[Tux3] Patch : Data Deduplication in Userspace

Daniel Phillips phillips at phunq.net
Wed Feb 25 01:39:10 PST 2009

On Wednesday 25 February 2009, OGAWA Hirofumi wrote:
> Philipp Marek <philipp.marek at emerion.com> writes:
> > On Mittwoch, 25. Februar 2009, Daniel Phillips wrote:
> >> Anyway, there is nothing magic about SHA1.  We certainly do not require
> >> cryptographic security for a dedup hash.  Maybe we should look for a
> >> more efficient hash than SHA1.
> > If you want to go that way, I recently read some interesting work: 
> >
> > 	Performance in Practice of String Hashing Functions
> > 	http://www.cs.mu.oz.au/~jz/fulltext/dasfaa97.ps
> >
> > This proposes a class of hashing functions, which give word-sized hash values 
> > with five operations per input character (which could be changed to input 
> > word, I expect); that would result for 4kB, 64bit words in
> > 	4kB / 8 => 512 words per block
> > 	times 5 operations
> > 	2540 operations per block
> > which looks very nice.
> >
> > (Maybe that could be done in SSE or something like that, too.)
> >
> >
> > If you just need some hash value, and want (need) to compare the *entire* 
> > block data, this might do the trick.
> In kernel, crypto subsystem has sha1 (and some other hashes). And
> some systems can use hardware for it (and IIRC, it can calc the hash
> asynchronously, if you want). And the algorithms would be selectable
> without changing interface for it. So, crypto stuff may be good one.
> BTW, this issue has "git" too. So git is using the openssl one and MPL
> replacement (from mozilla).

Fine, so there is a solution and it is just a matter of choosing the
best solution.  There is no need for any immediate change to the dedup
prototype in my opinion.  For the userspace code, I can add a specific
exception if we consider it necessary.  I also like the idea of trying
a few alternative hashes, more because of the chance to increase
performance than the OpenSSL issue.



Tux3 mailing list
Tux3 at tux3.org

More information about the Tux3 mailing list