[Tux3] Patch : Data Deduplication in Userspace
phillips at phunq.net
Wed Feb 25 01:39:10 PST 2009
On Wednesday 25 February 2009, OGAWA Hirofumi wrote:
> Philipp Marek <philipp.marek at emerion.com> writes:
> > On Mittwoch, 25. Februar 2009, Daniel Phillips wrote:
> >> Anyway, there is nothing magic about SHA1. We certainly do not require
> >> cryptographic security for a dedup hash. Maybe we should look for a
> >> more efficient hash than SHA1.
> > If you want to go that way, I recently read some interesting work:
> > Performance in Practice of String Hashing Functions
> > http://www.cs.mu.oz.au/~jz/fulltext/dasfaa97.ps
> > This proposes a class of hashing functions, which give word-sized hash values
> > with five operations per input character (which could be changed to input
> > word, I expect); that would result for 4kB, 64bit words in
> > 4kB / 8 => 512 words per block
> > times 5 operations
> > 2540 operations per block
> > which looks very nice.
> > (Maybe that could be done in SSE or something like that, too.)
> > If you just need some hash value, and want (need) to compare the *entire*
> > block data, this might do the trick.
> In kernel, crypto subsystem has sha1 (and some other hashes). And
> some systems can use hardware for it (and IIRC, it can calc the hash
> asynchronously, if you want). And the algorithms would be selectable
> without changing interface for it. So, crypto stuff may be good one.
> BTW, this issue has "git" too. So git is using the openssl one and MPL
> replacement (from mozilla).
Fine, so there is a solution and it is just a matter of choosing the
best solution. There is no need for any immediate change to the dedup
prototype in my opinion. For the userspace code, I can add a specific
exception if we consider it necessary. I also like the idea of trying
a few alternative hashes, more because of the chance to increase
performance than the OpenSSL issue.
Tux3 mailing list
Tux3 at tux3.org
More information about the Tux3