[Tux3] Patch : Data Deduplication in Userspace

OGAWA Hirofumi hirofumi at mail.parknet.co.jp
Wed Feb 25 01:30:05 PST 2009


Philipp Marek <philipp.marek at emerion.com> writes:

> On Mittwoch, 25. Februar 2009, Daniel Phillips wrote:
>> Anyway, there is nothing magic about SHA1.  We certainly do not require
>> cryptographic security for a dedup hash.  Maybe we should look for a
>> more efficient hash than SHA1.
> If you want to go that way, I recently read some interesting work: 
>
> 	Performance in Practice of String Hashing Functions
> 	http://www.cs.mu.oz.au/~jz/fulltext/dasfaa97.ps
>
> This proposes a class of hashing functions, which give word-sized hash values 
> with five operations per input character (which could be changed to input 
> word, I expect); that would result for 4kB, 64bit words in
> 	4kB / 8 => 512 words per block
> 	times 5 operations
> 	2540 operations per block
> which looks very nice.
>
> (Maybe that could be done in SSE or something like that, too.)
>
>
> If you just need some hash value, and want (need) to compare the *entire* 
> block data, this might do the trick.

In kernel, crypto subsystem has sha1 (and some other hashes). And
some systems can use hardware for it (and IIRC, it can calc the hash
asynchronously, if you want). And the algorithms would be selectable
without changing interface for it. So, crypto stuff may be good one.

BTW, this issue has "git" too. So git is using the openssl one and MPL
replacement (from mozilla).
-- 
OGAWA Hirofumi <hirofumi at mail.parknet.co.jp>

_______________________________________________
Tux3 mailing list
Tux3 at tux3.org
http://mailman.tux3.org/cgi-bin/mailman/listinfo/tux3



More information about the Tux3 mailing list