[Tux3] Patch : Data Deduplication in Userspace

Philipp Marek philipp.marek at emerion.com
Wed Feb 25 00:58:17 PST 2009

On Mittwoch, 25. Februar 2009, Daniel Phillips wrote:
> Anyway, there is nothing magic about SHA1.  We certainly do not require
> cryptographic security for a dedup hash.  Maybe we should look for a
> more efficient hash than SHA1.
If you want to go that way, I recently read some interesting work: 

	Performance in Practice of String Hashing Functions

This proposes a class of hashing functions, which give word-sized hash values 
with five operations per input character (which could be changed to input 
word, I expect); that would result for 4kB, 64bit words in
	4kB / 8 => 512 words per block
	times 5 operations
	2540 operations per block
which looks very nice.

(Maybe that could be done in SSE or something like that, too.)

If you just need some hash value, and want (need) to compare the *entire* 
block data, this might do the trick.



Tux3 mailing list
Tux3 at tux3.org

More information about the Tux3 mailing list