[Tux3] Patch : Data Deduplication in Userspace

Philipp Marek philipp.marek at emerion.com
Wed Feb 25 03:30:51 PST 2009

On Mittwoch, 25. Februar 2009, Christensen Stefan wrote:
> Behalf Of Daniel Phillips
> Sent: Wednesday, February 25, 2009 10:39 AM
> It should be a cryptographically secure hash, just to make sure
> it is collision resistant. 
That's the question ... if it's "cryptographically secure", it means (AFAIU) 
that it's "hard" to get collisions ... but it's not impossible.
Really, it's *guaranteed* that on a large-enough filesystem (some TB, anyone?) 
you'll get two blocks with the same hash value.

Therefore I asked whether the risk is acceptable ... there has been some 
filesystem (I think that was more than 10 years ago, didn't find a link) that 
tried deduplication by some hash - but got shot down, because without 
*verification* that the data is identical you might *silently* shoot yourself 
(and all others) in the foot.

> It might be an idea to follow the
> SHA-3 competition by NIST. It can be fount here:
> http://csrc.nist.gov/groups/ST/hash/sha-3/index.html
> An offsite wikipedia regarding SHA-3 can be found here:
> http://ehash.iaik.tugraz.at/wiki/The_SHA-3_Zoo
> The idea behind SHA-3 is to find a hash that is as resilient as
> SHA-2 (256,512bit), but a lot faster.
But if verification is needed anyway, then something *much* simpler (and 
*much* faster) would be ok, too.



Tux3 mailing list
Tux3 at tux3.org

More information about the Tux3 mailing list