[Tux3] Patch : Data Deduplication in Userspace

Michael Keulkeul kriptomik at gmail.com
Wed Feb 25 23:39:37 PST 2009

On Wed, Feb 25, 2009 at 8:20 PM, Chinmay Kamat <chinmaykamat at gmail.com>wrote:

> We had thought of using a smaller hash function. However the following
> issue arises ---  hash value of a block being written to disk is
> calculated and compared in the tree index. If a match is found, we are
> never sure if the blocks are identical or its a hash collision.

Still, you're not sure with 512bits hash...

> So we
> need to do a byte by byte comparison of the 2 blocks- current block
> being written and the block pointed to by matching tree entry. This
> would  mean doing disk read for reading the block pointed to by tree
> entry. So each detection of duplicate block will have an overhead of a
> block read.

Yes but if you process block asyncronously and we make sure that the block
list processed is seek-friendly this should not cost that much, and maybe
block to compare with is in cache...

> On the other hand, in case of a larger hash (SHA1) when 2 hash values
> match,
> the blocks should be duplicates, assuming the chances of collision
> with large hashes
> are very remote. Probably more remote than hw failure.

Very true. Would it be difficult to add block compare to your current
implementation ? If not, let people choosing in the final patch is perhaps a
good option ? what do to think ?

> The tree contains only the first 64 bits of the SHA1 hash,
> so there is some optimization. ( we are working on handling collisions
> here).
> About the SHA1 implementation, as Daniel mentioned in his mail, we
> will be looking at
> implementations other than openssl before the kernel port.
> Regards,
> Gaurav Tungatkar
> Chinmay Kamat
> Kushal Dalmia
> Amey Magar
> _______________________________________________
> Tux3 mailing list
> Tux3 at tux3.org
> http://mailman.tux3.org/cgi-bin/mailman/listinfo/tux3
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://phunq.net/pipermail/tux3/attachments/20090226/89e2bb82/attachment-0001.html>
-------------- next part --------------
Tux3 mailing list
Tux3 at tux3.org

More information about the Tux3 mailing list