[Tux3] Patch : Data Deduplication in Userspace

Philipp Marek philipp.marek at emerion.com
Wed Feb 25 23:22:57 PST 2009


On Mittwoch, 25. Februar 2009, Chinmay Kamat wrote:
> We had thought of using a smaller hash function. However the following
> issue arises ---  hash value of a block being written to disk is
> calculated and compared in the tree index. If a match is found, we are
> never sure if the blocks are identical or its a hash collision. So we
> need to do a byte by byte comparison of the 2 blocks- current block
> being written and the block pointed to by matching tree entry. This
> would  mean doing disk read for reading the block pointed to by tree
> entry. So each detection of duplicate block will have an overhead of a
> block read.

I thought about that last night, and came to a similar idea as Michael:

On Mittwoch, 25. Februar 2009, Michael Keulkeul wrote:
> If soneone asked me, I would answer than verification is necessary and
> using weaker and small hash is fine.
> Storing the block with it's hash, marked "non deduplicated" is fine, just
> dedup it later with a background process when filesystems has some idle
> iops to spend on it, and mark it "deduplicated" when done.
> I've the feeling that tux3 design is neat to do such a thing (multiple
> trees, just add one to the forest).

How about not doing deduplication on *every* block, but only for specially 
marked files?

Ie. the whole root filesystem is not marked; but if I create a big file that 
is to be used as loopback mount, and I want to use several versions of that, I 
could mark it for deduplication (reminds me of Copy-On-Write semantics), which 
would cause it being indexed in some way.

If I need a copy, I just mark the file to-be-written as "to be de-duplicated", 
and the copying process find the duplicate blocks.
(Or maybe there's some special "cowcp" for Tux3, since a real copy only needs 
the extents refcounted?)

Then there would be a lot less hash values computed, therefore fewer 
collisions, and there would be *much* less read-traffic.


Well, maybe the SSD people would create a new command "compare this data 
against block A,B,C and D; if there's a match, return the block number; else 
write into block E.".
As SSD has no seek time, this should be (on average) faster than normal 
writes.

[ A similar command would be good for RAID sets and online verification, BTW ]

> On the other hand, in case of a larger hash (SHA1) when 2 hash values
> match, the blocks should be duplicates, assuming the chances of collision
> with large hashes
> are very remote. Probably more remote than hw failure.
Well, IMO even remote chances of unseen data damage is bad.


Regards,

Phil


_______________________________________________
Tux3 mailing list
Tux3 at tux3.org
http://mailman.tux3.org/cgi-bin/mailman/listinfo/tux3



More information about the Tux3 mailing list