[Tux3] Patch : Data Deduplication in Userspace
philipp.marek at emerion.com
Wed Feb 25 01:54:04 PST 2009
On Mittwoch, 25. Februar 2009, OGAWA Hirofumi wrote:
> OGAWA Hirofumi <hirofumi at mail.parknet.co.jp> writes:
> > In kernel, crypto subsystem has sha1 (and some other hashes). And
> > some systems can use hardware for it (and IIRC, it can calc the hash
> > asynchronously, if you want). And the algorithms would be selectable
> > without changing interface for it. So, crypto stuff may be good one.
> BTW, IIRC, asynchronous stuff on hardware was the good optimization when
> I was playing with IPSEC.
Well, doing that asynchronously in hardware would probably mean some latency,
if eg. a MB of data has to be hashed in 4kB blocks.
I'm not sure what the best way is, performance-wise; I could see a benefit of
fast (in-CPU) hash calculation (with something like I mentioned earlier), *if*
the data is still in the CPU cache later when the comparision is done ... but
that's possibly some 0.03 seconds later, and so that cache would be spilled
Should the de-duplication be *fully* asynchronously, ie. done while the rest
of the (IO) system is idle? Or would you bet the data on the hash being
collision-free, so that no direct comparision is necessary?
(Of course, using a 32kBit hash key for a 4kB block would work, but be
Tux3 mailing list
Tux3 at tux3.org
More information about the Tux3