[Tux3] Patch : Data Deduplication in Userspace

OGAWA Hirofumi hirofumi at mail.parknet.co.jp
Wed Feb 25 02:55:29 PST 2009


Philipp Marek <philipp.marek at emerion.com> writes:

> On Mittwoch, 25. Februar 2009, OGAWA Hirofumi wrote:
>> OGAWA Hirofumi <hirofumi at mail.parknet.co.jp> writes:
>> > In kernel, crypto subsystem has sha1 (and some other hashes). And
>> > some systems can use hardware for it (and IIRC, it can calc the hash
>> > asynchronously, if you want). And the algorithms would be selectable
>> > without changing interface for it. So, crypto stuff may be good one.
>>
>> BTW, IIRC, asynchronous stuff on hardware was the good optimization when
>> I was playing with IPSEC.
> Well, doing that asynchronously in hardware would probably mean some latency, 
> if eg. a MB of data has to be hashed in 4kB blocks.
>
> I'm not sure what the best way is, performance-wise; I could see a benefit of 
> fast (in-CPU) hash calculation (with something like I mentioned earlier), *if* 
> the data is still in the CPU cache later when the comparision is done ... but 
> that's possibly some 0.03 seconds later, and so that cache would be spilled 
> anyway.
>
> Should the de-duplication be *fully* asynchronously, ie. done while the rest 
> of the (IO) system is idle? Or would you bet the data on the hash being 
> collision-free, so that no direct comparision is necessary?
> (Of course, using a 32kBit hash key for a 4kB block would work, but be 
> meaningless ;-)

Well, since it was embeded system, cpu was not fast enough. And the
hardware calc is fast, because that hardware is optimizing for crypto.

Probably, setup asynchronous hardware for hash. Then do other jobs
during calc. With the callback of hardware, back initial job with hash.

Well, IPSEC is doing crypto of packet data, so, it needed more more cpu
power than only hash. So, benefit may be small or nothing on faster
multi core cpus.

However, I guess it still have the benefit more or less. I think we will
calc hash on stage_delta(). So, we submit some pages to async hardware,
then I guess we would work for another pages immediately. And if hash
was done, we would submit bio after data compare.

Just a idea though.

Thanks.
-- 
OGAWA Hirofumi <hirofumi at mail.parknet.co.jp>

_______________________________________________
Tux3 mailing list
Tux3 at tux3.org
http://mailman.tux3.org/cgi-bin/mailman/listinfo/tux3



More information about the Tux3 mailing list