[Tux3] More xattr design details
Daniel Phillips
phillips at phunq.net
Wed Sep 10 15:52:12 PDT 2008
The parallel design/implementation effort for xattrs is now well under
way. There was a lot of obsessing over the wisdom of the xattr atom
idea vs storing the literal ascii xattr names. A nasty problem with
atoms is, how can we know the filesystem will never fill up with atom
names that were once used and now are just historical garbage?
Choose a solution:
1) Ignore the problem and hope it never bites (typical)
2) Refcount all atoms and delete any that fall to zero
3) Garbage collect unused atoms from time to time
For the immediate future Tux3 is using solution number 1, which is to
say, just let the on-disk atom table grow as new attribute types arrive
and do not worry too much about how big it gets.
For production Tux3 we need a better solution that acts just like the
traditional method of storing literal ascii strings. It looks like
refcounting is the best idea, but we have to do it efficiently.
Fortunately, that looks pretty easy, and even better, the methods we
need to apply are almost the same as for file atimes. So there is some
design synergy there, which helps ameliorate the pain.
Persistent atom reference counting
Reference counting means incrementing an atom refcount each time a new
xattr using the atom is stored in an inode, and decrementing it each
time the xattr is removed, either because the inode using it was
removed or the xattr body was set to empty. This must be done
*persistently*, which should give you a good idea about how easy it is
to lose the performance plot here. But of course, Tux3 is not going to
lose the plot, it is are going to do this refcounting with as close to
zero performance cost as we can possibly manage. Otherwise, if the
overhead of the atom idea is noticeable vs some cruder method then we
drop the atom idea, that is all there is to that.
Extended attributes in cache
The inode->xcache mechanism has now landed, where the xcache is a block
of memory holding the (small) xattrs that currently belong to a given
inode. This block of memory is realloced as necessary when new xattrs
are set on the inode. Never shrunk in the current implementation,
which maybe should be corrected, but when an inode is evicted from
cache then so is its xattr cache. That constitutes an unconscionably
lazy but effective way to shrink an xcache.
In time the xcache arrangement will be elaborated to support referencing
large xattrs cached in kernel address_space mappings (inode->map in
tux3 userspace). This is where we get to have a little fun by allowing
such xattrs to be accessed exactly like files, just with a different
open method.
Atoms for xattr data too?
One obvious direction we could investigate is whether the atom concept
serves well for xattr bodies as well as names. So we could do global
compression by mapping (small) xattr values through an atom cache. And
we can also think about doing this for immediate file data, which in
Tux3 is designed to be handled almost identically to xattrs. More fun.
Regards,
Daniel
_______________________________________________
Tux3 mailing list
Tux3 at tux3.org
http://tux3.org/cgi-bin/mailman/listinfo/tux3
More information about the Tux3
mailing list