[Tux3] More xattr design details

Wed Sep 10 15:52:12 PDT 2008

The parallel design/implementation effort for xattrs is now well under
way.  There was a lot of obsessing over the wisdom of the xattr atom 
idea vs storing the literal ascii xattr names.  A nasty problem with 
atoms is, how can we know the filesystem will never fill up with atom 
names that were once used and now are just historical garbage?

Choose a solution:

  1) Ignore the problem and hope it never bites (typical)

  2) Refcount all atoms and delete any that fall to zero

  3) Garbage collect unused atoms from time to time

For the immediate future Tux3 is using solution number 1, which is to 
say, just let the on-disk atom table grow as new attribute types arrive 
and do not worry too much about how big it gets.

For production Tux3 we need a better solution that acts just like the 
traditional method of storing literal ascii strings.  It looks like 
refcounting is the best idea, but we have to do it efficiently.  
Fortunately, that looks pretty easy, and even better, the methods we 
need to apply are almost the same as for file atimes.  So there is some 
design synergy there, which helps ameliorate the pain.

Persistent atom reference counting

Reference counting means incrementing an atom refcount each time a new 
xattr using the atom is stored in an inode, and decrementing it each 
time the xattr is removed, either because the inode using it was 
removed or the xattr body was set to empty.  This must be done 
*persistently*, which should give you a good idea about how easy it is 
to lose the performance plot here.  But of course, Tux3 is not going to 
lose the plot, it is are going to do this refcounting with as close to 
zero performance cost as we can possibly manage.  Otherwise, if the 
overhead of the atom idea is noticeable vs some cruder method then we 
drop the atom idea, that is all there is to that.

Extended attributes in cache

The inode->xcache mechanism has now landed, where the xcache is a block 
of memory holding the (small) xattrs that currently belong to a given 
inode.  This block of memory is realloced as necessary when new xattrs 
are set on the inode.  Never shrunk in the current implementation, 
which maybe should be corrected, but when an inode is evicted from 
cache then so is its xattr cache.  That constitutes an unconscionably 
lazy but effective way to shrink an xcache.

In time the xcache arrangement will be elaborated to support referencing 
large xattrs cached in kernel address_space mappings (inode->map in 
tux3 userspace).  This is where we get to have a little fun by allowing 
such xattrs to be accessed exactly like files, just with a different 
open method.

Atoms for xattr data too?

One obvious direction we could investigate is whether the atom concept 
serves well for xattr bodies as well as names.  So we could do global 
compression by mapping (small) xattr values through an atom cache.  And 
we can also think about doing this for immediate file data, which in 
Tux3 is designed to be handled almost identically to xattrs.  More fun.

Regards,

Daniel

_______________________________________________
Tux3 mailing list
Tux3 at tux3.org
http://tux3.org/cgi-bin/mailman/listinfo/tux3