[Tux3] The long and short of extended attributes

Daniel Phillips phillips at phunq.net
Mon Sep 8 12:40:23 PDT 2008


On Sunday 07 September 2008 17:43, Shapor Naghibzadeh wrote:
> I've noticed most filesystems have relatively little diversity in file
> attributes (especially within a directory), so we have lots of
> duplicated bits of attribute metadata.  For example, an email system
> with "virtual" accounts (not tied to real Unix users) may have
> millions of files with the exact same user/group/mode (Maildirs).
> With Tux3, if the inodes didn't explicitly track the extra 6 or so
> bytes of user/group/mode data per entry, we could see a potential 25%
> reduction in size of our already compact inodes.
> 
> After first reading this post, I thought the right approach may be to
> combine xattrs and user/group/mode in to a single attribute atom table
> which could grow dynamically in addressability (with 2 or 3 levels).
> However, I think an inheritance model would work better.  With atoms,
> it is possible for any user (malicious or not) to grow the atom table
> significantly.  Updating reference counts also sounds complex, with a
> lot of corner cases.
> 
> Initially, I thought we could track user/group/mode defaults on a
> per-directory basis, but discarded this due to the inability to
> (easily) map an inode to a parent directory (not to mention hard
> links, duh).  It would be possible, however, to have attribute
> defaults for inode table blocks (or higher level branches of the tree,
> even).  If we did that, it could lessen the need for a more complex
> atom based approach.

I completely agree with you on the thrust of this.  This is purely
a compression optimization, in other words, it had better cause no
change to semantics.  The inheritance can be per inode table block,
that is, each inode table block has a default user/group/mode in its
header, and if an inode exactly matches that, it is not represented,
otherwise the attribute appears in the inode.  A slight variation on
that idea is to say that the user/group/mode attribute of each inode
applies to the next one, if the next inode does not have one of its
own.  Which requires scanning all inodes in a table block to find out
what the user/group/mode attribute should be, so I think I prefer the
one per table block approach.  This is 12 bytes, vs savings of up to
64 * 12 = 768 bytes/inode, which is a big deal.

So yes, I think we should do something very much like this.  Later of
course, say after atomic commit and versioning are working, but with
fuse being a reality there is no need to wait for the kernel port.

> I suppose the inheritance and atom approaches could be combined or
> chosen based on how the filesystem is being used, but that sounds
> exponentially complex. :)

Yup.

Regards,

Daniel

_______________________________________________
Tux3 mailing list
Tux3 at tux3.org
http://tux3.org/cgi-bin/mailman/listinfo/tux3



More information about the Tux3 mailing list