[Tux3] The long and short of extended attributes
Daniel Phillips
phillips at phunq.net
Mon Sep 8 12:40:23 PDT 2008
On Sunday 07 September 2008 17:43, Shapor Naghibzadeh wrote:
> I've noticed most filesystems have relatively little diversity in file
> attributes (especially within a directory), so we have lots of
> duplicated bits of attribute metadata. For example, an email system
> with "virtual" accounts (not tied to real Unix users) may have
> millions of files with the exact same user/group/mode (Maildirs).
> With Tux3, if the inodes didn't explicitly track the extra 6 or so
> bytes of user/group/mode data per entry, we could see a potential 25%
> reduction in size of our already compact inodes.
>
> After first reading this post, I thought the right approach may be to
> combine xattrs and user/group/mode in to a single attribute atom table
> which could grow dynamically in addressability (with 2 or 3 levels).
> However, I think an inheritance model would work better. With atoms,
> it is possible for any user (malicious or not) to grow the atom table
> significantly. Updating reference counts also sounds complex, with a
> lot of corner cases.
>
> Initially, I thought we could track user/group/mode defaults on a
> per-directory basis, but discarded this due to the inability to
> (easily) map an inode to a parent directory (not to mention hard
> links, duh). It would be possible, however, to have attribute
> defaults for inode table blocks (or higher level branches of the tree,
> even). If we did that, it could lessen the need for a more complex
> atom based approach.
I completely agree with you on the thrust of this. This is purely
a compression optimization, in other words, it had better cause no
change to semantics. The inheritance can be per inode table block,
that is, each inode table block has a default user/group/mode in its
header, and if an inode exactly matches that, it is not represented,
otherwise the attribute appears in the inode. A slight variation on
that idea is to say that the user/group/mode attribute of each inode
applies to the next one, if the next inode does not have one of its
own. Which requires scanning all inodes in a table block to find out
what the user/group/mode attribute should be, so I think I prefer the
one per table block approach. This is 12 bytes, vs savings of up to
64 * 12 = 768 bytes/inode, which is a big deal.
So yes, I think we should do something very much like this. Later of
course, say after atomic commit and versioning are working, but with
fuse being a reality there is no need to wait for the kernel port.
> I suppose the inheritance and atom approaches could be combined or
> chosen based on how the filesystem is being used, but that sounds
> exponentially complex. :)
Yup.
Regards,
Daniel
_______________________________________________
Tux3 mailing list
Tux3 at tux3.org
http://tux3.org/cgi-bin/mailman/listinfo/tux3
More information about the Tux3
mailing list