[Tux3] The long and short of extended attributes

Kent Overstreet kent.overstreet at gmail.com
Tue Sep 9 03:40:31 PDT 2008


How about only refcounting xattrs that aren't used by root?

On Mon, Sep 8, 2008 at 11:40 AM, Daniel Phillips <phillips at phunq.net> wrote:
> On Sunday 07 September 2008 17:43, Shapor Naghibzadeh wrote:
>> I've noticed most filesystems have relatively little diversity in file
>> attributes (especially within a directory), so we have lots of
>> duplicated bits of attribute metadata.  For example, an email system
>> with "virtual" accounts (not tied to real Unix users) may have
>> millions of files with the exact same user/group/mode (Maildirs).
>> With Tux3, if the inodes didn't explicitly track the extra 6 or so
>> bytes of user/group/mode data per entry, we could see a potential 25%
>> reduction in size of our already compact inodes.
>>
>> After first reading this post, I thought the right approach may be to
>> combine xattrs and user/group/mode in to a single attribute atom table
>> which could grow dynamically in addressability (with 2 or 3 levels).
>> However, I think an inheritance model would work better.  With atoms,
>> it is possible for any user (malicious or not) to grow the atom table
>> significantly.  Updating reference counts also sounds complex, with a
>> lot of corner cases.
>>
>> Initially, I thought we could track user/group/mode defaults on a
>> per-directory basis, but discarded this due to the inability to
>> (easily) map an inode to a parent directory (not to mention hard
>> links, duh).  It would be possible, however, to have attribute
>> defaults for inode table blocks (or higher level branches of the tree,
>> even).  If we did that, it could lessen the need for a more complex
>> atom based approach.
>
> I completely agree with you on the thrust of this.  This is purely
> a compression optimization, in other words, it had better cause no
> change to semantics.  The inheritance can be per inode table block,
> that is, each inode table block has a default user/group/mode in its
> header, and if an inode exactly matches that, it is not represented,
> otherwise the attribute appears in the inode.  A slight variation on
> that idea is to say that the user/group/mode attribute of each inode
> applies to the next one, if the next inode does not have one of its
> own.  Which requires scanning all inodes in a table block to find out
> what the user/group/mode attribute should be, so I think I prefer the
> one per table block approach.  This is 12 bytes, vs savings of up to
> 64 * 12 = 768 bytes/inode, which is a big deal.
>
> So yes, I think we should do something very much like this.  Later of
> course, say after atomic commit and versioning are working, but with
> fuse being a reality there is no need to wait for the kernel port.
>
>> I suppose the inheritance and atom approaches could be combined or
>> chosen based on how the filesystem is being used, but that sounds
>> exponentially complex. :)
>
> Yup.
>
> Regards,
>
> Daniel
>
> _______________________________________________
> Tux3 mailing list
> Tux3 at tux3.org
> http://tux3.org/cgi-bin/mailman/listinfo/tux3
>

_______________________________________________
Tux3 mailing list
Tux3 at tux3.org
http://tux3.org/cgi-bin/mailman/listinfo/tux3



More information about the Tux3 mailing list