[Tux3] The long and short of extended attributes
Kent Overstreet
kent.overstreet at gmail.com
Tue Sep 9 03:46:37 PDT 2008
Actually, I've got a better idea. In your atom table, have next to
your link count a use counter for the number of times the link count
has changed - when link count is incremented, increment the use
counter, when the link count is decremented increment the use count.
Use the use count or possibly both to heuristically decide which xattr
names to reference count, and then just use garbage collection for the
ones you mark as permanent, if the table ever gets huge; in practice,
this won't happen at all. Integrate it into the online fsck or
whatever.
On Tue, Sep 9, 2008 at 2:40 AM, Kent Overstreet
<kent.overstreet at gmail.com> wrote:
> How about only refcounting xattrs that aren't used by root?
>
> On Mon, Sep 8, 2008 at 11:40 AM, Daniel Phillips <phillips at phunq.net> wrote:
>> On Sunday 07 September 2008 17:43, Shapor Naghibzadeh wrote:
>>> I've noticed most filesystems have relatively little diversity in file
>>> attributes (especially within a directory), so we have lots of
>>> duplicated bits of attribute metadata. For example, an email system
>>> with "virtual" accounts (not tied to real Unix users) may have
>>> millions of files with the exact same user/group/mode (Maildirs).
>>> With Tux3, if the inodes didn't explicitly track the extra 6 or so
>>> bytes of user/group/mode data per entry, we could see a potential 25%
>>> reduction in size of our already compact inodes.
>>>
>>> After first reading this post, I thought the right approach may be to
>>> combine xattrs and user/group/mode in to a single attribute atom table
>>> which could grow dynamically in addressability (with 2 or 3 levels).
>>> However, I think an inheritance model would work better. With atoms,
>>> it is possible for any user (malicious or not) to grow the atom table
>>> significantly. Updating reference counts also sounds complex, with a
>>> lot of corner cases.
>>>
>>> Initially, I thought we could track user/group/mode defaults on a
>>> per-directory basis, but discarded this due to the inability to
>>> (easily) map an inode to a parent directory (not to mention hard
>>> links, duh). It would be possible, however, to have attribute
>>> defaults for inode table blocks (or higher level branches of the tree,
>>> even). If we did that, it could lessen the need for a more complex
>>> atom based approach.
>>
>> I completely agree with you on the thrust of this. This is purely
>> a compression optimization, in other words, it had better cause no
>> change to semantics. The inheritance can be per inode table block,
>> that is, each inode table block has a default user/group/mode in its
>> header, and if an inode exactly matches that, it is not represented,
>> otherwise the attribute appears in the inode. A slight variation on
>> that idea is to say that the user/group/mode attribute of each inode
>> applies to the next one, if the next inode does not have one of its
>> own. Which requires scanning all inodes in a table block to find out
>> what the user/group/mode attribute should be, so I think I prefer the
>> one per table block approach. This is 12 bytes, vs savings of up to
>> 64 * 12 = 768 bytes/inode, which is a big deal.
>>
>> So yes, I think we should do something very much like this. Later of
>> course, say after atomic commit and versioning are working, but with
>> fuse being a reality there is no need to wait for the kernel port.
>>
>>> I suppose the inheritance and atom approaches could be combined or
>>> chosen based on how the filesystem is being used, but that sounds
>>> exponentially complex. :)
>>
>> Yup.
>>
>> Regards,
>>
>> Daniel
>>
>> _______________________________________________
>> Tux3 mailing list
>> Tux3 at tux3.org
>> http://tux3.org/cgi-bin/mailman/listinfo/tux3
>>
>
_______________________________________________
Tux3 mailing list
Tux3 at tux3.org
http://tux3.org/cgi-bin/mailman/listinfo/tux3
More information about the Tux3
mailing list