[Tux3] More xattr design details

Daniel Phillips phillips at phunq.net
Thu Sep 11 15:04:29 PDT 2008


Hi Benjamin,

So now we have an actual implementation of the xattr atom idea in
place, which makes the annoying little questions much less theoretical
and more pressing.  My thinking has evolved a little:

  * Now we know that the incremental amount of code needed for
    xattr name deduplication is really tiny, just 20 or 30 lines.

  * But we also need reference counting to make it nice, which is
    going to cost more code and some performance.

  * The only immediately measurable benefits to xattr atoms are space
    reduction and cache pressure reduction.  So this is just a space
    optimization, against which we have additional code complexity
    and performance impact.

These thoughts seem to point directly at an inescapable conclusion:
make it optional as has been suggested.  There are of course infinite
variations on optional.  I will touch on a couple that immediately
come to mind.

One option is to trust root as you suggested, to create only the xattrs
the system actually needs, for example, acls.  Other users might be
similarly trusted.  How would we know?  A side channel message to Tux3,
like an ioctl or a ddlink message?  Rely on the uid?  Other?

Another option would be to have a global on/off mount option for xattr
name deduplication.  Kind of an obscure option, maybe.  We could
instead cast that as a generic "try to compress" level, so Tux3 could
have a compression level option.  0 -> none, 1 -> compress xattr names,
2 -> compress xattr bodies too, 6 -> use gzip all over the place, 9 ->
go nuts, or something like that.

Whatever we settle on, I do not think the effort that went into xattr
name deduplication is in any way wasted.  It only cost an hour or two
of development - the time to write emails about it was actually a lot
more, and that was fun.  Even better, the code to skirt around the atom
optimization and directly store the xattr names in inodes looks like it
will be really easy and short.  We do everything just as it is done for
atoms, but the xattr body now starts with a one byte xattr name length,
and we know from the atom code (zero?) to do a string compare vs atom
equality check.  We keep a bit in the in-memory inode flags to tell us
whether atoms are present or not, in order to be able to skip the atom
lookup if it is not required.  It all amounts to not very much code.
And best, it makes no real difference to the user, so we can take our
sweet time implementing that code.

Now let me take another look at the pros and cons of xattr name atoms.

Pro:

  - Space efficiency: with atoms we can encode an entire inode in
    52 bytes complete with an xattr with a name as long as we want.
    So if the xattr name is 16 bytes long (typical?) and the atom
    field cost two bytes, we saved 14 bytes, about 27% compression.
    That is a big deal.

  - Versioning will make the atom compression much more important,
    because every time you update an xattr you get the compression
    all over again, so it could translate into 50% inode table
    compression for some common cases.

  - Cache efficiency.  Often tied directly to performance.  Saving
    27% worth of inode table cache pressure might speed up some
    benchmarks by, say 5%.  Or in some boundary cases it could be
    much more than that.  Or another way of looking at it, 27% less
    cache pressure moves the knee of the cache thrashing curve out
    to a 27% bigger filesystem, which may make the difference
    between a happy user and an annoyed user somewhere, sometime.

  - Future grooviness.  Xattr atoms provide some infrastructure for
    deduplication and compression in general, without costing too
    much in code bulk or complexity.  (Assuming that refcounting
    does not translate into a big mess.)

  - Rename attribute: this is trivial with xattr atoms, though there
    is no Posix interface for it and likely never will be.

  - Per user xattr stats: easily implemented with atom tables, and
    everyone loves stats!

  - Other benefits?  Please try to think of any and help me out here.
    There is not actually much on the benefit side of the ledger yet.

Con:

  - If we do not do something about it, it is a trivial denial of
    service vector where an unprivileged user can fill up an entire
    volume just by setting and removing random xattrs on their own
    files.

  - Extra code, maybe 20-30 lines so far, but more likely two or
    three hundred by the time reference counting and atomic update
    thereof is implemented.

  - Efficiency.  Maybe reference count updating is costly, maybe it
    isn't, we do not know.  Maybe the cost of reference count atomic
    updates is less than the savings due to reduced cache pressure,
    in which case we win the grand prize and can stop obsessing about
    it.

There is also the distinct possibility that we have not considered
every aspect of the question yet, or every possible solution.

The biggest reason to have xattrs at all in Linux is to represent acls,
so that naturally becomes the biggest "customer" for Tux3 xattrs.  it
is worth taking a little time to see how the ACL folks out there think
about things, and what kind of code they write.  Here is an example
from TRU64:

   http://www.phys.uu.nl/DU/Tru64_5.0/HTML/ARH95ATE/CHDCXXXX.HTM

Examples of direct usage on Linux are harder to come by, perhaps
showing that we penguins really do not care very much about ACLS or
granular security in general.  True.  I am not sure that is good.

>From my mainframe days, I found that being able control access to my
files per user was a very natural feature that I used frequently.  On
Linux that went away, but why?  Just because it is more complex to
implement I think, and that complexity is an open invitation to
exploits.  Well a lot of time has gone by and everybody is much better
at writing tight code now.  Or another way of looking at it, there is
so much really awful code out there that implements the simple minded
model, that tight code implementing something more sophisticated looks
good by comparison.  I think that is what we should be aiming at for
Tux3: tight code that goes beyond certain simplification that date
back to the dawn of time of Unix.  I mean, those mainframe guys got it
right back in the mid 70's, see MTS (Michigan Terminal System).  So
how about we Tux3 penguins show that with an extra thirty years to
think about it (in some case, for our parents to think about it:) we
can get it right too.  Surely we can.

Regards,

Daniel

_______________________________________________
Tux3 mailing list
Tux3 at tux3.org
http://tux3.org/cgi-bin/mailman/listinfo/tux3



More information about the Tux3 mailing list