[Tux3] More xattr design details - efficient atom refcounting

Daniel Phillips phillips at phunq.net
Fri Sep 12 15:14:34 PDT 2008


A couple of addenda to that post:

>   * Both tables are mapped into the atom table at a high logical
>     offset.  Allowing 32 bits worth of atom numbers, and with at most
>     256 atom entries per 4K dirent block, we need at most
>     (32 << 8) = 1 TB dirent bytes for the atom dictionary...

The 1 TB size is correct, but the reason is woefully wrong.  It should
be: Allowing 32 bits worth of atom numbers, and with a 255 byte name
for every xattr...

This could actually use a little more than 1 TB, which I am not going
to sweat about: we will fail with ENOSPC for any attempt to expand the
dirent part of the atom table past 1 TB, which if it ever happens is
because somebody had permission to (and thus no quota) and did it
intentionally, which falls into the same category as just filling up
the entire fs with a big file.

The other necessary component of the atom table I forgot to mention is
the reverse map, which maps atom numbers back to dirents so we can
implement xattr listing efficiently.  When a new atom dirent is
created, we also set the reverse map for the dirent's atom number to
the file offset at which the dirent was created.  This will be 64 bits
if we want to be lazy, which I do here, so that is 2^32 atoms * 8 bytes
= 2^35 revmap bytes = 2^35 >> 12 blocks = 2^23 blocks.  We locate this
just above the count table (low + high part), which puts it at logical
offset 2^28 + 2^23, since the refcount table is also (by coincidence)
2^23 bytes in size.

An alternative to the reverse map approach would be to use the dirent
offset directly as the atom number, which gives a less dense mapping of
atom numbers and thus somewhat less compression, with slightly simpler
code in return and less work to do the first time an xattr is set on
any inode.  But since the code and efficiency difference verges on
negligible, I will err on the side of better compression and go with the
simple minded reverse map.  Keep in mind that the limits I'm designing
for are way, way, way higher than what will be used in practice.  I
just do not want this design being attacked on the basis of something
ASCII strings can do that atoms cannot.  I guess we are about there at
this point, but of course the proof will be in the implementation,
which ought to land in the not too far distant future.

Regards,

Daniel

_______________________________________________
Tux3 mailing list
Tux3 at tux3.org
http://tux3.org/cgi-bin/mailman/listinfo/tux3



More information about the Tux3 mailing list