[Tux3] Structure of Tux3
phillips at phunq.net
Mon Aug 11 22:40:05 PDT 2008
It is about time to take a step back and describe what I have been
implementing. A tux3 filesystem is a heirarchical structure with a
Free map (btree)
Free btree index*
Bitmap block or extents*
Volume table (radix tree)
Version table (radix tree)
Version index block*
Atime table (btree)
Atime index block*
Inode table (btree)
Data attribute* (btree)
Data btree index*
Data tree leaf*
Versioned elements are marked by !, repeated elements by *.
There is a single free map from which extents for all other objects are
The volume table is a new addition not central to the goals of Tux3,
but a nice feature to have given that it comes nearly for free. One
Tux3 volume can have an arbitrary number of separate filesystems tucked
inside it, indexed by a simple integer parameter at mount time. People
say they like this idea and it imposes no significant complexity, so it
goes in. I am not sure I would ever use it, personally. I like my
volumes to be nice separate pieces of disk, but I suppose that is just
me being old fashioned.
Each volume has a metablock pointing at the forward log chain for the
volume, a version table that describes the heirarchical relationship
between versions (snapshots), an atime table to take care of that
horrid legacy Unix feature, and an inode table containing files and
attributes of files. The atime tables, version tables and atom tables
could possibly just be types of files. I have not gotten there quite
yet, so let's see what works out best in practice.
Versioning takes place in three places, versioned pointers in the atime
btree, versioned extents in a file data btree and versioned attributes
in the inode table.
Notice the absence of a journal, the functionality of which is provided
by forward log elements that I described in the Hammer thread (and will
eventually write a separate post about).
Kinds of data attributes:
Directory (btree mapped into file)
Atom dictionary (Tux3 special!)
Version link (Tux3 special!)
The atom dictionary maps the names of extended attributes to small
integers, which are then used to identify extended attributes in
inodes. This makes the attributes smaller and also means there is only
a single variable length field per attribute which ought to save some
Version links are a new idea in Tux3 to allow you to link files in
different versions. Just like a symlink, but also specifies the
version of the link target.
Data attributes come in four varieties, which I described in a previous
post. Any of the above kinds of data attributes can be stored in the
any of the four forms:
Direct pointers to blocks
Pointer to btree leaves
Other inode attributes besides data attributes are:
All inode attributes are versioned except for the block count, which
reports the same blocks count for all versions, which is the total
number of all blocks allocated to all data attributes of the inode.
I doubt that blocks count is used by any applications other than as an
advisory item, which is what it will be in Tux3. If more detailed
information about how many blocks a given version owns in a given file
is needed, it could be provided by a utility.
Note: radix trees in tux3 are implemented as btrees, just so the btree
code can be reused without having to implement a special radix tree.
The radix trees (volume table, version table) are fairly small so the
extra space for btree keys vs a true radix tree is insignificant.
Tux3 mailing list
Tux3 at tux3.org
More information about the Tux3