[Tux3] The version table
Daniel Phillips
phillips at phunq.net
Sun Sep 7 01:00:01 PDT 2008
It is about time to add some of that tangy secret sauce to make Tux3
special like it is supposed to be. The version table is a simple list
of version heads, with a structure something like this:
struct { u16 flags; u48 parent; u64 tag; } vtable[];
Big endian, like all Tux3 disk structures are supposed to be (we have
some work to do there). Some more fields such as create time may be
added to this basic structure. We shall see what is required in
practice. The above is all that is needed to implement versioning.
On startup, the entire version table is read and converted to a tree,
using the parent field. (This will work fine up to a few thousand
versions, then we might think about making the tree persistent.)
The version tree is used to encode and decode versioned entities in
Tux3, such as versioned pointers and attributes. The gory details of
how that is done are written here:
"Versioned pointers: a new method of representing snapshots"
http://lwn.net/Articles/288896/
The version table is stored in an unlinked file (inode number 2) just
like the allocation bitmap. Since it has no directory entry, it is
inaccessible from userspace. The size of table entries is a binary
power so that the table divides evenly into blocks. These blocks are
mapped into the page cache (buffer cache in Tux3 userspace) where they
may be readily accessed and paged, which may help in future when the
version table grows very large. There are other benefits to having
the table as a file: it is convenient to walk it using a bread loop
as in ext2_find_entry in dir.c. To save it back to disk just mark any
changed buffers dirty and sync the inode. The biggest benefit is
probably just that it takes very little code to handle the version
table as a file, compared to having a purpose-build disk structure.
Version zero is never used, because of a quirk mentioned in the post
above: N versions exposed to the user may require as many as N-1
"ghost" versions to exist in order to correctly represent data
inheritance as the topology of the version tree changes over time.
The on-disk entry is therefore unused. (The in memory version tree
records the root of the version tree as the one child of the zeroth
version.) The inode isize field tells us the version highwater mark,
the maximum number of versions that were ever used.
Shrinking the version table is impractical, as any inode may be using
any particular version. But the version table is relatively small so
this probably does not matter.
Initially, the size of the version table is zero, until the first child
version is created by a user-initiated action such as a snapshot. So
for filesystems that don't use versioning (floppy disk!) the version
table does not even exist, except for a roughly 64 byte entry in the
inode table.
The version table is atomically committed to disk like any other file.
Of course, the version table is not itself versioned, so its dleaf
pointers will always have zero in the version field.
Regards,
Daniel
_______________________________________________
Tux3 mailing list
Tux3 at tux3.org
http://tux3.org/cgi-bin/mailman/listinfo/tux3
More information about the Tux3
mailing list