[Tux3] leaf_split has landed

Daniel Phillips phillips at phunq.net
Thu Jul 31 21:58:08 PDT 2008


Another messy little function for the file index btree set of btree
leaf functions has landed, leaf_split:

   http://tux3.org/tux3?f=0fed5998fbec;file=test/leaf.c

This appears to work.  It was able to split the sample leaf at any
entry position, include zero (leaving an empty leaf in the original
btree block) or num_entries (leaving an empty leaf in the split block).

This was not a whole lot easier to write than leaf_insert.  In fact,
this whole two level design makes the coding quite hard, but the
benefit is clearly worth the effort I think.  The functions so far are
very fast, and the structure is really, really compact.  Which will
help offset the fact that tux3 will often be loading a lot of version
data that it doesn't actually need in order to get at a particular
version, including the current version.  I do not think this will be a
problem in practice because heavily versioned files are actually pretty
rare, and cache effects most likely will not be noticeable until up in
the several hundred version range.  At that level, probably every
versioning filesystem is going to exhibit some cache effects, not just
Tux3.

And we have a long term plan for keeping the cache footprint of
versioned extents to a manageable level, even for thousands of
versions, described in the original lkml posting.

Another possibility is to keep around a "digest" metadata btree that
contains only the pointers for a particular version, and make that be
consistent somehow with the "full" weave-style metadata.  Since this
would be maintained only for the current version, total metadata still
stays small vs schemes with distinct tree roots for each version (ZFS
and Btrfs).

Anyway, whatever cache pressure comes up with the current scheme, this
compressed leaf indexing scheme will help with it, to the tune of
fitting 33% more versioned extents into the same memory.

The next function to implement is leaf_merge, slightly less complex than
leaf_split I think.  Then the delete, which I am playing with ideas for
now, and finally a proper fuzzing test.  There will be 11 methods for
file index leaf nodes in total when done:

  1) show - done
  2) check - started
  3) lookup - done
  4) insert - done
  5) split - done
  6) merge
  7) delete
  8) create - done
  9) destroy - done
  10) freespace
  11) needspace

Then all 11 methods have to be implemented for the inode table btree,
which is actually quite a lot easier than the file index because
instead of having the two level directory, the directory is just a
simple table indexed by inum.

Then there are the index node methods, then there are three more btrees
to do, to implement all five I described earlier:

   inode table
   file index
   free extents*
   atime table*
   PHTree directory*

That all adds up to 110 methods to implement, quite a lot of work by
any measure.  Fortunately, most of them are much simpler than the file
index leaves.  Another nice time saver on the way to a functional
prototype is, none of the btrees marked with * are needed in the
functional prototype.  So after implementing two complete sets of leaf
functions (file index and inode table) I will move on to the generic
btree code, which will get the system to the point where we can test
the path all the way from inode lookup to file data block access.

Then the thing to do is hook up the versioned pointer code (not extents
for now) and try accessing some versioned data.  All still in userspace.

That is about a dozen code drops away.

Regards,

Daniel

_______________________________________________
Tux3 mailing list
Tux3 at tux3.org
http://tux3.org/cgi-bin/mailman/listinfo/tux3



More information about the Tux3 mailing list