[Tux3] Count: a few thoughts about REALLY outstanding features

Daniel Phillips phillips at phunq.net
Sun Dec 21 15:15:08 PST 2008


On Saturday 20 December 2008 12:50, Michael Pattrick wrote:
> >> Just another filesystem I am afraid, with one big advance
> >> (versioning) and a number of incremental ones.
> And that's the way to go about it, it may seem easy to tack side
> projects on but scopecreep (ScopeCreep: the devourer of souls and
> government projects) could destroy Tux3. Adding this type of feature
> would increase the complexity of a filesystem with the stated goal of
> having a 'tight' code base. Having a well defined list of reasonable
> features increases the likelihood that a project will be successful,
> adding a feature like this right now - just as Tux3 is preparing for
> its mainline merge- could delay the merge, increase the time needed to
> document, increase code complexity, and possibly introduce new types
> of bugs.
> 
> But that's just my take on it.

That's it all right.  To be more specific, by sticking with the rule 
that each allocated extent has exactly one pointer to it, we bypass a 
whole class of complexity and associated bugs.  When versioning is 
added using the versioned pointers method (versioned extents, versioned 
attributes) we still keep the single pointer per extent model.  At that 
point, we have snapshotting in a nice flexible form including writeable 
snapshots of snapshots, without elaborating the Tux3 structural model 
at all.  Only the btree leaf block scanning and editing code changes.

We will use our user space unit testing strategy to handle the 
additional leaf handling complexity, to give us the large number of 
development and testing iterations that are necessary to make code of 
that nature work really reliably.  A large number of unit testing 
iterations also helps code settle down to a relatively simple form.

Look in version.c and check out the unit testing there to see what I 
mean: it implements a random fuzz tester to beat heavily on corner 
cases, trying out millions of combinations in a few seconds and 
checking for correctness at every step.  Of course, this is no 
substitute for thinking deeply about what is going on, but it is a 
powerful tool for catching issues that slip through the net of pure 
reasoning.

When we add the additional versioning complexity to the ileaf and dleaf 
processing code, we will have another layer of unit testing at the leaf 
level.

What this means is that to implement versioning, we combine two well 
tested components: our classic single-referenced filesystem design and 
versioning logic that stays strictly within the the dleaf processing.
We therefore hope that the vast majority of bugs will be caught by Tux3 
developers in unit testing and not by users in full-system testing.

Now, single referencing does not immediately support data de-duplication 
and pointer techniques to avoid file copies.  But it does support 
snapshotting, and should make it easier to do online expand, shrink and 
checking reliably.  These are the must-have features that are currently 
deficient in Linux, and are real impediments for Linux storage.  I 
respect and admire those developers who are willing to jump in and 
tackle those other cool features, but to get where we need to be in the 
Linux storage space, our little group needs to stay focussed on 
essentials.

That said, we will eventually elaborate the Tux3 allocation model to add 
an allocation btree as a complement to the bitmap table.  I have 
written a little bit about this previously.  The executive summary is: 
for highly fragmented filesystems, bitmap allocation is more efficient
than extents (up to 50 times more space efficient) while for large files 
on unfragmented filesystems, extent allocation is much more efficient.  
The efficiency equation is compelling enough to justify some extra 
complexity in order to switch between them, depending on observed 
allocation statistics.  The point of this is, when extent allocation 
arrives, we can have reference counts on the extents and use that to 
implement such things as de-duplication.  Future fanciness.

If somebody wanted to work on de-duplication right now, I would 
recommend using a per-block reference count table mapped into a file, 
like the xattr atom refcounting we already have.  This is not the most 
efficient reference counting mechanism in the world, but it will work 
fine for testing algorithms and proving the worth of the feature.

Regards,

Daniel

_______________________________________________
Tux3 mailing list
Tux3 at tux3.org
http://mailman.tux3.org/cgi-bin/mailman/listinfo/tux3



More information about the Tux3 mailing list