[Tux3] Aha on the way to versioned extents

Mon Jul 28 18:20:48 PDT 2008

Hi all,

There are now over fifty of you here on the Tux3 list, I am most 
pleased.  A special shoutout to all the exnominate people who joined 
last night.  Gruesse!

I mentioned that the biggest piece of coding work remaining in Tux3 is  
generalizing from versioned pointers to versioned extents.  Just now, a 
way to turn that big project into a small project popped into my head.

First a couple of terminology adjustments: from now on there will be no 
more exceptions (legacy volume management terminology) and there will 
be elements instead.  A versioned element can be a pointer, an extent 
or an attribute.  Also, there will no longer be chunks.  Chunks are now 
just blocks in accord with standard filesystem/database terminology.

The aha is this: there is already nice code for the versioned pointer 
case, working on a single element list that is the set of versioned 
pointers for some logical block of a file or volume.  To generalize it 
to extents, we take a region of extents on which we want to perform 
some operation and convert it to a set of element lists, one for each 
logical block in the region.  Then we apply an edit operation (say 
version delete, the nastiest) to each of the lists separately.  Then we 
walk through the lists of single block pointers in parallel, converting 
them back to extents.  Then insert those emitted extents back into the 
region we started with, merging with extents on the boundares of the 
region where possible.  Easy enough to see how to do that, right?

Issue: this brute force approach may suck for performance.  But it does 
not suck for verifying correctness, and that is the dominating issue at 
the moment, the other dominating issue being that without extents we 
cannot compare Tux3's metadata compactness in any meaningful way to 
filesystems that do have extents.

The thing to do then, is iteratively improve the above simple idea to be 
more efficient.  We do not have to generate a versioned pointer list 
for each logical address, we only need one list and we will modify that 
one as we walk across the extent region.  Actually, make that two 
lists: a before-edit list and an after-edit list.  Actually, make that 
a source list of extents and an edited list of extents and also have an 
output list of extents.

First, load the source list of extents by adding in each extent that 
overlaps the logical block at the start of the region.  Now apply the 
edit operation to that list as if it were a list of single block 
versioned pointers, yielding an edited extent list.  Any extents in the 
source list that failed to appear in the output list are truncated to 
end at the current location, and if now zero length, deleted entirely.  
Copy the edited list to the output list.

Now advance to the beginning of the next extent in the region.  For any 
extents that end during that advance, set the extent count, remove them 
from the output extent list and emit them as final results.  Add the 
extents that begin at the new position to the source extent list.  
Apply the edit operation to the list of source extents as if they were 
block pointers, generating a new edited list.  Compare the edited list 
to the output list to see which output extents have disappeared from 
the edited list and therefore should have their length set and be 
removed from the output list and emitted as final results.  Repeat 
from "advance" above.

This is not meant to be a completely accurate algorithm, only to 
communicate the idea and to show that the messy problem can be 
unravelled into a nice simple problem with a simple solution.

Even this improved, iterative algorithm is far from optimal, but it is 
not too bad either.  It will be simple minded code (the way I like it) 
and it build strictly on the existing versioned pointer editing code 
without making the underlying code more complicated.  This approach 
will likely serve well for some time.

I now think that coding up versioned extents is not a big callenge, and 
that the biggest outstanding issue is designing a good, efficient 
transaction commit model, which is proceding well in another thread.

Regards,

Daniel

_______________________________________________
Tux3 mailing list
Tux3 at tux3.org
http://tux3.org/cgi-bin/mailman/listinfo/tux3