[Tux3] Aha on the way to versioned extents
Daniel Phillips
phillips at phunq.net
Mon Jul 28 18:20:48 PDT 2008
Hi all,
There are now over fifty of you here on the Tux3 list, I am most
pleased. A special shoutout to all the exnominate people who joined
last night. Gruesse!
I mentioned that the biggest piece of coding work remaining in Tux3 is
generalizing from versioned pointers to versioned extents. Just now, a
way to turn that big project into a small project popped into my head.
First a couple of terminology adjustments: from now on there will be no
more exceptions (legacy volume management terminology) and there will
be elements instead. A versioned element can be a pointer, an extent
or an attribute. Also, there will no longer be chunks. Chunks are now
just blocks in accord with standard filesystem/database terminology.
The aha is this: there is already nice code for the versioned pointer
case, working on a single element list that is the set of versioned
pointers for some logical block of a file or volume. To generalize it
to extents, we take a region of extents on which we want to perform
some operation and convert it to a set of element lists, one for each
logical block in the region. Then we apply an edit operation (say
version delete, the nastiest) to each of the lists separately. Then we
walk through the lists of single block pointers in parallel, converting
them back to extents. Then insert those emitted extents back into the
region we started with, merging with extents on the boundares of the
region where possible. Easy enough to see how to do that, right?
Issue: this brute force approach may suck for performance. But it does
not suck for verifying correctness, and that is the dominating issue at
the moment, the other dominating issue being that without extents we
cannot compare Tux3's metadata compactness in any meaningful way to
filesystems that do have extents.
The thing to do then, is iteratively improve the above simple idea to be
more efficient. We do not have to generate a versioned pointer list
for each logical address, we only need one list and we will modify that
one as we walk across the extent region. Actually, make that two
lists: a before-edit list and an after-edit list. Actually, make that
a source list of extents and an edited list of extents and also have an
output list of extents.
First, load the source list of extents by adding in each extent that
overlaps the logical block at the start of the region. Now apply the
edit operation to that list as if it were a list of single block
versioned pointers, yielding an edited extent list. Any extents in the
source list that failed to appear in the output list are truncated to
end at the current location, and if now zero length, deleted entirely.
Copy the edited list to the output list.
Now advance to the beginning of the next extent in the region. For any
extents that end during that advance, set the extent count, remove them
from the output extent list and emit them as final results. Add the
extents that begin at the new position to the source extent list.
Apply the edit operation to the list of source extents as if they were
block pointers, generating a new edited list. Compare the edited list
to the output list to see which output extents have disappeared from
the edited list and therefore should have their length set and be
removed from the output list and emitted as final results. Repeat
from "advance" above.
This is not meant to be a completely accurate algorithm, only to
communicate the idea and to show that the messy problem can be
unravelled into a nice simple problem with a simple solution.
Even this improved, iterative algorithm is far from optimal, but it is
not too bad either. It will be simple minded code (the way I like it)
and it build strictly on the existing versioned pointer editing code
without making the underlying code more complicated. This approach
will likely serve well for some time.
I now think that coding up versioned extents is not a big callenge, and
that the biggest outstanding issue is designing a good, efficient
transaction commit model, which is proceding well in another thread.
Regards,
Daniel
_______________________________________________
Tux3 mailing list
Tux3 at tux3.org
http://tux3.org/cgi-bin/mailman/listinfo/tux3
More information about the Tux3
mailing list