[Tux3] Feature interaction between multiple volumes and atomic update
Matthew Dillon
dillon at apollo.backplane.com
Fri Aug 29 20:31:03 PDT 2008
:It turns out that multiple independent volumes sharing the same
:allocation space is a feature that does not quite come for free as I
:had earlier claimed. The issue is this:
:...
: * Therefore it seems logical that Tux3 should have a separate forward
: log for each subvolume to allow independent syncing of subvolumes.
: But global allocation state must always be consistent regardless of
: the order in which subvolumes are synced.
I had a lot of trouble trying to implement multiple logs in HAMMER
(the idea being to improve I/O throughput). I eventually gave up
and went with a single log (well, UNDO fifo in HAMMER's case). So
e.g. even though HAMMER does implement pseudo-filesystem spaces
for mirroring slaves and such, everything still uses a single log
space.
: 3) When the first subvolume is remounted after a crash, implicitly
: remount and replay all subvolumes that were also mounted at the time
: of the crash, roll up the logs, and unmount them.
If you synchronize the transaction id spaces between the subvolumes
then the crash recovery code could use a single number to determine
how far to replay each subvolume. That sounds like it ought to work.
: 4) Partition the allocation space so that each subvolume allocates
: from a completely independent allocation space, which is separately
: logged and synced. Either implement this by providing an
: additional level of indrection so that each subvolume has its own
: map of the complete volume which may be expanded from time to time
: by large increments, or record in each subvolume allocation map
: only those regions that are free and available to the subvolume.
I tried this in an earlier HAMMER implementation and it was a
nightmare. I gave up on it. Also, in an earlier iteration, I
had a blockmap translation layer to support the above. That
worked fairly well as long as the blocks were very large (at least
8MB). When I went to the single global B-Tree model I didn't
need the layer any more and devolved it back down to a simple
2-layer freemap.
:something Tux3 wishes to avoid. We would be better advised to improve
:the volume manager so that it is capable enough to provide such
:incremental allocation itself in a way that maps well to the needs of
:filesystems such as Tux3.
:
:I CC'd this one to Matt Dillon, perhaps mainly for sympathy. Hammer
:does not have this issue as it does not support subvolumes, perhaps
:wisely.
Yah. We do support pseudo-filesystems within a HAMMER filesystem,
but they are implemented using a field in the B-Tree element key.
They aren't actually separate filesystems, they just use totally
independant key spaces within the global B-Tree.
We use the PFSs as replication sources and targets. This also allows
the inode numbers to be replicated (each PFS gets its own inode
numbering space).
:Regards,
:
:Daniel
-Matt
Matthew Dillon
<dillon at backplane.com>
_______________________________________________
Tux3 mailing list
Tux3 at tux3.org
http://tux3.org/cgi-bin/mailman/listinfo/tux3
More information about the Tux3
mailing list