[Tux3] Feature interaction between multiple volumes and atomic update

Matthew Dillon dillon at apollo.backplane.com
Fri Aug 29 20:31:03 PDT 2008

:It turns out that multiple independent volumes sharing the same
:allocation space is a feature that does not quite come for free as I
:had earlier claimed.  The issue is this:
: * Therefore it seems logical that Tux3 should have a separate forward
:   log for each subvolume to allow independent syncing of subvolumes.
:   But global allocation state must always be consistent regardless of
:   the order in which subvolumes are synced.

    I had a lot of trouble trying to implement multiple logs in HAMMER
    (the idea being to improve I/O throughput).  I eventually gave up
    and went with a single log (well, UNDO fifo in HAMMER's case).  So
    e.g. even though HAMMER does implement pseudo-filesystem spaces
    for mirroring slaves and such, everything still uses a single log

: 3) When the first subvolume is remounted after a crash, implicitly
:    remount and replay all subvolumes that were also mounted at the time
:    of the crash, roll up the logs, and unmount them.

    If you synchronize the transaction id spaces between the subvolumes
    then the crash recovery code could use a single number to determine
    how far to replay each subvolume.  That sounds like it ought to work.

: 4) Partition the allocation space so that each subvolume allocates
:    from a completely independent allocation space, which is separately
:    logged and synced.  Either implement this by providing an
:    additional level of indrection so that each subvolume has its own
:    map of the complete volume which may be expanded from time to time
:    by large increments, or record in each subvolume allocation map
:    only those regions that are free and available to the subvolume.

    I tried this in an earlier HAMMER implementation and it was a
    nightmare.  I gave up on it.  Also, in an earlier iteration, I
    had a blockmap translation layer to support the above.  That 
    worked fairly well as long as the blocks were very large (at least
    8MB).  When I went to the single global B-Tree model I didn't
    need the layer any more and devolved it back down to a simple
    2-layer freemap.

:something Tux3 wishes to avoid.  We would be better advised to improve
:the volume manager so that it is capable enough to provide such
:incremental allocation itself in a way that maps well to the needs of
:filesystems such as Tux3.
:I CC'd this one to Matt Dillon, perhaps mainly for sympathy.  Hammer
:does not have this issue as it does not support subvolumes, perhaps

    Yah.  We do support pseudo-filesystems within a HAMMER filesystem,
    but they are implemented using a field in the B-Tree element key.
    They aren't actually separate filesystems, they just use totally
    independant key spaces within the global B-Tree.

    We use the PFSs as replication sources and targets.  This also allows
    the inode numbers to be replicated (each PFS gets its own inode
    numbering space).


					Matthew Dillon 
					<dillon at backplane.com>

Tux3 mailing list
Tux3 at tux3.org

More information about the Tux3 mailing list