[Tux3] Unit test for atomic commit

Daniel Phillips phillips at phunq.net
Wed Jan 28 23:03:49 PST 2009


Changeset 927 adds a new unit test for atomic commit, that is pretty
cool because it creates a mountable Tux3 filesystem:

   http://hg.tux3.org/tux3/rev/a1e52f667cad

The idea is, committest will be running lots of delta cycles pretty 
soon, with all the little pieces for atomic commit hooked up.  This is 
much better than relying on full system tests to verify things are 
working as they should.

The big change in progress is, sync_super will no longer rely on 
flush_buffers(volmap) to write physical metadata to disk.  Instead that 
happens in change_end, on a delta transition.  We also will not rely on 
tuxclose(inode) to send file data to disk, especially directory and 
bitmap data.

With this test, tuxsync(sb->bitmap) fails with EAGAIN, by design.  As we 
established earlier, a simple flush of dirty bitmap buffers causes 
recursive bitmap block dirtying, which would result in unpredictable 
state of bitmap blocks on disk if we did not control it with the buffer 
forking technique and a custom block write function that only writes 
blocks that were dirty before the flush.  This function returns EAGAIN 
error for any re-dirtied buffer, thus requiring the caller to be aware 
of the requirement to handle such blocks specially (they stay on the 
bitmap dirty list to be written later).

With buffer forking and cursor_redirect enabled, filesystem activity 
generates the following lists:

  - dirty inode list

  - dirty block list per inode (kernel uses a different mechanism)

  - two global dirty lists:
      - delta dirty list
          - forked bitmap blocks
          - redirected btree leaf blocks
      - rollup dirty list
          - redirected btree index blocks

  - two deferred free lists:
      - list of extents to free after delta commit
      - list of extents to free after next rollup

  - log blocks

Delta staging does this:

  - flush dirty inode data except bitmap, that is, map each dirty data
    block to disk and initiate writeout.

  - flush dirty inodes to inode table blocks (this redirects inode
    btree blocks, which go onto the rollup dirty list)

  - initiate writeout for delta dirty list blocks.

  - allocate disk locations and initiate writeout for log blocks,
    adding log blocks to rollup deferred free list.

Log rollup does this:

  - add per-rollup blocks to delta list (dirty btree nodes and bitmap
    blocks)

  - move deferred frees for rollup to delta deferred free list

  - set sb->logbase to sb->lognext, emptying the log

  - increment rollup counter (further block block allocations belong to
    the next rollup)

  - map dirty bitmap blocks to disk and add to delta dirty list.

So log rollup relies on the delta mechanism to do most of its work.
Rollup can be done any time in a delta, however I think the easiest
place to do it is just after delta commit completes.

Regards,

Daniel

_______________________________________________
Tux3 mailing list
Tux3 at tux3.org
http://mailman.tux3.org/cgi-bin/mailman/listinfo/tux3



More information about the Tux3 mailing list