[Tux3] Basic logging code committed

Mon Jan 5 01:07:35 PST 2009

A userspace prototype for logging and replay is committed:

  http://hg.tux3.org/tux3/rev/b329f332ec12

This code demonstrates a nice efficient approach to logging and replay.  
The encode/decode primitives used for inode attributes are re-used 
here, which is pleasant.  This code is already endian-correct, which 
was easy because we had already developed the primitives.  I ran sparse 
checking, which was easy thanks to Hirofumi's makefile improvements, 
and this code came up endian clean on the first try.  Except that 
sparse found a bug in balloc.c I had introduced witt the new (untested) 
all_clear function.  Which means I should run sparse more often.

Log blocks are kept in a mapping, for which it is probably easiest just 
to allocate an inode in kernel.  Having them in their own mapping lets 
us treat the log blocks as an array, and by not using the buffer 
mapping for logs, we can delay deciding where each log block should be 
physically located until it is time to commit them to disk.

Variable sized log records are serially encoded into the highest log 
block in the log mapping.  The only example so far is log_alloc, which 
logs bitmap allocations and frees to implement the deferred bitmap 
update idea I described earlier.

We will also have log entries to implement the "promise" idea where we 
store a block and do not update its parent index on disk, but just 
update the parent in cache and add a log entry so that the cached 
parent can be reconstructed on replay.

The user space prototype replays just by re-walking the list of log 
blocks.  Replay in production will be a little more complex.  Later log 
entries may invalidate earlier ones, for example, storing a bitmap 
block invalidates all earlier sets and clears for that bitmap block 
because they are based on the previously recorded version of the block.  
So when we update a bitmap block, redirecting it to a new location, the 
log receives a DEADMAP record that means all previous log entries for 
the indicated bitmap block should be ignored on replay.  So log replay 
is going to be a multi-pass affair: first we walk the log blocks and 
find all the places where things are invalidated, then do the replay 
using that information to ignore what must be ignored.  Suggestions are 
welcome for how to do this elegantly.

Each delta will have one or more log blocks.  Each log block points at 
the previous log block, so we can load them in reverse order.  The last 
log block for a delta has a commit entry that includes a pointer to the 
previous delta commit block, and a count of how many delta commits 
blocks there are in the chain so that rollup can shorten the log chain 
just by writing a new commit block.  For the time being, the disksuper 
will maintain a pointer to the most recent delta commit block.  This 
idea should be improved eventually, because repeatedly updating the 
superblock and hoping that there will never be a partial update is not 
good practice.  However, it is extremely unlikely to cause a problem in 
practice, so we can defer this issue.

On startup, we load the log mapping by chaining back through the commit 
chain blocks, and chaining backward through each delta log chain.  
Being lazy, we can start at some high index in the log mapping and work 
backwards.

Loading the log chain this way is perhaps not the greatest idea for the 
long term, because each log block has to be read synchronously, with an 
average access time around 6 ms for rotating media.  It does not take 
many log blocks to add up to a significant delay on startup.  We can 
think about improving this later.

With this logging code, we officially enter the next phase of the 
project: robustness and crash recovery.

Regards,

Daniel

_______________________________________________
Tux3 mailing list
Tux3 at tux3.org
http://mailman.tux3.org/cgi-bin/mailman/listinfo/tux3