[Tux3] Basic logging code committed
Daniel Phillips
phillips at phunq.net
Mon Jan 5 01:07:35 PST 2009
A userspace prototype for logging and replay is committed:
http://hg.tux3.org/tux3/rev/b329f332ec12
This code demonstrates a nice efficient approach to logging and replay.
The encode/decode primitives used for inode attributes are re-used
here, which is pleasant. This code is already endian-correct, which
was easy because we had already developed the primitives. I ran sparse
checking, which was easy thanks to Hirofumi's makefile improvements,
and this code came up endian clean on the first try. Except that
sparse found a bug in balloc.c I had introduced witt the new (untested)
all_clear function. Which means I should run sparse more often.
Log blocks are kept in a mapping, for which it is probably easiest just
to allocate an inode in kernel. Having them in their own mapping lets
us treat the log blocks as an array, and by not using the buffer
mapping for logs, we can delay deciding where each log block should be
physically located until it is time to commit them to disk.
Variable sized log records are serially encoded into the highest log
block in the log mapping. The only example so far is log_alloc, which
logs bitmap allocations and frees to implement the deferred bitmap
update idea I described earlier.
We will also have log entries to implement the "promise" idea where we
store a block and do not update its parent index on disk, but just
update the parent in cache and add a log entry so that the cached
parent can be reconstructed on replay.
The user space prototype replays just by re-walking the list of log
blocks. Replay in production will be a little more complex. Later log
entries may invalidate earlier ones, for example, storing a bitmap
block invalidates all earlier sets and clears for that bitmap block
because they are based on the previously recorded version of the block.
So when we update a bitmap block, redirecting it to a new location, the
log receives a DEADMAP record that means all previous log entries for
the indicated bitmap block should be ignored on replay. So log replay
is going to be a multi-pass affair: first we walk the log blocks and
find all the places where things are invalidated, then do the replay
using that information to ignore what must be ignored. Suggestions are
welcome for how to do this elegantly.
Each delta will have one or more log blocks. Each log block points at
the previous log block, so we can load them in reverse order. The last
log block for a delta has a commit entry that includes a pointer to the
previous delta commit block, and a count of how many delta commits
blocks there are in the chain so that rollup can shorten the log chain
just by writing a new commit block. For the time being, the disksuper
will maintain a pointer to the most recent delta commit block. This
idea should be improved eventually, because repeatedly updating the
superblock and hoping that there will never be a partial update is not
good practice. However, it is extremely unlikely to cause a problem in
practice, so we can defer this issue.
On startup, we load the log mapping by chaining back through the commit
chain blocks, and chaining backward through each delta log chain.
Being lazy, we can start at some high index in the log mapping and work
backwards.
Loading the log chain this way is perhaps not the greatest idea for the
long term, because each log block has to be read synchronously, with an
average access time around 6 ms for rotating media. It does not take
many log blocks to add up to a significant delay on startup. We can
think about improving this later.
With this logging code, we officially enter the next phase of the
project: robustness and crash recovery.
Regards,
Daniel
_______________________________________________
Tux3 mailing list
Tux3 at tux3.org
http://mailman.tux3.org/cgi-bin/mailman/listinfo/tux3
More information about the Tux3
mailing list