[Tux3] Design note: Block Redirect
Daniel Phillips
phillips at phunq.net
Mon Jan 5 19:48:51 PST 2009
Block redirect is the operation of remapping a dirty buffered block to a
new physical location so that it can be flushed to disk without
overwriting part of a previously committed consistent image. This
involves allocating a new disk block and updating the cached parent
block to point to it.
A physically mapped block (that is, btree node or leaf in buffer cache)
must be copied to a new buffer and the btree cursor used to access it
must be updated, but a logically mapped block (data, dirent or bitmap)
needs no copy or cursor update. In either case, alloc and free records
are logged, along with a promise to update the disk image of the parent
block.
Block redirect algorithm:
- if buffer not dirty in current delta:
- balloc new block
- use cursor to update pointer in parent
- log old free, new alloc and promise to update parent on disk
- set buffer dirty in current delta
- add old block to free-at-end-of-delta list
- if physically mapped:
- blockget new block
- copy old buffer to new
- replace buffer and next in cursor
Redirecting can be done before or after altering buffer contents: the
block will not be flushed to disk before the filesystem change that
caused the redirect completes. If the block is already belongs to the
current delta then it is new or already redirected, and does not need
to be redirected again (a common case for metadata). However, to start
with we will always redirect, to exercise the mechanism.
Copying buffer contents for physical block redirect can be avoided in
user space by rehashing the buffer, and in kernel when block size =
page size, by moving the page in the radix tree.
Block redirect is the main workhorse of atomic commit in Tux3. The
other elements are: logging, change brackets, buffer forking, delta
commit, log rollup and log replay. We have now seen prototype code for
logging, log replay and change brackets, a reasonably detailed
description of block redirect, and various hints about the other three.
Next on the list is a detailed description of buffer forking, and how
it simplifies the tricky process of updating the allocation bitmap.
Regards,
Daniel
_______________________________________________
Tux3 mailing list
Tux3 at tux3.org
http://mailman.tux3.org/cgi-bin/mailman/listinfo/tux3
More information about the Tux3
mailing list