[Tux3] Design note: Buffer forking
Daniel Phillips
phillips at phunq.net
Thu Jan 8 04:16:02 PST 2009
Buffer forking is a technique I proposed to allow dirty metadata blocks
in cache to be further modified while a version that belongs to an
earlier delta is still in flight to disk. It is a kind of copy on
write: the memory backing a buffer is swapped out for a copy that can
be modified without affecting the original.
In contrast to block redirect which copies data from one buffer to
another, buffer forking affects a single buffer. We could say buffer
forking splits a buffer in time, leaving a read-only copy in an earlier
delta and a writable copy in the current delta.
Buffer forking is a front end operation, while block redirect is a back
end operation.
Buffer forking was conceived as a method of avoiding data dependencies
between cache layers that would cause pipeline stalls. Even though our
immediate plans are to run deltas one at a time, so that there is only
a single cache layer, buffer forking is still useful as a means of
dealing with the allocate-during-flush problem that comes up when
flushing bitmaps to disk.
Buffer forking takes place inside the blockdirty(buffer) function. The
idea is simple:
if the buffer is dirty in an earlier delta
allocate a new buffer
swap the buffer's backing memory with the new buffer
remove the old buffer from the earlier delta list
insert the new buffer on the earlier delta list
set the buffer dirty in the current delta
insert the buffer on the current delta list
Our immediate use case is flushing bitmaps. For now, this will happen
at delta transition time. We have a list of bitmaps that are dirty in
the current delta. We increment the delta counter, so the current
delta becomes the committing delta, and move the list. Now we walk the
commit list and allocate or reallocate physical addresses for each
block (map_region). If an allocation sets a bit in a bitmap block that
is dirty in the committing delta, that block will be forked and become
dirty in the current delta, leaving a read only copy on the commit
delta list. Finally, we walk the committing list again, initiating the
write transfer.
What could be simpler? Well, to be honest, a lot of things could be
simpler. But this is a pretty simple algorithm compared to trying to
handle all the cases of allocate-in-allocate some other way. Just for
fun, try it!
Forking is easy to implement in userspace: each buffer has a pointer to
memory for a block, and we just change the pointer. In kernel, there
are complications:
* Multiple buffers are packed onto a single page of memory.
* Multiple tasks may access a buffer asynchronously.
So we want to replace a page which may be shared by, say, four buffers,
each of which may be clean, dirty, in process of being read or written,
or not in use at all. Is it even possible? I think it does work, just
barely.
When we want to fork a buffer, we start by bringing its underlying page
uptodate (read_mapping_page). This gets rid of the otherwise very
nasty case of read IO that might be in progress on one or more other
blocks on the same page. After the read_mapping_page, each buffer on
the page is either clean, dirty in the current delta, or dirty in some
previous delta. We have to look at the cases.
Other dirty buffer on same page in current delta:
Not possible, if that was the case this page would already have
been forked
Other dirty buffers on same page in earlier delta:
After the page copy, all other dirty buffers are now dirty in the
current delta and may be freely modified thereafter. They were
forked "for free". They are either not being accessed, or being
read, in which case it is ok for the reading code to continue to
read the copy of the buffer on the old page. If the accessing
code then decides to write, the buffer is already forked.
Clean buffers being read by filesystem code:
The data being read is unchanged by the copy, as for a dirty
buffers in earlier delta being accessed for read.
Fork algorithm with multiple blocks per page:
Allocate a new page and put buffers on it
Use read_mapping_page to bring the full page uptodate
Take the page lock (protects the buffer list)
Swap the pages (set b_page for each buffer, exchange page->private)
Walk the two buffer lists together:
If the old buffer is dirty in a previous delta
Set it dirty in the current delta
Move it from the earlier delta list to the current delta list
Add the new buffer to the earlier delta list
Release the page lock
Users of blocks that can fork have to take some extra care. The address
of the buffer data can change asynchronously, but not if the buffer is
dirty in the current delta. A user that reads buffer data must be
aware that two calls to bufdata(buffer) may return different addresses.
Usually there is only one call to bufdata(buffer) per blockread(buffer)
so that is not a big change. A user that first reads buffer data then
writes to it will call blockdirty(buffer), and then it must get the
data pointer from bufdata(buffer) again. This requires changes in a
few places in our code.
_______________________________________________
Tux3 mailing list
Tux3 at tux3.org
http://mailman.tux3.org/cgi-bin/mailman/listinfo/tux3
More information about the Tux3
mailing list