[Tux3] Block handles: Our fancy new atomic block state handler

Daniel Phillips phillips at phunq.net
Mon Nov 24 18:44:29 PST 2008

Here is the cool new synchronizing primitive that Hirifumi, Maciej and I 
developed last night:

unsigned change_block_state(handle_t handle, unsigned state)
	atomic_t *statemap = &handles(handle)->statemap;
	unsigned shift = handle_shift(handle);
	unsigned oldmap = atomic_read(statemap);
	while (1) {
		unsigned newmap = (oldmap & ~(7 << shift)) | (state << shift);
		unsigned latest = atomic_cmpxchg(statemap, oldmap, newmap);
		if (oldmap == latest)
		oldmap = latest;
	return (oldmap >> shift) & 7;

It atomically updates one of up to eight block states and returns the 
previous statemap.  We can use the old statemap to detect transitions, 
not only on the field we just updated but on the statemap as a whole.  
For example, we can detect when the last block on a page went uptodate, 
and therefore be sure that we will not race with a parallel block read 
and accidentally unlock a page twice.

To illustrate, suppose a bio endio completes for block 2 of four blocks 
on a page.  We update the block state atomically from "empty" 
to "clean", and get the pre-update version of the entire statemap back 
as a result.  We check that the block was actually locked as we 
expected it to be, then we look at the states of all the blocks on the 
page just before the update.  We can check that no other blocks were 
empty when we took the transition with a simple closed form bit 

   !(((~oldmap & 0x7077) + 0x1111) & 0x8888)

This assumes that the BLOCK_EMPTY state is zero, which it was not in my 
previous post, but now it is, just to make this work.  This is a trick 
from signal processing, where we add to multiple bitfields in one 
instruction and carry into an unused bit.  The unused bits are 
available because we masked off the lock bits.  Before that, we turned 
any zero fields into all ones.  Then we add one to every field, which 
carries into the fomer lock bits only for fields that are all ones.  
Then mask for just the carry bits, and if the result is zero then all 
the state fields except the one we just updated must have been zero.

The code that the current block IO library to do essentially the same 
thing is:

 403        /*
 404         * Be _very_ careful from here on. Bad things can happen if
 405         * two buffer heads end IO at almost the same time and both
 406         * decide that the page is now completely done.
 407         */
 408        first = page_buffers(page);
 409        local_irq_save(flags);
 410        bit_spin_lock(BH_Uptodate_Lock, &first->b_state);
 412        unlock_buffer(bh);
 413        tmp = bh;
 414        do {
 415                if (!buffer_uptodate(tmp))
 416                        page_uptodate = 0;
 417                if (buffer_async_read(tmp)) {
 418                        BUG_ON(!buffer_locked(tmp));
 419                        goto still_busy;
 420                }
 421                tmp = tmp->b_this_page;
 422        } while (tmp != bh);
 423        bit_spin_unlock(BH_Uptodate_Lock, &first->b_state);
 424        local_irq_restore(flags);

Also, we avoid a bit_spin_lock/unlock pair and irq disable/enable 
because our change_ operation is atomic.  And, though I didn't show it 
here, we released the block lock in the same operation as updating the 
block state, we just have to wake up any waiters.

Our version is much shorter than the original, and faster.  This code 
executes a lot, not as much as it used to with multipage IO being the 
main path now, but still a lot.  And I expect the multipage path will 
get similarly dramatic cleanups as we go.



Tux3 mailing list
Tux3 at tux3.org

More information about the Tux3 mailing list