[Tux3] Patch: Convert log mutex to spinlock
Daniel Phillips
phillips at phunq.net
Wed Jan 7 13:51:27 PST 2009
...I forgot to explain how this spinlock strategy actually works:
spin_lock(&sb->loglock);
if (sb->logpos + bytes > sb->logtop) {
struct buffer_head *logbuf;
while (1) {
unsigned lognext = sb->lognext;
spin_unlock(&sb->loglock);
logbuf = blockget(mapping(sb->logmap), sb->lognext);
spin_lock(&sb->loglock);
if (lognext == sb->lognext)
break;
brelse(logbuf);
}
if (sb->logbuf)
log_finish(sb);
log_next(sb);
sb->logbuf = logbuf;
sb->lognext++;
log_start(sb);
*(struct logblock *)bufdata(sb->logbuf) = (struct logblock){
.magic = to_be_u16(0xc0de) };
}
After taking the spinlock and finding out that the requested space is
not available, log_begin has to get a new buffer, and because blockget
can sleep, it has to drop the spinlock. Blockget is SMP-safe, so
several log_begins can execute it in parallel. When log_begin retakes
the spinlock, some other log_begin may have retaken the spinlock first,
and already advanced the ->lognext pointer. If the pointer is not
advanced yet, our log_begin knows that it won the race, so it can
advance ->lognext, update the buffer pointer and do the other
bookkeeping. By advancing ->lognext, it ensures that no other
log_begin will try to make these same changes. If it finds that
lognext was already changed, it releases the buffer and just repeats
the whole algorithm, starting from the check for available log space.
It is remotely possible that some other log_begin could win the race to
update the buffer pointer, then use up all the log space. This
unlikely case is handled by repeating the space check.
Unlike blockget, brelse can be executed under a spinlock, as can be
seen by examining the code:
http://lxr.linux.no/linux+v2.6.27/fs/buffer.c#L1211
Otherwise, the spinlock would have to be released before the brelse,
and this messy code would be even messier.
In general, you have to do a lot of stupid-looking tricks like above
to use spinlocks in place of mutexes, and the reward is a tiny speedup.
It is better just to avoid them in developmental code. I don't know
what possessed me ;-)
Regards,
Daniel
_______________________________________________
Tux3 mailing list
Tux3 at tux3.org
http://mailman.tux3.org/cgi-bin/mailman/listinfo/tux3
More information about the Tux3
mailing list