Fsck Revisited

Daniel Phillips daniel.raymond.phillips at gmail.com
Sun Jan 27 12:41:08 PST 2013


On Sun, Jan 27, 2013 at 1:03 AM, OGAWA Hirofumi
<hirofumi at mail.parknet.co.jp> wrote:
> Daniel Phillips <daniel.raymond.phillips at gmail.com> writes:
>
>> This algorithm is easily extended to repair inconsistent bitmaps. Simply copy
>> any blocks that differ from the shadow to the allocation bitmap. This can be
>> done in cache, then the above check may be repeated to verify success, prior
>> to committing repaired bitmap blocks to disk. Note: the log may need to be
>> adjusted to remove pending bitmap updates. Alternatively, log entries may be
>> added rather than flushing bitmap blocks, creating a state consistent with a
>> future rollup.
>
> Copying shadow bitmap to bitmap blocks is not good. It requires to be
> follow atomic-commit rules (i.e. redirect blocks to modify, and modify
> pointers to blocks).
>
> Instead, we can just use balloc/bfree and log_balloc/log_bfree to fix
> difference of bitmap. With this, those operations are already following
> atomic-commit rules (i.e. crash proof), and no need to care about
> current logblock detail too.

Strongly agree, that is the right place to start. But what about the case
where a large number of bits differ? For example, suppose an entire block
of bits were zeroed (which you discover after copying your volume off of
a failing disk). Then the most efficient strategy is to copy the whole block,
which would require extra care to make the log consistent. But clearly
worth it in this case.

Of course, the above is just an optimization. Your suggestion is the best
to implement at first, and at worst, it might create an unusually large
number of log blocks, which is worth testing in itself. Note: it is possible
to construct a case where the volume hits ENOSPC during fsck repair,
which would be bad. So both approaches may need to be available in a
production ready fsck.

> To make log, we have to be on backend though. On userland, we can
> violate to this rule, because no asynchronous (This is why mkfs can
> reserve superblock on frontend).

Yes, true. But we should also keep in mind that our fsck will be online at
some point, and so try to favour algorithms that port easily to kernel. Note
that "online" does not necessarily mean "running".

Regards,

Daniel



More information about the Tux3 mailing list