Design note: Perfect ENOSPC

Daniel Phillips daniel at phunq.net
Mon May 12 15:22:43 PDT 2014


"Perfect ENOSPC handling" means returning ENOSPC for any syscall that
would otherwise hit ENOSPC in the backend, and never returning ENOSPC
for a syscall that would otherwise be successfully committed to media.

This builds on my earlier design note, which defines a way to shrink
the delta size as a volume approaches full. I do not propose to implement
perfect ENOSPC handling in the near future because we do not need to be
perfect, just good enough. This note is just to prove that perfect ENOSPC
handling is possible and practical.

We only need to consider the single transaction case, because we already
have a nice way of reducing to that case. The only certain way to know
whether the backend would run out of space or not is to actually run it,
and for that, the change must first be written to cache. If the backend
fails then we must be able to back the change out of cache. That is
easier than it sounds: by definition, we are always able to rebuild
dirty cache from the most recent commit from media, using the log. This
is how replay works at mount or after a crash. So to back out a failed
change, we just invalidate all our cache and replay the log. This is not
inefficient, because there are not many changes in the last committed
delta due to being nearly out of space. We already do things like this
in our unit tests so it is not particularly difficult either.

So for perfect ENOSPC handling, instead of returning ENOSPC where the
current algorithm does, we try one more time by actually calling the
backend. If the backend says "ENOSPC" then we invalidate cache and replay
the log, without setting the filesystem read-only or otherwise
inconveniencing the user.

If we ever do implement this, it will enable even the smallest volumes
to be completely filled without needing any reserve, so that our smallest
volume size becomes something like 32K. Whether anybody really needs
that is good question.

Regards,

Daniel



More information about the Tux3 mailing list