Tux3 Report: How fast can we fail?

Wed May 27 15:46:27 PDT 2015

On 05/27/2015 02:39 PM, Pavel Machek wrote:
> On Wed 2015-05-27 11:28:50, Daniel Phillips wrote:
>> On Tuesday, May 26, 2015 11:41:39 PM PDT, Mosis Tembo wrote:
>>> On Tue, May 26, 2015 at 6:03 PM, Pavel Machek <pavel at ucw.cz> wrote:
>>>
>>>>
>>>>> We identified the following quality metrics for this algorithm:
>>>>>
>>>>> 1) Never fails to detect out of space in the front end.
>>>>> 2) Always fills a volume to 100% before reporting out of space.
>>>>> 3) Allows rm, rmdir and truncate even when a volume is full.
>>>
>>> This is definitely nonsense. You can not rm, rmdir and truncate
>>> when the volume is full. You will need a free space on disk to perform
>>> such operations. Do you know why?
>>
>> Because some extra space needs to be on the volume in order to do the
>> atomic commit. Specifically, there must be enough extra space to keep
>> both old and new copies of any changed metadata, plus enough space for
>> new data or metadata. You are almost right: we can't support rm, rmdir
>> or truncate _with atomic commit_ unless some space is available on the
>> volume. So we keep a small reserve to handle those operations, which
>> only those operations can access. We define the volume as "full" when
>> only the reserve remains. The reserve is not included in "available"
>> blocks reported to statfs, so the volume appears to be 100% full when
>> only the reserve remains.
>>
>> For Tux3, that reserve is variable - about 1% of free space, declining
>> to a minimum of 10 blocks as free space runs out. Eventually, we will
>> reduce the minimum a bit as we develop finer control over how free
>> space is used in very low space conditions, but 10 blocks is not bad
>> at all. With no journal and only 10 blocks of unusable space, we do
>> pretty well with tiny volumes.
> 
> Yeah. Filesystem that could not do rm on full filesystem would be
> braindead.
> 
> Now, what about
> 
> 1) writing to already-allocated space in existing files?

I mentioned earlier, it seems to work pretty well in Tux3. But do user
applications really expect it to work? I do not know of any, perhaps
you do.

Incidentally, I have been torture testing this very property using a
32K filesystem consisting of 64 x 512 byte blocks, with repeated dd,
mknod, rm, etc. Just to show that we are serious about getting this
part right.

> 2) writing to already-allocated space in existing files using mmap?

Not part of the preliminary nospace patch, but planned. I intend to
work on that detail after merge.

The problem is almost the same as write(2) in that the reserve must be
large enough to accommodate both old and new versions of all data
blocks, otherwise we lose our ACID, which we will go to great lengths
to avoid losing. The thing that makes this work nicely is the way the
delta shrinks as freespace runs out, which is the central point of our
new nospace algorithm.

Regards,

Daniel