Tux3 report: New news for the new year
Daniel Phillips
lkml at phunq.net
Wed Jan 2 03:03:13 PST 2013
On Tuesday, January 01, 2013 10:58:35 PM Shentino wrote:
> From what I can tell on the design, tux3 is "fsync satiating" with a
> single disk write. It writes the data to the final location, updates
> the log, and at that point the data is considered committed and it can
> let userspace go on its merry way and take care of rolling up the
> changes later.
Yes, correct. I think we currently sync a small file create+write with seven
blocks and a file rewrite with four blocks, including the commit block and only
one long seek. We haven't benchmarked that yet, but it sounds fast. There are
two synchronous waits in the backend, but the frontend only waits on the
commit block completion in the task doing the sync while other concurrent
filesystem operations just keep going.
> If I understand btrfs correctly though it has to block
> until the cow logic percolates all the way up to the superblock.
A careful reading of the Btrfs design doc left me confused about that. Perhaps
Btrfs devs could clarify?
> One other thing that interests me is this "page forking" that allows
> userspace to write to a page that's already busy being written to
> disk. From what I heard it bypasses a stall caused by userspace I/O
> hitting a locked page.
Page forking is an amazing thing and should really head into core, after being
thoroughly proved out of course.
> Finally, atime handling. I personally dislike the forced default of
> "relatime" for mount options and anything that can let atime updates
> happen without being a bottleneck is a plus for me.
Atime is an odious invention indeed from a developer's perspective, but
apparently well loved by some users and has real applications. Knowing which
videos you watched recently apparently being one of them. We have a pretty
good plan for it that is actually just a small development item, the main
feature of which is avoiding polluting the inode table btree, which would
cause a lot of churn and aggravate allocate-on-write issues that are already
difficult, plus be horribly unfriendly to flash. Instead, we churn a dedicated
btree array (actually a regular file) where the write-on-reads are densely
concentrated. It somehow feels good to quarantine this craziness at least.
Regards,
Daniel
More information about the Tux3
mailing list