<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN" "http://www.w3.org/TR/REC-html40/strict.dtd">
<html><head><meta name="qrichtext" content="1" /><style type="text/css">
p, li { white-space: pre-wrap; }
</style></head><body style=" font-family:'Ubuntu'; font-size:11pt; font-weight:400; font-style:normal;">
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Two weeks ago, Hirofumi completed the kernel implementation of the last major design element needed for realistic performance measurements, while operating with expected atomicity and durability guarantees of a modern filesystem. I am happy to be able to say that early results indicate that Tux3 now shows performance competitive with other Linux filesystems and may possibly have taken the lead in some respects. Here are some details of this recent work.</p>
<p style="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; "> </p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Front/back separation in Tux3 decouples front end filesystem updates (Posix syscalls) from back end atomic delta transfer to media so that the user only observes cache transfer latency, not the overhead of preparing dirty cache for transfer to media, which is done in an asynchronous background task instead. This relatively small change improves performance significantly, bringing Tux3 near the performance of Tmpfs, a pure cache filesystem. In particular, this makes Tux3 somewhat faster than Ext4 and quite a lot faster than Btrfs for the particular benchmarks we have run so far.</p>
<p style="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; "> </p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">In spirit, front/back separation resembles the "delayed allocation" employed by Ext4 to improve write performance significantly. However, Tux3 does not limit this simply to disk allocations, instead it delays every kind of filesystem change: create, delete and rename directory operations, inode attribute changes, truncates, and in short, anything that affects disk. And these operations are not just delayed, but hived off into an entirely separate task context where they do not affect foreground task latency.</p>
<p style="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; "> </p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Implementing front/back separation efficiently, without major stalls where the front end waits on the back end, was challenging, and credit for this great work all goes to Hirofumi, who has checked in a stunning amount of beautiful, reliable and highly performant code over the past few months. One of the problems we needed to address was, what happens if a dirty data page is being written out as part of a "delta" atomic update and a user task wants to rewrite that same page? Should the user task wait until the the page has been fully written to media, to avoid polluting the earlier delta with more recent changes? That would stall the front end transaction, perhaps for several milliseconds, which is not very nice. Instead, we "fork" the dirty page, creating a new copy in cache that the front end can modify without affecting an earlier delta, or worse, changing the page contents halfway through a DMA transfer. This is done with lightweight or no locking to keep front end stalls as few and brief as possible. Analysis with perf shows that our front end is indeed very well decoupled from the back end, and this shows up as excellent benchmark numbers.</p>
<p style="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; "> </p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Dirty page forking is a key technique for Tux3, and recent work by other kernel developers suggest that similar techniques are likely to become pervasive throughout the kernel. See the "stable page" work that has been going on for the last few months. But Tux3 uses its fork technique to improve performance, whereas so far the ongoing stable page work has slowed things down. If Tux3 had nothing else to offer, an effective implementation of forking would already be enough. But that is just one of several significant innovations in Tux3 that are likely to change the way Linux filesystems are designed and built. I will touch on some examples in upcoming posts.</p>
<p style="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; "> </p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">With page forking, Tux3 implements stronger data consistency semantics than have so far been seen on Linux, even stronger than Ext4's "data=journal" mode, which has performance issues. To be clear about this point, when we compare Tux3 performance to Ext4 performance, Tux3 is actually doing more than Ext4 because we guarantee that files will always be committed to disk with their correct data and correct directory entries, no matter what kind of bad luck you may have with sudden interruptions, and no matter whether you remember to do fsync or fdatasync in all the right places. And we provide guarantees about the order in which updates arrive on disk that may avoid the need for performance-harming fsync operations in common situations. And we do not leave data sitting in dirty cache for an unpredictable period. Instead, we commit all dirty cache data to disk at each delta, and in a predictable order. You would think that all the extra care we take with data consistency would slow Tux3 down, but the opposite would now appear to be the case.</p>
<p style="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; "> </p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Our front/back separation is still not quite perfect. The front end still stalls on some back end locks from time to time. We have a pretty good idea what to do about it and it is on the list of things to do. But at this point, it works well enough that other work has become higher priority.</p>
<p style="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; "> </p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">The way Tux3 transfers log blocks to disk was also improved, gaining a little more performance and widening what appears to be a performance lead for Tux3 in the cases we exercise with the Fsx and Fsstress filesystem stress scripts.</p>
<p style="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; "> </p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">There are cases where Tux3 is still not the fastest Linux filesystem. For example, our disk layout algorithm is too simplistic to avoid read fragmentation under many common loads. And there is scalability work to do, particularly in the areas of directory indexing and free resource management. However, we are not aware of any cases where Tux3 is limited by its design to being less than competitive as a general purpose filesystem on Linux.</p>
<p style="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; "> </p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Now here is the bad news: for today I will not post any actual numbers, because some work still remains to prepare these properly. Coming soon.</p>
<p style="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; "> </p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Regards,</p>
<p style="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; "> </p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Daniel</p>
<p style="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; "> </p></body></html>