<div dir="ltr"><div>(resent as plain text)<br><br>On Sat, May 11, 2013 at 2:26 PM, Theodore Ts'o <<a href="mailto:tytso@mit.edu">tytso@mit.edu</a>> wrote:<br>> Dropping fsync() does a lot more than "amplify Tux3's advantage in<br>

> delete performace".  Since fsync(2) is defined as not returning until<br>> the data written to the file descriptor is flushed out to stable<br>> storage --- so it is guaranteed to be seen after a system crash --- it<br>

> means that the foreground application must not continue until the data<br>> is written by Tux3's back-end.<br>><br>> So it also means that any advantage of decoupling the front/back end<br>> is nullified, since fsync(2) requires a temporal coupling.  In fact,<br>

> if there is any delays introdued between when the front-end sends the<br>> fsync request, and when the back-end finishes writing the data and<br>> then communicates this back to the front-end --- i.e., caused by<br>

> schedular latencies, this may end up being a disadvantage compared to<br>> more traditional file system designs.<br>><br>> Like many things in file system design, there are tradeoffs.  It's<br>> perhaps more quseful when having these discussions to be clear what<br>

> you are trading off for what; in this case, the front/back design may<br>> be good for somethings, and less good for others, such as mail server<br>> workloads where fsync(2) semantics is extremely important for<br>

> application correctness.<br><br>Exactly, Ted. We avoided measuring the fsync load on this particular benchmark because we have not yet optimized fsync. When we do get to it (not an immediate priority) I expect Tux3 to perform competitively, because our delta commit scheme does manage the job with a minimal number of block writes. To have a really efficient fsync we need to isolate just the changes for the fsynced file into a special "half delta" that gets its own commit, ahead of any other pending changes to the filesystem. There is a plan for this, however we would rather not get sidetracked on that now, while we are getting ready for merge.<br>

<br></div>The point that seems to be getting a little lost in this thread is, the benchmark just as we ran it models an important and common type of workload, arguably the most common workload for real users, and the resulting performance measurement is easily reproducible for anyone who cares to try. In fact, I think we should prepare and post a detailed recipe for doing just that, since the interest level seems to be high.<br>

<div><br>Regards,<br><br>Daniel<br><br><br><br></div></div>