xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

David Lang david at lang.hm
Tue May 12 14:30:28 PDT 2015

On Tue, 12 May 2015, Daniel Phillips wrote:

> On 05/12/2015 11:39 AM, David Lang wrote:
>> On Mon, 11 May 2015, Daniel Phillips wrote:
>>>> ...it's the mm and core kernel developers that need to
>>>> review and accept that code *before* we can consider merging tux3.
>>> Please do not say "we" when you know that I am just as much a "we"
>>> as you are. Merging Tux3 is not your decision. The people whose
>>> decision it actually is are perfectly capable of recognizing your
>>> agenda for what it is.
>>>   http://www.phoronix.com/scan.php?page=news_item&px=MTA0NzM
>>>   "XFS Developer Takes Shots At Btrfs, EXT4"
>> umm, Phoronix has no input on what gets merged into the kernel. they also hae a reputation for
>> trying to turn anything into click-bait by making it sound like a fight when it isn't.
> Perhaps you misunderstood. Linus decides what gets merged. Andrew
> decides. Greg decides. Dave Chinner does not decide, he just does
> his level best to create the impression that our project is unfit
> to merge. Any chance there might be an agenda?
> Phoronix published a headline that identifies Dave Chinner as
> someone who takes shots at other projects. Seems pretty much on
> the money to me, and it ought to be obvious why he does it.

Phoronix turns any correction or criticism into an attack.

You need to get out of the mindset that Ted and Dave are Enemies that you need 
to overcome, they are friendly competitors, not Enemies. They assume that you 
are working in good faith (but are inexperienced compared to them), and you need 
to assume that they are working in good faith. If they ever do resort to 
underhanded means to sabotage you, Linus and the other kernel developers will 
take action. But pointing out limits in your current implementation, problems in 
your benchmarks based on how they are run, and concepts that are going to be 
difficult to merge is not underhanded, it's exactly the type of assistance that 
you should be greatful for in friendly competition.

You were the one who started crowing about how badly XFS performed. Dave gave a 
long and detailed explination about the reasons for the differences, and showing 
benchmarks on other hardware that showed that XFS works very well there. That's 
not an attack on EXT4 (or Tux3), it's an explination.

>>> The real question is, has the Linux development process become
>>> so political and toxic that worthwhile projects fail to benefit
>>> from supposed grassroots community support. You are the poster
>>> child for that.
>> The linux development process is making code available, responding to concerns from the experts in
>> the community, and letting the code talk for itself.
> Nice idea, but it isn't working. Did you let the code talk to you?
> Right, you let the code talk to Dave Chinner, then you listen to
> what Dave Chinner has to say about it. Any chance that there might
> be some creative licence acting somewhere in that chain?

I have my own concerns about how things are going to work (I've voiced some of 
them), but no, I haven't tried running Tux3 because you say it's not ready yet.

>> There have been many people pushing code for inclusion that has not gotten into the kernel, or has
>> not been used by any distros after it's made it into the kernel, in spite of benchmarks being posted
>> that seem to show how wonderful the new code is. ReiserFS was one of the first, and part of what
>> tarnished it's reputation with many people was how much they were pushing the benchmarks that were
>> shown to be faulty (the one I remember most vividly was that the entire benchmark completed in <30
>> seconds, and they had the FS tuned to not start flushing data to disk for 30 seconds, so the entire
>> 'benchmark' ran out of ram without ever touching the disk)
> You know what to do about checking for faulty benchmarks.

That requires that the code be readily available, which last I heard, Tux3 
wasn't. Has this been fixed?

>> So when Ted and Dave point out problems with the benchmark (the difference in behavior between a
>> single spinning disk, different partitions on the same disk, SSDs, and ramdisks), you would be
>> better off acknowledging them and if you can't adjust and re-run the benchmarks, don't start
>> attacking them as a result.
> Ted and Dave failed to point out any actual problem with any
> benchmark. They invented issues with benchmarks and promoted those
> as FUD.

They pointed out problems with using ramdisk to simulate a SSD and huge 
differences between spinning rust and an SSD (or disk array). Those aren't FUD.

>> As Dave says above, it's not the other filesystem people you have to convince, it's the core VFS and
>> Memory Mangement folks you have to convince. You may need a little benchmarking to show that there
>> is a real advantage to be gained, but the real discussion is going to be on the impact that page
>> forking is going to have on everything else (both in complexity and in performance impact to other
>> things)
> Yet he clearly wrote "we" as if he believes he is part of it.

He is part of the group of people who use and work with this stuff, so he is 
part of it.

> Now that ENOSPC is done to a standard way beyond what Btrfs had
> when it was merged, the next item on the agenda is writeback. That
> involves us and VFS people as you say, and not Dave Chinner, who
> only intends to obstruct the process as much as he possibly can. He
> should get back to work on his own project. Nobody will miss his
> posts if he doesn't make them. They contribute nothing of value,
> create a lot of bad blood, and just serve to further besmirch the
> famously tarnished reputation of LKML.

BTRFS is a perfect example of how not to introduce a new filesystem. Lots of 
hype, the presumption that is is going to replace all the existing filesystems 
because it's so much better (especially according to benchmarks). But then 
progress stalled before it was really ready, and it's still something most 
people avoid.

>>> You know that Tux3 is already fast. Not just that of course. It
>>> has a higher standard of data integrity than your metadata-only
>>> journalling filesystem and a small enough code base that it can
>>> be reasonably expected to reach the quality expected of an
>>> enterprise class filesystem, quite possibly before XFS gets
>>> there.
>> We wouldn't expect anyone developing a new filesystem to believe any differently.
> It is not a matter of belief, it is a matter of testable fact. For
> example, you can count the lines. You can run the same benchmarks.
> Proving the data consistency claims would be a little harder, you
> need tools for that, and some of those aren't built yet. Or, if you
> have technical ability, you can read the code and the copious design
> material that has been posted and convince yourself that, yes, there
> is something cool here, why didn't anybody do it that way before?
> But of course that starts to sound like work. Debating nontechnical
> issues and playing politics seems so much more like fun.

why are you picking a fight? there was no attack in my statement?

>> If they didn't
>> believe this, why would they be working on the filesystem instead of just using an existing filesystem.
> Right, and it is my job to convince you that what I believe for
> perfectly valid, demonstrable technical reasons, is really true. I do
> not see why you feel it is your job to convince me that the obviously
> broken Linux community process is not in fact broken, and that a
> certain person who obviously has an agenda, is not actually obstructing.

You will need to have a fully working, usable system before you can convince 
people that you are right. A partial system may look good, but how much is 
fixing the corner cases that you haven't gotten to yet going to hurt it? That 
there are going to be such cases is pretty much a given, and that changing 
things to add code to work around the pathalogical conditions is going to hurt 
the common case is pretty close to a given (it's one of those things that isn't 
mathamatically guaranteed, but happens on 99.99999+% of projects)

>> The ugly reality is that everyone's early versions of their new filesystem looks really good. The
>> problem is when they extend it to cover the corner cases and when it gets stressed by real-world (as
>> opposed to benchmark) workloads. This isn't saying that you are wrong in your belief, just that you
>> may not be right, and nobody will know until you are to a usable state and other people can start
>> beating on it.
> With ENOSPC we are at that state. Tux3 would get more testing and advance
> faster if it was merged. Things like ifdefs, grandiose new schemes for
> writeback infrastructure, dumb little hooks in the mkwrite path, those
> are all just manufactured red herrings. Somebody wanted those to be
> issues, so now they are issues. Fake ones.

Ok, so you are happy with your allocation strategy? you didn't seem to be a few 
e-mail ago.

but if you think it's ready for users, then start working to submit it in the 
next merge window. Dave said that except for one part, there was no reason not 
to merge it. That's pretty good. So you need to be discussing that one part with 
the the folks that Dave pointed you at.

> Nobody is trying to trick you. Just stating a fact. You ought to be able
> to figure out by now that Tux3 is worth merging.
> You might possibly have an argument that merging a filesystem that
> crashes as soon as it fills the disk is just sheer stupidity than can
> only lead to embarrassment in the long run, but then you would need to
> explain why Btrfs was merged. As I recall, it went something like, Chris
> had it on a laptop, so it must be a filesystem, and wow look at that
> feature list. Then it got merged in a completely unusable state and got
> worked on. If it had not been merged, Btrfs would most likely be dead
> right now. After all, who cares about an out of tree filesystem?

As I said above, Btrfs is a perfect example of how not to do things.

The other think you need to realize is that getting something in the kernel 
isn't a one-time effort, the code needs to be maintained over time (especially 
for a filesystem), and it's very possible for a developer/team/company to be so 
toxic and hostile to others that the Linux folks don't want to deal with the 
hassle of dealing with them. You are starting out on a path to put yourself into 
that category. Calm down and stop taking offense at everything. Your succeeding 
doesn't require that other people loose, so stop talking as if it's a zero sum 
game and you have to beat down the enemy to get your code accepted.

David Lang

> By the way, I gave my Tux3 presentation at SCALE 7x in Los Angeles in
> 2009, with Tux3 running as my root filesystem. By the standard applied
> to Btrfs, Tux3 should have been merged then, right? After all, our
> nospace handling worked just as well as theirs at that time.
> Regards,
> Daniel

More information about the Tux3 mailing list