xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

Theodore Ts'o tytso at mit.edu
Tue May 12 16:55:00 PDT 2015


On Tue, May 12, 2015 at 03:35:43PM -0700, David Lang wrote:
> 
> I happen to think that it's correct. It's not that Ext4 isn't tested, but
> that people's expectations of how much it's been tested, and at what scale
> don't match the reality.

Ext4 is used at Google, on a very large number of disks.  Exactly how
large is not something I'm allowed to say, but there's a very amusing
Ted Talk by Randall Munroe (of xkcd fame) on that topic:

http://tedsummaries.com/2014/05/14/randall-munroe-comics-that-ask-what-if/

One thing I can say is that shortly after we deployed ext4 at Google,
thanks to having a very large number of disks, and because we have
very good system monitoring, we detected a file system corruption
problem that happened with a very low probability, but we had enough
disks that we could detect the pattern.  (Fortunately, because
Google's cluster file system has replication and/or erasure coding, no
user data was lost.)  Even though we could notice the problem, it took
us several months to track down the problem.

When we finally did, it turned out to be a race condition which only
took place under high memory pressure.  What was *very* amusing was
after fixing the problem for ext4, I looked at ext3, and discovered
that (a) the ext4 had inerited the bug was also in ext3, and (b) the
bug in ext3 had not been noticed in several enterprise distribution
testing runs done by Red Hat, SuSE, and IBM --- for well over a
**decade**.

What this means is that it's hard for *any* file system to be that
well tested; it's hard to substitute for years and years of production
use, hopefully in systems that have very rigorous monitoring so you
would notice if data or file system metadata is getting corrupted in
ways that can't be explained as hardware errors.  The fact that we
found a bug that was never discovered in ext3 after years and years of
use in many enterprises is a testimony to that fact.

(This is also why the fact that Facebook has started using btrfs in
production is going to be a very good thing for btrfs.  I'm sure they
will find all sorts of problems once they start running at large
scale, which is a _good_ thing; that's how those problems get fixed.)

Of course, using xfstests certainly helps a lot, and so in my opinion
all serious file system developers should be regularly using xfstests
as a part of the daily development cycle, and to be be extremely
ruthless about not allowing any test regressions.

Best regards,

					- Ted



More information about the Tux3 mailing list