xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

Martin Steigerwald martin at lichtvoll.de
Thu Apr 30 02:00:05 PDT 2015


Am Donnerstag, 30. April 2015, 10:20:08 schrieb Dave Chinner:
> On Wed, Apr 29, 2015 at 09:05:26PM +0200, Mike Galbraith wrote:
> > Here's something that _might_ interest xfs folks.
> > 
> > cd git (source repository of git itself)
> > make clean
> > echo 3 > /proc/sys/vm/drop_caches
> > time make -j8 test
> > 
> > ext4    2m20.721s
> > xfs     6m41.887s <-- ick
> > btrfs   1m32.038s
> > tux3    1m30.262s
> > 
> > Testing by Aunt Tilly: mkfs, no fancy switches, mount the thing, test.
> 
> TL;DR: Results are *very different* on a 256GB Samsung 840 EVO SSD
> with slightly slower CPUs (E5-4620 @ 2.20GHz)i, all filesystems
> using defaults:
> 
> 	real		user		sys
> xfs	3m16.138s	7m8.341s	14m32.462s
> ext4	3m18.045s	7m7.840s	14m32.994s
> btrfs	3m45.149s	7m10.184s	16m30.498s
> 
> What you are seeing is physical seek distances impacting read
> performance.  XFS does not optimise for minimal physical seek
> distance, and hence is slower than filesytsems that do optimise for
> minimal seek distance. This shows up especially well on slow single
> spindles.
> 
> XFS is *adequate* for the use on slow single drives, but it is
> really designed for best performance on storage hardware that is not
> seek distance sensitive.
> 
> IOWS, XFS just hates your disk. Spend $50 and buy a cheap SSD and
> the problem goes away. :)


I am quite surprised that a traditional filesystem that was created in the 
age of rotating media does not like this kind of media and even seems to 
excel on BTRFS on the new non rotating media available.

But…

> ----
> 
> And now in more detail.
> 
> It's easy to be fast on empty filesystems. XFS does not aim to be
> fast in such situations - it aims to have consistent performance
> across the life of the filesystem.

… this is a quite important addition.

> Thing is, once you've abused those filesytsems for a couple of
> months, the files in ext4, btrfs and tux3 are not going to be laid
> out perfectly on the outer edge of the disk. They'll be spread all
> over the place and so all the filesystems will be seeing large seeks
> on read. The thing is, XFS will have roughly the same performance as
> when the filesystem is empty because the spreading of the allocation
> allows it to maintain better locality and separation and hence
> doesn't fragment free space nearly as badly as the oher filesystems.
> Free space fragmentation is what leads to performance degradation in
> filesystems, and all the other filesystem will have degraded to be
> *much worse* than XFS.

I even still see hungs on what I tend to see as freespace fragmentation in 
BTRFS. My /home on a Dual (!) BTRFS SSD setup can basically stall to a 
halt when it has reserved all space of the device for chunks. So this

merkaba:~> btrfs fi sh /home
Label: 'home'  uuid: […]
        Total devices 2 FS bytes used 129.48GiB
        devid    1 size 170.00GiB used 146.03GiB path /dev/mapper/msata-
home
        devid    2 size 170.00GiB used 146.03GiB path /dev/mapper/sata-
home

Btrfs v3.18
merkaba:~> btrfs fi df /home
Data, RAID1: total=142.00GiB, used=126.72GiB
System, RAID1: total=32.00MiB, used=48.00KiB
Metadata, RAID1: total=4.00GiB, used=2.76GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

is safe, but one I have size 170 GiB user 170 GiB, even if inside the 
chunks there is enough free space to allocate from, enough as in 30-40 
GiB, it can happen that writes are stalled up to the point that 
applications on the desktop freeze and I see hung task messages in kernel 
log.

This is the case upto kernel 4.0. I have seen Chris Mason fixing some write 
stalls for big facebook setups, maybe it will help here, but unless this 
issue is fixed, I think BTRFS is not yet fully production ready, unless you 
leave *huge* amount of free space, as in for 200 GiB of data you want to 
write make a 400 GiB volume.

> Put simply: empty filesystem benchmarking does not show the real
> performance of the filesystem under sustained production workloads.
> Hence benchmarks like this - while interesting from a theoretical
> point of view and are widely used for bragging about whose got the
> fastest - are mostly irrelevant to determining how the filesystem
> will perform in production environments.
> 
> We can also look at this algorithm in a different way: take a large
> filesystem (say a few hundred TB) across a few tens of disks in a
> linear concat.  ext4, btrfs and tux3 will only hit the first disk in
> the concat, and so go no faster because they are still bound by
> physical seek times.  XFS, however, will spread the load across many
> (if not all) of the disks, and so effectively reduce the average
> seek time by the number of disks doing concurrent IO. Then you'll
> see that application level IO concurrency becomes the performance
> limitation, not the physical seek time of the hardware.

That are the allocation groups. I always wondered how it can be beneficial 
to spread the allocations onto 4 areas of one partition on expensive seek 
media. Now that makes better sense for me. I always had the gut impression 
that XFS may not be the fastest in all cases, but it is one of the 
filesystem with the most consistent performance over time, but never was 
able to fully explain why that is.

Thanks,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7



More information about the Tux3 mailing list