[PATCH] Optimize wait_sb_inodes()
hirofumi at mail.parknet.co.jp
Wed Jun 26 22:18:17 PDT 2013
Dave Chinner <david at fromorbit.com> writes:
>> Optimizing wait_sb_inodes() might help lock contention, but it doesn't
>> help unnecessary wait/check.
> You have your own wait code, that doesn't make what the VFS does
> unnecesary. Quite frankly, I don't trust individual filesystems to
> get it right - there's a long history of filesystem specific data
> sync problems (including in XFS), and the best way to avoid that is
> to ensure the VFS gets it right for you.
> Indeed, we've gone from having sooper special secret sauce data sync
> code in XFS to using the VFS over the past few years, and the result
> is that it is now more reliable and faster than when we were trying
> to be smart and do it all ourselves. We got to where we are by
> fixing the problems in the VFS rather than continuing to try to work
> around them.
I guess you are assuming FS which is using data=writeback or such.
>> Since some FSes know about current
>> in-flight I/O already in those internal, so I think, those FSes can be
>> done it here, or are already doing in ->sync_fs().
> Sure, do your internal checks in ->sync_fs(), but if
> wait_sb_inodes() does not have any lock contention and very little
> overhead, then why do you need to avoid it? This wait has to be done
> somewhere between sync_inodes_sb() dispatching all the IO and
> ->sync_fs completing, so what's the problem with hving the VFS do
> that *for everyone* efficiently?
Are you saying the vfs should track all in-flight I/O with some sort of
Otherwise, vfs can't know the data is whether after sync point or before
sync point, and have to wait or not. FS is using the behavior like
data=journal has tracking of those already, and can reuse it.
> Fix the root cause of the problem - the sub-optimal VFS code.
> Hacking around it specifically for out-of-tree code is not the way
> things get done around here...
I'm thinking the root cause is vfs can't have knowledge of FS internal,
e.g. FS is handling data transactional way, or not.
OGAWA Hirofumi <hirofumi at mail.parknet.co.jp>
More information about the Tux3