[Tux3] Design note: Data flush and rename ordering
Daniel Phillips
phillips at phunq.net
Fri Mar 27 18:59:40 PDT 2009
On Friday 27 March 2009, OGAWA Hirofumi wrote:
> Daniel Phillips <phillips at phunq.net> writes:
>
> > The problem is, Ext4 holds recently written file data in cache even
> > across an atomic (journalled) update of directory metadata. While a
> > strict reading of Posix permits this, application writers do not expect
> > it and I think we want to define stronger semantics for Tux3. That is,
> > we should guarantee that a rename will never be committed to disk
> > before the source file of the rename is flushed.
> >
> > Our initial implementation of atomic commit will always flush every
> > dirty inode to disk at each delta transition, which provides the above
> > guarantee by default. That is, a rename will always be committed in or
> > after the delta that flushes its source inode.
>
> A bit related to this (fsync()).
>
> http://lkml.org/lkml/2009/3/26/72
>
> And personally, this is one of behaviors on ext3 which I hate very much.
> [the process waits the unrelated transactions to that job]
I often have horrible system response while doing simple things like
cp -a or scp on a local network, and this has been going on for years.
Just a single CPU workstation, nothing unusual. I often wonder why
there have not been more complaints.
I have not done experiments to see if the cause is filesystem, block
IO, process scheduling or memory management. It must be one of those.
>From the general comments it sounds like the filesystem, Ext3, may at
least contribute to the problem. It would be easy enough to try similar
loads on Ext2.
If Ext3 is the problem, then this is highly topical for us. We hope
to implement a largely non-blocking commit model for Tux3 and have
discussed the strategy for doing this. I expect we will run into some
surprises when we actually start running with atomic commit, and shortly
after, start working on the layered cache model. Also, our initial
fsync will just be a sync, roughly speaking, which is sure to cause
latency problems, so we will need a finer grained model there. And
then, with those things working fairly well, we may uncover other
bottlenecks that are shared by all Linux filesystem. I don't know, I
can only speculate at this point, but soon we will start to be able to
tackle those issues. It promises to be interesting.
For now, I am just going to read the thread and keep working towards a
basic atomic commit.
Regards,
Daniel
_______________________________________________
Tux3 mailing list
Tux3 at tux3.org
http://mailman.tux3.org/cgi-bin/mailman/listinfo/tux3
More information about the Tux3
mailing list