[Tux3] Cool features

Fri Jan 9 03:41:18 PST 2009

On 1/8/09, Daniel Phillips <phillips at phunq.net> wrote:
>
> On Wednesday 07 January 2009 07:21, OGAWA Hirofumi wrote:
> > Daniel Phillips <phillips at phunq.net> writes:
> >
> > >> Filesystem freeze :
> > >> Get an utility that flush cache and return something when it's done,
> then
> > >> freeze IO on disk and throttle/stack in a memory buffer until it's
> full.
> > >> When it's full, return again something and resume normal operation, or
> > >> freeze IO until we ask to resume. This in order to take clean
> snapshots when
> > >> backend support versionning. Even if it's not necessary due to tux3
> design,
> > >> it would be nice to be able to do it in order to ensure that some IO
> are
> > >> commited to disk, then get some time to do something to the disk
> backend,
> > >> with no impact on the filesystem side.
> > >
> > > I think all you want there is the ability to treat a snapshot as a
> > > barrier: user asks for a snapshot, Tux3 starts a new delta and sets a
> > > flag on it; when that snapshot has committed, the snapshot request
> > > is acknowledged.  That way, the user gets a snapshot of what has been
> > > sent to the filesystem most recently, without needing to stall the
> > > filesystem throughput.
> > >
> > > Tux3 does not need a new memory buffer for this, the needed mechanism
> > > is just what has already been designed.
> >
> > FWIW, if it is needed, we just implement ->write_super_lockfs nad
> unlockfs.
> > And I guess it shouldn't be hard.
>
> Yes, we will support lock/unlockfs.  I think he was asking for something
> a little different.  IO is supposed to continue to memory while block IO
> is prevented, to take a clean snapshot.  We can do that better.  I think
> What he really wants is for the snapshot to include the most recent
> changes, and not stall the filesystem.  lockfs causes a big stall and
> danger of system lockup too.

Yes exactly !

> What we can provide instead is a command to set a new delta
> that carries a new volume version.  All changes after that delta belong
> to the next snapshot.  When the delta that carries the snapshot completes
> to disk, the snapshot setting command returns.  This way, the user gets
> exactly what they expect: whatever they wrote before the snapshot ends
> up on disk.  And the only thing that stalls is the snapshot command: if
> there is parallel IO on the filesystem, it will continue and belong to
> the next snapshot.

Very cool, my explanation was apparently clear enought or you can access my
brain's filesystem :)
(If the second hypothesis is the good one please share the access with my
wife)

Contrary to popular belief, lockfs does not make snapshots "clean".
> Parallel application IO can be interrupted at any point by lockfs.
> There is nothing clean about that.  The only way to make a clean
> snapshot with respect to application IO is for applications to be aware
> of the snapshot interface and prepare themselves accordingly.  Because
> there is no standard interface, no applications do that.  So lockfs does
> not produce any cleaner results than just taking a snapshot at some
> random time.  What it does do, is flush dirty cache to disk before the
> snapshot.  Another way to do that is sync(1), without danger of system
> lockup.  So I don't know what lockfs is actually supposed to accomplish.

Yes, but if I'm not mistaken sync does not hold IO's to disk until you
decide to release them, so if you don't have a filesystem freeze feature to
hold them, bad things still can happen if you're hammering block devices
with requests (most common case in fact). if you host Databases on the
filesystem, it's very important to have that feature to be able to
syncronize the snapshot with the application layer, and getting consistant
database in snapshots is crutial otherwise versioning isn't really
interesting.

Regards,
>
> Daniel

Having this little tool can greatly increase tux3 popularity in enterprise
class applications.
And this is a cool futur proof feature for a feature.
Tux3-clustered, even with lilmitations, would be the "cerise sur le gâteau"
that would really put this filesystem ahead of others :)

Best regards

Michael
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://phunq.net/pipermail/tux3/attachments/20090109/783431ab/attachment-0001.html>
-------------- next part --------------
_______________________________________________
Tux3 mailing list
Tux3 at tux3.org
http://mailman.tux3.org/cgi-bin/mailman/listinfo/tux3