[Tux3] Deferred namespace operations, change return type of fs create method

Mon Dec 8 13:02:55 PST 2008

2008/12/8 Daniel Phillips <phillips at phunq.net>:
> This updated patch implements an instantiate variant that takes care of
> the orphan dirent problem (unlinked while open) by implementing a
> variant of d_instantiate that unhashes the orphan and returns a clone of
> the open dirent in the rare case that somebody creates a entry of the
> same name before the orphan closes:

Not to hijack this thread with a general tux3 design question related
to orphaned inodes but:

In reviewing http://userweb.kernel.org/~hirofumi/tux3/doc/design.html
I saw that forward logging should enable:
"logging orphan inodes that are unlinked while open, so they can be
deleted on replay after a crash."

and

"One traditional nasty case that becomes really nice with logical
forward logging is truncate of a gigantic file. We just need to commit
a logical update like ['resize', inum, 0] then the inode data truncate
can proceed as convenient. Another is orphan inode handling where an
open file has been completely unlinked, in which case we log the
logical change ['free', inum] then proceed with the actual delete when
the file is closed or when the log is replayed after a surprise
reboot."

So putting my distributed filesystem hat on: One unfortunate aspect of
ext3 is that orphaned inode processing after a crash blindly deletes
all inodes with n_link==0.  This is a problem if a remote client
application still has the orphaned inode open but the filesystem was
unmounted (either forcibly in the case of a Linux crash; or cleanly if
write access to the fs was revoked on a given server, e.g. filesystem
ownership migrated to another server).  It is a problem because the
new owning server will re-mount the fs and the conventional orphaned
inode processing will cleanup the orphaned inodes out from underneath
the remote client application; whereby breaking the application.

So my question is, how might tux3 be trained to _not_ cleanup orphaned
inodes on re-mount like conventional Linux fileystems?  Could a
re-mount filter be added that would trap and then somehow reschedule
tux3's deferred delete of orphan inodes?  This would leave a window of
time for an exposed hook to be called (by an upper layer) to
reconstitute a reference on each orphaned inode that is still open.

thanks,
Mike

_______________________________________________
Tux3 mailing list
Tux3 at tux3.org
http://mailman.tux3.org/cgi-bin/mailman/listinfo/tux3