[Tux3] Deferred namespace operations, change return type of fs create method
Mike Snitzer
snitzer at gmail.com
Mon Dec 8 13:02:55 PST 2008
2008/12/8 Daniel Phillips <phillips at phunq.net>:
> This updated patch implements an instantiate variant that takes care of
> the orphan dirent problem (unlinked while open) by implementing a
> variant of d_instantiate that unhashes the orphan and returns a clone of
> the open dirent in the rare case that somebody creates a entry of the
> same name before the orphan closes:
Not to hijack this thread with a general tux3 design question related
to orphaned inodes but:
In reviewing http://userweb.kernel.org/~hirofumi/tux3/doc/design.html
I saw that forward logging should enable:
"logging orphan inodes that are unlinked while open, so they can be
deleted on replay after a crash."
and
"One traditional nasty case that becomes really nice with logical
forward logging is truncate of a gigantic file. We just need to commit
a logical update like ['resize', inum, 0] then the inode data truncate
can proceed as convenient. Another is orphan inode handling where an
open file has been completely unlinked, in which case we log the
logical change ['free', inum] then proceed with the actual delete when
the file is closed or when the log is replayed after a surprise
reboot."
So putting my distributed filesystem hat on: One unfortunate aspect of
ext3 is that orphaned inode processing after a crash blindly deletes
all inodes with n_link==0. This is a problem if a remote client
application still has the orphaned inode open but the filesystem was
unmounted (either forcibly in the case of a Linux crash; or cleanly if
write access to the fs was revoked on a given server, e.g. filesystem
ownership migrated to another server). It is a problem because the
new owning server will re-mount the fs and the conventional orphaned
inode processing will cleanup the orphaned inodes out from underneath
the remote client application; whereby breaking the application.
So my question is, how might tux3 be trained to _not_ cleanup orphaned
inodes on re-mount like conventional Linux fileystems? Could a
re-mount filter be added that would trap and then somehow reschedule
tux3's deferred delete of orphan inodes? This would leave a window of
time for an exposed hook to be called (by an upper layer) to
reconstitute a reference on each orphaned inode that is still open.
thanks,
Mike
_______________________________________________
Tux3 mailing list
Tux3 at tux3.org
http://mailman.tux3.org/cgi-bin/mailman/listinfo/tux3
More information about the Tux3
mailing list