[Tux3] Deferred namespace operations, change return type of fs create method

Mon Dec 8 14:20:03 PST 2008

On Monday 08 December 2008 13:02, Mike Snitzer wrote:
> 2008/12/8 Daniel Phillips <phillips at phunq.net>:
> > This updated patch implements an instantiate variant that takes care of
> > the orphan dirent problem (unlinked while open) by implementing a
> > variant of d_instantiate that unhashes the orphan and returns a clone of
> > the open dirent in the rare case that somebody creates a entry of the
> > same name before the orphan closes:
> 
> Not to hijack this thread with a general tux3 design question related
> to orphaned inodes but:
> 
> In reviewing http://userweb.kernel.org/~hirofumi/tux3/doc/design.html
> I saw that forward logging should enable:
> "logging orphan inodes that are unlinked while open, so they can be
> deleted on replay after a crash."
> 
> and
> 
> "One traditional nasty case that becomes really nice with logical
> forward logging is truncate of a gigantic file. We just need to commit
> a logical update like ['resize', inum, 0] then the inode data truncate
> can proceed as convenient. Another is orphan inode handling where an
> open file has been completely unlinked, in which case we log the
> logical change ['free', inum] then proceed with the actual delete when
> the file is closed or when the log is replayed after a surprise
> reboot."
> 
> So putting my distributed filesystem hat on: One unfortunate aspect of
> ext3 is that orphaned inode processing after a crash blindly deletes
> all inodes with n_link==0.  This is a problem if a remote client
> application still has the orphaned inode open but the filesystem was
> unmounted (either forcibly in the case of a Linux crash; or cleanly if
> write access to the fs was revoked on a given server, e.g. filesystem
> ownership migrated to another server).  It is a problem because the
> new owning server will re-mount the fs and the conventional orphaned
> inode processing will cleanup the orphaned inodes out from underneath
> the remote client application; whereby breaking the application.
> 
> So my question is, how might tux3 be trained to _not_ cleanup orphaned
> inodes on re-mount like conventional Linux fileystems?  Could a
> re-mount filter be added that would trap and then somehow reschedule
> tux3's deferred delete of orphan inodes?  This would leave a window of
> time for an exposed hook to be called (by an upper layer) to
> reconstitute a reference on each orphaned inode that is still open.

Something like the NFS silly rename problem.  There, the client avoids
closing a file by renaming it instead, which creates a cleanup problem.
Something more elegant ought to be possible.

If the dirent is gone, leaving an orphaned inode, and the filesystem
has been convinced not to delete the orphan on restart, how would you
re-open the file?  Open by inode number from within kernel?

Regards,

Daniel

_______________________________________________
Tux3 mailing list
Tux3 at tux3.org
http://mailman.tux3.org/cgi-bin/mailman/listinfo/tux3