[Tux3] Possible solution to the deferred nameops delete issue

Daniel Phillips phillips at phunq.net
Thu Nov 27 15:42:05 PST 2008


On Thursday 27 November 2008 07:51, OGAWA Hirofumi wrote:
> Daniel Phillips <phillips at phunq.net> writes:
> 
> > The patch below seems to resolve the issue where dcache would unhash
> > any dentry that has multiple use counts at the time of a delete.  I
> > have no idea what the benefit of that is, and after scratching my head
> > about it for days, I decided there probably is no good reason at all.
> > A later dput will drop the dentry count to zero and do the d_iput.  The
> > elevated dentry count case is rare, and the guilty party is probably
> > going to do a dput very soon.  So I just can't imagine what the intent
> > of the aggressive __d_drop really was.
> 
> IIRC, usually, if user deletes the file that is still opened,
> dentry->d_count > 1. open() has the reference count:
> 
> 	fd -> filp -> filp-f_path.dentry -> dentry
> 
> and of course, unlink() should also have the reference count for self.
>         
> So, even if dentry->d_count > 1, we have to remove dentry from
> namespace, otherwise the user can open it via cached_lookup() without
> inode->i_op->lookup().
> 
> So, I think it's necessary.
> 
> Maybe, test case is:
> 
>     $ echo 111 > test.txt
>     $ sleep 30 < test.txt &
>     $ rm test.txt
>     $ cat test.txt

Thanks, I missed that obvious point entirely.  Yes, the VFS must make
the dentry invisible at the point of the delete.  There are two ways to
do that: 1) unhash the dentry; 2) turn the dentry negative.  The VFS
cannot make the dentry negative at this point because that would lose
track of the inode.  But it could call the filesystem to make the
dentry negative.  The filesystem _can_ keep track of the inode.

How about the patch below?  This provides a new dentry hook that lets
the filesystem turn the dentry negative instead of the VFS brutally
unhashing it.  The dentry still has a non-zero count at this point, and
our filesystem just becomes the new owner of that reference.  It puts
the dentry on a list for later unlinking, drops the dcache spinlocks
and does the iput (which we will also defer if iput_final is called).
Alternatively, we return 1 to tell the VFS to go ahead and unhash if we
cannot defer the unlink for some reason.

Eventually our filesystem will do the deferred unlink and dput the
dentry, which is now negative, or the holder of the last outstanding
dentry reference will.  After the filesystem has done the deferred
unlink, it does not care any more what happens to the negative dentry.

We need to be sure that the negative dentry will never be unhashed
before we finalize the deferred unlink.  All unhashes (__d_drop) fall
into five categories:

  1) On delete.  We use the new hook below to defer this case and turn
     the dentry negative even if it has nonzero reference count.

	- d_delete

  2) On final dput.  This comes either from us when we finalize the
     deferred unlink, or from a holder of an outstanding reference,
     after we have finalized the unlink and do not need the negative
     dentry any more:

	- dput

  3) On rename that implicitly deletes a target.  Our dentry will be
     unhashed, but replaced by another dentry having the same name,
     which hides our still-linked dirent just as well as a negative
     dentry.

	- d_move_locked


  4) After final sync on umount: filesystem operations are already all
     done and flushed to disk.

	- shrink_dcache_for_umount_subtree
	- prune_one_dentry
	- d_prune_aliases

  5) Library calls only used by filesystems on their own dentries:

	- d_materialise_unique
	- d_invalidate
        - d_drop

So it looks like this patch will let us rely on the negative dentry to
stay around until we do the deferred unlink.  Can you find a hole in
the analysis?

Daniel

diff --git a/fs/dcache.c b/fs/dcache.c
index 6068c25..5c5ab41 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1394,7 +1394,8 @@ void d_delete(struct dentry * dentry)
 	spin_lock(&dcache_lock);
 	spin_lock(&dentry->d_lock);
 	isdir = S_ISDIR(dentry->d_inode->i_mode);
-	if (atomic_read(&dentry->d_count) == 1) {
+	if (atomic_read(&dentry->d_count) == 1 || (dentry->d_op &&
+	    dentry->d_op->d_hide && dentry->d_op->d_hide(dentry))) {
 		dentry_iput(dentry);
 		fsnotify_nameremove(dentry, isdir);
 		return;
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index d982eb8..11c416a 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -133,6 +133,7 @@ struct dentry_operations {
 	void (*d_release)(struct dentry *);
 	void (*d_iput)(struct dentry *, struct inode *);
 	char *(*d_dname)(struct dentry *, char *, int);
+	int (*d_hide)(struct dentry *);
 };
 
 /* the dentry parameter passed to d_hash and d_compare is the parent

_______________________________________________
Tux3 mailing list
Tux3 at tux3.org
http://tux3.org/cgi-bin/mailman/listinfo/tux3



More information about the Tux3 mailing list