[Tux3] Possible solution to the deferred nameops delete issue
Daniel Phillips
phillips at phunq.net
Thu Nov 27 15:42:05 PST 2008
On Thursday 27 November 2008 07:51, OGAWA Hirofumi wrote:
> Daniel Phillips <phillips at phunq.net> writes:
>
> > The patch below seems to resolve the issue where dcache would unhash
> > any dentry that has multiple use counts at the time of a delete. I
> > have no idea what the benefit of that is, and after scratching my head
> > about it for days, I decided there probably is no good reason at all.
> > A later dput will drop the dentry count to zero and do the d_iput. The
> > elevated dentry count case is rare, and the guilty party is probably
> > going to do a dput very soon. So I just can't imagine what the intent
> > of the aggressive __d_drop really was.
>
> IIRC, usually, if user deletes the file that is still opened,
> dentry->d_count > 1. open() has the reference count:
>
> fd -> filp -> filp-f_path.dentry -> dentry
>
> and of course, unlink() should also have the reference count for self.
>
> So, even if dentry->d_count > 1, we have to remove dentry from
> namespace, otherwise the user can open it via cached_lookup() without
> inode->i_op->lookup().
>
> So, I think it's necessary.
>
> Maybe, test case is:
>
> $ echo 111 > test.txt
> $ sleep 30 < test.txt &
> $ rm test.txt
> $ cat test.txt
Thanks, I missed that obvious point entirely. Yes, the VFS must make
the dentry invisible at the point of the delete. There are two ways to
do that: 1) unhash the dentry; 2) turn the dentry negative. The VFS
cannot make the dentry negative at this point because that would lose
track of the inode. But it could call the filesystem to make the
dentry negative. The filesystem _can_ keep track of the inode.
How about the patch below? This provides a new dentry hook that lets
the filesystem turn the dentry negative instead of the VFS brutally
unhashing it. The dentry still has a non-zero count at this point, and
our filesystem just becomes the new owner of that reference. It puts
the dentry on a list for later unlinking, drops the dcache spinlocks
and does the iput (which we will also defer if iput_final is called).
Alternatively, we return 1 to tell the VFS to go ahead and unhash if we
cannot defer the unlink for some reason.
Eventually our filesystem will do the deferred unlink and dput the
dentry, which is now negative, or the holder of the last outstanding
dentry reference will. After the filesystem has done the deferred
unlink, it does not care any more what happens to the negative dentry.
We need to be sure that the negative dentry will never be unhashed
before we finalize the deferred unlink. All unhashes (__d_drop) fall
into five categories:
1) On delete. We use the new hook below to defer this case and turn
the dentry negative even if it has nonzero reference count.
- d_delete
2) On final dput. This comes either from us when we finalize the
deferred unlink, or from a holder of an outstanding reference,
after we have finalized the unlink and do not need the negative
dentry any more:
- dput
3) On rename that implicitly deletes a target. Our dentry will be
unhashed, but replaced by another dentry having the same name,
which hides our still-linked dirent just as well as a negative
dentry.
- d_move_locked
4) After final sync on umount: filesystem operations are already all
done and flushed to disk.
- shrink_dcache_for_umount_subtree
- prune_one_dentry
- d_prune_aliases
5) Library calls only used by filesystems on their own dentries:
- d_materialise_unique
- d_invalidate
- d_drop
So it looks like this patch will let us rely on the negative dentry to
stay around until we do the deferred unlink. Can you find a hole in
the analysis?
Daniel
diff --git a/fs/dcache.c b/fs/dcache.c
index 6068c25..5c5ab41 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1394,7 +1394,8 @@ void d_delete(struct dentry * dentry)
spin_lock(&dcache_lock);
spin_lock(&dentry->d_lock);
isdir = S_ISDIR(dentry->d_inode->i_mode);
- if (atomic_read(&dentry->d_count) == 1) {
+ if (atomic_read(&dentry->d_count) == 1 || (dentry->d_op &&
+ dentry->d_op->d_hide && dentry->d_op->d_hide(dentry))) {
dentry_iput(dentry);
fsnotify_nameremove(dentry, isdir);
return;
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index d982eb8..11c416a 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -133,6 +133,7 @@ struct dentry_operations {
void (*d_release)(struct dentry *);
void (*d_iput)(struct dentry *, struct inode *);
char *(*d_dname)(struct dentry *, char *, int);
+ int (*d_hide)(struct dentry *);
};
/* the dentry parameter passed to d_hash and d_compare is the parent
_______________________________________________
Tux3 mailing list
Tux3 at tux3.org
http://tux3.org/cgi-bin/mailman/listinfo/tux3
More information about the Tux3
mailing list