[Tux3] Deferred namespace operations, the Hirofumi Method

Mon Dec 8 23:48:16 PST 2008

This patch uses an alternative approach to dealing with deferred unlink 
of in-use files, suggested yesterday by Hirofumi.  We allow d_delete to 
unhash the in-use dentry as it is fond of doing, but clone a new, 
negative dentry to take its place to make the file appear to be deleted 
as expected, while the filesystem has not removed the dentry yet.

Unlike my previous attempt, this approach avoids making any changes to 
other filesystems.  There are four changes to core vfs:

  1) Add a new flag DCACHE_HIDDEN that is cleared in dcache.c wherever
     a dentry is changed from negative to positive (i.e., when an inode
     is attached).

  2) Add a d_negative inline that checks the HIDDEN flag as well as the
     traditional null inode condition for negative dentry.  Use the new
     inline throughout fs/namei.c (arguably this should have been done
     long ago for readability).

  3) Add a ->hide method to d_delete that allows a filesystem to add
     its own handling to dentry hiding.

  4) Reserve two dentry state bits for use by the filesystem.  These
     are used to decide what kind of deferred processing needs to be
     done on a dentry.

while a new method had to be added to d_delete, no new method was needed 
for dentry instantiation, because that is already left in the hands of 
the filesystem.  This apparent asymmetry actually makes sense: at 
instantiation time, only one task knows about the new dentry, whereas 
at unlink time (d_delete, which should have been called d_unlink) 
multiple tasks on multiple cpus can be accessing the dentry, so all the 
work has to be done very carefully under locks.

The ext2-specific hide method provided in the patch is:

static int ext2_hide_dentry(struct dentry *dentry)
{
	dentry->d_flags |= DCACHE_HIDDEN;
	if (!(dentry->d_flags & DCACHE_BACKED)) {
		/* converting unbacked to negative */
		dput(dentry); /* Cancel dget from deferred create */
		return 0;
	}
	if (atomic_read(&dentry->d_count) == 1) {
		BUG_ON(!dentry->d_inode);
		dentry->d_flags &= ~DCACHE_BACKED;
		dentry->d_flags |= DCACHE_STALE;
		dget(dentry);
		return 1;
	}
	spin_unlock(&dentry->d_lock);
	spin_unlock(&dcache_lock);
	dentry = d_alloc(dentry->d_parent, &dentry->d_name);
	dentry->d_flags |= DCACHE_STALE;
	d_instantiate(dentry, NULL);
	d_rehash(dentry);
	spin_lock(&dentry->d_lock);
	spin_lock(&dcache_lock);
	return 0;
}

A zero return means d_delete should process the dentry normally, which 
means unhashing any busy dentries, or directly unlinking if not busy.  
A return of one tells d_delete to allow for an extra use count due to 
deferring the unlink, but go ahead and unlink the dentry from the inode 
now.  In the case of a busy dentry that requires deferred processing, 
we add a new, negative dentry to the dentry cache, which is marked 
STALE so that our directory sync operation will remove the dirent.  We 
allow d_delete to unhash the original dentry.

This approach is not quite as efficient as the previous version because 
it clones a dentry on every unlink of a busy dentry, whereas the more 
invasive patch only clones when a busy, unlinked dentry is re-opened.  
But the clone case is still rare.  I think this new approach will work 
out well.

Now I will move on to the finishing details:

  1) Handle deferred inode creation
  2) Handle rename

Regards,

Daniel
-------------- next part --------------
A non-text attachment was scrubbed...
Name: defer.patch
Type: text/x-diff
Size: 18390 bytes
Desc: not available
URL: <http://phunq.net/pipermail/tux3/attachments/20081208/4496a97e/attachment-0001.patch>
-------------- next part --------------
_______________________________________________
Tux3 mailing list
Tux3 at tux3.org
http://mailman.tux3.org/cgi-bin/mailman/listinfo/tux3