[Tux3] Deferred namespace operations, the Hirofumi Method
Daniel Phillips
phillips at phunq.net
Mon Dec 8 23:48:16 PST 2008
This patch uses an alternative approach to dealing with deferred unlink
of in-use files, suggested yesterday by Hirofumi. We allow d_delete to
unhash the in-use dentry as it is fond of doing, but clone a new,
negative dentry to take its place to make the file appear to be deleted
as expected, while the filesystem has not removed the dentry yet.
Unlike my previous attempt, this approach avoids making any changes to
other filesystems. There are four changes to core vfs:
1) Add a new flag DCACHE_HIDDEN that is cleared in dcache.c wherever
a dentry is changed from negative to positive (i.e., when an inode
is attached).
2) Add a d_negative inline that checks the HIDDEN flag as well as the
traditional null inode condition for negative dentry. Use the new
inline throughout fs/namei.c (arguably this should have been done
long ago for readability).
3) Add a ->hide method to d_delete that allows a filesystem to add
its own handling to dentry hiding.
4) Reserve two dentry state bits for use by the filesystem. These
are used to decide what kind of deferred processing needs to be
done on a dentry.
while a new method had to be added to d_delete, no new method was needed
for dentry instantiation, because that is already left in the hands of
the filesystem. This apparent asymmetry actually makes sense: at
instantiation time, only one task knows about the new dentry, whereas
at unlink time (d_delete, which should have been called d_unlink)
multiple tasks on multiple cpus can be accessing the dentry, so all the
work has to be done very carefully under locks.
The ext2-specific hide method provided in the patch is:
static int ext2_hide_dentry(struct dentry *dentry)
{
dentry->d_flags |= DCACHE_HIDDEN;
if (!(dentry->d_flags & DCACHE_BACKED)) {
/* converting unbacked to negative */
dput(dentry); /* Cancel dget from deferred create */
return 0;
}
if (atomic_read(&dentry->d_count) == 1) {
BUG_ON(!dentry->d_inode);
dentry->d_flags &= ~DCACHE_BACKED;
dentry->d_flags |= DCACHE_STALE;
dget(dentry);
return 1;
}
spin_unlock(&dentry->d_lock);
spin_unlock(&dcache_lock);
dentry = d_alloc(dentry->d_parent, &dentry->d_name);
dentry->d_flags |= DCACHE_STALE;
d_instantiate(dentry, NULL);
d_rehash(dentry);
spin_lock(&dentry->d_lock);
spin_lock(&dcache_lock);
return 0;
}
A zero return means d_delete should process the dentry normally, which
means unhashing any busy dentries, or directly unlinking if not busy.
A return of one tells d_delete to allow for an extra use count due to
deferring the unlink, but go ahead and unlink the dentry from the inode
now. In the case of a busy dentry that requires deferred processing,
we add a new, negative dentry to the dentry cache, which is marked
STALE so that our directory sync operation will remove the dirent. We
allow d_delete to unhash the original dentry.
This approach is not quite as efficient as the previous version because
it clones a dentry on every unlink of a busy dentry, whereas the more
invasive patch only clones when a busy, unlinked dentry is re-opened.
But the clone case is still rare. I think this new approach will work
out well.
Now I will move on to the finishing details:
1) Handle deferred inode creation
2) Handle rename
Regards,
Daniel
-------------- next part --------------
A non-text attachment was scrubbed...
Name: defer.patch
Type: text/x-diff
Size: 18390 bytes
Desc: not available
URL: <http://phunq.net/pipermail/tux3/attachments/20081208/4496a97e/attachment-0001.patch>
-------------- next part --------------
_______________________________________________
Tux3 mailing list
Tux3 at tux3.org
http://mailman.tux3.org/cgi-bin/mailman/listinfo/tux3
More information about the Tux3
mailing list