[Tux3] Deferred namespace operations, Split up ext2_new_inode

Daniel Phillips phillips at phunq.net
Wed Dec 10 01:17:54 PST 2008


This diff against my previous patch splits ext2_new_inode into front end 
and back end parts in preparation for deferring the back end part.  The 
front end just initializes the in-memory inode as with ramfs and leaves 
the inode number unassigned.  The back end goes fishing around in inode 
allocation maps to find a suitable free inode and assigns the inode 
number to the in-memory inode.  Right now, the front end part just 
calls the back end, so not much has changed yet.

In the next iteration, the back end part will be deferred so that when a 
file is created and before a sync is done, the file has no inode 
number.  This raises a couple of issues:

  * What about sys_fstat, which exposes the inode number to user
    applications?

  * What about NFS, which needs inode numbers to generate stable
    handles?

The inode number is not actually used in that many places in Ext2, which 
is good.  The most important user is ext2_iget which looks up an inode 
in the vfs inode cache given an inode number.  This is only used in two 
places: ext2_lookup and ext2_get_parent.  The latter is for NFS, which 
we will worry about later.  The former does a "real lookup" in the 
filesystem for any name the vfs fails to find in the dentry cache.  But 
we pin the new dentry in cache just to ensure that a real lookup is 
never performed for a new inode before we complete the deferred back 
end update of filesystem blocks.

To give NFS the real inodes it needs we would introduce a "wait on ino 
assignment" operation, however that is outside the scope of this Ext2 
patch.  Tux3 will have this, but all we want to demonstrate with Ext2 
is that namespace consistency can be maintained while updates to 
directory and inode table blocks are deferred.

Sys_fstat will use the wait-on-ino-assigned strategy.  This will most 
likely be implemented as a wait-on-bit operation, and we introduce a 
new inode flag to indicate an inode number has been assigned.  (The 
kernel wait-on-bit facility uses hashed locks that do not require 
adding new lock or wait fields to objects, so the space cost is just 
one new flags bit.)

Quota subsystem initialization and security hooks add additional 
complexity to the new_inode regimen for Ext2.  I am not sure whether to 
do those things in the front end or back end.  Probably in the front 
end, but as I did not look closely at this, I left them in the back end 
for now.

Regards,

Daniel
-------------- next part --------------
A non-text attachment was scrubbed...
Name: defer.diff
Type: text/x-diff
Size: 4930 bytes
Desc: not available
URL: <http://phunq.net/pipermail/tux3/attachments/20081210/3e639a13/attachment-0001.diff>
-------------- next part --------------
_______________________________________________
Tux3 mailing list
Tux3 at tux3.org
http://mailman.tux3.org/cgi-bin/mailman/listinfo/tux3


More information about the Tux3 mailing list