[Tux3] Deferred namespace operations, Split up ext2_new_inode
Daniel Phillips
phillips at phunq.net
Wed Dec 10 01:17:54 PST 2008
This diff against my previous patch splits ext2_new_inode into front end
and back end parts in preparation for deferring the back end part. The
front end just initializes the in-memory inode as with ramfs and leaves
the inode number unassigned. The back end goes fishing around in inode
allocation maps to find a suitable free inode and assigns the inode
number to the in-memory inode. Right now, the front end part just
calls the back end, so not much has changed yet.
In the next iteration, the back end part will be deferred so that when a
file is created and before a sync is done, the file has no inode
number. This raises a couple of issues:
* What about sys_fstat, which exposes the inode number to user
applications?
* What about NFS, which needs inode numbers to generate stable
handles?
The inode number is not actually used in that many places in Ext2, which
is good. The most important user is ext2_iget which looks up an inode
in the vfs inode cache given an inode number. This is only used in two
places: ext2_lookup and ext2_get_parent. The latter is for NFS, which
we will worry about later. The former does a "real lookup" in the
filesystem for any name the vfs fails to find in the dentry cache. But
we pin the new dentry in cache just to ensure that a real lookup is
never performed for a new inode before we complete the deferred back
end update of filesystem blocks.
To give NFS the real inodes it needs we would introduce a "wait on ino
assignment" operation, however that is outside the scope of this Ext2
patch. Tux3 will have this, but all we want to demonstrate with Ext2
is that namespace consistency can be maintained while updates to
directory and inode table blocks are deferred.
Sys_fstat will use the wait-on-ino-assigned strategy. This will most
likely be implemented as a wait-on-bit operation, and we introduce a
new inode flag to indicate an inode number has been assigned. (The
kernel wait-on-bit facility uses hashed locks that do not require
adding new lock or wait fields to objects, so the space cost is just
one new flags bit.)
Quota subsystem initialization and security hooks add additional
complexity to the new_inode regimen for Ext2. I am not sure whether to
do those things in the front end or back end. Probably in the front
end, but as I did not look closely at this, I left them in the back end
for now.
Regards,
Daniel
-------------- next part --------------
A non-text attachment was scrubbed...
Name: defer.diff
Type: text/x-diff
Size: 4930 bytes
Desc: not available
URL: <http://phunq.net/pipermail/tux3/attachments/20081210/3e639a13/attachment-0001.diff>
-------------- next part --------------
_______________________________________________
Tux3 mailing list
Tux3 at tux3.org
http://mailman.tux3.org/cgi-bin/mailman/listinfo/tux3
More information about the Tux3
mailing list