Proposal: Persistent path

Tue Jan 1 03:41:05 PST 2013

Here is a very simple thing we can do to significantly speed up repeated access 
to the inode table in nearby places. We already probe and edit btrees using a 
"path" abstraction, where each level of the path corresponds to a level of the 
btree. We can make the path persistent - simply do not pop the path when done 
with it (which releases the blocks in the path). Instead, use the path 
traverse operation to move to the new location. This does a minimal number of 
pops and pushes, thus using references directly to block buffers and avoiding 
lookups in the volmap radix tree.

This strategy would do a lot to close the theoretical performance gap between 
Tux3, with its separate directories and inode table, and filesystem designs 
that embed inodes directly in directory entries. After being read into cache 
once, there would often be several accesses to the same itable block, which 
become lookups in the path object, and directly obtain a buffer object in a few 
nanoseconds.

When marshal updates the itable, it can just invalidate the persisent path 
object, and the front end will redo the volmap probes as necessary.

There would still remain some cases where embedded inodes could beat separate 
inode table. For example, a single, isolated fstat would need to do two probes 
in Tux3, the directory and the inode table, but only one in some theoretical 
embedded inode design. However, in mass operations the difference would quickly 
disappear, and Tux3 can even win in some cases. For example, a separate 
directories and inode table design may be considerably more compact than a 
combined design, so Tux3 might save in total block loads. Also, Tux3 will 
definitely win in a load that only or mainly accesses the directory, such as:

   ls -l *foo*

Regards,

Daniel