[Tux3] Tux3 kernel port - about ext2_get_inode

Sat Aug 30 16:49:03 PDT 2008

On Tuesday 26 August 2008 22:42, Daniel Phillips wrote:
> Skeleton code for Tux3:
> 
>    inode = tux3_<various creates>
>       inode = tux3_create(sb, name, len, iattrs)
>          inode = new_inode(sb)
>          tux3_get_inode(inum, &buffer, iattrs)
>             <encode new attrs to buffer>
>          tux3_create_entry(dir, name, len...))
> 
>    inode = tux3_iget(sb, name, len)
>       inode = iget_locked(sb, ino)
>        <return cached inode unless inode is marked new>
>       inode = tux3_open(sb, name, len)
>          tux3_get_inode(inum, &buffer, NULL)
> 
>    tux3_isync(inode)
>       tux3_get_inode(inum, &buffer, NULL)
>          <decode saved attrs from buffer
>          <compare to attrs in inode
>          <encode changed attrs to buffer>
>       <sync buffer>
> 
> So tux3_get_inode is to combine functionality of three ext2 functions:
> ext2_new_inode, ext2_get_inode and ext2_update_inode.  This places the
> inode btree handling logic (lookup, create and update) in exactly one
> place, and also centralizes the attribute encoding and decoding, which
> is considerably more complex than Ext2's...

Actually, I ultimately factored these into inode_make, inode_open and
inode_save.  There turned out to be little code repeated between these
because helper functions do nearly all the work.  Attribute encoding
and decoding has evolved into a pleasantly regular form which looks to
be blindingly fast, consisting entirely of inline calls to libc endian
conversion macros that expand to just a handful of machine instructions
on common architectures.  Too bad gcc can't handle this common chore on
its own, that would be even better.  Sigh.  Anyway, unpacking/repacking
inode attributes now approaches the efficiency of Ext2/3, a pleasing
result.

As I mentioned earlier, Ext2/3 have a slight advantage in being able to
compute the disk address of an inode table block directly rather than
traversing a btree to find it.  But that can be optimized in Tux3 by
keeping btree "cursors" that cache the result of previous btree probes
to take advantage of spacial locality in lookups, which is the common
case.  For a completely random access load, disk seeking will tend to
dominate anyway, and there will always be some amount of btree index
caching going on.  Just two levels of btree index gives us access to
about three million inodes, and both index levels will fit easily in
cache.  Three levels gives over a billion inodes and then we need to
be concerned mainly about keeping the terminal index nodes relatively
close to their parents, which lets the disk hardware combine seeks
efficiently.  Every filesystem is going to have to seek a lot in this
case.  The winner will be the one with the best disk layout policy,
and in the case of modern btree based filesystems, the highest btree
branching factor.

Regards,

Daniel

_______________________________________________
Tux3 mailing list
Tux3 at tux3.org
http://tux3.org/cgi-bin/mailman/listinfo/tux3