[Tux3] File index leaf format

Daniel Phillips phillips at phunq.net
Mon Jul 28 00:18:57 PDT 2008


In an earlier post I was seen fretting about how to reduce the 
relatively expensive linear search in a PHTree directory file dirent 
block to O(log(dirents)), while keeping the compactness and physical 
stability of the good old Ext dirent format.  It did not take too long 
to find one.

   struct tux3_dirent { unsigned inum:48, type:8, len:8; char name[]; };

Versus an Ext2 dirent, the two byte record size is gone and the inum
is increased from 32 bits to 48.  The fields are big-endian and the 
structure is byte-aligned, and therefore needs to be declared with 
attribute packed so that gcc will generate code that does not throw 
exceptions on alignment-challenged architectures when trying to pick up 
the 48 bit inum field.

A simple directory grows down from the top of the dirent block towards
the dirents:

   be_16 entries[];

Each big endian 16 bit entry is the offset of a dirent from the base of 
the block.  The entries are in lexically sorted order to support 
binsearch.  To maintain physical stability for telldir cookies, an 
entry is never moved once created.  Too bad about directory compaction, 
that is an offline operation.

To search for space within the dirent we make a copy of the directory 
and sort it.  In the base of the directory we record the size of the 
biggest contiguous free space in the dirent block in order to avoid 
most fruitless searches and the associated sort.  There is also a 
highwater mark to optimize the common case of dirent blocks that just 
fill up from bottom to top, with free space seldom created in the 
middle, and to know how far down the directory can grow before it
meets the dirents.

To delete an entry, just delete it from the directory and reduce the
count of entries.

The block header looks like:

   struct tux3_dirent_block { char magic[4]; unsigned count, high; };

The inum of the directory file may also be stored in the header to help
with fsck.

I am not in a big hurry to implement this, because the good old Ext2
dirent block format will serve quite nicely for now, and I can just cut 
and paste the code to implement that.  But when dirop speed rises to 
the top of this list of things that people are benchmarking, this 
improvement is ready and will measureably increase the speed of creates 
and random deletes.  Also, Tux3 is supposed to have 48 bit inode 
numbers, so the directory format has to support that.

Regards,

Daniel

_______________________________________________
Tux3 mailing list
Tux3 at tux3.org
http://tux3.org/cgi-bin/mailman/listinfo/tux3



More information about the Tux3 mailing list