[Tux3] File index leaf format
Daniel Phillips
phillips at phunq.net
Mon Jul 28 00:18:57 PDT 2008
In an earlier post I was seen fretting about how to reduce the
relatively expensive linear search in a PHTree directory file dirent
block to O(log(dirents)), while keeping the compactness and physical
stability of the good old Ext dirent format. It did not take too long
to find one.
struct tux3_dirent { unsigned inum:48, type:8, len:8; char name[]; };
Versus an Ext2 dirent, the two byte record size is gone and the inum
is increased from 32 bits to 48. The fields are big-endian and the
structure is byte-aligned, and therefore needs to be declared with
attribute packed so that gcc will generate code that does not throw
exceptions on alignment-challenged architectures when trying to pick up
the 48 bit inum field.
A simple directory grows down from the top of the dirent block towards
the dirents:
be_16 entries[];
Each big endian 16 bit entry is the offset of a dirent from the base of
the block. The entries are in lexically sorted order to support
binsearch. To maintain physical stability for telldir cookies, an
entry is never moved once created. Too bad about directory compaction,
that is an offline operation.
To search for space within the dirent we make a copy of the directory
and sort it. In the base of the directory we record the size of the
biggest contiguous free space in the dirent block in order to avoid
most fruitless searches and the associated sort. There is also a
highwater mark to optimize the common case of dirent blocks that just
fill up from bottom to top, with free space seldom created in the
middle, and to know how far down the directory can grow before it
meets the dirents.
To delete an entry, just delete it from the directory and reduce the
count of entries.
The block header looks like:
struct tux3_dirent_block { char magic[4]; unsigned count, high; };
The inum of the directory file may also be stored in the header to help
with fsck.
I am not in a big hurry to implement this, because the good old Ext2
dirent block format will serve quite nicely for now, and I can just cut
and paste the code to implement that. But when dirop speed rises to
the top of this list of things that people are benchmarking, this
improvement is ready and will measureably increase the speed of creates
and random deletes. Also, Tux3 is supposed to have 48 bit inode
numbers, so the directory format has to support that.
Regards,
Daniel
_______________________________________________
Tux3 mailing list
Tux3 at tux3.org
http://tux3.org/cgi-bin/mailman/listinfo/tux3
More information about the Tux3
mailing list