[Tux3] Encoding of extent information

Daniel Phillips phillips at phunq.net
Fri Oct 10 15:53:38 PDT 2008


On Wednesday 08 October 2008 01:02, Daniel Phillips wrote:
> On Wednesday 08 October 2008 00:26, Philip Pokorny wrote:
> > >  it is certainly worth thinking about.  There
> > > may be other benefits of the alignment restriction, such as helping
> > > control fragmentation.
> > It does make the fragmentation somewhat predictable.  For a given extent 
> > of a given size, the extent "next door" must be the same size or smaller 
> > if it's a candidate for merging the two into a larger extent.
> > 
> > > But does it not require more extents to
> > > represent odd sized files at the short end of the scale?
> > I suppose it requires log2(n) extents at most for any "odd" length of an 
> > extent.  Specifically the number of "1" bits in the binary 
> > representation is the number of extents required and their position 
> > tells you what length of extent is required.  So yes, in the general 
> > case, you would probably have more extents.  Given the predictability of 
> > the sizes, this might not be as much of an issue.  For a filesystem, the 
> > requirement that extents be a power of 2 in length might be a 
> > significant down-side to this scheme.
> 
> I haven't thought about it enough to know if it is a problem.  We win
> 5 bits and it costs some extra extent entries where it is necessary to
> represent extent sizes exactly.  At worst, this is fun to think about,
> and there might well be something useful there.

Hi Philip,

I thought about your idea some more and I am thinking there could be
some kind of win there.  Not only can we represent extents this way, we
can also do buddy allocation.  Why not manage each 128 MB allocation
region (mappable by one 4K block) as a buddy space?  The well known
advantages of a buddy allocator are rapid allocation and coalescing, and
the well known disadvantages, high internal fragmentation due to
exponential granularity and fragmentation due to limited coalescing
possibilities, can be ameliorated in Tux3 firstly by the ability to
represent an odd sized extent with multiple even sized extents (possibly
contiguous) and secondly by block migration.

The annoying issue is that pointer density decreases in the case of
small, odd-sized extents, the common case.  For large files, we do not
care very much how many extents are needed to represent the odd-sized
tail, but for a 7 block file, needing an additional three extents to
make up the odd size would cut our metadata density by a factor of four.

So I thought: suppose we grab a bit from the block address space and use
it to represent some odd sized allocations.  For example:

   Block alignment    Size if odd bit clear      Size if odd bit set

         1                     1                         3
         2                     2                         5
         4                     4                         6
         8                     8                         7

So with the help of the "odd" bit we can represent all sizes in the
range 1 - 8 blocks, with the implication that two or three successive
buddy blocks are allocated.  In other words, this odd bit does not
change the buddy allocation scheme, but just compresses the way we
encode extents.

So a 3 block extent includes the addressed single block and the buddy at
the next larger size.  A 5 block extent includes the addressed two block
extent, its buddy, and either the single block immediately above or
below, depending on wither the addressed block is even or odd.  And so on.

Above 8 blocks the scheme repeats at coarser granularity.  The first
size we cannot represent exactly is 9 blocks, which with uniform
distribution of file sizes from 0 to 9 blocks would be an 11% looser
representation, against which we save 5 bits of extent, nearly a tie.
But file size distribution is not uniform, it is strongly skewed towards
small files, so our dleaf wastage will be much lower in practice, maybe
around 1%.

I am really just playing around with your idea here.  For practical
reasons we need to keep pressing forwards with the simplest scheme that
can possibly win benchmarks soon enough to matter.  But we can afford
the luxury of thinking about how we might go about things more
efficiently in the future.

Regards,

Daniel

_______________________________________________
Tux3 mailing list
Tux3 at tux3.org
http://tux3.org/cgi-bin/mailman/listinfo/tux3



More information about the Tux3 mailing list