[Tux3] Q: inode numbering, and placement on-disk
Daniel Phillips
phillips at phunq.net
Wed Feb 25 01:12:58 PST 2009
On Wednesday 25 February 2009, Philipp Marek wrote:
> Hello Daniel,
>
> thank you for your answer.
>
> On Mittwoch, 25. Februar 2009, Daniel Phillips wrote:
> > On Wednesday 25 February 2009, Philipp Marek wrote:
> > > Now, if I understand Tux3 design correctly, it's no longer the case that
> > > the inode numbers have any meaning regarding the on-disk location
> > > (because of snapshots, versioning, and other strategies), so this hack
> > > (or "feature", if you like) wouldn't work anymore, would it?
> > Tux3 does in fact attach significance to inode numbers. Currently, the
> > file data allocation goal (a block number) is used as the inode number
> > goal, and Tux3 will assign the next available inode number after that
> > to a new file. In time we will improve this strategy to work well when
> > you come back and create a new file in a directory later, most probably
> > by maintaining or computing a overall directory allocation goal based
> > on physical location of files already in the directory.
> That's good to hear.
>
> > As it is now, when you initially write a directory, inode numbers will
> > correspond spatially quite well to file data block ordering, but this
> > pristine condition will degenerate over time, which can be regarded as
> > an optimization bug that needs to be fixed.
>
> Yes; I now looked at http://kerneltrap.org/Linux/Tux3_Hierarchical_Structure,
> and I think I just misunderstood something (or remembered wrong facts).
>
> Now, IIUC, other volumes in the same filesystem get completely distinct inode
> numbers, and if some snapshot gets extended new inodes are used for that, too
> (and these inodes are close to their data again).
We dropped the multiple volume per physical volume feature, noting the
feature does not offer any benefit that cannot be deliverd by a volume
manager. I am not sure what you mean about snapshots. New inodes can in
fact exist in one snapshot and not in another, which ought to be good for
some confusion when we extend layout optimization to cover that.
> Of course that means that their data gets heavily fragmented - some inodes are
> at the start of the device, others in the middle ...
> But that's no difference to ext3 or whatever now.
Right, and we will open a general discussion on what to do about this at
this point. The basic idea is, when laying out files linearly near the
"original" goal is impossible, at least try to lay out related files
linearly at the new, far away place.
> Do you know about any analysis that shows whether keeping directories and
> inodes should be kept closer than inodes and their file-data?
I have thought about this off and on. I think it is more important to
keep inode table blocks near the data than near the directory entries
because directory readahead is easy, and relatively many entries are
packed into each dirent block so seeking to a far away place to pick
out an out of cache dirent block is not very painful. Having learned
an inode number from a (cached) dirent block, the disk then goes off
to pick up the inode table block and file data. How nice it would be
if those are often on the same track, so the disk slurps them up into
its track cache at the same time.
An additional variable which I have not thought deeply about is, what
about inode table readahead? It sounds like a good idea, doesn't it.
What that does to the optimization equation is, it makes us want to
store inode table blocks in bursty groups so we can pick up several
with a single seek. In that case, the importance of being near the
data is diminished because a high portion of inode lookups will hit
cache.
Regards,
Daniel
, simply because
so many directory names can be packed into one directory entry block that
it costs relatively little to seek back to pick up the next block
_______________________________________________
Tux3 mailing list
Tux3 at tux3.org
http://mailman.tux3.org/cgi-bin/mailman/listinfo/tux3
More information about the Tux3
mailing list