[Tux3] Atom table i_size issue
Daniel Phillips
phillips at phunq.net
Thu Dec 4 12:04:09 PST 2008
There is a slight disconnect between Tux3's userspace way of going about
things and kernel's, which has left kernel xattr support in a broken
state at the moment. Rather than just fixing it, I thought I would
write about it. Then fix it. Somehow, writing about such things seems
to help clear the mind, and more often that not somebody has a better
idea, so here goes.
The issue is, the vfs block library refuses to read, write or map file
space above inode->i_size (cached file size) while Tux3 will happily do
so in userspace. The kernel is right on this one, because its
semantics are expected by the user and ordained by Posix: reading above
file size gives a short read. Since the user cannot retrieve any data
past the end of file, it makes no sense to write any there, or for the
filesystem to allocate and index it.
However, Tux3 does happily write way up high in at least one of its
internal files, the atom table. The atom table is in fact a directory
file, with two tables written way up high: one to count atom references
and the other to reverse map atom numbers back to directory entries.
These tables are written above the dir->i_size, which is used by the
directory operations (essentially Ext2's directory code) to know how
many blocks a directory has, thus when to stop searching. A new
directory block is added simply by increasing the dir->i_size.
Hirofumi's partial fix was to set the atom table file size very high,
which works fine for the two tables up high, but breaks the directory
operations by making them think they are working on terabytes worth of
dirent blocks. So this is what needs to be fixed.
It is cute to be able to reuse the directory code to operate the atom
table, and the slight refactoring that had to be done to allow this did
not hurt the normal vfs usage at all. But now we have to do something
if we want to keep using the code in this dual way, or maybe just admit
that this is not quite the right tool for the job and give xattrs its
own custom directory code. After all, xattrs only use create, delete
and lookup.
Another option is to generalize the directory operation interface a bit,
perhaps passing a pointer to inode->i_size instead of having the
directory operation go delving into the inode itself. This is probably
what we are going to do, but it does add another parameter to several
operations, only needed to make the atom table work, so I thought I
would introspect a little bit about this first.
A third option is to have the directory operations use ->i_blocks
instead of ->i_size, which would have the nice effect of reducing some
of the shifting going on in the directory operations to convert
between ->i_size and blocks. Directory files are never sparse in the
Ext2 model, and they never shrink except on rmdir, so the actual file
size only needs to be updated in two interface functions (in the new
kernel-only namei.c file). We would then take care not to update
the ->i_blocks field for blocks above ->i_size. On balance, this would
probably give the shortest code, so I don't know, maybe this is the
right thing to do. With suitable warning comments on how to
treat ->i_blocks, which we are not updating at all right now, and we
should.
Regards,
Daniel
_______________________________________________
Tux3 mailing list
Tux3 at mailman.tux3.org
http://mailman.tux3.org/cgi-bin/mailman/listinfo/tux3
More information about the Tux3
mailing list