[Tux3] Atom table i_size issue

Daniel Phillips phillips at phunq.net
Thu Dec 4 12:04:09 PST 2008


There is a slight disconnect between Tux3's userspace way of going about 
things and kernel's, which has left kernel xattr support in a broken 
state at the moment.  Rather than just fixing it, I thought I would 
write about it.  Then fix it.  Somehow, writing about such things seems 
to help clear the mind, and more often that not somebody has a better 
idea, so here goes.

The issue is, the vfs block library refuses to read, write or map file 
space above inode->i_size (cached file size) while Tux3 will happily do 
so in userspace.  The kernel is right on this one, because its 
semantics are expected by the user and ordained by Posix: reading above 
file size gives a short read.  Since the user cannot retrieve any data 
past the end of file, it makes no sense to write any there, or for the 
filesystem to allocate and index it.

However, Tux3 does happily write way up high in at least one of its 
internal files, the atom table.  The atom table is in fact a directory 
file, with two tables written way up high: one to count atom references 
and the other to reverse map atom numbers back to directory entries.  
These tables are written above the dir->i_size, which is used by the 
directory operations (essentially Ext2's directory code) to know how 
many blocks a directory has, thus when to stop searching.  A new 
directory block is added simply by increasing the dir->i_size.

Hirofumi's partial fix was to set the atom table file size very high, 
which works fine for the two tables up high, but breaks the directory 
operations by making them think they are working on terabytes worth of  
dirent blocks.  So this is what needs to be fixed.

It is cute to be able to reuse the directory code to operate the atom 
table, and the slight refactoring that had to be done to allow this did 
not hurt the normal vfs usage at all.  But now we have to do something 
if we want to keep using the code in this dual way, or maybe just admit 
that this is not quite the right tool for the job and give xattrs its 
own custom directory code.  After all, xattrs only use create, delete 
and lookup.

Another option is to generalize the directory operation interface a bit, 
perhaps passing a pointer to inode->i_size instead of having the 
directory operation go delving into the inode itself.  This is probably 
what we are going to do, but it does add another parameter to several 
operations, only needed to make the atom table work, so I thought I 
would introspect a little bit about this first.

A third option is to have the directory operations use ->i_blocks 
instead of ->i_size, which would have the nice effect of reducing some 
of the shifting going on in the directory operations to convert 
between ->i_size and blocks.  Directory files are never sparse in the 
Ext2 model, and they never shrink except on rmdir, so the actual file 
size only needs to be updated in two interface functions (in the new 
kernel-only namei.c file).  We would then take care not to update 
the ->i_blocks field for blocks above ->i_size.  On balance, this would 
probably give the shortest code, so I don't know, maybe this is the 
right thing to do.  With suitable warning comments on how to 
treat ->i_blocks, which we are not updating at all right now, and we 
should.

Regards,

Daniel

_______________________________________________
Tux3 mailing list
Tux3 at mailman.tux3.org
http://mailman.tux3.org/cgi-bin/mailman/listinfo/tux3



More information about the Tux3 mailing list