[Tux3] VFS -Tux3 Univ

Pranith Kumar bobby.prani at gmail.com
Wed Sep 17 04:16:44 PDT 2008


<MaZe> ugh, oh right, what was the homework?
<flips> read the superblock? ;-)
<RazvanM> flips: homework is: know how the root dir is loaded and
initialized, and now that differs from how any other inode is opened
<flips> it was about loading the root directory
2008-09-16 20:00 -!- pranith(7aa040b1 at webchat.mibbit.com) has joined #tux3
<flips> and what did we find?
<MaZe> that it gets loaded explicitely
<flips> because...
<flips> because dir lookup doesn't work
<MaZe> well it's the mount point
<flips> because there is no dir to look up in
<MaZe> root of the tree and all that
<RazvanM> ACTION is searching for s_root...
<flips> so we have to open the root dir "manually", using functionality that
normally gets called by something like ext2_lookup
<flips> not quite that function
<flips> anyway
<flips> we're starting somewhere different today
<flips> because maze wants to go faster ;)
<flips> so let's go to sys_write
<MaZe> I'm guiltless - I tell you...
<RazvanM> http://lxr.linux.no/linux+v2.6.26.5/fs/dcache.c#L1062
<RazvanM> ok ok ok ok
<MaZe> I think we killed lxr
<flips> seems
<flips> next time I'll go there before I announce the destination ;)
<RazvanM> http://lxr.linux.no/linux+v2.6.26.5/fs/read_write.c#L370
<MaZe> http://lxr.linux.no/linux+v2.6.26.5/fs/read_write.c#L370
<RazvanM> it works from here
<pranith> works here too
<MaZe> Razvan's always faster ;-)
<flips> ok, who wants to walk down into it?
<flips> instead of me this time?
<flips> seems to me, razvanm does that pretty well
<flips> you know the first few layers
<flips> it's just the same idea as sys_open
<RazvanM> ACTION is doesn't too much about fs yet :(
<flips> you know how to poke down into a syscall though
<MaZe> file_pos_read and file_pos_write are probably to fetch and store the
current file offset
<flips> just keep clicking until you see something that isn't obvious
<flips> let's look at those
<MaZe> fget_light and fput_light must be fd to struct file lookup with
locking
<MaZe> so all that's left is vfs_write
<flips> pretty simple (file_pos_read/write)
<MaZe> which was kind of obvious to begin with ;-)
<flips> I don't know why they're even abstracted
<flips> fget/put_light are demented
<flips> two of the most subtle and demented functions in the entire kernel
<flips> don't worry about them today ;)
<flips> they were conceived by a vile an twisted mind, and get to live
because they are fast
<MaZe> what's demented about them?
<flips> heh
<flips> later
<flips> really
<flips> google if you must
<RazvanM> http://lxr.linux.no/linux+v2.6.26.5/fs/read_write.c#L313
<MaZe> ok, that's vfs_write
<flips> suffice to say that they keep our file from disappearing while we
are writing to it
<flips> it would be bad otherwise
<MaZe> right - locking
<flips> razvanm, good, and what do you see there?
<MaZe> a bunch of permission checks
<MaZe> and then a f_op->write call
<RazvanM> f_op->write if exists
<flips> typical, right?
<MaZe> provided it's available
<flips> what you don't see is any locks being taken
<RazvanM> ot do_sync_write otherwise
<RazvanM> ot = or
<flips> there is _very little locking_ in this path
<flips> helping make it fast
<MaZe> and a cute inc_syscw
2008-09-16 20:09 -!- kbingham(~kbingham at 92.8.217.48) has joined #tux3
<flips> the consequence of that is, the filesystem can be hit in a very
parallel way
<RazvanM> what is rw_verify_area?
<MaZe> probably locking
<flips> sometimes in ways that don't make sense, or are from buggy, racy
applications, and the filesystem has to do something reasonable
<flips> i.e., not crash and not corrupt
<flips> rw_verify_area... hmm
<MaZe> as in byte-range locks
<flips> newish thing
<flips> no sorry
<flips> it's implementing flock
<flips> bad name
<flips> very
<RazvanM> http://lxr.linux.no/linux+v2.6.26.5/fs/read_write.c#L196
<flips> we don't care about it really
<MaZe> I'd guess it checks no-one else has locked the area we're about to
write to
<flips> normally nobody uses flock
<flips> crufty old baggage
<flips> more interesting that selinux has a hook there
<- typical selinux hook
<pranith> flips: inc_syscw.. tsk->syscw++
<flips> but this is not really interesting, let's pop back out and go deeper
<MaZe> that's a generic security hook though right?
<flips> yes
<flips> I forget what we call the generic harness
<- back here
<flips> next we see that meme again
<flips> our fs can either completely replace the write logic with its own,
or the vfs will supply a basic framework and call lower level methods in the
fs
<flips>  327                if (file->f_op->write)
<flips>  328                        ret = file->f_op->write(file, buf,
count, pos);
<flips> very few fs's will use this hook
<pranith> i thought we were supposed to use the vfs framework...
<RazvanM> http://lxr.linux.no/linux+v2.6.26.5/fs/read_write.c#L288
<flips> almost all continue on down into do_sync_write
<flips> which is still the vfs
<flips> most filesystems don't want to have the responsibility of doing all
the things the vfs is about to do now
2008-09-16 20:16 -!- amey(~amey at 116.73.35.180) has joined #tux3
<- do_sync_write
<flips> so, internally the kernel is kind of aio oriented
<flips> asynchronous IO
<flips> and synchronous IO is just a shell around it of the form "start and
IO op; wait on a wait queue until its done"
<flips> we see that here
<flips> very simple... if you don't poke into the details
<flips> we will, but later
<MaZe> http://lxr.linux.no/linux+v2.6.26.5/fs/ext2/file.c#L50
<MaZe> http://lxr.linux.no/linux+v2.6.26.5/mm/filemap.c#L2487
<flips> so now... we lose the trail
<flips> because the vfs calls the real write action through a variable
<flips> any suggestions how we can pick up that trail again?
<RazvanM> aio_write :P
<flips> filp->f_op->aio_write
<flips> right
<flips> we can grep the entire kernel for it
<flips> or we can go back to ext2/inode.c
<flips> where I know it is ;)
<flips> let's do that
<MaZe> http://lxr.linux.no/linux+v2.6.26.5/mm/filemap.c#L2364
<flips> you're getting ahead ;)
<flips> let's see how we get there
<flips> and I was wrong about the file
<flips> http://lxr.linux.no/linux+v2.6.26.5/fs/ext2/file.c#L50
<RazvanM> interesting
<MaZe> ?
<flips> now we see that ext2 just fills that in with a generic function
<flips> that maze already found
<flips> so lets clikc on it and go to filemap
<RazvanM> even this a fs is not interesting in implementing it :D
<flips> that's right
<flips> ext2 mostly lets the vfs do everything for it
<flips> and its still 7,500 lines long
<flips> worth considering what's in those 7,500 lines
<flips> keep in mind that the VFS was essentially created just by taking a
functioning filesystem and chopping it in half
<flips> the top half, which became the vfs
<flips> and the bottom half, which is a bunch of specific methods for doing
things like figuring out the position of a block on disk
<MaZe> and the bottom half which became the fs drivers
<RazvanM> ext2 should still have something to say about the write...
<flips> which because ext2 and all its friends
<MaZe> might not
<MaZe> ext2 is not journaled
<MaZe> might just have a get_disk_block(file, offset)
<flips> ext2 is happy to let the vfs take over completely here, but of
course, the vfs will come back to ext2 at some point
<pranith> why not ext3?
<MaZe> and allocate/free_disk_block
<flips> we will get there in about 5-10 minutes
<pranith> ok
<flips> for comparison, you could look at ext3/file.c
<flips> let's do that later
<flips> http://lxr.linux.no/linux+v2.6.26.5/+code=generic_file_aio_write
<flips> http://lxr.linux.no/linux+v2.6.26.5/fs/ext2/file.c#L50
<MaZe> ext2 is not journaled - so each file is just a read/write collection
of blocks on disk
<flips> even ext3 doesn't normally journal data
<MaZe> so all you need is the ability to lookup a given files/offsets block
location on disk and you can read/write just fine
<MaZe> but it can...
<RazvanM> next step: http://lxr.linux.no/linux+v2.6.26.5/mm/filemap.c#L2364
<flips> yes, and so it must supply different methods for its different
journalling options
<MaZe> http://lxr.linux.no/linux+v2.6.26.5/fs/ext3/file.c#L113
<flips> not *must*, but that is what it does
<MaZe>  113        .aio_read       = generic_file_aio_read,
<MaZe> 114        .aio_write      = ext3_file_write,
<MaZe> so ext3 has it's own write, but uses the generic read
<flips> thanks razvanm
<flips> notice that generic_file_aio_write didn't really do much
<RazvanM> generic read but custom write... interesting
<flips> jsut took care of some options
<flips> optional unix semantics
<flips> razvanm, sure, no journal needed on read
<flips> finally, __generic_file_aio_write_nolock is doing something
<flips> not much... but more than the others
<RazvanM> aaaa... ext3 :D
<MaZe> since on read you can just let the generic file/offset block lookup
code handle it, but on write - you might need to go through the journal if
the right mount optiones (data=ordered I think) were used
<MaZe> or data=journaled - never sure
<flips> here we see readv being implemented
<flips> um
<flips> writev
<pranith> where?
<flips> generic_segment_checks(iov, &nr_segs, &ocount, VERIFY_READ);
<flips> nr_segs... writev segs
<flips> not important
<flips> easy enough to understand
<MaZe> is that verifying we can read the ram the user passed us?
<flips> probably
<flips> let's find out
<flips> 1149                /*
<flips> 1150                 * If any segment has a negative length, or the
cumulative
<flips> 1151                 * length ever wraps negative then return
-EINVAL.
<flips> 1152                 */
<flips> no, just checking for properly formed structs
<MaZe> if (access_ok(access_flags, iv->iov_base, iv->iov_len))
<MaZe> I htink it does full access checks
<flips> security
<MaZe> note the return -EFAULT
<flips> so we will rely on the mmu
<flips> to fault
<flips> and sometimes check for faulting contitions by hand
<- access_ok just within memory or not
<MaZe> no I think it checks by hand, but only returns EFAULT if first part
is bad, otherwise it marks how many are good, and ignore the rest
<flips> vfs_check_frozen implements the filesystem "freeze" feature... which
is used for snapshotting
<flips> kind of misconceived
<MaZe> so you'll get a partial write instead of an EFAULT if you have a bad
mapping in the middle of a writev
<flips> sounds reasonable
<MaZe> can't realy on mmu since we probably will use dma
<flips> then we have a bunch of code associated with direct IO
<flips> which we are going to skip
<MaZe> http://lxr.linux.no/linux+v2.6.26.5/mm/filemap.c#L2319
<flips> maze, true
<flips> so we're going to check access somewhere
<flips> but not here
<flips> notice, no real work got done
<flips> we're still just deepening the call chain and allowing for various
options and whatnot
<MaZe> at this point, we're seriously not expecting any real work to get
done ;-)
<flips> then we get to generic_file_buffered_write
<RazvanM> ACTION does! :D
<flips> think that's going to do work?
<MaZe> nope
<flips> you'd be right
<flips> short break
<flips> while I fill the wine glass
<pranith> wine? i thought u wanted beer
<pranith> ;)
<flips> nobody sent any
<pranith> aww
<flips> ok here we go again
<RazvanM> ACTION thinks a_ops->write_begin must be the key...
<flips> we have a ->write_begin option
<flips> which is new for me
<MaZe> the two functions are right next to each other
<MaZe> and look similat
<flips> and that 2copy thing, likewise
<MaZe> probably something aio related
<flips> looks like braindamange
<RazvanM> the 2copy is also using some a_ops
<MaZe> notice a_ops
<MaZe> is struct addres_space_operations
<RazvanM> http://lxr.linux.no/linux+v2.6.26.5/include/linux/fs.h#L444
<flips> lost the scent for a moment
<RazvanM> ACTION knows readpage from romfs...
<MaZe> sounds mmap-ish
<pranith> ACTION has to go to work :(
<pranith> ACTION says bbyee, do post the logs ...
<MaZe> guessing a_ops are operations that can be performed on mmaped fs
pages
<MaZe> with ability for fs to override it to trigger journaling etc
<MaZe> bye bye
<flips> ok, this code has bben "worked on"
<flips> rearranged hopefully for a good reason
<RazvanM> readpage is the only 'read' the romfs is doing
<flips> http://lxr.linux.no/linux+v2.6.26.5/mm/filemap.c#L2231
<RazvanM> so its called not only for mmap stuff
<flips> generic_perform_write
<MaZe> that may be an optimization though
<flips> this is where the real action happens
<MaZe> who knows...
<flips> or one form of real action
<MaZe> http://lxr.linux.no/linux+v2.6.26.5/mm/filemap.c#L2231
<flips> we're going to talk about a_ops
<flips> this is the key to most filesystem io in linux
<flips> ok, so here is a typical write mem
<RazvanM> write_begin, write_end
<flips> right
<flips> and in between we copy data from userspace
<flips> onto a page
<MaZe>  copied = iov_iter_copy_from_user_atomic(page, i, offset, bytes);
<flips> so what is in write_beging? probably get a page into the page cache
of an inode
<flips> and write_end will send that page down to the hardware
<MaZe> looks like the kernel basically mmaps in the page and then mmaps it
out
<flips> copy_from_user gets the data, and generates EFAULT if necessary
<flips> either because of illegal access, or page swapped out
<MaZe>      pagefault_disable();
<MaZe> uhm?
<flips> things get interested in the page was swapped out to a swapfile
onthe same filesystem
<flips> interesting
<RazvanM> swapfile on the same filesystem??
<flips> right
<RazvanM> swapfile is not a separate fs?
<flips> trying to prevent recursive fault
<MaZe> sounds like that just turned off page-in
<flips> I don't have the details at hand just now
<flips> razvanm, swap can be separate, or it can be on a filesystem
<flips> there are some nasty possible recursions when its on a filesystem
<MaZe> very nasty
<RazvanM> ACTION doesn't know how to create a swap on a fs :|
<flips> 2 minutes until question time
<flips> it's going to be another "cliffhanger" ending
<RazvanM> :-)
<MaZe> lol
<flips> now this function is not very instructive
<flips> because it doesn't directly use the page cache ops
<flips> it provides hooks for them
<MaZe> are you sure we went into the right function? not the 2copy one?
<flips> let's see if we can pop out and find a variant that does use the
page cache ops
<flips> I'm sure we didn't
<flips> somebody has been messing with names
<flips> I hope it was for a good reason
<flips> it isn't always
<MaZe> http://lxr.linux.no/linux+v2.6.26.5/mm/filemap.c#L2063
<flips> and as you can see, the call chain is kind of unreasonably deep
<MaZe> this all seems extremely complex
<MaZe> for now I can't say unnecessarily... but...
<RazvanM> what does the 2copy mean?
<flips> yes, this looks like what remains of good old generic_write
<MaZe> it means brain-dead original 1st copy apparently
<flips> maze, I am happy to have reached your "complex" threshold
<flips> it gets more complex
<flips> in _2copy, we will alloc pages, map them into a page cache, copy
data onto them, and submit them to disk
<flips> we will call the fs's ->write_page method to do the latter
<flips> and that method will figure out _where_ on disk the page should go
<flips> I don't know wyat 2copy means
<MaZe> why do we have to copy_from_user
<MaZe> can't we write directly from userspace data?
<flips> feels like... wanking... but I will know for sure for thursdays's
session
<flips> maze, because this is _buffered_ write
<flips> we are placing the data in cache
<MaZe> oh, right
<flips> we can't just place references to pages in cache
<flips> because the user data is not necessarily properly aligned
<MaZe> couldn't we just rip the page out from under the user, and give him a
r/o cow page?
<flips> linus does want to attempt something like that
<flips> but it's too hard, even for him
<RazvanM> ACTION doesn't see the write_page....
<flips> me neither
<RazvanM> there is prepare_write
<MaZe> http://lxr.linux.no/linux+v2.6.26.5/mm/filemap.c#L2192
<flips> home is: see the writepage
<flips> ;-)
<RazvanM> and commit_write
<flips> on thursday we will pick up at the writepage
<MaZe> I'm not sure why there would need to be a write page
<flips> yep, it looks like _2copy really is the new incarnation of
generic_write
<flips> it used to just be generic_write
<flips> but then it started getting more and more "wrapped"
<flips> until we see this thing
<flips> unreadable thing you could say
<RazvanM> :-)
<flips> maze, the purpose of the ->writepages in there is to get dirty,
buffered pages onto disk
2008-09-16 20:57 -!- kbingham(~kbingham at 92.20.210.138) has joined #tux3
<MaZe> won't commit_write do that?
<flips> ah, that's what you asked
<flips> why two
<flips> no good reason actually
<flips> there's usually a "prepare_write" and a "commit_write"
<RazvanM> http://lxr.linux.no/linux+v2.6.26.5/include/linux/fs.h#L458
<flips> one or the other generally doesn't do much
<MaZe> there's a writeage, writepages,
prepatre_write,commit_write,write_begin,write_end ...
<MaZe> pick'n'choose
<flips> yes
<flips> big mess
<flips> linux IO is trying to find its identity
<MaZe> lol
<RazvanM> it was simpler and nicer in the past?
<flips> beginning of 2.6 was simpler, yes
<flips> o_direct is a very good thing, but it added considerable complexity
<MaZe> it looks like different file systems use different interfaces
<flips> likewise aio
<flips> maze, somewhat true
<flips> almost everybody uses generic_write
<MaZe> and thus we have a lot
<flips> not much global structural analysis goes on
<flips> so that the structure can be simplified
<flips> because that doesn't add new features
<flips> or fix bugs
<MaZe> are the address_space_operations fs internal?
<flips> introduces them more likely
<MaZe> or are they more global mm?
<flips> but it makes the code messy
<flips> like many such things in linux, they are usually library methods
<flips> kernel library
<flips> which the fs can lightly wrap
<flips> or use directly
<flips> the ->writepages thing is a relatively new invention
<flips> that allows the filesystem to map more than one page at a time for
IO
<flips> lead to nice benchmark improvements
<flips> and more mess in filemap.c
<tim_dimm> and this is where variable page sizes will get interesting
<flips> filemap.c is where most of the impact is, yes
<flips> insightful
<flips> 4 minutes over ;)
<flips> how did we do for pacing today?
<tim_dimm> i try
<tim_dimm> nice pace
<MaZe> pretty decent I think
<tim_dimm> sorry I asked so many questions
<flips> ok, we will be back into write on thursday
<tim_dimm> ;-)
<MaZe> tim_dimm: ask questions - it's the only way to learn anything
<RazvanM> ACTION is not happy with the length though ;-)
<flips> homework is: find the implementations of the ->writepage calls in
ext2
<tim_dimm> I was just trying to figure out what / where to read
<tim_dimm> never been inside the kernel like that before
<flips> it's bizarre, isn't it
<noob
<tim_dimm> yeah
<MaZe> so here's a question: buffered, aio, o_direct - what are the
permutations/combinations, what do they mean, and how do they interact with
each other if the same spot is being accessed via different means
<flips> maze, very good question, and the answer is: with considerable
complexity
<MaZe> lovely answer
<flips> it is necessary to maintain cache consistency with all possible
combinations
<MaZe> that's like my friend at work, who sits next to me and regularly
answers either/or questions with a 'yes' spoken in a deadpan voice
<tim_dimm> are there hooks for cache consistency or is it handle another
way?
<flips> that is why that section handling o_direct that we skipped is so...
um... interesting
<flips> tim_dimm, the vfs handles it
<flips> and there are rules that the fs has to follow
<MaZe> O_DIRECT means unbuffered straight to disk, right?
<flips> basically "do not skate over that cliff"
<MaZe> and is pretty meaningless for read...
<flips> maze, right
<flips> o_direct write has to invalidate any buffer data at that point
<MaZe> all synchronous io should be easily implementable via aio
<flips> also flush out dirty buffered data in that range
<tim_dimm> did you guys cover vfs on another tux3 night?
<flips> maze, it is
<flips> tim_dimm, partly
<flips> this is part of the vfs we're doing now
<MaZe> so you basically need to support {buffered | direct } asynchronous io
<tim_dimm> would it be worthwhile to have an entire session on it?'
<flips> we did an easy one first
<flips> maze, yes
<flips> in fact we already looked at the functions that support it
<flips> tim_dimm, that was essentially the first session
<MaZe> o_direct write has to invalidate any buffered data at that point -
uh?
<tim_dimm> k, I'll revisit in the logs
<flips> maze, yes
<MaZe> buffered data for what?
<flips> somebody might have been reading/writing the device with buffered
ops at the same time
<flips> this is not uncommon
<MaZe> oh, the buffered but not yet written stuff gets dropped?
<flips> flushed to disk
<MaZe> or overwritten with the - so flushed, not invalidated
<MaZe> what gets invalidateD?
<flips> you're right, fully replaced pages get dropped
<flips> partially replaced pages have to be flushed
<MaZe> so it's not so much invalidated, as overwritten and thus
dropped/replaced with the new data
<flips> right
<flips> haven't spent a lot of time in that code myself
<flips> but that's correct
<MaZe> does O_DIRECT mean anything on read?
<flips> yes
<flips> will not read from buffer afaic
<MaZe> Try to minimize cache effects of the I/O to and from this  file
<flips> but I could be wrong
<MaZe> according to man open, basically skip buffer cache populating
<flips> anything not buffered is read directly from disk and not added to
the page cache
<MaZe> unless already there
<flips> so o_direct read avoids double buffering
<RazvanM>        O_DIRECT (Since Linux 2.4.10)
<RazvanM>               Try  to  minimize cache effects of the I/O to and
from this file.  In general this will degrade performance, but it is useful
in special
<RazvanM>               situations, such as when applications do their own
caching.  File I/O is done directly to/from user space  buffers.   The  I/O
is  syn-
<RazvanM>               chronous, that is, at the completion of a read(2) or
write(2), data is guaranteed to have been transferred.  See NOTES below for
further
<RazvanM>               discussion.
<RazvanM>               A semantically similar (but deprecated) interface
for block devices is described in raw(8).
<flips> I'm not sure what it does with already-buffered data
<flips> if dirty then it _must_ use the dirty version
<MaZe> so, how expensive is a write to read only page fault?
<RazvanM> from man 2 open, sorry for the long lines
<flips> but I don't know if it does that by flushing it first, then reading
it back, or doing buffered read just for that bit
<MaZe> yeah, found it
<MaZe> doesn't look like there's any requirement to flush
<MaZe> seems like O_DIRECT read is meant for access once - not worth caching
- data
<flips> yes
<flips> still leaves the question about what it does with pages already in
cache, or dirty in cache
<MaZe> it says minimize
<flips> shall we leave that as your homework?
<MaZe> not ignore cache
<flips> can't rely on the man page
<RazvanM> the pages should not be dirty for too long
<flips> have to read the code
<RazvanM> :D
<MaZe> from NOTES
<MaZe> Applications  should  avoid  mixing O_DIRECT and normal I/O to the
same
<MaZe>        file, and especially to overlapping byte  regions  in  the
same  file.
<MaZe>        Even when the filesystem correctly handles the coherency
issues in this
<MaZe>        situation, overall I/O throughput is likely to  be  slower
than  using
<MaZe>        either  mode alone.  Likewise, applications should avoid
mixing mmap(2)
<MaZe>        of files with direct I/O to the same files.
<flips> one thing you see is that o_direct has to be constantly checking the
page cache to be sure nothing is aliased there
<MaZe> "The  thing  that has always disturbed me about O_DIRECT is that
<MaZe>               the whole interface is just stupid, and was probably
designed by
<MaZe>               a  deranged monkey on some serious mind-controlling
substances."
<MaZe>               â€" Linus
<flips> maze, the advice is often ignored
<flips> linux is not absolved from responsibiltiy for keeping the cache
consistent
<MaZe> right
<flips> linus doesn't run a database company
<MaZe> lol
<flips> which is why he thinks that
<flips> the interface is quite simple
<flips> open with o_direct, make sure your data is aligned
<shapor> hi all
<flips> maze, how'd you do with reading your superblock
<MaZe> hey
<flips> shapor, right on time ;)
<MaZe> I slept well, thank you ;-)
<flips> good thing we have logs
<shapor> yeah
<shapor> reading now
<MaZe> I'm going to be working on it now
<flips> maze, that little subproject will be highly instructive
<MaZe> agreed
<MaZe> it already has been
<flips> especially if you write your own custom endio
<flips> and figure out how to have your task (which is "mount") wait on a
wait queue for the io to complete
<MaZe> exactly
<MaZe> well, it's the in-kernel portion of mount
<flips> it's all not very much code, but each line takes about 15 minutes of
study
<flips> or maybe an hour the first time
<MaZe> I expect I need something, sleep on something, wake something from
endio
<flips> precisely
<MaZe> apparently something called a waitqueue
<flips> the waiting bits are covered in a nice tutorial manner on lwn
<MaZe> so probably something like a dynamic init of a waitqueue
<RazvanM> ACTION is off to bed. Tomorrow he needs to be early at school.
<MaZe> then submit io
<flips> bio is... an acquired taste
<MaZe> then sleep on wq
<flips> acquired ore
<flips> acquired lore
<MaZe> in endio wake wq
<MaZe> more like acquired love
<flips> exactly
<flips> probably using the "wake" function
<MaZe> that sounds awesome
<MaZe> and either wake or wakeall likely
<MaZe> here wakeall being more appropriate
<flips> usually wake
<flips> no need for a thundering herd
<flips> of course you know there is only one waiter
<flips> there better not be more, or something else broke
<MaZe> well, but in general, since the op is complete - I should wake all
<MaZe> interesting question then is how to dealloc the wq
<MaZe> must be some put_wq in the waiters
<MaZe> which on last dec to zero does free
<flips> next move for me is to drop over to whole foods to pick up some
munchies
<flips> I only have a few more days left as a bachelor
<flips> before the girls get back ;)
<shapor> flips: hah thats where i was instead of class
<flips> at which time I'm afraid my checking rate will drop somewhat
<MaZe> linux/wait.h
<shapor> didn't think it'd be so early
<flips> checkin
<flips> shapor, 8 pm tue and thur
<flips> hmm
<flips> looks like it's too late for whole food
<flips> unless I really run
<flips> don't feel like really running
<flips> maybe it's 3rd street for dinner tonight
<MaZe> so i need to make a dynamic wq, init with init_waitqueue_head()
<flips> yes, and there are various convenience wrappers
<flips> best is to write it on the metal the first time
<MaZe> #define wake_up_all(x)                  __wake_up(x, TASK_NORMAL, 0,
NULL)
<flips> well if I don't go shopping there will be no coffee for breakfast
<MaZe> seems to be the way to wake
<flips> so I'm gone...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://phunq.net/pipermail/tux3/attachments/20080917/8ea499ce/attachment.html>
-------------- next part --------------
_______________________________________________
Tux3 mailing list
Tux3 at tux3.org
http://tux3.org/cgi-bin/mailman/listinfo/tux3


More information about the Tux3 mailing list