[FYI] tux3: Core changes

Sun Jul 5 05:54:45 PDT 2015

Jan Kara <jack at suse.cz> writes:

>> I'm not sure I'm understanding your pseudocode logic correctly though.
>> This logic doesn't seems to be a page forking specific issue.  And
>> this pseudocode logic seems to be missing the locking and revalidate of
>> page.
>> 
>> If you can show more details, it would be helpful to see more, and
>> discuss the issue of page forking, or we can think about how to handle
>> the corner cases.
>> 
>> Well, before that, why need more details?
>> 
>> For example, replace the page fork at (4) with "truncate", "punch
>> hole", or "invalidate page".
>> 
>> Those operations remove the old page from radix tree, so the
>> userspace's write creates the new page, and HW still refererences the
>> old page.  (I.e. situation should be same with page forking, in my
>> understand of this pseudocode logic.)
>
> Yes, if userspace truncates the file, the situation we end up with is
> basically the same. However for truncate to happen some malicious process
> has to come and truncate the file - a failure scenario that is acceptable
> for most use cases since it doesn't happen unless someone is actively
> trying to screw you. With page forking it is enough for flusher thread
> to start writeback for that page to trigger the problem - event that is
> basically bound to happen without any other userspace application
> interfering.

Acceptable conclusion is where came from? That pseudocode logic doesn't
say about usage at all. And even if assume it is acceptable, as far as I
can see, for example /proc/sys/vm/drop_caches is enough to trigger, or a
page on non-exists block (sparse file. i.e. missing disk space check in
your logic). And if really no any lock/check, there would be another
races.

>> IOW, this pseudocode logic seems to be broken without page forking if
>> no lock and revalidate.  Usually, we prevent unpleasant I/O by
>> lock_page or PG_writeback, and an obsolated page is revalidated under
>> lock_page.
>
> Well, good luck with converting all the get_user_pages() users in kernel to
> use lock_page() or PG_writeback checks to avoid issues with page forking. I
> don't think that's really feasible.

What does all get_user_pages() conversion mean? Well, maybe right more
or less, I also think there is the issue in/around get_user_pages() that
we have to tackle.

IMO, if there is a code that pseudocode logic actually, it is the
breakage. And "it is acceptable and limitation, and give up to fix", I
don't think it is the right way to go. If there is really code broken
like your logic, I think we should fix.

Could you point which code is using your logic? Since that seems to be
so racy, I can't believe yet there are that racy codes actually.

>> For page forking, we may also be able to prevent similar situation by
>> locking, flags, and revalidate. But those details might be different
>> with current code, because page states are different.
>
> Sorry, I don't understand what do you mean in this paragraph. Can you
> explain it a bit more?

This just means a forked page (old page) and a truncated page have
different set of flags and state, so we may have to adjust revalidation.

Thanks.
-- 
OGAWA Hirofumi <hirofumi at mail.parknet.co.jp>