[FYI] tux3: Core changes
OGAWA Hirofumi
hirofumi at mail.parknet.co.jp
Sun Jul 5 05:54:45 PDT 2015
Jan Kara <jack at suse.cz> writes:
>> I'm not sure I'm understanding your pseudocode logic correctly though.
>> This logic doesn't seems to be a page forking specific issue. And
>> this pseudocode logic seems to be missing the locking and revalidate of
>> page.
>>
>> If you can show more details, it would be helpful to see more, and
>> discuss the issue of page forking, or we can think about how to handle
>> the corner cases.
>>
>> Well, before that, why need more details?
>>
>> For example, replace the page fork at (4) with "truncate", "punch
>> hole", or "invalidate page".
>>
>> Those operations remove the old page from radix tree, so the
>> userspace's write creates the new page, and HW still refererences the
>> old page. (I.e. situation should be same with page forking, in my
>> understand of this pseudocode logic.)
>
> Yes, if userspace truncates the file, the situation we end up with is
> basically the same. However for truncate to happen some malicious process
> has to come and truncate the file - a failure scenario that is acceptable
> for most use cases since it doesn't happen unless someone is actively
> trying to screw you. With page forking it is enough for flusher thread
> to start writeback for that page to trigger the problem - event that is
> basically bound to happen without any other userspace application
> interfering.
Acceptable conclusion is where came from? That pseudocode logic doesn't
say about usage at all. And even if assume it is acceptable, as far as I
can see, for example /proc/sys/vm/drop_caches is enough to trigger, or a
page on non-exists block (sparse file. i.e. missing disk space check in
your logic). And if really no any lock/check, there would be another
races.
>> IOW, this pseudocode logic seems to be broken without page forking if
>> no lock and revalidate. Usually, we prevent unpleasant I/O by
>> lock_page or PG_writeback, and an obsolated page is revalidated under
>> lock_page.
>
> Well, good luck with converting all the get_user_pages() users in kernel to
> use lock_page() or PG_writeback checks to avoid issues with page forking. I
> don't think that's really feasible.
What does all get_user_pages() conversion mean? Well, maybe right more
or less, I also think there is the issue in/around get_user_pages() that
we have to tackle.
IMO, if there is a code that pseudocode logic actually, it is the
breakage. And "it is acceptable and limitation, and give up to fix", I
don't think it is the right way to go. If there is really code broken
like your logic, I think we should fix.
Could you point which code is using your logic? Since that seems to be
so racy, I can't believe yet there are that racy codes actually.
>> For page forking, we may also be able to prevent similar situation by
>> locking, flags, and revalidate. But those details might be different
>> with current code, because page states are different.
>
> Sorry, I don't understand what do you mean in this paragraph. Can you
> explain it a bit more?
This just means a forked page (old page) and a truncated page have
different set of flags and state, so we may have to adjust revalidation.
Thanks.
--
OGAWA Hirofumi <hirofumi at mail.parknet.co.jp>
More information about the Tux3
mailing list