[FYI] tux3: Core changes
Jan Kara
jack at suse.cz
Mon Aug 3 06:42:51 PDT 2015
On Fri 31-07-15 13:44:44, OGAWA Hirofumi wrote:
> Jan Kara <jack at suse.cz> writes:
>
> >> > Yes, if userspace truncates the file, the situation we end up with is
> >> > basically the same. However for truncate to happen some malicious process
> >> > has to come and truncate the file - a failure scenario that is acceptable
> >> > for most use cases since it doesn't happen unless someone is actively
> >> > trying to screw you. With page forking it is enough for flusher thread
> >> > to start writeback for that page to trigger the problem - event that is
> >> > basically bound to happen without any other userspace application
> >> > interfering.
> >>
> >> Acceptable conclusion is where came from? That pseudocode logic doesn't
> >> say about usage at all. And even if assume it is acceptable, as far as I
> >> can see, for example /proc/sys/vm/drop_caches is enough to trigger, or a
> >> page on non-exists block (sparse file. i.e. missing disk space check in
> >> your logic). And if really no any lock/check, there would be another
> >> races.
> >
> > So drop_caches won't cause any issues because it avoids mmaped pages.
> > Also page reclaim or page migration don't cause any issues because
> > they avoid pages with increased refcount (and increased refcount would stop
> > drop_caches from reclaiming the page as well if it was not for the mmaped
> > check before). Generally, elevated page refcount currently guarantees page
> > isn't migrated, reclaimed, or otherwise detached from the mapping (except
> > for truncate where the combination of mapping-index becomes invalid) and
> > your page forking would change that assumption - which IMHO has a big
> > potential for some breakage somewhere.
>
> Lifetime and visibility from user are different topic. The issue here
> is visibility. Of course, those has relation more or less though,
> refcount doesn't stop to drop page from radix-tree at all.
Well, refcount prevents dropping page from a radix-tree in some cases -
memory pressure, page migration to name the most prominent ones. It doesn't
prevent page from being dropped because of truncate, that is correct. In
general, the rule we currently obey is that kernel doesn't detach a page
with increased refcount from a radix tree unless there is a syscall asking
kernel to do that.
> Well, anyway, your claim seems to be assuming the userspace app
> workarounds the issues. And it sounds like still not workarounds the
> ENOSPC issue (validate at page fault/GUP) even if assuming userspace
> behave as perfect. Calling it as kernel assumption is strange.
Realistically, I don't think userspace apps workaround anything. They just
do what happens to work. Nobody happens to delete files while application
works on it and expect application to gracefully handle that. So everyone
is happy. I'm not sure about which ENOSPC issue you are speaking BTW. Can
you please ellaborate?
> If you claim, there is strange logic widely used already, and of course,
> we can't simply break it because of compatibility. I would be able to
> agree. But your claim sounds like that logic is sane and well designed
> behavior. So I disagree.
To me the rule: "Do not detach a page from a radix tree if it has an elevated
refcount unless explicitely requested by a syscall" looks like a sane one.
Yes.
> > And frankly I fail to see why you and Daniel care so much about this
> > corner case because from performance POV it's IMHO a non-issue and you
> > bother with page forking because of performance, don't you?
>
> Trying to penalize the corner case path, instead of normal path, should
> try at first. Penalizing normal path to allow corner case path is insane
> basically.
>
> Make normal path faster and more reliable is what we are trying.
Elevated refcount of a page is in my opinion a corner case path. That's why
I think that penalizing that case by waiting for IO instead of forking is
acceptable cost for the improved compatibility & maintainability of the
code.
Honza
--
Jan Kara <jack at suse.com>
SUSE Labs, CR
More information about the Tux3
mailing list