RFR: 8272807: Permit use of memory concurrent with pretouch

Wed Sep 8 22:19:15 UTC 2021

> On Sep 8, 2021, at 10:07 AM, Stefan Karlsson <stefank at openjdk.java.net> wrote:
> 
> On Thu, 2 Sep 2021 18:33:56 GMT, Kim Barrett <kbarrett at openjdk.org> wrote:
> 
>> Note that this PR replaces the withdrawn https://github.com/openjdk/jdk/pull/5215.
>> 
>> Please review this change which adds os::touch_memory, which is similar to
>> os::pretouch_memory but allows concurrent access to the memory while it is
>> being touched.  This is accomplished by using an atomic add of zero as the
>> operation for touching the memory, ensuring the virtual location is backed
>> by physical memory while not changing any values being read or written by
>> other threads.
>> 
> 
> I think it would be prudent to separate this PR into two separate PRs.

I don’t see any benefit to that, but I can do it.

> 1) For the changes to the pretouch loop.
> 
> * I'm not sure we need all the added safeguarding, which, as you say, complicate the code. I think a few asserts would be good enough.

It’s not; see below.

> I don't think the function needs to:
> - work if the caller pass in end < start. Sounds like an assert.

There's already an assert for end < start.

The test currently in the code is for start < end, e.g. is the range
non-empty.  I originally tried asserting a non-empty range, but that seems
such show up sometimes.  I didn't try to figure out whether those could be
eliminated, though sometimes empty ranges show up pretty naturally and
range-based operations typically permit the range to be empty.

I found the empty range check up front simplified the analysis of the
protected code, since certain kinds of errors simply can't happen because of
it.  Unless empty ranges are forbidden, something needs to be done somewhere
to prevent writing outside the range.

> - Safeguard that end + page doesn't overflow. Do we ever hand out pages at the end of the virtual address range? There's a lot of HotSpot code that assumes that we never get that kind of memory, so is it worth protecting against this compared to the extra complexity it gives? Previously, I could take a glance at that function and understand it. Now I need to pause and stare at it a bit.

I suspect overflow can't happen on at least some 64bit platforms, but I
don't know of anything that would prevent it on a 32bit platform. And I had
exactly the opposite reaction from you to the old code. I looked at it and
immediately wondered what might happen on pointer overflow, and questioned
whether I understood the code.

And as noted in the PR description, the old code for PretouchTask checks for
overflow, but does so incorrectly, such that the check could be optimized
away (or lead to other problems) because it would involve UB to fail the check.

> 2) For the new infrastructure.
> 
> * I'm not sure we should add TouchTask until we have a concrete use-case (with prototype/PR) for it. Without that, it is hard to gauge the usability of this feature. How do other threads proceed concurrently with the worker threads? What is the impact of using TouchTask? One of the concerns that has been brought up by others is that using a WorkGang will block the rest of the JVM from safepointing while the workers are pre-touching. A use-case would clarify this, I think.

Here is a description of a specific example from my experiments. When the G1
allocator runs out of regions that were pre-allocated for eden and can't GC
because of the GCLocker, it tries to allocate a new region, first from the
free region list, and then an actually new region (assuming that's possible
within the heap size constraints). In the latter case, when AlwaysUsePretouch
is true, it currently does a *single-threaded* pretouch of the region and
ancillary memory. (It's single-threaded because most of the code path involved
is shared with during-GC code that is running under the work gang. That code
is a good candidate for os::touch_memory, so adding a gang argument through
that call tree that is null to indicate don't pretouch doesn't seem like an
improvement.) It could instead do a parallel concurrent touch after making the
new region available for other mutator threads to use for allocation. Yes, it
blocks safepoints, but that's already true for the existing code, and the
existing code may block them for *much* longer due to the unparallelized
pretouch. The downside of using a workgang here is the gang threads could be
competing with mutator threads for cores. I think parallelizing here is likely
beneficial though. I think there's a similar situation in ParallelGC, though I
haven't chased through all of that code carefully yet.

> * Is the usage of template specialization really needed? Wouldn't it be sufficient to pass down a bool parameter instead? I doubt we would see any performance difference by doing that.

The use of a template here seemed pretty straight-forward when I wrote it,
but I agree a boolean argument would work just as well and is more obvious
to read. I'll change that.  I expect the same generated code either way.

> * It's a little bit confusing that in the context of TouchTask "touch" means "concurrent touch", while in the touch_memory_* it means either concurrent or non-concurrent. Maybe rename TouchTask to something to make this distinction a bit clearer.

I'm open to alternative naming. I thought ConcurrentTouchTask was rather
long. My experiments repo currently uses PretouchTask::concurrent_touch, but
I don't like that naming either.

The rationale for "touch" vs "pretouch" is "touching" is the primary generic
concept, and touch_memory can be used anywhere. Meanwhile "pretouching" is now
a restricted variant that might have better performance under those limitions.
Similarly for TouchTask vs PretouchTask. The static functions are just shared
helpers for the API functions; "touch_memory" is the "primary" thing.