RFR: 8272807: Permit use of memory concurrent with pretouch

Tue Aug 24 11:01:27 UTC 2021

On Mon, 23 Aug 2021 11:35:18 GMT, Kim Barrett <kbarrett at openjdk.org> wrote:

> Please review this change to os::pretouch_memory to permit use of the memory
> concurrently with the pretouch operation.  This is accomplished by using an
> atomic add of zero as the operation for touching the memory, ensuring the
> virtual location is backed by physical memory while not changing any values
> being read or written by the application.
> 
> While I was there, fixed some other lurking issues in os::pretouch_memory.
> There was a potential overflow in the iteration that has been fixed.  And if
> the range arguments weren't page aligned then the last page might not get
> touched.  The latter was even mentioned in the function's description.  Both
> of those have been fixed by careful alignment and some extra checks.  The
> resulting code is a little more complicated, but more robust and complete.
> 
> This change doesn't make use of the new capability; I have some other
> changes in development to do that.
> 
> Testing:
> mach5 tier1-3.
> 
> I've been using this change while developing uses of the new capability.
> Performance testing hasn't found any regressions related to this change.

> I _haven't_ written a microbenchmark to commit memory, time touching it with
> one of the approaches, uncommit, repeat. I could do that, though I don't
> expect it to show anything either.

Yeah, the overhead is measurable. See for example Epsilon with 100G heap (several runs, most typical result is shown):

$ time ~/trunks/jdk/build/baseline/bin/java -Xms100g -Xmx100g -XX:+AlwaysPreTouch -XX:+UnlockExperimentalVMOptions -XX:+UseEpsilonGC Hello
Hello!

real	0m23.075s
user	0m1.880s
sys	0m21.108s

$ time ~/trunks/jdk/build/patched/bin/java -Xms100g -Xmx100g -XX:+AlwaysPreTouch -XX:+UnlockExperimentalVMOptions -XX:+UseEpsilonGC Hello
Hello!

real	0m23.568s  ; + 500ms
user	0m2.306s    ; + 420ms
sys	0m21.189s  ; + 80ms (noise?)

This correlates with 100G / 4K = 25M pages to touch with atomics, which gives us roughly additional 500ms/25M = 20 ns per atomic/page (most likely cache-missing atomic costing extra). In the test above, this adds up to ~2% overhead. I do believe this overhead is inconsequential (since user already kinda loses startup performance "privileges" with `-XX:+AlwaysPreTouch` anyway), especially if we would be able to leverage this feature to pre-touch heap in background in future RFEs.

And this is x86_64. Whereas I see that AArch64 seems to do the call to the helper with `memory_order_conservative` always.

-------------

PR: https://git.openjdk.java.net/jdk/pull/5215