RFR: 8312023: Parallel pretouch should shortcut when only 1 thread is needed [v2]

Thu Nov 30 20:12:09 UTC 2023

On Wed, 29 Nov 2023 20:16:31 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> This is a follow-up for code introduced in JDK 16 by [JDK-8252221](https://bugs.openjdk.org/browse/JDK-8252221). 
>> 
>> Unfortunately, we do pre-touch not only at startup, but also during the application lifetime when space boundary moves, see [JDK-8312021](https://bugs.openjdk.org/browse/JDK-8312021). From that code, we start the pre-touch threads unconditionally, even when we only need one thread anyway. This is clearly visible with `-Xlog:gc+heap=debug`. The actual pretouch takes single-digit microseconds, but the round-trip through the worker pool incurs latency of about 100us (2x 50us) in my setups. This is significant for very short GC pauses.
>> 
>> Additional testing:
>>  - [x] Ad-hoc benchmarks
>>  - [x] Linux x86_64 server fastdebug, `tier{1,2,3}` with `-XX:+UseParallelGC -XX:+AlwaysPreTouch`
>
> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Restore calculation, hoping to fix Windows build

I recall several discussions about avoiding use of available workers when the
amount of work to be done is sufficiently small that the number of workers
that would be requested is one. The alternative is to go ahead and use the
workers, but have the worker implementation recognize the special case of only
one requested thread and just run the task on the current thread. That avoids
making all uses of workers specially handle this case. I thought there was an
RFE for this, but can't find it. There may have been some concern that some
task might care about the kind of thread it runs on?

So I'm not convinced this change should be made.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/16882#issuecomment-1834471328