RFR: 8321242: Enable WorkerThreads to run tasks in caller thread [v3]
Aleksey Shipilev
shade at openjdk.org
Wed Dec 6 13:00:08 UTC 2023
On Wed, 6 Dec 2023 12:45:53 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:
>> `WorkerThreads` is a generic abstraction that allows running the task in multiple threads. There are cases, however, when we only ask for a single thread to carry the computation, and we can accept the caller running the computation instead of ping-ponging with the worker thread. As rendezvous with worker thread is a potential latency hiccup, we would like to avoid delegating work to worker threads unnecessarily. This improves latency on critical paths: the usual round-trip takes about 10..50us on systems I tried.
>>
>> Nominally, we would like to handle the case for 1 worker requested. But I would argue we would also like to do this even if we are requesting N (N > 1) threads to carry the compute. We can submit (N-1) tasks to workers, and execute the other task in the caller. For lower N-s, this removes additional rendezvous point with workers, and allows caller to complete the bulk of the work while workers are waking up. This would also simplify testing: if caller path is always taken, this would verify the task can indeed be executed by caller.
>>
>> We cannot, however, do this optimization unconditionally, because the caller thread might not be set up in the same way as workers are, and executing the code in caller might cause bugs. Therefore, it would be nice to have the opt-in option that allows running in caller thread. Since this looks to be the property of the task, I added it to task definition.
>>
>> This PR is only the infrastructure code additions, without product code behavior changes. New option is sanity-tested by new gtest. Other PRs can then opt-in tasks into this, for example #16882.
>>
>> Additional testing:
>> - [x] New gtest
>> - [x] Linux x86_64 server fastdebug, `tier{1,2,3}`
>> - [x] Linux AArch64 server fastdebug, `tier{1,2,3}`
>
> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision:
>
> - More test updates
> - Always run at least one task in caller, if possible
> - Split out performance test
> - Better test
> - Merge branch 'master' into JDK-8321242-worker-threads-caller-runs
> - Remove priority adjustment
> - More touchups
> - Merge branch 'master' into JDK-8321242-worker-threads-caller-runs
> - Allow caller to execute more than one task. Allow workers to take all tasks before caller is able to act.
> - Atomic touchups
> - ... and 3 more: https://git.openjdk.org/jdk/compare/b28263f1...99060fe9
Rewrote this a bit: priority adjustments are gone (can be reinstated later), atomics are not used since we are covered by semaphores, caller can now execute more than 1 task, gtest now carries the performance test to ballpark the improvements.
On c5.9xlarge, I have the following (quite noisy) result:
Full parallelism:
only workers:
2856293.574 us total; 57.126 us avg; 84.191 us max
2796672.079 us total; 55.933 us avg; 78.010 us max
2786075.701 us total; 55.722 us avg; 80.546 us max
2791395.216 us total; 55.828 us avg; 78.631 us max
2787526.197 us total; 55.751 us avg; 75.513 us max
workers + caller:
2718025.592 us total; 54.361 us avg; 79.392 us max
2723385.505 us total; 54.468 us avg; 88.255 us max
2723581.929 us total; 54.472 us avg; 84.199 us max
2725052.775 us total; 54.501 us avg; 81.570 us max
2727365.276 us total; 54.547 us avg; 82.973 us max
Half parallelism:
only workers:
1405136.045 us total; 28.103 us avg; 47.395 us max
1407230.696 us total; 28.145 us avg; 47.097 us max
1409279.497 us total; 28.186 us avg; 47.073 us max
1407346.105 us total; 28.147 us avg; 47.638 us max
1404350.254 us total; 28.087 us avg; 54.614 us max
workers + caller:
1318669.870 us total; 26.373 us avg; 37.887 us max
1321017.398 us total; 26.420 us avg; 40.561 us max
1337775.555 us total; 26.756 us avg; 48.519 us max
1327958.718 us total; 26.559 us avg; 39.517 us max
1329176.320 us total; 26.584 us avg; 46.197 us max
Min parallelism:
only workers:
459122.457 us total; 9.182 us avg; 26.771 us max
455487.663 us total; 9.110 us avg; 22.733 us max
457087.605 us total; 9.142 us avg; 23.489 us max
455424.606 us total; 9.108 us avg; 20.371 us max
457171.190 us total; 9.143 us avg; 22.381 us max
workers + caller:
3316.230 us total; 0.066 us avg; 0.691 us max
3332.589 us total; 0.067 us avg; 6.017 us max
3335.294 us total; 0.067 us avg; 6.859 us max
3317.823 us total; 0.066 us avg; 0.226 us max
3325.820 us total; 0.067 us avg; 5.868 us max
-------------
PR Comment: https://git.openjdk.org/jdk/pull/16945#issuecomment-1842826179
More information about the hotspot-gc-dev
mailing list