RFR: 8321242: Enable WorkerThreads to run tasks in caller thread

Kim Barrett kbarrett at openjdk.org
Mon Dec 4 21:48:34 UTC 2023


On Mon, 4 Dec 2023 10:37:42 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> `WorkerThreads` is a generic abstraction that allows running the task in multiple threads. There are cases, however, when we only ask for a single thread to carry the computation, and we can accept the caller running the computation instead of ping-ponging with the worker thread. As rendezvous with worker thread is a potential latency hiccup, we would like to avoid delegating work to worker threads unnecessarily. This improves latency on critical paths: the usual round-trip takes about 10..50us on systems I tried.
> 
> Nominally, we would like to handle the case for 1 worker requested. But I would argue we would also like to do this even if we are requesting N (N > 1) threads to carry the compute. We can submit (N-1) tasks to workers, and execute the other task in the caller. For lower N-s, this removes additional rendezvous point with workers, and allows caller to complete the bulk of the work while workers are waking up. This would also simplify testing: if caller path is always taken, this would verify the task can indeed be executed by caller.
> 
> We cannot, however, do this optimization unconditionally, because the caller thread might not be set up in the same way as workers are, and executing the code in caller might cause bugs. Therefore, it would be nice to have the opt-in option that allows running in caller thread. Since this looks to be the property of the task, I added it to task definition.
> 
> This PR is only the infrastructure code additions, without product code behavior changes. New option is sanity-tested by new gtest. Other PRs can then opt-in tasks into this, for example #16882.
> 
> Additional testing:
>  - [x] New gtest
>  - [x] Linux x86_64 server fastdebug, `tier{1,2,3}`
>  - [x] Linux AArch64 server fastdebug, `tier{1,2,3}`

Changes requested by kbarrett (Reviewer).

src/hotspot/share/gc/shared/workerThread.cpp line 49:

> 47:   // No workers are allowed to read the state variables until they have been signaled.
> 48:   _task = task;
> 49:   _not_finished = num_worker_tasks;

[preexisting] The assignment of _not_finished should be done with an Atomic::store.

src/hotspot/share/gc/shared/workerThread.cpp line 63:

> 61:   if (use_caller) {
> 62:     // Execute task in caller.
> 63:     task->work(0);

The worker threads do this in the context of a GCIdMark.  Seems like that should be done here too.

src/hotspot/share/gc/shared/workerThread.cpp line 63:

> 61:   if (use_caller) {
> 62:     // Execute task in caller.
> 63:     task->work(0);

WorkerThreads have their priority set to NearMaxPriority, while the calling thread has whatever priority it has. 
Priorities don't seem to matter much on unix-based platforms, but I _think_ do matter on Windows.  Should we
temporarily adjust the calling thread's priority here?

-------------

PR Review: https://git.openjdk.org/jdk/pull/16945#pullrequestreview-1763431174
PR Review Comment: https://git.openjdk.org/jdk/pull/16945#discussion_r1414534949
PR Review Comment: https://git.openjdk.org/jdk/pull/16945#discussion_r1414506984
PR Review Comment: https://git.openjdk.org/jdk/pull/16945#discussion_r1414544071


More information about the hotspot-gc-dev mailing list