RFR: 8321242: Enable WorkerThreads to run tasks in caller thread [v5]
Aleksey Shipilev
shade at openjdk.org
Wed Dec 6 15:27:36 UTC 2023
On Wed, 6 Dec 2023 14:24:32 GMT, Stefan Karlsson <stefank at openjdk.org> wrote:
> I had hoped that we could get rid of even more special handling of the caller in the WorkerTaskDispatcher, but I see that the proposed patch still tweaks `_num_finished` and runs the caller class without partaking in the counter logic.
Sorry, I don't see "even more special handling". Caller _does_ participate in counting `_started` and `_not_finished` now. The only thing that caller specifically does is reserving one task for itself (by issuing one less permit at `_start_semaphore`), because...
> I guess that the motivation for this is so that we skip all signaling and skip potentially waking up a worker when it isn't needed. I think I would prefer if we had a special-case for this, and make sure that all threads that participate in the WorkerTaskDispatcher use signaling and accounting.
...the problem is not just waking up the workers unnecessarily, but dealing with the situation when caller wakes up a worker, loses a tiny race against the waking up worker, and then realizes that worker started executing the task. This would miss the optimization we actually wanted to begin with: avoiding the round-trip to worker thread for executing a single task.
I see your https://github.com/openjdk/jdk/commit/c1c7295eb0e5bb5b3ad35bfae8316cbaed62cacb avoids this for `workers=1` case by special-handling it, but the problem manifests for threads=2 cases as well.
# This PR:
Test with 2 workers:
only workers:
88857.521 us total; 8.886 us avg; 19.308 us max
91031.770 us total; 9.103 us avg; 19.632 us max
92511.074 us total; 9.251 us avg; 18.840 us max
88850.082 us total; 8.885 us avg; 20.686 us max
88864.039 us total; 8.886 us avg; 19.132 us max
workers + caller:
16625.124 us total; 1.663 us avg; 11.781 us max
16658.809 us total; 1.666 us avg; 9.663 us max
16434.287 us total; 1.643 us avg; 11.400 us max
16498.131 us total; 1.650 us avg; 9.998 us max
16496.808 us total; 1.650 us avg; 13.919 us max
# Mixing in: https://github.com/openjdk/jdk/commit/c1c7295eb0e5bb5b3ad35bfae8316cbaed62cacb
Test with 2 workers:
only workers:
99672.791 us total; 9.967 us avg; 19.332 us max
95419.048 us total; 9.542 us avg; 19.795 us max
64211.455 us total; 6.421 us avg; 20.518 us max
63979.810 us total; 6.398 us avg; 46.090 us max
92039.049 us total; 9.204 us avg; 32.014 us max
workers + caller:
31425.773 us total; 3.143 us avg; 10.903 us max
31333.930 us total; 3.133 us avg; 17.883 us max
31492.294 us total; 3.149 us avg; 12.184 us max
31560.567 us total; 3.156 us avg; 12.971 us max
31555.060 us total; 3.156 us avg; 12.468 us max
> Yet another alternative is to limit the change to only let the caller run if the task is only supposed to run with one worker.
Yeah, I started from that version. But then I realized that doing N-1 instead on N round-trips to worker threads is also beneficial, since every round-trip is a latency hiccup waiting to happen.
But I also think we have to consider reliability/testability: if we make a special case for `num_tasks = 1`, the whole thing would actually be less testable. You would need to make sure that every time you have a task that can be executed in caller, _some testing configuration_ would execute it with single-threaded worker pool to cover caller-run path. Always taking the task in caller, if we can, makes caller-run paths exercised much more often.
So at this point I am not in favor of special-casing the single-task path.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/16945#issuecomment-1843107550
More information about the hotspot-gc-dev
mailing list