RFR: 8306738: Select num workers for safepoint ParallelCleanupTask [v2]
Aleksey Shipilev
shade at openjdk.org
Wed May 3 11:44:15 UTC 2023
On Thu, 27 Apr 2023 12:36:54 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote:
>>> So we need to investigate the cost of each operation, the possibility (and granularity) of estimating the work for each task and when the cost of spinning up worker threads is amortised by the parallel speedup.
>>
>> Yes.
>>
>> I think the guiding principle here is that thread hand-off (delivering the wake up signal, wake up lag, waiting for worker thread termination) can easily take tens of microseconds. We established that in Parallel Streams work in JDK 8. So for tasks that only notify other, already threaded subsystems, spinning up the thread just to perform the notification of _another_ thread seems wasteful.
>>
>> I think the only subtasks that do actual work are lazy roots (since they walk the unbounded number of threads) and IC buffer cleanups (when they have work, they can walk lots of IC stubs). The compounding factor is that string/symbol/oopstorage cleanup notification paths all take `ServiceLock`, and not only we spin up worker threads unnecessarily to essentially deliver a flag update, we also contend the worker threads over that lock! Oops. I wonder if that's where part of the cleanup time spikes comes from.
>>
>> I believe that we don't need to go very deep in this. The crude patch I have above is probably already 80% there, we "only" need to handle the IC buffer updates there, and test this actually works on many scenarios. There is no need, IMO, in trying to micro-optimize the stability of "ParallelCleanupTask::estimated_workers", or change the way we claim the subtasks, etc, since from the ballbark experiments I had, those things consume about a microsecond -- diminishing returns for the complexity the deeper rewrite would entail. Just estimating the number of safepoint workers gets us lots of bang for little buck :)
>
>> > So we need to investigate the cost of each operation, the possibility (and granularity) of estimating the work for each task and when the cost of spinning up worker threads is amortised by the parallel speedup.
>>
>> Yes.
>>
>> I think the guiding principle here is that thread hand-off (delivering the wake up signal, wake up lag, waiting for worker thread termination) can easily take tens of microseconds. We established that in Parallel Streams work in JDK 8. So for tasks that only notify other, already threaded subsystems, spinning up the thread just to perform the notification of _another_ thread seems wasteful.
>>
>> I think the only subtasks that do actual work are lazy roots (since they walk the unbounded number of threads) and IC buffer cleanups (when they have work, they can walk lots of IC stubs). The compounding factor is that string/symbol/oopstorage cleanup notification paths all take `ServiceLock`, and not only we spin up worker threads unnecessarily to essentially deliver a flag update, we also contend the worker threads over that lock! Oops. I wonder if that's where part of the cleanup time spikes comes from.
>>
>> I believe that we don't need to go very deep in this. The crude patch I have above is probably already 80% there, we "only" need to handle the IC buffer updates there, and test this actually works on many scenarios. There is no need, IMO, in trying to micro-optimize the stability of "ParallelCleanupTask::estimated_workers", or change the way we claim the subtasks, etc, since from the ballbark experiments I had, those things consume about a microsecond -- diminishing returns for the complexity the deeper rewrite would entail. Just estimating the number of safepoint workers gets us lots of bang for little buck :)
>
> Alright, my only fear is underestimating the work for some workload, and providing a change where we increase the safepoint times due to selecting to few workers.
@xmas92, what is your plan for this? It would be perfect to get it into JDK 21!
-------------
PR Comment: https://git.openjdk.org/jdk/pull/13616#issuecomment-1532878652
More information about the hotspot-runtime-dev
mailing list