RFR: 8306738: Select num workers for safepoint ParallelCleanupTask [v2]

Tue Apr 25 14:34:08 UTC 2023

On Tue, 25 Apr 2023 11:46:45 GMT, Thomas Schatzl <tschatzl at openjdk.org> wrote:

>> It is a "known" issue that if the particular task does not select its number of threads properly before executing the task (or not at all like this task) that there may be perf issues.
>> 
>> If this is what you want I would suggest that `active_workers()` as limit preserves the current behavior best. Otherwise the task should calculate and set its optimal number of tasks itself as it does in this change.
>> 
>> Not sure if setting it to `SAFEPOINT_CLEANUP_NUM_TASKS` is a good idea if some/most of this work is trivial/empty but it's probably a good first guess.
>> 
>> The only GC that does not seem to set safepoint_workers to something non-default seems to be Shenandoah (the stw collectors use the "main" workerthreads they also use for evacuation, so this should be maxed out mostly, and ZGC explicitly sets them to max_workers at initialization). (via code inspection, so might be wrong)
>
> @xmas92: what is the reason for this change, did you see any situation where an inappropriate amount (too few) number of threads were used? Just curious.

@tschatzl I noticed the behaviour when I added JFR events to worker threads activity, when I was profiling generational ZGC. There we have separate WorkerThreads for each generation. And a pool a "Runtime" WorkerThreads which is only used for `CollectedHeap::safepoint_workers()`. We do not configure this and only give the VM ParallelGCThreads number of workers. 

It seems correct to respect this flag for heap inspection and heap dump, but seemed strange and unnecessary to spin up all workers when we know that there is a limit to the amount of parallel work. (Just looking at the code, the heap inspect always uses one worker, and heap dumper uses the active number of workers)

Overall my thoughts on which number to use is that:
  * **Active**: Is a bit of a strange API, because it depends on some external set state, it is the current behaviour, so maybe it should be kept and changed in a different PR.
  * **Created**: Some sort of best effort, only use as many as the system has created so far, also seems to respect `UseDynamicNumberOfGCThreads` better. 
  * **Max**: We are a task that requires X workers, we have at least X workers in `CollectedHeap::safepoint_workers()` let us use them. Sort of the most pure API, and is easier to reason about.

The idea with this PR was mostly to put an upper limit to the number of workers. So I will change to use the `active_workers`. 

Maybe `expected_workers()` is worthwhile, I have no data on this. This PR was not motivated by any performance numbers, only the missmatch between the number of workers used and the amount of work available.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/13616#discussion_r1176603865