RFR: 8256265 G1: Improve parallelism in regions that failed evacuation [v2]
Thomas Schatzl
tschatzl at openjdk.java.net
Fri Dec 3 12:29:16 UTC 2021
On Thu, 2 Dec 2021 01:51:42 GMT, Hamlin Li <mli at openjdk.org> wrote:
>> Summary
>> -------
>>
>> Currently G1 assigns a thread per failed evacuated region. This can in effect serialize the whole process as often (particularly with region pinning) there is only one region to fix up.
>> Try to improve the parallelsim when walking over the regions by
>>
>> - first, split a region into tasks;
>> - then, process these task in parallel and load balance among GC threads;
>> - last, necessary cleanup
>>
>> NOTE: load balance part of code is almost same as G1ParScanThreadState, if necessary and feasible, consider to refactor this part into a shared code base.
>>
>> Performance Test
>> -------
>>
>> The perf test based on lastest implementation + JDK-8277736 shows that:
>>
>> - when `ParallelGCThreads`=32, when `G1EvacuationFailureALotCSetPercent` <= 50, the parallelism bring more benefit than regression;
>> - when `ParallelGCThreads`=128, whatever `G1EvacuationFailureALotCSetPercent` is, the parallelism bring more benefit than regression;
>>
>> other related evac failure vm options:
>> - `G1EvacuationFailureALotInterval`=1
>> - `G1EvacuationFailureALotCount`=1
>>
>> For detailed perf test result, please check:
>>
>> - https://bugs.openjdk.java.net/secure/attachment/97227/parallel.evac.failure-threads.32.png
>> - https://bugs.openjdk.java.net/secure/attachment/97228/parallel.evac.failure-threads.128.png
>>
>> For the situation like G1EvacuationFailureALotCSetPercent > 50 and ParallelGCThreads=32 , we could fall back to current implmentation, or further optimize the thread sizing at this phase if necessary.
>>
>> NOTE: I don't include perf data for `Remove Self Forwards`, because the comparison of pause time in this phase does not well show the improvement of this implementation, I think the reason is that the original implementation is not load balanced, and the new implementation is. But as `Remove Self Forwards` is part of `Post Evacuate Cleanup 1`, so only `Post Evacuate Cleanup 1` well show the improvement of the new implementation.
>> It could be a potential improvement to refine the Pause time data in `Remove Self Forwards` phase.
>
> Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits:
>
> - Merge branch 'master' into parallelize-evac-failure
> - Adjust worker cost by a factor; initialize task queues set and terminator threads by active workers
> - Fix wrong merge
> - Merge with master
> - Remove and merge code of G1ParRemoveSelfForwardPtrsTask into RemoveSelfForwardPtrsTask
> - Fix crashes in ~G1GCParPhaseTimesTracker(), G1PreRemoveSelfForwardClosure::do_heap_region, G1CollectedHeap::par_iterate_regions_array()=>~StubRoutines::atomic entry points; Refine comments
> - Fix inconsistent length between task queues and terminator
> - Fix crash when heap verification; Fix compilation error; Refine comments
> - Initial commit
Hi again,
>
> * maybe also the "Remove Self Forwards" phase could get some additional information about how many regions/tasks/bytes/objects it worked on (just throwing some ideas here, some may be redundant)
>
> * also Termination information for any phase that uses a `TaskTerminator` is a must; also, the `TASKQUEUE_STATS` mechanism should be supported in a useful way.
It would also be very interesting to get information to differentiate how long the sorting and how long the actual self-forward removal of that phase took. Additional interesting information is memory consumption of the segment allocators/sorting. If significant compared to other allocation, maybe we could even separate them out in a separate category (not sure, I do not really think so, just thinking out loud).
If you need help with adding gc pause log statistics, ping me; for timing information it should be sufficient to add enum values to `G1GCPhaseTimes::GCParPhases`, and initialize them like the others at the appropriate places; counts can be added as "thread work item" like for the `ScanHR` enum value.
The current evacuation failure handling implementation is already really good compared to 17 due to your hard work, now I guess it starts to become necessary knowing more details for further investigation and in general, for analysis and troubleshooting by users now and later :-).
Thanks a lot for your effort,
Thomas
-------------
PR: https://git.openjdk.java.net/jdk/pull/6627
More information about the hotspot-gc-dev
mailing list