RFR: 8256265 G1: Improve parallelism in regions that failed evacuation [v2]

Hamlin Li mli at openjdk.java.net
Wed Dec 15 12:10:02 UTC 2021

On Thu, 2 Dec 2021 01:51:42 GMT, Hamlin Li <mli at openjdk.org> wrote:

>> Summary
>> -------
>> Currently G1 assigns a thread per failed evacuated region. This can in effect serialize the whole process as often (particularly with region pinning) there is only one region to fix up.
>> Try to improve the parallelsim when walking over the regions by 
>>  - first, split a region into tasks;
>>  - then, process these task in parallel and load balance among GC threads;
>>  - last, necessary cleanup
>> NOTE: load balance part of code is almost same as G1ParScanThreadState, if necessary and feasible, consider to refactor this part into a shared code base.
>> Performance Test
>> -------
>> The perf test based on lastest implementation + JDK-8277736 shows that:
>>  - when `ParallelGCThreads`=32, when `G1EvacuationFailureALotCSetPercent` <= 50, the parallelism bring more benefit than regression;
>> - when `ParallelGCThreads`=128, whatever `G1EvacuationFailureALotCSetPercent` is, the parallelism bring more benefit than regression;
>> other related evac failure vm options:
>>  - `G1EvacuationFailureALotInterval`=1
>>  - `G1EvacuationFailureALotCount`=1
>> For detailed perf test result, please check:
>>  - https://bugs.openjdk.java.net/secure/attachment/97227/parallel.evac.failure-threads.32.png
>>  - https://bugs.openjdk.java.net/secure/attachment/97228/parallel.evac.failure-threads.128.png
>> For the situation like G1EvacuationFailureALotCSetPercent > 50 and ParallelGCThreads=32 , we could fall back to current implmentation, or further optimize the thread sizing at this phase if necessary.
>> NOTE: I don't include perf data for `Remove Self Forwards`, because the comparison of pause time in this phase does not well show the improvement of this implementation, I think the reason is that the original implementation is not load balanced, and the new implementation is. But as `Remove Self Forwards` is part of `Post Evacuate Cleanup 1`, so only `Post Evacuate Cleanup 1` well show the improvement of the new implementation.
>> It could be a potential improvement to refine the Pause time data in `Remove Self Forwards` phase.
> Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits:
>  - Merge branch 'master' into parallelize-evac-failure
>  - Adjust worker cost by a factor; initialize task queues set and terminator threads by active workers
>  - Fix wrong merge
>  - Merge with master
>  - Remove and merge code of G1ParRemoveSelfForwardPtrsTask into RemoveSelfForwardPtrsTask
>  - Fix crashes in ~G1GCParPhaseTimesTracker(), G1PreRemoveSelfForwardClosure::do_heap_region, G1CollectedHeap::par_iterate_regions_array()=>~StubRoutines::atomic entry points; Refine comments
>  - Fix inconsistent length between task queues and terminator
>  - Fix crash when heap verification; Fix compilation error; Refine comments
>  - Initial commit

I have found the root cause of inefficiency of parallelism in chunks in my PoC, it's not related to bitmap.
I have also fixed this fake parallelism issue in my PoC which is based on yours. Now parallelism in chunks brings much improvement when G1EvacuationFailureALotCSetPercent is low (e.g. below 25), and it does not bring much regresson when G1EvacuationFailureALotCSetPercent is high, and it might be improved further or at least we can fall back to previous version (your PoC) when detect G1EvacuationFailureALotCSetPercent is high at runtime.

So, I will send out my PoC of parallelism in chunks after your PoC is integrated.


PR: https://git.openjdk.java.net/jdk/pull/6627

More information about the hotspot-gc-dev mailing list