RFR: 8256265: G1: Improve parallelism in regions that failed evacuation [v14]

Fri Mar 18 08:32:36 UTC 2022

On Fri, 18 Mar 2022 08:02:12 GMT, Hamlin Li <mli at openjdk.org> wrote:

>> Currently G1 assigns a thread per failed evacuated region. This can in effect serialize the whole process as often (particularly with region pinning) there is only one region to fix up.
>> 
>> This patch tries to improve parallelism when walking over the regions in chunks
>> 
>> Latest implementation scans regions in chunks to bring parallelism, it's based on JDK-8278917 which changes to uses prev bitmap to mark evacuation failure objs.
>> 
>> Here's the summary of performance data based on latest implementation, basically, it brings better and stable performance than baseline at "Post Evacuate Cleanup 1/remove self forwardee" phase. (Although some regression is spotted when calculate the results in geomean, becuase one pause time from baseline is far too small than others.)
>> 
>> The performance benefit trend is:
>>  - pause time (Post Evacuate Cleanup 1) is decreased from 76.79% to 2.28% for average time, from 71.61% to 3.04% for geomean, when G1EvacuationFailureALotCSetPercent is changed from 2 to 90 (-XX:ParallelGCThreads=8)
>>  - pause time (Post Evacuate Cleanup 1) is decreased from 63.84% to 15.16% for average time, from 55.41% to 12.45% for geomean, when G1EvacuationFailureALotCSetPercent is changed from 2 to 90 (-XX:ParallelGCThreads=<default=123>)
>> ( Other common Evacuation Failure configurations are:
>> -XX:+G1EvacuationFailureALot -XX:G1EvacuationFailureALotInterval=0 -XX:G1EvacuationFailureALotCount=0 )
>> 
>> For more detailed performance data, please check the related bug.
>
> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fix compilation error

> Thanks for the detailed reviews. :)
> 
> I'm not sure if it's feasible to move `prepare_regions` into post_evacuate_cleanup_1 as a G1BatchedTask. My concern is that in `prepare_regions()->PrepareEvacFailureRegionTask::prepare_region()->HeapRegion::note_self_forwarding_removal_start()`, nTAMS is set, and nTAMS is used _cleanup 1_ phase at RemoveSelfForwardPtrObjClosure to set next bitmap. If move `prepare_regions` into post_evacuate_cleanup_1 as a G1BatchedTask, there is no guarantee that preparation will be done before the usage of nTAMS in RemoveSelfForwardPtrObjClosure. How do you think about it?

I believe this is an unnecessary dependency.

`PrepareEvacFailureRegionTask::prepare_region()->HeapRegion::note_self_forwarding_removal_start` sets nTAMS to `top()` unconditionally with the intent that the mark always happens.

So instead of calling `_cm->mark_in_next_bitmap(_worker_id, obj);` which checks nTAMS, just *unconditionally* mark the next bitmap (not sure if there is already a method for this) to achieve the same effect.

The alternative would be to add a new `G1BatchedTask` for just this preparation, which seems much more work not only in terms of code, but also much more work spinning up threads.

Of course, the use of this "raw" mark method needs to be documented.

Fwiw, in the protoype we have for [JDK-8210708](https://bugs.openjdk.java.net/browse/JDK-8210708), which looks fairly good at this point, a similar change would be needed anyway.

Thanks,
  Thomas

-------------

PR: https://git.openjdk.java.net/jdk/pull/7047