RFR: 8256265: G1: Improve parallelism in regions that failed evacuation

Tue Jan 25 16:53:36 UTC 2022

On Wed, 12 Jan 2022 09:03:45 GMT, Hamlin Li <mli at openjdk.org> wrote:

> Currently G1 assigns a thread per failed evacuated region. This can in effect serialize the whole process as often (particularly with region pinning) there is only one region to fix up.
> 
> This patch tries to improve parallelism when walking over the regions in chunks
> 
> Latest implementation scans regions in chunks to bring parallelism, it's based on JDK-8278917 which changes to uses prev bitmap to mark evacuation failure objs.
> 
> Here's the summary of performance data based on latest implementation, basically, it brings better and stable performance than baseline at "Post Evacuate Cleanup 1/remove self forwardee" phase. (Although some regression is spotted when calculate the results in geomean, becuase one pause time from baseline is far too small than others.)
> 
> The performance benefit trend is:
>  - pause time (Post Evacuate Cleanup 1) is decreased from 76.79% to 2.28% for average time, from 71.61% to 3.04% for geomean, when G1EvacuationFailureALotCSetPercent is changed from 2 to 90 (-XX:ParallelGCThreads=8)
>  - pause time (Post Evacuate Cleanup 1) is decreased from 63.84% to 15.16% for average time, from 55.41% to 12.45% for geomean, when G1EvacuationFailureALotCSetPercent is changed from 2 to 90 (-XX:ParallelGCThreads=<default=123>)
> ( Other common Evacuation Failure configurations are:
> -XX:+G1EvacuationFailureALot -XX:G1EvacuationFailureALotInterval=0 -XX:G1EvacuationFailureALotCount=0 )
> 
> For more detailed performance data, please check the related bug.

Just a recap of what the change adds:

* on evacuation failure, also records the number of bytes that failed evacuation in that region in a per-region live-map (using `G1RegionMarkStats`)
* (at the start of the Post Evacuation Cleanup 1 we flush that cache - ideally this would be done in `Merge PSS`, but we can't because we need it in the remove self forwards pointer task potentially running in parallel)
* remove self forwards in Post Evacuation Cleanup 1 does roughly the following:
1) let the threads claim and "prepare" the region - mostly setting live bytes from that new per-region live map, "readying the region" (BOT reset, some statistics), finally set to "ready"
2) wait for the region being "ready"
3) let the threads claim parts ("chunks") of the region; these chunks are first generated using information from the region (and the bitmap). They contain information to handle the zapping and restoring, which is then immediately used by that thread.

Fwiw, I did some hacking, adding lots of statistics output to it because I was a bit surprised of some of the numbers I saw (available at https://github.com/tschatzl/jdk/tree/pull/7047-evac-failure-chunking).

-------------

PR: https://git.openjdk.java.net/jdk/pull/7047