RFR: 8256265: G1: Improve parallelism in regions that failed evacuation [v7]

Thu Feb 17 16:11:05 UTC 2022

On Mon, 14 Feb 2022 12:10:51 GMT, Hamlin Li <mli at openjdk.org> wrote:

>> Currently G1 assigns a thread per failed evacuated region. This can in effect serialize the whole process as often (particularly with region pinning) there is only one region to fix up.
>> 
>> This patch tries to improve parallelism when walking over the regions in chunks
>> 
>> Latest implementation scans regions in chunks to bring parallelism, it's based on JDK-8278917 which changes to uses prev bitmap to mark evacuation failure objs.
>> 
>> Here's the summary of performance data based on latest implementation, basically, it brings better and stable performance than baseline at "Post Evacuate Cleanup 1/remove self forwardee" phase. (Although some regression is spotted when calculate the results in geomean, becuase one pause time from baseline is far too small than others.)
>> 
>> The performance benefit trend is:
>>  - pause time (Post Evacuate Cleanup 1) is decreased from 76.79% to 2.28% for average time, from 71.61% to 3.04% for geomean, when G1EvacuationFailureALotCSetPercent is changed from 2 to 90 (-XX:ParallelGCThreads=8)
>>  - pause time (Post Evacuate Cleanup 1) is decreased from 63.84% to 15.16% for average time, from 55.41% to 12.45% for geomean, when G1EvacuationFailureALotCSetPercent is changed from 2 to 90 (-XX:ParallelGCThreads=<default=123>)
>> ( Other common Evacuation Failure configurations are:
>> -XX:+G1EvacuationFailureALot -XX:G1EvacuationFailureALotInterval=0 -XX:G1EvacuationFailureALotCount=0 )
>> 
>> For more detailed performance data, please check the related bug.
>
> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Clean code; adapt to new bot implementation; others

Hi,

> My test (with the latest implementation) shows that when evacuation failure regions number is less than parallel gc thread number, it bring stable benefit in post 1 phase; but when evacuation failure regions number is more than parallel gc thread number, the benefit is not stable, and can bring some regionssion in post 1 phase. I think the test result is reasonable. When there are more evaucuation failure regions than parallel gc threads, parallelism at region level should already assign some regions to every gc threads, i.e. it's already fully parallized in some degree; whether parallelism at chunk level could bring more benefit depends on the distribution of evacuation failure objects in regions. Otherwise, when there are less evaucuation failure regions, parallelism at region level can not assign every gc threads a evacuation failure region to process, at this situation parallism at chunk level can bring more benefit, and the benefit is stable.
> 
> A simple heuristic is to switch to original implemenation, i.e. parallelize only at region level, when detects that evacuation failure regions number is more than parallel gc thread number. The advantage is that it avoids to consume extra CPU to do unnecessary parallelism at chunk level. The drawback of this solution is that it will bring 2 pieces of code: parallelism in regions, and parallelism in chunks.
> 
> How do you think about it?

I agree it is unnecessary to slice up the work into too many work units. One option is to determine the number of chunks per region depending on the number of failed regions and the number of threads. E.g. so that the target number of chunks in total is some (small) multiple of the number of threads.

Something like `ChunksPerRegion = next_log_2(#threads-actually-launched * 10 / #regions-retained)` maybe?

That `10` is just some random value (I also hope I got the formula right) but I think you get the idea.

Note that this is a bit of a circular calculation, as the #threads is sort-of determined by the number of regions... :D Maybe set the number of threads started at most (`worker_cost()`) at most to some small multiple of the number of regions retained (which is already done, but I'd probably bump that a bit, but idk).

Thomas

-------------

PR: https://git.openjdk.java.net/jdk/pull/7047