RFR: 8256265: G1: Improve parallelism in regions that failed evacuation [v8]

Hamlin Li mli at openjdk.java.net
Sat Feb 26 09:32:53 UTC 2022


On Fri, 18 Feb 2022 12:00:29 GMT, Hamlin Li <mli at openjdk.org> wrote:

>> Currently G1 assigns a thread per failed evacuated region. This can in effect serialize the whole process as often (particularly with region pinning) there is only one region to fix up.
>> 
>> This patch tries to improve parallelism when walking over the regions in chunks
>> 
>> Latest implementation scans regions in chunks to bring parallelism, it's based on JDK-8278917 which changes to uses prev bitmap to mark evacuation failure objs.
>> 
>> Here's the summary of performance data based on latest implementation, basically, it brings better and stable performance than baseline at "Post Evacuate Cleanup 1/remove self forwardee" phase. (Although some regression is spotted when calculate the results in geomean, becuase one pause time from baseline is far too small than others.)
>> 
>> The performance benefit trend is:
>>  - pause time (Post Evacuate Cleanup 1) is decreased from 76.79% to 2.28% for average time, from 71.61% to 3.04% for geomean, when G1EvacuationFailureALotCSetPercent is changed from 2 to 90 (-XX:ParallelGCThreads=8)
>>  - pause time (Post Evacuate Cleanup 1) is decreased from 63.84% to 15.16% for average time, from 55.41% to 12.45% for geomean, when G1EvacuationFailureALotCSetPercent is changed from 2 to 90 (-XX:ParallelGCThreads=<default=123>)
>> ( Other common Evacuation Failure configurations are:
>> -XX:+G1EvacuationFailureALot -XX:G1EvacuationFailureALotInterval=0 -XX:G1EvacuationFailureALotCount=0 )
>> 
>> For more detailed performance data, please check the related bug.
>
> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Thomas review

> `_evac_failure_regions->par_iterate_chunks_in_regions(&chunk_closure, worker_id);` is essentially:
> 
> ```c++
> for (r : evac_fail_regions) {
>   for (chunk : r->chunks) {
>     if (chunk.try_claim()) {
>       chunk_closure(chunk);
>     }
>   }
> }
> ```
> 
> However, I don't see a compelling reason for a two-level iteration, first on all regions and second on chunks inside one region.
> 
> I wonder if sth like the following works. IMO, it expresses the intent more clearly and gets rid of the def/use of closures.
> 
> ```c++
> auto start_chunk_id = worker_id * total_chunks / n_workers;
> 
> for (auto i = 0; i < total_chunks; ++i) {
>   auto chunk_id = (start_chunk_id + i) % total_chunks;
>   G1HeapRegionChunk chunk(chunk_id, ...); // I think the constructor does too much currently
>   if (!chunk.try_claim())
>     continue;
>   // claimed
>   process_chunk(chunk);
> }
> ```
> 
> (This gist is just to show the structure; probably contains bugs.)

Thanks, I agree, will upate the patch as you suggested.

-------------

PR: https://git.openjdk.java.net/jdk/pull/7047



More information about the hotspot-gc-dev mailing list