RFR: 8256265: G1: Improve parallelism in regions that failed evacuation [v18]

Albert Mingkun Yang ayang at openjdk.java.net
Wed Apr 6 07:53:48 UTC 2022


On Wed, 6 Apr 2022 06:31:26 GMT, Hamlin Li <mli at openjdk.org> wrote:

>> Currently G1 assigns a thread per failed evacuated region. This can in effect serialize the whole process as often (particularly with region pinning) there is only one region to fix up.
>> 
>> This patch tries to improve parallelism when walking over the regions in chunks
>> 
>> Latest implementation scans regions in chunks to bring parallelism, it's based on JDK-8278917 which changes to uses prev bitmap to mark evacuation failure objs.
>> 
>> Here's the summary of performance data based on latest implementation, basically, it brings better and stable performance than baseline at "Post Evacuate Cleanup 1/remove self forwardee" phase. (Although some regression is spotted when calculate the results in geomean, becuase one pause time from baseline is far too small than others.)
>> 
>> The performance benefit trend is:
>>  - pause time (Post Evacuate Cleanup 1) is decreased from 76.79% to 2.28% for average time, from 71.61% to 3.04% for geomean, when G1EvacuationFailureALotCSetPercent is changed from 2 to 90 (-XX:ParallelGCThreads=8)
>>  - pause time (Post Evacuate Cleanup 1) is decreased from 63.84% to 15.16% for average time, from 55.41% to 12.45% for geomean, when G1EvacuationFailureALotCSetPercent is changed from 2 to 90 (-XX:ParallelGCThreads=<default=123>)
>> ( Other common Evacuation Failure configurations are:
>> -XX:+G1EvacuationFailureALot -XX:G1EvacuationFailureALotInterval=0 -XX:G1EvacuationFailureALotCount=0 )
>> 
>> For more detailed performance data, please check the related bug.
>
> Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 25 commits:
> 
>  - Merge branch 'master' into parallelize-evac-failure-in-bm
>  - some cleanup
>  - Remove unnecessary method
>  - Move prepare_regions to post cleanup 1 phase
>  - Fix compilation error
>  - Thomas review
>  - Albert review
>  - move 'chunks per region' log
>  - Albert review
>  - Thomas & Albert review
>  - ... and 15 more: https://git.openjdk.java.net/jdk/compare/741be461...4b672e66

Changes requested by ayang (Reviewer).

src/hotspot/share/gc/g1/g1EvacFailure.cpp line 218:

> 216: 
> 217:     G1GCPhaseTimes* p = _g1h->phase_times();
> 218:     p->record_or_add_thread_work_item(G1GCPhaseTimes::RemoveSelfForwardsInChunks, _worker_id, rspc.marked_words(), G1GCPhaseTimes::RemoveSelfForwardObjectsBytes);

It seems a unit mismatch, `words` vs `bytes`.

src/hotspot/share/gc/g1/g1EvacFailure.hpp line 38:

> 36: // Task to fixup self-forwarding pointers
> 37: // installed as a result of an evacuation failure.
> 38: class G1ParRemoveSelfForwardPtrsTask: public WorkerTask {

Sort of preexisting, I don't get why it needs to inherit from `WorkerTask`.

src/hotspot/share/gc/g1/g1HeapRegionChunk.cpp line 55:

> 53: 
> 54: void G1ScanChunksInHeapRegions::process_chunk(G1HeapRegionChunkClosure* chunk_closure, uint chunk_id, uint worker_id) {
> 55:   G1CollectedHeap* glh = G1CollectedHeap::heap();

Should be `g1h` not `glh`.

src/hotspot/share/gc/g1/g1HeapRegionChunk.cpp line 84:

> 82: 
> 83:   _chunks_per_region = next_power_of_2(num_workers * G1RemoveSelfForwardPtrsThreadLoadFactor / evac_failure_regions_length);
> 84:   _chunk_size = static_cast<uint>(G1HeapRegionSize / _chunks_per_region);

I believe `HeapRegion::GrainWords` should be used.

src/hotspot/share/gc/g1/g1YoungGCPostEvacuateTasks.cpp line 103:

> 101:   G1ParRemoveSelfForwardPtrsTask _task;
> 102:   G1EvacFailureRegions* _evac_failure_regions;
> 103:   const char* _task_name;

I don't see where it's used.

src/hotspot/share/gc/g1/g1YoungGCPostEvacuateTasks.cpp line 126:

> 124: };
> 125: 
> 126: const char* G1PostEvacuateCollectionSetCleanupTask1::_name = "Post Evacuate Cleanup 1";

If the above `_task_name` can be removed, this can be restored IMO.

src/hotspot/share/gc/g1/g1_globals.hpp line 382:

> 380:            section of the heap.")                                           \
> 381:                                                                             \
> 382:   product(uint, G1RemoveSelfForwardPtrsWorkerCost, 2, DIAGNOSTIC,           \

Looking at how it's used, `G1RemoveSelfForwardPtrsWorkerCost` is essentially #workers per evac-fail regions. The current name/doc msg exposes too much impl detail and is not end-user friendly. How about "G1PerRetainedRegionThreads: The number of worker threads used for each retained (evacuation failure) region, subject to ParallelGCThreads constraint"?

src/hotspot/share/gc/g1/g1_globals.hpp line 387:

> 385:           range(1, 16)                                                      \
> 386:                                                                             \
> 387:   product(uint, G1RemoveSelfForwardPtrsThreadLoadFactor, 16, DIAGNOSTIC,    \

I can't comprehend `G1RemoveSelfForwardPtrsThreadLoadFactor`, even after seeing how it is used in the impl. I get its purpose though: calculating a good chunk-size, small enough to have sufficient parallelism within a region but also large enough to keep bookkeeping/sync cost low. This reminds me of `G1RemSetScanState::get_chunks_per_region`. I wonder if we can reuse that heuristic for now.

-------------

PR: https://git.openjdk.java.net/jdk/pull/7047



More information about the hotspot-gc-dev mailing list