RFR: 8256265: G1: Improve parallelism in regions that failed evacuation [v18]
Albert Mingkun Yang
ayang at openjdk.java.net
Wed Apr 6 07:53:48 UTC 2022
On Wed, 6 Apr 2022 06:31:26 GMT, Hamlin Li <mli at openjdk.org> wrote:
>> Currently G1 assigns a thread per failed evacuated region. This can in effect serialize the whole process as often (particularly with region pinning) there is only one region to fix up.
>>
>> This patch tries to improve parallelism when walking over the regions in chunks
>>
>> Latest implementation scans regions in chunks to bring parallelism, it's based on JDK-8278917 which changes to uses prev bitmap to mark evacuation failure objs.
>>
>> Here's the summary of performance data based on latest implementation, basically, it brings better and stable performance than baseline at "Post Evacuate Cleanup 1/remove self forwardee" phase. (Although some regression is spotted when calculate the results in geomean, becuase one pause time from baseline is far too small than others.)
>>
>> The performance benefit trend is:
>> - pause time (Post Evacuate Cleanup 1) is decreased from 76.79% to 2.28% for average time, from 71.61% to 3.04% for geomean, when G1EvacuationFailureALotCSetPercent is changed from 2 to 90 (-XX:ParallelGCThreads=8)
>> - pause time (Post Evacuate Cleanup 1) is decreased from 63.84% to 15.16% for average time, from 55.41% to 12.45% for geomean, when G1EvacuationFailureALotCSetPercent is changed from 2 to 90 (-XX:ParallelGCThreads=<default=123>)
>> ( Other common Evacuation Failure configurations are:
>> -XX:+G1EvacuationFailureALot -XX:G1EvacuationFailureALotInterval=0 -XX:G1EvacuationFailureALotCount=0 )
>>
>> For more detailed performance data, please check the related bug.
>
> Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 25 commits:
>
> - Merge branch 'master' into parallelize-evac-failure-in-bm
> - some cleanup
> - Remove unnecessary method
> - Move prepare_regions to post cleanup 1 phase
> - Fix compilation error
> - Thomas review
> - Albert review
> - move 'chunks per region' log
> - Albert review
> - Thomas & Albert review
> - ... and 15 more: https://git.openjdk.java.net/jdk/compare/741be461...4b672e66
Changes requested by ayang (Reviewer).
src/hotspot/share/gc/g1/g1EvacFailure.cpp line 218:
> 216:
> 217: G1GCPhaseTimes* p = _g1h->phase_times();
> 218: p->record_or_add_thread_work_item(G1GCPhaseTimes::RemoveSelfForwardsInChunks, _worker_id, rspc.marked_words(), G1GCPhaseTimes::RemoveSelfForwardObjectsBytes);
It seems a unit mismatch, `words` vs `bytes`.
src/hotspot/share/gc/g1/g1EvacFailure.hpp line 38:
> 36: // Task to fixup self-forwarding pointers
> 37: // installed as a result of an evacuation failure.
> 38: class G1ParRemoveSelfForwardPtrsTask: public WorkerTask {
Sort of preexisting, I don't get why it needs to inherit from `WorkerTask`.
src/hotspot/share/gc/g1/g1HeapRegionChunk.cpp line 55:
> 53:
> 54: void G1ScanChunksInHeapRegions::process_chunk(G1HeapRegionChunkClosure* chunk_closure, uint chunk_id, uint worker_id) {
> 55: G1CollectedHeap* glh = G1CollectedHeap::heap();
Should be `g1h` not `glh`.
src/hotspot/share/gc/g1/g1HeapRegionChunk.cpp line 84:
> 82:
> 83: _chunks_per_region = next_power_of_2(num_workers * G1RemoveSelfForwardPtrsThreadLoadFactor / evac_failure_regions_length);
> 84: _chunk_size = static_cast<uint>(G1HeapRegionSize / _chunks_per_region);
I believe `HeapRegion::GrainWords` should be used.
src/hotspot/share/gc/g1/g1YoungGCPostEvacuateTasks.cpp line 103:
> 101: G1ParRemoveSelfForwardPtrsTask _task;
> 102: G1EvacFailureRegions* _evac_failure_regions;
> 103: const char* _task_name;
I don't see where it's used.
src/hotspot/share/gc/g1/g1YoungGCPostEvacuateTasks.cpp line 126:
> 124: };
> 125:
> 126: const char* G1PostEvacuateCollectionSetCleanupTask1::_name = "Post Evacuate Cleanup 1";
If the above `_task_name` can be removed, this can be restored IMO.
src/hotspot/share/gc/g1/g1_globals.hpp line 382:
> 380: section of the heap.") \
> 381: \
> 382: product(uint, G1RemoveSelfForwardPtrsWorkerCost, 2, DIAGNOSTIC, \
Looking at how it's used, `G1RemoveSelfForwardPtrsWorkerCost` is essentially #workers per evac-fail regions. The current name/doc msg exposes too much impl detail and is not end-user friendly. How about "G1PerRetainedRegionThreads: The number of worker threads used for each retained (evacuation failure) region, subject to ParallelGCThreads constraint"?
src/hotspot/share/gc/g1/g1_globals.hpp line 387:
> 385: range(1, 16) \
> 386: \
> 387: product(uint, G1RemoveSelfForwardPtrsThreadLoadFactor, 16, DIAGNOSTIC, \
I can't comprehend `G1RemoveSelfForwardPtrsThreadLoadFactor`, even after seeing how it is used in the impl. I get its purpose though: calculating a good chunk-size, small enough to have sufficient parallelism within a region but also large enough to keep bookkeeping/sync cost low. This reminds me of `G1RemSetScanState::get_chunks_per_region`. I wonder if we can reuse that heuristic for now.
-------------
PR: https://git.openjdk.java.net/jdk/pull/7047
More information about the hotspot-gc-dev
mailing list