RFR: 8256265: G1: Improve parallelism in regions that failed evacuation [v12]
Thomas Schatzl
tschatzl at openjdk.java.net
Tue Mar 15 14:43:50 UTC 2022
On Fri, 11 Mar 2022 14:28:22 GMT, Hamlin Li <mli at openjdk.org> wrote:
>> Currently G1 assigns a thread per failed evacuated region. This can in effect serialize the whole process as often (particularly with region pinning) there is only one region to fix up.
>>
>> This patch tries to improve parallelism when walking over the regions in chunks
>>
>> Latest implementation scans regions in chunks to bring parallelism, it's based on JDK-8278917 which changes to uses prev bitmap to mark evacuation failure objs.
>>
>> Here's the summary of performance data based on latest implementation, basically, it brings better and stable performance than baseline at "Post Evacuate Cleanup 1/remove self forwardee" phase. (Although some regression is spotted when calculate the results in geomean, becuase one pause time from baseline is far too small than others.)
>>
>> The performance benefit trend is:
>> - pause time (Post Evacuate Cleanup 1) is decreased from 76.79% to 2.28% for average time, from 71.61% to 3.04% for geomean, when G1EvacuationFailureALotCSetPercent is changed from 2 to 90 (-XX:ParallelGCThreads=8)
>> - pause time (Post Evacuate Cleanup 1) is decreased from 63.84% to 15.16% for average time, from 55.41% to 12.45% for geomean, when G1EvacuationFailureALotCSetPercent is changed from 2 to 90 (-XX:ParallelGCThreads=<default=123>)
>> ( Other common Evacuation Failure configurations are:
>> -XX:+G1EvacuationFailureALot -XX:G1EvacuationFailureALotInterval=0 -XX:G1EvacuationFailureALotCount=0 )
>>
>> For more detailed performance data, please check the related bug.
>
> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision:
>
> Albert review
Changes requested by tschatzl (Reviewer).
src/hotspot/share/gc/g1/g1EvacFailure.cpp line 155:
> 153:
> 154: class RemoveSelfForwardPtrHRChunkClosure : public G1HeapRegionChunkClosure {
> 155: class RegionMarkedWordsCache {
Suggestion:
// Caches the currently accumulated number of live/marked words found in this heap region. Avoids direct (frequent) atomic operations on the HeapRegion's marked words.
class RegionMarkedWordsCache {
src/hotspot/share/gc/g1/g1EvacFailure.cpp line 170:
> 168: void note_self_forwarding_removal_end_par() {
> 169: _g1h->region_at(_region_idx)->note_self_forwarding_removal_end_par(_marked_words * BytesPerWord);
> 170: }
This helper should be `private` afaict.
src/hotspot/share/gc/g1/g1EvacFailureRegions.cpp line 50:
> 48: _regions_failed_evacuation.resize(max_regions);
> 49: _evac_failure_regions = NEW_C_HEAP_ARRAY(uint, max_regions, mtGC);
> 50: _chunks_in_regions = new (NEW_C_HEAP_OBJ(G1ScanChunksInHeapRegions, mtGC)) G1ScanChunksInHeapRegions();
If the code made `G1ScanChunksInHeapRegions` inherit from `CHeapObj<mtGC>`, then the macros to allocate/delete would not be required.
src/hotspot/share/gc/g1/g1EvacFailureRegions.cpp line 98:
> 96:
> 97: void prepare_region(uint region_idx, uint worker_id) {
> 98: G1CollectedHeap* _heap = G1CollectedHeap::heap();
Locals should not have an underscore. Also we tend to use `g1h` for variables containing `G1CollectedHeap::heap()`.
src/hotspot/share/gc/g1/g1EvacFailureRegions.cpp line 98:
> 96:
> 97: void prepare_region(uint region_idx, uint worker_id) {
> 98: G1CollectedHeap* _heap = G1CollectedHeap::heap();
Locals should not have an underscore. I personally also prefer `g1h` for `G1CollectedHeap::heap()`, but `heap` is fine.
src/hotspot/share/gc/g1/g1EvacFailureRegions.cpp line 139:
> 137: public:
> 138: PrepareEvacFailureRegionTask(G1EvacFailureRegions* evac_failure_regions, uint num_workers) :
> 139: WorkerTask("Prepare Evacuation Failure Region Task"),
The whole initializer list should indented one more level.
src/hotspot/share/gc/g1/g1EvacFailureRegions.cpp line 152:
> 150:
> 151: void G1EvacFailureRegions::prepare_regions() {
> 152: uint num_workers = MAX2(1u, MIN2(_evac_failure_regions_cur_length, G1CollectedHeap::heap()->workers()->active_workers()));
There is this `clamp(value, min max)` function which is a bit more readable than this. Please also introduce a helper variable `g1h` to improve readability.
src/hotspot/share/gc/g1/g1EvacFailureRegions.cpp line 154:
> 152: uint num_workers = MAX2(1u, MIN2(_evac_failure_regions_cur_length, G1CollectedHeap::heap()->workers()->active_workers()));
> 153: PrepareEvacFailureRegionTask task(this, num_workers);
> 154: G1CollectedHeap::heap()->workers()->run_task(&task, num_workers);
Suggestion:
WorkerThreads* workers = G1CollectedHeap::heap()->workers();
uint num_workers = clamp(_evac_failure_regions_cur_length, 1u, workers->active_workers()));
PrepareEvacFailureRegionTask task(this, num_workers);
workers->run_task(&task, num_workers);
This suggestion is kind of obsolete when wrapping this into a `G1AbstractSubTask`
src/hotspot/share/gc/g1/g1EvacFailureRegions.hpp line 68:
> 66:
> 67: // Do necessary preparation for evacuation failure regions
> 68: void prepare_regions();
I think the whole of `prepare_regions` should be integrated as a `G1BatchedTask`; from an initial look there does not seem to be anything in cleanup phase 1 that is dependent on what this code updates.
Simply add a method that returns an `G1AbstractSubTask` for use in the `G1BatchedGangTask` as done e.g. for `G1RemSet::create_cleanup_after_scan_heap_roots_task`.
src/hotspot/share/gc/g1/g1GCPhaseTimes.cpp line 118:
> 116: _gc_par_phases[EagerlyReclaimHumongousObjects] = new WorkerDataArray<double>("EagerlyReclaimHumongousObjects", "Eagerly Reclaim Humongous Objects (ms):", max_gc_threads);
> 117: _gc_par_phases[RestorePreservedMarks] = new WorkerDataArray<double>("RestorePreservedMarks", "Restore Preserved Marks (ms):", max_gc_threads);
> 118: _gc_par_phases[VerifyAfterSelfForwardingPtrRemoval] = new WorkerDataArray<double>("VerifyAfterSelfForwardingPtrRemoval", "Verify Retained Regions (ms):", max_gc_threads);
Not sure if we need extra verification pass time logging, but keep it if you think it is useful.
src/hotspot/share/gc/g1/g1HeapRegionChunk.cpp line 2:
> 1: /*
> 2: * Copyright (c) 2021, Huawei Technologies Co. Ltd. All rights reserved.
It's 2022 already :)
src/hotspot/share/gc/g1/g1HeapRegionChunk.cpp line 2:
> 1: /*
> 2: * Copyright (c) 2021, Huawei Technologies Co. Ltd. All rights reserved.
It's 2022 now :)
src/hotspot/share/gc/g1/g1HeapRegionChunk.cpp line 30:
> 28: #include "gc/g1/g1ConcurrentMarkBitMap.inline.hpp"
> 29: #include "gc/g1/g1HeapRegionChunk.hpp"
> 30: #include "gc/g1/heapRegion.hpp"
I think including the .inline is better here as likely one or more of its methods are used in this code (I did not check), but there is a risk on the .inline contents being imported from somewhere else.
src/hotspot/share/gc/g1/g1HeapRegionChunk.cpp line 55:
> 53:
> 54: void G1ScanChunksInHeapRegions::process_chunk(G1HeapRegionChunkClosure* chunk_closure, uint chunk_id, uint worker_id) {
> 55: G1GCPhaseTimes* p = G1CollectedHeap::heap()->phase_times();
Introduce a `g1h` variable for `G1CollectedHeap::heap()` here (your call)
src/hotspot/share/gc/g1/g1HeapRegionChunk.cpp line 55:
> 53:
> 54: void G1ScanChunksInHeapRegions::process_chunk(G1HeapRegionChunkClosure* chunk_closure, uint chunk_id, uint worker_id) {
> 55: G1GCPhaseTimes* p = G1CollectedHeap::heap()->phase_times();
Please extract `G1CollectedHeap::heap()` into a variable (`g1h`).
src/hotspot/share/gc/g1/g1HeapRegionChunk.hpp line 80:
> 78: uint _chunks_per_region;
> 79: uint _chunk_size;
> 80: uint _total_chunks;
This can be retrieved (via a getter) from `_chunks` to avoid storing the same value multiple times.
src/hotspot/share/gc/g1/g1HeapRegionChunk.hpp line 80:
> 78: uint _chunks_per_region;
> 79: uint _chunk_size;
> 80: uint _total_chunks;
Instead of the explicit variable, I recommend having a getter that returns `_chunks.size()`. This avoids duplication of that information.
src/hotspot/share/gc/g1/g1YoungCollector.cpp line 986:
> 984: allocator()->release_gc_alloc_regions(evacuation_info);
> 985:
> 986: _evac_failure_regions.prepare_regions();
Can this task be moved into the `post_evacuate_cleanup_1` phase as a `G1BatchedTask`?
src/hotspot/share/gc/g1/g1YoungGCPostEvacuateTasks.cpp line 70:
> 68:
> 69: class G1PostEvacuateCollectionSetCleanupTask1::SampleCollectionSetCandidatesTask : public G1AbstractSubTask {
> 70:
Seems to be an unrelated cleanup. Please remove.
src/hotspot/share/gc/g1/g1YoungGCPostEvacuateTasks.cpp line 73:
> 71: public:
> 72: SampleCollectionSetCandidatesTask() :
> 73: G1AbstractSubTask(G1GCPhaseTimes::SampleCollectionSetCandidates) { }
Seems to be an unrelated cleanup. Please remove.
src/hotspot/share/gc/g1/g1YoungGCPostEvacuateTasks.cpp line 90:
> 88:
> 89: bool do_heap_region(HeapRegion* hr) override {
> 90: _total.add(hr->rem_set()->card_set_memory_stats());
Seems to be an unrelated cleanup. Please remove.
src/hotspot/share/gc/g1/g1_globals.hpp line 391:
> 389: "for evecuation failure regions.") \
> 390: range(1, 1024) \
> 391: \
Please make these diagnostic. They really are just for troubleshooting and not general use.
-------------
PR: https://git.openjdk.java.net/jdk/pull/7047
More information about the hotspot-gc-dev
mailing list