RFR: 8256265: G1: Improve parallelism in regions that failed evacuation [v12]

Thomas Schatzl tschatzl at openjdk.java.net
Tue Mar 15 14:43:50 UTC 2022


On Fri, 11 Mar 2022 14:28:22 GMT, Hamlin Li <mli at openjdk.org> wrote:

>> Currently G1 assigns a thread per failed evacuated region. This can in effect serialize the whole process as often (particularly with region pinning) there is only one region to fix up.
>> 
>> This patch tries to improve parallelism when walking over the regions in chunks
>> 
>> Latest implementation scans regions in chunks to bring parallelism, it's based on JDK-8278917 which changes to uses prev bitmap to mark evacuation failure objs.
>> 
>> Here's the summary of performance data based on latest implementation, basically, it brings better and stable performance than baseline at "Post Evacuate Cleanup 1/remove self forwardee" phase. (Although some regression is spotted when calculate the results in geomean, becuase one pause time from baseline is far too small than others.)
>> 
>> The performance benefit trend is:
>>  - pause time (Post Evacuate Cleanup 1) is decreased from 76.79% to 2.28% for average time, from 71.61% to 3.04% for geomean, when G1EvacuationFailureALotCSetPercent is changed from 2 to 90 (-XX:ParallelGCThreads=8)
>>  - pause time (Post Evacuate Cleanup 1) is decreased from 63.84% to 15.16% for average time, from 55.41% to 12.45% for geomean, when G1EvacuationFailureALotCSetPercent is changed from 2 to 90 (-XX:ParallelGCThreads=<default=123>)
>> ( Other common Evacuation Failure configurations are:
>> -XX:+G1EvacuationFailureALot -XX:G1EvacuationFailureALotInterval=0 -XX:G1EvacuationFailureALotCount=0 )
>> 
>> For more detailed performance data, please check the related bug.
>
> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Albert review

Changes requested by tschatzl (Reviewer).

src/hotspot/share/gc/g1/g1EvacFailure.cpp line 155:

> 153: 
> 154: class RemoveSelfForwardPtrHRChunkClosure : public G1HeapRegionChunkClosure {
> 155:   class RegionMarkedWordsCache {

Suggestion:


  // Caches the currently accumulated number of live/marked words found in this heap region. Avoids direct (frequent) atomic operations on the HeapRegion's marked words.
  class RegionMarkedWordsCache {

src/hotspot/share/gc/g1/g1EvacFailure.cpp line 170:

> 168:     void note_self_forwarding_removal_end_par() {
> 169:       _g1h->region_at(_region_idx)->note_self_forwarding_removal_end_par(_marked_words * BytesPerWord);
> 170:     }

This helper should be `private` afaict.

src/hotspot/share/gc/g1/g1EvacFailureRegions.cpp line 50:

> 48:   _regions_failed_evacuation.resize(max_regions);
> 49:   _evac_failure_regions = NEW_C_HEAP_ARRAY(uint, max_regions, mtGC);
> 50:   _chunks_in_regions = new (NEW_C_HEAP_OBJ(G1ScanChunksInHeapRegions, mtGC)) G1ScanChunksInHeapRegions();

If the code made `G1ScanChunksInHeapRegions` inherit from `CHeapObj<mtGC>`, then the macros to allocate/delete would not be required.

src/hotspot/share/gc/g1/g1EvacFailureRegions.cpp line 98:

> 96: 
> 97:     void prepare_region(uint region_idx, uint worker_id) {
> 98:       G1CollectedHeap* _heap = G1CollectedHeap::heap();

Locals should not have an underscore. Also we tend to use `g1h` for variables containing `G1CollectedHeap::heap()`.

src/hotspot/share/gc/g1/g1EvacFailureRegions.cpp line 98:

> 96: 
> 97:     void prepare_region(uint region_idx, uint worker_id) {
> 98:       G1CollectedHeap* _heap = G1CollectedHeap::heap();

Locals should not have an underscore. I personally also prefer `g1h` for `G1CollectedHeap::heap()`, but `heap` is fine.

src/hotspot/share/gc/g1/g1EvacFailureRegions.cpp line 139:

> 137: public:
> 138:   PrepareEvacFailureRegionTask(G1EvacFailureRegions* evac_failure_regions, uint num_workers) :
> 139:   WorkerTask("Prepare Evacuation Failure Region Task"),

The whole initializer list should indented one more level.

src/hotspot/share/gc/g1/g1EvacFailureRegions.cpp line 152:

> 150: 
> 151: void G1EvacFailureRegions::prepare_regions() {
> 152:   uint num_workers = MAX2(1u, MIN2(_evac_failure_regions_cur_length, G1CollectedHeap::heap()->workers()->active_workers()));

There is this `clamp(value, min max)` function which is a bit more readable than this. Please also introduce a helper variable `g1h` to improve readability.

src/hotspot/share/gc/g1/g1EvacFailureRegions.cpp line 154:

> 152:   uint num_workers = MAX2(1u, MIN2(_evac_failure_regions_cur_length, G1CollectedHeap::heap()->workers()->active_workers()));
> 153:   PrepareEvacFailureRegionTask task(this, num_workers);
> 154:   G1CollectedHeap::heap()->workers()->run_task(&task, num_workers);

Suggestion:

  WorkerThreads* workers = G1CollectedHeap::heap()->workers();
  uint num_workers = clamp(_evac_failure_regions_cur_length, 1u, workers->active_workers()));
  PrepareEvacFailureRegionTask task(this, num_workers);
  workers->run_task(&task, num_workers);


This suggestion is kind of obsolete when wrapping this into a `G1AbstractSubTask`

src/hotspot/share/gc/g1/g1EvacFailureRegions.hpp line 68:

> 66: 
> 67:   // Do necessary preparation for evacuation failure regions
> 68:   void prepare_regions();

I think the whole of `prepare_regions` should be integrated as a `G1BatchedTask`; from an initial look there does not seem to be anything in cleanup phase 1 that is dependent on what this code updates.
Simply add a method that returns an `G1AbstractSubTask` for use in the `G1BatchedGangTask` as done e.g. for `G1RemSet::create_cleanup_after_scan_heap_roots_task`.

src/hotspot/share/gc/g1/g1GCPhaseTimes.cpp line 118:

> 116:   _gc_par_phases[EagerlyReclaimHumongousObjects] = new WorkerDataArray<double>("EagerlyReclaimHumongousObjects", "Eagerly Reclaim Humongous Objects (ms):", max_gc_threads);
> 117:   _gc_par_phases[RestorePreservedMarks] = new WorkerDataArray<double>("RestorePreservedMarks", "Restore Preserved Marks (ms):", max_gc_threads);
> 118:   _gc_par_phases[VerifyAfterSelfForwardingPtrRemoval] = new WorkerDataArray<double>("VerifyAfterSelfForwardingPtrRemoval", "Verify Retained Regions (ms):", max_gc_threads);

Not sure if we need extra verification pass time logging, but keep it if you think it is useful.

src/hotspot/share/gc/g1/g1HeapRegionChunk.cpp line 2:

> 1: /*
> 2:  * Copyright (c) 2021, Huawei Technologies Co. Ltd. All rights reserved.

It's 2022 already :)

src/hotspot/share/gc/g1/g1HeapRegionChunk.cpp line 2:

> 1: /*
> 2:  * Copyright (c) 2021, Huawei Technologies Co. Ltd. All rights reserved.

It's 2022 now :)

src/hotspot/share/gc/g1/g1HeapRegionChunk.cpp line 30:

> 28: #include "gc/g1/g1ConcurrentMarkBitMap.inline.hpp"
> 29: #include "gc/g1/g1HeapRegionChunk.hpp"
> 30: #include "gc/g1/heapRegion.hpp"

I think including the .inline is better here as likely one or more of its methods are used in this code (I did not check), but there is a risk on the .inline contents being imported from somewhere else.

src/hotspot/share/gc/g1/g1HeapRegionChunk.cpp line 55:

> 53: 
> 54: void G1ScanChunksInHeapRegions::process_chunk(G1HeapRegionChunkClosure* chunk_closure, uint chunk_id, uint worker_id) {
> 55:   G1GCPhaseTimes* p = G1CollectedHeap::heap()->phase_times();

Introduce a `g1h` variable for `G1CollectedHeap::heap()` here (your call)

src/hotspot/share/gc/g1/g1HeapRegionChunk.cpp line 55:

> 53: 
> 54: void G1ScanChunksInHeapRegions::process_chunk(G1HeapRegionChunkClosure* chunk_closure, uint chunk_id, uint worker_id) {
> 55:   G1GCPhaseTimes* p = G1CollectedHeap::heap()->phase_times();

Please extract `G1CollectedHeap::heap()` into a variable (`g1h`).

src/hotspot/share/gc/g1/g1HeapRegionChunk.hpp line 80:

> 78:   uint _chunks_per_region;
> 79:   uint _chunk_size;
> 80:   uint _total_chunks;

This can be retrieved (via a getter) from `_chunks` to avoid storing the same value multiple times.

src/hotspot/share/gc/g1/g1HeapRegionChunk.hpp line 80:

> 78:   uint _chunks_per_region;
> 79:   uint _chunk_size;
> 80:   uint _total_chunks;

Instead of the explicit variable, I recommend having a getter that returns `_chunks.size()`. This avoids duplication of that information.

src/hotspot/share/gc/g1/g1YoungCollector.cpp line 986:

> 984:   allocator()->release_gc_alloc_regions(evacuation_info);
> 985: 
> 986:   _evac_failure_regions.prepare_regions();

Can this task be moved into the `post_evacuate_cleanup_1` phase as a `G1BatchedTask`?

src/hotspot/share/gc/g1/g1YoungGCPostEvacuateTasks.cpp line 70:

> 68: 
> 69: class G1PostEvacuateCollectionSetCleanupTask1::SampleCollectionSetCandidatesTask : public G1AbstractSubTask {
> 70: 

Seems to be an unrelated cleanup. Please remove.

src/hotspot/share/gc/g1/g1YoungGCPostEvacuateTasks.cpp line 73:

> 71: public:
> 72:   SampleCollectionSetCandidatesTask() :
> 73:     G1AbstractSubTask(G1GCPhaseTimes::SampleCollectionSetCandidates) { }

Seems to be an unrelated cleanup. Please remove.

src/hotspot/share/gc/g1/g1YoungGCPostEvacuateTasks.cpp line 90:

> 88: 
> 89:       bool do_heap_region(HeapRegion* hr) override {
> 90:         _total.add(hr->rem_set()->card_set_memory_stats());

Seems to be an unrelated cleanup. Please remove.

src/hotspot/share/gc/g1/g1_globals.hpp line 391:

> 389:           "for evecuation failure regions.")                                \
> 390:           range(1, 1024)                                                    \
> 391:                                                                             \

Please make these diagnostic. They really are just for troubleshooting and not general use.

-------------

PR: https://git.openjdk.java.net/jdk/pull/7047



More information about the hotspot-gc-dev mailing list