RFR: 8256265: G1: Improve parallelism in regions that failed evacuation [v2]

Tue Sep 13 10:34:08 UTC 2022

On Mon, 12 Sep 2022 10:34:18 GMT, Thomas Schatzl <tschatzl at openjdk.org> wrote:

>> Hi all,
>> 
>>   can I have reviews on this change that makes the evacuation failure remove forwards phase split the work across threads within regions?
>> 
>> This work is a continuation of PR#7047 from @Hamlin-Li and latest improvements from @albertnetymk in that thread. This is reflected in the first commit, I added a fair amount of changes to fix bugs and streamline the code.  
>> 
>> Testing: tier1-8 with G1EvacuationFailureALot enabled, tier1-5 as is
>
> Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits:
> 
>  - Merge branch 'master' of gh:tschatzl/jdk into 8256265-parallel-evac-failure
>  - disable G1EvacuationFailureALot by default again
>  - Remove unneeded clear metadata phase
>  - Remove debug code
>  - More refactoring
>  - Initial cleanup
>  - some refactoring, fix clearing of opt index in cset
>  - fix test
>  - some cleanup
>  - Cleanup, phase names, fixes
>  - ... and 2 more: https://git.openjdk.org/jdk/compare/37df5f56...2628451d

Looks good, this code has really improve over the last 6 months (since I looked at it before my leave) and I just have a few small comments/suggestions.

src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 3308:

> 3306:   } else {
> 3307:     _cm->raw_mark_in_bitmap(obj);
> 3308:   }

I think `raw_mark_in_bitmap(worker, obj, size)` isn't that raw if we add the liveness calculation so I think we should either give that wrapper another name or do something like this:
Suggestion:

  _cm->raw_mark_in_bitmap(obj);
  if (collector_state()->in_concurrent_start_gc()) {
    _cm->add_to_liveness(worker_id, obj, obj_size);
  }

src/hotspot/share/gc/g1/g1EvacFailure.cpp line 108:

> 106: 
> 107: // Caches the currently accumulated number of garbage words found in this heap region.
> 108: // Avoids direct (frequent) atomic operations on the HeapRegion's garbage counter.

Just to be sure we need this, we have seen problems calling `note_self_forwarding_removal_end_par(...)` on return from `process_chunk(...)`?

src/hotspot/share/gc/g1/g1EvacFailure.hpp line 66:

> 64: };
> 65: 
> 66: #endif // SHARE_GC_G1_G1EVACFAILURE_HPP

Suggestion:

#endif // SHARE_GC_G1_G1EVACFAILURE_HPP

src/hotspot/share/gc/g1/g1GCPhaseTimes.cpp line 106:

> 104:   _gc_par_phases[MergePSS] = new WorkerDataArray<double>("MergePSS", "Merge Per-Thread State (ms):", max_gc_threads);
> 105:   _gc_par_phases[RestoreRetainedRegions] = new WorkerDataArray<double>("RestoreRetainedRegions", "Restore Retained Regions (ms):", max_gc_threads);
> 106:   _gc_par_phases[RemoveSelfForwardsInChunks] = new WorkerDataArray<double>("RemoveSelfForwardsInChunks", "Remove Self Forwards In Chunks (ms):", max_gc_threads);

Is the "In Chunks" part really interesting to users? We do mention chunks below so the info is there. 

And if we could come up with a more user friendly name than "Remove Self Forwards" that would probably be good as well. Some thing like "Clean/Fix Failed Regions" maybe.

src/hotspot/share/gc/g1/g1YoungGCPostEvacuateTasks.cpp line 112:

> 110:   double worker_cost() const override {
> 111:     assert(_evac_failure_regions->evacuation_failed(), "Should not call this if not executed");
> 112:     return _evac_failure_regions->num_regions_failed_evacuation() * G1PerRetainedRegionThreads;

It feels a bit wrong to have a flag for this, or at least I'm not sure if the number of thread only should be connected to the number of regions only. Instead maybe we should look at the to total size for those regions. Now with the chunking there will be a lot more work items than regions so we should be able to look at size which to me looks like a better metric.

-------------

PR: https://git.openjdk.org/jdk/pull/9980