RFR: 8256265: G1: Improve parallelism in regions that failed evacuation [v2]

Albert Mingkun Yang ayang at openjdk.org
Mon Sep 12 11:53:43 UTC 2022


On Mon, 12 Sep 2022 10:34:18 GMT, Thomas Schatzl <tschatzl at openjdk.org> wrote:

>> Hi all,
>> 
>>   can I have reviews on this change that makes the evacuation failure remove forwards phase split the work across threads within regions?
>> 
>> This work is a continuation of PR#7047 from @Hamlin-Li and latest improvements from @albertnetymk in that thread. This is reflected in the first commit, I added a fair amount of changes to fix bugs and streamline the code.  
>> 
>> Testing: tier1-8 with G1EvacuationFailureALot enabled, tier1-5 as is
>
> Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits:
> 
>  - Merge branch 'master' of gh:tschatzl/jdk into 8256265-parallel-evac-failure
>  - disable G1EvacuationFailureALot by default again
>  - Remove unneeded clear metadata phase
>  - Remove debug code
>  - More refactoring
>  - Initial cleanup
>  - some refactoring, fix clearing of opt index in cset
>  - fix test
>  - some cleanup
>  - Cleanup, phase names, fixes
>  - ... and 2 more: https://git.openjdk.org/jdk/compare/37df5f56...2628451d

src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 2948:

> 2946: 
> 2947: void G1CollectedHeap::clear_bitmap_for_region(HeapRegion* hr, bool update_tams) {
> 2948:   concurrent_mark()->clear_bitmap_for_region(hr, update_tams);

Instead of adding one more arg, I wonder if it's clearer to move the operation on `tams` to higher-level caller(s).

src/hotspot/share/gc/g1/g1EvacFailure.cpp line 254:

> 252:   _chunk_size = static_cast<uint>(HeapRegion::GrainWords / _num_chunks_per_region);
> 253: 
> 254:   log_debug(gc, ergo)("Initializing removing self forwards with %u chunks per region given %u workers",

`num_workers` seems to be used only for logging; I wonder if this can be dropped or moved to another place. It's odd that this task reports #workers but others don't.

src/hotspot/share/gc/g1/g1ParScanThreadState.cpp line 632:

> 630:     // Mark the failing object in the marking bitmap and later use the bitmap to handle
> 631:     // evacuation failure recovery.
> 632:     _g1h->mark_evac_failure_object(_worker_id, old, word_sz);

The same live-size is calculated again in `process_chunk`, isn't it?

src/hotspot/share/gc/g1/g1RemSet.cpp line 112:

> 110:   // Testing showed that 64 for 1M/2M region, 128 for 4M/8M regions, 256 for 16/32M regions,
> 111:   // and so on seems to be such a good trade-off.
> 112:   static uint get_chunks_per_region(uint log_region_size) {

I actually think the concrete examples (`64 for 1M/2M region ...`) make this method easy to comprehend. Could you keep them?

src/hotspot/share/gc/g1/g1YoungGCPostEvacuateTasks.cpp line 373:

> 371:     bool do_heap_region(HeapRegion* r) override {
> 372:       G1CollectedHeap::heap()->clear_bitmap_for_region(r, false /* update_tams */);
> 373:       return false;

I wonder if it's possible to assert `tams == bottom` here.

-------------

PR: https://git.openjdk.org/jdk/pull/9980


More information about the hotspot-dev mailing list