RFR: 8256265: G1: Improve parallelism in regions that failed evacuation [v2]
Albert Mingkun Yang
ayang at openjdk.org
Mon Sep 12 11:53:43 UTC 2022
On Mon, 12 Sep 2022 10:34:18 GMT, Thomas Schatzl <tschatzl at openjdk.org> wrote:
>> Hi all,
>>
>> can I have reviews on this change that makes the evacuation failure remove forwards phase split the work across threads within regions?
>>
>> This work is a continuation of PR#7047 from @Hamlin-Li and latest improvements from @albertnetymk in that thread. This is reflected in the first commit, I added a fair amount of changes to fix bugs and streamline the code.
>>
>> Testing: tier1-8 with G1EvacuationFailureALot enabled, tier1-5 as is
>
> Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits:
>
> - Merge branch 'master' of gh:tschatzl/jdk into 8256265-parallel-evac-failure
> - disable G1EvacuationFailureALot by default again
> - Remove unneeded clear metadata phase
> - Remove debug code
> - More refactoring
> - Initial cleanup
> - some refactoring, fix clearing of opt index in cset
> - fix test
> - some cleanup
> - Cleanup, phase names, fixes
> - ... and 2 more: https://git.openjdk.org/jdk/compare/37df5f56...2628451d
src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 2948:
> 2946:
> 2947: void G1CollectedHeap::clear_bitmap_for_region(HeapRegion* hr, bool update_tams) {
> 2948: concurrent_mark()->clear_bitmap_for_region(hr, update_tams);
Instead of adding one more arg, I wonder if it's clearer to move the operation on `tams` to higher-level caller(s).
src/hotspot/share/gc/g1/g1EvacFailure.cpp line 254:
> 252: _chunk_size = static_cast<uint>(HeapRegion::GrainWords / _num_chunks_per_region);
> 253:
> 254: log_debug(gc, ergo)("Initializing removing self forwards with %u chunks per region given %u workers",
`num_workers` seems to be used only for logging; I wonder if this can be dropped or moved to another place. It's odd that this task reports #workers but others don't.
src/hotspot/share/gc/g1/g1ParScanThreadState.cpp line 632:
> 630: // Mark the failing object in the marking bitmap and later use the bitmap to handle
> 631: // evacuation failure recovery.
> 632: _g1h->mark_evac_failure_object(_worker_id, old, word_sz);
The same live-size is calculated again in `process_chunk`, isn't it?
src/hotspot/share/gc/g1/g1RemSet.cpp line 112:
> 110: // Testing showed that 64 for 1M/2M region, 128 for 4M/8M regions, 256 for 16/32M regions,
> 111: // and so on seems to be such a good trade-off.
> 112: static uint get_chunks_per_region(uint log_region_size) {
I actually think the concrete examples (`64 for 1M/2M region ...`) make this method easy to comprehend. Could you keep them?
src/hotspot/share/gc/g1/g1YoungGCPostEvacuateTasks.cpp line 373:
> 371: bool do_heap_region(HeapRegion* r) override {
> 372: G1CollectedHeap::heap()->clear_bitmap_for_region(r, false /* update_tams */);
> 373: return false;
I wonder if it's possible to assert `tams == bottom` here.
-------------
PR: https://git.openjdk.org/jdk/pull/9980
More information about the hotspot-dev
mailing list