RFR: 8345423: Shenandoah: Parallelize concurrent cleanup [v2]
William Kemper
wkemper at openjdk.org
Wed Dec 4 19:06:58 UTC 2024
On Wed, 4 Dec 2024 19:02:21 GMT, Xiaolong Peng <xpeng at openjdk.org> wrote:
>> Parallelize concurrent cleanup after Shenandoah collection cycle is executed by single thread(Shenandoah control thread), since currently recycling trashed regions requires heap lock even it can be done w/o heap lock. This PR is a proposal to parallelize the execution of Shenandoah concurrent cleanup after making recycling trashed regions lock free.
>>
>> With the change the time execute Concurrent cleanup has been significantly improved by 10+ times, throughput/allocation rate is also improved significantly:
>>
>> TIP:
>>
>> [30.380s][info][gc] GC(1245) Concurrent cleanup (Young) 3491M->739M(4096M) 3.634ms
>> [30.404s][info][gc] GC(1246) Concurrent cleanup (Young) 3258M->377M(4096M) 2.233ms
>> [30.434s][info][gc] GC(1247) Concurrent cleanup (Young) 2887M->333M(4096M) 7.958ms
>> [30.464s][info][gc] GC(1248) Concurrent cleanup (Young) 3134M->472M(4096M) 6.097ms
>> [30.487s][info][gc] GC(1249) Concurrent cleanup (Young) 2922M->212M(4096M) 3.072ms
>> [30.519s][info][gc] GC(1250) Concurrent cleanup (Young) 3404M->549M(4096M) 3.730ms
>> [30.552s][info][gc] GC(1251) Concurrent cleanup (Young) 3542M->712M(4096M) 6.118ms
>> [30.579s][info][gc] GC(1252) Concurrent cleanup (Young) 3257M->373M(4096M) 5.049ms
>> [30.608s][info][gc] GC(1253) Concurrent cleanup (Young) 3390M->418M(4096M) 2.779ms
>>
>> Parallelized:
>>
>> [30.426s][info][gc] GC(1557) Concurrent cleanup (Young) 3208M->43M(4096M) 0.177ms
>> [30.510s][info][gc] GC(1560) Concurrent cleanup (Young) 2938M->161M(4096M) 0.220ms
>> [30.534s][info][gc] GC(1561) Concurrent cleanup (Young) 2960M->57M(4096M) 0.164ms
>> [30.564s][info][gc] GC(1562) Concurrent cleanup (Young) 3189M->106M(4096M) 0.176ms
>> [30.595s][info][gc] GC(1563) Concurrent cleanup (Young) 3389M->367M(4096M) 0.247ms
>> [30.625s][info][gc] GC(1564) Concurrent cleanup (Young) 3662M->628M(4096M) 0.246ms
>> [30.649s][info][gc] GC(1565) Concurrent cleanup (Young) 3190M->150M(4096M) 0.172ms
>> [30.678s][info][gc] GC(1566) Concurrent cleanup (Young) 3225M->69M(4096M) 0.175ms
>> [30.709s][info][gc] GC(1567) Concurrent cleanup (Young) 3250M->107M(4096M) 0.179ms
>> [30.765s][info][gc] GC(1570) Concurrent cleanup (Young) 2932M->211M(4096M) 0.422ms
>>
>>
>> JVM args for the tests: `-Xms4G -Xmx4G -XX:+AlwaysPreTouch -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:ShenandoahGCMode=generational -XX:-ShenandoahPacing -XX:+UseTLAB -Xlog:gc`
>>
>>
>> ### Additional test
>> - [x] MacOS AArch64 server fastdebug, hotspot_gc_shenandoah
>
> Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision:
>
> - Merge branch 'openjdk:master' into parallel-cleanup
> - Remove _trash_regions
> - Completely remove heap lock from recycling trashed regions
> - Void reordering
> - Rename recycling to _recycling
> - Remove comments
> - Parallelize concurrent cleanup and make recycling trashed regions mostly lock-free
Changes look good. Left some nits.
src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp line 1049:
> 1047:
> 1048: void ShenandoahConcurrentGC::op_cleanup_early() {
> 1049: ShenandoahWorkerScope scope(ShenandoahHeap::heap()->workers(),
Can we align these arguments with the first argument after the `(`?
src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 1262:
> 1260: public:
> 1261: ShenandoahRecycleTrashedRegionTask() :
> 1262: WorkerTask("Shenandoah Recycle trashed region.") {}
Should be "Shenandoah Recycle Trashed Regions" (no period, title case).
src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 1265:
> 1263:
> 1264: void work(uint worker_id) {
> 1265: const ShenandoahHeap* heap = ShenandoahHeap::heap();
`heap` looks unused here.
src/hotspot/share/gc/shenandoah/shenandoahGeneration.cpp line 890:
> 888: size_t ShenandoahGeneration::decrement_affiliated_region_count() {
> 889: // Assertions only hold true for Java threads since they call this method under heap lock.
> 890: bool const is_java_thread = Thread::current()->is_Java_thread();
Prefer not to check `Thread::current` to gate assertions. Could we use an `#ifdef ASSERT` block here? Could this be `decrease_affiliated_region_count(1)` instead? or should we have a separate `decrement_under_lock` method?
src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.cpp line 592:
> 590: shenandoah_assert_heaplocked();
> 591: if (is_trash() && _recycling.try_set()) {
> 592: if (is_trash()) {
Is it necessary to check `is_trash` a second time while the heap lock is held? Also, if it _is_ necessary, then it seems like we should `_recycling.unset` in the scope where `_recycling.try_set` happened. As it is, if the second check for `is_trash` was `false`, then the code would not `_recycling.unset`.
-------------
Changes requested by wkemper (Committer).
PR Review: https://git.openjdk.org/jdk/pull/22538#pullrequestreview-2479579023
PR Review Comment: https://git.openjdk.org/jdk/pull/22538#discussion_r1870112931
PR Review Comment: https://git.openjdk.org/jdk/pull/22538#discussion_r1870083766
PR Review Comment: https://git.openjdk.org/jdk/pull/22538#discussion_r1870085143
PR Review Comment: https://git.openjdk.org/jdk/pull/22538#discussion_r1870092090
PR Review Comment: https://git.openjdk.org/jdk/pull/22538#discussion_r1870100238
More information about the hotspot-gc-dev
mailing list