RFR: 8372861: Genshen: Override parallel_region_stride of ShenandoahResetBitmapClosure to a reasonable value for better parallelism [v2]

Wed Dec 3 10:14:51 UTC 2025

On Tue, 2 Dec 2025 23:28:58 GMT, Xiaolong Peng <xpeng at openjdk.org> wrote:

>> In concurrent reset/concurrent reset after collect phase, the worker needs to reset bitmaps for all the regions in current GC generation. The problem is resetting bitmaps may takes long for large heap because the marking bitmaps are also larger than small heap, we should always consider multiple threads if there are more than concurrent workers for concurrent reset. 
>> 
>> In this PR, parallel_region_stride for ShenandoahResetBitmapClosure is set to 8 for best possible workload distribution to all active workers.
>> 
>> Test result:
>> 
>> java -XX:+TieredCompilation -XX:+AlwaysPreTouch -Xms32G -Xmx32G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -Xlog:gc* -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational  -XX:+UseTLAB -jar ~/Downloads/dacapo-23.11-MR2-chopin.jar -n 5 h2 | grep "Concurrent Reset"
>> 
>> With the change:
>> 
>> [77.867s][info][gc,stats    ] Concurrent Reset               =    0.043 s (a =     3039 us) (n =    14) (lvls, us =     1133,     1230,     1270,     1328,    14650)
>> [77.867s][info][gc,stats    ] Concurrent Reset After Collect =    0.043 s (a =     3107 us) (n =    14) (lvls, us =     1094,     1230,     1855,     3457,     8348)
>> 
>> Original:
>> 
>> 
>> [77.289s][info][gc,stats    ] Concurrent Reset               =    0.045 s (a =     3197 us) (n =    14) (lvls, us =     1172,     1191,     1309,     1426,    15582)
>> [77.289s][info][gc,stats    ] Concurrent Reset After Collect =    0.105 s (a =     7476 us) (n =    14) (lvls, us =     2246,     3828,     4395,     7695,    21266)
>> 
>> 
>> The average time of concurrent reset after collect is reduced from 7476 us to 3107 us, 58% reduction for the time, 100%+ improvement for the performance/speed. 
>> 
>> ### Other tests
>> - [x] hotspot_gc_shenandoah
>> - [x] GHA
>
> Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Add more comments.

Changes requested by shade (Reviewer).

src/hotspot/share/gc/shenandoah/shenandoahGeneration.cpp line 88:

> 86:   // Using a smaller value here yields better task distribution for a lumpy workload. The task will be split
> 87:   // into smaller batches with 8 regions in batch, the worker processes more regions w/o needs to reset bitmaps
> 88:   // will process more batches, but overall all workers will be saturated throughout the whole concurrent reset phase.

I have a very general comment about writing comments like this one. This entire block of prose is really excessive, is set up to be outdated (are you tracking the real behavior of `SH::parallel_heap_region_iterate` and its magical `4096`?), and can be boiled down to much more succinct:

Bitmap reset task is heavy-weight and benefits from much smaller tasks than the default.

src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp line 119:

> 117:   // ShenandoahHeap::parallel_heap_region_iterate will derive a reasonable value based
> 118:   // on active worker threads and number of regions.
> 119:   // For some lumpy workload, the value can be overridden for better task distribution.

Again, excessive. You can just drop the comment; its purpose is obvious from the code.

-------------

PR Review: https://git.openjdk.org/jdk/pull/28613#pullrequestreview-3534247421
PR Review Comment: https://git.openjdk.org/jdk/pull/28613#discussion_r2584465890
PR Review Comment: https://git.openjdk.org/jdk/pull/28613#discussion_r2584468632