RFR: 8372861: Genshen: Override parallel_region_stride of ShenandoahResetBitmapClosure to a reasonable value for better parallelism

Kelvin Nilsen kdnilsen at openjdk.org
Tue Dec 2 19:43:03 UTC 2025


On Tue, 2 Dec 2025 18:59:25 GMT, Xiaolong Peng <xpeng at openjdk.org> wrote:

> In concurrent reset/concurrent reset after collect phase, the worker needs to reset bitmaps for all the regions in current GC generation. The problem is resetting bitmaps may takes long for large heap because the marking bitmaps are also larger than small heap, we should always consider multiple threads if there are more than concurrent workers for concurrent reset. 
> 
> In this PR, parallel_region_stride for ShenandoahResetBitmapClosure is set to 8 for best possible workload distribution to all active workers.
> 
> Test result:
> 
> java -XX:+TieredCompilation -XX:+AlwaysPreTouch -Xms32G -Xmx32G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -Xlog:gc* -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational  -XX:+UseTLAB -jar ~/Downloads/dacapo-23.11-MR2-chopin.jar -n 5 h2 | grep "Concurrent Reset"
> 
> With the change:
> 
> [77.867s][info][gc,stats    ] Concurrent Reset               =    0.043 s (a =     3039 us) (n =    14) (lvls, us =     1133,     1230,     1270,     1328,    14650)
> [77.867s][info][gc,stats    ] Concurrent Reset After Collect =    0.043 s (a =     3107 us) (n =    14) (lvls, us =     1094,     1230,     1855,     3457,     8348)
> 
> Original:
> 
> 
> [77.289s][info][gc,stats    ] Concurrent Reset               =    0.045 s (a =     3197 us) (n =    14) (lvls, us =     1172,     1191,     1309,     1426,    15582)
> [77.289s][info][gc,stats    ] Concurrent Reset After Collect =    0.105 s (a =     7476 us) (n =    14) (lvls, us =     2246,     3828,     4395,     7695,    21266)
> 
> 
> The average time of concurrent reset after collect is reduced from 7476 us to 3107 us, 100%+ improvement.
> 
> ### Other tests
> - [x] hotspot_gc_shenandoah

See below

src/hotspot/share/gc/shenandoah/shenandoahGeneration.cpp line 83:

> 81:   // For a 31G heap resetting bitmaps could take more than 60ms for single thread, we should use a small
> 82:   // parallel region stride for ShenandoahResetBitmapClosure.
> 83:   size_t parallel_region_stride() override { return 8; }

Should this be:

if (ShenandoahParallelRegionStride == 0) {
  return 8;
} else {
  return ShenandoahParallelRegionStride;
}

-------------

Changes requested by kdnilsen (Committer).

PR Review: https://git.openjdk.org/jdk/pull/28613#pullrequestreview-3531815307
PR Review Comment: https://git.openjdk.org/jdk/pull/28613#discussion_r2582551300


More information about the shenandoah-dev mailing list