RFR: 8372861: Genshen: Override parallel_region_stride of ShenandoahResetBitmapClosure to a reasonable value for better parallelism

Tue Dec 2 19:43:04 UTC 2025

On Tue, 2 Dec 2025 19:33:01 GMT, Kelvin Nilsen <kdnilsen at openjdk.org> wrote:

>> In concurrent reset/concurrent reset after collect phase, the worker needs to reset bitmaps for all the regions in current GC generation. The problem is resetting bitmaps may takes long for large heap because the marking bitmaps are also larger than small heap, we should always consider multiple threads if there are more than concurrent workers for concurrent reset. 
>> 
>> In this PR, parallel_region_stride for ShenandoahResetBitmapClosure is set to 8 for best possible workload distribution to all active workers.
>> 
>> Test result:
>> 
>> java -XX:+TieredCompilation -XX:+AlwaysPreTouch -Xms32G -Xmx32G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -Xlog:gc* -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational  -XX:+UseTLAB -jar ~/Downloads/dacapo-23.11-MR2-chopin.jar -n 5 h2 | grep "Concurrent Reset"
>> 
>> With the change:
>> 
>> [77.867s][info][gc,stats    ] Concurrent Reset               =    0.043 s (a =     3039 us) (n =    14) (lvls, us =     1133,     1230,     1270,     1328,    14650)
>> [77.867s][info][gc,stats    ] Concurrent Reset After Collect =    0.043 s (a =     3107 us) (n =    14) (lvls, us =     1094,     1230,     1855,     3457,     8348)
>> 
>> Original:
>> 
>> 
>> [77.289s][info][gc,stats    ] Concurrent Reset               =    0.045 s (a =     3197 us) (n =    14) (lvls, us =     1172,     1191,     1309,     1426,    15582)
>> [77.289s][info][gc,stats    ] Concurrent Reset After Collect =    0.105 s (a =     7476 us) (n =    14) (lvls, us =     2246,     3828,     4395,     7695,    21266)
>> 
>> 
>> The average time of concurrent reset after collect is reduced from 7476 us to 3107 us, 100%+ improvement.
>> 
>> ### Other tests
>> - [x] hotspot_gc_shenandoah
>
> src/hotspot/share/gc/shenandoah/shenandoahGeneration.cpp line 83:
> 
>> 81:   // For a 31G heap resetting bitmaps could take more than 60ms for single thread, we should use a small
>> 82:   // parallel region stride for ShenandoahResetBitmapClosure.
>> 83:   size_t parallel_region_stride() override { return 8; }
> 
> Should this be:
> 
> if (ShenandoahParallelRegionStride == 0) {
>   return 8;
> } else {
>   return ShenandoahParallelRegionStride;
> }

In fact, rather than the "constant" value 8, should we return ShenandoahWorkerPolicy::calc_workers_for_conc_reset()?

This makes the change robust against future integration of "worker surge".

In fact, the parallel stride depends on the reason we are iterating.  Can we make this change more "generic"?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/28613#discussion_r2582566703