RFR: 8372861: Genshen: Override parallel_region_stride of ShenandoahResetBitmapClosure to a reasonable value for better parallelism

Tue Dec 2 22:03:25 UTC 2025

On Tue, 2 Dec 2025 19:40:19 GMT, Kelvin Nilsen <kdnilsen at openjdk.org> wrote:

>> src/hotspot/share/gc/shenandoah/shenandoahGeneration.cpp line 83:
>> 
>>> 81:   // For a 31G heap resetting bitmaps could take more than 60ms for single thread, we should use a small
>>> 82:   // parallel region stride for ShenandoahResetBitmapClosure.
>>> 83:   size_t parallel_region_stride() override { return 8; }
>> 
>> Should this be:
>> 
>> if (ShenandoahParallelRegionStride == 0) {
>>   return 8;
>> } else {
>>   return ShenandoahParallelRegionStride;
>> }
>
> In fact, rather than the "constant" value 8, should we return ShenandoahWorkerPolicy::calc_workers_for_conc_reset()?
> 
> This makes the change robust against future integration of "worker surge".
> 
> In fact, the parallel stride depends on the reason we are iterating.  Can we make this change more "generic"?

The intention of the change is to let `ShenandoahResetBitmapClosure` not use the ShenandoahParallelRegionStride global value at all, here is the reasons: 
ShenandoahParallelRegionStride is usually set to a large value, the default value used to be 1024, large value won't help with the performance ShenandoahHeapRegionClosure at all for some reasons:
1. ShenandoahResetBitmapClosure reset the marking bitmaps before/after GC cycle, the resetting may not not needed for each region. e.g. when `top_bitmap == bottom`(immediate trash regions?) or the region is not current gc generation.
2. Withe large ShenandoahParallelRegionStride, each task will get large number of successive regions, e.g. worker 0 will process region 1 to region 1024, in this way it is not possible to make sure the actual workload is evenly distributed to all workers, some of the workers may have most of the regions need bitmap reset, some of the worker may not really do any actual bitmap reset at all. 

A smaller parallel region stride value will help with the workload distribution and also make it adaptive to different number of workers, it should be also working just fine with "worker surge"

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/28613#discussion_r2582857990