RFR: 8372861: Genshen: Override parallel_region_stride of ShenandoahResetBitmapClosure to a reasonable value for better parallelism
Xiaolong Peng
xpeng at openjdk.org
Tue Dec 2 22:03:28 UTC 2025
On Tue, 2 Dec 2025 21:38:26 GMT, Xiaolong Peng <xpeng at openjdk.org> wrote:
>> In fact, rather than the "constant" value 8, should we return ShenandoahWorkerPolicy::calc_workers_for_conc_reset()?
>>
>> This makes the change robust against future integration of "worker surge".
>>
>> In fact, the parallel stride depends on the reason we are iterating. Can we make this change more "generic"?
>
> The intention of the change is to let `ShenandoahResetBitmapClosure` not use the ShenandoahParallelRegionStride global value at all, here is the reasons:
> ShenandoahParallelRegionStride is usually set to a large value, the default value used to be 1024, large value won't help with the performance ShenandoahHeapRegionClosure at all for some reasons:
> 1. ShenandoahResetBitmapClosure reset the marking bitmaps before/after GC cycle, the resetting may not not needed for each region. e.g. when `top_bitmap == bottom`(immediate trash regions?) or the region is not current gc generation.
> 2. Withe large ShenandoahParallelRegionStride, each task will get large number of successive regions, e.g. worker 0 will process region 1 to region 1024, in this way it is not possible to make sure the actual workload is evenly distributed to all workers, some of the workers may have most of the regions need bitmap reset, some of the worker may not really do any actual bitmap reset at all.
>
> A smaller parallel region stride value will help with the workload distribution and also make it adaptive to different number of workers, it should be also working just fine with "worker surge"
In the JBS bug report, I attached a test I did for this, I have tested value from 1 to 4096:
java -XX:+TieredCompilation -XX:+AlwaysPreTouch -Xms32G -Xmx32G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -Xlog:gc* -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational -XX:+UseTLAB -XX:ShenandoahParallelRegionStride=<Stride Value>-jar ~/Downloads/dacapo-23.11-MR2-chopin.jar -n 5 h2 | grep "Concurrent Reset"
[1]
[77.444s][info][gc,stats ] Concurrent Reset = 0.043 s (a = 3078 us) (n = 14) (lvls, us = 1172, 1289, 1328, 1406, 14780)
[77.444s][info][gc,stats ] Concurrent Reset After Collect = 0.044 s (a = 3150 us) (n = 14) (lvls, us = 1074, 1504, 1895, 4121, 8952)
[2]
[77.304s][info][gc,stats ] Concurrent Reset = 0.043 s (a = 3036 us) (n = 14) (lvls, us = 1152, 1211, 1289, 1328, 14872)
[77.305s][info][gc,stats ] Concurrent Reset After Collect = 0.046 s (a = 3297 us) (n = 14) (lvls, us = 939, 1602, 2148, 3945, 8744)
[4]
[76.898s][info][gc,stats ] Concurrent Reset = 0.043 s (a = 3048 us) (n = 14) (lvls, us = 1152, 1230, 1270, 1328, 14989)
[76.898s][info][gc,stats ] Concurrent Reset After Collect = 0.045 s (a = 3215 us) (n = 14) (lvls, us = 1016, 1309, 1914, 3301, 7076)
[8]
[77.916s][info][gc,stats ] Concurrent Reset = 0.043 s (a = 3067 us) (n = 14) (lvls, us = 1152, 1211, 1270, 1309, 15091)
[77.916s][info][gc,stats ] Concurrent Reset After Collect = 0.043 s (a = 3050 us) (n = 14) (lvls, us = 1133, 1484, 1934, 3086, 8113)
[16]
[77.071s][info][gc,stats ] Concurrent Reset = 0.042 s (a = 3019 us) (n = 14) (lvls, us = 1152, 1250, 1270, 1328, 14615)
[77.071s][info][gc,stats ] Concurrent Reset After Collect = 0.046 s (a = 3284 us) (n = 14) (lvls, us = 932, 1523, 2090, 2930, 8841)
[32]
[76.965s][info][gc,stats ] Concurrent Reset = 0.044 s (a = 3117 us) (n = 14) (lvls, us = 1191, 1211, 1328, 1348, 14768)
[76.965s][info][gc,stats ] Concurrent Reset After Collect = 0.047 s (a = 3323 us) (n = 14) (lvls, us = 930, 1406, 1875, 4316, 8565)
[64]
[77.255s][info][gc,stats ] Concurrent Reset = 0.042 s (a = 3033 us) (n = 14) (lvls, us = 1152, 1211, 1270, 1406, 14635)
[77.255s][info][gc,stats ] Concurrent Reset After Collect = 0.054 s (a = 3862 us) (n = 14) (lvls, us = 1133, 1504, 2852, 5508, 8947)
[128]
[76.502s][info][gc,stats ] Concurrent Reset = 0.042 s (a = 3027 us) (n = 14) (lvls, us = 1133, 1230, 1250, 1426, 14264)
[76.502s][info][gc,stats ] Concurrent Reset After Collect = 0.053 s (a = 3762 us) (n = 14) (lvls, us = 1172, 1582, 2129, 5273, 9272)
[256]
[76.751s][info][gc,stats ] Concurrent Reset = 0.043 s (a = 3057 us) (n = 14) (lvls, us = 1133, 1230, 1270, 1426, 14713)
[76.751s][info][gc,stats ] Concurrent Reset After Collect = 0.056 s (a = 4029 us) (n = 14) (lvls, us = 1484, 1602, 3027, 4629, 11267)
[512]
[77.508s][info][gc,stats ] Concurrent Reset = 0.043 s (a = 3082 us) (n = 14) (lvls, us = 1133, 1230, 1270, 1426, 14893)
[77.508s][info][gc,stats ] Concurrent Reset After Collect = 0.068 s (a = 4822 us) (n = 14) (lvls, us = 1953, 2285, 3633, 5605, 16366)
[1024]
[76.933s][info][gc,stats ] Concurrent Reset = 0.043 s (a = 3073 us) (n = 14) (lvls, us = 1152, 1211, 1270, 1426, 14957)
[76.933s][info][gc,stats ] Concurrent Reset After Collect = 0.082 s (a = 5877 us) (n = 14) (lvls, us = 1895, 3203, 4258, 7793, 15587)
[2048]
[76.746s][info][gc,stats ] Concurrent Reset = 0.042 s (a = 3022 us) (n = 14) (lvls, us = 1133, 1172, 1211, 1406, 14586)
[76.746s][info][gc,stats ] Concurrent Reset After Collect = 0.099 s (a = 7104 us) (n = 14) (lvls, us = 1875, 3281, 4590, 7695, 19292)
[4096]
[77.356s][info][gc,stats ] Concurrent Reset = 0.042 s (a = 3031 us) (n = 14) (lvls, us = 1133, 1191, 1250, 1426, 14606)
[77.356s][info][gc,stats ] Concurrent Reset After Collect = 0.101 s (a = 7213 us) (n = 14) (lvls, us = 1914, 3262, 4238, 7871, 19862)
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/28613#discussion_r2582863336
More information about the shenandoah-dev
mailing list