RFR: 8336640: Shenandoah: Parallel worker use in parallel_heap_region_iterate [v2]

Wed Jul 24 20:05:32 UTC 2024

On Wed, 24 Jul 2024 19:50:44 GMT, Xiaolong Peng <xpeng at openjdk.org> wrote:

>> [parallel_heap_region_iterate](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp#L1726-L1734) is used to execute lightweight operations on heap regions, including ShenandoahPrepareForMarkClosure, ShenandoahInitMarkUpdateRegionStateClosure, ShenandoahFinalUpdateRefsUpdateRegionStateClosure, ShenandoahResetUpdateRegionStateClosure and ShenandoahFinalMarkUpdateRegionStateClosure. Since all the operations are very lightweight, in regular cases w/o large number of heap regions, the parallelism seems to be an overkill because the cost of multi-thread orchestrating could be more expensive; In most cases, single thread should be more efficient. Also, if multiple threading is needed, we should maximize the utilization of all active workers for best performance.
>> 
>> This PR includes proposed improvments addressing the known issues:
>> 1. Change the default value of ShenandoahParallelRegionStride to 0, when it is 0, Shenandoah will auto derive the value of stride for best performance; 
>> 2. if num_regions is <= 4096, not use worker threads at all to avoid the overhead of multi-threading;
>> 3. When num_regions is more than 4096, use worker threads to parallelize the workload, derive the value of stride to evenly distribute the workload to all active workers.
>> 4. When number of active workers is 1, don't bother the workers, it is faster to finish the workload in current thread(avoid overhead of multi-threads orchestration)
>> 
>> There are some time metrics I collected from test with TIP version(I added time metrics for parallel_heap_region_iterate):
>> 
>> JVM args: export JAVA_OPTS="-Xms8G -Xmx8G  -XX:+AlwaysPreTouch -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:ShenandoahParallelRegionStride=<stride> -XX:ShenandoahTargetNumRegions=<num_regions>  -Xlog:gc*"
>> 
>> |             | 1024 regions | 2048 regions | 4096 regions | 8192 regions |16384 regions |
>> | ----------- | ------------ | ------------ | ------------ | ------------ |------------ |
>> | 1024 stride | 5785 ns         | 22194 ns        | 20953 ns        | 23008 ns        |33013 ns       |
>> | 2048 stride | N/A          | 6491 ns         | 22476  ns        | 25842 ns        |34378 ns        |
>> | 4096 stride | N/A          | N/A          | 14034 ns        | 28425 ns        |36324 ns        |
>> | 8192 stride | N/A          | N/A          | N/A          | 24359 ns        |45231 ns        |
>> | 16384 stride | N/A          | N/A          | N/A          | N/A   ...
>
> Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Add trailing space

Marked as reviewed by shade (Reviewer).

I see double-digit-us improvements on my Mac, nice.

# Before
[30.851s][info][gc,stats] Pause Init Mark (G)            =    0.076 s (a =      272 us) (n =   281) (lvls, us =       47,      137,      199,      246,     5296)
[30.851s][info][gc,stats] Pause Init Mark (N)            =    0.019 s (a =       68 us) (n =   281) (lvls, us =       20,       49,       57,       71,      617)
[30.851s][info][gc,stats]   Update Region States         =    0.014 s (a =       49 us) (n =   281) (lvls, us =       13,       34,       41,       51,      368)

# After
[30.301s][info][gc,stats] Pause Init Mark (G)            =    0.073 s (a =      239 us) (n =   307) (lvls, us =       31,       90,      162,      217,     3586)
[30.301s][info][gc,stats] Pause Init Mark (N)            =    0.009 s (a =       28 us) (n =   307) (lvls, us =       10,       21,       26,       33,      188)
[30.301s][info][gc,stats]   Update Region States         =    0.004 s (a =       14 us) (n =   307) (lvls, us =        4,       11,       13,       15,      176)

-------------

PR Review: https://git.openjdk.org/jdk/pull/20305#pullrequestreview-2197663679
PR Comment: https://git.openjdk.org/jdk/pull/20305#issuecomment-2248800641