Integrated: 8340490: Shenandoah: Optimize ShenandoahPacer

Fri Sep 27 17:08:45 UTC 2024

On Thu, 19 Sep 2024 23:32:14 GMT, Xiaolong Peng <xpeng at openjdk.org> wrote:

> In a simple latency benchmark for memory allocation, I found ShenandoahPacer contributed quite a lot to the long tail latency > 10ms, when there are multi mutator threads failed at fast path to claim budget [here](https://github.com/openjdk/jdk/blob/fdc16a373459cb2311316448c765b1bee5c73694/src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp#L230), all of them will forcefully claim and them wait for up to 10ms([code link](https://github.com/openjdk/jdk/blob/fdc16a373459cb2311316448c765b1bee5c73694/src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp#L239-L277))
> 
> The change in this PR makes ShenandoahPacer impact long tail latency much less, instead forcefully claim budget and them wait, it attempts to claim after waiting for 1ms, and keep doing this until: 1/ either spent 10ms waiting in total; 2/ or successfully claimed the budget.
> 
> Here the latency comparison for the optimization:
> ![hdr-histogram-optimize-pacer](https://github.com/user-attachments/assets/811f48c5-87eb-462d-8b27-d15bd08be7b0)
> 
> With the optimization, long tail latency from the test code below has been much improved from over 20ms to ~10ms on MacOS with M3 chip:
> 
>     static final int threadCount = Runtime.getRuntime().availableProcessors();
>     static final LongAdder totalCount = new LongAdder();
>     static volatile byte[] sink;
>     public static void main(String[] args) {
>         runAllocationTest(100000);
>     }
>     static void recordTimeToAllocate(final int dataSize, final Histogram histogram) {
>         long startTime = System.nanoTime();
>         sink = new byte[dataSize];
>         long endTime = System.nanoTime();
>         histogram.recordValue(endTime - startTime);
>     }
> 
>     static void runAllocationTest(final int dataSize) {
>         final long endTime = System.currentTimeMillis() + 30_000;
>         final CountDownLatch startSignal = new CountDownLatch(1);
>         final CountDownLatch finished = new CountDownLatch(threadCount);
>         final Thread[] threads = new Thread[threadCount];
>         final Histogram[] histograms = new Histogram[threadCount];
>         final Histogram totalHistogram = new Histogram(3600000000000L, 3);
>         for (int i = 0; i < threadCount; i++) {
>             final var histogram = new Histogram(3600000000000L, 3);
>             histograms[i] = histogram;
>             threads[i] = new Thread(() -> {
>                 wait(startSignal);
>                 do {
>                     recordTimeToAllocate(dataSize, histogram);
>                 } while (System.currentTimeMillis() < e...

This pull request has now been integrated.

Changeset: 65200a95
Author:    Xiaolong Peng <xpeng at openjdk.org>
Committer: Aleksey Shipilev <shade at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/65200a9589e46956a2194b20c4c90d003351a539
Stats:     41 lines in 3 files changed: 8 ins; 16 del; 17 mod

8340490: Shenandoah: Optimize ShenandoahPacer

Reviewed-by: shade, kdnilsen

-------------

PR: https://git.openjdk.org/jdk/pull/21099