RFR: 8340490: Shenandoah: Optimize ShenandoahPacer [v2]

duke duke at openjdk.org
Fri Sep 27 15:07:39 UTC 2024


On Fri, 20 Sep 2024 18:47:50 GMT, Xiaolong Peng <xpeng at openjdk.org> wrote:

>> In a simple latency benchmark for memory allocation, I found ShenandoahPacer contributed quite a lot to the long tail latency > 10ms, when there are multi mutator threads failed at fast path to claim budget [here](https://github.com/openjdk/jdk/blob/fdc16a373459cb2311316448c765b1bee5c73694/src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp#L230), all of them will forcefully claim and them wait for up to 10ms([code link](https://github.com/openjdk/jdk/blob/fdc16a373459cb2311316448c765b1bee5c73694/src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp#L239-L277))
>> 
>> The change in this PR makes ShenandoahPacer impact long tail latency much less, instead forcefully claim budget and them wait, it attempts to claim after waiting for 1ms, and keep doing this until: 1/ either spent 10ms waiting in total; 2/ or successfully claimed the budget.
>> 
>> Here the latency comparison for the optimization:
>> ![hdr-histogram-optimize-pacer](https://github.com/user-attachments/assets/811f48c5-87eb-462d-8b27-d15bd08be7b0)
>> 
>> With the optimization, long tail latency from the test code below has been much improved from over 20ms to ~10ms on MacOS with M3 chip:
>> 
>>     static final int threadCount = Runtime.getRuntime().availableProcessors();
>>     static final LongAdder totalCount = new LongAdder();
>>     static volatile byte[] sink;
>>     public static void main(String[] args) {
>>         runAllocationTest(100000);
>>     }
>>     static void recordTimeToAllocate(final int dataSize, final Histogram histogram) {
>>         long startTime = System.nanoTime();
>>         sink = new byte[dataSize];
>>         long endTime = System.nanoTime();
>>         histogram.recordValue(endTime - startTime);
>>     }
>> 
>>     static void runAllocationTest(final int dataSize) {
>>         final long endTime = System.currentTimeMillis() + 30_000;
>>         final CountDownLatch startSignal = new CountDownLatch(1);
>>         final CountDownLatch finished = new CountDownLatch(threadCount);
>>         final Thread[] threads = new Thread[threadCount];
>>         final Histogram[] histograms = new Histogram[threadCount];
>>         final Histogram totalHistogram = new Histogram(3600000000000L, 3);
>>         for (int i = 0; i < threadCount; i++) {
>>             final var histogram = new Histogram(3600000000000L, 3);
>>             histograms[i] = histogram;
>>             threads[i] = new Thread(() -> {
>>                 wait(startSignal);
>>                 do {
>>                     recordTimeToAllocate(dataS...
>
> Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision:
> 
>   clean up

@pengxiaolong 
Your change (at version 58196a4f6f9f509525667dba1bd1fb2c2afa3e8e) is now ready to be sponsored by a Committer.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/21099#issuecomment-2379489972


More information about the hotspot-gc-dev mailing list