RFC: AArch64: Implementing spin pauses with ISB

Astigeevich, Evgeny eastig at amazon.co.uk
Tue Aug 17 20:42:52 UTC 2021


Hi Stuart,

> The ISB instruction wasn't intended to be used for that purpose...

It might be a time for YIELD to be a real instruction, especially on Neoverse. High thread contention is a typical situation in server workloads.
If it would be great if Neoverse architects consider this.

> Your experiments were with one ISB - did you experiment at all with multiple ISBs? I'm curious as to what the overall effect would be.

Yes, there were experiments with 2 ISBs. With 2 ISBs the performance improvements were less. Graviton 2 performance engineers' explanation of this is that spins should target 15-30ns. One ISB allows to be within these limits. Two and more ISBs get longer spins. It increases chances of an expensive code path and the OS to reschedule threads.

Thanks,
Evgeny


On 16/08/2021, 23:43, "hotspot-dev on behalf of Stuart Monteith" <hotspot-dev-retn at openjdk.java.net on behalf of stuart.monteith at arm.com> wrote:

    CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.



    On 10/08/2021 22:52, Astigeevich, Evgeny wrote:
    > Hello everyone.
    >
    > We’d like to discuss a proposal for implementing spin pauses with the ISB instruction:
    >
    > https://bugs.openjdk.java.net/browse/JDK-8186670 “Implement _onSpinWait() intrinsic for AArch64”
    > https://bugs.openjdk.java.net/browse/JDK-8258604 “Use 'isb' instruction in SpinPause on linux-aarch64”
    >
    > In 2017, Dmitry Chuyko from BellSoft proposed to implement onSpinWait() on ARM with the help of the YIELD instruction (see JDK-8186670). The contribution was discussed on the mailing list (http://mail.openjdk.java.net/pipermail/aarch64-port-dev/2017-August/004870.html) but never made it into OpenJDK 10 because at that time there were no hardware YIELD implementations and the exact effect was therefore unknown. Dmitry did a nice writeup and produced some benchmark results (http://cr.openjdk.java.net/~dchuyko/8186670/yield/spinwait.html). He was able to test and measure his implementation only on Cavium ThunderX and Raspberry Pi 3, which both implement YIELD as a NOP. Improvements were therefore minimal.
    >
    > As I am aware the YIELD instruction is still implemented as a NOP by most of AArch64 vendors. There were experiments to use a sequence of NOPs to emulate x86 PAUSE.
    > At Amazon experiments with the ISB on Graviton 2 showed it can create a small delay without consuming ALU resources. It is more reliable than NOPs. The experiments resulted in contributions of ISB-based spin pauses to MySQL (https://bugs.mysql.com/bug.php?id=100664) and MongoDb (https://jira.mongodb.org/browse/WT-6872).
    >
    > We have tested the ISB-based spin pauses for JDK11. Internal customer-based benchmarks showed 3% - 7% improvements in latencies and throughput.
    > We also used https://github.com/ben-manes/caffeine/wiki/Benchmarks microbenchmarks and a SynchronousQueue microbenchmark (you can find it in https://bugs.openjdk.java.net/browse/JDK-8267502). Caffeine GetPutBenchmark for LinkedHashMap_Lru, which has synchronized accesses to a cache, got +14% - +29% improvement in throughput. The SynchronousQueue microbenchmark, which uses Thread.onSpinWait, got 2.9x improvement.
    >
    > OpenJDK tip with the ISB-based spin pauses gets similar improvements in Caffeine GetPutBenchmark for LinkedHashMap_Lru. OpenJDK tip java.util.concurrent.SynchronousQueue does not use Thread.onSpinWait. Because of this the SynchronousQueue microbenchmark gets no improvements.
    >
    > We would like to contribute the ISB-based spin pauses implementation to OpenJDK: SpinPause (https://github.com/corretto/corretto-11/commit/dfb4965877a5810011514bd9294175eccd4b6d0d) and intrinsic (https://github.com/corretto/corretto-11/commit/a49a79bb2e7ac4a2265c51c9fb1c3fcf90dc7c9d).
    > If there is an interest in the contribution, the open question is whether it should be enabled for all AArch64 implementations or only for Neoverse N1.
    >
    > Comments welcome!
    >
    > Thanks,
    > Evgeny
    >
    >
    >
    > Amazon Development Centre (London) Ltd. Registered in England and Wales with registration number 04543232 with its registered office at 1 Principal Place, Worship Street, London EC2A 2FA, United Kingdom.
    >
    >

    Hello Evgeny,
            This is interesting, and thank you for bringing it up for discussion here. The ISB instruction wasn't intended to be
    used for that purpose, so while you can measure a benefit for now, there is no guarantee that it would continue to be
    beneficial in the future. I hate to suggest adding more flags, but we ought to consider adding one to disable the ISB
    instruction in the spins. The counter argument is of course that we'd update the implementation as new cores come out.

    Your experiments were with one ISB - did you experiment at all with multiple ISBs? I'm curious as to what the overall
    effect would be.

    BR,
            Stuart




Amazon Development Centre (London) Ltd. Registered in England and Wales with registration number 04543232 with its registered office at 1 Principal Place, Worship Street, London EC2A 2FA, United Kingdom.




More information about the hotspot-dev mailing list