RFC: AArch64: Implementing spin pauses with ISB
Astigeevich, Evgeny
eastig at amazon.co.uk
Tue Aug 10 21:52:17 UTC 2021
Hello everyone.
We’d like to discuss a proposal for implementing spin pauses with the ISB instruction:
https://bugs.openjdk.java.net/browse/JDK-8186670 “Implement _onSpinWait() intrinsic for AArch64”
https://bugs.openjdk.java.net/browse/JDK-8258604 “Use 'isb' instruction in SpinPause on linux-aarch64”
In 2017, Dmitry Chuyko from BellSoft proposed to implement onSpinWait() on ARM with the help of the YIELD instruction (see JDK-8186670). The contribution was discussed on the mailing list (http://mail.openjdk.java.net/pipermail/aarch64-port-dev/2017-August/004870.html) but never made it into OpenJDK 10 because at that time there were no hardware YIELD implementations and the exact effect was therefore unknown. Dmitry did a nice writeup and produced some benchmark results (http://cr.openjdk.java.net/~dchuyko/8186670/yield/spinwait.html). He was able to test and measure his implementation only on Cavium ThunderX and Raspberry Pi 3, which both implement YIELD as a NOP. Improvements were therefore minimal.
As I am aware the YIELD instruction is still implemented as a NOP by most of AArch64 vendors. There were experiments to use a sequence of NOPs to emulate x86 PAUSE.
At Amazon experiments with the ISB on Graviton 2 showed it can create a small delay without consuming ALU resources. It is more reliable than NOPs. The experiments resulted in contributions of ISB-based spin pauses to MySQL (https://bugs.mysql.com/bug.php?id=100664) and MongoDb (https://jira.mongodb.org/browse/WT-6872).
We have tested the ISB-based spin pauses for JDK11. Internal customer-based benchmarks showed 3% - 7% improvements in latencies and throughput.
We also used https://github.com/ben-manes/caffeine/wiki/Benchmarks microbenchmarks and a SynchronousQueue microbenchmark (you can find it in https://bugs.openjdk.java.net/browse/JDK-8267502). Caffeine GetPutBenchmark for LinkedHashMap_Lru, which has synchronized accesses to a cache, got +14% - +29% improvement in throughput. The SynchronousQueue microbenchmark, which uses Thread.onSpinWait, got 2.9x improvement.
OpenJDK tip with the ISB-based spin pauses gets similar improvements in Caffeine GetPutBenchmark for LinkedHashMap_Lru. OpenJDK tip java.util.concurrent.SynchronousQueue does not use Thread.onSpinWait. Because of this the SynchronousQueue microbenchmark gets no improvements.
We would like to contribute the ISB-based spin pauses implementation to OpenJDK: SpinPause (https://github.com/corretto/corretto-11/commit/dfb4965877a5810011514bd9294175eccd4b6d0d) and intrinsic (https://github.com/corretto/corretto-11/commit/a49a79bb2e7ac4a2265c51c9fb1c3fcf90dc7c9d).
If there is an interest in the contribution, the open question is whether it should be enabled for all AArch64 implementations or only for Neoverse N1.
Comments welcome!
Thanks,
Evgeny
Amazon Development Centre (London) Ltd. Registered in England and Wales with registration number 04543232 with its registered office at 1 Principal Place, Worship Street, London EC2A 2FA, United Kingdom.
More information about the hotspot-dev
mailing list