RFR: 8359435: AArch64: add support for SB instruction to MacroAssembler::spin_wait [v2]

Aleksey Shipilev shade at openjdk.org
Tue Jun 24 16:43:33 UTC 2025


On Tue, 24 Jun 2025 14:53:45 GMT, Evgeny Astigeevich <eastigeevich at openjdk.org> wrote:

>> There is data SB-based spin pauses are less disruptive then ISB-based one on them, so performance is better:
>>   - https://github.com/mysql/mysql-server/pull/611
>>   - https://github.com/facebook/folly/pull/2390
>> 
>> There are discussions regarding using it for spin pauses:
>>   - https://github.com/gperftools/gperftools/pull/1594
>>   - https://github.com/haproxy/haproxy/pull/2974
>> 
>> Instruction support: https://developer.arm.com/documentation/109697/2025_03/Feature-descriptions/The-Armv8-5-architecture-extension
>> 
>> CPUs supporting it:
>> - Apple M2+
>> - Neoverse-N2
>> - Neoverse-V2
>> 
>> Tests:
>> - Gtests passed.
>> - `test/hotspot/jtreg/compiler/onSpinWait/TestOnSpinWaitAArch64.java` passed.
>> - `test/hotspot/jtreg/compiler/onSpinWait/TestOnSpinWaitNoneAArch64.java` passed.
>> 
>> Micro-benchmarks (Graviton 4, c8g.16xlarge (64 CPU), Neoverse-V2):
>> 
>> 
>> Benchmark                   Mode Cnt Score Error Units Diff
>> ThreadOnSpinWait.ISB     avgt     15    11.875 ± 0.129  ns/op
>> ThreadOnSpinWait.SB      avgt     15    6.930 ± 0.054  ns/op  -42%
>> 
>> Benchmark                            (maxNum)  (threadCount)  Mode  Cnt    Score    Error  Units  Diff
>> ThreadOnSpinWaitSharedCounter.ISB   1000000              4  avgt   15  49.874 ± 10.160  ms/op
>> ThreadOnSpinWaitSharedCounter.SB    1000000              4  avgt   15  26.948 ±  4.036  ms/op  -46%
>> ThreadOnSpinWaitSharedCounter.ISB   1000000              8  avgt   15  65.173 ±  7.228  ms/op
>> ThreadOnSpinWaitSharedCounter.SB    1000000              8  avgt   15  44.476 ±  1.292  ms/op  -31%
>> ThreadOnSpinWaitSharedCounter.ISB   1000000             16  avgt   15  177.805 ± 44.925  ms/op
>> ThreadOnSpinWaitSharedCounter.SB    1000000             16  avgt   15  67.267 ± 13.814  ms/op -62%
>> ThreadOnSpinWaitSharedCounter.ISB   1000000             32  avgt   15  265.149 ± 5.353  ms/op
>> ThreadOnSpinWaitSharedCounter.SB    1000000             32  avgt   15  42.297 ± 3.436  ms/op -84%
>> ThreadOnSpinWaitSharedCounter.ISB   1000000             48  avgt   15  125.231 ±  9.272  ms/op
>> ThreadOnSpinWaitSharedCounter.SB    1000000             48  avgt   15  83.504 ± 8.561  ms/op  -33%
>> ThreadOnSpinWaitSharedCounter.ISB   1000000             64  avgt   15  124.505 ±  7.543  ms/op
>> ThreadOnSpinWaitSharedCounter.SB    1000000             64  avgt   15  86.588 ± 9.519  ms/op -30%
>
> Evgeny Astigeevich has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - Add SB detection
>  - Add support for SB to MacroAssembler::spin_wait

Looks reasonable, but test needs more work.

Also, merge from mainline to get windows-aarch64 build fix, so that we test things there too.

test/hotspot/jtreg/compiler/onSpinWait/TestOnSpinWaitAArch64.java line 36:

> 34:  * @run driver compiler.onSpinWait.TestOnSpinWaitAArch64 c2 isb 3
> 35:  * @run driver compiler.onSpinWait.TestOnSpinWaitAArch64 c2 yield 1
> 36:  * @run driver compiler.onSpinWait.TestOnSpinWaitAArch64 c2 sb

Since we are touching up the test: maybe just say `sb 1` explicitly, and then read `spinWaitInstCount` from `args[2]` unconditionally?

test/hotspot/jtreg/compiler/onSpinWait/TestOnSpinWaitAArch64.java line 80:

> 78:         OutputAnalyzer analyzer = new OutputAnalyzer(pb.start());
> 79: 
> 80:         if (analyzer.getExitValue() != 0 && "sb".equals(spinWaitInst) && analyzer.contains("CPU does not support SB")) {

The logic here is a bit off. Suppose we _do_ have non-zero exit code for, say, `isb`. This would not fail the test now. Do it something like this instead?


if ("sb".equals(spinWaitInst) && analyzer.contains("CPU does not support SB")) {
    System.out.println("Skipping the test. The current CPU does not support SB instruction.");
    return;
}

analyzer.shouldHaveExitValue(0);

-------------

PR Review: https://git.openjdk.org/jdk/pull/25801#pullrequestreview-2954582448
PR Comment: https://git.openjdk.org/jdk/pull/25801#issuecomment-3001173366
PR Review Comment: https://git.openjdk.org/jdk/pull/25801#discussion_r2164468471
PR Review Comment: https://git.openjdk.org/jdk/pull/25801#discussion_r2164461092


More information about the hotspot-compiler-dev mailing list