RFR: 8359435: AArch64: add support for SB instruction to MacroAssembler::spin_wait [v2]
Aleksey Shipilev
shade at openjdk.org
Tue Jun 24 16:43:33 UTC 2025
On Tue, 24 Jun 2025 14:53:45 GMT, Evgeny Astigeevich <eastigeevich at openjdk.org> wrote:
>> There is data SB-based spin pauses are less disruptive then ISB-based one on them, so performance is better:
>> - https://github.com/mysql/mysql-server/pull/611
>> - https://github.com/facebook/folly/pull/2390
>>
>> There are discussions regarding using it for spin pauses:
>> - https://github.com/gperftools/gperftools/pull/1594
>> - https://github.com/haproxy/haproxy/pull/2974
>>
>> Instruction support: https://developer.arm.com/documentation/109697/2025_03/Feature-descriptions/The-Armv8-5-architecture-extension
>>
>> CPUs supporting it:
>> - Apple M2+
>> - Neoverse-N2
>> - Neoverse-V2
>>
>> Tests:
>> - Gtests passed.
>> - `test/hotspot/jtreg/compiler/onSpinWait/TestOnSpinWaitAArch64.java` passed.
>> - `test/hotspot/jtreg/compiler/onSpinWait/TestOnSpinWaitNoneAArch64.java` passed.
>>
>> Micro-benchmarks (Graviton 4, c8g.16xlarge (64 CPU), Neoverse-V2):
>>
>>
>> Benchmark Mode Cnt Score Error Units Diff
>> ThreadOnSpinWait.ISB avgt 15 11.875 ± 0.129 ns/op
>> ThreadOnSpinWait.SB avgt 15 6.930 ± 0.054 ns/op -42%
>>
>> Benchmark (maxNum) (threadCount) Mode Cnt Score Error Units Diff
>> ThreadOnSpinWaitSharedCounter.ISB 1000000 4 avgt 15 49.874 ± 10.160 ms/op
>> ThreadOnSpinWaitSharedCounter.SB 1000000 4 avgt 15 26.948 ± 4.036 ms/op -46%
>> ThreadOnSpinWaitSharedCounter.ISB 1000000 8 avgt 15 65.173 ± 7.228 ms/op
>> ThreadOnSpinWaitSharedCounter.SB 1000000 8 avgt 15 44.476 ± 1.292 ms/op -31%
>> ThreadOnSpinWaitSharedCounter.ISB 1000000 16 avgt 15 177.805 ± 44.925 ms/op
>> ThreadOnSpinWaitSharedCounter.SB 1000000 16 avgt 15 67.267 ± 13.814 ms/op -62%
>> ThreadOnSpinWaitSharedCounter.ISB 1000000 32 avgt 15 265.149 ± 5.353 ms/op
>> ThreadOnSpinWaitSharedCounter.SB 1000000 32 avgt 15 42.297 ± 3.436 ms/op -84%
>> ThreadOnSpinWaitSharedCounter.ISB 1000000 48 avgt 15 125.231 ± 9.272 ms/op
>> ThreadOnSpinWaitSharedCounter.SB 1000000 48 avgt 15 83.504 ± 8.561 ms/op -33%
>> ThreadOnSpinWaitSharedCounter.ISB 1000000 64 avgt 15 124.505 ± 7.543 ms/op
>> ThreadOnSpinWaitSharedCounter.SB 1000000 64 avgt 15 86.588 ± 9.519 ms/op -30%
>
> Evgeny Astigeevich has updated the pull request incrementally with two additional commits since the last revision:
>
> - Add SB detection
> - Add support for SB to MacroAssembler::spin_wait
Looks reasonable, but test needs more work.
Also, merge from mainline to get windows-aarch64 build fix, so that we test things there too.
test/hotspot/jtreg/compiler/onSpinWait/TestOnSpinWaitAArch64.java line 36:
> 34: * @run driver compiler.onSpinWait.TestOnSpinWaitAArch64 c2 isb 3
> 35: * @run driver compiler.onSpinWait.TestOnSpinWaitAArch64 c2 yield 1
> 36: * @run driver compiler.onSpinWait.TestOnSpinWaitAArch64 c2 sb
Since we are touching up the test: maybe just say `sb 1` explicitly, and then read `spinWaitInstCount` from `args[2]` unconditionally?
test/hotspot/jtreg/compiler/onSpinWait/TestOnSpinWaitAArch64.java line 80:
> 78: OutputAnalyzer analyzer = new OutputAnalyzer(pb.start());
> 79:
> 80: if (analyzer.getExitValue() != 0 && "sb".equals(spinWaitInst) && analyzer.contains("CPU does not support SB")) {
The logic here is a bit off. Suppose we _do_ have non-zero exit code for, say, `isb`. This would not fail the test now. Do it something like this instead?
if ("sb".equals(spinWaitInst) && analyzer.contains("CPU does not support SB")) {
System.out.println("Skipping the test. The current CPU does not support SB instruction.");
return;
}
analyzer.shouldHaveExitValue(0);
-------------
PR Review: https://git.openjdk.org/jdk/pull/25801#pullrequestreview-2954582448
PR Comment: https://git.openjdk.org/jdk/pull/25801#issuecomment-3001173366
PR Review Comment: https://git.openjdk.org/jdk/pull/25801#discussion_r2164468471
PR Review Comment: https://git.openjdk.org/jdk/pull/25801#discussion_r2164461092
More information about the hotspot-compiler-dev
mailing list