RFR: 8277137: Set OnSpinWaitInst/OnSpinWaitInstCount defaults to "isb"/1 for Arm Neoverse N1 [v3]
Evgeny Astigeevich
duke at openjdk.java.net
Wed Nov 17 15:46:38 UTC 2021
On Wed, 17 Nov 2021 12:31:10 GMT, Evgeny Astigeevich <duke at openjdk.java.net> wrote:
>> One `ISB` implementation of `Thread.OnSpinWait` shows performance improvements on Graviton2 (Arm Neoverse N1 implementation), e.g. https://github.com/openjdk/jdk/pull/5562#issuecomment-966153163.
>>
>> Testing:
>> - `make test TEST=gtest`: Passed
>> - `make run-test TEST=tier1`: Passed
>> - `make run-test TEST=tier2`: Passed
>> - `make run-test TEST=hotspot/jtreg/compiler/onSpinWait`: Passed
>
> Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision:
>
> Set defaults for OnSpinWaitInst/OnSpinWaitInstCount independently
Hi Andrew,
Thank you for reviewing.
> Did we establish that this is the right default for Neoverse N1?
This is based on:
- MySql: https://bugs.mysql.com/bug.php?id=100664
- MongoDB: https://jira.mongodb.org/browse/WT-6872
- Netty: https://github.com/netty/netty/pull/11677
- Customers' benchmarks and workloads.
- Experiments with two and three `ISB` instructions.
> On the other hand, do we know of possible cases where ISB makes things worse?
`Thread.onSpinWait` makes things worse when synchronisation overhead is not on the critical path. It might not improve performance when there is thread contention. In this case it might not give CPU resources to another thread. This applies to both arm64 and x86_64.
For example, my x86 system:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-15
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
Stepping: 7
CPU MHz: 3097.588
BogoMIPS: 4999.99
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 36608K
Results of `org.openjdk.bench.java.lang.ThreadOnSpinWaitSharedCounter` with 4 threads running on 2 vCPUs:
- `taskset -c 0-1 build/linux-x86_64-server-release/images/jdk/bin/java -XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_onSpinWait -jar build/linux-x86_64-server-release/images/test/micro/benchmarks.jar -f 3 org.openjdk.bench.java.lang.ThreadOnSpinWaitSharedCounter`
Benchmark (maxNum) (threadCount) Mode Cnt Score Error Units
ThreadOnSpinWaitSharedCounter.trial 1000000 4 avgt 15 45.317 ± 1.741 ms/op
- `taskset -c 0-1 build/linux-x86_64-server-release/images/jdk/bin/java -jar build/linux-x86_64-server-release/images/test/micro/benchmarks.jar -f 3 org.openjdk.bench.java.lang.ThreadOnSpinWaitSharedCounter`
Benchmark (maxNum) (threadCount) Mode Cnt Score Error Units
ThreadOnSpinWaitSharedCounter.trial 1000000 4 avgt 15 55.530 ± 4.606 ms/op
X86 `PAUSE` based implementation causes 22.5% slowdown.
-------------
PR: https://git.openjdk.java.net/jdk/pull/6415
More information about the hotspot-dev
mailing list