RFR: 8277137: Set OnSpinWaitInst/OnSpinWaitInstCount defaults to "isb"/1 for Arm Neoverse N1 [v3]

Evgeny Astigeevich duke at openjdk.java.net
Wed Nov 17 15:46:38 UTC 2021


On Wed, 17 Nov 2021 12:31:10 GMT, Evgeny Astigeevich <duke at openjdk.java.net> wrote:

>> One `ISB` implementation of `Thread.OnSpinWait` shows performance improvements on Graviton2 (Arm Neoverse N1 implementation), e.g. https://github.com/openjdk/jdk/pull/5562#issuecomment-966153163. 
>> 
>> Testing:
>> - `make test TEST=gtest`: Passed
>> - `make run-test TEST=tier1`: Passed
>> - `make run-test TEST=tier2`: Passed
>> - `make run-test TEST=hotspot/jtreg/compiler/onSpinWait`: Passed
>
> Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Set defaults for OnSpinWaitInst/OnSpinWaitInstCount independently

Hi Andrew,
Thank you for reviewing.

> Did we establish that this is the right default for Neoverse N1?

This is based on:
- MySql: https://bugs.mysql.com/bug.php?id=100664
- MongoDB: https://jira.mongodb.org/browse/WT-6872
- Netty: https://github.com/netty/netty/pull/11677
- Customers' benchmarks and workloads.
- Experiments with two and three `ISB` instructions.

> On the other hand, do we know of possible cases where ISB makes things worse?

`Thread.onSpinWait` makes things worse when synchronisation overhead is not on the critical path. It might not improve performance when there is thread contention. In this case it might not give CPU resources to another thread. This applies to both arm64 and x86_64.
For example, my x86 system:

Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              16
On-line CPU(s) list: 0-15
Thread(s) per core:  2
Core(s) per socket:  8
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               85
Model name:          Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
Stepping:            7
CPU MHz:             3097.588
BogoMIPS:            4999.99
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            1024K
L3 cache:            36608K

Results of `org.openjdk.bench.java.lang.ThreadOnSpinWaitSharedCounter` with 4 threads running on 2 vCPUs:
- `taskset -c 0-1 build/linux-x86_64-server-release/images/jdk/bin/java -XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_onSpinWait -jar build/linux-x86_64-server-release/images/test/micro/benchmarks.jar -f 3 org.openjdk.bench.java.lang.ThreadOnSpinWaitSharedCounter` 

Benchmark                            (maxNum)  (threadCount)  Mode  Cnt   Score   Error  Units
ThreadOnSpinWaitSharedCounter.trial   1000000              4  avgt   15  45.317 ± 1.741  ms/op

- `taskset -c 0-1 build/linux-x86_64-server-release/images/jdk/bin/java -jar build/linux-x86_64-server-release/images/test/micro/benchmarks.jar -f 3 org.openjdk.bench.java.lang.ThreadOnSpinWaitSharedCounter`

Benchmark                            (maxNum)  (threadCount)  Mode  Cnt   Score   Error  Units
ThreadOnSpinWaitSharedCounter.trial   1000000              4  avgt   15  55.530 ± 4.606  ms/op


X86 `PAUSE` based implementation causes 22.5% slowdown.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6415


More information about the hotspot-dev mailing list