RFR: JDK-8302736: Major performance regression in Math.log on aarch64
Tobias Holenstein
tholenstein at openjdk.org
Wed May 10 12:52:36 UTC 2023
### Performance java.lang.Math exp, log, log10, pow and tan
The class`java.lang.Math` contains methods for performing basic numeric operations such as the elementary exponential, logarithm, square root, and trigonometric functions. The numeric methods of class `java.lang.StrictMath` are defined to return the bit-for-bit same results on all platforms. The implementations of the equivalent functions in class `java.lang.Math` do not have this requirement. This relaxation permits better-performing implementations where strict reproducibility is not required. By default most of the `java.lang.Math` methods simply call the equivalent method in `java.lang.StrictMath` for their implementation. Code generators (like C2) are encouraged to use platform-specific native libraries or microprocessor instructions, where available, to provide higher-performance implementations of `java.lang.Math` methods. Such higher-performance implementations still must conform to the specification for `java.lang.Math`
Running JMH benchmarks `org.openjdk.bench.java.lang.StrictMathBench` and `org.openjdk.bench.java.lang.MathBench` on `aarch64` shows that for `exp`, `log`, `log10`, `pow` and `tan` `java.lang.Math` is around 10x slower than `java.lang.StrictMath` - which is NOT expected.
### Reason for major performance regression
If there is an intrinsic implemented, like for `Math.sin` and `Math.cos`, C2 generates a `StubRoutines`.
Unfortunately, on `macOS aarch64` there is no intrinsics for `Math.tan`, `Math.exp`, `Math.log`, `Math.pow` and `Math.log10` yet.
_Tracked here:_
[JDK-8189106 AARCH64: create intrinsic for tan - Java Bug System](https://bugs.openjdk.org/browse/JDK-8189106)
[JDK-8189107 AARCH64: create intrinsic for pow - Java Bug System](https://bugs.openjdk.org/browse/JDK-8189107)
[JDK-8307332 AARCH64: create intrinsic for exp - Java Bug System](https://bugs.openjdk.org/browse/JDK-8307332)
[JDK-8210858 AArch64: Math.log intrinsic gives incorrect results - Java Bug System](https://bugs.openjdk.org/browse/JDK-8210858)
Instead, for `Math.tan`, `Math.exp`, `Math.log`, `Math.pow` and `Math.log10` a call to a `c++` function is generated in `LibraryCallKit::inline_math_native` with `CAST_FROM_FN_PTR(address, SharedRuntime:: dlog)`
The shared runtime functions are implemented in `sharedRuntimeTrans.cpp` as follows:
```c++
JRT_LEAF(jdouble, SharedRuntime::dlog(jdouble x))
return __ieee754_log(x);
JRT_END
```
`JRT_LEAF ` uses `VM_LEAF_BASE` which puts a write lock on the code cache:
```c++
MACOS_AARCH64_ONLY(ThreadWXEnable __wx(WXWrite, JavaThread::current()));
This lock causes the 10x slowdown. Since the shared runtime functions do not access the code cache, the lock is not needed.
### Side note about WXWrite
On Apple Silicon the Writer/Execute lock is a new Hardened Runtime capability, see:
https://developer.apple.com/documentation/apple-silicon/porting-just-in-time-compilers-to-apple-silicon
It prevents memory regions to be writable and executable at the same time. Therefore, we need to aquire `WXWrite` when we want to write to the code cache.
### Solution: moving WXWrite from JRT_LEAF
At the moment the `WXWrite` is too coarse grained. This fix removes `WXWrite` lock from `VM_LEAF_BASE` and moves it further down in the call hierarchy. This resolves the performance issue because now the shared runtime functions in `sharedRuntimeTrans.cpp` can be called without the `WXWrite` lock. Overall this change gives performance improvements of 10x for `Math.tan`, `Math.exp`, `Math.log`, `Math.pow` and `Math.log10` on specific JMH benchmarks. Further, it also also give up to 8% performance improvements for example on `SPECjvm2008-XML.transform` on `macOS aarch64`
-------------
Commit messages:
- comment
- moved lock
- comments added
- Delete BenchmarkMath.java
- remove trailing whitespace
- remove redundant lock from OptoRuntime::rethrow_C
- remove lock in InterpreterRuntime::resolve_from_cache
- lock moved down
- benchmark
- Revert "JDK-8302736: Major performance regression in Math.log on aarch64"
- ... and 1 more: https://git.openjdk.org/jdk/compare/5c7ede94...c073342f
Changes: https://git.openjdk.org/jdk/pull/13606/files
Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13606&range=00
Issue: https://bugs.openjdk.org/browse/JDK-8302736
Stats: 14 lines in 4 files changed: 8 ins; 6 del; 0 mod
Patch: https://git.openjdk.org/jdk/pull/13606.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/13606/head:pull/13606
PR: https://git.openjdk.org/jdk/pull/13606
More information about the hotspot-dev
mailing list