RFR: JDK-8302736: Major performance regression in Math.log on aarch64

Tobias Holenstein tholenstein at openjdk.org
Wed May 10 12:52:36 UTC 2023


###  Performance java.lang.Math exp, log, log10, pow and tan
The class`java.lang.Math` contains methods for performing basic numeric operations such as the elementary exponential, logarithm, square root, and trigonometric functions. The numeric methods of class `java.lang.StrictMath`  are defined to return the bit-for-bit same results on all platforms. The implementations of the equivalent functions in class `java.lang.Math` do not have this requirement.  This relaxation permits better-performing implementations where strict reproducibility is not required. By default most of the `java.lang.Math` methods simply call the equivalent method in `java.lang.StrictMath` for their implementation.  Code generators (like C2) are encouraged to use platform-specific native libraries or microprocessor instructions, where available, to provide higher-performance implementations of `java.lang.Math` methods.  Such higher-performance implementations still must conform to the specification for `java.lang.Math`

Running JMH benchmarks `org.openjdk.bench.java.lang.StrictMathBench` and  `org.openjdk.bench.java.lang.MathBench` on `aarch64` shows that for `exp`, `log`, `log10`, `pow` and `tan` `java.lang.Math` is around 10x slower than `java.lang.StrictMath` - which is NOT expected.

### Reason for major performance regression
If there is an intrinsic implemented, like for `Math.sin` and `Math.cos`, C2 generates a `StubRoutines`.
Unfortunately, on `macOS aarch64` there is no intrinsics for `Math.tan`, `Math.exp`, `Math.log`, `Math.pow` and `Math.log10` yet.

_Tracked here:_
[JDK-8189106 AARCH64: create intrinsic for tan - Java Bug System](https://bugs.openjdk.org/browse/JDK-8189106)
[JDK-8189107 AARCH64: create intrinsic for pow - Java Bug System](https://bugs.openjdk.org/browse/JDK-8189107)
[JDK-8307332 AARCH64: create intrinsic for exp - Java Bug System](https://bugs.openjdk.org/browse/JDK-8307332)
[JDK-8210858 AArch64: Math.log intrinsic gives incorrect results - Java Bug System](https://bugs.openjdk.org/browse/JDK-8210858)

Instead, for `Math.tan`, `Math.exp`, `Math.log`, `Math.pow` and `Math.log10` a call to a `c++` function is generated in `LibraryCallKit::inline_math_native` with `CAST_FROM_FN_PTR(address, SharedRuntime:: dlog)` 
 
The shared runtime functions are implemented in `sharedRuntimeTrans.cpp` as follows: 
```c++ 
JRT_LEAF(jdouble, SharedRuntime::dlog(jdouble x)) 
  return __ieee754_log(x); 
JRT_END 
``` 

`JRT_LEAF ` uses `VM_LEAF_BASE` which puts a write lock on the code cache:
```c++
 MACOS_AARCH64_ONLY(ThreadWXEnable __wx(WXWrite,  JavaThread::current()));


This lock causes the 10x slowdown. Since the shared runtime functions do not access the code cache, the lock is not needed.

### Side note about WXWrite 
On Apple Silicon the Writer/Execute lock is a new Hardened Runtime capability, see:
https://developer.apple.com/documentation/apple-silicon/porting-just-in-time-compilers-to-apple-silicon

It prevents memory regions to be writable and executable at the same time. Therefore, we need to aquire `WXWrite` when we want to write to the code cache.

### Solution: moving WXWrite from JRT_LEAF
At the moment the `WXWrite` is too coarse grained. This fix removes `WXWrite` lock from `VM_LEAF_BASE` and moves it further down in the call hierarchy. This resolves the performance issue because now the shared runtime functions in `sharedRuntimeTrans.cpp` can be called without the `WXWrite` lock. Overall this change gives performance improvements of 10x for `Math.tan`, `Math.exp`, `Math.log`, `Math.pow` and `Math.log10` on specific JMH benchmarks. Further, it also also give up to 8% performance improvements for example on `SPECjvm2008-XML.transform` on `macOS aarch64`

-------------

Commit messages:
 - comment
 - moved lock
 - comments added
 - Delete BenchmarkMath.java
 - remove trailing whitespace
 - remove redundant lock from  OptoRuntime::rethrow_C
 - remove lock in InterpreterRuntime::resolve_from_cache
 - lock moved down
 - benchmark
 - Revert "JDK-8302736: Major performance regression in Math.log on aarch64"
 - ... and 1 more: https://git.openjdk.org/jdk/compare/5c7ede94...c073342f

Changes: https://git.openjdk.org/jdk/pull/13606/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13606&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8302736
  Stats: 14 lines in 4 files changed: 8 ins; 6 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/13606.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/13606/head:pull/13606

PR: https://git.openjdk.org/jdk/pull/13606


More information about the hotspot-dev mailing list