RFR: JDK-8302736: Major performance regression in Math.log on aarch64

David Holmes dholmes at openjdk.org
Thu May 11 01:08:53 UTC 2023


On Mon, 24 Apr 2023 08:10:02 GMT, Tobias Holenstein <tholenstein at openjdk.org> wrote:

> ###  Performance java.lang.Math exp, log, log10, pow and tan
> The class`java.lang.Math` contains methods for performing basic numeric operations such as the elementary exponential, logarithm, square root, and trigonometric functions. The numeric methods of class `java.lang.StrictMath`  are defined to return the bit-for-bit same results on all platforms. The implementations of the equivalent functions in class `java.lang.Math` do not have this requirement.  This relaxation permits better-performing implementations where strict reproducibility is not required. By default most of the `java.lang.Math` methods simply call the equivalent method in `java.lang.StrictMath` for their implementation.  Code generators (like C2) are encouraged to use platform-specific native libraries or microprocessor instructions, where available, to provide higher-performance implementations of `java.lang.Math` methods.  Such higher-performance implementations still must conform to the specification for `java.lang.Math`
> 
> Running JMH benchmarks `org.openjdk.bench.java.lang.StrictMathBench` and  `org.openjdk.bench.java.lang.MathBench` on `aarch64` shows that for `exp`, `log`, `log10`, `pow` and `tan` `java.lang.Math` is around 10x slower than `java.lang.StrictMath` - which is NOT expected.
> 
> ### Reason for major performance regression
> If there is an intrinsic implemented, like for `Math.sin` and `Math.cos`, C2 generates a `StubRoutines`.
> Unfortunately, on `macOS aarch64` there is no intrinsics for `Math.tan`, `Math.exp`, `Math.log`, `Math.pow` and `Math.log10` yet.
> 
> _Tracked here:_
> [JDK-8189106 AARCH64: create intrinsic for tan - Java Bug System](https://bugs.openjdk.org/browse/JDK-8189106)
> [JDK-8189107 AARCH64: create intrinsic for pow - Java Bug System](https://bugs.openjdk.org/browse/JDK-8189107)
> [JDK-8307332 AARCH64: create intrinsic for exp - Java Bug System](https://bugs.openjdk.org/browse/JDK-8307332)
> [JDK-8210858 AArch64: Math.log intrinsic gives incorrect results - Java Bug System](https://bugs.openjdk.org/browse/JDK-8210858)
> 
> Instead, for `Math.tan`, `Math.exp`, `Math.log`, `Math.pow` and `Math.log10` a call to a `c++` function is generated in `LibraryCallKit::inline_math_native` with `CAST_FROM_FN_PTR(address, SharedRuntime:: dlog)` 
>  
> The shared runtime functions are implemented in `sharedRuntimeTrans.cpp` as follows: 
> ```c++ 
> JRT_LEAF(jdouble, SharedRuntime::dlog(jdouble x)) 
>   return __ieee754_log(x); 
> JRT_END 
> ``` 
> 
> `JRT_LEAF ` uses `VM_LEAF_BASE` ...

This is day one code for the macOS/Aarch64 port which has been in place for two years. Why is this only now being seen to be a problem?

The high-level placement of these calls was done to stop playing whack-a-mole every time we hit a new failure due to a missing `ThreadWXEnable`. I'm all for placing these where they are actually needed but noone seems to be to able to clearly state/identify exactly where that is in the code. The changes in this PR are pushing it down further, but based on the comments e.g.

// we might modify the code cache via BarrierSetNMethod::nmethod_entry_barrier
  MACOS_AARCH64_ONLY(ThreadWXEnable __wx(WXWrite, thread));
  return ConfigT::thaw(thread, (Continuation::thaw_kind)kind);

we are not pushing it down to where it is actually needed. The trade-off of course is that if we push this too far down we may have to execute it far more often and so take a performance hit. So figuring out the optimum placement for these in the call stack seems rather difficult.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/13606#issuecomment-1543020620


More information about the hotspot-dev mailing list