RFR: 8348638: Performance regression in Math.tanh [v9]
Mohamed Issa
duke at openjdk.org
Sat Apr 26 01:06:55 UTC 2025
> The changes described below are meant to resolve the performance regression introduced by the **x86_64 tanh** double precision floating point scalar intrinsic in #20657. Additionally, new constant value micro-benchmarks are included alongside a new micro-benchmark to check the performance of specific input value ranges to help prevent regressions in the future.
>
> 1. Check and handle high magnitude input values before those in other ranges. If found, **+/- 1** is returned almost immediately without having to go through too many computations or branches.
> 2. Reduce the lower bound of the input range that triggers a quick **+/- 1** return from **|x| >= 32** to **|x| >= 22**. This new endpoint is the exact value required for correctness that's used by the original OpenJDK implementation.
>
> The results of all tests posted below were captured with an [Intel® Xeon 6761P](https://www.intel.com/content/www/us/en/products/sku/241842/intel-xeon-6761p-processor-336m-cache-2-50-ghz/specifications.html) using [OpenJDK v25-b15](https://github.com/openjdk/jdk/releases/tag/jdk-25%2B15) as the baseline version. The term _baseline1_ refers to runs with the intrinsic enabled and _baseline2_ refers to runs with the intrinsic disabled.
>
> For the first set of performance data collected with the new built-in **tanhRange** micro-benchmark, see the tables below. Each result is the mean of 8 individual runs, and the input ranges used match those in the bug report with two additional ones included. In the high value scenarios (100, 1000, 10000, 100000), the changes increase throughput values over _baseline1_. Also, there is a small negative impact to the low value (1, 2, 10, 20) scenarios compared to _baseline1_. When comparing against _baseline2_, the changes have significant uplift with the lower value inputs (1, 2, 10, 20, 100). However, they slightly lag behind _baseline2_ when the high value inputs (1000, 10000, 100000) are used.
>
> | Input range(s) | Baseline1 (ops/s) | Change (ops/s) | Change vs baseline1 (%) |
> | :-------------------: | :-----------------: | :----------------: | :-------------------------: |
> | [-1, 1] | 103342 | 103705 | +0.35 |
> | [-2, 2] | 99977 | 100819 | +0.84 |
> | [-10, 10] | 99147 | 100240 | +1.10 |
> | [-20, 20] ...
Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision:
Create separate tanh micro-benchmark module to avoid noise in MathBench
-------------
Changes:
- all: https://git.openjdk.org/jdk/pull/23889/files
- new: https://git.openjdk.org/jdk/pull/23889/files/66be269e..006eef6a
Webrevs:
- full: https://webrevs.openjdk.org/?repo=jdk&pr=23889&range=08
- incr: https://webrevs.openjdk.org/?repo=jdk&pr=23889&range=07-08
Stats: 220 lines in 2 files changed: 154 ins; 65 del; 1 mod
Patch: https://git.openjdk.org/jdk/pull/23889.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/23889/head:pull/23889
PR: https://git.openjdk.org/jdk/pull/23889
More information about the hotspot-compiler-dev
mailing list