RFR: 8348638: Performance regression in Math.tanh [v3]

Mohamed Issa duke at openjdk.org
Wed Apr 9 23:57:51 UTC 2025


> The changes described below are meant to resolve the performance regression introduced by the **x86_64 tanh** double precision floating point scalar intrinsic in #20657. Additionally, a new micro-benchmark is included to check the performance of specific input value ranges to help prevent regressions in the future.
> 
> 1. Check and handle high magnitude input values before those in other ranges. If found, **+/- 1** is returned almost immediately without having to go through too many computations or branches.
> 2. Reduce the lower bound of the input range that triggers a quick **+/- 1** return from **|x| >= 32** to **|x| >= 22**. This new endpoint is the exact value required for correctness that's used by the original OpenJDK implementation.
> 
> The results of all tests posted below were captured with an [Intel® Xeon 6761P](https://www.intel.com/content/www/us/en/products/sku/241842/intel-xeon-6761p-processor-336m-cache-2-50-ghz/specifications.html) using [OpenJDK v25-b15](https://github.com/openjdk/jdk/releases/tag/jdk-25%2B15) as the baseline version.
> 
> For the first set of performance data collected with the new built-in **tanhRange** micro-benchmark, see the table below.  Each result is the mean of 8 individual runs, and the input ranges used match those in the bug report with two additional ones included. In the high value scenarios (100, 1000, 10000, 100000), the changes significantly increase throughput values over the baseline. Also, there is almost no impact to the low value (1, 2, 10, 20) scenarios.
> 
> | Input range(s)        | Baseline (ops/s) | Change (ops/s) | Change vs baseline (%) |
> | :-------------------: | :----------------: | :----------------: | :------------------------: |
> | [-1, 1]                     | 26.043               | 25.929               | -0.44                              |
> | [-2, 2]                     | 25.330               | 25.260               | -0.28                              |
> | [-10, 10]                 | 24.930               | 24.936               | +0.02                             |
> | [-20, 20]                 | 24.908               | 24.844               | -0.26                              |
> | [-100, 100]             | 53.813               | 76.650               | +42.44                           |
> | [-1000, 1000]         | 84.459               | 115.106             | +36.29                           |
> | [-10000, 10000]     | 93.980               | 123.320             | +31.22                           |
> | [-100000, 1000...

Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision:

  Add new tanh micro-benchmark that covers different ranges of input values

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/23889/files
  - new: https://git.openjdk.org/jdk/pull/23889/files/e563fd73..4a9ad41a

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=23889&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23889&range=01-02

  Stats: 3422 lines in 2 files changed: 60 ins; 2897 del; 465 mod
  Patch: https://git.openjdk.org/jdk/pull/23889.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/23889/head:pull/23889

PR: https://git.openjdk.org/jdk/pull/23889


More information about the hotspot-compiler-dev mailing list