Integrated: 8348638: Performance regression in Math.tanh

Mohamed Issa duke at openjdk.org
Fri May 2 17:25:04 UTC 2025


On Tue, 4 Mar 2025 09:44:32 GMT, Mohamed Issa <duke at openjdk.org> wrote:

> The changes described below are meant to resolve the performance regression introduced by the **x86_64 tanh** double precision floating point scalar intrinsic in #20657. Additionally, a new set of micro-benchmarks are included to check the performance of specific input value ranges to help prevent regressions in the future.
> 
> 1. Check and handle high magnitude input values before those in other ranges. If found, **+/- 1** is returned almost immediately without having to go through too many computations or branches.
> 2. Reduce the lower bound of the input range that triggers a quick **+/- 1** return from **|x| >= 32** to **|x| >= 22**. This new endpoint is the exact value required for correctness that's used by the original OpenJDK implementation.
> 
> The results of all tests posted below were captured with an [Intel® Xeon 6761P](https://www.intel.com/content/www/us/en/products/sku/241842/intel-xeon-6761p-processor-336m-cache-2-50-ghz/specifications.html) using [OpenJDK v25-b15](https://github.com/openjdk/jdk/releases/tag/jdk-25%2B15) as the baseline version. The term _baseline1_ refers to runs with the intrinsic enabled and _baseline2_ refers to runs with the intrinsic disabled.
> 
> For the first set of performance data collected with the new built-in range micro-benchmark, see the tables below.  Each result is the mean of 8 individual runs, and the input ranges used match those in the bug report with two additional ones included. In all scenarios, the changes increase throughput values over _baseline1_. The uplift over _baseline1_ is quite significant for the high value (100, 1000, 10000, 100000) scenarios. When comparing against _baseline2_, the changes have significant uplift with the lower value inputs (1, 2, 10, 20, 100). However, they significantly lag behind _baseline2_ when the high value inputs (1000, 10000, 100000) are used.
> 
> | Input range(s)        | Baseline1 (ops/s) | Change (ops/s) | Change vs baseline1 (%) |
> | :-------------------: | :-----------------: | :----------------: | :-------------------------: |
> | [-1, 1]                     | 103342                | 103705              | +0.35                              |
> | [-2, 2]                     | 99977                  | 100819              | +0.84                              |
> | [-10, 10]                 | 99147                  | 100240              | +1.10                              |
> | [-20, 20]                 | 99419                  | 99492                | +0.07                        ...

This pull request has now been integrated.

Changeset: c8bbcaf5
Author:    Mohamed Issa <mohamed.issa at intel.com>
Committer: Jatin Bhateja <jbhateja at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/c8bbcaf5de6982f673504a8dc766fb80bb6f0d07
Stats:     178 lines in 2 files changed: 160 ins; 7 del; 11 mod

8348638: Performance regression in Math.tanh

Reviewed-by: jbhateja, epeter, sviswanathan

-------------

PR: https://git.openjdk.org/jdk/pull/23889


More information about the hotspot-compiler-dev mailing list