Integrated: 8348638: Performance regression in Math.tanh
Mohamed Issa
duke at openjdk.org
Fri May 2 17:25:04 UTC 2025
On Tue, 4 Mar 2025 09:44:32 GMT, Mohamed Issa <duke at openjdk.org> wrote:
> The changes described below are meant to resolve the performance regression introduced by the **x86_64 tanh** double precision floating point scalar intrinsic in #20657. Additionally, a new set of micro-benchmarks are included to check the performance of specific input value ranges to help prevent regressions in the future.
>
> 1. Check and handle high magnitude input values before those in other ranges. If found, **+/- 1** is returned almost immediately without having to go through too many computations or branches.
> 2. Reduce the lower bound of the input range that triggers a quick **+/- 1** return from **|x| >= 32** to **|x| >= 22**. This new endpoint is the exact value required for correctness that's used by the original OpenJDK implementation.
>
> The results of all tests posted below were captured with an [Intel® Xeon 6761P](https://www.intel.com/content/www/us/en/products/sku/241842/intel-xeon-6761p-processor-336m-cache-2-50-ghz/specifications.html) using [OpenJDK v25-b15](https://github.com/openjdk/jdk/releases/tag/jdk-25%2B15) as the baseline version. The term _baseline1_ refers to runs with the intrinsic enabled and _baseline2_ refers to runs with the intrinsic disabled.
>
> For the first set of performance data collected with the new built-in range micro-benchmark, see the tables below. Each result is the mean of 8 individual runs, and the input ranges used match those in the bug report with two additional ones included. In all scenarios, the changes increase throughput values over _baseline1_. The uplift over _baseline1_ is quite significant for the high value (100, 1000, 10000, 100000) scenarios. When comparing against _baseline2_, the changes have significant uplift with the lower value inputs (1, 2, 10, 20, 100). However, they significantly lag behind _baseline2_ when the high value inputs (1000, 10000, 100000) are used.
>
> | Input range(s) | Baseline1 (ops/s) | Change (ops/s) | Change vs baseline1 (%) |
> | :-------------------: | :-----------------: | :----------------: | :-------------------------: |
> | [-1, 1] | 103342 | 103705 | +0.35 |
> | [-2, 2] | 99977 | 100819 | +0.84 |
> | [-10, 10] | 99147 | 100240 | +1.10 |
> | [-20, 20] | 99419 | 99492 | +0.07 ...
This pull request has now been integrated.
Changeset: c8bbcaf5
Author: Mohamed Issa <mohamed.issa at intel.com>
Committer: Jatin Bhateja <jbhateja at openjdk.org>
URL: https://git.openjdk.org/jdk/commit/c8bbcaf5de6982f673504a8dc766fb80bb6f0d07
Stats: 178 lines in 2 files changed: 160 ins; 7 del; 11 mod
8348638: Performance regression in Math.tanh
Reviewed-by: jbhateja, epeter, sviswanathan
-------------
PR: https://git.openjdk.org/jdk/pull/23889
More information about the hotspot-compiler-dev
mailing list