RFR: 8348638: Performance regression in Math.tanh [v9]
Mohamed Issa
duke at openjdk.org
Mon Apr 28 14:23:52 UTC 2025
On Sat, 26 Apr 2025 01:06:55 GMT, Mohamed Issa <duke at openjdk.org> wrote:
>> The changes described below are meant to resolve the performance regression introduced by the **x86_64 tanh** double precision floating point scalar intrinsic in #20657. Additionally, a new set of micro-benchmarks are included to check the performance of specific input value ranges to help prevent regressions in the future.
>>
>> 1. Check and handle high magnitude input values before those in other ranges. If found, **+/- 1** is returned almost immediately without having to go through too many computations or branches.
>> 2. Reduce the lower bound of the input range that triggers a quick **+/- 1** return from **|x| >= 32** to **|x| >= 22**. This new endpoint is the exact value required for correctness that's used by the original OpenJDK implementation.
>>
>> The results of all tests posted below were captured with an [Intel® Xeon 6761P](https://www.intel.com/content/www/us/en/products/sku/241842/intel-xeon-6761p-processor-336m-cache-2-50-ghz/specifications.html) using [OpenJDK v25-b15](https://github.com/openjdk/jdk/releases/tag/jdk-25%2B15) as the baseline version. The term _baseline1_ refers to runs with the intrinsic enabled and _baseline2_ refers to runs with the intrinsic disabled.
>>
>> For the first set of performance data collected with the new built-in range micro-benchmark, see the tables below. Each result is the mean of 8 individual runs, and the input ranges used match those in the bug report with two additional ones included. In all scenarios, the changes increase throughput values over _baseline1_. The uplift over _baseline1_ is quite significant for the high value (100, 1000, 10000, 100000) scenarios. When comparing against _baseline2_, the changes have significant uplift with the lower value inputs (1, 2, 10, 20, 100). However, they significantly lag behind _baseline2_ when the high value inputs (1000, 10000, 100000) are used.
>>
>> | Input range(s) | Baseline1 (ops/s) | Change (ops/s) | Change vs baseline1 (%) |
>> | :-------------------: | :-----------------: | :----------------: | :-------------------------: |
>> | [-1, 1] | 103342 | 103705 | +0.35 |
>> | [-2, 2] | 99977 | 100819 | +0.84 |
>> | [-10, 10] | 99147 | 100240 | +1.10 |
>> | [-20, 20] | 99419 | 99492 |...
>
> Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision:
>
> Create separate tanh micro-benchmark module to avoid noise in MathBench
@TobiHartmann @vnkozlov Ok to run this through Oracle test framework before integration?
-------------
PR Comment: https://git.openjdk.org/jdk/pull/23889#issuecomment-2835422011
More information about the hotspot-compiler-dev
mailing list