RFR: 8348638: Performance regression in Math.tanh [v2]

Jatin Bhateja jbhateja at openjdk.org
Wed Apr 2 13:51:07 UTC 2025


On Fri, 28 Mar 2025 00:18:41 GMT, Mohamed Issa <duke at openjdk.org> wrote:

>> The changes described below are meant to resolve the performance regression introduced by the **x86_64 tanh** double precision floating point scalar intrinsic in #20657.
>> 
>> 1. Check and handle high magnitude input values before those in other ranges. If found, **+/- 1** is returned almost immediately without having to go through too many computations or branches.
>> 2. Reduce the lower bound of the input range that triggers a quick **+/- 1** return from **|x| >= 32** to **|x| >= 22**. This new endpoint is the exact value required for correctness that's used by the original OpenJDK implementation.
>> 
>> The results of all tests posted below were captured with an [Intel® Xeon 6761P](https://www.intel.com/content/www/us/en/products/sku/241842/intel-xeon-6761p-processor-336m-cache-2-50-ghz/specifications.html) using [OpenJDK v25-b15](https://github.com/openjdk/jdk/releases/tag/jdk-25%2B15) as the baseline version.
>> 
>> For performance data collected with the regression micro-benchmark referenced in the bug report, see the table below.  Each result is the mean of 3 individual runs. In the high value scenarios (100, 1000, 10000, 100000), the changes significantly improve execution times to the point where they are almost at parity with the baseline. Also, there is almost no impact to the low value (1, 2) scenarios. 
>> 
>> | Input range (+/-) | Baseline (ms) | No fix (ms) | With fix (ms) | No fix vs baseline (%) | Fix vs baseline (%) |
>> | :------------------: | :-------------: | :-----------: | :-------------: | :----------------------: | :-------------------: |
>> | 1                          | 1846              | 1925           | 1972              | +4.28                          | +6.83                     |
>> | 2                          | 2099              | 1991           | 2016              | -5.15                           | -3.95                      |
>> | 100                      | 803                | 1007           | 742                | +25.40                        | -7.60                      |
>> | 1000                    | 497                | 635             | 514                | +27.77                        | +3.42                     |
>> | 10000                  | 474                | 572             | 477                | +20.68                        | +0.63                     |
>> | 100000                | 473                | 567             | 474                | +19.87                        | +0.21                     |
>> 
>> For perfo...
>
> Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Change tanh intrinsic endpoint comparison value to match reference OpenJDK implementation

Please add a micro benchmark for different value ranges

-------------

PR Comment: https://git.openjdk.org/jdk/pull/23889#issuecomment-2772623273


More information about the hotspot-compiler-dev mailing list