RFR: 8358179: Performance regression in Math.cbrt

Mohamed Issa missa at openjdk.org
Wed Jun 25 18:41:41 UTC 2025


The changes described below are meant to resolve the performance regression introduced by the **x86_64 cbrt** double precision floating point scalar intrinsic in #24470.

1. Check for +0, -0, +INF, -INF, and NaN before any other input values.
2. If these special values are found, return immediately with minimal modifications to the result register.

The commands to run all relevant micro-benchmarks are posted below.

`make test TEST="micro:CbrtPerf.CbrtPerfRanges"`
`make test TEST="micro:CbrtPerf.CbrtPerfSpecialValues"`

The results of all tests posted below were captured with an [Intel® Xeon 8488C](https://www.intel.com/content/www/us/en/products/sku/231730/intel-xeon-platinum-8480c-processor-105m-cache-2-00-ghz/specifications.html) using [OpenJDK v26-b1](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B1) as the baseline version. The term _baseline1_ refers to runs with the intrinsic enabled and _baseline2_ refers to runs with the intrinsic disabled.

Each result is the mean of 8 individual runs, and the input ranges used match those from the original Java implementation. Overall, the changes provide a significant uplift over _baseline1_ except for a mild regression in the (**2^(-1022) <= |x| < INF**) input range, which is expected due to the extra checks. When comparing against _baseline2_, the modified intrinsic significantly still outperforms for the inputs (**-INF < x < INF**) that require heavy compute. However, the special value inputs that trigger fast path returns still perform better with _baseline2_.

| Input range(s)                                  | Baseline1 (ops/ms) | Change (ops/ms) | Change vs baseline1 (%) |
| :-------------------------------------: | :-------------------: | :------------------: | :--------------------------: |
| [-2^(-1022), 2^(-1022)]                   | 18470                     | 20847                   | +12.87                             |
| (-INF, -2^(-1022)], [2^(-1022), INF) | 210538                   | 198925                 | -5.52                                |
| [0]                                                     | 344990                  | 627561                 | +81.91                             |
| [-0]                                                   | 291983                   | 629941                 | +115.75                           |
| [INF]                                                 | 382685                   | 542211                 | +41.68                             |
| [-INF]                                                | 386174                  | 542291                 | +40.43                              |
| [NaN]                                               | 421700                   | 615157                 | +45.88                             |

| Input range(s)                                  | Baseline2 (ops/ms) | Change (ops/ms) | Change vs baseline2 (%) |
| :-------------------------------------: | :-------------------: | :------------------: | :--------------------------: |
| [-2^(-1022), 2^(-1022)]                   | 7072                       | 20847                   | +194.78                           |
| (-INF, -2^(-1022)], [2^(-1022), INF) | 147884                   | 198925                 | +34.51                             |
| [0]                                                     | 1890520                | 627561                 | -66.80                               |
| [-0]                                                   | 1890404                 | 629941                 | -66.68                              |
| [INF]                                                 | 1247633                 | 542211                 | -56.54                              |
| [-INF]                                                | 1242287                | 542291                 | -56.35                               |
| [NaN]                                               | 1253700                 | 615157                 | -50.93                               |

Finally, the `jtreg:test/jdk/java/lang/Math/CubeRootTests.java` test passed with the changes.

-------------

Commit messages:
 - Make absolute mask memory constant 16 byte aligned for compatibility with andpd instruction
 - Check for special values first in x86_64 cbrt intrinsic

Changes: https://git.openjdk.org/jdk/pull/25962/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25962&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8358179
  Stats: 49 lines in 1 file changed: 10 ins; 36 del; 3 mod
  Patch: https://git.openjdk.org/jdk/pull/25962.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/25962/head:pull/25962

PR: https://git.openjdk.org/jdk/pull/25962


More information about the hotspot-compiler-dev mailing list