RFR: 8358179: Performance regression in Math.cbrt [v2]

Yudi Zheng yzheng at openjdk.org
Wed Jul 2 11:28:47 UTC 2025


On Fri, 27 Jun 2025 01:43:16 GMT, Mohamed Issa <missa at openjdk.org> wrote:

>> The changes described below are meant to resolve the performance regression introduced by the **x86_64 cbrt** double precision floating point scalar intrinsic in #24470.
>> 
>> 1. Check for +0, -0, +INF, -INF, and NaN before any other input values.
>> 2. If these special values are found, return immediately with minimal modifications to the result register.
>> 3. Performance testing shows the modified intrinsic improves throughput by 65.1% over the original intrinsic on average for the special values while throughput drops by 5.5% for the normal value range (-INF, -2^(-1022)], [2^(-1022), INF).
>> 
>> The commands to run all relevant micro-benchmarks are posted below.
>> 
>> `make test TEST="micro:CbrtPerf.CbrtPerfRanges"`
>> `make test TEST="micro:CbrtPerf.CbrtPerfSpecialValues"`
>> 
>> The results of all tests posted below were captured with an [Intel® Xeon 8488C](https://www.intel.com/content/www/us/en/products/sku/231730/intel-xeon-platinum-8480c-processor-105m-cache-2-00-ghz/specifications.html) using [OpenJDK v26-b1](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B1) as the baseline version. The term _baseline1_ refers to runs with the intrinsic enabled and _baseline2_ refers to runs with the intrinsic disabled.
>> 
>> Each result is the mean of 8 individual runs, and the input ranges used match those from the original Java implementation. Overall, the changes provide a significant uplift over _baseline1_ except for a mild regression in the (**2^(-1022) <= |x| < INF**) input range, which is expected due to the extra checks. When comparing against _baseline2_, the modified intrinsic significantly still outperforms for the inputs (**-INF < x < INF**) that require heavy compute. However, the special value inputs that trigger fast path returns still perform better with _baseline2_.
>> 
>> | Input range(s)                                  | Baseline1 (ops/ms) | Change (ops/ms) | Change vs baseline1 (%) |
>> | :-------------------------------------: | :-------------------: | :------------------: | :--------------------------: |
>> | [-2^(-1022), 2^(-1022)]                   | 18470                     | 20847                   | +12.87                             |
>> | (-INF, -2^(-1022)], [2^(-1022), INF) | 210538                   | 198925                 | -5.52                                |
>> | [0]                                                     | 344990                  | 627561                 | +81.91                             |
>> | [-0]              ...
>
> Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Ensure ABS_MASK is a 128-bit memory sized location and only use equal enum for UCOMISD checks

src/hotspot/cpu/x86/stubGenerator_x86_64_cbrt.cpp line 350:

> 348: 
> 349:   __ bind(L_2TAG_PACKET_6_0_1);
> 350:   __ movsd(xmm0, ExternalAddress(NEG_INF), r11 /*rscratch*/);

note that `NEG_INF` is now unused

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25962#discussion_r2179808403


More information about the hotspot-compiler-dev mailing list