RFR: 8358179: Performance regression in Math.cbrt
Sandhya Viswanathan
sviswanathan at openjdk.org
Thu Jun 26 22:56:40 UTC 2025
On Tue, 24 Jun 2025 22:33:56 GMT, Mohamed Issa <missa at openjdk.org> wrote:
> The changes described below are meant to resolve the performance regression introduced by the **x86_64 cbrt** double precision floating point scalar intrinsic in #24470.
>
> 1. Check for +0, -0, +INF, -INF, and NaN before any other input values.
> 2. If these special values are found, return immediately with minimal modifications to the result register.
>
> The commands to run all relevant micro-benchmarks are posted below.
>
> `make test TEST="micro:CbrtPerf.CbrtPerfRanges"`
> `make test TEST="micro:CbrtPerf.CbrtPerfSpecialValues"`
>
> The results of all tests posted below were captured with an [Intel® Xeon 8488C](https://www.intel.com/content/www/us/en/products/sku/231730/intel-xeon-platinum-8480c-processor-105m-cache-2-00-ghz/specifications.html) using [OpenJDK v26-b1](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B1) as the baseline version. The term _baseline1_ refers to runs with the intrinsic enabled and _baseline2_ refers to runs with the intrinsic disabled.
>
> Each result is the mean of 8 individual runs, and the input ranges used match those from the original Java implementation. Overall, the changes provide a significant uplift over _baseline1_ except for a mild regression in the (**2^(-1022) <= |x| < INF**) input range, which is expected due to the extra checks. When comparing against _baseline2_, the modified intrinsic significantly still outperforms for the inputs (**-INF < x < INF**) that require heavy compute. However, the special value inputs that trigger fast path returns still perform better with _baseline2_.
>
> | Input range(s) | Baseline1 (ops/ms) | Change (ops/ms) | Change vs baseline1 (%) |
> | :-------------------------------------: | :-------------------: | :------------------: | :--------------------------: |
> | [-2^(-1022), 2^(-1022)] | 18470 | 20847 | +12.87 |
> | (-INF, -2^(-1022)], [2^(-1022), INF) | 210538 | 198925 | -5.52 |
> | [0] | 344990 | 627561 | +81.91 |
> | [-0] | 291983 | 629941 | +115.75 |
> | [INF] | 382685 | 542211 | +41.68 |
> | [-INF...
src/hotspot/cpu/x86/stubGenerator_x86_64_cbrt.cpp line 51:
> 49: ATTRIBUTE_ALIGNED(16) static const juint _ABS_MASK[] =
> 50: {
> 51: 4294967295, 2147483647
This should be a 128 bit constant as we are using it with andpd. Also please add in comments the hex value.
src/hotspot/cpu/x86/stubGenerator_x86_64_cbrt.cpp line 217:
> 215: __ bind(B1_1);
> 216: __ ucomisd(xmm0, ExternalAddress(ZERON), r11 /*rscratch*/);
> 217: __ jcc(Assembler::zero, L_2TAG_PACKET_1_0_1); // Branch only if x is +/- zero or NaN
This could be Assembler::equal to be consistent.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/25962#discussion_r2170180412
PR Review Comment: https://git.openjdk.org/jdk/pull/25962#discussion_r2170181404
More information about the hotspot-compiler-dev
mailing list