RFR: 8353686: Optimize Math.cbrt for x86 64 bit platforms [v3]
Sandhya Viswanathan
sviswanathan at openjdk.org
Tue May 27 23:43:53 UTC 2025
On Tue, 6 May 2025 21:45:34 GMT, Mohamed Issa <duke at openjdk.org> wrote:
>> The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.cbrt() using libm. There is a new set of micro-benchmarks are included to check the performance of specific input value ranges to help prevent regressions in the future.
>>
>> The command to run all range specific micro-benchmarks is posted below.
>>
>> `make test TEST="micro:CbrtPerf.CbrtPerfRanges"`
>>
>> The results of all tests posted below were captured with an [Intel® Xeon 6761P](https://www.intel.com/content/www/us/en/products/sku/241842/intel-xeon-6761p-processor-336m-cache-2-50-ghz/specifications.html) using [OpenJDK v25-b21](https://github.com/openjdk/jdk/releases/tag/jdk-25%2B21) as the baseline version.
>>
>> For performance data collected with the new built in range micro-benchmark, see the table below. Each result is the mean of 8 individual runs, and the input ranges used match those from the original Java implementation. Overall, the intrinsic provides a major uplift of 169% when very small inputs are used and a more modest uplift of 45% for all other inputs.
>>
>> | Input range(s) | Throughput with baseline (op/s) | Throughput with intrinsic (op/s) | Speedup |
>> | :-------------------------------------: | :----------------------------------: | :----------------------------------: | :---------: |
>> | [-2^(-1022), 2^(-1022)] | 6568 | 17678 | 2.69x |
>> | (-INF, -2^(-1022)], [2^(-1022), INF) | 138932 | 200897 | 1.45x |
>>
>> Finally, the `jtreg:test/jdk/java/lang/Math/CubeRootTests.java` test passed with the changes.
>
> Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision:
>
> Add new set of cbrt micro-benchmarks
src/hotspot/cpu/x86/macroAssembler_x86.hpp line 1251:
> 1249: void movapd(XMMRegister dst, Address src) { Assembler::movapd(dst, src); }
> 1250: void movapd(XMMRegister dst, AddressLiteral src, Register rscratch = noreg);
> 1251:
You could write it as:
using Assembler::movapd;
void movapd(XMMRegister dst, AddressLiteral src, Register rscratch = noreg);
src/hotspot/cpu/x86/macroAssembler_x86.hpp line 1323:
> 1321: void unpckhpd(XMMRegister dst, XMMRegister src) { Assembler::unpckhpd(dst, src); }
> 1322: void unpcklpd(XMMRegister dst, XMMRegister src) { Assembler::unpcklpd(dst, src); }
> 1323:
Do we need these declarations here?
src/hotspot/cpu/x86/stubGenerator_x86_64_cbrt.cpp line 43:
> 41: //
> 42: // Special cases:
> 43: // cbrt(NaN) = quiet NaN, and raise invalid exception
No exception is raised so the comment needs to be corrected.
src/hotspot/cpu/x86/stubGenerator_x86_64_cbrt.cpp line 226:
> 224: __ andl(rcx, 248);
> 225: __ lea(r8, ExternalAddress(rcp_table));
> 226: __ movsd(xmm4, Address(r8, rcx, Address::times_1));
This address and other instructions using similar address could be written as Address(rcx, r8, Address::times_1).
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/24470#discussion_r2110406675
PR Review Comment: https://git.openjdk.org/jdk/pull/24470#discussion_r2110426188
PR Review Comment: https://git.openjdk.org/jdk/pull/24470#discussion_r2110536680
PR Review Comment: https://git.openjdk.org/jdk/pull/24470#discussion_r2110535561
More information about the graal-dev
mailing list