RFR: 8265325: Optimize StubRoutines::dpow() for Math.pow(x, 0.5) [v4]
Nils Eliasson
neliasso at openjdk.java.net
Mon Apr 19 09:41:57 UTC 2021
On Fri, 16 Apr 2021 23:51:06 GMT, Jie Fu <jiefu at openjdk.org> wrote:
>> Hi all,
>>
>> I'd like to optimize the StubRoutines::dpow() for Math.pow(x, 0.5).
>>
>> In the pow and sqrt discussion [1], Joe taught me that the Java library implementation of pow has been optimized for pow(x, 2.0) [2] and pow(x, 0.5) [3].
>> However, the hotspot StubRoutines::dpow() only implements the same opt for pow(x, 2.0), but still not for pow(x, 0.5).
>> This patch optimizes StubRoutines::dpow() for pow(x, 0.5).
>>
>> Although not all Math.pow(x, 0.5) can be replaced with sqrt(x), we can still do it safely for the following cases:
>> 1) x >= 0.0 (fully implemented)
>> 2) x is +Inf (fully implemented)
>> 3) x is NaN (can be further divided into +NaN and -NaN and only +NaN is implemented)
>>
>> The effect of this opt has been tested on serveral platforms showing 3.0x ~ 6.3x performance improvement.
>> And no performance drop was observed.
>>
>> Testing:
>> - tier1 ~ tier3 on Linux/x64
>>
>> Thanks.
>> Best regards,
>> Jie
>>
>> [1] https://mail.openjdk.java.net/pipermail/core-libs-dev/2021-April/076220.html
>> [2] https://github.com/openjdk/jdk/blob/d84a7e55be40eae57b6c322694d55661a5053a55/src/java.base/share/classes/java/lang/FdLibm.java#L362
>> [3] https://github.com/openjdk/jdk/blob/d84a7e55be40eae57b6c322694d55661a5053a55/src/java.base/share/classes/java/lang/FdLibm.java#L364
>>
>> Detailed performance numbers:
>> * Linux/Intel
>>
>> --------- Before -----------
>> Benchmark (seed) Mode Cnt Score Error Units
>> MathBench.powDouble 0 thrpt 8 218783.605 ? 838.379 ops/ms
>> MathBench.powDouble0Dot5 0 thrpt 8 45498.351 ? 7.558 ops/ms
>> MathBench.powDouble0Dot5Const 0 thrpt 8 45243.530 ? 1097.100 ops/ms
>> MathBench.powDouble0Dot5Loop 0 thrpt 8 0.031 ? 0.001 ops/ms
>> MathBench.powDoubleLoop 0 thrpt 8 0.031 ? 0.001 ops/ms
>> StrictMathBench.powDouble N/A thrpt 8 176106.602 ? 13127.650 ops/ms
>> ----------------------------
>>
>> --------- After -----------
>> Benchmark (seed) Mode Cnt Score Error Units
>> MathBench.powDouble 0 thrpt 8 219930.462 ? 181.922 ops/ms
>> MathBench.powDouble0Dot5 0 thrpt 8 204966.834 ? 329.032 ops/ms <-- 4.5x up
>> MathBench.powDouble0Dot5Const 0 thrpt 8 203004.302 ? 684.072 ops/ms
>> MathBench.powDouble0Dot5Loop 0 thrpt 8 0.121 ? 0.001 ops/ms <-- 3.9x up
>> MathBench.powDoubleLoop 0 thrpt 8 0.031 ? 0.001 ops/ms
>> StrictMathBench.powDouble N/A thrpt 8 178818.861 ? 16235.465 ops/ms
>> ----------------------------
>>
>>
>> * Linux/AMD
>>
>> --------- Before -----------
>> Benchmark (seed) Mode Cnt Score Error Units
>> MathBench.powDouble 0 thrpt 8 100741.348 ? 207.766 ops/ms
>> MathBench.powDouble0Dot5 0 thrpt 8 33896.623 ? 103.352 ops/ms
>> MathBench.powDouble0Dot5Const 0 thrpt 8 34195.944 ? 230.703 ops/ms
>> MathBench.powDouble0Dot5Loop 0 thrpt 8 0.039 ? 0.001 ops/ms
>> MathBench.powDoubleLoop 0 thrpt 8 0.038 ? 0.001 ops/ms
>> StrictMathBench.powDouble N/A thrpt 8 72000.166 ? 135.002 ops/ms
>> ----------------------------
>>
>> --------- After -----------
>> Benchmark (seed) Mode Cnt Score Error Units
>> MathBench.powDouble 0 thrpt 8 100738.866 ? 222.820 ops/ms
>> MathBench.powDouble0Dot5 0 thrpt 8 100799.098 ? 95.537 ops/ms <-- 3.0x up
>> MathBench.powDouble0Dot5Const 0 thrpt 8 100765.571 ? 178.436 ops/ms
>> MathBench.powDouble0Dot5Loop 0 thrpt 8 0.244 ? 0.002 ops/ms <-- 6.3x up
>> MathBench.powDoubleLoop 0 thrpt 8 0.038 ? 0.001 ops/ms
>> StrictMathBench.powDouble N/A thrpt 8 71758.725 ? 339.660 ops/ms
>> ----------------------------
>>
>>
>> * MacOS/Intel
>>
>> --------- Before -----------
>> Benchmark (seed) Mode Cnt Score Error Units
>> MathBench.powDouble 0 thrpt 8 238064.722 ? 5181.318 ops/ms
>> MathBench.powDouble0Dot5 0 thrpt 8 59235.979 ? 2046.519 ops/ms
>> MathBench.powDouble0Dot5Const 0 thrpt 8 59695.014 ? 1079.692 ops/ms
>> MathBench.powDouble0Dot5Loop 0 thrpt 8 0.040 ? 0.001 ops/ms
>> MathBench.powDoubleLoop 0 thrpt 8 0.041 ? 0.001 ops/ms
>> StrictMathBench.powDouble N/A thrpt 8 238391.026 ? 2743.385 ops/ms
>> ----------------------------
>>
>> --------- After -----------
>> Benchmark (seed) Mode Cnt Score Error Units
>> MathBench.powDouble 0 thrpt 8 238582.414 ? 3661.261 ops/ms
>> MathBench.powDouble0Dot5 0 thrpt 8 224102.701 ? 2846.892 ops/ms <-- 3.8x up
>> MathBench.powDouble0Dot5Const 0 thrpt 8 224542.331 ? 19027.596 ops/ms
>> MathBench.powDouble0Dot5Loop 0 thrpt 8 0.158 ? 0.002 ops/ms <-- 4.0x up
>> MathBench.powDoubleLoop 0 thrpt 8 0.041 ? 0.001 ops/ms
>> StrictMathBench.powDouble N/A thrpt 8 233689.504 ? 10141.034 ops/ms
>> ----------------------------
>
> Jie Fu has updated the pull request incrementally with one additional commit since the last revision:
>
> Revert TestPow0Dot5Opt.java change
Approved.
-------------
Marked as reviewed by neliasso (Reviewer).
PR: https://git.openjdk.java.net/jdk/pull/3536
More information about the hotspot-compiler-dev
mailing list