RFR: 8265325: Optimize StubRoutines::dpow() for Math.pow(x, 0.5) [v3]
Jie Fu
jiefu at openjdk.java.net
Fri Apr 16 23:04:56 UTC 2021
> Hi all,
>
> I'd like to optimize the StubRoutines::dpow() for Math.pow(x, 0.5).
>
> In the pow and sqrt discussion [1], Joe taught me that the Java library implementation of pow has been optimized for pow(x, 2.0) [2] and pow(x, 0.5) [3].
> However, the hotspot StubRoutines::dpow() only implements the same opt for pow(x, 2.0), but still not for pow(x, 0.5).
> This patch optimizes StubRoutines::dpow() for pow(x, 0.5).
>
> Although not all Math.pow(x, 0.5) can be replaced with sqrt(x), we can still do it safely for the following cases:
> 1) x >= 0.0 (fully implemented)
> 2) x is +Inf (fully implemented)
> 3) x is NaN (can be further divided into +NaN and -NaN and only +NaN is implemented)
>
> The effect of this opt has been tested on serveral platforms showing 3.0x ~ 6.3x performance improvement.
> And no performance drop was observed.
>
> Testing:
> - tier1 ~ tier3 on Linux/x64
>
> Thanks.
> Best regards,
> Jie
>
> [1] https://mail.openjdk.java.net/pipermail/core-libs-dev/2021-April/076220.html
> [2] https://github.com/openjdk/jdk/blob/d84a7e55be40eae57b6c322694d55661a5053a55/src/java.base/share/classes/java/lang/FdLibm.java#L362
> [3] https://github.com/openjdk/jdk/blob/d84a7e55be40eae57b6c322694d55661a5053a55/src/java.base/share/classes/java/lang/FdLibm.java#L364
>
> Detailed performance numbers:
> * Linux/Intel
>
> --------- Before -----------
> Benchmark (seed) Mode Cnt Score Error Units
> MathBench.powDouble 0 thrpt 8 218783.605 ? 838.379 ops/ms
> MathBench.powDouble0Dot5 0 thrpt 8 45498.351 ? 7.558 ops/ms
> MathBench.powDouble0Dot5Const 0 thrpt 8 45243.530 ? 1097.100 ops/ms
> MathBench.powDouble0Dot5Loop 0 thrpt 8 0.031 ? 0.001 ops/ms
> MathBench.powDoubleLoop 0 thrpt 8 0.031 ? 0.001 ops/ms
> StrictMathBench.powDouble N/A thrpt 8 176106.602 ? 13127.650 ops/ms
> ----------------------------
>
> --------- After -----------
> Benchmark (seed) Mode Cnt Score Error Units
> MathBench.powDouble 0 thrpt 8 219930.462 ? 181.922 ops/ms
> MathBench.powDouble0Dot5 0 thrpt 8 204966.834 ? 329.032 ops/ms <-- 4.5x up
> MathBench.powDouble0Dot5Const 0 thrpt 8 203004.302 ? 684.072 ops/ms
> MathBench.powDouble0Dot5Loop 0 thrpt 8 0.121 ? 0.001 ops/ms <-- 3.9x up
> MathBench.powDoubleLoop 0 thrpt 8 0.031 ? 0.001 ops/ms
> StrictMathBench.powDouble N/A thrpt 8 178818.861 ? 16235.465 ops/ms
> ----------------------------
>
>
> * Linux/AMD
>
> --------- Before -----------
> Benchmark (seed) Mode Cnt Score Error Units
> MathBench.powDouble 0 thrpt 8 100741.348 ? 207.766 ops/ms
> MathBench.powDouble0Dot5 0 thrpt 8 33896.623 ? 103.352 ops/ms
> MathBench.powDouble0Dot5Const 0 thrpt 8 34195.944 ? 230.703 ops/ms
> MathBench.powDouble0Dot5Loop 0 thrpt 8 0.039 ? 0.001 ops/ms
> MathBench.powDoubleLoop 0 thrpt 8 0.038 ? 0.001 ops/ms
> StrictMathBench.powDouble N/A thrpt 8 72000.166 ? 135.002 ops/ms
> ----------------------------
>
> --------- After -----------
> Benchmark (seed) Mode Cnt Score Error Units
> MathBench.powDouble 0 thrpt 8 100738.866 ? 222.820 ops/ms
> MathBench.powDouble0Dot5 0 thrpt 8 100799.098 ? 95.537 ops/ms <-- 3.0x up
> MathBench.powDouble0Dot5Const 0 thrpt 8 100765.571 ? 178.436 ops/ms
> MathBench.powDouble0Dot5Loop 0 thrpt 8 0.244 ? 0.002 ops/ms <-- 6.3x up
> MathBench.powDoubleLoop 0 thrpt 8 0.038 ? 0.001 ops/ms
> StrictMathBench.powDouble N/A thrpt 8 71758.725 ? 339.660 ops/ms
> ----------------------------
>
>
> * MacOS/Intel
>
> --------- Before -----------
> Benchmark (seed) Mode Cnt Score Error Units
> MathBench.powDouble 0 thrpt 8 238064.722 ? 5181.318 ops/ms
> MathBench.powDouble0Dot5 0 thrpt 8 59235.979 ? 2046.519 ops/ms
> MathBench.powDouble0Dot5Const 0 thrpt 8 59695.014 ? 1079.692 ops/ms
> MathBench.powDouble0Dot5Loop 0 thrpt 8 0.040 ? 0.001 ops/ms
> MathBench.powDoubleLoop 0 thrpt 8 0.041 ? 0.001 ops/ms
> StrictMathBench.powDouble N/A thrpt 8 238391.026 ? 2743.385 ops/ms
> ----------------------------
>
> --------- After -----------
> Benchmark (seed) Mode Cnt Score Error Units
> MathBench.powDouble 0 thrpt 8 238582.414 ? 3661.261 ops/ms
> MathBench.powDouble0Dot5 0 thrpt 8 224102.701 ? 2846.892 ops/ms <-- 3.8x up
> MathBench.powDouble0Dot5Const 0 thrpt 8 224542.331 ? 19027.596 ops/ms
> MathBench.powDouble0Dot5Loop 0 thrpt 8 0.158 ? 0.002 ops/ms <-- 4.0x up
> MathBench.powDoubleLoop 0 thrpt 8 0.041 ? 0.001 ops/ms
> StrictMathBench.powDouble N/A thrpt 8 233689.504 ? 10141.034 ops/ms
> ----------------------------
Jie Fu has updated the pull request incrementally with one additional commit since the last revision:
Fix tests
-------------
Changes:
- all: https://git.openjdk.java.net/jdk/pull/3536/files
- new: https://git.openjdk.java.net/jdk/pull/3536/files/a97cb957..dc194975
Webrevs:
- full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3536&range=02
- incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3536&range=01-02
Stats: 5 lines in 2 files changed: 3 ins; 0 del; 2 mod
Patch: https://git.openjdk.java.net/jdk/pull/3536.diff
Fetch: git fetch https://git.openjdk.java.net/jdk pull/3536/head:pull/3536
PR: https://git.openjdk.java.net/jdk/pull/3536
More information about the hotspot-compiler-dev
mailing list