RFR: 8265325: Optimize StubRoutines::dpow() for Math.pow(x, 0.5)
Jie Fu
jiefu at openjdk.java.net
Fri Apr 16 06:58:00 UTC 2021
Hi all,
I'd like to optimize the StubRoutines::dpow() for Math.pow(x, 0.5).
In the pow and sqrt discussion [1], Joe taught me that the Java library implementation of pow has been optimized for pow(x, 2.0) [2] and pow(x, 0.5) [3].
However, the hotspot StubRoutines::dpow() only implements the same opt for pow(x, 2.0), but still not for pow(x, 0.5).
This patch optimizes StubRoutines::dpow() for pow(x, 0.5).
Although not all Math.pow(x, 0.5) can be replaced with sqrt(x), we can still do it safely for the following cases:
1) x >= 0.0 (fully implemented)
2) x is +Inf (fully implemented)
3) x is NaN (can be further divided into +NaN and -NaN and only +NaN is implemented)
The effect of this opt has been tested on serveral platforms showing 3.0x ~ 6.3x performance improvement.
And no performance drop was observed.
Testing:
- tier1 ~ tier3 on Linux/x64
Thanks.
Best regards,
Jie
[1] https://mail.openjdk.java.net/pipermail/core-libs-dev/2021-April/076220.html
[2] https://github.com/openjdk/jdk/blob/d84a7e55be40eae57b6c322694d55661a5053a55/src/java.base/share/classes/java/lang/FdLibm.java#L362
[3] https://github.com/openjdk/jdk/blob/d84a7e55be40eae57b6c322694d55661a5053a55/src/java.base/share/classes/java/lang/FdLibm.java#L364
Detailed performance numbers:
* Linux/Intel
--------- Before -----------
Benchmark (seed) Mode Cnt Score Error Units
MathBench.powDouble 0 thrpt 8 218783.605 ? 838.379 ops/ms
MathBench.powDouble0Dot5 0 thrpt 8 45498.351 ? 7.558 ops/ms
MathBench.powDouble0Dot5Const 0 thrpt 8 45243.530 ? 1097.100 ops/ms
MathBench.powDouble0Dot5Loop 0 thrpt 8 0.031 ? 0.001 ops/ms
MathBench.powDoubleLoop 0 thrpt 8 0.031 ? 0.001 ops/ms
StrictMathBench.powDouble N/A thrpt 8 176106.602 ? 13127.650 ops/ms
----------------------------
--------- After -----------
Benchmark (seed) Mode Cnt Score Error Units
MathBench.powDouble 0 thrpt 8 219930.462 ? 181.922 ops/ms
MathBench.powDouble0Dot5 0 thrpt 8 204966.834 ? 329.032 ops/ms <-- 4.5x up
MathBench.powDouble0Dot5Const 0 thrpt 8 203004.302 ? 684.072 ops/ms
MathBench.powDouble0Dot5Loop 0 thrpt 8 0.121 ? 0.001 ops/ms <-- 3.9x up
MathBench.powDoubleLoop 0 thrpt 8 0.031 ? 0.001 ops/ms
StrictMathBench.powDouble N/A thrpt 8 178818.861 ? 16235.465 ops/ms
----------------------------
* Linux/AMD
--------- Before -----------
Benchmark (seed) Mode Cnt Score Error Units
MathBench.powDouble 0 thrpt 8 100741.348 ? 207.766 ops/ms
MathBench.powDouble0Dot5 0 thrpt 8 33896.623 ? 103.352 ops/ms
MathBench.powDouble0Dot5Const 0 thrpt 8 34195.944 ? 230.703 ops/ms
MathBench.powDouble0Dot5Loop 0 thrpt 8 0.039 ? 0.001 ops/ms
MathBench.powDoubleLoop 0 thrpt 8 0.038 ? 0.001 ops/ms
StrictMathBench.powDouble N/A thrpt 8 72000.166 ? 135.002 ops/ms
----------------------------
--------- After -----------
Benchmark (seed) Mode Cnt Score Error Units
MathBench.powDouble 0 thrpt 8 100738.866 ? 222.820 ops/ms
MathBench.powDouble0Dot5 0 thrpt 8 100799.098 ? 95.537 ops/ms <-- 3.0x up
MathBench.powDouble0Dot5Const 0 thrpt 8 100765.571 ? 178.436 ops/ms
MathBench.powDouble0Dot5Loop 0 thrpt 8 0.244 ? 0.002 ops/ms <-- 6.3x up
MathBench.powDoubleLoop 0 thrpt 8 0.038 ? 0.001 ops/ms
StrictMathBench.powDouble N/A thrpt 8 71758.725 ? 339.660 ops/ms
----------------------------
* MacOS/Intel
--------- Before -----------
Benchmark (seed) Mode Cnt Score Error Units
MathBench.powDouble 0 thrpt 8 238064.722 ? 5181.318 ops/ms
MathBench.powDouble0Dot5 0 thrpt 8 59235.979 ? 2046.519 ops/ms
MathBench.powDouble0Dot5Const 0 thrpt 8 59695.014 ? 1079.692 ops/ms
MathBench.powDouble0Dot5Loop 0 thrpt 8 0.040 ? 0.001 ops/ms
MathBench.powDoubleLoop 0 thrpt 8 0.041 ? 0.001 ops/ms
StrictMathBench.powDouble N/A thrpt 8 238391.026 ? 2743.385 ops/ms
----------------------------
--------- After -----------
Benchmark (seed) Mode Cnt Score Error Units
MathBench.powDouble 0 thrpt 8 238582.414 ? 3661.261 ops/ms
MathBench.powDouble0Dot5 0 thrpt 8 224102.701 ? 2846.892 ops/ms <-- 3.8x up
MathBench.powDouble0Dot5Const 0 thrpt 8 224542.331 ? 19027.596 ops/ms
MathBench.powDouble0Dot5Loop 0 thrpt 8 0.158 ? 0.002 ops/ms <-- 4.0x up
MathBench.powDoubleLoop 0 thrpt 8 0.041 ? 0.001 ops/ms
StrictMathBench.powDouble N/A thrpt 8 233689.504 ? 10141.034 ops/ms
----------------------------
-------------
Commit messages:
- 8265325: Optimize StubRoutines::dpow() for Math.pow(x, 0.5)
Changes: https://git.openjdk.java.net/jdk/pull/3536/files
Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3536&range=00
Issue: https://bugs.openjdk.java.net/browse/JDK-8265325
Stats: 144 lines in 3 files changed: 142 ins; 0 del; 2 mod
Patch: https://git.openjdk.java.net/jdk/pull/3536.diff
Fetch: git fetch https://git.openjdk.java.net/jdk pull/3536/head:pull/3536
PR: https://git.openjdk.java.net/jdk/pull/3536
More information about the hotspot-compiler-dev
mailing list