RFR: 8265325: Optimize StubRoutines::dpow() for Math.pow(x, 0.5)

Jie Fu jiefu at openjdk.java.net
Fri Apr 16 06:58:00 UTC 2021


Hi all,

I'd like to optimize the StubRoutines::dpow() for Math.pow(x, 0.5).

In the pow and sqrt discussion [1], Joe taught me that the Java library implementation of pow has been optimized for pow(x, 2.0) [2] and pow(x, 0.5) [3].
However, the hotspot StubRoutines::dpow() only implements the same opt for pow(x, 2.0), but still not for pow(x, 0.5).
This patch optimizes StubRoutines::dpow() for pow(x, 0.5).

Although not all Math.pow(x, 0.5) can be replaced with sqrt(x), we can still do it safely for the following cases:
  1) x >= 0.0    (fully implemented)
  2) x is +Inf   (fully implemented)
  3) x is NaN    (can be further divided into +NaN and -NaN and only +NaN is implemented)

The effect of this opt has been tested on serveral platforms showing 3.0x ~ 6.3x performance improvement.
And no performance drop was observed.

Testing:
  - tier1 ~ tier3 on Linux/x64

Thanks.
Best regards,
Jie

[1] https://mail.openjdk.java.net/pipermail/core-libs-dev/2021-April/076220.html
[2] https://github.com/openjdk/jdk/blob/d84a7e55be40eae57b6c322694d55661a5053a55/src/java.base/share/classes/java/lang/FdLibm.java#L362
[3] https://github.com/openjdk/jdk/blob/d84a7e55be40eae57b6c322694d55661a5053a55/src/java.base/share/classes/java/lang/FdLibm.java#L364

Detailed performance numbers:
* Linux/Intel

--------- Before -----------
Benchmark                      (seed)   Mode  Cnt       Score       Error   Units
MathBench.powDouble                 0  thrpt    8  218783.605 ?   838.379  ops/ms
MathBench.powDouble0Dot5            0  thrpt    8   45498.351 ?     7.558  ops/ms
MathBench.powDouble0Dot5Const       0  thrpt    8   45243.530 ?  1097.100  ops/ms
MathBench.powDouble0Dot5Loop        0  thrpt    8       0.031 ?     0.001  ops/ms
MathBench.powDoubleLoop             0  thrpt    8       0.031 ?     0.001  ops/ms
StrictMathBench.powDouble         N/A  thrpt    8  176106.602 ? 13127.650  ops/ms
----------------------------

--------- After -----------
Benchmark                      (seed)   Mode  Cnt       Score       Error   Units
MathBench.powDouble                 0  thrpt    8  219930.462 ?   181.922  ops/ms
MathBench.powDouble0Dot5            0  thrpt    8  204966.834 ?   329.032  ops/ms   <-- 4.5x up
MathBench.powDouble0Dot5Const       0  thrpt    8  203004.302 ?   684.072  ops/ms
MathBench.powDouble0Dot5Loop        0  thrpt    8       0.121 ?     0.001  ops/ms   <-- 3.9x up
MathBench.powDoubleLoop             0  thrpt    8       0.031 ?     0.001  ops/ms
StrictMathBench.powDouble         N/A  thrpt    8  178818.861 ? 16235.465  ops/ms
----------------------------


* Linux/AMD

--------- Before -----------
Benchmark                      (seed)   Mode  Cnt       Score     Error   Units
MathBench.powDouble                 0  thrpt    8  100741.348 ? 207.766  ops/ms
MathBench.powDouble0Dot5            0  thrpt    8   33896.623 ? 103.352  ops/ms
MathBench.powDouble0Dot5Const       0  thrpt    8   34195.944 ? 230.703  ops/ms
MathBench.powDouble0Dot5Loop        0  thrpt    8       0.039 ?   0.001  ops/ms
MathBench.powDoubleLoop             0  thrpt    8       0.038 ?   0.001  ops/ms
StrictMathBench.powDouble         N/A  thrpt    8   72000.166 ? 135.002  ops/ms
----------------------------

--------- After -----------
Benchmark                      (seed)   Mode  Cnt       Score     Error   Units
MathBench.powDouble                 0  thrpt    8  100738.866 ? 222.820  ops/ms
MathBench.powDouble0Dot5            0  thrpt    8  100799.098 ?  95.537  ops/ms   <-- 3.0x up
MathBench.powDouble0Dot5Const       0  thrpt    8  100765.571 ? 178.436  ops/ms
MathBench.powDouble0Dot5Loop        0  thrpt    8       0.244 ?   0.002  ops/ms   <-- 6.3x up
MathBench.powDoubleLoop             0  thrpt    8       0.038 ?   0.001  ops/ms
StrictMathBench.powDouble         N/A  thrpt    8   71758.725 ? 339.660  ops/ms
----------------------------


* MacOS/Intel

--------- Before -----------
Benchmark                      (seed)   Mode  Cnt       Score      Error   Units
MathBench.powDouble                 0  thrpt    8  238064.722 ? 5181.318  ops/ms
MathBench.powDouble0Dot5            0  thrpt    8   59235.979 ? 2046.519  ops/ms
MathBench.powDouble0Dot5Const       0  thrpt    8   59695.014 ? 1079.692  ops/ms
MathBench.powDouble0Dot5Loop        0  thrpt    8       0.040 ?    0.001  ops/ms
MathBench.powDoubleLoop             0  thrpt    8       0.041 ?    0.001  ops/ms
StrictMathBench.powDouble         N/A  thrpt    8  238391.026 ? 2743.385  ops/ms
----------------------------

--------- After -----------
Benchmark                      (seed)   Mode  Cnt       Score       Error   Units
MathBench.powDouble                 0  thrpt    8  238582.414 ?  3661.261  ops/ms
MathBench.powDouble0Dot5            0  thrpt    8  224102.701 ?  2846.892  ops/ms   <-- 3.8x up
MathBench.powDouble0Dot5Const       0  thrpt    8  224542.331 ? 19027.596  ops/ms
MathBench.powDouble0Dot5Loop        0  thrpt    8       0.158 ?     0.002  ops/ms   <-- 4.0x up
MathBench.powDoubleLoop             0  thrpt    8       0.041 ?     0.001  ops/ms
StrictMathBench.powDouble         N/A  thrpt    8  233689.504 ? 10141.034  ops/ms
----------------------------

-------------

Commit messages:
 - 8265325: Optimize StubRoutines::dpow() for Math.pow(x, 0.5)

Changes: https://git.openjdk.java.net/jdk/pull/3536/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3536&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8265325
  Stats: 144 lines in 3 files changed: 142 ins; 0 del; 2 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3536.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3536/head:pull/3536

PR: https://git.openjdk.java.net/jdk/pull/3536


More information about the hotspot-compiler-dev mailing list