RFR: 8265325: Optimize StubRoutines::dpow() for Math.pow(x, 0.5) [v2]

Jie Fu jiefu at openjdk.java.net
Fri Apr 16 07:30:11 UTC 2021


> Hi all,
> 
> I'd like to optimize the StubRoutines::dpow() for Math.pow(x, 0.5).
> 
> In the pow and sqrt discussion [1], Joe taught me that the Java library implementation of pow has been optimized for pow(x, 2.0) [2] and pow(x, 0.5) [3].
> However, the hotspot StubRoutines::dpow() only implements the same opt for pow(x, 2.0), but still not for pow(x, 0.5).
> This patch optimizes StubRoutines::dpow() for pow(x, 0.5).
> 
> Although not all Math.pow(x, 0.5) can be replaced with sqrt(x), we can still do it safely for the following cases:
>   1) x >= 0.0    (fully implemented)
>   2) x is +Inf   (fully implemented)
>   3) x is NaN    (can be further divided into +NaN and -NaN and only +NaN is implemented)
> 
> The effect of this opt has been tested on serveral platforms showing 3.0x ~ 6.3x performance improvement.
> And no performance drop was observed.
> 
> Testing:
>   - tier1 ~ tier3 on Linux/x64
> 
> Thanks.
> Best regards,
> Jie
> 
> [1] https://mail.openjdk.java.net/pipermail/core-libs-dev/2021-April/076220.html
> [2] https://github.com/openjdk/jdk/blob/d84a7e55be40eae57b6c322694d55661a5053a55/src/java.base/share/classes/java/lang/FdLibm.java#L362
> [3] https://github.com/openjdk/jdk/blob/d84a7e55be40eae57b6c322694d55661a5053a55/src/java.base/share/classes/java/lang/FdLibm.java#L364
> 
> Detailed performance numbers:
> * Linux/Intel
> 
> --------- Before -----------
> Benchmark                      (seed)   Mode  Cnt       Score       Error   Units
> MathBench.powDouble                 0  thrpt    8  218783.605 ?   838.379  ops/ms
> MathBench.powDouble0Dot5            0  thrpt    8   45498.351 ?     7.558  ops/ms
> MathBench.powDouble0Dot5Const       0  thrpt    8   45243.530 ?  1097.100  ops/ms
> MathBench.powDouble0Dot5Loop        0  thrpt    8       0.031 ?     0.001  ops/ms
> MathBench.powDoubleLoop             0  thrpt    8       0.031 ?     0.001  ops/ms
> StrictMathBench.powDouble         N/A  thrpt    8  176106.602 ? 13127.650  ops/ms
> ----------------------------
> 
> --------- After -----------
> Benchmark                      (seed)   Mode  Cnt       Score       Error   Units
> MathBench.powDouble                 0  thrpt    8  219930.462 ?   181.922  ops/ms
> MathBench.powDouble0Dot5            0  thrpt    8  204966.834 ?   329.032  ops/ms   <-- 4.5x up
> MathBench.powDouble0Dot5Const       0  thrpt    8  203004.302 ?   684.072  ops/ms
> MathBench.powDouble0Dot5Loop        0  thrpt    8       0.121 ?     0.001  ops/ms   <-- 3.9x up
> MathBench.powDoubleLoop             0  thrpt    8       0.031 ?     0.001  ops/ms
> StrictMathBench.powDouble         N/A  thrpt    8  178818.861 ? 16235.465  ops/ms
> ----------------------------
> 
> 
> * Linux/AMD
> 
> --------- Before -----------
> Benchmark                      (seed)   Mode  Cnt       Score     Error   Units
> MathBench.powDouble                 0  thrpt    8  100741.348 ? 207.766  ops/ms
> MathBench.powDouble0Dot5            0  thrpt    8   33896.623 ? 103.352  ops/ms
> MathBench.powDouble0Dot5Const       0  thrpt    8   34195.944 ? 230.703  ops/ms
> MathBench.powDouble0Dot5Loop        0  thrpt    8       0.039 ?   0.001  ops/ms
> MathBench.powDoubleLoop             0  thrpt    8       0.038 ?   0.001  ops/ms
> StrictMathBench.powDouble         N/A  thrpt    8   72000.166 ? 135.002  ops/ms
> ----------------------------
> 
> --------- After -----------
> Benchmark                      (seed)   Mode  Cnt       Score     Error   Units
> MathBench.powDouble                 0  thrpt    8  100738.866 ? 222.820  ops/ms
> MathBench.powDouble0Dot5            0  thrpt    8  100799.098 ?  95.537  ops/ms   <-- 3.0x up
> MathBench.powDouble0Dot5Const       0  thrpt    8  100765.571 ? 178.436  ops/ms
> MathBench.powDouble0Dot5Loop        0  thrpt    8       0.244 ?   0.002  ops/ms   <-- 6.3x up
> MathBench.powDoubleLoop             0  thrpt    8       0.038 ?   0.001  ops/ms
> StrictMathBench.powDouble         N/A  thrpt    8   71758.725 ? 339.660  ops/ms
> ----------------------------
> 
> 
> * MacOS/Intel
> 
> --------- Before -----------
> Benchmark                      (seed)   Mode  Cnt       Score      Error   Units
> MathBench.powDouble                 0  thrpt    8  238064.722 ? 5181.318  ops/ms
> MathBench.powDouble0Dot5            0  thrpt    8   59235.979 ? 2046.519  ops/ms
> MathBench.powDouble0Dot5Const       0  thrpt    8   59695.014 ? 1079.692  ops/ms
> MathBench.powDouble0Dot5Loop        0  thrpt    8       0.040 ?    0.001  ops/ms
> MathBench.powDoubleLoop             0  thrpt    8       0.041 ?    0.001  ops/ms
> StrictMathBench.powDouble         N/A  thrpt    8  238391.026 ? 2743.385  ops/ms
> ----------------------------
> 
> --------- After -----------
> Benchmark                      (seed)   Mode  Cnt       Score       Error   Units
> MathBench.powDouble                 0  thrpt    8  238582.414 ?  3661.261  ops/ms
> MathBench.powDouble0Dot5            0  thrpt    8  224102.701 ?  2846.892  ops/ms   <-- 3.8x up
> MathBench.powDouble0Dot5Const       0  thrpt    8  224542.331 ? 19027.596  ops/ms
> MathBench.powDouble0Dot5Loop        0  thrpt    8       0.158 ?     0.002  ops/ms   <-- 4.0x up
> MathBench.powDoubleLoop             0  thrpt    8       0.041 ?     0.001  ops/ms
> StrictMathBench.powDouble         N/A  thrpt    8  233689.504 ? 10141.034  ops/ms
> ----------------------------

Jie Fu has updated the pull request incrementally with one additional commit since the last revision:

  Add test for x=0.0

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/3536/files
  - new: https://git.openjdk.java.net/jdk/pull/3536/files/392a5b92..a97cb957

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3536&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3536&range=00-01

  Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3536.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3536/head:pull/3536

PR: https://git.openjdk.java.net/jdk/pull/3536


More information about the hotspot-compiler-dev mailing list