Integrated: 8265325: Optimize StubRoutines::dpow() for Math.pow(x, 0.5)

Jie Fu jiefu at openjdk.java.net
Mon Apr 19 10:49:38 UTC 2021


On Fri, 16 Apr 2021 06:51:00 GMT, Jie Fu <jiefu at openjdk.org> wrote:

> Hi all,
> 
> I'd like to optimize the StubRoutines::dpow() for Math.pow(x, 0.5).
> 
> In the pow and sqrt discussion [1], Joe taught me that the Java library implementation of pow has been optimized for pow(x, 2.0) [2] and pow(x, 0.5) [3].
> However, the hotspot StubRoutines::dpow() only implements the same opt for pow(x, 2.0), but still not for pow(x, 0.5).
> This patch optimizes StubRoutines::dpow() for pow(x, 0.5).
> 
> Although not all Math.pow(x, 0.5) can be replaced with sqrt(x), we can still do it safely for the following cases:
>   1) x >= 0.0    (fully implemented)
>   2) x is +Inf   (fully implemented)
>   3) x is NaN    (can be further divided into +NaN and -NaN and only +NaN is implemented)
> 
> The effect of this opt has been tested on serveral platforms showing 3.0x ~ 6.3x performance improvement.
> And no performance drop was observed.
> 
> Testing:
>   - tier1 ~ tier3 on Linux/x64
> 
> Thanks.
> Best regards,
> Jie
> 
> [1] https://mail.openjdk.java.net/pipermail/core-libs-dev/2021-April/076220.html
> [2] https://github.com/openjdk/jdk/blob/d84a7e55be40eae57b6c322694d55661a5053a55/src/java.base/share/classes/java/lang/FdLibm.java#L362
> [3] https://github.com/openjdk/jdk/blob/d84a7e55be40eae57b6c322694d55661a5053a55/src/java.base/share/classes/java/lang/FdLibm.java#L364
> 
> Detailed performance numbers:
> * Linux/Intel
> 
> --------- Before -----------
> Benchmark                      (seed)   Mode  Cnt       Score       Error   Units
> MathBench.powDouble                 0  thrpt    8  218783.605 ?   838.379  ops/ms
> MathBench.powDouble0Dot5            0  thrpt    8   45498.351 ?     7.558  ops/ms
> MathBench.powDouble0Dot5Const       0  thrpt    8   45243.530 ?  1097.100  ops/ms
> MathBench.powDouble0Dot5Loop        0  thrpt    8       0.031 ?     0.001  ops/ms
> MathBench.powDoubleLoop             0  thrpt    8       0.031 ?     0.001  ops/ms
> StrictMathBench.powDouble         N/A  thrpt    8  176106.602 ? 13127.650  ops/ms
> ----------------------------
> 
> --------- After -----------
> Benchmark                      (seed)   Mode  Cnt       Score       Error   Units
> MathBench.powDouble                 0  thrpt    8  219930.462 ?   181.922  ops/ms
> MathBench.powDouble0Dot5            0  thrpt    8  204966.834 ?   329.032  ops/ms   <-- 4.5x up
> MathBench.powDouble0Dot5Const       0  thrpt    8  203004.302 ?   684.072  ops/ms
> MathBench.powDouble0Dot5Loop        0  thrpt    8       0.121 ?     0.001  ops/ms   <-- 3.9x up
> MathBench.powDoubleLoop             0  thrpt    8       0.031 ?     0.001  ops/ms
> StrictMathBench.powDouble         N/A  thrpt    8  178818.861 ? 16235.465  ops/ms
> ----------------------------
> 
> 
> * Linux/AMD
> 
> --------- Before -----------
> Benchmark                      (seed)   Mode  Cnt       Score     Error   Units
> MathBench.powDouble                 0  thrpt    8  100741.348 ? 207.766  ops/ms
> MathBench.powDouble0Dot5            0  thrpt    8   33896.623 ? 103.352  ops/ms
> MathBench.powDouble0Dot5Const       0  thrpt    8   34195.944 ? 230.703  ops/ms
> MathBench.powDouble0Dot5Loop        0  thrpt    8       0.039 ?   0.001  ops/ms
> MathBench.powDoubleLoop             0  thrpt    8       0.038 ?   0.001  ops/ms
> StrictMathBench.powDouble         N/A  thrpt    8   72000.166 ? 135.002  ops/ms
> ----------------------------
> 
> --------- After -----------
> Benchmark                      (seed)   Mode  Cnt       Score     Error   Units
> MathBench.powDouble                 0  thrpt    8  100738.866 ? 222.820  ops/ms
> MathBench.powDouble0Dot5            0  thrpt    8  100799.098 ?  95.537  ops/ms   <-- 3.0x up
> MathBench.powDouble0Dot5Const       0  thrpt    8  100765.571 ? 178.436  ops/ms
> MathBench.powDouble0Dot5Loop        0  thrpt    8       0.244 ?   0.002  ops/ms   <-- 6.3x up
> MathBench.powDoubleLoop             0  thrpt    8       0.038 ?   0.001  ops/ms
> StrictMathBench.powDouble         N/A  thrpt    8   71758.725 ? 339.660  ops/ms
> ----------------------------
> 
> 
> * MacOS/Intel
> 
> --------- Before -----------
> Benchmark                      (seed)   Mode  Cnt       Score      Error   Units
> MathBench.powDouble                 0  thrpt    8  238064.722 ? 5181.318  ops/ms
> MathBench.powDouble0Dot5            0  thrpt    8   59235.979 ? 2046.519  ops/ms
> MathBench.powDouble0Dot5Const       0  thrpt    8   59695.014 ? 1079.692  ops/ms
> MathBench.powDouble0Dot5Loop        0  thrpt    8       0.040 ?    0.001  ops/ms
> MathBench.powDoubleLoop             0  thrpt    8       0.041 ?    0.001  ops/ms
> StrictMathBench.powDouble         N/A  thrpt    8  238391.026 ? 2743.385  ops/ms
> ----------------------------
> 
> --------- After -----------
> Benchmark                      (seed)   Mode  Cnt       Score       Error   Units
> MathBench.powDouble                 0  thrpt    8  238582.414 ?  3661.261  ops/ms
> MathBench.powDouble0Dot5            0  thrpt    8  224102.701 ?  2846.892  ops/ms   <-- 3.8x up
> MathBench.powDouble0Dot5Const       0  thrpt    8  224542.331 ? 19027.596  ops/ms
> MathBench.powDouble0Dot5Loop        0  thrpt    8       0.158 ?     0.002  ops/ms   <-- 4.0x up
> MathBench.powDoubleLoop             0  thrpt    8       0.041 ?     0.001  ops/ms
> StrictMathBench.powDouble         N/A  thrpt    8  233689.504 ? 10141.034  ops/ms
> ----------------------------

This pull request has now been integrated.

Changeset: b64a3fb9
Author:    Jie Fu <jiefu at openjdk.org>
URL:       https://git.openjdk.java.net/jdk/commit/b64a3fb9
Stats:     146 lines in 3 files changed: 145 ins; 0 del; 1 mod

8265325: Optimize StubRoutines::dpow() for Math.pow(x, 0.5)

Reviewed-by: kvn, neliasso

-------------

PR: https://git.openjdk.java.net/jdk/pull/3536


More information about the hotspot-compiler-dev mailing list