RFR: 8264945: Optimize the code-gen for Math.pow(x, 0.5)
Jie Fu
jiefu at openjdk.java.net
Mon Apr 19 15:10:55 UTC 2021
On Fri, 9 Apr 2021 02:19:10 GMT, Jie Fu <jiefu at openjdk.org> wrote:
> Hi all,
>
> I'd like to optimize the code-gen for Math.pow(x, 0.5).
> And 7x ~ 14x performance improvement is observed by the jmh micro-benchmarks.
>
> While I was optimizing a machine learning program, I found both Math.pow(x, 2) and Math.pow(x, 0.5) are used.
> To my surprise, C2 just optimizes the case for Math.pow(x, 2) [1], but still not for Math.pow(x, 0.5) yet.
>
> The patch just replace Math.pow(x, 0.5) with Math.sqrt(x).
>
> Before:
>
> Benchmark (seed) Mode Cnt Score Error Units
> MathBench.powDouble0Dot5 0 thrpt 8 45525.117 ? 11.686 ops/ms
> MathBench.powDouble0Dot5Loop 0 thrpt 8 0.031 ? 0.001 ops/ms
>
> Benchmark (seed) Mode Cnt Score Error Units
> MathBench.powDouble0Dot5 0 thrpt 8 45509.317 ? 6.581 ops/ms
> MathBench.powDouble0Dot5Loop 0 thrpt 8 0.031 ? 0.001 ops/ms
>
>
> After:
>
> Benchmark (seed) Mode Cnt Score Error Units
> MathBench.powDouble0Dot5 0 thrpt 8 343354.892 ? 362.900 ops/ms
> MathBench.powDouble0Dot5Loop 0 thrpt 8 0.457 ? 0.001 ops/mso
>
> Benchmark (seed) Mode Cnt Score Error Units
> MathBench.powDouble0Dot5 0 thrpt 8 343421.559 ? 49.326 ops/ms
> MathBench.powDouble0Dot5Loop 0 thrpt 8 0.457 ? 0.001 ops/ms
>
>
> Testing:
> - tier1~3 on Linux/x64
>
> Thanks,
> Best regards,
> Jie
>
> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/library_call.cpp#L1680
Hi all,
This is the follow-up of JDK-8265325, which optimizes the code-gen of C2 for pow(x, 0.5).
Instead of calling into the StubRoutines::dpow(), the compiler can directly generate sqrt for all x >= 0.0.
1.5x ~ 4.2x performance improvement is observed after the opt.
Before
Benchmark (seed) Mode Cnt Score Error Units
MathBench.powDouble0Dot5Const 0 thrpt 8 203004.302 ? 684.072 ops/ms
MathBench.powDouble0Dot5Loop 0 thrpt 8 0.121 ? 0.001 ops/ms
After
Benchmark (seed) Mode Cnt Score Error Units
MathBench.powDouble0Dot5Const 0 thrpt 8 308771.237 ? 1604.567 ops/ms <-- 1.5x
MathBench.powDouble0Dot5Loop 0 thrpt 8 0.508 ? 0.001 ops/ms <-- 4.2x
Testing:
tier1 ~ tier3 on Linux/x64
Thanks.
Best regards,
Jie
-------------
PR: https://git.openjdk.java.net/jdk/pull/3404
More information about the hotspot-compiler-dev
mailing list