RFR: 8221542: ~15% performance degradation due to less optimized inline decision
Vladimir Ivanov
vladimir.x.ivanov at oracle.com
Thu Mar 28 06:21:51 UTC 2019
Hi Jie,
The heuristic quirk looks very similar to the one Sergey reported recently:
http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-February/032623.html
Overall, tweaking the heuristic to favor inlining doesn't look the right
thing here. profile.count=0 is a sign the profile isn't mature enough
and it's likely the callee doesn't have enough profiling info as well.
(And that's what Sergey observed on some of the microbenchmarks during
his experiments.)
In your particular case (Random::<init>), tweaking the heuristic so
is_init_with_ea [1] overrules "profile.count > 0" may be a more
promising approach. After all, the fact that the call site is being
considered for inlining (and not pruned along with the basic block it
belongs to) is a strong signal in favor of "profile.count > 0" case.
(Though it's not guaranteed due to the immaturity of profile data.)
But IMO the root problem is that top-tier compilation happens too early:
profile data isn't mature enough yet and it will easily lead to similar
problems later (during compilation).
Best regards,
Vladimir Ivanov
[1]
http://hg.openjdk.java.net/jdk/jdk/file/9c84d2865c2d/src/hotspot/share/opto/bytecodeInfo.cpp#l81
On 27/03/2019 03:15, Jie Fu wrote:
> Hi all,
>
> JBS: https://bugs.openjdk.java.net/browse/JDK-8221542
> Webrev: http://cr.openjdk.java.net/~jiefu/monte_carlo-perf-drop/webrev.00/
>
> ## Symptom
> ~15% performance degradation (from 700 ops/m to 600 ops/m) was observed
> randomly on x86 while running SPECjvm2008's scimark.monte_carlo with
> -XX:-TieredCompilation.
>
> ## Reproduce
> It can be always reproduced with the script[1] in less than 5 minutes.
>
> ## Reason
> The drop was caused by a not-inline decision on
> spec.benchmarks.scimark.utils.Random::<init> in
> spec.benchmarks.scimark.monte_carlo.MonteCarlo::integrate.
>
> ## Fix
> It might be better to make a little change to the inline heuristic[2].
>
> For callers without loops, the original heuristic works fine.
> But for callers with loops, it would be better to make a not-inline
> decision more conservatively.
>
> ## Testing
> - Running scimark.monte_carlo on jdk/x64 with -XX:-TieredCompilation for
> about 5000 times, no performance drop
> Also on jdk8u/mips64 with -XX:-TieredCompilation, no performance drop
> - Running make test TEST="micro" on jdk/x64, no performance regression
> - Running SPECjvm2008 on jdk8u/x64 with -XX:-TieredCompilation, no
> performance regression
>
> For more detailed info, please see the JBS.
>
> Could you please review it?
> Thanks a lot.
>
> Best regards,
> Jie
>
> [1] http://cr.openjdk.java.net/~jiefu/monte_carlo-perf-drop/reproduce.sh
> [2]
> http://hg.openjdk.java.net/jdk/jdk/file/0a2d73e02076/src/hotspot/share/opto/bytecodeInfo.cpp#l375
>
>
>
More information about the hotspot-compiler-dev
mailing list