RFR: 8221542: ~15% performance degradation due to less optimized inline decision

Thu Mar 28 06:21:51 UTC 2019

Hi Jie,

The heuristic quirk looks very similar to the one Sergey reported recently:

http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-February/032623.html

Overall, tweaking the heuristic to favor inlining doesn't look the right 
thing here. profile.count=0 is a sign the profile isn't mature enough 
and it's likely the callee doesn't have enough profiling info as well. 
(And that's what Sergey observed on some of the microbenchmarks during 
his experiments.)

In your particular case (Random::<init>), tweaking the heuristic so 
is_init_with_ea [1] overrules "profile.count > 0" may be a more 
promising approach. After all, the fact that the call site is being 
considered for inlining (and not pruned along with the basic block it 
belongs to) is a strong signal in favor of "profile.count > 0" case. 
(Though it's not guaranteed due to the immaturity of profile data.)

But IMO the root problem is that top-tier compilation happens too early: 
profile data isn't mature enough yet and it will easily lead to similar 
problems later (during compilation).

Best regards,
Vladimir Ivanov

[1] 
http://hg.openjdk.java.net/jdk/jdk/file/9c84d2865c2d/src/hotspot/share/opto/bytecodeInfo.cpp#l81

On 27/03/2019 03:15, Jie Fu wrote:
> Hi all,
> 
> JBS:    https://bugs.openjdk.java.net/browse/JDK-8221542
> Webrev: http://cr.openjdk.java.net/~jiefu/monte_carlo-perf-drop/webrev.00/
> 
> ## Symptom
> ~15% performance degradation (from 700 ops/m to 600 ops/m) was observed 
> randomly on x86 while running SPECjvm2008's scimark.monte_carlo with 
> -XX:-TieredCompilation.
> 
> ## Reproduce
> It can be always reproduced with the script[1] in less than 5 minutes.
> 
> ## Reason
> The drop was caused by a not-inline decision on 
> spec.benchmarks.scimark.utils.Random::<init> in 
> spec.benchmarks.scimark.monte_carlo.MonteCarlo::integrate.
> 
> ## Fix
> It might be better to make a little change to the inline heuristic[2].
> 
> For callers without loops, the original heuristic works fine.
> But for callers with loops, it would be better to make a not-inline 
> decision more conservatively.
> 
> ## Testing
> - Running scimark.monte_carlo on jdk/x64 with -XX:-TieredCompilation for 
> about 5000 times, no performance drop
>    Also on jdk8u/mips64 with -XX:-TieredCompilation, no performance drop
> - Running make test TEST="micro" on jdk/x64, no performance regression
> - Running SPECjvm2008 on jdk8u/x64 with -XX:-TieredCompilation, no 
> performance regression
> 
> For more detailed info, please see the JBS.
> 
> Could you please review it?
> Thanks a lot.
> 
> Best regards,
> Jie
> 
> [1] http://cr.openjdk.java.net/~jiefu/monte_carlo-perf-drop/reproduce.sh
> [2] 
> http://hg.openjdk.java.net/jdk/jdk/file/0a2d73e02076/src/hotspot/share/opto/bytecodeInfo.cpp#l375 
> 
> 
>