RFR: 8221542: ~15% performance degradation due to less optimized inline decision

Jie Fu fujie at loongson.cn
Thu Mar 28 07:54:16 UTC 2019


Hi Vladimir,

Thanks for your review and valuable suggestions.
I will study your suggestions and Sergey's discussion to find a better 
solution.

Thanks a lot.

Best regards,
Jie

On 2019/3/28 下午2:21, Vladimir Ivanov wrote:
> Hi Jie,
>
> The heuristic quirk looks very similar to the one Sergey reported 
> recently:
>
>
> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-February/032623.html 
>
>
> Overall, tweaking the heuristic to favor inlining doesn't look the 
> right thing here. profile.count=0 is a sign the profile isn't mature 
> enough and it's likely the callee doesn't have enough profiling info 
> as well. (And that's what Sergey observed on some of the 
> microbenchmarks during his experiments.)
>
> In your particular case (Random::<init>), tweaking the heuristic so 
> is_init_with_ea [1] overrules "profile.count > 0" may be a more 
> promising approach. After all, the fact that the call site is being 
> considered for inlining (and not pruned along with the basic block it 
> belongs to) is a strong signal in favor of "profile.count > 0" case. 
> (Though it's not guaranteed due to the immaturity of profile data.)
>
> But IMO the root problem is that top-tier compilation happens too 
> early: profile data isn't mature enough yet and it will easily lead to 
> similar problems later (during compilation).
>
> Best regards,
> Vladimir Ivanov
>
> [1] 
> http://hg.openjdk.java.net/jdk/jdk/file/9c84d2865c2d/src/hotspot/share/opto/bytecodeInfo.cpp#l81
>
> On 27/03/2019 03:15, Jie Fu wrote:
>> Hi all,
>>
>> JBS:    https://bugs.openjdk.java.net/browse/JDK-8221542
>> Webrev: 
>> http://cr.openjdk.java.net/~jiefu/monte_carlo-perf-drop/webrev.00/
>>
>> ## Symptom
>> ~15% performance degradation (from 700 ops/m to 600 ops/m) was 
>> observed randomly on x86 while running SPECjvm2008's 
>> scimark.monte_carlo with -XX:-TieredCompilation.
>>
>> ## Reproduce
>> It can be always reproduced with the script[1] in less than 5 minutes.
>>
>> ## Reason
>> The drop was caused by a not-inline decision on 
>> spec.benchmarks.scimark.utils.Random::<init> in 
>> spec.benchmarks.scimark.monte_carlo.MonteCarlo::integrate.
>>
>> ## Fix
>> It might be better to make a little change to the inline heuristic[2].
>>
>> For callers without loops, the original heuristic works fine.
>> But for callers with loops, it would be better to make a not-inline 
>> decision more conservatively.
>>
>> ## Testing
>> - Running scimark.monte_carlo on jdk/x64 with -XX:-TieredCompilation 
>> for about 5000 times, no performance drop
>>    Also on jdk8u/mips64 with -XX:-TieredCompilation, no performance drop
>> - Running make test TEST="micro" on jdk/x64, no performance regression
>> - Running SPECjvm2008 on jdk8u/x64 with -XX:-TieredCompilation, no 
>> performance regression
>>
>> For more detailed info, please see the JBS.
>>
>> Could you please review it?
>> Thanks a lot.
>>
>> Best regards,
>> Jie
>>
>> [1] http://cr.openjdk.java.net/~jiefu/monte_carlo-perf-drop/reproduce.sh
>> [2] 
>> http://hg.openjdk.java.net/jdk/jdk/file/0a2d73e02076/src/hotspot/share/opto/bytecodeInfo.cpp#l375 
>>
>>
>>



More information about the hotspot-compiler-dev mailing list