Using immature CounterData for inliner(JDK-8325681)

Liu, Xin xxinliu at amazon.com
Wed Feb 21 06:06:38 UTC 2024


Hi, 

I found that it is possible that one method is brought to C2 before its methoddata becomes mature. For C2, the threshold is 
InvocationThreshold = Tier4InvocationThreshold * ProfileMaturityPercentage / 100.0. 

By default, the threshold is 600[1]. It’s not a high bar though, a method may lie in an unlikely path. 
Eg. method bar() below is underprofiled if the possibility of taking the branch is less than ProfileMaturityPercentage%. 
For simplicity, let's assume we have only 10% possibility, or ODD=10 in my program. 

    public void foo(boolean cond) {
      if (cond) { // unlikely, let's say 10%
        bar();
      }
    }

    public void bar() {
      baz(); // this callsite is underprofiled!
  }
  
After https://bugs.openjdk.org/browse/JDK-8273712, C2 extracts the CounterData at the invoke bytecode (bci=1 in this case) from the methoddata of bar(). 
C2 uses it as a hint to determine whether it inlines callee baz.  We got “failed to inline: low call site frequency” because the frequency of calling baz is -1, or unknown. 
Its frequency is not really -1. Even we only have 1% possibility, the frequency is still 0.01, which is > MinInlineFrequencyRatio. C2 rejects to inline baz because its ciCallProfile is not initialized! 
The struct was not initialized because the methoddata is not mature.

This is obscure. If we check -XX:+PrintCompilation, it will you something like this. 
60 13 b 4 UnderProfiledSubprocedure::foo (9 bytes) 
                              @ 5 UnderProfiledSubprocedure::bar (6 bytes) inline (hot) 
                                @ 1 UnderProfiledSubprocedure::baz (19 bytes) failed to inline: low call site frequency
 
Only LogCompilation spill the truth. Its count is -1!
<method id='1419' holder='1331' name='add' return='1223' arguments='1228 1404 1221' flags='2' bytes='23' iicount='834'/>
<call method='1419' count='-1' prof_factor='0.595745' inline='1'/>
<inline_fail reason='low call site frequency'/>
 
it's worth noting that baking programs longer may or may not dodge this issue. The rootcause is in the tiered compilation model. 
HotSpot submits the compiler task based on predefined thresholds, but it can't determine how long before the task gets processed by C2 compiler.
If c1-genenated bar is executed longer enough, C2 will observe a mature methoddata. If not, it will see an immature one. 

Inliner broadens the horizon of optimizer.  Not inlining a method often has a consequence. C2 will mark the 1st argument (receiver) ArgEscaped and refrain from scalar replacement then. 
The method 'allocationExample' shows an example I caught when I compile java.base module.  

I wonder if we can use the methoddata of method bar in this context. Despite it's immature, I think it still makes sense of using its CounterData. 
As I pasted in the JBS, C2 inliner can make better judgement with it. Why do we have to ensure methoddata is mature before parsing it in ciMethod::call_profile_at_bci?  
I tried to loosen the condition and passed tier1 tests on linux. Is it better choice? 

Besides maturity, I have a follow-up question. In either should_inline() or should_not_inline(),  the heuristics of frequency is as follows:
freq = call_site_count of / invoke_count of caller

eg. frequency is close to 100% if we only consider the callsite and its caller 'bar'.  When c2 inlines a call, it actually inlines in the compilation unit of root method. 
Is it better to consider freq' = call_site_count / invoate count of root? 
In my testcase,  freq' is only 10% because its root is foo(). freq is local frequency, but freq' is global frequency. 
Have you tried this? If I would like to explore more, what kind of benchmarks should I evaluate? 

[1] 3000 * 0.2
Thanks,
--lx







More information about the hotspot-compiler-dev mailing list