C2: Advantage of parse time inlining

Krystal Mok rednaxelafx at gmail.com
Thu May 14 22:00:20 UTC 2015


Hi Vitaly,

The code I posted comes from should_not_inline(). It's a negative filter,
so return true means don't inline.
You can take a look at opto/bytecodeInfo.cpp. It's a lot to explain in
words, but very obvious from the code.

I'm not sure what your 49% means here.

In the positive filter, If the frequency of a call site is more
than InlineFrequencyRatio (=20, think of a call site in a loop run at least
20 times per invocation of this method), or the profile recorded the call
site is called at least InlineFrequencyCount (=100 on x86), or some other
heuristics, the max_inline_size for this call site is bumped from
MaxInlineSize (=35) to FreqInlineSize (=325 on x86).

In the negative filter, the callee has to be executed at
least MIN2(MinInliningThreshold, counter_high_value) times in order to be
considered candidate for inlining. This is the invocation counter on the
callee side, not on the call site side.
In tiered mode that would be MIN2(250, 134217728) = 250. It doesn't matter
what CompileThreshold is set in this case.
In non-tiered mode, it'd be MIN2(250, CompileThreshold/2), for
CompileThreshold=100 that's 50.

- Kris

On Thu, May 14, 2015 at 2:35 PM, Vitaly Davidovich <vitalyd at gmail.com>
wrote:

> Thanks Kris.  Hmm, this sounds pretty bad for non-tiered compilations with
> a relatively low CompileThreshold.  If I have a (larger than MaxInlineSize)
> method executed 49% of the time, it'll inline at CompileThreshold=10k but
> not CompileThreshold=100.  Or am I missing something?
>
> On Thu, May 14, 2015 at 5:28 PM, Krystal Mok <rednaxelafx at gmail.com>
> wrote:
>
>> Yes and no.
>>
>>       intx counter_high_value;
>>       // Tiered compilation uses a different "high value" than non-tiered
>> compilation.
>>       // Determine the right value to use.
>>       if (TieredCompilation) {
>>         counter_high_value = InvocationCounter::count_limit / 2;
>>       } else {
>>         counter_high_value = CompileThreshold / 2;
>>       }
>>       if
>> (!callee_method->was_executed_more_than(MIN2(MinInliningThreshold,
>> counter_high_value))) {
>>         set_msg("executed < MinInliningThreshold times");
>>         return true;
>>       }
>>
>> So it's not scaling MinInliningThreshold directly, but rather using a min
>> of MinInliningThreshold and counter_high_value (where the latter is
>> calculated from CompileThreshold when not using tiered compilation) to make
>> the actual decision.
>>
>> Because tiered is on by default now, the short answer to your question
>> would probably be a "no".
>>
>> - Kris
>>
>> On Thu, May 14, 2015 at 1:01 PM, Vitaly Davidovich <vitalyd at gmail.com>
>> wrote:
>>
>>> Right, thank you.
>>>
>>> Is MinInliningThreshold scaled with XX:CompileThreshold=XXX values? Say
>>> I turn CompileThreshold down to 100 (as an example).
>>>
>>> On Thu, May 14, 2015 at 3:20 PM, Vladimir Kozlov <
>>> vladimir.kozlov at oracle.com> wrote:
>>>
>>>> On 5/14/15 12:02 PM, Vitaly Davidovich wrote:
>>>>
>>>>> Thanks Vladimir.  I recall seeing changes around incremental inlining,
>>>>> and may have mistakenly thought it happens at some later point in time.
>>>>> Appreciate the clarification.
>>>>>
>>>>> Ok, so based on what you say, I can see a theoretical problem whereby a
>>>>> method is being parsed, is larger than MaxInlineSize, but doesn't
>>>>> happen
>>>>> to be frequent enough yet at this point, and so it won't be inlined; if
>>>>> it turns out to be hot later on, the lack of inlining will not be
>>>>> undone
>>>>> (assuming the caller isn't deopted and recompiled later, with updated
>>>>> frequency info, for other reasons).
>>>>>
>>>>
>>>> That is correct.
>>>>
>>>> Note, that the problem is not how hot is callee (invocation times) but
>>>> how hot the call site in caller. Usually it does not change during
>>>> execution. If it is called in a loop C2 will try to inline because freq
>>>> should be high.
>>>>
>>>> There is MinInliningThreshold (250) but since we compile caller when it
>>>> is executed 10000 times the call site should be on slow path which is
>>>> executed only 2.5% times. So you will not see the performance difference if
>>>> we inline it or call callee which is compiled if it is hot.
>>>>
>>>> Vladimir
>>>>
>>>>
>>>>> Thanks again.
>>>>>
>>>>> On Thu, May 14, 2015 at 2:30 PM, Vladimir Kozlov
>>>>> <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>>
>>>>> wrote:
>>>>>
>>>>>     Vitaly,
>>>>>
>>>>>     You have small misconception - almost all C2 inlining (normal java
>>>>>     methods) is done during parsing in one pass. Recently we changed it
>>>>>     to inline jsr292 methods after parsing (to execute IGVN and reduce
>>>>>     graph - otherwise they blow up number of ideal nodes and we bailout
>>>>>     compilation due to MaxNodeLimit).
>>>>>
>>>>>     As parser goes and see a call site it check can it inline or not
>>>>>     (see opto/bytecodeinfo.cpp, should_inline() and
>>>>>     should_not_inline()). There are several conditions which drive
>>>>>     inlining and following (most important) flags controls them:
>>>>>     MaxTrivialSize, MaxInlineSize, FreqInlineSize, InlineSmallCode,
>>>>>     MinInliningThreshold, MaxInlineLevel, InlineFrequencyCount,
>>>>>     InlineFrequencyRatio.
>>>>>
>>>>>     Most tedious flag is InlineSmallCode (size of compiled assembler
>>>>>     code which is different from other sizes which are bytecode size)
>>>>>     which control inlining of already compiled method. Usually if a
>>>>> call
>>>>>     site is hot the callee is compiled before caller. But sometimes you
>>>>>     can get caller compiled first (if it has hot loop, for example) so
>>>>>     different condition will be used and as result you can get
>>>>>     performance variation between runs.
>>>>>
>>>>>     The difference between MaxInlineSize (35) and FreqInlineSize (325)
>>>>>     is FreqInlineSize takes into account how frequent call site is
>>>>>     executed relatively to caller invocations:
>>>>>
>>>>>        int call_site_count  = method()->scale_count(profile.count());
>>>>>        int invoke_count     = method()->interpreter_invocation_count();
>>>>>        int freq = call_site_count / invoke_count;
>>>>>        int max_inline_size  = MaxInlineSize;
>>>>>        // bump the max size if the call is frequent
>>>>>        if ((freq >= InlineFrequencyRatio) ||
>>>>>            (call_site_count >= InlineFrequencyCount) ||
>>>>>            is_unboxing_method(callee_method, C) ||
>>>>>            is_init_with_ea(callee_method, caller_method, C)) {
>>>>>          max_inline_size = FreqInlineSize;
>>>>>
>>>>>     And there is additional inlining condition for all methods which
>>>>>     size > MaxTrivialSize:
>>>>>
>>>>>        if
>>>>> (!callee_method->was_executed_more_than(MinInliningThreshold)) {
>>>>>          set_msg("executed < MinInliningThreshold times");
>>>>>
>>>>>     Regards,
>>>>>     Vladimir
>>>>>
>>>>>     On 5/14/15 10:03 AM, Vitaly Davidovich wrote:
>>>>>
>>>>>         I should also add that I see how inlining without taking call
>>>>>         freq into
>>>>>         account could lead to faster time to peak performance for
>>>>>         methods that
>>>>>         eventually get hot anyway but aren't at parse time.  Peak perf
>>>>>         will be
>>>>>         the same if the method is too big for parse inlining but
>>>>>         eventually gets
>>>>>         compiled due to reaching hotness.  Is that about right?
>>>>>
>>>>>         sent from my phone
>>>>>
>>>>>         On May 14, 2015 12:57 PM, "Vitaly Davidovich" <
>>>>> vitalyd at gmail.com
>>>>>         <mailto:vitalyd at gmail.com>
>>>>>         <mailto:vitalyd at gmail.com <mailto:vitalyd at gmail.com>>> wrote:
>>>>>
>>>>>              Vladimir,
>>>>>
>>>>>              I'm comparing MaxInlineSize (35) with FreqInlineSize
>>>>>         (325).  AFAIU,
>>>>>              MaxInlineSize drives which methods are inlined at parse
>>>>>         time by C2,
>>>>>              whereas FreqInlineSize is the threshold for "late" (or
>>>>> what
>>>>>         do you
>>>>>              guys call inlining after parsing?) inlining.  Most of the
>>>>>         inlining
>>>>>              discussions (or worries, rather) seem to focus around the
>>>>>              MaxInlineSize value, and not FreqInlineSize, even if the
>>>>> target
>>>>>              method will get hot.
>>>>>
>>>>>                  Usually, people care about 35 (= MaxInlineSize),
>>>>>         because for
>>>>>                  methods up to MaxInlineSize their call frequency is
>>>>>         ignored. So,
>>>>>                  fewer chances to end up with non-inlined call.
>>>>>
>>>>>
>>>>>              Ok, so for hot methods then MaxInlineSize isn't really a
>>>>>         concern,
>>>>>              and FreqInlineSize would be the threshold to worry about
>>>>>         (for C2
>>>>>              compiler) then? Why are people worried about inlining in
>>>>>         cold paths
>>>>>              then?
>>>>>
>>>>>              Thanks Vladimir
>>>>>
>>>>>              On Thu, May 14, 2015 at 12:36 PM, Vladimir Ivanov
>>>>>              <vladimir.x.ivanov at oracle.com
>>>>>         <mailto:vladimir.x.ivanov at oracle.com>
>>>>>         <mailto:vladimir.x.ivanov at oracle.com
>>>>>
>>>>>         <mailto:vladimir.x.ivanov at oracle.com>>>
>>>>>              wrote:
>>>>>
>>>>>                  Vitaly,
>>>>>
>>>>>                  Can you elaborate your question a bit? What do you
>>>>> compare
>>>>>                  parse-time inlining with? Mentioning of С1 & profile
>>>>>         pollution
>>>>>                  in this context confuses me.
>>>>>
>>>>>                  Usually, people care about 35 (= MaxInlineSize),
>>>>>         because for
>>>>>                  methods up to MaxInlineSize their call frequency is
>>>>>         ignored. So,
>>>>>                  fewer chances to end up with non-inlined call.
>>>>>
>>>>>                  Best regards,
>>>>>                  Vladimir Ivanov
>>>>>
>>>>>                  On 5/14/15 7:09 PM, Vitaly Davidovich wrote:
>>>>>
>>>>>                      Any pointers? Sorry to bug you guys, but just want
>>>>>         to make
>>>>>                      sure I
>>>>>                      understand this point as I see quite a bit of
>>>>>         discussion on
>>>>>                      core-libs
>>>>>                      and elsewhere where people are worrying about the
>>>>> 35
>>>>>                      bytecode size
>>>>>                      threshold for parse inlining.
>>>>>
>>>>>                      On Wed, May 13, 2015 at 3:36 PM, Vitaly Davidovich
>>>>>                      <vitalyd at gmail.com <mailto:vitalyd at gmail.com>
>>>>>         <mailto:vitalyd at gmail.com <mailto:vitalyd at gmail.com>>
>>>>>                      <mailto:vitalyd at gmail.com
>>>>>         <mailto:vitalyd at gmail.com> <mailto:vitalyd at gmail.com
>>>>>         <mailto:vitalyd at gmail.com>>>> wrote:
>>>>>
>>>>>                           Hi guys,
>>>>>
>>>>>                           Could someone please explain the advantage,
>>>>> if
>>>>>         any, of
>>>>>                      parse time
>>>>>                           inlining in C2? Given that FreqInlineSize is
>>>>> quite
>>>>>                      large by default,
>>>>>                           most hot methods will get inlined anyway
>>>>>         (well, ones
>>>>>                      that can be for
>>>>>                           other reasons).  What is the advantage of
>>>>>         parse time
>>>>>                      inlining?
>>>>>
>>>>>                           Is it quicker time to peak performance if C1
>>>>>         is reached
>>>>>                      first?
>>>>>
>>>>>                           Does it ensure that a method is inlined
>>>>>         whereas it may
>>>>>                      not be if
>>>>>                           it's already compiled into a medium/large
>>>>>         method otherwise?
>>>>>
>>>>>                           Is parse time inlining not susceptible to
>>>>> profile
>>>>>                      pollution? I
>>>>>                           suspect it is since the interpreter has
>>>>> already
>>>>>                      profiled the inlinee
>>>>>                           either way, but wanted to check.
>>>>>
>>>>>                           Anything else I'm not thinking about?
>>>>>
>>>>>                           Thanks
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150514/7f264d3c/attachment-0001.html>


More information about the hotspot-compiler-dev mailing list