C2: Advantage of parse time inlining

Thu May 14 21:35:21 UTC 2015

Thanks Kris.  Hmm, this sounds pretty bad for non-tiered compilations with
a relatively low CompileThreshold.  If I have a (larger than MaxInlineSize)
method executed 49% of the time, it'll inline at CompileThreshold=10k but
not CompileThreshold=100.  Or am I missing something?

On Thu, May 14, 2015 at 5:28 PM, Krystal Mok <rednaxelafx at gmail.com> wrote:

> Yes and no.
>
>       intx counter_high_value;
>       // Tiered compilation uses a different "high value" than non-tiered
> compilation.
>       // Determine the right value to use.
>       if (TieredCompilation) {
>         counter_high_value = InvocationCounter::count_limit / 2;
>       } else {
>         counter_high_value = CompileThreshold / 2;
>       }
>       if
> (!callee_method->was_executed_more_than(MIN2(MinInliningThreshold,
> counter_high_value))) {
>         set_msg("executed < MinInliningThreshold times");
>         return true;
>       }
>
> So it's not scaling MinInliningThreshold directly, but rather using a min
> of MinInliningThreshold and counter_high_value (where the latter is
> calculated from CompileThreshold when not using tiered compilation) to make
> the actual decision.
>
> Because tiered is on by default now, the short answer to your question
> would probably be a "no".
>
> - Kris
>
> On Thu, May 14, 2015 at 1:01 PM, Vitaly Davidovich <vitalyd at gmail.com>
> wrote:
>
>> Right, thank you.
>>
>> Is MinInliningThreshold scaled with XX:CompileThreshold=XXX values? Say I
>> turn CompileThreshold down to 100 (as an example).
>>
>> On Thu, May 14, 2015 at 3:20 PM, Vladimir Kozlov <
>> vladimir.kozlov at oracle.com> wrote:
>>
>>> On 5/14/15 12:02 PM, Vitaly Davidovich wrote:
>>>
>>>> Thanks Vladimir.  I recall seeing changes around incremental inlining,
>>>> and may have mistakenly thought it happens at some later point in time.
>>>> Appreciate the clarification.
>>>>
>>>> Ok, so based on what you say, I can see a theoretical problem whereby a
>>>> method is being parsed, is larger than MaxInlineSize, but doesn't happen
>>>> to be frequent enough yet at this point, and so it won't be inlined; if
>>>> it turns out to be hot later on, the lack of inlining will not be undone
>>>> (assuming the caller isn't deopted and recompiled later, with updated
>>>> frequency info, for other reasons).
>>>>
>>>
>>> That is correct.
>>>
>>> Note, that the problem is not how hot is callee (invocation times) but
>>> how hot the call site in caller. Usually it does not change during
>>> execution. If it is called in a loop C2 will try to inline because freq
>>> should be high.
>>>
>>> There is MinInliningThreshold (250) but since we compile caller when it
>>> is executed 10000 times the call site should be on slow path which is
>>> executed only 2.5% times. So you will not see the performance difference if
>>> we inline it or call callee which is compiled if it is hot.
>>>
>>> Vladimir
>>>
>>>
>>>> Thanks again.
>>>>
>>>> On Thu, May 14, 2015 at 2:30 PM, Vladimir Kozlov
>>>> <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>> wrote:
>>>>
>>>>     Vitaly,
>>>>
>>>>     You have small misconception - almost all C2 inlining (normal java
>>>>     methods) is done during parsing in one pass. Recently we changed it
>>>>     to inline jsr292 methods after parsing (to execute IGVN and reduce
>>>>     graph - otherwise they blow up number of ideal nodes and we bailout
>>>>     compilation due to MaxNodeLimit).
>>>>
>>>>     As parser goes and see a call site it check can it inline or not
>>>>     (see opto/bytecodeinfo.cpp, should_inline() and
>>>>     should_not_inline()). There are several conditions which drive
>>>>     inlining and following (most important) flags controls them:
>>>>     MaxTrivialSize, MaxInlineSize, FreqInlineSize, InlineSmallCode,
>>>>     MinInliningThreshold, MaxInlineLevel, InlineFrequencyCount,
>>>>     InlineFrequencyRatio.
>>>>
>>>>     Most tedious flag is InlineSmallCode (size of compiled assembler
>>>>     code which is different from other sizes which are bytecode size)
>>>>     which control inlining of already compiled method. Usually if a call
>>>>     site is hot the callee is compiled before caller. But sometimes you
>>>>     can get caller compiled first (if it has hot loop, for example) so
>>>>     different condition will be used and as result you can get
>>>>     performance variation between runs.
>>>>
>>>>     The difference between MaxInlineSize (35) and FreqInlineSize (325)
>>>>     is FreqInlineSize takes into account how frequent call site is
>>>>     executed relatively to caller invocations:
>>>>
>>>>        int call_site_count  = method()->scale_count(profile.count());
>>>>        int invoke_count     = method()->interpreter_invocation_count();
>>>>        int freq = call_site_count / invoke_count;
>>>>        int max_inline_size  = MaxInlineSize;
>>>>        // bump the max size if the call is frequent
>>>>        if ((freq >= InlineFrequencyRatio) ||
>>>>            (call_site_count >= InlineFrequencyCount) ||
>>>>            is_unboxing_method(callee_method, C) ||
>>>>            is_init_with_ea(callee_method, caller_method, C)) {
>>>>          max_inline_size = FreqInlineSize;
>>>>
>>>>     And there is additional inlining condition for all methods which
>>>>     size > MaxTrivialSize:
>>>>
>>>>        if
>>>> (!callee_method->was_executed_more_than(MinInliningThreshold)) {
>>>>          set_msg("executed < MinInliningThreshold times");
>>>>
>>>>     Regards,
>>>>     Vladimir
>>>>
>>>>     On 5/14/15 10:03 AM, Vitaly Davidovich wrote:
>>>>
>>>>         I should also add that I see how inlining without taking call
>>>>         freq into
>>>>         account could lead to faster time to peak performance for
>>>>         methods that
>>>>         eventually get hot anyway but aren't at parse time.  Peak perf
>>>>         will be
>>>>         the same if the method is too big for parse inlining but
>>>>         eventually gets
>>>>         compiled due to reaching hotness.  Is that about right?
>>>>
>>>>         sent from my phone
>>>>
>>>>         On May 14, 2015 12:57 PM, "Vitaly Davidovich" <
>>>> vitalyd at gmail.com
>>>>         <mailto:vitalyd at gmail.com>
>>>>         <mailto:vitalyd at gmail.com <mailto:vitalyd at gmail.com>>> wrote:
>>>>
>>>>              Vladimir,
>>>>
>>>>              I'm comparing MaxInlineSize (35) with FreqInlineSize
>>>>         (325).  AFAIU,
>>>>              MaxInlineSize drives which methods are inlined at parse
>>>>         time by C2,
>>>>              whereas FreqInlineSize is the threshold for "late" (or what
>>>>         do you
>>>>              guys call inlining after parsing?) inlining.  Most of the
>>>>         inlining
>>>>              discussions (or worries, rather) seem to focus around the
>>>>              MaxInlineSize value, and not FreqInlineSize, even if the
>>>> target
>>>>              method will get hot.
>>>>
>>>>                  Usually, people care about 35 (= MaxInlineSize),
>>>>         because for
>>>>                  methods up to MaxInlineSize their call frequency is
>>>>         ignored. So,
>>>>                  fewer chances to end up with non-inlined call.
>>>>
>>>>
>>>>              Ok, so for hot methods then MaxInlineSize isn't really a
>>>>         concern,
>>>>              and FreqInlineSize would be the threshold to worry about
>>>>         (for C2
>>>>              compiler) then? Why are people worried about inlining in
>>>>         cold paths
>>>>              then?
>>>>
>>>>              Thanks Vladimir
>>>>
>>>>              On Thu, May 14, 2015 at 12:36 PM, Vladimir Ivanov
>>>>              <vladimir.x.ivanov at oracle.com
>>>>         <mailto:vladimir.x.ivanov at oracle.com>
>>>>         <mailto:vladimir.x.ivanov at oracle.com
>>>>
>>>>         <mailto:vladimir.x.ivanov at oracle.com>>>
>>>>              wrote:
>>>>
>>>>                  Vitaly,
>>>>
>>>>                  Can you elaborate your question a bit? What do you
>>>> compare
>>>>                  parse-time inlining with? Mentioning of С1 & profile
>>>>         pollution
>>>>                  in this context confuses me.
>>>>
>>>>                  Usually, people care about 35 (= MaxInlineSize),
>>>>         because for
>>>>                  methods up to MaxInlineSize their call frequency is
>>>>         ignored. So,
>>>>                  fewer chances to end up with non-inlined call.
>>>>
>>>>                  Best regards,
>>>>                  Vladimir Ivanov
>>>>
>>>>                  On 5/14/15 7:09 PM, Vitaly Davidovich wrote:
>>>>
>>>>                      Any pointers? Sorry to bug you guys, but just want
>>>>         to make
>>>>                      sure I
>>>>                      understand this point as I see quite a bit of
>>>>         discussion on
>>>>                      core-libs
>>>>                      and elsewhere where people are worrying about the
>>>> 35
>>>>                      bytecode size
>>>>                      threshold for parse inlining.
>>>>
>>>>                      On Wed, May 13, 2015 at 3:36 PM, Vitaly Davidovich
>>>>                      <vitalyd at gmail.com <mailto:vitalyd at gmail.com>
>>>>         <mailto:vitalyd at gmail.com <mailto:vitalyd at gmail.com>>
>>>>                      <mailto:vitalyd at gmail.com
>>>>         <mailto:vitalyd at gmail.com> <mailto:vitalyd at gmail.com
>>>>         <mailto:vitalyd at gmail.com>>>> wrote:
>>>>
>>>>                           Hi guys,
>>>>
>>>>                           Could someone please explain the advantage, if
>>>>         any, of
>>>>                      parse time
>>>>                           inlining in C2? Given that FreqInlineSize is
>>>> quite
>>>>                      large by default,
>>>>                           most hot methods will get inlined anyway
>>>>         (well, ones
>>>>                      that can be for
>>>>                           other reasons).  What is the advantage of
>>>>         parse time
>>>>                      inlining?
>>>>
>>>>                           Is it quicker time to peak performance if C1
>>>>         is reached
>>>>                      first?
>>>>
>>>>                           Does it ensure that a method is inlined
>>>>         whereas it may
>>>>                      not be if
>>>>                           it's already compiled into a medium/large
>>>>         method otherwise?
>>>>
>>>>                           Is parse time inlining not susceptible to
>>>> profile
>>>>                      pollution? I
>>>>                           suspect it is since the interpreter has
>>>> already
>>>>                      profiled the inlinee
>>>>                           either way, but wanted to check.
>>>>
>>>>                           Anything else I'm not thinking about?
>>>>
>>>>                           Thanks
>>>>
>>>>
>>>>
>>>>
>>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150514/2e466425/attachment-0001.html>