C2: Advantage of parse time inlining

Thu May 14 20:01:29 UTC 2015

Right, thank you.

Is MinInliningThreshold scaled with XX:CompileThreshold=XXX values? Say I
turn CompileThreshold down to 100 (as an example).

On Thu, May 14, 2015 at 3:20 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com
> wrote:

> On 5/14/15 12:02 PM, Vitaly Davidovich wrote:
>
>> Thanks Vladimir.  I recall seeing changes around incremental inlining,
>> and may have mistakenly thought it happens at some later point in time.
>> Appreciate the clarification.
>>
>> Ok, so based on what you say, I can see a theoretical problem whereby a
>> method is being parsed, is larger than MaxInlineSize, but doesn't happen
>> to be frequent enough yet at this point, and so it won't be inlined; if
>> it turns out to be hot later on, the lack of inlining will not be undone
>> (assuming the caller isn't deopted and recompiled later, with updated
>> frequency info, for other reasons).
>>
>
> That is correct.
>
> Note, that the problem is not how hot is callee (invocation times) but how
> hot the call site in caller. Usually it does not change during execution.
> If it is called in a loop C2 will try to inline because freq should be high.
>
> There is MinInliningThreshold (250) but since we compile caller when it is
> executed 10000 times the call site should be on slow path which is executed
> only 2.5% times. So you will not see the performance difference if we
> inline it or call callee which is compiled if it is hot.
>
> Vladimir
>
>
>> Thanks again.
>>
>> On Thu, May 14, 2015 at 2:30 PM, Vladimir Kozlov
>> <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>> wrote:
>>
>>     Vitaly,
>>
>>     You have small misconception - almost all C2 inlining (normal java
>>     methods) is done during parsing in one pass. Recently we changed it
>>     to inline jsr292 methods after parsing (to execute IGVN and reduce
>>     graph - otherwise they blow up number of ideal nodes and we bailout
>>     compilation due to MaxNodeLimit).
>>
>>     As parser goes and see a call site it check can it inline or not
>>     (see opto/bytecodeinfo.cpp, should_inline() and
>>     should_not_inline()). There are several conditions which drive
>>     inlining and following (most important) flags controls them:
>>     MaxTrivialSize, MaxInlineSize, FreqInlineSize, InlineSmallCode,
>>     MinInliningThreshold, MaxInlineLevel, InlineFrequencyCount,
>>     InlineFrequencyRatio.
>>
>>     Most tedious flag is InlineSmallCode (size of compiled assembler
>>     code which is different from other sizes which are bytecode size)
>>     which control inlining of already compiled method. Usually if a call
>>     site is hot the callee is compiled before caller. But sometimes you
>>     can get caller compiled first (if it has hot loop, for example) so
>>     different condition will be used and as result you can get
>>     performance variation between runs.
>>
>>     The difference between MaxInlineSize (35) and FreqInlineSize (325)
>>     is FreqInlineSize takes into account how frequent call site is
>>     executed relatively to caller invocations:
>>
>>        int call_site_count  = method()->scale_count(profile.count());
>>        int invoke_count     = method()->interpreter_invocation_count();
>>        int freq = call_site_count / invoke_count;
>>        int max_inline_size  = MaxInlineSize;
>>        // bump the max size if the call is frequent
>>        if ((freq >= InlineFrequencyRatio) ||
>>            (call_site_count >= InlineFrequencyCount) ||
>>            is_unboxing_method(callee_method, C) ||
>>            is_init_with_ea(callee_method, caller_method, C)) {
>>          max_inline_size = FreqInlineSize;
>>
>>     And there is additional inlining condition for all methods which
>>     size > MaxTrivialSize:
>>
>>        if (!callee_method->was_executed_more_than(MinInliningThreshold)) {
>>          set_msg("executed < MinInliningThreshold times");
>>
>>     Regards,
>>     Vladimir
>>
>>     On 5/14/15 10:03 AM, Vitaly Davidovich wrote:
>>
>>         I should also add that I see how inlining without taking call
>>         freq into
>>         account could lead to faster time to peak performance for
>>         methods that
>>         eventually get hot anyway but aren't at parse time.  Peak perf
>>         will be
>>         the same if the method is too big for parse inlining but
>>         eventually gets
>>         compiled due to reaching hotness.  Is that about right?
>>
>>         sent from my phone
>>
>>         On May 14, 2015 12:57 PM, "Vitaly Davidovich" <vitalyd at gmail.com
>>         <mailto:vitalyd at gmail.com>
>>         <mailto:vitalyd at gmail.com <mailto:vitalyd at gmail.com>>> wrote:
>>
>>              Vladimir,
>>
>>              I'm comparing MaxInlineSize (35) with FreqInlineSize
>>         (325).  AFAIU,
>>              MaxInlineSize drives which methods are inlined at parse
>>         time by C2,
>>              whereas FreqInlineSize is the threshold for "late" (or what
>>         do you
>>              guys call inlining after parsing?) inlining.  Most of the
>>         inlining
>>              discussions (or worries, rather) seem to focus around the
>>              MaxInlineSize value, and not FreqInlineSize, even if the
>> target
>>              method will get hot.
>>
>>                  Usually, people care about 35 (= MaxInlineSize),
>>         because for
>>                  methods up to MaxInlineSize their call frequency is
>>         ignored. So,
>>                  fewer chances to end up with non-inlined call.
>>
>>
>>              Ok, so for hot methods then MaxInlineSize isn't really a
>>         concern,
>>              and FreqInlineSize would be the threshold to worry about
>>         (for C2
>>              compiler) then? Why are people worried about inlining in
>>         cold paths
>>              then?
>>
>>              Thanks Vladimir
>>
>>              On Thu, May 14, 2015 at 12:36 PM, Vladimir Ivanov
>>              <vladimir.x.ivanov at oracle.com
>>         <mailto:vladimir.x.ivanov at oracle.com>
>>         <mailto:vladimir.x.ivanov at oracle.com
>>
>>         <mailto:vladimir.x.ivanov at oracle.com>>>
>>              wrote:
>>
>>                  Vitaly,
>>
>>                  Can you elaborate your question a bit? What do you
>> compare
>>                  parse-time inlining with? Mentioning of С1 & profile
>>         pollution
>>                  in this context confuses me.
>>
>>                  Usually, people care about 35 (= MaxInlineSize),
>>         because for
>>                  methods up to MaxInlineSize their call frequency is
>>         ignored. So,
>>                  fewer chances to end up with non-inlined call.
>>
>>                  Best regards,
>>                  Vladimir Ivanov
>>
>>                  On 5/14/15 7:09 PM, Vitaly Davidovich wrote:
>>
>>                      Any pointers? Sorry to bug you guys, but just want
>>         to make
>>                      sure I
>>                      understand this point as I see quite a bit of
>>         discussion on
>>                      core-libs
>>                      and elsewhere where people are worrying about the 35
>>                      bytecode size
>>                      threshold for parse inlining.
>>
>>                      On Wed, May 13, 2015 at 3:36 PM, Vitaly Davidovich
>>                      <vitalyd at gmail.com <mailto:vitalyd at gmail.com>
>>         <mailto:vitalyd at gmail.com <mailto:vitalyd at gmail.com>>
>>                      <mailto:vitalyd at gmail.com
>>         <mailto:vitalyd at gmail.com> <mailto:vitalyd at gmail.com
>>         <mailto:vitalyd at gmail.com>>>> wrote:
>>
>>                           Hi guys,
>>
>>                           Could someone please explain the advantage, if
>>         any, of
>>                      parse time
>>                           inlining in C2? Given that FreqInlineSize is
>> quite
>>                      large by default,
>>                           most hot methods will get inlined anyway
>>         (well, ones
>>                      that can be for
>>                           other reasons).  What is the advantage of
>>         parse time
>>                      inlining?
>>
>>                           Is it quicker time to peak performance if C1
>>         is reached
>>                      first?
>>
>>                           Does it ensure that a method is inlined
>>         whereas it may
>>                      not be if
>>                           it's already compiled into a medium/large
>>         method otherwise?
>>
>>                           Is parse time inlining not susceptible to
>> profile
>>                      pollution? I
>>                           suspect it is since the interpreter has already
>>                      profiled the inlinee
>>                           either way, but wanted to check.
>>
>>                           Anything else I'm not thinking about?
>>
>>                           Thanks
>>
>>
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150514/534493cc/attachment.html>