C2: Advantage of parse time inlining

Thu May 14 21:50:30 UTC 2015

On 5/14/15 1:01 PM, Vitaly Davidovich wrote:
> Right, thank you.
>
> Is MinInliningThreshold scaled with XX:CompileThreshold=XXX values? Say
> I turn CompileThreshold down to 100 (as an example).

No, it does not scale.

Note, with tiered compilation C1 compilation triggered after 100 
invocations. So you will get compiled code (not optimal as C2) very early.

Vladimir

>
> On Thu, May 14, 2015 at 3:20 PM, Vladimir Kozlov
> <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>> wrote:
>
>     On 5/14/15 12:02 PM, Vitaly Davidovich wrote:
>
>         Thanks Vladimir.  I recall seeing changes around incremental
>         inlining,
>         and may have mistakenly thought it happens at some later point
>         in time.
>         Appreciate the clarification.
>
>         Ok, so based on what you say, I can see a theoretical problem
>         whereby a
>         method is being parsed, is larger than MaxInlineSize, but
>         doesn't happen
>         to be frequent enough yet at this point, and so it won't be
>         inlined; if
>         it turns out to be hot later on, the lack of inlining will not
>         be undone
>         (assuming the caller isn't deopted and recompiled later, with
>         updated
>         frequency info, for other reasons).
>
>
>     That is correct.
>
>     Note, that the problem is not how hot is callee (invocation times)
>     but how hot the call site in caller. Usually it does not change
>     during execution. If it is called in a loop C2 will try to inline
>     because freq should be high.
>
>     There is MinInliningThreshold (250) but since we compile caller when
>     it is executed 10000 times the call site should be on slow path
>     which is executed only 2.5% times. So you will not see the
>     performance difference if we inline it or call callee which is
>     compiled if it is hot.
>
>     Vladimir
>
>
>         Thanks again.
>
>         On Thu, May 14, 2015 at 2:30 PM, Vladimir Kozlov
>         <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>
>         <mailto:vladimir.kozlov at oracle.com
>         <mailto:vladimir.kozlov at oracle.com>>> wrote:
>
>              Vitaly,
>
>              You have small misconception - almost all C2 inlining
>         (normal java
>              methods) is done during parsing in one pass. Recently we
>         changed it
>              to inline jsr292 methods after parsing (to execute IGVN and
>         reduce
>              graph - otherwise they blow up number of ideal nodes and we
>         bailout
>              compilation due to MaxNodeLimit).
>
>              As parser goes and see a call site it check can it inline
>         or not
>              (see opto/bytecodeinfo.cpp, should_inline() and
>              should_not_inline()). There are several conditions which drive
>              inlining and following (most important) flags controls them:
>              MaxTrivialSize, MaxInlineSize, FreqInlineSize, InlineSmallCode,
>              MinInliningThreshold, MaxInlineLevel, InlineFrequencyCount,
>              InlineFrequencyRatio.
>
>              Most tedious flag is InlineSmallCode (size of compiled
>         assembler
>              code which is different from other sizes which are bytecode
>         size)
>              which control inlining of already compiled method. Usually
>         if a call
>              site is hot the callee is compiled before caller. But
>         sometimes you
>              can get caller compiled first (if it has hot loop, for
>         example) so
>              different condition will be used and as result you can get
>              performance variation between runs.
>
>              The difference between MaxInlineSize (35) and
>         FreqInlineSize (325)
>              is FreqInlineSize takes into account how frequent call site is
>              executed relatively to caller invocations:
>
>                 int call_site_count  =
>         method()->scale_count(profile.count());
>                 int invoke_count     =
>         method()->interpreter_invocation_count();
>                 int freq = call_site_count / invoke_count;
>                 int max_inline_size  = MaxInlineSize;
>                 // bump the max size if the call is frequent
>                 if ((freq >= InlineFrequencyRatio) ||
>                     (call_site_count >= InlineFrequencyCount) ||
>                     is_unboxing_method(callee_method, C) ||
>                     is_init_with_ea(callee_method, caller_method, C)) {
>                   max_inline_size = FreqInlineSize;
>
>              And there is additional inlining condition for all methods
>         which
>              size > MaxTrivialSize:
>
>                 if
>         (!callee_method->was_executed_more_than(MinInliningThreshold)) {
>                   set_msg("executed < MinInliningThreshold times");
>
>              Regards,
>              Vladimir
>
>              On 5/14/15 10:03 AM, Vitaly Davidovich wrote:
>
>                  I should also add that I see how inlining without
>         taking call
>                  freq into
>                  account could lead to faster time to peak performance for
>                  methods that
>                  eventually get hot anyway but aren't at parse time.
>         Peak perf
>                  will be
>                  the same if the method is too big for parse inlining but
>                  eventually gets
>                  compiled due to reaching hotness.  Is that about right?
>
>                  sent from my phone
>
>                  On May 14, 2015 12:57 PM, "Vitaly Davidovich"
>         <vitalyd at gmail.com <mailto:vitalyd at gmail.com>
>                  <mailto:vitalyd at gmail.com <mailto:vitalyd at gmail.com>>
>                  <mailto:vitalyd at gmail.com <mailto:vitalyd at gmail.com>
>         <mailto:vitalyd at gmail.com <mailto:vitalyd at gmail.com>>>> wrote:
>
>                       Vladimir,
>
>                       I'm comparing MaxInlineSize (35) with FreqInlineSize
>                  (325).  AFAIU,
>                       MaxInlineSize drives which methods are inlined at
>         parse
>                  time by C2,
>                       whereas FreqInlineSize is the threshold for "late"
>         (or what
>                  do you
>                       guys call inlining after parsing?) inlining.  Most
>         of the
>                  inlining
>                       discussions (or worries, rather) seem to focus
>         around the
>                       MaxInlineSize value, and not FreqInlineSize, even
>         if the target
>                       method will get hot.
>
>                           Usually, people care about 35 (= MaxInlineSize),
>                  because for
>                           methods up to MaxInlineSize their call
>         frequency is
>                  ignored. So,
>                           fewer chances to end up with non-inlined call.
>
>
>                       Ok, so for hot methods then MaxInlineSize isn't
>         really a
>                  concern,
>                       and FreqInlineSize would be the threshold to worry
>         about
>                  (for C2
>                       compiler) then? Why are people worried about
>         inlining in
>                  cold paths
>                       then?
>
>                       Thanks Vladimir
>
>                       On Thu, May 14, 2015 at 12:36 PM, Vladimir Ivanov
>                       <vladimir.x.ivanov at oracle.com
>         <mailto:vladimir.x.ivanov at oracle.com>
>                  <mailto:vladimir.x.ivanov at oracle.com
>         <mailto:vladimir.x.ivanov at oracle.com>>
>                  <mailto:vladimir.x.ivanov at oracle.com
>         <mailto:vladimir.x.ivanov at oracle.com>
>
>                  <mailto:vladimir.x.ivanov at oracle.com
>         <mailto:vladimir.x.ivanov at oracle.com>>>>
>                       wrote:
>
>                           Vitaly,
>
>                           Can you elaborate your question a bit? What do
>         you compare
>                           parse-time inlining with? Mentioning of С1 &
>         profile
>                  pollution
>                           in this context confuses me.
>
>                           Usually, people care about 35 (= MaxInlineSize),
>                  because for
>                           methods up to MaxInlineSize their call
>         frequency is
>                  ignored. So,
>                           fewer chances to end up with non-inlined call.
>
>                           Best regards,
>                           Vladimir Ivanov
>
>                           On 5/14/15 7:09 PM, Vitaly Davidovich wrote:
>
>                               Any pointers? Sorry to bug you guys, but
>         just want
>                  to make
>                               sure I
>                               understand this point as I see quite a bit of
>                  discussion on
>                               core-libs
>                               and elsewhere where people are worrying
>         about the 35
>                               bytecode size
>                               threshold for parse inlining.
>
>                               On Wed, May 13, 2015 at 3:36 PM, Vitaly
>         Davidovich
>                               <vitalyd at gmail.com
>         <mailto:vitalyd at gmail.com> <mailto:vitalyd at gmail.com
>         <mailto:vitalyd at gmail.com>>
>                  <mailto:vitalyd at gmail.com <mailto:vitalyd at gmail.com>
>         <mailto:vitalyd at gmail.com <mailto:vitalyd at gmail.com>>>
>                               <mailto:vitalyd at gmail.com
>         <mailto:vitalyd at gmail.com>
>                  <mailto:vitalyd at gmail.com <mailto:vitalyd at gmail.com>>
>         <mailto:vitalyd at gmail.com <mailto:vitalyd at gmail.com>
>                  <mailto:vitalyd at gmail.com
>         <mailto:vitalyd at gmail.com>>>>> wrote:
>
>                                    Hi guys,
>
>                                    Could someone please explain the
>         advantage, if
>                  any, of
>                               parse time
>                                    inlining in C2? Given that
>         FreqInlineSize is quite
>                               large by default,
>                                    most hot methods will get inlined anyway
>                  (well, ones
>                               that can be for
>                                    other reasons).  What is the advantage of
>                  parse time
>                               inlining?
>
>                                    Is it quicker time to peak
>         performance if C1
>                  is reached
>                               first?
>
>                                    Does it ensure that a method is inlined
>                  whereas it may
>                               not be if
>                                    it's already compiled into a medium/large
>                  method otherwise?
>
>                                    Is parse time inlining not
>         susceptible to profile
>                               pollution? I
>                                    suspect it is since the interpreter
>         has already
>                               profiled the inlinee
>                                    either way, but wanted to check.
>
>                                    Anything else I'm not thinking about?
>
>                                    Thanks
>
>
>
>
>