C2: Advantage of parse time inlining

Vitaly Davidovich vitalyd at gmail.com
Thu May 14 22:16:08 UTC 2015


Yes, I'll look at that code.

My 49% example (probably poorly explained, I'll try to correct this below)
was based on what Vladimir said:

There is MinInliningThreshold (250) but since we compile caller when it is
> executed 10000 times the call site should be on slow path which is executed
> only 2.5% times. So you will not see the performance difference if we
> inline it or call callee which is compiled if it is hot.


Assume we're talking about a callee (method) whose size is > MaxInlineSize
(e.g. 64 bytes just to pull some number out).  Using default compile
threshold of 10k, the callee will not be inlined if it's executed < 250
times.  That's fine since as Vladimir says it's a cold path anyway (< 2.5%
of invocations of caller don't touch that callsite), and I agree given 10k
compile threshold.  If we change CompileThreshold to 100, then the code
snippet you showed earlier (for the negative filter) will yield the
MIN2(250,50) negative filter.  So the "49%" case is the caller is invoked
100 times, triggers compilation, and compiler looks at the call site that's
executed 49 times out of those 100 (that's the 49% I'm using).  If we use
10k compile threshold, then it's executed 4900 times, clearly above 250;
I'm using an example where the code profile is the same, we're only
changing compile threshold.  So the negative filter rejects this method for
inlining now, whereas it would've accepted it had I let the code run to 10k
invocations.

I guess my overall question is the following: does changing
CompileThreshold to a lower value, but keeping execution profile the same,
alter inlining decisions.  Intuitively I would think the answer should be
"no, inlining is the same" as everything would be scaled down
appropriately, but it sounds like there are "magic" absolute numbers
involved.




On Thu, May 14, 2015 at 6:00 PM, Krystal Mok <rednaxelafx at gmail.com> wrote:

> Hi Vitaly,
>
> The code I posted comes from should_not_inline(). It's a negative filter,
> so return true means don't inline.
> You can take a look at opto/bytecodeInfo.cpp. It's a lot to explain in
> words, but very obvious from the code.
>
> I'm not sure what your 49% means here.
>
> In the positive filter, If the frequency of a call site is more
> than InlineFrequencyRatio (=20, think of a call site in a loop run at least
> 20 times per invocation of this method), or the profile recorded the call
> site is called at least InlineFrequencyCount (=100 on x86), or some other
> heuristics, the max_inline_size for this call site is bumped from
> MaxInlineSize (=35) to FreqInlineSize (=325 on x86).
>
> In the negative filter, the callee has to be executed at
> least MIN2(MinInliningThreshold, counter_high_value) times in order to be
> considered candidate for inlining. This is the invocation counter on the
> callee side, not on the call site side.
> In tiered mode that would be MIN2(250, 134217728) = 250. It doesn't matter
> what CompileThreshold is set in this case.
> In non-tiered mode, it'd be MIN2(250, CompileThreshold/2), for
> CompileThreshold=100 that's 50.
>
> - Kris
>
> On Thu, May 14, 2015 at 2:35 PM, Vitaly Davidovich <vitalyd at gmail.com>
> wrote:
>
>> Thanks Kris.  Hmm, this sounds pretty bad for non-tiered compilations
>> with a relatively low CompileThreshold.  If I have a (larger than
>> MaxInlineSize) method executed 49% of the time, it'll inline at
>> CompileThreshold=10k but not CompileThreshold=100.  Or am I missing
>> something?
>>
>> On Thu, May 14, 2015 at 5:28 PM, Krystal Mok <rednaxelafx at gmail.com>
>> wrote:
>>
>>> Yes and no.
>>>
>>>       intx counter_high_value;
>>>       // Tiered compilation uses a different "high value" than
>>> non-tiered compilation.
>>>       // Determine the right value to use.
>>>       if (TieredCompilation) {
>>>         counter_high_value = InvocationCounter::count_limit / 2;
>>>       } else {
>>>         counter_high_value = CompileThreshold / 2;
>>>       }
>>>       if
>>> (!callee_method->was_executed_more_than(MIN2(MinInliningThreshold,
>>> counter_high_value))) {
>>>         set_msg("executed < MinInliningThreshold times");
>>>         return true;
>>>       }
>>>
>>> So it's not scaling MinInliningThreshold directly, but rather using a
>>> min of MinInliningThreshold and counter_high_value (where the latter is
>>> calculated from CompileThreshold when not using tiered compilation) to make
>>> the actual decision.
>>>
>>> Because tiered is on by default now, the short answer to your question
>>> would probably be a "no".
>>>
>>> - Kris
>>>
>>> On Thu, May 14, 2015 at 1:01 PM, Vitaly Davidovich <vitalyd at gmail.com>
>>> wrote:
>>>
>>>> Right, thank you.
>>>>
>>>> Is MinInliningThreshold scaled with XX:CompileThreshold=XXX values? Say
>>>> I turn CompileThreshold down to 100 (as an example).
>>>>
>>>> On Thu, May 14, 2015 at 3:20 PM, Vladimir Kozlov <
>>>> vladimir.kozlov at oracle.com> wrote:
>>>>
>>>>> On 5/14/15 12:02 PM, Vitaly Davidovich wrote:
>>>>>
>>>>>> Thanks Vladimir.  I recall seeing changes around incremental inlining,
>>>>>> and may have mistakenly thought it happens at some later point in
>>>>>> time.
>>>>>> Appreciate the clarification.
>>>>>>
>>>>>> Ok, so based on what you say, I can see a theoretical problem whereby
>>>>>> a
>>>>>> method is being parsed, is larger than MaxInlineSize, but doesn't
>>>>>> happen
>>>>>> to be frequent enough yet at this point, and so it won't be inlined;
>>>>>> if
>>>>>> it turns out to be hot later on, the lack of inlining will not be
>>>>>> undone
>>>>>> (assuming the caller isn't deopted and recompiled later, with updated
>>>>>> frequency info, for other reasons).
>>>>>>
>>>>>
>>>>> That is correct.
>>>>>
>>>>> Note, that the problem is not how hot is callee (invocation times) but
>>>>> how hot the call site in caller. Usually it does not change during
>>>>> execution. If it is called in a loop C2 will try to inline because freq
>>>>> should be high.
>>>>>
>>>>> There is MinInliningThreshold (250) but since we compile caller when
>>>>> it is executed 10000 times the call site should be on slow path which is
>>>>> executed only 2.5% times. So you will not see the performance difference if
>>>>> we inline it or call callee which is compiled if it is hot.
>>>>>
>>>>> Vladimir
>>>>>
>>>>>
>>>>>> Thanks again.
>>>>>>
>>>>>> On Thu, May 14, 2015 at 2:30 PM, Vladimir Kozlov
>>>>>> <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>>
>>>>>> wrote:
>>>>>>
>>>>>>     Vitaly,
>>>>>>
>>>>>>     You have small misconception - almost all C2 inlining (normal java
>>>>>>     methods) is done during parsing in one pass. Recently we changed
>>>>>> it
>>>>>>     to inline jsr292 methods after parsing (to execute IGVN and reduce
>>>>>>     graph - otherwise they blow up number of ideal nodes and we
>>>>>> bailout
>>>>>>     compilation due to MaxNodeLimit).
>>>>>>
>>>>>>     As parser goes and see a call site it check can it inline or not
>>>>>>     (see opto/bytecodeinfo.cpp, should_inline() and
>>>>>>     should_not_inline()). There are several conditions which drive
>>>>>>     inlining and following (most important) flags controls them:
>>>>>>     MaxTrivialSize, MaxInlineSize, FreqInlineSize, InlineSmallCode,
>>>>>>     MinInliningThreshold, MaxInlineLevel, InlineFrequencyCount,
>>>>>>     InlineFrequencyRatio.
>>>>>>
>>>>>>     Most tedious flag is InlineSmallCode (size of compiled assembler
>>>>>>     code which is different from other sizes which are bytecode size)
>>>>>>     which control inlining of already compiled method. Usually if a
>>>>>> call
>>>>>>     site is hot the callee is compiled before caller. But sometimes
>>>>>> you
>>>>>>     can get caller compiled first (if it has hot loop, for example) so
>>>>>>     different condition will be used and as result you can get
>>>>>>     performance variation between runs.
>>>>>>
>>>>>>     The difference between MaxInlineSize (35) and FreqInlineSize (325)
>>>>>>     is FreqInlineSize takes into account how frequent call site is
>>>>>>     executed relatively to caller invocations:
>>>>>>
>>>>>>        int call_site_count  = method()->scale_count(profile.count());
>>>>>>        int invoke_count     =
>>>>>> method()->interpreter_invocation_count();
>>>>>>        int freq = call_site_count / invoke_count;
>>>>>>        int max_inline_size  = MaxInlineSize;
>>>>>>        // bump the max size if the call is frequent
>>>>>>        if ((freq >= InlineFrequencyRatio) ||
>>>>>>            (call_site_count >= InlineFrequencyCount) ||
>>>>>>            is_unboxing_method(callee_method, C) ||
>>>>>>            is_init_with_ea(callee_method, caller_method, C)) {
>>>>>>          max_inline_size = FreqInlineSize;
>>>>>>
>>>>>>     And there is additional inlining condition for all methods which
>>>>>>     size > MaxTrivialSize:
>>>>>>
>>>>>>        if
>>>>>> (!callee_method->was_executed_more_than(MinInliningThreshold)) {
>>>>>>          set_msg("executed < MinInliningThreshold times");
>>>>>>
>>>>>>     Regards,
>>>>>>     Vladimir
>>>>>>
>>>>>>     On 5/14/15 10:03 AM, Vitaly Davidovich wrote:
>>>>>>
>>>>>>         I should also add that I see how inlining without taking call
>>>>>>         freq into
>>>>>>         account could lead to faster time to peak performance for
>>>>>>         methods that
>>>>>>         eventually get hot anyway but aren't at parse time.  Peak perf
>>>>>>         will be
>>>>>>         the same if the method is too big for parse inlining but
>>>>>>         eventually gets
>>>>>>         compiled due to reaching hotness.  Is that about right?
>>>>>>
>>>>>>         sent from my phone
>>>>>>
>>>>>>         On May 14, 2015 12:57 PM, "Vitaly Davidovich" <
>>>>>> vitalyd at gmail.com
>>>>>>         <mailto:vitalyd at gmail.com>
>>>>>>         <mailto:vitalyd at gmail.com <mailto:vitalyd at gmail.com>>> wrote:
>>>>>>
>>>>>>              Vladimir,
>>>>>>
>>>>>>              I'm comparing MaxInlineSize (35) with FreqInlineSize
>>>>>>         (325).  AFAIU,
>>>>>>              MaxInlineSize drives which methods are inlined at parse
>>>>>>         time by C2,
>>>>>>              whereas FreqInlineSize is the threshold for "late" (or
>>>>>> what
>>>>>>         do you
>>>>>>              guys call inlining after parsing?) inlining.  Most of the
>>>>>>         inlining
>>>>>>              discussions (or worries, rather) seem to focus around the
>>>>>>              MaxInlineSize value, and not FreqInlineSize, even if the
>>>>>> target
>>>>>>              method will get hot.
>>>>>>
>>>>>>                  Usually, people care about 35 (= MaxInlineSize),
>>>>>>         because for
>>>>>>                  methods up to MaxInlineSize their call frequency is
>>>>>>         ignored. So,
>>>>>>                  fewer chances to end up with non-inlined call.
>>>>>>
>>>>>>
>>>>>>              Ok, so for hot methods then MaxInlineSize isn't really a
>>>>>>         concern,
>>>>>>              and FreqInlineSize would be the threshold to worry about
>>>>>>         (for C2
>>>>>>              compiler) then? Why are people worried about inlining in
>>>>>>         cold paths
>>>>>>              then?
>>>>>>
>>>>>>              Thanks Vladimir
>>>>>>
>>>>>>              On Thu, May 14, 2015 at 12:36 PM, Vladimir Ivanov
>>>>>>              <vladimir.x.ivanov at oracle.com
>>>>>>         <mailto:vladimir.x.ivanov at oracle.com>
>>>>>>         <mailto:vladimir.x.ivanov at oracle.com
>>>>>>
>>>>>>         <mailto:vladimir.x.ivanov at oracle.com>>>
>>>>>>              wrote:
>>>>>>
>>>>>>                  Vitaly,
>>>>>>
>>>>>>                  Can you elaborate your question a bit? What do you
>>>>>> compare
>>>>>>                  parse-time inlining with? Mentioning of С1 & profile
>>>>>>         pollution
>>>>>>                  in this context confuses me.
>>>>>>
>>>>>>                  Usually, people care about 35 (= MaxInlineSize),
>>>>>>         because for
>>>>>>                  methods up to MaxInlineSize their call frequency is
>>>>>>         ignored. So,
>>>>>>                  fewer chances to end up with non-inlined call.
>>>>>>
>>>>>>                  Best regards,
>>>>>>                  Vladimir Ivanov
>>>>>>
>>>>>>                  On 5/14/15 7:09 PM, Vitaly Davidovich wrote:
>>>>>>
>>>>>>                      Any pointers? Sorry to bug you guys, but just
>>>>>> want
>>>>>>         to make
>>>>>>                      sure I
>>>>>>                      understand this point as I see quite a bit of
>>>>>>         discussion on
>>>>>>                      core-libs
>>>>>>                      and elsewhere where people are worrying about
>>>>>> the 35
>>>>>>                      bytecode size
>>>>>>                      threshold for parse inlining.
>>>>>>
>>>>>>                      On Wed, May 13, 2015 at 3:36 PM, Vitaly
>>>>>> Davidovich
>>>>>>                      <vitalyd at gmail.com <mailto:vitalyd at gmail.com>
>>>>>>         <mailto:vitalyd at gmail.com <mailto:vitalyd at gmail.com>>
>>>>>>                      <mailto:vitalyd at gmail.com
>>>>>>         <mailto:vitalyd at gmail.com> <mailto:vitalyd at gmail.com
>>>>>>         <mailto:vitalyd at gmail.com>>>> wrote:
>>>>>>
>>>>>>                           Hi guys,
>>>>>>
>>>>>>                           Could someone please explain the advantage,
>>>>>> if
>>>>>>         any, of
>>>>>>                      parse time
>>>>>>                           inlining in C2? Given that FreqInlineSize
>>>>>> is quite
>>>>>>                      large by default,
>>>>>>                           most hot methods will get inlined anyway
>>>>>>         (well, ones
>>>>>>                      that can be for
>>>>>>                           other reasons).  What is the advantage of
>>>>>>         parse time
>>>>>>                      inlining?
>>>>>>
>>>>>>                           Is it quicker time to peak performance if C1
>>>>>>         is reached
>>>>>>                      first?
>>>>>>
>>>>>>                           Does it ensure that a method is inlined
>>>>>>         whereas it may
>>>>>>                      not be if
>>>>>>                           it's already compiled into a medium/large
>>>>>>         method otherwise?
>>>>>>
>>>>>>                           Is parse time inlining not susceptible to
>>>>>> profile
>>>>>>                      pollution? I
>>>>>>                           suspect it is since the interpreter has
>>>>>> already
>>>>>>                      profiled the inlinee
>>>>>>                           either way, but wanted to check.
>>>>>>
>>>>>>                           Anything else I'm not thinking about?
>>>>>>
>>>>>>                           Thanks
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150514/25c5190d/attachment-0001.html>


More information about the hotspot-compiler-dev mailing list