C2: Advantage of parse time inlining
Vitaly Davidovich
vitalyd at gmail.com
Thu May 14 20:01:29 UTC 2015
Right, thank you.
Is MinInliningThreshold scaled with XX:CompileThreshold=XXX values? Say I
turn CompileThreshold down to 100 (as an example).
On Thu, May 14, 2015 at 3:20 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com
> wrote:
> On 5/14/15 12:02 PM, Vitaly Davidovich wrote:
>
>> Thanks Vladimir. I recall seeing changes around incremental inlining,
>> and may have mistakenly thought it happens at some later point in time.
>> Appreciate the clarification.
>>
>> Ok, so based on what you say, I can see a theoretical problem whereby a
>> method is being parsed, is larger than MaxInlineSize, but doesn't happen
>> to be frequent enough yet at this point, and so it won't be inlined; if
>> it turns out to be hot later on, the lack of inlining will not be undone
>> (assuming the caller isn't deopted and recompiled later, with updated
>> frequency info, for other reasons).
>>
>
> That is correct.
>
> Note, that the problem is not how hot is callee (invocation times) but how
> hot the call site in caller. Usually it does not change during execution.
> If it is called in a loop C2 will try to inline because freq should be high.
>
> There is MinInliningThreshold (250) but since we compile caller when it is
> executed 10000 times the call site should be on slow path which is executed
> only 2.5% times. So you will not see the performance difference if we
> inline it or call callee which is compiled if it is hot.
>
> Vladimir
>
>
>> Thanks again.
>>
>> On Thu, May 14, 2015 at 2:30 PM, Vladimir Kozlov
>> <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>> wrote:
>>
>> Vitaly,
>>
>> You have small misconception - almost all C2 inlining (normal java
>> methods) is done during parsing in one pass. Recently we changed it
>> to inline jsr292 methods after parsing (to execute IGVN and reduce
>> graph - otherwise they blow up number of ideal nodes and we bailout
>> compilation due to MaxNodeLimit).
>>
>> As parser goes and see a call site it check can it inline or not
>> (see opto/bytecodeinfo.cpp, should_inline() and
>> should_not_inline()). There are several conditions which drive
>> inlining and following (most important) flags controls them:
>> MaxTrivialSize, MaxInlineSize, FreqInlineSize, InlineSmallCode,
>> MinInliningThreshold, MaxInlineLevel, InlineFrequencyCount,
>> InlineFrequencyRatio.
>>
>> Most tedious flag is InlineSmallCode (size of compiled assembler
>> code which is different from other sizes which are bytecode size)
>> which control inlining of already compiled method. Usually if a call
>> site is hot the callee is compiled before caller. But sometimes you
>> can get caller compiled first (if it has hot loop, for example) so
>> different condition will be used and as result you can get
>> performance variation between runs.
>>
>> The difference between MaxInlineSize (35) and FreqInlineSize (325)
>> is FreqInlineSize takes into account how frequent call site is
>> executed relatively to caller invocations:
>>
>> int call_site_count = method()->scale_count(profile.count());
>> int invoke_count = method()->interpreter_invocation_count();
>> int freq = call_site_count / invoke_count;
>> int max_inline_size = MaxInlineSize;
>> // bump the max size if the call is frequent
>> if ((freq >= InlineFrequencyRatio) ||
>> (call_site_count >= InlineFrequencyCount) ||
>> is_unboxing_method(callee_method, C) ||
>> is_init_with_ea(callee_method, caller_method, C)) {
>> max_inline_size = FreqInlineSize;
>>
>> And there is additional inlining condition for all methods which
>> size > MaxTrivialSize:
>>
>> if (!callee_method->was_executed_more_than(MinInliningThreshold)) {
>> set_msg("executed < MinInliningThreshold times");
>>
>> Regards,
>> Vladimir
>>
>> On 5/14/15 10:03 AM, Vitaly Davidovich wrote:
>>
>> I should also add that I see how inlining without taking call
>> freq into
>> account could lead to faster time to peak performance for
>> methods that
>> eventually get hot anyway but aren't at parse time. Peak perf
>> will be
>> the same if the method is too big for parse inlining but
>> eventually gets
>> compiled due to reaching hotness. Is that about right?
>>
>> sent from my phone
>>
>> On May 14, 2015 12:57 PM, "Vitaly Davidovich" <vitalyd at gmail.com
>> <mailto:vitalyd at gmail.com>
>> <mailto:vitalyd at gmail.com <mailto:vitalyd at gmail.com>>> wrote:
>>
>> Vladimir,
>>
>> I'm comparing MaxInlineSize (35) with FreqInlineSize
>> (325). AFAIU,
>> MaxInlineSize drives which methods are inlined at parse
>> time by C2,
>> whereas FreqInlineSize is the threshold for "late" (or what
>> do you
>> guys call inlining after parsing?) inlining. Most of the
>> inlining
>> discussions (or worries, rather) seem to focus around the
>> MaxInlineSize value, and not FreqInlineSize, even if the
>> target
>> method will get hot.
>>
>> Usually, people care about 35 (= MaxInlineSize),
>> because for
>> methods up to MaxInlineSize their call frequency is
>> ignored. So,
>> fewer chances to end up with non-inlined call.
>>
>>
>> Ok, so for hot methods then MaxInlineSize isn't really a
>> concern,
>> and FreqInlineSize would be the threshold to worry about
>> (for C2
>> compiler) then? Why are people worried about inlining in
>> cold paths
>> then?
>>
>> Thanks Vladimir
>>
>> On Thu, May 14, 2015 at 12:36 PM, Vladimir Ivanov
>> <vladimir.x.ivanov at oracle.com
>> <mailto:vladimir.x.ivanov at oracle.com>
>> <mailto:vladimir.x.ivanov at oracle.com
>>
>> <mailto:vladimir.x.ivanov at oracle.com>>>
>> wrote:
>>
>> Vitaly,
>>
>> Can you elaborate your question a bit? What do you
>> compare
>> parse-time inlining with? Mentioning of С1 & profile
>> pollution
>> in this context confuses me.
>>
>> Usually, people care about 35 (= MaxInlineSize),
>> because for
>> methods up to MaxInlineSize their call frequency is
>> ignored. So,
>> fewer chances to end up with non-inlined call.
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> On 5/14/15 7:09 PM, Vitaly Davidovich wrote:
>>
>> Any pointers? Sorry to bug you guys, but just want
>> to make
>> sure I
>> understand this point as I see quite a bit of
>> discussion on
>> core-libs
>> and elsewhere where people are worrying about the 35
>> bytecode size
>> threshold for parse inlining.
>>
>> On Wed, May 13, 2015 at 3:36 PM, Vitaly Davidovich
>> <vitalyd at gmail.com <mailto:vitalyd at gmail.com>
>> <mailto:vitalyd at gmail.com <mailto:vitalyd at gmail.com>>
>> <mailto:vitalyd at gmail.com
>> <mailto:vitalyd at gmail.com> <mailto:vitalyd at gmail.com
>> <mailto:vitalyd at gmail.com>>>> wrote:
>>
>> Hi guys,
>>
>> Could someone please explain the advantage, if
>> any, of
>> parse time
>> inlining in C2? Given that FreqInlineSize is
>> quite
>> large by default,
>> most hot methods will get inlined anyway
>> (well, ones
>> that can be for
>> other reasons). What is the advantage of
>> parse time
>> inlining?
>>
>> Is it quicker time to peak performance if C1
>> is reached
>> first?
>>
>> Does it ensure that a method is inlined
>> whereas it may
>> not be if
>> it's already compiled into a medium/large
>> method otherwise?
>>
>> Is parse time inlining not susceptible to
>> profile
>> pollution? I
>> suspect it is since the interpreter has already
>> profiled the inlinee
>> either way, but wanted to check.
>>
>> Anything else I'm not thinking about?
>>
>> Thanks
>>
>>
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150514/534493cc/attachment.html>
More information about the hotspot-compiler-dev
mailing list