C2: Advantage of parse time inlining
Vitaly Davidovich
vitalyd at gmail.com
Thu May 14 21:35:21 UTC 2015
Thanks Kris. Hmm, this sounds pretty bad for non-tiered compilations with
a relatively low CompileThreshold. If I have a (larger than MaxInlineSize)
method executed 49% of the time, it'll inline at CompileThreshold=10k but
not CompileThreshold=100. Or am I missing something?
On Thu, May 14, 2015 at 5:28 PM, Krystal Mok <rednaxelafx at gmail.com> wrote:
> Yes and no.
>
> intx counter_high_value;
> // Tiered compilation uses a different "high value" than non-tiered
> compilation.
> // Determine the right value to use.
> if (TieredCompilation) {
> counter_high_value = InvocationCounter::count_limit / 2;
> } else {
> counter_high_value = CompileThreshold / 2;
> }
> if
> (!callee_method->was_executed_more_than(MIN2(MinInliningThreshold,
> counter_high_value))) {
> set_msg("executed < MinInliningThreshold times");
> return true;
> }
>
> So it's not scaling MinInliningThreshold directly, but rather using a min
> of MinInliningThreshold and counter_high_value (where the latter is
> calculated from CompileThreshold when not using tiered compilation) to make
> the actual decision.
>
> Because tiered is on by default now, the short answer to your question
> would probably be a "no".
>
> - Kris
>
> On Thu, May 14, 2015 at 1:01 PM, Vitaly Davidovich <vitalyd at gmail.com>
> wrote:
>
>> Right, thank you.
>>
>> Is MinInliningThreshold scaled with XX:CompileThreshold=XXX values? Say I
>> turn CompileThreshold down to 100 (as an example).
>>
>> On Thu, May 14, 2015 at 3:20 PM, Vladimir Kozlov <
>> vladimir.kozlov at oracle.com> wrote:
>>
>>> On 5/14/15 12:02 PM, Vitaly Davidovich wrote:
>>>
>>>> Thanks Vladimir. I recall seeing changes around incremental inlining,
>>>> and may have mistakenly thought it happens at some later point in time.
>>>> Appreciate the clarification.
>>>>
>>>> Ok, so based on what you say, I can see a theoretical problem whereby a
>>>> method is being parsed, is larger than MaxInlineSize, but doesn't happen
>>>> to be frequent enough yet at this point, and so it won't be inlined; if
>>>> it turns out to be hot later on, the lack of inlining will not be undone
>>>> (assuming the caller isn't deopted and recompiled later, with updated
>>>> frequency info, for other reasons).
>>>>
>>>
>>> That is correct.
>>>
>>> Note, that the problem is not how hot is callee (invocation times) but
>>> how hot the call site in caller. Usually it does not change during
>>> execution. If it is called in a loop C2 will try to inline because freq
>>> should be high.
>>>
>>> There is MinInliningThreshold (250) but since we compile caller when it
>>> is executed 10000 times the call site should be on slow path which is
>>> executed only 2.5% times. So you will not see the performance difference if
>>> we inline it or call callee which is compiled if it is hot.
>>>
>>> Vladimir
>>>
>>>
>>>> Thanks again.
>>>>
>>>> On Thu, May 14, 2015 at 2:30 PM, Vladimir Kozlov
>>>> <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>> wrote:
>>>>
>>>> Vitaly,
>>>>
>>>> You have small misconception - almost all C2 inlining (normal java
>>>> methods) is done during parsing in one pass. Recently we changed it
>>>> to inline jsr292 methods after parsing (to execute IGVN and reduce
>>>> graph - otherwise they blow up number of ideal nodes and we bailout
>>>> compilation due to MaxNodeLimit).
>>>>
>>>> As parser goes and see a call site it check can it inline or not
>>>> (see opto/bytecodeinfo.cpp, should_inline() and
>>>> should_not_inline()). There are several conditions which drive
>>>> inlining and following (most important) flags controls them:
>>>> MaxTrivialSize, MaxInlineSize, FreqInlineSize, InlineSmallCode,
>>>> MinInliningThreshold, MaxInlineLevel, InlineFrequencyCount,
>>>> InlineFrequencyRatio.
>>>>
>>>> Most tedious flag is InlineSmallCode (size of compiled assembler
>>>> code which is different from other sizes which are bytecode size)
>>>> which control inlining of already compiled method. Usually if a call
>>>> site is hot the callee is compiled before caller. But sometimes you
>>>> can get caller compiled first (if it has hot loop, for example) so
>>>> different condition will be used and as result you can get
>>>> performance variation between runs.
>>>>
>>>> The difference between MaxInlineSize (35) and FreqInlineSize (325)
>>>> is FreqInlineSize takes into account how frequent call site is
>>>> executed relatively to caller invocations:
>>>>
>>>> int call_site_count = method()->scale_count(profile.count());
>>>> int invoke_count = method()->interpreter_invocation_count();
>>>> int freq = call_site_count / invoke_count;
>>>> int max_inline_size = MaxInlineSize;
>>>> // bump the max size if the call is frequent
>>>> if ((freq >= InlineFrequencyRatio) ||
>>>> (call_site_count >= InlineFrequencyCount) ||
>>>> is_unboxing_method(callee_method, C) ||
>>>> is_init_with_ea(callee_method, caller_method, C)) {
>>>> max_inline_size = FreqInlineSize;
>>>>
>>>> And there is additional inlining condition for all methods which
>>>> size > MaxTrivialSize:
>>>>
>>>> if
>>>> (!callee_method->was_executed_more_than(MinInliningThreshold)) {
>>>> set_msg("executed < MinInliningThreshold times");
>>>>
>>>> Regards,
>>>> Vladimir
>>>>
>>>> On 5/14/15 10:03 AM, Vitaly Davidovich wrote:
>>>>
>>>> I should also add that I see how inlining without taking call
>>>> freq into
>>>> account could lead to faster time to peak performance for
>>>> methods that
>>>> eventually get hot anyway but aren't at parse time. Peak perf
>>>> will be
>>>> the same if the method is too big for parse inlining but
>>>> eventually gets
>>>> compiled due to reaching hotness. Is that about right?
>>>>
>>>> sent from my phone
>>>>
>>>> On May 14, 2015 12:57 PM, "Vitaly Davidovich" <
>>>> vitalyd at gmail.com
>>>> <mailto:vitalyd at gmail.com>
>>>> <mailto:vitalyd at gmail.com <mailto:vitalyd at gmail.com>>> wrote:
>>>>
>>>> Vladimir,
>>>>
>>>> I'm comparing MaxInlineSize (35) with FreqInlineSize
>>>> (325). AFAIU,
>>>> MaxInlineSize drives which methods are inlined at parse
>>>> time by C2,
>>>> whereas FreqInlineSize is the threshold for "late" (or what
>>>> do you
>>>> guys call inlining after parsing?) inlining. Most of the
>>>> inlining
>>>> discussions (or worries, rather) seem to focus around the
>>>> MaxInlineSize value, and not FreqInlineSize, even if the
>>>> target
>>>> method will get hot.
>>>>
>>>> Usually, people care about 35 (= MaxInlineSize),
>>>> because for
>>>> methods up to MaxInlineSize their call frequency is
>>>> ignored. So,
>>>> fewer chances to end up with non-inlined call.
>>>>
>>>>
>>>> Ok, so for hot methods then MaxInlineSize isn't really a
>>>> concern,
>>>> and FreqInlineSize would be the threshold to worry about
>>>> (for C2
>>>> compiler) then? Why are people worried about inlining in
>>>> cold paths
>>>> then?
>>>>
>>>> Thanks Vladimir
>>>>
>>>> On Thu, May 14, 2015 at 12:36 PM, Vladimir Ivanov
>>>> <vladimir.x.ivanov at oracle.com
>>>> <mailto:vladimir.x.ivanov at oracle.com>
>>>> <mailto:vladimir.x.ivanov at oracle.com
>>>>
>>>> <mailto:vladimir.x.ivanov at oracle.com>>>
>>>> wrote:
>>>>
>>>> Vitaly,
>>>>
>>>> Can you elaborate your question a bit? What do you
>>>> compare
>>>> parse-time inlining with? Mentioning of С1 & profile
>>>> pollution
>>>> in this context confuses me.
>>>>
>>>> Usually, people care about 35 (= MaxInlineSize),
>>>> because for
>>>> methods up to MaxInlineSize their call frequency is
>>>> ignored. So,
>>>> fewer chances to end up with non-inlined call.
>>>>
>>>> Best regards,
>>>> Vladimir Ivanov
>>>>
>>>> On 5/14/15 7:09 PM, Vitaly Davidovich wrote:
>>>>
>>>> Any pointers? Sorry to bug you guys, but just want
>>>> to make
>>>> sure I
>>>> understand this point as I see quite a bit of
>>>> discussion on
>>>> core-libs
>>>> and elsewhere where people are worrying about the
>>>> 35
>>>> bytecode size
>>>> threshold for parse inlining.
>>>>
>>>> On Wed, May 13, 2015 at 3:36 PM, Vitaly Davidovich
>>>> <vitalyd at gmail.com <mailto:vitalyd at gmail.com>
>>>> <mailto:vitalyd at gmail.com <mailto:vitalyd at gmail.com>>
>>>> <mailto:vitalyd at gmail.com
>>>> <mailto:vitalyd at gmail.com> <mailto:vitalyd at gmail.com
>>>> <mailto:vitalyd at gmail.com>>>> wrote:
>>>>
>>>> Hi guys,
>>>>
>>>> Could someone please explain the advantage, if
>>>> any, of
>>>> parse time
>>>> inlining in C2? Given that FreqInlineSize is
>>>> quite
>>>> large by default,
>>>> most hot methods will get inlined anyway
>>>> (well, ones
>>>> that can be for
>>>> other reasons). What is the advantage of
>>>> parse time
>>>> inlining?
>>>>
>>>> Is it quicker time to peak performance if C1
>>>> is reached
>>>> first?
>>>>
>>>> Does it ensure that a method is inlined
>>>> whereas it may
>>>> not be if
>>>> it's already compiled into a medium/large
>>>> method otherwise?
>>>>
>>>> Is parse time inlining not susceptible to
>>>> profile
>>>> pollution? I
>>>> suspect it is since the interpreter has
>>>> already
>>>> profiled the inlinee
>>>> either way, but wanted to check.
>>>>
>>>> Anything else I'm not thinking about?
>>>>
>>>> Thanks
>>>>
>>>>
>>>>
>>>>
>>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150514/2e466425/attachment-0001.html>
More information about the hotspot-compiler-dev
mailing list