C2: Advantage of parse time inlining

Thu May 14 18:30:35 UTC 2015

Vitaly,

You have small misconception - almost all C2 inlining (normal java 
methods) is done during parsing in one pass. Recently we changed it to 
inline jsr292 methods after parsing (to execute IGVN and reduce graph - 
otherwise they blow up number of ideal nodes and we bailout compilation 
due to MaxNodeLimit).

As parser goes and see a call site it check can it inline or not (see 
opto/bytecodeinfo.cpp, should_inline() and should_not_inline()). There 
are several conditions which drive inlining and following (most 
important) flags controls them: MaxTrivialSize, MaxInlineSize, 
FreqInlineSize, InlineSmallCode, MinInliningThreshold, MaxInlineLevel, 
InlineFrequencyCount, InlineFrequencyRatio.

Most tedious flag is InlineSmallCode (size of compiled assembler code 
which is different from other sizes which are bytecode size) which 
control inlining of already compiled method. Usually if a call site is 
hot the callee is compiled before caller. But sometimes you can get 
caller compiled first (if it has hot loop, for example) so different 
condition will be used and as result you can get performance variation 
between runs.

The difference between MaxInlineSize (35) and FreqInlineSize (325) is 
FreqInlineSize takes into account how frequent call site is executed 
relatively to caller invocations:

   int call_site_count  = method()->scale_count(profile.count());
   int invoke_count     = method()->interpreter_invocation_count();
   int freq = call_site_count / invoke_count;
   int max_inline_size  = MaxInlineSize;
   // bump the max size if the call is frequent
   if ((freq >= InlineFrequencyRatio) ||
       (call_site_count >= InlineFrequencyCount) ||
       is_unboxing_method(callee_method, C) ||
       is_init_with_ea(callee_method, caller_method, C)) {
     max_inline_size = FreqInlineSize;

And there is additional inlining condition for all methods which size > 
MaxTrivialSize:

   if (!callee_method->was_executed_more_than(MinInliningThreshold)) {
     set_msg("executed < MinInliningThreshold times");

Regards,
Vladimir

On 5/14/15 10:03 AM, Vitaly Davidovich wrote:
> I should also add that I see how inlining without taking call freq into
> account could lead to faster time to peak performance for methods that
> eventually get hot anyway but aren't at parse time.  Peak perf will be
> the same if the method is too big for parse inlining but eventually gets
> compiled due to reaching hotness.  Is that about right?
>
> sent from my phone
>
> On May 14, 2015 12:57 PM, "Vitaly Davidovich" <vitalyd at gmail.com
> <mailto:vitalyd at gmail.com>> wrote:
>
>     Vladimir,
>
>     I'm comparing MaxInlineSize (35) with FreqInlineSize (325).  AFAIU,
>     MaxInlineSize drives which methods are inlined at parse time by C2,
>     whereas FreqInlineSize is the threshold for "late" (or what do you
>     guys call inlining after parsing?) inlining.  Most of the inlining
>     discussions (or worries, rather) seem to focus around the
>     MaxInlineSize value, and not FreqInlineSize, even if the target
>     method will get hot.
>
>         Usually, people care about 35 (= MaxInlineSize), because for
>         methods up to MaxInlineSize their call frequency is ignored. So,
>         fewer chances to end up with non-inlined call.
>
>
>     Ok, so for hot methods then MaxInlineSize isn't really a concern,
>     and FreqInlineSize would be the threshold to worry about (for C2
>     compiler) then? Why are people worried about inlining in cold paths
>     then?
>
>     Thanks Vladimir
>
>     On Thu, May 14, 2015 at 12:36 PM, Vladimir Ivanov
>     <vladimir.x.ivanov at oracle.com <mailto:vladimir.x.ivanov at oracle.com>>
>     wrote:
>
>         Vitaly,
>
>         Can you elaborate your question a bit? What do you compare
>         parse-time inlining with? Mentioning of С1 & profile pollution
>         in this context confuses me.
>
>         Usually, people care about 35 (= MaxInlineSize), because for
>         methods up to MaxInlineSize their call frequency is ignored. So,
>         fewer chances to end up with non-inlined call.
>
>         Best regards,
>         Vladimir Ivanov
>
>         On 5/14/15 7:09 PM, Vitaly Davidovich wrote:
>
>             Any pointers? Sorry to bug you guys, but just want to make
>             sure I
>             understand this point as I see quite a bit of discussion on
>             core-libs
>             and elsewhere where people are worrying about the 35
>             bytecode size
>             threshold for parse inlining.
>
>             On Wed, May 13, 2015 at 3:36 PM, Vitaly Davidovich
>             <vitalyd at gmail.com <mailto:vitalyd at gmail.com>
>             <mailto:vitalyd at gmail.com <mailto:vitalyd at gmail.com>>> wrote:
>
>                  Hi guys,
>
>                  Could someone please explain the advantage, if any, of
>             parse time
>                  inlining in C2? Given that FreqInlineSize is quite
>             large by default,
>                  most hot methods will get inlined anyway (well, ones
>             that can be for
>                  other reasons).  What is the advantage of parse time
>             inlining?
>
>                  Is it quicker time to peak performance if C1 is reached
>             first?
>
>                  Does it ensure that a method is inlined whereas it may
>             not be if
>                  it's already compiled into a medium/large method otherwise?
>
>                  Is parse time inlining not susceptible to profile
>             pollution? I
>                  suspect it is since the interpreter has already
>             profiled the inlinee
>                  either way, but wanted to check.
>
>                  Anything else I'm not thinking about?
>
>                  Thanks
>
>
>