Odd inlining failure
Vitaly Davidovich
vitalyd at gmail.com
Wed Sep 28 14:49:34 UTC 2016
Apologies, I accidentally dropped the list from my reply to Roland (quoted
below).
On Wed, Sep 28, 2016 at 10:38 AM, Roland Westrelin <rwestrel at redhat.com>
wrote:
>
> > In this case when b() is called its invocation count is +1 to c() because
> > c() is only called by b(). Now, a() has a big switch statement, with one
> > arm calling into b(). a() is called in a loop of sorts. So I think the
> > switch arm calling b() gets hot and inlining starts. But since inlining
> is
> > going top-down here, I suspect it's failing to inline helper methods,
> such
> > as c(), that are just as hot as b() but the 10000'th invocation hasn't
> been
> > recorded yet? This seems kind of broken though, if true, so I'm wondering
> > if I'm missing something.
> >
> > When recursively inlining, starting at a hot method, do the recursive
> > callsites, like c(), need to also have exactly 10,000 (or more)
> > invocations? What if it's, say, 9995?
>
> CompileThreshold=10000 is when compilation is triggered. It doesn't come
> into play to decide whether inlining happens or not. Also if you have
> loops, compilation is triggered when invocation counter + backedge
> counter exceeds CompileThreshold.
>
> Note also, that profiling (invocation counters at calls etc.) doesn't
> start until a method has been invoked a minimum number of times:
> InterpreterProfilePercentage % of CompileThreshold. So profiling doesn't
> start until invocation counter + backedge counter is greater than 3300
> by default with tiered off. If your method is inlined before it's been
> invoked 3300, all the call sites in the method are cold.
>
Ah, maybe that's the reason -- there's a loop in the outer method (a()), so
maybe that's the cause. I'll need to look at the compilation log or the
PrintMethodData that you suggested.
>
> And if methods are invoked by multiple threads, updates to the counters
> can be lost.
>
Single thread here.
>
> The code that triggers inlining is:
>
> int call_site_count = method()->scale_count(profile.count());
> int invoke_count = method()->interpreter_invocation_count();
>
> assert(invoke_count != 0, "require invocation count greater than zero");
> int freq = call_site_count / invoke_count;
>
> // bump the max size if the call is frequent
> if ((freq >= InlineFrequencyRatio) ||
> (call_site_count >= InlineFrequencyCount) ||
> is_unboxing_method(callee_method, C) ||
> is_init_with_ea(callee_method, caller_method, C)) {
>
> max_inline_size = C->freq_inline_size();
> if (size <= max_inline_size && TraceFrequencyInlining) {
> CompileTask::print_inline_indent(inline_level());
> tty->print_cr("Inlined frequent method (freq=%d count=%d):", freq,
> call_site_count);
> CompileTask::print_inline_indent(inline_level());
> callee_method->print();
> tty->cr();
> }
> } else {
> // Not hot. Check for medium-sized pre-existing nmethod at cold sites.
> if (callee_method->has_compiled_code() &&
> callee_method->instructions_size() > inline_small_code_size) {
> set_msg("already compiled into a medium method");
> return false;
> }
> }
> if (size > max_inline_size) {
> if (max_inline_size > default_max_inline_size) {
> set_msg("hot method too big");
> } else {
> set_msg("too big");
> }
> return false;
> }
>
> So a call site is hot if the call site count exceeds
> InlineFrequencyCount (100) or the frequency (ratio of number of time the
> call was taken and the number of time the caller was entered) exceeds
> InlineFrequencyRatio (20). InlineFrequencyCount is way below 10000.
>
> Do you have this as a simple test case that you can share?
>
I don't yet - I'll see if I can reproduce something. As noted,
microbenchmarks/reduced test cases usually do the right thing but when
same/similar code shapes/call graphs are incorporated into a large app,
they don't.
>
> > I need to go look at the inlining heuristic code again, but maybe you
> know
> > offhand.
> >
> > As a general observation, I'm seeing lots of inlining failures, for a
> > variety of reasons, in a complex app where I think inlining would help.
> > The heuristics aren't doing the "right" thing. I know there are a few
> > longstanding JBS entries around inlining, but I'm wondering if they will
> > ever be addressed or whether Graal simply takes over for C2. I wonder if
> > Oracle or RedHat or anyone else looks at inlining output on large apps
> as a
> > way to assess its effect? Microbenchmarks are usually fine because the
> > profile is different, methods typically don't fail to inline because of
> > InlineSmallCode, etc.
> >
> > I know I'm preaching to the choir and I apologize for the semi-rant, but
> > inlining is paramount to Java performance, moreso than other languages
> (eg
> > C/C++) because of all the safety checks. Given @ForceInline isn't really
> > available for end users, it's a huge pain and sometimes practically
> > impossible to convince C2 to inline something.
> >
> > I understand Graal has better inlining properties (I believe it pseudo
> > inlines to see if it's profitable, regardless of bytecode size). Is that
> > the Hotspot answer to improved inlining?
> >
> > What the heck is everyone else doing for large apps with lots of hot
> > callsites? :) I can move some code around manually to outline some
> > (uncommon) code to slim down methods, but that's a hack IMO.
>
> You didn't send that email to the list. Was it intended?
Argh - no, that was unintentional. I'm adding the list back in here.
> I'm curious
> what others would say. All I can say is that inlining heuristics are a
> known weakness of c2. Improving them is not a simple project. Also
> having graal on the horizon probably doesn't help: it could be a lot of
> work that will be of little value when graal is here, whenever that
> happens.
>
> Roland.
>
Thanks again
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160928/e2e12ee4/attachment.html>
More information about the hotspot-compiler-dev
mailing list