request for review (L): 7121756 Improve C1 inlining policy by using profiling at call sites

Thu Dec 15 07:28:42 PST 2011

Hi Roland,

Interesting. Now C1's getting more of the heavier gears :-)

A few things I'd like to ask:

1. How does this change interact with tiered mode?
I see that in a tiered build, C1ProfileInlining is set to false in
arguments.cpp. So this set of changes is only meant for a Client VM, right?

2. Is this change mainly targeted at embedded builds?
In a desktop/server scenario, I had a feeling that the Client VM was going
away, replaced by a unified tiered VM in the future. Hence the question.

3. Are there any plans to do a late-inlining phase for C1?
I tried to make C1 able to inline at more callsites by adding
Phi::exact_type(), but failed [1]. The main reason for failing is that
before the whole HIR graph is built, the CFG isn't stable yet, and quering
Phi::exact_type() during graph building just won't work. But if there's
more compilation budget to spend, say we're in a hot method, an optional
inlining phase after the HIR graph is built might be profitable. I'd like
to see more comments on this point.

Regards,
Kris Mok

[1]:
http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2011-August/006049.html

On Thu, Dec 15, 2011 at 10:52 PM, Roland Westrelin <
roland.westrelin at oracle.com> wrote:

> http://cr.openjdk.java.net/~roland/7121756/webrev.00/
>
> Implements profile based inlining in C1.
>
> Execution of a method starts interpreted as usual. A method transitions
> from interpreted to compiled in the usual way as well. When the method is
> compiled, the compiler identifies a number of call sites that are
> candidates for profiling and further inlining. At those call sites, the
> method is compiled so that a per call site counter is incremented and
> tested for overflow when the call site is used. On first call site
> resolution, a timestamp is also recorded. The count and timestamp are used
> to compute a frequency. A frequency higher than a high water mark detects a
> hot call site. A hot call site triggers a recompilation of the caller
> method in which the callee is inlined. A frequency higher than a low water
> mark detects a warm call site. Otherwise the call site is cold. Recompiling
> with the extra inlining won't bring a performance advantage for a warm or
> cold call site. But keeping the profiling on at a warm call site is
> detrimental so it is dropped. At a cold call site profiling can be kept
> enabled to trigger later recompilation if the call site becomes hot.
>
> To perform profiling, the compiler identifies the candidate call sites and
> generates a stub similar to the static call stub in the nmethod's stub
> area. The profile call stub performs the following step:
> 1- load mdo pointer in register
> 2- increment counter for call site
> 3- branch to runtime if counter crosses threshold
> 4- jump to callee
>
> On call site resolution, for a call to a compiled method, the jump (4-
> above) is patched with the resolved call site info (to continue to callee's
> code or transition stub) then the call site is patched to point to the
> profile call stub. Profiling can be later fully disabled for the call site
> (if the call site is polymorphic or if the compilation policy finds it's
> better to not profile the call site anymore) by reresolving the call.
>
> The compiler also uses profile data to inline a frequent virtual method if
> profile data suggests a single receiver class. State changes of inline
> caches associated with call sites (performed in the runtime) are used to
> collect receiver class data. Correctness during execution is enforced with
> a compiled guard and a deoptimization can be triggered.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20111215/65f4a3b8/attachment.html