request for review (L): 7121756 Improve C1 inlining policy by using profiling at call sites

Thu May 3 08:53:01 PDT 2012

Dec 15 is a long time ago :-)  Sorry, I thought this has been reviewed.

src/share/vm/c1/c1_GraphBuilder.cpp:

+  if (C1TypeProfileInlining && 
+      code != Bytecodes::_invokestatic &&
+      code != Bytecodes::_invokespecial &&
+      code != Bytecodes::_invokedynamic &&
+      (code != Bytecodes::_invokevirtual || !target->is_loaded() || !target->is_final_method())) {

Can this be reversed and simplified?

src/share/vm/c1/c1_LIR.cpp:

+  assert (code == lir_checkcast ||code == lir_instanceof, "expects checkcast or instanceof only");

Spacing is odd.

src/share/vm/ci/ciMethod.cpp:

+bool ciMethod::profile_is_hot_helper(ciProfileData* data, int bci, bool &warm, bool full) {

What is "full"?  Maybe you can find a better name.

src/share/vm/code/compiledIC.cpp:

+bool CompiledProfile::is_profiled(NativeCall* call) {
+#ifdef COMPILER2
+  return false;
+#endif
+#ifdef COMPILER1
+  if (!C1ProfileInlining) return false;

I think this will produce a compiler warning (or error) when compiling C2 since it defines -DCOMPILER2 -DCOMPILER1.

+#ifdef ARM
+    // The first word of a static call stub has no reloc info. So if
+    // we don't find a reloc info we know, it's a static call stub.
+    return obj != NULL;
+#endif

Why does ARM not require a reloc info?

More to come...

I remember you mentioning to rework the patch.  Is there a new version available?

-- Chris

On Dec 15, 2011, at 6:52 AM, Roland Westrelin wrote:

> http://cr.openjdk.java.net/~roland/7121756/webrev.00/
> 
> Implements profile based inlining in C1. 
> 
> Execution of a method starts interpreted as usual. A method transitions from interpreted to compiled in the usual way as well. When the method is compiled, the compiler identifies a number of call sites that are candidates for profiling and further inlining. At those call sites, the method is compiled so that a per call site counter is incremented and tested for overflow when the call site is used. On first call site resolution, a timestamp is also recorded. The count and timestamp are used to compute a frequency. A frequency higher than a high water mark detects a hot call site. A hot call site triggers a recompilation of the caller method in which the callee is inlined. A frequency higher than a low water mark detects a warm call site. Otherwise the call site is cold. Recompiling with the extra inlining won't bring a performance advantage for a warm or cold call site. But keeping the profiling on at a warm call site is detrimental so it is dropped. At a cold call site profiling can be kept enabled to trigger later recompilation if the call site becomes hot.
> 
> To perform profiling, the compiler identifies the candidate call sites and generates a stub similar to the static call stub in the nmethod's stub area. The profile call stub performs the following step:
> 1- load mdo pointer in register
> 2- increment counter for call site
> 3- branch to runtime if counter crosses threshold
> 4- jump to callee
> 
> On call site resolution, for a call to a compiled method, the jump (4- above) is patched with the resolved call site info (to continue to callee's code or transition stub) then the call site is patched to point to the profile call stub. Profiling can be later fully disabled for the call site (if the call site is polymorphic or if the compilation policy finds it's better to not profile the call site anymore) by reresolving the call.
> 
> The compiler also uses profile data to inline a frequent virtual method if profile data suggests a single receiver class. State changes of inline caches associated with call sites (performed in the runtime) are used to collect receiver class data. Correctness during execution is enforced with a compiled guard and a deoptimization can be triggered.