RFR: 8221542: ~15% performance degradation due to less optimized inline decision

Thu Apr 18 09:54:20 UTC 2019

Hi Vladimir,

> Though I don't consider parallel execution case as problematic,
> I got a better idea while browsing the code :-)
>
>   http://cr.openjdk.java.net/~vlivanov/jiefu/8221542/webrev.01

Aha! I've found a way to show you that the following condition in 
patch[1] does NOT hold with the parallel execution of the caller.
-----------------------------------------------
if (caller_method->was_executed_more_than(1))  return false; // trust 
profile
-----------------------------------------------

Step 1: Apply this patch
-----------------------------------------------
diff -r 5de35f58f70c src/hotspot/share/opto/bytecodeInfo.cpp

--- a/src/hotspot/share/opto/bytecodeInfo.cpp   Thu Apr 18 02:45:02 2019 
+0200
+++ b/src/hotspot/share/opto/bytecodeInfo.cpp   Thu Apr 18 17:32:16 2019 
+0800
@@ -374,6 +374,8 @@
        // Inlining was forced by CompilerOracle, ciReplay or annotation
      } else if (profile.count() == 0) {
        // don't inline unreached call sites
+       tty->print_cr("caller_method count = %d, 
was_executed_more_than(1) is %s",
+            caller_method->interpreter_invocation_count(), 
caller_method->was_executed_more_than(1) ? "true" : "false");
         set_msg("call site not reached");
         return false;
      }
-----------------------------------------------

Step 2: Run SPECjvm2008's scimark.monte_carlo with the reproduce 
script[2] on a machine with high parallelism.

Step 3: Just wait and see the result.


For example, I run it on an i7-8700 machine with just 12 threads.
Here is the result showing that profile.count is 0 && 
caller_method->was_executed_more_than(1) is true.
-----------------------------------------------
   Benchmark:   scimark.monte_carlo
   Run mode:    timed run
   Test type:   multi
   Threads:     12
   Warmup:      120s
   Iterations:  1
   Run length:  240s
     275   72             java.lang.StringBuilder::append (8 bytes)   
made not entrant
     275   99             java.io.File::<init> (47 bytes) made not entrant

Warmup (120s) begins: Thu Apr 18 17:25:33 CST 2019
     281  113  s spec.benchmarks.scimark.utils.Random::nextDouble (124 
bytes)
     282  114 % 
spec.benchmarks.scimark.monte_carlo.MonteCarlo::integrate @ 15 (68 bytes)
               s             @ 22 
spec.benchmarks.scimark.utils.Random::nextDouble (124 bytes) inline (hot)
               s             @ 28 
spec.benchmarks.scimark.utils.Random::nextDouble (124 bytes) inline (hot)
     432  114 % 
spec.benchmarks.scimark.monte_carlo.MonteCarlo::integrate @ 15 (68 
bytes)   made not entrant
     433  115 spec.benchmarks.scimark.monte_carlo.MonteCarlo::integrate 
(68 bytes)

caller_method count = 13, was_executed_more_than(1) is true

                             @ 6 
spec.benchmarks.scimark.utils.Random::<init> (53 bytes) call site not 
reached
               s             @ 22 
spec.benchmarks.scimark.utils.Random::nextDouble (124 bytes) inline (hot)
               s             @ 28 
spec.benchmarks.scimark.utils.Random::nextDouble (124 bytes) inline (hot)
     436  116 % 
spec.benchmarks.scimark.monte_carlo.MonteCarlo::integrate @ 15 (68 bytes)
               s             @ 22 
spec.benchmarks.scimark.utils.Random::nextDouble (124 bytes) inline (hot)
               s             @ 28 
spec.benchmarks.scimark.utils.Random::nextDouble (124 bytes) inline (hot)
-----------------------------------------------

So do you agree to remove that condition in your patch[1]?
Thanks a lot.

Best regards,
Jie


[1] http://cr.openjdk.java.net/~vlivanov/jiefu/8221542/webrev.00/
[2] http://cr.openjdk.java.net/~jiefu/monte_carlo-perf-drop/reproduce.sh