[9] RFR(M): 8167656: Unstable MethodHandle inlining causing huge performance variations

Nils Eliasson nils.eliasson at oracle.com
Thu Oct 13 13:02:48 UTC 2016


Hi,

Please review,

Background:
The fix for JDK-8160543 "C1: Crash in java.lang.String.indexOf in some 
java.sql tests" looked like it was causing a performance regression in 
navierStokes benchmark. Consecutive runs of the benchmark showed very 
high variance between results (~100% speedup). It showed a bimodal 
behaviour where fast runs were as fast, and slow as slow, but the 
proportion of slow increased.

Drilling down to the cause of this was hard, any intrusive debug flag 
(like PrintAssembly) made the benchmark end up in the fast case.

Problem description:
The problem is that sometimes invokeSpecial calls aren't inlined because 
no type profile data has been collected for that callsite even though it 
is hot.

This method is compiled on level 3 (full profile):

java.lang.invoke.LambdaForm$DMH/2008276237::invokeSpecial_LID_V (18 bytes)
     @ 1   java.lang.invoke.DirectMethodHandle::internalMemberName (8 
bytes)   force inline by annotation
     @ 14   java.lang.invoke.MethodHandle::linkToSpecial(LIDL)V (0 
bytes)   MemberName not constant

@14 is a native method: compiled as
java.lang.invoke.MethodHandle::linkToSpecial(LIDL)V (native) (static)

Sometimes there are already some method data collected for 
invokeSpecial_LID_V (interpreter?), then the generated code will look 
slightly different and we will always end up in the fast case. If there 
were no earlier profile data, it sometimes get collected and then we end 
up in the fast case too.

When invokeSpecial_LID_V is later compiled on level 4 (full opt) this 
method data is available:

-----------------
static 
java.lang.invoke.LambdaForm$DMH/335334514::invokeSpecial_LID_V(Ljava/lang/Object;Ljava/lang/Object;ID)V
   interpreter_invocation_count:     5472
   invocation_counter:               5474
   backedge_counter:                    0
   mdo size: 640 bytes

   parameter types        0: stack(0) 
'java/lang/invoke/DirectMethodHandle$Special'
                          1: stack(1) 
'jdk/nashorn/internal/runtime/arrays/NumberArrayData'
0 fast_aload_0
1 invokestatic 18 
<java/lang/invoke/DirectMethodHandle.internalMemberName(Ljava/lang/Object;)Ljava/lang/Object;> 

     bci: 1    CounterData         count(5336)
               argument types      0: stack(0) 
'java/lang/invoke/DirectMethodHandle$Special'
               return type 'java/lang/invoke/MemberName'
4 astore #5
6 aload_1
7 iload_2
8 dload_3
9 aload #5
11 checkcast 20 <java/lang/invoke/MemberName>
     bci: 11   ReceiverTypeData    count(0) nonprofiled_count(0) entries(0)
14 invokestatic 26 
<java/lang/invoke/MethodHandle.linkToSpecial(Ljava/lang/Object;IDLjava/lang/invoke/MemberName;)V> 

     bci: 14   CounterData         count(0)
               argument types      0: stack(0) none
                                   1: stack(4) none
-----------------

No data at bci 14 makes the inliner go:

[.. snippet from huge inline log...]
@ 55 java.lang.invoke.LambdaForm$DMH/221378065::invokeSpecial_LID_V (18 
bytes)   force inline by annotation
     @ 1   java.lang.invoke.DirectMethodHandle::internalMemberName (8 
bytes)   force inline by annotation
     @ 14 jdk.nashorn.internal.runtime.arrays.NumberArrayData::setElem 
(24 bytes)   call site not reached
..

@14 is a linkToSpecial that was redirected to a invokeSpecial, that is 
linked to NumberArrayData::setElem that is missing a profile.

We can note that the perf data is still there as the method parameter 
profile have caught the class.

Why does this only happens sometimes?
linkToSpecial is a shared form. It may be compiled into a signature 
specialized method - but it is native and is missing type profile 
counters. We will get no data when it is running. The level 3 compiled 
invokeSpecial_LID_V sometimes only live a very short time ~50ms, and the 
callstack of the benchmark is pretty deep. The amount of invocations 
varies between 0 and ±500, often being very low, or quite high. It looks 
like the call is used in some phases, and depending on the timing it may 
not have been run.

Solution:
Don't forbid methodhandle call sites missing type profile data from 
being inlined, the surrounding invokeSpecial probably will - leave it to 
normal inlining heuristics to decide if it is hot enough.

Lessons:
Type profiling of methodHandle and Lambda form code may need to be 
revisited. We both seem to profile to little and too much. Native 
methods get no profile, but in normal level 3 code we are profiling the 
same reference at every use causing a lot of overhead even in trivial 
non-branchy code.

Testing:
The fix proposed fix makes us end up in the fast case every run. It 
should remove a lot of variance from Nashorn benchmarks.

Bug: https://bugs.openjdk.java.net/browse/JDK-8167656
Webrev: http://cr.openjdk.java.net/~neliasso/8167656/webrev.01/

Regards,
Nils Eliasson


More information about the hotspot-compiler-dev mailing list