[9] RFR(M): 8167656: Unstable MethodHandle inlining causing huge performance variations

Nils Eliasson nils.eliasson at oracle.com
Wed Nov 9 20:45:56 UTC 2016


Hi,

After additional discussions with Vladimir I., Igor and Claes we tracked 
down the root cause to be missing profiling on invokestatic that is 
necessary for MH.linkToSpecial. (Can be seem in the examples earlier in 
this thread.) This missing profiling in compiled code was sometimes 
hidden by interpreter profiling that kicks in when C1 doesn't keep up 
with compiler requests.

This bug was be introduced and included in b120, which saw introduction 
of a lot of performance variation. In the same build was also indify 
string concat which which was assumed to be the cause of some of the 
changed behavior.

The bug was intruduced by JDK-8154172 which missed that calls without a 
reciver also should have profiling.

I fixed another issue in the same code in JDK-8160543 but missed this case.

The change is smaller than it looks at first sight. The check (recv != 
NULL) is moved out, making it only guard the nullcheck insertion.

Webrev: http://cr.openjdk.java.net/~neliasso/8167656/webrev.02/

Testing:

- nashorn octane for perf stability
- JCK/BINC, LinkageErrorTest for correctness

Please have a look,
Nils


On 2016-10-26 14:20, Vladimir Ivanov wrote:
> Nils,
>
> Thanks for taking care of the problem.
>
> It sounds like you fixed the symptoms (missed inlining opportunity), 
> but not the root cause.
>
> Invokers are heavily shared, but it can't explain inconsistent profile 
> info: the method was called 5k times, but some per-bytecode info is 
> absent (counts at @11 & @14 are 0s, but @1 it's ~5k) [1]. The method 
> has a single basic block and DMH.internalMemberName doesn't throw 
> exceptions (it's just a field getter).
>
> It doesn't even look MH-specific. Do you know why does it happen?
>
> Best regards,
> Vladimir Ivanov
>
> [1]
> -----------------
> static 
> java.lang.invoke.LambdaForm$DMH/335334514::invokeSpecial_LID_V(Ljava/lang/Object;Ljava/lang/Object;ID)V
>   interpreter_invocation_count:     5472
>   invocation_counter:               5474
>   backedge_counter:                    0
>   mdo size: 640 bytes
>
>   parameter types        0: stack(0) 
> 'java/lang/invoke/DirectMethodHandle$Special'
>                          1: stack(1) 
> 'jdk/nashorn/internal/runtime/arrays/NumberArrayData'
> 0 fast_aload_0
> 1 invokestatic 18 
> <java/lang/invoke/DirectMethodHandle.internalMemberName(Ljava/lang/Object;)Ljava/lang/Object;>
>     bci: 1    CounterData         count(5336)
>               argument types      0: stack(0) 
> 'java/lang/invoke/DirectMethodHandle$Special'
>               return type 'java/lang/invoke/MemberName'
> 4 astore #5
> 6 aload_1
> 7 iload_2
> 8 dload_3
> 9 aload #5
> 11 checkcast 20 <java/lang/invoke/MemberName>
>     bci: 11   ReceiverTypeData    count(0) nonprofiled_count(0) 
> entries(0)
> 14 invokestatic 26 
> <java/lang/invoke/MethodHandle.linkToSpecial(Ljava/lang/Object;IDLjava/lang/invoke/MemberName;)V>
>     bci: 14   CounterData         count(0)
>               argument types      0: stack(0) none
>                                   1: stack(4) none
> -----------------
>
> On 10/13/16 4:02 PM, Nils Eliasson wrote:
>> Hi,
>>
>> Please review,
>>
>> Background:
>> The fix for JDK-8160543 "C1: Crash in java.lang.String.indexOf in some
>> java.sql tests" looked like it was causing a performance regression in
>> navierStokes benchmark. Consecutive runs of the benchmark showed very
>> high variance between results (~100% speedup). It showed a bimodal
>> behaviour where fast runs were as fast, and slow as slow, but the
>> proportion of slow increased.
>>
>> Drilling down to the cause of this was hard, any intrusive debug flag
>> (like PrintAssembly) made the benchmark end up in the fast case.
>>
>> Problem description:
>> The problem is that sometimes invokeSpecial calls aren't inlined because
>> no type profile data has been collected for that callsite even though it
>> is hot.
>>
>> This method is compiled on level 3 (full profile):
>>
>> java.lang.invoke.LambdaForm$DMH/2008276237::invokeSpecial_LID_V (18 
>> bytes)
>>     @ 1 java.lang.invoke.DirectMethodHandle::internalMemberName (8
>> bytes)   force inline by annotation
>>     @ 14   java.lang.invoke.MethodHandle::linkToSpecial(LIDL)V (0
>> bytes)   MemberName not constant
>>
>> @14 is a native method: compiled as
>> java.lang.invoke.MethodHandle::linkToSpecial(LIDL)V (native) (static)
>>
>> Sometimes there are already some method data collected for
>> invokeSpecial_LID_V (interpreter?), then the generated code will look
>> slightly different and we will always end up in the fast case. If there
>> were no earlier profile data, it sometimes get collected and then we end
>> up in the fast case too.
>>
>> When invokeSpecial_LID_V is later compiled on level 4 (full opt) this
>> method data is available:
>>
>> -----------------
>> static
>> java.lang.invoke.LambdaForm$DMH/335334514::invokeSpecial_LID_V(Ljava/lang/Object;Ljava/lang/Object;ID)V 
>>
>>
>>   interpreter_invocation_count:     5472
>>   invocation_counter:               5474
>>   backedge_counter:                    0
>>   mdo size: 640 bytes
>>
>>   parameter types        0: stack(0)
>> 'java/lang/invoke/DirectMethodHandle$Special'
>>                          1: stack(1)
>> 'jdk/nashorn/internal/runtime/arrays/NumberArrayData'
>> 0 fast_aload_0
>> 1 invokestatic 18
>> <java/lang/invoke/DirectMethodHandle.internalMemberName(Ljava/lang/Object;)Ljava/lang/Object;> 
>>
>>
>>     bci: 1    CounterData         count(5336)
>>               argument types      0: stack(0)
>> 'java/lang/invoke/DirectMethodHandle$Special'
>>               return type 'java/lang/invoke/MemberName'
>> 4 astore #5
>> 6 aload_1
>> 7 iload_2
>> 8 dload_3
>> 9 aload #5
>> 11 checkcast 20 <java/lang/invoke/MemberName>
>>     bci: 11   ReceiverTypeData    count(0) nonprofiled_count(0) 
>> entries(0)
>> 14 invokestatic 26
>> <java/lang/invoke/MethodHandle.linkToSpecial(Ljava/lang/Object;IDLjava/lang/invoke/MemberName;)V> 
>>
>>
>>     bci: 14   CounterData         count(0)
>>               argument types      0: stack(0) none
>>                                   1: stack(4) none
>> -----------------
>>
>> No data at bci 14 makes the inliner go:
>>
>> [.. snippet from huge inline log...]
>> @ 55 java.lang.invoke.LambdaForm$DMH/221378065::invokeSpecial_LID_V (18
>> bytes)   force inline by annotation
>>     @ 1 java.lang.invoke.DirectMethodHandle::internalMemberName (8
>> bytes)   force inline by annotation
>>     @ 14 jdk.nashorn.internal.runtime.arrays.NumberArrayData::setElem
>> (24 bytes)   call site not reached
>> ..
>>
>> @14 is a linkToSpecial that was redirected to a invokeSpecial, that is
>> linked to NumberArrayData::setElem that is missing a profile.
>>
>> We can note that the perf data is still there as the method parameter
>> profile have caught the class.
>>
>> Why does this only happens sometimes?
>> linkToSpecial is a shared form. It may be compiled into a signature
>> specialized method - but it is native and is missing type profile
>> counters. We will get no data when it is running. The level 3 compiled
>> invokeSpecial_LID_V sometimes only live a very short time ~50ms, and the
>> callstack of the benchmark is pretty deep. The amount of invocations
>> varies between 0 and ±500, often being very low, or quite high. It looks
>> like the call is used in some phases, and depending on the timing it may
>> not have been run.
>>
>> Solution:
>> Don't forbid methodhandle call sites missing type profile data from
>> being inlined, the surrounding invokeSpecial probably will - leave it to
>> normal inlining heuristics to decide if it is hot enough.
>>
>> Lessons:
>> Type profiling of methodHandle and Lambda form code may need to be
>> revisited. We both seem to profile to little and too much. Native
>> methods get no profile, but in normal level 3 code we are profiling the
>> same reference at every use causing a lot of overhead even in trivial
>> non-branchy code.
>>
>> Testing:
>> The fix proposed fix makes us end up in the fast case every run. It
>> should remove a lot of variance from Nashorn benchmarks.
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8167656
>> Webrev: http://cr.openjdk.java.net/~neliasso/8167656/webrev.01/
>>
>> Regards,
>> Nils Eliasson

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20161109/d0ee56cb/attachment-0001.html>


More information about the hotspot-compiler-dev mailing list