[9] RFR(M): 8167656: Unstable MethodHandle inlining causing huge performance variations
Nils Eliasson
nils.eliasson at oracle.com
Thu Oct 13 13:02:48 UTC 2016
Hi,
Please review,
Background:
The fix for JDK-8160543 "C1: Crash in java.lang.String.indexOf in some
java.sql tests" looked like it was causing a performance regression in
navierStokes benchmark. Consecutive runs of the benchmark showed very
high variance between results (~100% speedup). It showed a bimodal
behaviour where fast runs were as fast, and slow as slow, but the
proportion of slow increased.
Drilling down to the cause of this was hard, any intrusive debug flag
(like PrintAssembly) made the benchmark end up in the fast case.
Problem description:
The problem is that sometimes invokeSpecial calls aren't inlined because
no type profile data has been collected for that callsite even though it
is hot.
This method is compiled on level 3 (full profile):
java.lang.invoke.LambdaForm$DMH/2008276237::invokeSpecial_LID_V (18 bytes)
@ 1 java.lang.invoke.DirectMethodHandle::internalMemberName (8
bytes) force inline by annotation
@ 14 java.lang.invoke.MethodHandle::linkToSpecial(LIDL)V (0
bytes) MemberName not constant
@14 is a native method: compiled as
java.lang.invoke.MethodHandle::linkToSpecial(LIDL)V (native) (static)
Sometimes there are already some method data collected for
invokeSpecial_LID_V (interpreter?), then the generated code will look
slightly different and we will always end up in the fast case. If there
were no earlier profile data, it sometimes get collected and then we end
up in the fast case too.
When invokeSpecial_LID_V is later compiled on level 4 (full opt) this
method data is available:
-----------------
static
java.lang.invoke.LambdaForm$DMH/335334514::invokeSpecial_LID_V(Ljava/lang/Object;Ljava/lang/Object;ID)V
interpreter_invocation_count: 5472
invocation_counter: 5474
backedge_counter: 0
mdo size: 640 bytes
parameter types 0: stack(0)
'java/lang/invoke/DirectMethodHandle$Special'
1: stack(1)
'jdk/nashorn/internal/runtime/arrays/NumberArrayData'
0 fast_aload_0
1 invokestatic 18
<java/lang/invoke/DirectMethodHandle.internalMemberName(Ljava/lang/Object;)Ljava/lang/Object;>
bci: 1 CounterData count(5336)
argument types 0: stack(0)
'java/lang/invoke/DirectMethodHandle$Special'
return type 'java/lang/invoke/MemberName'
4 astore #5
6 aload_1
7 iload_2
8 dload_3
9 aload #5
11 checkcast 20 <java/lang/invoke/MemberName>
bci: 11 ReceiverTypeData count(0) nonprofiled_count(0) entries(0)
14 invokestatic 26
<java/lang/invoke/MethodHandle.linkToSpecial(Ljava/lang/Object;IDLjava/lang/invoke/MemberName;)V>
bci: 14 CounterData count(0)
argument types 0: stack(0) none
1: stack(4) none
-----------------
No data at bci 14 makes the inliner go:
[.. snippet from huge inline log...]
@ 55 java.lang.invoke.LambdaForm$DMH/221378065::invokeSpecial_LID_V (18
bytes) force inline by annotation
@ 1 java.lang.invoke.DirectMethodHandle::internalMemberName (8
bytes) force inline by annotation
@ 14 jdk.nashorn.internal.runtime.arrays.NumberArrayData::setElem
(24 bytes) call site not reached
..
@14 is a linkToSpecial that was redirected to a invokeSpecial, that is
linked to NumberArrayData::setElem that is missing a profile.
We can note that the perf data is still there as the method parameter
profile have caught the class.
Why does this only happens sometimes?
linkToSpecial is a shared form. It may be compiled into a signature
specialized method - but it is native and is missing type profile
counters. We will get no data when it is running. The level 3 compiled
invokeSpecial_LID_V sometimes only live a very short time ~50ms, and the
callstack of the benchmark is pretty deep. The amount of invocations
varies between 0 and ±500, often being very low, or quite high. It looks
like the call is used in some phases, and depending on the timing it may
not have been run.
Solution:
Don't forbid methodhandle call sites missing type profile data from
being inlined, the surrounding invokeSpecial probably will - leave it to
normal inlining heuristics to decide if it is hot enough.
Lessons:
Type profiling of methodHandle and Lambda form code may need to be
revisited. We both seem to profile to little and too much. Native
methods get no profile, but in normal level 3 code we are profiling the
same reference at every use causing a lot of overhead even in trivial
non-branchy code.
Testing:
The fix proposed fix makes us end up in the fast case every run. It
should remove a lot of variance from Nashorn benchmarks.
Bug: https://bugs.openjdk.java.net/browse/JDK-8167656
Webrev: http://cr.openjdk.java.net/~neliasso/8167656/webrev.01/
Regards,
Nils Eliasson
More information about the hotspot-compiler-dev
mailing list