Invokedynamic deoptimization issues
Vladimir Ivanov
vladimir.x.ivanov at oracle.com
Wed Jan 14 19:28:32 UTC 2015
Sam,
I assume the scores are "lower/better".
I tried to run your benchmark and here's what I see.
On a sufficiently long run (-n >100) I observe that scores are equivalent:
indy wrapper: 7
direct: 8
But there's a spike in indy score which is caused by method invalidation
and concequent recompilation:
867 88 b indybench.Main::timeIndy (73 bytes)
indy wrapper: 7
3276 88 indybench.Main::timeIndy (73 bytes) made not
entrant
indy wrapper: 114
3969 91 b indybench.Main::timeIndy (73 bytes)
indy wrapper: 7
The cause of recompilation is a pruned branch in Integer.valueOf()
during first compilation [1].
Can you confirm my observations that the score settles in a sufficiently
long run?
If not, please, tell JDK version you are using (java -version). I'd also
be grateful for -XX:+LogCompilation logs for 2 separate runs (-indy &
-direct, -n 1000).
Best regards,
Vladimir Ivanov
[1] excerpt from "-XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation
-XX:LogFile=hotspot.log":
<uncommon_trap thread='6403' reason='unstable_if' action='reinterpret'
compile_id='88' compiler='C2' count='4' stamp='3.276'>
<jvms bci='10' method='java.lang.Integer valueOf (I)Ljava/lang/Integer;'
bytes='32' count='2048' backedge_count='1' iicount='16140'
decompiles='1' unstable_if_traps='4' overflow_recompiles='1'/>
<jvms bci='1' method='sun.invoke.util.ValueConversions boxInteger
(I)Ljava/lang/Integer;' bytes='5' count='2048' iicount='16140'/>
<jvms bci='10' method='java.lang.invoke.LambdaForm$DMH/361993357
invokeStatic_I_L (Ljava/lang/Object;I)Ljava/lang/Object;' bytes='14'
count='2048' iicount='16140'/>
<jvms bci='51' method='java.lang.invoke.LambdaForm$BMH/1551870003
reinvoke
(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;'
bytes='55' count='2048' iicount='16140'/>
<jvms bci='11' method='java.lang.invoke.LambdaForm$MH/391359742
linkToCallSite
(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;'
bytes='15' count='1' backedge_count='1' iicount='10000'/>
<jvms bci='19'
method='com.github.mustachejava.indy.W_Main_someMethod_3c771472_abcd_41de_94a9_8eb85a3ae5bf
call ([Ljava/lang/Object;)Ljava/lang/Object;' bytes='25' count='2048'
backedge_count='1' iicount='16141' decompiles='1'/>
<jvms bci='31' method='indybench.Main timeIndy (Lindybench/Main;)V'
bytes='73' count='5000' backedge_count='2049' iicount='3' decompiles='3'
unstable_if_traps='1'/>
</uncommon_trap>
On 1/14/15 9:57 PM, Sam Pullara wrote:
> I'm using invokedynamic bytecode:
>
> https://github.com/spullara/mustache.java/blob/master/indy/src/main/java/com/github/mustachejava/indy/IndyWrapper.java
>
> If you run the benchmark with "-direct -indy -codegenReflectionOH -n 10"
> you will see interesting behavior. It is able to optimize indy and to be
> as fast as the direct call but around the 7th iteration it optimizes it
> again and makes it 10x slower. When I use your flags, I see that there
> is little difference in the optimization output however:
>
> samair11:indybenchmark sam$ diff indy1.txt indy2.txt
>
> 1c1,2
>
> < 2408 396 % 4 indybench.Main::timeIndy @ 16 (73 bytes)
>
> ---
>
> > 6482 412 4 indybench.Main::timeIndy (73 bytes)
>
> > @ 0
> java.lang.System::currentTimeMillis (0 bytes) (intrinsic)
>
> 4c5
>
> < \-> TypeProfile (604309/604309 counts)
> =
> com/github/mustachejava/indy/W_Main_someMethod_a36851e6_61ca_4b15_b94e_7c73cdc71d1c
>
> ---
>
> > \-> TypeProfile (4204749/4204749
> counts) =
> com/github/mustachejava/indy/W_Main_someMethod_a36851e6_61ca_4b15_b94e_7c73cdc71d1c
>
> 8,9c9,10
>
> < \-> TypeProfile (6124/56502 counts)
> = com/github/mustachejava/codegen/CompiledGuards1
>
> < \-> TypeProfile (50378/56502
> counts) = com/github/mustachejava/codegen/CompiledGuards3
>
> ---
>
> > \-> TypeProfile (6124/100293
> counts) = com/github/mustachejava/codegen/CompiledGuards1
>
> > \-> TypeProfile (94169/100293
> counts) = com/github/mustachejava/codegen/CompiledGuards3
>
> 33a35
>
> > @ 21 java.lang.Integer::intValue (5
> bytes) accessor
>
> 55a58
>
> >
>
>
> Sam
>
> Output:
>
> samair11:indybenchmark sam$ ./target/appassembler/bin/indybench -direct
> -indy -codegenReflectionOH -n 10
>
> codegen reflection OH: 736
>
> indy wrapper: 66
>
> direct: 16
>
> -----------------
>
> codegen reflection OH: 1136
>
> indy wrapper: 40
>
> direct: 25
>
> -----------------
>
> codegen reflection OH: 577
>
> indy wrapper: 4
>
> direct: 5
>
> -----------------
>
> codegen reflection OH: 658
>
> indy wrapper: 4
>
> direct: 6
>
> -----------------
>
> codegen reflection OH: 567
>
> indy wrapper: 3
>
> direct: 6
>
> -----------------
>
> codegen reflection OH: 658
>
> indy wrapper: 4
>
> direct: 6
>
> -----------------
>
> codegen reflection OH: 625
>
> indy wrapper: 5
>
> direct: 6
>
> -----------------
>
> codegen reflection OH: 648
>
> indy wrapper: 121
>
> direct: 6
>
> -----------------
>
> codegen reflection OH: 618
>
> indy wrapper: 65
>
> direct: 7
>
> -----------------
>
> codegen reflection OH: 553
>
> indy wrapper: 68
>
> direct: 6
>
> -----------------
>
>
>
> On Wed, Jan 14, 2015 at 10:40 AM, Vladimir Ivanov
> <vladimir.x.ivanov at oracle.com <mailto:vladimir.x.ivanov at oracle.com>> wrote:
>
> Sam,
>
> The first thing I usually do is run with:
> -XX:+PrintCompilation -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining
>
> I haven't looked at your benchmarks yet, but my main suspicion is
> that MethodHandles aren't compile-time constants, so inlining fails.
> Do you use invokedynamic bytecode or call
> MethodHandle.invokeExact()/__invoke()? In the latter case, you
> should store MethodHandles in static final fields. Otherwise, JIT
> won't inline them.
>
> Best regards,
> Vladimir Ivanov
>
>
> On 1/14/15 8:23 PM, Sam Pullara wrote:
>
> Hi,
>
> I've implemented a bunch of different ways for mustache.java to
> get data
> from names fields and methods. One thing I have noticed running
> benchmarks
> is that accessing the fields/methods in different ways can cause
> invokedynamic to either never be optimized or actually be
> deoptimized later
> during the running of the benchmarks (20x+ difference in
> performance). Here
> is the code I have been running:
>
> https://github.com/spullara/__indybenchmark
> <https://github.com/spullara/indybenchmark>
>
> I was wondering what the best way to debug this would be?
>
> Thanks,
> Sam
>
>
More information about the hotspot-dev
mailing list