Invokedynamic deoptimization issues

Wed Jan 14 19:28:32 UTC 2015

Sam,

I assume the scores are "lower/better".

I tried to run your benchmark and here's what I see.

On a sufficiently long run (-n >100) I observe that scores are equivalent:
indy wrapper: 7
direct: 8

But there's a spike in indy score which is caused by method invalidation 
and concequent recompilation:
     867   88    b        indybench.Main::timeIndy (73 bytes)

indy wrapper: 7

    3276   88             indybench.Main::timeIndy (73 bytes)   made not 
entrant

indy wrapper: 114

    3969   91    b        indybench.Main::timeIndy (73 bytes)

indy wrapper: 7

The cause of recompilation is a pruned branch in Integer.valueOf() 
during first compilation [1].

Can you confirm my observations that the score settles in a sufficiently 
long run?

If not, please, tell JDK version you are using (java -version). I'd also 
be grateful for -XX:+LogCompilation logs for 2 separate runs (-indy & 
-direct, -n 1000).

Best regards,
Vladimir Ivanov

[1] excerpt from "-XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation 
-XX:LogFile=hotspot.log":

<uncommon_trap thread='6403' reason='unstable_if' action='reinterpret' 
compile_id='88' compiler='C2' count='4' stamp='3.276'>
<jvms bci='10' method='java.lang.Integer valueOf (I)Ljava/lang/Integer;' 
bytes='32' count='2048' backedge_count='1' iicount='16140' 
decompiles='1' unstable_if_traps='4' overflow_recompiles='1'/>
<jvms bci='1' method='sun.invoke.util.ValueConversions boxInteger 
(I)Ljava/lang/Integer;' bytes='5' count='2048' iicount='16140'/>
<jvms bci='10' method='java.lang.invoke.LambdaForm$DMH/361993357 
invokeStatic_I_L (Ljava/lang/Object;I)Ljava/lang/Object;' bytes='14' 
count='2048' iicount='16140'/>
<jvms bci='51' method='java.lang.invoke.LambdaForm$BMH/1551870003 
reinvoke 
(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;' 
bytes='55' count='2048' iicount='16140'/>
<jvms bci='11' method='java.lang.invoke.LambdaForm$MH/391359742 
linkToCallSite 
(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;' 
bytes='15' count='1' backedge_count='1' iicount='10000'/>
<jvms bci='19' 
method='com.github.mustachejava.indy.W_Main_someMethod_3c771472_abcd_41de_94a9_8eb85a3ae5bf 
call ([Ljava/lang/Object;)Ljava/lang/Object;' bytes='25' count='2048' 
backedge_count='1' iicount='16141' decompiles='1'/>
<jvms bci='31' method='indybench.Main timeIndy (Lindybench/Main;)V' 
bytes='73' count='5000' backedge_count='2049' iicount='3' decompiles='3' 
unstable_if_traps='1'/>
</uncommon_trap>

On 1/14/15 9:57 PM, Sam Pullara wrote:
> I'm using invokedynamic bytecode:
>
> https://github.com/spullara/mustache.java/blob/master/indy/src/main/java/com/github/mustachejava/indy/IndyWrapper.java
>
> If you run the benchmark with "-direct -indy -codegenReflectionOH -n 10"
> you will see interesting behavior. It is able to optimize indy and to be
> as fast as the direct call but around the 7th iteration it optimizes it
> again and makes it 10x slower. When I use your flags, I see that there
> is little difference in the optimization output however:
>
> samair11:indybenchmark sam$ diff indy1.txt indy2.txt
>
> 1c1,2
>
> <    2408  396 %     4       indybench.Main::timeIndy @ 16 (73 bytes)
>
> ---
>
>  >    6482  412       4       indybench.Main::timeIndy (73 bytes)
>
>  >                               @ 0
> java.lang.System::currentTimeMillis (0 bytes)   (intrinsic)
>
> 4c5
>
> <                                \-> TypeProfile (604309/604309 counts)
> =
> com/github/mustachejava/indy/W_Main_someMethod_a36851e6_61ca_4b15_b94e_7c73cdc71d1c
>
> ---
>
>  >                                \-> TypeProfile (4204749/4204749
> counts) =
> com/github/mustachejava/indy/W_Main_someMethod_a36851e6_61ca_4b15_b94e_7c73cdc71d1c
>
> 8,9c9,10
>
> <                                    \-> TypeProfile (6124/56502 counts)
> = com/github/mustachejava/codegen/CompiledGuards1
>
> <                                    \-> TypeProfile (50378/56502
> counts) = com/github/mustachejava/codegen/CompiledGuards3
>
> ---
>
>  >                                    \-> TypeProfile (6124/100293
> counts) = com/github/mustachejava/codegen/CompiledGuards1
>
>  >                                    \-> TypeProfile (94169/100293
> counts) = com/github/mustachejava/codegen/CompiledGuards3
>
> 33a35
>
>  >                               @ 21   java.lang.Integer::intValue (5
> bytes)   accessor
>
> 55a58
>
>  >
>
>
> Sam
>
> Output:
>
> samair11:indybenchmark sam$ ./target/appassembler/bin/indybench -direct
> -indy -codegenReflectionOH -n 10
>
> codegen reflection OH: 736
>
> indy wrapper: 66
>
> direct: 16
>
> -----------------
>
> codegen reflection OH: 1136
>
> indy wrapper: 40
>
> direct: 25
>
> -----------------
>
> codegen reflection OH: 577
>
> indy wrapper: 4
>
> direct: 5
>
> -----------------
>
> codegen reflection OH: 658
>
> indy wrapper: 4
>
> direct: 6
>
> -----------------
>
> codegen reflection OH: 567
>
> indy wrapper: 3
>
> direct: 6
>
> -----------------
>
> codegen reflection OH: 658
>
> indy wrapper: 4
>
> direct: 6
>
> -----------------
>
> codegen reflection OH: 625
>
> indy wrapper: 5
>
> direct: 6
>
> -----------------
>
> codegen reflection OH: 648
>
> indy wrapper: 121
>
> direct: 6
>
> -----------------
>
> codegen reflection OH: 618
>
> indy wrapper: 65
>
> direct: 7
>
> -----------------
>
> codegen reflection OH: 553
>
> indy wrapper: 68
>
> direct: 6
>
> -----------------
>
>
>
> On Wed, Jan 14, 2015 at 10:40 AM, Vladimir Ivanov
> <vladimir.x.ivanov at oracle.com <mailto:vladimir.x.ivanov at oracle.com>> wrote:
>
>     Sam,
>
>     The first thing I usually do is run with:
>     -XX:+PrintCompilation -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining
>
>     I haven't looked at your benchmarks yet, but my main suspicion is
>     that MethodHandles aren't compile-time constants, so inlining fails.
>     Do you use invokedynamic bytecode or call
>     MethodHandle.invokeExact()/__invoke()? In the latter case, you
>     should store MethodHandles in static final fields. Otherwise, JIT
>     won't inline them.
>
>     Best regards,
>     Vladimir Ivanov
>
>
>     On 1/14/15 8:23 PM, Sam Pullara wrote:
>
>         Hi,
>
>         I've implemented a bunch of different ways for mustache.java to
>         get data
>         from names fields and methods. One thing I have noticed running
>         benchmarks
>         is that accessing the fields/methods in different ways can cause
>         invokedynamic to either never be optimized or actually be
>         deoptimized later
>         during the running of the benchmarks (20x+ difference in
>         performance). Here
>         is the code I have been running:
>
>         https://github.com/spullara/__indybenchmark
>         <https://github.com/spullara/indybenchmark>
>
>         I was wondering what the best way to debug this would be?
>
>         Thanks,
>         Sam
>
>