profiling of branches - odd code generation?

Fri Jun 5 12:59:54 UTC 2015

Hi Gilles,

Yes, that's exactly what I was expecting/hoping C2 emits.  Nice to see
Graal does this.  Does Graal also use profile info for switch statements?
Would it peel out a check for 2 as the first thing in this case?

Thanks

sent from my phone
On Jun 5, 2015 3:35 AM, "Gilles Duboscq" <duboscq at ssw.jku.at> wrote:

> Regarding interpreter restart/uncommon_trap, for this kind of problem, the
> compiler has a better chance when it does not bind them to a precise
> bytecode restart location right from the start of the compilation. The
> branches can then be reordered and eliminated to achieve what you want.
>
> For example with Graal, on your code i get:
>
>   # {method} {0x00007f442685e318} 'f' '(I)I' in 'Test'
>   # parm0:    rsi       = int
>   #           [sp+0x20]  (sp of caller)
>   0x00007f44288d2ba0: mov    qword ptr [rsp-0x14000],rax
>   0x00007f44288d2ba8: sub    rsp,0x18
>   0x00007f44288d2bac: mov    qword ptr [rsp+0x10],rbp
>   0x00007f44288d2bb1: cmp    esi,0x2
>   0x00007f44288d2bb4: jne    0x00007f44288d2bca
>   0x00007f44288d2bba: mov    eax,0x3
>   0x00007f44288d2bbf: add    rsp,0x18
>   0x00007f44288d2bc3: test   dword ptr [rip+0x1a8d743d],eax  ;
> {poll_return}
>   0x00007f44288d2bc9: ret
>   0x00007f44288d2bca: mov    dword ptr [r15+0x8],0xffffffed
>   0x00007f44288d2bd2: mov    qword ptr [r15+0x10],r12
>   0x00007f44288d2bd6: call   0x00007f4428047341  ; OopMap{off=59}
>                                                 ;*iload_0 {reexecute=1
> rethrow=0 return_oop=0}
>                                                 ; - Test::f at 0 (line 10)
>                                                 ;   {runtime_call}
>
> Which is more or less what you wanted modulo frame setup/tear down and
> poll_return. In this deopt case, execution restarts at bci 0 (it
> re-executes the method from the start). In general it would re-execute from
> the last side-effect.
>
>  Gilles
>
> On Fri, Jun 5, 2015 at 2:35 AM Vitaly Davidovich <vitalyd at gmail.com>
> wrote:
>
>> Ok I see the complication with the restart in interpreter (I think that's
>> what Remi was saying as well).  I suspect that most checks will tend to not
>> have side effects (bad practice), but of course there may be some and for
>> more complex scenarios sufficient inlining would need to occur.  For simple
>> cases however, it's a pity since the VM has enough info and deopt ability
>> to do this safely.  The real case i had was checks against enum constants,
>> but the int example is just as good.
>>
>> As for switch, I didn't try it this time but we had a thread on here a
>> few months back where I was complaining about the switch not handling the
>> same type of scenario, and John Rose mentioned it's a known issue with
>> switches (i.e. no profile based optimization) :).  In addition, I find some
>> of the switch codegen suboptimal (e.g. same cmp performed back to back with
>> just a different jump after).  So I then tried if/else chain given that's
>> supposedly profiled, and it is but apparently the codegen isn't what I
>> thought it would be.  I was basically hoping for a single class CHA-like
>> analysis for branches :).
>>
>> sent from my phone
>> On Jun 4, 2015 8:15 PM, "Vladimir Kozlov" <vladimir.kozlov at oracle.com>
>> wrote:
>>
>>> Uncommon traps are bound to bytecode. If we hit uncommon trap, for
>>> example, for (x == 1) test then after deoptimization Interpreter will
>>> execute only 'return 2;'. If generated code as you suggested we need to
>>> bind uncommon trap to the BCI of the first (x == 0) check so it will be
>>> executed in Interpreter after deoptimization.
>>>
>>> So it is not simple optimization but doable for cases like this (integer
>>> checks).
>>>
>>> Did you tried 'switch' instead?
>>>
>>> Regards,
>>> Vladimir
>>>
>>> On 6/4/15 4:44 PM, Vitaly Davidovich wrote:
>>>
>>>> By the way, the context for this example is the following.  Suppose you
>>>> have a class with such a method.  This class is then used in different
>>>> java processes such that in each instance only one of those branches is
>>>> ever taken and the other compares have no side effects.  Ideally, the
>>>> compiled code would favor that fast path, which may not be the first arm
>>>> of the if/else chain.
>>>>
>>>> On Thu, Jun 4, 2015 at 7:42 PM, Vitaly Davidovich <vitalyd at gmail.com
>>>> <mailto:vitalyd at gmail.com>> wrote:
>>>>
>>>>     Thanks for the response Vladimir.  In this case though, can the JIT
>>>>     not see that the cmp bytecodes of non-taken branches have no side
>>>>     effects and remove them altogether? Is that just deemed not worth
>>>>     the cost or is there something fundamental I'm missing here?
>>>>
>>>>     On Thu, Jun 4, 2015 at 7:36 PM, Vladimir Kozlov
>>>>     <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>>
>>>> wrote:
>>>>
>>>>         VM does not profiling values. We profiling branches.
>>>>         When C2 construct control flow graph it follows bytecode. And it
>>>>         can't eliminate cmp code based only on branch profiling.
>>>>         Profiling still shows that all cmp bytecodes are always executed
>>>>         - only branches are not taken. We would eliminate tests if they
>>>>         were on non taken branch.
>>>>         We generate uncommon traps for branches which were not taken
>>>>         based on profiling.
>>>>
>>>>         Vladimir
>>>>
>>>>
>>>>         On 6/4/15 4:20 PM, Vitaly Davidovich wrote:
>>>>
>>>>             Hi,
>>>>
>>>>             Suppose you have a method like this:
>>>>
>>>>             private static int f(final int x) {
>>>>                       if (x == 0)
>>>>                           return 1;
>>>>                       else if (x == 1)
>>>>                           return 2;
>>>>                       else if (x == 2)
>>>>                           return 3;
>>>>                       return 4;
>>>>
>>>>                   }
>>>>
>>>>             If I then call it with x=2 always, the generated asm is not
>>>>             what I
>>>>             expect (8u40 with C2 compiler)
>>>>
>>>>                 # parm0:    rsi       = int
>>>>                 #           [sp+0x30]  (sp of caller)
>>>>                 0x00007fcc5970c520: mov    %eax,-0x14000(%rsp)
>>>>                 0x00007fcc5970c527: push   %rbp
>>>>                 0x00007fcc5970c528: sub    $0x20,%rsp
>>>>               ;*synchronization entry
>>>>
>>>>                 0x00007fcc5970c52c: test   %esi,%esi
>>>>                 0x00007fcc5970c52e: je     0x00007fcc5970c55d  ;*ifne
>>>>
>>>>                 0x00007fcc5970c530: cmp    $0x1,%esi
>>>>                 0x00007fcc5970c533: je     0x00007fcc5970c571
>>>> ;*if_icmpne
>>>>
>>>>                 0x00007fcc5970c535: cmp    $0x2,%esi
>>>>                 0x00007fcc5970c538: jne    0x00007fcc5970c54b
>>>> ;*if_icmpne
>>>>
>>>>                 0x00007fcc5970c53a: mov    $0x3,%eax
>>>>                 0x00007fcc5970c53f: add    $0x20,%rsp
>>>>                 0x00007fcc5970c543: pop    %rbp
>>>>                 0x00007fcc5970c544: test   %eax,0x5e0dab6(%rip)300000
>>>>                  #
>>>>             0x00007fcc5f51a000
>>>>                                                               ;
>>>>               {poll_return}
>>>>                 0x00007fcc5970c54a: retq
>>>>             <snip>
>>>>
>>>>             It's checking the if conditions in order, and then jumps to
>>>>             some runtime
>>>>             calls (I'm assuming that's for deopt to restore pruned
>>>>             branches? Cause I
>>>>             don't see anything that returns 1 or 2 otherwise).  Why is
>>>>             this code not
>>>>             favoring x=2? I'd have thought this code would be something
>>>>             like (after
>>>>             epilogue):
>>>>
>>>>             cmp $0x2, %esi
>>>>             jne <deopt_or_other_cases>
>>>>             mov $0x3, %eax
>>>>             retq
>>>>
>>>>             Thanks
>>>>
>>>>
>>>>
>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150605/5d73be7c/attachment.html>