profiling of branches - odd code generation?
Vitaly Davidovich
vitalyd at gmail.com
Fri Jun 5 12:59:54 UTC 2015
Hi Gilles,
Yes, that's exactly what I was expecting/hoping C2 emits. Nice to see
Graal does this. Does Graal also use profile info for switch statements?
Would it peel out a check for 2 as the first thing in this case?
Thanks
sent from my phone
On Jun 5, 2015 3:35 AM, "Gilles Duboscq" <duboscq at ssw.jku.at> wrote:
> Regarding interpreter restart/uncommon_trap, for this kind of problem, the
> compiler has a better chance when it does not bind them to a precise
> bytecode restart location right from the start of the compilation. The
> branches can then be reordered and eliminated to achieve what you want.
>
> For example with Graal, on your code i get:
>
> # {method} {0x00007f442685e318} 'f' '(I)I' in 'Test'
> # parm0: rsi = int
> # [sp+0x20] (sp of caller)
> 0x00007f44288d2ba0: mov qword ptr [rsp-0x14000],rax
> 0x00007f44288d2ba8: sub rsp,0x18
> 0x00007f44288d2bac: mov qword ptr [rsp+0x10],rbp
> 0x00007f44288d2bb1: cmp esi,0x2
> 0x00007f44288d2bb4: jne 0x00007f44288d2bca
> 0x00007f44288d2bba: mov eax,0x3
> 0x00007f44288d2bbf: add rsp,0x18
> 0x00007f44288d2bc3: test dword ptr [rip+0x1a8d743d],eax ;
> {poll_return}
> 0x00007f44288d2bc9: ret
> 0x00007f44288d2bca: mov dword ptr [r15+0x8],0xffffffed
> 0x00007f44288d2bd2: mov qword ptr [r15+0x10],r12
> 0x00007f44288d2bd6: call 0x00007f4428047341 ; OopMap{off=59}
> ;*iload_0 {reexecute=1
> rethrow=0 return_oop=0}
> ; - Test::f at 0 (line 10)
> ; {runtime_call}
>
> Which is more or less what you wanted modulo frame setup/tear down and
> poll_return. In this deopt case, execution restarts at bci 0 (it
> re-executes the method from the start). In general it would re-execute from
> the last side-effect.
>
> Gilles
>
> On Fri, Jun 5, 2015 at 2:35 AM Vitaly Davidovich <vitalyd at gmail.com>
> wrote:
>
>> Ok I see the complication with the restart in interpreter (I think that's
>> what Remi was saying as well). I suspect that most checks will tend to not
>> have side effects (bad practice), but of course there may be some and for
>> more complex scenarios sufficient inlining would need to occur. For simple
>> cases however, it's a pity since the VM has enough info and deopt ability
>> to do this safely. The real case i had was checks against enum constants,
>> but the int example is just as good.
>>
>> As for switch, I didn't try it this time but we had a thread on here a
>> few months back where I was complaining about the switch not handling the
>> same type of scenario, and John Rose mentioned it's a known issue with
>> switches (i.e. no profile based optimization) :). In addition, I find some
>> of the switch codegen suboptimal (e.g. same cmp performed back to back with
>> just a different jump after). So I then tried if/else chain given that's
>> supposedly profiled, and it is but apparently the codegen isn't what I
>> thought it would be. I was basically hoping for a single class CHA-like
>> analysis for branches :).
>>
>> sent from my phone
>> On Jun 4, 2015 8:15 PM, "Vladimir Kozlov" <vladimir.kozlov at oracle.com>
>> wrote:
>>
>>> Uncommon traps are bound to bytecode. If we hit uncommon trap, for
>>> example, for (x == 1) test then after deoptimization Interpreter will
>>> execute only 'return 2;'. If generated code as you suggested we need to
>>> bind uncommon trap to the BCI of the first (x == 0) check so it will be
>>> executed in Interpreter after deoptimization.
>>>
>>> So it is not simple optimization but doable for cases like this (integer
>>> checks).
>>>
>>> Did you tried 'switch' instead?
>>>
>>> Regards,
>>> Vladimir
>>>
>>> On 6/4/15 4:44 PM, Vitaly Davidovich wrote:
>>>
>>>> By the way, the context for this example is the following. Suppose you
>>>> have a class with such a method. This class is then used in different
>>>> java processes such that in each instance only one of those branches is
>>>> ever taken and the other compares have no side effects. Ideally, the
>>>> compiled code would favor that fast path, which may not be the first arm
>>>> of the if/else chain.
>>>>
>>>> On Thu, Jun 4, 2015 at 7:42 PM, Vitaly Davidovich <vitalyd at gmail.com
>>>> <mailto:vitalyd at gmail.com>> wrote:
>>>>
>>>> Thanks for the response Vladimir. In this case though, can the JIT
>>>> not see that the cmp bytecodes of non-taken branches have no side
>>>> effects and remove them altogether? Is that just deemed not worth
>>>> the cost or is there something fundamental I'm missing here?
>>>>
>>>> On Thu, Jun 4, 2015 at 7:36 PM, Vladimir Kozlov
>>>> <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>>
>>>> wrote:
>>>>
>>>> VM does not profiling values. We profiling branches.
>>>> When C2 construct control flow graph it follows bytecode. And it
>>>> can't eliminate cmp code based only on branch profiling.
>>>> Profiling still shows that all cmp bytecodes are always executed
>>>> - only branches are not taken. We would eliminate tests if they
>>>> were on non taken branch.
>>>> We generate uncommon traps for branches which were not taken
>>>> based on profiling.
>>>>
>>>> Vladimir
>>>>
>>>>
>>>> On 6/4/15 4:20 PM, Vitaly Davidovich wrote:
>>>>
>>>> Hi,
>>>>
>>>> Suppose you have a method like this:
>>>>
>>>> private static int f(final int x) {
>>>> if (x == 0)
>>>> return 1;
>>>> else if (x == 1)
>>>> return 2;
>>>> else if (x == 2)
>>>> return 3;
>>>> return 4;
>>>>
>>>> }
>>>>
>>>> If I then call it with x=2 always, the generated asm is not
>>>> what I
>>>> expect (8u40 with C2 compiler)
>>>>
>>>> # parm0: rsi = int
>>>> # [sp+0x30] (sp of caller)
>>>> 0x00007fcc5970c520: mov %eax,-0x14000(%rsp)
>>>> 0x00007fcc5970c527: push %rbp
>>>> 0x00007fcc5970c528: sub $0x20,%rsp
>>>> ;*synchronization entry
>>>>
>>>> 0x00007fcc5970c52c: test %esi,%esi
>>>> 0x00007fcc5970c52e: je 0x00007fcc5970c55d ;*ifne
>>>>
>>>> 0x00007fcc5970c530: cmp $0x1,%esi
>>>> 0x00007fcc5970c533: je 0x00007fcc5970c571
>>>> ;*if_icmpne
>>>>
>>>> 0x00007fcc5970c535: cmp $0x2,%esi
>>>> 0x00007fcc5970c538: jne 0x00007fcc5970c54b
>>>> ;*if_icmpne
>>>>
>>>> 0x00007fcc5970c53a: mov $0x3,%eax
>>>> 0x00007fcc5970c53f: add $0x20,%rsp
>>>> 0x00007fcc5970c543: pop %rbp
>>>> 0x00007fcc5970c544: test %eax,0x5e0dab6(%rip)300000
>>>> #
>>>> 0x00007fcc5f51a000
>>>> ;
>>>> {poll_return}
>>>> 0x00007fcc5970c54a: retq
>>>> <snip>
>>>>
>>>> It's checking the if conditions in order, and then jumps to
>>>> some runtime
>>>> calls (I'm assuming that's for deopt to restore pruned
>>>> branches? Cause I
>>>> don't see anything that returns 1 or 2 otherwise). Why is
>>>> this code not
>>>> favoring x=2? I'd have thought this code would be something
>>>> like (after
>>>> epilogue):
>>>>
>>>> cmp $0x2, %esi
>>>> jne <deopt_or_other_cases>
>>>> mov $0x3, %eax
>>>> retq
>>>>
>>>> Thanks
>>>>
>>>>
>>>>
>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150605/5d73be7c/attachment.html>
More information about the hotspot-compiler-dev
mailing list