Conditional moves vs. branching in unrolled loops

Vitaly Davidovich vitalyd at gmail.com
Wed Jan 6 14:45:35 UTC 2016


>
> Ok. The generated code for an unrolled loop firsts load array elements
> into registers before performing the cmovs.


Yes, but the cmov cannot proceed until that load retires.  If you had a
normal branch, speculation can continue past the branch and put more
instructions into the pipeline barring other hazards/dependencies.  By
"available in registers" I meant a cmov executed against 2 values in
registers that are already available (i.e. the loads which put the values
into registers have already completed, or the registers were set with
immediates, etc).

Basically, if the cost of branch misprediction is higher than waiting for
both inputs to cmov to be available, then cmov is better.  For very
predictable branches, cmov is a loss (as we've already established in this
thread) and I think always will be (i.e. cpu vendors seem to be putting
more and more smarts into branch prediction instead).

Yes, that was me not understanding the underlying branch profiling
> mechanisms.


Actually, that question of mine was more aimed at John who said we should
do something special for loops with max/min accumulators :).


On Wed, Jan 6, 2016 at 9:34 AM, Paul Sandoz <paul.sandoz at oracle.com> wrote:

>
> > On 6 Jan 2016, at 13:38, Vitaly Davidovich <vitalyd at gmail.com> wrote:
> >
> > Perhaps for conditional moves data dependency chains are more costly?
> >
> > cmov carries a dependency on both inputs, making it more likely to stall
> when at least one isn't available whereas the branch still allows cpu to
> continue with speculative execution.  In a tight loop with a memory access
> as one input to cmov, the memory op has to retire before cmov can proceed;
> using cmov when both inputs are already ready (e.g. values in registers) is
> pretty harmless though and avoids a branch entirely.  cmov also has larger
> encoding than a branch.
> >
>
> Ok. The generated code for an unrolled loop firsts load array elements
> into registers before performing the cmovs.
>
>
> > As the original jira on this issue states, cmov should only be used when
> the branch is profiled to be unpredictable.  I'm not sure why loops with a
> max/min accumulator need to be called out separately in this regard -
> wouldn't the branch profile dictate this anyway?
>
> Yes, that was me not understanding the underlying branch profiling
> mechanisms.
>
> Paul.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160106/d48ec24b/attachment.html>


More information about the hotspot-compiler-dev mailing list