Strange branching performance

Wed Feb 12 18:33:42 PST 2014

We can try to use hint prefixes for branches as experiment:

http://software.intel.com/en-us/articles/branch-and-loop-reorganization-to-prevent-mispredicts

Thanks,
Vladimir

On 2/12/14 6:14 PM, Vitaly Davidovich wrote:
> FWIW, I recall reading on gcc forums that someone made a comparison of
> cmov vs jmp on i7 and bulldozer, and concluded that cmov becomes better
> if the branch is predicted < 92% of the time or so.  In addition,
> bulldozer suffered bigger penalty for cmov than intel.  I can try to dig
> up that thread if there's interest.
>
> In addition to register pressure, cmov also adds a dependency chain and
> the instruction size is bigger.
>
> I guess try to write code with more predictable branching is the answer :).
>
> Sent from my phone
>
> On Feb 12, 2014 7:20 PM, "Vladimir Kozlov" <vladimir.kozlov at oracle.com
> <mailto:vladimir.kozlov at oracle.com>> wrote:
>
>     Hi Martin,
>
>     The issue is more complicated than I thought. The code I pointed
>     before was added by me about 3 years ago for:
>
>     7097546: Optimize use of CMOVE instructions
>     https://bugs.openjdk.java.net/__browse/JDK-7097546
>     <https://bugs.openjdk.java.net/browse/JDK-7097546>
>
>     Changes were done to avoid 2x performance hit with cmov for code
>     like next:
>
>          public static int test(int result, int limit, int mask) { //
>     mask = 15
>              for (int i = 0; i < limit; i++) {
>                if ((i&mask) == 0) result++; // Non frequent
>              }
>              return result;
>          }
>
>     Cmov instruction has big flow - it requires an additional register.
>     If loop's body is complex, using cmov will result in a register
>     spilling - additional instructions. The performance hit could be
>     high than branch misprediction.
>
>     I am not sure how to proceed from here. I may do some benchmark
>     testing to see affects if cmov is used in more cases.
>
>     Regards,
>     Vladimir
>
>     On 2/8/14 1:11 PM, Martin Grajcar wrote:
>
>         Hi Vladimir!
>
>         On Sat, Feb 8, 2014 at 4:36 AM, Vladimir Kozlov
>         <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>
>         <mailto:vladimir.kozlov at __oracle.com
>         <mailto:vladimir.kozlov at oracle.com>>> wrote:
>
>              Hi Martin,
>
>              Your observation is correct. The corresponding code is next:
>
>                 float infrequent_prob = PROB_UNLIKELY_MAG(3); // 0.001
>
>                 // BlockLayoutByFrequency optimization moves infrequent
>         branch
>                 // from hot path. No point in CMOV'ing in such case (110
>         is used
>                 // instead of 100 to take into account not exactness of
>         float value).
>                 if (BlockLayoutByFrequency) {
>                   infrequent_prob = MAX2(infrequent_prob,
>              (float)____BlockLayoutMinDiamondPercentag____e/110.0f);
>                 }
>                 // Check for highly predictable branch.  No point in
>         CMOV'ing if
>                 // we are going to predict accurately all the time.
>                 if (iff->_prob < infrequent_prob ||
>                     iff->_prob > (1.0f - infrequent_prob))
>                   return NULL;
>
>              Note, BlockLayoutMinDiamondPercentag____e is default 20 so
>              infrequent_prob become 0.2 as you observed.
>
>
>         Yes, there's a sharp edge somewhere below 0.2.
>
>              C2 moves infrequent code outside the loop (with branches
>         out and
>              back) to keep only hot code inside.
>
>
>         To me it looks like there's nothing to be moved outside of the loop.
>         Mainly because you'd hardy save anything as you'd replace the two
>         instructions
>
>         LEA (%result_reg, 1), %tmp_reg
>         CMOVEQ %tmp_reg, %result_reg
>
>         by a conditional jump. Saving a single instruction on the hot
>         path and
>         risking a branch misprediction penalty might make sense for very low
>         probabilities like PROB_UNLIKELY_MAG(3), not 20%.
>
>              It looks like it does not happen in your case and I need to
>         look
>              why. There are several conditions besides
>         BlockLayoutByFrequency and
>              the above code could be incorrect and needs to be fixed (or
>         removed).
>
>
>         Nice that you can look into it. There are a lot of attempts to
>         eliminate
>         branching manually like in
>         http://grepcode.com/file/__repo1.maven.org/maven2/com.__google.guava/guava/15.0/com/__google/common/math/IntMath.__java#IntMath.gcd%28int%2Cint%__29
>         <http://grepcode.com/file/repo1.maven.org/maven2/com.google.guava/guava/15.0/com/google/common/math/IntMath.java#IntMath.gcd%28int%2Cint%29>
>         but this is nearly always less efficient than using CMOVcc.
>
>         Regards,
>         Martin.
>