Strange branching performance

Vladimir Kozlov vladimir.kozlov at oracle.com
Fri Feb 7 19:36:02 PST 2014


Hi Martin,

Your observation is correct. The corresponding code is next:

   float infrequent_prob = PROB_UNLIKELY_MAG(3); // 0.001

   // BlockLayoutByFrequency optimization moves infrequent branch
   // from hot path. No point in CMOV'ing in such case (110 is used
   // instead of 100 to take into account not exactness of float value).
   if (BlockLayoutByFrequency) {
     infrequent_prob = MAX2(infrequent_prob, 
(float)BlockLayoutMinDiamondPercentage/110.0f);
   }
   // Check for highly predictable branch.  No point in CMOV'ing if
   // we are going to predict accurately all the time.
   if (iff->_prob < infrequent_prob ||
       iff->_prob > (1.0f - infrequent_prob))
     return NULL;

Note, BlockLayoutMinDiamondPercentage is default 20 so infrequent_prob 
become 0.2 as you observed.

C2 moves infrequent code outside the loop (with branches out and back) 
to keep only hot code inside. It looks like it does not happen in your 
case and I need to look why. There are several conditions besides 
BlockLayoutByFrequency and the above code could be incorrect and needs 
to be fixed (or removed).

Regards,
Vladimir

On 2/7/14 11:35 AM, Martin Grajcar wrote:
> I wrote a simple benchmark showing much better performance (on Core
> i5) for branching probability of about 50% than for 15%. The branch is
> unpredictable. The better performance comes from HotSpot replacing Jcc
> by CMOVcc, the bad performance comes from it not doing it in case in
> seemingly should.
>
> The linked picture shows the duration as measured with caliper
> http://i.stack.imgur.com/TstzH.png
> The attached JMH benchmark confirms it:
>
> PERCENTAGE:      MEAN    MIN    MAX   UNIT
> branchless:     7.237  6.977  7.283 ops/ms
>           5:     7.848  7.355  8.306 ops/ms
>          10:     5.522  5.359  5.665 ops/ms
>          15:     4.205  4.027  4.372 ops/ms
>          16:     3.964  3.677  4.255 ops/ms
> *        17:     3.779  3.478  4.048 ops/ms*
> *        18:     4.459  3.458  7.983 ops/ms*
> *        19:     7.922  7.168  8.188 ops/ms*
>          20:     8.008  7.697  8.328 ops/ms
>          30:     7.938  5.410  8.075 ops/ms
>          40:     8.004  7.651  8.256 ops/ms
>          50:     7.995  7.440  8.055 ops/ms
>
> It looks like the JIT switches to CMOVcc somewhere around 18% branching
> probability, but at this time the branching penalty reduces the speed to
> about a half. The break even lies somewhere around 5%, and using CMOVcc
> always would also be better than the current state.
>
> Is this a performance bug or is there an explanation?
>
> I might have forgotten some details, you can find them in my SO question
> or ask me
> http://stackoverflow.com/questions/19689214/strange-branching-performance
>
> Regards, Martin.
>


More information about the hotspot-compiler-dev mailing list