Strange branching performance

Vladimir Kozlov vladimir.kozlov at oracle.com
Wed Feb 12 17:09:50 PST 2014


I filed RFE to track this issue:

https://bugs.openjdk.java.net/browse/JDK-8034833

Regards,
Vladimir

On 2/12/14 4:18 PM, Vladimir Kozlov wrote:
> Hi Martin,
>
> The issue is more complicated than I thought. The code I pointed before
> was added by me about 3 years ago for:
>
> 7097546: Optimize use of CMOVE instructions
> https://bugs.openjdk.java.net/browse/JDK-7097546
>
> Changes were done to avoid 2x performance hit with cmov for code like next:
>
>      public static int test(int result, int limit, int mask) { // mask = 15
>          for (int i = 0; i < limit; i++) {
>            if ((i&mask) == 0) result++; // Non frequent
>          }
>          return result;
>      }
>
> Cmov instruction has big flow - it requires an additional register. If
> loop's body is complex, using cmov will result in a register spilling -
> additional instructions. The performance hit could be high than branch
> misprediction.
>
> I am not sure how to proceed from here. I may do some benchmark testing
> to see affects if cmov is used in more cases.
>
> Regards,
> Vladimir
>
> On 2/8/14 1:11 PM, Martin Grajcar wrote:
>> Hi Vladimir!
>>
>> On Sat, Feb 8, 2014 at 4:36 AM, Vladimir Kozlov
>> <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>> wrote:
>>
>>     Hi Martin,
>>
>>     Your observation is correct. The corresponding code is next:
>>
>>        float infrequent_prob = PROB_UNLIKELY_MAG(3); // 0.001
>>
>>        // BlockLayoutByFrequency optimization moves infrequent branch
>>        // from hot path. No point in CMOV'ing in such case (110 is used
>>        // instead of 100 to take into account not exactness of float
>> value).
>>        if (BlockLayoutByFrequency) {
>>          infrequent_prob = MAX2(infrequent_prob,
>>     (float)__BlockLayoutMinDiamondPercentag__e/110.0f);
>>        }
>>        // Check for highly predictable branch.  No point in CMOV'ing if
>>        // we are going to predict accurately all the time.
>>        if (iff->_prob < infrequent_prob ||
>>            iff->_prob > (1.0f - infrequent_prob))
>>          return NULL;
>>
>>     Note, BlockLayoutMinDiamondPercentag__e is default 20 so
>>     infrequent_prob become 0.2 as you observed.
>>
>>
>> Yes, there's a sharp edge somewhere below 0.2.
>>
>>     C2 moves infrequent code outside the loop (with branches out and
>>     back) to keep only hot code inside.
>>
>>
>> To me it looks like there's nothing to be moved outside of the loop.
>> Mainly because you'd hardy save anything as you'd replace the two
>> instructions
>>
>> LEA (%result_reg, 1), %tmp_reg
>> CMOVEQ %tmp_reg, %result_reg
>>
>> by a conditional jump. Saving a single instruction on the hot path and
>> risking a branch misprediction penalty might make sense for very low
>> probabilities like PROB_UNLIKELY_MAG(3), not 20%.
>>
>>     It looks like it does not happen in your case and I need to look
>>     why. There are several conditions besides BlockLayoutByFrequency and
>>     the above code could be incorrect and needs to be fixed (or removed).
>>
>>
>> Nice that you can look into it. There are a lot of attempts to eliminate
>> branching manually like in
>> http://grepcode.com/file/repo1.maven.org/maven2/com.google.guava/guava/15.0/com/google/common/math/IntMath.java#IntMath.gcd%28int%2Cint%29
>>
>> but this is nearly always less efficient than using CMOVcc.
>>
>> Regards,
>> Martin.


More information about the hotspot-compiler-dev mailing list