RFR: 8285973: x86_64: Improve fp comparison and cmove for eq/ne

Fri May 20 22:01:48 UTC 2022

On Wed, 18 May 2022 14:59:33 GMT, Quan Anh Mai <duke at openjdk.java.net> wrote:

>> Hi,
>> 
>> This patch optimises the matching rules for floating-point comparison with respects to eq/ne on x86-64
>> 
>> 1, When the inputs of a comparison is the same (i.e `isNaN` patterns), `ZF` is always set, so we don't need `cmpOpUCF2` for the eq/ne cases, which improves the sequence of `If (CmpF x x) (Bool ne)` from
>> 
>>     ucomiss xmm0, xmm0
>>     jp      label
>>     jne     label
>> 
>> into
>> 
>>     ucomiss xmm0, xmm0
>>     jp      label
>> 
>> 2, The move rules for `cmpOpUCF2` is missing, which makes patterns such as `x == y ? 1 : 0` to fall back to `cmpOpU`, which have a really high cost of fixing the flags, such as
>> 
>>         xorl    ecx, ecx
>>         ucomiss xmm0, xmm1
>>         jnp     done
>>         pushf
>>         andq    [rsp], 0xffffff2b
>>         popf
>>     done:
>>         movl    eax, 1
>>         cmovel  eax, ecx
>> 
>> The patch changes this sequence into
>> 
>>     xorl    ecx, ecx
>>     ucomiss xmm0, xmm1
>>     movl    eax, 1
>>     cmovpl  eax, ecx
>>     cmovnel eax, ecx
>> 
>> 3, The patch also changes the pattern of `isInfinite` to be more optimised by using `Math.abs` to reduce 1 comparison and compares the result with `MAX_VALUE` since `>` is more optimised than `==` for floating-point types.
>> 
>> The benchmark results are as follow:
>> 
>>     Before:
>>     Benchmark                      Mode  Cnt     Score     Error  Units
>>     FPComparison.equalDouble       avgt    5  2876.242 ±  58.875  ns/op
>>     FPComparison.equalFloat        avgt    5  3062.430 ±  31.371  ns/op
>>     FPComparison.isFiniteDouble    avgt    5   475.749 ±  19.027  ns/op
>>     FPComparison.isFiniteFloat     avgt    5   506.525 ±  14.417  ns/op
>>     FPComparison.isInfiniteDouble  avgt    5  1232.800 ±  31.677  ns/op
>>     FPComparison.isInfiniteFloat   avgt    5  1234.708 ±  70.239  ns/op
>>     FPComparison.isNanDouble       avgt    5  2255.847 ±   7.238  ns/op
>>     FPComparison.isNanFloat        avgt    5  2567.044 ±  36.078  ns/op
>> 
>>     After:
>>     Benchmark                      Mode  Cnt     Score     Error  Units
>>     FPComparison.equalDouble       avgt    5   594.636 ±   8.922  ns/op
>>     FPComparison.equalFloat        avgt    5   663.849 ±   3.656  ns/op
>>     FPComparison.isFiniteDouble    avgt    5   518.309 ± 107.352  ns/op
>>     FPComparison.isFiniteFloat     avgt    5   515.576 ±  14.669  ns/op
>>     FPComparison.isInfiniteDouble  avgt    5   621.185 ±  11.935  ns/op
>>     FPComparison.isInfiniteFloat   avgt    5   623.566 ±  15.206  ns/op
>>     FPComparison.isNanDouble       avgt    5   400.124 ±   0.762  ns/op
>>     FPComparison.isNanFloat        avgt    5   546.486 ±   1.509  ns/op
>> 
>> Thank you very much.
>
> I have reverted the changes to `java.lang.Float` and `java.lang.Double` to not interfere with the intrinsic PR. More tests are added to cover all cases regarding floating-point comparison of compiled code.
> 
> The rules for fp comparison that output the result to `rFlagRegsU` are expensive and should be avoided. As a result, I removed the shortcut rules with memory or constant operands to reduce the number of match rules. Only the basic rules are kept.
> 
> Thanks.

@merykitty Very nice work! The patch looks good to me.

-------------

PR: https://git.openjdk.java.net/jdk/pull/8525