RFR: 8285973: x86_64: Improve fp comparison and cmove for eq/ne [v2]

Thu May 19 20:46:40 UTC 2022

On Wed, 18 May 2022 14:59:49 GMT, Quan Anh Mai <duke at openjdk.java.net> wrote:

>> Hi,
>> 
>> This patch optimises the matching rules for floating-point comparison with respects to eq/ne on x86-64
>> 
>> 1, When the inputs of a comparison is the same (i.e `isNaN` patterns), `ZF` is always set, so we don't need `cmpOpUCF2` for the eq/ne cases, which improves the sequence of `If (CmpF x x) (Bool ne)` from
>> 
>>     ucomiss xmm0, xmm0
>>     jp      label
>>     jne     label
>> 
>> into
>> 
>>     ucomiss xmm0, xmm0
>>     jp      label
>> 
>> 2, The move rules for `cmpOpUCF2` is missing, which makes patterns such as `x == y ? 1 : 0` to fall back to `cmpOpU`, which have a really high cost of fixing the flags, such as
>> 
>>         xorl    ecx, ecx
>>         ucomiss xmm0, xmm1
>>         jnp     done
>>         pushf
>>         andq    [rsp], 0xffffff2b
>>         popf
>>     done:
>>         movl    eax, 1
>>         cmovel  eax, ecx
>> 
>> The patch changes this sequence into
>> 
>>     xorl    ecx, ecx
>>     ucomiss xmm0, xmm1
>>     movl    eax, 1
>>     cmovpl  eax, ecx
>>     cmovnel eax, ecx
>> 
>> 3, The patch also changes the pattern of `isInfinite` to be more optimised by using `Math.abs` to reduce 1 comparison and compares the result with `MAX_VALUE` since `>` is more optimised than `==` for floating-point types.
>> 
>> The benchmark results are as follow:
>> 
>>     Before:
>>     Benchmark                      Mode  Cnt     Score     Error  Units
>>     FPComparison.equalDouble       avgt    5  2876.242 ±  58.875  ns/op
>>     FPComparison.equalFloat        avgt    5  3062.430 ±  31.371  ns/op
>>     FPComparison.isFiniteDouble    avgt    5   475.749 ±  19.027  ns/op
>>     FPComparison.isFiniteFloat     avgt    5   506.525 ±  14.417  ns/op
>>     FPComparison.isInfiniteDouble  avgt    5  1232.800 ±  31.677  ns/op
>>     FPComparison.isInfiniteFloat   avgt    5  1234.708 ±  70.239  ns/op
>>     FPComparison.isNanDouble       avgt    5  2255.847 ±   7.238  ns/op
>>     FPComparison.isNanFloat        avgt    5  2567.044 ±  36.078  ns/op
>> 
>>     After:
>>     Benchmark                      Mode  Cnt     Score     Error  Units
>>     FPComparison.equalDouble       avgt    5   594.636 ±   8.922  ns/op
>>     FPComparison.equalFloat        avgt    5   663.849 ±   3.656  ns/op
>>     FPComparison.isFiniteDouble    avgt    5   518.309 ± 107.352  ns/op
>>     FPComparison.isFiniteFloat     avgt    5   515.576 ±  14.669  ns/op
>>     FPComparison.isInfiniteDouble  avgt    5   621.185 ±  11.935  ns/op
>>     FPComparison.isInfiniteFloat   avgt    5   623.566 ±  15.206  ns/op
>>     FPComparison.isNanDouble       avgt    5   400.124 ±   0.762  ns/op
>>     FPComparison.isNanFloat        avgt    5   546.486 ±   1.509  ns/op
>> 
>> Thank you very much.
>
> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision:
> 
>  - incidental ws
>  - add tests
>  - Merge branch 'master' into fpcompare
>  - fix tests
>  - test
>  - improve infinity
>  - remove expensive rules
>  - improve fp comparison

Could you pls show the performance delta with the baseline and after the patch? Otherwise, people reviewing this PR have to manually compute how much improvement is obtained. For example, `FPComparison.isNanFloat` is showing `4.7x` improvement. Kindly fill the delta column for rest of the data points. 

Instead of two separate tables, the suggested table format for each row would be:
`<benchmark> , <baseline throughput> , <this patch throughput>, <performance delta in % or x>`

-------------

PR: https://git.openjdk.java.net/jdk/pull/8525