RFR: 8285973: x86_64: Improve fp comparison and cmove for eq/ne [v2]
Srinivas Vamsi Parasa
duke at openjdk.java.net
Thu May 19 20:46:40 UTC 2022
On Wed, 18 May 2022 14:59:49 GMT, Quan Anh Mai <duke at openjdk.java.net> wrote:
>> Hi,
>>
>> This patch optimises the matching rules for floating-point comparison with respects to eq/ne on x86-64
>>
>> 1, When the inputs of a comparison is the same (i.e `isNaN` patterns), `ZF` is always set, so we don't need `cmpOpUCF2` for the eq/ne cases, which improves the sequence of `If (CmpF x x) (Bool ne)` from
>>
>> ucomiss xmm0, xmm0
>> jp label
>> jne label
>>
>> into
>>
>> ucomiss xmm0, xmm0
>> jp label
>>
>> 2, The move rules for `cmpOpUCF2` is missing, which makes patterns such as `x == y ? 1 : 0` to fall back to `cmpOpU`, which have a really high cost of fixing the flags, such as
>>
>> xorl ecx, ecx
>> ucomiss xmm0, xmm1
>> jnp done
>> pushf
>> andq [rsp], 0xffffff2b
>> popf
>> done:
>> movl eax, 1
>> cmovel eax, ecx
>>
>> The patch changes this sequence into
>>
>> xorl ecx, ecx
>> ucomiss xmm0, xmm1
>> movl eax, 1
>> cmovpl eax, ecx
>> cmovnel eax, ecx
>>
>> 3, The patch also changes the pattern of `isInfinite` to be more optimised by using `Math.abs` to reduce 1 comparison and compares the result with `MAX_VALUE` since `>` is more optimised than `==` for floating-point types.
>>
>> The benchmark results are as follow:
>>
>> Before:
>> Benchmark Mode Cnt Score Error Units
>> FPComparison.equalDouble avgt 5 2876.242 ± 58.875 ns/op
>> FPComparison.equalFloat avgt 5 3062.430 ± 31.371 ns/op
>> FPComparison.isFiniteDouble avgt 5 475.749 ± 19.027 ns/op
>> FPComparison.isFiniteFloat avgt 5 506.525 ± 14.417 ns/op
>> FPComparison.isInfiniteDouble avgt 5 1232.800 ± 31.677 ns/op
>> FPComparison.isInfiniteFloat avgt 5 1234.708 ± 70.239 ns/op
>> FPComparison.isNanDouble avgt 5 2255.847 ± 7.238 ns/op
>> FPComparison.isNanFloat avgt 5 2567.044 ± 36.078 ns/op
>>
>> After:
>> Benchmark Mode Cnt Score Error Units
>> FPComparison.equalDouble avgt 5 594.636 ± 8.922 ns/op
>> FPComparison.equalFloat avgt 5 663.849 ± 3.656 ns/op
>> FPComparison.isFiniteDouble avgt 5 518.309 ± 107.352 ns/op
>> FPComparison.isFiniteFloat avgt 5 515.576 ± 14.669 ns/op
>> FPComparison.isInfiniteDouble avgt 5 621.185 ± 11.935 ns/op
>> FPComparison.isInfiniteFloat avgt 5 623.566 ± 15.206 ns/op
>> FPComparison.isNanDouble avgt 5 400.124 ± 0.762 ns/op
>> FPComparison.isNanFloat avgt 5 546.486 ± 1.509 ns/op
>>
>> Thank you very much.
>
> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision:
>
> - incidental ws
> - add tests
> - Merge branch 'master' into fpcompare
> - fix tests
> - test
> - improve infinity
> - remove expensive rules
> - improve fp comparison
Could you pls show the performance delta with the baseline and after the patch? Otherwise, people reviewing this PR have to manually compute how much improvement is obtained. For example, `FPComparison.isNanFloat` is showing `4.7x` improvement. Kindly fill the delta column for rest of the data points.
Instead of two separate tables, the suggested table format for each row would be:
`<benchmark> , <baseline throughput> , <this patch throughput>, <performance delta in % or x>`
-------------
PR: https://git.openjdk.java.net/jdk/pull/8525
More information about the hotspot-compiler-dev
mailing list