RFR: 8285973: x86_64: Improve fp comparison and cmove for eq/ne
Vladimir Kozlov
kvn at openjdk.java.net
Wed May 4 23:20:18 UTC 2022
On Wed, 4 May 2022 19:32:41 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:
>> Hi,
>>
>> This patch optimises the matching rules for floating-point comparison with respects to eq/ne on x86-64
>>
>> 1, When the inputs of a comparison is the same (i.e `isNaN` patterns), `ZF` is always set, so we don't need `cmpOpUCF2` for the eq/ne cases, which improves the sequence of `If (CmpF x x) (Bool ne)` from
>>
>> ucomiss xmm0, xmm0
>> jp label
>> jne label
>>
>> into
>>
>> ucomiss xmm0, xmm0
>> jp label
>>
>> 2, The move rules for `cmpOpUCF2` is missing, which makes patterns such as `x == y ? 1 : 0` to fall back to `cmpOpU`, which have a really high cost of fixing the flags, such as
>>
>> xorl ecx, ecx
>> ucomiss xmm0, xmm1
>> jnp done
>> pushf
>> andq [rsp], 0xffffff2b
>> popf
>> done:
>> movl eax, 1
>> cmovel eax, ecx
>>
>> The patch changes this sequence into
>>
>> xorl ecx, ecx
>> ucomiss xmm0, xmm1
>> movl eax, 1
>> cmovpl eax, ecx
>> cmovnel eax, ecx
>>
>> 3, The patch also changes the pattern of `isInfinite` to be more optimised by using `Math.abs` to reduce 1 comparison and compares the result with `MAX_VALUE` since `>` is more optimised than `==` for floating-point types.
>>
>> The benchmark results are as follow:
>>
>> Before:
>> Benchmark Mode Cnt Score Error Units
>> FPComparison.equalDouble avgt 5 2876.242 ± 58.875 ns/op
>> FPComparison.equalFloat avgt 5 3062.430 ± 31.371 ns/op
>> FPComparison.isFiniteDouble avgt 5 475.749 ± 19.027 ns/op
>> FPComparison.isFiniteFloat avgt 5 506.525 ± 14.417 ns/op
>> FPComparison.isInfiniteDouble avgt 5 1232.800 ± 31.677 ns/op
>> FPComparison.isInfiniteFloat avgt 5 1234.708 ± 70.239 ns/op
>> FPComparison.isNanDouble avgt 5 2255.847 ± 7.238 ns/op
>> FPComparison.isNanFloat avgt 5 2567.044 ± 36.078 ns/op
>>
>> After:
>> Benchmark Mode Cnt Score Error Units
>> FPComparison.equalDouble avgt 5 594.636 ± 8.922 ns/op
>> FPComparison.equalFloat avgt 5 663.849 ± 3.656 ns/op
>> FPComparison.isFiniteDouble avgt 5 518.309 ± 107.352 ns/op
>> FPComparison.isFiniteFloat avgt 5 515.576 ± 14.669 ns/op
>> FPComparison.isInfiniteDouble avgt 5 621.185 ± 11.935 ns/op
>> FPComparison.isInfiniteFloat avgt 5 623.566 ± 15.206 ns/op
>> FPComparison.isNanDouble avgt 5 400.124 ± 0.762 ns/op
>> FPComparison.isNanFloat avgt 5 546.486 ± 1.509 ns/op
>>
>> Thank you very much.
>
> src/hotspot/cpu/x86/x86_64.ad line 6998:
>
>> 6996: ins_encode %{
>> 6997: __ cmovl(Assembler::parity, $dst$$Register, $src$$Register);
>> 6998: __ cmovl(Assembler::notEqual, $dst$$Register, $src$$Register);
>
> Should this be `equal`?
I see that you swapped `src, dst` in `match()` but `format` is sill incorrect and the code is confusing.
-------------
PR: https://git.openjdk.java.net/jdk/pull/8525
More information about the hotspot-compiler-dev
mailing list