RFR: 8297753: AArch64: Add optimized rules for vector compare with zero on NEON [v2]

Sat Jan 28 10:32:20 UTC 2023

On Sat, 28 Jan 2023 02:38:10 GMT, Chang Peng <duke at openjdk.org> wrote:

>> We can use the compare-with-zero instructions like cmgt(zero)[1] immediately to avoid the extra scalar2vector operations.
>> 
>> The following instruction sequence
>> 
>> movi  v16.4s, #0x0
>> cmgt  v16.4s, v17.4s, v16.4s
>> 
>> can be optimized to:
>> 
>> cmgt v16.4s, v17.4s, #0x0
>> 
>> This patch does the following:
>> 1. Add NEON floating-point compare-with-zero instructions.
>> 2. Add optimized match rules to generate the compare-with-zero instructions.
>> 
>> [1]: https://developer.arm.com/documentation/ddi0602/2022-06/SIMD-FP-Instructions/CMGT--zero---Compare-signed-Greater-than-zero--vector--
>
> Chang Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits:
> 
>  - Resolving the merge conflicts caused by test/hotspot/gtest/aarch64/asmtest.out.h
>    
>    Change-Id: I896b879c8b7097a99e35fc1e53abab646240281a
>  - 8297753: AArch64: Add optimized rules for vector compare with zero on NEON
>    
>    We can use the compare-with-zero instructions like cmgt(zero)[1]
>    immediately to avoid the extra scalar2vector operations.
>    
>    The following instruction sequence
>    ```
>    movi  v16.4s, #0x0
>    cmgt  v16.4s, v17.4s, v16.4s
>    ```
>    can be optimized to:
>    ```
>    cmgt v16.4s, v17.4s, #0x0
>    ```
>    This patch does the following:
>    1. Add NEON floating-point compare-with-zero instructions.
>    2. Add optimized match rules to generate the compare-with-zero
>    instructions.
>    
>    [1]: https://developer.arm.com/documentation/ddi0602/2022-06/SIMD-FP-Instructions/CMGT--zero---Compare-signed-Greater-than-zero--vector--
>    
>    Change-Id: If026b477a0cad809bd201feafbfc9ab301a1b569

src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 3174:

> 3172:   INSN(fcvtzs, 0, 0b10, 0b01, 0b11011);
> 3173:   INSN(fcvtms, 0, 0b00, 0b01, 0b11011);
> 3174:   INSN(fcmgt,  0, 0b10, 0b01, 0b01100); // Floating-point compare greater than zero (vector)

if you were to make this `fcm(Condition cond, ...` rather than having separate definitions for each condition it might make the code simpler and shorter.

-------------

PR: https://git.openjdk.org/jdk/pull/11822