RFR: 8285790: AArch64: Merge C2 NEON and SVE matching rules

Tue Jul 5 04:25:25 UTC 2022

On Mon, 4 Jul 2022 12:51:22 GMT, Andrew Haley <aph at openjdk.org> wrote:

> However, just putting aside for a moment the lack of useful abstraction mechanisms, I note that there's a lot of code like this:
> 
> ```
>     if (length_in_bytes <= 16) {
>       // ... Neon
>     } else {
>       assert(UseSVE > 0, "must be sve");
>       // ... SVE
>     }
> ```
> 
> which is to say, there's an implicit assumption that if an operation can be done with Neon it will be, and SVE will only be used if not. What is the justification for that assumption?

Not exactly.
It's only for common **64/128-bit unpredicated** vector operations, when NEON have equivalent instructions as SVE.

Recall the **Drawback-1** and **Update-2 (part 2)** in the commit message.

Besides the code pattern you mentioned, there are many pairs of rules with "**_le128b**" and "**_gt128b**" suffixes, e.g., vmulI_le128b() and vmulI_gt128b(). We use two rules mainly because different numbers of arguments are used. Otherwise, we tend to put them into one rule, which is your mentioned pattern, e.g., vadd().

The main reason we conduct this change lies in that from Neoverse V1 and N2 optimization guides, if the size fit, common NEON instructions are no slower than equivalent SVE instructions in latency and throughput.

Note-1: In current aarch64_sve.ad file, there are already several rules under this rule, e.g., loadV16_vreg(), vroundFtoI(), insertI_le128bits(). There is an ongoing patch as well in [link](https://github.com/openjdk/jdk/pull/7999). This patch makes them more clear.
Note-2: As we mentioned in the part 4 in **TESTING** section, we ran JMH testing on one SVE machine and didn't observe regression and we will do more measurement on different systems.

-------------

PR: https://git.openjdk.org/jdk/pull/9346