RFR: 8285790: AArch64: Merge C2 NEON and SVE matching rules

Fri Jul 22 05:38:04 UTC 2022

On Mon, 4 Jul 2022 12:51:22 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> Aha! I was looking forward to this.
>> 
>> On 7/1/22 11:46, Hao Sun wrote:
>>  > Note-1: m4 file is not introduced because many duplications are highly
>>  > reduced now.
>> 
>> Yes, but there's still a lot of duplications. I'll make a few examples
>> of where you should make simple changes that will usefully increase the
>> level of abstraction. That will be a start.
>
>> @theRealAph Thanks for your comment. Yes. There are still duplicate code. I can easily list several ones, such as the reduce-and/or/xor, vector shift ops and several reg with imm rules. We're open to keep m4 file.
>> 
>> But I would suggest that we may put our attention firstly on 1) our implementation on generic vector registers and 2) the merged rules (in particular those we share the codegen for NEON only platform and 128-bit vector ops on SVE platform). After that we may discuss whether to use m4 file and how to implement it if needed.
> 
> We can do both: there's no sense in which one excludes the other, and we have time.
> 
> However, just putting aside for a moment the lack of useful abstraction mechanisms, I note that there's a lot of code like this:
> 
> 
>     if (length_in_bytes <= 16) {
>       // ... Neon
>     } else {
>       assert(UseSVE > 0, "must be sve");
>       // ... SVE
>     }
> 
> 
> which is to say, there's an implicit assumption that if an operation can be done with Neon it will be, and SVE will only be used if not. What is the justification for that assumption?

Hi @theRealAph , three commits are uploaded. Could you help take a look at them when you have spare time? Thanks.

commit-1: merge the `master` branch as of 7th-July. Of course, more merges would be done during the follow-up review process.

commit-2: add one `VM_Version` flag to control whether to generate NEON instructions for 64/128-bit vector operations on SVE.

commit-3: add the m4 file. We tried to make as many abstractions as possible in the m4 file.
Before m4 is introduced,  `aarch64_vector.ad` file is ~5k LOC. And now with this commit, we use ~4k LOC `aarch64_vector_ad.m4` file, i.e. only ~20% reduction. 
I personally think the reduction is not that big, compared to the reductions between `aarch64_neon/sve_ad.m4` and `aarch64_neon/sve.ad`.

-------------

PR: https://git.openjdk.org/jdk/pull/9346