RFR: 8264352: AArch64: Optimize vector "not/andNot" for NEON and SVE

Thu Apr 8 01:52:08 UTC 2021

On Wed, 7 Apr 2021 09:03:55 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> Since the vector bitwise `"andNot"` is implemented with `"v1.and(v2.xor(-1))"`, the generated codes with SVE look like:
>>   mov z16.b, #-1
>>   eor z17.d, z20.d, z16.d
>>   and z18.d, z18.d, z17.d
>> This could be improved with a single instruction:
>>   bic z16.d, z16.d, z18.d
>> Similarly, the following optimization for NEON is also needed:
>>   not v21.16b, v21.16b
>>   and v21.16b, v21.16b, v18.16b  ==>  bic v21.16b, v18.16b, v21.16b
>> This patch also adds the following optimization to vector` "not"` for SVE which has already been added for NEON:
>>   mov z16.b, #-1
>>   eor z17.d, z20.d, z16.d     ==>   not z17.d, p7/m, z20.d
>> The performance can improve about `16% ~ 36%` with NEON for the `"AND_NOT"` benchmark [1].
>> 
>> [1] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/jdk/jdk/incubator/vector/benchmark/src/main/java/benchmark/jdk/incubator/vector/ByteMaxVector.java#L343
>> 
>> Tested tier1 and jdk:tier3.
>
> Marked as reviewed by aph (Reviewer).

Thanks for the review @theRealAph @nsjian !

-------------

PR: https://git.openjdk.java.net/jdk/pull/3370