[vectorIntrinsics+mask] RFR: 8270349: Initial X86 backend support for optimizing masking operations on AVX512 targets. [v2]

Thu Aug 12 06:23:49 UTC 2021

On Wed, 11 Aug 2021 22:32:46 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote:

>> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits:
>> 
>>  - 8270349: Merge with latest vectorIntrinsics+mask tip + extend backend support for XorV,AndV,OrV and Compare masked operations.
>>  - 8270349: Fix for 32-bit build failure.
>>  - 8270349: Initial X86 backend support for optimizing masking operations on AVX512 targets.
>
> src/hotspot/cpu/x86/assembler_x86.cpp line 8877:
> 
>> 8875: }
>> 8876: 
>> 8877: void Assembler::evfmaps(XMMRegister dst, KRegister mask, XMMRegister nds, XMMRegister src, bool merge, int vector_len) {
> 
> It will be good to specify the flavor of fma here, say evfmaps213ps based on the opcode that you use.
> Another point is that the 213 flavor does the following operation:
>    dst = src + dst * nds;
> Wouldn't the 231 flavor be better?

Existing implementation is not functionally incorrect, predication will still choose the old (dst) or new value, connections are set appropriately in IR.

> src/hotspot/cpu/x86/assembler_x86.cpp line 9000:
> 
>> 8998: }
>> 8999: 
>> 9000: void Assembler::evppermd(XMMRegister dst, KRegister mask, XMMRegister nds, Address src, bool merge, int vector_len) {
> 
> This should be evpermd.

evp is chosen as prefix in all the masked assemble routines. (evex + vector + packed)

> src/hotspot/cpu/x86/x86.ad line 8610:
> 
>> 8608:   match(Set dst (AddVL (Binary dst src2) mask));
>> 8609:   match(Set dst (AddVF (Binary dst src2) mask));
>> 8610:   match(Set dst (AddVD (Binary dst src2) mask));
> 
> Could we add the match rules for sub, and, or, xor, mul, div, min, max, rearrange here?
> The ins_encode part it the same for all of these.

We will not be saving anything by clubbing the rules,  on pattern per operation is easy to review and maintain.
Your suggestion will make one pattern very bulky.

> src/hotspot/cpu/x86/x86.ad line 8629:
> 
>> 8627:   match(Set dst (AddVL (Binary dst src2) mask));
>> 8628:   match(Set dst (AddVF (Binary dst src2) mask));
>> 8629:   match(Set dst (AddVD (Binary dst src2) mask));
> 
> Could we add the match rules for sub, and, or, xor, mul, div, min, max, rearrange here?
> The ins_encode part it the same for all of these.

Same as above

> src/hotspot/cpu/x86/x86.ad line 8833:
> 
>> 8831:   match(Set dst (LShiftVS (Binary dst src2) mask));
>> 8832:   match(Set dst (LShiftVI (Binary dst src2) mask));
>> 8833:   match(Set dst (LShiftVL (Binary dst src2) mask));
> 
> Could we add the match rules for RShift, URShift here?
> The ins_encode part it the same for all of these.

Same as above

> src/hotspot/cpu/x86/x86.ad line 8850:
> 
>> 8848:   match(Set dst (LShiftVS (Binary dst src2) mask));
>> 8849:   match(Set dst (LShiftVI (Binary dst src2) mask));
>> 8850:   match(Set dst (LShiftVL (Binary dst src2) mask));
> 
> Could we add the match rules for RShift, URShift here?
> The ins_encode part it the same for all of these.

Same as above

-------------

PR: https://git.openjdk.java.net/panama-vector/pull/99