[vectorIntrinsics+mask] RFR: 8270349: Initial X86 backend support for optimizing masking operations on AVX512 targets. [v4]

Mon Aug 16 18:01:52 UTC 2021

On Sat, 14 Aug 2021 21:22:57 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> src/hotspot/cpu/x86/x86.ad line 8996:
>> 
>>> 8994:     int opc = this->ideal_Opcode();
>>> 8995:     __ evmasked_op(opc, bt, $mask$$KRegister, $dst$$XMMRegister,
>>> 8996:                    $dst$$XMMRegister, $src2$$XMMRegister, false, vlen_enc);
>> 
>> Since merge masking is false here, dst and src1 could be separate registers.
>> There is another flavor of rearrange with second vector, e.g.:
>>  IntVector rearrange(VectorShuffle<Integer> s, Vector<Integer> v);
>> Which can use rearrange with merge masking true.
>> I don't see a rule for that. Do you plan to add that later?
>
> All these instructions are in two address format, thus src1 is copied to dst first.  Second flavor of rearrange does not accept mask argument. This pattern specifically handled the case with mask.

Could you clarify further please. e.g. vpermd is a 3 address instruction:
 VPERMD zmm1 {k1}{z}, zmm2, zmm3/m512/m32bcst :Permute doublewords in zmm3/m512/m32bcst using indices in zmm2 and store the result in zmm1 using writemask k1.

>> src/hotspot/cpu/x86/x86.ad line 9021:
>> 
>>> 9019:   match(Set dst (AbsVL dst mask));
>>> 9020:   format %{ "vabs_masked $dst, $mask \t! vabs masked operation" %}
>>> 9021:   ins_cost(100);
>> 
>> It is not clear why ins_cost is required for matching?
>
> To give this preference over Set dst (AbsVI src) pattern

The pattern is different, one has mask and one doesn't.
match(Set dst (AbsVI dst mask));
match(Set dst (AbsVI src));

>> src/hotspot/cpu/x86/x86.ad line 9057:
>> 
>>> 9055:     int opc = this->ideal_Opcode();
>>> 9056:     __ evmasked_op(opc, bt, $mask$$KRegister, $dst$$XMMRegister,
>>> 9057:                    $src2$$XMMRegister, $src3$$Address, true, vlen_enc);
>> 
>> This and the previous instruct should translate to fma231.
>
> same as above

In non masked version, fma231 is being used.  Please double check:
FMA operation is dst = dst + src2 * src3, where dst is the accumulator.

-------------

PR: https://git.openjdk.java.net/panama-vector/pull/99