[vectorIntrinsics+mask] RFR: 8270349: Initial X86 backend support for optimizing masking operations on AVX512 targets. [v7]

Thu Aug 19 20:03:29 UTC 2021

On Mon, 16 Aug 2021 18:41:25 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote:

>> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   8270349: Optimizing JIT sequence for alltrue/anytrue and maskAll operations.
>
> src/hotspot/cpu/x86/x86.ad line 1962:
> 
>> 1960:       assert(bt != T_INT  || VM_Version::supports_avx512bw(), "");
>> 1961:       assert(bt != T_LONG || VM_Version::supports_avx512bw(), "");
>> 1962:       if (bt == T_BYTE && VM_Version::supports_avx512dq()) {
> 
> Should this be 
> if (bt == T_BYTE && !VM_Version::supports_avx512dq())

DONE

> src/hotspot/cpu/x86/x86.ad line 3798:
> 
>> 3796:     BasicType elem_bt = vector_element_basic_type(this);
>> 3797:     assert(!is_subword_type(elem_bt), "sanity"); // T_INT, T_LONG, T_FLOAT, T_DOUBLE
>> 3798:     __ kmovwl($ktmp$$KRegister, $mask$$KRegister);
> 
> Why do we need kmovwl here?

Gather/Scatter instruction partially updates predicate register, hence moving mask to temporary.

> src/hotspot/cpu/x86/x86.ad line 3838:
> 
>> 3836:     assert(vector_length_in_bytes(this, $src) >= 16, "sanity");
>> 3837:     assert(!is_subword_type(elem_bt), "sanity"); // T_INT, T_LONG, T_FLOAT, T_DOUBLE
>> 3838:     __ kmovwl($ktmp$$KRegister, $mask$$KRegister);
> 
> Why do we need kmovwl here?

Gather/Scatter instruction partially updates predicate register, hence moving mask to temporary.

> src/hotspot/cpu/x86/x86.ad line 9061:
> 
>> 9059:     __ movslq($tmp$$Register, $src$$Register);
>> 9060:     __ kmovql($dst$$KRegister, $tmp$$Register);
>> 9061:     __ kshiftrql($dst$$KRegister, $dst$$KRegister, 64 - vec_len);
> 
> Could we not do kmovdl followed by kshiftrdl here? Thereby removing the need for movslq.

maskAll accept a boolean argument (false (0) , true(-1)).
value operand has a rRegI register class which represent 32 bit register. This value need to be sign extended to 64 bit value before computing the final mask value using shift right operation.

-------------

PR: https://git.openjdk.java.net/panama-vector/pull/99