RFR: 8292898: [vectorapi] Unify vector mask cast operation [v4]

Fri Sep 23 05:32:25 UTC 2022

On Mon, 19 Sep 2022 03:04:57 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:

>> The current implementation of the vector mask cast operation is
>> complex that the compiler generates different patterns for different
>> scenarios. For architectures that do not support the predicate
>> feature, vector mask is represented the same as the normal vector.
>> So the vector mask cast is implemented by `VectorCast `node. But this
>> is not always needed. When two masks have the same element size (e.g.
>> int vs. float), their bits layout are the same. So casting between
>> them does not need to emit any instructions.
>> 
>> Currently the compiler generates different patterns based on the
>> vector type of the input/output and the platforms. Normally the
>> "`VectorMaskCast`" op is only used for cases that doesn't emit any
>> instructions, and "`VectorCast`" op is used to implement the necessary
>> expand/narrow operations. This can avoid adding some duplicate rules
>> in the backend. However, this also has the drawbacks:
>> 
>>  1) The codes are complex, especially when the compiler needs to
>>     check whether the hardware supports the necessary IRs for the
>>     vector mask cast. It needs to check different patterns for
>>     different cases.
>>  2) The vector mask cast operation could be implemented with cheaper
>>     instructions than the vector casting on some architectures.
>> 
>> Instead of generating `VectorCast `or `VectorMaskCast `nodes for different
>> cases of vector mask cast operations, this patch unifies the vector
>> mask cast implementation with "`VectorMaskCast`" node for all vector types
>> and platforms. The missing backend rules are also added for it.
>> 
>> This patch also simplies the vector mask conversion happened in
>> "`VectorUnbox::Ideal()`". Normally "`VectorUnbox (VectorBox vmask)`" can
>> be optimized to "`vmask`" if the unboxing type matches with the boxed
>> "`vmask`" type. Otherwise, it needs the type conversion. Currently the
>> "`VectorUnbox`" will be transformed to two different patterns to implement
>> the conversion:
>> 
>>  1) If the element size is not changed, it is transformed to:
>> 
>>     "VectorMaskCast vmask"
>> 
>>  2) Otherwise, it is transformed to:
>> 
>>     "VectorLoadMask (VectorStoreMask vmask)"
>> 
>> It firstly converts the "`vmask`" to a boolean vector with "`VectorStoreMask`",
>> and then uses "`VectorLoadMask`" to convert the boolean vector to the
>> dst mask vector. Since this patch makes "`VectorMaskCast`" op supported
>> for all types on all platforms, it doesn't need the "`VectorLoadMask`" and
>> "`VectorStoreMask`" to do the conversion. The existing transformation:
>> 
>>   VectorUnbox (VectorBox vmask) => VectorLoadMask (VectorStoreMask vmask)
>> 
>> can be simplified to:
>> 
>>   VectorUnbox (VectorBox vmask) => VectorMaskCast vmask
>
> Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Add assertion to the elem num for mast cast

Some initial comment.

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4643:

> 4641:         if (vlen_enc == AVX_512bit) {
> 4642:           if (VM_Version::supports_avx512bw()) {
> 4643:             evpmovwb(dst, src, vlen_enc);

Same problem as above:   src = long , dst  = int,  narrowing conversion (src/dst = 2) 
Expectation is to convert each quad word to double work not each  word to byte.

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4650:

> 4648:           if (VM_Version::supports_avx512bw()) {
> 4649:             evpmovwb(dst, src, vlen_enc);
> 4650:           } else if (dst_bt != T_BYTE) {

Line #4649 will be a dead code, above BW is already checked at line #4642.

src/hotspot/share/opto/vectorIntrinsics.cpp line 2559:

> 2557:           op = gvn().transform(new VectorMaskCastNode(op, dst_type));
> 2558:         } else {
> 2559:           op = VectorMaskCastNode::makeCastNode(&gvn(), op, dst_type);

Masks are either predicated registers or vectors, in former case cast is a no-op and in later case mask casting is similar to vector casting where each vector lane is holding either a 0 or -1 value and we leverage backend implementation for existing VectorCastX2Y IR nodes.  Unification here in a way duplicating backed implementation at least for X86.

-------------

PR: https://git.openjdk.org/jdk/pull/10192