RFR: 8292898: [vectorapi] Unify vector mask cast operation [v4]
Jatin Bhateja
jbhateja at openjdk.org
Mon Sep 26 20:23:29 UTC 2022
On Mon, 19 Sep 2022 03:04:57 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:
>> The current implementation of the vector mask cast operation is
>> complex that the compiler generates different patterns for different
>> scenarios. For architectures that do not support the predicate
>> feature, vector mask is represented the same as the normal vector.
>> So the vector mask cast is implemented by `VectorCast `node. But this
>> is not always needed. When two masks have the same element size (e.g.
>> int vs. float), their bits layout are the same. So casting between
>> them does not need to emit any instructions.
>>
>> Currently the compiler generates different patterns based on the
>> vector type of the input/output and the platforms. Normally the
>> "`VectorMaskCast`" op is only used for cases that doesn't emit any
>> instructions, and "`VectorCast`" op is used to implement the necessary
>> expand/narrow operations. This can avoid adding some duplicate rules
>> in the backend. However, this also has the drawbacks:
>>
>> 1) The codes are complex, especially when the compiler needs to
>> check whether the hardware supports the necessary IRs for the
>> vector mask cast. It needs to check different patterns for
>> different cases.
>> 2) The vector mask cast operation could be implemented with cheaper
>> instructions than the vector casting on some architectures.
>>
>> Instead of generating `VectorCast `or `VectorMaskCast `nodes for different
>> cases of vector mask cast operations, this patch unifies the vector
>> mask cast implementation with "`VectorMaskCast`" node for all vector types
>> and platforms. The missing backend rules are also added for it.
>>
>> This patch also simplies the vector mask conversion happened in
>> "`VectorUnbox::Ideal()`". Normally "`VectorUnbox (VectorBox vmask)`" can
>> be optimized to "`vmask`" if the unboxing type matches with the boxed
>> "`vmask`" type. Otherwise, it needs the type conversion. Currently the
>> "`VectorUnbox`" will be transformed to two different patterns to implement
>> the conversion:
>>
>> 1) If the element size is not changed, it is transformed to:
>>
>> "VectorMaskCast vmask"
>>
>> 2) Otherwise, it is transformed to:
>>
>> "VectorLoadMask (VectorStoreMask vmask)"
>>
>> It firstly converts the "`vmask`" to a boolean vector with "`VectorStoreMask`",
>> and then uses "`VectorLoadMask`" to convert the boolean vector to the
>> dst mask vector. Since this patch makes "`VectorMaskCast`" op supported
>> for all types on all platforms, it doesn't need the "`VectorLoadMask`" and
>> "`VectorStoreMask`" to do the conversion. The existing transformation:
>>
>> VectorUnbox (VectorBox vmask) => VectorLoadMask (VectorStoreMask vmask)
>>
>> can be simplified to:
>>
>> VectorUnbox (VectorBox vmask) => VectorMaskCast vmask
>
> Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision:
>
> Add assertion to the elem num for mast cast
common IR changes looks good to me, few more comments related to x86 backend implementation
src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4626:
> 4624: switch (dst_bt_size / src_bt_size) {
> 4625: case 2: {
> 4626: if (vlen_enc == AVX_512bit && !VM_Version::supports_avx512bw()) {
Ideal type for a valid mask casting involving 512 bit species should be TypeVectMask, thus this code may never get executed since mask will be propagated using opmask register.
src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4642:
> 4640: case 2: {
> 4641: if (vlen_enc == AVX_512bit) {
> 4642: if (VM_Version::supports_avx512bw()) {
Please elaborate for which mask cast src/dst species do you expect this code to be executed. AVX512 masks are propagated using opmask registers.
src/hotspot/cpu/x86/x86.ad line 8412:
> 8410:
> 8411: instruct vmaskcast_avx(vec dst, vec src) %{
> 8412: predicate(Matcher::vector_length(n) == Matcher::vector_length(n->in(1)) &&
Redundant length check. Since we only create VectorMaskCast node is source and destinations have same lane counts.
src/hotspot/cpu/x86/x86.ad line 8414:
> 8412: predicate(Matcher::vector_length(n) == Matcher::vector_length(n->in(1)) &&
> 8413: Matcher::vector_length_in_bytes(n) != Matcher::vector_length_in_bytes(n->in(1)) &&
> 8414: (UseAVX >= 2 ||
Pattern looks relevant only for UseAVX=2
src/hotspot/cpu/x86/x86.ad line 8429:
> 8427:
> 8428: instruct vmaskcast_avx1_32B_expand(vec dst, vec src, vec xtmp1, vec xtmp2) %{
> 8429: predicate(UseAVX == 1 && Matcher::vector_length_in_bytes(n) == 32 &&
Limitation related to VectorLoadMask https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L1852 will prevent exercising this pattern.
src/hotspot/cpu/x86/x86.ad line 8439:
> 8437: BasicType src_bt = Matcher::vector_element_basic_type(this, $src);
> 8438: BasicType dst_bt = Matcher::vector_element_basic_type(this);
> 8439: __ vector_mask_cast($dst$$XMMRegister, $src$$XMMRegister, $xtmp1$$XMMRegister,
renaming macro-assembly routine will make the intent more clear vector_mask_cast32B, same goes for below shrink casting pattern.
src/hotspot/cpu/x86/x86.ad line 8446:
> 8444:
> 8445: instruct vmaskcast_avx1_32B_shrink(vec dst, vec src, vec xtmp) %{
> 8446: predicate(UseAVX == 1 && Matcher::vector_length_in_bytes(n->in(1)) == 32 &&
Limitation related to VectorLoadMask https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L1852 will prevent exercising this pattern.
src/hotspot/share/opto/vectorIntrinsics.cpp line 2547:
> 2545: if (is_mask) {
> 2546: // Make sure that cast for vector mask is implemented to particular type/size combination.
> 2547: if (!arch_supports_vector(Op_VectorMaskCast, num_elem_to, elem_bt_to, VecMaskNotUsed)) {
Mask casting does involve loading of masks, hence we should be passing VecMaskUseLoad instead of VecMaskNotUsed, I see that we are already querying target support for load/store mask nodes on Line#2439 but use of VecMaskUseAll looks stricter than needed.
-------------
PR: https://git.openjdk.org/jdk/pull/10192
More information about the hotspot-compiler-dev
mailing list