RFR: 8292898: [vectorapi] Unify vector mask cast operation [v8]
Quan Anh Mai
qamai at openjdk.org
Mon Oct 10 08:20:58 UTC 2022
On Mon, 10 Oct 2022 04:14:57 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:
>> The current implementation of the vector mask cast operation is
>> complex that the compiler generates different patterns for different
>> scenarios. For architectures that do not support the predicate
>> feature, vector mask is represented the same as the normal vector.
>> So the vector mask cast is implemented by `VectorCast `node. But this
>> is not always needed. When two masks have the same element size (e.g.
>> int vs. float), their bits layout are the same. So casting between
>> them does not need to emit any instructions.
>>
>> Currently the compiler generates different patterns based on the
>> vector type of the input/output and the platforms. Normally the
>> "`VectorMaskCast`" op is only used for cases that doesn't emit any
>> instructions, and "`VectorCast`" op is used to implement the necessary
>> expand/narrow operations. This can avoid adding some duplicate rules
>> in the backend. However, this also has the drawbacks:
>>
>> 1) The codes are complex, especially when the compiler needs to
>> check whether the hardware supports the necessary IRs for the
>> vector mask cast. It needs to check different patterns for
>> different cases.
>> 2) The vector mask cast operation could be implemented with cheaper
>> instructions than the vector casting on some architectures.
>>
>> Instead of generating `VectorCast `or `VectorMaskCast `nodes for different
>> cases of vector mask cast operations, this patch unifies the vector
>> mask cast implementation with "`VectorMaskCast`" node for all vector types
>> and platforms. The missing backend rules are also added for it.
>>
>> This patch also simplies the vector mask conversion happened in
>> "`VectorUnbox::Ideal()`". Normally "`VectorUnbox (VectorBox vmask)`" can
>> be optimized to "`vmask`" if the unboxing type matches with the boxed
>> "`vmask`" type. Otherwise, it needs the type conversion. Currently the
>> "`VectorUnbox`" will be transformed to two different patterns to implement
>> the conversion:
>>
>> 1) If the element size is not changed, it is transformed to:
>>
>> "VectorMaskCast vmask"
>>
>> 2) Otherwise, it is transformed to:
>>
>> "VectorLoadMask (VectorStoreMask vmask)"
>>
>> It firstly converts the "`vmask`" to a boolean vector with "`VectorStoreMask`",
>> and then uses "`VectorLoadMask`" to convert the boolean vector to the
>> dst mask vector. Since this patch makes "`VectorMaskCast`" op supported
>> for all types on all platforms, it doesn't need the "`VectorLoadMask`" and
>> "`VectorStoreMask`" to do the conversion. The existing transformation:
>>
>> VectorUnbox (VectorBox vmask) => VectorLoadMask (VectorStoreMask vmask)
>>
>> can be simplified to:
>>
>> VectorUnbox (VectorBox vmask) => VectorMaskCast vmask
>
> Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits:
>
> - Use "setDefaultWarmup" instead of adding the annotation for each test
> - Merge branch 'jdk:master' into JDK-8292898
> - Change to use "avx512vl" cpu feature for some IR tests
> - Add the IR test and fix review comments on x86 backend
> - Remove untaken code paths on x86 match rules
> - Add assertion to the elem num for mast cast
> - Merge branch 'jdk:master' into JDK-8292898
> - 8292898: [vectorapi] Unify vector mask cast operation
> - Merge branch 'jdk:master' into JDK-8291600
> - Address review comments
> - ... and 7 more: https://git.openjdk.org/jdk/compare/8713dfa6...3845f926
Actually I also encountered intrinsification failures while working on [JDK-8259610](https://bugs.openjdk.java.net/browse/JDK-8259610) when setting the warmup iterations too low (the `INVOCATIONS` is set to 10000 in those tests). The cause is unknown to me, probably because some information fails to be propagated through the inlining. This can be seen frequently using `-XX:+PrintIntrinsics`, although the compiler will eventually manage to get the required constant information. As a result, I think setting a warmup iterations of 10000 is alright here. Thanks.
-------------
PR: https://git.openjdk.org/jdk/pull/10192
More information about the hotspot-compiler-dev
mailing list