RFR: 8291600: [vectorapi] vector cast op check is not always needed for vector mask cast [v7]

Thu Sep 15 07:49:08 UTC 2022

On Wed, 7 Sep 2022 06:01:22 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:

>> Recently we found the performance of "`FIRST_NONZERO`" for double type is largely worse than the other types on x86 when `UseAVX=2`. The main reason is the "`VectorCastL2X`" op is not supported by the backend when the dst element type is `T_DOUBLE`. This makes the check of `VectorCast` op fail before intrinsifying "`VectorMask.cast()`" which is used in the
>> "`FIRST_NONZERO`" java implementation (see [1]). However, the compiler will not generate the `VectorCast `op for `VectorMask.cast()` if:
>> 
>>  1) the current platform supports the predicated feature
>>  2) the element size (in bytes) of the src and dst type is the same
>> 
>> So the check of "`VectorCast`" op is needless for such cases. To fix it, this patch:
>> 
>>  1) limits the specified vector cast op check to vectors
>>  2) adds the relative mask cast op check for VectorMask.cast()
>>  3) cleans up the unnecessary codes
>> 
>> Here is the performance of "`FIRST_NONZERO`" benchmark [2] on a x86 machine with `UseAVX=2`:
>> 
>> Benchmark                          (size) Mode Cnt Before  After   Units
>> DoubleMaxVector.FIRST_NONZERO       1024  thrpt 15 49.266 2460.886 ops/ms
>> DoubleMaxVector.FIRST_NONZEROMasked 1024  thrpt 15 49.554 1892.223 ops/ms
>> 
>> [1] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/DoubleVector.java#L770
>> [2] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/DoubleMaxVector.java#L246
>
> Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits:
> 
>  - Merge branch 'jdk:master' into JDK-8291600
>  - Address review comments
>  - Add vector cast op check for vector mask for some cases
>  - Revert the unify changes to vector mask cast
>  - Merge branch 'jdk:master' into JDK-8291600
>  - Fix x86 codegen issue
>  - Unify VectorMaskCast for all platforms
>  - Merge branch 'master' into JDK-8291600
>  - 8291600: [vectorapi] vector cast op check is not always needed for vector mask cast

Thanks @XiaohongGong  for looking into this,  patch is showing significant speedups for AVX2 and AVX512 KNL targets.

src/hotspot/share/opto/vectorIntrinsics.cpp line 2486:

> 2484:       ((src_type->isa_vectmask() == NULL && dst_type->isa_vectmask()) ||
> 2485:        (dst_type->isa_vectmask() == NULL && src_type->isa_vectmask()) ||
> 2486:        num_elem_from != num_elem_to)) {

This check is already done on [java side](https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Double512Vector.java#L643).

src/hotspot/share/opto/vectorIntrinsics.cpp line 2505:

> 2503:     bool no_vec_cast_check = is_mask &&
> 2504:                              ((src_type->isa_vectmask() && dst_type->isa_vectmask()) ||
> 2505:                               type2aelembytes(elem_bt_from) == type2aelembytes(elem_bt_to));

Same check is existing at https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/vectorIntrinsics.cpp#L2551, can we do some factoring.

src/hotspot/share/opto/vectorIntrinsics.cpp line 2554:

> 2552:       op = gvn().transform(new VectorReinterpretNode(op, src_type, resize_type));
> 2553:       op = gvn().transform(VectorCastNode::make(cast_vopc, op, elem_bt_to, num_elem_to));
> 2554:     } else { // num_elem_from == num_elem_to

There is one comment unrelated to this patch but since your patch touches this function may be we can address it.
Call to VectorMaskCastNode::makeCastNode in else block at following line
https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/vectorIntrinsics.cpp#L2555
should be able to handle the true block. 
But, newly created IR node on following location https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/vectornode.cpp#L1761 should be passed through gvn transform before returning.

-------------

PR: https://git.openjdk.org/jdk/pull/9737