RFR: 8291600: [vectorapi] vector cast op check is not always needed for vector mask cast [v4]
Jie Fu
jiefu at openjdk.org
Mon Aug 29 06:47:11 UTC 2022
On Sun, 28 Aug 2022 14:20:20 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:
>> Recently we found the performance of "`FIRST_NONZERO`" for double type is largely worse than the other types on x86 when `UseAVX=2`. The main reason is the "`VectorCastL2X`" op is not supported by the backend when the dst element type is `T_DOUBLE`. This makes the check of `VectorCast` op fail before intrinsifying "`VectorMask.cast()`" which is used in the
>> "`FIRST_NONZERO`" java implementation (see [1]). However, the compiler will not generate the `VectorCast `op for `VectorMask.cast()` if:
>>
>> 1) the current platform supports the predicated feature
>> 2) the element size (in bytes) of the src and dst type is the same
>>
>> So the check of "`VectorCast`" op is needless for such cases. To fix it, this patch:
>>
>> 1) limits the specified vector cast op check to vectors
>> 2) adds the relative mask cast op check for VectorMask.cast()
>> 3) cleans up the unnecessary codes
>>
>> Here is the performance of "`FIRST_NONZERO`" benchmark [2] on a x86 machine with `UseAVX=2`:
>>
>> Benchmark (size) Mode Cnt Before After Units
>> DoubleMaxVector.FIRST_NONZERO 1024 thrpt 15 49.266 2460.886 ops/ms
>> DoubleMaxVector.FIRST_NONZEROMasked 1024 thrpt 15 49.554 1892.223 ops/ms
>>
>> [1] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/DoubleVector.java#L770
>> [2] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/DoubleMaxVector.java#L246
>
> Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision:
>
> - Revert the unify changes to vector mask cast
> - Merge branch 'jdk:master' into JDK-8291600
> - Fix x86 codegen issue
> - Unify VectorMaskCast for all platforms
> - Merge branch 'master' into JDK-8291600
> - 8291600: [vectorapi] vector cast op check is not always needed for vector mask cast
Thanks for the update.
May I ask can we do it like this?
diff --git a/src/hotspot/share/opto/vectorIntrinsics.cpp b/src/hotspot/share/opto/vectorIntrinsics.cpp
index 66bacf0..c4e807a 100644
--- a/src/hotspot/share/opto/vectorIntrinsics.cpp
+++ b/src/hotspot/share/opto/vectorIntrinsics.cpp
@@ -2494,8 +2494,9 @@ bool LibraryCallKit::inline_vector_convert() {
new_elem_bt_from = elem_bt_from == T_FLOAT ? T_INT : T_LONG;
}
int cast_vopc = VectorCastNode::opcode(new_elem_bt_from, !is_ucast);
+ bool no_vec_cast_check = is_mask && (type2aelembytes(elem_bt_from) == type2aelembytes(elem_bt_to));
// Make sure that cast is implemented to particular type/size combination.
- if (!arch_supports_vector(cast_vopc, num_elem_to, elem_bt_to, VecMaskNotUsed)) {
+ if (!no_vec_cast_check && !arch_supports_vector(cast_vopc, num_elem_to, elem_bt_to, VecMaskNotUsed)) {
if (C->print_intrinsics()) {
tty->print_cr(" ** not supported: arity=1 op=cast#%d/3 vlen2=%d etype2=%s ismask=%d",
cast_vopc,
-------------
PR: https://git.openjdk.org/jdk/pull/9737
More information about the hotspot-compiler-dev
mailing list