Withdrawn: 8266720: Wrong implementation in LibraryCallKit::inline_vector_shuffle_iota
Wang Huang
whuang at openjdk.java.net
Thu May 13 01:50:01 UTC 2021
On Sat, 8 May 2021 02:30:06 GMT, Wang Huang <whuang at openjdk.org> wrote:
> Dear All,
> Here is the patch of JDK-8266720. Could you do me a favor to review this?
> * Reproduce:
> * cherry-pick JDK-8265956
> * run patch's `TestVectorShuffleIotaByteWrongImpl.java`
> * However, this wrong of this code is obvious.
> * Reason :
> 1. In interpreter:
>
> static int partiallyWrapIndex(int index, int laneCount) {
> return checkIndex0(index, laneCount, (byte)-1);
> }
>
> @ForceInline
> static int checkIndex0(int index, int laneCount, byte mode) {
> int wrapped = VectorIntrinsics.wrapToRange(index, laneCount);
> if (mode == 0 || wrapped == index) { // NOTE here
> return wrapped;
> }
> if (mode < 0) {
> return wrapped - laneCount; // special mode for internal storage
> }
> throw checkIndexFailed(index, laneCount);
> }
>
> @ForceInline
> static int wrapToRange(int index, int size) {
> if ((size & (size - 1)) == 0) {
> // Size is zero or a power of two, so we got this.
> return index & (size - 1);
> } else {
> return wrapToRangeNPOT(index, size);
> }
> }
>
> 2. However, we have this intrinsics in
> src/hotspot/share/opto/vectorIntrinsics.cpp [jdk/jdk]
> ```c++
> 386 } else {
> 387 ConINode* pred_node = (ConINode*)gvn().makecon(TypeInt::make(1)); // BoolTest::gt here
> 388 Node * lane_cnt = gvn().makecon(TypeInt::make(num_elem));
> 389 Node * bcast_lane_cnt = gvn().transform(VectorNode::scalar2vector(lane_cnt, num_elem, type_bt));
> // here BoolTest::ge != 1 (which means BoolTest::gt)
> 390 Node* mask = gvn().transform(new VectorMaskCmpNode(BoolTest::ge, bcast_lane_cnt, res, pred_node, vt));
>
> 3. In `aarch64` neon backend, we use `BoolTest::ge` for generated code:
> ```c++
> // cond is useless here
> instruct vcmge8B(vecD dst, vecD src1, vecD src2, immI cond)
> %{
> predicate(n->as_Vector()->length() == 8 &&
> n->as_VectorMaskCmp()->get_predicate() == BoolTest::ge &&
> n->in(1)->in(1)->bottom_type()->is_vect()->element_basic_type() == T_BYTE);
> match(Set dst (VectorMaskCmp (Binary src1 src2) cond));
> format %{ "cmge $dst, T8B, $src1, $src2\t# vector cmp (8B)" %}
> ins_cost(INSN_COST);
> ins_encode %{
> __ cmge(as_FloatRegister($dst$$reg), __ T8B,
> as_FloatRegister($src1$$reg), as_FloatRegister($src2$$reg));
> %}
> ins_pipe(vdop64);
> %}
>
>
> However, we use cond (=1 or BoolTest::gt). So X86 is **right** on jdk/jdk
> ```c++
> instruct vcmp(legVec dst, legVec src1, legVec src2, immI8 cond, rRegP scratch) %{
> predicate(vector_length_in_bytes(n->in(1)->in(1)) >= 8 && // src1
> vector_length_in_bytes(n->in(1)->in(1)) <= 32 && // src1
> is_integral_type(vector_element_basic_type(n->in(1)->in(1)))); // src1
> match(Set dst (VectorMaskCmp (Binary src1 src2) cond));
> effect(TEMP scratch);
> format %{ "vector_compare $dst,$src1,$src2,$cond\t! using $scratch as TEMP" %}
> ins_encode %{
> int vlen_enc = vector_length_encoding(this, $src1);
> Assembler::ComparisonPredicate cmp = booltest_pred_to_comparison_pred($cond$$constant);
> Assembler::Width ww = widthForType(vector_element_basic_type(this, $src1));
> __ vpcmpCCW($dst$$XMMRegister, $src1$$XMMRegister, $src2$$XMMRegister, cmp, ww, vlen_enc, $scratch$$Register);
> %}
> ins_pipe( pipe_slow );
> %}
>
> 4. In repo `panama-vector`, both of them are wrong, because the IR is fixed:
> ```c++
> 455 } else {
> 456 ConINode* pred_node = (ConINode*)gvn().makecon(TypeInt::make(BoolTest::ge));// WRONG here
> 457 Node * lane_cnt = gvn().makecon(TypeInt::make(num_elem));
> 458 Node * bcast_lane_cnt = gvn().transform(VectorNode::scalar2vector(lane_cnt, num_elem, type_bt));
> 459 Node* mask = gvn().transform(new VectorMaskCmpNode(BoolTest::ge, bcast_lane_cnt, res, pred_node, vt));
>
> Yours,
> Wang Huang
This pull request has been closed without being integrated.
-------------
PR: https://git.openjdk.java.net/jdk/pull/3933
More information about the hotspot-compiler-dev
mailing list