RFR: 8266720: Wrong implementation in LibraryCallKit::inline_vector_shuffle_iota [v2]

Sat May 8 03:30:06 UTC 2021

> Dear All,
>   Here is the patch of JDK-8266720. Could you do me a favor to review this?
> * Reproduce:
>    * cherry-pick JDK-8265956 
>    * run patch's `TestVectorShuffleIotaByteWrongImpl.java`
>    * However, this wrong of this code is obvious.
> * Reason :
>  1. In interpreter: 
> 
> static int partiallyWrapIndex(int index, int laneCount) {
>     return checkIndex0(index, laneCount, (byte)-1);
> }
> 
> @ForceInline
> static int checkIndex0(int index, int laneCount, byte mode) {
>     int wrapped = VectorIntrinsics.wrapToRange(index, laneCount);
>     if (mode == 0 || wrapped == index) { // NOTE here
>         return wrapped;
>     }
>     if (mode < 0) {
>         return wrapped - laneCount;  // special mode for internal storage
>     }
>     throw checkIndexFailed(index, laneCount);
> }
> 
> @ForceInline
> static int wrapToRange(int index, int size) {
>     if ((size & (size - 1)) == 0) {
>         // Size is zero or a power of two, so we got this.
>         return index & (size - 1);
>     } else {
>         return wrapToRangeNPOT(index, size);
>     }
> }
> 
> 2. However, we have this intrinsics in   
> src/hotspot/share/opto/vectorIntrinsics.cpp [jdk/jdk]
> ```c++
>  386   } else {
>  387     ConINode* pred_node = (ConINode*)gvn().makecon(TypeInt::make(1)); // BoolTest::gt here
>  388     Node * lane_cnt  = gvn().makecon(TypeInt::make(num_elem));
>  389     Node * bcast_lane_cnt = gvn().transform(VectorNode::scalar2vector(lane_cnt, num_elem, type_bt));
> // here BoolTest::ge != 1 (which means BoolTest::gt)
>  390     Node* mask = gvn().transform(new VectorMaskCmpNode(BoolTest::ge, bcast_lane_cnt, res, pred_node, vt));
> 
> 3. In `aarch64` neon backend, we use `BoolTest::ge` for generated code:
> ```c++
> // cond is useless here
> instruct vcmge8B(vecD dst, vecD src1, vecD src2, immI cond)
> %{
>   predicate(n->as_Vector()->length() == 8 &&
>             n->as_VectorMaskCmp()->get_predicate() == BoolTest::ge &&
>             n->in(1)->in(1)->bottom_type()->is_vect()->element_basic_type() == T_BYTE);
>   match(Set dst (VectorMaskCmp (Binary src1 src2) cond));
>   format %{ "cmge  $dst, T8B, $src1, $src2\t# vector cmp (8B)" %}
>   ins_cost(INSN_COST);
>   ins_encode %{
>     __ cmge(as_FloatRegister($dst$$reg), __ T8B,
>             as_FloatRegister($src1$$reg), as_FloatRegister($src2$$reg));
>   %}
>   ins_pipe(vdop64);
> %}
> 
> 
> However, we use cond (=1 or BoolTest::gt). So X86 is **right** on jdk/jdk
> ```c++
> instruct vcmp(legVec dst, legVec src1, legVec src2, immI8 cond, rRegP scratch) %{
>   predicate(vector_length_in_bytes(n->in(1)->in(1)) >=  8 && // src1
>             vector_length_in_bytes(n->in(1)->in(1)) <= 32 && // src1
>             is_integral_type(vector_element_basic_type(n->in(1)->in(1)))); // src1
>   match(Set dst (VectorMaskCmp (Binary src1 src2) cond));
>   effect(TEMP scratch);
>   format %{ "vector_compare $dst,$src1,$src2,$cond\t! using $scratch as TEMP" %}
>   ins_encode %{
>     int vlen_enc = vector_length_encoding(this, $src1);
>     Assembler::ComparisonPredicate cmp = booltest_pred_to_comparison_pred($cond$$constant);
>     Assembler::Width ww = widthForType(vector_element_basic_type(this, $src1));
>     __ vpcmpCCW($dst$$XMMRegister, $src1$$XMMRegister, $src2$$XMMRegister, cmp, ww, vlen_enc, $scratch$$Register);
>   %}
>   ins_pipe( pipe_slow );
> %}
> 
> 4. In repo `panama-vector`, both of them are wrong, because the IR is fixed:
> ```c++
>  455   } else {
>  456     ConINode* pred_node = (ConINode*)gvn().makecon(TypeInt::make(BoolTest::ge));// WRONG here
>  457     Node * lane_cnt  = gvn().makecon(TypeInt::make(num_elem));
>  458     Node * bcast_lane_cnt = gvn().transform(VectorNode::scalar2vector(lane_cnt, num_elem, type_bt));
>  459     Node* mask = gvn().transform(new VectorMaskCmpNode(BoolTest::ge, bcast_lane_cnt, res, pred_node, vt));
> 
> Yours, 
> Wang Huang

Wang Huang has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision:

  8266720: Wrong implementation in LibraryCallKit::inline_vector_shuffle_iota

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/3933/files
  - new: https://git.openjdk.java.net/jdk/pull/3933/files/65e20a50..9f5577f0

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3933&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3933&range=00-01

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3933.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3933/head:pull/3933

PR: https://git.openjdk.java.net/jdk/pull/3933