RFR: 8266720: Wrong implementation in LibraryCallKit::inline_vector_shuffle_iota [v2]
Wang Huang
whuang at openjdk.java.net
Thu May 13 01:50:01 UTC 2021
On Sat, 8 May 2021 03:30:06 GMT, Wang Huang <whuang at openjdk.org> wrote:
>> Dear All,
>> Here is the patch of JDK-8266720. Could you do me a favor to review this?
>> * Reproduce:
>> * cherry-pick JDK-8265956
>> * run patch's `TestVectorShuffleIotaByteWrongImpl.java`
>> * However, this wrong of this code is obvious.
>> * Reason :
>> 1. In interpreter:
>>
>> static int partiallyWrapIndex(int index, int laneCount) {
>> return checkIndex0(index, laneCount, (byte)-1);
>> }
>>
>> @ForceInline
>> static int checkIndex0(int index, int laneCount, byte mode) {
>> int wrapped = VectorIntrinsics.wrapToRange(index, laneCount);
>> if (mode == 0 || wrapped == index) { // NOTE here
>> return wrapped;
>> }
>> if (mode < 0) {
>> return wrapped - laneCount; // special mode for internal storage
>> }
>> throw checkIndexFailed(index, laneCount);
>> }
>>
>> @ForceInline
>> static int wrapToRange(int index, int size) {
>> if ((size & (size - 1)) == 0) {
>> // Size is zero or a power of two, so we got this.
>> return index & (size - 1);
>> } else {
>> return wrapToRangeNPOT(index, size);
>> }
>> }
>>
>> 2. However, we have this intrinsics in
>> src/hotspot/share/opto/vectorIntrinsics.cpp [jdk/jdk]
>> ```c++
>> 386 } else {
>> 387 ConINode* pred_node = (ConINode*)gvn().makecon(TypeInt::make(1)); // BoolTest::gt here
>> 388 Node * lane_cnt = gvn().makecon(TypeInt::make(num_elem));
>> 389 Node * bcast_lane_cnt = gvn().transform(VectorNode::scalar2vector(lane_cnt, num_elem, type_bt));
>> // here BoolTest::ge != 1 (which means BoolTest::gt)
>> 390 Node* mask = gvn().transform(new VectorMaskCmpNode(BoolTest::ge, bcast_lane_cnt, res, pred_node, vt));
>>
>> 3. In `aarch64` neon backend, we use `BoolTest::ge` for generated code:
>> ```c++
>> // cond is useless here
>> instruct vcmge8B(vecD dst, vecD src1, vecD src2, immI cond)
>> %{
>> predicate(n->as_Vector()->length() == 8 &&
>> n->as_VectorMaskCmp()->get_predicate() == BoolTest::ge &&
>> n->in(1)->in(1)->bottom_type()->is_vect()->element_basic_type() == T_BYTE);
>> match(Set dst (VectorMaskCmp (Binary src1 src2) cond));
>> format %{ "cmge $dst, T8B, $src1, $src2\t# vector cmp (8B)" %}
>> ins_cost(INSN_COST);
>> ins_encode %{
>> __ cmge(as_FloatRegister($dst$$reg), __ T8B,
>> as_FloatRegister($src1$$reg), as_FloatRegister($src2$$reg));
>> %}
>> ins_pipe(vdop64);
>> %}
>>
>>
>> However, we use cond (=1 or BoolTest::gt). So X86 is **right** on jdk/jdk
>> ```c++
>> instruct vcmp(legVec dst, legVec src1, legVec src2, immI8 cond, rRegP scratch) %{
>> predicate(vector_length_in_bytes(n->in(1)->in(1)) >= 8 && // src1
>> vector_length_in_bytes(n->in(1)->in(1)) <= 32 && // src1
>> is_integral_type(vector_element_basic_type(n->in(1)->in(1)))); // src1
>> match(Set dst (VectorMaskCmp (Binary src1 src2) cond));
>> effect(TEMP scratch);
>> format %{ "vector_compare $dst,$src1,$src2,$cond\t! using $scratch as TEMP" %}
>> ins_encode %{
>> int vlen_enc = vector_length_encoding(this, $src1);
>> Assembler::ComparisonPredicate cmp = booltest_pred_to_comparison_pred($cond$$constant);
>> Assembler::Width ww = widthForType(vector_element_basic_type(this, $src1));
>> __ vpcmpCCW($dst$$XMMRegister, $src1$$XMMRegister, $src2$$XMMRegister, cmp, ww, vlen_enc, $scratch$$Register);
>> %}
>> ins_pipe( pipe_slow );
>> %}
>>
>> 4. In repo `panama-vector`, both of them are wrong, because the IR is fixed:
>> ```c++
>> 455 } else {
>> 456 ConINode* pred_node = (ConINode*)gvn().makecon(TypeInt::make(BoolTest::ge));// WRONG here
>> 457 Node * lane_cnt = gvn().makecon(TypeInt::make(num_elem));
>> 458 Node * bcast_lane_cnt = gvn().transform(VectorNode::scalar2vector(lane_cnt, num_elem, type_bt));
>> 459 Node* mask = gvn().transform(new VectorMaskCmpNode(BoolTest::ge, bcast_lane_cnt, res, pred_node, vt));
>>
>> Yours,
>> Wang Huang
>
> Wang Huang has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision:
>
> 8266720: Wrong implementation in LibraryCallKit::inline_vector_shuffle_iota
This issue will be closed because I will fix it on panama-vector since #3803 has not been merged.
-------------
PR: https://git.openjdk.java.net/jdk/pull/3933
More information about the hotspot-compiler-dev
mailing list