RFR: 8266720: Wrong implementation in LibraryCallKit::inline_vector_shuffle_iota [v2]

Wang Huang whuang at openjdk.java.net
Thu May 13 01:50:01 UTC 2021


On Sat, 8 May 2021 03:30:06 GMT, Wang Huang <whuang at openjdk.org> wrote:

>> Dear All,
>>   Here is the patch of JDK-8266720. Could you do me a favor to review this?
>> * Reproduce:
>>    * cherry-pick JDK-8265956 
>>    * run patch's `TestVectorShuffleIotaByteWrongImpl.java`
>>    * However, this wrong of this code is obvious.
>> * Reason :
>>  1. In interpreter: 
>> 
>> static int partiallyWrapIndex(int index, int laneCount) {
>>     return checkIndex0(index, laneCount, (byte)-1);
>> }
>> 
>> @ForceInline
>> static int checkIndex0(int index, int laneCount, byte mode) {
>>     int wrapped = VectorIntrinsics.wrapToRange(index, laneCount);
>>     if (mode == 0 || wrapped == index) { // NOTE here
>>         return wrapped;
>>     }
>>     if (mode < 0) {
>>         return wrapped - laneCount;  // special mode for internal storage
>>     }
>>     throw checkIndexFailed(index, laneCount);
>> }
>> 
>> @ForceInline
>> static int wrapToRange(int index, int size) {
>>     if ((size & (size - 1)) == 0) {
>>         // Size is zero or a power of two, so we got this.
>>         return index & (size - 1);
>>     } else {
>>         return wrapToRangeNPOT(index, size);
>>     }
>> }
>> 
>> 2. However, we have this intrinsics in   
>> src/hotspot/share/opto/vectorIntrinsics.cpp [jdk/jdk]
>> ```c++
>>  386   } else {
>>  387     ConINode* pred_node = (ConINode*)gvn().makecon(TypeInt::make(1)); // BoolTest::gt here
>>  388     Node * lane_cnt  = gvn().makecon(TypeInt::make(num_elem));
>>  389     Node * bcast_lane_cnt = gvn().transform(VectorNode::scalar2vector(lane_cnt, num_elem, type_bt));
>> // here BoolTest::ge != 1 (which means BoolTest::gt)
>>  390     Node* mask = gvn().transform(new VectorMaskCmpNode(BoolTest::ge, bcast_lane_cnt, res, pred_node, vt));
>> 
>> 3. In `aarch64` neon backend, we use `BoolTest::ge` for generated code:
>> ```c++
>> // cond is useless here
>> instruct vcmge8B(vecD dst, vecD src1, vecD src2, immI cond)
>> %{
>>   predicate(n->as_Vector()->length() == 8 &&
>>             n->as_VectorMaskCmp()->get_predicate() == BoolTest::ge &&
>>             n->in(1)->in(1)->bottom_type()->is_vect()->element_basic_type() == T_BYTE);
>>   match(Set dst (VectorMaskCmp (Binary src1 src2) cond));
>>   format %{ "cmge  $dst, T8B, $src1, $src2\t# vector cmp (8B)" %}
>>   ins_cost(INSN_COST);
>>   ins_encode %{
>>     __ cmge(as_FloatRegister($dst$$reg), __ T8B,
>>             as_FloatRegister($src1$$reg), as_FloatRegister($src2$$reg));
>>   %}
>>   ins_pipe(vdop64);
>> %}
>> 
>> 
>> However, we use cond (=1 or BoolTest::gt). So X86 is **right** on jdk/jdk
>> ```c++
>> instruct vcmp(legVec dst, legVec src1, legVec src2, immI8 cond, rRegP scratch) %{
>>   predicate(vector_length_in_bytes(n->in(1)->in(1)) >=  8 && // src1
>>             vector_length_in_bytes(n->in(1)->in(1)) <= 32 && // src1
>>             is_integral_type(vector_element_basic_type(n->in(1)->in(1)))); // src1
>>   match(Set dst (VectorMaskCmp (Binary src1 src2) cond));
>>   effect(TEMP scratch);
>>   format %{ "vector_compare $dst,$src1,$src2,$cond\t! using $scratch as TEMP" %}
>>   ins_encode %{
>>     int vlen_enc = vector_length_encoding(this, $src1);
>>     Assembler::ComparisonPredicate cmp = booltest_pred_to_comparison_pred($cond$$constant);
>>     Assembler::Width ww = widthForType(vector_element_basic_type(this, $src1));
>>     __ vpcmpCCW($dst$$XMMRegister, $src1$$XMMRegister, $src2$$XMMRegister, cmp, ww, vlen_enc, $scratch$$Register);
>>   %}
>>   ins_pipe( pipe_slow );
>> %}
>> 
>> 4. In repo `panama-vector`, both of them are wrong, because the IR is fixed:
>> ```c++
>>  455   } else {
>>  456     ConINode* pred_node = (ConINode*)gvn().makecon(TypeInt::make(BoolTest::ge));// WRONG here
>>  457     Node * lane_cnt  = gvn().makecon(TypeInt::make(num_elem));
>>  458     Node * bcast_lane_cnt = gvn().transform(VectorNode::scalar2vector(lane_cnt, num_elem, type_bt));
>>  459     Node* mask = gvn().transform(new VectorMaskCmpNode(BoolTest::ge, bcast_lane_cnt, res, pred_node, vt));
>> 
>> Yours, 
>> Wang Huang
>
> Wang Huang has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision:
> 
>   8266720: Wrong implementation in LibraryCallKit::inline_vector_shuffle_iota

This issue will be closed because I will fix it on panama-vector since #3803 has not been merged.

-------------

PR: https://git.openjdk.java.net/jdk/pull/3933


More information about the hotspot-compiler-dev mailing list