[vectorIntrinsics] RFR: 8274569: X86 backend related incorrectness issues in legacy store mask patterns

Fri Oct 1 00:39:15 UTC 2021

On Thu, 30 Sep 2021 19:34:43 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

> - Issue was seen while unit testing changes for masking related optimizations on vectorIntrinsics branch.
> 
> - Following patterns result into incorrect store mask computation, even though contextually store-mask is followed by StoreVector which does takes care of various vector sizes. But C2 does emit VectorStoreMask in IR to populate a byte vector. 
> 
>   - instruct storeMask2B
>   - instruct storeMask4B
>   - instruct storeMask8B
>   - instruct storeMask8B_avx
> 
> - Replicate with immI operand is correctly taking care of shorter vector lengths thus it can be fall back case for following pattern with immediate -1 argument. 
>   - instruct ReplI_M1
>   
> Patch also adds a test case which exhaustively coves load/store vector masks operations for different SPECIES.

Do we still need `andq`?

void C2_MacroAssembler::vector_mask_operation(int opc, Register dst, XMMRegister mask, XMMRegister xtmp,
                                              XMMRegister xtmp1, Register tmp, int masklen, int vec_enc) {
  assert(VM_Version::supports_avx(), "");
  vpxor(xtmp, xtmp, xtmp, vec_enc);
  vpsubb(xtmp, xtmp, mask, vec_enc);
  vpmovmskb(tmp, xtmp, vec_enc);
  if (masklen < 64) {
    andq(tmp, (((jlong)1 << masklen) - 1));               // <---- Do we still need this?
  }
  switch(opc) {
    case Op_VectorMaskTrueCount:
      popcntq(dst, tmp);
      break;
    case Op_VectorMaskLastTrue:
      mov64(dst, -1);
      bsrq(tmp, tmp);
      cmov(Assembler::notZero, dst, tmp);
      break;
    case Op_VectorMaskFirstTrue:
      mov64(dst, masklen);
      bsfq(tmp, tmp);
      cmov(Assembler::notZero, dst, tmp);
      break;
    default: assert(false, "Unhandled mask operation");
  }
}

-------------

PR: https://git.openjdk.java.net/panama-vector/pull/139