[vectorIntrinsics+compress] RFR: 8274971: Add PrefixMask API

Mon Oct 11 22:54:03 UTC 2021

On Fri, 8 Oct 2021 11:56:05 GMT, Joshua Zhu <jzhu at openjdk.org> wrote:

> I separate my implementation of "compress" API into several patches for easy review.
> This change is to import PrefixMask API for VectorMask.
> It cooperates with compress/expand API. (See the usage in ALIBABA selectiveStore use case.)
> It returns a prefix mask, based on the true count of the mask.
> Assume "N" is the true count of the mask, the mask bit is set from the beginning lane till the lane numbered "N-1", otherwise it is unset.
> Temporarily mask.prefixMask() is implemented by
> 
>     vectorSpecies.iota().compare(VectorOperators.LT, trueCount());
> 
> The alternative implementation is:
> 
>     vectorSpecies().indexInRange(0, m.trueCount())
> 
> I choose the former implementation since the latter depends on the Intrinsic support of indexVector.
> 
> I'm looking for instructions that could be used to accelerate indexVector/iota, so that vector-to-vector operations together with a store/load and prefix mask could be optimized further into single memory version instruction. 
> Intel experts, do you have any suggestions on SIMD instructions for iota vector generation?

@JoshuaZhuwj Yes, let us use mask.compress() name instead. 
For AVX512, one way is to use the PEXT instruction for mask.compress() implementation:
   * Move the mask from k register to r register
   * PEXT on r register with other register having all 1s.
   * Move from r register to k register
Are you planning to add the test cases as well?

-------------

PR: https://git.openjdk.java.net/panama-vector/pull/148