[vectorIntrinsics+compress] RFR: 8274971: Add PrefixMask API

Mon Oct 11 15:37:23 UTC 2021

On Fri, 8 Oct 2021 11:56:05 GMT, Joshua Zhu <jzhu at openjdk.org> wrote:

> I separate my implementation of "compress" API into several patches for easy review.
> This change is to import PrefixMask API for VectorMask.
> It cooperates with compress/expand API. (See the usage in ALIBABA selectiveStore use case.)
> It returns a prefix mask, based on the true count of the mask.
> Assume "N" is the true count of the mask, the mask bit is set from the beginning lane till the lane numbered "N-1", otherwise it is unset.
> Temporarily mask.prefixMask() is implemented by
> 
>     vectorSpecies.iota().compare(VectorOperators.LT, trueCount());
> 
> The alternative implementation is:
> 
>     vectorSpecies().indexInRange(0, m.trueCount())
> 
> I choose the former implementation since the latter depends on the Intrinsic support of indexVector.
> 
> I'm looking for instructions that could be used to accelerate indexVector/iota, so that vector-to-vector operations together with a store/load and prefix mask could be optimized further into single memory version instruction. 
> Intel experts, do you have any suggestions on SIMD instructions for iota vector generation?

As discussed on the panama list, this is an important concept, even if it can be composed from other methods (plus we might have opportunities to optimize).

In that discussion we refer to the method as `compress`. That aligns nicely with `Vector.compress`. Can we please update to that name.

-------------

PR: https://git.openjdk.java.net/panama-vector/pull/148