[vectorIntrinsics+compress] RFR: 8274971: Add PrefixMask API

Fri Oct 8 12:01:31 UTC 2021

I separate my implementation of "compress" API into several patches for easy review.
This change is to import PrefixMask API for VectorMask.
It cooperates with compress/expand API. (See the usage in ALIBABA selectiveStore use case.)
It returns a prefix mask, based on the true count of the mask.
Assume "N" is the true count of the mask, the mask bit is set from the beginning lane till the lane numbered "N-1", otherwise it is unset.
Temporarily mask.prefixMask() is implemented by

    vectorSpecies.iota().compare(VectorOperators.LT, trueCount());

The alternative implementation is:

    vectorSpecies().indexInRange(0, m.trueCount())

I choose the former implementation since the latter depends on the Intrinsic support of indexVector.

I'm looking for instructions that could be used to accelerate indexVector/iota, so that vector-to-vector operations together with a store/load and prefix mask could be optimized further into single memory version instruction. 
Intel experts, do you have any suggestions on SIMD instructions for iota vector generation?

-------------

Commit messages:
 - 8274971: Add PrefixMask API

Changes: https://git.openjdk.java.net/panama-vector/pull/148/files
 Webrev: https://webrevs.openjdk.java.net/?repo=panama-vector&pr=148&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8274971
  Stats: 16 lines in 1 file changed: 16 ins; 0 del; 0 mod
  Patch: https://git.openjdk.java.net/panama-vector/pull/148.diff
  Fetch: git fetch https://git.openjdk.java.net/panama-vector pull/148/head:pull/148

PR: https://git.openjdk.java.net/panama-vector/pull/148