[vectorIntrinsics+mask] RFR: 8273057: [vector] New VectorAPI "SelectiveStore"

Tue Aug 31 16:44:24 UTC 2021

Yes, my suggestion is that a vector-to-vector compress might be a composition of mask -> partitioning shuffle -> rearrange, such that on supported architectures it reduces down to a single instruction. In combination with a store and prefix mask it may be possible to reduce further to single instruction accepting the source vector, mask, and a memory location.

That may be wishful thinking, however we have made significant improvements to the optimization of shuffles and masking, which gives me some hope.

I think we should give it some more thought (on the C2 heroics required to not, whether we can internally classify certain kinds of shuffle etc) before committing to a more specific/specialized operation, such as say:

- Vector.compress(VectorMask<>); or perhaps
- a new class of operators, rearrange operators, whose behaviors are documented with regards to mask and non-mask variants.
(It’s tempting to create a special unary lanewise operator, whose non-mask variant returns the input. But, that would be a misuse.)

Paul.

> On Aug 31, 2021, at 2:18 AM, Ningsheng Jian <njian at openjdk.java.net> wrote:
> 
> On Mon, 30 Aug 2021 11:39:31 GMT, Joshua Zhu <jzhu at openjdk.org> wrote:
> 
>>> I think I would rather see a vector-to-vector compress operation, than a vector-to-memory operation that also includes compression. Isn?t that the real underlying primitive?
>> 
>> Agree. John, thanks a lot for your review comments. This will make the primitive more friendly.
> 
> Yes, if we have to introduce a new API, a vector-to-vector compress operation sounds more reasonable. Arm SVE instrctuion COMPACT could also do such work.
> 
> -------------
> 
> PR: https://git.openjdk.java.net/panama-vector/pull/115