[vectorIntrinsics+mask] RFR: 8273057: [vector] New VectorAPI "SelectiveStore"

Tue Sep 7 15:41:49 UTC 2021

On Tue, 7 Sep 2021 09:38:29 GMT, Joshua Zhu <jzhu at openjdk.org> wrote:

> > While adding new macro level APIs is appealing, we can also extend following existing vectorAPIs to accept another boolean flag "is_selective" under which compression/expansion triggers.
> > ```
> > public static IntVector fromArray(VectorSpecies<Integer> species,
> > int[] a,
> > int offset,
> > VectorMask<Integer> m) 
> > 
> > 
> > public final void intoArray(int[] a,
> > int offset,
> > VectorMask<Integer> m)
> > ```
> 
> Per design discussion in this thread, compared to vector-to-memory operation, vector-to-vector compress/expand operation is the more friendly primitive.
> It can also be used to "bridge to and from permutation simply by working with index vectors like iota, and perhaps (as sugar) lifting selected vector operations to shuffles."
> For different architectures, like SVE, memory destination version is also not supported natively.
> 
> > In this use case its difficult to infer COMPRESSION through Auto-vectorizer though we made attempts in past to infer complex loop patterns for VNNI instruction.
> 
> Could you elaborate on it please? I do not follow this.
>

I meant auto-vectorizing following loop which mimic compression semantics could be tricky if its difficult to ascertain the independence between memory references. Like in following case 'j' could be a middle index in array and thus if the distance between memory references is less than chosen vector width it may result into incorrect overwrite. 

for ( int i = 0; i < n ; i++ ) {
    if ( mask[i] > 1 ) {
         a[j++] = a[i];
     }
}

> > This way we can also share common optimizations as you suggested earlier to convert masked COMPRESS to unmasked vector move for ALLTRUE mask, some work[1][2] is already in place on this front.
> > [1] https://github.com/openjdk/panama-vector/blob/master/src/hotspot/share/opto/vectornode.cpp#L752
> > [2] https://github.com/openjdk/panama-vector/blob/master/src/hotspot/share/opto/vectornode.cpp#L771
> 
> Yes. Since compress/expand op is also mask-based, this piece of optimization is common. Maybe we can think of one way to share this optimization for different kinds of masked operations?

Yes, agree.

-------------

PR: https://git.openjdk.java.net/panama-vector/pull/115