[vectorIntrinsics+mask] RFR: 8266287: Basic mask IR implementation for the Vector API masking feature support [v3]

Thu Jun 24 05:12:47 UTC 2021

On Mon, 17 May 2021 04:04:03 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:

> > Can you elaborate, please, what's the purpose of new nodes?
> > There's some duplication with existing vector nodes and I'd like to understand how do you intend to merge them.
> 
> Sure. I added some new nodes for the vector mask generations and operations. Their bottom type are the new added mask type `TypeVectMask`. These nodes are created only for platforms that support predicate feature. For other platforms, it still uses the existing vector nodes (with `TypeVect `as the bottom type) for masks. An alternative solution for the mask IRs is to use the same vector nodes by setting different bottom types for platforms that support predicate feature and others. To make it not mixture with the vector operations, we choose to separate the mask with vectors by adding new nodes.
> 
> > `LoadVectorMask`/`StoreVectorMask` duplicate `VectorLoadMask (LoadVector)`/`StoreVector (VectorStoreMask)`. What kind of benefits do you expect from exposing the operation as a single node?
> > Depending on how `MaskToVector`/`VectorToMask` are specified, they can duplicate `VectorStoreMask`/`VectorLoadMask`.
> 
> We added the transformation for mask loading/storing: `VectorLoadMask (LoadVector)`/`StoreVector (VectorStoreMask)` ==> `LoadVectorMask`/`StoreVectorMask`. To be honest, this seems only benefit Arm SVE. Different with AVX-512, SVE predicate works based on the basic element type. Since the memory type for mask is `boolean`, while the vector data type might be `byte, short, int, long, float, double`, we always need to unpack the boolean values to the relative data types when converting vector mask to predicate. So the SVE codes for `VectorLoadMask (LoadVector)` with int type is:
> 
> ```
>   mov x0, #64
>   whilelo p0.b, zr, x0
>   ld1b  z16.b, p0, [address]   // LoadVector  (since the mask vector length is less than the max vector length,
>                                // we need the predicate to implement the partial loading)
> 
>   uunpklo z16.h, z16.h
>   uunpklo z16.s, z16.s         // VectorLoadMask
>   cmpne  p0.s, z16.s, #0       // VectorToMask
> ```
> 
> Since SVE supports the load extended instruction, we can use one node `LoadVectorMask` to load the mask values from memory and extend the values to the data type. The above codes could be optimized to:
> 
> ```
>   ld1b  z16.s, p7, [address]   // LoadVectorMask  (load mask values from boolean type memory, and extends the values to int type)
>   cmpne p0.s, z16.s, #0        // VectorToMask
> ```
> 
> Note that `MaskToVector`/`VectorToMask` are different with `VectorStoreMask`/`VectorLoadMask`, which only simply do the convertions between mask and vector with the same element type, while `VectorStoreMask`/`VectorLoadMask` must contain the type casting between mask and vector which is also needed for SVE.
> 
> Although the existing pattern can work well for AVX-512. Is it possible to use the same optimized pattern for it as well @jatin-bhateja ?
> 

Hi @XiaohongGong , I just missed noticing this query earlier,  apologies!,   I think VectorLoadMask and VectorStoreMask serves the purpose to load/store a raw mask (boolean array)  from/to a vector/predicated register.  An explicit mask casting IR (VectorMaskCastNode) has been introduced recently which can be plugged after loading mask value if casting is needed.

As you mentioned there is a little  use for LoadVectorMask/StoreVectorMask from X86 prespective.

> > If mask value always has a vector type,`AndVMask`/`OrVMask`/`XorVMask` can be replaced by `AndV`/`OrV`/`XorV` and special implementations for 2 representations (canonical and native). Same considerations apply to `VectorCmpMaskGen` (compared to `VectorMaskCmp`).
> 
> We added `AndVMask`/`OrVMask`/`XorVMask` for mask logical operations finally to separate with the vector logical operations. It might be confusing if `AndV`/`OrV`/`XorV` could might be `TypeVect ` or `TypeMaskVect ` for the same plaftform.
> 
> As a conclusion, actually we are not sure what is the best optimal way to represent the mask operations, but we prefer to new mask IRs (definitely with `TypeVectMask`) for all mask operations while not just change the bottom type of the existing vector nodes. Any ideas about it? It will be absolutely helpful if you could give us more advice. Thanks so much!

-------------

PR: https://git.openjdk.java.net/panama-vector/pull/78