[vectorIntrinsics+mask] RFR: 8273406: Optimize various masked vector operations for AVX512 target.
Jatin Bhateja
jbhateja at openjdk.java.net
Tue Sep 7 14:59:04 UTC 2021
This patch is in continuation to X86 backend support for optimizing masked operations over AVX-512 targets (JDK-8262356).
Summary of changes:
1) Support for masked rotate left and right operations over integer/long vectors.
2) Support for masked square root operation over float/double vectors.
3) Support for masked logical shiftleft and logical/arithmetic shiftright operation with constant shift count.
4) Optimized VectorMask.not operation by emitting direct KNOT instruction.
5) Extended masking optimization support for X86 KNL target which has limited set of AVX-512 features.
- Currently vector type associated with VectorLoadMask operation is created during parsing stage.
For targets supporting opmask registers, lane type is explicitly set to BOOLEAN irrespective of the primitive
type of species i.e. for Int512 species ideal type TypeVectMask(16,BOOL) represent vector of 16 BOOLEAN elements
each of which represent a mask bit for corresponding vector lane.
This type information is also associated with respective mask boxes (Int512Mask).
- During macro expansion vbox/vunbox nodes are broken down into granular target mappable ideal nodes.
```
VectorBoxNode -> VectorStoreMask + StoreVector
VectorUnboxNode -> LoadVector + VectorLoadMask
```
At this stage vector type (TypeVectMask(16,BOOL)) earlier associated with vunbox node is used to create the
type for VectorLoadMask operation.
- Masks can be propagated either though a vector (non-AVX512 targets) or using opmask registers (K1-K7).
Decision to create correct ideal type based on the target features is delegated to low level
type creation routine TypeVect::makemask.
- This creates problem for targets like KNL which support limited set of AVX-512 features i.e. do
no support AVX512VL and AVX512BW feature.
- For Int512 species initial ideal type constructed during parsing is based on primitive type and
lane count associated with species, but during macro expansion type creation
decision is based on vector type associated with v[u]box nodes i.e. TypeVectoMask(16,BOOL),
thus for KNL target incorrect vector mask type TypeVectX(16,BOOL) gets created since it does not
support vector length extension(128,256 bit operation over EVEX encoded instruction).
- There are multiple ways to fix this discrepancy, cleanest approach is to create ideal type TypeVectoMask
based on the primitive lane type of the species, instead of always setting the lane type as BOOLEAN.
This will also preserve the original lane type information which was needed in some cases e.g.
reinterpretation operation over mask. To circumvent such issue explicit src/dst primitive types
were added to ideal nodes.
- Also this does not disturbs the register mask and spilling behavior associated with opmask registers
thus the change is transparent to backend passes.
Validation:
Patch regressed through tier1-3 tests at AVX Level=0,1,2,3 and UseKNLSetting
-------------
Commit messages:
- 8273406: Optimize various masked vector operations for AVX512 target.
Changes: https://git.openjdk.java.net/panama-vector/pull/122/files
Webrev: https://webrevs.openjdk.java.net/?repo=panama-vector&pr=122&range=00
Issue: https://bugs.openjdk.java.net/browse/JDK-8273406
Stats: 736 lines in 14 files changed: 666 ins; 26 del; 44 mod
Patch: https://git.openjdk.java.net/panama-vector/pull/122.diff
Fetch: git fetch https://git.openjdk.java.net/panama-vector pull/122/head:pull/122
PR: https://git.openjdk.java.net/panama-vector/pull/122
More information about the panama-dev
mailing list