[vectorIntrinsics+mask] RFR: 8273406: Optimize various masked vector operations for AVX512 target.

Tue Sep 7 14:59:04 UTC 2021

This patch is in continuation to X86 backend support for optimizing masked operations over AVX-512 targets (JDK-8262356).

Summary of changes:

1) Support for masked rotate left and right operations over integer/long vectors.

2) Support for masked square root operation over float/double vectors.

3) Support for masked logical shiftleft and logical/arithmetic shiftright operation with constant shift count.

4) Optimized VectorMask.not operation by emitting direct KNOT instruction.

5) Extended masking optimization support for X86 KNL target which has limited set of AVX-512 features.

      - Currently vector type associated with VectorLoadMask operation is created during parsing stage.
        For targets supporting opmask registers, lane type is explicitly set to BOOLEAN irrespective of the primitive
        type of species i.e. for Int512 species ideal type TypeVectMask(16,BOOL) represent vector of 16 BOOLEAN elements
        each of which represent a mask bit for corresponding vector lane.
        This type information is also associated with respective mask boxes (Int512Mask).

      - During macro expansion vbox/vunbox nodes are broken down into granular target mappable ideal nodes.

          ``` 
              VectorBoxNode   -> VectorStoreMask + StoreVector

              VectorUnboxNode -> LoadVector + VectorLoadMask 
          ```

         At this stage vector type (TypeVectMask(16,BOOL)) earlier associated with vunbox node is used to create the
        type for VectorLoadMask operation.

      - Masks can be propagated either though a vector (non-AVX512 targets) or using opmask registers (K1-K7).
        Decision to create correct ideal type based on the target features is delegated to low level
        type creation routine TypeVect::makemask.

      - This creates problem for targets like KNL which support limited set of AVX-512 features i.e. do
        no support AVX512VL and AVX512BW feature.

      - For Int512 species initial ideal type constructed during parsing is based on primitive type and
        lane count associated with species, but during macro expansion type creation
        decision is based on vector type associated with v[u]box nodes i.e. TypeVectoMask(16,BOOL),
        thus for KNL target incorrect vector mask type TypeVectX(16,BOOL) gets created since it does not
        support vector length extension(128,256 bit operation over EVEX encoded instruction).

      - There are multiple ways to fix this discrepancy, cleanest approach is to create ideal type TypeVectoMask 
        based on the primitive lane type of the species, instead of always setting the lane type as BOOLEAN.
        This will also preserve the original lane type information which was needed in some cases e.g.
        reinterpretation operation over mask. To circumvent such issue explicit src/dst primitive types
        were added to ideal nodes.

      - Also this does not disturbs the register mask and spilling behavior associated with opmask registers
        thus the change is transparent to backend passes.

Validation:
Patch regressed through tier1-3 tests at AVX Level=0,1,2,3 and UseKNLSetting

-------------

Commit messages:
 - 8273406: Optimize various masked vector operations for AVX512 target.

Changes: https://git.openjdk.java.net/panama-vector/pull/122/files
 Webrev: https://webrevs.openjdk.java.net/?repo=panama-vector&pr=122&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8273406
  Stats: 736 lines in 14 files changed: 666 ins; 26 del; 44 mod
  Patch: https://git.openjdk.java.net/panama-vector/pull/122.diff
  Fetch: git fetch https://git.openjdk.java.net/panama-vector pull/122/head:pull/122

PR: https://git.openjdk.java.net/panama-vector/pull/122