RFR: 8306136: [vectorapi] Intrinsics of VectorMask.laneIsSet()

Eric Liu eliu at openjdk.org
Mon May 29 12:03:07 UTC 2023


VectorMask.laneIsSet() [1] is implemented based on VectorMask.toLong() [2], and it's performance highly depends on the intrinsification of toLong(). However, if `toLong()` is failed to intrinsify, on some architectures or unsupported species, it's much more expensive than pure getBits(). Besides, some CPUs (e.g. with Arm Neon) may not have efficient instructions to implementation toLong(), so we propose to intrinsify VectorMask.laneIsSet separately.

This patch optimize laneIsSet() by calling the existing intrinsic method VectorSupport.extract(), which actually does not introduce new intrinsic method. The C2 compiler intrinsification logic to support _VectorExtract has also been extended to better support laneIsSet(). It tries to extract the mask's lane value with an ExtractUB node if the hardware backend supports it. While on hardware without ExtractUB backend support , c2 will still try to generate toLong() related nodes, which behaves the same as before the patch.

Key changes in this patch:

1. Reuse intrinsic `VectorSupport.extract()` in Java side. No new intrinsic method is introduced.
2. In compiler, `ExtractUBNode` is generated if backend support is. If not, the original "toLong" pattern is generated if it's implemented. Otherwise, it uses the default Java `getBits[i]` rather than the expensive and complicated toLong() based implementation.
3. Enable `ExtractUBNode` on AArch64 to extract the lane value for a vector mask in compiler, together with changing its bottom type to TypeInt::BOOL. This helps optimize the conditional selection generated by

   ```

       public boolean laneIsSet(int i) {
           return VectorSupport.extract(..., defaultImpl) == 1L;
       }

   ```

[Test]
hotspot:compiler/vectorapi and jdk/incubator/vector passed.

[Performance]

Below shows the performance gain on 128-bit vector size Neon machine. For 64 and 128 SPECIES, the improvment caused by this intrinsics. For other SPECIES which can not be intrinfied, performance gain comes from the default Java implementation changes, i.e. getBits[i] vs. toLong().


Benchmark                               Gain (after/before)
microMaskLaneIsSetByte128_con           2.47
microMaskLaneIsSetByte128_var           1.82
microMaskLaneIsSetByte256_con           3.01
microMaskLaneIsSetByte256_var           3.04
microMaskLaneIsSetByte512_con           4.83
microMaskLaneIsSetByte512_var           4.86
microMaskLaneIsSetByte64_con            1.57
microMaskLaneIsSetByte64_var            1.18
microMaskLaneIsSetDouble128_con         1.19
microMaskLaneIsSetDouble128_var         1.13
microMaskLaneIsSetDouble256_con         1.33
microMaskLaneIsSetDouble256_var         1.40
microMaskLaneIsSetDouble512_con         1.62
microMaskLaneIsSetDouble512_var         1.70
microMaskLaneIsSetDouble64_con          1.18
microMaskLaneIsSetDouble64_var          1.21
microMaskLaneIsSetFloat128_con          1.56
microMaskLaneIsSetFloat128_var          1.25
microMaskLaneIsSetFloat256_con          1.63
microMaskLaneIsSetFloat256_var          1.68
microMaskLaneIsSetFloat512_con          2.13
microMaskLaneIsSetFloat512_var          2.18
microMaskLaneIsSetFloat64_con           1.21
microMaskLaneIsSetFloat64_var           1.11
microMaskLaneIsSetInt128_con            1.54
microMaskLaneIsSetInt128_var            1.24
microMaskLaneIsSetInt256_con            1.63
microMaskLaneIsSetInt256_var            1.72
microMaskLaneIsSetInt512_con            2.09
microMaskLaneIsSetInt512_var            2.19
microMaskLaneIsSetInt64_con             1.22
microMaskLaneIsSetInt64_var             1.13
microMaskLaneIsSetLong128_con           1.21
microMaskLaneIsSetLong128_var           1.08
microMaskLaneIsSetLong256_con           1.29
microMaskLaneIsSetLong256_var           1.41
microMaskLaneIsSetLong512_con           1.62
microMaskLaneIsSetLong512_var           1.67
microMaskLaneIsSetLong64_con            1.18
microMaskLaneIsSetLong64_var            1.21
microMaskLaneIsSetShort128_con          1.57
microMaskLaneIsSetShort128_var          1.19
microMaskLaneIsSetShort256_con          2.09
microMaskLaneIsSetShort256_var          2.17
microMaskLaneIsSetShort512_con          3.00
microMaskLaneIsSetShort512_var          3.08
microMaskLaneIsSetShort64_con           1.56
microMaskLaneIsSetShort64_var           1.25

No regression on x86.

[1] https://github.com/openjdk/jdk/blob/3d3eaed9133dbe728ca8e00a626d33f7e35ba9ff/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractMask.java#L72
[2] https://github.com/openjdk/jdk/blob/3d3eaed9133dbe728ca8e00a626d33f7e35ba9ff/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Int128Vector.java#L726
[3] https://github.com/openjdk/jdk/blob/3d3eaed9133dbe728ca8e00a626d33f7e35ba9ff/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractMask.java#L195

-------------

Commit messages:
 - 8306136: [vectorapi] Intrinsics of VectorMask.laneIsSet()

Changes: https://git.openjdk.org/jdk/pull/14200/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14200&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8306136
  Stats: 1101 lines in 40 files changed: 1023 ins; 29 del; 49 mod
  Patch: https://git.openjdk.org/jdk/pull/14200.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/14200/head:pull/14200

PR: https://git.openjdk.org/jdk/pull/14200


More information about the hotspot-dev mailing list