RFR: 8319872: AArch64: [vectorapi] Implementation of unsigned (zero extended) casts

Eric Liu eliu at openjdk.org
Wed Nov 15 07:54:41 UTC 2023


Vector API defines zero-extend operations [1], which are going to be intrinsified and generated to `VectorUCastNode` by C2. This patch adds backend implementation for `VectorUCastNode` on AArch64.

The micro benchmark shows significant performance improvement. In my test machine (SVE, 256-bit), the result is shown as below:



  Benchmark                     Before     After       Units   Gain
  VectorZeroExtend.byte2Int     3168.251   243012.399  ops/ms  75.70
  VectorZeroExtend.byte2Long    3212.201   216291.588  ops/ms  66.33
  VectorZeroExtend.byte2Short   3391.968   182655.365  ops/ms  52.85
  VectorZeroExtend.int2Long     1012.197    80448.553  ops/ms  78.48
  VectorZeroExtend.short2Int    1812.471   153416.828  ops/ms  83.65
  VectorZeroExtend.short2Long   1788.382   129794.814  ops/ms  71.58


On other Neon systems, we can get similar performance boost as a result of intrinsification success.

Since `VectorUCastNode` only used in Vector API's zero extension currently, this patch also adds assertion on nodes' definitions to clarify their usages.

[TEST]
compiler/vectorapi and jdk/incubator/vector passed on NEON and SVE machines.

[1] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java#L726

-------------

Commit messages:
 - 8319872: AArch64: [vectorapi] Implementation of unsigned (zero extended) casts

Changes: https://git.openjdk.org/jdk/pull/16670/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16670&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8319872
  Stats: 376 lines in 7 files changed: 337 ins; 0 del; 39 mod
  Patch: https://git.openjdk.org/jdk/pull/16670.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/16670/head:pull/16670

PR: https://git.openjdk.org/jdk/pull/16670


More information about the hotspot-compiler-dev mailing list