RFR: 8372980: [VectorAPI] AArch64: Add intrinsic support for unsigned min/max reduction operations [v4]

Mon Jan 26 09:26:35 UTC 2026

> This patch adds intrinsic support for UMIN and UMAX reduction operations in the Vector API on AArch64, enabling direct hardware instruction mapping for better performance.
> 
> Changes:
> --------
> 
> 1. C2 mid-end:
>    - Added UMinReductionVNode and UMaxReductionVNode
> 
> 2. AArch64 Backend:
>    - Added uminp/umaxp/sve_uminv/sve_umaxv instructions
>    - Updated match rules for all vector sizes and element types
>    - Both NEON and SVE implementation are supported
> 
> 3. Test:
>    - Added UMIN_REDUCTION_V and UMAX_REDUCTION_V to IRNode.java
>    - Added assembly tests in aarch64-asmtest.py for new instructions
>    - Added a JTReg test file VectorUMinMaxReductionTest.java
> 
> Different configurations were tested on aarch64 and x86 machines, and all tests passed.
> 
> Test results of JMH benchmarks from the panama-vector project:
> --------
> 
> On a Nvidia Grace machine with 128-bit SVE:
> 
> Benchmark                       Unit    Before  Error   After           Error   Uplift
> Byte128Vector.UMAXLanes         ops/ms  411.60  42.18   25226.51        33.92   61.29
> Byte128Vector.UMAXMaskedLanes   ops/ms  558.56  85.12   25182.90        28.74   45.09
> Byte128Vector.UMINLanes         ops/ms  645.58  780.76  28396.29        103.11  43.99
> Byte128Vector.UMINMaskedLanes   ops/ms  621.09  718.27  26122.62        42.68   42.06
> Byte64Vector.UMAXLanes          ops/ms  296.33  34.44   14357.74        15.95   48.45
> Byte64Vector.UMAXMaskedLanes    ops/ms  376.54  44.01   14269.24        21.41   37.90
> Byte64Vector.UMINLanes          ops/ms  373.45  426.51  15425.36        66.20   41.31
> Byte64Vector.UMINMaskedLanes    ops/ms  353.32  346.87  14201.37        13.79   40.19
> Int128Vector.UMAXLanes          ops/ms  174.79  192.51  9906.07         286.93  56.67
> Int128Vector.UMAXMaskedLanes    ops/ms  157.23  206.68  10246.77        11.44   65.17
> Int64Vector.UMAXLanes           ops/ms  95.30   126.49  4719.30         98.57   49.52
> Int64Vector.UMAXMaskedLanes     ops/ms  88.19   87.44   4693.18         19.76   53.22
> Long128Vector.UMAXLanes         ops/ms  80.62   97.82   5064.01         35.52   62.82
> Long128Vector.UMAXMaskedLanes   ops/ms  78.15   102.91  5028.24         8.74    64.34
> Long64Vector.UMAXLanes          ops/ms  47.56   62.01   46.76           52.28   0.98
> Long64Vector.UMAXMaskedLanes    ops/ms  45.44   46.76   45.79           42.91   1.01
> Short128Vector.UMAXLanes        ops/ms  316.65  410.30  14814.82        23.65   46.79
> Short128Vector.UMAXMaskedLanes  ops/ms  308.90  351.78  15155.26        31.03   49.06
> Sh...

Eric Fang has updated the pull request incrementally with one additional commit since the last revision:

  Move helper functions into c2_MacroAssembler_aarch64.hpp

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/28693/files
  - new: https://git.openjdk.org/jdk/pull/28693/files/fc3dee3d..10d74f13

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=28693&range=03
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28693&range=02-03

  Stats: 104 lines in 2 files changed: 26 ins; 64 del; 14 mod
  Patch: https://git.openjdk.org/jdk/pull/28693.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/28693/head:pull/28693

PR: https://git.openjdk.org/jdk/pull/28693