RFR: 8372980: [VectorAPI] AArch64: Add intrinsic support for unsigned min/max reduction operations

Thu Dec 11 11:11:31 UTC 2025

On Mon, 8 Dec 2025 03:29:03 GMT, Eric Fang <erfang at openjdk.org> wrote:

> This patch adds intrinsic support for UMIN and UMAX reduction operations in the Vector API on AArch64, enabling direct hardware instruction mapping for better performance.
> 
> Changes:
> --------
> 
> 1. C2 mid-end:
>    - Added UMinReductionVNode and UMaxReductionVNode
> 
> 2. AArch64 Backend:
>    - Added uminp/umaxp/sve_uminv/sve_umaxv instructions
>    - Updated match rules for all vector sizes and element types
>    - Both NEON and SVE implementation are supported
> 
> 3. Test:
>    - Added UMIN_REDUCTION_V and UMAX_REDUCTION_V to IRNode.java
>    - Added assembly tests in aarch64-asmtest.py for new instructions
>    - Added a JTReg test file VectorUMinMaxReductionTest.java
> 
> Different configurations were tested on aarch64 and x86 machines, and all tests passed.
> 
> Test results of JMH benchmarks from the panama-vector project:
> --------
> 
> On a Nvidia Grace machine with 128-bit SVE:
> 
> Benchmark                       Unit    Before  Error   After           Error   Uplift
> Byte128Vector.UMAXLanes         ops/ms  411.60  42.18   25226.51        33.92   61.29
> Byte128Vector.UMAXMaskedLanes   ops/ms  558.56  85.12   25182.90        28.74   45.09
> Byte128Vector.UMINLanes         ops/ms  645.58  780.76  28396.29        103.11  43.99
> Byte128Vector.UMINMaskedLanes   ops/ms  621.09  718.27  26122.62        42.68   42.06
> Byte64Vector.UMAXLanes          ops/ms  296.33  34.44   14357.74        15.95   48.45
> Byte64Vector.UMAXMaskedLanes    ops/ms  376.54  44.01   14269.24        21.41   37.90
> Byte64Vector.UMINLanes          ops/ms  373.45  426.51  15425.36        66.20   41.31
> Byte64Vector.UMINMaskedLanes    ops/ms  353.32  346.87  14201.37        13.79   40.19
> Int128Vector.UMAXLanes          ops/ms  174.79  192.51  9906.07         286.93  56.67
> Int128Vector.UMAXMaskedLanes    ops/ms  157.23  206.68  10246.77        11.44   65.17
> Int64Vector.UMAXLanes           ops/ms  95.30   126.49  4719.30         98.57   49.52
> Int64Vector.UMAXMaskedLanes     ops/ms  88.19   87.44   4693.18         19.76   53.22
> Long128Vector.UMAXLanes         ops/ms  80.62   97.82   5064.01         35.52   62.82
> Long128Vector.UMAXMaskedLanes   ops/ms  78.15   102.91  5028.24         8.74    64.34
> Long64Vector.UMAXLanes          ops/ms  47.56   62.01   46.76           52.28   0.98
> Long64Vector.UMAXMaskedLanes    ops/ms  45.44   46.76   45.79           42.91   1.01
> Short128Vector.UMAXLanes        ops/ms  316.65  410.30  14814.82        23.65   46.79
> Short128Vector.UMAXMaskedLanes  ops/ms  308.90  351.78  15155.26        31.03   49.06
> Sh...

> Nice work. Thanks for your support!
> 
> I noticed that this PR contains the same commit of #28692. Could you please split the change from this PR? If this PR depends on #28692, I wonder whether we can change the target merge branch to `pr/28692` instead of `master` please?

Yeah, I think I made a mistake when pushing the PR. I'll just convert this PR as draft since  #28692 is under active review. Then rebase the PR after  #28692 is merged. Thanks for the reminder~

-------------

PR Comment: https://git.openjdk.org/jdk/pull/28693#issuecomment-3641410633