RFR: 8372980: [VectorAPI] AArch64: Add intrinsic support for unsigned min/max reduction operations [v4]
Eric Fang
erfang at openjdk.org
Mon Jan 26 09:26:35 UTC 2026
> This patch adds intrinsic support for UMIN and UMAX reduction operations in the Vector API on AArch64, enabling direct hardware instruction mapping for better performance.
>
> Changes:
> --------
>
> 1. C2 mid-end:
> - Added UMinReductionVNode and UMaxReductionVNode
>
> 2. AArch64 Backend:
> - Added uminp/umaxp/sve_uminv/sve_umaxv instructions
> - Updated match rules for all vector sizes and element types
> - Both NEON and SVE implementation are supported
>
> 3. Test:
> - Added UMIN_REDUCTION_V and UMAX_REDUCTION_V to IRNode.java
> - Added assembly tests in aarch64-asmtest.py for new instructions
> - Added a JTReg test file VectorUMinMaxReductionTest.java
>
> Different configurations were tested on aarch64 and x86 machines, and all tests passed.
>
> Test results of JMH benchmarks from the panama-vector project:
> --------
>
> On a Nvidia Grace machine with 128-bit SVE:
>
> Benchmark Unit Before Error After Error Uplift
> Byte128Vector.UMAXLanes ops/ms 411.60 42.18 25226.51 33.92 61.29
> Byte128Vector.UMAXMaskedLanes ops/ms 558.56 85.12 25182.90 28.74 45.09
> Byte128Vector.UMINLanes ops/ms 645.58 780.76 28396.29 103.11 43.99
> Byte128Vector.UMINMaskedLanes ops/ms 621.09 718.27 26122.62 42.68 42.06
> Byte64Vector.UMAXLanes ops/ms 296.33 34.44 14357.74 15.95 48.45
> Byte64Vector.UMAXMaskedLanes ops/ms 376.54 44.01 14269.24 21.41 37.90
> Byte64Vector.UMINLanes ops/ms 373.45 426.51 15425.36 66.20 41.31
> Byte64Vector.UMINMaskedLanes ops/ms 353.32 346.87 14201.37 13.79 40.19
> Int128Vector.UMAXLanes ops/ms 174.79 192.51 9906.07 286.93 56.67
> Int128Vector.UMAXMaskedLanes ops/ms 157.23 206.68 10246.77 11.44 65.17
> Int64Vector.UMAXLanes ops/ms 95.30 126.49 4719.30 98.57 49.52
> Int64Vector.UMAXMaskedLanes ops/ms 88.19 87.44 4693.18 19.76 53.22
> Long128Vector.UMAXLanes ops/ms 80.62 97.82 5064.01 35.52 62.82
> Long128Vector.UMAXMaskedLanes ops/ms 78.15 102.91 5028.24 8.74 64.34
> Long64Vector.UMAXLanes ops/ms 47.56 62.01 46.76 52.28 0.98
> Long64Vector.UMAXMaskedLanes ops/ms 45.44 46.76 45.79 42.91 1.01
> Short128Vector.UMAXLanes ops/ms 316.65 410.30 14814.82 23.65 46.79
> Short128Vector.UMAXMaskedLanes ops/ms 308.90 351.78 15155.26 31.03 49.06
> Sh...
Eric Fang has updated the pull request incrementally with one additional commit since the last revision:
Move helper functions into c2_MacroAssembler_aarch64.hpp
-------------
Changes:
- all: https://git.openjdk.org/jdk/pull/28693/files
- new: https://git.openjdk.org/jdk/pull/28693/files/fc3dee3d..10d74f13
Webrevs:
- full: https://webrevs.openjdk.org/?repo=jdk&pr=28693&range=03
- incr: https://webrevs.openjdk.org/?repo=jdk&pr=28693&range=02-03
Stats: 104 lines in 2 files changed: 26 ins; 64 del; 14 mod
Patch: https://git.openjdk.org/jdk/pull/28693.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/28693/head:pull/28693
PR: https://git.openjdk.org/jdk/pull/28693
More information about the hotspot-compiler-dev
mailing list