RFR: 8372980: [VectorAPI] AArch64: Add intrinsic support for unsigned min/max reduction operations [v2]

Eric Fang erfang at openjdk.org
Wed Jan 21 10:15:27 UTC 2026


On Tue, 20 Jan 2026 19:23:38 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> Eric Fang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits:
>> 
>>  - Rebase commit 56d7b52
>>  - Merge branch 'master' into JDK-8372980-umin-umax-intrinsic
>>  - 8372980: [VectorAPI] AArch64: Add intrinsic support for unsigned min/max reduction operations
>>    
>>    This patch adds intrinsic support for UMIN and UMAX reduction operations
>>    in the Vector API on AArch64, enabling direct hardware instruction mapping
>>    for better performance.
>>    
>>    Changes:
>>    --------
>>    
>>    1. C2 mid-end:
>>       - Added UMinReductionVNode and UMaxReductionVNode
>>    
>>    2. AArch64 Backend:
>>       - Added uminp/umaxp/sve_uminv/sve_umaxv instructions
>>       - Updated match rules for all vector sizes and element types
>>       - Both NEON and SVE implementation are supported
>>    
>>    3. Test:
>>       - Added UMIN_REDUCTION_V and UMAX_REDUCTION_V to IRNode.java
>>       - Added assembly tests in aarch64-asmtest.py for new instructions
>>       - Added a JTReg test file VectorUMinMaxReductionTest.java
>>    
>>    Different configurations were tested on aarch64 and x86 machines, and
>>    all tests passed.
>>    
>>    Test results of JMH benchmarks from the panama-vector project:
>>    --------
>>    
>>    On a Nvidia Grace machine with 128-bit SVE:
>>    ```
>>    Benchmark			Unit	Before	Error	After		Error	Uplift
>>    Byte128Vector.UMAXLanes		ops/ms	411.60	42.18	25226.51	33.92	61.29
>>    Byte128Vector.UMAXMaskedLanes	ops/ms	558.56	85.12	25182.90	28.74	45.09
>>    Byte128Vector.UMINLanes		ops/ms	645.58	780.76	28396.29	103.11	43.99
>>    Byte128Vector.UMINMaskedLanes	ops/ms	621.09	718.27	26122.62	42.68	42.06
>>    Byte64Vector.UMAXLanes		ops/ms	296.33	34.44	14357.74	15.95	48.45
>>    Byte64Vector.UMAXMaskedLanes	ops/ms	376.54	44.01	14269.24	21.41	37.90
>>    Byte64Vector.UMINLanes		ops/ms	373.45	426.51	15425.36	66.20	41.31
>>    Byte64Vector.UMINMaskedLanes	ops/ms	353.32	346.87	14201.37	13.79	40.19
>>    Int128Vector.UMAXLanes		ops/ms	174.79	192.51	9906.07		286.93	56.67
>>    Int128Vector.UMAXMaskedLanes	ops/ms	157.23	206.68	10246.77	11.44	65.17
>>    Int64Vector.UMAXLanes		ops/ms	95.30	126.49	4719.30		98.57	49.52
>>    Int64Vector.UMAXMaskedLanes	ops/ms	88.19	87.44	4693.18		19.76	53.22
>>    Long128Vector.UMAXLanes		ops/ms	80.62	97.82	5064.01		35.52	62.82
>>    Long128Vector.UMAXMaskedLanes	ops/ms	78.15	102.91	5028.24		8.74	64.34
>>    Long64Vector.UMAXLanes		ops/ms	47.56	62.01	46.76		52.28	0.98
>>    Long64V...
>
> I'm sorry, I _completely_ overthought that one. All you need are definitions for `min[vp]` and `max[vp]` in C2_Macroassembler.
> 
> Like so:
> 
> `void minv(bool is_unsigned, ...) { if (is_unsigned) { uminv(... } else { sminv(... } }`
> 
> No need to mess with class `Assembler`.

@theRealAph I have made the change, please help take another look, thanks~

-------------

PR Comment: https://git.openjdk.org/jdk/pull/28693#issuecomment-3777257496


More information about the hotspot-compiler-dev mailing list