RFR: 8373344: Add support for min/max reduction operations for Float16 [v2]
Yi Wu
duke at openjdk.org
Mon Jan 5 11:31:26 UTC 2026
> This patch adds mid-end support for vectorized min/max reduction operations for half floats. It also includes backend AArch64 support for these operations.
> Both floating point min/max reductions don’t require strict order, because they are associative.
>
> It will generate NEON fminv/fmaxv reduction instructions when max vector length is 8B or 16B. On SVE supporting machines with vector lengths > 16B, it will generate the SVE fminv/fmaxv instructions.
> The patch also adds support for partial min/max reductions on SVE machines using fminv/fmaxv.
>
> Ratio of throughput(ops/ms) > 1 indicates the performance with this patch is better than the mainline.
>
> Neoverse N1 (UseSVE = 0, max vector length = 16B):
>
> Benchmark vectorDim Mode Cnt 8B 16B
> ReductionMaxFP16 256 thrpt 9 3.69 6.44
> ReductionMaxFP16 512 thrpt 9 3.71 7.62
> ReductionMaxFP16 1024 thrpt 9 4.16 8.64
> ReductionMaxFP16 2048 thrpt 9 4.44 9.12
> ReductionMinFP16 256 thrpt 9 3.69 6.43
> ReductionMinFP16 512 thrpt 9 3.70 7.62
> ReductionMinFP16 1024 thrpt 9 4.16 8.64
> ReductionMinFP16 2048 thrpt 9 4.44 9.10
>
>
> Neoverse V1 (UseSVE = 1, max vector length = 32B):
>
> Benchmark vectorDim Mode Cnt 8B 16B 32B
> ReductionMaxFP16 256 thrpt 9 3.96 8.62 8.02
> ReductionMaxFP16 512 thrpt 9 3.54 9.25 11.71
> ReductionMaxFP16 1024 thrpt 9 3.77 8.71 14.07
> ReductionMaxFP16 2048 thrpt 9 3.88 8.44 14.69
> ReductionMinFP16 256 thrpt 9 3.96 8.61 8.03
> ReductionMinFP16 512 thrpt 9 3.54 9.28 11.69
> ReductionMinFP16 1024 thrpt 9 3.76 8.70 14.12
> ReductionMinFP16 2048 thrpt 9 3.87 8.45 14.70
>
>
> Neoverse V2 (UseSVE = 2, max vector length = 16B):
>
> Benchmark vectorDim Mode Cnt 8B 16B
> ReductionMaxFP16 256 thrpt 9 4.78 10.00
> ReductionMaxFP16 512 thrpt 9 3.74 11.33
> ReductionMaxFP16 1024 thrpt 9 3.86 9.59
> ReductionMaxFP16 2048 thrpt 9 3.94 8.71
> ReductionMinFP16 256 thrpt 9 4.78 10.00
> ReductionMinFP16 512 thrpt 9 3.74 11.29
> ReductionMinFP16 1024 thrpt 9 3.86 9.58
> ReductionMinFP16 2048 thrpt 9 3.94 8.71
>
>
> Testing:
> hotspot_all, jdk (tier1-3) and langtools (tier1) all pass on Neoverse N1/V1/V2.
Yi Wu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision:
- Replace assert with verify
- Add IRNode constant and code refactor
- Merge remote-tracking branch 'origin/master' into yiwu-8373344
- 8373344: Add support for FP16 min/max reduction operations
This patch adds mid-end support for vectorized min/max reduction
operations for half floats. It also includes backend AArch64 support
for these operations.
Both floating point min/max reductions don’t require strict order,
because they are associative.
It will generate NEON fminv/fmaxv reduction instructions when
max vector length is 8B or 16B. On SVE supporting machines
with vector lengths > 16B, it will generate the SVE fminv/fmaxv
instructions.
The patch also adds support for partial min/max reductions on
SVE machines using fminv/fmaxv.
Ratio of throughput(ops/ms) > 1 indicates the performance with
this patch is better than the mainline.
Neoverse N1 (UseSVE = 0, max vector length = 16B):
Benchmark vectorDim Mode Cnt 8B 16B
ReductionMaxFP16 256 thrpt 9 3.69 6.44
ReductionMaxFP16 512 thrpt 9 3.71 7.62
ReductionMaxFP16 1024 thrpt 9 4.16 8.64
ReductionMaxFP16 2048 thrpt 9 4.44 9.12
ReductionMinFP16 256 thrpt 9 3.69 6.43
ReductionMinFP16 512 thrpt 9 3.70 7.62
ReductionMinFP16 1024 thrpt 9 4.16 8.64
ReductionMinFP16 2048 thrpt 9 4.44 9.10
Neoverse V1 (UseSVE = 1, max vector length = 32B):
Benchmark vectorDim Mode Cnt 8B 16B 32B
ReductionMaxFP16 256 thrpt 9 3.96 8.62 8.02
ReductionMaxFP16 512 thrpt 9 3.54 9.25 11.71
ReductionMaxFP16 1024 thrpt 9 3.77 8.71 14.07
ReductionMaxFP16 2048 thrpt 9 3.88 8.44 14.69
ReductionMinFP16 256 thrpt 9 3.96 8.61 8.03
ReductionMinFP16 512 thrpt 9 3.54 9.28 11.69
ReductionMinFP16 1024 thrpt 9 3.76 8.70 14.12
ReductionMinFP16 2048 thrpt 9 3.87 8.45 14.70
Neoverse V2 (UseSVE = 2, max vector length = 16B):
Benchmark vectorDim Mode Cnt 8B 16B
ReductionMaxFP16 256 thrpt 9 4.78 10.00
ReductionMaxFP16 512 thrpt 9 3.74 11.33
ReductionMaxFP16 1024 thrpt 9 3.86 9.59
ReductionMaxFP16 2048 thrpt 9 3.94 8.71
ReductionMinFP16 256 thrpt 9 4.78 10.00
ReductionMinFP16 512 thrpt 9 3.74 11.29
ReductionMinFP16 1024 thrpt 9 3.86 9.58
ReductionMinFP16 2048 thrpt 9 3.94 8.71
Testing:
hotspot_all, jdk (tier1-3) and langtools (tier1) all pass on
Neoverse N1/V1/V2.
-------------
Changes:
- all: https://git.openjdk.org/jdk/pull/28828/files
- new: https://git.openjdk.org/jdk/pull/28828/files/2f80bc4f..9971752e
Webrevs:
- full: https://webrevs.openjdk.org/?repo=jdk&pr=28828&range=01
- incr: https://webrevs.openjdk.org/?repo=jdk&pr=28828&range=00-01
Stats: 17385 lines in 2438 files changed: 9261 ins; 2408 del; 5716 mod
Patch: https://git.openjdk.org/jdk/pull/28828.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/28828/head:pull/28828
PR: https://git.openjdk.org/jdk/pull/28828
More information about the hotspot-compiler-dev
mailing list