RFR: 8373344: Add support for min/max reduction operations for Float16 [v2]

Mon Jan 5 11:31:26 UTC 2026

> This patch adds mid-end support for vectorized min/max reduction operations for half floats. It also includes backend AArch64 support for these operations.
> Both floating point min/max reductions don’t require strict order, because they are associative.
> 
> It will generate NEON fminv/fmaxv reduction instructions when max vector length is 8B or 16B. On SVE supporting machines with vector lengths > 16B, it will generate the SVE fminv/fmaxv instructions.
> The patch also adds support for partial min/max reductions on SVE machines using fminv/fmaxv.
> 
> Ratio of throughput(ops/ms) > 1 indicates the performance with this patch is better than the mainline.
> 
> Neoverse N1 (UseSVE = 0, max vector length = 16B):
> 
> Benchmark         vectorDim  Mode   Cnt     8B    16B
> ReductionMaxFP16   256       thrpt 9      3.69   6.44
> ReductionMaxFP16   512       thrpt 9      3.71   7.62
> ReductionMaxFP16   1024      thrpt 9      4.16   8.64
> ReductionMaxFP16   2048      thrpt 9      4.44   9.12
> ReductionMinFP16   256       thrpt 9      3.69   6.43
> ReductionMinFP16   512       thrpt 9      3.70   7.62
> ReductionMinFP16   1024      thrpt 9      4.16   8.64
> ReductionMinFP16   2048      thrpt 9      4.44   9.10
> 
> 
> Neoverse V1 (UseSVE = 1, max vector length = 32B):
> 
> Benchmark         vectorDim  Mode   Cnt     8B    16B    32B
> ReductionMaxFP16   256       thrpt 9      3.96   8.62   8.02
> ReductionMaxFP16   512       thrpt 9      3.54   9.25  11.71
> ReductionMaxFP16   1024      thrpt 9      3.77   8.71  14.07
> ReductionMaxFP16   2048      thrpt 9      3.88   8.44  14.69
> ReductionMinFP16   256       thrpt 9      3.96   8.61   8.03
> ReductionMinFP16   512       thrpt 9      3.54   9.28  11.69
> ReductionMinFP16   1024      thrpt 9      3.76   8.70  14.12
> ReductionMinFP16   2048      thrpt 9      3.87   8.45  14.70
> 
> 
> Neoverse V2 (UseSVE = 2, max vector length = 16B):
> 
> Benchmark         vectorDim  Mode   Cnt     8B    16B
> ReductionMaxFP16   256       thrpt 9      4.78  10.00
> ReductionMaxFP16   512       thrpt 9      3.74  11.33
> ReductionMaxFP16   1024      thrpt 9      3.86   9.59
> ReductionMaxFP16   2048      thrpt 9      3.94   8.71
> ReductionMinFP16   256       thrpt 9      4.78  10.00
> ReductionMinFP16   512       thrpt 9      3.74  11.29
> ReductionMinFP16   1024      thrpt 9      3.86   9.58
> ReductionMinFP16   2048      thrpt 9      3.94   8.71
> 
> 
> Testing:
> hotspot_all, jdk (tier1-3) and langtools (tier1) all pass on Neoverse N1/V1/V2.

Yi Wu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision:

 - Replace assert with verify
 - Add IRNode constant and code refactor
 - Merge remote-tracking branch 'origin/master' into yiwu-8373344
 - 8373344: Add support for FP16 min/max reduction operations

   This patch adds mid-end support for vectorized min/max reduction
   operations for half floats. It also includes backend AArch64 support
   for these operations.
   Both floating point min/max reductions don’t require strict order,
   because they are associative.

   It will generate NEON fminv/fmaxv reduction instructions when
   max vector length is 8B or 16B. On SVE supporting machines
   with vector lengths > 16B, it will generate the SVE fminv/fmaxv
   instructions.
   The patch also adds support for partial min/max reductions on
   SVE machines using fminv/fmaxv.

   Ratio of throughput(ops/ms) > 1 indicates the performance with
   this patch is better than the mainline.

   Neoverse N1 (UseSVE = 0, max vector length = 16B):
   Benchmark         vectorDim  Mode   Cnt     8B    16B
   ReductionMaxFP16   256       thrpt 9      3.69   6.44
   ReductionMaxFP16   512       thrpt 9      3.71   7.62
   ReductionMaxFP16   1024      thrpt 9      4.16   8.64
   ReductionMaxFP16   2048      thrpt 9      4.44   9.12
   ReductionMinFP16   256       thrpt 9      3.69   6.43
   ReductionMinFP16   512       thrpt 9      3.70   7.62
   ReductionMinFP16   1024      thrpt 9      4.16   8.64
   ReductionMinFP16   2048      thrpt 9      4.44   9.10

   Neoverse V1 (UseSVE = 1, max vector length = 32B):
   Benchmark         vectorDim  Mode   Cnt     8B    16B    32B
   ReductionMaxFP16   256       thrpt 9      3.96   8.62   8.02
   ReductionMaxFP16   512       thrpt 9      3.54   9.25  11.71
   ReductionMaxFP16   1024      thrpt 9      3.77   8.71  14.07
   ReductionMaxFP16   2048      thrpt 9      3.88   8.44  14.69
   ReductionMinFP16   256       thrpt 9      3.96   8.61   8.03
   ReductionMinFP16   512       thrpt 9      3.54   9.28  11.69
   ReductionMinFP16   1024      thrpt 9      3.76   8.70  14.12
   ReductionMinFP16   2048      thrpt 9      3.87   8.45  14.70

   Neoverse V2 (UseSVE = 2, max vector length = 16B):
   Benchmark         vectorDim  Mode   Cnt     8B    16B
   ReductionMaxFP16   256       thrpt 9      4.78  10.00
   ReductionMaxFP16   512       thrpt 9      3.74  11.33
   ReductionMaxFP16   1024      thrpt 9      3.86   9.59
   ReductionMaxFP16   2048      thrpt 9      3.94   8.71
   ReductionMinFP16   256       thrpt 9      4.78  10.00
   ReductionMinFP16   512       thrpt 9      3.74  11.29
   ReductionMinFP16   1024      thrpt 9      3.86   9.58
   ReductionMinFP16   2048      thrpt 9      3.94   8.71

   Testing:
   hotspot_all, jdk (tier1-3) and langtools (tier1) all pass on
   Neoverse N1/V1/V2.

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/28828/files
  - new: https://git.openjdk.org/jdk/pull/28828/files/2f80bc4f..9971752e

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=28828&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28828&range=00-01

  Stats: 17385 lines in 2438 files changed: 9261 ins; 2408 del; 5716 mod
  Patch: https://git.openjdk.org/jdk/pull/28828.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/28828/head:pull/28828

PR: https://git.openjdk.org/jdk/pull/28828