RFR: 8346256: Optimize UMIN/UMAX reduction operations for x86 targets [v2]

Tue Feb 17 06:36:05 UTC 2026

> Hi all,
> 
> Patch adds x86 backend implementation for UMIN/UMAX reduction operation.
> 
> Following are the performance numbers of existing micro-benchmark test/micro/org/openjdk/bench/jdk/incubator/vector/VectorUMinUMaxReductionBenchmark.java
> 
> System Configuration:
>   Model name:             AMD EPYC 9755 128-Core Processor (Turin)
>   Fixed Frequency : 2.1GHz
> 
> 
> 
> Baseline:-
> ----------
> Benchmark                                                  (size)   Mode  Cnt     Score   Error   Units
> VectorUMinUMaxReductionBenchmark.byteUMaxReduction           1024  thrpt    2  1183.300          ops/ms
> VectorUMinUMaxReductionBenchmark.byteUMaxReductionMasked     1024  thrpt    2  1426.570          ops/ms
> VectorUMinUMaxReductionBenchmark.byteUMinReduction           1024  thrpt    2  1186.889          ops/ms
> VectorUMinUMaxReductionBenchmark.byteUMinReductionMasked     1024  thrpt    2  1360.700          ops/ms
> VectorUMinUMaxReductionBenchmark.intUMaxReduction            1024  thrpt    2   967.264          ops/ms
> VectorUMinUMaxReductionBenchmark.intUMaxReductionMasked      1024  thrpt    2   767.641          ops/ms
> VectorUMinUMaxReductionBenchmark.intUMinReduction            1024  thrpt    2   969.714          ops/ms
> VectorUMinUMaxReductionBenchmark.intUMinReductionMasked      1024  thrpt    2   799.210          ops/ms
> VectorUMinUMaxReductionBenchmark.longUMaxReduction           1024  thrpt    2   410.210          ops/ms
> VectorUMinUMaxReductionBenchmark.longUMaxReductionMasked     1024  thrpt    2   452.717          ops/ms
> VectorUMinUMaxReductionBenchmark.longUMinReduction           1024  thrpt    2   470.575          ops/ms
> VectorUMinUMaxReductionBenchmark.longUMinReductionMasked     1024  thrpt    2   485.897          ops/ms
> VectorUMinUMaxReductionBenchmark.shortUMaxReduction          1024  thrpt    2   958.935          ops/ms
> VectorUMinUMaxReductionBenchmark.shortUMaxReductionMasked    1024  thrpt    2   937.805          ops/ms
> VectorUMinUMaxReductionBenchmark.shortUMinReduction          1024  thrpt    2   950.125          ops/ms
> VectorUMinUMaxReductionBenchmark.shortUMinReductionMasked    1024  thrpt    2   928.718          ops/ms
> 
> Withopt:-
> ---------
> Benchmark                                                  (size)   Mode  Cnt      Score   Error   Units
> VectorUMinUMaxReductionBenchmark.byteUMaxReduction           1024  thrpt    2  21391.700          ops/ms
> VectorUMinUMaxReductionBenchmark.byteUMaxReductionMasked     1024  thrpt    2  19865.073          ops/ms
> VectorUMinUMaxRed...

Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision:

 - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8346256
 - 8346256: Optimize UMIN/UMAX reduction operations for x86 targets

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/29751/files
  - new: https://git.openjdk.org/jdk/pull/29751/files/913f22a3..19a40fa0

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=29751&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29751&range=00-01

  Stats: 24143 lines in 508 files changed: 11385 ins; 2730 del; 10028 mod
  Patch: https://git.openjdk.org/jdk/pull/29751.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/29751/head:pull/29751

PR: https://git.openjdk.org/jdk/pull/29751