RFR: 8346256: Optimize UMIN/UMAX reduction operations for x86 targets

Jatin Bhateja jbhateja at openjdk.org
Tue Feb 17 06:27:13 UTC 2026


Hi all,

Patch adds x86 backend implementation for UMIN/UMAX reduction operation.

Following are the performance numbers of existing micro-benchmark test/micro/org/openjdk/bench/jdk/incubator/vector/VectorUMinUMaxReductionBenchmark.java

System Configuration:
  Model name:             AMD EPYC 9755 128-Core Processor (Turin)
  Fixed Frequency : 2.1GHz



Baseline:-
----------
Benchmark                                                  (size)   Mode  Cnt     Score   Error   Units
VectorUMinUMaxReductionBenchmark.byteUMaxReduction           1024  thrpt    2  1183.300          ops/ms
VectorUMinUMaxReductionBenchmark.byteUMaxReductionMasked     1024  thrpt    2  1426.570          ops/ms
VectorUMinUMaxReductionBenchmark.byteUMinReduction           1024  thrpt    2  1186.889          ops/ms
VectorUMinUMaxReductionBenchmark.byteUMinReductionMasked     1024  thrpt    2  1360.700          ops/ms
VectorUMinUMaxReductionBenchmark.intUMaxReduction            1024  thrpt    2   967.264          ops/ms
VectorUMinUMaxReductionBenchmark.intUMaxReductionMasked      1024  thrpt    2   767.641          ops/ms
VectorUMinUMaxReductionBenchmark.intUMinReduction            1024  thrpt    2   969.714          ops/ms
VectorUMinUMaxReductionBenchmark.intUMinReductionMasked      1024  thrpt    2   799.210          ops/ms
VectorUMinUMaxReductionBenchmark.longUMaxReduction           1024  thrpt    2   410.210          ops/ms
VectorUMinUMaxReductionBenchmark.longUMaxReductionMasked     1024  thrpt    2   452.717          ops/ms
VectorUMinUMaxReductionBenchmark.longUMinReduction           1024  thrpt    2   470.575          ops/ms
VectorUMinUMaxReductionBenchmark.longUMinReductionMasked     1024  thrpt    2   485.897          ops/ms
VectorUMinUMaxReductionBenchmark.shortUMaxReduction          1024  thrpt    2   958.935          ops/ms
VectorUMinUMaxReductionBenchmark.shortUMaxReductionMasked    1024  thrpt    2   937.805          ops/ms
VectorUMinUMaxReductionBenchmark.shortUMinReduction          1024  thrpt    2   950.125          ops/ms
VectorUMinUMaxReductionBenchmark.shortUMinReductionMasked    1024  thrpt    2   928.718          ops/ms

Withopt:-
---------
Benchmark                                                  (size)   Mode  Cnt      Score   Error   Units
VectorUMinUMaxReductionBenchmark.byteUMaxReduction           1024  thrpt    2  21391.700          ops/ms
VectorUMinUMaxReductionBenchmark.byteUMaxReductionMasked     1024  thrpt    2  19865.073          ops/ms
VectorUMinUMaxReductionBenchmark.byteUMinReduction           1024  thrpt    2  20783.616          ops/ms
VectorUMinUMaxReductionBenchmark.byteUMinReductionMasked     1024  thrpt    2  19703.367          ops/ms
VectorUMinUMaxReductionBenchmark.intUMaxReduction            1024  thrpt    2   9883.694          ops/ms
VectorUMinUMaxReductionBenchmark.intUMaxReductionMasked      1024  thrpt    2   9067.299          ops/ms
VectorUMinUMaxReductionBenchmark.intUMinReduction            1024  thrpt    2   9960.181          ops/ms
VectorUMinUMaxReductionBenchmark.intUMinReductionMasked      1024  thrpt    2   8824.220          ops/ms
VectorUMinUMaxReductionBenchmark.longUMaxReduction           1024  thrpt    2   6144.831          ops/ms
VectorUMinUMaxReductionBenchmark.longUMaxReductionMasked     1024  thrpt    2   5820.244          ops/ms
VectorUMinUMaxReductionBenchmark.longUMinReduction           1024  thrpt    2   6157.208          ops/ms
VectorUMinUMaxReductionBenchmark.longUMinReductionMasked     1024  thrpt    2   5803.597          ops/ms
VectorUMinUMaxReductionBenchmark.shortUMaxReduction          1024  thrpt    2  12798.922          ops/ms
VectorUMinUMaxReductionBenchmark.shortUMaxReductionMasked    1024  thrpt    2  11872.386          ops/ms
VectorUMinUMaxReductionBenchmark.shortUMinReduction          1024  thrpt    2  12543.426          ops/ms
VectorUMinUMaxReductionBenchmark.shortUMinReductionMasked    1024  thrpt    2  11888.700          ops/ms


Kindly review and share your feedback.

Best Regards,
Jatin

-------------

Commit messages:
 - 8346256: Optimize UMIN/UMAX reduction operations for x86 targets

Changes: https://git.openjdk.org/jdk/pull/29751/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29751&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8346256
  Stats: 106 lines in 5 files changed: 88 ins; 0 del; 18 mod
  Patch: https://git.openjdk.org/jdk/pull/29751.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/29751/head:pull/29751

PR: https://git.openjdk.org/jdk/pull/29751


More information about the hotspot-dev mailing list