RFR: 8346256: Optimize UMIN/UMAX reduction operations for x86 targets
Jatin Bhateja
jbhateja at openjdk.org
Tue Feb 17 06:27:13 UTC 2026
Hi all,
Patch adds x86 backend implementation for UMIN/UMAX reduction operation.
Following are the performance numbers of existing micro-benchmark test/micro/org/openjdk/bench/jdk/incubator/vector/VectorUMinUMaxReductionBenchmark.java
System Configuration:
Model name: AMD EPYC 9755 128-Core Processor (Turin)
Fixed Frequency : 2.1GHz
Baseline:-
----------
Benchmark (size) Mode Cnt Score Error Units
VectorUMinUMaxReductionBenchmark.byteUMaxReduction 1024 thrpt 2 1183.300 ops/ms
VectorUMinUMaxReductionBenchmark.byteUMaxReductionMasked 1024 thrpt 2 1426.570 ops/ms
VectorUMinUMaxReductionBenchmark.byteUMinReduction 1024 thrpt 2 1186.889 ops/ms
VectorUMinUMaxReductionBenchmark.byteUMinReductionMasked 1024 thrpt 2 1360.700 ops/ms
VectorUMinUMaxReductionBenchmark.intUMaxReduction 1024 thrpt 2 967.264 ops/ms
VectorUMinUMaxReductionBenchmark.intUMaxReductionMasked 1024 thrpt 2 767.641 ops/ms
VectorUMinUMaxReductionBenchmark.intUMinReduction 1024 thrpt 2 969.714 ops/ms
VectorUMinUMaxReductionBenchmark.intUMinReductionMasked 1024 thrpt 2 799.210 ops/ms
VectorUMinUMaxReductionBenchmark.longUMaxReduction 1024 thrpt 2 410.210 ops/ms
VectorUMinUMaxReductionBenchmark.longUMaxReductionMasked 1024 thrpt 2 452.717 ops/ms
VectorUMinUMaxReductionBenchmark.longUMinReduction 1024 thrpt 2 470.575 ops/ms
VectorUMinUMaxReductionBenchmark.longUMinReductionMasked 1024 thrpt 2 485.897 ops/ms
VectorUMinUMaxReductionBenchmark.shortUMaxReduction 1024 thrpt 2 958.935 ops/ms
VectorUMinUMaxReductionBenchmark.shortUMaxReductionMasked 1024 thrpt 2 937.805 ops/ms
VectorUMinUMaxReductionBenchmark.shortUMinReduction 1024 thrpt 2 950.125 ops/ms
VectorUMinUMaxReductionBenchmark.shortUMinReductionMasked 1024 thrpt 2 928.718 ops/ms
Withopt:-
---------
Benchmark (size) Mode Cnt Score Error Units
VectorUMinUMaxReductionBenchmark.byteUMaxReduction 1024 thrpt 2 21391.700 ops/ms
VectorUMinUMaxReductionBenchmark.byteUMaxReductionMasked 1024 thrpt 2 19865.073 ops/ms
VectorUMinUMaxReductionBenchmark.byteUMinReduction 1024 thrpt 2 20783.616 ops/ms
VectorUMinUMaxReductionBenchmark.byteUMinReductionMasked 1024 thrpt 2 19703.367 ops/ms
VectorUMinUMaxReductionBenchmark.intUMaxReduction 1024 thrpt 2 9883.694 ops/ms
VectorUMinUMaxReductionBenchmark.intUMaxReductionMasked 1024 thrpt 2 9067.299 ops/ms
VectorUMinUMaxReductionBenchmark.intUMinReduction 1024 thrpt 2 9960.181 ops/ms
VectorUMinUMaxReductionBenchmark.intUMinReductionMasked 1024 thrpt 2 8824.220 ops/ms
VectorUMinUMaxReductionBenchmark.longUMaxReduction 1024 thrpt 2 6144.831 ops/ms
VectorUMinUMaxReductionBenchmark.longUMaxReductionMasked 1024 thrpt 2 5820.244 ops/ms
VectorUMinUMaxReductionBenchmark.longUMinReduction 1024 thrpt 2 6157.208 ops/ms
VectorUMinUMaxReductionBenchmark.longUMinReductionMasked 1024 thrpt 2 5803.597 ops/ms
VectorUMinUMaxReductionBenchmark.shortUMaxReduction 1024 thrpt 2 12798.922 ops/ms
VectorUMinUMaxReductionBenchmark.shortUMaxReductionMasked 1024 thrpt 2 11872.386 ops/ms
VectorUMinUMaxReductionBenchmark.shortUMinReduction 1024 thrpt 2 12543.426 ops/ms
VectorUMinUMaxReductionBenchmark.shortUMinReductionMasked 1024 thrpt 2 11888.700 ops/ms
Kindly review and share your feedback.
Best Regards,
Jatin
-------------
Commit messages:
- 8346256: Optimize UMIN/UMAX reduction operations for x86 targets
Changes: https://git.openjdk.org/jdk/pull/29751/files
Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29751&range=00
Issue: https://bugs.openjdk.org/browse/JDK-8346256
Stats: 106 lines in 5 files changed: 88 ins; 0 del; 18 mod
Patch: https://git.openjdk.org/jdk/pull/29751.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/29751/head:pull/29751
PR: https://git.openjdk.org/jdk/pull/29751
More information about the hotspot-dev
mailing list