RFR: 8346256: Optimize UMIN/UMAX reduction operations for x86 targets [v2]
Jatin Bhateja
jbhateja at openjdk.org
Tue Feb 24 09:54:55 UTC 2026
On Tue, 17 Feb 2026 06:36:05 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:
>> Hi all,
>>
>> Patch adds x86 backend implementation for UMIN/UMAX reduction operation.
>>
>> Following are the performance numbers of existing micro-benchmark test/micro/org/openjdk/bench/jdk/incubator/vector/VectorUMinUMaxReductionBenchmark.java
>>
>> System Configuration:
>> Model name: AMD EPYC 9755 128-Core Processor (Turin)
>> Fixed Frequency : 2.1GHz
>>
>>
>>
>> Baseline:-
>> ----------
>> Benchmark (size) Mode Cnt Score Error Units
>> VectorUMinUMaxReductionBenchmark.byteUMaxReduction 1024 thrpt 2 1183.300 ops/ms
>> VectorUMinUMaxReductionBenchmark.byteUMaxReductionMasked 1024 thrpt 2 1426.570 ops/ms
>> VectorUMinUMaxReductionBenchmark.byteUMinReduction 1024 thrpt 2 1186.889 ops/ms
>> VectorUMinUMaxReductionBenchmark.byteUMinReductionMasked 1024 thrpt 2 1360.700 ops/ms
>> VectorUMinUMaxReductionBenchmark.intUMaxReduction 1024 thrpt 2 967.264 ops/ms
>> VectorUMinUMaxReductionBenchmark.intUMaxReductionMasked 1024 thrpt 2 767.641 ops/ms
>> VectorUMinUMaxReductionBenchmark.intUMinReduction 1024 thrpt 2 969.714 ops/ms
>> VectorUMinUMaxReductionBenchmark.intUMinReductionMasked 1024 thrpt 2 799.210 ops/ms
>> VectorUMinUMaxReductionBenchmark.longUMaxReduction 1024 thrpt 2 410.210 ops/ms
>> VectorUMinUMaxReductionBenchmark.longUMaxReductionMasked 1024 thrpt 2 452.717 ops/ms
>> VectorUMinUMaxReductionBenchmark.longUMinReduction 1024 thrpt 2 470.575 ops/ms
>> VectorUMinUMaxReductionBenchmark.longUMinReductionMasked 1024 thrpt 2 485.897 ops/ms
>> VectorUMinUMaxReductionBenchmark.shortUMaxReduction 1024 thrpt 2 958.935 ops/ms
>> VectorUMinUMaxReductionBenchmark.shortUMaxReductionMasked 1024 thrpt 2 937.805 ops/ms
>> VectorUMinUMaxReductionBenchmark.shortUMinReduction 1024 thrpt 2 950.125 ops/ms
>> VectorUMinUMaxReductionBenchmark.shortUMinReductionMasked 1024 thrpt 2 928.718 ops/ms
>>
>> Withopt:-
>> ---------
>> Benchmark (size) Mode Cnt Score Error Units
>> VectorUMinUMaxReductionBenchmark.byteUMaxReduction 1024 thrpt 2 21391.700 ops/ms
>> VectorUMinUMaxReductionBenchmark.byteUMaxReducti...
>
> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision:
>
> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8346256
> - 8346256: Optimize UMIN/UMAX reduction operations for x86 targets
Hi @sviswa7, can you kindly review this patch.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/29751#issuecomment-3950439305
More information about the hotspot-dev
mailing list