RFR: 8346256: Optimize UMIN/UMAX reduction operations for x86 targets [v2]

Jatin Bhateja jbhateja at openjdk.org
Tue Feb 24 09:54:55 UTC 2026


On Tue, 17 Feb 2026 06:36:05 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Hi all,
>> 
>> Patch adds x86 backend implementation for UMIN/UMAX reduction operation.
>> 
>> Following are the performance numbers of existing micro-benchmark test/micro/org/openjdk/bench/jdk/incubator/vector/VectorUMinUMaxReductionBenchmark.java
>> 
>> System Configuration:
>>   Model name:             AMD EPYC 9755 128-Core Processor (Turin)
>>   Fixed Frequency : 2.1GHz
>> 
>> 
>> 
>> Baseline:-
>> ----------
>> Benchmark                                                  (size)   Mode  Cnt     Score   Error   Units
>> VectorUMinUMaxReductionBenchmark.byteUMaxReduction           1024  thrpt    2  1183.300          ops/ms
>> VectorUMinUMaxReductionBenchmark.byteUMaxReductionMasked     1024  thrpt    2  1426.570          ops/ms
>> VectorUMinUMaxReductionBenchmark.byteUMinReduction           1024  thrpt    2  1186.889          ops/ms
>> VectorUMinUMaxReductionBenchmark.byteUMinReductionMasked     1024  thrpt    2  1360.700          ops/ms
>> VectorUMinUMaxReductionBenchmark.intUMaxReduction            1024  thrpt    2   967.264          ops/ms
>> VectorUMinUMaxReductionBenchmark.intUMaxReductionMasked      1024  thrpt    2   767.641          ops/ms
>> VectorUMinUMaxReductionBenchmark.intUMinReduction            1024  thrpt    2   969.714          ops/ms
>> VectorUMinUMaxReductionBenchmark.intUMinReductionMasked      1024  thrpt    2   799.210          ops/ms
>> VectorUMinUMaxReductionBenchmark.longUMaxReduction           1024  thrpt    2   410.210          ops/ms
>> VectorUMinUMaxReductionBenchmark.longUMaxReductionMasked     1024  thrpt    2   452.717          ops/ms
>> VectorUMinUMaxReductionBenchmark.longUMinReduction           1024  thrpt    2   470.575          ops/ms
>> VectorUMinUMaxReductionBenchmark.longUMinReductionMasked     1024  thrpt    2   485.897          ops/ms
>> VectorUMinUMaxReductionBenchmark.shortUMaxReduction          1024  thrpt    2   958.935          ops/ms
>> VectorUMinUMaxReductionBenchmark.shortUMaxReductionMasked    1024  thrpt    2   937.805          ops/ms
>> VectorUMinUMaxReductionBenchmark.shortUMinReduction          1024  thrpt    2   950.125          ops/ms
>> VectorUMinUMaxReductionBenchmark.shortUMinReductionMasked    1024  thrpt    2   928.718          ops/ms
>> 
>> Withopt:-
>> ---------
>> Benchmark                                                  (size)   Mode  Cnt      Score   Error   Units
>> VectorUMinUMaxReductionBenchmark.byteUMaxReduction           1024  thrpt    2  21391.700          ops/ms
>> VectorUMinUMaxReductionBenchmark.byteUMaxReducti...
>
> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision:
> 
>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8346256
>  - 8346256: Optimize UMIN/UMAX reduction operations for x86 targets

Hi @sviswa7, can you kindly review this patch.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/29751#issuecomment-3950439305


More information about the hotspot-dev mailing list