RFR: 8346236: Auto vectorization support for various Float16 operations [v2]

Tue Mar 18 00:35:10 UTC 2025

On Mon, 10 Mar 2025 06:25:38 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> This is a follow-up PR for https://github.com/openjdk/jdk/pull/22754
>> 
>> The patch adds support to vectorize various float16 scalar operations (add/subtract/divide/multiply/sqrt/fma).
>> 
>> Summary of changes included with the patch:
>>    1. C2 compiler New Vector IR creation.
>>    2. Auto-vectorization support.
>>    3. x86 backend implementation.
>>    4. New IR verification test for each newly supported vector operation.
>> 
>> Following are the performance numbers of Float16OperationsBenchmark
>> 
>> System : Intel(R) Xeon(R) Processor code-named Granite rapids
>> Frequency fixed at 2.5 GHz
>> 
>> 
>> Baseline
>> Benchmark                                                      (vectorDim)   Mode  Cnt     Score   Error   Units
>> Float16OperationsBenchmark.absBenchmark                               1024  thrpt    2  4191.787          ops/ms
>> Float16OperationsBenchmark.addBenchmark                               1024  thrpt    2  1211.978          ops/ms
>> Float16OperationsBenchmark.cosineSimilarityDequantizedFP16            1024  thrpt    2   493.026          ops/ms
>> Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16         1024  thrpt    2   612.430          ops/ms
>> Float16OperationsBenchmark.cosineSimilaritySingleRoundingFP16         1024  thrpt    2   616.012          ops/ms
>> Float16OperationsBenchmark.divBenchmark                               1024  thrpt    2   604.882          ops/ms
>> Float16OperationsBenchmark.dotProductFP16                             1024  thrpt    2   410.798          ops/ms
>> Float16OperationsBenchmark.euclideanDistanceDequantizedFP16           1024  thrpt    2   602.863          ops/ms
>> Float16OperationsBenchmark.euclideanDistanceFP16                      1024  thrpt    2   640.348          ops/ms
>> Float16OperationsBenchmark.fmaBenchmark                               1024  thrpt    2   809.175          ops/ms
>> Float16OperationsBenchmark.getExponentBenchmark                       1024  thrpt    2  2682.764          ops/ms
>> Float16OperationsBenchmark.isFiniteBenchmark                          1024  thrpt    2  3373.901          ops/ms
>> Float16OperationsBenchmark.isFiniteCMovBenchmark                      1024  thrpt    2  1881.652          ops/ms
>> Float16OperationsBenchmark.isFiniteStoreBenchmark                     1024  thrpt    2  2273.745          ops/ms
>> Float16OperationsBenchmark.isInfiniteBenchmark                        1024  thrpt    2  2147.913          ops/ms
>> Float16OperationsBenchmark.isInfiniteCMovBen...
>
> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits:
> 
>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8346236
>  - Updating benchmark
>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8346236
>  - Updating copyright
>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8346236
>  - Add MinVHF/MaxVHF to commutative op list
>  - Auto Vectorization support for Float16 operations.

src/hotspot/cpu/x86/x86.ad line 11034:

> 11032: %{
> 11033:   match(Set dst (FmaVHF src2 (Binary dst src1)));
> 11034:   effect(DEF dst);

DEF dst is the default behavior, do we need the effect statement here?

src/hotspot/cpu/x86/x86.ad line 11046:

> 11044: %{
> 11045:   match(Set dst (FmaVHF src2 (Binary dst (VectorReinterpret (LoadVector src1)))));
> 11046:   effect(DEF dst);

DEF dst is the default behavior, do we need the effect statement here?

test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java line 65:

> 63:             input2[i] = floatToFloat16(rng.nextFloat());
> 64:             input3[i] = floatToFloat16(rng.nextFloat());
> 65:         }

You could use the new Generators fill method here.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/22755#discussion_r1999876698
PR Review Comment: https://git.openjdk.org/jdk/pull/22755#discussion_r1999876291
PR Review Comment: https://git.openjdk.org/jdk/pull/22755#discussion_r1999897286