RFR: 8346236: Auto vectorization support for various Float16 operations [v2]
Sandhya Viswanathan
sviswanathan at openjdk.org
Tue Mar 18 00:35:10 UTC 2025
On Mon, 10 Mar 2025 06:25:38 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:
>> This is a follow-up PR for https://github.com/openjdk/jdk/pull/22754
>>
>> The patch adds support to vectorize various float16 scalar operations (add/subtract/divide/multiply/sqrt/fma).
>>
>> Summary of changes included with the patch:
>> 1. C2 compiler New Vector IR creation.
>> 2. Auto-vectorization support.
>> 3. x86 backend implementation.
>> 4. New IR verification test for each newly supported vector operation.
>>
>> Following are the performance numbers of Float16OperationsBenchmark
>>
>> System : Intel(R) Xeon(R) Processor code-named Granite rapids
>> Frequency fixed at 2.5 GHz
>>
>>
>> Baseline
>> Benchmark (vectorDim) Mode Cnt Score Error Units
>> Float16OperationsBenchmark.absBenchmark 1024 thrpt 2 4191.787 ops/ms
>> Float16OperationsBenchmark.addBenchmark 1024 thrpt 2 1211.978 ops/ms
>> Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 1024 thrpt 2 493.026 ops/ms
>> Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 1024 thrpt 2 612.430 ops/ms
>> Float16OperationsBenchmark.cosineSimilaritySingleRoundingFP16 1024 thrpt 2 616.012 ops/ms
>> Float16OperationsBenchmark.divBenchmark 1024 thrpt 2 604.882 ops/ms
>> Float16OperationsBenchmark.dotProductFP16 1024 thrpt 2 410.798 ops/ms
>> Float16OperationsBenchmark.euclideanDistanceDequantizedFP16 1024 thrpt 2 602.863 ops/ms
>> Float16OperationsBenchmark.euclideanDistanceFP16 1024 thrpt 2 640.348 ops/ms
>> Float16OperationsBenchmark.fmaBenchmark 1024 thrpt 2 809.175 ops/ms
>> Float16OperationsBenchmark.getExponentBenchmark 1024 thrpt 2 2682.764 ops/ms
>> Float16OperationsBenchmark.isFiniteBenchmark 1024 thrpt 2 3373.901 ops/ms
>> Float16OperationsBenchmark.isFiniteCMovBenchmark 1024 thrpt 2 1881.652 ops/ms
>> Float16OperationsBenchmark.isFiniteStoreBenchmark 1024 thrpt 2 2273.745 ops/ms
>> Float16OperationsBenchmark.isInfiniteBenchmark 1024 thrpt 2 2147.913 ops/ms
>> Float16OperationsBenchmark.isInfiniteCMovBen...
>
> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits:
>
> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8346236
> - Updating benchmark
> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8346236
> - Updating copyright
> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8346236
> - Add MinVHF/MaxVHF to commutative op list
> - Auto Vectorization support for Float16 operations.
src/hotspot/cpu/x86/x86.ad line 11034:
> 11032: %{
> 11033: match(Set dst (FmaVHF src2 (Binary dst src1)));
> 11034: effect(DEF dst);
DEF dst is the default behavior, do we need the effect statement here?
src/hotspot/cpu/x86/x86.ad line 11046:
> 11044: %{
> 11045: match(Set dst (FmaVHF src2 (Binary dst (VectorReinterpret (LoadVector src1)))));
> 11046: effect(DEF dst);
DEF dst is the default behavior, do we need the effect statement here?
test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java line 65:
> 63: input2[i] = floatToFloat16(rng.nextFloat());
> 64: input3[i] = floatToFloat16(rng.nextFloat());
> 65: }
You could use the new Generators fill method here.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/22755#discussion_r1999876698
PR Review Comment: https://git.openjdk.org/jdk/pull/22755#discussion_r1999876291
PR Review Comment: https://git.openjdk.org/jdk/pull/22755#discussion_r1999897286
More information about the hotspot-compiler-dev
mailing list