RFR: 8346236: Auto vectorization support for various Float16 operations [v10]
Jatin Bhateja
jbhateja at openjdk.org
Thu Apr 3 18:33:36 UTC 2025
> This is a follow-up PR for https://github.com/openjdk/jdk/pull/22754
>
> The patch adds support to vectorize various float16 scalar operations (add/subtract/divide/multiply/sqrt/fma).
>
> Summary of changes included with the patch:
> 1. C2 compiler New Vector IR creation.
> 2. Auto-vectorization support.
> 3. x86 backend implementation.
> 4. New IR verification test for each newly supported vector operation.
>
> Following are the performance numbers of Float16OperationsBenchmark
>
> System : Intel(R) Xeon(R) Processor code-named Granite rapids
> Frequency fixed at 2.5 GHz
>
>
> Baseline
> Benchmark (vectorDim) Mode Cnt Score Error Units
> Float16OperationsBenchmark.absBenchmark 1024 thrpt 2 4191.787 ops/ms
> Float16OperationsBenchmark.addBenchmark 1024 thrpt 2 1211.978 ops/ms
> Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 1024 thrpt 2 493.026 ops/ms
> Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 1024 thrpt 2 612.430 ops/ms
> Float16OperationsBenchmark.cosineSimilaritySingleRoundingFP16 1024 thrpt 2 616.012 ops/ms
> Float16OperationsBenchmark.divBenchmark 1024 thrpt 2 604.882 ops/ms
> Float16OperationsBenchmark.dotProductFP16 1024 thrpt 2 410.798 ops/ms
> Float16OperationsBenchmark.euclideanDistanceDequantizedFP16 1024 thrpt 2 602.863 ops/ms
> Float16OperationsBenchmark.euclideanDistanceFP16 1024 thrpt 2 640.348 ops/ms
> Float16OperationsBenchmark.fmaBenchmark 1024 thrpt 2 809.175 ops/ms
> Float16OperationsBenchmark.getExponentBenchmark 1024 thrpt 2 2682.764 ops/ms
> Float16OperationsBenchmark.isFiniteBenchmark 1024 thrpt 2 3373.901 ops/ms
> Float16OperationsBenchmark.isFiniteCMovBenchmark 1024 thrpt 2 1881.652 ops/ms
> Float16OperationsBenchmark.isFiniteStoreBenchmark 1024 thrpt 2 2273.745 ops/ms
> Float16OperationsBenchmark.isInfiniteBenchmark 1024 thrpt 2 2147.913 ops/ms
> Float16OperationsBenchmark.isInfiniteCMovBenchmark 1024 thrpt 2 1962.579 ops/ms...
Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits:
- Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8346236
- Review comment resolutions
- Some re-factoring
- Adding tests for new float16 Generator
- Removing Generator dependency on incubation module
- Review comments resolution.
- Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8346236
- Updating benchmark
- Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8346236
- Updating copyright
- ... and 3 more: https://git.openjdk.org/jdk/compare/d894b781...6d05863d
-------------
Changes: https://git.openjdk.org/jdk/pull/22755/files
Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22755&range=09
Stats: 1165 lines in 23 files changed: 1077 ins; 12 del; 76 mod
Patch: https://git.openjdk.org/jdk/pull/22755.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/22755/head:pull/22755
PR: https://git.openjdk.org/jdk/pull/22755
More information about the hotspot-compiler-dev
mailing list