RFR: 8346236: Auto vectorization support for various Float16 operations [v9]

Jatin Bhateja jbhateja at openjdk.org
Fri Mar 28 18:39:40 UTC 2025


> This is a follow-up PR for https://github.com/openjdk/jdk/pull/22754
> 
> The patch adds support to vectorize various float16 scalar operations (add/subtract/divide/multiply/sqrt/fma).
> 
> Summary of changes included with the patch:
>    1. C2 compiler New Vector IR creation.
>    2. Auto-vectorization support.
>    3. x86 backend implementation.
>    4. New IR verification test for each newly supported vector operation.
> 
> Following are the performance numbers of Float16OperationsBenchmark
> 
> System : Intel(R) Xeon(R) Processor code-named Granite rapids
> Frequency fixed at 2.5 GHz
> 
> 
> Baseline
> Benchmark                                                      (vectorDim)   Mode  Cnt     Score   Error   Units
> Float16OperationsBenchmark.absBenchmark                               1024  thrpt    2  4191.787          ops/ms
> Float16OperationsBenchmark.addBenchmark                               1024  thrpt    2  1211.978          ops/ms
> Float16OperationsBenchmark.cosineSimilarityDequantizedFP16            1024  thrpt    2   493.026          ops/ms
> Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16         1024  thrpt    2   612.430          ops/ms
> Float16OperationsBenchmark.cosineSimilaritySingleRoundingFP16         1024  thrpt    2   616.012          ops/ms
> Float16OperationsBenchmark.divBenchmark                               1024  thrpt    2   604.882          ops/ms
> Float16OperationsBenchmark.dotProductFP16                             1024  thrpt    2   410.798          ops/ms
> Float16OperationsBenchmark.euclideanDistanceDequantizedFP16           1024  thrpt    2   602.863          ops/ms
> Float16OperationsBenchmark.euclideanDistanceFP16                      1024  thrpt    2   640.348          ops/ms
> Float16OperationsBenchmark.fmaBenchmark                               1024  thrpt    2   809.175          ops/ms
> Float16OperationsBenchmark.getExponentBenchmark                       1024  thrpt    2  2682.764          ops/ms
> Float16OperationsBenchmark.isFiniteBenchmark                          1024  thrpt    2  3373.901          ops/ms
> Float16OperationsBenchmark.isFiniteCMovBenchmark                      1024  thrpt    2  1881.652          ops/ms
> Float16OperationsBenchmark.isFiniteStoreBenchmark                     1024  thrpt    2  2273.745          ops/ms
> Float16OperationsBenchmark.isInfiniteBenchmark                        1024  thrpt    2  2147.913          ops/ms
> Float16OperationsBenchmark.isInfiniteCMovBenchmark                    1024  thrpt    2  1962.579          ops/ms...

Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:

  Review comment resolutions

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/22755/files
  - new: https://git.openjdk.org/jdk/pull/22755/files/6f89f3f3..a25eb507

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=22755&range=08
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22755&range=07-08

  Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/22755.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/22755/head:pull/22755

PR: https://git.openjdk.org/jdk/pull/22755


More information about the hotspot-compiler-dev mailing list