[lworld+fp16] RFR: 8341003: [lworld+fp16] Benchmarks for various Float16 operations [v2]
Jatin Bhateja
jbhateja at openjdk.org
Fri Sep 27 07:53:52 UTC 2024
On Fri, 27 Sep 2024 07:06:18 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:
> > Hi @jatin-bhateja , thanks for doing the micros. Can I please ask why are you benchmarking/testing the cosine similarity tests specifically? Are there any real world usecases which are similar to these for FP16 for which you have written these smaller benchmark kernels?
> > Also, regarding the performance results you posted for the Intel machine, have you compared it with anything else (like the default FP32 implementation for FP16/case without the intrinsics or the scalar FP16 version) so that we can better interpret the scores?
>
> Hi @Bhavana-Kilambi , This patch adds **micro benchmarks** for all Float16 APIs optimized uptill now. **Macro-benchmarks** demonstrates use case for low precision semantic search primitives.
Hey, for baseline we should not pass --enable-preview since it will prohibit following
- Flat layout of Float16 arrays.
- Creating valhalla specific IR needed for intrinsification.
Here are the first baseline numbers without --enable-primitive.
Benchmark (vectorDim) Mode Cnt Score Error Units
Float16OpsBenchmark.absBenchmark 1024 thrpt 2 99.424 ops/ms
Float16OpsBenchmark.addBenchmark 1024 thrpt 2 97.498 ops/ms
Float16OpsBenchmark.cosineSimilarityDequantizedFP16 1024 thrpt 2 525.360 ops/ms
Float16OpsBenchmark.cosineSimilarityDoubleRoundingFP16 1024 thrpt 2 51.132 ops/ms
Float16OpsBenchmark.cosineSimilaritySingleRoundingFP16 1024 thrpt 2 46.921 ops/ms
Float16OpsBenchmark.divBenchmark 1024 thrpt 2 97.186 ops/ms
Float16OpsBenchmark.euclideanDistanceDequantizedFP16 1024 thrpt 2 583.051 ops/ms
Float16OpsBenchmark.euclideanDistanceFP16 1024 thrpt 2 56.133 ops/ms
Float16OpsBenchmark.fmaBenchmark 1024 thrpt 2 81.386 ops/ms
Float16OpsBenchmark.getExponentBenchmark 1024 thrpt 2 2257.619 ops/ms
Float16OpsBenchmark.isFiniteBenchmark 1024 thrpt 2 3086.476 ops/ms
Float16OpsBenchmark.isInfiniteBenchmark 1024 thrpt 2 1718.411 ops/ms
Float16OpsBenchmark.isNaNBenchmark 1024 thrpt 2 1685.557 ops/ms
Float16OpsBenchmark.maxBenchmark 1024 thrpt 2 92.078 ops/ms
Float16OpsBenchmark.minBenchmark 1024 thrpt 2 63.377 ops/ms
Float16OpsBenchmark.mulBenchmark 1024 thrpt 2 98.202 ops/ms
Float16OpsBenchmark.negateBenchmark 1024 thrpt 2 98.158 ops/ms
Float16OpsBenchmark.sqrtBenchmark 1024 thrpt 2 83.760 ops/ms
Float16OpsBenchmark.subBenchmark 1024 thrpt 2 98.200 ops/ms
Following are the number where we do allow flat array layout, but only disable intrinsics (-XX:DisableIntrinsics=<INTIN_ID>+).
Benchmark (vectorDim) Mode Cnt Score Error Units
Float16OpsBenchmark.absBenchmark 1024 thrpt 2 25978.876 ops/ms
Float16OpsBenchmark.addBenchmark 1024 thrpt 2 6406.685 ops/ms
Float16OpsBenchmark.cosineSimilarityDequantizedFP16 1024 thrpt 2 528.877 ops/ms
Float16OpsBenchmark.cosineSimilarityDoubleRoundingFP16 1024 thrpt 2 76.680 ops/ms
Float16OpsBenchmark.cosineSimilaritySingleRoundingFP16 1024 thrpt 2 53.692 ops/ms
Float16OpsBenchmark.divBenchmark 1024 thrpt 2 3227.037 ops/ms
Float16OpsBenchmark.euclideanDistanceDequantizedFP16 1024 thrpt 2 740.490 ops/ms
Float16OpsBenchmark.euclideanDistanceFP16 1024 thrpt 2 83.747 ops/ms
Float16OpsBenchmark.fmaBenchmark 1024 thrpt 2 256.399 ops/ms
Float16OpsBenchmark.getExponentBenchmark 1024 thrpt 2 2135.678 ops/ms
Float16OpsBenchmark.isFiniteBenchmark 1024 thrpt 2 3916.860 ops/ms
Float16OpsBenchmark.isInfiniteBenchmark 1024 thrpt 2 1497.417 ops/ms
Float16OpsBenchmark.isNaNBenchmark 1024 thrpt 2 2747.704 ops/ms
Float16OpsBenchmark.maxBenchmark 1024 thrpt 2 3625.708 ops/ms
Float16OpsBenchmark.minBenchmark 1024 thrpt 2 3628.261 ops/ms
Float16OpsBenchmark.mulBenchmark 1024 thrpt 2 6340.403 ops/ms
Float16OpsBenchmark.negateBenchmark 1024 thrpt 2 25727.870 ops/ms
Float16OpsBenchmark.sqrtBenchmark 1024 thrpt 2 157.519 ops/ms
Float16OpsBenchmark.subBenchmark 1024 thrpt 2 6404.047 ops/ms
-------------
PR Comment: https://git.openjdk.org/valhalla/pull/1254#issuecomment-2378638423
More information about the valhalla-dev
mailing list