[lworld+fp16] RFR: 8341003: [lworld+fp16] Benchmarks for various Float16 operations [v2]

Fri Sep 27 07:09:00 UTC 2024

On Thu, 26 Sep 2024 10:42:14 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Update benchmark
>
> Hi @Bhavana-Kilambi , I see vector IR in almost all the micros apart from three i.e. isNaN, isFinite and isInfinity with following command
> 
> `numactl --cpunodebind=1 -l java -jar target/benchmarks.jar  -jvmArgs "-XX:+TraceNewVectors" -p vectorDim=512 -f 1 -i 2 -wi 1 -w 30 org.openjdk.bench.java.lang.Float16OpsBenchmark.<BM_NAME>
> `
> 
> Indicates Java implementation in those cases is not getting auto-vectorized, we didn't had benchmarks earlier, after tuning we can verify with this new benchmark.
> 
> Kindly let me know if the micro looks good, I can integrate it.

> Hi @jatin-bhateja , thanks for doing the micros. Can I please ask why are you benchmarking/testing the cosine similarity tests specifically? Are there any real world usecases which are similar to these for FP16 for which you have written these smaller benchmark kernels?
> 
> Also, regarding the performance results you posted for the Intel machine, have you compared it with anything else (like the default FP32 implementation for FP16/case without the intrinsics or the scalar FP16 version) so that we can better interpret the scores?

Hi @Bhavana-Kilambi , This patch adds **micro benchmarks** for all Float16 APIs optimized uptill now.
**Macro-benchmarks** demonstrates use case for low precision semantic search primitives.

-------------

PR Comment: https://git.openjdk.org/valhalla/pull/1254#issuecomment-2378550615