[lworld+fp16] RFR: 8341003: [lworld+fp16] Benchmarks for various Float16 operations [v2]

Fri Sep 27 07:50:53 UTC 2024

On Fri, 27 Sep 2024 07:06:18 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Hi @Bhavana-Kilambi , I see vector IR in almost all the micros apart from three i.e. isNaN, isFinite and isInfinity with following command
>> 
>> `numactl --cpunodebind=1 -l java -jar target/benchmarks.jar  -jvmArgs "-XX:+TraceNewVectors" -p vectorDim=512 -f 1 -i 2 -wi 1 -w 30 org.openjdk.bench.java.lang.Float16OpsBenchmark.<BM_NAME>
>> `
>> 
>> Indicates Java implementation in those cases is not getting auto-vectorized, we didn't had benchmarks earlier, after tuning we can verify with this new benchmark.
>> 
>> Kindly let me know if the micro looks good, I can integrate it.
>
>> Hi @jatin-bhateja , thanks for doing the micros. Can I please ask why are you benchmarking/testing the cosine similarity tests specifically? Are there any real world usecases which are similar to these for FP16 for which you have written these smaller benchmark kernels?
>> 
>> Also, regarding the performance results you posted for the Intel machine, have you compared it with anything else (like the default FP32 implementation for FP16/case without the intrinsics or the scalar FP16 version) so that we can better interpret the scores?
> 
> Hi @Bhavana-Kilambi , This patch adds **micro benchmarks** for all Float16 APIs optimized uptill now.
> **Macro-benchmarks** demonstrates use case for low precision semantic search primitives.

@jatin-bhateja , Thanks! While we are at the topic, can I ask if there are any real-world usescases or workloads that you are targeting the FP16 work for and maybe plan to do performance testing in the future?

-------------

PR Comment: https://git.openjdk.org/valhalla/pull/1254#issuecomment-2378632353