[lworld+fp16] RFR: 8341003: [lworld+fp16] Benchmarks for various Float16 operations [v2]

Thu Sep 26 15:41:52 UTC 2024

On Thu, 26 Sep 2024 10:42:14 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Update benchmark
>
> Hi @Bhavana-Kilambi , I see vector IR in almost all the micros apart from three i.e. isNaN, isFinite and isInfinity with following command
> 
> `numactl --cpunodebind=1 -l java -jar target/benchmarks.jar  -jvmArgs "-XX:+TraceNewVectors" -p vectorDim=512 -f 1 -i 2 -wi 1 -w 30 org.openjdk.bench.java.lang.Float16OpsBenchmark.<BM_NAME>
> `
> 
> Indicates Java implementation in those cases is not getting auto-vectorized, we didn't had benchmarks earlier, after tuning we can verify with this new benchmark.
> 
> Kindly let me know if the micro looks good, I can integrate it.

Hi @jatin-bhateja , thanks for doing the micros.
Can I please ask why are you benchmarking/testing the cosine similarity tests specifically? Are there any real world usecases which are similar to these for FP16 for which you have written these smaller benchmark kernels?

Also, regarding the performance results you posted for the Intel machine, have you compared it with anything else (like the default FP32 implementation/case without the intrinsics or the scalar FP16 version) so that we can better interpret the scores?

-------------

PR Comment: https://git.openjdk.org/valhalla/pull/1254#issuecomment-2377319124