[lworld+fp16] RFR: 8339473: Add support for FP16 isFinite, isInfinite and isNaN
Bhavana Kilambi
bkilambi at openjdk.org
Thu Sep 12 08:15:18 UTC 2024
On Wed, 11 Sep 2024 08:37:38 GMT, Bhavana Kilambi <bkilambi at openjdk.org> wrote:
> This patch adds intrinsic support for FP16 isNaN, isFinite and isInfinite methods and also adds aarch64 backend for these intrinsics.
>
> Tested all FP16 related tests successfully on aarch64.
Hi @jatin-bhateja , I have implemented these intrinsics in floating point in this patch. I also have another version where I tried implementing isInfinite in integer arithmetic for aarch64. I have created a draft PR here - https://github.com/openjdk/valhalla/pull/1242/commits/e095b2d47a4c28bece8fc0fcbc656da713feb132
I avoided generating the ReinterpretS2HF nodes in the mid-end which will make the "src" to be in a GPR and made the implementation in integer arithmetic easier and resulted in significant gains which can be seen in the commit message of the draft PR.
I have also tried implementing similar integer logic on x86 (in the draft PR) and I could see ~2.6x better performance than the case without the intrinsic (I have however not tried to implement the vpclass instruction for FP16 and compare performance).
Keeping the RenterpretS2HF nodes would mean that we need to move the `src` from an FPR to a GPR in the backend, before executing the integer operations which proved to be costly and not very beneficial.
What do you think of not using ReinterpretS2HF node for these intrinsics and keeping the `src` in a GPR instead? I know that technically FP16 is floating point (even if storage is `short`) and we have always implemented the `src` in an FPR so as to generate FP instructions but I was wondering if this tweak is ok if we see good performance gains? Looking forward to hearing your feedback. Thanks!
-------------
PR Comment: https://git.openjdk.org/valhalla/pull/1239#issuecomment-2345562624
More information about the valhalla-dev
mailing list