[lworld+fp16] RFR: 8339473: Add support for FP16 isFinite, isInfinite and isNaN
Bhavana Kilambi
bkilambi at openjdk.org
Mon Sep 16 11:45:24 UTC 2024
On Thu, 12 Sep 2024 08:12:18 GMT, Bhavana Kilambi <bkilambi at openjdk.org> wrote:
>> This patch adds intrinsic support for FP16 isNaN, isFinite and isInfinite methods and also adds aarch64 backend for these intrinsics.
>>
>> Tested all FP16 related tests successfully on aarch64.
>
> Hi @jatin-bhateja , I have implemented these intrinsics in floating point in this patch. I also have another version where I tried implementing isInfinite in integer arithmetic for aarch64. I have created a draft PR here - https://github.com/openjdk/valhalla/pull/1242/commits/e095b2d47a4c28bece8fc0fcbc656da713feb132
>
> I avoided generating the ReinterpretS2HF nodes in the mid-end which will make the "src" to be in a GPR and made the implementation in integer arithmetic easier and resulted in significant gains which can be seen in the commit message of the draft PR.
>
> I have also tried implementing similar integer logic on x86 (in the draft PR) and I could see ~2.6x better performance than the case without the intrinsic (I have however not tried to implement the vpclass instruction for FP16 and compare performance).
>
> Keeping the RenterpretS2HF nodes would mean that we need to move the `src` from an FPR to a GPR in the backend, before executing the integer operations which proved to be costly and not very beneficial.
>
> What do you think of not using ReinterpretS2HF node for these intrinsics and keeping the `src` in a GPR instead? I know that technically FP16 is floating point (even if storage is `short`) and we have always implemented the `src` in an FPR so as to generate FP instructions but I was wondering if this tweak is ok if we see good performance gains? Looking forward to hearing your feedback. Thanks!
> Hi @Bhavana-Kilambi , Did you trying modifying the Java implementation and measuring the performance with intrinsic solution ? e.g. isNaNTest := (Float16.float16ToRawShortValue() & 0x7e00) == 0x7e00 etc..
Hi @jatin-bhateja , I just tried for isNaN and the Java version is 13% faster than intrinsic (both with FP16 instructions and integer arithmetic). Do you think it's better to default to the java version instead of using intrinsics? I havent' tried for isFinite and isInfinite.
-------------
PR Comment: https://git.openjdk.org/valhalla/pull/1239#issuecomment-2352683852
More information about the valhalla-dev
mailing list