[lworld+fp16] RFR: 8339473: Add support for FP16 isFinite, isInfinite and isNaN
Jatin Bhateja
jbhateja at openjdk.org
Thu Sep 26 03:38:51 UTC 2024
On Tue, 17 Sep 2024 12:09:41 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:
>> Hi @Bhavana-Kilambi ,
>> Did you trying modifying the Java implementation and measuring the performance with intrinsic solution ?
>> e.g. isNaNTest := (Float16.float16ToRawShortValue() & 0x7e00) == 0x7e00 etc..
>
>> Hi @jatin-bhateja , just tested the other two intrinsics as well. I do get huge performance gains (on aarch64) with the java versions. I used these -
>>
>> ```
>> @IntrinsicCandidate
>> public static boolean isNaN(Float16 f16) {
>> return ((float16ToRawShortBits(f16) & 0x7e00) == 0x7e00);
>> }
>> ```
>>
>> ```
>> @IntrinsicCandidate
>> public static boolean isInfinite(Float16 f16) {
>> return ((float16ToRawShortBits(f16) ^ float16ToRawShortBits(POSITIVE_INFINITY)) & 0x7fff) == 0;
>> }
>> ```
>>
>> ```
>> @IntrinsicCandidate
>> public static boolean isFinite(Float16 f16) {
>> return !(isInfinite(f16) || isNaN(f16));
>> }
>> ```
>>
>> I see these gains compared to the intrinsic version uploaded in the latest patch on this PR -
>>
>> ```
>> Benchmark (SIZE) Gain
>> FP16Ops.isFiniteHF 65504 1.18409353876134
>> FP16Ops.isInfiniteHF 65504 1.57611028955779
>> FP16Ops.isNaNHF 65504 1.48276431718062
>> ```
>>
>> Do you think it's better to not have any intrinsics at all for these operations?
>
> Thanks @Bhavana-Kilambi , yes it will be optimal, your java versions still carries @IntrinsicCandidate, hope we are not accedently running intrinsics, if that is not the case then making only a java side change should suffice as its also easily auto-vectorizatable
> Hi @jatin-bhateja , Hope the changes you suggested for the JTREG testcase are ok? I am marking it ready for integration -
>
> /integrate
Hi @Bhavana-Kilambi , I think we can take same approach for unary operations - Float16.negate and Float16.abs and remove explicit intrinsification, java side implimentation is auto-vectorizable and performance should be at par with explicit intrinsic.
-------------
PR Comment: https://git.openjdk.org/valhalla/pull/1239#issuecomment-2375792389
More information about the valhalla-dev
mailing list