[lworld+fp16] RFR: 8339473: Add support for FP16 isFinite, isInfinite and isNaN

Wed Sep 18 10:00:36 UTC 2024

On Tue, 17 Sep 2024 12:09:41 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Hi @Bhavana-Kilambi , 
>> Did you trying modifying the Java implementation and measuring the performance with intrinsic solution ? 
>> e.g.  isNaNTest :=  (Float16.float16ToRawShortValue() & 0x7e00) == 0x7e00 etc..
>
>> Hi @jatin-bhateja , just tested the other two intrinsics as well. I do get huge performance gains (on aarch64) with the java versions. I used these -
>> 
>> ```
>> @IntrinsicCandidate
>>     public static boolean isNaN(Float16 f16) {
>>         return ((float16ToRawShortBits(f16) & 0x7e00) == 0x7e00);
>>     }
>> ```
>> 
>> ```
>> @IntrinsicCandidate
>>     public static boolean isInfinite(Float16 f16) {
>>         return ((float16ToRawShortBits(f16) ^ float16ToRawShortBits(POSITIVE_INFINITY)) & 0x7fff) == 0;
>>     }
>> ```
>> 
>> ```
>> @IntrinsicCandidate
>>     public static boolean isFinite(Float16 f16) {
>>         return !(isInfinite(f16) || isNaN(f16));
>>      }
>> ```
>> 
>> I see these gains compared to the intrinsic version uploaded in the latest patch on this PR -
>> 
>> ```
>> Benchmark               (SIZE)    Gain
>> FP16Ops.isFiniteHF      65504     1.18409353876134
>> FP16Ops.isInfiniteHF    65504     1.57611028955779
>> FP16Ops.isNaNHF         65504     1.48276431718062
>> ```
>> 
>> Do you think it's better to not have any intrinsics at all for these operations?
> 
> Thanks @Bhavana-Kilambi , yes it will be optimal, your java versions still carries @IntrinsicCandidate, hope we are not accedently running intrinsics, if that is not the case then making only a java side change should suffice as its also easily auto-vectorizatable

Hi @jatin-bhateja , I have uploaded a new patch with changes in the java implementation instead of using intrinsics for these three operations. I have new performance results with this patch here - 

Benchmark               Gain vs FP16 insns      Gain vs default
FP16Ops.isFiniteHF      1.64                    2.36
FP16Ops.isInfiniteHF    1.56                    2.97
FP16Ops.isNaNHF	        1.49                    1.62

The second column shows the ratio of thrpt between this patch and theprevious one which implements these operations as intrinsics (either in FP16 or integer arithmetic) and the third column is the ratio of thrpt between this patch and the default JDK implementation which requires the result to first be converted to FP32 before computing the result.

Please review. Thanks!

-------------

PR Comment: https://git.openjdk.org/valhalla/pull/1239#issuecomment-2358025460