[lworld+fp16] RFR: 8339473: Add support for FP16 isFinite, isInfinite and isNaN
Bhavana Kilambi
bkilambi at openjdk.org
Wed Sep 18 10:00:36 UTC 2024
On Tue, 17 Sep 2024 12:09:41 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:
>> Hi @Bhavana-Kilambi ,
>> Did you trying modifying the Java implementation and measuring the performance with intrinsic solution ?
>> e.g. isNaNTest := (Float16.float16ToRawShortValue() & 0x7e00) == 0x7e00 etc..
>
>> Hi @jatin-bhateja , just tested the other two intrinsics as well. I do get huge performance gains (on aarch64) with the java versions. I used these -
>>
>> ```
>> @IntrinsicCandidate
>> public static boolean isNaN(Float16 f16) {
>> return ((float16ToRawShortBits(f16) & 0x7e00) == 0x7e00);
>> }
>> ```
>>
>> ```
>> @IntrinsicCandidate
>> public static boolean isInfinite(Float16 f16) {
>> return ((float16ToRawShortBits(f16) ^ float16ToRawShortBits(POSITIVE_INFINITY)) & 0x7fff) == 0;
>> }
>> ```
>>
>> ```
>> @IntrinsicCandidate
>> public static boolean isFinite(Float16 f16) {
>> return !(isInfinite(f16) || isNaN(f16));
>> }
>> ```
>>
>> I see these gains compared to the intrinsic version uploaded in the latest patch on this PR -
>>
>> ```
>> Benchmark (SIZE) Gain
>> FP16Ops.isFiniteHF 65504 1.18409353876134
>> FP16Ops.isInfiniteHF 65504 1.57611028955779
>> FP16Ops.isNaNHF 65504 1.48276431718062
>> ```
>>
>> Do you think it's better to not have any intrinsics at all for these operations?
>
> Thanks @Bhavana-Kilambi , yes it will be optimal, your java versions still carries @IntrinsicCandidate, hope we are not accedently running intrinsics, if that is not the case then making only a java side change should suffice as its also easily auto-vectorizatable
Hi @jatin-bhateja , I have uploaded a new patch with changes in the java implementation instead of using intrinsics for these three operations. I have new performance results with this patch here -
Benchmark Gain vs FP16 insns Gain vs default
FP16Ops.isFiniteHF 1.64 2.36
FP16Ops.isInfiniteHF 1.56 2.97
FP16Ops.isNaNHF 1.49 1.62
The second column shows the ratio of thrpt between this patch and theprevious one which implements these operations as intrinsics (either in FP16 or integer arithmetic) and the third column is the ratio of thrpt between this patch and the default JDK implementation which requires the result to first be converted to FP32 before computing the result.
Please review. Thanks!
-------------
PR Comment: https://git.openjdk.org/valhalla/pull/1239#issuecomment-2358025460
More information about the valhalla-dev
mailing list