[lworld+fp16] RFR: 8330021: AArch64: Add backend support for FP16 add operation [v3]

Fri Jun 21 08:40:26 UTC 2024

On Fri, 21 Jun 2024 06:09:54 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Hello @jatin-bhateja , I noticed that the vector test - `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorSum.java` fails on aarch64/x86 machines that support the Float16 feature. Have you noticed this on any of your `avx512_fp16` supporting machines too? I suspect this failure does not show up on github actions for this PR or others because they might be running the tests on non-fp16 supporting hardware.
>> 
>> I digged a bit deeper and found out that the Float16 `add` operation was not getting unrolled in the first place because of which it doesn't get vectorized eventually. I have an internal implementation for other binary FP16 operations (like subtract, multiply, divide, min and max) and none of these operations get vectorized either. After some debugging, I noticed that the inner loop body -> `_body` at this point of code execution - `https://github.com/openjdk/valhalla/blob/404a4fae6fc099cd93cc73d57e29b08b5418adb9/src/hotspot/share/opto/loopTransform.cpp#L1069` contains 188 nodes for Half float add compared to 29 nodes for Float32 add and 31 nodes for FP16 add before the merge on June 9th. I think due to so many nodes in the loop, the unrolling policy decides not to unroll it. 
>> I printed the extra nodes that the loop (for FP16 add operation) contains and I can see nodes like - `DecodeNKlass`, `DecodeN`, `LoadNKlass`, `LoadN`, `CmpP`, `StoreP`, `StoreCM`, `MergeMem`, `EncodeP`, `CastPP` and other pointer related IR nodes and maybe some nodes related to decoding compressed class pointers. Although none of these exist in the FP32 code or the FP16 code before the merge. Do you have any idea about this failure or the extra nodes in the loop?
>
>> Hello @jatin-bhateja , I noticed that the vector test - `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorSum.java` fails on aarch64/x86 machines that support the Float16 feature. Have you noticed this on any of your `avx512_fp16` supporting machines too? I suspect this failure does not show up on github actions for this PR or others because they might be running the tests on non-fp16 supporting hardware.
>> 
>> I digged a bit deeper and found out that the Float16 `add` operation was not getting unrolled in the first place because of which it doesn't get vectorized eventually. I have an internal implementation for other binary FP16 operations (like subtract, multiply, divide, min and max) and none of these operations get vectorized either. After some debugging, I noticed that the inner loop body -> `_body` at this point of code execution - `https://github.com/openjdk/valhalla/blob/404a4fae6fc099cd93cc73d57e29b08b5418adb9/src/hotspot/share/opto/loopTransform.cpp#L1069` contains 188 nodes for Half float add compared to 29 nodes for Float32 add and 31 nodes for FP16 add before the merge on June 9th. I think due to so many nodes in the loop, the unrolling policy decides not to unroll it. I printed the extra nodes that the loop (for FP16 add operation) contains and I can see nodes like - `DecodeNKlass`, `DecodeN`, `LoadNKlass`, `LoadN`, `CmpP`, `StoreP`, `StoreCM`, `MergeMem`, `EncodeP`, `Cast
 PP` and other pointer related IR nodes and maybe some nodes related to decoding compressed class pointers. Although none of these exist in the FP32 code or the FP16 code before the merge. Do you have any idea about this failure or the extra nodes in the loop?
> 
> Hi @Bhavana-Kilambi , 
> 
> Flat array layout is currently broken, it used to work when Float16 was a primitive value class, please refer following link for more details.
> https://github.com/openjdk/valhalla/pull/1117#issuecomment-2161269886
> 
> I have already created a RFE for it https://bugs.openjdk.org/browse/JDK-8333852 and working on the same.
> 
> Best Regards,
> Jatin

Thanks @jatin-bhateja. I seem to have missed this RFE. I get it now.

-------------

PR Comment: https://git.openjdk.org/valhalla/pull/1096#issuecomment-2182283500