[lworld+fp16] RFR: 8330021: AArch64: Add backend support for FP16 add operation [v4]

Fri Jul 5 14:13:27 UTC 2024

On Fri, 5 Jul 2024 09:01:45 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Bhavana Kilambi has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits:
>> 
>>  - Merge branch 'lworld+fp16'
>>  - Remove couple of IR testcases + add --enable-preview flag to tests
>>  - Merge branch 'lworld+fp16'
>>  - Add a missing return statement in the ad file
>>  - 8330021: AArch64: Add backend support for FP16 add operation
>>    
>>    This commit [1] adds initial support for FP16 operations and adds
>>    backend support for FP16 add operation for X86. This task adds backend
>>    support for scalar and vector FP16 add operation on aarch64.
>>    
>>    [1] https://github.com/openjdk/valhalla/commit/f03fb4e4ee4d59ed692d0c26ddce260511f544e7#diff-a799ce8da7f3062bb3699beb65aae504840c649942032e325c2a50f88a2869ad
>
> src/hotspot/share/opto/vectornode.cpp line 595:
> 
>> 593:   case Op_ConvF2HF:
>> 594:   case Op_ReinterpretS2HF:
>> 595:   case Op_ReinterpretHF2S:
> 
> ReinterpretHF2S carry a short type, does not qualify as float16 node.
> 
> Follwing codes should handle it.
> 
> https://github.com/openjdk/valhalla/blob/779d77f04810c137bb6a3374608b80cbe5cc9ef3/src/hotspot/share/opto/superword.cpp#L3172

I actually stumbled upon a test failure when I was testing this pattern - `dst (ConvHF2F (ReinterpretHF2S src))` using this test case - 
 ```
// fin[] -> Float16 array
// flout[] -> Float array
public static void fp16Add() {
        for (int i = 0; i < SIZE; i++) {
            flout[i] = Float16.sum(fin[i], fin[i]).floatValue();
}

and with a debug build the test fails with the following error -

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (/home/bhakil01/valhalla_gerrit/valhalla/src/hotspot/share/opto/vectornode.cpp:1423), pid=2559308, tid=2559322
#  Error: assert(bt == T_SHORT) failed
#
# JRE version: OpenJDK Runtime Environment (23.0) (fastdebug build 23-internal-adhoc.bhakil01.valhalla)
# Java VM: OpenJDK 64-Bit Server VM (fastdebug 23-internal-adhoc.bhakil01.valhalla, mixed mode, compressed oops, compressed class ptrs, g1 gc, linux-aarch64)
# Problematic frame:
# V  [libjvm.so+0x18d00d8]  VectorCastNode::opcode(int, BasicType, bool)+0xf4
#

This happens because during auto-vectorization, the container type of `ReinterpretHF2S` is computed as `T_INT` and not `T_SHORT` while the input to ConvHF2F is supposed to be T_SHORT which is why the assertion fails here - https://github.com/openjdk/valhalla/blob/779d77f04810c137bb6a3374608b80cbe5cc9ef3/src/hotspot/share/opto/vectornode.cpp#L1422

The container type for `ReinterpretHF2S` is computed as `TypeInt::INT` here - https://github.com/openjdk/valhalla/blob/779d77f04810c137bb6a3374608b80cbe5cc9ef3/src/hotspot/share/opto/superword.cpp#L3352 which is propagating down to the VectorCastNode and leading to the assertion failure.

So I have included this node in `is_float16_node()` to return `TypeInt::SHORT` for `ReinterpretHF2S`.

If you feel this is not the right place for this node, maybe I can do something like - 

if (VectorNode::is_float16_node(n->Opcode()) || Op_ReinterpretHF2S) {
    return TypeInt::SHORT;
  }

-------------

PR Review Comment: https://git.openjdk.org/valhalla/pull/1096#discussion_r1666844923