RFR: 8345125: Aarch64: Add aarch64 backend for Float16 scalar operations [v2]

Wed Feb 26 10:27:02 UTC 2025

On Wed, 26 Feb 2025 08:49:58 GMT, Bhavana Kilambi <bkilambi at openjdk.org> wrote:

>> src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 2097:
>> 
>>> 2095: 
>>> 2096:   // Half-precision floating-point instructions
>>> 2097:   INSN(fabdh,  0b011, 0b11, 0b000101, 0b0);
>> 
>> I suppose `fadbh` and `fnmulh` are added to keep aligned with the float and double ones, i.e. `fabd(s|d)` and `fnmul(s|d)`.
>> 
>> 
>> I noticed that there are matching rules for `fabd(s|d)`, i.e. `absd(F|D)_reg`. I wonder if we need add the corresponding rule for fp16 here?
>
> Hi @shqking , thanks for your review comments. Yes I added `fabdh` and `fnmulh` to keep aligned with float and double types.
> For adding support for FP16 `absd` we need `AbsHF` to be supported (along with SubHF) but `AbsHF` node is not implemented currently. `abs` operation is directly executed from the java code here - https://github.com/openjdk/jdk/blob/037e47112bdf2fa2324f7c58198f6d433f17d9fd/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Float16.java#L1464 and is not intrinsified or pattern matched like other FP16 operations. Same with `negate` operation for FP16 - https://github.com/openjdk/jdk/blob/037e47112bdf2fa2324f7c58198f6d433f17d9fd/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Float16.java#L1449
> On the Valhalla repo, while these operation were being developed, I tried adding support for `AbsHF/NegHF` which emitted `fabs` and `fneg` instructions but the performance with the direct java code(bit manipulation operations) was much faster (sorry don't remember the exact number) so we decided to go with the java implementation instead.
> I still added `fabd` here because `op21` is 0 only in `fabd` H variant and felt that it'd be better to handle it here as it belongs to this group of instructions. Please let me know your thoughts.

According to the RM, fabd is in _Advanced SIMD scalar three same FP16_, but the rest are in _Floating-point data-processing (2 source)_.  The decoding scheme looks rather different.`fabd`, then, doesn't really fit here, but in a section with the rest of the three same FP16 instructions.
The encoding scheme for _Advanced SIMD scalar three same FP16_ is pretty simple, so I suggest you create a new group for them, and put `fabd` in there.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/23748#discussion_r1971330062