RFR: 8342103: C2 compiler support for Float16 type and associated operations [v2]

Mon Nov 25 08:59:26 UTC 2024

On Fri, 22 Nov 2024 10:36:10 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Hi All,
>> 
>> This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128)
>> 
>> Following is the summary of changes included with this patch:-
>> 
>> 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations.
>> 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization.
>> 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class.
>>       -    These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values.
>> 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines.
>> 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more details.
>> 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa.
>> 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF
>> 6. Auto-vectorization of newly supported scalar operations.
>> 7. X86 and AARCH64 backend implementation for all supported intrinsics.
>> 9. Functional and Performance validation tests.
>> 
>> **Missing Pieces:-**
>> **-  AARCH64 Backend.**
>> 
>> Kindly review and share your feedback.
>> 
>> Best Regards,
>> Jatin
>
> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Testpoints for new value transforms + code cleanups

I heard no argument about why you did not split this up. Please do that in the future. It is hard to review well when there is this much code. If it is really necessary, then sure. Here it does not seem necessary to deliver all at once.

> The patch includes IR framework-based scalar constant folding test points.
You mention this IR test:
https://github.com/openjdk/jdk/pull/21490/files#diff-3f8786f9f62662eda4b4a5c76c01fa04534c94d870d496501bfc20434ad45579R169-R174

Here I only see the use of very trivial values. I think we need more complicated cases.

What about these:
- Add/Sub/Mul/Div/Min/Max ... with NaN and infinity.
- Same where it would overflow the FP16 range.
- Negative zero tests.
- Division by powers of 2.

It would for example be nice if you could iterate over all inputs. FP16 with 2 inputs is only 32bits, that can be iterated in just a few seconds. Then you can run the computation with constants in the interpreter, and compare to the results in compiled code.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/21490#issuecomment-2497315686