RFR: 8342103: C2 compiler support for Float16 type and associated scalar operations [v17]

Mon Feb 10 21:26:25 UTC 2025

On Tue, 4 Feb 2025 10:05:09 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Hi All,
>> 
>> This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128)
>> 
>> Following is the summary of changes included with this patch:-
>> 
>> 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations.
>> 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization.
>> 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class.
>>       -    These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values.
>> 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines.
>> 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/22754#issuecomment-2543982577)for more details.
>> 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA generally operates over floating point registers, thus the compiler injects reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa.
>> 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF
>> 9. X86  backend implementation for all supported intrinsics.
>> 10. Functional and Performance validation tests.
>> 
>> Kindly review the patch and share your feedback.
>> 
>> Best Regards,
>> Jatin
>
> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fixing typos

An impressive and substantial change. I focused on the Java code, there are some small tweaks, presented in comments, we can make to the intrinsics to improve the expression of code, and it has no impact on the intrinsic implementation.

src/java.base/share/classes/jdk/internal/vm/vector/Float16Math.java line 32:

> 30:  * The class {@code Float16Math} constains intrinsic entry points corresponding
> 31:  * to scalar numeric operations defined in Float16 class.
> 32:  * @since   25

You can remove this line, since this is an internal class.

src/java.base/share/classes/jdk/internal/vm/vector/Float16Math.java line 38:

> 36:     }
> 37: 
> 38:     public interface Float16UnaryMathOp {

You can just use `UnaryOperator<T>`, no need for a new type, here are the updated methods you can apply to this class.

    @FunctionalInterface
    public interface TernaryOperator<T> {
        T apply(T a, T b, T c);
    }

    @IntrinsicCandidate
    public static <T> T sqrt(Class<T> box_class, T oa, UnaryOperator<T> defaultImpl) {
        assert isNonCapturingLambda(defaultImpl) : defaultImpl;
        return defaultImpl.apply(oa);
    }

    @IntrinsicCandidate
    public static <T> T fma(Class<T> box_class, T oa, T ob, T oc, TernaryOperator<T> defaultImpl) {
        assert isNonCapturingLambda(defaultImpl) : defaultImpl;
        return defaultImpl.apply(oa, ob, oc);
    }

    static boolean isNonCapturingLambda(Object o) {
        return o.getClass().getDeclaredFields().length == 0;
    }

And in `src/hotspot/share/classfile/vmIntrinsics.hpp`:

  /* Float16Math API intrinsification support */                                                                         \
  /* Float16 signatures */                                                                                               \
  do_signature(float16_unary_math_op_sig, "(Ljava/lang/Class;"                                                           \
                                           "Ljava/lang/Object;"                                                          \
                                           "Ljava/util/function/UnaryOperator;)"                                         \
                                           "Ljava/lang/Object;")                                                         \
  do_signature(float16_ternary_math_op_sig, "(Ljava/lang/Class;"                                                         \
                                             "Ljava/lang/Object;"                                                        \
                                             "Ljava/lang/Object;"                                                        \
                                             "Ljava/lang/Object;"                                                        \
                                             "Ljdk/internal/vm/vector/Float16Math$TernaryOperator;)"                     \
                                             "Ljava/lang/Object;")                                                       \
  do_intrinsic(_sqrt_float16, jdk_internal_vm_vector_Float16Math, sqrt_name, float16_unary_math_op_sig, F_S)             \
  do_intrinsic(_fma_float16, jdk_internal_vm_vector_Float16Math, fma_name, float16_ternary_math_op_sig, F_S)             \

src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Float16.java line 1202:

> 1200:      */
> 1201:     public static Float16 sqrt(Float16 radicand) {
> 1202:         return (Float16) Float16Math.sqrt(Float16.class, radicand,

With changes to the intrinsics (as presented in another comment) you no longer need explicit casts and the code is precisely the same as before except embedded in a lambda body:

    public static Float16 sqrt(Float16 radicand) {
        return Float16Math.sqrt(Float16.class, radicand,
            (_radicand) -> {
                // Rounding path of sqrt(Float16 -> double) -> Float16 is fine
                // for preserving the correct final value. The conversion
                // Float16 -> double preserves the exact numerical value. The
                // conversion of double -> Float16 also benefits from the
                // 2p+2 property of IEEE 754 arithmetic.
               return valueOf(Math.sqrt(_radicand.doubleValue()));
            }
        );
    }

Similarly for `fma`:

         return Float16Math.fma(Float16.class, a, b, c,
                (_a, _b, _c) -> {
                    // product is numerically exact in float before the cast to
                    // double; not necessary to widen to double before the
                    // multiply.
                    double product = (double)(_a.floatValue() * _b.floatValue());
                    return valueOf(product + _c.doubleValue());
                });

test/jdk/jdk/incubator/vector/ScalarFloat16OperationsTest.java line 44:

> 42: import static jdk.incubator.vector.Float16.*;
> 43: 
> 44: public class ScalarFloat16OperationsTest {

Now that we have IR tests do you still think this test is necessary or should we have more IR test instead? @eme64 thoughts? We could follow up in another PR if need be.

-------------

PR Review: https://git.openjdk.org/jdk/pull/22754#pullrequestreview-2607094727
PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1949842011
PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1949871647
PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1949847574
PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1949858554