RFR: 8342103: C2 compiler support for Float16 type and associated scalar operations [v9]

Tue Jan 14 13:09:45 UTC 2025

On Mon, 13 Jan 2025 16:51:02 GMT, Paul Sandoz <psandoz at openjdk.org> wrote:

>> Hi @PaulSandoz , In the current scheme we are passing unboxed carriers to intrinsic entry point, in the fallback implementation carrier type is first converted to floating point value using Float.float16ToFloat API which expects to receive a short type argument,  after the operation we again convert float value to carrier type (short) using Float.floatToFloat16 API which expects a float argument,  thus our intent here is to perform unboxing and boxing outside the intrinsic thereby avoiding all complexities around boxing by compiler. Even if we pass 3 additional parameters we still need to use Float16.floatValue which invokes Float.float16ToFloat underneath, thus this minor modification on Java side is on account of optimizing the intrinsic interface.
>
> Yes, i understand the approach. It's about clarity of the fallback implementation retaining what was expressed in the original code:
> 
>         short res = Float16Math.fma(fa, fb, fc, a, b, c,
>                 (a_, b_, c_) -> {
>                     double product = (double)(a_.floatValue() * b._floatValue());
>                     return valueOf(product + c_.doubleValue());
>                 });

Hi @PaulSandoz ,

In  above code snippet the return type 'short' of intrinsic call does not comply with the value being returned which is of box type, thereby mandating addition glue code. 

Regular primitive type boxing APIs are lazily intrinsified, thereby generating  an intrinsifiable Call IR during parsing. 
LoadNode’s idealization can fetch a boxed value from the input of boxing call IR and directly forward it to users.

Q1. What is the problem in directly passing Float16 boxes to FMA and SQRT intrinsic entry points?

A. The compiler will have to unbox them before the actual operation. There are multiple schemes to perform unboxing, such as name-based, offset-based, and index-based field lookup. 
Vector API unbox expansion uses an offset-based payload field lookup, for this it bookkeeps the payload’s offset over runtime representation of VectorPayload class created as part of VM initialization.
However, VM can only bookkeep this information for classes that are part of java.base module, Float16 being part of incubation module cannot use offset-based field lookup.  Thus only viable alternative is to unbox using field name/index based lookup.
For this compiler will first verify that the incoming oop is of Float16 type and then use a hardcoded name-based lookup to Load the field value. This looks fragile as it establishes an unwanted dependency b/w Float16 field names and compiler implementation, same applies to index-based lookup as index values are dependent onthe combined field count of class and instance-specific fields, thus any addition or deletion of a class-level static helper field before the field of interest can invalidate any hardcoded index value used by the compiler. 
All in all, for safe and reliable unboxing by compiler, it's necessary to create an upfront VM representation like vector_VectorPayload.

Q2. What are the pros and cons of passing both the unboxed value and boxed values to the intrinsic entry point?
A.  
Pros:
- This will save unsafe unboxing implementation if the holder class is not part of java.base module.
- We can leverage existing box intrinsification infrastructure which directly forwards the embedded values to its users.
- Also, it will minimize the changes in the Java side implementation.

Cons:
- It's suboptimal in case the call is neither intrinsified or inlined, as it will add additional spills before the call.

Q3.  Primitive box class boxing API “valueOf” accepts an argument of the corresponding primitive type. How different are Float16 boxing APIs. 
A.  Unlike primitive box classes, Float16 has multiple boxing APIs and none of them accept a short type argument. 
    public static Float16 valueOf(int value) 
    public static Float16 valueOf(long value) 
    public static Float16 valueOf(float f) 
    public static Float16 valueOf(double d) 
    public static Float16 valueOf(String s) throws NumberFormatException 
    public static Float16 valueOf(BigDecimal v)
    public static Float16 valueOf(BigInteger bi) 
Thus, we need to add special handling to first downcast the parameter value to short type carrier otherwise it will pose problems in forwarding the boxed values.  Existing LoadNode idealization directly forwards the input of unboxed Call IR to its users. To use existing idealization, we need to massage the input of unboxed Call IR to the exact carrier size, so it’s not a meager one-line change in the following methods to enable seamless intrinsification of Float16 boxing APIs.
bool ciMethod::is_boxing_method() const 
bool ciMethod::is_unboxing_method() const 

Given the above observations passing 3 additional box arguments to intrinsic and returning a box value needs additional changes in the compiler while minor re-structuring in Java implementation packed with in the glue logic looks like a reasonable approach.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1914782512