RFR: 8342103: C2 compiler support for Float16 type and associated scalar operations [v9]

Fri Jan 17 16:02:55 UTC 2025

On Wed, 15 Jan 2025 00:28:50 GMT, Paul Sandoz <psandoz at openjdk.org> wrote:

>> Hi @PaulSandoz ,
>> 
>> In  above code snippet the return type 'short' of intrinsic call does not comply with the value being returned which is of box type, thereby mandating addition glue code. 
>> 
>> Regular primitive type boxing APIs are lazily intrinsified, thereby generating  an intrinsifiable Call IR during parsing. 
>> LoadNode’s idealization can fetch a boxed value from the input of boxing call IR and directly forward it to users.
>> 
>> Q1. What is the problem in directly passing Float16 boxes to FMA and SQRT intrinsic entry points?
>> 
>> A. The compiler will have to unbox them before the actual operation. There are multiple schemes to perform unboxing, such as name-based, offset-based, and index-based field lookup. 
>> Vector API unbox expansion uses an offset-based payload field lookup, for this it bookkeeps the payload’s offset over runtime representation of VectorPayload class created as part of VM initialization.
>> However, VM can only bookkeep this information for classes that are part of java.base module, Float16 being part of incubation module cannot use offset-based field lookup.  Thus only viable alternative is to unbox using field name/index based lookup.
>> For this compiler will first verify that the incoming oop is of Float16 type and then use a hardcoded name-based lookup to Load the field value. This looks fragile as it establishes an unwanted dependency b/w Float16 field names and compiler implementation, same applies to index-based lookup as index values are dependent onthe combined field count of class and instance-specific fields, thus any addition or deletion of a class-level static helper field before the field of interest can invalidate any hardcoded index value used by the compiler. 
>> All in all, for safe and reliable unboxing by compiler, it's necessary to create an upfront VM representation like vector_VectorPayload.
>> 
>> Q2. What are the pros and cons of passing both the unboxed value and boxed values to the intrinsic entry point?
>> A.  
>> Pros:
>> - This will save unsafe unboxing implementation if the holder class is not part of java.base module.
>> - We can leverage existing box intrinsification infrastructure which directly forwards the embedded values to its users.
>> - Also, it will minimize the changes in the Java side implementation.
>> 
>> Cons:
>> - It's suboptimal in case the call is neither intrinsified or inlined, as it will add additional spills before the call.
>> 
>> Q3.  Primitive box class boxing API “valueOf” accepts...
>
>> Given the above observations passing 3 additional box arguments to intrinsic and returning a box value needs additional changes in the compiler while minor re-structuring in Java implementation packed with in the glue logic looks like a reasonable approach.
> 
> Did you mean to say *no* additional changes in the compiler? Otherwise, if not what would those changes be?

Hi @PaulSandoz ,

Many thanks for your suggestion. From a compiler standpoint, changes are mainly around unboxing/boxing arguments and return values. Since the _Float16_ class is defined in the incubation module, we cannot refer to it in the wrapper class _Float16Math_ which is part of _java.base_ module. Thus, lambda must pass the boxes as _java.lang.Object_ type arguments will introduce additional type sharpening casts in lambda and defeat our purpose of preserving unmodified code in lambda. The current scheme only deals in intrinsic with short parameters and return values and is free from these concerns.

The user may declare MUL and ADD as separate operations, and to honor JLS specification and precise floating-point model semantics we should not infer an FMA scalar operation, thus, intrinsification is appropriate here rather than a pattern match.

With this incremental commit, I am trying to minimize the Java side changes. Let me know if this looks fine to you.

Verified that auto-vectorization planned for follow-on patch is aligned with new changes.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1920398466