[lworld+fp16] RFR: 8341414: Add support for FP16 conversion routines [v2]

Thu Nov 7 07:27:56 UTC 2024

On Thu, 31 Oct 2024 13:50:40 GMT, Bhavana Kilambi <bkilambi at openjdk.org> wrote:

>> This patch adds intrinsic support for FP16 conversion routines to int/long/double and also the aarch64 backend support. This patch implements both scalar and vector versions for these conversions.
>> 
>> Performance numbers on aarch64 machine with SVE support :
>> 
>> 
>> Benchmark                         (vectorDim)   Gain
>> Float16OpsBenchmark.fp16ToDouble  1024          18.23
>> Float16OpsBenchmark.fp16ToInt     1024          1.93
>> Float16OpsBenchmark.fp16ToLong    1024          3.95
>> 
>> 
>> The Gain column is the ratio between thrpt of this patch and the thrpt with the intrinsics disabled (which generates FP32 arithmetic).
>
> Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Remove intrinsification of conversion methods in Float16

My bad, I meant the other way round i.e. integral to float16 conversion case, which takes a slow path route currently.  Consider the following micro kernel:-

public class float16_allocation {
   public static float micro(int value) {
       Float16 val = Float16.valueOf(value); // [a]
       return val.floatValue();              // [b]
   }

   public static void main(String [] args) {
       float res = 0.0f;
       for (int i = 0; i < 100000; i++) {
           res += micro(i);
       }
       System.out.println("[res]" + res);
   }
}

Here, the integer parameter is first converted to float16 value [a],  valueOf routine first type cast integer value to double type and then passes it to Float16.valueOf(double) routine resulting in a bulky JIT sequence. 

We can outline the following code [c] into a new leaf routine returning a short value, and directly pass it to the Float16 constructor similar to https://github.com/openjdk/valhalla/blob/lworld%2Bfp16/src/java.base/share/classes/java/lang/Float16.java#L411

New routine can then be intrinsified to yield ConvI2HF IR, which then gets boxed as a value object. Since Float16 is a value type, it will scalarize its field accesses, thus directly forwarding HF ('short') value to subsequent ConvHF2F [b].  On mainline where Float16 is a value-based class we can bank on escape analysis to eliminate redundant boxing allocations. 

    public static Float16 valueOf(int value) {
        // int -> double conversion is exact
        return valueOf((double)value);     // [c] 
    }

We can spill this over to another patch if you suggest it, kindly let me know your views.

Best Regards,
Jatin

-------------

PR Comment: https://git.openjdk.org/valhalla/pull/1283#issuecomment-2461500053