RFR: 8342103: C2 compiler support for Float16 type and associated scalar operations [v3]

Tue Dec 17 11:09:10 UTC 2024

On Tue, 17 Dec 2024 07:15:35 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Adding more test points
>
> src/hotspot/share/opto/convertnode.cpp line 282:
> 
>> 280:       return new ReinterpretHF2SNode(binop);
>> 281:     }
>> 282:   }
> 
> Where are the constant folding tests for this?

This is the core idealization logic which infers FP16 IR. Every test point added in the test points added in test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java verifies this.

> src/hotspot/share/opto/divnode.cpp line 815:
> 
>> 813:       !g_isnan(t1->getf()) && g_isfinite(t1->getf()) && t1->getf() != 0.0) { // could be negative ZERO or NaN
>> 814:     return TypeH::ONE;
>> 815:   }
> 
> Do we cover all cases here?

Please refer to test point at https://github.com/openjdk/jdk/pull/22754/files#diff-3f8786f9f62662eda4b4a5c76c01fa04534c94d870d496501bfc20434ad45579R419

> src/hotspot/share/opto/divnode.cpp line 840:
> 
>> 838:   if (g_isnan(t1->getf()) || g_isnan(t2->getf())) {
>> 839:     return TypeH::make(NAN);
>> 840:   }
> 
> I'm a little confused here. We are working with nodes that have type Float16, but we are asking for Float constants here. Why is that, how does it work?

Please refer to PhaseIGVN::transform, we create constant IR for singleton types. 
https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/phaseX.cpp#L721

> src/hotspot/share/opto/subnode.cpp line 566:
> 
>> 564:     return t1;
>> 565:   }
>> 566:   else if(g_isnan(t2->getf())) {
> 
> General question: why are you using `getf` and not `geth` all over the code?

getf is a routine that performs half float to float conversion for TypeH.

> src/hotspot/share/opto/type.cpp line 1530:
> 
>> 1528: uint TypeH::hash(void) const {
>> 1529:   return *(uint*)(&_f);
>> 1530: }
> 
> I just saw that `_f` is a `short`, which I think is 16 bits, right? And the cast to `uint` would mean we take 32 bits. That looks a bit off, but maybe it is not. Can you explain, and maybe also put a comment in the code for that?

This is to comply with Node::hash signature which returns uint value

> test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java line 275:
> 
>> 273:     @IR(counts = {IRNode.ADD_HF, " 0 ", IRNode.REINTERPRET_S2HF, " 0 ", IRNode.REINTERPRET_HF2S, " 0 "},
>> 274:         applyIfCPUFeature = {"avx512_fp16", "true"})
>> 275:     public void testAddConstantFolding() {
> 
> Ok, this is great. I'm missing some cases that check correct rounding. For that, it might be good to have one example with random constants, so 2 random Float16 values. You can generate them in static context, and also compute the result in static context, so it should be evaluated in the interpreter. That way, we can compare the result of interpreter to compiled code.
> 
> Do that for all operations.

Hey @eme64 , constant folding is done at FP32 granularity, so we first upcast FP16 to FP32 values using hf2f runtime helper, operate over FP32 values and then down cast it back to FP16 value using f2hf helper.  Thus both compiler value transformations and interpreter use the same runtime helper underneath.

Fallback implementation of each Float16 API is using Float.floatToFloat16 and Float.floa16ToFloat routines which are intrinsified at [interpreter level.](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/interpreter/templateInterpreterGenerator.cpp#L488), these interpreter intrinsic invokes same leaf level macro assembly routine flt16_to_flt which is also called though runtime helpers. 

So it may not add much value to do interpreter vs compiler comparison in these cases.

> test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java line 421:
> 
>> 419: 
>> 420:         assertResult(divide(valueOf(2.0f), valueOf(2.0f)).floatValue(), 1.0f, "testDivConstantFolding");
>> 421:     }
> 
> What about cases like `x/x`, where `x` is a variable, and then feed in all sorts of values, including NaN. I think there we must ensure that it does not fold to `1`. Could be a separate IR test.
> 
> But also `x/x` with all sorts of constants is relevant. It would test this section in the `Ideal` code:
> 
>   // x/x == 1, we ignore 0/0.
>   // Note: if t1 and t2 are zero then result is NaN (JVMS page 213)
>   // Does not work for variables because of NaN's
>   if (in(1) == in(2) && t1->base() == Type::HalfFloatCon &&
>       !g_isnan(t1->getf()) && g_isfinite(t1->getf()) && t1->getf() != 0.0) { // could be negative ZERO or NaN
>     return TypeH::ONE;
>   }

The above predicate filters out the NaN dividend case. For the variable argument, we rely on the x86 floating point ISA specification, which complies with the IEEE 754 floating point specification.

Please refer to section 4.8.3.5 Operating on SNaNs and QNaNs for Intel Software Development Manual for more details. 

Note: Float16 to float conversion helpers preserve the NaN significand bits, but Java only deals in QNaN values.  I am adding a few test points for signaling NaN.

> test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java line 494:
> 
>> 492:         assertResult(fma(valueOf(1.0f), valueOf(2.0f), valueOf(3.0f)).floatValue(), 1.0f * 2.0f + 3.0f, "testFMAConstantFolding");
>> 493:     }
>> 494: }
> 
> I am missing constant folding tests with `shortBitsToFloat16` etc.

Added a few test points for the same

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1888331413
PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1888331196
PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1888330121
PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1888330274
PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1888331089
PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1888331480
PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1888330506
PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1888330775