RFR: 8342103: C2 compiler support for Float16 type and associated scalar operations [v3]

Tue Dec 17 07:50:04 UTC 2024

On Mon, 16 Dec 2024 14:23:16 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Hi All,
>> 
>> This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128)
>> 
>> Following is the summary of changes included with this patch:-
>> 
>> 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations.
>> 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization.
>> 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class.
>>       -    These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values.
>> 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines.
>> 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/22754#issuecomment-2543982577)for more details.
>> 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA generally operates over floating point registers, thus the compiler injects reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa.
>> 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF
>> 9. X86  backend implementation for all supported intrinsics.
>> 10. Functional and Performance validation tests.
>> 
>> Kindly review the patch and share your feedback.
>> 
>> Best Regards,
>> Jatin
>
> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Adding more test points

@jatin-bhateja I took 1h to go over this change. I left 15 comments, probably some of them you can just answer by a quick explanation / pointing to the relevant test.

src/hotspot/share/opto/convertnode.cpp line 282:

> 280:       return new ReinterpretHF2SNode(binop);
> 281:     }
> 282:   }

Where are the constant folding tests for this?

src/hotspot/share/opto/convertnode.cpp line 960:

> 958:   }
> 959:   return TypeInt::SHORT;
> 960: }

Do we have tests for these constant folding operations?

src/hotspot/share/opto/divnode.cpp line 815:

> 813:       !g_isnan(t1->getf()) && g_isfinite(t1->getf()) && t1->getf() != 0.0) { // could be negative ZERO or NaN
> 814:     return TypeH::ONE;
> 815:   }

Do we cover all cases here?

src/hotspot/share/opto/divnode.cpp line 821:

> 819:   }
> 820: 
> 821:   // If divisor is a constant and not zero, divide them numbers

Suggestion:

  // If divisor is a constant and not zero, divide the numbers

src/hotspot/share/opto/divnode.cpp line 826:

> 824:       t2->getf() != 0.0)  {
> 825:     // could be negative zero
> 826:     return TypeH::make(t1->getf()/t2->getf());

Suggestion:

    return TypeH::make(t1->getf() / t2->getf());

src/hotspot/share/opto/divnode.cpp line 840:

> 838:   if (g_isnan(t1->getf()) || g_isnan(t2->getf())) {
> 839:     return TypeH::make(NAN);
> 840:   }

I'm a little confused here. We are working with nodes that have type Float16, but we are asking for Float constants here. Why is that, how does it work?

src/hotspot/share/opto/subnode.cpp line 566:

> 564:     return t1;
> 565:   }
> 566:   else if(g_isnan(t2->getf())) {

General question: why are you using `getf` and not `geth` all over the code?

src/hotspot/share/opto/type.cpp line 1465:

> 1463: //------------------------------meet-------------------------------------------
> 1464: // Compute the MEET of two types.  It returns a new Type object.
> 1465: const Type *TypeH::xmeet( const Type *t ) const {

Please write `TypeH*` and not `TypeH *`

src/hotspot/share/opto/type.cpp line 1530:

> 1528: uint TypeH::hash(void) const {
> 1529:   return *(uint*)(&_f);
> 1530: }

I just saw that `_f` is a `short`, which I think is 16 bits, right? And the cast to `uint` would mean we take 32 bits. That looks a bit off, but maybe it is not. Can you explain, and maybe also put a comment in the code for that?

test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java line 275:

> 273:     @IR(counts = {IRNode.ADD_HF, " 0 ", IRNode.REINTERPRET_S2HF, " 0 ", IRNode.REINTERPRET_HF2S, " 0 "},
> 274:         applyIfCPUFeature = {"avx512_fp16", "true"})
> 275:     public void testAddConstantFolding() {

Ok, this is great. I'm missing some cases that check correct rounding. For that, it might be good to have one example with random constants, so 2 random Float16 values. You can generate them in static context, and also compute the result in static context, so it should be evaluated in the interpreter. That way, we can compare the result of interpreter to compiled code.

Do that for all operations.

test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java line 421:

> 419: 
> 420:         assertResult(divide(valueOf(2.0f), valueOf(2.0f)).floatValue(), 1.0f, "testDivConstantFolding");
> 421:     }

What about cases like `x/x`, where `x` is a variable, and then feed in all sorts of values, including NaN. I think there we must ensure that it does not fold to `1`. Could be a separate IR test.

But also `x/x` with all sorts of constants is relevant. It would test this section in the `Ideal` code:

  // x/x == 1, we ignore 0/0.
  // Note: if t1 and t2 are zero then result is NaN (JVMS page 213)
  // Does not work for variables because of NaN's
  if (in(1) == in(2) && t1->base() == Type::HalfFloatCon &&
      !g_isnan(t1->getf()) && g_isfinite(t1->getf()) && t1->getf() != 0.0) { // could be negative ZERO or NaN
    return TypeH::ONE;
  }

test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java line 494:

> 492:         assertResult(fma(valueOf(1.0f), valueOf(2.0f), valueOf(3.0f)).floatValue(), 1.0f * 2.0f + 3.0f, "testFMAConstantFolding");
> 493:     }
> 494: }

I am missing constant folding tests with `shortBitsToFloat16` etc.

-------------

Changes requested by epeter (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/22754#pullrequestreview-2508020252
PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1888008209
PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1888009160
PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1888012154
PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1888027070
PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1888027339
PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1888038360
PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1888030240
PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1888013140
PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1888017396
PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1888005513
PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1888026278
PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1888021315