RFR: 8342103: C2 compiler support for Float16 type and associated operations [v2]

Mon Nov 25 08:05:36 UTC 2024

On Fri, 22 Nov 2024 10:36:10 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Hi All,
>> 
>> This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128)
>> 
>> Following is the summary of changes included with this patch:-
>> 
>> 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations.
>> 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization.
>> 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class.
>>       -    These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values.
>> 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines.
>> 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more details.
>> 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa.
>> 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF
>> 6. Auto-vectorization of newly supported scalar operations.
>> 7. X86 and AARCH64 backend implementation for all supported intrinsics.
>> 9. Functional and Performance validation tests.
>> 
>> **Missing Pieces:-**
>> **-  AARCH64 Backend.**
>> 
>> Kindly review and share your feedback.
>> 
>> Best Regards,
>> Jatin
>
> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Testpoints for new value transforms + code cleanups

Wow, thanks for tackling this!

Ok, lots of style comments.

But again:
I would have loved to see this split up into these parts:
- Scalar
- Scalar optimizations (value, ideal, identity)
- Vector

This will again take many many week to get reviewed because it is a 3k+ change with lots of details.

Do you have any tests for the scalar constant folding optimizations? I did not find them.

src/hotspot/cpu/x86/x86.ad line 10910:

> 10908: %}
> 10909: 
> 10910: instruct convF2HFAndS2HF(regF dst, regF src)

I'm starting to see that you use sometimes `H` and sometimes `HF`. That needs to be consistent - unless they are 2 different things?

src/hotspot/cpu/x86/x86.ad line 10930:

> 10928: %}
> 10929: 
> 10930: instruct scalar_sqrt_fp16_reg(regF dst, regF src)

Hmm, and them you also use `fp16`... so now we have `H`, `HF` and `fp16`...

src/hotspot/share/opto/addnode.cpp line 713:

> 711: //------------------------------add_of_identity--------------------------------
> 712: // Check for addition of the identity
> 713: const Type *AddHFNode::add_of_identity(const Type *t1, const Type *t2) const {

I would generally drop out these comments, unless they actually have something useful to say that the name does not say. You could make a comment why you are returning `nullptr`, i.e. doing nothing.

And for style: the `*` belongs with the type ;)
Suggestion:

const Type* AddHFNode::add_of_identity(const Type* t1, const Type* t2) const {

src/hotspot/share/opto/addnode.cpp line 721:

> 719: // This also type-checks the inputs for sanity.  Guaranteed never to
> 720: // be passed a TOP or BOTTOM type, these are filtered out by pre-check.
> 721: const Type *AddHFNode::add_ring(const Type *t0, const Type *t1) const {

Suggestion:

// Supplied function returns the sum of the inputs.
// This also type-checks the inputs for sanity.  Guaranteed never to
// be passed a TOP or BOTTOM type, these are filtered out by pre-check.
const Type* AddHFNode::add_ring(const Type* t0, const Type* t1) const {

Here the comments are great :)

src/hotspot/share/opto/addnode.cpp line 1625:

> 1623: 
> 1624:   // handle min of 0.0, -0.0 case.
> 1625:   return (jint_cast(f0) < jint_cast(f1)) ? r0 : r1;

Can you please add some comments for this here? Why is there an int-case on floats? Why not just do the ternary comparison on line 1621: `return f0 < f1 ? r0 : r1;`?

src/hotspot/share/opto/addnode.hpp line 179:

> 177:   virtual Node* Identity(PhaseGVN* phase) { return this; }
> 178:   virtual uint ideal_reg() const { return Op_RegF; }
> 179: };

Please put the `*` with the type everywhere.

src/hotspot/share/opto/connode.cpp line 49:

> 47:   switch( t->basic_type() ) {
> 48:   case T_INT:         return new ConINode( t->is_int() );
> 49:   case T_SHORT:       return new ConHNode( t->is_half_float_constant() );

That will be quite confusing.... don't you think?

src/hotspot/share/opto/connode.hpp line 122:

> 120: class ConHNode : public ConNode {
> 121: public:
> 122:   ConHNode( const TypeH *t ) : ConNode(t) {}

Suggestion:

  ConHNode(const TypeH* t) : ConNode(t) {}

src/hotspot/share/opto/connode.hpp line 129:

> 127:     return new ConHNode( TypeH::make(con) );
> 128:   }
> 129: 

Suggestion:

src/hotspot/share/opto/convertnode.cpp line 256:

> 254: //------------------------------Ideal------------------------------------------
> 255: Node* ConvF2HFNode::Ideal(PhaseGVN* phase, bool can_reshape) {
> 256:   // Optimize pattern - ConvHF2F (FP32BinOp) ConvF2HF ==> ReinterpretS2HF (FP16BinOp) ReinterpretHF2S.

This is a little dense and I don't understand your notation.

So we are pattern matching:
`ConvF2HF( FP32BinOp(ConvHF2F(x), ConvHF2F(y)) )` <- I think that would be more readable.
You could also create local variables for `x` and `y`, just so it is more readable.
And then instead we generate:
`ReinterpretHF2S(FP16BinOp(ReinterpretS2HF(x), ReinterpretS2HF(y)))`

Ok, so you are saying why lift to FP32, if we cast down to FP16 anyway... would be nice to have such a comment at the top to motivate the optimization!

What confuses me a little here: why do we even have to cast from and to `short` here? Maybe a quick comment about that would also help.

src/hotspot/share/opto/convertnode.cpp line 948:

> 946: }
> 947: 
> 948: bool Float16NodeFactory::is_binary_oper(int opc) {

Suggestion:

bool Float16NodeFactory::is_float32_binary_oper(int opc) {

Just so it is explicit, since you have the parallel `get_float16_binary_oper` below.

src/hotspot/share/opto/convertnode.hpp line 234:

> 232: class ReinterpretHF2SNode : public Node {
> 233:   public:
> 234:   ReinterpretHF2SNode( Node *in1 ) : Node(0,in1) {}

Suggestion:

  ReinterpretHF2SNode(Node* in1) : Node(0, in1) {}

src/hotspot/share/opto/divnode.cpp line 759:

> 757:   const Type* t2 = phase->type(in(2));
> 758:   if(t1 == Type::TOP) return Type::TOP;
> 759:   if(t2 == Type::TOP) return Type::TOP;

Suggestion:

  if(t1 == Type::TOP) { return Type::TOP; }
  if(t2 == Type::TOP) { return Type::TOP; }

Please use the brackets consistently.

src/hotspot/share/opto/divnode.cpp line 765:

> 763:   if((t1 == bot) || (t2 == bot) ||
> 764:      (t1 == Type::BOTTOM) || (t2 == Type::BOTTOM))
> 765:     return bot;

Suggestion:

  if((t1 == bot) || (t2 == bot) ||
     (t1 == Type::BOTTOM) || (t2 == Type::BOTTOM)) {
    return bot;
  }

Again: please always use brackets.

src/hotspot/share/opto/divnode.cpp line 776:

> 774: 
> 775:   if(t2 == TypeH::ONE)
> 776:     return t1;

brackets

src/hotspot/share/opto/divnode.cpp line 782:

> 780:      t2->base() == Type::HalfFloatCon &&
> 781:      t2->getf() != 0.0) // could be negative zero
> 782:     return TypeH::make(t1->getf()/t2->getf());

Suggestion:

  // If divisor is a constant and not zero, divide the numbers
  if(t1->base() == Type::HalfFloatCon &&
     t2->base() == Type::HalfFloatCon &&
     t2->getf() != 0.0) { // could be negative zero
    return TypeH::make(t1->getf() / t2->getf());
  }

src/hotspot/share/opto/divnode.cpp line 789:

> 787: 
> 788:   if(t1 == TypeH::ZERO && !g_isnan(t2->getf()) && t2->getf() != 0.0)
> 789:     return TypeH::ZERO;

brackets for if

Ok, why not also do it for negative zero then?

src/hotspot/share/opto/divnode.cpp line 797:

> 795: //------------------------------isA_Copy---------------------------------------
> 796: // Dividing by self is 1.
> 797: // If the divisor is 1, we are an identity on the dividend.

Suggestion:

// If the divisor is 1, we are an identity on the dividend.

`Dividing by self is 1.` That does not seem to apply here. Maybe you meant `dividing by 1 is self`?

src/hotspot/share/opto/divnode.cpp line 804:

> 802: 
> 803: //------------------------------Idealize---------------------------------------
> 804: Node *DivHFNode::Ideal(PhaseGVN* phase, bool can_reshape) {

Suggestion:

Node* DivHFNode::Ideal(PhaseGVN* phase, bool can_reshape) {

src/hotspot/share/opto/divnode.cpp line 805:

> 803: //------------------------------Idealize---------------------------------------
> 804: Node *DivHFNode::Ideal(PhaseGVN* phase, bool can_reshape) {
> 805:   if (in(0) && remove_dead_region(phase, can_reshape))  return this;

Suggestion:

  if (in(0) != nullptr && remove_dead_region(phase, can_reshape)) { return this; }

brackets for if and no implicit null checks please!

src/hotspot/share/opto/divnode.cpp line 814:

> 812: 
> 813:   const TypeH* tf = t2->isa_half_float_constant();
> 814:   if(!tf) return nullptr;

no implicit booleans!

src/hotspot/share/opto/divnode.cpp line 836:

> 834: 
> 835:   // return multiplication by the reciprocal
> 836:   return (new MulHFNode(in(1), phase->makecon(TypeH::make(reciprocal))));

Do we have good tests for this optimization?

src/hotspot/share/opto/mulnode.cpp line 559:

> 557: 
> 558: // Compute the product type of two half float ranges into this node.
> 559: const Type *MulHFNode::mul_ring(const Type *t0, const Type *t1) const {

Suggestion:

const Type* MulHFNode::mul_ring(const Type* t0, const Type* t1) const {

src/hotspot/share/opto/mulnode.cpp line 561:

> 559: const Type *MulHFNode::mul_ring(const Type *t0, const Type *t1) const {
> 560:   if( t0 == Type::HALF_FLOAT || t1 == Type::HALF_FLOAT ) return Type::HALF_FLOAT;
> 561:   return TypeH::make( t0->getf() * t1->getf() );

I hope that `TypeH::make` handles the overflow cases well... does it?
And do we have tests for this?

src/hotspot/share/opto/mulnode.cpp line 1945:

> 1943:   return TypeH::make(fma(f1, f2, f3));
> 1944: #endif
> 1945: }

I need:
- brackets for ifs
- all `*` on the left with the type
- An explanation what the `ifdef __STDC_IEC_559__` does.

src/hotspot/share/opto/mulnode.hpp line 155:

> 153:   virtual const Type *mul_ring( const Type *, const Type * ) const;
> 154:   const Type *mul_id() const { return TypeH::ONE; }
> 155:   const Type *add_id() const { return TypeH::ZERO; }

Suggestion:

  const Type* mul_id() const { return TypeH::ONE; }
  const Type* add_id() const { return TypeH::ZERO; }

src/hotspot/share/opto/mulnode.hpp line 160:

> 158:   int max_opcode() const { return Op_MaxHF; }
> 159:   int min_opcode() const { return Op_MinHF; }
> 160:   const Type *bottom_type() const { return Type::HALF_FLOAT; }

Suggestion:

  const Type* bottom_type() const { return Type::HALF_FLOAT; }

src/hotspot/share/opto/subnode.cpp line 1975:

> 1973:   if( f < 0.0f ) return Type::HALF_FLOAT;
> 1974:   return TypeH::make( (float)sqrt( (double)f ) );
> 1975: }

if brackets and asterisks with types please

src/hotspot/share/opto/subnode.hpp line 143:

> 141:   const Type   *bottom_type() const { return Type::HALF_FLOAT; }
> 142:   virtual uint  ideal_reg() const { return Op_RegF; }
> 143: };

Suggestion:

//------------------------------SubHFNode--------------------------------------
// Subtract 2 half floats
class SubHFNode : public SubFPNode {
public:
  SubHFNode(Node* in1, Node* in2) : SubFPNode(in1, in2) {}
  virtual int Opcode() const;
  virtual const Type* sub(const Type *, const Type *) const;
  const Type* add_id() const { return TypeH::ZERO; }
  const Type* bottom_type() const { return Type::HALF_FLOAT; }
  virtual uint  ideal_reg() const { return Op_RegF; }
};

src/hotspot/share/opto/subnode.hpp line 552:

> 550:   }
> 551:   virtual int Opcode() const;
> 552:   const Type *bottom_type() const { return Type::HALF_FLOAT; }

Suggestion:

  const Type* bottom_type() const { return Type::HALF_FLOAT; }

src/hotspot/share/opto/type.cpp line 1487:

> 1485:     typerr(t);
> 1486: 
> 1487:   case HalfFloatCon:                // Float-constant vs Float-constant?

Suggestion:

  case HalfFloatCon:             // Float-constant vs Float-constant?

-------------

Changes requested by epeter (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/21490#pullrequestreview-2457382009
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855943470
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855944584
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855948500
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855950333
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855954166
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855955074
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855958333
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855958773
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855959025
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855977560
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855981273
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855982405
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855984366
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855985484
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855988545
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855989752
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855992127
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855994876
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855995436
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855996454
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1856000589
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1856002336
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1856007382
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1856006524
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1856009749
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1856010212
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1856010391
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1856013278
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1856013945
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1856014893
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1856016525