[lworld+fp16] RFR: 8308363: Initial compiler support for FP16 scalar operations. [v4]

Thu Aug 24 08:53:48 UTC 2023

On Fri, 18 Aug 2023 18:56:32 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Starting with 4th Generation Xeon, Intel has made extensive extensions to existing ISA to support 16 bit scalar and vector floating point operations based on IEEE 754 binary16 format.
>> 
>> We plan to support this in multiple stages spanning across Java side definition of Float16 type, scalar operation and finally SLP vectorization support.
>> 
>> This patch adds  minimal Java and Compiler side support for one API Float16.add.
>> 
>> **Summary of changes :-**
>> - Minimal implementation of Float16 primitive class supporting one operation (Float16.add)
>> - X86 AVX512-FP16 feature detection at VM startup.
>> - C2 IR and Inline expander changes for Float16.add API.
>> - FP16 constant folding handling.
>> - Backend support : Instruction selection patterns and assembler support.
>> - New IR framework and functional tests.
>> 
>> **Implementation details:-**
>> 
>> 1/ Newly defined Float16 class encapsulate a short value holding IEEE 754 binary16 encoded value.
>> 
>> 2/ Float16 is a primitive class which in future will be aligned with other enhanced primitive wrapper classes proposed by [JEP-402.](https://openjdk.org/jeps/402)
>> 
>> 3/ Float16 to support all the operations supported by corresponding Float class.
>> 
>> 4/ Java implementation of each API will internally perform floating point operation at FP32 granularity.
>> 
>> 5/ API which can be directly mapped to an Intel AVX512FP16 instruction will be a candidate for intensification by C2 compiler.
>> 
>> 6/ With Valhalla, C2 compiler always creates an InlineType IR node for a value class instance.
>> Total number of inputs of an InlineType node match the number of non-static fields. In this case node will have one input of short type TypeInt::SHORT.
>> 
>> 7/ Since all the scalar AVX512FP16 instructions operate on floating point registers and Float16 backing storage is held in a general-purpose register hence we need to introduce appropriate conversion IR which moves a 16-bit value from GPR to a XMM register and vice versa.
>> ![image](https://github.com/openjdk/valhalla/assets/59989778/192fca7e-6b7e-4e62-9b09-677e33eca48d)
>> 
>> 8/ Current plan is to introduce a new IR node for each operation which is a subclass of its corresponding single precision IR node. This will allow leveraging idealization routines (Ideal/Identity/Value) of its parent operation.
>> 
>> 9/ All the single/double precision IR nodes carry a Type::FLOAT/DOUBLE ideal type. This represents entire FP32/64 value range and is different from integral types which expli...
>
> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Addressing offline review comments from Sandhya, new IR test addition.

src/hotspot/share/classfile/vmIntrinsics.hpp line 201:

> 199:   /* Float16 intrinsics, similar to what we have in Math. */                                                            \
> 200:   do_intrinsic(_add_float16,              java_lang_Float16,      add_name,           floa16_float16_signature,  F_R)   \
> 201:    do_name(add_name,    "add")                                                                                          \

style: please remove extra spaces between `add_name` and `"add"`.

src/hotspot/share/opto/addnode.hpp line 133:

> 131: 
> 132: //------------------------------AddHFNode---------------------------------------
> 133: // Add 2 floats

Change to: `Add 2 half-precision floats` ?

src/hotspot/share/opto/convertnode.hpp line 179:

> 177: class ReinterpretS2HFNode : public Node {
> 178:   public:
> 179:   ReinterpretS2HFNode( Node *in1 ) : Node(0,in1) {}

Suggest changes:

ReinterpretS2HFNode(Node* in1) : Node(0, in1) {}

src/hotspot/share/opto/convertnode.hpp line 181:

> 179:   ReinterpretS2HFNode( Node *in1 ) : Node(0,in1) {}
> 180:   virtual int Opcode() const;
> 181:   virtual const Type *bottom_type() const { return Type::FLOAT; }

Suggest to:

virtual const Type* bottom_type() const { return Type::FLOAT; }

src/hotspot/share/opto/convertnode.hpp line 191:

> 189:   virtual int Opcode() const;
> 190:   virtual const Type* Value(PhaseGVN* phase) const;
> 191:   virtual const Type *bottom_type() const { return TypeInt::SHORT; }

Same as above.

src/hotspot/share/opto/library_call.cpp line 4794:

> 4792: 
> 4793: bool LibraryCallKit::inline_fp16_operations(vmIntrinsics::ID id) {
> 4794:   Node* result = NULL;

Please use `nullptr` instead of `NULL`.

-------------

PR Review Comment: https://git.openjdk.org/valhalla/pull/848#discussion_r1304001891
PR Review Comment: https://git.openjdk.org/valhalla/pull/848#discussion_r1304004451
PR Review Comment: https://git.openjdk.org/valhalla/pull/848#discussion_r1304009240
PR Review Comment: https://git.openjdk.org/valhalla/pull/848#discussion_r1304009952
PR Review Comment: https://git.openjdk.org/valhalla/pull/848#discussion_r1304010260
PR Review Comment: https://git.openjdk.org/valhalla/pull/848#discussion_r1304012865