[lworld+fp16] Integrated: 8308363: Initial compiler support for FP16 scalar operations.

Tue Sep 19 16:45:17 UTC 2023

On Mon, 22 May 2023 17:07:42 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

> Starting with 4th Generation Xeon, Intel has made extensive extensions to existing ISA to support 16 bit scalar and vector floating point operations based on IEEE 754 binary16 format.
> 
> We plan to support this in multiple stages spanning across Java side definition of Float16 type, scalar operation and finally SLP vectorization support.
> 
> This patch adds  minimal Java and Compiler side support for one API Float16.add.
> 
> **Summary of changes :-**
> - Minimal implementation of Float16 primitive class supporting one operation (Float16.add)
> - X86 AVX512-FP16 feature detection at VM startup.
> - C2 IR and Inline expander changes for Float16.add API.
> - FP16 constant folding handling.
> - Backend support : Instruction selection patterns and assembler support.
> - New IR framework and functional tests.
> 
> **Implementation details:-**
> 
> 1/ Newly defined Float16 class encapsulate a short value holding IEEE 754 binary16 encoded value.
> 
> 2/ Float16 is a primitive class which in future will be aligned with other enhanced primitive wrapper classes proposed by [JEP-402.](https://openjdk.org/jeps/402)
> 
> 3/ Float16 to support all the operations supported by corresponding Float class.
> 
> 4/ Java implementation of each API will internally perform floating point operation at FP32 granularity.
> 
> 5/ API which can be directly mapped to an Intel AVX512FP16 instruction will be a candidate for intensification by C2 compiler.
> 
> 6/ With Valhalla, C2 compiler always creates an InlineType IR node for a value class instance.
> Total number of inputs of an InlineType node match the number of non-static fields. In this case node will have one input of short type TypeInt::SHORT.
> 
> 7/ Since all the scalar AVX512FP16 instructions operate on floating point registers and Float16 backing storage is held in a general-purpose register hence we need to introduce appropriate conversion IR which moves a 16-bit value from GPR to a XMM register and vice versa.
> ![image](https://github.com/openjdk/valhalla/assets/59989778/192fca7e-6b7e-4e62-9b09-677e33eca48d)
> 
> 8/ Current plan is to introduce a new IR node for each operation which is a subclass of its corresponding single precision IR node. This will allow leveraging idealization routines (Ideal/Identity/Value) of its parent operation.
> 
> 9/ All the single/double precision IR nodes carry a Type::FLOAT/DOUBLE ideal type. This represents entire FP32/64 value range and is different from integral types which explicitly record lower and upper bounds of value ranges. Value resolution ...

This pull request has now been integrated.

Changeset: f03fb4e4
Author:    Jatin Bhateja <jbhateja at openjdk.org>
URL:       https://git.openjdk.org/valhalla/commit/f03fb4e4ee4d59ed692d0c26ddce260511f544e7
Stats:     883 lines in 33 files changed: 870 ins; 3 del; 10 mod

8308363: Initial compiler support for FP16 scalar operations.

Reviewed-by: sviswanathan

-------------

PR: https://git.openjdk.org/valhalla/pull/848