[lworld+fp16] RFR: 8308363: Initial compiler support for FP16 scalar operations. [v5]

Thu Aug 31 10:04:20 UTC 2023

> Starting with 4th Generation Xeon, Intel has made extensive extensions to existing ISA to support 16 bit scalar and vector floating point operations based on IEEE 754 binary16 format.
> 
> We plan to support this in multiple stages spanning across Java side definition of Float16 type, scalar operation and finally SLP vectorization support.
> 
> This patch adds  minimal Java and Compiler side support for one API Float16.add.
> 
> **Summary of changes :-**
> - Minimal implementation of Float16 primitive class supporting one operation (Float16.add)
> - X86 AVX512-FP16 feature detection at VM startup.
> - C2 IR and Inline expander changes for Float16.add API.
> - FP16 constant folding handling.
> - Backend support : Instruction selection patterns and assembler support.
> - New IR framework and functional tests.
> 
> **Implementation details:-**
> 
> 1/ Newly defined Float16 class encapsulate a short value holding IEEE 754 binary16 encoded value.
> 
> 2/ Float16 is a primitive class which in future will be aligned with other enhanced primitive wrapper classes proposed by [JEP-402.](https://openjdk.org/jeps/402)
> 
> 3/ Float16 to support all the operations supported by corresponding Float class.
> 
> 4/ Java implementation of each API will internally perform floating point operation at FP32 granularity.
> 
> 5/ API which can be directly mapped to an Intel AVX512FP16 instruction will be a candidate for intensification by C2 compiler.
> 
> 6/ With Valhalla, C2 compiler always creates an InlineType IR node for a value class instance.
> Total number of inputs of an InlineType node match the number of non-static fields. In this case node will have one input of short type TypeInt::SHORT.
> 
> 7/ Since all the scalar AVX512FP16 instructions operate on floating point registers and Float16 backing storage is held in a general-purpose register hence we need to introduce appropriate conversion IR which moves a 16-bit value from GPR to a XMM register and vice versa.
> ![image](https://github.com/openjdk/valhalla/assets/59989778/192fca7e-6b7e-4e62-9b09-677e33eca48d)
> 
> 8/ Current plan is to introduce a new IR node for each operation which is a subclass of its corresponding single precision IR node. This will allow leveraging idealization routines (Ideal/Identity/Value) of its parent operation.
> 
> 9/ All the single/double precision IR nodes carry a Type::FLOAT/DOUBLE ideal type. This represents entire FP32/64 value range and is different from integral types which explicitly record lower and upper bounds of value ranges. Value resolution ...

Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:

  Resolving offline review comments from Paul Sandoz and Xiaohong Gong.

-------------

Changes:
  - all: https://git.openjdk.org/valhalla/pull/848/files
  - new: https://git.openjdk.org/valhalla/pull/848/files/6989b465..53ac8929

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=valhalla&pr=848&range=04
 - incr: https://webrevs.openjdk.org/?repo=valhalla&pr=848&range=03-04

  Stats: 162 lines in 9 files changed: 103 ins; 27 del; 32 mod
  Patch: https://git.openjdk.org/valhalla/pull/848.diff
  Fetch: git fetch https://git.openjdk.org/valhalla.git pull/848/head:pull/848

PR: https://git.openjdk.org/valhalla/pull/848