RFP: Float16 Support in the OpenJDK Vector API

Thu Aug 14 17:02:12 UTC 2025

*Float16 Support in the OpenJDK Vector API*

*Summary*

Add FP16 (IEEE‑754 half‑precision) vector type to the Vector API, enabling
compute and memory operations over half‑precision lanes with short carrier
and Float16 box type. Provide C2 support, baseline/fallback semantics via
FP32 promotion, and validation via jtreg/JMH.

*Goals*

   - Introduce HalffloatVector with 64/128/256/512‑bit concrete species.
   - Preserve the Vector API’s carrier model (short) while disambiguating
   FP16 from ShortVector via Float16 box type and an explicit *operation
   type*.
   - Enable core ops: load/store, lanewise (unary/binary/ternary incl.
   FMA), compare, mask operations, broadcasts, splats.
   - Provide deterministic fallback semantics using FP32 compute + FP16
   round‑to‑nearest‑even (RNE) down‑conversion.

*Non‑Goals*

   - No new hardware backends in this RFC (use existing C2 instruction
   selection patterns for respective backends and extend where necessary).
   - No FP8/INT8 semantics in this change (but design leaves a path).
   - No changes to Float16 scalar APIs beyond what’s required for interop.

*Motivation*

   - FP16 is prevalent in ML/AI, image/video, and DSP for higher throughput
   and reduced bandwidth/footprint.
   - Many architectures expose native FP16 SIMD; exposing this via Vector
   API unlocks portable performance while keeping well‑defined fallbacks.

*Design Overview*

*Java Types*

   - Abstract: HalffloatVector
   - Concrete: Halffloat64Vector, Halffloat128Vector, Halffloat256Vector,
   Halffloat512Vector
   - ElementType (carrier): short.class
   - BoxType: Float16 (avoids ambiguity with Vector<Short> and virtual
   dispatch)

*Fallback Semantics*

   - Promote each short lane via Float16.floatValue() → compute in FP32 →
   down‑convert to FP16.

*Operation Type (disambiguation)*

   - Pass an additional int operationType to VectorSupport entry points
   (e.g., VECTOR_TYPE_FP16 for now and VECTOR_TYPE_INT8, VECTOR_TYPE_FP8 in
   future).
   - Carrier type remains T_SHORT for TypeVect; IR opcode inference uses
   (operationType, opKind).

*HotSpot/C2 Integration (selected)*

   - *Load/Store*
   Entry: VectorSupport.load/store with (vectortype=Halffloat*Vector.class,
   elemtype=short.class, length=N, operationType=VECTOR_TYPE_FP16)
   Expander: LoadVector/StoreVector using
   TypeVect{element_basic_type=T_SHORT, num_elem=N}; existing short‑vector
   match rules apply.
   - *Lanewise*
   Entry: VectorSupport.unaryOp/binaryOp/ternaryOp (+ operationType).
   IR: Reuse existing vector IR where backend ops exist; add
   FP16‑specialized nodes only where needed. (Today C2 creates specialized
   FP16 nodes; we continue that path.)
   - *Compare/Mask*
   Entry: VectorSupport.compare (+ operationType).
   IR: Introduce VectorMaskCmpHFNode (semantic compare in FP16 domain).
   - *Incubation Note*
   Halffloat* lives in the incubation module; hence in order to infer
   Float16 IR inline expander can take two approaches. One is to infer a
   Float16 IR through a class name-based resolution, since VM only keeps track
   of the classes part of the java.base module, this solution is acceptable in
   short‑term. However, a robust solution is to pass an explicit operationType
   parameter to intrinsic entry points as discussed above. This scheme
   circumvents loopholes in fragile name based resolution and can easily be
   extended to support other reduced precision types like INT8 or FP8 in
   future.

*Compatibility & Interactions*

   - Distinct BoxType=Float16 prevents dispatch ambiguity with
   Vector<Short>.
   - Interop: explicit conversion via Float.float16ToFloat(short) /
   Float.floatToFloat16(float).
   - No behavioral changes to existing vector types.

*Testing & Validation*

   - *Functional*: Extend Vector API jtreg to cover all Halffloat ops
   (loads/stores, lanewise incl. FMA, compares, masks, predication).
   - *Performance*: Extend JMH harness; add microbenchmarks (e.g., FP16
   dot‑product, semantic search kernels).
   - *Correctness*: Cross‑check fallback vs hardware where available;
   verify RNE rounding and edge cases (NaNs, subnormals, signed zero,
   infinities).

*Risks & Mitigations*

   - *Dispatch ambiguity* → Use Float16 box type and operationType flag.
   - *Backend coverage variance* → Fall back to FP32 emulation; gated
   intrinsics per‑platform.
   - *Precision surprises* → Document FP16 semantics, rounding, and
   conversions.

*Reference Implementation Plan*

1.      Refresh vectorIntrinsics+fp16 support.

2.    Implement C2 decode of (carrier=T_SHORT, operationType=FP16) to infer
FP16 vector IR.

3.   Leverage existing instruction selection and backend support from JDK
mainline and extend wherever applicable e.g. VectorMaskCmpHFNode.

4.      Land jtreg/JMH; publish perf/correctness data.

*Minimal Usage Sketch*

          static final VectorSpecies<Float16> S =
Halffloat.SPECIES_PREFERRED;

      short[] a = ...; short[] b = ...;

      HalffloatVector acc = HalffloatVector.broadcast(S, 0);

      for (int i = 0; i < S.loopBound(a.length); i += S.length()) {

         var v1 = HalffloatVector.fromArray(S, a, i);

         var v2 = HalffloatVector.fromArray(S, b, i);

         acc = acc.lanewise(VectorOperators.FMA, v1, v2, acc);

      }

      float sum = 0.0f;

      for (int i = 0; i < S.length(); i++) {

         sum += Float.float16ToFloat(acc.lane(i));

      }

      return sum;

*Future Activities:-*

-         Scope of this RFC is to extend array backing storage based
existing vector API infrastructure to support Float16 type and enable Java
users to harness the power of FP16 ISA supported by various targets and
bring Float16 support at par with existing primitive Vector types.

-         Our eventual goal is to make Float16 a value type and use flat
array based backing storage supported by project Valhalla.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20250814/fe511bfe/attachment-0001.htm>