RFR: 8342103: C2 compiler support for Float16 type and associated operations

Tue Nov 19 19:57:19 UTC 2024

On Mon, 14 Oct 2024 11:40:01 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

> Hi All,
> 
> This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128)
> 
> Following is the summary of changes included with this patch:-
> 
> 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations.
> 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization.
> 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class.
>       -    These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values.
> 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines.
> 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more details.
> 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa.
> 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF
> 6. Auto-vectorization of newly supported scalar operations.
> 7. X86 and AARCH64 backend implementation for all supported intrinsics.
> 9. Functional and Performance validation tests.
> 
> **Missing Pieces:-**
> **-  AARCH64 Backend.**
> 
> Kindly review and share your feedback.
> 
> Best Regards,
> Jatin

src/hotspot/cpu/x86/assembler_x86.cpp line 3481:

> 3479: void Assembler::vmovw(XMMRegister dst, Register src) {
> 3480:   assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16");
> 3481:   InstructionAttr attributes(AVX_128bit, false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false);

It will be good to have the second argument with comment as "/* vex_w */ false".

src/hotspot/cpu/x86/assembler_x86.cpp line 3483:

> 3481:   InstructionAttr attributes(AVX_128bit, false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false);
> 3482:   attributes.set_is_evex_instruction();
> 3483:   int encode = vex_prefix_and_encode(dst->encoding(), 0, src->encoding(), VEX_SIMD_66, VEX_OPCODE_MAP5, &attributes);

I think we need to change this to:
int encode = vex_prefix_and_encode(dst->encoding(), 0, src->encoding(), VEX_SIMD_66, VEX_OPCODE_MAP5, &attributes, true);
Please note the last argument for APX encoding when src is in higher register bank.

src/hotspot/cpu/x86/assembler_x86.cpp line 3489:

> 3487: void Assembler::vmovw(Register dst, XMMRegister src) {
> 3488:   assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16");
> 3489:   InstructionAttr attributes(AVX_128bit, false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false);

It will be good to have the second argument with comment as "/* vex_w */ false".

src/hotspot/cpu/x86/assembler_x86.cpp line 3491:

> 3489:   InstructionAttr attributes(AVX_128bit, false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false);
> 3490:   attributes.set_is_evex_instruction();
> 3491:   int encode = vex_prefix_and_encode(src->encoding(), 0, dst->encoding(), VEX_SIMD_66, VEX_OPCODE_MAP5, &attributes);

I think we need to change this to:
int encode = vex_prefix_and_encode(src->encoding(), 0, dst->encoding(), VEX_SIMD_66, VEX_OPCODE_MAP5, &attributes, true);
Please note the last argument for APX encoding when dst is in higher register bank.

src/hotspot/cpu/x86/assembler_x86.cpp line 8464:

> 8462: void Assembler::evaddph(XMMRegister dst, XMMRegister nds, XMMRegister src, int vector_len) {
> 8463:   assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16");
> 8464:   InstructionAttr attributes(vector_len, false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ true);

It will be good to have the second argument with comment as "/* vex_w */ false".

src/hotspot/cpu/x86/assembler_x86.cpp line 8483:

> 8481: void Assembler::evsubph(XMMRegister dst, XMMRegister nds, XMMRegister src, int vector_len) {
> 8482:   assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16");
> 8483:   InstructionAttr attributes(vector_len, false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ true);

It will be good to have the second argument with comment as "/* vex_w */ false"

src/hotspot/cpu/x86/assembler_x86.cpp line 8502:

> 8500: void Assembler::evmulph(XMMRegister dst, XMMRegister nds, XMMRegister src, int vector_len) {
> 8501:   assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16");
> 8502:   InstructionAttr attributes(vector_len, false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ true);

It will be good to have the second argument with comment as "/* vex_w */ false"

src/hotspot/cpu/x86/assembler_x86.cpp line 8521:

> 8519: void Assembler::evminph(XMMRegister dst, XMMRegister nds, XMMRegister src, int vector_len) {
> 8520:   assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16");
> 8521:   InstructionAttr attributes(vector_len, false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ true);

It will be good to have the second argument with comment as "/* vex_w */ false"

src/hotspot/cpu/x86/assembler_x86.cpp line 8540:

> 8538: void Assembler::evmaxph(XMMRegister dst, XMMRegister nds, XMMRegister src, int vector_len) {
> 8539:   assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16");
> 8540:   InstructionAttr attributes(vector_len, false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ true);

It will be good to have the second argument with comment as "/* vex_w */ false"

src/hotspot/cpu/x86/assembler_x86.cpp line 8559:

> 8557: void Assembler::evdivph(XMMRegister dst, XMMRegister nds, XMMRegister src, int vector_len) {
> 8558:   assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16");
> 8559:   InstructionAttr attributes(vector_len, false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ true);

It will be good to have the second argument with comment as "/* vex_w */ false"

src/hotspot/cpu/x86/assembler_x86.cpp line 8576:

> 8574: }
> 8575: 
> 8576: void Assembler::evsqrtph(XMMRegister dst, XMMRegister src1, int vector_len) {

A nitpick src1 could be src :).

src/hotspot/cpu/x86/assembler_x86.cpp line 8614:

> 8612: }
> 8613: 
> 8614: void Assembler::eaddsh(XMMRegister dst, XMMRegister nds, XMMRegister src) {

This should be vaddsh.

src/hotspot/cpu/x86/assembler_x86.cpp line 8616:

> 8614: void Assembler::eaddsh(XMMRegister dst, XMMRegister nds, XMMRegister src) {
> 8615:   assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16");
> 8616:   InstructionAttr attributes(AVX_128bit, false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false);

It will be good to have the second argument with comment as "/* vex_w */ false"

src/hotspot/cpu/x86/assembler_x86.cpp line 8622:

> 8620: }
> 8621: 
> 8622: void Assembler::esubsh(XMMRegister dst, XMMRegister nds, XMMRegister src) {

This should be vsubsh.

src/hotspot/cpu/x86/assembler_x86.cpp line 8624:

> 8622: void Assembler::esubsh(XMMRegister dst, XMMRegister nds, XMMRegister src) {
> 8623:   assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16");
> 8624:   InstructionAttr attributes(AVX_128bit, false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false);

It will be good to have the second argument with comment as "/* vex_w */ false"

src/hotspot/cpu/x86/assembler_x86.cpp line 8630:

> 8628: }
> 8629: 
> 8630: void Assembler::edivsh(XMMRegister dst, XMMRegister nds, XMMRegister src) {

This should be vdivsh.

src/hotspot/cpu/x86/assembler_x86.cpp line 8632:

> 8630: void Assembler::edivsh(XMMRegister dst, XMMRegister nds, XMMRegister src) {
> 8631:   assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16");
> 8632:   InstructionAttr attributes(AVX_128bit, false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false);

It will be good to have the second argument with comment as "/* vex_w */ false"

src/hotspot/cpu/x86/assembler_x86.cpp line 8638:

> 8636: }
> 8637: 
> 8638: void Assembler::emulsh(XMMRegister dst, XMMRegister nds, XMMRegister src) {

This should be vmulsh.

src/hotspot/cpu/x86/assembler_x86.cpp line 8640:

> 8638: void Assembler::emulsh(XMMRegister dst, XMMRegister nds, XMMRegister src) {
> 8639:   assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16");
> 8640:   InstructionAttr attributes(AVX_128bit, false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false);

It will be good to have the second argument with comment as "/* vex_w */ false"

src/hotspot/cpu/x86/assembler_x86.cpp line 8646:

> 8644: }
> 8645: 
> 8646: void Assembler::emaxsh(XMMRegister dst, XMMRegister nds, XMMRegister src) {

This should be vmaxsh.

src/hotspot/cpu/x86/assembler_x86.cpp line 8648:

> 8646: void Assembler::emaxsh(XMMRegister dst, XMMRegister nds, XMMRegister src) {
> 8647:   assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16");
> 8648:   InstructionAttr attributes(AVX_128bit, false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false);

It will be good to have the second argument with comment as "/* vex_w */ false"

src/hotspot/cpu/x86/assembler_x86.cpp line 8654:

> 8652: }
> 8653: 
> 8654: void Assembler::eminsh(XMMRegister dst, XMMRegister nds, XMMRegister src) {

This should be vminsh.

src/hotspot/cpu/x86/assembler_x86.cpp line 8656:

> 8654: void Assembler::eminsh(XMMRegister dst, XMMRegister nds, XMMRegister src) {
> 8655:   assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16");
> 8656:   InstructionAttr attributes(AVX_128bit, false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false);

It will be good to have the second argument with comment as "/* vex_w */ false"

src/hotspot/cpu/x86/assembler_x86.cpp line 8662:

> 8660: }
> 8661: 
> 8662: void Assembler::esqrtsh(XMMRegister dst, XMMRegister src) {

This should be vsqrtsh.

src/hotspot/cpu/x86/assembler_x86.cpp line 8664:

> 8662: void Assembler::esqrtsh(XMMRegister dst, XMMRegister src) {
> 8663:   assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16");
> 8664:   InstructionAttr attributes(AVX_128bit, false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false);

It will be good to have the second argument with comment as "/* vex_w */ false"

src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 3974:

> 3972:   generate_libm_stubs();
> 3973: 
> 3974:   StubRoutines::_fmod = generate_libmFmod(); // from stubGenerator_x86_64_fmod.cpp

Good to retain the is_intrinsic_available checks.

src/hotspot/cpu/x86/x86.ad line 4518:

> 4516: #ifdef _LP64
> 4517: instruct ReplS_imm(vec dst, immH con, rRegI rtmp) %{
> 4518:   predicate(VM_Version::supports_avx512_fp16() && Matcher::vector_element_basic_type(n) == T_SHORT);

I have a question about the predicate for ReplS_imm. What happens if the predicate is false? There doesn't seem to be any other instruct rule to cover that situation. Also I don't see any check in match rule supported on Replicate node.

src/hotspot/cpu/x86/x86.ad line 10895:

> 10893:   format %{ "esqrtsh $dst, $src" %}
> 10894:   ins_encode %{
> 10895:     int opcode = this->ideal_Opcode();

opcode is unused.

src/hotspot/cpu/x86/x86.ad line 10936:

> 10934:   ins_encode %{
> 10935:     int vlen_enc = vector_length_encoding(this);
> 10936:     int opcode = this->ideal_Opcode();

opcode unused later.

src/hotspot/cpu/x86/x86.ad line 10949:

> 10947:   ins_encode %{
> 10948:     int vlen_enc = vector_length_encoding(this);
> 10949:     int opcode = this->ideal_Opcode();

opcode unused later.

src/hotspot/cpu/x86/x86.ad line 10964:

> 10962:   match(Set dst (SubVHF src1 src2));
> 10963:   format %{ "evbinopfp16_reg $dst, $src1, $src2" %}
> 10964:   ins_cost(450);

Why ins_cost 450 here for reg version and 150 for mem version of binOps?  Whereas sqrt above has 150 cost for both reg and mem version. Good to be consistent.

src/hotspot/cpu/x86/x86.ad line 11012:

> 11010:   effect(DEF dst);
> 11011:   format %{ "evfmaph_reg $dst, $src1, $src2\t# $dst = $dst * $src1 + $src2 fma packedH" %}
> 11012:   ins_cost(450);

Good to be consistent with ins_cost for reg vs mem version.

src/hotspot/cpu/x86/x86.ad line 11015:

> 11013:   ins_encode %{
> 11014:     int vlen_enc = vector_length_encoding(this);
> 11015:     __ evfmadd132ph($dst$$XMMRegister, $src2$$XMMRegister, $src1$$XMMRegister, vlen_enc);

Wondering if for auto vectorization the natural fma form is dst = dst + src1 * src2 i.e.
     match(Set dst (FmaVHF dst (Binary src1 src2)));
which then leads to fmadd231.

src/hotspot/share/adlc/output_h.cpp line 1298:

> 1296:         case Form::idealD: type = "Type::DOUBLE";   break;
> 1297:         case Form::idealL: type = "TypeLong::LONG"; break;
> 1298:         case Form::idealH: type = "Type::HALF_LONG"; break;

This should be Type::HALF_FLOAT

src/hotspot/share/classfile/vmSymbols.hpp line 143:

> 141:   template(java_util_DualPivotQuicksort,              "java/util/DualPivotQuicksort")             \
> 142:   template(jdk_internal_misc_Signal,                  "jdk/internal/misc/Signal")                 \
> 143:   template(jdk_internal_math_Float16Math,             "jdk/internal/math/Float16Math")            \

This seems to be leftover template.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843870304
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843899813
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843870852
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843902337
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843871328
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843906656
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843908957
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843910609
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843912897
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843914392
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843916999
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843922125
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843922490
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843923239
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843924299
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843925126
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843925319
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843926551
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843926789
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843928252
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843928447
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843929519
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843929686
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843930969
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843931641
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1847403451
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1847400518
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1844234786
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1844237825
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1844238487
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1844244532
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1847443990
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1847448109
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1847470619
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1847475384