[vectorIntrinsics+fp16] RFR: 8290204: FP16 initial backend implementation
Jatin Bhateja
jbhateja at openjdk.org
Thu Jul 14 17:57:15 UTC 2022
On Tue, 12 Jul 2022 21:52:29 GMT, Smita Kamath <svkamath at openjdk.org> wrote:
> Initial backend implementation for enabling FP16.
Hi Smita,
Please find my initial review comments.
Thanks
src/hotspot/cpu/x86/assembler_x86.cpp line 2147:
> 2145:
> 2146: void Assembler::evcvtpd2ph(XMMRegister dst, XMMRegister src, int vector_len) {
> 2147: assert(VM_Version::supports_avx512_fp16(), "");
It will be beneficial is we pass an extra opmask register to leaf level assembly routines for packed operations and have another wrapper without opmask which passes K0 as the opmask operand. It will help us while supporting predicated operations.
src/hotspot/cpu/x86/assembler_x86.cpp line 2156:
> 2154:
> 2155: void Assembler::evcvtps2ph(XMMRegister dst, XMMRegister src, int imm8, int vector_len) {
> 2156: assert(VM_Version::supports_evex(), "");
VM_Version::supports_avx512_fp16() missing.
src/hotspot/cpu/x86/assembler_x86.cpp line 2166:
> 2164:
> 2165: void Assembler::evcvtph2ps(XMMRegister dst, XMMRegister src, int vector_len) {
> 2166: assert(VM_Version::supports_evex(), "");
VM_Version::supports_avx512_fp16() missing.
src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 7765:
> 7763: StubRoutines::x86::_vector_halffloat_sign_mask = generate_vector_fp_mask("vector_halffloat_sign_mask", 0x7FFF7FFF7FFF7FFF);
> 7764: StubRoutines::x86::_vector_halffloat_sign_flip = generate_vector_fp_mask("vector_halffloat_sign_flip", 0x8000800080008000);
> 7765:
Should be added to 32 bit stubGenerators also.
src/hotspot/cpu/x86/vm_version_x86.hpp line 369:
> 367: decl(GFNI, "gfni", 48) /* Vector GFNI instructions */ \
> 368: decl(AVX512_BITALG, "avx512_bitalg", 49) /* Vector sub-word popcount and bit gather instructions */\
> 369: decl(AVX512_FP16, "avx512_fp16", 50) /* Vector FP16 instructions*/
Please also handle the newly added feature in
src/jdk.internal.vm.ci/share/classes/jdk.vm.ci.amd64/src/jdk/vm/ci/amd64/AMD64.java
src/hotspot/cpu/x86/x86.ad line 2120:
> 2118: if(bt != T_SHORT && !VM_Version::supports_avx512_fp16()) {
> 2119: return false;
> 2120: }
Should it be moved Matcher::match_rule_supported_vector, backend does not support predicated half floats instructions currently.
src/hotspot/cpu/x86/x86.ad line 4852:
> 4850: // =======================Half Float Reduction==========================================
> 4851: instruct reduction8HF(rRegI dst, vec src2, vec tmp, vec tmp1, vec tmp2) %{
> 4852: predicate(UseAVX > 2);
This patterns handles 128 bit vector, in case we defer handling 256/512 bit vectors we should add a check in match_rule_supported_vector.
src/hotspot/cpu/x86/x86.ad line 4889:
> 4887: __ pshuflw($tmp$$XMMRegister, $tmp2$$XMMRegister, 0x03);
> 4888: __ evaddsh($tmp1$$XMMRegister, $tmp1$$XMMRegister, $tmp$$XMMRegister);
> 4889: __ movdl($dst$$Register, $tmp1$$XMMRegister);
We can move this into a macro assembly routine.
src/hotspot/cpu/x86/x86.ad line 5411:
> 5409: // Halffloat vector add
> 5410: instruct vaddHF_reg(vec dst, vec src1, vec src2) %{
> 5411: predicate(UseAVX > 2);
We can check existence of AVX3 targets match_rule_supported for half float IR nodes and remove explicit predicates from each pattern.
src/hotspot/cpu/x86/x86.ad line 7424:
> 7422: // Convert from Halffloat to other types
> 7423: instruct vcvtHFtoD_reg(vec dst, vec src) %{
> 7424: predicate(UseAVX > 2 && VM_Version::supports_avx512_fp16() && Matcher::vector_element_basic_type(n) == T_DOUBLE);
Should there be a reverse check !FP16 in vcastStoX_evex to resolve any dis-ambiguity during selection
src/hotspot/cpu/x86/x86.ad line 7435:
> 7433:
> 7434: instruct vcvtHFtoF_reg(vec dst, vec src) %{
> 7435: predicate(UseAVX > 2 && Matcher::vector_element_basic_type(n) == T_FLOAT);
Same as above.
src/hotspot/cpu/x86/x86.ad line 7447:
> 7445: // Convert from other types to Halffloat
> 7446: instruct vcvtFtoHF_reg(vec dst, vec src) %{
> 7447: predicate(UseAVX > 2 && Matcher::vector_element_basic_type(n) == T_SHORT);
Same as above.
src/hotspot/cpu/x86/x86.ad line 7459:
> 7457:
> 7458: instruct vcvtDtoHF_reg(vec dst, vec src) %{
> 7459: predicate(UseAVX > 2 && VM_Version::supports_avx512_fp16() && Matcher::vector_element_basic_type(n) == T_SHORT);
Same as above.
src/hotspot/cpu/x86/x86.ad line 7992:
> 7990: predicate(UseAVX > 2);
> 7991: match(Set dst (AbsVHF src));
> 7992: format %{ "vandps $dst,$src\t# $dst = |$src| abs packedHF" %}
Incorrect format string.
src/hotspot/share/opto/vectornode.cpp line 1222:
> 1220: switch (vopc) {
> 1221: case Op_AddReductionVHF: return new AddReductionVHFNode(ctrl, n1, n2);
> 1222: case Op_MulReductionVF: return new MulReductionVFNode(ctrl, n1, n2);
Backend handling is missing for MulReductionVF
-------------
PR: https://git.openjdk.org/panama-vector/pull/204
More information about the panama-dev
mailing list