RFR: 8302976: C2 intrinsification of Float.floatToFloat16 and Float.float16ToFloat yields different result than the interpreter

Tue Mar 7 02:07:26 UTC 2023

On Tue, 7 Mar 2023 01:04:00 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote:

>> Implemented `Float.floatToFloat16` and `Float.float16ToFloat` intrinsics in Interpreter and C1 compiler to produce the same results as C2 intrinsics on x64, Aarch64 and RISC-V - all platforms where C2 intrinsics for these Java methods were implemented originally.
>> 
>> Replaced `SharedRuntime::f2hf()` and `hf2f()` C runtime functions with calls to runtime stubs which use the same HW instructions as C2 intrinsics. Only for 64-bit x64 because 32-bit x86 stub does not work: result is passed through FPU register and NaN values become different from C2 intrinsic. This runtime stub is only used to calculate constant values during C2 compilation and can be skipped.
>> 
>> I added new tests based on Tobias's `TestAll.java` And copied `jdk/lang/Float/Binary16Conversion*.java` tests to run them with `-Xcomp` to make sure code is compiled by C1 or C2. I modified `Binary16ConversionNaN.java` to compare results from Interpreter, C1 and C2.
>> 
>> Tested tier1-5, Xcomp, stress
>
> src/hotspot/cpu/x86/macroAssembler_x86.hpp line 199:
> 
>> 197:   void flt_to_flt16(Register dst, XMMRegister src, XMMRegister tmp) {
>> 198:     // Instruction requires different XMM registers
>> 199:     vcvtps2ph(tmp, src, 0x04, Assembler::AVX_128bit);
> 
> vcvtps2ph can have source and destination as same. Did you mean to say here in the comment that "Instruction requires XMM register as destination"?

`flt_to_flt16` is used in `x86.ad` instruction which requires preserving `src` register.
I did not want to add an other macroassembler instruction for src->src case.
I will add this to this comment.

> src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 3928:
> 
>> 3926:   }
>> 3927: 
>> 3928:   if (VM_Version::supports_f16c() || VM_Version::supports_avx512vl()) {
> 
> We could check for VM_Version::supports_float16() here instead.

Yes.

> src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 3931:
> 
>> 3929:     // For results consistency both intrinsics should be enabled.
>> 3930:     if (vmIntrinsics::is_intrinsic_available(vmIntrinsics::_float16ToFloat) &&
>> 3931:         vmIntrinsics::is_intrinsic_available(vmIntrinsics::_floatToFloat16)) {
> 
> Should this also check for InlineIntrinsics?

`vmIntrinsics::disabled_by_jvm_flags()` checks `InlineIntrinsics`. See `vmIntrinsics.cpp` changes.

> src/hotspot/cpu/x86/templateInterpreterGenerator_x86_64.cpp line 346:
> 
>> 344:   }
>> 345:   // For AVX CPUs only. f16c support is disabled if UseAVX == 0.
>> 346:   if (VM_Version::supports_f16c() || VM_Version::supports_avx512vl()) {
> 
> We could check for VM_Version::supports_float16() here instead.

Yes. And I need to remove `!InlineIntrinsics` check at line 340.

-------------

PR: https://git.openjdk.org/jdk/pull/12869