RFR: 8302976: C2 intrinsification of Float.floatToFloat16 and Float.float16ToFloat yields different result than the interpreter

Sandhya Viswanathan sviswanathan at openjdk.org
Tue Mar 7 00:22:15 UTC 2023


On Mon, 6 Mar 2023 23:54:44 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:

>> Implemented `Float.floatToFloat16` and `Float.float16ToFloat` intrinsics in Interpreter and C1 compiler to produce the same results as C2 intrinsics on x64, Aarch64 and RISC-V - all platforms where C2 intrinsics for these Java methods were implemented originally.
>> 
>> Replaced `SharedRuntime::f2hf()` and `hf2f()` C runtime functions with calls to runtime stubs which use the same HW instructions as C2 intrinsics. Only for 64-bit x64 because 32-bit x86 stub does not work: result is passed through FPU register and NaN values become different from C2 intrinsic. This runtime stub is only used to calculate constant values during C2 compilation and can be skipped.
>> 
>> I added new tests based on Tobias's `TestAll.java` And copied `jdk/lang/Float/Binary16Conversion*.java` tests to run them with `-Xcomp` to make sure code is compiled by C1 or C2. I modified `Binary16ConversionNaN.java` to compare results from Interpreter, C1 and C2.
>> 
>> Tested tier1-5, Xcomp, stress
>
> @fyang, please help to verify that new tests passed on RISC-V with these changes and review these changes. Thanks!
> 
> I tested x86 (64- and 32-bit) and AArch64.

@vnkozlov Thanks a lot for taking this up. Is the following in the PR description still true:
"Replaced SharedRuntime::f2hf() and hf2f() C runtime functions with calls to runtime stubs which use the same HW instructions as C2 intrinsics. Only for 64-bit x64 because 32-bit x86 stub does not work: result is passed through FPU register and NaN values become different from C2 intrinsic."
>From the PR it looks to me that for x86_64 you have the changes in place for SharedRuntime and the same result is produced across SharedRuntime, interpreter, c1, and c2.
For x86 32-bit also things are consistent across. Only the SharedRuntime optimization doesnt happen for x86 32bit as StubRoutines::hf2f() and StubRoutines::f2hf() are set as null. The fallback is handled correctly in interpreter, c1, and c2.

-------------

PR: https://git.openjdk.org/jdk/pull/12869


More information about the hotspot-compiler-dev mailing list