RFR: 8318227: RISC-V: C2 ConvHF2F [v2]
Fei Yang
fyang at openjdk.org
Tue Nov 28 13:27:11 UTC 2023
On Mon, 27 Nov 2023 13:29:33 GMT, Hamlin Li <mli at openjdk.org> wrote:
>> Hi,
>> Can you review the patch to add ConvHF2F intrinsic to JDK for riscv?
>> Thanks!
>>
>> (By latest kernel patch, `#define RISCV_HWPROBE_EXT_ZFH (1 << 27)`
>> https://lore.kernel.org/lkml/20231114141256.126749-11-cleger@rivosinc.com/)
>>
>> ## Test
>> ### Functionality
>> #### hotspot tests
>> test/hotspot/jtreg/compiler/intrinsics/
>> test/hotspot/jtreg/compiler/c2/irTests
>>
>> #### jdk tests
>> test/jdk/java/lang/Float/Binary16Conversion*.java
>>
>> ### Performance
>> tested on licheepi.
>>
>> #### with UseZfh enabled & stub out-of-band
>>
>> Benchmark (size) Mode Cnt Score Error Units
>> Fp16ConversionBenchmark.float16ToFloat 2048 avgt 10 3493.376 ? 18.631 ns/op
>> Fp16ConversionBenchmark.float16ToFloatMemory 2048 avgt 10 19.819 ? 0.193 ns/op
>>
>>
>> #### with UseZfh enabled only
>> (i.e. enable the intrinsic)
>>
>> Benchmark (size) Mode Cnt Score Error Units
>> Fp16ConversionBenchmark.float16ToFloat 2048 avgt 10 4659.796 ? 13.262 ns/op
>> Fp16ConversionBenchmark.float16ToFloatMemory 2048 avgt 10 22.957 ? 0.098 ns/op
>>
>>
>> #### with UseZfh disabled
>> (i.e. disable the intrinsic)
>>
>> Benchmark (size) Mode Cnt Score Error Units
>> Fp16ConversionBenchmark.float16ToFloat 2048 avgt 10 22930.591 ? 72.595 ns/op
>> Fp16ConversionBenchmark.float16ToFloatMemory 2048 avgt 10 25.970 ? 0.063 ns/op
>
> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision:
>
> optimize perf with stub out-of-line
src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1704:
> 1702: // check whether it's a NaN.
> 1703: mv(t0, 0x7c00);
> 1704: andr(tmp, src, t0);
I see from the exponent encoding of float16 on [1], it could be a negative/positive infinity as well when exponent is 0b11111. It depends on whether the significand is zero or not. So it this checking for NAN sufficient?
[1] https://en.wikipedia.org/wiki/Half-precision_floating-point_format
src/hotspot/cpu/riscv/riscv.ad line 8288:
> 8286: __ float16_to_float($dst$$FloatRegister, $src$$Register, $tmp$$Register);
> 8287: %}
> 8288: ins_pipe(fp_f2i);
Seems we should use `ins_pipe(pipe_slow)` here as this emits multiple instructions.
src/hotspot/os_cpu/linux_riscv/riscv_hwprobe.cpp line 52:
> 50: #define RISCV_HWPROBE_EXT_ZBB (1 << 4)
> 51: #define RISCV_HWPROBE_EXT_ZBS (1 << 5)
> 52: #define RISCV_HWPROBE_EXT_ZFH (1 << 27)
Will this change in future? Seems it's still not there in the kernel source yet [1].
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/riscv/include/uapi/asm/hwprobe.h?h=v6.7-rc3
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/16802#discussion_r1407751915
PR Review Comment: https://git.openjdk.org/jdk/pull/16802#discussion_r1407588997
PR Review Comment: https://git.openjdk.org/jdk/pull/16802#discussion_r1407610639
More information about the hotspot-dev
mailing list