RFR: 8299525: RISC-V: Add backend support for half float conversion intrinsics

Feilong Jiang fjiang at openjdk.org
Wed Jan 4 13:04:53 UTC 2023


On Tue, 3 Jan 2023 12:08:47 GMT, Yadong Wang <yadongwang at openjdk.org> wrote:

> This patch adds RISC-V backend support for library intrinsics that implement conversions between half-precision and single-precision floats by using RISC-V Zfh Extension, which was already ratified by November 2021 (https://wiki.riscv.org/display/HOME/Recently+Ratified+Extensions).
> 
> The C2 output for PrintOptoAssembly:
> 0dc      B10: #  out( B33 B11 ) <- in( B9 )  Freq: 1.99802
> 0dc +   flw  F1, [R29, #16]     # float, #@loadF
> 0e0 +   fcvt.h.s F0, F1 #@convF2HF_reg_reg
>              fmv.x.h R8, F0  #@convF2HF_reg_reg
> 
> 0dc      B10: #  out( B33 B11 ) <- in( B9 )  Freq: 1.99801
> 0dc +   lh  R12, [R11, #16]     # short, #@loadS
> 0e0 +   fmv.h.x F0, R12 #@convHF2F_reg_reg
>              fcvt.s.h F1, F0 #@convHF2F_reg_reg
> 
> We don't have any hardware supporting yet, so ran the following benchmarks in Qemu for unreliable reference:
> 
> VM options: -XX:+UnlockExperimentalVMOptions -XX:-UseZfh
> Benchmark                                           (size)   Mode  Samples      Score  Score error   Units
> o.s.Fp16ConversionBenchmark.float16ToFloat            2048  thrpt       15     44.523        0.116  ops/ms
> o.s.Fp16ConversionBenchmark.float16ToFloatMemory      2048  thrpt       15   8379.835       27.309  ops/ms
> o.s.Fp16ConversionBenchmark.floatToFloat16            2048  thrpt       15      7.370        0.028  ops/ms
> o.s.Fp16ConversionBenchmark.floatToFloat16Memory      2048  thrpt       15  11292.278       11.962  ops/ms
> 
> VM options: -XX:+UnlockExperimentalVMOptions -XX:+UseZfh
> Benchmark                                           (size)   Mode  Samples      Score  Score error   Units
> o.s.Fp16ConversionBenchmark.float16ToFloat            2048  thrpt       15     12.357        0.153  ops/ms
> o.s.Fp16ConversionBenchmark.float16ToFloatMemory      2048  thrpt       15  10213.944       69.222  ops/ms
> o.s.Fp16ConversionBenchmark.floatToFloat16            2048  thrpt       15     11.728        0.067  ops/ms
> o.s.Fp16ConversionBenchmark.floatToFloat16Memory      2048  thrpt       15  15008.550       13.917  ops/ms

src/hotspot/cpu/riscv/globals_riscv.hpp line 106:

> 104:   product(bool, UseZbb, false, EXPERIMENTAL, "Use Zbb instructions")             \
> 105:   product(bool, UseZbs, false, EXPERIMENTAL, "Use Zbs instructions")             \
> 106:   product(bool, UseZfh, false, EXPERIMENTAL, "Use Zfh instructions")             \

`fcvt.s.h`/`fcvt.h.s`/`fmv_h_x`/`fmv.x.h` also belongs to Zfhmin[1], we can use `UseZfhmin` instead of `UseZfh` here.

1. https://github.com/riscv/riscv-isa-manual/blob/cb3b9d1dcdacefbde6602ada7a0050f5c723ddee/src/zfh.tex#L377-L391

src/hotspot/cpu/riscv/riscv.ad line 8159:

> 8157:   %}
> 8158: 
> 8159:   ins_pipe(fp_f2d);

According to the type of `src` and `dst`, we can use `fp_i2f` as ins_pipe.

src/hotspot/cpu/riscv/riscv.ad line 8177:

> 8175:   %}
> 8176: 
> 8177:   ins_pipe(fp_f2d);

Same here, `fp_f2i` is better.

-------------

PR: https://git.openjdk.org/jdk/pull/11828


More information about the hotspot-compiler-dev mailing list