RFR: 8307446: RISC-V: Improve performance of floating point to integer conversion

Thu May 4 14:20:24 UTC 2023

On Thu, 4 May 2023 12:06:05 GMT, Feilong Jiang <fjiang at openjdk.org> wrote:

> Hi, 
> 
> can I have reviews for this change that improves the performance of floating point to integer conversion?
> 
> Currently, risc-v port converts floating point to integer using `FCVT_SAFE` in macroAssembler_riscv.cpp.
> 
> The main issue here is Java spec returns 0 when the floating point number is NaN [1].
> But for RISC-V ISA, instructions converting a floating-point value to an integer value (`FCVT.W.S`/`FCVT.L.S`/`FCVT.W.D`/`FCVT.L.D`) return the largest/smallest value when the floating point number is NaN [2].
> That requires additional logic to handle the case when the src of conversion is NaN, as the following code did:
> 
> 
> #define FCVT_SAFE(FLOATCVT, FLOATEQ)                                                             \
> void MacroAssembler:: FLOATCVT##_safe(Register dst, FloatRegister src, Register tmp) {           \
>   Label L_Okay;                                                                                  \
>   fscsr(zr);                                                                                     \
>   FLOATCVT(dst, src);                                                                            \
>   frcsr(tmp);                                                                                    \
>   andi(tmp, tmp, 0x1E);                                                                          \
>   beqz(tmp, L_Okay);                                                                             \
>   FLOATEQ(tmp, src, src);                                                                        \
>   bnez(tmp, L_Okay);                                                                             \
>   mv(dst, zr);                                                                                   \
>   bind(L_Okay);                                                                                  \
> }
> 
> FCVT_SAFE(fcvt_w_s, feq_s)
> FCVT_SAFE(fcvt_l_s, feq_s)
> FCVT_SAFE(fcvt_w_d, feq_d)
> FCVT_SAFE(fcvt_l_d, feq_d)
> 
> 
> We can improve the logic of NaN checking with the `fclass` instruction just as [JDK-8297359](https://bugs.openjdk.org/browse/JDK-8297359) did.
> 
> Here are the JMH results, we can got an obvious improvement for `f2i`/`f2l`/`d2i`/`d2l` conversions (source: [FloatConversion.java](https://gist.github.com/feilongjiang/b59bdd8db8460242bafac4a2ee6c2e06#file-floatconversion-java), tests on HiFive Unmatched board):
> 
> 
> Before:
> 
> Benchmark                     (size)   Mode  Cnt   Score   Error   Units
> FloatConversion.doubleToInt     2048  thrpt   15  29.311 ± 0.063  ops/ms
> FloatConversion.doubleToLong    2048  thrpt   15  29.914 ± 0.023  ops/ms
> FloatConversion.floatToInt      2048  thrpt   15  30.530 ± 0.011  ops/ms
> FloatConversion.floatToLong     2048  thrpt   15  29.657 ± 0.021  ops/ms
> 
> Benchmark                     (size)   Mode  Cnt   Score   Error   Units
> FloatConversion.doubleToInt     2048  thrpt   15  29.335 ± 0.014  ops/ms
> FloatConversion.doubleToLong    2048  thrpt   15  29.919 ± 0.022  ops/ms
> FloatConversion.floatToInt      2048  thrpt   15  30.523 ± 0.026  ops/ms
> FloatConversion.floatToLong     2048  thrpt   15  29.670 ± 0.011  ops/ms
> 
> Benchmark                     (size)   Mode  Cnt   Score   Error   Units
> FloatConversion.doubleToInt     2048  thrpt   15  29.344 ± 0.017  ops/ms
> FloatConversion.doubleToLong    2048  thrpt   15  29.908 ± 0.060  ops/ms
> FloatConversion.floatToInt      2048  thrpt   15  30.539 ± 0.009  ops/ms
> FloatConversion.floatToLong     2048  thrpt   15  29.676 ± 0.013  ops/ms
> 
> ---------------------------------------------------------------------------
> 
> After:
> 
> Benchmark                     (size)   Mode  Cnt   Score   Error   Units
> FloatConversion.doubleToInt     2048  thrpt   15  65.903 ± 0.385  ops/ms
> FloatConversion.doubleToLong    2048  thrpt   15  66.491 ± 0.057  ops/ms
> FloatConversion.floatToInt      2048  thrpt   15  68.045 ± 0.061  ops/ms
> FloatConversion.floatToLong     2048  thrpt   15  68.441 ± 0.077  ops/ms
> 
> Benchmark                     (size)   Mode  Cnt   Score   Error   Units
> FloatConversion.doubleToInt     2048  thrpt   15  66.015 ± 0.059  ops/ms
> FloatConversion.doubleToLong    2048  thrpt   15  66.511 ± 0.059  ops/ms
> FloatConversion.floatToInt      2048  thrpt   15  68.077 ± 0.051  ops/ms
> FloatConversion.floatToLong     2048  thrpt   15  68.467 ± 0.076  ops/ms
> 
> Benchmark                     (size)   Mode  Cnt   Score   Error   Units
> FloatConversion.doubleToInt     2048  thrpt   15  65.999 ± 0.067  ops/ms
> FloatConversion.doubleToLong    2048  thrpt   15  66.454 ± 0.090  ops/ms
> FloatConversion.floatToInt      2048  thrpt   15  68.048 ± 0.055  ops/ms
> FloatConversion.floatToLong     2048  thrpt   15  68.467 ± 0.054  ops/ms
> 
> 
> 1. https://docs.oracle.com/javase/specs/jls/se20/html/jls-5.html#jls-5.1.3
> 2. https://github.com/riscv/riscv-isa-manual/blob/63aeaada9b2fee7ca15e5c6b6a28f3b710fb7e58/src/f-st-ext.adoc?plain=1#L365-L386
> 
> ## Testing:
> - [x] tier1~3 on Unmatched board (release build)

Changes requested by vkempik (Committer).

src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 4075:

> 4073:   bind(do_convert);                                                                       \
> 4074:   FLOATCVT(dst, src);                                                                     \
> 4075:   bind(done);                                                                             \

what about reducing the branching?

e.g.

mv (dst, zr); //pretty cheap anyway
fclass(..);
andi(tmp, tmp, 0b1100000000);
bnez(tmp, done);
  FLOATCVT(dst, src);                                                                    
bind(done);

-------------

PR Review: https://git.openjdk.org/jdk/pull/13800#pullrequestreview-1413204751
PR Review Comment: https://git.openjdk.org/jdk/pull/13800#discussion_r1185081049