RFR: 8307446: RISC-V: Improve performance of floating point to integer conversion [v2]

Fri May 5 09:31:25 UTC 2023

On Fri, 5 May 2023 03:01:09 GMT, Xiaolin Zheng <xlinzheng at openjdk.org> wrote:

>> src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 4075:
>> 
>>> 4073:   bind(do_convert);                                                                       \
>>> 4074:   FLOATCVT(dst, src);                                                                     \
>>> 4075:   bind(done);                                                                             \
>> 
>> what about reducing the branching?
>> 
>> e.g.
>> 
>> mv (dst, zr); //pretty cheap anyway
>> fclass(..);
>> andi(tmp, tmp, 0b1100000000);
>> bnez(tmp, done);
>>   FLOATCVT(dst, src);                                                                    
>> bind(done);
>
> After applying this results look better:
> 
> 
> Benchmark                     (size)   Mode  Cnt    Score   Error   Units
> FloatConversion.doubleToInt     2048  thrpt   15  286.038 ± 1.472  ops/ms
> FloatConversion.doubleToLong    2048  thrpt   15  289.585 ± 1.501  ops/ms
> FloatConversion.floatToInt      2048  thrpt   15  294.313 ± 1.263  ops/ms
> FloatConversion.floatToLong     2048  thrpt   15  273.749 ± 2.261  ops/ms
> 
> 
> Stable.

I tweaked this version a bit (put `mv(dst, zr)` after `fclass`), results are still good and stable on unmatched board:

Benchmark                     (size)   Mode  Cnt   Score   Error   Units
FloatConversion.doubleToInt     2048  thrpt   15  66.022 ± 0.308  ops/ms
FloatConversion.doubleToLong    2048  thrpt   15  66.549 ± 0.052  ops/ms
FloatConversion.floatToInt      2048  thrpt   15  68.108 ± 0.042  ops/ms
FloatConversion.floatToLong     2048  thrpt   15  68.483 ± 0.099  ops/ms

Benchmark                     (size)   Mode  Cnt   Score   Error   Units
FloatConversion.doubleToInt     2048  thrpt   15  66.106 ± 0.065  ops/ms
FloatConversion.doubleToLong    2048  thrpt   15  66.590 ± 0.060  ops/ms
FloatConversion.floatToInt      2048  thrpt   15  68.121 ± 0.032  ops/ms
FloatConversion.floatToLong     2048  thrpt   15  68.505 ± 0.082  ops/m

Here is the change:

#define FCVT_SAFE(FLOATCVT, FLOATSIG)                                                     \
void MacroAssembler::FLOATCVT##_safe(Register dst, FloatRegister src, Register tmp) {     \
  Label done;                                                                             \
  assert_different_registers(dst, tmp);                                                   \
  fclass_##FLOATSIG(tmp, src);                                                            \
  mv(dst, zr);                                                                            \
  /* check if src is NaN */                                                               \
  andi(tmp, tmp, 0b1100000000);                                                           \
  bnez(tmp, done);                                                                        \
  FLOATCVT(dst, src);                                                                     \
  bind(done);                                                                             \
}

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/13800#discussion_r1185876272