RFR: 8307446: RISC-V: Improve performance of floating point to integer conversion
Xiaolin Zheng
xlinzheng at openjdk.org
Thu May 4 13:41:13 UTC 2023
On Thu, 4 May 2023 12:06:05 GMT, Feilong Jiang <fjiang at openjdk.org> wrote:
> Hi,
>
> can I have reviews for this change that improves the performance of floating point to integer conversion?
>
> Currently, risc-v port converts floating point to integer using `FCVT_SAFE` in macroAssembler_riscv.cpp.
>
> The main issue here is Java spec returns 0 when the floating point number is NaN [1].
> But for RISC-V ISA, instructions converting a floating-point value to an integer value (`FCVT.W.S`/`FCVT.L.S`/`FCVT.W.D`/`FCVT.L.D`) return the largest/smallest value when the floating point number is NaN [2].
> That requires additional logic to handle the case when the src of conversion is NaN, as the following code did:
>
>
> #define FCVT_SAFE(FLOATCVT, FLOATEQ) \
> void MacroAssembler:: FLOATCVT##_safe(Register dst, FloatRegister src, Register tmp) { \
> Label L_Okay; \
> fscsr(zr); \
> FLOATCVT(dst, src); \
> frcsr(tmp); \
> andi(tmp, tmp, 0x1E); \
> beqz(tmp, L_Okay); \
> FLOATEQ(tmp, src, src); \
> bnez(tmp, L_Okay); \
> mv(dst, zr); \
> bind(L_Okay); \
> }
>
> FCVT_SAFE(fcvt_w_s, feq_s)
> FCVT_SAFE(fcvt_l_s, feq_s)
> FCVT_SAFE(fcvt_w_d, feq_d)
> FCVT_SAFE(fcvt_l_d, feq_d)
>
>
> We can improve the logic of NaN checking with the `fclass` instruction just as [JDK-8297359](https://bugs.openjdk.org/browse/JDK-8297359) did.
>
> Here are the JMH results, we can got an obvious improvement for `f2i`/`f2l`/`d2i`/`d2l` conversions (source: [FloatConversion.java](https://gist.github.com/feilongjiang/b59bdd8db8460242bafac4a2ee6c2e06#file-floatconversion-java), tests on HiFive Unmatched board):
>
>
> Before:
>
> Benchmark (size) Mode Cnt Score Error Units
> FloatConversion.doubleToInt 2048 thrpt 15 29.311 ± 0.063 ops/ms
> FloatConversion.doubleToLong 2048 thrpt 15 29.914 ± 0.023 ops/ms
> FloatConversion.floatToInt 2048 thrpt 15 30.530 ± 0.011 ops/ms
> FloatConversion.floatToLong 2048 thrpt 15 29.657 ± 0.021 ops/ms
>
> Benchmark (size) Mode Cnt Score Error Units
> FloatConversion.doubleToInt 2048 thrpt 15 29.335 ± 0.014 ops/ms
> FloatConversion.doubleToLong 2048 thrpt 15 29.919 ± 0.022 ops/ms
> FloatConversion.floatToInt 2048 thrpt 15 30.523 ± 0.026 ops/ms
> FloatConversion.floatToLong 2048 thrpt 15 29.670 ± 0.011 ops/ms
>
> Benchmark (size) Mode Cnt Score Error Units
> FloatConversion.doubleToInt 2048 thrpt 15 29.344 ± 0.017 ops/ms
> FloatConversion.doubleToLong 2048 thrpt 15 29.908 ± 0.060 ops/ms
> FloatConversion.floatToInt 2048 thrpt 15 30.539 ± 0.009 ops/ms
> FloatConversion.floatToLong 2048 thrpt 15 29.676 ± 0.013 ops/ms
>
> ---------------------------------------------------------------------------
>
> After:
>
> Benchmark (size) Mode Cnt Score Error Units
> FloatConversion.doubleToInt 2048 thrpt 15 65.903 ± 0.385 ops/ms
> FloatConversion.doubleToLong 2048 thrpt 15 66.491 ± 0.057 ops/ms
> FloatConversion.floatToInt 2048 thrpt 15 68.045 ± 0.061 ops/ms
> FloatConversion.floatToLong 2048 thrpt 15 68.441 ± 0.077 ops/ms
>
> Benchmark (size) Mode Cnt Score Error Units
> FloatConversion.doubleToInt 2048 thrpt 15 66.015 ± 0.059 ops/ms
> FloatConversion.doubleToLong 2048 thrpt 15 66.511 ± 0.059 ops/ms
> FloatConversion.floatToInt 2048 thrpt 15 68.077 ± 0.051 ops/ms
> FloatConversion.floatToLong 2048 thrpt 15 68.467 ± 0.076 ops/ms
>
> Benchmark (size) Mode Cnt Score Error Units
> FloatConversion.doubleToInt 2048 thrpt 15 65.999 ± 0.067 ops/ms
> FloatConversion.doubleToLong 2048 thrpt 15 66.454 ± 0.090 ops/ms
> FloatConversion.floatToInt 2048 thrpt 15 68.048 ± 0.055 ops/ms
> FloatConversion.floatToLong 2048 thrpt 15 68.467 ± 0.054 ops/ms
>
>
> 1. https://docs.oracle.com/javase/specs/jls/se20/html/jls-5.html#jls-5.1.3
> 2. https://github.com/riscv/riscv-isa-manual/blob/63aeaada9b2fee7ca15e5c6b6a28f3b710fb7e58/src/f-st-ext.adoc?plain=1#L365-L386
>
> ## Testing:
> - [x] tier1~3 on Unmatched board (release build)
The results are good on T-Head board, several times with similar figures:
Before:
Benchmark (size) Mode Cnt Score Error Units
FloatConversion.doubleToInt 2048 thrpt 15 17.011 ± 0.077 ops/ms
FloatConversion.doubleToLong 2048 thrpt 15 17.121 ± 0.046 ops/ms
FloatConversion.floatToInt 2048 thrpt 15 17.091 ± 0.048 ops/ms
FloatConversion.floatToLong 2048 thrpt 15 16.948 ± 0.032 ops/ms
After:
Benchmark (size) Mode Cnt Score Error Units
FloatConversion.doubleToInt 2048 thrpt 15 269.501 ± 2.230 ops/ms
FloatConversion.doubleToLong 2048 thrpt 15 267.121 ± 2.237 ops/ms
FloatConversion.floatToInt 2048 thrpt 15 275.593 ± 5.272 ops/ms
FloatConversion.floatToLong 2048 thrpt 15 265.564 ± 6.730 ops/ms
-------------
PR Comment: https://git.openjdk.org/jdk/pull/13800#issuecomment-1534799145
More information about the hotspot-dev
mailing list