RFR: 8318158: RISC-V: implement roundD/roundF intrinsics [v7]

Mon Dec 18 06:30:45 UTC 2023

On Fri, 8 Dec 2023 22:46:45 GMT, Olga Mikhaltsova <omikhaltcova at openjdk.org> wrote:

>> Please, review this Implementation of the roundD/roundF intrinsics for RISC-V platform.
>> 
>> In the table below it is shown that NaN argument should be processed as a special case.
>> 
>>                                                   RISC-V                            Java
>>                                         (FCVT.W.S)    (FCVT.L.D)  (long round(double a)) (int round(float a))
>> Minimum valid input (after rounding)     −2^31         −2^63         Long.MIN_VALUE       Integer.MIN_VALUE
>> Maximum valid input (after rounding)      2^31 − 1      2^63 − 1     Long.MAX_VALUE       Integer.MAX_VALUE
>> Output for out-of-range negative input   −2^31         −2^63         Long.MIN_VALUE       Integer.MIN_VALUE
>> Output for −∞                            −2^31         −2^63         Long.MIN_VALUE       Integer.MIN_VALUE
>> Output for out-of-range positive input    2^31 − 1      2^63 - 1     Long.MAX_VALUE       Integer.MAX_VALUE
>> Output for +∞                             2^31 − 1      2^63 - 1     Long.MAX_VALUE       Integer.MAX_VALUE
>> Output for NaN                            2^31 − 1      2^63 - 1           0                      0
>> 
>> The benchmark running with the 2nd fixed implementation on the T-Head RVB-ICE board shows the following performance improvement::
>> 
>> **Before**
>> 
>> Benchmark                              (TESTSIZE)   Mode  Cnt    Score   Error   Units
>> FpRoundingBenchmark.test_round_double        2048  thrpt   15   59.555  0.179  ops/ms
>> FpRoundingBenchmark.test_round_float         2048  thrpt   15   49.760  0.103  ops/ms
>> 
>> 
>> **After**
>> 
>> Benchmark                              (TESTSIZE)   Mode  Cnt    Score   Error   Units
>> FpRoundingBenchmark.test_round_double        2048  thrpt   15  110.956  0.186  ops/ms
>> FpRoundingBenchmark.test_round_float         2048  thrpt   15  115.947  0.122  ops/ms
>
> Olga Mikhaltsova has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Optimization against regression on SiFive

So I tried your test on both Unmatched and Licheepi-4A boards. And I see the C2 JIT code is exercised and the test is passing over the full 32-bit range. Several minor comments though.

src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 4264:

> 4262: void MacroAssembler::java_round_float(Register dst, FloatRegister src, FloatRegister ftmp) {
> 4263:   Label done;
> 4264:   mv(dst, zr);

I see slightly improvement on both platforms when moving `mv` into between `feq_s`/`feq_d` and `beqz`.

src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 4265:

> 4263:   Label done;
> 4264:   mv(dst, zr);
> 4265:   li(t0, 0x3f000000);

Suggestion: `mv(t0, jint_cast(0.5f));`

src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 4282:

> 4280:   Label done;
> 4281:   mv(dst, zr);
> 4282:   li(t0, 0x3fe0000000000000);

Suggestion: `mv(t0, julong_cast(0.5));`

src/hotspot/cpu/riscv/riscv.ad line 8367:

> 8365:   %}
> 8366: 
> 8367:   ins_pipe(pipe_class_default);

Suggestion: `ins_pipe(pipe_slow);`

src/hotspot/cpu/riscv/riscv.ad line 8381:

> 8379:   %}
> 8380: 
> 8381:   ins_pipe(pipe_class_default);

Suggestion: `ins_pipe(pipe_slow);`

-------------

PR Review: https://git.openjdk.org/jdk/pull/16382#pullrequestreview-1785876685
PR Review Comment: https://git.openjdk.org/jdk/pull/16382#discussion_r1429517638
PR Review Comment: https://git.openjdk.org/jdk/pull/16382#discussion_r1429518186
PR Review Comment: https://git.openjdk.org/jdk/pull/16382#discussion_r1429518536
PR Review Comment: https://git.openjdk.org/jdk/pull/16382#discussion_r1429546156
PR Review Comment: https://git.openjdk.org/jdk/pull/16382#discussion_r1429546238