RFR: 8312569: RISC-V: Missing intrinsics for Math.ceil, floor, rint [v10]

Fri Sep 1 07:41:48 UTC 2023

On Tue, 29 Aug 2023 08:28:42 GMT, Ilya Gavrilin <duke at openjdk.org> wrote:

>> Please review this changes into risc-v double rounding intrinsic.
>> 
>> On risc-v intrinsics for rounding doubles with mode (like Math.ceil/floor/rint) were missing. On risc-v we don`t have special instruction for such conversion, so two times conversion was used: double -> long int -> double (using fcvt.l.d, fcvt.d.l).
>> 
>> Also, we should provide some rounding mode to fcvt.x.x instruction.
>> 
>> Rounding mode selection on ceil (similar for floor and rint): according to Math.ceil requirements: 
>> 
>>> Returns the smallest (closest to negative infinity) value that is greater than or equal to the argument and is equal to a mathematical integer (Math.java:475).
>> 
>> For double -> long int we choose rup (round towards +inf) mode to get the integer that more than or equal to the input value. 
>> For long int -> double we choose rdn (rounds towards -inf) mode to get the smallest (closest to -inf) representation of integer that we got after conversion.
>> 
>> For cases when we got inf, nan, or value more than 2^63 return input value (double value which more than 2^63 is guaranteed integer).
>> As well when we store result we copy sign from input value (need for cases when for (-1.0, 0.0) ceil need to return -0.0).
>> 
>> We have observed significant improvement on hifive and thead boards.
>> 
>> testing: tier1, tier2 and hotspot:tier3 on hifive
>> 
>> Performance results on hifive (FpRoundingBenchmark.testceil/floor/rint):
>> 
>> Without intrinsic:
>> 
>> Benchmark                      (TESTSIZE)   Mode  Cnt   Score   Error   Units
>> FpRoundingBenchmark.testceil         1024  thrpt   25  39.297 ± 0.037  ops/ms
>> FpRoundingBenchmark.testfloor        1024  thrpt   25  39.398 ± 0.018  ops/ms
>> FpRoundingBenchmark.testrint         1024  thrpt   25  36.388 ± 0.844  ops/ms
>> 
>> With intrinsic:
>> 
>> Benchmark                      (TESTSIZE)   Mode  Cnt   Score   Error   Units
>> FpRoundingBenchmark.testceil         1024  thrpt   25  80.560 ± 0.053  ops/ms
>> FpRoundingBenchmark.testfloor        1024  thrpt   25  80.541 ± 0.081  ops/ms
>> FpRoundingBenchmark.testrint         1024  thrpt   25  80.603 ± 0.071  ops/ms
>
> Ilya Gavrilin has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fix typo in c2_MacroAssembler_riscv.cpp

Thanks for the update. Several nits remain. Otherwise LGTM.

src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1380:

> 1378: // round out-of-range values to the nearest max or min value), therefore special
> 1379: // handling is needed by NaN, +/-Infinity, +/-0.
> 1380: void C2_MacroAssembler::round_double_mode(FloatRegister dst, FloatRegister src, int round_mode, Register tmp1, Register tmp2, Register tmp3) {

Start a new line for the three temporary register parameters.

src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1383:

> 1381: 
> 1382:   assert_different_registers(dst, src);
> 1383:   assert_different_registers(tmp1, tmp2, tmp3);

Suggestion: `assert_different_registers(dst, src, tmp1, tmp2, tmp3);`

src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1385:

> 1383:   assert_different_registers(tmp1, tmp2, tmp3);
> 1384: 
> 1385:   // setting rounding mode for conversions

Suggestion: s/setting/Set/

src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1388:

> 1386:   // here we use similar modes to double->long and long->double conversions
> 1387:   // different mode for long->double conversion matter only if long value was not representable as double
> 1388:   // we got long value as a result of double->long conversion so it is defenitely representable

Typo: s/defenitely/definitely/

src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1409:

> 1407:   Label done, bad_val;
> 1408: 
> 1409:   // generating constant (tmp2)

Suggestion: s/generating/Generate/

src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1412:

> 1410:   // tmp2 = 100...0000
> 1411:   addi(tmp2, zr, 1);
> 1412:   slli(tmp2, tmp2, 63);

Better to move these two lines with code comments after `fcvt_l_d(tmp1, src, rm);`

src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1416:

> 1414:   fcvt_l_d(tmp1, src, rm);
> 1415: 
> 1416:   // preparing converted long (tmp1)

Suggestion: s/preparing/Prepare/

src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1422:

> 1420:   addi(tmp3, tmp1, 1);
> 1421:   andi(tmp3, tmp3, -2);
> 1422:   beq(tmp3, tmp2, bad_val);

Please start a new line here after `beq`;

src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1427:

> 1425:   // add sign of input value to result for +/- 0 cases
> 1426:   fsgnj_d(dst, dst, src);
> 1427:   j(done);

Please start a new line here after `j(done);`

-------------

Changes requested by fyang (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/14991#pullrequestreview-1606389159
PR Review Comment: https://git.openjdk.org/jdk/pull/14991#discussion_r1312661362
PR Review Comment: https://git.openjdk.org/jdk/pull/14991#discussion_r1312659940
PR Review Comment: https://git.openjdk.org/jdk/pull/14991#discussion_r1312662917
PR Review Comment: https://git.openjdk.org/jdk/pull/14991#discussion_r1312662397
PR Review Comment: https://git.openjdk.org/jdk/pull/14991#discussion_r1312666632
PR Review Comment: https://git.openjdk.org/jdk/pull/14991#discussion_r1312665535
PR Review Comment: https://git.openjdk.org/jdk/pull/14991#discussion_r1312671271
PR Review Comment: https://git.openjdk.org/jdk/pull/14991#discussion_r1312669272
PR Review Comment: https://git.openjdk.org/jdk/pull/14991#discussion_r1312663760