RFR: 8312569: RISC-V: Missing intrinsics for Math.ceil, floor, rint [v2]

Thu Aug 17 07:45:38 UTC 2023

On Fri, 4 Aug 2023 16:27:50 GMT, Ilya Gavrilin <duke at openjdk.org> wrote:

>> Please review this changes into risc-v double rounding intrinsic.
>> 
>> On risc-v intrinsics for rounding doubles with mode (like Math.ceil/floor/rint) were missing. On risc-v we don`t have special instruction for such conversion, so two times conversion was used: double -> long int -> double (using fcvt.l.d, fcvt.d.l).
>> 
>> Also, we should provide some rounding mode to fcvt.x.x instruction.
>> 
>> Rounding mode selection on ceil (similar for floor and rint): according to Math.ceil requirements: 
>> 
>>> Returns the smallest (closest to negative infinity) value that is greater than or equal to the argument and is equal to a mathematical integer (Math.java:475).
>> 
>> For double -> long int we choose rup (round towards +inf) mode to get the integer that more than or equal to the input value. 
>> For long int -> double we choose rdn (rounds towards -inf) mode to get the smallest (closest to -inf) representation of integer that we got after conversion.
>> 
>> For cases when we got inf, nan, or value more than 2^63 return input value (double value which more than 2^63 is guaranteed integer).
>> As well when we store result we copy sign from input value (need for cases when for (-1.0, 0.0) ceil need to return -0.0).
>> 
>> We have observed significant improvement on hifive and thead boards.
>> 
>> testing: tier1, tier2 and hotspot:tier3 on hifive
>> 
>> Performance results on hifive (FpRoundingBenchmark.testceil/floor/rint):
>> 
>> Without intrinsic:
>> 
>> Benchmark                      (TESTSIZE)   Mode  Cnt   Score   Error   Units
>> FpRoundingBenchmark.testceil         1024  thrpt   25  39.297 ± 0.037  ops/ms
>> FpRoundingBenchmark.testfloor        1024  thrpt   25  39.398 ± 0.018  ops/ms
>> FpRoundingBenchmark.testrint         1024  thrpt   25  36.388 ± 0.844  ops/ms
>> 
>> With intrinsic:
>> 
>> Benchmark                      (TESTSIZE)   Mode  Cnt   Score   Error   Units
>> FpRoundingBenchmark.testceil         1024  thrpt   25  80.560 ± 0.053  ops/ms
>> FpRoundingBenchmark.testfloor        1024  thrpt   25  80.541 ± 0.081  ops/ms
>> FpRoundingBenchmark.testrint         1024  thrpt   25  80.603 ± 0.071  ops/ms
>
> Ilya Gavrilin has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Change fsgnj_d(dst, src, src) to fmv_d(dst, src)

Changes requested by fyang (Reviewer).

src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 4250:

> 4248: 
> 4249: // Round double with mode
> 4250: 

We should add some comment here. Maybe:

// According to Java SE specification, for floating-point round operations, if
// the input is NaN, +/-infinity, or +/-0, the same input is returned as the
// rounded result; this differs from behavior of RISC-V fcvt instructions (which
// round out-of-range values to the nearest max or min value), therefore special
// handling is needed by NaN, +/-Infinity, +/-0.

src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 4258:

> 4256: 
> 4257:   // setting roundig mode to double->long (rm_direct) and long->double (rm_back) conversions
> 4258:   RoundingMode rm_direct, rm_back;

Can we use the same rounding mode for conversions in both direction? Say `rup` for `ceil`, and `rdn` for `floor`.
I see this policy is used for both glibc [1] and V8. 

[1] https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/riscv/rv64/rvd/s_ceil.c;h=6c355cd72691c45c97201fe8947683287982ade9;hb=41d8c3bc33bcae1ebb8077b0442caef4917f763a

src/hotspot/cpu/riscv/macroAssembler_riscv.hpp line 1236:

> 1234:     rmode_floor,
> 1235:     rmode_rint
> 1236:   };

Why not use the existing `RoundDoubleModeNode::rmode_ceil`, `RoundDoubleModeNode::rmode_floor` and `RoundDoubleModeNode::rmode_rint` instead?

src/hotspot/cpu/riscv/macroAssembler_riscv.hpp line 1238:

> 1236:   };
> 1237: 
> 1238:   void round_double_mode(FloatRegister dst, FloatRegister src, enum Round_double_mode round_mode, Register converted_dbl, Register mask, Register converted_dbl_masked);

I would prefer to rename the last three parameters as `tmp1`, `tmp2` and `tmp3`.  You could create aliases for these parameters where you feel necessary.

src/hotspot/cpu/riscv/riscv.ad line 7706:

> 7704:   match(Set dst (RoundDoubleMode src rmode));
> 7705:   ins_cost(2 * XFER_COST + BRANCH_COST);
> 7706:   effect(TEMP_DEF dst, TEMP tmp1, TEMP tmp2, TEMP tmp3, KILL cr);

Did we kill `cr` anywhere in the assembly code?

src/hotspot/cpu/riscv/riscv.ad line 7708:

> 7706:   effect(TEMP_DEF dst, TEMP tmp1, TEMP tmp2, TEMP tmp3, KILL cr);
> 7707: 
> 7708:   format %{ "RoundDoubleMode $src,$rmode" %}

Indentation: Please leave a space between the two operands.

src/hotspot/cpu/riscv/riscv.ad line 7727:

> 7725:     }
> 7726:   %}
> 7727:   ins_pipe(fp_rnd_d);

I think `pipe_class_default` will do here. No need for another new pipe class.

-------------

PR Review: https://git.openjdk.org/jdk/pull/14991#pullrequestreview-1581868109
PR Review Comment: https://git.openjdk.org/jdk/pull/14991#discussion_r1296756944
PR Review Comment: https://git.openjdk.org/jdk/pull/14991#discussion_r1296766101
PR Review Comment: https://git.openjdk.org/jdk/pull/14991#discussion_r1296734767
PR Review Comment: https://git.openjdk.org/jdk/pull/14991#discussion_r1296743831
PR Review Comment: https://git.openjdk.org/jdk/pull/14991#discussion_r1296750058
PR Review Comment: https://git.openjdk.org/jdk/pull/14991#discussion_r1296751637
PR Review Comment: https://git.openjdk.org/jdk/pull/14991#discussion_r1296749660