RFR: 8317971: RISC-V: implement copySignF/D and signumF/D intrinsics

Mon Oct 16 07:40:13 UTC 2023

On Fri, 13 Oct 2023 15:36:56 GMT, Ilya Gavrilin <igavrilin at openjdk.org> wrote:

> Hi all, please review this changes into risc-v floating point copysign and signum intrinsics.
> CopySign - returns first argument with the sign of second. On risc-v we have `fsgnj.x` instruction, which can implement this intrinsic.
> Signum - returns input value if it is +/- 0.0 or NaN, otherwise 1.0 with the sign of input value returned.  On risc-v we can use `fclass.x` to specify type of input value and return appropriate value.
> 
> Tests:
> Performance tests on t-head board:
> With intrinsics:
> 
> Benchmark                 (seed)   Mode  Cnt      Score      Error   Units
> MathBench.copySignDouble       0  thrpt    8  34156.580 ±   76.272  ops/ms
> MathBench.copySignFloat        0  thrpt    8  34181.731 ±   38.182  ops/ms
> MathBench.signumDouble         0  thrpt    8  31977.258 ± 1122.327  ops/ms
> MathBench.signumFloat          0  thrpt    8  31836.852 ±   56.013  ops/ms
> 
> Intrinsics turned off (`-XX:+UnlockDiagnosticVMOptions -XX:-UseCopySignIntrinsic -XX:-UseSignumIntrinsic`):
> 
> Benchmark                 (seed)   Mode  Cnt      Score      Error   Units
> MathBench.copySignDouble       0  thrpt    8  31000.996 ±  943.094  ops/ms
> MathBench.copySignFloat        0  thrpt    8  30678.016 ±   28.087  ops/ms
> MathBench.signumDouble         0  thrpt    8  25435.010 ± 2047.085  ops/ms
> MathBench.signumFloat          0  thrpt    8  25257.058 ±   79.175  ops/ms
> 
> Regression tests: tier1, hotspot:tier2 on risc-v board.
> 
> Also, changed name of one micro test: before we had: `sigNumDouble` and `signumFloat` tests, they does not matches to `signum` or `sigNum`. Now we have similar part: `signum`.
> Performance tests has been changed a bit, to check intrinsics result better, diff to modify tests:
> 
> diff --git a/test/micro/org/openjdk/bench/java/lang/MathBench.java b/test/micro/org/openjdk/bench/java/lang/MathBench.java
> index 6cd1353907e..0bee25366bf 100644
> --- a/test/micro/org/openjdk/bench/java/lang/MathBench.java
> +++ b/test/micro/org/openjdk/bench/java/lang/MathBench.java
> @@ -143,12 +143,12 @@ public double  ceilDouble() {
>  
>      @Benchmark
>      public double  copySignDouble() {
> -        return  Math.copySign(double81, doubleNegative12);
> +        return  Math.copySign(double81, doubleNegative12) + Math.copySign(double81, double2) + Math.copySign(double4Dot1, doubleNegative12);
>      }
>  
>      @Benchmark
>      public float  copySignFloat() {
> -        return  Math.copySign(floatNegative99, float1);
> +        return  Math.copySign(floatNegative99, float1) + Math.copySign(eFloat, float1) + Math.copySign...

Some comments after a brief look.

src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1659:

> 1657: // on input we have NaN or +/-0.0 value we should return it,
> 1658: // otherwise return +/- 1.0 using sign of input.
> 1659: // tmp1 - used to store result of fclass operation,

Seems to me that scratch register `t0` could be used in this function in order to save use of `tmp1`.

src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1661:

> 1659: // tmp1 - used to store result of fclass operation,
> 1660: // one - gives us a floating-point 1.0 (got from matching rule)
> 1661: // bool single_precision - specififes single or double precision operations will be used.

Suggestion: s/bool single_precision - specififes/bool is_double - specifies/

src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1663:

> 1661: // bool single_precision - specififes single or double precision operations will be used.
> 1662: void C2_MacroAssembler::signum_fp(FloatRegister dst, FloatRegister src, FloatRegister one, Register tmp1, bool is_double) {
> 1663:   assert_different_registers(dst, src, one);

This constraint could be relaxed if we use scratch register `t0` here instead of `tmp`.

src/hotspot/cpu/riscv/riscv.ad line 7511:

> 7509: // Copysign and signum intrinsics
> 7510: 
> 7511: instruct copySignD_reg(fRegD dst, fRegD src1, fRegD src2, immD0 zero) %{

A more simpler `immD zero` will do here which is similar with the x86 counterpart.

src/hotspot/cpu/riscv/riscv.ad line 7513:

> 7511: instruct copySignD_reg(fRegD dst, fRegD src1, fRegD src2, immD0 zero) %{
> 7512:   match(Set dst (CopySignD src1 (Binary src2 zero)));
> 7513:   effect(TEMP_DEF dst, USE src1, USE src2);

Unnecessary effect.

src/hotspot/cpu/riscv/riscv.ad line 7526:

> 7524: instruct copySignF_reg(fRegF dst, fRegF src1, fRegF src2) %{
> 7525:   match(Set dst (CopySignF src1 src2));
> 7526:   effect(TEMP_DEF dst, USE src1, USE src2);

Unnecessary effect.

src/hotspot/cpu/riscv/riscv.ad line 7537:

> 7535: %}
> 7536: 
> 7537: instruct signumD_reg(fRegD dst, fRegD src, fRegD zero, fRegD one, iRegINoSp tmp1) %{

Use `immD zero` instread, which will help avoid reserving one FP register here.

src/hotspot/cpu/riscv/riscv.ad line 7539:

> 7537: instruct signumD_reg(fRegD dst, fRegD src, fRegD zero, fRegD one, iRegINoSp tmp1) %{
> 7538:   match(Set dst (SignumD src (Binary zero one)));
> 7539:   effect(TEMP_DEF dst, USE src, USE one, TEMP tmp1);

Unnecessary effect if changed to use `t0` instead of `tmp1` in `signum_fp`.

src/hotspot/cpu/riscv/riscv.ad line 7548:

> 7546: %}
> 7547: 
> 7548: instruct signumF_reg(fRegF dst, fRegF src, fRegF zero, fRegF one, iRegINoSp tmp1) %{

Use `immF zero` instread, which will help avoid reserving one FP register here.

src/hotspot/cpu/riscv/riscv.ad line 7550:

> 7548: instruct signumF_reg(fRegF dst, fRegF src, fRegF zero, fRegF one, iRegINoSp tmp1) %{
> 7549:   match(Set dst (SignumF src (Binary zero one)));
> 7550:   effect(TEMP_DEF dst, USE src, USE one, TEMP tmp1);

Unnecessary effect if changed to use `t0` instead of `tmp1` in `signum_fp`.

-------------

PR Review: https://git.openjdk.org/jdk/pull/16186#pullrequestreview-1679272664
PR Review Comment: https://git.openjdk.org/jdk/pull/16186#discussion_r1360216914
PR Review Comment: https://git.openjdk.org/jdk/pull/16186#discussion_r1360215721
PR Review Comment: https://git.openjdk.org/jdk/pull/16186#discussion_r1360219620
PR Review Comment: https://git.openjdk.org/jdk/pull/16186#discussion_r1360221019
PR Review Comment: https://git.openjdk.org/jdk/pull/16186#discussion_r1360230241
PR Review Comment: https://git.openjdk.org/jdk/pull/16186#discussion_r1360230310
PR Review Comment: https://git.openjdk.org/jdk/pull/16186#discussion_r1360222911
PR Review Comment: https://git.openjdk.org/jdk/pull/16186#discussion_r1360231462
PR Review Comment: https://git.openjdk.org/jdk/pull/16186#discussion_r1360223205
PR Review Comment: https://git.openjdk.org/jdk/pull/16186#discussion_r1360231642