RFR: 8317971: RISC-V: implement copySignF/D and signumF/D intrinsics
Fei Yang
fyang at openjdk.org
Mon Oct 16 07:40:13 UTC 2023
On Fri, 13 Oct 2023 15:36:56 GMT, Ilya Gavrilin <igavrilin at openjdk.org> wrote:
> Hi all, please review this changes into risc-v floating point copysign and signum intrinsics.
> CopySign - returns first argument with the sign of second. On risc-v we have `fsgnj.x` instruction, which can implement this intrinsic.
> Signum - returns input value if it is +/- 0.0 or NaN, otherwise 1.0 with the sign of input value returned. On risc-v we can use `fclass.x` to specify type of input value and return appropriate value.
>
> Tests:
> Performance tests on t-head board:
> With intrinsics:
>
> Benchmark (seed) Mode Cnt Score Error Units
> MathBench.copySignDouble 0 thrpt 8 34156.580 ± 76.272 ops/ms
> MathBench.copySignFloat 0 thrpt 8 34181.731 ± 38.182 ops/ms
> MathBench.signumDouble 0 thrpt 8 31977.258 ± 1122.327 ops/ms
> MathBench.signumFloat 0 thrpt 8 31836.852 ± 56.013 ops/ms
>
> Intrinsics turned off (`-XX:+UnlockDiagnosticVMOptions -XX:-UseCopySignIntrinsic -XX:-UseSignumIntrinsic`):
>
> Benchmark (seed) Mode Cnt Score Error Units
> MathBench.copySignDouble 0 thrpt 8 31000.996 ± 943.094 ops/ms
> MathBench.copySignFloat 0 thrpt 8 30678.016 ± 28.087 ops/ms
> MathBench.signumDouble 0 thrpt 8 25435.010 ± 2047.085 ops/ms
> MathBench.signumFloat 0 thrpt 8 25257.058 ± 79.175 ops/ms
>
> Regression tests: tier1, hotspot:tier2 on risc-v board.
>
> Also, changed name of one micro test: before we had: `sigNumDouble` and `signumFloat` tests, they does not matches to `signum` or `sigNum`. Now we have similar part: `signum`.
> Performance tests has been changed a bit, to check intrinsics result better, diff to modify tests:
>
> diff --git a/test/micro/org/openjdk/bench/java/lang/MathBench.java b/test/micro/org/openjdk/bench/java/lang/MathBench.java
> index 6cd1353907e..0bee25366bf 100644
> --- a/test/micro/org/openjdk/bench/java/lang/MathBench.java
> +++ b/test/micro/org/openjdk/bench/java/lang/MathBench.java
> @@ -143,12 +143,12 @@ public double ceilDouble() {
>
> @Benchmark
> public double copySignDouble() {
> - return Math.copySign(double81, doubleNegative12);
> + return Math.copySign(double81, doubleNegative12) + Math.copySign(double81, double2) + Math.copySign(double4Dot1, doubleNegative12);
> }
>
> @Benchmark
> public float copySignFloat() {
> - return Math.copySign(floatNegative99, float1);
> + return Math.copySign(floatNegative99, float1) + Math.copySign(eFloat, float1) + Math.copySign...
Some comments after a brief look.
src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1659:
> 1657: // on input we have NaN or +/-0.0 value we should return it,
> 1658: // otherwise return +/- 1.0 using sign of input.
> 1659: // tmp1 - used to store result of fclass operation,
Seems to me that scratch register `t0` could be used in this function in order to save use of `tmp1`.
src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1661:
> 1659: // tmp1 - used to store result of fclass operation,
> 1660: // one - gives us a floating-point 1.0 (got from matching rule)
> 1661: // bool single_precision - specififes single or double precision operations will be used.
Suggestion: s/bool single_precision - specififes/bool is_double - specifies/
src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1663:
> 1661: // bool single_precision - specififes single or double precision operations will be used.
> 1662: void C2_MacroAssembler::signum_fp(FloatRegister dst, FloatRegister src, FloatRegister one, Register tmp1, bool is_double) {
> 1663: assert_different_registers(dst, src, one);
This constraint could be relaxed if we use scratch register `t0` here instead of `tmp`.
src/hotspot/cpu/riscv/riscv.ad line 7511:
> 7509: // Copysign and signum intrinsics
> 7510:
> 7511: instruct copySignD_reg(fRegD dst, fRegD src1, fRegD src2, immD0 zero) %{
A more simpler `immD zero` will do here which is similar with the x86 counterpart.
src/hotspot/cpu/riscv/riscv.ad line 7513:
> 7511: instruct copySignD_reg(fRegD dst, fRegD src1, fRegD src2, immD0 zero) %{
> 7512: match(Set dst (CopySignD src1 (Binary src2 zero)));
> 7513: effect(TEMP_DEF dst, USE src1, USE src2);
Unnecessary effect.
src/hotspot/cpu/riscv/riscv.ad line 7526:
> 7524: instruct copySignF_reg(fRegF dst, fRegF src1, fRegF src2) %{
> 7525: match(Set dst (CopySignF src1 src2));
> 7526: effect(TEMP_DEF dst, USE src1, USE src2);
Unnecessary effect.
src/hotspot/cpu/riscv/riscv.ad line 7537:
> 7535: %}
> 7536:
> 7537: instruct signumD_reg(fRegD dst, fRegD src, fRegD zero, fRegD one, iRegINoSp tmp1) %{
Use `immD zero` instread, which will help avoid reserving one FP register here.
src/hotspot/cpu/riscv/riscv.ad line 7539:
> 7537: instruct signumD_reg(fRegD dst, fRegD src, fRegD zero, fRegD one, iRegINoSp tmp1) %{
> 7538: match(Set dst (SignumD src (Binary zero one)));
> 7539: effect(TEMP_DEF dst, USE src, USE one, TEMP tmp1);
Unnecessary effect if changed to use `t0` instead of `tmp1` in `signum_fp`.
src/hotspot/cpu/riscv/riscv.ad line 7548:
> 7546: %}
> 7547:
> 7548: instruct signumF_reg(fRegF dst, fRegF src, fRegF zero, fRegF one, iRegINoSp tmp1) %{
Use `immF zero` instread, which will help avoid reserving one FP register here.
src/hotspot/cpu/riscv/riscv.ad line 7550:
> 7548: instruct signumF_reg(fRegF dst, fRegF src, fRegF zero, fRegF one, iRegINoSp tmp1) %{
> 7549: match(Set dst (SignumF src (Binary zero one)));
> 7550: effect(TEMP_DEF dst, USE src, USE one, TEMP tmp1);
Unnecessary effect if changed to use `t0` instead of `tmp1` in `signum_fp`.
-------------
PR Review: https://git.openjdk.org/jdk/pull/16186#pullrequestreview-1679272664
PR Review Comment: https://git.openjdk.org/jdk/pull/16186#discussion_r1360216914
PR Review Comment: https://git.openjdk.org/jdk/pull/16186#discussion_r1360215721
PR Review Comment: https://git.openjdk.org/jdk/pull/16186#discussion_r1360219620
PR Review Comment: https://git.openjdk.org/jdk/pull/16186#discussion_r1360221019
PR Review Comment: https://git.openjdk.org/jdk/pull/16186#discussion_r1360230241
PR Review Comment: https://git.openjdk.org/jdk/pull/16186#discussion_r1360230310
PR Review Comment: https://git.openjdk.org/jdk/pull/16186#discussion_r1360222911
PR Review Comment: https://git.openjdk.org/jdk/pull/16186#discussion_r1360231462
PR Review Comment: https://git.openjdk.org/jdk/pull/16186#discussion_r1360223205
PR Review Comment: https://git.openjdk.org/jdk/pull/16186#discussion_r1360231642
More information about the hotspot-compiler-dev
mailing list