RFR: 8322790: RISC-V: Tune costs for shuffles with no conversion

Sat Jan 6 16:30:21 UTC 2024

On Sat, 30 Dec 2023 20:07:13 GMT, Ilya Gavrilin <igavrilin at openjdk.org> wrote:

> Hi all, please review this small change to RISC-V nodes insertion costs.
> Now we have several nodes which provide shuffles without conversion: https://github.com/openjdk/jdk/blob/32d80e2caf6063b58128bd5f3dc87b276f3bd0cb/src/hotspot/cpu/riscv/riscv.ad#L8525-L8741
> On most RISC-V cpu`s we prefer reg<->reg operations, because they are faster, but now stack<->reg operations used (for details about reasons, please, visit connected jbs issue).
> After changing insertion costs reg<->reg operations selected, and we can see performance improvements for benchmarks, which use such shuffles (tested on thead C910 board):
> |              Benchmark              | Upstream build (ops/ms) | Patched build (ops/ms) | difference (%) |
> |:-----------------------------------:|:-----------------------:|:----------------------:|:--------------:|
> | MathBench.doubleToRawLongBitsDouble |        30935.139        |        32171.761       |      +4.00      |
> |      StrictMathBench.ceilDouble     |        24682.810        |        29782.050       |      +20.66     |
> |      StrictMathBench.cosDouble      |         6948.309        |        6938.276        |      -0.14     |
> |      StrictMathBench.expDouble      |         6816.143        |        7211.021        |      +5.79      |
> |     StrictMathBench.floorDouble     |        30699.630        |        34189.509       |      +11.37     |
> |      StrictMathBench.maxDouble      |        35157.355        |        34675.191       |      -1.37     |
> |      StrictMathBench.minDouble      |        35192.135        |        35183.015       |      -0.03     |
> |      StrictMathBench.sinDouble      |         6698.405        |        6721.809        |      +0.35      |
> 
> New benchmark for changed nodes:
> 
> --- a/test/micro/org/openjdk/bench/java/lang/MathBench.java
> +++ b/test/micro/org/openjdk/bench/java/lang/MathBench.java
> @@ -540,4 +540,11 @@ public class MathBench {
>          return  Math.ulp(float7);
>      }
>  
> +    @Benchmark
> +    public long doubleToRawLongBitsDouble() {
> +        double dbl162Dot5 = double81 * 2.0d + double0Dot5;
> +        double dbl3 = double2 + double1;
> +        return Double.doubleToRawLongBits(dbl162Dot5) + Double.doubleToRawLongBits(dbl3);
> +    }
> +

Thanks @RealFYang for suggested changes, performed some additional tests on thead board, also checked JIT code for some tests. 
| Benchmark                                | Upstream  | Old patch | Current patch |
|------------------------------------------|-----------|-----------|---------------|
| lang.MathBench.doubleToRawLongBitsDouble | 30495.868 | 32332.48  | 31635.15      |
| lang.MathBench.longBitsToDoubleLong      | 35161.101 | 34542.878 | 34146.705     |
| lang.StrictMathBench.ceilDouble          | 24272.224 | 29797.862 | 29094.981     |
| lang.StrictMathBench.cosDouble           | 6967.161  | 6930.468  | 6960.957      |
| lang.StrictMathBench.expDouble           | 6812.605  | 7211.988  | 7123.429      |
| lang.StrictMathBench.floorDouble         | 29893.151 | 34193.412 | 33257.669     |
| lang.StrictMathBench.maxDouble           | 34684.497 | 35194.694 | 35199.944     |
| lang.StrictMathBench.minDouble           | 34692.521 | 34673.531 | 34678.324     |
| lang.StrictMathBench.sinDouble           | 6769.593  | 6714.003  | 6736.884      |
| math.FpRoundingBenchmark.testnativeceil  | 67.801    | 115.6     | 116.822       |
| math.FpRoundingBenchmark.testnativefloor | 71.745    | 116.59    | 116.662       |

Additional benchmarks:

diff --git a/test/micro/org/openjdk/bench/java/lang/MathBench.java b/test/micro/org/openjdk/bench/java/lang/MathBench.java
index 27d8033b8b7..fd39cc58222 100644
--- a/test/micro/org/openjdk/bench/java/lang/MathBench.java
+++ b/test/micro/org/openjdk/bench/java/lang/MathBench.java
@@ -540,4 +540,17 @@ public class MathBench {
         return  Math.ulp(float7);
     }
 
+    @Benchmark
+    public long doubleToRawLongBitsDouble() {
+        double dbl162Dot5 = double81 * 2.0d + double0Dot5;
+        double dbl3 = double2 + double1;
+        return Double.doubleToRawLongBits(dbl162Dot5) + Double.doubleToRawLongBits(dbl3);
+    }
+
+    @Benchmark
+    public double longBitsToDoubleLong() {
+        long lng14 = long13 + long1;
+        long lng750 = long747 + 3;
+        return Double.longBitsToDouble(lng14) + Double.longBitsToDouble(lng750);
+    }
 }
diff --git a/test/micro/org/openjdk/bench/java/math/FpRoundingBenchmark.java b/test/micro/org/openjdk/bench/java/math/FpRoundingBenchmark.java
index cf0eed32e07..3687f43b886 100644
--- a/test/micro/org/openjdk/bench/java/math/FpRoundingBenchmark.java
+++ b/test/micro/org/openjdk/bench/java/math/FpRoundingBenchmark.java
@@ -75,4 +75,16 @@ public class FpRoundingBenchmark {
     for (int i = 0; i < TESTSIZE; i++)
       Res[i] = Math.rint(DargV1[i]);
   }
+
+  @Benchmark
+  public void testnativeceil(Blackhole bh) {
+    for (int i = 0; i < TESTSIZE; i++)
+      Res[i] = StrictMath.ceil(DargV1[i]);
+  }
+
+  @Benchmark
+  public void testnativefloor(Blackhole bh) {
+    for (int i = 0; i < TESTSIZE; i++)
+      Res[i] = StrictMath.floor(DargV1[i]);
+  }
 }

-------------

PR Comment: https://git.openjdk.org/jdk/pull/17206#issuecomment-1879745479