RFR: 8322790: RISC-V: Tune costs for shuffles with no conversion
Ilya Gavrilin
igavrilin at openjdk.org
Sat Jan 6 16:30:21 UTC 2024
On Sat, 30 Dec 2023 20:07:13 GMT, Ilya Gavrilin <igavrilin at openjdk.org> wrote:
> Hi all, please review this small change to RISC-V nodes insertion costs.
> Now we have several nodes which provide shuffles without conversion: https://github.com/openjdk/jdk/blob/32d80e2caf6063b58128bd5f3dc87b276f3bd0cb/src/hotspot/cpu/riscv/riscv.ad#L8525-L8741
> On most RISC-V cpu`s we prefer reg<->reg operations, because they are faster, but now stack<->reg operations used (for details about reasons, please, visit connected jbs issue).
> After changing insertion costs reg<->reg operations selected, and we can see performance improvements for benchmarks, which use such shuffles (tested on thead C910 board):
> | Benchmark | Upstream build (ops/ms) | Patched build (ops/ms) | difference (%) |
> |:-----------------------------------:|:-----------------------:|:----------------------:|:--------------:|
> | MathBench.doubleToRawLongBitsDouble | 30935.139 | 32171.761 | +4.00 |
> | StrictMathBench.ceilDouble | 24682.810 | 29782.050 | +20.66 |
> | StrictMathBench.cosDouble | 6948.309 | 6938.276 | -0.14 |
> | StrictMathBench.expDouble | 6816.143 | 7211.021 | +5.79 |
> | StrictMathBench.floorDouble | 30699.630 | 34189.509 | +11.37 |
> | StrictMathBench.maxDouble | 35157.355 | 34675.191 | -1.37 |
> | StrictMathBench.minDouble | 35192.135 | 35183.015 | -0.03 |
> | StrictMathBench.sinDouble | 6698.405 | 6721.809 | +0.35 |
>
> New benchmark for changed nodes:
>
> --- a/test/micro/org/openjdk/bench/java/lang/MathBench.java
> +++ b/test/micro/org/openjdk/bench/java/lang/MathBench.java
> @@ -540,4 +540,11 @@ public class MathBench {
> return Math.ulp(float7);
> }
>
> + @Benchmark
> + public long doubleToRawLongBitsDouble() {
> + double dbl162Dot5 = double81 * 2.0d + double0Dot5;
> + double dbl3 = double2 + double1;
> + return Double.doubleToRawLongBits(dbl162Dot5) + Double.doubleToRawLongBits(dbl3);
> + }
> +
Thanks @RealFYang for suggested changes, performed some additional tests on thead board, also checked JIT code for some tests.
| Benchmark | Upstream | Old patch | Current patch |
|------------------------------------------|-----------|-----------|---------------|
| lang.MathBench.doubleToRawLongBitsDouble | 30495.868 | 32332.48 | 31635.15 |
| lang.MathBench.longBitsToDoubleLong | 35161.101 | 34542.878 | 34146.705 |
| lang.StrictMathBench.ceilDouble | 24272.224 | 29797.862 | 29094.981 |
| lang.StrictMathBench.cosDouble | 6967.161 | 6930.468 | 6960.957 |
| lang.StrictMathBench.expDouble | 6812.605 | 7211.988 | 7123.429 |
| lang.StrictMathBench.floorDouble | 29893.151 | 34193.412 | 33257.669 |
| lang.StrictMathBench.maxDouble | 34684.497 | 35194.694 | 35199.944 |
| lang.StrictMathBench.minDouble | 34692.521 | 34673.531 | 34678.324 |
| lang.StrictMathBench.sinDouble | 6769.593 | 6714.003 | 6736.884 |
| math.FpRoundingBenchmark.testnativeceil | 67.801 | 115.6 | 116.822 |
| math.FpRoundingBenchmark.testnativefloor | 71.745 | 116.59 | 116.662 |
Additional benchmarks:
diff --git a/test/micro/org/openjdk/bench/java/lang/MathBench.java b/test/micro/org/openjdk/bench/java/lang/MathBench.java
index 27d8033b8b7..fd39cc58222 100644
--- a/test/micro/org/openjdk/bench/java/lang/MathBench.java
+++ b/test/micro/org/openjdk/bench/java/lang/MathBench.java
@@ -540,4 +540,17 @@ public class MathBench {
return Math.ulp(float7);
}
+ @Benchmark
+ public long doubleToRawLongBitsDouble() {
+ double dbl162Dot5 = double81 * 2.0d + double0Dot5;
+ double dbl3 = double2 + double1;
+ return Double.doubleToRawLongBits(dbl162Dot5) + Double.doubleToRawLongBits(dbl3);
+ }
+
+ @Benchmark
+ public double longBitsToDoubleLong() {
+ long lng14 = long13 + long1;
+ long lng750 = long747 + 3;
+ return Double.longBitsToDouble(lng14) + Double.longBitsToDouble(lng750);
+ }
}
diff --git a/test/micro/org/openjdk/bench/java/math/FpRoundingBenchmark.java b/test/micro/org/openjdk/bench/java/math/FpRoundingBenchmark.java
index cf0eed32e07..3687f43b886 100644
--- a/test/micro/org/openjdk/bench/java/math/FpRoundingBenchmark.java
+++ b/test/micro/org/openjdk/bench/java/math/FpRoundingBenchmark.java
@@ -75,4 +75,16 @@ public class FpRoundingBenchmark {
for (int i = 0; i < TESTSIZE; i++)
Res[i] = Math.rint(DargV1[i]);
}
+
+ @Benchmark
+ public void testnativeceil(Blackhole bh) {
+ for (int i = 0; i < TESTSIZE; i++)
+ Res[i] = StrictMath.ceil(DargV1[i]);
+ }
+
+ @Benchmark
+ public void testnativefloor(Blackhole bh) {
+ for (int i = 0; i < TESTSIZE; i++)
+ Res[i] = StrictMath.floor(DargV1[i]);
+ }
}
-------------
PR Comment: https://git.openjdk.org/jdk/pull/17206#issuecomment-1879745479
More information about the hotspot-compiler-dev
mailing list