RFR: 8320998: RISC-V: C2 RoundDoubleModeV

Dingli Zhang dzhang at openjdk.org
Tue Sep 24 16:10:14 UTC 2024


Hi all,

This patch will add RoundDoubleModeV intrinsics for riscv64. The vector implementation is similar to the scalar version. Please take a look and have some reviews. Thanks a lot!

Just like https://github.com/openjdk/jdk/pull/17745, current test shows that, it bring performance gain when vlenb >= 32 (which is on k1), but bring regression when vlenb == 16 (which is on k230). So I only enable the intrinsic when vlenb >= 32.

Please compare the data below, thanks!

## Test
test/hotspot/jtreg/compiler/c2/cr6340864/TestDoubleVect.java
test/hotspot/jtreg/compiler/floatingpoint/TestRound.java
test/jdk/java/lang/Math/RoundTests.java
test/jdk/jdk/incubator/vector/*
test/micro/org/openjdk/bench/java/math/FpRoundingBenchmark.java

## Performance - with Intrinsic
### on k1
Benchmark on k1 (+intrinsic)

Benchmark                              (TESTSIZE)   Mode  Cnt   Score   Error   Units
FpRoundingBenchmark.test_ceil                2048  thrpt   15  58.973 ± 0.460  ops/ms
FpRoundingBenchmark.test_floor               2048  thrpt   15  59.873 ± 0.054  ops/ms
FpRoundingBenchmark.test_rint                2048  thrpt   15  59.460 ± 0.552  ops/ms


Benchmark on k1 (-intrinsic)

Benchmark                              (TESTSIZE)   Mode  Cnt   Score   Error   Units
FpRoundingBenchmark.test_ceil                2048  thrpt   15  51.335 ± 0.068  ops/ms
FpRoundingBenchmark.test_floor               2048  thrpt   15  51.356 ± 0.062  ops/ms
FpRoundingBenchmark.test_rint                2048  thrpt   15  51.387 ± 0.059  ops/ms

### on k230
Benchmark on k230 (+intrinsic, enable intrinsic even when vlenb == 16)

Benchmark                              (TESTSIZE)   Mode  Cnt   Score   Error   Units
FpRoundingBenchmark.test_ceil                2048  thrpt   15  28.263 ± 0.837  ops/ms
FpRoundingBenchmark.test_floor               2048  thrpt   15  28.130 ± 0.789  ops/ms
FpRoundingBenchmark.test_rint                2048  thrpt   15  28.241 ± 0.868  ops/ms


Benchmark on k230 (-intrinsic, enable intrinsic even when vlenb == 16)

Benchmark                              (TESTSIZE)   Mode  Cnt   Score   Error   Units
FpRoundingBenchmark.test_ceil                2048  thrpt   15  44.391 ± 1.249  ops/ms
FpRoundingBenchmark.test_floor               2048  thrpt   15  44.423 ± 1.187  ops/ms
FpRoundingBenchmark.test_rint                2048  thrpt   15  44.441 ± 1.218  ops/ms


## Performance - without Intrinsic
### on k1, intrinsic disabled due to -UseSuperWord
Benchmark on k1, -UseSuperWord (+intrinsic)

Benchmark                              (TESTSIZE)   Mode  Cnt   Score   Error   Units
FpRoundingBenchmark.test_ceil                2048  thrpt   15  51.249 ± 0.038  ops/ms
FpRoundingBenchmark.test_floor               2048  thrpt   15  51.232 ± 0.021  ops/ms
FpRoundingBenchmark.test_rint                2048  thrpt   15  51.110 ± 0.176  ops/ms


Benchmark on k1, -UseSuperWord (-intrinsic)

Benchmark                              (TESTSIZE)   Mode  Cnt   Score   Error   Units
FpRoundingBenchmark.test_ceil                2048  thrpt   15  51.287 ± 0.151  ops/ms
FpRoundingBenchmark.test_floor               2048  thrpt   15  51.313 ± 0.107  ops/ms
FpRoundingBenchmark.test_rint                2048  thrpt   15  51.350 ± 0.067  ops/ms

### on k230, intrinsic disabled due to -UseSuperWord
Benchmark on k230, -UseSuperWord (+intrinsic)

Benchmark                              (TESTSIZE)   Mode  Cnt   Score   Error   Units
FpRoundingBenchmark.test_ceil                2048  thrpt   15  44.375 ? 1.364  ops/ms
FpRoundingBenchmark.test_floor               2048  thrpt   15  44.532 ? 1.221  ops/ms
FpRoundingBenchmark.test_rint                2048  thrpt   15  44.675 ? 1.295  ops/ms

### on k230, intrinsic disabled due to vlenb == 16
Benchmark on k230, +UseSuperWord (+intrinsic)

Benchmark                              (TESTSIZE)   Mode  Cnt   Score   Error   Units
FpRoundingBenchmark.test_ceil                2048  thrpt   15  44.372 ? 1.357  ops/ms
FpRoundingBenchmark.test_floor               2048  thrpt   15  44.513 ? 1.278  ops/ms
FpRoundingBenchmark.test_rint                2048  thrpt   15  44.609 ? 1.151  ops/ms

-------------

Commit messages:
 - 8320998: RISC-V: C2 RoundDoubleModeV

Changes: https://git.openjdk.org/jdk/pull/21164/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21164&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8320998
  Stats: 75 lines in 4 files changed: 75 ins; 0 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/21164.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/21164/head:pull/21164

PR: https://git.openjdk.org/jdk/pull/21164


More information about the hotspot-compiler-dev mailing list