RFR: 8320998: RISC-V: C2 RoundDoubleModeV
Dingli Zhang
dzhang at openjdk.org
Tue Sep 24 16:10:14 UTC 2024
Hi all,
This patch will add RoundDoubleModeV intrinsics for riscv64. The vector implementation is similar to the scalar version. Please take a look and have some reviews. Thanks a lot!
Just like https://github.com/openjdk/jdk/pull/17745, current test shows that, it bring performance gain when vlenb >= 32 (which is on k1), but bring regression when vlenb == 16 (which is on k230). So I only enable the intrinsic when vlenb >= 32.
Please compare the data below, thanks!
## Test
test/hotspot/jtreg/compiler/c2/cr6340864/TestDoubleVect.java
test/hotspot/jtreg/compiler/floatingpoint/TestRound.java
test/jdk/java/lang/Math/RoundTests.java
test/jdk/jdk/incubator/vector/*
test/micro/org/openjdk/bench/java/math/FpRoundingBenchmark.java
## Performance - with Intrinsic
### on k1
Benchmark on k1 (+intrinsic)
Benchmark (TESTSIZE) Mode Cnt Score Error Units
FpRoundingBenchmark.test_ceil 2048 thrpt 15 58.973 ± 0.460 ops/ms
FpRoundingBenchmark.test_floor 2048 thrpt 15 59.873 ± 0.054 ops/ms
FpRoundingBenchmark.test_rint 2048 thrpt 15 59.460 ± 0.552 ops/ms
Benchmark on k1 (-intrinsic)
Benchmark (TESTSIZE) Mode Cnt Score Error Units
FpRoundingBenchmark.test_ceil 2048 thrpt 15 51.335 ± 0.068 ops/ms
FpRoundingBenchmark.test_floor 2048 thrpt 15 51.356 ± 0.062 ops/ms
FpRoundingBenchmark.test_rint 2048 thrpt 15 51.387 ± 0.059 ops/ms
### on k230
Benchmark on k230 (+intrinsic, enable intrinsic even when vlenb == 16)
Benchmark (TESTSIZE) Mode Cnt Score Error Units
FpRoundingBenchmark.test_ceil 2048 thrpt 15 28.263 ± 0.837 ops/ms
FpRoundingBenchmark.test_floor 2048 thrpt 15 28.130 ± 0.789 ops/ms
FpRoundingBenchmark.test_rint 2048 thrpt 15 28.241 ± 0.868 ops/ms
Benchmark on k230 (-intrinsic, enable intrinsic even when vlenb == 16)
Benchmark (TESTSIZE) Mode Cnt Score Error Units
FpRoundingBenchmark.test_ceil 2048 thrpt 15 44.391 ± 1.249 ops/ms
FpRoundingBenchmark.test_floor 2048 thrpt 15 44.423 ± 1.187 ops/ms
FpRoundingBenchmark.test_rint 2048 thrpt 15 44.441 ± 1.218 ops/ms
## Performance - without Intrinsic
### on k1, intrinsic disabled due to -UseSuperWord
Benchmark on k1, -UseSuperWord (+intrinsic)
Benchmark (TESTSIZE) Mode Cnt Score Error Units
FpRoundingBenchmark.test_ceil 2048 thrpt 15 51.249 ± 0.038 ops/ms
FpRoundingBenchmark.test_floor 2048 thrpt 15 51.232 ± 0.021 ops/ms
FpRoundingBenchmark.test_rint 2048 thrpt 15 51.110 ± 0.176 ops/ms
Benchmark on k1, -UseSuperWord (-intrinsic)
Benchmark (TESTSIZE) Mode Cnt Score Error Units
FpRoundingBenchmark.test_ceil 2048 thrpt 15 51.287 ± 0.151 ops/ms
FpRoundingBenchmark.test_floor 2048 thrpt 15 51.313 ± 0.107 ops/ms
FpRoundingBenchmark.test_rint 2048 thrpt 15 51.350 ± 0.067 ops/ms
### on k230, intrinsic disabled due to -UseSuperWord
Benchmark on k230, -UseSuperWord (+intrinsic)
Benchmark (TESTSIZE) Mode Cnt Score Error Units
FpRoundingBenchmark.test_ceil 2048 thrpt 15 44.375 ? 1.364 ops/ms
FpRoundingBenchmark.test_floor 2048 thrpt 15 44.532 ? 1.221 ops/ms
FpRoundingBenchmark.test_rint 2048 thrpt 15 44.675 ? 1.295 ops/ms
### on k230, intrinsic disabled due to vlenb == 16
Benchmark on k230, +UseSuperWord (+intrinsic)
Benchmark (TESTSIZE) Mode Cnt Score Error Units
FpRoundingBenchmark.test_ceil 2048 thrpt 15 44.372 ? 1.357 ops/ms
FpRoundingBenchmark.test_floor 2048 thrpt 15 44.513 ? 1.278 ops/ms
FpRoundingBenchmark.test_rint 2048 thrpt 15 44.609 ? 1.151 ops/ms
-------------
Commit messages:
- 8320998: RISC-V: C2 RoundDoubleModeV
Changes: https://git.openjdk.org/jdk/pull/21164/files
Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21164&range=00
Issue: https://bugs.openjdk.org/browse/JDK-8320998
Stats: 75 lines in 4 files changed: 75 ins; 0 del; 0 mod
Patch: https://git.openjdk.org/jdk/pull/21164.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/21164/head:pull/21164
PR: https://git.openjdk.org/jdk/pull/21164
More information about the hotspot-compiler-dev
mailing list