RFR: 8320998: RISC-V: C2 RoundDoubleModeV

Fei Yang fyang at openjdk.org
Wed Sep 25 09:32:35 UTC 2024


On Tue, 24 Sep 2024 16:01:47 GMT, Dingli Zhang <dzhang at openjdk.org> wrote:

> Hi all,
> 
> This patch will add RoundDoubleModeV intrinsics for riscv64. The vector implementation is similar to the scalar version. Please take a look and have some reviews. Thanks a lot!
> 
> Just like https://github.com/openjdk/jdk/pull/17745, current test shows that, it bring performance gain when vlenb >= 32 (which is on k1), but bring regression when vlenb == 16 (which is on k230). So I only enable the intrinsic when vlenb >= 32.
> 
> Please compare the data below, thanks!
> 
> ## Test
> ### Test on k1
> test/hotspot/jtreg/compiler/c2/cr6340864/TestDoubleVect.java
> test/hotspot/jtreg/compiler/floatingpoint/TestRound.java
> test/jdk/java/lang/Math/RoundTests.java
> test/micro/org/openjdk/bench/java/math/FpRoundingBenchmark.java
> ### Test on qemu(enable RVV1.0)
> test/jdk/jdk/incubator/vector/*
> 
> ## Performance - with Intrinsic
> ### on k1
> Benchmark on k1 (+intrinsic)
> 
> Benchmark                              (TESTSIZE)   Mode  Cnt   Score   Error   Units
> FpRoundingBenchmark.test_ceil                2048  thrpt   15  58.973 ± 0.460  ops/ms
> FpRoundingBenchmark.test_floor               2048  thrpt   15  59.873 ± 0.054  ops/ms
> FpRoundingBenchmark.test_rint                2048  thrpt   15  59.460 ± 0.552  ops/ms
> 
> 
> Benchmark on k1 (-intrinsic)
> 
> Benchmark                              (TESTSIZE)   Mode  Cnt   Score   Error   Units
> FpRoundingBenchmark.test_ceil                2048  thrpt   15  51.335 ± 0.068  ops/ms
> FpRoundingBenchmark.test_floor               2048  thrpt   15  51.356 ± 0.062  ops/ms
> FpRoundingBenchmark.test_rint                2048  thrpt   15  51.387 ± 0.059  ops/ms
> 
> ### on k230
> Benchmark on k230 (+intrinsic, enable intrinsic even when vlenb == 16)
> 
> Benchmark                              (TESTSIZE)   Mode  Cnt   Score   Error   Units
> FpRoundingBenchmark.test_ceil                2048  thrpt   15  28.263 ± 0.837  ops/ms
> FpRoundingBenchmark.test_floor               2048  thrpt   15  28.130 ± 0.789  ops/ms
> FpRoundingBenchmark.test_rint                2048  thrpt   15  28.241 ± 0.868  ops/ms
> 
> 
> Benchmark on k230 (-intrinsic, enable intrinsic even when vlenb == 16)
> 
> Benchmark                              (TESTSIZE)   Mode  Cnt   Score   Error   Units
> FpRoundingBenchmark.test_ceil                2048  thrpt   15  44.391 ± 1.249  ops/ms
> FpRoundingBenchmark.test_floor               2048  thrpt   15  44.423 ± 1.187  ops/ms
> FpRoundingBenchmark.test_rint                2048  thrpt   15  44.441 ± 1.218  ops/ms
> 
> 
> ## Performance - without Intrinsic
> ### on k1, intrinsic disabled due to -Us...

src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 3119:

> 3117:       break;
> 3118:     case RoundDoubleModeNode::rmode_rint:
> 3119:       csrwi(CSR_FRM, C2_MacroAssembler::rne);

No need to set the CSR here as `FRM` has been set to Round to Nearest mode when enter Java code. Check [JDK-8330094](https://bugs.openjdk.org/browse/JDK-8330094) for more details. And if you set `FRM` to some other rounding modes, you will need to restore it to Round to Nearest mode after processing. But the problem is that modifying CSR on RISC-V is very costly. Guess that's one of the reasons why the JMH result is not obvious.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/21164#discussion_r1774893250


More information about the hotspot-compiler-dev mailing list