RFR: 8320998: RISC-V: C2 RoundDoubleModeV
Fei Yang
fyang at openjdk.org
Wed Sep 25 09:32:35 UTC 2024
On Tue, 24 Sep 2024 16:01:47 GMT, Dingli Zhang <dzhang at openjdk.org> wrote:
> Hi all,
>
> This patch will add RoundDoubleModeV intrinsics for riscv64. The vector implementation is similar to the scalar version. Please take a look and have some reviews. Thanks a lot!
>
> Just like https://github.com/openjdk/jdk/pull/17745, current test shows that, it bring performance gain when vlenb >= 32 (which is on k1), but bring regression when vlenb == 16 (which is on k230). So I only enable the intrinsic when vlenb >= 32.
>
> Please compare the data below, thanks!
>
> ## Test
> ### Test on k1
> test/hotspot/jtreg/compiler/c2/cr6340864/TestDoubleVect.java
> test/hotspot/jtreg/compiler/floatingpoint/TestRound.java
> test/jdk/java/lang/Math/RoundTests.java
> test/micro/org/openjdk/bench/java/math/FpRoundingBenchmark.java
> ### Test on qemu(enable RVV1.0)
> test/jdk/jdk/incubator/vector/*
>
> ## Performance - with Intrinsic
> ### on k1
> Benchmark on k1 (+intrinsic)
>
> Benchmark (TESTSIZE) Mode Cnt Score Error Units
> FpRoundingBenchmark.test_ceil 2048 thrpt 15 58.973 ± 0.460 ops/ms
> FpRoundingBenchmark.test_floor 2048 thrpt 15 59.873 ± 0.054 ops/ms
> FpRoundingBenchmark.test_rint 2048 thrpt 15 59.460 ± 0.552 ops/ms
>
>
> Benchmark on k1 (-intrinsic)
>
> Benchmark (TESTSIZE) Mode Cnt Score Error Units
> FpRoundingBenchmark.test_ceil 2048 thrpt 15 51.335 ± 0.068 ops/ms
> FpRoundingBenchmark.test_floor 2048 thrpt 15 51.356 ± 0.062 ops/ms
> FpRoundingBenchmark.test_rint 2048 thrpt 15 51.387 ± 0.059 ops/ms
>
> ### on k230
> Benchmark on k230 (+intrinsic, enable intrinsic even when vlenb == 16)
>
> Benchmark (TESTSIZE) Mode Cnt Score Error Units
> FpRoundingBenchmark.test_ceil 2048 thrpt 15 28.263 ± 0.837 ops/ms
> FpRoundingBenchmark.test_floor 2048 thrpt 15 28.130 ± 0.789 ops/ms
> FpRoundingBenchmark.test_rint 2048 thrpt 15 28.241 ± 0.868 ops/ms
>
>
> Benchmark on k230 (-intrinsic, enable intrinsic even when vlenb == 16)
>
> Benchmark (TESTSIZE) Mode Cnt Score Error Units
> FpRoundingBenchmark.test_ceil 2048 thrpt 15 44.391 ± 1.249 ops/ms
> FpRoundingBenchmark.test_floor 2048 thrpt 15 44.423 ± 1.187 ops/ms
> FpRoundingBenchmark.test_rint 2048 thrpt 15 44.441 ± 1.218 ops/ms
>
>
> ## Performance - without Intrinsic
> ### on k1, intrinsic disabled due to -Us...
src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 3119:
> 3117: break;
> 3118: case RoundDoubleModeNode::rmode_rint:
> 3119: csrwi(CSR_FRM, C2_MacroAssembler::rne);
No need to set the CSR here as `FRM` has been set to Round to Nearest mode when enter Java code. Check [JDK-8330094](https://bugs.openjdk.org/browse/JDK-8330094) for more details. And if you set `FRM` to some other rounding modes, you will need to restore it to Round to Nearest mode after processing. But the problem is that modifying CSR on RISC-V is very costly. Guess that's one of the reasons why the JMH result is not obvious.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/21164#discussion_r1774893250
More information about the hotspot-compiler-dev
mailing list