Withdrawn: 8320998: RISC-V: C2 RoundDoubleModeV

duke duke at openjdk.org
Sat Feb 1 06:25:47 UTC 2025


On Tue, 24 Sep 2024 16:01:47 GMT, Dingli Zhang <dzhang at openjdk.org> wrote:

> Hi all,
> 
> This patch will add RoundDoubleModeV intrinsics for riscv64. The vector implementation is similar to the scalar version. Please take a look and have some reviews. Thanks a lot!
> 
> Just like https://github.com/openjdk/jdk/pull/17745, current test shows that, it bring performance gain when vlenb >= 32 (which is on k1), but bring regression when vlenb == 16 (which is on k230). So I only enable the intrinsic when vlenb >= 32.
> 
> Please compare the data below, thanks!
> 
> ## Test
> ### Test on k1
> test/hotspot/jtreg/compiler/c2/cr6340864/TestDoubleVect.java
> test/hotspot/jtreg/compiler/floatingpoint/TestRound.java
> test/jdk/java/lang/Math/RoundTests.java
> test/micro/org/openjdk/bench/java/math/FpRoundingBenchmark.java
> ### Test on qemu(enable RVV1.0)
> test/jdk/jdk/incubator/vector/*
> 
> ## Performance - with Intrinsic
> ### on k1
> Benchmark on k1 (+intrinsic)
> 
> Benchmark                              (TESTSIZE)   Mode  Cnt   Score   Error   Units
> FpRoundingBenchmark.test_ceil                2048  thrpt   15  58.973 ± 0.460  ops/ms
> FpRoundingBenchmark.test_floor               2048  thrpt   15  59.873 ± 0.054  ops/ms
> FpRoundingBenchmark.test_rint                2048  thrpt   15  59.460 ± 0.552  ops/ms
> 
> 
> Benchmark on k1 (-intrinsic)
> 
> Benchmark                              (TESTSIZE)   Mode  Cnt   Score   Error   Units
> FpRoundingBenchmark.test_ceil                2048  thrpt   15  51.335 ± 0.068  ops/ms
> FpRoundingBenchmark.test_floor               2048  thrpt   15  51.356 ± 0.062  ops/ms
> FpRoundingBenchmark.test_rint                2048  thrpt   15  51.387 ± 0.059  ops/ms
> 
> ### on k230
> Benchmark on k230 (+intrinsic, enable intrinsic even when vlenb == 16)
> 
> Benchmark                              (TESTSIZE)   Mode  Cnt   Score   Error   Units
> FpRoundingBenchmark.test_ceil                2048  thrpt   15  28.263 ± 0.837  ops/ms
> FpRoundingBenchmark.test_floor               2048  thrpt   15  28.130 ± 0.789  ops/ms
> FpRoundingBenchmark.test_rint                2048  thrpt   15  28.241 ± 0.868  ops/ms
> 
> 
> Benchmark on k230 (-intrinsic, enable intrinsic even when vlenb == 16)
> 
> Benchmark                              (TESTSIZE)   Mode  Cnt   Score   Error   Units
> FpRoundingBenchmark.test_ceil                2048  thrpt   15  44.391 ± 1.249  ops/ms
> FpRoundingBenchmark.test_floor               2048  thrpt   15  44.423 ± 1.187  ops/ms
> FpRoundingBenchmark.test_rint                2048  thrpt   15  44.441 ± 1.218  ops/ms
> 
> 
> ## Performance - without Intrinsic
> ### on k1, intrinsic disabled due to -Us...

This pull request has been closed without being integrated.

-------------

PR: https://git.openjdk.org/jdk/pull/21164


More information about the hotspot-compiler-dev mailing list