RFR: 8318158: RISC-V: implement roundD/roundF intrinsics [v6]

Vladimir Kempik vkempik at openjdk.org
Fri Dec 8 10:02:18 UTC 2023


On Tue, 5 Dec 2023 03:33:52 GMT, Fei Yang <fyang at openjdk.org> wrote:

>> Olga Mikhaltsova has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Replaced tmp with t0
>
> Unfortunately, I witnessed performance regression on sifive unmatched board.
> 
> Before:
> 
> FpRoundingBenchmark.test_ceil                2048  thrpt   15  39.243 ? 0.506  ops/ms
> FpRoundingBenchmark.test_floor               2048  thrpt   15  39.448 ? 0.076  ops/ms
> FpRoundingBenchmark.test_rint                2048  thrpt   15  39.411 ? 0.134  ops/ms
> FpRoundingBenchmark.test_round_double        2048  thrpt   15  31.329 ? 0.085  ops/ms
> FpRoundingBenchmark.test_round_float         2048  thrpt   15  31.328 ? 0.031  ops/ms
> 
> After:
> 
> FpRoundingBenchmark.test_ceil                2048  thrpt   15  39.375 ? 0.125  ops/ms
> FpRoundingBenchmark.test_floor               2048  thrpt   15  39.407 ? 0.076  ops/ms
> FpRoundingBenchmark.test_rint                2048  thrpt   15  39.387 ? 0.235  ops/ms
> FpRoundingBenchmark.test_round_double        2048  thrpt   15  23.940 ? 0.025  ops/ms
> FpRoundingBenchmark.test_round_float         2048  thrpt   15  30.629 ? 0.021  ops/ms

> > @RealFYang I've reproduced this performance regression on VisionFive 2. The results are as follow:
> > ```
> >  Before
> > Benchmark                              (TESTSIZE)   Mode  Cnt   Score   Error   Units
> > FpRoundingBenchmark.test_round_double        2048  thrpt   15  39.335 ± 0.122  ops/ms
> > FpRoundingBenchmark.test_round_float         2048  thrpt   15  39.327 ± 0.138  ops/ms
> > After
> > FpRoundingBenchmark.test_round_double        2048  thrpt   15  30.004 ± 0.192  ops/ms
> > FpRoundingBenchmark.test_round_float         2048  thrpt   15  38.489 ± 0.120  ops/ms
> > ```
> 
> That is, to say the very least, surprising. I'd use -prof:perfasm to find out why.

-prof:perfasm doesn't work on u74 boards(hifive and visionfive2) as is, some problems with cycles event.
This works: -prof perfasm:"events=cpu-clock"
but it's s/w event, still better than nothing.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/16382#issuecomment-1846888863


More information about the hotspot-dev mailing list