RFR: 8317720: RISC-V: Implement Adler32 intrinsic [v11]

Fei Yang fyang at openjdk.org
Mon Jul 1 07:10:28 UTC 2024


On Thu, 27 Jun 2024 12:46:11 GMT, ArsenyBochkarev <duke at openjdk.org> wrote:

>> Hi! Sorry for such a late reply. Here are the results on Banana-Pi. 
>> Numbers for different `vsetvli` depending on the `MaxVectorSize` (= 32 by default on Banana-Pi):
>> 
>> Benchmark                              (count)   Mode  Cnt     Score     Error   Units
>> Adler32.TestAdler32.testAdler32Update       64  thrpt   25  7564.243 ± 103.398  ops/ms
>> Adler32.TestAdler32.testAdler32Update      128  thrpt   25  5691.398 ±  72.201  ops/ms
>> Adler32.TestAdler32.testAdler32Update      256  thrpt   25  3831.476 ±  32.985  ops/ms
>> Adler32.TestAdler32.testAdler32Update      512  thrpt   25  2325.933 ±  10.056  ops/ms
>> Adler32.TestAdler32.testAdler32Update     1024  thrpt   25  1311.345 ±   3.337  ops/ms
>> Adler32.TestAdler32.testAdler32Update     2048  thrpt   25   697.412 ±   0.906  ops/ms
>> Adler32.TestAdler32.testAdler32Update     5012  thrpt   25   294.467 ±   0.131  ops/ms
>> Adler32.TestAdler32.testAdler32Update     8192  thrpt   25   182.197 ±   0.075  ops/ms
>> Adler32.TestAdler32.testAdler32Update    16384  thrpt   25    92.100 ±   0.075  ops/ms
>> Adler32.TestAdler32.testAdler32Update    32768  thrpt   25    45.242 ±   0.006  ops/ms
>> Adler32.TestAdler32.testAdler32Update    65536  thrpt   25    22.878 ±   0.008  ops/ms
>> 
>> 
>> and for stable `vsetvli` with `LMULx2` for any `MaxVectorSize`:
>> 
>> Benchmark                              (count)   Mode  Cnt     Score     Error   Units
>> Adler32.TestAdler32.testAdler32Update       64  thrpt   25  7439.400 ± 128.059  ops/ms
>> Adler32.TestAdler32.testAdler32Update      128  thrpt   25  5662.910 ±  73.459  ops/ms
>> Adler32.TestAdler32.testAdler32Update      256  thrpt   25  3831.516 ±  33.050  ops/ms
>> Adler32.TestAdler32.testAdler32Update      512  thrpt   25  2345.353 ±   9.946  ops/ms
>> Adler32.TestAdler32.testAdler32Update     1024  thrpt   25  1305.337 ±   3.294  ops/ms
>> Adler32.TestAdler32.testAdler32Update     2048  thrpt   25   695.615 ±   0.775  ops/ms
>> Adler32.TestAdler32.testAdler32Update     5012  thrpt   25   294.332 ±   0.132  ops/ms
>> Adler32.TestAdler32.testAdler32Update     8192  thrpt   25   182.178 ±   0.086  ops/ms
>> Adler32.TestAdler32.testAdler32Update    16384  thrpt   25    92.125 ±   0.032  ops/ms
>> Adler32.TestAdler32.testAdler32Update    32768  thrpt   25    45.238 ±   0.030  ops/ms
>> Adler32.TestAdler32.testAdler32Update    65536  thrpt   25    22.880 ±   0.008  ops/ms
>
> So I suppose it is safe to stay on conditionless `vsetvli`?

Hmm ... JMH data on my banana-pi (running a `OS: Armbian (24.5.0-trunk) riscv64 / 6.1.15-legacy-k1 kernel` from the vendor) is kind of different from yours for these two approaches.

1. __ vsetvli(temp0, count, Assembler::e16, LMUL) for (MaxVectorSize > 16)

Benchmark                      (count)   Mode  Cnt     Score     Error   Units
TestAdler32.testAdler32Update       64  thrpt   25  7364.310 ± 103.256  ops/ms
TestAdler32.testAdler32Update      128  thrpt   25  5651.856 ±  71.376  ops/ms
TestAdler32.testAdler32Update      256  thrpt   25  3803.744 ±  18.320  ops/ms
TestAdler32.testAdler32Update      512  thrpt   25  2324.802 ±   8.553  ops/ms
TestAdler32.testAdler32Update     1024  thrpt   25  1306.936 ±   4.027  ops/ms
TestAdler32.testAdler32Update     2048  thrpt   25   696.408 ±   1.925  ops/ms
TestAdler32.testAdler32Update     5012  thrpt   25   294.126 ±   0.644  ops/ms
TestAdler32.testAdler32Update     8192  thrpt   25   182.142 ±   0.048  ops/ms
TestAdler32.testAdler32Update    16384  thrpt   25    92.007 ±   0.253  ops/ms
TestAdler32.testAdler32Update    32768  thrpt   25    45.190 ±   0.158  ops/ms
TestAdler32.testAdler32Update    65536  thrpt   25    22.873 ±   0.014  ops/ms


2. __ vsetvli(temp0, count, Assembler::e16, LMULx2) for (MaxVectorSize == 16)

Benchmark                      (count)   Mode  Cnt     Score    Error   Units
TestAdler32.testAdler32Update       64  thrpt   25  7683.759 ± 92.761  ops/ms
TestAdler32.testAdler32Update      128  thrpt   25  6226.934 ± 71.597  ops/ms
TestAdler32.testAdler32Update      256  thrpt   25  4409.333 ± 27.677  ops/ms
TestAdler32.testAdler32Update      512  thrpt   25  2813.737 ±  5.570  ops/ms
TestAdler32.testAdler32Update     1024  thrpt   25  1635.601 ±  1.207  ops/ms
TestAdler32.testAdler32Update     2048  thrpt   25   891.615 ±  0.999  ops/ms
TestAdler32.testAdler32Update     5012  thrpt   25   382.035 ±  0.255  ops/ms
TestAdler32.testAdler32Update     8192  thrpt   25   237.338 ±  0.282  ops/ms
TestAdler32.testAdler32Update    16384  thrpt   25   120.517 ±  0.044  ops/ms
TestAdler32.testAdler32Update    32768  thrpt   25    58.957 ±  0.059  ops/ms
TestAdler32.testAdler32Update    65536  thrpt   25    29.881 ±  0.009  ops/ms

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/18382#discussion_r1660568178


More information about the hotspot-compiler-dev mailing list