RFR: 8317720: RISC-V: Implement Adler32 intrinsic [v11]
Fei Yang
fyang at openjdk.org
Mon Jul 1 07:10:28 UTC 2024
On Thu, 27 Jun 2024 12:46:11 GMT, ArsenyBochkarev <duke at openjdk.org> wrote:
>> Hi! Sorry for such a late reply. Here are the results on Banana-Pi.
>> Numbers for different `vsetvli` depending on the `MaxVectorSize` (= 32 by default on Banana-Pi):
>>
>> Benchmark (count) Mode Cnt Score Error Units
>> Adler32.TestAdler32.testAdler32Update 64 thrpt 25 7564.243 ± 103.398 ops/ms
>> Adler32.TestAdler32.testAdler32Update 128 thrpt 25 5691.398 ± 72.201 ops/ms
>> Adler32.TestAdler32.testAdler32Update 256 thrpt 25 3831.476 ± 32.985 ops/ms
>> Adler32.TestAdler32.testAdler32Update 512 thrpt 25 2325.933 ± 10.056 ops/ms
>> Adler32.TestAdler32.testAdler32Update 1024 thrpt 25 1311.345 ± 3.337 ops/ms
>> Adler32.TestAdler32.testAdler32Update 2048 thrpt 25 697.412 ± 0.906 ops/ms
>> Adler32.TestAdler32.testAdler32Update 5012 thrpt 25 294.467 ± 0.131 ops/ms
>> Adler32.TestAdler32.testAdler32Update 8192 thrpt 25 182.197 ± 0.075 ops/ms
>> Adler32.TestAdler32.testAdler32Update 16384 thrpt 25 92.100 ± 0.075 ops/ms
>> Adler32.TestAdler32.testAdler32Update 32768 thrpt 25 45.242 ± 0.006 ops/ms
>> Adler32.TestAdler32.testAdler32Update 65536 thrpt 25 22.878 ± 0.008 ops/ms
>>
>>
>> and for stable `vsetvli` with `LMULx2` for any `MaxVectorSize`:
>>
>> Benchmark (count) Mode Cnt Score Error Units
>> Adler32.TestAdler32.testAdler32Update 64 thrpt 25 7439.400 ± 128.059 ops/ms
>> Adler32.TestAdler32.testAdler32Update 128 thrpt 25 5662.910 ± 73.459 ops/ms
>> Adler32.TestAdler32.testAdler32Update 256 thrpt 25 3831.516 ± 33.050 ops/ms
>> Adler32.TestAdler32.testAdler32Update 512 thrpt 25 2345.353 ± 9.946 ops/ms
>> Adler32.TestAdler32.testAdler32Update 1024 thrpt 25 1305.337 ± 3.294 ops/ms
>> Adler32.TestAdler32.testAdler32Update 2048 thrpt 25 695.615 ± 0.775 ops/ms
>> Adler32.TestAdler32.testAdler32Update 5012 thrpt 25 294.332 ± 0.132 ops/ms
>> Adler32.TestAdler32.testAdler32Update 8192 thrpt 25 182.178 ± 0.086 ops/ms
>> Adler32.TestAdler32.testAdler32Update 16384 thrpt 25 92.125 ± 0.032 ops/ms
>> Adler32.TestAdler32.testAdler32Update 32768 thrpt 25 45.238 ± 0.030 ops/ms
>> Adler32.TestAdler32.testAdler32Update 65536 thrpt 25 22.880 ± 0.008 ops/ms
>
> So I suppose it is safe to stay on conditionless `vsetvli`?
Hmm ... JMH data on my banana-pi (running a `OS: Armbian (24.5.0-trunk) riscv64 / 6.1.15-legacy-k1 kernel` from the vendor) is kind of different from yours for these two approaches.
1. __ vsetvli(temp0, count, Assembler::e16, LMUL) for (MaxVectorSize > 16)
Benchmark (count) Mode Cnt Score Error Units
TestAdler32.testAdler32Update 64 thrpt 25 7364.310 ± 103.256 ops/ms
TestAdler32.testAdler32Update 128 thrpt 25 5651.856 ± 71.376 ops/ms
TestAdler32.testAdler32Update 256 thrpt 25 3803.744 ± 18.320 ops/ms
TestAdler32.testAdler32Update 512 thrpt 25 2324.802 ± 8.553 ops/ms
TestAdler32.testAdler32Update 1024 thrpt 25 1306.936 ± 4.027 ops/ms
TestAdler32.testAdler32Update 2048 thrpt 25 696.408 ± 1.925 ops/ms
TestAdler32.testAdler32Update 5012 thrpt 25 294.126 ± 0.644 ops/ms
TestAdler32.testAdler32Update 8192 thrpt 25 182.142 ± 0.048 ops/ms
TestAdler32.testAdler32Update 16384 thrpt 25 92.007 ± 0.253 ops/ms
TestAdler32.testAdler32Update 32768 thrpt 25 45.190 ± 0.158 ops/ms
TestAdler32.testAdler32Update 65536 thrpt 25 22.873 ± 0.014 ops/ms
2. __ vsetvli(temp0, count, Assembler::e16, LMULx2) for (MaxVectorSize == 16)
Benchmark (count) Mode Cnt Score Error Units
TestAdler32.testAdler32Update 64 thrpt 25 7683.759 ± 92.761 ops/ms
TestAdler32.testAdler32Update 128 thrpt 25 6226.934 ± 71.597 ops/ms
TestAdler32.testAdler32Update 256 thrpt 25 4409.333 ± 27.677 ops/ms
TestAdler32.testAdler32Update 512 thrpt 25 2813.737 ± 5.570 ops/ms
TestAdler32.testAdler32Update 1024 thrpt 25 1635.601 ± 1.207 ops/ms
TestAdler32.testAdler32Update 2048 thrpt 25 891.615 ± 0.999 ops/ms
TestAdler32.testAdler32Update 5012 thrpt 25 382.035 ± 0.255 ops/ms
TestAdler32.testAdler32Update 8192 thrpt 25 237.338 ± 0.282 ops/ms
TestAdler32.testAdler32Update 16384 thrpt 25 120.517 ± 0.044 ops/ms
TestAdler32.testAdler32Update 32768 thrpt 25 58.957 ± 0.059 ops/ms
TestAdler32.testAdler32Update 65536 thrpt 25 29.881 ± 0.009 ops/ms
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/18382#discussion_r1660568178
More information about the hotspot-compiler-dev
mailing list