RFR: 8317720: RISC-V: Implement Adler32 intrinsic [v11]
ArsenyBochkarev
duke at openjdk.org
Thu Jun 27 12:49:18 UTC 2024
On Wed, 26 Jun 2024 18:27:57 GMT, ArsenyBochkarev <duke at openjdk.org> wrote:
>> Yeah, it will be more simpler if you use it for any MaxVectorSize. But to my surprise, it will bring negative performane impact when I test on my BananaPi-F3 (MaxVectorSize = 32). I am not sure whether it's an issue across the boards. Maybe you can give it a try on your hardware to see?
>
> Hi! Sorry for such a late reply. Here are the results on Banana-Pi.
> Numbers for different `vsetvli` depending on the `MaxVectorSize` (= 32 by default on Banana-Pi):
>
> Benchmark (count) Mode Cnt Score Error Units
> Adler32.TestAdler32.testAdler32Update 64 thrpt 25 7564.243 ± 103.398 ops/ms
> Adler32.TestAdler32.testAdler32Update 128 thrpt 25 5691.398 ± 72.201 ops/ms
> Adler32.TestAdler32.testAdler32Update 256 thrpt 25 3831.476 ± 32.985 ops/ms
> Adler32.TestAdler32.testAdler32Update 512 thrpt 25 2325.933 ± 10.056 ops/ms
> Adler32.TestAdler32.testAdler32Update 1024 thrpt 25 1311.345 ± 3.337 ops/ms
> Adler32.TestAdler32.testAdler32Update 2048 thrpt 25 697.412 ± 0.906 ops/ms
> Adler32.TestAdler32.testAdler32Update 5012 thrpt 25 294.467 ± 0.131 ops/ms
> Adler32.TestAdler32.testAdler32Update 8192 thrpt 25 182.197 ± 0.075 ops/ms
> Adler32.TestAdler32.testAdler32Update 16384 thrpt 25 92.100 ± 0.075 ops/ms
> Adler32.TestAdler32.testAdler32Update 32768 thrpt 25 45.242 ± 0.006 ops/ms
> Adler32.TestAdler32.testAdler32Update 65536 thrpt 25 22.878 ± 0.008 ops/ms
>
>
> and for stable `vsetvli` with `LMULx2` for any `MaxVectorSize`:
>
> Benchmark (count) Mode Cnt Score Error Units
> Adler32.TestAdler32.testAdler32Update 64 thrpt 25 7439.400 ± 128.059 ops/ms
> Adler32.TestAdler32.testAdler32Update 128 thrpt 25 5662.910 ± 73.459 ops/ms
> Adler32.TestAdler32.testAdler32Update 256 thrpt 25 3831.516 ± 33.050 ops/ms
> Adler32.TestAdler32.testAdler32Update 512 thrpt 25 2345.353 ± 9.946 ops/ms
> Adler32.TestAdler32.testAdler32Update 1024 thrpt 25 1305.337 ± 3.294 ops/ms
> Adler32.TestAdler32.testAdler32Update 2048 thrpt 25 695.615 ± 0.775 ops/ms
> Adler32.TestAdler32.testAdler32Update 5012 thrpt 25 294.332 ± 0.132 ops/ms
> Adler32.TestAdler32.testAdler32Update 8192 thrpt 25 182.178 ± 0.086 ops/ms
> Adler32.TestAdler32.testAdler32Update 16384 thrpt 25 92.125 ± 0.032 ops/ms
> Adler32.TestAdler32.testAdler32Update 32768 thrpt 25 45.238 ± 0.030 ops/ms
> Adler32.TestAdler32.testAdler32Update 65536 thrpt 25 22.880 ± 0.008 ops/ms
So I suppose it is safe to stay on conditionless `vsetvli`?
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/18382#discussion_r1657073016
More information about the hotspot-compiler-dev
mailing list