RFR: 8317720: RISC-V: Implement Adler32 intrinsic [v11]

ArsenyBochkarev duke at openjdk.org
Wed Jun 26 18:30:12 UTC 2024


On Wed, 19 Jun 2024 03:37:20 GMT, Fei Yang <fyang at openjdk.org> wrote:

>> Sounds good, thanks! As far as I can see it is safe to do it for any `MaxVectorSize`, so that the code will be even simplier. Please correct me if I'm wrong
>
> Yeah, it will be more simpler if you use it for any MaxVectorSize. But to my surprise, it will bring negative performane impact when I test on my BananaPi-F3 (MaxVectorSize = 32). I am not sure whether it's an issue across the boards. Maybe you can give it a try on your hardware to see?

Hi! Sorry for such a late reply. Here are the results on Banana-Pi. 
Numbers for different `vsetvli` depending on the `MaxVectorSize` (= 32 by default on Banana-Pi):

Benchmark                              (count)   Mode  Cnt     Score     Error   Units
Adler32.TestAdler32.testAdler32Update       64  thrpt   25  7564.243 ± 103.398  ops/ms
Adler32.TestAdler32.testAdler32Update      128  thrpt   25  5691.398 ±  72.201  ops/ms
Adler32.TestAdler32.testAdler32Update      256  thrpt   25  3831.476 ±  32.985  ops/ms
Adler32.TestAdler32.testAdler32Update      512  thrpt   25  2325.933 ±  10.056  ops/ms
Adler32.TestAdler32.testAdler32Update     1024  thrpt   25  1311.345 ±   3.337  ops/ms
Adler32.TestAdler32.testAdler32Update     2048  thrpt   25   697.412 ±   0.906  ops/ms
Adler32.TestAdler32.testAdler32Update     5012  thrpt   25   294.467 ±   0.131  ops/ms
Adler32.TestAdler32.testAdler32Update     8192  thrpt   25   182.197 ±   0.075  ops/ms
Adler32.TestAdler32.testAdler32Update    16384  thrpt   25    92.100 ±   0.075  ops/ms
Adler32.TestAdler32.testAdler32Update    32768  thrpt   25    45.242 ±   0.006  ops/ms
Adler32.TestAdler32.testAdler32Update    65536  thrpt   25    22.878 ±   0.008  ops/ms


and for stable `vsetvli` with `LMULx2` for any `MaxVectorSize`:

Benchmark                              (count)   Mode  Cnt     Score     Error   Units
Adler32.TestAdler32.testAdler32Update       64  thrpt   25  7439.400 ± 128.059  ops/ms
Adler32.TestAdler32.testAdler32Update      128  thrpt   25  5662.910 ±  73.459  ops/ms
Adler32.TestAdler32.testAdler32Update      256  thrpt   25  3831.516 ±  33.050  ops/ms
Adler32.TestAdler32.testAdler32Update      512  thrpt   25  2345.353 ±   9.946  ops/ms
Adler32.TestAdler32.testAdler32Update     1024  thrpt   25  1305.337 ±   3.294  ops/ms
Adler32.TestAdler32.testAdler32Update     2048  thrpt   25   695.615 ±   0.775  ops/ms
Adler32.TestAdler32.testAdler32Update     5012  thrpt   25   294.332 ±   0.132  ops/ms
Adler32.TestAdler32.testAdler32Update     8192  thrpt   25   182.178 ±   0.086  ops/ms
Adler32.TestAdler32.testAdler32Update    16384  thrpt   25    92.125 ±   0.032  ops/ms
Adler32.TestAdler32.testAdler32Update    32768  thrpt   25    45.238 ±   0.030  ops/ms
Adler32.TestAdler32.testAdler32Update    65536  thrpt   25    22.880 ±   0.008  ops/ms

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/18382#discussion_r1655347904


More information about the hotspot-compiler-dev mailing list