RFR: 8317720: RISC-V: Implement Adler32 intrinsic [v11]

ArsenyBochkarev duke at openjdk.org
Wed Jun 26 11:19:18 UTC 2024


On Wed, 19 Jun 2024 03:37:20 GMT, Fei Yang <fyang at openjdk.org> wrote:

>> Sounds good, thanks! As far as I can see it is safe to do it for any `MaxVectorSize`, so that the code will be even simplier. Please correct me if I'm wrong
>
> Yeah, it will be more simpler if you use it for any MaxVectorSize. But to my surprise, it will bring negative performane impact when I test on my BananaPi-F3 (MaxVectorSize = 32). I am not sure whether it's an issue across the boards. Maybe you can give it a try on your hardware to see?

Hi! Sorry for such a late reply. Here are the results on Kendryte K230 for both options.
Without condition for `MaxVectorSize`:

Benchmark                              (count)   Mode  Cnt     Score   Error   Units
Adler32.TestAdler32.testAdler32Update       64  thrpt   25  7253.108 ? 7.362  ops/ms
Adler32.TestAdler32.testAdler32Update      128  thrpt   25  5849.102 ? 9.263  ops/ms
Adler32.TestAdler32.testAdler32Update      256  thrpt   25  4220.237 ? 4.695  ops/ms
Adler32.TestAdler32.testAdler32Update      512  thrpt   25  2717.498 ? 3.531  ops/ms
Adler32.TestAdler32.testAdler32Update     1024  thrpt   25  1585.664 ? 2.455  ops/ms
Adler32.TestAdler32.testAdler32Update     2048  thrpt   25   865.849 ? 0.717  ops/ms
Adler32.TestAdler32.testAdler32Update     5012  thrpt   25   372.534 ? 0.293  ops/ms
Adler32.TestAdler32.testAdler32Update     8192  thrpt   25   231.532 ? 0.306  ops/ms
Adler32.TestAdler32.testAdler32Update    16384  thrpt   25   117.180 ? 0.157  ops/ms
Adler32.TestAdler32.testAdler32Update    32768  thrpt   25    55.013 ? 0.152  ops/ms
Adler32.TestAdler32.testAdler32Update    65536  thrpt   25    25.604 ? 0.126  ops/ms


and for different `vsetvli` depending on the `MaxVectorSize`:

Benchmark                              (count)   Mode  Cnt     Score   Error   Units
Adler32.TestAdler32.testAdler32Update       64  thrpt   25  7239.485 ? 8.705  ops/ms
Adler32.TestAdler32.testAdler32Update      128  thrpt   25  5836.018 ? 9.489  ops/ms
Adler32.TestAdler32.testAdler32Update      256  thrpt   25  4212.986 ? 4.596  ops/ms
Adler32.TestAdler32.testAdler32Update      512  thrpt   25  2712.742 ? 3.114  ops/ms
Adler32.TestAdler32.testAdler32Update     1024  thrpt   25  1583.161 ? 2.374  ops/ms
Adler32.TestAdler32.testAdler32Update     2048  thrpt   25   864.321 ? 0.870  ops/ms
Adler32.TestAdler32.testAdler32Update     5012  thrpt   25   371.964 ? 0.463  ops/ms
Adler32.TestAdler32.testAdler32Update     8192  thrpt   25   231.092 ? 0.328  ops/ms
Adler32.TestAdler32.testAdler32Update    16384  thrpt   25   116.995 ? 0.189  ops/ms
Adler32.TestAdler32.testAdler32Update    32768  thrpt   25    54.923 ? 0.075  ops/ms
Adler32.TestAdler32.testAdler32Update    65536  thrpt   25    24.864 ? 0.618  ops/ms

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/18382#discussion_r1654623817


More information about the hotspot-compiler-dev mailing list