RFR: 8317720: RISC-V: Implement Adler32 intrinsic [v7]

Gui Cao gcao at openjdk.org
Fri May 17 13:37:06 UTC 2024


On Thu, 16 May 2024 13:03:24 GMT, ArsenyBochkarev <duke at openjdk.org> wrote:

>> Hello everyone! Please review this ~non-vectorized~ implementation of `_updateBytesAdler32` intrinsic. Reference implementation for AArch64 can be found [here](https://github.com/openjdk/jdk9/blob/master/hotspot/src/cpu/aarch64/vm/stubGenerator_aarch64.cpp#L3281).
>> 
>> ### Correctness checks
>> 
>> Test `test/hotspot/jtreg/compiler/intrinsics/zip/TestAdler32.java` is ok. All tier1 also passed.
>> 
>> ### Performance results on T-Head board
>> 
>> Enabled intrinsic:
>> 
>> | Benchmark                          |    (count) |  Mode |  Cnt  |   Score  |  Error |  Units |
>> | ------------------------------------- | ----------- | ------ | --------- | ------ | --------- | ---------- |
>> | Adler32.TestAdler32.testAdler32Update |      64 | thrpt  | 25 | 5522.693 | 23.387 | ops/ms |
>> | Adler32.TestAdler32.testAdler32Update |     128 | thrpt |  25 | 3430.761 |  9.210 | ops/ms |
>> | Adler32.TestAdler32.testAdler32Update |     256 | thrpt |  25 | 1962.888 |  5.323 | ops/ms |
>> | Adler32.TestAdler32.testAdler32Update |     512 | thrpt  | 25 | 1050.938 |  0.144 | ops/ms |
>> | Adler32.TestAdler32.testAdler32Update |    1024 | thrpt  | 25 |  549.227 |  0.375 | ops/ms |
>> | Adler32.TestAdler32.testAdler32Update |    2048 | thrpt  | 25 |  280.829 |  0.170 | ops/ms |
>> | Adler32.TestAdler32.testAdler32Update |    5012 | thrpt  | 25 |  116.333 |  0.057 | ops/ms |
>> | Adler32.TestAdler32.testAdler32Update |    8192 | thrpt  | 25  |  71.392 |  0.060 | ops/ms |
>> | Adler32.TestAdler32.testAdler32Update |   16384 | thrpt |  25  |  35.784 |  0.019 | ops/ms |
>> | Adler32.TestAdler32.testAdler32Update |   32768 | thrpt |  25  |  17.924 |  0.010 | ops/ms |
>> | Adler32.TestAdler32.testAdler32Update  |  65536 | thrpt |  25  |   8.940 |  0.003 | ops/ms |
>> 
>> Disabled intrinsic:
>> 
>> | Benchmark                          |    (count) |  Mode |  Cnt  |   Score  |  Error |  Units |
>> | ------------------------------------- | ----------- | ------ | --------- | ------ | --------- | ---------- |
>> |Adler32.TestAdler32.testAdler32Update|64|thrpt|25|655.633|5.845|ops/ms|
>> |Adler32.TestAdler32.testAdler32Update|128|thrpt|25|587.418|10.062|ops/ms|
>> |Adler32.TestAdler32.testAdler32Update|256|thrpt|25|546.675|11.598|ops/ms|
>> |Adler32.TestAdler32.testAdler32Update|512|thrpt|25|432.328|11.517|ops/ms|
>> |Adler32.TestAdler32.testAdler32Update|1024|thrpt|25|311.771|4.238|ops/ms|
>> |Adler32.TestAdler32.testAdler32Update|2048|thrpt|25|202.648|2.486|ops/ms|
>> |Adler32.TestAdler32.testAdler32Update|5012|thrpt|...
>
> ArsenyBochkarev has updated the pull request incrementally with eight additional commits since the last revision:
> 
>  - Prettify L_nmax loop
>  - Add comments in functions
>  - Add explanation comment for L_nmax_loop
>  - Fix L_nmax_loop for big lengths
>  - Fix L_by16 loop step
>  - Prettify intrinsic
>  - Use LMUL=4 for most of the calculations
>  - Use LMUL to load multiple data in one step

Hi, I ran the jmh test on the Banana Pi BPI-F3 board (has RVV1.0):

Apply this pr and diable UseRVV

Benchmark                      (count)   Mode  Cnt     Score    Error   Units
TestAdler32.testAdler32Update       64  thrpt   25  1845.347 ± 17.961  ops/ms
TestAdler32.testAdler32Update      128  thrpt   25  1622.564 ± 18.082  ops/ms
TestAdler32.testAdler32Update      256  thrpt   25  1337.308 ± 12.022  ops/ms
TestAdler32.testAdler32Update      512  thrpt   25   971.847 ± 12.653  ops/ms
TestAdler32.testAdler32Update     1024  thrpt   25   637.476 ±  1.802  ops/ms
TestAdler32.testAdler32Update     2048  thrpt   25   377.564 ±  2.189  ops/ms
TestAdler32.testAdler32Update     5012  thrpt   25   172.410 ±  0.295  ops/ms
TestAdler32.testAdler32Update     8192  thrpt   25   109.077 ±  0.213  ops/ms
TestAdler32.testAdler32Update    16384  thrpt   25    55.915 ±  0.062  ops/ms
TestAdler32.testAdler32Update    32768  thrpt   25    26.653 ±  0.131  ops/ms
TestAdler32.testAdler32Update    65536  thrpt   25    13.421 ±  0.015  ops/ms
Finished running test 'micro:java.util.TestAdler32'


Apply this pr and enable UseRVV

Benchmark                      (count)   Mode  Cnt     Score     Error   Units
TestAdler32.testAdler32Update       64  thrpt   25  7822.238 ± 175.797  ops/ms
TestAdler32.testAdler32Update      128  thrpt   25  5054.415 ±   0.133  ops/ms
TestAdler32.testAdler32Update      256  thrpt   25  2859.404 ±  83.301  ops/ms
TestAdler32.testAdler32Update      512  thrpt   25  1546.183 ±  47.910  ops/ms
TestAdler32.testAdler32Update     1024  thrpt   25   808.569 ±  25.122  ops/ms
TestAdler32.testAdler32Update     2048  thrpt   25   413.848 ±  12.909  ops/ms
TestAdler32.testAdler32Update     5012  thrpt   25   168.005 ±   5.176  ops/ms
TestAdler32.testAdler32Update     8192  thrpt   25   159.197 ±   3.353  ops/ms
TestAdler32.testAdler32Update    16384  thrpt   25    78.056 ±   1.514  ops/ms
TestAdler32.testAdler32Update    32768  thrpt   25    45.334 ±   0.756  ops/ms
TestAdler32.testAdler32Update    65536  thrpt   25    24.339 ±   0.342  ops/ms
Finished running test 'micro:java.util.TestAdler32'

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18382#issuecomment-2117621767


More information about the hotspot-compiler-dev mailing list