RFR: 8317720: RISC-V: Implement Adler32 intrinsic [v17]
Fei Yang
fyang at openjdk.org
Tue Jul 16 06:54:59 UTC 2024
On Mon, 15 Jul 2024 20:24:11 GMT, ArsenyBochkarev <duke at openjdk.org> wrote:
>> Hello everyone! Please review this ~non-vectorized~ implementation of `_updateBytesAdler32` intrinsic. Reference implementation for AArch64 can be found [here](https://github.com/openjdk/jdk9/blob/master/hotspot/src/cpu/aarch64/vm/stubGenerator_aarch64.cpp#L3281).
>>
>> ### Correctness checks
>>
>> Test `test/hotspot/jtreg/compiler/intrinsics/zip/TestAdler32.java` is ok. All tier1 also passed.
>>
>> ### Performance results on T-Head board
>>
>> Enabled intrinsic:
>>
>> | Benchmark | (count) | Mode | Cnt | Score | Error | Units |
>> | ------------------------------------- | ----------- | ------ | --------- | ------ | --------- | ---------- |
>> | Adler32.TestAdler32.testAdler32Update | 64 | thrpt | 25 | 5522.693 | 23.387 | ops/ms |
>> | Adler32.TestAdler32.testAdler32Update | 128 | thrpt | 25 | 3430.761 | 9.210 | ops/ms |
>> | Adler32.TestAdler32.testAdler32Update | 256 | thrpt | 25 | 1962.888 | 5.323 | ops/ms |
>> | Adler32.TestAdler32.testAdler32Update | 512 | thrpt | 25 | 1050.938 | 0.144 | ops/ms |
>> | Adler32.TestAdler32.testAdler32Update | 1024 | thrpt | 25 | 549.227 | 0.375 | ops/ms |
>> | Adler32.TestAdler32.testAdler32Update | 2048 | thrpt | 25 | 280.829 | 0.170 | ops/ms |
>> | Adler32.TestAdler32.testAdler32Update | 5012 | thrpt | 25 | 116.333 | 0.057 | ops/ms |
>> | Adler32.TestAdler32.testAdler32Update | 8192 | thrpt | 25 | 71.392 | 0.060 | ops/ms |
>> | Adler32.TestAdler32.testAdler32Update | 16384 | thrpt | 25 | 35.784 | 0.019 | ops/ms |
>> | Adler32.TestAdler32.testAdler32Update | 32768 | thrpt | 25 | 17.924 | 0.010 | ops/ms |
>> | Adler32.TestAdler32.testAdler32Update | 65536 | thrpt | 25 | 8.940 | 0.003 | ops/ms |
>>
>> Disabled intrinsic:
>>
>> | Benchmark | (count) | Mode | Cnt | Score | Error | Units |
>> | ------------------------------------- | ----------- | ------ | --------- | ------ | --------- | ---------- |
>> |Adler32.TestAdler32.testAdler32Update|64|thrpt|25|655.633|5.845|ops/ms|
>> |Adler32.TestAdler32.testAdler32Update|128|thrpt|25|587.418|10.062|ops/ms|
>> |Adler32.TestAdler32.testAdler32Update|256|thrpt|25|546.675|11.598|ops/ms|
>> |Adler32.TestAdler32.testAdler32Update|512|thrpt|25|432.328|11.517|ops/ms|
>> |Adler32.TestAdler32.testAdler32Update|1024|thrpt|25|311.771|4.238|ops/ms|
>> |Adler32.TestAdler32.testAdler32Update|2048|thrpt|25|202.648|2.486|ops/ms|
>> |Adler32.TestAdler32.testAdler32Update|5012|thrpt|...
>
> ArsenyBochkarev has updated the pull request incrementally with two additional commits since the last revision:
>
> - Use t2 instead of count as scratch register
> - Remove blt after by16_loop_unroll
src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5048:
> 5046: VectorRegister vzero, VectorRegister vbytes, VectorRegister vs1acc, VectorRegister vs2acc,
> 5047: Register temp0, Register temp1, Register temp2, Register temp3,
> 5048: VectorRegister vtemp1, VectorRegister vtemp2, int step, Assembler::LMUL LMUL) {
Better to use small ‘lmul’ and `lmulx2` for `LMUL` and `LMULx2` respectively to be consistent in naming style.
src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5277:
> 5275: __ sub(len, len, step_64);
> 5276: // By now the count should still be 64
> 5277: __ bge(len, count, L_by16_loop_unroll);
Code comment and `count` here needs update as well when you change to `t2`.
src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5290:
> 5288: __ bltz(len, L_do_mod);
> 5289:
> 5290: __ bind(L_by1_loop);
The loop body of `L_by1_loop` and `L_simple_by1_loop` looks the same except for the branch at the end. Could we eliminate `L_simple_by1_loop` and jump to `L_by1_loop` instead? Seems the only need is to substract `len` by one before the jump.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/18382#discussion_r1678848582
PR Review Comment: https://git.openjdk.org/jdk/pull/18382#discussion_r1678841827
PR Review Comment: https://git.openjdk.org/jdk/pull/18382#discussion_r1678845662
More information about the hotspot-compiler-dev
mailing list