RFR: 8339738: RISC-V: Vectorize crc32 intrinsic [v6]
Fei Yang
fyang at openjdk.org
Wed Sep 11 14:33:08 UTC 2024
On Wed, 11 Sep 2024 08:59:23 GMT, Hamlin Li <mli at openjdk.org> wrote:
>> Hi,
>> Can you help to review this patch?
>> Thanks.
>>
>> This improvement is based on java.base/share/native/libzip/zlib/zcrc32.c, I made some modification to N (to 16) related code, then re-generate the tables needed, finally vectorize the code (original implementation in zcrc32.c is just scalar code).
>>
>> ## Test
>> test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32.java,
>> test/jdk/java/util/zip/TestCRC32.java
>>
>> ## Performance
>>
>> ### on bananapi
>>
>> with patch
>> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">
>> Benchmark | (count) | Mode | Cnt | Score | Error | Units
>> -- | -- | -- | -- | -- | -- | --
>> TestCRC32.testCRC32Update | 64 | avgt | 10 | 222.297 | 0.106 | ns/op
>> TestCRC32.testCRC32Update | 128 | avgt | 10 | 365.144 | 0.196 | ns/op
>> TestCRC32.testCRC32Update | 256 | avgt | 10 | 687.14 | 0.235 | ns/op
>> TestCRC32.testCRC32Update | 512 | avgt | 10 | 1043.833 | 0.083 | ns/op
>> TestCRC32.testCRC32Update | 2048 | avgt | 10 | 3299.928 | 1.361 | ns/op
>> TestCRC32.testCRC32Update | 16384 | avgt | 10 | 24384.502 | 25.298 | ns/op
>> TestCRC32.testCRC32Update | 65536 | avgt | 10 | 103200.458 | 8.297 | ns/op
>>
>> </google-sheets-html-origin>
>>
>> without patch
>> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">
>> Benchmark | (count) | Mode | Cnt | Score | Error | Units
>> -- | -- | -- | -- | -- | -- | --
>> TestCRC32.testCRC32Update | 64 | avgt | 10 | 220.878 | 0.02 | ns/op
>> TestCRC32.testCRC32Update | 128 | avgt | 10 | 364.173 | 0.032 | ns/op
>> TestCRC32.testCRC32Update | 256 | avgt | 10 | 685.815 | 0.055 | ns/op
>> TestCRC32.testCRC32Update | 512 | avgt | 10 | 1329.049 | 0.084 | ns/op
>> TestCRC32.testCRC32Update | 2048 | avgt | 10 | 5189.302 | 0.666 | ns/op
>> TestCRC32.testCRC32Update | 16384 | avgt | 10 | 41250.873 | 23.882 | ns/op
>> TestCRC32.testCRC32Update | 65536 | avgt | 10 | 171664.002 | 15.011 | ns/op
>>
>> </google-sheets-html-origin>
> ...
>
> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision:
>
> remove redundant jump
src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1583:
> 1581: const int64_t tmp_limit = MaxVectorSize >= 32 ? unroll_words*2 : unroll_words*4;
> 1582: sub(tmp1, len, tmp_limit);
> 1583: bge(tmp1, zr, L_vector_entry);
I don't quite understand this compare of `len` with `tmp_limit` here as I see `len` has already been updated on entry with `subw(len, len, unroll_words)`. Should we compare with the original `len` before the update? (And remove the `addi(len, len, unroll_words)` in `vector_update_crc32` at the same time).
src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1654:
> 1652:
> 1653: addiw(len, len, -4);
> 1654: bge(len, zr, L_by4_loop);
Suggestion: `bgez(len, L_by4_loop);`
src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1656:
> 1654: bge(len, zr, L_by4_loop);
> 1655: addiw(len, len, 4);
> 1656: bgt(len, zr, L_by1_loop);
Suggestion: `bgtz(len, L_by1_loop);`
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/20910#discussion_r1754754549
PR Review Comment: https://git.openjdk.org/jdk/pull/20910#discussion_r1753728829
PR Review Comment: https://git.openjdk.org/jdk/pull/20910#discussion_r1753730125
More information about the hotspot-dev
mailing list