RFR: 8343502: RISC-V: SIGBUS in updateBytesCRC32 after JDK-8339738
Fei Yang
fyang at openjdk.org
Tue Nov 5 01:34:32 UTC 2024
On Mon, 4 Nov 2024 13:16:23 GMT, Hamlin Li <mli at openjdk.org> wrote:
>> Hi, please review this small change.
>>
>> [JDK-8339738](https://bugs.openjdk.org/browse/JDK-8339738) adds vectorization for crc32 intrinsic, which does `vle32.v` from the input byte buffer and calculates the checksum. But the input byte buffer could be misaligned (not 4-byte aligned). This leads to SIGBUS on hardware platforms like `BPI-F3` board where misaligned vector loads are not supported. Similar issue is there for scalar version as well, which could mean performance issue on other hardwares. Patch fixes this issue by adding a small alignment processing on entry for both scalar and vector version.
>>
>> This also fixes another potential issue in tail handling, where we do a single `lwu` for both versions to load the remaining bytes from the input byte buffer and extract each byte from the loaded 32-bit value. Since we only have less than 4 bytes remaining, this `lwu` would exceed the buffer limit, which I think is not safe. Patch fixes this issue by doing three separate `lbu` instead.
>>
>> Testing on `BPI-F3` with RVV 1.0 extension:
>> - [x] test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32.java
>> - [x] test/jdk/java/util/zip/TestCRC32.java
>> - [x] SPECjbb2015
>>
>> No obvious impact witnessed on `micro:java.util.TestCRC32`:
>>
>>
>> Before:
>> Benchmark (count) Mode Cnt Score Error Units
>> TestCRC32.testCRC32Update 64 thrpt 12 4778.903 ± 1.793 ops/ms
>> TestCRC32.testCRC32Update 128 thrpt 12 2655.639 ± 2.958 ops/ms
>> TestCRC32.testCRC32Update 256 thrpt 12 1430.997 ± 0.970 ops/ms
>> TestCRC32.testCRC32Update 512 thrpt 12 965.785 ± 1.840 ops/ms
>> TestCRC32.testCRC32Update 2048 thrpt 12 303.056 ± 0.620 ops/ms
>> TestCRC32.testCRC32Update 16384 thrpt 12 40.601 ± 0.220 ops/ms
>> TestCRC32.testCRC32Update 65536 thrpt 12 9.575 ± 0.045 ops/ms
>> TestCRC32C.testCRC32CUpdate 64 thrpt 12 3923.698 ± 23.209 ops/ms
>> TestCRC32C.testCRC32CUpdate 128 thrpt 12 2514.616 ± 22.991 ops/ms
>> TestCRC32C.testCRC32CUpdate 256 thrpt 12 1477.223 ± 2.319 ops/ms
>> TestCRC32C.testCRC32CUpdate 512 thrpt 12 806.179 ± 1.961 ops/ms
>> TestCRC32C.testCRC32CUpdate 2048 thrpt 12 216.396 ± 0.172 ops/ms
>> TestCRC32C.testCRC32CUpdate 16384 thrpt 12 27.526 ± 0.049 ops/ms
>> TestCRC32C.testCRC32CUpdate 65536 thrpt 12 6.530 ± 0.041 ops/ms
>>
>> After:
>> Benchmark (count) Mode Cn...
>
> src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1649:
>
>> 1647:
>> 1648: subw(len, len, 1);
>> 1649: lbu(tmp1, Address(buf));
>
> I guess after the alignment prelogue added in this pr, the previous tail implementation should be fine now.
> As in the case of both java heap and native heap, the tail of passed-in buf can not cross the boundary of pages.
Yeah, maybe. But I don't think it's good to do things that would exceed the buffer limit :-)
I prefer a safer approach which won't performance much because it's only tail processing here.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/21863#discussion_r1828608490
More information about the hotspot-dev
mailing list