RFR: 8343502: RISC-V: SIGBUS in updateBytesCRC32 after JDK-8339738
Hamlin Li
mli at openjdk.org
Mon Nov 4 13:27:28 UTC 2024
On Mon, 4 Nov 2024 03:48:48 GMT, Fei Yang <fyang at openjdk.org> wrote:
> Hi, please review this small change.
>
> [JDK-8339738](https://bugs.openjdk.org/browse/JDK-8339738) adds vectorization for crc32 intrinsic, which does `vle32.v` from the input byte buffer and calculates the checksum. But the input byte buffer could be misaligned (not 4-byte aligned). This leads to SIGBUS on hardware platforms like `BPI-F3` board where misaligned vector loads are not supported. Similar issue is there for scalar version as well, which could mean performance issue on other hardwares. Patch fixes this issue by adding a small alignment processing on entry for both scalar and vector version.
>
> This also fixes another potential issue in tail handling, where we do a single `lwu` for both versions to load the remaining bytes from the input byte buffer and extract each byte from the loaded 32-bit value. Since we only have less than 4 bytes remaining, this `lwu` would exceed the buffer limit, which I think is not safe. Patch fixes this issue by doing three separate `lbu` instead.
>
> Testing on `BPI-F3` with RVV 1.0 extension:
> - [x] test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32.java
> - [x] test/jdk/java/util/zip/TestCRC32.java
> - [x] SPECjbb2015
>
> No obvious impact witnessed on `micro:java.util.TestCRC32`:
>
>
> Before:
> Benchmark (count) Mode Cnt Score Error Units
> TestCRC32.testCRC32Update 64 thrpt 12 4778.903 ± 1.793 ops/ms
> TestCRC32.testCRC32Update 128 thrpt 12 2655.639 ± 2.958 ops/ms
> TestCRC32.testCRC32Update 256 thrpt 12 1430.997 ± 0.970 ops/ms
> TestCRC32.testCRC32Update 512 thrpt 12 965.785 ± 1.840 ops/ms
> TestCRC32.testCRC32Update 2048 thrpt 12 303.056 ± 0.620 ops/ms
> TestCRC32.testCRC32Update 16384 thrpt 12 40.601 ± 0.220 ops/ms
> TestCRC32.testCRC32Update 65536 thrpt 12 9.575 ± 0.045 ops/ms
> TestCRC32C.testCRC32CUpdate 64 thrpt 12 3923.698 ± 23.209 ops/ms
> TestCRC32C.testCRC32CUpdate 128 thrpt 12 2514.616 ± 22.991 ops/ms
> TestCRC32C.testCRC32CUpdate 256 thrpt 12 1477.223 ± 2.319 ops/ms
> TestCRC32C.testCRC32CUpdate 512 thrpt 12 806.179 ± 1.961 ops/ms
> TestCRC32C.testCRC32CUpdate 2048 thrpt 12 216.396 ± 0.172 ops/ms
> TestCRC32C.testCRC32CUpdate 16384 thrpt 12 27.526 ± 0.049 ops/ms
> TestCRC32C.testCRC32CUpdate 65536 thrpt 12 6.530 ± 0.041 ops/ms
>
> After:
> Benchmark (count) Mode Cnt Score Error Units
> TestCRC32.testCRC32Update 64 ...
Does the non-alignment data cause any similar issue with other intrinsics?
-------------
PR Comment: https://git.openjdk.org/jdk/pull/21863#issuecomment-2454710709
More information about the hotspot-dev
mailing list