RFR: 8358032: Use crypto pmull for CRC32(C) on Ampere CPU and improve for short inputs [v4]
Emanuel Peter
epeter at openjdk.org
Mon Jun 23 05:54:32 UTC 2025
On Thu, 5 Jun 2025 07:15:34 GMT, Liming Liu <lliu at openjdk.org> wrote:
>> This PR is to enable the use of crypto pmull for CRC32/CRC32C intrinsics on Ampere CPU. There is an option UseCryptoPmullForCRC32 that can enable crypto pmull, but directly enabling it on Ampere CPU will cause the following problems.
>>
>> 1. There will be regressions (-14% ~ -8%) on Ampere1 when the length is 64. When <= 128, both kernel_crc32_using_crc32 and kernel_crc32_using_crypto_pmull use the loop labeled as CRC_by32_loop, but their implements are a little different, and the loop in kernel_crc32_using_crc32 is better at hiding latency on Ampere1. So this PR takes the loop in kernel_crc32_using_crc32 to kernel_crc32_using_crypto_pmull, and does the same for CRC32C intrinsic.
>>
>> 2. The intrinsics only use crypto pmull when the length is higher than 383, while the loop in kernel_crc32_common_fold_using_crypto_pmull looks able to handle 256, and if it handles 256 on Ampere1, the improvements can be as high as 110% compared with kernel_crc32_using_crc32/kernel_crc32c_using_crc32c. However, there are regressions (~-6%) on Neoverse V1 when the length is 256. So this PR introduces a new option named CryptoPmullForCRC32LowLimit. It defaults to 256 since the code could handle 256, while it is set to 384 for V1/V2 to keep the old behavior on these platforms.
>>
>> The performance regressions and improvements were measured with the following microbenchmarks:
>> org.openjdk.bench.java.util.TestCRC32.testCRC32Update
>> org.openjdk.bench.java.util.TestCRC32C.testCRC32CUpdate
>>
>> Ran the following JTReg tests on Ampere1 and did not find problems:
>> test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32.java
>> test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32C.java
>
> Liming Liu has updated the pull request incrementally with one additional commit since the last revision:
>
> Add the message for the assertions
Looks like this is progressing nicely here :)
I have 2 comments below. Once you addressed them, I'll run some internal testing, and then I can approve it :)
src/hotspot/cpu/aarch64/globals_aarch64.hpp line 92:
> 90: product(bool, UseCryptoPmullForCRC32, false, \
> 91: "Use Crypto PMULL instructions for CRC32 computation") \
> 92: product(uint, CryptoPmullForCRC32LowLimit, 256, DIAGNOSTIC, \
Can you please add a test that uses this flag, and sets it to some selected values, and maybe even a random value?
src/hotspot/cpu/aarch64/vm_version_aarch64.cpp line 126:
> 124: warning("CryptoPmullForCRC32LowLimit must be a multiple of 128");
> 125: CryptoPmullForCRC32LowLimit = align_down(CryptoPmullForCRC32LowLimit, 128);
> 126: }
Can you describe somewhere why it has to be a multiple of `128`? Imagine someone comes across this later, and wonders if that is just some strange implementation limitation or something more fundamental, or something very subtle.
-------------
Changes requested by epeter (Reviewer).
PR Review: https://git.openjdk.org/jdk/pull/25609#pullrequestreview-2948733798
PR Review Comment: https://git.openjdk.org/jdk/pull/25609#discussion_r2160764901
PR Review Comment: https://git.openjdk.org/jdk/pull/25609#discussion_r2160767137
More information about the hotspot-dev
mailing list