RFR: JDK-8300584: Accelerate AVX-512 CRC32C for small buffers
Vladimir Kozlov
kvn at openjdk.org
Wed Jan 18 22:36:28 UTC 2023
On Wed, 18 Jan 2023 19:21:58 GMT, Scott Gibbons <duke at openjdk.org> wrote:
> Use AVX2 code for CRC32C for small buffers in the AVX-512 path. Breakeven buffer size between the two algorithms is on the order of 384 bytes.
>
> **Performance numbers for previous:**
>
> Benchmark (count) Mode Cnt Score Error Units
> TestCRC32C.testCRC32CUpdate 64 thrpt 4 66974.957 ± 8.872 ops/ms
> TestCRC32C.testCRC32CUpdate 128 thrpt 4 44224.810 ± 11.801 ops/ms
> TestCRC32C.testCRC32CUpdate 256 thrpt 4 63997.611 ± 173.577 ops/ms
> TestCRC32C.testCRC32CUpdate 512 thrpt 4 56068.683 ± 269.582 ops/ms
> TestCRC32C.testCRC32CUpdate 2048 thrpt 4 27048.098 ± 87.350 ops/ms
> TestCRC32C.testCRC32CUpdate 16384 thrpt 4 4066.736 ± 10.318 ops/ms
> TestCRC32C.testCRC32CUpdate 65536 thrpt 4 1040.754 ± 6.419 ops/ms
>
>
> **Performance numbers for this version:**
>
> Benchmark (count) Mode Cnt Score Error Units
> TestCRC32C.testCRC32CUpdate 64 thrpt 3 161659.326 ± 74.974 ops/ms
> TestCRC32C.testCRC32CUpdate 128 thrpt 3 88456.935 ± 11.940 ops/ms
> TestCRC32C.testCRC32CUpdate 256 thrpt 3 73254.993 ± 5.004 ops/ms
> TestCRC32C.testCRC32CUpdate 512 thrpt 3 56508.541 ± 298.229 ops/ms
> TestCRC32C.testCRC32CUpdate 2048 thrpt 3 26701.995 ± 31.369 ops/ms
> TestCRC32C.testCRC32CUpdate 16384 thrpt 3 4110.819 ± 4.618 ops/ms
> TestCRC32C.testCRC32CUpdate 65536 thrpt 3 1045.821 ± 2.037 ops/ms
I mean this (`else` is removed so that `AVX2` code is always generated):
__ enter(); // required for proper stackwalking of RuntimeStub frame
Label L_small, L_continue;
if (VM_Version::supports_sse4_1() && VM_Version::supports_avx512_vpclmulqdq() &&
VM_Version::supports_avx512bw() &&
VM_Version::supports_avx512vl()) {
__ cmpl(len, 384);
__ jcc(Assembler::belowEqual, L_small);
__ lea(j, ExternalAddress(StubRoutines::x86::crc32c_table_avx512_addr()));
__ kernel_crc32_avx512(crc, buf, len, j, l, k);
__ jmp(L_continue);
__ bind(L_small);
}
#ifdef _WIN64
__ push(y);
__ push(z);
#endif
__ crc32c_ipl_alg2_alt2(crc, buf, len,
a, j, k,
l, y, z,
c_farg0, c_farg1, c_farg2,
is_pclmulqdq_supported);
#ifdef _WIN64
__ pop(z);
__ pop(y);
#endif
__ bind(L_continue);
__ movl(rax, crc);
__ vzeroupper();
__ leave(); // required for proper stackwalking of RuntimeStub frame
__ ret(0);
-------------
PR: https://git.openjdk.org/jdk/pull/12079
More information about the hotspot-dev
mailing list