RFR: JDK-8300584: Accelerate AVX-512 CRC32C for small buffers

Scott Gibbons duke at openjdk.org
Wed Jan 18 23:10:14 UTC 2023


On Wed, 18 Jan 2023 22:32:17 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:

>> Use AVX2 code for CRC32C for small buffers in the AVX-512 path.  Breakeven buffer size between the two algorithms is on the order of 384 bytes.
>> 
>> **Performance numbers for previous:**
>> 
>> Benchmark                    (count)   Mode  Cnt      Score     Error   Units
>> TestCRC32C.testCRC32CUpdate       64  thrpt    4  66974.957 ±   8.872  ops/ms
>> TestCRC32C.testCRC32CUpdate      128  thrpt    4  44224.810 ±  11.801  ops/ms
>> TestCRC32C.testCRC32CUpdate      256  thrpt    4  63997.611 ± 173.577  ops/ms
>> TestCRC32C.testCRC32CUpdate      512  thrpt    4  56068.683 ± 269.582  ops/ms
>> TestCRC32C.testCRC32CUpdate     2048  thrpt    4  27048.098 ±  87.350  ops/ms
>> TestCRC32C.testCRC32CUpdate    16384  thrpt    4   4066.736 ±  10.318  ops/ms
>> TestCRC32C.testCRC32CUpdate    65536  thrpt    4   1040.754 ±   6.419  ops/ms
>> 
>> 
>> **Performance numbers for this version:**
>> 
>> Benchmark                    (count)   Mode  Cnt       Score     Error   Units
>> TestCRC32C.testCRC32CUpdate       64  thrpt    3  161659.326 ±  74.974  ops/ms
>> TestCRC32C.testCRC32CUpdate      128  thrpt    3   88456.935 ±  11.940  ops/ms
>> TestCRC32C.testCRC32CUpdate      256  thrpt    3   73254.993 ±   5.004  ops/ms
>> TestCRC32C.testCRC32CUpdate      512  thrpt    3   56508.541 ± 298.229  ops/ms
>> TestCRC32C.testCRC32CUpdate     2048  thrpt    3   26701.995 ±  31.369  ops/ms
>> TestCRC32C.testCRC32CUpdate    16384  thrpt    3    4110.819 ±   4.618  ops/ms
>> TestCRC32C.testCRC32CUpdate    65536  thrpt    3    1045.821 ±   2.037  ops/ms
>
> I mean this (`else` is removed so that `AVX2` code is always generated):
> 
>   __ enter(); // required for proper stackwalking of RuntimeStub frame
> 
>   Label L_small, L_continue;
> 
>   if (VM_Version::supports_sse4_1() && VM_Version::supports_avx512_vpclmulqdq() &&
>       VM_Version::supports_avx512bw() &&
>       VM_Version::supports_avx512vl()) {
>     __ cmpl(len, 384);
>     __ jcc(Assembler::belowEqual, L_small);
> 
>     __ lea(j, ExternalAddress(StubRoutines::x86::crc32c_table_avx512_addr()));
>     __ kernel_crc32_avx512(crc, buf, len, j, l, k);
>     __ jmp(L_continue);
> 
>     __ bind(L_small);
>   }
> #ifdef _WIN64
>   __ push(y);
>   __ push(z);
> #endif
>   __ crc32c_ipl_alg2_alt2(crc, buf, len,
>                           a, j, k,
>                           l, y, z,
>                           c_farg0, c_farg1, c_farg2,
>                           is_pclmulqdq_supported);
> #ifdef _WIN64
>   __ pop(z);
>   __ pop(y);
> #endif
>   __ bind(L_continue);
>   __ movl(rax, crc);
>   __ vzeroupper();
>   __ leave(); // required for proper stackwalking of RuntimeStub frame
>   __ ret(0);

@vnkozlov Thanks for clarifying.  I've pushed changed code - let me know if this is what you were thinking.

-------------

PR: https://git.openjdk.org/jdk/pull/12079


More information about the hotspot-dev mailing list