RFR: JDK-8300584: Accelerate AVX-512 CRC32C for small buffers

Vladimir Kozlov kvn at openjdk.org
Wed Jan 18 22:36:28 UTC 2023


On Wed, 18 Jan 2023 19:21:58 GMT, Scott Gibbons <duke at openjdk.org> wrote:

> Use AVX2 code for CRC32C for small buffers in the AVX-512 path.  Breakeven buffer size between the two algorithms is on the order of 384 bytes.
> 
> **Performance numbers for previous:**
> 
> Benchmark                    (count)   Mode  Cnt      Score     Error   Units
> TestCRC32C.testCRC32CUpdate       64  thrpt    4  66974.957 ±   8.872  ops/ms
> TestCRC32C.testCRC32CUpdate      128  thrpt    4  44224.810 ±  11.801  ops/ms
> TestCRC32C.testCRC32CUpdate      256  thrpt    4  63997.611 ± 173.577  ops/ms
> TestCRC32C.testCRC32CUpdate      512  thrpt    4  56068.683 ± 269.582  ops/ms
> TestCRC32C.testCRC32CUpdate     2048  thrpt    4  27048.098 ±  87.350  ops/ms
> TestCRC32C.testCRC32CUpdate    16384  thrpt    4   4066.736 ±  10.318  ops/ms
> TestCRC32C.testCRC32CUpdate    65536  thrpt    4   1040.754 ±   6.419  ops/ms
> 
> 
> **Performance numbers for this version:**
> 
> Benchmark                    (count)   Mode  Cnt       Score     Error   Units
> TestCRC32C.testCRC32CUpdate       64  thrpt    3  161659.326 ±  74.974  ops/ms
> TestCRC32C.testCRC32CUpdate      128  thrpt    3   88456.935 ±  11.940  ops/ms
> TestCRC32C.testCRC32CUpdate      256  thrpt    3   73254.993 ±   5.004  ops/ms
> TestCRC32C.testCRC32CUpdate      512  thrpt    3   56508.541 ± 298.229  ops/ms
> TestCRC32C.testCRC32CUpdate     2048  thrpt    3   26701.995 ±  31.369  ops/ms
> TestCRC32C.testCRC32CUpdate    16384  thrpt    3    4110.819 ±   4.618  ops/ms
> TestCRC32C.testCRC32CUpdate    65536  thrpt    3    1045.821 ±   2.037  ops/ms

I mean this (`else` is removed so that `AVX2` code is always generated):

  __ enter(); // required for proper stackwalking of RuntimeStub frame

  Label L_small, L_continue;

  if (VM_Version::supports_sse4_1() && VM_Version::supports_avx512_vpclmulqdq() &&
      VM_Version::supports_avx512bw() &&
      VM_Version::supports_avx512vl()) {
    __ cmpl(len, 384);
    __ jcc(Assembler::belowEqual, L_small);

    __ lea(j, ExternalAddress(StubRoutines::x86::crc32c_table_avx512_addr()));
    __ kernel_crc32_avx512(crc, buf, len, j, l, k);
    __ jmp(L_continue);

    __ bind(L_small);
  }
#ifdef _WIN64
  __ push(y);
  __ push(z);
#endif
  __ crc32c_ipl_alg2_alt2(crc, buf, len,
                          a, j, k,
                          l, y, z,
                          c_farg0, c_farg1, c_farg2,
                          is_pclmulqdq_supported);
#ifdef _WIN64
  __ pop(z);
  __ pop(y);
#endif
  __ bind(L_continue);
  __ movl(rax, crc);
  __ vzeroupper();
  __ leave(); // required for proper stackwalking of RuntimeStub frame
  __ ret(0);

-------------

PR: https://git.openjdk.org/jdk/pull/12079


More information about the hotspot-dev mailing list