RFR: 8360934: Add AVX-512 intrinsics for ML-KEM - enhancement on AVX512_VBMI [v4]
Jatin Bhateja
jbhateja at openjdk.org
Sat Jan 10 03:26:20 UTC 2026
On Thu, 8 Jan 2026 17:59:35 GMT, Shawn M Emery <duke at openjdk.org> wrote:
>> This change allows use of the AVX512_VBMI instruction set to further optimize decompression/parsing of polynomial coefficients for ML-KEM. The speedup gained in the ML-KEM benchmarks for key generation is between 0.3 to 0.6%, encapsulation is 0.4 to 1.7%, and decapsulation is 0.3 to 1.9%.
>>
>> Thank you to @sviswa7 and @ferakocz for their help in working through the early stages of this code with me.
>
> Shawn M Emery has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision:
>
> - Merge with mainline
> - 8360934: Add AVX-512 intrinsics for ML-KEM - enhancement on AVX512_VBMI
> Change Swap to Dup named function/variable
> Check for only VBMI support (not VBMI2)
> - Update copyright year
> - Merge with mainline
> - Swap parameter operation with source
> - Remove wrong mask from evpsrlvw
> - Reverse ordering for vpermb and vpsrlvw instructions
> - Switch from vpshldvw to vpsrlvw
> - Fix whitespaces
> - 8360934: Add AVX-512 intrinsics for ML-KEM - enhancement on AVX512_VBMI and AVX512_VBMI2
src/hotspot/cpu/x86/stubGenerator_x86_64_kyber.cpp line 876:
> 874: __ evmovdquq(xmm22, Address(perms), Assembler::AVX_512bit);
> 875:
> 876: __ BIND(VBMILoop);
Better to align loop sarting address to OptoLoopAlignment
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/28815#discussion_r2678272848
More information about the hotspot-compiler-dev
mailing list