RFR: 8360934: Add AVX-512 intrinsics for ML-KEM - enhancement on AVX512_VBMI [v4]

Sat Jan 10 03:26:20 UTC 2026

On Thu, 8 Jan 2026 17:59:35 GMT, Shawn M Emery <duke at openjdk.org> wrote:

>> This change allows use of the AVX512_VBMI instruction set to further optimize decompression/parsing of polynomial coefficients for ML-KEM.  The speedup gained in the ML-KEM benchmarks for key generation is between 0.3 to 0.6%, encapsulation is  0.4 to 1.7%, and decapsulation is 0.3 to 1.9%.
>> 
>> Thank you to @sviswa7 and @ferakocz for their help in working through the early stages of this code with me.
>
> Shawn M Emery has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision:
> 
>  - Merge with mainline
>  - 8360934: Add AVX-512 intrinsics for ML-KEM - enhancement on AVX512_VBMI
>    Change Swap to Dup named function/variable
>    Check for only VBMI support (not VBMI2)
>  - Update copyright year
>  - Merge with mainline
>  - Swap parameter operation with source
>  - Remove wrong mask from evpsrlvw
>  - Reverse ordering for vpermb and vpsrlvw instructions
>  - Switch from vpshldvw to vpsrlvw
>  - Fix whitespaces
>  - 8360934: Add AVX-512 intrinsics for ML-KEM - enhancement on AVX512_VBMI and AVX512_VBMI2

src/hotspot/cpu/x86/stubGenerator_x86_64_kyber.cpp line 876:

> 874:     __ evmovdquq(xmm22, Address(perms), Assembler::AVX_512bit);
> 875: 
> 876:     __ BIND(VBMILoop);

Better to align loop sarting address to OptoLoopAlignment

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/28815#discussion_r2678272848