RFR: 8360934: Add AVX-512 intrinsics for ML-KEM - enhancement on AVX512_VBMI and AVX512_VBMI2 [v2]

Wed Jan 7 06:22:48 UTC 2026

On Tue, 6 Jan 2026 23:29:52 GMT, Volodymyr Paprotski <vpaprotski at openjdk.org> wrote:

>> Shawn M Emery has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Update copyright year
>
> src/hotspot/cpu/x86/stubGenerator_x86_64_kyber.cpp line 862:
> 
>> 860:   __ addptr(condensed, condensedOffs);
>> 861: 
>> 862:   if (VM_Version::supports_avx512_vbmi2()) {
> 
> Which instruction needs vbmi2? All I could spot was that `evpermb` that needs vbmi. Relax the restriction slightly?

Good catch!  Initially the code was using `vpshldvw`, but was changed to just use `vpsrlvw`.  Fixed in next commit.
I should probably update the bug synopsis to exclude VBMI2?

> src/hotspot/cpu/x86/stubGenerator_x86_64_kyber.cpp line 906:
> 
>> 904:       __ addptr(condensed, 192);
>> 905:       __ addptr(parsed, 256);
>> 906:       __ subl(parsedLength, 128);
> 
> (128 instead of 256 here because `parsedLength` is an index to an `short` array..)
> 
> I am confused by the stride. The `twelve2Sixteen()` seems to (almost) guarantee that the parsed length is a multiple of 64 (last block can be 48 bytes). This would imply a stride of 128 bytes for `parsed`. And 96 for `condensed`.
> 
> This is exactly how the existing code already behaves so I am less concerned, but I would like an explanation why it works?

I believe the numbers are right: with each pass 256 bytes of coefficients are `parsed` into the parse buffer.  This means that half of the coefficients have been processed (`parseLength` = 128).  Would having a comment stating as such be sufficient for your concerns?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/28815#discussion_r2667206396
PR Review Comment: https://git.openjdk.org/jdk/pull/28815#discussion_r2667206828