RFR: 8360934: Add AVX-512 intrinsics for ML-KEM - enhancement on AVX512_VBMI and AVX512_VBMI2 [v2]
Shawn M Emery
duke at openjdk.org
Wed Jan 7 06:22:48 UTC 2026
On Tue, 6 Jan 2026 23:29:52 GMT, Volodymyr Paprotski <vpaprotski at openjdk.org> wrote:
>> Shawn M Emery has updated the pull request incrementally with one additional commit since the last revision:
>>
>> Update copyright year
>
> src/hotspot/cpu/x86/stubGenerator_x86_64_kyber.cpp line 862:
>
>> 860: __ addptr(condensed, condensedOffs);
>> 861:
>> 862: if (VM_Version::supports_avx512_vbmi2()) {
>
> Which instruction needs vbmi2? All I could spot was that `evpermb` that needs vbmi. Relax the restriction slightly?
Good catch! Initially the code was using `vpshldvw`, but was changed to just use `vpsrlvw`. Fixed in next commit.
I should probably update the bug synopsis to exclude VBMI2?
> src/hotspot/cpu/x86/stubGenerator_x86_64_kyber.cpp line 906:
>
>> 904: __ addptr(condensed, 192);
>> 905: __ addptr(parsed, 256);
>> 906: __ subl(parsedLength, 128);
>
> (128 instead of 256 here because `parsedLength` is an index to an `short` array..)
>
> I am confused by the stride. The `twelve2Sixteen()` seems to (almost) guarantee that the parsed length is a multiple of 64 (last block can be 48 bytes). This would imply a stride of 128 bytes for `parsed`. And 96 for `condensed`.
>
> This is exactly how the existing code already behaves so I am less concerned, but I would like an explanation why it works?
I believe the numbers are right: with each pass 256 bytes of coefficients are `parsed` into the parse buffer. This means that half of the coefficients have been processed (`parseLength` = 128). Would having a comment stating as such be sufficient for your concerns?
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/28815#discussion_r2667206396
PR Review Comment: https://git.openjdk.org/jdk/pull/28815#discussion_r2667206828
More information about the security-dev
mailing list