RFR: 8360934: Add AVX-512 intrinsics for ML-KEM - enhancement on AVX512_VBMI and AVX512_VBMI2 [v2]
Ferenc Rakoczi
duke at openjdk.org
Wed Jan 7 17:50:06 UTC 2026
On Wed, 7 Jan 2026 16:43:30 GMT, Volodymyr Paprotski <vpaprotski at openjdk.org> wrote:
>> I believe the numbers are right: with each pass 256 bytes of coefficients are `parsed` into the parse buffer. This means that half of the coefficients have been processed (`parsedLength` = 128). Would having a comment stating as such address your concerns?
>
> I wasn't as clear in my question. The asm is indeed processing the bytes in the increment. What I was trying to convince myself about.. 'how come we are not reading past the end of the array. Or are we?'.
>
> On one hand, this is exactly what the existing asm code does, so I will assume that its correct. However, on the java side/version of this code, I could only convince myself about processing ~two AVX512 vectors at a time, not four.
>
> So either I cant count, or there is some further (implicit) restrictions on the callers of `twelve2Sixteen`
In ML_KEM.java there is this assert (and this is the only call to implKyber12To16()
assert ((remainder == 0) || (remainder == 48)) &&
(index + i * 96 <= condensed.length);
implKyber12To16(condensed, index, parsed, parsedLength);
and one can check how the callers of twelve2Sixteen() make sure that this is the case.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/28815#discussion_r2669490940
More information about the hotspot-compiler-dev
mailing list