RFR: 8360934: Add AVX-512 intrinsics for ML-KEM - enhancement on AVX512_VBMI and AVX512_VBMI2 [v2]

Wed Jan 7 17:50:06 UTC 2026

On Wed, 7 Jan 2026 16:43:30 GMT, Volodymyr Paprotski <vpaprotski at openjdk.org> wrote:

>> I believe the numbers are right: with each pass 256 bytes of coefficients are `parsed` into the parse buffer.  This means that half of the coefficients have been processed (`parsedLength` = 128).  Would having a comment stating as such address your concerns?
>
> I wasn't as clear in my question. The asm is indeed processing the bytes in the increment. What I was trying to convince myself about.. 'how come we are not reading past the end of the array. Or are we?'.
> 
> On one hand, this is exactly what the existing asm code does, so I will assume that its correct. However, on the java side/version of this code, I could only convince myself about processing ~two AVX512 vectors at a time, not four.
> 
> So either I cant count, or there is some further (implicit) restrictions on the callers of `twelve2Sixteen`

In ML_KEM.java there is this  assert (and this is the only call to  implKyber12To16() 

        assert ((remainder == 0) || (remainder == 48)) &&
                (index + i * 96 <= condensed.length);
        implKyber12To16(condensed, index, parsed, parsedLength);

and one can check how the callers of twelve2Sixteen() make sure that this is the case.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/28815#discussion_r2669490940