RFR: 8360934: Add AVX-512 intrinsics for ML-KEM - enhancement on AVX512_VBMI and AVX512_VBMI2 [v2]
Volodymyr Paprotski
vpaprotski at openjdk.org
Wed Jan 7 18:04:14 UTC 2026
On Wed, 7 Jan 2026 17:47:59 GMT, Ferenc Rakoczi <duke at openjdk.org> wrote:
>> I wasn't as clear in my question. The asm is indeed processing the bytes in the increment. What I was trying to convince myself about.. 'how come we are not reading past the end of the array. Or are we?'.
>>
>> On one hand, this is exactly what the existing asm code does, so I will assume that its correct. However, on the java side/version of this code, I could only convince myself about processing ~two AVX512 vectors at a time, not four.
>>
>> So either I cant count, or there is some further (implicit) restrictions on the callers of `twelve2Sixteen`
>
> In ML_KEM.java there is this assert (and this is the only call to implKyber12To16()
>
> assert ((remainder == 0) || (remainder == 48)) &&
> (index + i * 96 <= condensed.length);
> implKyber12To16(condensed, index, parsed, parsedLength);
>
> and one can check how the callers of twelve2Sixteen() make sure that this is the case.
Yep, thats exactly the assert I was looking at as well.. looks to me like its dividing the 'expanded-short-array-length' by 64 and ensuring the remainder is zero (ignoring the 48 for a bit.. and the condensed-length check).
(for simplicity) So the 'expanded' array length should be a multiple of 64; i.e. 128-bytes. But we stride the expanded array by 256 bytes? (and parsedLength by 128-shorts..)
I haven't checked the callers of `twelve2Sixteen` but I suspect that the length of the expanded array is always a multiple of 256-bytes (128-shorts).. in which case, the assert is 'incomplete'?
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/28815#discussion_r2669535184
More information about the hotspot-compiler-dev
mailing list