RFR: 8360934: Add AVX-512 intrinsics for ML-KEM - enhancement on AVX512_VBMI and AVX512_VBMI2 [v2]

Wed Jan 7 13:21:35 UTC 2026

On Wed, 7 Jan 2026 00:18:43 GMT, Volodymyr Paprotski <vpaprotski at openjdk.org> wrote:

> "Insert 0b0000 nibble after every third nibble". I only have two questions, looks good otherwise.

Yes, that is the idea.

> 
> PS: things I've considered:
> 
> * Loop controls?
>   
>   * ML_KEM.java guarantees  (per callee comment and assert) lengths are multiple of 64
>   * also same as original code
> * Why not simply a vpermb? Have zeroes already from the masked load with k1..

It *is* using vpermb (evpermb() generates the EVEX encoded VPERMB)

>   
>   * shuffle granularity is actually 4-bits, not 8-bits

Really? In what instruction? I hadn't found it in the manual.

> * logical shift already zeroes top bits, so `vpand` not required?

Only every 2nd byte is shifted, the rest needs to be masked.
>   
>   * odd columns not shifted, so still have extra bits that need clearing

Yes, that is what the vpand does. (actually, it also (unnecessarily) masks the shifted bytes.

> * Why VBMI?
>   
>   * needed for `evpermb`

Yes.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/28815#issuecomment-3718842604