RFR: 8360934: Add AVX-512 intrinsics for ML-KEM - enhancement on AVX512_VBMI and AVX512_VBMI2 [v2]
Ferenc Rakoczi
duke at openjdk.org
Wed Jan 7 13:21:35 UTC 2026
On Wed, 7 Jan 2026 00:18:43 GMT, Volodymyr Paprotski <vpaprotski at openjdk.org> wrote:
> "Insert 0b0000 nibble after every third nibble". I only have two questions, looks good otherwise.
Yes, that is the idea.
>
> PS: things I've considered:
>
> * Loop controls?
>
> * ML_KEM.java guarantees (per callee comment and assert) lengths are multiple of 64
> * also same as original code
> * Why not simply a vpermb? Have zeroes already from the masked load with k1..
It *is* using vpermb (evpermb() generates the EVEX encoded VPERMB)
>
> * shuffle granularity is actually 4-bits, not 8-bits
Really? In what instruction? I hadn't found it in the manual.
> * logical shift already zeroes top bits, so `vpand` not required?
Only every 2nd byte is shifted, the rest needs to be masked.
>
> * odd columns not shifted, so still have extra bits that need clearing
Yes, that is what the vpand does. (actually, it also (unnecessarily) masks the shifted bytes.
> * Why VBMI?
>
> * needed for `evpermb`
Yes.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/28815#issuecomment-3718842604
More information about the security-dev
mailing list