RFR: 8371259: ML-DSA AVX2 and AVX512 intrinsics and improvements [v3]

Volodymyr Paprotski vpaprotski at openjdk.org
Mon Nov 24 17:19:12 UTC 2025


On Mon, 24 Nov 2025 16:28:44 GMT, Mark Powers <mpowers at openjdk.org> wrote:

> SignatureBench.MLDSA with `+UseDilithiumIntrinsics` shows an average 1.61% improvement across all algorithms and data sizes. Measuring SignatureBench.MLDSA against a baseline build without the fix, shows an average 2.24% improvement across all algorithms and data sizes.

Need bit of clarification.. (I think you are saying there is a regression?). 
- `+UseDilithiumIntrinsics` should be redundant (i.e. `vm_version_x86.cpp` should automatically detect and turn the feature on).
    - So if I read correctly.. the baseline measured is already has the original intrinsics (implicitly) enabled.. 
        - therefore there is a 2.24% noise in the benchmark?

In my measurements for AVX512 parts, I had seen between 0%->6% across `SignatureBench.MLDSA`
    - (some variation on desktop-vs-server parts..)
    - `SignatureBench.MLDSA.verify` was worse, only 0->2% depending on keysize (iirc, bigger portion of benchmark was in SHA3 instead)
    - `SignatureBench.MLDSA.sign` was better, 4-6% (also depending on datasize)

That is also why I had included the other (deleted) microbenchmark.. `SignatureBench.MLDSA` has a lot of 'other things' (e.g. SHA3) also happening, so the AVX512 intrinsic changes were harder to differentiate from noise..
    - I had measured ~25%-50% improvement on purely the 5 intrinsics changed..

Hence the claim 'never worse'.. A more precise claim..:
    - "New intrinsics seem to be better, but (at least for AVX512) existing intrinsics were already plenty good for MLDSA"

-------------

PR Comment: https://git.openjdk.org/jdk/pull/28136#issuecomment-3571871477


More information about the hotspot-dev mailing list