RFR: 8371259: ML-DSA AVX2 and AVX512 intrinsics and improvements [v3]
Volodymyr Paprotski
vpaprotski at openjdk.org
Mon Nov 24 17:19:12 UTC 2025
On Mon, 24 Nov 2025 16:28:44 GMT, Mark Powers <mpowers at openjdk.org> wrote:
> SignatureBench.MLDSA with `+UseDilithiumIntrinsics` shows an average 1.61% improvement across all algorithms and data sizes. Measuring SignatureBench.MLDSA against a baseline build without the fix, shows an average 2.24% improvement across all algorithms and data sizes.
Need bit of clarification.. (I think you are saying there is a regression?).
- `+UseDilithiumIntrinsics` should be redundant (i.e. `vm_version_x86.cpp` should automatically detect and turn the feature on).
- So if I read correctly.. the baseline measured is already has the original intrinsics (implicitly) enabled..
- therefore there is a 2.24% noise in the benchmark?
In my measurements for AVX512 parts, I had seen between 0%->6% across `SignatureBench.MLDSA`
- (some variation on desktop-vs-server parts..)
- `SignatureBench.MLDSA.verify` was worse, only 0->2% depending on keysize (iirc, bigger portion of benchmark was in SHA3 instead)
- `SignatureBench.MLDSA.sign` was better, 4-6% (also depending on datasize)
That is also why I had included the other (deleted) microbenchmark.. `SignatureBench.MLDSA` has a lot of 'other things' (e.g. SHA3) also happening, so the AVX512 intrinsic changes were harder to differentiate from noise..
- I had measured ~25%-50% improvement on purely the 5 intrinsics changed..
Hence the claim 'never worse'.. A more precise claim..:
- "New intrinsics seem to be better, but (at least for AVX512) existing intrinsics were already plenty good for MLDSA"
-------------
PR Comment: https://git.openjdk.org/jdk/pull/28136#issuecomment-3571871477
More information about the hotspot-dev
mailing list