RFR: 8371259: ML-DSA AVX2 and AVX512 intrinsics and improvements [v2]

Sandhya Viswanathan sviswanathan at openjdk.org
Thu Nov 20 21:34:56 UTC 2025


On Mon, 17 Nov 2025 23:35:44 GMT, Volodymyr Paprotski <vpaprotski at openjdk.org> wrote:

>> - New AVX2 intrinsics are 1.6x-6.9x faster than Java baseline 
>>    - `SignatureBench.MLDSA` is 1.2x-2.2x faster
>>    - Note: there is no AVX2-SHA3 intrinsics yet (Being reviewed https://github.com/vpaprotsk/jdk/pull/7)
>> - AVX512 intrinsic improvements are 1.24x-1.5x faster then current version 
>>   - `SignatureBench.MLDSA` is upto 5% faster, never slower
>> 
>> Note on intrinsic:
>> - The emitted (existing) AVX512 assembler was not "significantly" changed; mostly more efficient instruction selection and tighter register allocation, which allowed removal of NTT loop and stack spill.
>> - Code was refactored to allow reuse of same assembler (as possible) for AVX512 and AVX2
>> 
>> Tests and benchmarks:
>> - Added a fuzz test to ensure Java and intrinsic produces exactly same result
>> - Added benchmark to measure the performance of intrinsic itself
>> 
>> make test TEST="test/jdk/sun/security/provider/acvp/Launcher.java test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java"
>> make test TEST="test/jdk/sun/security/provider/acvp/Launcher.java test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java" JTREG="JAVA_OPTIONS=-XX:UseAVX=2"
>> make test TEST="micro:org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA" MICRO="JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions -XX:+UseDilithiumIntrinsics;FORK=1"
>> make test TEST="micro:org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA" MICRO="JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions -XX:-UseDilithiumIntrinsics;FORK=1"
>
> Volodymyr Paprotski has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - whitespace
>  - address first comments

src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 1283:

> 1281:     // r1 = r1 & quotient; // copy 0 or keep as is, using EqMsk as filter
> 1282:     for (int i = 0; i < regCnt; i++) {
> 1283:       // FIXME: replace with void evmovdqul(Address dst, KRegister mask, XMMRegister src, bool merge, int vector_len);?

Is the fixme a leftover?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2547729185


More information about the security-dev mailing list