RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v13]

Sat Apr 5 00:44:56 UTC 2025

On Wed, 2 Apr 2025 07:38:34 GMT, Ferenc Rakoczi <duke at openjdk.org> wrote:

>> By using the AVX-512 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled.
>
> Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Reacting to comment by Sandhya.

src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 339:

> 337: 
> 338:   // levels 2 to 7 are done in 2 batches, by first saving half of the coefficients
> 339:   // from level 1 into memory, doing all the level 2 to level 7 computations

In line number 344 - 347, we seem to be storing all the coefficients from level 1 into memory.

src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 345:

> 343: 
> 344:   store4Xmms(coeffs, 0, xmm0_3, _masm);
> 345:   store4Xmms(coeffs, 4 * XMMBYTES, xmm4_7, _masm);

This seems to be unnecessary store.

src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 370:

> 368:   loadPerm(xmm16_19, perms, nttL4PermsIdx, _masm);
> 369:   loadPerm(xmm12_15, perms, nttL4PermsIdx + 64, _masm);
> 370:   load4Xmms(xmm24_27, zetas, 4 * 512, _masm); // for level 3

The comment // for level3 is not relevant here and could be removed.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2029437396
PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2029578599
PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2029583308