RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v13]
Sandhya Viswanathan
sviswanathan at openjdk.org
Sat Apr 5 00:44:56 UTC 2025
On Wed, 2 Apr 2025 07:38:34 GMT, Ferenc Rakoczi <duke at openjdk.org> wrote:
>> By using the AVX-512 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled.
>
> Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision:
>
> Reacting to comment by Sandhya.
src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 339:
> 337:
> 338: // levels 2 to 7 are done in 2 batches, by first saving half of the coefficients
> 339: // from level 1 into memory, doing all the level 2 to level 7 computations
In line number 344 - 347, we seem to be storing all the coefficients from level 1 into memory.
src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 345:
> 343:
> 344: store4Xmms(coeffs, 0, xmm0_3, _masm);
> 345: store4Xmms(coeffs, 4 * XMMBYTES, xmm4_7, _masm);
This seems to be unnecessary store.
src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 370:
> 368: loadPerm(xmm16_19, perms, nttL4PermsIdx, _masm);
> 369: loadPerm(xmm12_15, perms, nttL4PermsIdx + 64, _masm);
> 370: load4Xmms(xmm24_27, zetas, 4 * 512, _masm); // for level 3
The comment // for level3 is not relevant here and could be removed.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2029437396
PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2029578599
PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2029583308
More information about the graal-dev
mailing list