RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v5]
Volodymyr Paprotski
vpaprotski at openjdk.org
Mon Mar 17 21:49:14 UTC 2025
On Thu, 6 Mar 2025 17:37:33 GMT, Ferenc Rakoczi <duke at openjdk.org> wrote:
>> By using the AVX-512 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled.
>
> Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision:
>
> Accepted review comments.
src/hotspot/cpu/x86/stubGenerator_x86_64.hpp line 494:
> 492: address generate_sha3_implCompress(StubGenStubId stub_id);
> 493:
> 494: address generate_double_keccak();
you can hide internal helper functions (i.e. `montmulEven(*)`) if you wish.
The trick is to add `MacroAssembler* _masm` as a parameter to the static (local) function. Its a trick I use to keep header clean, but still have plenty of helpers
src/hotspot/cpu/x86/stubGenerator_x86_64_sha3.cpp line 409:
> 407: __ evmovdquq(xmm29, Address(permsAndRots, 768), Assembler::AVX_512bit);
> 408: __ evmovdquq(xmm30, Address(permsAndRots, 832), Assembler::AVX_512bit);
> 409: __ evmovdquq(xmm31, Address(permsAndRots, 896), Assembler::AVX_512bit);
Matter of taste, but I liked the compactness of montmulEven; i.e.
for (i=0; i<15; i++)
__ evmovdquq(xmm(17+i), Address(permsAndRots, 64*i), Assembler::AVX_512bit);
src/hotspot/cpu/x86/stubGenerator_x86_64_sha3.cpp line 426:
> 424: __ subl( roundsLeft, 1);
> 425:
> 426: __ evmovdquw(xmm5, xmm0, Assembler::AVX_512bit);
Is there a pattern here; that can be 'compacted' into a loop?
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r1983903347
PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r1983935964
PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r1983937154
More information about the hotspot-dev
mailing list