RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v5]

Volodymyr Paprotski vpaprotski at openjdk.org
Mon Mar 17 21:49:14 UTC 2025


On Thu, 6 Mar 2025 17:37:33 GMT, Ferenc Rakoczi <duke at openjdk.org> wrote:

>> By using the AVX-512 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled.
>
> Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Accepted review comments.

src/hotspot/cpu/x86/stubGenerator_x86_64.hpp line 494:

> 492:   address generate_sha3_implCompress(StubGenStubId stub_id);
> 493: 
> 494:   address generate_double_keccak();

you can hide internal helper functions (i.e. `montmulEven(*)`) if you wish. 

The trick is to add `MacroAssembler* _masm` as a parameter to the static (local) function. Its a trick I use to keep header clean, but still have plenty of helpers

src/hotspot/cpu/x86/stubGenerator_x86_64_sha3.cpp line 409:

> 407:   __ evmovdquq(xmm29, Address(permsAndRots, 768), Assembler::AVX_512bit);
> 408:   __ evmovdquq(xmm30, Address(permsAndRots, 832), Assembler::AVX_512bit);
> 409:   __ evmovdquq(xmm31, Address(permsAndRots, 896), Assembler::AVX_512bit);

Matter of taste, but I liked the compactness of montmulEven; i.e. 

for (i=0; i<15; i++)
    __ evmovdquq(xmm(17+i), Address(permsAndRots, 64*i), Assembler::AVX_512bit);

src/hotspot/cpu/x86/stubGenerator_x86_64_sha3.cpp line 426:

> 424:   __ subl( roundsLeft, 1);
> 425: 
> 426:   __ evmovdquw(xmm5, xmm0, Assembler::AVX_512bit);

Is there a pattern here; that can be 'compacted' into a loop?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r1983903347
PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r1983935964
PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r1983937154


More information about the hotspot-dev mailing list