RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v7]
Volodymyr Paprotski
vpaprotski at openjdk.org
Sat Mar 22 20:05:12 UTC 2025
On Thu, 20 Mar 2025 21:06:30 GMT, Ferenc Rakoczi <duke at openjdk.org> wrote:
>> src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 58:
>>
>>> 56:
>>> 57: ATTRIBUTE_ALIGNED(64) static const uint32_t dilithiumAvx512Perms[] = {
>>> 58: // collect montmul results into the destination register
>>
>> same as `dilithiumAvx512Consts()`, 'magic offsets'; except here they are harder to count (eg. not clear visually what is the offset of `ntt inverse`).
>>
>> Could be split into three constant arrays to make the compiler count for us
>
> Well, it is 64 bytes per line (16 4-byte uint32_ts), not that hard :-) ...
Ha! I didn't realize it was 16 per line.. ran out of fingers while counting!!! :)
'works for me, as long as its a "premeditated" decision'
>> src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 980:
>>
>>> 978: // Dilithium multiply polynomials in the NTT domain.
>>> 979: // Implements
>>> 980: // static int implDilithiumNttMult(
>>
>> I suppose no java changes in this PR, but I notice that the inputs are all assumed to have fixed size.
>>
>> Most/all intrinsics I worked with had some sort of guard (eg `Objects.checkFromIndexSize`) right before the intrinsic java call. (It usually looks like it can be optimized away). But I notice no such guard here on the java side.
>
> These functions will not be used anywhere else and in ML_DSA.java all of the arrays passed to inrinsics are of the correct size.
Works for me; just thought I would point it out, so its a 'premeditated' decision.
>> src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 1012:
>>
>>> 1010: __ evmovdqul(xmm28, Address(perms, 0), Assembler::AVX_512bit);
>>> 1011:
>>> 1012: __ movl(len, 4);
>>
>> Compile-time constant, why not 'unroll at compile time'? i.e. wrap this loop with `for (int len=0; len<4; len++)` instead?
>
> I have found that unrolling these loops actually hurts performance (probably an I-cache effect.
Interesting; I keep on having to re-train my intuition, thanks for the data
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2008806159
PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2008805574
PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2008805113
More information about the hotspot-dev
mailing list