RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v3]

Jatin Bhateja jbhateja at openjdk.org
Wed Mar 5 14:05:53 UTC 2025


On Wed, 5 Mar 2025 13:07:54 GMT, Ferenc Rakoczi <duke at openjdk.org> wrote:

>> src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 292:
>> 
>>> 290:   __ movl(iterations, 2);
>>> 291: 
>>> 292:   __ BIND(L_loop);
>> 
>> Hi @ferakocz , Kindly align loop entry address using __align64() here and at all the places before __BIND(LOOP)
>
> Hi, @jatin-bhateja, thanks for the suggestion. I have added __ align(OptoLoopAlignment); before all loop entries.

Hi @ferakocz , 

Thanks!, for efficient utilization of Decode ICache (please refer to Intel SDM section 3.4.2.5), code blocks should be aligned to 32-byte boundaries; a 64-byte aligned code is a superset of both 16 and 32 byte aligned addresses and also matches with the cacheline size. However, I can noticed that we have been using OptoLoopAlignment at places in AES-GCM also.

I introduced some errors in generate_dilithiumAlmostInverseNtt_avx512 implementation in anticipation of catching it through existing ML_DSA_Tests under 
test/jdk/sun/security/provider/acvp

But all the tests passed for me.
`java  -jar /home/jatinbha/sandboxes/jtreg/build/images/jtreg/lib/jtreg.jar -jdk:$JAVA_HOME -Djdk.test.lib.artifacts.ACVP-Server=/home/jatinbha/softwares/v1.1.0.38.zip -va -timeout:4 Launcher.java`

Can you please point out a test I need to use for validation

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r1981468903


More information about the hotspot-dev mailing list