RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v3]
Jatin Bhateja
jbhateja at openjdk.org
Wed Mar 5 14:05:53 UTC 2025
On Wed, 5 Mar 2025 13:07:54 GMT, Ferenc Rakoczi <duke at openjdk.org> wrote:
>> src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 292:
>>
>>> 290: __ movl(iterations, 2);
>>> 291:
>>> 292: __ BIND(L_loop);
>>
>> Hi @ferakocz , Kindly align loop entry address using __align64() here and at all the places before __BIND(LOOP)
>
> Hi, @jatin-bhateja, thanks for the suggestion. I have added __ align(OptoLoopAlignment); before all loop entries.
Hi @ferakocz ,
Thanks!, for efficient utilization of Decode ICache (please refer to Intel SDM section 3.4.2.5), code blocks should be aligned to 32-byte boundaries; a 64-byte aligned code is a superset of both 16 and 32 byte aligned addresses and also matches with the cacheline size. However, I can noticed that we have been using OptoLoopAlignment at places in AES-GCM also.
I introduced some errors in generate_dilithiumAlmostInverseNtt_avx512 implementation in anticipation of catching it through existing ML_DSA_Tests under
test/jdk/sun/security/provider/acvp
But all the tests passed for me.
`java -jar /home/jatinbha/sandboxes/jtreg/build/images/jtreg/lib/jtreg.jar -jdk:$JAVA_HOME -Djdk.test.lib.artifacts.ACVP-Server=/home/jatinbha/softwares/v1.1.0.38.zip -va -timeout:4 Launcher.java`
Can you please point out a test I need to use for validation
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r1981468903
More information about the hotspot-dev
mailing list