RFR: 8349721: Add aarch64 intrinsics for ML-KEM [v7]
Andrew Dinn
adinn at openjdk.org
Tue Apr 15 14:18:54 UTC 2025
On Thu, 10 Apr 2025 13:19:05 GMT, Ferenc Rakoczi <duke at openjdk.org> wrote:
>> By using the aarch64 vector registers the speed of the computation of the ML-KEM algorithms (key generation, encapsulation, decapsulation) can be approximately doubled.
>
> Ferenc Rakoczi has updated the pull request incrementally with two additional commits since the last revision:
>
> - Code rearrange, some renaming, fixing comments
> - Changes suggested by Andrew Dinn.
src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5665:
> 5663: vs_ld2_post(vs_back(vs1), __ T8H, nttb);
> 5664: vs_ld2_post(vs_front(vs4), __ T8H, ntta);
> 5665: vs_ld2_post(vs_back(vs4), __ T8H, nttb);
Suggestion:
vs_ld2_post(vs_front(vs1), __ T8H, ntta); // <a0, a1> x 8H
vs_ld2_post(vs_back(vs1), __ T8H, nttb); // <b0, b1> x 8H
vs_ld2_post(vs_front(vs4), __ T8H, ntta); // <a2, a3> x 8H
vs_ld2_post(vs_back(vs4), __ T8H, nttb); // <b2, b3> x 8H
src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5668:
> 5666: // montmul the first and second pair of values loaded into vs1
> 5667: // in order and then with one pair reversed storing the two
> 5668: // results in vs3
Suggestion:
// compute 4 montmul cross-products for pairs (a0,a1) and (b0,b1)
// i.e. montmul the first and second halves of vs1 in order and
// then with one sequence reversed storing the two results in vs3
//
// vs3[0] <- montmul(a0, b0)
// vs3[1] <- montmul(a1, b1)
// vs3[2] <- montmul(a0, b1)
// vs3[3] <- montmul(a1, b0)
src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5674:
> 5672: // montmul the first and second pair of values loaded into vs4
> 5673: // in order and then with one pair reversed storing the two
> 5674: // results in vs1
Suggestion:
// compute 4 montmul cross-products for pairs (a2,a3) and (b2,b3)
// i.e. montmul the first and second halves of vs4 in order and
// then with one sequence reversed storing the two results in vs1
//
// vs1[0] <- montmul(a2, b2)
// vs1[1] <- montmul(a3, b3)
// vs1[2] <- montmul(a2, b3)
// vs1[3] <- montmul(a3, b2)
src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5680:
> 5678: // for each pair of results pick the second value in the first
> 5679: // pair to create a sequence that we montmul by the zetas
> 5680: // i.e. we want sequence <vs3[1], vs1[1]>
Suggestion:
// montmul result 2 of each cross-product i.e. (a1*b1, a3*b3) by a zeta.
// We can schedule two montmuls at a time if we use a suitable vector
// sequence <vs3[1], vs1[1]>.
src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5683:
> 5681: int delta = vs1[1]->encoding() - vs3[1]->encoding();
> 5682: VSeq<2> vs5(vs3[1], delta);
> 5683: kyber_montmul16(vs5, vz, vs5, vs_front(vs2), vq);
Suggestion:
// vs3[1] <- montmul(montmul(a1, b1), z0)
// vs1[1] <- montmul(montmul(a3, b3), z1)
kyber_montmul16(vs5, vz, vs5, vs_front(vs2), vq);
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/23663#discussion_r2044679089
PR Review Comment: https://git.openjdk.org/jdk/pull/23663#discussion_r2044682671
PR Review Comment: https://git.openjdk.org/jdk/pull/23663#discussion_r2044684696
PR Review Comment: https://git.openjdk.org/jdk/pull/23663#discussion_r2044689607
PR Review Comment: https://git.openjdk.org/jdk/pull/23663#discussion_r2044691632
More information about the security-dev
mailing list