RFR: 8349721: Add aarch64 intrinsics for ML-KEM [v7]
Andrew Dinn
adinn at openjdk.org
Thu Apr 10 14:42:35 UTC 2025
On Thu, 10 Apr 2025 13:19:05 GMT, Ferenc Rakoczi <duke at openjdk.org> wrote:
>> By using the aarch64 vector registers the speed of the computation of the ML-KEM algorithms (key generation, encapsulation, decapsulation) can be approximately doubled.
>
> Ferenc Rakoczi has updated the pull request incrementally with two additional commits since the last revision:
>
> - Code rearrange, some renaming, fixing comments
> - Changes suggested by Andrew Dinn.
src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5300:
> 5298: // level 5
> 5299: vs_ldpq(vq, kyberConsts);
> 5300: int offsets4[4] = { 0, 32, 64, 96 };
Again a comment
// At level 5 related coefficients occur in discrete blocks of size 8 so
// need to be loaded interleaved using an ld2 operation with arrangement 2D
src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5319:
> 5317: vs_st2_indexed(vs1, __ T2D, coeffs, tmpAddr, 384, offsets4);
> 5318:
> 5319: // level 6
And again
// At level 6 related coefficients occur in discrete blocks of size 4 so
// need to be loaded interleaved using an ld2 operation with arrangement 4S
src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5377:
> 5375: // level 0
> 5376: vs_ldpq(vq, kyberConsts);
> 5377: int offsets4[4] = { 0, 32, 64, 96 };
Again a comment
// At level 0 related coefficients occur in discrete blocks of size 4 so
// need to be loaded interleaved using an ld2 operation with arrangement 4S
src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5399:
> 5397: vs_st2_indexed(vs1, __ T4S, coeffs, tmpAddr, 384, offsets4);
> 5398:
> 5399: // level 1
Again a comment
// At level 1 related coefficients occur in discrete blocks of size 8 so
// need to be loaded interleaved using an ld2 operation with arrangement 2D
src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5423:
> 5421:
> 5422: // level 2
> 5423: int offsets3[8] = { 0, 32, 64, 96, 128, 160, 192, 224 };
Again
// At level 2 coefficients occur in 8 discrete blocks of size 16
// so they are loaded using employing an ldr at 8 distinct offsets.
src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5464:
> 5462: vs_str_indexed(vs1, __ Q, coeffs, 256, offsets3);
> 5463:
> 5464: // level 3
/ From level 3 upwards coefficients occur in discrete blocks whose size is
// some multiple of 32 so can be loaded using ldpq and suitable indexes.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/23663#discussion_r2037571231
PR Review Comment: https://git.openjdk.org/jdk/pull/23663#discussion_r2037573218
PR Review Comment: https://git.openjdk.org/jdk/pull/23663#discussion_r2037577265
PR Review Comment: https://git.openjdk.org/jdk/pull/23663#discussion_r2037578385
PR Review Comment: https://git.openjdk.org/jdk/pull/23663#discussion_r2037581149
PR Review Comment: https://git.openjdk.org/jdk/pull/23663#discussion_r2037585101
More information about the hotspot-dev
mailing list