RFR: 8348561: Add aarch64 intrinsics for ML-DSA [v5]
Ferenc Rakoczi
duke at openjdk.org
Fri Feb 21 10:14:00 UTC 2025
On Tue, 18 Feb 2025 13:43:18 GMT, Andrew Dinn <adinn at openjdk.org> wrote:
>> Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision:
>>
>> Adding comments + some code reorganization
>
> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4066:
>
>> 4064: }
>> 4065:
>> 4066: // Execute on round of keccak of two computations in parallel.
>
> Suggestion:
>
> It would be helpful to add comments that relate the register and instruction selection to the original Java source code. e.g. change the header as follows
>
> // Performs 2 keccak round transformations using vector parallelism
> //
> // Two sets of 25 * 64-bit input states a0[lo:hi]...a24[lo:hi] are passed in
> // the lower/upper halves of registers v0...v24 and the transformed states
> // are returned in the same registers. Intermediate 64-bit pairs
> // c0...c5 and d0...d5 are computed in registers v25...v30. v31 is
> // loaded with the required pair of 64 bit rounding constants.
> // During computation of the output states some intermediate results are
> // shuffled around registers v0...v30. Comments on each line indicate
> // how the values in registers correspond to variables ai, ci, di in
> // the Java source code, likewise how the generated machine instructions
> // correspond to Java source operations (n.b. rol means rotate left).
>
> The annotate the generation steps as follows:
>
> __ eor3(v29, __ T16B, v4, v9, v14); // c4 = a4 ^ a9 ^ a14
> __ eor3(v26, __ T16B, v1, v6, v11); // c1 = a1 ^ a16 ^ a11
> __ eor3(v28, __ T16B, v3, v8, v13); // c3 = a3 ^ a8 ^a13
> __ eor3(v25, __ T16B, v0, v5, v10); // c0 = a0 ^ a5 ^ a10
> __ eor3(v27, __ T16B, v2, v7, v12); // c2 = a2 ^ a7 ^ a12
> __ eor3(v29, __ T16B, v29, v19, v24); // c4 ^= a19 ^ a24
> __ eor3(v26, __ T16B, v26, v16, v21); // c1 ^= a16 ^ a21
> __ eor3(v28, __ T16B, v28, v18, v23); // c3 ^= a18 ^ a23
> __ eor3(v25, __ T16B, v25, v15, v20); // c0 ^= a15 ^ a20
> __ eor3(v27, __ T16B, v27, v17, v22); // c2 ^= a17 ^ a22
>
> __ rax1(v30, __ T2D, v29, v26); // d0 = c4 ^ rol(c1, 1)
> __ rax1(v26, __ T2D, v26, v28); // d2 = c1 ^ rol(c3, 1)
> __ rax1(v28, __ T2D, v28, v25); // d4 = c3 ^ rol(c0, 1)
> __ rax1(v25, __ T2D, v25, v27); // d1 = c0 ^ rol(c2, 1)
> __ rax1(v27, __ T2D, v27, v29); // d3 = c2 ^ rol(c4, 1)
>
> __ eor(v0, __ T16B, v0, v30); // a0 = a0 ^ d0
> __ xar(v29, __ T2D, v1, v25, (64 - 1)); // a10' = rol((a1^d1), 1)
> __ xar(v1, __ T2D, v6, v25, (64 - 44)); // a1 = rol(a6^d1), 44)
> __ xar(v6, __ T2D, v9, v28, (64 - 20)); // a6 = rol((a9^d4), 20)
> __ xar(v...
Although this piece of code is not new, and I don't really think that this level of commenting is necessary, especially in code that is very unlikely to change, I added the comments.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/23300#discussion_r1965220606
More information about the hotspot-dev
mailing list