RFR: 8316592: RISC-V: implement poly1305 intrinsic [v7]

Fri Nov 10 10:04:04 UTC 2023

On Thu, 9 Nov 2023 18:54:26 GMT, Hamlin Li <mli at openjdk.org> wrote:

>> @theRealAph Besides of the reduce discussed above, I still got several other questions about the original implementation in aarch64 (as this implementation is kind of translation from the original one), would you mind help to clarify a bit? Thanks!
>> 1. `__ add(S_2, S_2, 1);` in L7168 at stubGenerator_aarch64.cpp, what's the purpose of this add 1?
>> 2. `wide_mul(U_1, U_1HI, S_0, R_1);  wide_madd(U_1, U_1HI, S_1, R_0);  wide_madd(U_1, U_1HI, S_2, RR_1);` in L7178 at stubGenerator_aarch64.cpp, why S_2*RR_1 does not consider the low 2 bit of R_1, but just the higher bits (which in RR_1)?
>
> Seems we don't need the extra reduce introduced in commit `053b7c0`.
> Image this situation:
> 1. in the loop (final round), at the end of reduce `__ add(U_2, U_2, tmp2);`, U_2 is `0b11`, tmp2 is `0b1`, that's the case of maximum value we can get.
> 2. out of the loop, reduce can never overflow 130 bits anymore, because top 2 bits of 130 bits is `0b00` now.
> 
> So I think we are safe to revert before `053b7c0`.

About one of the above questions `__ add(S_2, S_2, 1); in L7168 at stubGenerator_aarch64.cpp, what's the purpose of this add 1?`, I think I get it, it's in the paper of https://cr.yp.to/mac/poly1305-20050329.pdf, at section `Conversion and padding`, I missed that part, sorry.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/16417#discussion_r1389188086