RFR: 8316592: RISC-V: implement poly1305 intrinsic [v7]

Thu Nov 9 18:57:08 UTC 2023

On Thu, 9 Nov 2023 18:25:42 GMT, Hamlin Li <mli at openjdk.org> wrote:

>> I think the caller does a full mod N reduction of your result in `IntegerPolynomial::final Reduce()`.
>
> @theRealAph Besides of the reduce discussed above, I still got several other questions about the original implementation in aarch64 (as this implementation is kind of translation from the original one), would you mind help to clarify a bit? Thanks!
> 1. `__ add(S_2, S_2, 1);` in L7168 at stubGenerator_aarch64.cpp, what's the purpose of this add 1?
> 2. `wide_mul(U_1, U_1HI, S_0, R_1);  wide_madd(U_1, U_1HI, S_1, R_0);  wide_madd(U_1, U_1HI, S_2, RR_1);` in L7178 at stubGenerator_aarch64.cpp, why S_2*RR_1 does not consider the low 2 bit of R_1, but just the higher bits (which in RR_1)?

Seems we don't need the extra reduce introduced in commit `053b7c0`.
Image this situation:
1. in the loop (final round), at the end of reduce `__ add(U_2, U_2, tmp2);`, U_2 is `0b11`, tmp2 is `0b1`, that's the case of maximum value we can get.
2. out of the loop, reduce can never overflow 130 bits anymore, because top 2 bits of 130 bits is `0b00` now.

So I think we are safe to revert before `053b7c0`.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/16417#discussion_r1388457503