RFR: 8316592: RISC-V: implement poly1305 intrinsic [v7]

Thu Nov 9 18:30:10 UTC 2023

On Thu, 9 Nov 2023 15:04:36 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> Thanks for updating.
>> Looks safer, but I'm not sure if it's too conservative or not safe enough.
>> Can we wait for a while to see how others think about it or what new discoveries have we made about ourselves?
>
> I think the caller does a full mod N reduction of your result in `IntegerPolynomial::final Reduce()`.

Thanks for the information @theRealAph 
Seems the finalReduce in java code does not make reduce in intrinsic code here unnecessary, because in the intrinsic code, we only store back 130 bits back to acc_start, if it's overflowed then the rest of bits (beyond 130) will be just discarded, seems this is not expected?
And in the finalReduce (java code), it does 2 pass of reduce as `final`.

@theRealAph Besides of the reduce discussed above, I still got several other questions about the original implementation in aarch64 (as this implementation is kind of translation from the original one), would you mind help to clarify a bit? Thanks!
1. `__ add(S_2, S_2, 1);` in L7168 at stubGenerator_aarch64.cpp, what's the purpose of this add 1?
2. `wide_mul(U_1, U_1HI, S_0, R_1);  wide_madd(U_1, U_1HI, S_1, R_0);  wide_madd(U_1, U_1HI, S_2, RR_1);` in L7178 at stubGenerator_aarch64.cpp, why S_2*RR_1 does not consider the low 2 bit of R_1, but just the higher bits (which in RR_1)?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/16417#discussion_r1388230683
PR Review Comment: https://git.openjdk.org/jdk/pull/16417#discussion_r1388421744