RFR: 8316592: RISC-V: implement poly1305 intrinsic [v9]

Tue Nov 14 19:57:36 UTC 2023

On Tue, 14 Nov 2023 19:27:40 GMT, Hamlin Li <mli at openjdk.org> wrote:

>> Hmm, I looked at the code once again and it seems to me that the overflow is actually impossible. Consider the case of 8-bit registers (for simplicity).
>> 
>> 1. Above `wide_mul`'s and `wide_madd`'s:
>>> we know that the top four bits of R_0 and R_1 are zero
>> 
>>   Therefore,`R_1` and `R_0`'s max value (in our case of 8-bit registers) is `0b1111`. Then max value for `RR_1` is `(R_1 >> 2) * 5` = `0b1000`
>> 
>> 2. As we figured out earlier: the max value for `S_2` is `0b110`;
>> 
>> Using these facts we can deduce the restrictions for `U_1HI`:
>> 1. `wide_mul(U_1, U_1HI, S_0, R_1)`: let `S_0 = 0b11111111`. Then the `U_1HI` here is `0b1110`;
>> 
>> 2. `wide_madd(U_1, U_1HI, S_1, R_0, t1, t2)`: let `S_1 = 0b11111111`, carry is `0b1`, then the `U_1HI` = `U_1HI` + `0b1110` + `0b1` = `0b11101`;
>> 
>> 3. `wide_madd(U_1, U_1HI, S_2, RR_1, t1, t2)`: let carry is `0b1`, then the `U_1HI` = `U_1HI` + `0b101010` + `0b1` = `0b1001000`;
>> 
>> 4. `mul(U_2, S_2, U_2)`: max value for `U_2` is `0b10010`;
>> 
>> 5. `adc(U_2, U_2, U_1HI, t1)`: max value for `U_2` is `0b10010` + `0b1001000` + `0b1` = `0b1011011`;
>> 
>> 6. `poly1305_reduce`: max value for `tmp1` is `0b1101110`
>> 
>> Am I missing something once again or is it alright?
>
> Seems right! Based on this deducation, I think we don't have to expand the tests.
> 
> But I would recommend to still use `2 steps` in poly1305_reduce, as it's safer, although it brings a bit performance cost, unless more people could looking into the details. How do you think about it?

I'm fine with that.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/16417#discussion_r1393190262