RFR: 8316592: RISC-V: implement poly1305 intrinsic [v9]
ArsenyBochkarev
duke at openjdk.org
Tue Nov 14 19:57:36 UTC 2023
On Tue, 14 Nov 2023 19:27:40 GMT, Hamlin Li <mli at openjdk.org> wrote:
>> Hmm, I looked at the code once again and it seems to me that the overflow is actually impossible. Consider the case of 8-bit registers (for simplicity).
>>
>> 1. Above `wide_mul`'s and `wide_madd`'s:
>>> we know that the top four bits of R_0 and R_1 are zero
>>
>> Therefore,`R_1` and `R_0`'s max value (in our case of 8-bit registers) is `0b1111`. Then max value for `RR_1` is `(R_1 >> 2) * 5` = `0b1000`
>>
>> 2. As we figured out earlier: the max value for `S_2` is `0b110`;
>>
>> Using these facts we can deduce the restrictions for `U_1HI`:
>> 1. `wide_mul(U_1, U_1HI, S_0, R_1)`: let `S_0 = 0b11111111`. Then the `U_1HI` here is `0b1110`;
>>
>> 2. `wide_madd(U_1, U_1HI, S_1, R_0, t1, t2)`: let `S_1 = 0b11111111`, carry is `0b1`, then the `U_1HI` = `U_1HI` + `0b1110` + `0b1` = `0b11101`;
>>
>> 3. `wide_madd(U_1, U_1HI, S_2, RR_1, t1, t2)`: let carry is `0b1`, then the `U_1HI` = `U_1HI` + `0b101010` + `0b1` = `0b1001000`;
>>
>> 4. `mul(U_2, S_2, U_2)`: max value for `U_2` is `0b10010`;
>>
>> 5. `adc(U_2, U_2, U_1HI, t1)`: max value for `U_2` is `0b10010` + `0b1001000` + `0b1` = `0b1011011`;
>>
>> 6. `poly1305_reduce`: max value for `tmp1` is `0b1101110`
>>
>> Am I missing something once again or is it alright?
>
> Seems right! Based on this deducation, I think we don't have to expand the tests.
>
> But I would recommend to still use `2 steps` in poly1305_reduce, as it's safer, although it brings a bit performance cost, unless more people could looking into the details. How do you think about it?
I'm fine with that.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/16417#discussion_r1393190262
More information about the hotspot-compiler-dev
mailing list