RFR: 8316592: RISC-V: implement poly1305 intrinsic [v9]
ArsenyBochkarev
duke at openjdk.org
Mon Nov 13 09:44:27 UTC 2023
On Fri, 10 Nov 2023 15:45:31 GMT, Hamlin Li <mli at openjdk.org> wrote:
>> src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 4474:
>>
>>> 4472:
>>> 4473: __ srli(tmp1, U_2, 2);
>>> 4474: __ shadd(tmp1, tmp1, tmp1, tmp2, 2);
>>
>> I'm afraid there might be an overflow issue here.
>> Consider a smplifed case(8 bits per U): U_2==0b11111111, tmp1 will be 0b111111, tmp1*5 will overflow, and there is carry for it.
>> If I'm right, then the fix is to use 2 steps rather than one to reduce, please check the aarch64 version for details.
>
> If the above issue is not a false one, and the implemention was tested with existing jdk test suite, then I'm afraid we need to add more test to catch this issue.
Please correct me if I'm wrong:
1. In first loop iteration: after `poly1305_pack_26` before the loop `U_2` is `0b11` max. After any loop iteration the max for `U_2` is `0b100` (as we found out earlier);
2. After `Add U_0 to S_0`: the carry is `0b1` max, plus mandatory `addi` of `0b1` to `S_2` --> `S_2` is `0b110` max;
3. After `wide_mul`'s and `wide_madd`'s: `U_2` is again `0b11` max due to `andi(U_2, R_0, bits2)` (and even `0b0` in case of `0b100` in first step). NB: inside `wide_*` functions `S_2` is unchanged;
4. `mul(U_2, S_2, U_2)`: `U_2` is `0b1111` max --> in `poly1305_reduce` the `tmp1` is `0b11` max, which is safe?
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/16417#discussion_r1390842464
More information about the hotspot-compiler-dev
mailing list