RFR: 8317720: RISC-V: Implement Adler32 intrinsic [v15]

Tue Jul 2 17:19:25 UTC 2024

On Tue, 2 Jul 2024 02:32:16 GMT, Fei Yang <fyang at openjdk.org> wrote:

>> ArsenyBochkarev has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Change srliw and zero_extend order
>
> src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5279:
> 
>> 5277:     __ bge(len, count, L_by16_loop_unroll);
>> 5278:     __ mv(count, step_16);
>> 5279:     __ blt(len, count, L_by1);
> 
> Question: Why do we need this `blt` branch after the loop unroll here? The `len` has been subtracted by 16 at `L_by16` by `__ add(len, len, count)` where the input `len == len - nmax` and `count == nmax - 16`.

`L_by16` loop just means that the length is less than `NMAX` (which is 5552). So even after subtracting It is possible for `len` to be less than 16 but greater than 0 on this `blt`.

For example, take 90 as an initial `len`. We go to `L_nmax` firstly, bypassing the `L_simple_by1_loop`. Then we branch onto `L_by16` since we're smaller than `nmax`. `add(len, len, count)` instruction means that we have 76 at `len` at this point. After executing `adler32_process_bytes` and `sub((len, len, step_64)` we got 12 in the `len` (smaller than 64), meaning that we have to choose whether we need to process 16 bytes at a step in `L_by16_loop` or go to `L_by1`. And the execution goes to `L_by1` instead of falling through.

Please correct me If you see some cases that doesn't fit into this model

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/18382#discussion_r1662899966