RFR: 8339738: RISC-V: Vectorize crc32 intrinsic [v4]

Wed Sep 11 07:55:05 UTC 2024

On Wed, 11 Sep 2024 07:43:12 GMT, Hamlin Li <mli at openjdk.org> wrote:

>> src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1584:
>> 
>>> 1582:     sub(tmp1, len, tmp_limit);
>>> 1583:     bge(tmp1, zr, L_vector_entry);
>>> 1584:   }
>> 
>> Hi Hamlin, I think maybe we should introduce another assember routine for the vector code? Let's say `kernel_crc32_using_vector` and delegate the work to it under `UseRVV`. That seems more cleaner to me and avoids "offset is too large" issue. I will take a look at the vector code later. BTW: Should `single_talbe_size` be `single_table_size`?
>
> Not sure if I understand your suggestion correctly. Do you mean something like below?
> 
> address generate_updateBytesCRC32() {
>   if (UseRVV) { kernel_crc32_using_vector(); }
>   else { kernel_crc32(...); }
> }
> 
> But as kernel_crc32_using_vector reuses the code in kernel_crc32, and even with UseRVV, in some condition (when size is not large enough) we still need to fallback to L_unroll_loop_entry.
> Or maybe I could misunderstand what you mean?

In a summary, the code paths are went through in following order: vector(optional) -> loop unroll -> other scalar cases, it depens on data size + UseRVV. So in UseRVV case, we need all the code path.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20910#discussion_r1753467371