RFR: 8339738: RISC-V: Vectorize crc32 intrinsic [v4]
Hamlin Li
mli at openjdk.org
Wed Sep 11 07:55:05 UTC 2024
On Wed, 11 Sep 2024 07:43:12 GMT, Hamlin Li <mli at openjdk.org> wrote:
>> src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1584:
>>
>>> 1582: sub(tmp1, len, tmp_limit);
>>> 1583: bge(tmp1, zr, L_vector_entry);
>>> 1584: }
>>
>> Hi Hamlin, I think maybe we should introduce another assember routine for the vector code? Let's say `kernel_crc32_using_vector` and delegate the work to it under `UseRVV`. That seems more cleaner to me and avoids "offset is too large" issue. I will take a look at the vector code later. BTW: Should `single_talbe_size` be `single_table_size`?
>
> Not sure if I understand your suggestion correctly. Do you mean something like below?
>
> address generate_updateBytesCRC32() {
> if (UseRVV) { kernel_crc32_using_vector(); }
> else { kernel_crc32(...); }
> }
>
> But as kernel_crc32_using_vector reuses the code in kernel_crc32, and even with UseRVV, in some condition (when size is not large enough) we still need to fallback to L_unroll_loop_entry.
> Or maybe I could misunderstand what you mean?
In a summary, the code paths are went through in following order: vector(optional) -> loop unroll -> other scalar cases, it depens on data size + UseRVV. So in UseRVV case, we need all the code path.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/20910#discussion_r1753467371
More information about the hotspot-dev
mailing list