RFR: 8339738: RISC-V: Vectorize crc32 intrinsic [v10]

Fei Yang fyang at openjdk.org
Tue Sep 17 14:12:11 UTC 2024


On Tue, 17 Sep 2024 13:45:44 GMT, Hamlin Li <mli at openjdk.org> wrote:

>> Seem not help too much, as we need to slidedown vtmp in every loop round like vcrc, that means we can not save instruction; on the other side, as the `lwu` in the outer loop is continuous load, we can expect most of the actual laod is indeed from the cache.
>> 
>> Unless we can also vetorize most of the code of outer loop (i < N), i.e. vectorize the subsequent `xorr` to `vxor_vv`, but seems we can not do that, because in every loop round `i`, it depends on `crc` result of previous loop round.
>
> Sorry, I gave it another thought.
> Although we can not vectorize the whole out loop, we can still put one `xor` outside of the outer loop.

Yes. Looks better.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20910#discussion_r1763322224


More information about the hotspot-dev mailing list