RFR: 8339738: RISC-V: Vectorize crc32 intrinsic [v10]
Fei Yang
fyang at openjdk.org
Tue Sep 17 14:12:11 UTC 2024
On Tue, 17 Sep 2024 13:45:44 GMT, Hamlin Li <mli at openjdk.org> wrote:
>> Seem not help too much, as we need to slidedown vtmp in every loop round like vcrc, that means we can not save instruction; on the other side, as the `lwu` in the outer loop is continuous load, we can expect most of the actual laod is indeed from the cache.
>>
>> Unless we can also vetorize most of the code of outer loop (i < N), i.e. vectorize the subsequent `xorr` to `vxor_vv`, but seems we can not do that, because in every loop round `i`, it depends on `crc` result of previous loop round.
>
> Sorry, I gave it another thought.
> Although we can not vectorize the whole out loop, we can still put one `xor` outside of the outer loop.
Yes. Looks better.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/20910#discussion_r1763322224
More information about the hotspot-dev
mailing list