RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v3]
Mikhail Ablakatov
duke at openjdk.org
Tue Aug 20 16:31:26 UTC 2024
On Tue, 20 Aug 2024 15:58:31 GMT, Andrew Haley <aph at openjdk.org> wrote:
>> src/hotspot/cpu/aarch64/aarch64.ad line 16613:
>>
>>> 16611: vRegD_V12 vtmp8, vRegD_V13 vtmp9, vRegD_V14 vtmp10,
>>> 16612: vRegD_V15 vtmp11, vRegD_V16 vtmp12, vRegD_V17 vtmp13,
>>> 16613: rFlagsReg cr)
>>
>> Using fixed registers here is rather odd.
>>
>> Is there some reason not simply to use `vReg` here, rather than named specific vector registers? You could just pass them down to `arrays_hashcode`. This issue isn't drop-dead-critical, but it would simplify this patch, which is otherwise fine.
>
>> Using fixed registers here is rather odd.
>
> My mistake. I see that you're calling a stub, unlike x86 which expands inline. It could go either way, whichever you choose to do is OK. Inline might be a bit more performant, but I suspect it's marginal.
The implementation is split into two parts: an unrolled scalar loop that handles up to tail 16/32 elements depending on the data type and a vectorized Neon loop. Inlining both parts takes more than 300B of code cache. Most of that size accounts for the Neon loop so it was moved into a stub. The scalar loop expands inline, the Neon loop is implemented via a stub which is called from the inlined part. This had no statistically significant effect on the performance of `ArraysHashCode` and `StringHashCode` benchmarks compared to the fully inlined (no stub) version.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1723610944
More information about the hotspot-dev
mailing list