RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v3]

Mikhail Ablakatov duke at openjdk.org
Tue Aug 20 16:31:26 UTC 2024


On Tue, 20 Aug 2024 15:58:31 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> src/hotspot/cpu/aarch64/aarch64.ad line 16613:
>> 
>>> 16611:                          vRegD_V12 vtmp8, vRegD_V13 vtmp9, vRegD_V14 vtmp10,
>>> 16612:                          vRegD_V15 vtmp11, vRegD_V16 vtmp12, vRegD_V17 vtmp13,
>>> 16613:                          rFlagsReg cr)
>> 
>> Using fixed registers here is rather odd.
>> 
>> Is there some reason not simply to use `vReg` here, rather than named specific vector registers? You could just pass them down to `arrays_hashcode`. This issue isn't drop-dead-critical, but it would simplify this patch, which is otherwise fine.
>
>> Using fixed registers here is rather odd.
> 
> My mistake. I see that you're calling a stub, unlike x86 which expands inline. It could go either way, whichever you choose to do is OK. Inline might be a bit more performant, but I suspect it's marginal.

The implementation is split into two parts: an unrolled scalar loop that handles up to tail 16/32 elements depending on the data type and a vectorized Neon loop. Inlining both parts takes more than 300B of code cache. Most of that size accounts for the Neon loop so it was moved into a stub. The scalar loop expands inline, the Neon loop is implemented via a stub which is called from the inlined part. This had no statistically significant effect on the performance of  `ArraysHashCode` and `StringHashCode` benchmarks compared to the fully inlined (no stub) version.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1723610944


More information about the hotspot-dev mailing list