RFR: 8322770: Implement C2 VectorizedHashCode on AArch64

Mon Apr 15 22:04:08 UTC 2024

On Tue, 26 Mar 2024 13:59:12 GMT, Mikhail Ablakatov <duke at openjdk.org> wrote:

> Hello,
> 
> Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. 
> 
> The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements.
> 
> At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required.
> 
> # Performance
> 
> ## Neoverse N1
> 
> 
>   --------------------------------------------------------------------------------------------
>   Version                                            Baseline           This patch
>   --------------------------------------------------------------------------------------------
>   Benchmark                   (size)  Mode  Cnt      Score    Error     Score     Error  Units
>   --------------------------------------------------------------------------------------------
>   ArraysHashCode.bytes             1  avgt   15      1.249 ?  0.060     1.247 ?   0.062  ns/op
>   ArraysHashCode.bytes            10  avgt   15      8.754 ?  0.028     4.387 ?   0.015  ns/op
>   ArraysHashCode.bytes           100  avgt   15     98.596 ?  0.051    26.655 ?   0.097  ns/op
>   ArraysHashCode.bytes         10000  avgt   15  10150.578 ?  1.352  2649.962 ? 216.744  ns/op
>   ArraysHashCode.chars             1  avgt   15      1.286 ?  0.062     1.246 ?   0.054  ns/op
>   ArraysHashCode.chars            10  avgt   15      8.731 ?  0.002     5.344 ?   0.003  ns/op
>   ArraysHashCode.chars           100  avgt   15     98.632 ?  0.048    23.023 ?   0.142  ns/op
>   ArraysHashCode.chars         10000  avgt   15  10150.658 ?  3.374  2410.504 ?   8.872  ns/op
>   ArraysHashCode.ints              1  avgt   15      1.189 ?  0.005     1.187 ?   0.001  ns/op
>   ArraysHashCode.ints             10  avgt   15      8.730 ?  0.002     5.676 ?   0.001  ns/op
>   ArraysHashCode.ints            100  avgt   15     98.559 ?  0.016    24.378 ?   0.006  ns/op
>   ArraysHashCode.ints          10000  avgt   15  10148.752 ?  1.336  2419.015 ?   0.492  ns/op
>   ArraysHashCode.multibytes        1  avgt   15      1.037 ?  0.001     1.037 ?   0.001  ns/op
>   ArraysHashCode.multibytes       10  avgt   15      5.4...

Just a trivial note: this change also improves the calculation of String.hashCode(). For instance, on V1

Benchmark	size	Improvement
StringHashCode.Algorithm.defaultLatin1	1	-2.86%
StringHashCode.Algorithm.defaultLatin1	10	45.84%
StringHashCode.Algorithm.defaultLatin1	100	79.43%
StringHashCode.Algorithm.defaultLatin1	10000	79.16%
StringHashCode.Algorithm.defaultUTF16	1	-1.57%
StringHashCode.Algorithm.defaultUTF16	10	41.83%
StringHashCode.Algorithm.defaultUTF16	100	80.01%
StringHashCode.Algorithm.defaultUTF16	10000	78.44%

SVE can give notable additional speedup only for very long strings (>1k).

src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 65:

> 63:                              : eltype == T_CHAR || eltype == T_SHORT || eltype == T_INT ? 4
> 64:                                                                                         : 0;
> 65:   guarantee(loop_factor, "unsopported eltype");

typo: unsupported

src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 102:

> 100:    * Pseudocode:
> 101:    *
> 102:    *  cnt -= unroll_facotor + 1 - loop_factor;

typo: factor

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18487#issuecomment-2024948481
PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1542839364
PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1543169690