RFR: 8322770: Implement C2 VectorizedHashCode on AArch64
Dmitry Chuyko
dchuyko at openjdk.org
Mon Apr 15 22:04:08 UTC 2024
On Tue, 26 Mar 2024 13:59:12 GMT, Mikhail Ablakatov <duke at openjdk.org> wrote:
> Hello,
>
> Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively.
>
> The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements.
>
> At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required.
>
> # Performance
>
> ## Neoverse N1
>
>
> --------------------------------------------------------------------------------------------
> Version Baseline This patch
> --------------------------------------------------------------------------------------------
> Benchmark (size) Mode Cnt Score Error Score Error Units
> --------------------------------------------------------------------------------------------
> ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op
> ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op
> ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op
> ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op
> ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op
> ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op
> ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op
> ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op
> ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op
> ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op
> ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op
> ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op
> ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ns/op
> ArraysHashCode.multibytes 10 avgt 15 5.4...
Just a trivial note: this change also improves the calculation of String.hashCode(). For instance, on V1
Benchmark size Improvement
StringHashCode.Algorithm.defaultLatin1 1 -2.86%
StringHashCode.Algorithm.defaultLatin1 10 45.84%
StringHashCode.Algorithm.defaultLatin1 100 79.43%
StringHashCode.Algorithm.defaultLatin1 10000 79.16%
StringHashCode.Algorithm.defaultUTF16 1 -1.57%
StringHashCode.Algorithm.defaultUTF16 10 41.83%
StringHashCode.Algorithm.defaultUTF16 100 80.01%
StringHashCode.Algorithm.defaultUTF16 10000 78.44%
SVE can give notable additional speedup only for very long strings (>1k).
src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 65:
> 63: : eltype == T_CHAR || eltype == T_SHORT || eltype == T_INT ? 4
> 64: : 0;
> 65: guarantee(loop_factor, "unsopported eltype");
typo: unsupported
src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 102:
> 100: * Pseudocode:
> 101: *
> 102: * cnt -= unroll_facotor + 1 - loop_factor;
typo: factor
-------------
PR Comment: https://git.openjdk.org/jdk/pull/18487#issuecomment-2024948481
PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1542839364
PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1543169690
More information about the hotspot-dev
mailing list