RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v9]
Mikhail Ablakatov
duke at openjdk.org
Wed Sep 18 10:30:49 UTC 2024
> Hello,
>
> Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively.
>
> The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements.
>
> At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required.
>
> # Performance
>
> ## Neoverse N1
>
>
> --------------------------------------------------------------------------------------------
> Version Baseline This patch
> --------------------------------------------------------------------------------------------
> Benchmark (size) Mode Cnt Score Error Score Error Units
> --------------------------------------------------------------------------------------------
> ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op
> ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op
> ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op
> ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op
> ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op
> ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op
> ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op
> ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op
> ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op
> ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op
> ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op
> ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op
> ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ns/op
> ArraysHashCode.multibytes 10 avgt 15 5.4...
Mikhail Ablakatov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision:
- Merge branch 'master' into 8322770
- cleanup: adjust a comment in the light of the latest change
- cleanup: fix comment formatting
Co-authored-by: Andrew Haley <aph-open at littlepinkcloud.com>
- Optimize both the stub and inlined parts of the implementation
Process T_CHAR/T_SHORT elements using T8H arrangement instead of T4H.
Add a non-unrolled vectorized loop to the stub to handle vectorizable
tail portions of arrays multiple to 4/8 elements (for ints / other
types). Make the stub process array as a whole instead of relying on
the inlined part to process an unvectorizable tail.
- cleanup: add comments and simplify the orr ins
- cleanup: remove redundant copyright notice
- cleanup: use a constexpr function for intpow instead of a templated class
- cleanup: address review comments
- cleanup: remove a redundant parameter
- 8322770: AArch64: C2: Implement VectorizedHashCode
The code to calculate a hash code consists of two parts: a stub method that
implements a vectorized loop using Neon instruction which processes 16 or 32
elements per iteration depending on the data type; and an unrolled inlined
scalar loop that processes remaining tail elements.
[Performance]
[[Neoverse V2]]
```
| 328a053 (master) | dc2909f (this) |
----------------------------------------------------------------------------------------------------------
Benchmark (size) Mode Cnt | Score Error | Score Error | Units
----------------------------------------------------------------------------------------------------------
ArraysHashCode.bytes 1 avgt 15 | 0.805 ? 0.206 | 0.815 ? 0.141 | ns/op
ArraysHashCode.bytes 10 avgt 15 | 4.362 ? 0.013 | 3.522 ? 0.124 | ns/op
ArraysHashCode.bytes 100 avgt 15 | 78.374 ? 0.136 | 12.935 ? 0.016 | ns/op
ArraysHashCode.bytes 10000 avgt 15 | 9247.335 ? 13.691 | 1344.770 ? 1.898 | ns/op
ArraysHashCode.chars 1 avgt 15 | 0.731 ? 0.035 | 0.723 ? 0.046 | ns/op
ArraysHashCode.chars 10 avgt 15 | 4.359 ? 0.007 | 3.385 ? 0.004 | ns/op
ArraysHashCode.chars 100 avgt 15 | 78.374 ? 0.117 | 11.903 ? 0.023 | ns/op
ArraysHashCode.chars 10000 avgt 15 | 9248.328 ? 13.644 | 1344.007 ? 1.795 | ns/op
ArraysHashCode.ints 1 avgt 15 | 0.746 ? 0.083 | 0.631 ? 0.020 | ns/op
ArraysHashCode.ints 10 avgt 15 | 4.357 ? 0.009 | 3.387 ? 0.005 | ns/op
ArraysHashCode.ints 100 avgt 15 | 78.391 ? 0.103 | 10.934 ? 0.015 | ns/op
ArraysHashCode.ints 10000 avgt 15 | 9248.125 ? 12.583 | 1340.644 ? 1.869 | ns/op
ArraysHashCode.multibytes 1 avgt 15 | 0.555 ? 0.020 | 0.559 ? 0.020 | ns/op
ArraysHashCode.multibytes 10 avgt 15 | 2.681 ? 0.020 | 2.175 ? 0.045 | ns/op
ArraysHashCode.multibytes 100 avgt 15 | 36.954 ? 0.051 | 12.870 ? 0.021 | ns/op
ArraysHashCode.multibytes 10000 avgt 15 | 4862.703 ? 6.909 | 720.774 ? 3.487 | ns/op
ArraysHashCode.multichars 1 avgt 15 | 0.551 ? 0.017 | 0.552 ? 0.018 | ns/op
ArraysHashCode.multichars 10 avgt 15 | 2.683 ? 0.018 | 2.182 ? 0.086 | ns/op
ArraysHashCode.multichars 100 avgt 15 | 36.988 ? 0.054 | 8.830 ? 0.013 | ns/op
ArraysHashCode.multichars 10000 avgt 15 | 4862.279 ? 6.839 | 756.074 ? 6.754 | ns/op
ArraysHashCode.multiints 1 avgt 15 | 0.555 ? 0.018 | 0.557 ? 0.019 | ns/op
ArraysHashCode.multiints 10 avgt 15 | 2.689 ? 0.029 | 2.184 ? 0.074 | ns/op
ArraysHashCode.multiints 100 avgt 15 | 36.992 ? 0.044 | 8.098 ? 0.012 | ns/op
ArraysHashCode.multiints 10000 avgt 15 | 4873.863 ? 6.689 | 783.540 ? 9.151 | ns/op
ArraysHashCode.multishorts 1 avgt 15 | 0.563 ? 0.021 | 0.561 ? 0.021 | ns/op
ArraysHashCode.multishorts 10 avgt 15 | 2.679 ? 0.020 | 2.164 ? 0.054 | ns/op
ArraysHashCode.multishorts 100 avgt 15 | 36.976 ? 0.053 | 8.828 ? 0.013 | ns/op
ArraysHashCode.multishorts 10000 avgt 15 | 4861.118 ? 7.057 | 748.952 ? 6.040 | ns/op
ArraysHashCode.shorts 1 avgt 15 | 0.631 ? 0.020 | 0.643 ? 0.033 | ns/op
ArraysHashCode.shorts 10 avgt 15 | 4.362 ? 0.005 | 3.400 ? 0.025 | ns/op
ArraysHashCode.shorts 100 avgt 15 | 78.324 ? 0.151 | 11.892 ? 0.017 | ns/op
ArraysHashCode.shorts 10000 avgt 15 | 9246.323 ? 13.126 | 1344.304 ? 1.906 | ns/op
StringHashCode.Algorithm.defaultLatin1 1 avgt 15 | 0.946 ? 0.061 | 0.924 ? 0.001 | ns/op
StringHashCode.Algorithm.defaultLatin1 10 avgt 15 | 4.334 ? 0.046 | 3.447 ? 0.051 | ns/op
StringHashCode.Algorithm.defaultLatin1 100 avgt 15 | 78.136 ? 0.105 | 12.950 ? 0.048 | ns/op
StringHashCode.Algorithm.defaultLatin1 10000 avgt 15 | 9266.117 ? 13.184 | 1345.097 ? 1.963 | ns/op
StringHashCode.Algorithm.defaultUTF16 1 avgt 15 | 0.692 ? 0.035 | 0.687 ? 0.034 | ns/op
StringHashCode.Algorithm.defaultUTF16 10 avgt 15 | 4.323 ? 0.023 | 3.394 ? 0.015 | ns/op
StringHashCode.Algorithm.defaultUTF16 100 avgt 15 | 78.317 ? 0.109 | 11.911 ? 0.017 | ns/op
StringHashCode.Algorithm.defaultUTF16 10000 avgt 15 | 9249.620 ? 14.594 | 1344.533 ? 1.908 | ns/op
StringHashCode.cached N/A avgt 15 | 0.518 ? 0.017 | 0.530 ? 0.031 | ns/op
StringHashCode.empty N/A avgt 15 | 0.733 ? 0.086 | 0.849 ? 0.168 | ns/op
StringHashCode.notCached N/A avgt 15 | 0.687 ? 0.084 | 0.630 ? 0.018 | ns/op
```
[Test]
jtreg::tier1 passed on AArch64 and x86.
-------------
Changes:
- all: https://git.openjdk.org/jdk/pull/18487/files
- new: https://git.openjdk.org/jdk/pull/18487/files/6b8eb78c..f5918cca
Webrevs:
- full: https://webrevs.openjdk.org/?repo=jdk&pr=18487&range=08
- incr: https://webrevs.openjdk.org/?repo=jdk&pr=18487&range=07-08
Stats: 177814 lines in 1617 files changed: 159782 ins; 9374 del; 8658 mod
Patch: https://git.openjdk.org/jdk/pull/18487.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/18487/head:pull/18487
PR: https://git.openjdk.org/jdk/pull/18487
More information about the hotspot-dev
mailing list