RFR: 8322770: Implement C2 VectorizedHashCode on AArch64
Andrew Haley
aph at openjdk.org
Mon Apr 22 12:32:28 UTC 2024
On Tue, 26 Mar 2024 13:59:12 GMT, Mikhail Ablakatov <duke at openjdk.org> wrote:
> Hello,
>
> Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively.
>
> The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements.
>
> At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required.
>
> # Performance
>
> ## Neoverse N1
>
>
> --------------------------------------------------------------------------------------------
> Version Baseline This patch
> --------------------------------------------------------------------------------------------
> Benchmark (size) Mode Cnt Score Error Score Error Units
> --------------------------------------------------------------------------------------------
> ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op
> ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op
> ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op
> ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op
> ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op
> ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op
> ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op
> ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op
> ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op
> ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op
> ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op
> ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op
> ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ns/op
> ArraysHashCode.multibytes 10 avgt 15 5.4...
You only need one load, add, and multiply per iteration.
You don't need to add across columns until the end.
This is an example of how to do it. The full thing is at https://gist.github.com/theRealAph/cbc85299d6cd24101d46a06c12a97ce6.
public static int vectorizedHashCode(int result, int[] a, int fromIndex, int length) {
if (length < WIDTH) {
return hashCode(result, a, fromIndex, length);
}
int offset = fromIndex;
int[] sum = new int[WIDTH];
sum[WIDTH - 1] = result;
int[] temp = new int[WIDTH];
int remaining = length;
while (remaining >= WIDTH * 2) {
vmult(sum, sum, n31powerWIDTH);
vload(temp, a, offset);
vadd(sum, sum, temp);
offset += WIDTH;
remaining -= WIDTH;
}
vmult(sum, sum, n31powerWIDTH);
vload(temp, a, offset);
vadd(sum, sum, temp);
vmult(sum, sum, n31powersToWIDTH);
offset += WIDTH;
remaining -= WIDTH;
result = vadd(sum);
return hashCode(result, a, fromIndex + offset, remaining);
}
-------------
PR Comment: https://git.openjdk.org/jdk/pull/18487#issuecomment-2069274761
More information about the hotspot-dev
mailing list