RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v8]
Andrew Haley
aph at openjdk.org
Wed Sep 18 09:54:11 UTC 2024
On Tue, 17 Sep 2024 16:24:29 GMT, Mikhail Ablakatov <duke at openjdk.org> wrote:
>> Hello,
>>
>> Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively.
>>
>> The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements.
>>
>> At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required.
>>
>> # Performance
>>
>> ## Neoverse N1
>>
>>
>> --------------------------------------------------------------------------------------------
>> Version Baseline This patch
>> --------------------------------------------------------------------------------------------
>> Benchmark (size) Mode Cnt Score Error Score Error Units
>> --------------------------------------------------------------------------------------------
>> ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op
>> ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op
>> ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op
>> ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op
>> ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op
>> ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op
>> ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op
>> ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op
>> ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op
>> ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op
>> ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op
>> ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op
>> ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ...
>
> Mikhail Ablakatov has updated the pull request incrementally with one additional commit since the last revision:
>
> cleanup: adjust a comment in the light of the latest change
OK, I think we're now good enough, performance wise, with and without the vectorized intrinsic:
Benchmark (size) Mode Cnt Score Error Score Error Units
ArraysHashCode.bytes 1 avgt 5 0.591 ± 0.043 0.584 ± 0.006 ns/op
ArraysHashCode.bytes 2 avgt 5 1.343 ± 0.003 0.838 ± 0.016 ns/op
ArraysHashCode.bytes 4 avgt 5 2.262 ± 0.028 1.096 ± 0.032 ns/op
ArraysHashCode.bytes 8 avgt 5 2.432 ± 0.038 2.215 ± 0.049 ns/op
ArraysHashCode.bytes 12 avgt 5 3.605 ± 0.042 2.292 ± 0.068 ns/op
ArraysHashCode.bytes 16 avgt 5 5.149 ± 0.220 2.245 ± 0.132 ns/op
ArraysHashCode.bytes 20 avgt 5 6.819 ± 0.266 2.575 ± 0.046 ns/op
ArraysHashCode.bytes 24 avgt 5 8.478 ± 0.430 2.965 ± 0.085 ns/op
ArraysHashCode.bytes 28 avgt 5 10.308 ± 0.386 3.047 ± 0.377 ns/op
ArraysHashCode.bytes 32 avgt 5 12.425 ± 0.453 4.045 ± 0.123 ns/op
ArraysHashCode.bytes 48 avgt 35 21.086 ± 0.061 4.756 ± 0.053 ns/op
ArraysHashCode.bytes 64 avgt 35 32.817 ± 0.078 5.934 ± 0.039 ns/op
> This is what I'm seeing now. Scorching fast with large blocks, poor with smaller ones.
>
> ```
> Benchmark (size) Mode Cnt Score Error Units
> ArraysHashCode.bytes 1 avgt 5 0.532 ± 0.036 ns/op
> ArraysHashCode.bytes 2 avgt 5 0.812 ± 0.011 ns/op
> ArraysHashCode.bytes 4 avgt 5 1.104 ± 0.020 ns/op
> ArraysHashCode.bytes 8 avgt 5 2.136 ± 0.032 ns/op
> ArraysHashCode.bytes 12 avgt 5 3.596 ± 0.061 ns/op
> ArraysHashCode.bytes 16 avgt 5 5.278 ± 0.240 ns/op
> ArraysHashCode.bytes 20 avgt 5 7.390 ± 0.043 ns/op
> ArraysHashCode.bytes 24 avgt 5 9.606 ± 0.059 ns/op
> ArraysHashCode.bytes 28 avgt 5 12.144 ± 0.064 ns/op
> ArraysHashCode.bytes 32 avgt 5 3.898 ± 0.096 ns/op
> ArraysHashCode.bytes 36 avgt 5 4.468 ± 0.113 ns/op
> ArraysHashCode.bytes 40 avgt 5 4.481 ± 0.082 ns/op
> ArraysHashCode.bytes 44 avgt 5 5.143 ± 0.060 ns/op
> ArraysHashCode.bytes 48 avgt 5 6.727 ± 0.103 ns/op
> ArraysHashCode.bytes 52 avgt 5 8.844 ± 0.029 ns/op
> ArraysHashCode.bytes 56 avgt 5 11.108 ± 0.108 ns/op
> ArraysHashCode.bytes 60 avgt 5 13.864 ± 0.071 ns/op
> ArraysHashCode.bytes 64 avgt 5 5.796 ± 0.146 ns/op
> ```
-------------
PR Comment: https://git.openjdk.org/jdk/pull/18487#issuecomment-2358012793
More information about the hotspot-dev
mailing list