RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v14]
Andrew Dinn
adinn at openjdk.org
Tue Sep 24 16:07:46 UTC 2024
On Tue, 24 Sep 2024 10:17:18 GMT, Mikhail Ablakatov <duke at openjdk.org> wrote:
>> Hello,
>>
>> Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively.
>>
>> The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements.
>>
>> At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required.
>>
>> # Performance
>>
>> ## Neoverse N1
>>
>>
>> --------------------------------------------------------------------------------------------
>> Version Baseline This patch
>> --------------------------------------------------------------------------------------------
>> Benchmark (size) Mode Cnt Score Error Score Error Units
>> --------------------------------------------------------------------------------------------
>> ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op
>> ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op
>> ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op
>> ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op
>> ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op
>> ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op
>> ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op
>> ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op
>> ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op
>> ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op
>> ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op
>> ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op
>> ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ...
>
> Mikhail Ablakatov has updated the pull request incrementally with one additional commit since the last revision:
>
> cleanup: fix a comment typo
>
> Co-authored-by: Andrew Haley <aph-open at littlepinkcloud.com>
A few small nits to address but otherwise good.
src/hotspot/cpu/aarch64/aarch64.ad line 16608:
> 16606: %}
> 16607:
> 16608: instruct arrays_hashcode(iRegP_R1 ary, iRegI_R2 cnt, iRegI_R0 result, immI basic_type,
I'm not sure why `arrays_hashcode` uses the plural, ditto for the macroassembler method name and stub name/stub generator method name. Other instructions and stubs use the singular e.g. `instruction array_equal_NNN`, `generate_arraycopy_stubs` etc. It would be better to follow that by systematically renaming all occurrences of `arrays_hashcode` to `array_hashcode`.
src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.hpp line 39:
> 37: // Helper functions for arrays_hashcode.
> 38: void arrays_hashcode_elload(Register dst, Address src, BasicType eltype);
> 39: int arrays_hashcode_elsize(BasicType eltype);
The above two methods don't seem to exist any more?
src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5439:
> 5437: case Assembler::T8B:
> 5438: case Assembler::T4H:
> 5439: case Assembler::T8H:
With the current code we should never see T4H as a value for load_arrangement. Is there a reason why you are folding it into the same case handling block as T8B and T8H rather than into the default block? If not then best to remove that case here and below.
src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5477:
> 5475:
> 5476: assert(is_power_of_2(vf), "can't use this value to calculate the jump target PC");
> 5477: __ andr(rscratch2, cnt, vf - 1);
It would probably be helpful to include here a repeat of the comment you added to the macroassembler method explaining how this deals with the correct number of leftover elements modulo `vf`
-------------
Marked as reviewed by adinn (Reviewer).
PR Review: https://git.openjdk.org/jdk/pull/18487#pullrequestreview-2325580547
PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1773570214
PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1773545405
PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1773592005
PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1773627876
More information about the hotspot-dev
mailing list