RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v11]
Andrew Haley
aph at openjdk.org
Mon Sep 23 10:55:40 UTC 2024
On Mon, 23 Sep 2024 09:40:19 GMT, Mikhail Ablakatov <duke at openjdk.org> wrote:
>> Hello,
>>
>> Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively.
>>
>> The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements.
>>
>> At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required.
>>
>> # Performance
>>
>> ## Neoverse N1
>>
>>
>> --------------------------------------------------------------------------------------------
>> Version Baseline This patch
>> --------------------------------------------------------------------------------------------
>> Benchmark (size) Mode Cnt Score Error Score Error Units
>> --------------------------------------------------------------------------------------------
>> ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op
>> ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op
>> ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op
>> ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op
>> ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op
>> ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op
>> ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op
>> ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op
>> ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op
>> ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op
>> ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op
>> ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op
>> ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ...
>
> Mikhail Ablakatov has updated the pull request incrementally with two additional commits since the last revision:
>
> - Add asm tests for Neon Vector - Scalar insts
> - fixup: restrict Vm to V0-V15 for mulvs when esize is H
src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 2887:
> 2885: f(0b10, 23, 22), f(index & 1, 21), rf(Vm, 16), f(op2, 15, 12), f(index >> 1, 11); \
> 2886: } \
> 2887: f(0, 10), rf(Vn, 5), rf(Vd, 0); \
Suggestion:
#define INSN(NAME, op1, op2) \
void NAME(FloatRegister Vd, SIMD_Arrangement T, FloatRegister Vn, FloatRegister Vm, int index) { \
starti; \
assert(T == T4H || T == T8H || T == T2S || T == T4S, "invalid arrangement"); \
assert(index >= 0 && \
((T == T2S && index <= 1) || (T != T2S && index <= 3) || (T == T8H && index <= 7)), \
"invalid index"); \
assert((T != T4H && T != T8H) || Vm->encoding() < 16, "invalid source SIMD&FP register"); \
f(0, 31), f((int)T & 1, 30), f(op1, 29), f(0b01111, 28, 24), f(0b01, 23, 22); \
if (T == T4H || T == T8H) { \
f(index & 0b11, 21, 20), lrf(Vm, 16); \
} else { \
f(index & 1, 21), rf(Vm, 16); \
} \
f(op2, 15, 12), f(index >> 1, 11), f(0, 10), rf(Vn, 5), rf(Vd, 0); \
I think it's a bit easier to see what's going on here if we lose the duplicated code.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1771185298
More information about the hotspot-dev
mailing list