RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v11]

Andrew Haley aph at openjdk.org
Mon Sep 23 10:55:40 UTC 2024


On Mon, 23 Sep 2024 09:40:19 GMT, Mikhail Ablakatov <duke at openjdk.org> wrote:

>> Hello,
>> 
>> Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. 
>> 
>> The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements.
>> 
>> At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required.
>> 
>> # Performance
>> 
>> ## Neoverse N1
>> 
>> 
>>   --------------------------------------------------------------------------------------------
>>   Version                                            Baseline           This patch
>>   --------------------------------------------------------------------------------------------
>>   Benchmark                   (size)  Mode  Cnt      Score    Error     Score     Error  Units
>>   --------------------------------------------------------------------------------------------
>>   ArraysHashCode.bytes             1  avgt   15      1.249 ?  0.060     1.247 ?   0.062  ns/op
>>   ArraysHashCode.bytes            10  avgt   15      8.754 ?  0.028     4.387 ?   0.015  ns/op
>>   ArraysHashCode.bytes           100  avgt   15     98.596 ?  0.051    26.655 ?   0.097  ns/op
>>   ArraysHashCode.bytes         10000  avgt   15  10150.578 ?  1.352  2649.962 ? 216.744  ns/op
>>   ArraysHashCode.chars             1  avgt   15      1.286 ?  0.062     1.246 ?   0.054  ns/op
>>   ArraysHashCode.chars            10  avgt   15      8.731 ?  0.002     5.344 ?   0.003  ns/op
>>   ArraysHashCode.chars           100  avgt   15     98.632 ?  0.048    23.023 ?   0.142  ns/op
>>   ArraysHashCode.chars         10000  avgt   15  10150.658 ?  3.374  2410.504 ?   8.872  ns/op
>>   ArraysHashCode.ints              1  avgt   15      1.189 ?  0.005     1.187 ?   0.001  ns/op
>>   ArraysHashCode.ints             10  avgt   15      8.730 ?  0.002     5.676 ?   0.001  ns/op
>>   ArraysHashCode.ints            100  avgt   15     98.559 ?  0.016    24.378 ?   0.006  ns/op
>>   ArraysHashCode.ints          10000  avgt   15  10148.752 ?  1.336  2419.015 ?   0.492  ns/op
>>   ArraysHashCode.multibytes        1  avgt   15      1.037 ?  0.001     1.037 ?   0.001  ...
>
> Mikhail Ablakatov has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - Add asm tests for Neon Vector - Scalar insts
>  - fixup: restrict Vm to V0-V15 for mulvs when esize is H

src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 2887:

> 2885:       f(0b10, 23, 22), f(index & 1, 21), rf(Vm, 16), f(op2, 15, 12), f(index >> 1, 11);             \
> 2886:     }                                                                                               \
> 2887:     f(0, 10), rf(Vn, 5), rf(Vd, 0);                                                                 \

Suggestion:

#define INSN(NAME, op1, op2)                                                              \
  void NAME(FloatRegister Vd, SIMD_Arrangement T, FloatRegister Vn, FloatRegister Vm, int index) { \
    starti;                                                                               \
    assert(T == T4H || T == T8H || T == T2S || T == T4S, "invalid arrangement");          \
    assert(index >= 0 &&                                                                  \
               ((T == T2S && index <= 1) || (T != T2S && index <= 3) || (T == T8H && index <= 7)), \
           "invalid index");                                                              \
    assert((T != T4H && T != T8H) || Vm->encoding() < 16, "invalid source SIMD&FP register"); \
    f(0, 31), f((int)T & 1, 30), f(op1, 29), f(0b01111, 28, 24), f(0b01, 23, 22);         \
    if (T == T4H || T == T8H) {                                                           \
      f(index & 0b11, 21, 20), lrf(Vm, 16);                                               \
    } else {                                                                              \
      f(index & 1, 21), rf(Vm, 16);                                                       \
    }                                                                                     \
    f(op2, 15, 12), f(index >> 1, 11), f(0, 10), rf(Vn, 5), rf(Vd, 0);                    \

I think it's a bit easier to see what's going on here if we lose the duplicated code.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1771185298


More information about the hotspot-dev mailing list