RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v9]
Mikhail Ablakatov
duke at openjdk.org
Thu Sep 19 14:07:54 UTC 2024
On Wed, 18 Sep 2024 11:49:08 GMT, Andrew Haley <aph at openjdk.org> wrote:
>> Mikhail Ablakatov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision:
>>
>> - Merge branch 'master' into 8322770
>> - cleanup: adjust a comment in the light of the latest change
>> - cleanup: fix comment formatting
>>
>> Co-authored-by: Andrew Haley <aph-open at littlepinkcloud.com>
>> - Optimize both the stub and inlined parts of the implementation
>>
>> Process T_CHAR/T_SHORT elements using T8H arrangement instead of T4H.
>> Add a non-unrolled vectorized loop to the stub to handle vectorizable
>> tail portions of arrays multiple to 4/8 elements (for ints / other
>> types). Make the stub process array as a whole instead of relying on
>> the inlined part to process an unvectorizable tail.
>> - cleanup: add comments and simplify the orr ins
>> - cleanup: remove redundant copyright notice
>> - cleanup: use a constexpr function for intpow instead of a templated class
>> - cleanup: address review comments
>> - cleanup: remove a redundant parameter
>> - 8322770: AArch64: C2: Implement VectorizedHashCode
>>
>> The code to calculate a hash code consists of two parts: a stub method that
>> implements a vectorized loop using Neon instruction which processes 16 or 32
>> elements per iteration depending on the data type; and an unrolled inlined
>> scalar loop that processes remaining tail elements.
>>
>> [Performance]
>>
>> [[Neoverse V2]]
>> ```
>> | 328a053 (master) | dc2909f (this) |
>> ----------------------------------------------------------------------------------------------------------
>> Benchmark (size) Mode Cnt | Score Error | Score Error | Units
>> ----------------------------------------------------------------------------------------------------------
>> ArraysHashCode.bytes 1 avgt 15 | 0.805 ? 0.206 | 0.815 ? 0.141 | ns/op
>> ArraysHashCode.bytes 10 avgt 15 | 4.362 ? 0.013 | 3.522 ? 0.124 | ns/op
>> ArraysHashCode.bytes 100 avgt 15 | 78.374 ? 0.136 | 12.935 ? 0.016 | ns/op
>> ArraysHashCode.bytes 10000 avgt 15 | 9247.335 ? 13.691 | 1344.770 ? 1.898 | ns/op
>> ArraysHashCode.cha...
>
> src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 2877:
>
>> 2875: f(0b01111, 28, 24); \
>> 2876: if (T == T4H || T == T8H) { \
>> 2877: f(0b01, 23, 22), f(index & 0b11, 21, 20), rf(Vm, 16), f(op2, 15, 12), f(index >> 2 & 1, 11); \
>
> This isn't right.
> Please go to test/hotspot/gtest/aarch64/aarch64-asmtest.py and add `mulv` to the set of tested instructions. Please make sure you test all modes.
Could you point me towards asm tests for *Vector - Scalar* insts like `fmlavs`, `fmlsvs` or `fmulxvs` ([assembler_aarch64.hpp#L2857](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/assembler_aarch64.hpp:2857))? Judging by git blame these were added without any new tests. Am I right assuming we are missing asm tests for *Vector - Scalar* instructions in general?
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1766902957
More information about the hotspot-dev
mailing list