RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v2]

Wed Aug 21 15:00:08 UTC 2024

On Wed, 21 Aug 2024 14:49:05 GMT, Mikhail Ablakatov <duke at openjdk.org> wrote:

>> Mikhail Ablakatov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit:
>> 
>>   8322770: AArch64: C2: Implement VectorizedHashCode
>>   
>>   The code to calculate a hash code consists of two parts: a stub method that
>>   implements a vectorized loop using Neon instruction which processes 16 or 32
>>   elements per iteration depending on the data type; and an unrolled inlined
>>   scalar loop that processes remaining tail elements.
>>   
>>   [Performance]
>>   
>>   [[Neoverse V2]]
>>   ```
>>                                                               |  328a053 (master) |  dc2909f (this)  |
>>   ----------------------------------------------------------------------------------------------------------
>>     Benchmark                               (size)  Mode  Cnt |    Score    Error |    Score   Error | Units
>>   ----------------------------------------------------------------------------------------------------------
>>     ArraysHashCode.bytes                         1  avgt   15 |    0.805 ?  0.206 |    0.815 ? 0.141 | ns/op
>>     ArraysHashCode.bytes                        10  avgt   15 |    4.362 ?  0.013 |    3.522 ? 0.124 | ns/op
>>     ArraysHashCode.bytes                       100  avgt   15 |   78.374 ?  0.136 |   12.935 ? 0.016 | ns/op
>>     ArraysHashCode.bytes                     10000  avgt   15 | 9247.335 ? 13.691 | 1344.770 ? 1.898 | ns/op
>>     ArraysHashCode.chars                         1  avgt   15 |    0.731 ?  0.035 |    0.723 ? 0.046 | ns/op
>>     ArraysHashCode.chars                        10  avgt   15 |    4.359 ?  0.007 |    3.385 ? 0.004 | ns/op
>>     ArraysHashCode.chars                       100  avgt   15 |   78.374 ?  0.117 |   11.903 ? 0.023 | ns/op
>>     ArraysHashCode.chars                     10000  avgt   15 | 9248.328 ? 13.644 | 1344.007 ? 1.795 | ns/op
>>     ArraysHashCode.ints                          1  avgt   15 |    0.746 ?  0.083 |    0.631 ? 0.020 | ns/op
>>     ArraysHashCode.ints                         10  avgt   15 |    4.357 ?  0.009 |    3.387 ? 0.005 | ns/op
>>     ArraysHashCode.ints                        100  avgt   15 |   78.391 ?  0.103 |   10.934 ? 0.015 | ns/op
>>     ArraysHashCode.ints                      10000  avgt   15 | 9248.125 ? 12.583 | 1340.644 ? 1.869 | ns/op
>>     ArraysHashCode.multibytes                    1  avgt   15 |    0.555 ?  0.020 |    0.559 ? 0.020 | ns/op
>>     ArraysHashCode.multibytes                   10  avgt   1...
>
> src/hotspot/share/utilities/intpow.hpp line 58:
> 
>> 56: struct intpow<T, v, 1, no_overflow> {
>> 57:   static const T value = v;
>> 58: };
> 
>> There's no need for any of this metaprogramming. A constexpr function would be better.
> 
> The main advantage of this implementation over one using a `constexpr` function is the ability to verify input parameter values and detect overflows at compile time using `static_assert`. This feature helps prevent user errors that could otherwise be difficult to debug. I chose this approach for its functional benefits over a `constexpr` function. If you have any suggestions on how to implement this functionality in a `constexpr` function, they would be highly welcome, as I might be missing something here. 😃
> 
> Below is a similar `constexpr` implementation. **Please don't accept it through the UI**. I kept a `typedef` template parameter to ensure the return value has the desired width, if needed, without applying a mask at the call site.
> 
> If you'd still prefer a `constexpr` function, please let me know if you have any comments on the implementation below (I'll rename it, remove `no_overflow` parameter and the comments).
> 
> Suggestion:
> 
> #include <limits>
> 
> template <typename T>
> static constexpr T intpow_c(T v, unsigned p, bool no_overflow = false) {
>   // Can't be used in constexpr function
>   // static_assert(v || p, "0^0 is not defined");
> 
>   if (p == 0) {
>     return 1;
>   }
> 
>   T a = intpow_c(v, p / 2, no_overflow);
>   T b = (p % 2) ? v : 1;
> 
>   // Can't be used in constexpr function
>   // static_assert(!no_overflow || a <= std::numeric_limits<T>::max() / a, "Integer overflow");
>   // static_assert(!no_overflow || a * a <= std::numeric_limits<T>::max() / b, "Integer overflow");
> 
>   return a * a * b;
> }

You're assuming a requirement that isn't really there. HotSpot calculates a great many things at startup time, and given that the arguments to intpow here are constants, there's  no possibility of any surprises at runtime. Just assert what needs to be asserted, and please keep utility functions like this as simple as possible, leaving template metaprogramming for places that need it.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1725219259