RFR: 8322770: Implement C2 VectorizedHashCode on AArch64

Mikhail Ablakatov duke at openjdk.org
Fri Jul 5 17:25:34 UTC 2024


On Thu, 16 May 2024 12:40:30 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> Hi,
>> 
>>>  I can update the patch with current results on Monday and we could decide how to proceed with this PR after that. Sounds good?
>> 
>> Yes, that's right.
>
>> Hi @theRealAph ! You may find the latest version here: [mikabl-arm at b3db421](https://github.com/mikabl-arm/jdk/commit/b3db421c795f683db1a001853990026bafc2ed4b) . I gave a short explanation in the commit message, feel free to ask for more details if required.
>> 
>> Unfortunately, it still contains critical bugs and I won't be able to take a look into the issue before the next week at best. Until it's fixed, it's not possible to run the benchmarks. Although I expect it to improve performance on longer integer arrays based on a benchmark I've written in C++ and Assembly. The results aren't comparable to the jmh results, so I won't post them here.
> 
> OK. One small thing, I think it's possible to rearrange things a bit to use `mlav`, which may help performance. No need for that until the code is correct, though.

Hi @theRealAph ! This took a while, but please find a fixed version here: https://github.com/mikabl-arm/jdk/tree/285826-vmul

Here are performance numbers collected for Neoverse V2 compared to the common baseline and the latest state of this PR:

                                                          |    d2ea6b1e657    |    f19203015fb    |    5504227bfe3   |
                                                          |     baseline      |        PR         |    285826-vmul   |
----------------------------------------------------------|---------------------------------------|------------------|------
Benchmark                               (size)  Mode  Cnt |    Score    Error |    Score    Error |    Score   Error | Units
----------------------------------------------------------|---------------------------------------|------------------|------
ArraysHashCode.bytes                         1  avgt   15 |    0.859 ?  0.166 |    0.720 ?  0.103 |    0.732 ? 0.105 | ns/op
ArraysHashCode.bytes                        10  avgt   15 |    4.440 ?  0.013 |    2.262 ?  0.009 |    3.454 ? 0.057 | ns/op
ArraysHashCode.bytes                       100  avgt   15 |   78.642 ?  0.119 |   15.997 ?  0.023 |   12.753 ? 0.072 | ns/op
ArraysHashCode.bytes                     10000  avgt   15 | 9248.961 ? 11.332 | 1879.905 ? 11.609 | 1345.014 ? 1.947 | ns/op
ArraysHashCode.chars                         1  avgt   15 |    0.695 ?  0.036 |    0.694 ?  0.035 |    0.682 ? 0.036 | ns/op
ArraysHashCode.chars                        10  avgt   15 |    4.436 ?  0.015 |    2.428 ?  0.034 |    3.352 ? 0.031 | ns/op
ArraysHashCode.chars                       100  avgt   15 |   78.660 ?  0.113 |   14.508 ?  0.075 |   11.784 ? 0.088 | ns/op
ArraysHashCode.chars                     10000  avgt   15 | 9253.807 ? 13.660 | 2010.053 ?  3.549 | 1344.716 ? 1.936 | ns/op
ArraysHashCode.ints                          1  avgt   15 |    0.635 ?  0.022 |    0.640 ?  0.022 |    0.640 ? 0.022 | ns/op
ArraysHashCode.ints                         10  avgt   15 |    4.424 ?  0.006 |    2.752 ?  0.012 |    3.388 ? 0.004 | ns/op
ArraysHashCode.ints                        100  avgt   15 |   78.680 ?  0.120 |   14.794 ?  0.131 |   11.090 ? 0.055 | ns/op
ArraysHashCode.ints                      10000  avgt   15 | 9249.520 ? 13.305 | 1997.441 ?  3.299 | 1340.916 ? 1.843 | ns/op
ArraysHashCode.multibytes                    1  avgt   15 |    0.566 ?  0.023 |    0.563 ?  0.021 |    0.554 ? 0.012 | ns/op
ArraysHashCode.multibytes                   10  avgt   15 |    2.679 ?  0.018 |    1.798 ?  0.038 |    1.973 ? 0.021 | ns/op
ArraysHashCode.multibytes                  100  avgt   15 |   36.934 ?  0.055 |    9.118 ?  0.018 |   12.712 ? 0.026 | ns/op
ArraysHashCode.multibytes                10000  avgt   15 | 4861.700 ?  6.563 | 1005.809 ?  2.260 |  721.366 ? 1.570 | ns/op
ArraysHashCode.multichars                    1  avgt   15 |    0.557 ?  0.016 |    0.552 ?  0.001 |    0.563 ? 0.021 | ns/op
ArraysHashCode.multichars                   10  avgt   15 |    2.700 ?  0.018 |    1.840 ?  0.024 |    1.978 ? 0.008 | ns/op
ArraysHashCode.multichars                  100  avgt   15 |   36.932 ?  0.054 |    8.633 ?  0.020 |    8.678 ? 0.052 | ns/op
ArraysHashCode.multichars                10000  avgt   15 | 4859.462 ?  6.693 | 1063.788 ?  3.057 |  752.857 ? 5.262 | ns/op
ArraysHashCode.multiints                     1  avgt   15 |    0.574 ?  0.023 |    0.554 ?  0.011 |    0.559 ? 0.017 | ns/op
ArraysHashCode.multiints                    10  avgt   15 |    2.707 ?  0.028 |    1.907 ?  0.031 |    1.992 ? 0.036 | ns/op
ArraysHashCode.multiints                   100  avgt   15 |   36.942 ?  0.056 |    9.141 ?  0.013 |    8.174 ? 0.029 | ns/op
ArraysHashCode.multiints                 10000  avgt   15 | 4872.540 ?  7.479 | 1187.393 ? 12.083 |  785.256 ? 9.472 | ns/op
ArraysHashCode.multishorts                   1  avgt   15 |    0.558 ?  0.016 |    0.555 ?  0.012 |    0.566 ? 0.022 | ns/op
ArraysHashCode.multishorts                  10  avgt   15 |    2.696 ?  0.015 |    1.854 ?  0.027 |    1.983 ? 0.009 | ns/op
ArraysHashCode.multishorts                 100  avgt   15 |   36.930 ?  0.051 |    8.652 ?  0.011 |    8.681 ? 0.039 | ns/op
ArraysHashCode.multishorts               10000  avgt   15 | 4863.966 ?  6.736 | 1068.627 ?  1.902 |  760.280 ? 5.150 | ns/op
ArraysHashCode.shorts                        1  avgt   15 |    0.665 ?  0.058 |    0.644 ?  0.022 |    0.636 ? 0.023 | ns/op
ArraysHashCode.shorts                       10  avgt   15 |    4.431 ?  0.006 |    2.432 ?  0.024 |    3.332 ? 0.026 | ns/op
ArraysHashCode.shorts                      100  avgt   15 |   78.630 ?  0.103 |   14.521 ?  0.077 |   11.783 ? 0.093 | ns/op
ArraysHashCode.shorts                    10000  avgt   15 | 9249.908 ? 12.039 | 2010.461 ?  2.548 | 1344.441 ? 1.818 | ns/op
StringHashCode.Algorithm.defaultLatin1       1  avgt   15 |    0.770 ?  0.001 |    0.770 ?  0.001 |    0.770 ? 0.001 | ns/op
StringHashCode.Algorithm.defaultLatin1      10  avgt   15 |    4.305 ?  0.009 |    2.260 ?  0.009 |    3.433 ? 0.015 | ns/op
StringHashCode.Algorithm.defaultLatin1     100  avgt   15 |   78.355 ?  0.102 |   16.140 ?  0.038 |   12.767 ? 0.023 | ns/op
StringHashCode.Algorithm.defaultLatin1   10000  avgt   15 | 9269.665 ? 13.817 | 1893.354 ?  3.677 | 1345.571 ? 1.930 | ns/op
StringHashCode.Algorithm.defaultUTF16        1  avgt   15 |    0.736 ?  0.100 |    0.653 ?  0.083 |    0.690 ? 0.101 | ns/op
StringHashCode.Algorithm.defaultUTF16       10  avgt   15 |    4.280 ?  0.018 |    2.374 ?  0.021 |    3.394 ? 0.010 | ns/op
StringHashCode.Algorithm.defaultUTF16      100  avgt   15 |   78.312 ?  0.118 |   14.603 ?  0.103 |   11.837 ? 0.016 | ns/op
StringHashCode.Algorithm.defaultUTF16    10000  avgt   15 | 9249.562 ? 13.113 | 2011.717 ?  4.097 | 1344.715 ? 1.896 | ns/op
StringHashCode.cached                      N/A  avgt   15 |    0.539 ?  0.027 |    0.525 ?  0.018 |    0.525 ? 0.018 | ns/op
StringHashCode.empty                       N/A  avgt   15 |    0.861 ?  0.163 |    0.670 ?  0.079 |    0.694 ? 0.093 | ns/op
StringHashCode.notCached                   N/A  avgt   15 |    0.698 ?  0.108 |    0.648 ?  0.024 |    0.637 ? 0.023 | ns/op


There are several known issues:

- [ ] For arrays shorter than the number of elements processed by a single iteration of the Neon loop performance is not optimal, though still better than the baseline's.
- [ ] The intrinsic take 364 Bytes in the worst case (for BYTE/BOOLEAN types) which may either significantly increase code size or limit inlining opportunities.
- [ ]  As mentioned before, the implementation might be affected by https://bugs.openjdk.org/browse/JDK-8139457 .

To address the first two we could implement the vectorized part of the algorithm as a separate stub method. Please let me know if this sound like a right approach or you have other suggestions.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18487#issuecomment-2211186951


More information about the hotspot-dev mailing list