RFR: 8322770: Implement C2 VectorizedHashCode on AArch64

Mikhail Ablakatov duke at openjdk.org
Mon Apr 15 22:04:08 UTC 2024


Hello,

Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. 

The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements.

At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required.

# Performance

## Neoverse N1


  --------------------------------------------------------------------------------------------
  Version                                            Baseline           This patch
  --------------------------------------------------------------------------------------------
  Benchmark                   (size)  Mode  Cnt      Score    Error     Score     Error  Units
  --------------------------------------------------------------------------------------------
  ArraysHashCode.bytes             1  avgt   15      1.249 ?  0.060     1.247 ?   0.062  ns/op
  ArraysHashCode.bytes            10  avgt   15      8.754 ?  0.028     4.387 ?   0.015  ns/op
  ArraysHashCode.bytes           100  avgt   15     98.596 ?  0.051    26.655 ?   0.097  ns/op
  ArraysHashCode.bytes         10000  avgt   15  10150.578 ?  1.352  2649.962 ? 216.744  ns/op
  ArraysHashCode.chars             1  avgt   15      1.286 ?  0.062     1.246 ?   0.054  ns/op
  ArraysHashCode.chars            10  avgt   15      8.731 ?  0.002     5.344 ?   0.003  ns/op
  ArraysHashCode.chars           100  avgt   15     98.632 ?  0.048    23.023 ?   0.142  ns/op
  ArraysHashCode.chars         10000  avgt   15  10150.658 ?  3.374  2410.504 ?   8.872  ns/op
  ArraysHashCode.ints              1  avgt   15      1.189 ?  0.005     1.187 ?   0.001  ns/op
  ArraysHashCode.ints             10  avgt   15      8.730 ?  0.002     5.676 ?   0.001  ns/op
  ArraysHashCode.ints            100  avgt   15     98.559 ?  0.016    24.378 ?   0.006  ns/op
  ArraysHashCode.ints          10000  avgt   15  10148.752 ?  1.336  2419.015 ?   0.492  ns/op
  ArraysHashCode.multibytes        1  avgt   15      1.037 ?  0.001     1.037 ?   0.001  ns/op
  ArraysHashCode.multibytes       10  avgt   15      5.481 ?  0.001     3.136 ?   0.001  ns/op
  ArraysHashCode.multibytes      100  avgt   15     50.950 ?  0.006    15.277 ?   0.007  ns/op
  ArraysHashCode.multibytes    10000  avgt   15   5335.181 ?  0.692  1340.850 ?   4.291  ns/op
  ArraysHashCode.multichars        1  avgt   15      1.038 ?  0.001     1.037 ?   0.001  ns/op
  ArraysHashCode.multichars       10  avgt   15      5.480 ?  0.001     3.783 ?   0.001  ns/op
  ArraysHashCode.multichars      100  avgt   15     50.955 ?  0.006    13.890 ?   0.018  ns/op
  ArraysHashCode.multichars    10000  avgt   15   5338.597 ?  0.853  1335.599 ?   0.652  ns/op
  ArraysHashCode.multiints         1  avgt   15      1.042 ?  0.001     1.043 ?   0.001  ns/op
  ArraysHashCode.multiints        10  avgt   15      5.526 ?  0.001     3.866 ?   0.001  ns/op
  ArraysHashCode.multiints       100  avgt   15     50.917 ?  0.005    14.918 ?   0.026  ns/op
  ArraysHashCode.multiints     10000  avgt   15   5348.365 ?  5.836  1287.685 ?   1.083  ns/op
  ArraysHashCode.multishorts       1  avgt   15      1.036 ?  0.001     1.037 ?   0.001  ns/op
  ArraysHashCode.multishorts      10  avgt   15      5.480 ?  0.001     3.783 ?   0.001  ns/op
  ArraysHashCode.multishorts     100  avgt   15     50.975 ?  0.034    13.890 ?   0.015  ns/op
  ArraysHashCode.multishorts   10000  avgt   15   5338.790 ?  1.276  1337.034 ?   1.600  ns/op
  ArraysHashCode.shorts            1  avgt   15      1.187 ?  0.001     1.187 ?   0.001  ns/op
  ArraysHashCode.shorts           10  avgt   15      8.731 ?  0.002     5.342 ?   0.001  ns/op
  ArraysHashCode.shorts          100  avgt   15     98.544 ?  0.013    23.017 ?   0.141  ns/op
  ArraysHashCode.shorts        10000  avgt   15  10148.275 ?  1.119  2408.041 ?   1.478  ns/op

## Neoverse N2, Neoverse V1
Performance metrics have been collected for these cores as well. They are similar to the results above and can be posted upon request.

# Test

Full jtreg passed on AArch64 and x86.

-------------

Commit messages:
 - 8322770: AArch64: C2: Implement VectorizedHashCode

Changes: https://git.openjdk.org/jdk/pull/18487/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18487&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8322770
  Stats: 264 lines in 4 files changed: 263 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/18487.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/18487/head:pull/18487

PR: https://git.openjdk.org/jdk/pull/18487


More information about the hotspot-dev mailing list