RFR: 8308606: C2 SuperWord: remove alignment checks when not required [v6]

Fei Gao fgao at openjdk.org
Fri Jun 16 03:36:13 UTC 2023


On Wed, 14 Jun 2023 11:27:35 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> aarch64 asimd: vectorizing the misaligned cases leads to clear performance win compared to non-vectorization. However, we can see that the vectorized misaligned cases are consistently a bit slower than the vectorized aligned cases.

Hi @eme64 , thanks for your perf data! I also tried your new benchmark on some latest `aarch64` machines using `asimd`. Here are part of results:

  VectorAlignment.VectorAlignmentSuperWord.bench000B_control                                   2048       0  avgt        152.831          ns/op
  VectorAlignment.VectorAlignmentSuperWord.bench000C_control                                   2048       0  avgt        285.819          ns/op
  VectorAlignment.VectorAlignmentSuperWord.bench000D_control                                   2048       0  avgt        749.996          ns/op
  VectorAlignment.VectorAlignmentSuperWord.bench000F_control                                   2048       0  avgt        396.433          ns/op
  VectorAlignment.VectorAlignmentSuperWord.bench000I_control                                   2048       0  avgt        560.767          ns/op
  VectorAlignment.VectorAlignmentSuperWord.bench000L_control                                   2048       0  avgt       1131.909          ns/op
  VectorAlignment.VectorAlignmentSuperWord.bench000S_control                                   2048       0  avgt        285.215          ns/op
  VectorAlignment.VectorAlignmentSuperWord.bench001_control                                    2048       0  avgt        562.436          ns/op
  VectorAlignment.VectorAlignmentSuperWord.bench100B_misaligned_load                           2048       0  avgt        152.459          ns/op
  VectorAlignment.VectorAlignmentSuperWord.bench100C_misaligned_load                           2048       0  avgt        290.888          ns/op
  VectorAlignment.VectorAlignmentSuperWord.bench100D_misaligned_load                           2048       0  avgt        754.443          ns/op
  VectorAlignment.VectorAlignmentSuperWord.bench100F_misaligned_load                           2048       0  avgt        386.633          ns/op
  VectorAlignment.VectorAlignmentSuperWord.bench100I_misaligned_load                           2048       0  avgt        560.587          ns/op
  VectorAlignment.VectorAlignmentSuperWord.bench100L_misaligned_load                           2048       0  avgt       1134.492          ns/op
  VectorAlignment.VectorAlignmentSuperWord.bench100S_misaligned_load                           2048       0  avgt        284.768          ns/op



I believe that the perf gap between the vectorized misaligned cases and the vectorized aligned cases may become smaller and sometimes prospectively can be removed on newer `aarch64` machines.

Also, I strongly agree on your conclusion: it is clearly profitable to vectorize these misaligned cases. 

Thanks!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/14096#issuecomment-1594036940


More information about the hotspot-compiler-dev mailing list