RFR: 8308606: C2 SuperWord: remove alignment checks when not required [v6]
Fei Gao
fgao at openjdk.org
Fri Jun 16 03:36:13 UTC 2023
On Wed, 14 Jun 2023 11:27:35 GMT, Emanuel Peter <epeter at openjdk.org> wrote:
> aarch64 asimd: vectorizing the misaligned cases leads to clear performance win compared to non-vectorization. However, we can see that the vectorized misaligned cases are consistently a bit slower than the vectorized aligned cases.
Hi @eme64 , thanks for your perf data! I also tried your new benchmark on some latest `aarch64` machines using `asimd`. Here are part of results:
VectorAlignment.VectorAlignmentSuperWord.bench000B_control 2048 0 avgt 152.831 ns/op
VectorAlignment.VectorAlignmentSuperWord.bench000C_control 2048 0 avgt 285.819 ns/op
VectorAlignment.VectorAlignmentSuperWord.bench000D_control 2048 0 avgt 749.996 ns/op
VectorAlignment.VectorAlignmentSuperWord.bench000F_control 2048 0 avgt 396.433 ns/op
VectorAlignment.VectorAlignmentSuperWord.bench000I_control 2048 0 avgt 560.767 ns/op
VectorAlignment.VectorAlignmentSuperWord.bench000L_control 2048 0 avgt 1131.909 ns/op
VectorAlignment.VectorAlignmentSuperWord.bench000S_control 2048 0 avgt 285.215 ns/op
VectorAlignment.VectorAlignmentSuperWord.bench001_control 2048 0 avgt 562.436 ns/op
VectorAlignment.VectorAlignmentSuperWord.bench100B_misaligned_load 2048 0 avgt 152.459 ns/op
VectorAlignment.VectorAlignmentSuperWord.bench100C_misaligned_load 2048 0 avgt 290.888 ns/op
VectorAlignment.VectorAlignmentSuperWord.bench100D_misaligned_load 2048 0 avgt 754.443 ns/op
VectorAlignment.VectorAlignmentSuperWord.bench100F_misaligned_load 2048 0 avgt 386.633 ns/op
VectorAlignment.VectorAlignmentSuperWord.bench100I_misaligned_load 2048 0 avgt 560.587 ns/op
VectorAlignment.VectorAlignmentSuperWord.bench100L_misaligned_load 2048 0 avgt 1134.492 ns/op
VectorAlignment.VectorAlignmentSuperWord.bench100S_misaligned_load 2048 0 avgt 284.768 ns/op
I believe that the perf gap between the vectorized misaligned cases and the vectorized aligned cases may become smaller and sometimes prospectively can be removed on newer `aarch64` machines.
Also, I strongly agree on your conclusion: it is clearly profitable to vectorize these misaligned cases.
Thanks!
-------------
PR Comment: https://git.openjdk.org/jdk/pull/14096#issuecomment-1594036940
More information about the hotspot-compiler-dev
mailing list