RFR: 8308606: C2 SuperWord: remove alignment checks when not required [v6]

Emanuel Peter epeter at openjdk.org
Mon Jun 19 13:44:10 UTC 2023


On Fri, 16 Jun 2023 03:30:58 GMT, Fei Gao <fgao at openjdk.org> wrote:

>> I'm collecting the new benchmark results here, so that we see the effect of misaligned load-stores.
>> I have a series of control cases (aligned), and a series of misaligned cases.
>> 
>> -------------
>> Machine: 11th Gen Intel® Core™ i7-11850H @ 2.50GHz × 16. With AVX512 support.
>> 
>> With patch:
>> 
>> Benchmark                                                                                            (COUNT)  (seed)  Mode  Cnt     Score   Error  Units
>> VectorAlignment.VectorAlignmentNoSuperWord.bench000B_control                                            2048       0  avgt       2465.281          ns/op
>> VectorAlignment.VectorAlignmentNoSuperWord.bench000C_control                                            2048       0  avgt       2467.440          ns/op
>> VectorAlignment.VectorAlignmentNoSuperWord.bench000D_control                                            2048       0  avgt       1276.895          ns/op
>> VectorAlignment.VectorAlignmentNoSuperWord.bench000F_control                                            2048       0  avgt       1313.390          ns/op
>> VectorAlignment.VectorAlignmentNoSuperWord.bench000I_control                                            2048       0  avgt       2465.260          ns/op
>> VectorAlignment.VectorAlignmentNoSuperWord.bench000L_control                                            2048       0  avgt       2469.814          ns/op
>> VectorAlignment.VectorAlignmentNoSuperWord.bench000S_control                                            2048       0  avgt       2466.305          ns/op
>> VectorAlignment.VectorAlignmentNoSuperWord.bench001_control                                             2048       0  avgt       2470.130          ns/op
>> VectorAlignment.VectorAlignmentNoSuperWord.bench100B_misaligned_load                                    2048       0  avgt       2463.569          ns/op
>> VectorAlignment.VectorAlignmentNoSuperWord.bench100C_misaligned_load                                    2048       0  avgt       2467.426          ns/op
>> VectorAlignment.VectorAlignmentNoSuperWord.bench100D_misaligned_load                                    2048       0  avgt       1244.256          ns/op
>> VectorAlignment.VectorAlignmentNoSuperWord.bench100F_misaligned_load                                    2048       0  avgt       1268.847          ns/op
>> VectorAlignment.VectorAlignmentNoSuperWord.bench100I_misaligned_load                                    2048       0  avgt       2465.870          ns/op
>> VectorAlignment.VectorAlign...
>
>> aarch64 asimd: vectorizing the misaligned cases leads to clear performance win compared to non-vectorization. However, we can see that the vectorized misaligned cases are consistently a bit slower than the vectorized aligned cases.
> 
> Hi @eme64 , thanks for your perf data! I also tried your new benchmark on some latest `aarch64` machines using `asimd`. Here are part of results:
> 
>   VectorAlignment.VectorAlignmentSuperWord.bench000B_control                                   2048       0  avgt        152.831          ns/op
>   VectorAlignment.VectorAlignmentSuperWord.bench000C_control                                   2048       0  avgt        285.819          ns/op
>   VectorAlignment.VectorAlignmentSuperWord.bench000D_control                                   2048       0  avgt        749.996          ns/op
>   VectorAlignment.VectorAlignmentSuperWord.bench000F_control                                   2048       0  avgt        396.433          ns/op
>   VectorAlignment.VectorAlignmentSuperWord.bench000I_control                                   2048       0  avgt        560.767          ns/op
>   VectorAlignment.VectorAlignmentSuperWord.bench000L_control                                   2048       0  avgt       1131.909          ns/op
>   VectorAlignment.VectorAlignmentSuperWord.bench000S_control                                   2048       0  avgt        285.215          ns/op
>   VectorAlignment.VectorAlignmentSuperWord.bench001_control                                    2048       0  avgt        562.436          ns/op
>   VectorAlignment.VectorAlignmentSuperWord.bench100B_misaligned_load                           2048       0  avgt        152.459          ns/op
>   VectorAlignment.VectorAlignmentSuperWord.bench100C_misaligned_load                           2048       0  avgt        290.888          ns/op
>   VectorAlignment.VectorAlignmentSuperWord.bench100D_misaligned_load                           2048       0  avgt        754.443          ns/op
>   VectorAlignment.VectorAlignmentSuperWord.bench100F_misaligned_load                           2048       0  avgt        386.633          ns/op
>   VectorAlignment.VectorAlignmentSuperWord.bench100I_misaligned_load                           2048       0  avgt        560.587          ns/op
>   VectorAlignment.VectorAlignmentSuperWord.bench100L_misaligned_load                           2048       0  avgt       1134.492          ns/op
>   VectorAlignment.VectorAlignmentSuperWord.bench100S_misaligned_load                           2048   ...

@fg1417 perfect, thanks for looking into that!
Is there something you still want me to change on this RFE?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/14096#issuecomment-1597213651


More information about the hotspot-compiler-dev mailing list