RFR: 8308606: C2 SuperWord: remove alignment checks when not required [v6]
Emanuel Peter
epeter at openjdk.org
Mon Jun 19 13:44:10 UTC 2023
On Fri, 16 Jun 2023 03:30:58 GMT, Fei Gao <fgao at openjdk.org> wrote:
>> I'm collecting the new benchmark results here, so that we see the effect of misaligned load-stores.
>> I have a series of control cases (aligned), and a series of misaligned cases.
>>
>> -------------
>> Machine: 11th Gen Intel® Core™ i7-11850H @ 2.50GHz × 16. With AVX512 support.
>>
>> With patch:
>>
>> Benchmark (COUNT) (seed) Mode Cnt Score Error Units
>> VectorAlignment.VectorAlignmentNoSuperWord.bench000B_control 2048 0 avgt 2465.281 ns/op
>> VectorAlignment.VectorAlignmentNoSuperWord.bench000C_control 2048 0 avgt 2467.440 ns/op
>> VectorAlignment.VectorAlignmentNoSuperWord.bench000D_control 2048 0 avgt 1276.895 ns/op
>> VectorAlignment.VectorAlignmentNoSuperWord.bench000F_control 2048 0 avgt 1313.390 ns/op
>> VectorAlignment.VectorAlignmentNoSuperWord.bench000I_control 2048 0 avgt 2465.260 ns/op
>> VectorAlignment.VectorAlignmentNoSuperWord.bench000L_control 2048 0 avgt 2469.814 ns/op
>> VectorAlignment.VectorAlignmentNoSuperWord.bench000S_control 2048 0 avgt 2466.305 ns/op
>> VectorAlignment.VectorAlignmentNoSuperWord.bench001_control 2048 0 avgt 2470.130 ns/op
>> VectorAlignment.VectorAlignmentNoSuperWord.bench100B_misaligned_load 2048 0 avgt 2463.569 ns/op
>> VectorAlignment.VectorAlignmentNoSuperWord.bench100C_misaligned_load 2048 0 avgt 2467.426 ns/op
>> VectorAlignment.VectorAlignmentNoSuperWord.bench100D_misaligned_load 2048 0 avgt 1244.256 ns/op
>> VectorAlignment.VectorAlignmentNoSuperWord.bench100F_misaligned_load 2048 0 avgt 1268.847 ns/op
>> VectorAlignment.VectorAlignmentNoSuperWord.bench100I_misaligned_load 2048 0 avgt 2465.870 ns/op
>> VectorAlignment.VectorAlign...
>
>> aarch64 asimd: vectorizing the misaligned cases leads to clear performance win compared to non-vectorization. However, we can see that the vectorized misaligned cases are consistently a bit slower than the vectorized aligned cases.
>
> Hi @eme64 , thanks for your perf data! I also tried your new benchmark on some latest `aarch64` machines using `asimd`. Here are part of results:
>
> VectorAlignment.VectorAlignmentSuperWord.bench000B_control 2048 0 avgt 152.831 ns/op
> VectorAlignment.VectorAlignmentSuperWord.bench000C_control 2048 0 avgt 285.819 ns/op
> VectorAlignment.VectorAlignmentSuperWord.bench000D_control 2048 0 avgt 749.996 ns/op
> VectorAlignment.VectorAlignmentSuperWord.bench000F_control 2048 0 avgt 396.433 ns/op
> VectorAlignment.VectorAlignmentSuperWord.bench000I_control 2048 0 avgt 560.767 ns/op
> VectorAlignment.VectorAlignmentSuperWord.bench000L_control 2048 0 avgt 1131.909 ns/op
> VectorAlignment.VectorAlignmentSuperWord.bench000S_control 2048 0 avgt 285.215 ns/op
> VectorAlignment.VectorAlignmentSuperWord.bench001_control 2048 0 avgt 562.436 ns/op
> VectorAlignment.VectorAlignmentSuperWord.bench100B_misaligned_load 2048 0 avgt 152.459 ns/op
> VectorAlignment.VectorAlignmentSuperWord.bench100C_misaligned_load 2048 0 avgt 290.888 ns/op
> VectorAlignment.VectorAlignmentSuperWord.bench100D_misaligned_load 2048 0 avgt 754.443 ns/op
> VectorAlignment.VectorAlignmentSuperWord.bench100F_misaligned_load 2048 0 avgt 386.633 ns/op
> VectorAlignment.VectorAlignmentSuperWord.bench100I_misaligned_load 2048 0 avgt 560.587 ns/op
> VectorAlignment.VectorAlignmentSuperWord.bench100L_misaligned_load 2048 0 avgt 1134.492 ns/op
> VectorAlignment.VectorAlignmentSuperWord.bench100S_misaligned_load 2048 ...
@fg1417 perfect, thanks for looking into that!
Is there something you still want me to change on this RFE?
-------------
PR Comment: https://git.openjdk.org/jdk/pull/14096#issuecomment-1597213651
More information about the hotspot-compiler-dev
mailing list