RFR: JDK-8308994: C2: Re-implement experimental post loop vectorization

Emanuel Peter epeter at openjdk.org
Wed Jun 28 11:09:24 UTC 2023


On Wed, 28 Jun 2023 10:06:45 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> ## TL;DR
>> 
>> This patch completely re-implements C2's experimental post loop vectorization for better stability, maintainability and performance. Compared with the original implementation, this new implementation adds a standalone loop phase in C2's ideal loop phases and can vectorize more post loops. The original implementation and all code related to multi-versioned post loops are deleted in this patch. More details about this patch can be found in the document replied in this pull request.
>
> src/hotspot/share/opto/vmaskloop.cpp line 854:
> 
>> 852:       }
>> 853:     }
>> 854:   }
> 
> What happens if you have a int and a float slice? You don't seem to separate them here but just thread them together.
> 
> `./java -Xcomp -XX:-TieredCompilation -XX:+TraceNewVectors -XX:+TraceLoopOpts -XX:+UnlockExperimentalVMOptions -XX:+UseMaskedLoop -XX:+TraceMaskedLoop -XX:CompileCommand=compileonly,Test::test0 -XX:+TraceSuperWord Test.java`
> 
> 
> 
> public class Test {
>     static int RANGE = 1024;
> 
>     public static void main(String[] strArr) {
>         float a[] = new float[RANGE];
>         int b[] = new int[RANGE];
>         short c[] = new short[RANGE];
>         test0(a, b, c);
>     }
> 
>     static void test0(float[] a, int[] b, short[] c) {
>         for (int i = 0; i < RANGE; i++) {
>             a[i] ++;
>             b[i] ++;
>             c[i] ++;
>         }
>     }
> }
> 
> 
> It seems the memory state is now passed between the int and float `StoreVectorMasked`:
> 
> 
> Duplicated vector nodes with lane size = 4
> Offset = 0
>  3524  StoreVectorMasked  === 479 484 475 3525 3510  [[ 3527 ]]  @float[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; mismatched  Memory: @float[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; !orig=3519,[472],[151],829 !jvms: Test::test0 @ bci:15 (line 13)
>  3525  AddVF  === _ 3526 3517  [[ 3524 ]]  #vectorz[16]:{float} !orig=3518,[473],[130] !jvms: Test::test0 @ bci:14 (line 13)
>  3526  LoadVectorMasked  === 502 484 475 3510  [[ 3525 ]]  @float[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; mismatched #vectorz[16]:{float} !orig=3516,[474],[128] !jvms: Test::test0 @ bci:12 (line 13)
>  3527  StoreVectorMasked  === 479 3524 471 3528 3510  [[ 3519 ]]  @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=7; mismatched  Memory: @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=7; !orig=3523,[461],[207],823 !jvms: Test::test0 @ bci:22 (line 14)
>  3528  AddVI  === _ 3529 3521  [[ 3527 ]]  #vectorz[16]:{int} !orig=3522,[462],[186] !jvms: Test::test0 @ bci:21 (line 14)
>  3529  LoadVectorMasked  === 502 480 471 3510  [[ 3528 ]]  @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=7; mismatched #vectorz[16]:{int} !orig=3520,[470],[185] !jvms: Test::test0 @ bci:19 (line 14)
> Offset = 1
>  3519  StoreVectorMasked  === 479 3527 3530 3518 3511  [[ 493 484 3523 ]]  @float[int:>=0] (java/lang/Cloneable,java/io/Serial...

![image](https://github.com/openjdk/jdk/assets/32593061/a00e4973-2faf-428e-9794-48abb945e815)

That indeed looks like a mixup in the int/float memory slices. Not sure if there are any bad consequences, but that should be fixed.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1245001527


More information about the hotspot-dev mailing list