RFR: JDK-8308994: C2: Re-implement experimental post loop vectorization
Emanuel Peter
epeter at openjdk.org
Wed Jun 28 11:09:24 UTC 2023
On Wed, 28 Jun 2023 10:06:45 GMT, Emanuel Peter <epeter at openjdk.org> wrote:
>> ## TL;DR
>>
>> This patch completely re-implements C2's experimental post loop vectorization for better stability, maintainability and performance. Compared with the original implementation, this new implementation adds a standalone loop phase in C2's ideal loop phases and can vectorize more post loops. The original implementation and all code related to multi-versioned post loops are deleted in this patch. More details about this patch can be found in the document replied in this pull request.
>
> src/hotspot/share/opto/vmaskloop.cpp line 854:
>
>> 852: }
>> 853: }
>> 854: }
>
> What happens if you have a int and a float slice? You don't seem to separate them here but just thread them together.
>
> `./java -Xcomp -XX:-TieredCompilation -XX:+TraceNewVectors -XX:+TraceLoopOpts -XX:+UnlockExperimentalVMOptions -XX:+UseMaskedLoop -XX:+TraceMaskedLoop -XX:CompileCommand=compileonly,Test::test0 -XX:+TraceSuperWord Test.java`
>
>
>
> public class Test {
> static int RANGE = 1024;
>
> public static void main(String[] strArr) {
> float a[] = new float[RANGE];
> int b[] = new int[RANGE];
> short c[] = new short[RANGE];
> test0(a, b, c);
> }
>
> static void test0(float[] a, int[] b, short[] c) {
> for (int i = 0; i < RANGE; i++) {
> a[i] ++;
> b[i] ++;
> c[i] ++;
> }
> }
> }
>
>
> It seems the memory state is now passed between the int and float `StoreVectorMasked`:
>
>
> Duplicated vector nodes with lane size = 4
> Offset = 0
> 3524 StoreVectorMasked === 479 484 475 3525 3510 [[ 3527 ]] @float[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; mismatched Memory: @float[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; !orig=3519,[472],[151],829 !jvms: Test::test0 @ bci:15 (line 13)
> 3525 AddVF === _ 3526 3517 [[ 3524 ]] #vectorz[16]:{float} !orig=3518,[473],[130] !jvms: Test::test0 @ bci:14 (line 13)
> 3526 LoadVectorMasked === 502 484 475 3510 [[ 3525 ]] @float[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; mismatched #vectorz[16]:{float} !orig=3516,[474],[128] !jvms: Test::test0 @ bci:12 (line 13)
> 3527 StoreVectorMasked === 479 3524 471 3528 3510 [[ 3519 ]] @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=7; mismatched Memory: @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=7; !orig=3523,[461],[207],823 !jvms: Test::test0 @ bci:22 (line 14)
> 3528 AddVI === _ 3529 3521 [[ 3527 ]] #vectorz[16]:{int} !orig=3522,[462],[186] !jvms: Test::test0 @ bci:21 (line 14)
> 3529 LoadVectorMasked === 502 480 471 3510 [[ 3528 ]] @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=7; mismatched #vectorz[16]:{int} !orig=3520,[470],[185] !jvms: Test::test0 @ bci:19 (line 14)
> Offset = 1
> 3519 StoreVectorMasked === 479 3527 3530 3518 3511 [[ 493 484 3523 ]] @float[int:>=0] (java/lang/Cloneable,java/io/Serial...
![image](https://github.com/openjdk/jdk/assets/32593061/a00e4973-2faf-428e-9794-48abb945e815)
That indeed looks like a mixup in the int/float memory slices. Not sure if there are any bad consequences, but that should be fixed.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1245001527
More information about the hotspot-dev
mailing list