RFR: 8290910: Wrong memory state is picked in SuperWord::co_locate_pack() [v3]

Fei Gao fgao at openjdk.org
Tue Sep 6 02:10:49 UTC 2022


On Tue, 6 Sep 2022 02:03:06 GMT, Fei Gao <fgao at openjdk.org> wrote:

>> After [JDK-8283091](https://bugs.openjdk.org/browse/JDK-8283091), the loop below can be vectorized partially.
>> Statement 1 can be vectorized but statement 2 can't.
>> 
>> // int[] iArr; long[] lArrFld; int i1,i2;
>> for (i1 = 6; i1 < 227; i1++) {
>>   iArr[i1] += lArrFld[i1]++; // statement 1
>>   iArr[i1 + 1] -= (i2++); // statement 2
>> }
>> 
>> 
>> But we got incorrect results because the vector packs of iArr are
>> scheduled incorrectly like:
>> 
>> ...
>> load_vector XMM1,[R8 + #16 + R11 << #2]
>> movl    RDI, [R8 + #20 + R11 << #2] # int
>> load_vector XMM2,[R9 + #8 + R11 << #3]
>> subl    RDI, R11    # int
>> vpaddq  XMM3,XMM2,XMM0  ! add packedL
>> store_vector [R9 + #8 + R11 << #3],XMM3
>> vector_cast_l2x  XMM2,XMM2  !
>> vpaddd  XMM1,XMM2,XMM1  ! add packedI
>> addl    RDI, #228   # int
>> movl    [R8 + #20 + R11 << #2], RDI # int
>> movl    RBX, [R8 + #24 + R11 << #2] # int
>> subl    RBX, R11    # int
>> addl    RBX, #227   # int
>> movl    [R8 + #24 + R11 << #2], RBX # int
>> ...
>> movl    RBX, [R8 + #40 + R11 << #2] # int
>> subl    RBX, R11    # int
>> addl    RBX, #223   # int
>> movl    [R8 + #40 + R11 << #2], RBX # int
>> movl    RDI, [R8 + #44 + R11 << #2] # int
>> subl    RDI, R11    # int
>> addl    RDI, #222   # int
>> movl    [R8 + #44 + R11 << #2], RDI # int
>> store_vector [R8 + #16 + R11 << #2],XMM1
>> ...
>> 
>> simplified as:
>> 
>> load_vector iArr in statement 1
>> unvectorized loads/stores in statement 2
>> store_vector iArr in statement 1
>> 
>> We cannot pick the memory state from the first load for LoadI pack
>> here, as the LoadI vector operation must load the new values in memory
>> after iArr writes `iArr[i1 + 1] - (i2++)` to `iArr[i1 + 1]`(statement 2).
>> We must take the memory state of the last load where we have assigned
>> new values `iArr[i1 + 1] - (i2++)` to the iArr array.
>> 
>> In [JDK-8240281](https://bugs.openjdk.org/browse/JDK-8240281), we picked the memory state of the first load[1]. Different
>> from the scenario in [JDK-8240281](https://bugs.openjdk.org/browse/JDK-8240281), the store, which is dependent on an
>> earlier load here, is in a pack to be scheduled and the LoadI pack
>> depends on the last_mem. As designed[2], to schedule the StoreI pack,
>> all memory operations in another single pack should be moved in the same
>> direction. We know that the store in the pack depends on one of loads in
>> the LoadI pack, so the LoadI pack should be scheduled before the StoreI
>> pack. And the LoadI pack depends on the last_mem, so the last_mem must
>> be scheduled before the LoadI pack and also before the store pack.
>> Therefore, we need to take the memory state of the last load for the
>> LoadI pack here.
>> 
>> To fix it, the pack adds additional checks while picking the memory state
>> of the first load. When the store locates in a pack and the load pack
>> relies on the last_mem, we shouldn't choose the memory state of the
>> first load but choose the memory state of the last load.
>> 
>> [1]https://github.com/openjdk/jdk/blob/0ae834105740f7cf73fe96be22e0f564ad29b18d/src/hotspot/share/opto/superword.cpp#L2380
>> [2]https://github.com/openjdk/jdk/blob/0ae834105740f7cf73fe96be22e0f564ad29b18d/src/hotspot/share/opto/superword.cpp#L2232
>
> Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision:
> 
>  - Fix the interleaving cases using as index offset and add new reduced case from JDK-8293216
>    
>    Change-Id: Ia20009e262e49ef0d6096133f00acad614b4a1dc
>  - Merge branch 'master' into fg8290910
>    
>    Change-Id: I2393a1f4f744b2ed258803c82f3198c2f2e5a8ac
>  - Code style change: add one space
>    
>    Change-Id: I2794060ac0f9dbe006e32f202111ee08f09d96a1
>  - 8290910: Wrong memory state is picked in SuperWord::co_locate_pack()
>    
>    After JDK-8283091, the loop below can be vectorized partially.
>    Statement 1 can be vectorized but statement 2 can't.
>    ```
>    // int[] iArr; long[] lArrFld; int i1,i2;
>    for (i1 = 6; i1 < 227; i1++) {
>      iArr[i1] += lArrFld[i1]++; // statement 1
>      iArr[i1 + 1] -= (i2++); // statement 2
>    }
>    ```
>    
>    But we got incorrect results because the vector packs of iArr are
>    scheduled incorrectly like:
>    ```
>    ...
>    load_vector XMM1,[R8 + #16 + R11 << #2]
>    movl    RDI, [R8 + #20 + R11 << #2] # int
>    load_vector XMM2,[R9 + #8 + R11 << #3]
>    subl    RDI, R11    # int
>    vpaddq  XMM3,XMM2,XMM0  ! add packedL
>    store_vector [R9 + #8 + R11 << #3],XMM3
>    vector_cast_l2x  XMM2,XMM2  !
>    vpaddd  XMM1,XMM2,XMM1  ! add packedI
>    addl    RDI, #228   # int
>    movl    [R8 + #20 + R11 << #2], RDI # int
>    movl    RBX, [R8 + #24 + R11 << #2] # int
>    subl    RBX, R11    # int
>    addl    RBX, #227   # int
>    movl    [R8 + #24 + R11 << #2], RBX # int
>    ...
>    movl    RBX, [R8 + #40 + R11 << #2] # int
>    subl    RBX, R11    # int
>    addl    RBX, #223   # int
>    movl    [R8 + #40 + R11 << #2], RBX # int
>    movl    RDI, [R8 + #44 + R11 << #2] # int
>    subl    RDI, R11    # int
>    addl    RDI, #222   # int
>    movl    [R8 + #44 + R11 << #2], RDI # int
>    store_vector [R8 + #16 + R11 << #2],XMM1
>    ...
>    ```
>    simplified as:
>    ```
>    load_vector iArr in statement 1
>    unvectorized loads/stores in statement 2
>    store_vector iArr in statement 1
>    ```
>    We cannot pick the memory state from the first load for LoadI pack
>    here, as the LoadI vector operation must load the new values in memory
>    after iArr writes 'iArr[i1 + 1] - (i2++)' to 'iArr[i1 + 1]'(statement 2).
>    We must take the memory state of the last load where we have assigned
>    new values ('iArr[i1 + 1] - (i2++)') to the iArr array.
>    
>    In JDK-8240281, we picked the memory state of the first load. Different
>    from the scenario in JDK-8240281, the store, which is dependent on an
>    earlier load here, is in a pack to be scheduled and the LoadI pack
>    depends on the last_mem. As designed[2], to schedule the StoreI pack,
>    all memory operations in another single pack should be moved in the same
>    direction. We know that the store in the pack depends on one of loads in
>    the LoadI pack, so the LoadI pack should be scheduled before the StoreI
>    pack. And the LoadI pack depends on the last_mem, so the last_mem must
>    be scheduled before the LoadI pack and also before the store pack.
>    Therefore, we need to take the memory state of the last load for the
>    LoadI pack here.
>    
>    To fix it, the pack adds additional checks while picking the memory state
>    of the first load. When the store locates in a pack and the load pack
>    relies on the last_mem, we shouldn't choose the memory state of the
>    first load but choose the memory state of the last load.
>    
>    [1]https://github.com/openjdk/jdk/blob/0ae834105740f7cf73fe96be22e0f564ad29b18d/src/hotspot/share/opto/superword.cpp#L2380
>    [2]https://github.com/openjdk/jdk/blob/0ae834105740f7cf73fe96be22e0f564ad29b18d/src/hotspot/share/opto/superword.cpp#L2232
>    
>    Jira: ENTLLT-5482
>    Change-Id: I341d10b91957b60a1b4aff8116723e54083a5fb8
>    CustomizedGitHooks: yes

I also updated the testcase by adding new interleaving cases with different offset, and adding the reduced testcase from [JDK-8293216](https://bugs.openjdk.org/browse/JDK-8293216), which is a dup of the problem.

-------------

PR: https://git.openjdk.org/jdk/pull/9898


More information about the hotspot-compiler-dev mailing list