RFR: 8290910: Wrong memory state is picked in SuperWord::co_locate_pack() [v3]

Fei Gao fgao at openjdk.org
Tue Sep 6 02:03:06 UTC 2022


> After [JDK-8283091](https://bugs.openjdk.org/browse/JDK-8283091), the loop below can be vectorized partially.
> Statement 1 can be vectorized but statement 2 can't.
> 
> // int[] iArr; long[] lArrFld; int i1,i2;
> for (i1 = 6; i1 < 227; i1++) {
>   iArr[i1] += lArrFld[i1]++; // statement 1
>   iArr[i1 + 1] -= (i2++); // statement 2
> }
> 
> 
> But we got incorrect results because the vector packs of iArr are
> scheduled incorrectly like:
> 
> ...
> load_vector XMM1,[R8 + #16 + R11 << #2]
> movl    RDI, [R8 + #20 + R11 << #2] # int
> load_vector XMM2,[R9 + #8 + R11 << #3]
> subl    RDI, R11    # int
> vpaddq  XMM3,XMM2,XMM0  ! add packedL
> store_vector [R9 + #8 + R11 << #3],XMM3
> vector_cast_l2x  XMM2,XMM2  !
> vpaddd  XMM1,XMM2,XMM1  ! add packedI
> addl    RDI, #228   # int
> movl    [R8 + #20 + R11 << #2], RDI # int
> movl    RBX, [R8 + #24 + R11 << #2] # int
> subl    RBX, R11    # int
> addl    RBX, #227   # int
> movl    [R8 + #24 + R11 << #2], RBX # int
> ...
> movl    RBX, [R8 + #40 + R11 << #2] # int
> subl    RBX, R11    # int
> addl    RBX, #223   # int
> movl    [R8 + #40 + R11 << #2], RBX # int
> movl    RDI, [R8 + #44 + R11 << #2] # int
> subl    RDI, R11    # int
> addl    RDI, #222   # int
> movl    [R8 + #44 + R11 << #2], RDI # int
> store_vector [R8 + #16 + R11 << #2],XMM1
> ...
> 
> simplified as:
> 
> load_vector iArr in statement 1
> unvectorized loads/stores in statement 2
> store_vector iArr in statement 1
> 
> We cannot pick the memory state from the first load for LoadI pack
> here, as the LoadI vector operation must load the new values in memory
> after iArr writes `iArr[i1 + 1] - (i2++)` to `iArr[i1 + 1]`(statement 2).
> We must take the memory state of the last load where we have assigned
> new values `iArr[i1 + 1] - (i2++)` to the iArr array.
> 
> In [JDK-8240281](https://bugs.openjdk.org/browse/JDK-8240281), we picked the memory state of the first load[1]. Different
> from the scenario in [JDK-8240281](https://bugs.openjdk.org/browse/JDK-8240281), the store, which is dependent on an
> earlier load here, is in a pack to be scheduled and the LoadI pack
> depends on the last_mem. As designed[2], to schedule the StoreI pack,
> all memory operations in another single pack should be moved in the same
> direction. We know that the store in the pack depends on one of loads in
> the LoadI pack, so the LoadI pack should be scheduled before the StoreI
> pack. And the LoadI pack depends on the last_mem, so the last_mem must
> be scheduled before the LoadI pack and also before the store pack.
> Therefore, we need to take the memory state of the last load for the
> LoadI pack here.
> 
> To fix it, the pack adds additional checks while picking the memory state
> of the first load. When the store locates in a pack and the load pack
> relies on the last_mem, we shouldn't choose the memory state of the
> first load but choose the memory state of the last load.
> 
> [1]https://github.com/openjdk/jdk/blob/0ae834105740f7cf73fe96be22e0f564ad29b18d/src/hotspot/share/opto/superword.cpp#L2380
> [2]https://github.com/openjdk/jdk/blob/0ae834105740f7cf73fe96be22e0f564ad29b18d/src/hotspot/share/opto/superword.cpp#L2232

Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision:

 - Fix the interleaving cases using as index offset and add new reduced case from JDK-8293216
   
   Change-Id: Ia20009e262e49ef0d6096133f00acad614b4a1dc
 - Merge branch 'master' into fg8290910
   
   Change-Id: I2393a1f4f744b2ed258803c82f3198c2f2e5a8ac
 - Code style change: add one space
   
   Change-Id: I2794060ac0f9dbe006e32f202111ee08f09d96a1
 - 8290910: Wrong memory state is picked in SuperWord::co_locate_pack()
   
   After JDK-8283091, the loop below can be vectorized partially.
   Statement 1 can be vectorized but statement 2 can't.
   ```
   // int[] iArr; long[] lArrFld; int i1,i2;
   for (i1 = 6; i1 < 227; i1++) {
     iArr[i1] += lArrFld[i1]++; // statement 1
     iArr[i1 + 1] -= (i2++); // statement 2
   }
   ```
   
   But we got incorrect results because the vector packs of iArr are
   scheduled incorrectly like:
   ```
   ...
   load_vector XMM1,[R8 + #16 + R11 << #2]
   movl    RDI, [R8 + #20 + R11 << #2] # int
   load_vector XMM2,[R9 + #8 + R11 << #3]
   subl    RDI, R11    # int
   vpaddq  XMM3,XMM2,XMM0  ! add packedL
   store_vector [R9 + #8 + R11 << #3],XMM3
   vector_cast_l2x  XMM2,XMM2  !
   vpaddd  XMM1,XMM2,XMM1  ! add packedI
   addl    RDI, #228   # int
   movl    [R8 + #20 + R11 << #2], RDI # int
   movl    RBX, [R8 + #24 + R11 << #2] # int
   subl    RBX, R11    # int
   addl    RBX, #227   # int
   movl    [R8 + #24 + R11 << #2], RBX # int
   ...
   movl    RBX, [R8 + #40 + R11 << #2] # int
   subl    RBX, R11    # int
   addl    RBX, #223   # int
   movl    [R8 + #40 + R11 << #2], RBX # int
   movl    RDI, [R8 + #44 + R11 << #2] # int
   subl    RDI, R11    # int
   addl    RDI, #222   # int
   movl    [R8 + #44 + R11 << #2], RDI # int
   store_vector [R8 + #16 + R11 << #2],XMM1
   ...
   ```
   simplified as:
   ```
   load_vector iArr in statement 1
   unvectorized loads/stores in statement 2
   store_vector iArr in statement 1
   ```
   We cannot pick the memory state from the first load for LoadI pack
   here, as the LoadI vector operation must load the new values in memory
   after iArr writes 'iArr[i1 + 1] - (i2++)' to 'iArr[i1 + 1]'(statement 2).
   We must take the memory state of the last load where we have assigned
   new values ('iArr[i1 + 1] - (i2++)') to the iArr array.
   
   In JDK-8240281, we picked the memory state of the first load. Different
   from the scenario in JDK-8240281, the store, which is dependent on an
   earlier load here, is in a pack to be scheduled and the LoadI pack
   depends on the last_mem. As designed[2], to schedule the StoreI pack,
   all memory operations in another single pack should be moved in the same
   direction. We know that the store in the pack depends on one of loads in
   the LoadI pack, so the LoadI pack should be scheduled before the StoreI
   pack. And the LoadI pack depends on the last_mem, so the last_mem must
   be scheduled before the LoadI pack and also before the store pack.
   Therefore, we need to take the memory state of the last load for the
   LoadI pack here.
   
   To fix it, the pack adds additional checks while picking the memory state
   of the first load. When the store locates in a pack and the load pack
   relies on the last_mem, we shouldn't choose the memory state of the
   first load but choose the memory state of the last load.
   
   [1]https://github.com/openjdk/jdk/blob/0ae834105740f7cf73fe96be22e0f564ad29b18d/src/hotspot/share/opto/superword.cpp#L2380
   [2]https://github.com/openjdk/jdk/blob/0ae834105740f7cf73fe96be22e0f564ad29b18d/src/hotspot/share/opto/superword.cpp#L2232
   
   Jira: ENTLLT-5482
   Change-Id: I341d10b91957b60a1b4aff8116723e54083a5fb8
   CustomizedGitHooks: yes

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/9898/files
  - new: https://git.openjdk.org/jdk/pull/9898/files/01d64113..c733f039

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=9898&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9898&range=01-02

  Stats: 89280 lines in 1321 files changed: 36710 ins; 41329 del; 11241 mod
  Patch: https://git.openjdk.org/jdk/pull/9898.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/9898/head:pull/9898

PR: https://git.openjdk.org/jdk/pull/9898


More information about the hotspot-compiler-dev mailing list