RFR: 8290910: Wrong memory state is picked in SuperWord::co_locate_pack() [v3]
Fei Gao
fgao at openjdk.org
Tue Sep 6 02:03:06 UTC 2022
> After [JDK-8283091](https://bugs.openjdk.org/browse/JDK-8283091), the loop below can be vectorized partially.
> Statement 1 can be vectorized but statement 2 can't.
>
> // int[] iArr; long[] lArrFld; int i1,i2;
> for (i1 = 6; i1 < 227; i1++) {
> iArr[i1] += lArrFld[i1]++; // statement 1
> iArr[i1 + 1] -= (i2++); // statement 2
> }
>
>
> But we got incorrect results because the vector packs of iArr are
> scheduled incorrectly like:
>
> ...
> load_vector XMM1,[R8 + #16 + R11 << #2]
> movl RDI, [R8 + #20 + R11 << #2] # int
> load_vector XMM2,[R9 + #8 + R11 << #3]
> subl RDI, R11 # int
> vpaddq XMM3,XMM2,XMM0 ! add packedL
> store_vector [R9 + #8 + R11 << #3],XMM3
> vector_cast_l2x XMM2,XMM2 !
> vpaddd XMM1,XMM2,XMM1 ! add packedI
> addl RDI, #228 # int
> movl [R8 + #20 + R11 << #2], RDI # int
> movl RBX, [R8 + #24 + R11 << #2] # int
> subl RBX, R11 # int
> addl RBX, #227 # int
> movl [R8 + #24 + R11 << #2], RBX # int
> ...
> movl RBX, [R8 + #40 + R11 << #2] # int
> subl RBX, R11 # int
> addl RBX, #223 # int
> movl [R8 + #40 + R11 << #2], RBX # int
> movl RDI, [R8 + #44 + R11 << #2] # int
> subl RDI, R11 # int
> addl RDI, #222 # int
> movl [R8 + #44 + R11 << #2], RDI # int
> store_vector [R8 + #16 + R11 << #2],XMM1
> ...
>
> simplified as:
>
> load_vector iArr in statement 1
> unvectorized loads/stores in statement 2
> store_vector iArr in statement 1
>
> We cannot pick the memory state from the first load for LoadI pack
> here, as the LoadI vector operation must load the new values in memory
> after iArr writes `iArr[i1 + 1] - (i2++)` to `iArr[i1 + 1]`(statement 2).
> We must take the memory state of the last load where we have assigned
> new values `iArr[i1 + 1] - (i2++)` to the iArr array.
>
> In [JDK-8240281](https://bugs.openjdk.org/browse/JDK-8240281), we picked the memory state of the first load[1]. Different
> from the scenario in [JDK-8240281](https://bugs.openjdk.org/browse/JDK-8240281), the store, which is dependent on an
> earlier load here, is in a pack to be scheduled and the LoadI pack
> depends on the last_mem. As designed[2], to schedule the StoreI pack,
> all memory operations in another single pack should be moved in the same
> direction. We know that the store in the pack depends on one of loads in
> the LoadI pack, so the LoadI pack should be scheduled before the StoreI
> pack. And the LoadI pack depends on the last_mem, so the last_mem must
> be scheduled before the LoadI pack and also before the store pack.
> Therefore, we need to take the memory state of the last load for the
> LoadI pack here.
>
> To fix it, the pack adds additional checks while picking the memory state
> of the first load. When the store locates in a pack and the load pack
> relies on the last_mem, we shouldn't choose the memory state of the
> first load but choose the memory state of the last load.
>
> [1]https://github.com/openjdk/jdk/blob/0ae834105740f7cf73fe96be22e0f564ad29b18d/src/hotspot/share/opto/superword.cpp#L2380
> [2]https://github.com/openjdk/jdk/blob/0ae834105740f7cf73fe96be22e0f564ad29b18d/src/hotspot/share/opto/superword.cpp#L2232
Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision:
- Fix the interleaving cases using as index offset and add new reduced case from JDK-8293216
Change-Id: Ia20009e262e49ef0d6096133f00acad614b4a1dc
- Merge branch 'master' into fg8290910
Change-Id: I2393a1f4f744b2ed258803c82f3198c2f2e5a8ac
- Code style change: add one space
Change-Id: I2794060ac0f9dbe006e32f202111ee08f09d96a1
- 8290910: Wrong memory state is picked in SuperWord::co_locate_pack()
After JDK-8283091, the loop below can be vectorized partially.
Statement 1 can be vectorized but statement 2 can't.
```
// int[] iArr; long[] lArrFld; int i1,i2;
for (i1 = 6; i1 < 227; i1++) {
iArr[i1] += lArrFld[i1]++; // statement 1
iArr[i1 + 1] -= (i2++); // statement 2
}
```
But we got incorrect results because the vector packs of iArr are
scheduled incorrectly like:
```
...
load_vector XMM1,[R8 + #16 + R11 << #2]
movl RDI, [R8 + #20 + R11 << #2] # int
load_vector XMM2,[R9 + #8 + R11 << #3]
subl RDI, R11 # int
vpaddq XMM3,XMM2,XMM0 ! add packedL
store_vector [R9 + #8 + R11 << #3],XMM3
vector_cast_l2x XMM2,XMM2 !
vpaddd XMM1,XMM2,XMM1 ! add packedI
addl RDI, #228 # int
movl [R8 + #20 + R11 << #2], RDI # int
movl RBX, [R8 + #24 + R11 << #2] # int
subl RBX, R11 # int
addl RBX, #227 # int
movl [R8 + #24 + R11 << #2], RBX # int
...
movl RBX, [R8 + #40 + R11 << #2] # int
subl RBX, R11 # int
addl RBX, #223 # int
movl [R8 + #40 + R11 << #2], RBX # int
movl RDI, [R8 + #44 + R11 << #2] # int
subl RDI, R11 # int
addl RDI, #222 # int
movl [R8 + #44 + R11 << #2], RDI # int
store_vector [R8 + #16 + R11 << #2],XMM1
...
```
simplified as:
```
load_vector iArr in statement 1
unvectorized loads/stores in statement 2
store_vector iArr in statement 1
```
We cannot pick the memory state from the first load for LoadI pack
here, as the LoadI vector operation must load the new values in memory
after iArr writes 'iArr[i1 + 1] - (i2++)' to 'iArr[i1 + 1]'(statement 2).
We must take the memory state of the last load where we have assigned
new values ('iArr[i1 + 1] - (i2++)') to the iArr array.
In JDK-8240281, we picked the memory state of the first load. Different
from the scenario in JDK-8240281, the store, which is dependent on an
earlier load here, is in a pack to be scheduled and the LoadI pack
depends on the last_mem. As designed[2], to schedule the StoreI pack,
all memory operations in another single pack should be moved in the same
direction. We know that the store in the pack depends on one of loads in
the LoadI pack, so the LoadI pack should be scheduled before the StoreI
pack. And the LoadI pack depends on the last_mem, so the last_mem must
be scheduled before the LoadI pack and also before the store pack.
Therefore, we need to take the memory state of the last load for the
LoadI pack here.
To fix it, the pack adds additional checks while picking the memory state
of the first load. When the store locates in a pack and the load pack
relies on the last_mem, we shouldn't choose the memory state of the
first load but choose the memory state of the last load.
[1]https://github.com/openjdk/jdk/blob/0ae834105740f7cf73fe96be22e0f564ad29b18d/src/hotspot/share/opto/superword.cpp#L2380
[2]https://github.com/openjdk/jdk/blob/0ae834105740f7cf73fe96be22e0f564ad29b18d/src/hotspot/share/opto/superword.cpp#L2232
Jira: ENTLLT-5482
Change-Id: I341d10b91957b60a1b4aff8116723e54083a5fb8
CustomizedGitHooks: yes
-------------
Changes:
- all: https://git.openjdk.org/jdk/pull/9898/files
- new: https://git.openjdk.org/jdk/pull/9898/files/01d64113..c733f039
Webrevs:
- full: https://webrevs.openjdk.org/?repo=jdk&pr=9898&range=02
- incr: https://webrevs.openjdk.org/?repo=jdk&pr=9898&range=01-02
Stats: 89280 lines in 1321 files changed: 36710 ins; 41329 del; 11241 mod
Patch: https://git.openjdk.org/jdk/pull/9898.diff
Fetch: git fetch https://git.openjdk.org/jdk pull/9898/head:pull/9898
PR: https://git.openjdk.org/jdk/pull/9898
More information about the hotspot-compiler-dev
mailing list