RFR: 8290910: Wrong memory state is picked in SuperWord::co_locate_pack() [v2]
Vladimir Kozlov
kvn at openjdk.org
Tue Aug 30 03:36:05 UTC 2022
On Wed, 17 Aug 2022 08:17:26 GMT, Fei Gao <fgao at openjdk.org> wrote:
>> After [JDK-8283091](https://bugs.openjdk.org/browse/JDK-8283091), the loop below can be vectorized partially.
>> Statement 1 can be vectorized but statement 2 can't.
>>
>> // int[] iArr; long[] lArrFld; int i1,i2;
>> for (i1 = 6; i1 < 227; i1++) {
>> iArr[i1] += lArrFld[i1]++; // statement 1
>> iArr[i1 + 1] -= (i2++); // statement 2
>> }
>>
>>
>> But we got incorrect results because the vector packs of iArr are
>> scheduled incorrectly like:
>>
>> ...
>> load_vector XMM1,[R8 + #16 + R11 << #2]
>> movl RDI, [R8 + #20 + R11 << #2] # int
>> load_vector XMM2,[R9 + #8 + R11 << #3]
>> subl RDI, R11 # int
>> vpaddq XMM3,XMM2,XMM0 ! add packedL
>> store_vector [R9 + #8 + R11 << #3],XMM3
>> vector_cast_l2x XMM2,XMM2 !
>> vpaddd XMM1,XMM2,XMM1 ! add packedI
>> addl RDI, #228 # int
>> movl [R8 + #20 + R11 << #2], RDI # int
>> movl RBX, [R8 + #24 + R11 << #2] # int
>> subl RBX, R11 # int
>> addl RBX, #227 # int
>> movl [R8 + #24 + R11 << #2], RBX # int
>> ...
>> movl RBX, [R8 + #40 + R11 << #2] # int
>> subl RBX, R11 # int
>> addl RBX, #223 # int
>> movl [R8 + #40 + R11 << #2], RBX # int
>> movl RDI, [R8 + #44 + R11 << #2] # int
>> subl RDI, R11 # int
>> addl RDI, #222 # int
>> movl [R8 + #44 + R11 << #2], RDI # int
>> store_vector [R8 + #16 + R11 << #2],XMM1
>> ...
>>
>> simplified as:
>>
>> load_vector iArr in statement 1
>> unvectorized loads/stores in statement 2
>> store_vector iArr in statement 1
>>
>> We cannot pick the memory state from the first load for LoadI pack
>> here, as the LoadI vector operation must load the new values in memory
>> after iArr writes `iArr[i1 + 1] - (i2++)` to `iArr[i1 + 1]`(statement 2).
>> We must take the memory state of the last load where we have assigned
>> new values `iArr[i1 + 1] - (i2++)` to the iArr array.
>>
>> In [JDK-8240281](https://bugs.openjdk.org/browse/JDK-8240281), we picked the memory state of the first load[1]. Different
>> from the scenario in [JDK-8240281](https://bugs.openjdk.org/browse/JDK-8240281), the store, which is dependent on an
>> earlier load here, is in a pack to be scheduled and the LoadI pack
>> depends on the last_mem. As designed[2], to schedule the StoreI pack,
>> all memory operations in another single pack should be moved in the same
>> direction. We know that the store in the pack depends on one of loads in
>> the LoadI pack, so the LoadI pack should be scheduled before the StoreI
>> pack. And the LoadI pack depends on the last_mem, so the last_mem must
>> be scheduled before the LoadI pack and also before the store pack.
>> Therefore, we need to take the memory state of the last load for the
>> LoadI pack here.
>>
>> To fix it, the pack adds additional checks while picking the memory state
>> of the first load. When the store locates in a pack and the load pack
>> relies on the last_mem, we shouldn't choose the memory state of the
>> first load but choose the memory state of the last load.
>>
>> [1]https://github.com/openjdk/jdk/blob/0ae834105740f7cf73fe96be22e0f564ad29b18d/src/hotspot/share/opto/superword.cpp#L2380
>> [2]https://github.com/openjdk/jdk/blob/0ae834105740f7cf73fe96be22e0f564ad29b18d/src/hotspot/share/opto/superword.cpp#L2232
>
> Fei Gao has updated the pull request incrementally with one additional commit since the last revision:
>
> Code style change: add one space
>
> Change-Id: I2794060ac0f9dbe006e32f202111ee08f09d96a1
Can you show assembler after this fix?
Would be interest to see results for other interleaving cases:
a[i-1] += <expr1>; // similar to your case
a[i] += <expr2>;
a[i+1] += <expr1>;
a[i] += <expr2>;
a[i] += <expr1>;
a[i-1] += <expr2>;
Also when `+2` is used instead of `+1`. Or `+<pack_size>`.
-------------
PR: https://git.openjdk.org/jdk/pull/9898
More information about the hotspot-compiler-dev
mailing list