RFR: 8290910: Wrong memory state is picked in SuperWord::co_locate_pack() [v2]

Vladimir Kozlov kvn at openjdk.org
Tue Aug 30 03:36:05 UTC 2022


On Wed, 17 Aug 2022 08:17:26 GMT, Fei Gao <fgao at openjdk.org> wrote:

>> After [JDK-8283091](https://bugs.openjdk.org/browse/JDK-8283091), the loop below can be vectorized partially.
>> Statement 1 can be vectorized but statement 2 can't.
>> 
>> // int[] iArr; long[] lArrFld; int i1,i2;
>> for (i1 = 6; i1 < 227; i1++) {
>>   iArr[i1] += lArrFld[i1]++; // statement 1
>>   iArr[i1 + 1] -= (i2++); // statement 2
>> }
>> 
>> 
>> But we got incorrect results because the vector packs of iArr are
>> scheduled incorrectly like:
>> 
>> ...
>> load_vector XMM1,[R8 + #16 + R11 << #2]
>> movl    RDI, [R8 + #20 + R11 << #2] # int
>> load_vector XMM2,[R9 + #8 + R11 << #3]
>> subl    RDI, R11    # int
>> vpaddq  XMM3,XMM2,XMM0  ! add packedL
>> store_vector [R9 + #8 + R11 << #3],XMM3
>> vector_cast_l2x  XMM2,XMM2  !
>> vpaddd  XMM1,XMM2,XMM1  ! add packedI
>> addl    RDI, #228   # int
>> movl    [R8 + #20 + R11 << #2], RDI # int
>> movl    RBX, [R8 + #24 + R11 << #2] # int
>> subl    RBX, R11    # int
>> addl    RBX, #227   # int
>> movl    [R8 + #24 + R11 << #2], RBX # int
>> ...
>> movl    RBX, [R8 + #40 + R11 << #2] # int
>> subl    RBX, R11    # int
>> addl    RBX, #223   # int
>> movl    [R8 + #40 + R11 << #2], RBX # int
>> movl    RDI, [R8 + #44 + R11 << #2] # int
>> subl    RDI, R11    # int
>> addl    RDI, #222   # int
>> movl    [R8 + #44 + R11 << #2], RDI # int
>> store_vector [R8 + #16 + R11 << #2],XMM1
>> ...
>> 
>> simplified as:
>> 
>> load_vector iArr in statement 1
>> unvectorized loads/stores in statement 2
>> store_vector iArr in statement 1
>> 
>> We cannot pick the memory state from the first load for LoadI pack
>> here, as the LoadI vector operation must load the new values in memory
>> after iArr writes `iArr[i1 + 1] - (i2++)` to `iArr[i1 + 1]`(statement 2).
>> We must take the memory state of the last load where we have assigned
>> new values `iArr[i1 + 1] - (i2++)` to the iArr array.
>> 
>> In [JDK-8240281](https://bugs.openjdk.org/browse/JDK-8240281), we picked the memory state of the first load[1]. Different
>> from the scenario in [JDK-8240281](https://bugs.openjdk.org/browse/JDK-8240281), the store, which is dependent on an
>> earlier load here, is in a pack to be scheduled and the LoadI pack
>> depends on the last_mem. As designed[2], to schedule the StoreI pack,
>> all memory operations in another single pack should be moved in the same
>> direction. We know that the store in the pack depends on one of loads in
>> the LoadI pack, so the LoadI pack should be scheduled before the StoreI
>> pack. And the LoadI pack depends on the last_mem, so the last_mem must
>> be scheduled before the LoadI pack and also before the store pack.
>> Therefore, we need to take the memory state of the last load for the
>> LoadI pack here.
>> 
>> To fix it, the pack adds additional checks while picking the memory state
>> of the first load. When the store locates in a pack and the load pack
>> relies on the last_mem, we shouldn't choose the memory state of the
>> first load but choose the memory state of the last load.
>> 
>> [1]https://github.com/openjdk/jdk/blob/0ae834105740f7cf73fe96be22e0f564ad29b18d/src/hotspot/share/opto/superword.cpp#L2380
>> [2]https://github.com/openjdk/jdk/blob/0ae834105740f7cf73fe96be22e0f564ad29b18d/src/hotspot/share/opto/superword.cpp#L2232
>
> Fei Gao has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Code style change: add one space
>   
>   Change-Id: I2794060ac0f9dbe006e32f202111ee08f09d96a1

Can you show assembler after this fix?

Would be interest to see results for other interleaving cases:

    a[i-1] += <expr1>; // similar to your case
    a[i] += <expr2>;


    a[i+1] += <expr1>;
    a[i] += <expr2>;


    a[i] += <expr1>;
    a[i-1] += <expr2>;


Also when `+2` is used instead of `+1`.  Or `+<pack_size>`.

-------------

PR: https://git.openjdk.org/jdk/pull/9898


More information about the hotspot-compiler-dev mailing list