RFR: 8318446: C2: implement StoreNode::Ideal_merge_stores

Tue Jan 16 15:13:35 UTC 2024

On Wed, 25 Oct 2023 14:59:07 GMT, Quan Anh Mai <qamai at openjdk.org> wrote:

>> @merykitty do you have examples for both? Maybe stores to fields already works. Merging loads and stores may be out of scope. That sounds a little much like SLP. We can still try to do that in a future RFE. We could even try to use (masked) vector instructions.
>
> @eme64 I have tried your patch, it seems that there are some limitations:
> 
> - The stores are not merged if the order is not right (e.g `a[2] = 2; a[1] = 1;`)
> - The stores are not merged if they are floating point constants.
> - The stores are not merged if they are consecutive fields in an object. E.g:
> 
> 
>     class Point {
>         int x; int y;
>     }
> 
>     p.x = 1;
>     p.y = 2; // Cannot merge into mov [p.x], 0x200000001
> 
> 
> Regarding the final point, fields may be of different types with different sizes and there may be padding between them. This means that for load-store sequence merges, I think SLP cannot handle these cases.
> 
> Thanks.

@merykitty I just looked at this project again today.

About the limitations: Yes, this is deliberately limited for now. We could make it much more smart, and create a sort of straight-line code SLP algorithm that could even allow for different element sizes and padding in between (using masked loads / stores). Maybe that would be worth attempting.

For now, this is just to satisfy the limited requirements of library folks who do not want to see everybody using Unsafe to merge stores.

About fields stores: I see that different fields apparently are not in a chain, but rather independent:

    static void test3(Point p) {
        p.x = 1;
        p.y = 2;
    }

40  StoreI  === 28 7 39 21  [[ 16 ]]  @Test$Point+12 *, name=x, idx=4;  Memory: @Test$Point+12 *, name=x, idx=4; !jvms: Test::test3 @ bci:2 (line 36)
44  StoreI  === 28 7 43 41  [[ 16 ]]  @Test$Point+16 *, name=y, idx=5;  Memory: @Test$Point+16 *, name=y, idx=5; !jvms: Test::test3 @ bci:7 (line 37)

I should be able to allow for that quite easily, they can either be in a chain, or have the same memory state as input.

@merykitty @cl4es @RogerRiggs @vnkozlov I wonder if you think that the approach of this PR is good, and if you have any suggestions about it?

- Is a separate phase ok?
- Is this PR in a sweet-spot that reaches the goals of the library-folks, but is not too complex?
- Would you prefer a more general solution, like a straight-line SLP algorithm, that can merge (even vectorize) any load / store sequences, even merge accesses with different element sizes and with gaps/padding?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/16245#issuecomment-1893927494
PR Comment: https://git.openjdk.org/jdk/pull/16245#issuecomment-1893940205