[External] : Re: RFC : Approach to handle Allocation Merges in C2 Scalar Replacement

Thu Mar 31 00:26:14 UTC 2022

Actually it is more complicated. May be this approach is incorrect.

Originally it was assumed that we can access directly to p0 and p1 after merge. C2 does not allow it - such access is 
not dominated by its definition. p0.x is illegal in C2 after merge point because we don't know which path is taken 
before merge and p0 could be not defined. Yes, we guard such access with (s == id_base0) check but it does not help 
domination checks in C2.

if (…)   p0 = …
else     p1 = …

s = # selector phi #

if (s == id_base0)      x0 = p0.x
elseif (s == id_base1)  x1 = p1.x

x = phi(x0, x1)

May be instead of placing p0|1.x loads after merge with check you can do split "through phi" transformation.
You do exactly what you do today but placing load clones on corresponding input control edges of merge.
You should know (I think) which path correspond to which id_base so you don't need id_base check.
And replace old phi and load with new phi which merges these new loads.

if (…)   { p0 = …; x0 = p0.x; }
else     { p1 = …; x1 = p1.x; }

x = phi(x0, x1)

Thanks,
Vladimir K

On 3/30/22 4:54 PM, Vladimir Kozlov wrote:
> I was not able to reproduce the issue with my test and your patch.
> 
> Also try to comment out/remove next your new lines to keep orig_load_mem:
> 
> +  // Create unique input for "memory" of Load
> +  Node* orig_load_mem   = original_load->in(LoadNode::Memory);
> +  new_load->replace_edge(orig_load_mem, orig_load_mem->is_Phi() ? orig_load_mem->in(idx) : orig_load_mem);
> 
> 
> Thanks,
> Vladimir K
> 
> On 3/30/22 2:32 PM, Vladimir Kozlov wrote:
>> On 3/30/22 2:28 PM, Vladimir Kozlov wrote:
>>> Hi Cesar,
>>>
>>> Can you also send me your test file TestSimpleMerge.java. I need to reproduce your failure.
>>>
>>> Currently I see wrong memeory edge in LoadI [231] node because it points to memory [122] on path which is not fully 
>>> dominating (on code branch).
>>
>> The memory should be this Phi:
>>
>>   182  Phi  ===  105  122  43  [[ 18 ]]  #memory  Memory: @TestSimpleMerge$Point+12 *, name=x, idx=6; !jvms: 
>> TestSimpleMerge::test @ bci:22 (line 14)
>>
>> May be we got some partial split through Phi transformation which is causing the issue.
>>
>>>
>>> Thanks,
>>> Vladimir K
>>>
>>> [1]
>>>
>>>   41  Initialize  ===  33  1  44  1  1  40  90  [[ 42  43 ]]  !jvms: TestSimpleMerge::test @ bci:0 (line 9)
>>>   43  Proj  ===  41  [[ 133  132  131  123  179  180  181  182  53  53  53  53  233 ]] #2  Memory: @rawptr:BotPTR, 
>>> idx=Raw; !jvms: TestSimpleMerge::test @ bci:0 (line 9)
>>>   101  IfFalse  ===  99  [[ 105 ]] #0 !jvms: TestSimpleMerge::test @ bci:10 (line 11)
>>>
>>>   120  Initialize  ===  112  1  123  1  1  119  171  [[ 121  122 ]]  !jvms: TestSimpleMerge::test @ bci:13 (line 12)
>>>   122  Proj  ===  120  [[ 182  181  180  179  231 ]] #2  Memory: @rawptr:BotPTR, idx=Raw; !jvms: 
>>> TestSimpleMerge::test @ bci:13 (line 12)
>>>   121  Proj  ===  120  [[ 105  124 ]] #0 !jvms: TestSimpleMerge::test @ bci:13 (line 12)
>>>
>>>   221  ConI  ===  0  [[ 220  225 ]]  #int:0
>>>   222  ConI  ===  0  [[ 220 ]]  #int:1
>>>
>>>   105  Region  ===  105  121  101  [[ 105  220  178  179  180  181  182  227 ]]  !jvms: TestSimpleMerge::test @ 
>>> bci:22 (line 14)
>>>
>>>   220  Phi  ===  105  221  222  [[ 225 ]]  #int
>>>   225  CmpI  === _  220  221  [[ 226 ]]
>>>   226  Bool  === _  225  [[ 227 ]] [eq]
>>>   227  If  ===  105  226  [[ 228  229 ]] P=0.000000, C=0.000000
>>>
>>>   229  IfFalse  ===  227  [[ 233  223 ]] #0
>>>   233  LoadI  ===  229  43  232  [[ 224 ]]  @TestSimpleMerge$Point+12 *, name=x, idx=6; #int !orig=[207] !jvms: 
>>> TestSimpleMerge$Point::calc @ bci:1 (line 5) TestSimpleMerge::test @ bci:23 (line 14)
>>>
>>>   228  IfTrue  ===  227  [[ 231  223 ]] #1
>>>   231  LoadI  ===  228  122  230  [[ 224 ]]  @TestSimpleMerge$Point+12 *, name=x, idx=6; #int !orig=[207] !jvms: 
>>> TestSimpleMerge$Point::calc @ bci:1 (line 5) TestSimpleMerge::test @ bci:23 (line 14)
>>>
>>>   223  Region  ===  223  228  229  [[ 223  224  216 ]]
>>>   224  Phi  ===  223  231  233  [[ 216 ]]  #int
>>>   216  Return  ===  223  178  18  8  9 returns 224  [[ 0 ]]
>>>
>>> On 3/29/22 5:34 PM, Cesar Soares Lucas wrote:
>>>> Hi, Vladimir.
>>>>
>>>> Thank you for getting back to me. What you describe is exactly what I understand and what I tried to implement. I 
>>>> also noticed the issue with the nodes floating up and I even tried to prevent that.. but I think the root of the 
>>>> problem is somewhere else.
>>>>
>>>> Please take a look at this printout: https://www.dropbox.com/s/ao30zadovwqswlp/splitted.txt?dl=0 
>>>> <https://urldefense.com/v3/__https://www.dropbox.com/s/ao30zadovwqswlp/splitted.txt?dl=0__;!!ACWV5N9M2RV99hQ!ZJvmUIbT0x_YJJPA3ap6R-qkOaEsH3YOGwu47flogbZamqV6nWVO2Kfx8uUYnVTddI45Zg$> 
>>>> . C2 tells me that this graph has bad dominance: “Use 228 isn’t dominated by def 231”. This error message is result 
>>>> of this verification: https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/loopnode.cpp#L5361 
>>>> <https://urldefense.com/v3/__https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/loopnode.cpp*L5361__;Iw!!ACWV5N9M2RV99hQ!ZJvmUIbT0x_YJJPA3ap6R-qkOaEsH3YOGwu47flogbZamqV6nWVO2Kfx8uUYnVSGiICGJw$> 
>>>>
>>>>
>>>> I agree and disagree with the message. I’m not sure if it’s my limited knowledge of C2 IR or some wording problem in 
>>>> the message: why would 228 need to be dominated by 231 - after all 228 doesn’t use 231? The opposite makes more 
>>>> sense to me: 228 needs to dominate 231 – and that is true AFAICT. Nonetheless, I agree with the message that there 
>>>> is something wrong in the graph: this graph allows for execution to reach node 231 without node 124 being executed!
>>>>
>>>> I created this DRAFT PR so that you can look at the code if you want: https://github.com/openjdk/jdk/pull/8023 
>>>> <https://urldefense.com/v3/__https://github.com/openjdk/jdk/pull/8023__;!!ACWV5N9M2RV99hQ!ZJvmUIbT0x_YJJPA3ap6R-qkOaEsH3YOGwu47flogbZamqV6nWVO2Kfx8uUYnVQl2jEXlA$> 
>>>> .
>>>>
>>>> Please let me know what you think. TIA!
>>>>
>>>> Cesar
>>>>
>>>> *From: *Vladimir Kozlov <vladimir.kozlov at oracle.com>
>>>> *Date: *Monday, March 28, 2022 at 10:48 AM
>>>> *To: *Cesar Soares Lucas <Divino.Cesar at microsoft.com>, Vladimir Ivanov <vladimir.x.ivanov at oracle.com>, hotspot 
>>>> compiler <hotspot-compiler-dev at openjdk.java.net>
>>>> *Cc: *Brian Stafford <Brian.Stafford at microsoft.com>, Martijn Verburg <Martijn.Verburg at microsoft.com>
>>>> *Subject: *Re: [External] : Re: RFC : Approach to handle Allocation Merges in C2 Scalar Replacement
>>>>
>>>> Hi Cesar
>>>>
>>>> On 3/17/22 12:15 PM, Cesar Soares Lucas wrote:
>>>>> Hi all,
>>>>>
>>>>> I’ve been experimenting with the idea explained by Vladimir Ivanov and I was able to implement it and have a few
>>>>> non-trivial merges untangled and scalar replaced. However, I found an obstacle on the road. I was able to overcome it,
>>>>> but I’d very much appreciate your opinion about the matter. Assume we start off with the piece of code below.
>>>>>
>>>>> if (…)     p0 = …
>>>>>
>>>>> else       p1 = …
>>>>>
>>>>> p = phi(p0, p1)
>>>>>
>>>>> return p.x
>>>>>
>>>>> The idea proposed suggests, AFAIU, that we change the code to be something like the below **where we use the bases
>>>>> directly**.
>>>>>
>>>>> if (…)   p0 = …
>>>>>
>>>>> else     p1 = …
>>>>>
>>>>> s = # selector phi #
>>>>>
>>>>> if (s == id_base0)           x0 = p0.x
>>>>>
>>>>> elseif (s == id_base1)    x1 = p1.x
>>>>>
>>>>> else                                  halt()
>>>>>
>>>>> x = phi(x0, x1)
>>>>>
>>>>> return x
>>>>>
>>>>> The problem here is that the definition of “p0” and/or “p1” doesn’t dominate their use in the loads “x0 = p0.x” / 
>>>>> “x1 =
>>>>> p1.x”, therefore certain IR verification steps in C2 fail. I was able to work around the issue by adding Phi nodes 
>>>>> where
>>>>> all inputs are the same (excluding the ctrl), thus creating a “fake” definition for the value. Please see code below.
>>>>
>>>> You have original `p = phi(p0, p1)` which is attached to Region node which merges paths with p0 and p1. `Selector ID
>>>> Phi(id_p0, id_p1)` should be attached to the same Region and guarantee that p0 and p1 are defined at that point. The
>>>> only issue could be is that load nodes don't have control edges and depend on memory edges to schedule. So they could
>>>> float up. I think they should be attached (control edge) to id checks (s == id_base*) to prevent them move up as new
>>>> special case [1]. At least as first approach for experiment.
>>>>
>>>> Thanks,
>>>> Vladimir K
>>>>
>>>> [1] 
>>>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopenjdk%2Fjdk%2Fblob%2Fmaster%2Fsrc%2Fhotspot%2Fshare%2Fopto%2Fmemnode.cpp%23L1737&data=04%7C01%7CDivino.Cesar%40microsoft.com%7Cd6152056b06744f7359e08da10e33a44%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637840865379084345%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=HaPBCqckab%2BpOcIoYEccL04u1o1pqqaUjsYMCslfP7Y%3D&reserved=0 
>>>> <https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Fgithub.com*2Fopenjdk*2Fjdk*2Fblob*2Fmaster*2Fsrc*2Fhotspot*2Fshare*2Fopto*2Fmemnode.cpp*23L1737&data=04*7C01*7CDivino.Cesar*40microsoft.com*7Cd6152056b06744f7359e08da10e33a44*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637840865379084345*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C3000&sdata=HaPBCqckab*2BpOcIoYEccL04u1o1pqqaUjsYMCslfP7Y*3D&reserved=0__;JSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUl!!ACWV5N9M2RV99hQ!ZJvmUIbT0x_YJJPA3ap6R-qkOaEsH3YOGwu47flogbZamqV6nWVO2Kfx8uUYnVTydJtLxg$> 
>>>>
>>>>
>>>>>
>>>>> if (…)   p0 = …
>>>>>
>>>>> else     p1 = …
>>>>>
>>>>> phi0 = phi(p0, p0)
>>>>>
>>>>> phi1 = Phi(p1, p1)
>>>>>
>>>>> s = # selector phi #
>>>>>
>>>>> if (s == id_base0)           x0 = phi0.x
>>>>>
>>>>> elseif (s == id_base1)    x1 = phi1.x
>>>>>
>>>>> else                                  halt()
>>>>>
>>>>> x = phi(x0, x1)
>>>>>
>>>>> return x
>>>>>
>>>>> Transforming the code to look like the above silenced the verifications ( :-] ) and at the end EA+SR was able to 
>>>>> remove
>>>>> the allocations **and get rid of all the Phi’s**.
>>>>>
>>>>> So, my question is: is this workaround acceptable? Do you have a suggestion of a better solution?
>>>>>
>>>>> TIA, Cesar
>>>>>
>>>>> *From: *hotspot-compiler-dev <hotspot-compiler-dev-retn at openjdk.java.net> on behalf of Cesar Soares Lucas
>>>>> <Divino.Cesar at microsoft.com>
>>>>> *Date: *Tuesday, February 22, 2022 at 10:01 AM
>>>>> *To: *Vladimir Ivanov <vladimir.x.ivanov at oracle.com>, Vladimir Kozlov <vladimir.kozlov at oracle.com>, hotspot compiler
>>>>> <hotspot-compiler-dev at openjdk.java.net>
>>>>> *Cc: *Brian Stafford <Brian.Stafford at microsoft.com>, Martijn Verburg <Martijn.Verburg at microsoft.com>
>>>>> *Subject: *Re: [External] : Re: RFC : Approach to handle Allocation Merges in C2 Scalar Replacement
>>>>>
>>>>> Hi Vladimir,
>>>>>
>>>>> I think I've enough feedback to start working on a prototype and see how things play out. Thank you for sharing your
>>>>> ideas/expertise!
>>>>>
>>>>>
>>>>> Cheers,
>>>>> Cesar
>>>>>
>>>>> ________________________________________
>>>>> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
>>>>> Sent: February 21, 2022 10:26 AM
>>>>> To: Cesar Soares Lucas; Vladimir Kozlov; hotspot compiler
>>>>> Cc: Brian Stafford; Martijn Verburg
>>>>> Subject: Re: [External] : Re: RFC : Approach to handle Allocation Merges in C2 Scalar Replacement
>>>>>
>>>>> Hi Cesar,
>>>>>
>>>>>> I with the selector-based base splitting idea using some examples and I think I understand the mechanics well. 
>>>>>> It's conceptually simple and quite effective, I like it! I've another question, though. From your previous message 
>>>>>> I had understood that we needed the clustering because split_unique_types can't handle multiple bases at once, 
>>>>>> which makes sense to me. _This time I
>>>>> don't understand why we need to do the clustering before doing the selector-based transformation_. AFAIU this
>>>>> transformation won't be creating any memory slices but rather just assigning IDs to different bases and then creating
>>>>> specialized memory operations conditionally using the different bases.
>>>>>
>>>>> Yes, that's a valid simplification. It does sound like introducing
>>>>> selector ID doesn't require a separate memory graph (and can be
>>>>> performed before split_unique_types()), but I don't have a good
>>>>> understanding how complicated simplifying selector-based IR shapes would
>>>>> be compared to original (memory op-based) ones. It's definitely worth
>>>>> experimenting with.
>>>>>
>>>>> Best regards,
>>>>> Vladimir Ivanov
>>>>>
>>>>>> ________________________________________
>>>>>> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
>>>>>> Sent: February 17, 2022 7:37 AM
>>>>>> To: Vladimir Kozlov; Cesar Soares Lucas; hotspot compiler
>>>>>> Cc: Brian Stafford; Martijn Verburg
>>>>>> Subject: Re: RFC : Approach to handle Allocation Merges in C2 Scalar Replacement
>>>>>>
>>>>>>
>>>>>>> But depending on the actual shape of the base pointer (here it's
>>>>>>> Phi(obj1,obj2)), it may become infeasible (from performance perspective)
>>>>>>> or even impossible (e.g., for complex loop variant conditions) to
>>>>>>> separate instance slices.
>>>>>>
>>>>>> I'd like to correct myself. After thinking more about the problem, I see
>>>>>> a universal way to scalarize clusters of interfering non-escaping
>>>>>> allocations even in presence of accesses with indeterminate base.
>>>>>>
>>>>>> The crucial property for such cluster of interacting non-escaping
>>>>>> allocations is for any memory access its base is either part of the
>>>>>> cluster or not (it's a static property for every access). Then, when it
>>>>>> is part of the cluster, then the number of possible base values at
>>>>>> runtime is finite and is a subset of the cluster.
>>>>>>
>>>>>> So, memory graph for the cluster can be scalarized as follows:
>>>>>>
>>>>>>      (1) separate memory graph for the cluster
>>>>>>
>>>>>>      (2) on the newly constructed graph:
>>>>>>
>>>>>>         (a) replace every base pointer with an ID ("selector id") and
>>>>>> recreate data graph for selector IDs from base pointers graph;
>>>>>>
>>>>>>         (b) for accesses with indeterminate base pointer, replace them
>>>>>> with a merge of states from relevant allocations selected by "selector id"
>>>>>>
>>>>>>         (c) after 2b, all memory accesses from the cluster memory graph
>>>>>> should have fixed base pointing at a particular allocation
>>>>>>
>>>>>>      (3) scalarize individual allocations from the cluster one by one
>>>>>> (since they are independent now)
>>>>>>
>>>>>>         - additional transformation (like, split-through-phi) should
>>>>>> simplify the graph by reducing the number of cases at merge points to
>>>>>> care about (ideally, leaving only a single one);
>>>>>>
>>>>>>
>>>>>>
>>>>>> As an illustration:
>>>>>>
>>>>>>      val = Load (AddP (Phi R base1 base2) offset) mem # load A.i
>>>>>>
>>>>>> ==(pseudo-code)==>
>>>>>>
>>>>>>      selector = (Phi R #base1 #base2)
>>>>>>
>>>>>>      if (selector == #base1) {
>>>>>>        val1 = Load (base1 + offset) mem
>>>>>>      } else if (selector == #base2) {
>>>>>>        val2 = Load (base2 + offset) mem
>>>>>>      } else {
>>>>>>        halt(); // impossible
>>>>>>      }
>>>>>>
>>>>>>      val = Phi(R1, val1, val2)
>>>>>>
>>>>>> ======
>>>>>>
>>>>>>      Store (AddP (Phi R base1 base2) offset) mem v # store A.i v
>>>>>>
>>>>>> ==(pseudo-code)==>
>>>>>>
>>>>>>      selector = (Phi R #base1 #base2)
>>>>>>
>>>>>>      if (selector == #base1) {
>>>>>>        Store base1 mem v ==> mem1
>>>>>>      } else if (selector == #base2) {
>>>>>>        Store base2 mem v ==> mem2
>>>>>>      } else {
>>>>>>        halt(); // impossible
>>>>>>      }
>>>>>>
>>>>>>      mem = Phi(R1, mem1, mem2);
>>>>>>
>>>>>>
>>>>>> Best regards,
>>>>>> Vladimir Ivanov
>>>>>>
>>>>>>>
>>>>>>> So, gradual approach seems the way to go:
>>>>>>>      (1) split memory graph for clusters of interacting non-escaping
>>>>>>> allocations;
>>>>>>>
>>>>>>>      (2) perform adhoc transformation targeted at untangling aliasing
>>>>>>> accesses (e.g, split-through-phi);
>>>>>>>
>>>>>>>      (3) extract unique instance types where possible, thus making the
>>>>>>> corresponding allocation scalar replaceable
>>>>>>>
>>>>>>> =====================================================================
>>>>>>>
>>>>>>> Also, one observation: interacting allocations don't have to be of the
>>>>>>> same class.
>>>>>>>
>>>>>>>        static int testPolluted(int r, boolean b) {
>>>>>>>            A obj1 = new B1(r);
>>>>>>>            A obj2 = new B2(r);
>>>>>>>
>>>>>>>            A obj = (b ? obj1 : obj2);
>>>>>>>            for (int i = 1; i < r; i++) {
>>>>>>>                obj.i++;
>>>>>>>            }
>>>>>>>
>>>>>>>            return obj1.i + obj2.i;
>>>>>>>        }
>>>>>>>
>>>>>>> Requiring unique instance types to perform SR (and not enhancing SR to
>>>>>>> handle aliasing allocations case) helps avoid some challenges in
>>>>>>> instance rematerialization logic at safepoints, because actual shape of
>>>>>>> the scalarized object (its class and exact set of fields) becomes a
>>>>>>> runtime property.
>>>>>>>
>>>>>>> Best regards,
>>>>>>> Vladimir Ivanov
>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Otherwise, I believe wide memory states/merges would pose serious
>>>>>>>>>> problems. E.g., when a load is split through a phi, you need to pick
>>>>>>>>>> correct memory states for the new loads above the phi which quickly
>>>>>>>>>> becomes quite a challenging task.
>>>>>>>>>
>>>>>>>>> Especially in loops. Actually I did tried such approach before and I
>>>>>>>>> agree that it is hard.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Vladimir K
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Best regards,
>>>>>>>>>> Vladimir Ivanov
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> public int ex1(boolean cond, int first, int second) {
>>>>>>>>>>>        p0 = Allocate(...);
>>>>>>>>>>>        ...
>>>>>>>>>>>        p0.x = first;
>>>>>>>>>>>        p0.y = second;
>>>>>>>>>>>
>>>>>>>>>>>        if (cond) {
>>>>>>>>>>>            p1 = Allocate(...);
>>>>>>>>>>>            ...
>>>>>>>>>>>            p1.x = second;
>>>>>>>>>>>            p1.y = first;
>>>>>>>>>>>        }
>>>>>>>>>>>
>>>>>>>>>>>        p = phi(p0, p1) // unused and will be removed
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>        return phi(p0.x,p1.x) - phi(p0.y, p1.y);
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Vladimir K
>>>>>>>>>>>
>>>>>>>>>>> On 2/10/22 11:29 AM, Vladimir Ivanov wrote:
>>>>>>>>>>>> (BCCing hotspot-dev and moving the discussion to
>>>>>>>>>>>> hotspot-compiler-dev.)
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Cesar,
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for looking into enhancing EA.
>>>>>>>>>>>>
>>>>>>>>>>>> Overall, the proposal looks reasonable.
>>>>>>>>>>>>
>>>>>>>>>>>> I suggest to look more closely at split_unique_types().
>>>>>>>>>>>> It introduces a dedicated class of alias categories for fields of
>>>>>>>>>>>> the allocation being eliminated and clones memory graph. I don't
>>>>>>>>>>>> see why it shouldn't work for multiple allocations.
>>>>>>>>>>>>
>>>>>>>>>>>> Moreover, split_unique_types() will break if you start optimizing
>>>>>>>>>>>> multiple allocations at once. The notion of unique alias should be
>>>>>>>>>>>> adjusted and cover the union of unique aliases for all interacting
>>>>>>>>>>>> allocations.
>>>>>>>>>>>>
>>>>>>>>>>>> Seems like you need to enhance SR to work on non-intersecting
>>>>>>>>>>>> clusters of allocations.
>>>>>>>>>>>>
>>>>>>>>>>>> One thing to take care of: scalar replacement relies on
>>>>>>>>>>>> TypeOopPtr::instance_id().
>>>>>>>>>>>>
>>>>>>>>>>>>      // If not InstanceTop or InstanceBot, indicates that this is
>>>>>>>>>>>>      // a particular instance of this type which is distinct.
>>>>>>>>>>>>      // This is the node index of the allocation node creating this
>>>>>>>>>>>> instance.
>>>>>>>>>>>>      int           _instance_id;
>>>>>>>>>>>>
>>>>>>>>>>>> It'll break when multiple allocations are in play.
>>>>>>>>>>>>
>>>>>>>>>>>> Best regards,
>>>>>>>>>>>> Vladimir Ivanov
>>>>>>>>>>>>
>>>>>>>>>>>> On 09.02.2022 04:45, Cesar Soares Lucas wrote:
>>>>>>>>>>>>> Hi there again!
>>>>>>>>>>>>>
>>>>>>>>>>>>> Can you please give me feedback on the following approach to at
>>>>>>>>>>>>> least partially
>>>>>>>>>>>>> address [1], the scalar replacement allocation merge issue?
>>>>>>>>>>>>>
>>>>>>>>>>>>> The problem that I am trying to solve arises when allocations are
>>>>>>>>>>>>> merged after a
>>>>>>>>>>>>> control flow split. The code below shows _one example_ of such a
>>>>>>>>>>>>> situation.
>>>>>>>>>>>>>
>>>>>>>>>>>>> public int ex1(boolean cond, int x, int y) {
>>>>>>>>>>>>>        Point p = new Point(x, y);
>>>>>>>>>>>>>        if (cond)
>>>>>>>>>>>>>            p = new Point(y, x);
>>>>>>>>>>>>>        // Allocations for p are merged here.
>>>>>>>>>>>>>        return p.calc();
>>>>>>>>>>>>> }
>>>>>>>>>>>>>
>>>>>>>>>>>>> Assuming the method calls on "p" are inlined then the allocations
>>>>>>>>>>>>> will not
>>>>>>>>>>>>> escape the method. The C2 IR for this method will look like this:
>>>>>>>>>>>>>
>>>>>>>>>>>>> public int ex1(boolean cond, int first, int second) {
>>>>>>>>>>>>>        p0 = Allocate(...);
>>>>>>>>>>>>>        ...
>>>>>>>>>>>>>        p0.x = first;
>>>>>>>>>>>>>        p0.y = second;
>>>>>>>>>>>>>
>>>>>>>>>>>>>        if (cond) {
>>>>>>>>>>>>>            p1 = Allocate(...);
>>>>>>>>>>>>>            ...
>>>>>>>>>>>>>            p1.x = second;
>>>>>>>>>>>>>            p1.y = first;
>>>>>>>>>>>>>        }
>>>>>>>>>>>>>
>>>>>>>>>>>>>        p = phi(p0, p1)
>>>>>>>>>>>>>
>>>>>>>>>>>>>        return p.x - p.y;
>>>>>>>>>>>>> }
>>>>>>>>>>>>>
>>>>>>>>>>>>> However, one of the constraints implemented here [2],
>>>>>>>>>>>>> specifically the third
>>>>>>>>>>>>> one, will prevent the objects from being scalar replaced.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The approach that I'm considering for solving the problem is to
>>>>>>>>>>>>> replace the Phi
>>>>>>>>>>>>> node `p = phi(p0, p1)` with new Phi nodes for each of the fields
>>>>>>>>>>>>> of the objects
>>>>>>>>>>>>> in the original Phi. The IR for `ex1` would look something like
>>>>>>>>>>>>> this after the
>>>>>>>>>>>>> transformation:
>>>>>>>>>>>>>
>>>>>>>>>>>>> public int ex1(boolean cond, int first, int second) {
>>>>>>>>>>>>>        p0 = Allocate(...);
>>>>>>>>>>>>>        ...
>>>>>>>>>>>>>        p0.x = first;
>>>>>>>>>>>>>        p0.y = second;
>>>>>>>>>>>>>
>>>>>>>>>>>>>        if (cond) {
>>>>>>>>>>>>>            p1 = Allocate(...);
>>>>>>>>>>>>>            ...
>>>>>>>>>>>>>            p1.x = second;
>>>>>>>>>>>>>            p1.y = first;
>>>>>>>>>>>>>        }
>>>>>>>>>>>>>
>>>>>>>>>>>>>        pX = phi(first, second)
>>>>>>>>>>>>>        pY = phi(second, first)
>>>>>>>>>>>>>
>>>>>>>>>>>>>        return pX - pY;
>>>>>>>>>>>>> }
>>>>>>>>>>>>>
>>>>>>>>>>>>> I understand that this transformation might not be applicable for
>>>>>>>>>>>>> all cases and
>>>>>>>>>>>>> that it's not as simple as illustrated above. Also, it seems to
>>>>>>>>>>>>> me that much of
>>>>>>>>>>>>> what I'd have to implement is already implemented in other steps
>>>>>>>>>>>>> of the Scalar
>>>>>>>>>>>>> Replacement pipeline (which is a good thing). To work around these
>>>>>>>>>>>>> implementation details I plan to use as much of the existing code
>>>>>>>>>>>>> as possible.
>>>>>>>>>>>>> The algorithm for the transformation would be like this:
>>>>>>>>>>>>>
>>>>>>>>>>>>> split_phis(phi)
>>>>>>>>>>>>>        # If output of phi escapes, or something uses its identity, etc
>>>>>>>>>>>>>        # then we can't remove it. The conditions here might
>>>>>>>>>>>>> possible be the
>>>>>>>>>>>>>        # same as the ones implemented in
>>>>>>>>>>>>> `PhaseMacroExpand::can_eliminate_allocation`
>>>>>>>>>>>>>        if cant_remove_phi_output(phi)
>>>>>>>>>>>>>            return ;
>>>>>>>>>>>>>
>>>>>>>>>>>>>        # Collect a set of tuples(F,U) containing nodes U that uses
>>>>>>>>>>>>> field F
>>>>>>>>>>>>>        # member of the object resulting from `phi`.
>>>>>>>>>>>>>        fields_used = collect_fields_used_after_phi(phi)
>>>>>>>>>>>>>
>>>>>>>>>>>>>        foreach field in fields_used
>>>>>>>>>>>>>            producers = {}
>>>>>>>>>>>>>
>>>>>>>>>>>>>            # Create a list with the last Store for each field
>>>>>>>>>>>>> "field" on the
>>>>>>>>>>>>>            # scope of each of the Phi input objects.
>>>>>>>>>>>>>            foreach o in phi.inputs
>>>>>>>>>>>>>                # The function called below might re-use a lot of
>>>>>>>>>>>>> the code/logic in `PhaseMacroExpand::scalar_replacement`
>>>>>>>>>>>>>                producers += last_store_to_o_field(0, field)
>>>>>>>>>>>>>            # Create a new phi node whose inputs are the Store's to
>>>>>>>>>>>>> 'field'
>>>>>>>>>>>>>            field_phi = create_new_phi(producers)
>>>>>>>>>>>>>
>>>>>>>>>>>>>            update_consumers(field, field_phi)
>>>>>>>>>>>>>
>>>>>>>>>>>>> The implementation that I envisioned would be as a "pre-process"
>>>>>>>>>>>>> [3] step just
>>>>>>>>>>>>> after EA but before the constraint checks in
>>>>>>>>>>>>> `adjust_scalar_replaceable_state`
>>>>>>>>>>>>> [2]. If we agree that the overall Scalar Replacement
>>>>>>>>>>>>> implementation goes through
>>>>>>>>>>>>> the following major phases:
>>>>>>>>>>>>>
>>>>>>>>>>>>>        1. Identify the Escape Status of objects.
>>>>>>>>>>>>>        2. Adjust object Escape and/or Scalar Replacement status
>>>>>>>>>>>>> based on a set of constraints.
>>>>>>>>>>>>>        3. Make call to Split_unique_types [4].
>>>>>>>>>>>>>        4 Iterate over object and array allocations.
>>>>>>>>>>>>>            4.1 Check if allocation can be eliminated.
>>>>>>>>>>>>>            4.2 Perform scalar replacement. Replace uses of object
>>>>>>>>>>>>> in Safepoints.
>>>>>>>>>>>>>            4.3 Process users of CheckCastPP other than Safepoint:
>>>>>>>>>>>>> AddP, ArrayCopy and CastP2X.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The transformation that I am proposing would change the overall
>>>>>>>>>>>>> flow to look
>>>>>>>>>>>>> like this:
>>>>>>>>>>>>>
>>>>>>>>>>>>>        1. Identify the Escape Status of objects.
>>>>>>>>>>>>>        2. ----> New: "Split phi functions" <----
>>>>>>>>>>>>>        2. Adjust object Escape and/or Scalar Replacement status
>>>>>>>>>>>>> based on a set of constraints.
>>>>>>>>>>>>>        3. Make call to Split_unique_types [14].
>>>>>>>>>>>>>        4 Iterate over object and array allocations.
>>>>>>>>>>>>>            4.1 ----> Moved to split_phi: "Check if allocation can
>>>>>>>>>>>>> be eliminated" <----
>>>>>>>>>>>>>            4.2 Perform scalar replacement. Replace uses of object
>>>>>>>>>>>>> in Safepoints.
>>>>>>>>>>>>>            4.3 Process users of CheckCastPP other than Safepoint:
>>>>>>>>>>>>> AddP, ArrayCopy and CastP2X.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Please let me know what you think and thank you for taking the
>>>>>>>>>>>>> time to review
>>>>>>>>>>>>> this!
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>> Cesar
>>>>>>>>>>>>>
>>>>>>>>>>>>> Notes:
>>>>>>>>>>>>>
>>>>>>>>>>>>>        [1] I am not sure yet how this approach will play with the
>>>>>>>>>>>>> case of a merge
>>>>>>>>>>>>>            with NULL.
>>>>>>>>>>>>>        [2]
>>>>>>>>>>>>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Fnam06.safelinks.protection.outlook.com%2F%3Furl%3Dhttps*3A*2F*2Fgithub.com*2Fopenjdk*2Fjdk*2Fblob*2F2f71a6b39ed6bb869b4eb3e81bc1d87f4b3328ff*2Fsrc*2Fhotspot*2Fshare*2Fopto*2Fescape.cpp*23L1809%26amp%3Bdata%3D04*7C01*7CDivino.Cesar*40microsoft.com*7Ce13b0a3de30e43d7ac4408d9f22b75c5*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637807090747649306*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C3000%26amp%3Bsdata%3DWruWqIgGVIzIFIG0gD0*2FEIQm40euLx6FZMexotJVrlE*3D%26amp%3Breserved%3D0__%3BJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUl!!ACWV5N9M2RV99hQ!aWncAwAPjDG2SCD4yFD_Rj6ecIB4iHjD2gepnv2TCHRO5RJznILvC8eoFUe4wgtAURqRWDc%24&data=04%7C01%7CDivino.Cesar%40microsoft.com%7Cd6152056b06744f7359e08da10e33a44%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637840865379134331%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=2uggNczR4bPxDv15gREOFA65mta6qrilwlZxblgQcYc%3D&reserved=0 
>>>>
>>>>
>>>>
>>>> <https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Furldefense.com*2Fv3*2F__https*3A*2F*2Fnam06.safelinks.protection.outlook.com*2F*3Furl*3Dhttps*3A*2F*2Fgithub.com*2Fopenjdk*2Fjdk*2Fblob*2F2f71a6b39ed6bb869b4eb3e81bc1d87f4b3328ff*2Fsrc*2Fhotspot*2Fshare*2Fopto*2Fescape.cpp*23L1809*26amp*3Bdata*3D04*7C01*7CDivino.Cesar*40microsoft.com*7Ce13b0a3de30e43d7ac4408d9f22b75c5*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637807090747649306*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C3000*26amp*3Bsdata*3DWruWqIgGVIzIFIG0gD0*2FEIQm40euLx6FZMexotJVrlE*3D*26amp*3Breserved*3D0__*3BJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUl!!ACWV5N9M2RV99hQ!aWncAwAPjDG2SCD4yFD_Rj6ecIB4iHjD2gepnv2TCHRO5RJznILvC8eoFUe4wgtAURqRWDc*24&data=04*7C01*7CDivino.Cesar*40microsoft.com*7Cd6152056b06744f7359e08da10e33a44*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637840865379134331*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C3000&sdata=2uggNczR4bPxDv15gREOFA65mta6qrilwlZxblgQcYc*3D&reserved=0__;JSUlJSUlJSUlJSUqKioqKioqKioqKioqJSUlKioqKioqKioqKioqJSUlKiolJSUlJSUlJSUlJSUlJSUlJSU!!ACWV5N9M2RV99hQ!ZJvmUIbT0x_YJJPA3ap6R-qkOaEsH3YOGwu47flogbZamqV6nWVO2Kfx8uUYnVSj83WW6w$> 
>>>>
>>>>> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Fnam06.safelinks.protection.outlook.com%2F%3Furl%3Dhttps*3A*2F*2Furldefense.com*2Fv3*2F__https*3A*2F*2Fnam06.safelinks.protection.outlook.com*2F*3Furl*3Dhttps*3A*2F*2Fgithub.com*2Fopenjdk*2Fjdk*2Fblob*2F2f71a6b39ed6bb869b4eb3e81bc1d87f4b3328ff*2Fsrc*2Fhotspot*2Fshare*2Fopto*2Fescape.cpp*23L1809*26amp*3Bdata*3D04*7C01*7CDivino.Cesar*40microsoft.com*7Ce13b0a3de30e43d7ac4408d9f22b75c5*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637807090747649306*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C3000*26amp*3Bsdata*3DWruWqIgGVIzIFIG0gD0*2FEIQm40euLx6FZMexotJVrlE*3D*26amp*3Breserved*3D0__*3BJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUl!!ACWV5N9M2RV99hQ!aWncAwAPjDG2SCD4yFD_Rj6ecIB4iHjD2gepnv2TCHRO5RJznILvC8eoFUe4wgtAURqRWDc*24%26amp%3Bdata%3D04*7C01*7Cdivino.cesar*40microsoft.com*7C8c1737b992e847651e8d08d9f62d649b*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637811497098895122*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C3000%26amp%3Bsdata%3D8hLm9BcYVRF810gFqf2*2B*2Fd28jzqU2k7htD*2FDLqR9vAM*3D%26amp%3Breserved%3D0__%3BJSUlJSUlJSUlJSUqKioqKioqKioqKioqJSUlKioqKioqKioqKioqJSUlKiolJSUlJSUlJSUlJSUlJSUlJSUlJSU!!ACWV5N9M2RV99hQ!c3mFGaPH2PWpbeiHR44WrjfJKF0Ymg6aPQ8EOEMK0t5iYjdL4XwgsEvifrlDVrer2pCrdQ%24&data=04%7C01%7CDivino.Cesar%40microsoft.com%7Cd6152056b06744f7359e08da10e33a44%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637840865379134331%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=MX%2BMB2A%2FyA1sLbSbXPp0uIsckxM5OOc%2Fg%2BqPAMwSgAw%3D&reserved=0 
>>>>
>>>>
>>>>
>>>> <https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Furldefense.com*2Fv3*2F__https*3A*2F*2Fnam06.safelinks.protection.outlook.com*2F*3Furl*3Dhttps*3A*2F*2Furldefense.com*2Fv3*2F__https*3A*2F*2Fnam06.safelinks.protection.outlook.com*2F*3Furl*3Dhttps*3A*2F*2Fgithub.com*2Fopenjdk*2Fjdk*2Fblob*2F2f71a6b39ed6bb869b4eb3e81bc1d87f4b3328ff*2Fsrc*2Fhotspot*2Fshare*2Fopto*2Fescape.cpp*23L1809*26amp*3Bdata*3D04*7C01*7CDivino.Cesar*40microsoft.com*7Ce13b0a3de30e43d7ac4408d9f22b75c5*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637807090747649306*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C3000*26amp*3Bsdata*3DWruWqIgGVIzIFIG0gD0*2FEIQm40euLx6FZMexotJVrlE*3D*26amp*3Breserved*3D0__*3BJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUl!!ACWV5N9M2RV99hQ!aWncAwAPjDG2SCD4yFD_Rj6ecIB4iHjD2gepnv2TCHRO5RJznILvC8eoFUe4wgtAURqRWDc*24*26amp*3Bdata*3D04*7C01*7Cdivino.cesar*40microsoft.com*7C8c1737b992e847651e8d08d9f62d649b*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637811497098895122*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C3000*26amp*3Bsdata*3D8hLm9BcYVRF810gFqf2*2B*2Fd28jzqU2k7htD*2FDLqR9vAM*3D*26amp*3Breserved*3D0__*3BJSUlJSUlJSUlJSUqKioqKioqKioqKioqJSUlKioqKioqKioqKioqJSUlKiolJSUlJSUlJSUlJSUlJSUlJSUlJSU!!ACWV5N9M2RV99hQ!c3mFGaPH2PWpbeiHR44WrjfJKF0Ymg6aPQ8EOEMK0t5iYjdL4XwgsEvifrlDVrer2pCrdQ*24&data=04*7C01*7CDivino.Cesar*40microsoft.com*7Cd6152056b06744f7359e08da10e33a44*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637840865379134331*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C3000&sdata=MX*2BMB2A*2FyA1sLbSbXPp0uIsckxM5OOc*2Fg*2BqPAMwSgAw*3D&reserved=0__;JSUlJSUlJSUlJSUqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqJSUlKioqKioqKioqKioqJSUlKioqKiUlJSUlJSUlJSUlJSUlJSUlJSUlJSU!!ACWV5N9M2RV99hQ!ZJvmUIbT0x_YJJPA3ap6R-qkOaEsH3YOGwu47flogbZamqV6nWVO2Kfx8uUYnVQOQ4zHVg$>> 
>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>        [3] Another option would be to "patch" the current
>>>>>>>>>>>>> implementation to be able
>>>>>>>>>>>>>            to handle the merges. I am not certain that the "patch"
>>>>>>>>>>>>> approach would be
>>>>>>>>>>>>>            better, however, the "pre-process" approach is certainly
>>>>>>>>>>>>> much easier to test
>>>>>>>>>>>>>            and more readable.
>>>>>>>>>>>>>
>>>>>>>>>>>>>        [4] I cannot say I understand 100% the effects of executing
>>>>>>>>>>>>>            split_unique_types(). Would the transformation that I am
>>>>>>>>>>>>> proposing need to
>>>>>>>>>>>>>            be after the call to split_unique_types?
>>>>>
>>>>