[External] : Re: RFC : Approach to handle Allocation Merges in C2 Scalar Replacement

Wed Mar 30 23:54:23 UTC 2022

I was not able to reproduce the issue with my test and your patch.

Also try to comment out/remove next your new lines to keep orig_load_mem:

+  // Create unique input for "memory" of Load
+  Node* orig_load_mem   = original_load->in(LoadNode::Memory);
+  new_load->replace_edge(orig_load_mem, orig_load_mem->is_Phi() ? orig_load_mem->in(idx) : orig_load_mem);

Thanks,
Vladimir K

On 3/30/22 2:32 PM, Vladimir Kozlov wrote:
> On 3/30/22 2:28 PM, Vladimir Kozlov wrote:
>> Hi Cesar,
>>
>> Can you also send me your test file TestSimpleMerge.java. I need to reproduce your failure.
>>
>> Currently I see wrong memeory edge in LoadI [231] node because it points to memory [122] on path which is not fully 
>> dominating (on code branch).
> 
> The memory should be this Phi:
> 
>   182  Phi  ===  105  122  43  [[ 18 ]]  #memory  Memory: @TestSimpleMerge$Point+12 *, name=x, idx=6; !jvms: 
> TestSimpleMerge::test @ bci:22 (line 14)
> 
> May be we got some partial split through Phi transformation which is causing the issue.
> 
>>
>> Thanks,
>> Vladimir K
>>
>> [1]
>>
>>   41  Initialize  ===  33  1  44  1  1  40  90  [[ 42  43 ]]  !jvms: TestSimpleMerge::test @ bci:0 (line 9)
>>   43  Proj  ===  41  [[ 133  132  131  123  179  180  181  182  53  53  53  53  233 ]] #2  Memory: @rawptr:BotPTR, 
>> idx=Raw; !jvms: TestSimpleMerge::test @ bci:0 (line 9)
>>   101  IfFalse  ===  99  [[ 105 ]] #0 !jvms: TestSimpleMerge::test @ bci:10 (line 11)
>>
>>   120  Initialize  ===  112  1  123  1  1  119  171  [[ 121  122 ]]  !jvms: TestSimpleMerge::test @ bci:13 (line 12)
>>   122  Proj  ===  120  [[ 182  181  180  179  231 ]] #2  Memory: @rawptr:BotPTR, idx=Raw; !jvms: TestSimpleMerge::test 
>> @ bci:13 (line 12)
>>   121  Proj  ===  120  [[ 105  124 ]] #0 !jvms: TestSimpleMerge::test @ bci:13 (line 12)
>>
>>   221  ConI  ===  0  [[ 220  225 ]]  #int:0
>>   222  ConI  ===  0  [[ 220 ]]  #int:1
>>
>>   105  Region  ===  105  121  101  [[ 105  220  178  179  180  181  182  227 ]]  !jvms: TestSimpleMerge::test @ bci:22 
>> (line 14)
>>
>>   220  Phi  ===  105  221  222  [[ 225 ]]  #int
>>   225  CmpI  === _  220  221  [[ 226 ]]
>>   226  Bool  === _  225  [[ 227 ]] [eq]
>>   227  If  ===  105  226  [[ 228  229 ]] P=0.000000, C=0.000000
>>
>>   229  IfFalse  ===  227  [[ 233  223 ]] #0
>>   233  LoadI  ===  229  43  232  [[ 224 ]]  @TestSimpleMerge$Point+12 *, name=x, idx=6; #int !orig=[207] !jvms: 
>> TestSimpleMerge$Point::calc @ bci:1 (line 5) TestSimpleMerge::test @ bci:23 (line 14)
>>
>>   228  IfTrue  ===  227  [[ 231  223 ]] #1
>>   231  LoadI  ===  228  122  230  [[ 224 ]]  @TestSimpleMerge$Point+12 *, name=x, idx=6; #int !orig=[207] !jvms: 
>> TestSimpleMerge$Point::calc @ bci:1 (line 5) TestSimpleMerge::test @ bci:23 (line 14)
>>
>>   223  Region  ===  223  228  229  [[ 223  224  216 ]]
>>   224  Phi  ===  223  231  233  [[ 216 ]]  #int
>>   216  Return  ===  223  178  18  8  9 returns 224  [[ 0 ]]
>>
>> On 3/29/22 5:34 PM, Cesar Soares Lucas wrote:
>>> Hi, Vladimir.
>>>
>>> Thank you for getting back to me. What you describe is exactly what I understand and what I tried to implement. I 
>>> also noticed the issue with the nodes floating up and I even tried to prevent that.. but I think the root of the 
>>> problem is somewhere else.
>>>
>>> Please take a look at this printout: https://www.dropbox.com/s/ao30zadovwqswlp/splitted.txt?dl=0 
>>> <https://urldefense.com/v3/__https://www.dropbox.com/s/ao30zadovwqswlp/splitted.txt?dl=0__;!!ACWV5N9M2RV99hQ!ZJvmUIbT0x_YJJPA3ap6R-qkOaEsH3YOGwu47flogbZamqV6nWVO2Kfx8uUYnVTddI45Zg$> 
>>> . C2 tells me that this graph has bad dominance: “Use 228 isn’t dominated by def 231”. This error message is result 
>>> of this verification: https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/loopnode.cpp#L5361 
>>> <https://urldefense.com/v3/__https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/loopnode.cpp*L5361__;Iw!!ACWV5N9M2RV99hQ!ZJvmUIbT0x_YJJPA3ap6R-qkOaEsH3YOGwu47flogbZamqV6nWVO2Kfx8uUYnVSGiICGJw$> 
>>>
>>>
>>> I agree and disagree with the message. I’m not sure if it’s my limited knowledge of C2 IR or some wording problem in 
>>> the message: why would 228 need to be dominated by 231 - after all 228 doesn’t use 231? The opposite makes more sense 
>>> to me: 228 needs to dominate 231 – and that is true AFAICT. Nonetheless, I agree with the message that there is 
>>> something wrong in the graph: this graph allows for execution to reach node 231 without node 124 being executed!
>>>
>>> I created this DRAFT PR so that you can look at the code if you want: https://github.com/openjdk/jdk/pull/8023 
>>> <https://urldefense.com/v3/__https://github.com/openjdk/jdk/pull/8023__;!!ACWV5N9M2RV99hQ!ZJvmUIbT0x_YJJPA3ap6R-qkOaEsH3YOGwu47flogbZamqV6nWVO2Kfx8uUYnVQl2jEXlA$> 
>>> .
>>>
>>> Please let me know what you think. TIA!
>>>
>>> Cesar
>>>
>>> *From: *Vladimir Kozlov <vladimir.kozlov at oracle.com>
>>> *Date: *Monday, March 28, 2022 at 10:48 AM
>>> *To: *Cesar Soares Lucas <Divino.Cesar at microsoft.com>, Vladimir Ivanov <vladimir.x.ivanov at oracle.com>, hotspot 
>>> compiler <hotspot-compiler-dev at openjdk.java.net>
>>> *Cc: *Brian Stafford <Brian.Stafford at microsoft.com>, Martijn Verburg <Martijn.Verburg at microsoft.com>
>>> *Subject: *Re: [External] : Re: RFC : Approach to handle Allocation Merges in C2 Scalar Replacement
>>>
>>> Hi Cesar
>>>
>>> On 3/17/22 12:15 PM, Cesar Soares Lucas wrote:
>>>> Hi all,
>>>>
>>>> I’ve been experimenting with the idea explained by Vladimir Ivanov and I was able to implement it and have a few
>>>> non-trivial merges untangled and scalar replaced. However, I found an obstacle on the road. I was able to overcome it,
>>>> but I’d very much appreciate your opinion about the matter. Assume we start off with the piece of code below.
>>>>
>>>> if (…)     p0 = …
>>>>
>>>> else       p1 = …
>>>>
>>>> p = phi(p0, p1)
>>>>
>>>> return p.x
>>>>
>>>> The idea proposed suggests, AFAIU, that we change the code to be something like the below **where we use the bases
>>>> directly**.
>>>>
>>>> if (…)   p0 = …
>>>>
>>>> else     p1 = …
>>>>
>>>> s = # selector phi #
>>>>
>>>> if (s == id_base0)           x0 = p0.x
>>>>
>>>> elseif (s == id_base1)    x1 = p1.x
>>>>
>>>> else                                  halt()
>>>>
>>>> x = phi(x0, x1)
>>>>
>>>> return x
>>>>
>>>> The problem here is that the definition of “p0” and/or “p1” doesn’t dominate their use in the loads “x0 = p0.x” / “x1 =
>>>> p1.x”, therefore certain IR verification steps in C2 fail. I was able to work around the issue by adding Phi nodes 
>>>> where
>>>> all inputs are the same (excluding the ctrl), thus creating a “fake” definition for the value. Please see code below.
>>>
>>> You have original `p = phi(p0, p1)` which is attached to Region node which merges paths with p0 and p1. `Selector ID
>>> Phi(id_p0, id_p1)` should be attached to the same Region and guarantee that p0 and p1 are defined at that point. The
>>> only issue could be is that load nodes don't have control edges and depend on memory edges to schedule. So they could
>>> float up. I think they should be attached (control edge) to id checks (s == id_base*) to prevent them move up as new
>>> special case [1]. At least as first approach for experiment.
>>>
>>> Thanks,
>>> Vladimir K
>>>
>>> [1] 
>>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopenjdk%2Fjdk%2Fblob%2Fmaster%2Fsrc%2Fhotspot%2Fshare%2Fopto%2Fmemnode.cpp%23L1737&data=04%7C01%7CDivino.Cesar%40microsoft.com%7Cd6152056b06744f7359e08da10e33a44%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637840865379084345%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=HaPBCqckab%2BpOcIoYEccL04u1o1pqqaUjsYMCslfP7Y%3D&reserved=0 
>>> <https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Fgithub.com*2Fopenjdk*2Fjdk*2Fblob*2Fmaster*2Fsrc*2Fhotspot*2Fshare*2Fopto*2Fmemnode.cpp*23L1737&data=04*7C01*7CDivino.Cesar*40microsoft.com*7Cd6152056b06744f7359e08da10e33a44*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637840865379084345*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C3000&sdata=HaPBCqckab*2BpOcIoYEccL04u1o1pqqaUjsYMCslfP7Y*3D&reserved=0__;JSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUl!!ACWV5N9M2RV99hQ!ZJvmUIbT0x_YJJPA3ap6R-qkOaEsH3YOGwu47flogbZamqV6nWVO2Kfx8uUYnVTydJtLxg$> 
>>>
>>>
>>>>
>>>> if (…)   p0 = …
>>>>
>>>> else     p1 = …
>>>>
>>>> phi0 = phi(p0, p0)
>>>>
>>>> phi1 = Phi(p1, p1)
>>>>
>>>> s = # selector phi #
>>>>
>>>> if (s == id_base0)           x0 = phi0.x
>>>>
>>>> elseif (s == id_base1)    x1 = phi1.x
>>>>
>>>> else                                  halt()
>>>>
>>>> x = phi(x0, x1)
>>>>
>>>> return x
>>>>
>>>> Transforming the code to look like the above silenced the verifications ( :-] ) and at the end EA+SR was able to remove
>>>> the allocations **and get rid of all the Phi’s**.
>>>>
>>>> So, my question is: is this workaround acceptable? Do you have a suggestion of a better solution?
>>>>
>>>> TIA, Cesar
>>>>
>>>> *From: *hotspot-compiler-dev <hotspot-compiler-dev-retn at openjdk.java.net> on behalf of Cesar Soares Lucas
>>>> <Divino.Cesar at microsoft.com>
>>>> *Date: *Tuesday, February 22, 2022 at 10:01 AM
>>>> *To: *Vladimir Ivanov <vladimir.x.ivanov at oracle.com>, Vladimir Kozlov <vladimir.kozlov at oracle.com>, hotspot compiler
>>>> <hotspot-compiler-dev at openjdk.java.net>
>>>> *Cc: *Brian Stafford <Brian.Stafford at microsoft.com>, Martijn Verburg <Martijn.Verburg at microsoft.com>
>>>> *Subject: *Re: [External] : Re: RFC : Approach to handle Allocation Merges in C2 Scalar Replacement
>>>>
>>>> Hi Vladimir,
>>>>
>>>> I think I've enough feedback to start working on a prototype and see how things play out. Thank you for sharing your
>>>> ideas/expertise!
>>>>
>>>>
>>>> Cheers,
>>>> Cesar
>>>>
>>>> ________________________________________
>>>> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
>>>> Sent: February 21, 2022 10:26 AM
>>>> To: Cesar Soares Lucas; Vladimir Kozlov; hotspot compiler
>>>> Cc: Brian Stafford; Martijn Verburg
>>>> Subject: Re: [External] : Re: RFC : Approach to handle Allocation Merges in C2 Scalar Replacement
>>>>
>>>> Hi Cesar,
>>>>
>>>>> I with the selector-based base splitting idea using some examples and I think I understand the mechanics well. It's 
>>>>> conceptually simple and quite effective, I like it! I've another question, though. From your previous message I had 
>>>>> understood that we needed the clustering because split_unique_types can't handle multiple bases at once, which 
>>>>> makes sense to me. _This time I
>>>> don't understand why we need to do the clustering before doing the selector-based transformation_. AFAIU this
>>>> transformation won't be creating any memory slices but rather just assigning IDs to different bases and then creating
>>>> specialized memory operations conditionally using the different bases.
>>>>
>>>> Yes, that's a valid simplification. It does sound like introducing
>>>> selector ID doesn't require a separate memory graph (and can be
>>>> performed before split_unique_types()), but I don't have a good
>>>> understanding how complicated simplifying selector-based IR shapes would
>>>> be compared to original (memory op-based) ones. It's definitely worth
>>>> experimenting with.
>>>>
>>>> Best regards,
>>>> Vladimir Ivanov
>>>>
>>>>> ________________________________________
>>>>> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
>>>>> Sent: February 17, 2022 7:37 AM
>>>>> To: Vladimir Kozlov; Cesar Soares Lucas; hotspot compiler
>>>>> Cc: Brian Stafford; Martijn Verburg
>>>>> Subject: Re: RFC : Approach to handle Allocation Merges in C2 Scalar Replacement
>>>>>
>>>>>
>>>>>> But depending on the actual shape of the base pointer (here it's
>>>>>> Phi(obj1,obj2)), it may become infeasible (from performance perspective)
>>>>>> or even impossible (e.g., for complex loop variant conditions) to
>>>>>> separate instance slices.
>>>>>
>>>>> I'd like to correct myself. After thinking more about the problem, I see
>>>>> a universal way to scalarize clusters of interfering non-escaping
>>>>> allocations even in presence of accesses with indeterminate base.
>>>>>
>>>>> The crucial property for such cluster of interacting non-escaping
>>>>> allocations is for any memory access its base is either part of the
>>>>> cluster or not (it's a static property for every access). Then, when it
>>>>> is part of the cluster, then the number of possible base values at
>>>>> runtime is finite and is a subset of the cluster.
>>>>>
>>>>> So, memory graph for the cluster can be scalarized as follows:
>>>>>
>>>>>      (1) separate memory graph for the cluster
>>>>>
>>>>>      (2) on the newly constructed graph:
>>>>>
>>>>>         (a) replace every base pointer with an ID ("selector id") and
>>>>> recreate data graph for selector IDs from base pointers graph;
>>>>>
>>>>>         (b) for accesses with indeterminate base pointer, replace them
>>>>> with a merge of states from relevant allocations selected by "selector id"
>>>>>
>>>>>         (c) after 2b, all memory accesses from the cluster memory graph
>>>>> should have fixed base pointing at a particular allocation
>>>>>
>>>>>      (3) scalarize individual allocations from the cluster one by one
>>>>> (since they are independent now)
>>>>>
>>>>>         - additional transformation (like, split-through-phi) should
>>>>> simplify the graph by reducing the number of cases at merge points to
>>>>> care about (ideally, leaving only a single one);
>>>>>
>>>>>
>>>>>
>>>>> As an illustration:
>>>>>
>>>>>      val = Load (AddP (Phi R base1 base2) offset) mem # load A.i
>>>>>
>>>>> ==(pseudo-code)==>
>>>>>
>>>>>      selector = (Phi R #base1 #base2)
>>>>>
>>>>>      if (selector == #base1) {
>>>>>        val1 = Load (base1 + offset) mem
>>>>>      } else if (selector == #base2) {
>>>>>        val2 = Load (base2 + offset) mem
>>>>>      } else {
>>>>>        halt(); // impossible
>>>>>      }
>>>>>
>>>>>      val = Phi(R1, val1, val2)
>>>>>
>>>>> ======
>>>>>
>>>>>      Store (AddP (Phi R base1 base2) offset) mem v # store A.i v
>>>>>
>>>>> ==(pseudo-code)==>
>>>>>
>>>>>      selector = (Phi R #base1 #base2)
>>>>>
>>>>>      if (selector == #base1) {
>>>>>        Store base1 mem v ==> mem1
>>>>>      } else if (selector == #base2) {
>>>>>        Store base2 mem v ==> mem2
>>>>>      } else {
>>>>>        halt(); // impossible
>>>>>      }
>>>>>
>>>>>      mem = Phi(R1, mem1, mem2);
>>>>>
>>>>>
>>>>> Best regards,
>>>>> Vladimir Ivanov
>>>>>
>>>>>>
>>>>>> So, gradual approach seems the way to go:
>>>>>>      (1) split memory graph for clusters of interacting non-escaping
>>>>>> allocations;
>>>>>>
>>>>>>      (2) perform adhoc transformation targeted at untangling aliasing
>>>>>> accesses (e.g, split-through-phi);
>>>>>>
>>>>>>      (3) extract unique instance types where possible, thus making the
>>>>>> corresponding allocation scalar replaceable
>>>>>>
>>>>>> =====================================================================
>>>>>>
>>>>>> Also, one observation: interacting allocations don't have to be of the
>>>>>> same class.
>>>>>>
>>>>>>        static int testPolluted(int r, boolean b) {
>>>>>>            A obj1 = new B1(r);
>>>>>>            A obj2 = new B2(r);
>>>>>>
>>>>>>            A obj = (b ? obj1 : obj2);
>>>>>>            for (int i = 1; i < r; i++) {
>>>>>>                obj.i++;
>>>>>>            }
>>>>>>
>>>>>>            return obj1.i + obj2.i;
>>>>>>        }
>>>>>>
>>>>>> Requiring unique instance types to perform SR (and not enhancing SR to
>>>>>> handle aliasing allocations case) helps avoid some challenges in
>>>>>> instance rematerialization logic at safepoints, because actual shape of
>>>>>> the scalarized object (its class and exact set of fields) becomes a
>>>>>> runtime property.
>>>>>>
>>>>>> Best regards,
>>>>>> Vladimir Ivanov
>>>>>>
>>>>>>>>>
>>>>>>>>> Otherwise, I believe wide memory states/merges would pose serious
>>>>>>>>> problems. E.g., when a load is split through a phi, you need to pick
>>>>>>>>> correct memory states for the new loads above the phi which quickly
>>>>>>>>> becomes quite a challenging task.
>>>>>>>>
>>>>>>>> Especially in loops. Actually I did tried such approach before and I
>>>>>>>> agree that it is hard.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Vladimir K
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Best regards,
>>>>>>>>> Vladimir Ivanov
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> public int ex1(boolean cond, int first, int second) {
>>>>>>>>>>        p0 = Allocate(...);
>>>>>>>>>>        ...
>>>>>>>>>>        p0.x = first;
>>>>>>>>>>        p0.y = second;
>>>>>>>>>>
>>>>>>>>>>        if (cond) {
>>>>>>>>>>            p1 = Allocate(...);
>>>>>>>>>>            ...
>>>>>>>>>>            p1.x = second;
>>>>>>>>>>            p1.y = first;
>>>>>>>>>>        }
>>>>>>>>>>
>>>>>>>>>>        p = phi(p0, p1) // unused and will be removed
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>        return phi(p0.x,p1.x) - phi(p0.y, p1.y);
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Vladimir K
>>>>>>>>>>
>>>>>>>>>> On 2/10/22 11:29 AM, Vladimir Ivanov wrote:
>>>>>>>>>>> (BCCing hotspot-dev and moving the discussion to
>>>>>>>>>>> hotspot-compiler-dev.)
>>>>>>>>>>>
>>>>>>>>>>> Hi Cesar,
>>>>>>>>>>>
>>>>>>>>>>> Thanks for looking into enhancing EA.
>>>>>>>>>>>
>>>>>>>>>>> Overall, the proposal looks reasonable.
>>>>>>>>>>>
>>>>>>>>>>> I suggest to look more closely at split_unique_types().
>>>>>>>>>>> It introduces a dedicated class of alias categories for fields of
>>>>>>>>>>> the allocation being eliminated and clones memory graph. I don't
>>>>>>>>>>> see why it shouldn't work for multiple allocations.
>>>>>>>>>>>
>>>>>>>>>>> Moreover, split_unique_types() will break if you start optimizing
>>>>>>>>>>> multiple allocations at once. The notion of unique alias should be
>>>>>>>>>>> adjusted and cover the union of unique aliases for all interacting
>>>>>>>>>>> allocations.
>>>>>>>>>>>
>>>>>>>>>>> Seems like you need to enhance SR to work on non-intersecting
>>>>>>>>>>> clusters of allocations.
>>>>>>>>>>>
>>>>>>>>>>> One thing to take care of: scalar replacement relies on
>>>>>>>>>>> TypeOopPtr::instance_id().
>>>>>>>>>>>
>>>>>>>>>>>      // If not InstanceTop or InstanceBot, indicates that this is
>>>>>>>>>>>      // a particular instance of this type which is distinct.
>>>>>>>>>>>      // This is the node index of the allocation node creating this
>>>>>>>>>>> instance.
>>>>>>>>>>>      int           _instance_id;
>>>>>>>>>>>
>>>>>>>>>>> It'll break when multiple allocations are in play.
>>>>>>>>>>>
>>>>>>>>>>> Best regards,
>>>>>>>>>>> Vladimir Ivanov
>>>>>>>>>>>
>>>>>>>>>>> On 09.02.2022 04:45, Cesar Soares Lucas wrote:
>>>>>>>>>>>> Hi there again!
>>>>>>>>>>>>
>>>>>>>>>>>> Can you please give me feedback on the following approach to at
>>>>>>>>>>>> least partially
>>>>>>>>>>>> address [1], the scalar replacement allocation merge issue?
>>>>>>>>>>>>
>>>>>>>>>>>> The problem that I am trying to solve arises when allocations are
>>>>>>>>>>>> merged after a
>>>>>>>>>>>> control flow split. The code below shows _one example_ of such a
>>>>>>>>>>>> situation.
>>>>>>>>>>>>
>>>>>>>>>>>> public int ex1(boolean cond, int x, int y) {
>>>>>>>>>>>>        Point p = new Point(x, y);
>>>>>>>>>>>>        if (cond)
>>>>>>>>>>>>            p = new Point(y, x);
>>>>>>>>>>>>        // Allocations for p are merged here.
>>>>>>>>>>>>        return p.calc();
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> Assuming the method calls on "p" are inlined then the allocations
>>>>>>>>>>>> will not
>>>>>>>>>>>> escape the method. The C2 IR for this method will look like this:
>>>>>>>>>>>>
>>>>>>>>>>>> public int ex1(boolean cond, int first, int second) {
>>>>>>>>>>>>        p0 = Allocate(...);
>>>>>>>>>>>>        ...
>>>>>>>>>>>>        p0.x = first;
>>>>>>>>>>>>        p0.y = second;
>>>>>>>>>>>>
>>>>>>>>>>>>        if (cond) {
>>>>>>>>>>>>            p1 = Allocate(...);
>>>>>>>>>>>>            ...
>>>>>>>>>>>>            p1.x = second;
>>>>>>>>>>>>            p1.y = first;
>>>>>>>>>>>>        }
>>>>>>>>>>>>
>>>>>>>>>>>>        p = phi(p0, p1)
>>>>>>>>>>>>
>>>>>>>>>>>>        return p.x - p.y;
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> However, one of the constraints implemented here [2],
>>>>>>>>>>>> specifically the third
>>>>>>>>>>>> one, will prevent the objects from being scalar replaced.
>>>>>>>>>>>>
>>>>>>>>>>>> The approach that I'm considering for solving the problem is to
>>>>>>>>>>>> replace the Phi
>>>>>>>>>>>> node `p = phi(p0, p1)` with new Phi nodes for each of the fields
>>>>>>>>>>>> of the objects
>>>>>>>>>>>> in the original Phi. The IR for `ex1` would look something like
>>>>>>>>>>>> this after the
>>>>>>>>>>>> transformation:
>>>>>>>>>>>>
>>>>>>>>>>>> public int ex1(boolean cond, int first, int second) {
>>>>>>>>>>>>        p0 = Allocate(...);
>>>>>>>>>>>>        ...
>>>>>>>>>>>>        p0.x = first;
>>>>>>>>>>>>        p0.y = second;
>>>>>>>>>>>>
>>>>>>>>>>>>        if (cond) {
>>>>>>>>>>>>            p1 = Allocate(...);
>>>>>>>>>>>>            ...
>>>>>>>>>>>>            p1.x = second;
>>>>>>>>>>>>            p1.y = first;
>>>>>>>>>>>>        }
>>>>>>>>>>>>
>>>>>>>>>>>>        pX = phi(first, second)
>>>>>>>>>>>>        pY = phi(second, first)
>>>>>>>>>>>>
>>>>>>>>>>>>        return pX - pY;
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> I understand that this transformation might not be applicable for
>>>>>>>>>>>> all cases and
>>>>>>>>>>>> that it's not as simple as illustrated above. Also, it seems to
>>>>>>>>>>>> me that much of
>>>>>>>>>>>> what I'd have to implement is already implemented in other steps
>>>>>>>>>>>> of the Scalar
>>>>>>>>>>>> Replacement pipeline (which is a good thing). To work around these
>>>>>>>>>>>> implementation details I plan to use as much of the existing code
>>>>>>>>>>>> as possible.
>>>>>>>>>>>> The algorithm for the transformation would be like this:
>>>>>>>>>>>>
>>>>>>>>>>>> split_phis(phi)
>>>>>>>>>>>>        # If output of phi escapes, or something uses its identity, etc
>>>>>>>>>>>>        # then we can't remove it. The conditions here might
>>>>>>>>>>>> possible be the
>>>>>>>>>>>>        # same as the ones implemented in
>>>>>>>>>>>> `PhaseMacroExpand::can_eliminate_allocation`
>>>>>>>>>>>>        if cant_remove_phi_output(phi)
>>>>>>>>>>>>            return ;
>>>>>>>>>>>>
>>>>>>>>>>>>        # Collect a set of tuples(F,U) containing nodes U that uses
>>>>>>>>>>>> field F
>>>>>>>>>>>>        # member of the object resulting from `phi`.
>>>>>>>>>>>>        fields_used = collect_fields_used_after_phi(phi)
>>>>>>>>>>>>
>>>>>>>>>>>>        foreach field in fields_used
>>>>>>>>>>>>            producers = {}
>>>>>>>>>>>>
>>>>>>>>>>>>            # Create a list with the last Store for each field
>>>>>>>>>>>> "field" on the
>>>>>>>>>>>>            # scope of each of the Phi input objects.
>>>>>>>>>>>>            foreach o in phi.inputs
>>>>>>>>>>>>                # The function called below might re-use a lot of
>>>>>>>>>>>> the code/logic in `PhaseMacroExpand::scalar_replacement`
>>>>>>>>>>>>                producers += last_store_to_o_field(0, field)
>>>>>>>>>>>>            # Create a new phi node whose inputs are the Store's to
>>>>>>>>>>>> 'field'
>>>>>>>>>>>>            field_phi = create_new_phi(producers)
>>>>>>>>>>>>
>>>>>>>>>>>>            update_consumers(field, field_phi)
>>>>>>>>>>>>
>>>>>>>>>>>> The implementation that I envisioned would be as a "pre-process"
>>>>>>>>>>>> [3] step just
>>>>>>>>>>>> after EA but before the constraint checks in
>>>>>>>>>>>> `adjust_scalar_replaceable_state`
>>>>>>>>>>>> [2]. If we agree that the overall Scalar Replacement
>>>>>>>>>>>> implementation goes through
>>>>>>>>>>>> the following major phases:
>>>>>>>>>>>>
>>>>>>>>>>>>        1. Identify the Escape Status of objects.
>>>>>>>>>>>>        2. Adjust object Escape and/or Scalar Replacement status
>>>>>>>>>>>> based on a set of constraints.
>>>>>>>>>>>>        3. Make call to Split_unique_types [4].
>>>>>>>>>>>>        4 Iterate over object and array allocations.
>>>>>>>>>>>>            4.1 Check if allocation can be eliminated.
>>>>>>>>>>>>            4.2 Perform scalar replacement. Replace uses of object
>>>>>>>>>>>> in Safepoints.
>>>>>>>>>>>>            4.3 Process users of CheckCastPP other than Safepoint:
>>>>>>>>>>>> AddP, ArrayCopy and CastP2X.
>>>>>>>>>>>>
>>>>>>>>>>>> The transformation that I am proposing would change the overall
>>>>>>>>>>>> flow to look
>>>>>>>>>>>> like this:
>>>>>>>>>>>>
>>>>>>>>>>>>        1. Identify the Escape Status of objects.
>>>>>>>>>>>>        2. ----> New: "Split phi functions" <----
>>>>>>>>>>>>        2. Adjust object Escape and/or Scalar Replacement status
>>>>>>>>>>>> based on a set of constraints.
>>>>>>>>>>>>        3. Make call to Split_unique_types [14].
>>>>>>>>>>>>        4 Iterate over object and array allocations.
>>>>>>>>>>>>            4.1 ----> Moved to split_phi: "Check if allocation can
>>>>>>>>>>>> be eliminated" <----
>>>>>>>>>>>>            4.2 Perform scalar replacement. Replace uses of object
>>>>>>>>>>>> in Safepoints.
>>>>>>>>>>>>            4.3 Process users of CheckCastPP other than Safepoint:
>>>>>>>>>>>> AddP, ArrayCopy and CastP2X.
>>>>>>>>>>>>
>>>>>>>>>>>> Please let me know what you think and thank you for taking the
>>>>>>>>>>>> time to review
>>>>>>>>>>>> this!
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Cesar
>>>>>>>>>>>>
>>>>>>>>>>>> Notes:
>>>>>>>>>>>>
>>>>>>>>>>>>        [1] I am not sure yet how this approach will play with the
>>>>>>>>>>>> case of a merge
>>>>>>>>>>>>            with NULL.
>>>>>>>>>>>>        [2]
>>>>>>>>>>>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Fnam06.safelinks.protection.outlook.com%2F%3Furl%3Dhttps*3A*2F*2Fgithub.com*2Fopenjdk*2Fjdk*2Fblob*2F2f71a6b39ed6bb869b4eb3e81bc1d87f4b3328ff*2Fsrc*2Fhotspot*2Fshare*2Fopto*2Fescape.cpp*23L1809%26amp%3Bdata%3D04*7C01*7CDivino.Cesar*40microsoft.com*7Ce13b0a3de30e43d7ac4408d9f22b75c5*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637807090747649306*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C3000%26amp%3Bsdata%3DWruWqIgGVIzIFIG0gD0*2FEIQm40euLx6FZMexotJVrlE*3D%26amp%3Breserved%3D0__%3BJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUl!!ACWV5N9M2RV99hQ!aWncAwAPjDG2SCD4yFD_Rj6ecIB4iHjD2gepnv2TCHRO5RJznILvC8eoFUe4wgtAURqRWDc%24&data=04%7C01%7CDivino.Cesar%40microsoft.com%7Cd6152056b06744f7359e08da10e33a44%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637840865379134331%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=2uggNczR4bPxDv15gREOFA65mta6qrilwlZxblgQcYc%3D&reserved=0 
>>>
>>>
>>> <https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Furldefense.com*2Fv3*2F__https*3A*2F*2Fnam06.safelinks.protection.outlook.com*2F*3Furl*3Dhttps*3A*2F*2Fgithub.com*2Fopenjdk*2Fjdk*2Fblob*2F2f71a6b39ed6bb869b4eb3e81bc1d87f4b3328ff*2Fsrc*2Fhotspot*2Fshare*2Fopto*2Fescape.cpp*23L1809*26amp*3Bdata*3D04*7C01*7CDivino.Cesar*40microsoft.com*7Ce13b0a3de30e43d7ac4408d9f22b75c5*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637807090747649306*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C3000*26amp*3Bsdata*3DWruWqIgGVIzIFIG0gD0*2FEIQm40euLx6FZMexotJVrlE*3D*26amp*3Breserved*3D0__*3BJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUl!!ACWV5N9M2RV99hQ!aWncAwAPjDG2SCD4yFD_Rj6ecIB4iHjD2gepnv2TCHRO5RJznILvC8eoFUe4wgtAURqRWDc*24&data=04*7C01*7CDivino.Cesar*40microsoft.com*7Cd6152056b06744f7359e08da10e33a44*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637840865379134331*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C3000&sdata=2uggNczR4bPxDv15gREOFA65mta6qrilwlZxblgQcYc*3D&reserved=0__;JSUlJSUlJSUlJSUqKioqKioqKioqKioqJSUlKioqKioqKioqKioqJSUlKiolJSUlJSUlJSUlJSUlJSUlJSU!!ACWV5N9M2RV99hQ!ZJvmUIbT0x_YJJPA3ap6R-qkOaEsH3YOGwu47flogbZamqV6nWVO2Kfx8uUYnVSj83WW6w$> 
>>>
>>>> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Fnam06.safelinks.protection.outlook.com%2F%3Furl%3Dhttps*3A*2F*2Furldefense.com*2Fv3*2F__https*3A*2F*2Fnam06.safelinks.protection.outlook.com*2F*3Furl*3Dhttps*3A*2F*2Fgithub.com*2Fopenjdk*2Fjdk*2Fblob*2F2f71a6b39ed6bb869b4eb3e81bc1d87f4b3328ff*2Fsrc*2Fhotspot*2Fshare*2Fopto*2Fescape.cpp*23L1809*26amp*3Bdata*3D04*7C01*7CDivino.Cesar*40microsoft.com*7Ce13b0a3de30e43d7ac4408d9f22b75c5*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637807090747649306*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C3000*26amp*3Bsdata*3DWruWqIgGVIzIFIG0gD0*2FEIQm40euLx6FZMexotJVrlE*3D*26amp*3Breserved*3D0__*3BJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUl!!ACWV5N9M2RV99hQ!aWncAwAPjDG2SCD4yFD_Rj6ecIB4iHjD2gepnv2TCHRO5RJznILvC8eoFUe4wgtAURqRWDc*24%26amp%3Bdata%3D04*7C01*7Cdivino.cesar*40microsoft.com*7C8c1737b992e847651e8d08d9f62d649b*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637811497098895122*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C3000%26amp%3Bsdata%3D8hLm9BcYVRF810gFqf2*2B*2Fd28jzqU2k7htD*2FDLqR9vAM*3D%26amp%3Breserved%3D0__%3BJSUlJSUlJSUlJSUqKioqKioqKioqKioqJSUlKioqKioqKioqKioqJSUlKiolJSUlJSUlJSUlJSUlJSUlJSUlJSU!!ACWV5N9M2RV99hQ!c3mFGaPH2PWpbeiHR44WrjfJKF0Ymg6aPQ8EOEMK0t5iYjdL4XwgsEvifrlDVrer2pCrdQ%24&data=04%7C01%7CDivino.Cesar%40microsoft.com%7Cd6152056b06744f7359e08da10e33a44%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637840865379134331%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=MX%2BMB2A%2FyA1sLbSbXPp0uIsckxM5OOc%2Fg%2BqPAMwSgAw%3D&reserved=0 
>>>
>>>
>>> <https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Furldefense.com*2Fv3*2F__https*3A*2F*2Fnam06.safelinks.protection.outlook.com*2F*3Furl*3Dhttps*3A*2F*2Furldefense.com*2Fv3*2F__https*3A*2F*2Fnam06.safelinks.protection.outlook.com*2F*3Furl*3Dhttps*3A*2F*2Fgithub.com*2Fopenjdk*2Fjdk*2Fblob*2F2f71a6b39ed6bb869b4eb3e81bc1d87f4b3328ff*2Fsrc*2Fhotspot*2Fshare*2Fopto*2Fescape.cpp*23L1809*26amp*3Bdata*3D04*7C01*7CDivino.Cesar*40microsoft.com*7Ce13b0a3de30e43d7ac4408d9f22b75c5*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637807090747649306*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C3000*26amp*3Bsdata*3DWruWqIgGVIzIFIG0gD0*2FEIQm40euLx6FZMexotJVrlE*3D*26amp*3Breserved*3D0__*3BJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUl!!ACWV5N9M2RV99hQ!aWncAwAPjDG2SCD4yFD_Rj6ecIB4iHjD2gepnv2TCHRO5RJznILvC8eoFUe4wgtAURqRWDc*24*26amp*3Bdata*3D04*7C01*7Cdivino.cesar*40microsoft.com*7C8c1737b992e847651e8d08d9f62d649b*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637811497098895122*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C3000*26amp*3Bsdata*3D8hLm9BcYVRF810gFqf2*2B*2Fd28jzqU2k7htD*2FDLqR9vAM*3D*26amp*3Breserved*3D0__*3BJSUlJSUlJSUlJSUqKioqKioqKioqKioqJSUlKioqKioqKioqKioqJSUlKiolJSUlJSUlJSUlJSUlJSUlJSUlJSU!!ACWV5N9M2RV99hQ!c3mFGaPH2PWpbeiHR44WrjfJKF0Ymg6aPQ8EOEMK0t5iYjdL4XwgsEvifrlDVrer2pCrdQ*24&data=04*7C01*7CDivino.Cesar*40microsoft.com*7Cd6152056b06744f7359e08da10e33a44*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637840865379134331*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C3000&sdata=MX*2BMB2A*2FyA1sLbSbXPp0uIsckxM5OOc*2Fg*2BqPAMwSgAw*3D&reserved=0__;JSUlJSUlJSUlJSUqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqJSUlKioqKioqKioqKioqJSUlKioqKiUlJSUlJSUlJSUlJSUlJSUlJSUlJSU!!ACWV5N9M2RV99hQ!ZJvmUIbT0x_YJJPA3ap6R-qkOaEsH3YOGwu47flogbZamqV6nWVO2Kfx8uUYnVQOQ4zHVg$>> 
>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>        [3] Another option would be to "patch" the current
>>>>>>>>>>>> implementation to be able
>>>>>>>>>>>>            to handle the merges. I am not certain that the "patch"
>>>>>>>>>>>> approach would be
>>>>>>>>>>>>            better, however, the "pre-process" approach is certainly
>>>>>>>>>>>> much easier to test
>>>>>>>>>>>>            and more readable.
>>>>>>>>>>>>
>>>>>>>>>>>>        [4] I cannot say I understand 100% the effects of executing
>>>>>>>>>>>>            split_unique_types(). Would the transformation that I am
>>>>>>>>>>>> proposing need to
>>>>>>>>>>>>            be after the call to split_unique_types?
>>>>
>>>