[External] : Re: RFC : Approach to handle Allocation Merges in C2 Scalar Replacement

Fri Feb 11 21:03:54 UTC 2022

On 2/11/22 12:14 PM, Cesar Soares Lucas wrote:
> Thank you all for the feedback. Very much appreciated!
> 
> What I like most about the idea of replacing phi(p0, p1) with several phi(Store(p0.x), Store(p1.x), etc. is that the 
> code becomes "detached" from the way that p0 and p1 were created. I think this might be helpful when eventually we try 
> to bring partial escape analysis.
> 
> I understand that split_unique_types() and SR will break if we have multiple allocations since they are dependent on the 
> Allocate ID. However, my understanding of this method isn't deep: I know what it does but I don't really know why it is 
> necessary on a deep level. If you could elaborate on that, it would be very helpful to me.

split_unique_types() created clean memory slices related only to such objects (by cloning related nodes).

Actually Roland suggested before to always run split_unique_types(), even if non-escaping object is not scalar 
replaceable. I think this is what Vladimir I. is also suggesting.

It will significantly simplify the work we have to do in IGVN.

> 
>  > As you correctly pointed in your original post, EA and SR heavily depend on _instance_id which is Allocation node's 
> _idx value.
>  > And the solution for this is either remove reference to merge phi. Or introduce new "virtual allocation" which fields 
> are
>  > populated by loads - but it is again needs to split through phi.
> 
> Vladimir K., are you suggesting something like this:

Yes, that is what I had in mind but it could be more complicated to implement.
The idea is to have new _instance_id after merge. It may not be useful for this simple case but may help in others.
We would still need to improve split_unique_types to find values to store in its fields.

> 
>   public int ex1(boolean cond, int first, int second) {
>        p0 = ...;
>        p0.x = first;
>        p0.y = second;
>        if (cond) {
>            p1 = ...
>            p1.x = second;
>            p1.y = first;
>        }
>        p = VirtualAllocate(...);
>        p.x = phi(first, second);
>        p.y = phi(second, first);
>        return p.x - p.y;
>   }
> 
> Would the phi/merges for individual fields cause problems in split_unique_types/SR?

Split_unique_types/SR work on allocation's references and not on what is stored in its fields.

Thanks,
Vladimir

> 
> 
> Regards,
> Cesar
> 
> ------------------------------------------------------------------------------------------------------------------------
> *From:* Vladimir Kozlov <vladimir.kozlov at oracle.com>
> *Sent:* February 11, 2022 10:51 AM
> *To:* Vladimir Ivanov <vladimir.x.ivanov at oracle.com>; Cesar Soares Lucas <Divino.Cesar at microsoft.com>; hotspot compiler 
> <hotspot-compiler-dev at openjdk.java.net>
> *Cc:* Brian Stafford <Brian.Stafford at microsoft.com>; Martijn Verburg <Martijn.Verburg at microsoft.com>
> *Subject:* Re: RFC : Approach to handle Allocation Merges in C2 Scalar Replacement
> On 2/11/22 10:24 AM, Vladimir Ivanov wrote:
>> 
>>> For this simple case we can teach C2's IGVN to split fields loads through Phi so that phi(p0, p1) is not used and
>>> allocations as well. We can do that because we know that allocations and phi do not escape.
>> 
>> Seems like split_unique_types() pass is a prerequisite to make such transformations predictable on a larger scale. It
>> simplifies the memory graph and untangles instance-related memory operations from the rest of the graph.
> 
> Can you elaborate on your suggestion?
> 
> As you correctly pointed in your original post, EA and SR heavily depend on _instance_id which is Allocation node's _idx
> value. And the solution for this is either remove reference to merge phi. Or introduce new "virtual allocation" which
> fields are populated by loads - but it is again needs to split through phi.
> 
>> 
>> Otherwise, I believe wide memory states/merges would pose serious problems. E.g., when a load is split through a phi,
>> you need to pick correct memory states for the new loads above the phi which quickly becomes quite a challenging task.
> 
> Especially in loops. Actually I did tried such approach before and I agree that it is hard.
> 
> Thanks,
> Vladimir K
> 
>> 
>> Best regards,
>> Vladimir Ivanov
>> 
>>>
>>> public int ex1(boolean cond, int first, int second) {
>>>      p0 = Allocate(...);
>>>      ...
>>>      p0.x = first;
>>>      p0.y = second;
>>>
>>>      if (cond) {
>>>          p1 = Allocate(...);
>>>          ...
>>>          p1.x = second;
>>>          p1.y = first;
>>>      }
>>>
>>>      p = phi(p0, p1) // unused and will be removed
>>>
>>>
>>>      return phi(p0.x,p1.x) - phi(p0.y, p1.y);
>>> }
>>>
>>> Thanks,
>>> Vladimir K
>>>
>>> On 2/10/22 11:29 AM, Vladimir Ivanov wrote:
>>>> (BCCing hotspot-dev and moving the discussion to hotspot-compiler-dev.)
>>>>
>>>> Hi Cesar,
>>>>
>>>> Thanks for looking into enhancing EA.
>>>>
>>>> Overall, the proposal looks reasonable.
>>>>
>>>> I suggest to look more closely at split_unique_types().
>>>> It introduces a dedicated class of alias categories for fields of the allocation being eliminated and clones memory
>>>> graph. I don't see why it shouldn't work for multiple allocations.
>>>>
>>>> Moreover, split_unique_types() will break if you start optimizing multiple allocations at once. The notion of unique
>>>> alias should be adjusted and cover the union of unique aliases for all interacting allocations.
>>>>
>>>> Seems like you need to enhance SR to work on non-intersecting clusters of allocations.
>>>>
>>>> One thing to take care of: scalar replacement relies on TypeOopPtr::instance_id().
>>>>
>>>>    // If not InstanceTop or InstanceBot, indicates that this is
>>>>    // a particular instance of this type which is distinct.
>>>>    // This is the node index of the allocation node creating this instance.
>>>>    int           _instance_id;
>>>>
>>>> It'll break when multiple allocations are in play.
>>>>
>>>> Best regards,
>>>> Vladimir Ivanov
>>>>
>>>> On 09.02.2022 04:45, Cesar Soares Lucas wrote:
>>>>> Hi there again!
>>>>>
>>>>> Can you please give me feedback on the following approach to at least partially
>>>>> address [1], the scalar replacement allocation merge issue?
>>>>>
>>>>> The problem that I am trying to solve arises when allocations are merged after a
>>>>> control flow split. The code below shows _one example_ of such a situation.
>>>>>
>>>>> public int ex1(boolean cond, int x, int y) {
>>>>>      Point p = new Point(x, y);
>>>>>      if (cond)
>>>>>          p = new Point(y, x);
>>>>>      // Allocations for p are merged here.
>>>>>      return p.calc();
>>>>> }
>>>>>
>>>>> Assuming the method calls on "p" are inlined then the allocations will not
>>>>> escape the method. The C2 IR for this method will look like this:
>>>>>
>>>>> public int ex1(boolean cond, int first, int second) {
>>>>>      p0 = Allocate(...);
>>>>>      ...
>>>>>      p0.x = first;
>>>>>      p0.y = second;
>>>>>
>>>>>      if (cond) {
>>>>>          p1 = Allocate(...);
>>>>>          ...
>>>>>          p1.x = second;
>>>>>          p1.y = first;
>>>>>      }
>>>>>
>>>>>      p = phi(p0, p1)
>>>>>
>>>>>      return p.x - p.y;
>>>>> }
>>>>>
>>>>> However, one of the constraints implemented here [2], specifically the third
>>>>> one, will prevent the objects from being scalar replaced.
>>>>>
>>>>> The approach that I'm considering for solving the problem is to replace the Phi
>>>>> node `p = phi(p0, p1)` with new Phi nodes for each of the fields of the objects
>>>>> in the original Phi. The IR for `ex1` would look something like this after the
>>>>> transformation:
>>>>>
>>>>> public int ex1(boolean cond, int first, int second) {
>>>>>      p0 = Allocate(...);
>>>>>      ...
>>>>>      p0.x = first;
>>>>>      p0.y = second;
>>>>>
>>>>>      if (cond) {
>>>>>          p1 = Allocate(...);
>>>>>          ...
>>>>>          p1.x = second;
>>>>>          p1.y = first;
>>>>>      }
>>>>>
>>>>>      pX = phi(first, second)
>>>>>      pY = phi(second, first)
>>>>>
>>>>>      return pX - pY;
>>>>> }
>>>>>
>>>>> I understand that this transformation might not be applicable for all cases and
>>>>> that it's not as simple as illustrated above. Also, it seems to me that much of
>>>>> what I'd have to implement is already implemented in other steps of the Scalar
>>>>> Replacement pipeline (which is a good thing). To work around these
>>>>> implementation details I plan to use as much of the existing code as possible.
>>>>> The algorithm for the transformation would be like this:
>>>>>
>>>>> split_phis(phi)
>>>>>      # If output of phi escapes, or something uses its identity, etc
>>>>>      # then we can't remove it. The conditions here might possible be the
>>>>>      # same as the ones implemented in `PhaseMacroExpand::can_eliminate_allocation`
>>>>>      if cant_remove_phi_output(phi)
>>>>>          return ;
>>>>>
>>>>>      # Collect a set of tuples(F,U) containing nodes U that uses field F
>>>>>      # member of the object resulting from `phi`.
>>>>>      fields_used = collect_fields_used_after_phi(phi)
>>>>>
>>>>>      foreach field in fields_used
>>>>>          producers = {}
>>>>>
>>>>>          # Create a list with the last Store for each field "field" on the
>>>>>          # scope of each of the Phi input objects.
>>>>>          foreach o in phi.inputs
>>>>>              # The function called below might re-use a lot of the code/logic in `PhaseMacroExpand::scalar_replacement`
>>>>>              producers += last_store_to_o_field(0, field)
>>>>>          # Create a new phi node whose inputs are the Store's to 'field'
>>>>>          field_phi = create_new_phi(producers)
>>>>>
>>>>>          update_consumers(field, field_phi)
>>>>>
>>>>> The implementation that I envisioned would be as a "pre-process" [3] step just
>>>>> after EA but before the constraint checks in `adjust_scalar_replaceable_state`
>>>>> [2]. If we agree that the overall Scalar Replacement implementation goes through
>>>>> the following major phases:
>>>>>
>>>>>      1. Identify the Escape Status of objects.
>>>>>      2. Adjust object Escape and/or Scalar Replacement status based on a set of constraints.
>>>>>      3. Make call to Split_unique_types [4].
>>>>>      4 Iterate over object and array allocations.
>>>>>          4.1 Check if allocation can be eliminated.
>>>>>          4.2 Perform scalar replacement. Replace uses of object in Safepoints.
>>>>>          4.3 Process users of CheckCastPP other than Safepoint: AddP, ArrayCopy and CastP2X.
>>>>>
>>>>> The transformation that I am proposing would change the overall flow to look
>>>>> like this:
>>>>>
>>>>>      1. Identify the Escape Status of objects.
>>>>>      2. ----> New: "Split phi functions" <----
>>>>>      2. Adjust object Escape and/or Scalar Replacement status based on a set of constraints.
>>>>>      3. Make call to Split_unique_types [14].
>>>>>      4 Iterate over object and array allocations.
>>>>>          4.1 ----> Moved to split_phi: "Check if allocation can be eliminated" <----
>>>>>          4.2 Perform scalar replacement. Replace uses of object in Safepoints.
>>>>>          4.3 Process users of CheckCastPP other than Safepoint: AddP, ArrayCopy and CastP2X.
>>>>>
>>>>> Please let me know what you think and thank you for taking the time to review
>>>>> this!
>>>>>
>>>>>
>>>>> Regards,
>>>>> Cesar
>>>>>
>>>>> Notes:
>>>>>
>>>>>      [1] I am not sure yet how this approach will play with the case of a merge
>>>>>          with NULL.
>>>>>      [2] 
>>>>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopenjdk%2Fjdk%2Fblob%2F2f71a6b39ed6bb869b4eb3e81bc1d87f4b3328ff%2Fsrc%2Fhotspot%2Fshare%2Fopto%2Fescape.cpp%23L1809&data=04%7C01%7CDivino.Cesar%40microsoft.com%7Cc2580fe523c640d628d708d9ed8f81f3%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637802022912055025%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=eNNCJjkdjWPcmIxl0VVqkpjdRcaCCSDB5Focx0R9EuY%3D&reserved=0 
> <https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Fgithub.com*2Fopenjdk*2Fjdk*2Fblob*2F2f71a6b39ed6bb869b4eb3e81bc1d87f4b3328ff*2Fsrc*2Fhotspot*2Fshare*2Fopto*2Fescape.cpp*23L1809&data=04*7C01*7CDivino.Cesar*40microsoft.com*7Cc2580fe523c640d628d708d9ed8f81f3*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637802022912055025*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C3000&sdata=eNNCJjkdjWPcmIxl0VVqkpjdRcaCCSDB5Focx0R9EuY*3D&reserved=0__;JSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSU!!ACWV5N9M2RV99hQ!e1Tzr2hFook2sJaiA9-MXwtcKvodRKFQqkwhQ8dvjTwf0ynm2nYg54zUTBwUJuxvji6W0Q$>
>>>>>
>>>>>      [3] Another option would be to "patch" the current implementation to be able
>>>>>          to handle the merges. I am not certain that the "patch" approach would be
>>>>>          better, however, the "pre-process" approach is certainly much easier to test
>>>>>          and more readable.
>>>>>
>>>>>      [4] I cannot say I understand 100% the effects of executing
>>>>>          split_unique_types(). Would the transformation that I am proposing need to
>>>>>          be after the call to split_unique_types?