[External] : Re: RFC : Approach to handle Allocation Merges in C2 Scalar Replacement

Fri Feb 11 22:34:48 UTC 2022

> Actually Roland suggested before to always run split_unique_types(), 
> even if non-escaping object is not scalar replaceable.
I second Roland suggestion. Irrespective of SR enhancements being 
discussed, it's beneficial to introduce unique instance slices for 
non-escaping allocations which aren't scalar replaceable.

(I experimented a bit with that idea before [1] and got some promising 
results.)

Best regards,
Vladimir Ivanov

[1] https://github.com/iwanowww/jdk/tree/split_unique_types

>>
>>  > As you correctly pointed in your original post, EA and SR heavily 
>> depend on _instance_id which is Allocation node's _idx value.
>>  > And the solution for this is either remove reference to merge phi. 
>> Or introduce new "virtual allocation" which fields are
>>  > populated by loads - but it is again needs to split through phi.
>>
>> Vladimir K., are you suggesting something like this:
> 
> Yes, that is what I had in mind but it could be more complicated to 
> implement.
> The idea is to have new _instance_id after merge. It may not be useful 
> for this simple case but may help in others.
> We would still need to improve split_unique_types to find values to 
> store in its fields.
> 
>>
>>   public int ex1(boolean cond, int first, int second) {
>>        p0 = ...;
>>        p0.x = first;
>>        p0.y = second;
>>        if (cond) {
>>            p1 = ...
>>            p1.x = second;
>>            p1.y = first;
>>        }
>>        p = VirtualAllocate(...);
>>        p.x = phi(first, second);
>>        p.y = phi(second, first);
>>        return p.x - p.y;
>>   }
>>
>> Would the phi/merges for individual fields cause problems in 
>> split_unique_types/SR?
> 
> Split_unique_types/SR work on allocation's references and not on what is 
> stored in its fields.
> 
> Thanks,
> Vladimir
> 
>>
>>
>> Regards,
>> Cesar
>>
>> ------------------------------------------------------------------------------------------------------------------------ 
>>
>> *From:* Vladimir Kozlov <vladimir.kozlov at oracle.com>
>> *Sent:* February 11, 2022 10:51 AM
>> *To:* Vladimir Ivanov <vladimir.x.ivanov at oracle.com>; Cesar Soares 
>> Lucas <Divino.Cesar at microsoft.com>; hotspot compiler 
>> <hotspot-compiler-dev at openjdk.java.net>
>> *Cc:* Brian Stafford <Brian.Stafford at microsoft.com>; Martijn Verburg 
>> <Martijn.Verburg at microsoft.com>
>> *Subject:* Re: RFC : Approach to handle Allocation Merges in C2 Scalar 
>> Replacement
>> On 2/11/22 10:24 AM, Vladimir Ivanov wrote:
>>>
>>>> For this simple case we can teach C2's IGVN to split fields loads 
>>>> through Phi so that phi(p0, p1) is not used and
>>>> allocations as well. We can do that because we know that allocations 
>>>> and phi do not escape.
>>>
>>> Seems like split_unique_types() pass is a prerequisite to make such 
>>> transformations predictable on a larger scale. It
>>> simplifies the memory graph and untangles instance-related memory 
>>> operations from the rest of the graph.
>>
>> Can you elaborate on your suggestion?
>>
>> As you correctly pointed in your original post, EA and SR heavily 
>> depend on _instance_id which is Allocation node's _idx
>> value. And the solution for this is either remove reference to merge 
>> phi. Or introduce new "virtual allocation" which
>> fields are populated by loads - but it is again needs to split through 
>> phi.
>>
>>>
>>> Otherwise, I believe wide memory states/merges would pose serious 
>>> problems. E.g., when a load is split through a phi,
>>> you need to pick correct memory states for the new loads above the 
>>> phi which quickly becomes quite a challenging task.
>>
>> Especially in loops. Actually I did tried such approach before and I 
>> agree that it is hard.
>>
>> Thanks,
>> Vladimir K
>>
>>>
>>> Best regards,
>>> Vladimir Ivanov
>>>
>>>>
>>>> public int ex1(boolean cond, int first, int second) {
>>>>      p0 = Allocate(...);
>>>>      ...
>>>>      p0.x = first;
>>>>      p0.y = second;
>>>>
>>>>      if (cond) {
>>>>          p1 = Allocate(...);
>>>>          ...
>>>>          p1.x = second;
>>>>          p1.y = first;
>>>>      }
>>>>
>>>>      p = phi(p0, p1) // unused and will be removed
>>>>
>>>>
>>>>      return phi(p0.x,p1.x) - phi(p0.y, p1.y);
>>>> }
>>>>
>>>> Thanks,
>>>> Vladimir K
>>>>
>>>> On 2/10/22 11:29 AM, Vladimir Ivanov wrote:
>>>>> (BCCing hotspot-dev and moving the discussion to 
>>>>> hotspot-compiler-dev.)
>>>>>
>>>>> Hi Cesar,
>>>>>
>>>>> Thanks for looking into enhancing EA.
>>>>>
>>>>> Overall, the proposal looks reasonable.
>>>>>
>>>>> I suggest to look more closely at split_unique_types().
>>>>> It introduces a dedicated class of alias categories for fields of 
>>>>> the allocation being eliminated and clones memory
>>>>> graph. I don't see why it shouldn't work for multiple allocations.
>>>>>
>>>>> Moreover, split_unique_types() will break if you start optimizing 
>>>>> multiple allocations at once. The notion of unique
>>>>> alias should be adjusted and cover the union of unique aliases for 
>>>>> all interacting allocations.
>>>>>
>>>>> Seems like you need to enhance SR to work on non-intersecting 
>>>>> clusters of allocations.
>>>>>
>>>>> One thing to take care of: scalar replacement relies on 
>>>>> TypeOopPtr::instance_id().
>>>>>
>>>>>    // If not InstanceTop or InstanceBot, indicates that this is
>>>>>    // a particular instance of this type which is distinct.
>>>>>    // This is the node index of the allocation node creating this 
>>>>> instance.
>>>>>    int           _instance_id;
>>>>>
>>>>> It'll break when multiple allocations are in play.
>>>>>
>>>>> Best regards,
>>>>> Vladimir Ivanov
>>>>>
>>>>> On 09.02.2022 04:45, Cesar Soares Lucas wrote:
>>>>>> Hi there again!
>>>>>>
>>>>>> Can you please give me feedback on the following approach to at 
>>>>>> least partially
>>>>>> address [1], the scalar replacement allocation merge issue?
>>>>>>
>>>>>> The problem that I am trying to solve arises when allocations are 
>>>>>> merged after a
>>>>>> control flow split. The code below shows _one example_ of such a 
>>>>>> situation.
>>>>>>
>>>>>> public int ex1(boolean cond, int x, int y) {
>>>>>>      Point p = new Point(x, y);
>>>>>>      if (cond)
>>>>>>          p = new Point(y, x);
>>>>>>      // Allocations for p are merged here.
>>>>>>      return p.calc();
>>>>>> }
>>>>>>
>>>>>> Assuming the method calls on "p" are inlined then the allocations 
>>>>>> will not
>>>>>> escape the method. The C2 IR for this method will look like this:
>>>>>>
>>>>>> public int ex1(boolean cond, int first, int second) {
>>>>>>      p0 = Allocate(...);
>>>>>>      ...
>>>>>>      p0.x = first;
>>>>>>      p0.y = second;
>>>>>>
>>>>>>      if (cond) {
>>>>>>          p1 = Allocate(...);
>>>>>>          ...
>>>>>>          p1.x = second;
>>>>>>          p1.y = first;
>>>>>>      }
>>>>>>
>>>>>>      p = phi(p0, p1)
>>>>>>
>>>>>>      return p.x - p.y;
>>>>>> }
>>>>>>
>>>>>> However, one of the constraints implemented here [2], specifically 
>>>>>> the third
>>>>>> one, will prevent the objects from being scalar replaced.
>>>>>>
>>>>>> The approach that I'm considering for solving the problem is to 
>>>>>> replace the Phi
>>>>>> node `p = phi(p0, p1)` with new Phi nodes for each of the fields 
>>>>>> of the objects
>>>>>> in the original Phi. The IR for `ex1` would look something like 
>>>>>> this after the
>>>>>> transformation:
>>>>>>
>>>>>> public int ex1(boolean cond, int first, int second) {
>>>>>>      p0 = Allocate(...);
>>>>>>      ...
>>>>>>      p0.x = first;
>>>>>>      p0.y = second;
>>>>>>
>>>>>>      if (cond) {
>>>>>>          p1 = Allocate(...);
>>>>>>          ...
>>>>>>          p1.x = second;
>>>>>>          p1.y = first;
>>>>>>      }
>>>>>>
>>>>>>      pX = phi(first, second)
>>>>>>      pY = phi(second, first)
>>>>>>
>>>>>>      return pX - pY;
>>>>>> }
>>>>>>
>>>>>> I understand that this transformation might not be applicable for 
>>>>>> all cases and
>>>>>> that it's not as simple as illustrated above. Also, it seems to me 
>>>>>> that much of
>>>>>> what I'd have to implement is already implemented in other steps 
>>>>>> of the Scalar
>>>>>> Replacement pipeline (which is a good thing). To work around these
>>>>>> implementation details I plan to use as much of the existing code 
>>>>>> as possible.
>>>>>> The algorithm for the transformation would be like this:
>>>>>>
>>>>>> split_phis(phi)
>>>>>>      # If output of phi escapes, or something uses its identity, etc
>>>>>>      # then we can't remove it. The conditions here might possible 
>>>>>> be the
>>>>>>      # same as the ones implemented in 
>>>>>> `PhaseMacroExpand::can_eliminate_allocation`
>>>>>>      if cant_remove_phi_output(phi)
>>>>>>          return ;
>>>>>>
>>>>>>      # Collect a set of tuples(F,U) containing nodes U that uses 
>>>>>> field F
>>>>>>      # member of the object resulting from `phi`.
>>>>>>      fields_used = collect_fields_used_after_phi(phi)
>>>>>>
>>>>>>      foreach field in fields_used
>>>>>>          producers = {}
>>>>>>
>>>>>>          # Create a list with the last Store for each field 
>>>>>> "field" on the
>>>>>>          # scope of each of the Phi input objects.
>>>>>>          foreach o in phi.inputs
>>>>>>              # The function called below might re-use a lot of the 
>>>>>> code/logic in `PhaseMacroExpand::scalar_replacement`
>>>>>>              producers += last_store_to_o_field(0, field)
>>>>>>          # Create a new phi node whose inputs are the Store's to 
>>>>>> 'field'
>>>>>>          field_phi = create_new_phi(producers)
>>>>>>
>>>>>>          update_consumers(field, field_phi)
>>>>>>
>>>>>> The implementation that I envisioned would be as a "pre-process" 
>>>>>> [3] step just
>>>>>> after EA but before the constraint checks in 
>>>>>> `adjust_scalar_replaceable_state`
>>>>>> [2]. If we agree that the overall Scalar Replacement 
>>>>>> implementation goes through
>>>>>> the following major phases:
>>>>>>
>>>>>>      1. Identify the Escape Status of objects.
>>>>>>      2. Adjust object Escape and/or Scalar Replacement status 
>>>>>> based on a set of constraints.
>>>>>>      3. Make call to Split_unique_types [4].
>>>>>>      4 Iterate over object and array allocations.
>>>>>>          4.1 Check if allocation can be eliminated.
>>>>>>          4.2 Perform scalar replacement. Replace uses of object in 
>>>>>> Safepoints.
>>>>>>          4.3 Process users of CheckCastPP other than Safepoint: 
>>>>>> AddP, ArrayCopy and CastP2X.
>>>>>>
>>>>>> The transformation that I am proposing would change the overall 
>>>>>> flow to look
>>>>>> like this:
>>>>>>
>>>>>>      1. Identify the Escape Status of objects.
>>>>>>      2. ----> New: "Split phi functions" <----
>>>>>>      2. Adjust object Escape and/or Scalar Replacement status 
>>>>>> based on a set of constraints.
>>>>>>      3. Make call to Split_unique_types [14].
>>>>>>      4 Iterate over object and array allocations.
>>>>>>          4.1 ----> Moved to split_phi: "Check if allocation can be 
>>>>>> eliminated" <----
>>>>>>          4.2 Perform scalar replacement. Replace uses of object in 
>>>>>> Safepoints.
>>>>>>          4.3 Process users of CheckCastPP other than Safepoint: 
>>>>>> AddP, ArrayCopy and CastP2X.
>>>>>>
>>>>>> Please let me know what you think and thank you for taking the 
>>>>>> time to review
>>>>>> this!
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>> Cesar
>>>>>>
>>>>>> Notes:
>>>>>>
>>>>>>      [1] I am not sure yet how this approach will play with the 
>>>>>> case of a merge
>>>>>>          with NULL.
>>>>>>      [2] 
>>>>>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopenjdk%2Fjdk%2Fblob%2F2f71a6b39ed6bb869b4eb3e81bc1d87f4b3328ff%2Fsrc%2Fhotspot%2Fshare%2Fopto%2Fescape.cpp%23L1809&data=04%7C01%7CDivino.Cesar%40microsoft.com%7Cc2580fe523c640d628d708d9ed8f81f3%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637802022912055025%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=eNNCJjkdjWPcmIxl0VVqkpjdRcaCCSDB5Focx0R9EuY%3D&reserved=0 
>>
>> <https://urldefense.com/v3/__https://nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Fgithub.com*2Fopenjdk*2Fjdk*2Fblob*2F2f71a6b39ed6bb869b4eb3e81bc1d87f4b3328ff*2Fsrc*2Fhotspot*2Fshare*2Fopto*2Fescape.cpp*23L1809&data=04*7C01*7CDivino.Cesar*40microsoft.com*7Cc2580fe523c640d628d708d9ed8f81f3*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C637802022912055025*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C3000&sdata=eNNCJjkdjWPcmIxl0VVqkpjdRcaCCSDB5Focx0R9EuY*3D&reserved=0__;JSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSU!!ACWV5N9M2RV99hQ!e1Tzr2hFook2sJaiA9-MXwtcKvodRKFQqkwhQ8dvjTwf0ynm2nYg54zUTBwUJuxvji6W0Q$> 
>>
>>>>>>
>>>>>>      [3] Another option would be to "patch" the current 
>>>>>> implementation to be able
>>>>>>          to handle the merges. I am not certain that the "patch" 
>>>>>> approach would be
>>>>>>          better, however, the "pre-process" approach is certainly 
>>>>>> much easier to test
>>>>>>          and more readable.
>>>>>>
>>>>>>      [4] I cannot say I understand 100% the effects of executing
>>>>>>          split_unique_types(). Would the transformation that I am 
>>>>>> proposing need to
>>>>>>          be after the call to split_unique_types?