RFC : Approach to handle Allocation Merges in C2 Scalar Replacement

Thu Feb 10 19:29:45 UTC 2022

(BCCing hotspot-dev and moving the discussion to hotspot-compiler-dev.)

Hi Cesar,

Thanks for looking into enhancing EA.

Overall, the proposal looks reasonable.

I suggest to look more closely at split_unique_types().
It introduces a dedicated class of alias categories for fields of the 
allocation being eliminated and clones memory graph. I don't see why it 
shouldn't work for multiple allocations.

Moreover, split_unique_types() will break if you start optimizing 
multiple allocations at once. The notion of unique alias should be 
adjusted and cover the union of unique aliases for all interacting 
allocations.

Seems like you need to enhance SR to work on non-intersecting clusters 
of allocations.

One thing to take care of: scalar replacement relies on 
TypeOopPtr::instance_id().

   // If not InstanceTop or InstanceBot, indicates that this is
   // a particular instance of this type which is distinct.
   // This is the node index of the allocation node creating this instance.
   int           _instance_id;

It'll break when multiple allocations are in play.

Best regards,
Vladimir Ivanov

On 09.02.2022 04:45, Cesar Soares Lucas wrote:
> Hi there again!
> 
> Can you please give me feedback on the following approach to at least partially
> address [1], the scalar replacement allocation merge issue?
> 
> The problem that I am trying to solve arises when allocations are merged after a
> control flow split. The code below shows _one example_ of such a situation.
> 
> public int ex1(boolean cond, int x, int y) {
>      Point p = new Point(x, y);
>      if (cond)
>          p = new Point(y, x);
>      // Allocations for p are merged here.
>      return p.calc();
> }
> 
> Assuming the method calls on "p" are inlined then the allocations will not
> escape the method. The C2 IR for this method will look like this:
> 
> public int ex1(boolean cond, int first, int second) {
>      p0 = Allocate(...);
>      ...
>      p0.x = first;
>      p0.y = second;
> 
>      if (cond) {
>          p1 = Allocate(...);
>          ...
>          p1.x = second;
>          p1.y = first;
>      }
> 
>      p = phi(p0, p1)
> 
>      return p.x - p.y;
> }
> 
> However, one of the constraints implemented here [2], specifically the third
> one, will prevent the objects from being scalar replaced.
> 
> The approach that I'm considering for solving the problem is to replace the Phi
> node `p = phi(p0, p1)` with new Phi nodes for each of the fields of the objects
> in the original Phi. The IR for `ex1` would look something like this after the
> transformation:
> 
> public int ex1(boolean cond, int first, int second) {
>      p0 = Allocate(...);
>      ...
>      p0.x = first;
>      p0.y = second;
> 
>      if (cond) {
>          p1 = Allocate(...);
>          ...
>          p1.x = second;
>          p1.y = first;
>      }
> 
>      pX = phi(first, second)
>      pY = phi(second, first)
> 
>      return pX - pY;
> }
> 
> I understand that this transformation might not be applicable for all cases and
> that it's not as simple as illustrated above. Also, it seems to me that much of
> what I'd have to implement is already implemented in other steps of the Scalar
> Replacement pipeline (which is a good thing). To work around these
> implementation details I plan to use as much of the existing code as possible.
> The algorithm for the transformation would be like this:
> 
> split_phis(phi)
>      # If output of phi escapes, or something uses its identity, etc
>      # then we can't remove it. The conditions here might possible be the
>      # same as the ones implemented in `PhaseMacroExpand::can_eliminate_allocation`
>      if cant_remove_phi_output(phi)
>          return ;
> 
>      # Collect a set of tuples(F,U) containing nodes U that uses field F
>      # member of the object resulting from `phi`.
>      fields_used = collect_fields_used_after_phi(phi)
> 
>      foreach field in fields_used
>          producers = {}
> 
>          # Create a list with the last Store for each field "field" on the
>          # scope of each of the Phi input objects.
>          foreach o in phi.inputs
>              # The function called below might re-use a lot of the code/logic in `PhaseMacroExpand::scalar_replacement`
>              producers += last_store_to_o_field(0, field)
>          
>          # Create a new phi node whose inputs are the Store's to 'field'
>          field_phi = create_new_phi(producers)
> 
>          update_consumers(field, field_phi)
> 
> The implementation that I envisioned would be as a "pre-process" [3] step just
> after EA but before the constraint checks in `adjust_scalar_replaceable_state`
> [2]. If we agree that the overall Scalar Replacement implementation goes through
> the following major phases:
> 
>      1. Identify the Escape Status of objects.
>      2. Adjust object Escape and/or Scalar Replacement status based on a set of constraints.
>      3. Make call to Split_unique_types [4].
>      4 Iterate over object and array allocations.
>          4.1 Check if allocation can be eliminated.
>          4.2 Perform scalar replacement. Replace uses of object in Safepoints.
>          4.3 Process users of CheckCastPP other than Safepoint: AddP, ArrayCopy and CastP2X.
> 
> The transformation that I am proposing would change the overall flow to look
> like this:
> 
>      1. Identify the Escape Status of objects.
>      2. ----> New: "Split phi functions" <----
>      2. Adjust object Escape and/or Scalar Replacement status based on a set of constraints.
>      3. Make call to Split_unique_types [14].
>      4 Iterate over object and array allocations.
>          4.1 ----> Moved to split_phi: "Check if allocation can be eliminated" <----
>          4.2 Perform scalar replacement. Replace uses of object in Safepoints.
>          4.3 Process users of CheckCastPP other than Safepoint: AddP, ArrayCopy and CastP2X.
> 
> Please let me know what you think and thank you for taking the time to review
> this!
> 
> 
> Regards,
> Cesar
> 
> Notes:
> 
>      [1] I am not sure yet how this approach will play with the case of a merge
>          with NULL.
>   
>      [2] https://github.com/openjdk/jdk/blob/2f71a6b39ed6bb869b4eb3e81bc1d87f4b3328ff/src/hotspot/share/opto/escape.cpp#L1809
> 
>      [3] Another option would be to "patch" the current implementation to be able
>          to handle the merges. I am not certain that the "patch" approach would be
>          better, however, the "pre-process" approach is certainly much easier to test
>          and more readable.
> 
>      [4] I cannot say I understand 100% the effects of executing
>          split_unique_types(). Would the transformation that I am proposing need to
>          be after the call to split_unique_types?