RFC : Approach to handle Allocation Merges in C2 Scalar Replacement

Wed Feb 9 01:45:57 UTC 2022

Hi there again!

Can you please give me feedback on the following approach to at least partially
address [1], the scalar replacement allocation merge issue? 

The problem that I am trying to solve arises when allocations are merged after a
control flow split. The code below shows _one example_ of such a situation. 

public int ex1(boolean cond, int x, int y) {
    Point p = new Point(x, y);
    if (cond)
        p = new Point(y, x);
    // Allocations for p are merged here.
    return p.calc();
}

Assuming the method calls on "p" are inlined then the allocations will not
escape the method. The C2 IR for this method will look like this: 

public int ex1(boolean cond, int first, int second) {
    p0 = Allocate(...); 
    ...
    p0.x = first;
    p0.y = second;

    if (cond) {
        p1 = Allocate(...);
        ...
        p1.x = second;
        p1.y = first;
    }

    p = phi(p0, p1)

    return p.x - p.y;
}

However, one of the constraints implemented here [2], specifically the third
one, will prevent the objects from being scalar replaced.  

The approach that I'm considering for solving the problem is to replace the Phi
node `p = phi(p0, p1)` with new Phi nodes for each of the fields of the objects
in the original Phi. The IR for `ex1` would look something like this after the
transformation: 

public int ex1(boolean cond, int first, int second) {
    p0 = Allocate(...); 
    ...
    p0.x = first;
    p0.y = second;

    if (cond) {
        p1 = Allocate(...);
        ...
        p1.x = second;
        p1.y = first;
    }

    pX = phi(first, second)
    pY = phi(second, first)

    return pX - pY;
}

I understand that this transformation might not be applicable for all cases and
that it's not as simple as illustrated above. Also, it seems to me that much of
what I'd have to implement is already implemented in other steps of the Scalar
Replacement pipeline (which is a good thing). To work around these
implementation details I plan to use as much of the existing code as possible.
The algorithm for the transformation would be like this: 

split_phis(phi)
    # If output of phi escapes, or something uses its identity, etc
    # then we can't remove it. The conditions here might possible be the 
    # same as the ones implemented in `PhaseMacroExpand::can_eliminate_allocation`
    if cant_remove_phi_output(phi)
        return ;

    # Collect a set of tuples(F,U) containing nodes U that uses field F
    # member of the object resulting from `phi`.
    fields_used = collect_fields_used_after_phi(phi)

    foreach field in fields_used 
        producers = {}

        # Create a list with the last Store for each field "field" on the
        # scope of each of the Phi input objects.
        foreach o in phi.inputs
            # The function called below might re-use a lot of the code/logic in `PhaseMacroExpand::scalar_replacement`
            producers += last_store_to_o_field(0, field)

        # Create a new phi node whose inputs are the Store's to 'field'
        field_phi = create_new_phi(producers)

        update_consumers(field, field_phi)

The implementation that I envisioned would be as a "pre-process" [3] step just
after EA but before the constraint checks in `adjust_scalar_replaceable_state`
[2]. If we agree that the overall Scalar Replacement implementation goes through
the following major phases: 

    1. Identify the Escape Status of objects. 
    2. Adjust object Escape and/or Scalar Replacement status based on a set of constraints. 
    3. Make call to Split_unique_types [4]. 
    4 Iterate over object and array allocations. 
        4.1 Check if allocation can be eliminated.  
        4.2 Perform scalar replacement. Replace uses of object in Safepoints. 
        4.3 Process users of CheckCastPP other than Safepoint: AddP, ArrayCopy and CastP2X. 

The transformation that I am proposing would change the overall flow to look
like this: 

    1. Identify the Escape Status of objects. 
    2. ----> New: "Split phi functions" <---- 
    2. Adjust object Escape and/or Scalar Replacement status based on a set of constraints. 
    3. Make call to Split_unique_types [14]. 
    4 Iterate over object and array allocations. 
        4.1 ----> Moved to split_phi: "Check if allocation can be eliminated" <---- 
        4.2 Perform scalar replacement. Replace uses of object in Safepoints. 
        4.3 Process users of CheckCastPP other than Safepoint: AddP, ArrayCopy and CastP2X. 

Please let me know what you think and thank you for taking the time to review
this! 

Regards, 
Cesar 

Notes: 

    [1] I am not sure yet how this approach will play with the case of a merge
        with NULL. 

    [2] https://github.com/openjdk/jdk/blob/2f71a6b39ed6bb869b4eb3e81bc1d87f4b3328ff/src/hotspot/share/opto/escape.cpp#L1809 

    [3] Another option would be to "patch" the current implementation to be able
        to handle the merges. I am not certain that the "patch" approach would be
        better, however, the "pre-process" approach is certainly much easier to test
        and more readable. 

    [4] I cannot say I understand 100% the effects of executing
        split_unique_types(). Would the transformation that I am proposing need to
        be after the call to split_unique_types?