RFC : Approach to handle Allocation Merges in C2 Scalar Replacement

Wed Apr 20 23:35:11 UTC 2022

hi, Cesar,

I'd like to update what I learned recently. please help me do fact-check
and correct me.

I found that 'ConnectionGraph::split_unique_types' is the key for Scalar
Replacement in HotSpot. It enforces a property 'unique typing' for a
java object. If a java object is 'unique typing', its all uses are
exclusively refer to that java object.

Once a java object satisfies the property, c2 creates a new
instance-specific alias. split_unique_types() and LoadNode, PhiNode
ideal() functions split out all relevant nodes with the refined
adr_type, or the new alias.

A phi(o1, o2) original has adr_type alias<X>. X is a general MyPair*.
After some transformations, we can have distinct alias<Y> and
alias<Y+1>. They are both instance-specific aliases. In our previous
discussion, the splitting I called Ivanov/Divino Approach :) not only
gets rid of phi(o1, o2) but also makes o1 and o2 unique typing. In terms
of alias analysis, memssa involves from <X> to <Y> & <Y+1>.

Here is my new thinking on this. If you agree with my understanding of
'unique typing' property so far. Let's take a step back and review it. I
feel it's too strong and it's really complex to enforce it. All we need
is more precise alias analysis.

In your example, phi(o1, o2) jeopardizes the unique typing property for
both o1 and o2. That's why EA marks them NSR and give up.  However, the
phi node actually does NOT block scalar replacement. We still can handle
'LOADI o3.x' in 'PhaseMacroExpand::process_users_of_allocation()' and
even deoptimization as long as we have better alias info.

Let's assume we have a phi(o1, o2), and we know that o1 has alias<Y> and
o2 has alias<Y+1>. splitting phi beforehand is not required. We can
leverage the fact that Java is a strong-typed OO language, so memory
locations of o1 and o2 don't overlap. We know which java object we are
dealing with in scalar replacement. It is alloc->_idx! we can take a
path at phi(o1, o2).

Here is the simplest case.  we can allow SR to deal with NSR objects
which have been dead before reaching any safepoint. In your example
escapeAnalysisFails(), 'o' is not live at the exiting safepoint.
Therefore, it's not c2's responsibility to save its state.
process_users_of_allocation(alloc) can simplify
from
(LoadI, memphi(o1, o2), AddP(phi(o1, o2), #offset))
to
(LoadI, o1, AddP(o1, #offset)) for o1. It will do the similar change for
the 2nd java object.

In a nutshell, it's the same splitting approach we're talking about. It
doesn't require explicit selector_id and we only do that on demand.

thanks,
--lx

On 4/4/22 9:34 PM, Cesar Soares Lucas wrote:
> *CAUTION*: This email originated from outside of the organization. Do
> not click links or open attachments unless you can confirm the sender
> and know the content is safe.
> 
> 
> Hi, Xin Liu.
> 
>  
> 
> Thank you for asking these questions and sharing your ideas!
> 
>  
> 
> You understand is correct. I’m trying to make objects that currently are
> NonEscape but NSR become Scalar Replaceable. These objects are marked as
> NSR because they are connected in a Phi node.
> 
>  
> 
> You understood Vladimir selector idea correctly (AFAIU). The problem
> with that idea is that we can’t directly access the objects after the
> Region node merging the control branches that define such objects.
> However, after playing for a while with this Selector idea I found out
> that it seems we don’t really need it: if we generalize
> split_through_phi enough we can handle many cases that cause objects to
> be marked as NSR.
> 
>  
> 
> I’ve observed the CastPP nodes. I did some experiments to identify the
> most frequent node types that come after Phi nodes merging object
> allocation. ***Roughly the numbers are***: ~70% CallStaticJava, 6%
> Allocate, 3% CmpP, 3% CastPP, etc.
> 
>  
> 
> The split_through_phi idea works great (AFAIU) if we are floating up
> nodes that don’t have control inputs, unfortunately often nodes do and
> that’s a bummer. However, as I mentioned above, looks like that in most
> of the cases the nodes that consume the merge Phi _/and/_ have control
> input, are CallStaticJava “Uncommon Trap” and I’ve an idea to “split
> through phi” these nodes.
> 
>  
> 
> Thanks again for the question and sorry for the long text.
> 
> Cesar
> 
>  
> 
>  
> 
> On 4/4/22, 4:10 PM, "Liu, Xin" <xxinliu at amazon.com> wrote:
> 
> hi, Cesar
> 
>  
> 
> I am trying to catch up your conversation. Allow me to repeat the
> 
> problem. You are improving NonEscape but NSR objects, tripped by
> 
> merging. The typical form is like the example from "Control Flow
> 
> Merges".
> 
> https://cr.openjdk.java.net/~cslucas/escape-analysis/EscapeAnalysis.html
> <https://cr.openjdk.java.net/~cslucas/escape-analysis/EscapeAnalysis.html>
> 
>  
> 
> Those two JavaObjects in your example 'escapeAnalysisFails' are NSR
> 
> because they intertwine and will hinder split_unique_types. In Ivanov's
> 
> approach, we insert an explicit selector to split JOs at uses. Because
> 
> uses are separate, we then can proceed with split_unique_types() for
> 
> them individually. (please correct me if I misunderstand here)
> 
>  
> 
> here is the original control flow.
> 
>  
> 
> B0--
> 
> o1 = new MyPair(0,0)
> 
> cond
> 
> ----
> 
> | \
> 
> |  B1--------------------
> 
> |  | o2 = new MyPair(x, y)
> 
> |  -----------------------
> 
> |   /
> 
> B2-------------
> 
> o3 = phi(o1, o2)
> 
> x = o3.x;
> 
> ---------------
> 
>  
> 
>  
> 
> here is after?
> 
>  
> 
> B0--
> 
> o1 = new MyPair(0,0)
> 
> cond
> 
> ----
> 
> | \
> 
> |  B1--------------------
> 
> |  | o2 = new MyPair(x, y)
> 
> |  -----------------------
> 
> |   /
> 
> B2-------------
> 
> selector = phi(o1, o2)
> 
> cmp(select, 0)
> 
> ---------------
> 
> |              \
> 
> --------       --------
> 
> x1 = o1.x|       x2 = o2.x
> 
> ---------      -------
> 
> |             /
> 
> ---------------
> 
> x3 = phi(x1, x2)
> 
> ---------------
> 
>  
> 
> Besides the fixed form Load/Store(PHI(base1, base2), ADDP), I'd like to
> 
> report that C2 sometimes insert CastPP in between. Object
> 
> 'Integer(65536)' in the following example is also non-escape but NSR.
> 
> there's a CastPP to make sure the object is not NULL. The more general
> 
> case is that the object is returned from a inlined function called.
> 
>  
> 
> public class MergingAlloc {
> 
> ...
> 
>     public static Integer zero = Integer.valueOf(0);
> 
>  
> 
>     public static int testBoxingObject(boolean cond) {
> 
>         Integer i = zero;
> 
>  
> 
>         if (cond) {
> 
>             i = new Integer(65536);
> 
>         }
> 
>  
> 
>         return i; // i.intValue();
> 
>     }
> 
>  
> 
>     public static void main(String[] args) {
> 
>         MyPair p = new MyPair(0, 0);
> 
>         escapeAnalysisFails(true, 1, 0);
> 
>         testBoxingObject(true);
> 
>     }
> 
> }
> 
>  
> 
>  
> 
> I though that LoadNode::split_through_phi() should split the LoadI of
> 
> i.intValue() in the Iterative GVN before Escape Analysis but current
> 
> it's not.  I wonder if it's possible to make
> 
> LoadNode::split_through_phi() or PhaseIdealLoop::split_thru_phi() more
> 
> general. if so, it will fit better in C2 design. i.e. we evolve code in
> 
> local scope. In this case, splitting through a phi node of multiple
> 
> objects is beneficial when the result disambiguate memories.
> 
>  
> 
> In your example, ideally split_through_phi() should be able to produce
> 
> simpler code. currently, split_through_phi only works for load node and
> 
> there are a few constraints.
> 
>  
> 
> B0-------------
> 
> o1 = new MyPair(0,0)
> 
> x1 = o1.x
> 
> cond
> 
> ----------------
> 
> |   \
> 
> |  B1--------------------
> 
> |  | o2 = new MyPair(x, y)
> 
> |  | x2 = o2.x;
> 
> |  -----------------------
> 
> |   /
> 
> -------------
> 
> x3 = phi(x1, x2)
> 
> ---------------
> 
>  
> 
>  
> 
> thanks,
> 
> --lx
>