RFR(L): 8005071: Incremental inlining for JSR 292

Tue Dec 18 12:20:43 PST 2012

>> A() dominates B(). During parsing, we build a complete graph with CallNodes for A() and B(). Let's say we decide to inline A() after parsing so we build a subgraph with A()'s inputs as inputs to parsing and then we replace B()'s outputs with the resulting new subgraph.
> 
> I assume you mean "replace A()'s outputs".

I mean: we replace B()'s outputs with the outputs of the new subgraph.

> Top inputs should be processed correctly. If we crash, it indicates that we are missing checks for top inputs in some ideal nodes methods (Ideal(), Value()).
> 
> Also kill_dead_call_outputs() works only if final_ctl->is_top() which contradict to your case.
> 
>> 
>> That doesn't happen with parse-time inlining because A()'s subgraph is built inside the main graph, all results from A()'s inlining are propagated forward to B() by the gvn, the input state for B() is consistent and we don't try to inlining B() because this is a dead path.
> 
> We don't inline B() if control is top (stopped()). So it is the same situation as with kill_dead_call_outputs().

Let's say we have this:

r = A()

for () {
   B(r);
}

We inline A() and its outputs are dead if we move to B(), we'll see a non dead Region as control input, a non dead Phi as memory input and so we start inlining B but r is top and that confuses the parser. That's what I meant by inconsistent inputs and I found that would happen in many different ways if there's a lot of incremental inlining going on so it seemed to me that it makes more sense to clean the graph rather than make the parser robust to strange input states.

>> I don't think PhaseRemoveUseless helps in this case. What we would need in the case of post parse inlining would be to inline A(), apply an igvn to propagate the results from the inlining forward to B(), then inline B(). But even an igvn is not sufficient if for instance, B() is inside a loop.
> 
> May be IGVN is not enough (it does not have deliminators information and works on small subgraphs) but PhaseIdealLoop can do that (it was the reason I called it before EA). Which brings the question: why you call PhaseIdealLoop only when live nodes reach limit?

Because I found it to be expensive when when we do a lot of inlining incrementally and we have to rebuild the CFG structures again and again so I tried to do it only once there's no more decrease in the graph size with IGVN.

>> That's why I use GraphKit::kill_dead_call_outputs(). I don't understand why it's dangerous since it operates in well defined conditions. In the example above, we have a complete graph with a DirectCall to A(). We are done with parsing for A() and the subgraph for A() uses A()'s inputs but it is not yet connected to the rest of the graph by it's outputs. We want to find calls that A()'s control outputs dominate and we work on a complete graph, the subgraph for A() doesn't really matter.
> 
> I don't see the big need for new "dead code cleanup" code when we have already several implementations. We may need to adjust them to work for your case (PhaseIdealLoop work without inner loops).

I found PhaseIdeaLoop to be very expensive if we need to do a lot of it that's why I developed the custom clean up code that is triggered only along the dead outputs and that only works on the part of the graph that is being changed so it's less expensive and according to profiling I've done for the compiler threads when a lot of incremental inlining is happening, it's indeed no longer a bottleneck.

Roland.