SIGSEGV in C2 compiler

Tom Rodriguez tom.rodriguez at oracle.com
Tue Feb 8 11:22:06 PST 2011


On Feb 8, 2011, at 10:14 AM, Denis Lila wrote:

> Hi Tom.
> 
>> So, as long as it's possible for the data to die, but that data's
>> control to stay alive, I think this will be a problem, regardless
>> of ConvI2L's behaviour.
> 
> I should have probably asked this before I sent that huge e-mail:
> 
> Is it a legal state for a data input of a phi to be dead but that data's
> control to be alive?

No it's not.  Data is only allowed to go dead as a result of control going dead.  That's why this is a bug in how ConvI2L is used and not a bug with handling of Phis.  It would be nicer if we detected this during GVN itself but it's not entirely simple to do so.  Maybe processing of Phis could be delayed until all other control nodes are processed and then Phis with top inputs would be flagged as an error.  That would require significant reworking of the worklist processing I think.

tom

> 
> To me it doesn't seem so. In that case, can this happen with any node
> other than a range checked ConvI2L whose input is always out of range?
> 
> Thank you,
> Denis.
> 
> ----- Original Message -----
>>> Ping me directly if you'd like help restructuring the loop to avoid
>>> the
>>> problem.
>> 
>> Thank you, but I already know how to do that. The reason I worked on
>> this
>> is because I am interested in compilers, and I would like to work on
>> hotspot. This seemed like a good place to start.
>> 
>>> I'm sorry I didn't follow up on this with you, I lost track of it
>>> over
>>> the Christmas holiday. This appears to be the same issue as 6675699
>>> where a ConvI2L with a constrained type on it has it's input
>>> replaced
>>> with a constant outside that range. The identity transformation
>>> converts that to top which causes a phi collapse and the control
>>> flow
>>> to be killed. We've kind of band aided this in a few other places
>>> but
>>> it appears that loop unswitching can create similar issues.
>> 
>> I've spent a lot of time thinking about this and staring at graphs,
>> and
>> here's what I think: (now, before this, I had never even read anything
>> about
>> compilers, let alone worked on one, so please, if anything doesn't
>> make
>> sense, don't go easy on me ;-) )
>> 
>> (Before I begin, I refer to a lot of node indexes below. Here's the
>> graphs
>> they were taken from:
>> http://icedtea.classpath.org/~dlila/ecjGraph.xml. I used
>> IdealGraphVisualizer from src/share/tools)
>> 
>> That bug's evaluation says "We need to remove the band aid fixes for
>> 4781451 and 4799512 and disallow ConvI2L from being initialized with
>> a narrower type than bottom". I'm not sure I understand why. I mean,
>> in this case it made sense to me that node 147 should be initialized
>> with type 1..maxint because, like the comment where this is done says,
>> we can simply assume that our input is a valid array index because if
>> it's not a runtime check will take care of that case and the index
>> won't
>> be used.
>> 
>> What I think is going wrong is that when we do subsume_node(k, i) (in
>> phaseX.cpp:1108 - in this case that would be subsume_node(357, 296))
>> we
>> are losing information about one of the uses of 296. Before the
>> subsume
>> the uses of 296 are 357 (the phi) and 295. After the subsume they are
>> 259 and 295. When compute_lca_of_uses(296, 314) is called, Node *c in
>> the
>> loop becomes 259 in the 2nd iteration, and when we call get_ctrl(259)
>> we
>> get 226. 226 is dominated by 350, and the idom of 350 is 286 (the if
>> that
>> is executed before the loops - the one generated by the loop
>> splitting)
>> <i>because the controls from both split loops to 350 are still
>> alive</i>.
>> 286 dominates the early control of 296 (which is 314), so it thinks
>> this
>> is a bad graph.
>> 
>> If 357 had not been removed, when c.idx==357 in
>> compute_lca_of_uses(296...),
>> because 357 is a phi node, we know to "skip" over the region and set
>> the
>> "use" variable to the control input to the region that corresponds to
>> the
>> data input in question to the phi. In this case 296 is the data input,
>> the phi
>> is 357, and the corresponding control input to the region is 349,
>> which is
>> the control from the split loop where 296 is computed. So "use" would
>> be set
>> to node 349, and nothing would go wrong because 314 dominates both 349
>> and 303.
>> 
>> So, what I meant when I said "we are losing information about one of
>> the uses
>> of 296" is that we are losing the information that, if the value
>> produced by
>> node 296 is used by node 259, the control path taken _must_ have gone
>> through
>> 349, and it _cannot_ have gone through 217 (which is control path from
>> the i!=8
>> unswitched loop (the one that never runs)).
>> 
>> It seems to me like this problem isn't specific to ConvI2L, but can
>> happen
>> any time that a data input to a phi node dies, but the control input
>> to that
>> phi's region that corresponds to the dead data doesn't die. I'm not
>> sure if
>> this state of affairs can be caused by anything other than a ConvI2L,
>> but even
>> so, I think this should get its own fix, because in
>> PhiNode::unique_input, when
>> we do
>> if (un == NULL || un == this || phase->type(un) == Type::TOP) {
>> continue; // ignore if top, or in(i) and "this" are in a data cycle
>> }
>> we're not stating an invariant such as "a phi's data input dies if and
>> only if
>> that data's corresponding control dies". So, as long as it's possible
>> for the
>> data to die, but that data's control to stay alive, I think this will
>> be a
>> problem, regardless of ConvI2L's behaviour.
>> 
>> I have two potential fixes in mind:
>> 1. Don't subsume a phi node with its unique input if it has dead
>> inputs but
>> the dead inputs' controls are not dead.
>> 2. For every node N that is used by a phi node P that is killed, save
>> a pointer
>> to P->in(0) and a number j such that P->in(j) == N. That will allow us
>> to correctly
>> compute "use" in compute_lca_of_uses even if c isn't a phi.
>> 
>> Like I said above, I'm not sure if any of this makes any sense at all
>> so I
>> would very much appreciate any criticism.
>> 
>> Thank you,
>> Denis.
>> 
>> ----- Original Message -----
>> 
>>> 
>>> Pop 147 ConvI2L === _ 20 [[ 153 ]] Type:long:1..maxint:www !jvms:
>>> Test::main @ bci:32
>>> &lt; long:1..maxint:www &lt; 147 ConvI2L === _ _ [[]] [3000147]
>> ...



More information about the hotspot-compiler-dev mailing list