RFR: 8350563: C2 compilation fails because PhaseCCP does not reach a fixpoint [v7]

Tue Apr 1 10:09:15 UTC 2025

On Mon, 31 Mar 2025 16:08:57 GMT, Liam Miller-Cushon <cushon at openjdk.org> wrote:

>> Hello, please consider this fix for [JDK-8350563](https://bugs.openjdk.org/browse/JDK-8350563) contributed by my colleague Matthias Ernst.
>> 
>> https://github.com/openjdk/jdk/pull/22856 introduced a new `Value()` optimization for the pattern `AndIL(Con, Mask)`.
>> This optimization can look through CastNodes, and therefore requires additional logic in CCP to push these
>> transitive uses to the worklist.
>> 
>> The optimization is closely related to analogous optimizations for SHIFT nodes, and we also extend the existing logic for
>> CCP worklist handling: the current logic is "if the shift input to a SHIFT node changes, push indirect AND node uses to the CCP worklist".
>> We extend it by adding "if the (new) type of a node is an IntegerType that `is_con, ...` to the predicate.
>
> Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Explicitly check for OP_Con instead of TypeInteger::is_con.
>   
>    322  Phi  === 303 119 255  [[ 399 388 351 751 366 377 ]]  #int:-256..127 !jvms: Integer::parseInt @ bci:151 (line 625)
>   
>    While this Phi dumps as "#int:-256..127", `phase->type(expr)` returns a type that is_con -256.

Thanks Matthias for having a look at the issue and proposing a fix! While this fix seems to work, I think we should address it slightly differently with an explicit bailout, though. Let's step back a bit:

CCP first sets all types to top and then tries to widen them (i.e. an optimistic approach) while IGVN does the opposite: We start by setting all types to bottom and then try to narrow them (i.e. a pessimistic approach). 

The assert we've faced in CCP complains that we tried to narrow some type again which is against the rules of CCP - we can only widen types. 

Now when CCP runs, we start with every type of every node at top. When visiting `AndI` at some point, we see what you reported above:

> What I observe for the Integer.parseInt reproducer is that expr dumps as a phi node with type #int:-256...127, but phase->type(expr) returns a type that is_con() with value -256.

That is perfectly fine. What happened here is that only one input of the phi with type `#int:-256` is non-top. The other inputs are still top (i.e. not processed in CCP, yet). Therefore, the phi's type is set to `#int:-256`. Note that the `TypeNode::_type` field of the phi is still set to the type we had before CCP, i.e. ` #int:-256...127` . In CCP, we use `PhaseValues::_types` which are set to top in the beginning and we leave `TypeNode::_type` unchanged during the analysis.

As a consequence this can happen when having a phi and only looking at the currently tracked CCP types:

> In consequence, the AND(phi-node, mask) gets optimized to zero.

Let's look at the output of the failure:

  304  ConI  === 0  [[ 506 ]]  #int:255
  996  CastII  === 461 453  [[ 557 546 535 524 1034 506 ]]  #int:-256..127 extra types: {0:int:-256} strong dependency !orig=[478] !jvms: Integer::parseInt @ bci:144 (line 550)
  506  AndI  === _ 996 304  [[ 507 ]]  !jvms: Integer::parseInt @ bci:170 (line 552)
told = int:0
tnew = top

it looks like we first optimized `AndI` to zero (i.e. `told`) and then set it to top again in a later `Value()` call in CCP (i.e. `tnew`). This is a violation of the rules for CCP. When we suddenly see top again, it suggests that we prematurely applied an optimization while one of the involved inputs was actually still top. This looks wrong and we should have waited until all the involved inputs are non-top.

When looking at the code, we check that `mask` is an integer type and thus non-top here:

https://github.com/openjdk/jdk/blob/f25f701652900d02858c905f4cd0bb43208c13d5/src/hotspot/share/opto/mulnode.cpp#L2255-L2260

But it looks like we miss that for `expr` when it is a cast node (which is `996 CastII` in the failing test). We pass `expr` to `AndIL_min_trailing_zeros()` and then uncast it and only then check if it is a proper integer type:

https://github.com/openjdk/jdk/blob/f25f701652900d02858c905f4cd0bb43208c13d5/src/hotspot/share/opto/mulnode.cpp#L2180-L2185

So, if the type of `996 CastII` in CCP is still top, we skip it with `uncast()` and then check the phi above which has first the constant type `#int:-256`. We can apply the optimization to return type zero. When later updating the type of the phi to `#int:-256...127`, we can no longer apply the optimization and fall back to `MulNode::Value()` where we return top because the input `996 CastII` is still top:

https://github.com/openjdk/jdk/blob/f25f701652900d02858c905f4cd0bb43208c13d5/src/hotspot/share/opto/mulnode.cpp#L185-L187

We find top which is narrower than type zero and we fail with the assert.

Long story short, you should check for `expr` being top before uncasting it. This was hard to see and is only a problem in CCP. 

I suggest to add the small reproducer as additional test case.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/23871#issuecomment-2768852982