RFR: 8371716: C2 compilation fails with "Missed optimization opportunity in PhaseIterGVN"

Fri Nov 14 20:06:19 UTC 2025

This bug was originally found and reported as a Valhalla problem. It quickly became apparent it has no reason to be Valhalla-specific, while I couldn't prove so. Roland managed to make a mainline reproducer. The explanation details my Valhalla investigation, but it has nothing to do with value classes anyway.

# Analysis
## Obervationally
### IGVN
During IGVN, in `PhiNode::Value`, a `PhiNode` has 2 inputs. Their types are:

in(1): java/lang/Object * (speculative=compiler/valhalla/inlinetypes/MyValue2 (compiler/valhalla/inlinetypes/MyInterface):exact * (inline_depth=4))
in(2): java/lang/Object * (speculative=null)

We compute the join (HS' meet):
https://github.com/openjdk/valhalla/blob/412ec882767d3ee1792d1e0f98da54ff800c60ce/src/hotspot/share/opto/cfgnode.cpp#L1310-L1317

t=java/lang/Object * (speculative=compiler/valhalla/inlinetypes/MyValue2 (compiler/valhalla/inlinetypes/MyInterface):exact *)

But the current `_type` (of the `PhiNode` as a `TypeNode`) is

_type=java/lang/Object * (speculative=compiler/valhalla/inlinetypes/MyValue3 (compiler/valhalla/inlinetypes/MyInterface):exact *)

We filter `t` by `_type`
https://github.com/openjdk/valhalla/blob/412ec882767d3ee1792d1e0f98da54ff800c60ce/src/hotspot/share/opto/cfgnode.cpp#L1332
and we get

ft=java/lang/Object *

which is what we return. After the end of `Value`, the returned becomes the new `PhiNode`'s `_type`.
https://github.com/openjdk/valhalla/blob/412ec882767d3ee1792d1e0f98da54ff800c60ce/src/hotspot/share/opto/phaseX.cpp#L2150-L2164
and
https://github.com/openjdk/valhalla/blob/412ec882767d3ee1792d1e0f98da54ff800c60ce/src/hotspot/share/opto/node.cpp#L1127-L1133

### Verification
On verification, `in(1)`, `in(2)` have the same value, so does `t`. But this time

_type=java/lang/Object *

and so after filtering `t` by (new) `_type` and we get

ft=java/lang/Object * (speculative=compiler/valhalla/inlinetypes/MyValue2 (compiler/valhalla/inlinetypes/MyInterface):exact *)

which is retuned. Verification gets angry because the new `ft` is not the same as the previous one.

## But why?!
### Details on type computation
In short, we are doing

t = typeof(in(1)) / typeof(in(2))
ft  = t /\ _type (* IGVN *)
ft' = t /\ ft    (* Verification *)

and observing that `ft != ft'`. It seems our lattice doesn't ensure `(a /\ b) /\ b = a /\ b` which is problematic for this kind of verfication that will just "try again and see if something change".

To me, the surprising fact was that the intersection

java/lang/Object * (speculative=compiler/valhalla/inlinetypes/MyValue2 (compiler/valhalla/inlinetypes/MyInterface):exact *)
/\
_type=java/lang/Object * (speculative=compiler/valhalla/inlinetypes/MyValue3 (compiler/valhalla/inlinetypes/MyInterface):exact *)
~>
java/lang/Object *

What happened to the speculative type? Both `MyValue2` and `MyValue3` are inheriting `MyAbstract` (and implementing `MyInterface`). So the code correctly find that the intersection of these speculative type is

compiler/valhalla/inlinetypes/MyAbstract (compiler/valhalla/inlinetypes/MyInterface):AnyNull * (flat in array),iid=top

The interesting part is that it's `AnyNull`: indeed, what else is a `MyValue2` and `MyValue3` at the same time? And then, `above_centerline` decides it's not useful enough (too precise, too clone from HS' top/normal bottom) and remove the speculative type.
https://github.com/openjdk/valhalla/blob/412ec882767d3ee1792d1e0f98da54ff800c60ce/src/hotspot/share/opto/type.cpp#L2886-L2888
But on the verification run, `compiler/valhalla/inlinetypes/MyValue2 (compiler/valhalla/inlinetypes/MyInterface):exact *` is intersected with the speculative type of `java/lang/Object *`, which is unknown (HS' bottom/normal top), so we are simply getting `MyValue2`. If we did not discard `AnyNull` using `above_centerline`, we would have the intersection of `MyValue2` and `AnyNull`, giving `AnyNull`, which is indeed stable.

## Ok, but the types are weird?

Indeed, they are! How can we get a speculative type `MyValue3` on the `PhiNode` when inputs are both `Object`, and one is speculated to be a `MyValue2`? This comes from incremental inlining. It seems that we have some profiling information on the returned type of a callee, that happens to be `MyValue3`, which propagate to the `PhiNode`. Later, the callee is inlined, and we get new type information (`MyValue2`) from its body (from the returned type of a callee of our callee, if I remember well), that reaches the input of our `PhiNode`.

# Reproducing

## In Valhalla

This crash is quite rare because:
1. it needs a specific speculative type setup, which depends heavily on timing
2. if `PhiNode::Value` is called a second time, it will stabilize the `_type` field before verification.

To limitate the influence of 2., I've tested with an additional assert that would immediately do

const Type* ft_ = t->filter_speculative(ft);

in `PhiNode::Value` and compare `ft` and `ft_`. Indeed, we are never sure a run of `Value` is not the last one: it should always be legal to stop anywhere (even if in a particular case), it was going to run further.

With this extra check, the crash a bit more common, but still pretty rare. Tests that have been witness to crash then at least once:
- `compiler/valhalla/inlinetypes/TestCallingConvention.java`
- `compiler/valhalla/inlinetypes/TestIntrinsics.java`
- `compiler/valhalla/inlinetypes/TestArrays.java`
- `compiler/valhalla/inlinetypes/TestBasicFunctionality.java`

All in `compiler/valhalla/inlinetypes` while I was also testing with mainline tests. Suspicious, uh.

## In mainline

With the aforementioned extra check, I've tried to see if it could happen on mainline since the involved code seems not to be valhalla-specific. Yet, nothing failed.

Fortunately, Roland crafted an example that reproduces in mainline (and Valhalla)! You'll find it as a test here.

# Fixing?

I think changing the type system would be quite risky: it is all over the place. Also, fixing would require not to drop the speculative type when `above_centerline`. This might not be desirable. On top of the complexity and the associated risk, a too specific speculative type is rather useless. If we keep the too specific type around, we probably should ignore it where we make use of it. That's distributing the effort and open the door to inconsistencies. If we should ignore it, we might just as well drop it immediately. It is also dubious whether ordering requirement are meaningful for speculative type: they are not sound or complete. Moreover, one could argue that in the abstract, we don't even need a lattice or anything like that. A single poset whose functions are sound approximation of the concrete is enough. It is not uncommon that in the abstract world that `a /\ b` is not smaller than `a`.

For instance, in the co-interval domain (the whole universe `E` minus an interval), if we take `a = E \ [1, 2]` and `b = E \ [3, 4]`, the concrete intersection would be `E \ [1, 2] \ [3, 4]` which is not allowed in our domain since it has 2 holes. If we want a sound approximation of that, we must remove at least one hole. We can then take `E \ [1, 2] = a`, `E \ [3, 4] = b` or even `E` (and of course, a lot of other things, with smaller holes than `a` or `b`). Whichever our abstract domain chooses, there is never both `a /\ b < a` and `a /\ b < b`. Indeed, this poset (like many real-world domains) is not a lattice, which doesn't keep us from speaking about soundness. An interesting and related fact is that there is no best abstraction of `E \ [1, 2] \ [3, 4]`.

Maybe this digression about soundness rather hints that we should not compare speculative types during verification.

Finally, as a simple solution, one could simply run `filter_speculative` twice, that should be enough as the second filter will simply select the non empty speculative type if there is only one, and this one won't be `above_centerline`, or it would not exist as a speculative type already. To try to be a bit less aggressive, we can rather do that in case where we know it cannot be useful. If `ft` obtained from `filter_speculative` has no speculative type, but `t` has, we can know that it might be because it has been dropped, and computing `t->filter_speculative(ft)` could pick the speculative type of `t`. The speculative type can still be removed if the non-speculative type of `ft` is exact and non null for instance, but we've still reached a fixpoint, so it's correct, but a little bit too much work. This solution means that we are basically throwing away the speculative type in `_type` in case of clash. That sounds not shocking to me: if we get (hopefully) more accurate information o
 n the inputs, it seems reasonable, or at least not shocking, to give up the previous speculative type if it clashes since it's not known to be correct. One can also phrase it as "we keep the speculative type of `_type` only if we don't have anything going against it".

This PR includes the last solution.

See https://github.com/openjdk/valhalla/pull/1717 for some previous discussion, when it was not proven to happen in mainline yet.

Thanks,
Marc

-------------

Commit messages:
 - Filter twice

Changes: https://git.openjdk.org/jdk/pull/28331/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28331&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8371716
  Stats: 160 lines in 2 files changed: 160 ins; 0 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/28331.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/28331/head:pull/28331

PR: https://git.openjdk.org/jdk/pull/28331