RFR: 8350864: C2: verify structural invariants of the Ideal graph [v3]

Emanuel Peter epeter at openjdk.org
Mon Aug 25 15:00:08 UTC 2025


On Thu, 14 Aug 2025 10:43:08 GMT, Marc Chevalier <mchevalier at openjdk.org> wrote:

>> Some crashes are consequences of earlier misshaped ideal graphs, which could be detected earlier, closer to the source, before the possibly many transformations that lead to the crash.
>> 
>> Let's verify that the ideal graph is well-shaped earlier then! I propose here such a feature. This runs after IGVN, because at this point, the graph, should be cleaned up for any weirdness happening earlier or during IGVN.
>> 
>> This feature is enabled with the develop flag `VerifyIdealStructuralInvariants`. Open to renaming. No problem with me! This feature is only available in debug builds, and most of the code is even not compiled in product, since it uses some debug-only functions, such as `Node::dump` or `Node::Name`.
>> 
>> For now, only local checks are implemented: they are checks that only look at a node and its neighborhood, wherever it happens in the graph. Typically: under a `If` node, we have a `IfTrue` and a `IfFalse`. To ease development, each check is implemented in its own class, independently of the others. Nevertheless, one needs to do always the same kind of things: checking there is an output of such type, checking there is N inputs, that the k-th input has such type... To ease writing such checks, in a readable way, and in a less error-prone way than pile of copy-pasted code that manually traverse the graph, I propose a set of compositional helpers to write patterns that can be matched against the ideal graph. Since these patterns are... patterns, so not related to a specific graph, they can be allocated once and forever. When used, one provides the node (called center) around which one want to check if the pattern holds.
>> 
>> On top of making the description of pattern easier, these helpers allows nice printing in case of error, by showing the path from the center to the violating node. For instance (made up for the purpose of showing the formatting), a violation with a path climbing only inputs:
>> 
>> 1 failure for node
>>  211  OuterStripMinedLoopEnd  === 215 39  [[ 212 198 ]] P=0,948966, C=23799,000000
>> At node
>>     209  CountedLoopEnd  === 182 208  [[ 210 197 ]] [lt] P=0,948966, C=23799,000000 !orig=[196] !jvms: StringLatin1::equals @ bci:12 (line 100)
>>   From path:
>>     [center] 211  OuterStripMinedLoopEnd  === 215 39  [[ 212 198 ]] P=0,948966, C=23799,000000
>>       <-(0)- 215  SafePoint  === 210 1 7 1 1 216 37 54 185  [[ 211 ]]  SafePoint  !orig=186 !jvms: StringLatin1::equals @ bci:29 (line 100)
>>       <-(0)- 210  IfFalse  === 209  [[ 21...
>
> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Benoît's comments

Wow, very nice work @marc-chevalier !

Cool that you tried a pattern-matching approach. I really do wonder if we could use that more widely 😊

src/hotspot/share/opto/graphInvariants.cpp line 32:

> 30: 
> 31: void LocalGraphInvariant::LazyReachableCFGNodes::fill() {
> 32:   precond(live_nodes.size() == 0);

Maybe I missed something here: where do the `precond` and `postcond` come from?

src/hotspot/share/opto/graphInvariants.cpp line 57:

> 55: }
> 56: 
> 57: void print_path(const Node_List& steps, const GrowableArray<int>& path, stringStream& ss) {

Totally optional: you could make this a private static method of `GraphInvariantChecker`. Just to group it to where it belongs.

src/hotspot/share/opto/graphInvariants.cpp line 89:

> 87: }
> 88: 
> 89: struct Pattern : ResourceObj {

Some comments at the classes could be nice. Especially with `Bind` I did not have any idea what it could be used for.

src/hotspot/share/opto/graphInvariants.cpp line 105:

> 103:     return true;
> 104:   }
> 105:   const Node*& _binding;

Does this need to be public?

src/hotspot/share/opto/graphInvariants.cpp line 133:

> 131:     return true;
> 132:   }
> 133:   GrowableArray<Pattern*> _checks;

Does this need to be public?

src/hotspot/share/opto/graphInvariants.cpp line 176:

> 174:   }
> 175:   const uint _expect_req;
> 176: };

Looks like code duplication. Could you make it more general with a callback? You could still create alias structs that already have the correct callback/lambdas.

src/hotspot/share/opto/graphInvariants.cpp line 207:

> 205:   }
> 206:   bool (Node::*_type_check)() const;
> 207: };

You could probably generalize this with a callback approach. And then one concrete implentation is the one that does the type check. Just an idea.

src/hotspot/share/opto/graphInvariants.cpp line 270:

> 268:                 new HasNOutputs(2),
> 269:                 new AtSingleOutputOfType(&Node::is_IfTrue, new True()),
> 270:                 new AtSingleOutputOfType(&Node::is_IfFalse, new True()))) {

I would suggest that you append the word `Pattern` to all `Patterns` - at least in most cases this will make it a bit easier to see what you have at the use-site. I'm looking at `new True()` and wonder what might be passed here... if it was called `TruePattern`, it would be immediately clear.

src/hotspot/share/opto/graphInvariants.cpp line 279:

> 277:       return CheckResult::NOT_APPLICABLE;
> 278:     }
> 279:     CheckResult r = PatternBasedCheck::check(center, reachable_cfg_nodes, steps, path, ss);

Could this not be solved with a `OrPattern`?

Or::make( <not is_If ...,
  <the And::make from above>
)

Not sure that's worth it...

src/hotspot/share/opto/graphInvariants.cpp line 287:

> 285:       }
> 286:     }
> 287:     return r;

Also this could probably be handled with a pattern wrapping mechanism, right?
`FailOnlyForLiveNodes( <the pattern from above> )`

src/hotspot/share/opto/graphInvariants.cpp line 301:

> 299:                     And::make(
> 300:                         new NodeClass(&Node::is_Region),
> 301:                         new Bind(region_node))))) {

This sort of binding is kinda cool! Never thought of it before. Could be really cool for general pattern matching.
We would have to find a solution if there would be multiple bindings though ... I think that's not possible with your patterns, right? Is that a fundamental constraint?

src/hotspot/share/opto/graphInvariants.cpp line 309:

> 307:     if (!center->is_Phi()) {
> 308:       return CheckResult::NOT_APPLICABLE;
> 309:     }

Could do this via `Or`?

src/hotspot/share/opto/graphInvariants.cpp line 319:

> 317:       return CheckResult::FAILED;
> 318:     }
> 319:     return CheckResult::VALID;

Another funky idea: could probably be handled with some callback, some "terminal" check you do on the bound variable. Not sure if worth it.

src/hotspot/share/opto/graphInvariants.cpp line 323:

> 321: };
> 322: 
> 323: struct ControlSuccessor : LocalGraphInvariant {

A quick comment above would prevent me from having to reverse-engineer the code below ;)

src/hotspot/share/opto/graphInvariants.cpp line 332:

> 330:     }
> 331: 
> 332:     Node_List ctrl_succ;

Do we need a `ResouceMark` for this?

src/hotspot/share/opto/graphInvariants.cpp line 338:

> 336:       if (out->is_CFG()) {
> 337:         cfg_out++;
> 338:         ctrl_succ.push(out);

Seems you do these in a pair. So why do you need `cfg_out` at all? Can you not take the length/size of `ctrl_succ`? After all, it counts duplicates too (hope that is intended).

src/hotspot/share/opto/graphInvariants.cpp line 399:

> 397:   }
> 398:   CheckResult check(const Node* center, LazyReachableCFGNodes& reachable_cfg_nodes, Node_List& steps, GrowableArray<int>& path, stringStream& ss) const override {
> 399:     if (!center->is_Region() && !center->is_Start() && !center->is_Root()) {

If you allow non-Regions to play here, then I'd just call it `SelfLoopPattern`.

src/hotspot/share/opto/graphInvariants.cpp line 413:

> 411:       ss.print_cr("%s nodes' 0-th input must be itself or nullptr (for a copy Region).", center->Name());
> 412:       return CheckResult::FAILED;
> 413:     }

Absolutely subjective: checking `self != center` is more about `self`, checking `center != self` is more about `center`. So I would use `self != center` :rofl: 
Suggestion:

    if (self != center || (center->is_Region() && self == nullptr)) {
      ss.print_cr("%s nodes' 0-th input must be itself or nullptr (for a copy Region).", center->Name());
      return CheckResult::FAILED;
    }

src/hotspot/share/opto/graphInvariants.cpp line 417:

> 415:     if (self == nullptr) {
> 416:       // Must be a copy Region
> 417:       Node_List non_null_inputs;

ResouceMark?

src/hotspot/share/opto/graphInvariants.cpp line 447:

> 445:                     And::make(
> 446:                         new NodeClass(&Node::is_IfTrue),
> 447:                         new HasAtLeastNInputs(1),

Can an `IfTrue` have more than 1 input?

src/hotspot/share/opto/graphInvariants.cpp line 452:

> 450:                             And::make(
> 451:                                 new NodeClass(&Node::is_BaseCountedLoopEnd),
> 452:                                 new Bind(counted_loop))))))) {}

Ah, another check and Bind! Why not allow `Bind<BaseCountedLoopEndNode*>`, so we can bind it with the cast?

src/hotspot/share/opto/graphInvariants.cpp line 468:

> 466:     }
> 467:     assert(counted_loop != nullptr, "sanity");
> 468:     if (is_long) {

Why did you cache the value? Seems `is_long` is only used once ... and `center` should not change pointers around.

src/hotspot/share/opto/graphInvariants.cpp line 469:

> 467:     assert(counted_loop != nullptr, "sanity");
> 468:     if (is_long) {
> 469:       if (counted_loop->is_CountedLoopEnd()) {

Sounds like head/tail confusion here. Call it `counted_loop_end`.

src/hotspot/share/opto/graphInvariants.cpp line 562:

> 560:   }
> 561: 
> 562:   VectorSet enqueued;

I would move the ResourceMark to the beginning of the allocations, and do the fast bail-out first.

src/hotspot/share/opto/graphInvariants.cpp line 575:

> 573:   // For CFG-related errors, we will compute the set of reachable CFG nodes and decide whether to keep
> 574:   // the issue if the problematic node is reachable. This set of reachable node is thus computed lazily
> 575:   // (and it seems not to happen often in practice), and shared across checks.

Suggestion:

  // Sometimes, we get weird structures in dead code that will be cleaned up later. It typically happens
  // when data dies, but control is not cleaned up right away, possibly kept alive by an unreachable loop.
  // Since we don't want to eagerly traverse the whole graph to remove dead code in IGVN, we can accept
  // weird structures in dead code.
  // For CFG-related errors, we will compute the set of reachable CFG nodes and decide whether to keep
  // the issue if the problematic node is reachable. This set of reachable nodes is thus computed lazily
  // (and it seems not to happen often in practice), and shared across checks.

src/hotspot/share/opto/graphInvariants.cpp line 585:

> 583:       if (in != nullptr && !enqueued.test_set(in->_idx)) {
> 584:         worklist.push(in);
> 585:       }

Why not make a `Unique_Node_List`? It would already have a `VectorSet` included, and you could just push without checking if we already pushed the node. Very nice for BFS traversals.
You would then not even pop nodes, but just traverse over the worklist, as it grows.

src/hotspot/share/opto/graphInvariants.cpp line 611:

> 609:       center->dump();
> 610:       tty->print_cr("%s", ss.base());
> 611:       ss.reset();

Do you really want to use the `ttyLocker` here? I thought we were trying to get away from it because it sometimes leads to lock-priority issues / dead-locks.
Why not just use yet another `stringStream ss3`, and do it all via that one?

src/hotspot/share/opto/graphInvariants.hpp line 35:

> 33: class LocalGraphInvariant : public ResourceObj {
> 34: public:
> 35:   static constexpr int OutputStep = -1;

Can you please add a quick comment what this is for? After all, it is a public static constant ;)

src/hotspot/share/opto/graphInvariants.hpp line 37:

> 35:   static constexpr int OutputStep = -1;
> 36: 
> 37:   struct LazyReachableCFGNodes {

You could add a comment here. What I was surprised by: that you do a whole graph traversal the first time we call `is_node_dead`. I thought you would just visit a subgraph every time, and fill out the `live_nodes` gradually.

You could also give an explanation why it needs to be lazy. Is it possible that we never call `is_node_dead`?

src/hotspot/share/opto/graphInvariants.hpp line 41:

> 39:   private:
> 40:     void fill();
> 41:     Unique_Node_List live_nodes;

I think the hotspot convention is to have fields with an underscore `_live_nodes`. Especially if they are private.

src/hotspot/share/opto/graphInvariants.hpp line 56:

> 54:    *
> 55:    * If the check fails steps and path must be filled with the path from the center to the failing node (where it's relevant to show).
> 56:    * Given a list of node

Suggestion:

   * Given a list of nodes

src/hotspot/share/opto/graphInvariants.hpp line 64:

> 62:    *   - a non-negative integer p for each step such that N{i-1} has Ni as p-th input (we need to follow an input edge)
> 63:    *   - the OUTPUT_STEP value in case N{i-1} has Ni as an output (we need to follow an output edge)
> 64:    * The list are reversed to allow to easily fill them lazily on failure.

Suggestion:

   * The lists are reversed to allow to easily fill them lazily on failure.

src/hotspot/share/opto/graphInvariants.hpp line 70:

> 68:    *
> 69:    * The parameter [live_nodes] is used to share the lazily computed set of CFG nodes reachable from root. This is because some
> 70:    * checks don't apply to dead code, suppress their error if a violation is detected in dead code.

Does that mean we only cache the result if it is reachable, but not if it is not reachable? Does that mean we may check reachability for non-reachable nodes many many times?

src/hotspot/share/opto/phaseX.hpp line 615:

> 613:   Node* _verify_window[_verify_window_size];
> 614:   void verify_step(Node* n);
> 615:   GraphInvariantChecker* _invariant_checker;

Why do you allocate it separately, and not have it in-place?

-------------

Changes requested by epeter (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/26362#pullrequestreview-3151474721
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298159885
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298174539
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298187255
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298212967
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298191417
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298197676
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298203251
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298212389
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298236694
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298239978
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298244614
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298253028
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298257576
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298264410
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298262968
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298271376
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298277114
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298286692
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298289038
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298302608
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298298634
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298309408
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298306579
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298319813
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298324046
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298329951
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298336112
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298115554
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298169221
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298117370
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298122016
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298123512
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298129493
PR Review Comment: https://git.openjdk.org/jdk/pull/26362#discussion_r2298149486


More information about the hotspot-compiler-dev mailing list