RFR: 8327110: Refactor create_bool_from_template_assertion_predicate() to separate class and fix identical cloning cases used for Loop Unswitching and Split If

Thu Mar 21 13:44:23 UTC 2024

On Thu, 14 Mar 2024 07:10:30 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote:

> This is a follow-up to the previous refactoring done in https://github.com/openjdk/jdk/pull/18080. The patch starts to replace the usages of `create_bool_from_template_assertion_predicate()` by providing a refactored and fixed cloning algorithm.
> 
> #### How `create_bool_from_template_assertion_predicate()` Works
> Currently, the algorithm in `create_bool_from_template_assertion_predicate()` uses an iterative DFS walk to find all nodes of a Template Assertion Predicate Expression in order to clone them. We do the following:
> 1. Follow all inputs if they could be a node that's part of a Template Assertion Predicate (compares opcodes):
> https://github.com/openjdk/jdk/blob/326c91e1a28ec70822ef927ee9ab17f79aa6d35c/src/hotspot/share/opto/loopTransform.cpp#L1513
> 
> 2. Once we find an `OpaqueLoopInit` or `OpaqueLoopStride` node, we start backtracking in the DFS. While doing so, we start to clone all nodes on the path from the `OpaqueLoop*Nodes` node to the start node and already update the graph. This logic is quite complex and difficult to understand since we do everything simultaneously. This was one of the reasons, I've originally  tried to refactor this method in https://github.com/openjdk/jdk/pull/16877 because I needed to extend it for the full fix of Assertion Predicates in JDK-8288981.
> 
> #### Missing Visited Set
> The current implementation of `create_bool_from_template_assertion_predicate()` does not use a visited set. This means that whenever we find a diamond shape, we could visit a node twice and re-discover all paths above this diamond again:
> 
> 
>     ...
>      |
>      E
>      |
>      D
>     / \
>    B   C
>     \ /
>      A
> 
> DFS walk: A -> B -> D -> E -> ... -> C -> D -> E -> ...
> 
> With each diamond, the number of revisits of each node above doubles.
> 
> #### Endless DFS in Edge-Cases
> In most cases, we would normally just stop quite quickly once we follow a data node that is not part of a Template Assertion Predicate Expression because the node opcode is different. However, in the test cases, we create a long chain of data nodes with many diamonds that could all be part of a Template Assertion Predicate Expression (i.e. `is_part_of_template_assertion_predicate_bool()` would return true to follow the inputs in a DFS walk). As a result, the DFS revisits a lot of nodes, especially higher up in the graph, exponentially many times and compilation is stuck for a long time (running the test cases result in a test timeout because background compilation is disabled).
> 
> #### New DFS Implem...

This looks fantastic, great work :)

src/hotspot/share/opto/loopnode.hpp line 1937:

> 1935:   }
> 1936: 
> 1937:   // Create a copy of the provided data node collection by doing the following:

By "collection" you mean the `_data_nodes`, right? maybe some renaming could be helpful?

src/hotspot/share/opto/loopopts.cpp line 4542:

> 4540: 
> 4541: void DataNodeGraph::transform_opaque_node(TransformStrategyForOpaqueLoopNodes& transform_strategy, Node* node) {
> 4542:   const uint next_idx = _phase->C->unique();

Does this have any use?

src/hotspot/share/opto/predicates.cpp line 185:

> 183: // The class takes a node filter to decide which input nodes to follow and a target node predicate to start backtracking
> 184: // from. All nodes found on all paths from source->target(s) returned in a Unique_Node_List (without duplicates).
> 185: class DataNodesOnPathToTargets : public StackObj {

Suggestion:

class DataNodesOnPathsToTargets : public StackObj {

Just a nit: but you do want all paths.

src/hotspot/share/opto/predicates.cpp line 189:

> 187: 
> 188:   NodeCheck _node_filter; // Node filter function to decide if we should process a node or not while searching for targets.
> 189:   NodeCheck _is_target_node; // Function to decide if a node is a target node (i.e. where we should start backtracking).

There should be some remark that all target nodes must pass the filter.

src/hotspot/share/opto/predicates.cpp line 218:

> 216:         pop_target_node_and_collect_predecessor();
> 217:       } else if (!push_next_unvisited_input()) {
> 218:         // All inputs visited. Continue backtracking.

`push_next_unvisited_input`:
A `enum` with 2 values would make it more explicit than a bool: `HasMoreUnvisitedInputs` and `AllInputsAreVisited`.
Then it makes more sense at the use site, because you can just check for `AllInputsAreVisited`, and it is immediately clear that you can continue with backtacking.

src/hotspot/share/opto/predicates.cpp line 263:

> 261:     for (uint i = next_unvisited_input_index; i < current->req(); i++) {
> 262:       Node* input = current->in(i);
> 263:       if (_node_filter(input)) {

you could check that if the filter does not pass, then also the target-node criterion fails.

src/hotspot/share/opto/predicates.cpp line 287:

> 285:   // Push the next unvisited node in the DFS order with index 1 since this node needs to visit all its inputs.
> 286:   void push_unvisited_node(Node* next_to_visit) {
> 287:     _stack.push(next_to_visit, 1);

Add assert that it was not visited yet!

src/hotspot/share/opto/predicates.cpp line 301:

> 299:     }
> 300:   }
> 301: };

I think this is now correct. But it is 100 lines to perform and explain this DFS with backtracking.

On the other hand doing 2 BFS would just be 20+ lines.

Unique_Node_List collected;

Unique_Node_List input_traversal;
input_traversal.push(start_node);
for (int i = 0; i < input_traversal.length(); i++) {
  Node* n = input_traversal.at(i);
  for (int j = 1; j < n->req(); j++) {
    Node* input = n->in(j);
    if (_is_target_node(input)) {
      collected.push(input); // mark as target, where we start backtracking.
    } else if(_filter(input)) {
      input_traversal.push(input); // continue BFS.
    }
  }
}
assert(!collected.is_empty(), "must find some targets");
for (int i = 0; i < collected.length(); i++) {
  Node* n = collected.at(i);
  for (output : n->fastout()) { // pseudocode
    if (input_traversal.contains(output)) {
      collected.push(output); // backtrack through nodes of input traversal
    }
  }
}
assert(collected.contains(start_node), "must find start node again");

src/hotspot/share/opto/predicates.cpp line 306:

> 304: Opaque4Node* TemplateAssertionPredicateExpression::clone(TransformStrategyForOpaqueLoopNodes& transform_strategy,
> 305:                                                          Node* new_ctrl, PhaseIdealLoop* phase) {
> 306:   ResourceMark rm;

The ResourceMark makes me a bit nervous, in combination of a non-constant `transform_strategy`.
Could the `transform_strategy` be a constant reference, maybe by making its functions also const?

-------------

Changes requested by epeter (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/18293#pullrequestreview-1952137313
PR Review Comment: https://git.openjdk.org/jdk/pull/18293#discussion_r1533856371
PR Review Comment: https://git.openjdk.org/jdk/pull/18293#discussion_r1533863020
PR Review Comment: https://git.openjdk.org/jdk/pull/18293#discussion_r1533878355
PR Review Comment: https://git.openjdk.org/jdk/pull/18293#discussion_r1533886598
PR Review Comment: https://git.openjdk.org/jdk/pull/18293#discussion_r1533884251
PR Review Comment: https://git.openjdk.org/jdk/pull/18293#discussion_r1533888249
PR Review Comment: https://git.openjdk.org/jdk/pull/18293#discussion_r1533895635
PR Review Comment: https://git.openjdk.org/jdk/pull/18293#discussion_r1533915387
PR Review Comment: https://git.openjdk.org/jdk/pull/18293#discussion_r1533923221