RFR: 8298935: fix cyclic dependency bug in create_pack logic in SuperWord::find_adjacent_refs [v22]

Thu Mar 9 16:43:02 UTC 2023

On Thu, 9 Mar 2023 09:47:04 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Looks reasonable.  The one thing I don't understand is new method `find_dependence()`.
>> May be because I don't know what data `DepPreds` is operating on.
>> 
>> Do I understand it correctly?:
>> 1. All nodes in one pack are independent
>> 2. Using `DepPreds` looks through all inputs for each node in pack and put them on work list if they are in the same block and in the same depth range. Does not matter if they are in an other pack or not?
>> 3. Go through these inputs and put their inputs on work list if  they satisfy conditions. 
>> 4. If we find input which is a node in the pack - we got dependence, return this pack's node.
>> 5. We check an input only once because we use Unique_Node_List.
>
>> The one thing I don't understand is new method `find_dependence()`.
> 
> @vnkozlov Now to your questions about `find_dependence()`.
> 
> I need a method that can check if a `pack` is `independent`, ie that all members are mutually `independent`. I want to filter with it, and verify at the end.
> 
> All it checks if any of the `pack`-members have a `DepPreds` path to any other `pack`-member. For that, I could also check `independent(s1, s2)` on every pair, but that would be squared many BFS traversal, one per `independent` call. Note that the depth range is used inside `independence`: The idea is to start at the "deeper" node, and BFS up the DAG, but no further than the depth of the "shallower" node. Because if we go further up, we will never find the "shallower" node. The BFS traversal is a bit convoluted, and implemented with recursive function calls to `independent_path` and checking `visited_test` to ensure nodes are not visited more than once.
> 
> I now generalized this `independence` query to the `pack`-level:
>  - Find `min_d`, of the "shallowest" node in the `pack` (no need to ever traverse further up than that depth).
>  - Instead of BFS-ing from one start node, directly start at all nodes from the `pack`.
>  - I mark all nodes from the `pack` as `visited`, and only those. This gives me an easy query to check if a node is from the `pack`.
>  - The `worklist` (`Unique_Node_List`) is my BFS data-structure. It is unique so that we only visit every node once. And it simultaneously works as the BFS queue, as `j` iterates over them linearly.
>  - I BFS traverse upward, only inside the basic block (`in_bb`), and only up to `min_d`. If I ever find a node `visit_test(pred)`, then I know that I have encountered a node from the `pack` from below. This implies that there must be a path from one of the nodes in the `pack` up to this node (return it). There is thus a dependence between two `si, sj` in the `pack`. If I never find a path, then I know that there cannot be a dependence, the `pack` is `independent`.
> 
> I hope I have clarified things. Except maybe this point:
>> Does not matter if they are in an other pack or not?
> 
> `find_dependence` only checks if all the members of a `pack` are mutually `independent` in the DAG. One might now wonder: do other `packs` not have an impact also? Might another pack not create additional dependencies "across" the vector elements?
> 
> In the [paper](https://groups.csail.mit.edu/cag/slp/SLP-PLDI-2000.pdf), they have this example:
> 
> <img src="https://user-images.githubusercontent.com/32593061/223974322-1b8d032a-fa8c-4990-8904-076d00b81d05.png" width="50%">
> 
> Quote:
> 
> 3.7 Scheduling
> Dependence analysis before packing ensures that statements within a group can be executed
> safely in parallel. However, it may be the case that executing two groups produces a dependence
> violation. An example of this is shown in Figure 6. Here, dependence edges are drawn between
> groups if a statement in one group is dependent on a statement in the other. As long as there
> are no cycles in this dependence graph, all groups can be scheduled such that no violations
> occur. However, a cycle indicates that the set of chosen groups is invalid and at least one group
> will need to be eliminated. Although experimental data has shown this case to be extremely rare,
> care must be taken to ensure correctness.
> 
> 
> The idea is this: before `schedule`, we must ensure that `packs` are `independent`. **But:** `independence` on the `pack` level is **not** sufficient, we also need to ensure that the `packs` (groups) are acyclic before we `schedule`.
> 
> From the comments in `schedule -> co_locate_pack`, it seems we are assuming that there are no cycles (see point 5 in the list there).
> 
> So is there a **4th Bug** lurking here?
> 
> Maybe. If so, I'd say we should fix that in a separate RFE, since all my examples failed because of `dependence` in the `pack`, and not because of cyclic dependencies between `independent` packs.
> 
> If you want, I could implement a verification code, that at least checks that there is no cyclic dependency between the `packs`. I had actually implemented this earlier, but then decided against it, in favour of `find_dependence` - because it is much less code.
> https://github.com/openjdk/jdk/blob/5e01c9f5aa39c94bc70069df02e0043678fa69c7/src/hotspot/share/opto/superword.cpp#L2290-L2293
> 
> But here an **argument why there is no such bug**: In `SuperWord::profitable`, we check that all inputs are also vectors. And in `SuperWord::is_vector_use`, we check that the corresponding "lanes" match. We thus have the same dependencies inside a "lane" of a vector as between the vectors. This is a limitation to our SLP that the paper does not directly require, as far as I see. If we have `n` lanes / elements in a vector, then we basically expect to find `n` `isomorphic` subgraphs, that are completely `independent` from each other.
> 
> In the example from the paper, we see that the vectors do not match (`q,r,s` / `k1,k2,s`), and that values permute between vector lanes (`x,y,z` / `y,k3,k4`).
> 
> **However**: if we were ever to implement `permutations`, or `ExtractNode` or `PackNode`, this would change. We might allow one lane to be unpacked, and re-packed to another. Or a direct permutation does that.
> 
> https://github.com/openjdk/jdk/blob/a44082b61f22dcdee115697f34d39c1d8382a15d/src/hotspot/share/opto/superword.cpp#L1367-L1376
> 
> https://github.com/openjdk/jdk/blob/a44082b61f22dcdee115697f34d39c1d8382a15d/src/hotspot/share/opto/superword.cpp#L2400-L2417

@eme64 Thank you for explaining code in such details. This confirmed by understanding how `find_dependence()` works. Very good.
Please, don't any more investigations in this PR. I think it is solid enough already.  File separate RFEs and look on them later.

-------------

PR: https://git.openjdk.org/jdk/pull/12350