RFR: 8327978: C2 SuperWord: Fix compilation time regression in dependency graph traversal after JDK-8325651 [v2]

Thu Apr 4 05:07:24 UTC 2024

> In [JDK-8325651](https://bugs.openjdk.org/browse/JDK-8325651) / https://github.com/openjdk/jdk/pull/17812 I refactored the dependency graph. It seems I made a typo, and missed a single `!`, which broke `VLoopDependencyGraph::compute_depth` (formerly `SuperWord::compute_max_depth`).
> 
> The consequence was that all nodes in the dependency graph had the same depth `1`. A node is supposed to have a higher depth than all its inputs, except for Phi nodes, which have depth 0, as they are at the beginning of the loop's basic block, i.e. they are at the beginning of the DAG.
> 
> **Details**
> 
> Well, it is a bit more complicated. I had not just forgotten about the `!`. Before the change, we used to iterate over the body multiple times, until the depth computation is stable. When I saw this, I assumed this was not necessary, since the `body` is already ordered, such that `def` is before `use`. So I reduced it to a single pass over the `body`.
> 
> But this assumption was wrong: I added some assertion code, which detected that something was wrong with the ordering in the `body`. In the failing example, I saw that we had a `Load` and a `Store` with the same memory state. Given the edges, our ordering algorithm for the `body` could schedule `Load` before `Store` or `Store` before `Load`. But that is incorrect: our assumption is that in such cases `Loads` always happen before `Stores`.
> 
> Therefore, I had to change the traversal order in `VLoopBody::construct`, so that we visit `Load` before `Store`. With this, I now know that the `body` order is correct for both the data dependency and the memory dependency. Therefore, I only need to iterate over the `body` once in `VLoopDependencyGraph::compute_depth`.
> 
> **More Backgroud / Details**
> 
> This bug was reported because there were timeouts with `TestAlignVectorFuzzer.java`. This fix seems to improve the compile time drastically for the example below. It seems to be an example with a large dependency graph, where we still attempt to create some packs. This means there is a large amount of `independence` checks on the dependency graph. If those are not pruned well, then they visit many more nodes than necessary.
> 
> Why did I not catch this earlier? I had a compile time benchmark for [JDK-8325651](https://bugs.openjdk.org/browse/JDK-8325651) / https://github.com/openjdk/jdk/pull/17812, but it seems it was not sensitive enough. It has a dense graph, but never actually created any packs. My new benchmark creates packs, which unlocks more checks d...

Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:

  For Vladimir: check is_Store first

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/18532/files
  - new: https://git.openjdk.org/jdk/pull/18532/files/cd6c401c..87892b7c

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=18532&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18532&range=00-01

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/18532.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/18532/head:pull/18532

PR: https://git.openjdk.org/jdk/pull/18532