RFR: 8327978: C2 SuperWord: Fix compilation time regression in dependency graph traversal after JDK-8325651 [v5]

Christian Hagedorn chagedorn at openjdk.org
Thu Apr 4 13:14:01 UTC 2024


On Thu, 4 Apr 2024 12:55:24 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> In [JDK-8325651](https://bugs.openjdk.org/browse/JDK-8325651) / https://github.com/openjdk/jdk/pull/17812 I refactored the dependency graph. It seems I made a typo, and missed a single `!`, which broke `VLoopDependencyGraph::compute_depth` (formerly `SuperWord::compute_max_depth`).
>> 
>> The consequence was that all nodes in the dependency graph had the same depth `1`. A node is supposed to have a higher depth than all its inputs, except for Phi nodes, which have depth 0, as they are at the beginning of the loop's basic block, i.e. they are at the beginning of the DAG.
>> 
>> **Details**
>> 
>> Well, it is a bit more complicated. I had not just forgotten about the `!`. Before the change, we used to iterate over the body multiple times, until the depth computation is stable. When I saw this, I assumed this was not necessary, since the `body` is already ordered, such that `def` is before `use`. So I reduced it to a single pass over the `body`.
>> 
>> But this assumption was wrong: I added some assertion code, which detected that something was wrong with the ordering in the `body`. In the failing example, I saw that we had a `Load` and a `Store` with the same memory state. Given the edges, our ordering algorithm for the `body` could schedule `Load` before `Store` or `Store` before `Load`. But that is incorrect: our assumption is that in such cases `Loads` always happen before `Stores`.
>> 
>> Therefore, I had to change the traversal order in `VLoopBody::construct`, so that we visit `Load` before `Store`. With this, I now know that the `body` order is correct for both the data dependency and the memory dependency. Therefore, I only need to iterate over the `body` once in `VLoopDependencyGraph::compute_depth`.
>> 
>> **More Backgroud / Details**
>> 
>> This bug was reported because there were timeouts with `TestAlignVectorFuzzer.java`. This fix seems to improve the compile time drastically for the example below. It seems to be an example with a large dependency graph, where we still attempt to create some packs. This means there is a large amount of `independence` checks on the dependency graph. If those are not pruned well, then they visit many more nodes than necessary.
>> 
>> Why did I not catch this earlier? I had a compile time benchmark for [JDK-8325651](https://bugs.openjdk.org/browse/JDK-8325651) / https://github.com/openjdk/jdk/pull/17812, but it seems it was not sensitive enough. It has a dense graph, but never actually created any packs. My new benchmark creates ...
>
> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
> 
>   extract find_max_pred_depth

You have a whitespace error in the test file.

src/hotspot/share/opto/vectorization.cpp line 299:

> 297: // assume that the depth of all preds is already computed when we compute the depth of use.
> 298: void VLoopDependencyGraph::compute_depth() {
> 299:   auto find_max_pred_depth = [&] (const Node* n) {

I would move this code out to a separate method. Having a lambda here makes it hard to read `compute_depth()` and you don't really need to capture anything.

test/hotspot/jtreg/compiler/loopopts/superword/TestLargeCompilation.java line 31:

> 29:  * @summary Test compile time for large compilation, where SuperWord takes especially much time.
> 30:  * @requires vm.compiler2.enabled
> 31:  * @run main/othervm -XX:+UnlockDiagnosticVMOptions -XX:RepeatCompilation=5 -XX:LoopUnrollLimit=1000 -Xbatch

You can also use `main/othervm/timeout=30` or an even lower timeout. Then you might be able to get rid of `RepeatCompilation`.

-------------

PR Review: https://git.openjdk.org/jdk/pull/18532#pullrequestreview-1979852854
PR Review Comment: https://git.openjdk.org/jdk/pull/18532#discussion_r1551658809
PR Review Comment: https://git.openjdk.org/jdk/pull/18532#discussion_r1551656544


More information about the hotspot-compiler-dev mailing list