RFR: 8325651: C2 SuperWord: refactor the dependency graph

Emanuel Peter epeter at openjdk.org
Wed Feb 28 13:22:15 UTC 2024


On Mon, 12 Feb 2024 16:24:30 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> Subtask of https://github.com/openjdk/jdk/pull/16620.
> 
> **Goal**
> 
> - Make the dependency graph, make it a module of `VLoopAnalyzer` -> `VLoopDependencyGraph`.
> - Refactoring: replace linked-list edges with a compact array for each node.
> - No behavioral change to vectorization.
> 
> **Benchmark**
> 
> I have a large loop body (extra large with hand-unrolling and ` -XX:LoopUnrollLimit=1000`).
> All stores are connected to all previous loads, effectively creating an `O(n^2)` size graph,
> ensuring that we spend a lot of time on the dependency graph compared to other components.
> 
> Measured on `linux-x64` and turbo disabled.
> 
> Measuring Compile time difference:
> `/oracle-work/jdk-fork2/build/linux-x64/jdk/bin/java -XX:CompileCommand=compileonly,TestGraph::test* -XX:+CITime -XX:+UnlockDiagnosticVMOptions -XX:RepeatCompilation=20 -XX:LoopUnrollLimit=1000 -Xbatch TestGraph.java`
> 
> TestGraph.java
> 
> public class TestGraph {
>     static int RANGE = 100_000;
> 
>     public static void main(String[] args) {
>         int[] a = new int[RANGE];
>         int[] b = new int[RANGE];
>         for (int i = 0; i < 10_000; i++) {
>             test1(a, b, i % 100);
>         }
>     }
> 
>     static void test1(int[] a, int[] b, int offset) {
>         for (int i = 0; i < RANGE/16-200; i++) {
>             a[i * 16 +  0] = b[i * 16 +  0 + offset];
>             a[i * 16 +  1] = b[i * 16 +  1 + offset];
>             a[i * 16 +  2] = b[i * 16 +  2 + offset];
>             a[i * 16 +  3] = b[i * 16 +  3 + offset];
>             a[i * 16 +  4] = b[i * 16 +  4 + offset];
>             a[i * 16 +  5] = b[i * 16 +  5 + offset];
>             a[i * 16 +  6] = b[i * 16 +  6 + offset];
>             a[i * 16 +  7] = b[i * 16 +  7 + offset];
>             a[i * 16 +  8] = b[i * 16 +  8 + offset];
>             a[i * 16 +  9] = b[i * 16 +  9 + offset];
>             a[i * 16 + 10] = b[i * 16 + 10 + offset];
>             a[i * 16 + 11] = b[i * 16 + 11 + offset];
>             a[i * 16 + 12] = b[i * 16 + 12 + offset];
>             a[i * 16 + 13] = b[i * 16 + 13 + offset];
>             a[i * 16 + 14] = b[i * 16 + 14 + offset];
>             a[i * 16 + 15] = b[i * 16 + 15 + offset];
>         }
>     }
> }
> 
> 
> 
> Before:
> 
>     C2 Compile Time:       14.588 s
> ...
>          IdealLoop:            13.670 s
>            AutoVectorize:      11.703 s```
> 
> After:
> 
>     C2 Compile Time:       14.468 s
> ...
>          IdealLoop:            13.595 s
>            AutoVectorize:      11.539 s
> 
> 
> Memory usage: `-XX:CompileCommand=memstat,TestGraph::test*,print`
> Before: `8_225_680 - 8_258_408` byt...

src/hotspot/share/opto/superword.cpp line 459:

> 457: 
> 458:   // compute function depth(Node*)
> 459:   compute_max_depth();

Note: moved to `VLoopAnalyzer::setup_submodules_helper`

src/hotspot/share/opto/superword.cpp line 756:

> 754: // Add dependence edges to load/store nodes for memory dependence
> 755: //    A.out()->DependNode.in(1) and DependNode.out()->B.prec(x)
> 756: void SuperWord::dependence_graph() {

Note: new code is in `VLoopDependencyGraph::construct`. The general idea is the same, but I refactored it a bit.

src/hotspot/share/opto/superword.cpp line 788:

> 786:     // Create a sink for the slice
> 787:     DepMem* slice_sink = _dg.make_node(nullptr);
> 788:     _dg.make_edge(slice_sink, _dg.tail());

Note: as far as I can tell, there is no reason to create these `sink` nodes. They are later never referenced again.

src/hotspot/share/opto/superword.cpp line 3064:

> 3062:     }
> 3063:     ct++;
> 3064:   } while (again);

Note: moved to `VLoopDependencyGraph::compute_depth`. Simplified it, we only need a single pass.

src/hotspot/share/opto/superword.cpp line 3815:

> 3813:     _dep_next = nullptr;
> 3814:   }
> 3815:   next();

Note: this was to set up the iteration over the `n->in(i)` (limited range, based on node type), and iteration over the `dep` edges (stored in a linked list).
I don't think the `is_Mem` case ever occurs (added an assert in my new code).

New code in: `VLoopDependencyGraph::PredsIterator::PredsIterator`

src/hotspot/share/opto/superword.cpp line 3850:

> 3848:     _dep_next = nullptr;
> 3849:   }
> 3850:   next();

Note: `DepSuccs` was never used

src/hotspot/share/opto/superword.hpp line 73:

> 71:   DepMem* _succ;
> 72:   DepEdge* _next_in;   // list of in edges, null terminated
> 73:   DepEdge* _next_out;  // list of out edges, null terminated

Note: the `out / succ` edges were never used.

src/hotspot/share/opto/superword.hpp line 293:

> 291:     return TraceSuperWord ||
> 292:            _vloop.vtrace().is_trace(TraceAutoVectorizationTag::SW_DEPENDENCE_GRAPH);
> 293:   }

Noe: replaced with `DEPENDENCY_GRAPH`

src/hotspot/share/opto/superword.hpp line 367:

> 365:   // Max expression (DAG) depth from beginning of the block for each node
> 366:   int depth(Node* n) const                   { return _node_info.adr_at(bb_idx(n))->_depth; }
> 367:   void set_depth(Node* n, int d)             { int i = bb_idx(n); grow_node_info(i); _node_info.adr_at(i)->_depth = d; }

Note: bonus: `depth` and `set_depth` is not private in `VLoopDependencyGraph`.

src/hotspot/share/opto/vectorization.hpp line 510:

> 508:     MemNode* _node; // Corresponding ideal node
> 509:     const uint _extra_pred_edges_length;
> 510:     int* _extra_pred_edges; // extra def-edges, mapping to bb_idx

Note: I'm now using an array instead of a linked list for the nodes.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/17812#discussion_r1505819042
PR Review Comment: https://git.openjdk.org/jdk/pull/17812#discussion_r1505731065
PR Review Comment: https://git.openjdk.org/jdk/pull/17812#discussion_r1505732210
PR Review Comment: https://git.openjdk.org/jdk/pull/17812#discussion_r1505733586
PR Review Comment: https://git.openjdk.org/jdk/pull/17812#discussion_r1505736605
PR Review Comment: https://git.openjdk.org/jdk/pull/17812#discussion_r1505737280
PR Review Comment: https://git.openjdk.org/jdk/pull/17812#discussion_r1505797283
PR Review Comment: https://git.openjdk.org/jdk/pull/17812#discussion_r1505798923
PR Review Comment: https://git.openjdk.org/jdk/pull/17812#discussion_r1505820489
PR Review Comment: https://git.openjdk.org/jdk/pull/17812#discussion_r1505828381


More information about the hotspot-compiler-dev mailing list