RFR: 8325651: C2 SuperWord: refactor the dependency graph [v7]
Emanuel Peter
epeter at openjdk.org
Fri Mar 8 07:48:22 UTC 2024
> Subtask of https://github.com/openjdk/jdk/pull/16620.
>
> **Goal**
>
> - Make the dependency graph, make it a module of `VLoopAnalyzer` -> `VLoopDependencyGraph`.
> - Refactoring: replace linked-list edges with a compact array for each node.
> - No behavioral change to vectorization.
>
> **Benchmark**
>
> I have a large loop body (extra large with hand-unrolling and ` -XX:LoopUnrollLimit=1000`).
> All stores are connected to all previous loads, effectively creating an `O(n^2)` size graph,
> ensuring that we spend a lot of time on the dependency graph compared to other components.
>
> Measured on `linux-x64` and turbo disabled.
>
> Measuring Compile time difference:
> `/oracle-work/jdk-fork2/build/linux-x64/jdk/bin/java -XX:CompileCommand=compileonly,TestGraph::test* -XX:+CITime -XX:+UnlockDiagnosticVMOptions -XX:RepeatCompilation=20 -XX:LoopUnrollLimit=1000 -Xbatch TestGraph.java`
>
> TestGraph.java
>
> public class TestGraph {
> static int RANGE = 100_000;
>
> public static void main(String[] args) {
> int[] a = new int[RANGE];
> int[] b = new int[RANGE];
> for (int i = 0; i < 10_000; i++) {
> test1(a, b, i % 100);
> }
> }
>
> static void test1(int[] a, int[] b, int offset) {
> for (int i = 0; i < RANGE/16-200; i++) {
> a[i * 16 + 0] = b[i * 16 + 0 + offset];
> a[i * 16 + 1] = b[i * 16 + 1 + offset];
> a[i * 16 + 2] = b[i * 16 + 2 + offset];
> a[i * 16 + 3] = b[i * 16 + 3 + offset];
> a[i * 16 + 4] = b[i * 16 + 4 + offset];
> a[i * 16 + 5] = b[i * 16 + 5 + offset];
> a[i * 16 + 6] = b[i * 16 + 6 + offset];
> a[i * 16 + 7] = b[i * 16 + 7 + offset];
> a[i * 16 + 8] = b[i * 16 + 8 + offset];
> a[i * 16 + 9] = b[i * 16 + 9 + offset];
> a[i * 16 + 10] = b[i * 16 + 10 + offset];
> a[i * 16 + 11] = b[i * 16 + 11 + offset];
> a[i * 16 + 12] = b[i * 16 + 12 + offset];
> a[i * 16 + 13] = b[i * 16 + 13 + offset];
> a[i * 16 + 14] = b[i * 16 + 14 + offset];
> a[i * 16 + 15] = b[i * 16 + 15 + offset];
> }
> }
> }
>
>
>
> Before:
>
> C2 Compile Time: 14.588 s
> ...
> IdealLoop: 13.670 s
> AutoVectorize: 11.703 s```
>
> After:
>
> C2 Compile Time: 14.468 s
> ...
> IdealLoop: 13.595 s
> AutoVectorize: 11.539 s
>
>
> Memory usage: `-XX:CompileCommand=memstat,TestGraph::test*,print`
> Before: `8_225_680 - 8_258_408` byt...
Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
rm trailing whitespaces from applied suggestion
-------------
Changes:
- all: https://git.openjdk.org/jdk/pull/17812/files
- new: https://git.openjdk.org/jdk/pull/17812/files/c3915bd1..cf4996b9
Webrevs:
- full: https://webrevs.openjdk.org/?repo=jdk&pr=17812&range=06
- incr: https://webrevs.openjdk.org/?repo=jdk&pr=17812&range=05-06
Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod
Patch: https://git.openjdk.org/jdk/pull/17812.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/17812/head:pull/17812
PR: https://git.openjdk.org/jdk/pull/17812
More information about the hotspot-compiler-dev
mailing list