RFR: 8325651: C2 SuperWord: refactor the dependency graph [v8]
Christian Hagedorn
chagedorn at openjdk.org
Mon Mar 11 07:09:00 UTC 2024
On Fri, 8 Mar 2024 07:59:20 GMT, Emanuel Peter <epeter at openjdk.org> wrote:
>> Subtask of https://github.com/openjdk/jdk/pull/16620.
>>
>> **Goal**
>>
>> - Make the dependency graph, make it a module of `VLoopAnalyzer` -> `VLoopDependencyGraph`.
>> - Refactoring: replace linked-list edges with a compact array for each node.
>> - No behavioral change to vectorization.
>>
>> **Benchmark**
>>
>> I have a large loop body (extra large with hand-unrolling and ` -XX:LoopUnrollLimit=1000`).
>> All stores are connected to all previous loads, effectively creating an `O(n^2)` size graph,
>> ensuring that we spend a lot of time on the dependency graph compared to other components.
>>
>> Measured on `linux-x64` and turbo disabled.
>>
>> Measuring Compile time difference:
>> `/oracle-work/jdk-fork2/build/linux-x64/jdk/bin/java -XX:CompileCommand=compileonly,TestGraph::test* -XX:+CITime -XX:+UnlockDiagnosticVMOptions -XX:RepeatCompilation=20 -XX:LoopUnrollLimit=1000 -Xbatch TestGraph.java`
>>
>> TestGraph.java
>>
>> public class TestGraph {
>> static int RANGE = 100_000;
>>
>> public static void main(String[] args) {
>> int[] a = new int[RANGE];
>> int[] b = new int[RANGE];
>> for (int i = 0; i < 10_000; i++) {
>> test1(a, b, i % 100);
>> }
>> }
>>
>> static void test1(int[] a, int[] b, int offset) {
>> for (int i = 0; i < RANGE/16-200; i++) {
>> a[i * 16 + 0] = b[i * 16 + 0 + offset];
>> a[i * 16 + 1] = b[i * 16 + 1 + offset];
>> a[i * 16 + 2] = b[i * 16 + 2 + offset];
>> a[i * 16 + 3] = b[i * 16 + 3 + offset];
>> a[i * 16 + 4] = b[i * 16 + 4 + offset];
>> a[i * 16 + 5] = b[i * 16 + 5 + offset];
>> a[i * 16 + 6] = b[i * 16 + 6 + offset];
>> a[i * 16 + 7] = b[i * 16 + 7 + offset];
>> a[i * 16 + 8] = b[i * 16 + 8 + offset];
>> a[i * 16 + 9] = b[i * 16 + 9 + offset];
>> a[i * 16 + 10] = b[i * 16 + 10 + offset];
>> a[i * 16 + 11] = b[i * 16 + 11 + offset];
>> a[i * 16 + 12] = b[i * 16 + 12 + offset];
>> a[i * 16 + 13] = b[i * 16 + 13 + offset];
>> a[i * 16 + 14] = b[i * 16 + 14 + offset];
>> a[i * 16 + 15] = b[i * 16 + 15 + offset];
>> }
>> }
>> }
>>
>>
>>
>> Before:
>>
>> C2 Compile Time: 14.588 s
>> ...
>> IdealLoop: 13.670 s
>> AutoVectorize: 11.703 s```
>>
>> After:
>>
>> C2 Compile Time: 14.468 s
>> ...
>> IdealLoop: 13.595 s
>> ...
>
> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 24 additional commits since the last revision:
>
> - Merge branch 'master' into JDK-8325651
> - rm trailing whitespaces from applied suggestion
> - Apply from Christian's suggestions
>
> Co-authored-by: Christian Hagedorn <christian.hagedorn at oracle.com>
> - remove body() accessor from VLoopDependencyGraph, use field directly
> - _depth -> _depths for Christian
> - add_node change for Christian
> - missing string Extra -> Memory change
> - rename extra -> memory
> - typo
> - fix depth of Phi node
> - ... and 14 more: https://git.openjdk.org/jdk/compare/75213358...d89119e1
Thanks for the updates! Looks good.
-------------
Marked as reviewed by chagedorn (Reviewer).
PR Review: https://git.openjdk.org/jdk/pull/17812#pullrequestreview-1927050290
More information about the hotspot-compiler-dev
mailing list