RFR: JDKJDK-8277178: Reduce the priority of data dependent nodes when OptoScheduling enabled

SUN Guoyun duke at openjdk.java.net
Tue Nov 16 09:16:59 UTC 2021


when doing gcm/lcm, We should not only consider the height of nodes(latency), but also consider whether there is data dependency between nodes. When there is data dependency between two nodes and the delay of the previous node is large, another node without data dependency can be considered inserting between the two nodes. For example:
for java code
<pre><code class="java">
    public static final double fval = 2.00;
    public static double[] A = new double[N];
    public static int[] B = new int[N];

    public static void testP(){
	for (int i=0; i<N; i++) {
	   A[i] += A[i] * fval;
	   B[i] += B[i]+2;
        }
    }
</code></pre>

when use `-XX:+OptoScheduling` in aarch64, the sequence is
<pre><code class="shell">
190     B15: #	out( B15 B16 ) <- in( B14 B15 ) Loop( B15-B15 inner main of N118 strip mined) Freq: 9.9999e+11
190     sxtw  R13, R15	# i2l
194 +   add R14, R17, R13, LShiftL #3	# ptr
198     ldrd  V16, [R14, #16]	# double
19c +   fmuld   V18, V16, V17
1a0 +   faddd   V16, V18, V16
1a4     strd  V16, [R14, #16]	# double
1a8 +   add R13, R0, R13, LShiftL #2	# ptr
1ac +   ldrw  R1, [R13, #16]	# int
1b0 +   addw  R14, R1, R1
1b4 +   addw R1, R14, #2
1b8 +   addw R15, R15, #1
1bc     strw  R1, [R13, #16]	# int
1c0 +   cmpw  R15, R12
1c4     blt B15 	// counted loop end  P=1.000000 C=40960.000000
</code></pre>

Then a more efficient sequence should be:
<pre><code class="shell">
190     B15: #	out( B15 B16 ) <- in( B14 B15 ) Loop( B15-B15 inner main of N118 strip mined) Freq: 9.9999e+11
190     sxtw  R13, R14	# i2l
194     add R15, R17, R13, LShiftL #3	# ptr
198     add R13, R0, R13, LShiftL #2	# ptr
19c     ldrd  V16, [R15, #16]	# double
1a0     ldrw  R2, [R13, #16]	# int
1a4     fmuld   V18, V16, V17
1a8     addw  R1, R2, R2
1ac     faddd   V16, V18, V16
1b0     strd  V16, [R15, #16]	# double
1b4     addw R1, R1, #2
1b8     strw  R1, [R13, #16]	# int
1bc     addw R14, R14, #1
1c0     cmpw  R14, R12
1c4     blt B15 	// counted loop end  P=1.000000 C=40960.000000
</code></pre>

This problem also exists in MIPS architecture. This is a patch to fix this problem. Please help review it.
Thanks

-------------

Commit messages:
 - 8277178: Reduce the priority of data dependent nodes when OptoScheduling enabled

Changes: https://git.openjdk.java.net/jdk/pull/6407/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6407&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8277178
  Stats: 41 lines in 2 files changed: 12 ins; 28 del; 1 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6407.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6407/head:pull/6407

PR: https://git.openjdk.java.net/jdk/pull/6407


More information about the hotspot-compiler-dev mailing list