RFR: JDK-8277178: Reduce the priority of data dependent nodes when OptoScheduling enabled [v4]

Thu Jan 13 07:48:23 UTC 2022

On Thu, 13 Jan 2022 07:15:10 GMT, SUN Guoyun <duke at openjdk.java.net> wrote:

>> when doing gcm/lcm, We should not only consider the height of nodes(latency), but also consider whether there is data dependency between nodes. When there is data dependency between two nodes and the delay of the previous node is large, another node without data dependency can be considered inserting between the two nodes. For example:
>> for java code
>> <pre><code class="java">
>>     public static final double fval = 2.00;
>>     public static double[] A = new double[N];
>>     public static int[] B = new int[N];
>> 
>>     public static void testP(){
>> 	for (int i=0; i<N; i++) {
>> 	   A[i] += A[i] * fval;
>> 	   B[i] += B[i]+2;
>>         }
>>     }
>> </code></pre>
>> 
>> when use `-XX:+OptoScheduling` in aarch64, the sequence is
>> <pre><code class="shell">
>> 190     B15: #	out( B15 B16 ) <- in( B14 B15 ) Loop( B15-B15 inner main of N118 strip mined) Freq: 9.9999e+11
>> 190     sxtw  R13, R15	# i2l
>> 194 +   add R14, R17, R13, LShiftL #3	# ptr
>> 198     ldrd  V16, [R14, #16]	# double
>> 19c +   fmuld   V18, V16, V17
>> 1a0 +   faddd   V16, V18, V16
>> 1a4     strd  V16, [R14, #16]	# double
>> 1a8 +   add R13, R0, R13, LShiftL #2	# ptr
>> 1ac +   ldrw  R1, [R13, #16]	# int
>> 1b0 +   addw  R14, R1, R1
>> 1b4 +   addw R1, R14, #2
>> 1b8 +   addw R15, R15, #1
>> 1bc     strw  R1, [R13, #16]	# int
>> 1c0 +   cmpw  R15, R12
>> 1c4     blt B15 	// counted loop end  P=1.000000 C=40960.000000
>> </code></pre>
>> 
>> Then a more efficient sequence should be:
>> <pre><code class="shell">
>> 190     B15: #	out( B15 B16 ) <- in( B14 B15 ) Loop( B15-B15 inner main of N118 strip mined) Freq: 9.9999e+11
>> 190     sxtw  R13, R14	# i2l
>> 194     add R15, R17, R13, LShiftL #3	# ptr
>> 198     add R13, R0, R13, LShiftL #2	# ptr
>> 19c     ldrd  V16, [R15, #16]	# double
>> 1a0     ldrw  R2, [R13, #16]	# int
>> 1a4     fmuld   V18, V16, V17
>> 1a8     addw  R1, R2, R2
>> 1ac     faddd   V16, V18, V16
>> 1b0     strd  V16, [R15, #16]	# double
>> 1b4     addw R1, R1, #2
>> 1b8     strw  R1, [R13, #16]	# int
>> 1bc     addw R14, R14, #1
>> 1c0     cmpw  R14, R12
>> 1c4     blt B15 	// counted loop end  P=1.000000 C=40960.000000
>> </code></pre>
>> 
>> This problem also exists in MIPS architecture. This is a patch to fix this problem. Please help review it.
>> Thanks
>
> SUN Guoyun has updated the pull request incrementally with one additional commit since the last revision:
> 
>   8277178: Reduce the priority of data dependent nodes when OptoScheduling enabled

test/micro/org/openjdk/bench/vm/compiler/InstructionScheduling.java line 36:

> 34: 
> 35:     @Benchmark
> 36:     public void testMethod(){

`(){` => `() {`

test/micro/org/openjdk/bench/vm/compiler/InstructionScheduling.java line 40:

> 38:             D[i] += D[i] * fval;
> 39:             D[i] += D[i] / fval;
> 40:             I[i] += I[i] * 2;

Should we define a variable for `2` like the one for `2.00`?

-------------

PR: https://git.openjdk.java.net/jdk/pull/6407