RFR: JDK-8277178: Reduce the priority of data dependent nodes when OptoScheduling enabled [v6]
Vladimir Kozlov
kvn at openjdk.java.net
Thu Jan 20 16:01:56 UTC 2022
On Thu, 13 Jan 2022 09:02:06 GMT, SUN Guoyun <duke at openjdk.java.net> wrote:
>> when doing gcm/lcm, We should not only consider the height of nodes(latency), but also consider whether there is data dependency between nodes. When there is data dependency between two nodes and the delay of the previous node is large, another node without data dependency can be considered inserting between the two nodes. For example:
>> for java code
>> <pre><code class="java">
>> public static final double fval = 2.00;
>> public static double[] A = new double[N];
>> public static int[] B = new int[N];
>>
>> public static void testP(){
>> for (int i=0; i<N; i++) {
>> A[i] += A[i] * fval;
>> B[i] += B[i]+2;
>> }
>> }
>> </code></pre>
>>
>> when use `-XX:+OptoScheduling` in aarch64, the sequence is
>> <pre><code class="shell">
>> 190 B15: # out( B15 B16 ) <- in( B14 B15 ) Loop( B15-B15 inner main of N118 strip mined) Freq: 9.9999e+11
>> 190 sxtw R13, R15 # i2l
>> 194 + add R14, R17, R13, LShiftL #3 # ptr
>> 198 ldrd V16, [R14, #16] # double
>> 19c + fmuld V18, V16, V17
>> 1a0 + faddd V16, V18, V16
>> 1a4 strd V16, [R14, #16] # double
>> 1a8 + add R13, R0, R13, LShiftL #2 # ptr
>> 1ac + ldrw R1, [R13, #16] # int
>> 1b0 + addw R14, R1, R1
>> 1b4 + addw R1, R14, #2
>> 1b8 + addw R15, R15, #1
>> 1bc strw R1, [R13, #16] # int
>> 1c0 + cmpw R15, R12
>> 1c4 blt B15 // counted loop end P=1.000000 C=40960.000000
>> </code></pre>
>>
>> Then a more efficient sequence should be:
>> <pre><code class="shell">
>> 190 B15: # out( B15 B16 ) <- in( B14 B15 ) Loop( B15-B15 inner main of N118 strip mined) Freq: 9.9999e+11
>> 190 sxtw R13, R14 # i2l
>> 194 add R15, R17, R13, LShiftL #3 # ptr
>> 198 add R13, R0, R13, LShiftL #2 # ptr
>> 19c ldrd V16, [R15, #16] # double
>> 1a0 ldrw R2, [R13, #16] # int
>> 1a4 fmuld V18, V16, V17
>> 1a8 addw R1, R2, R2
>> 1ac faddd V16, V18, V16
>> 1b0 strd V16, [R15, #16] # double
>> 1b4 addw R1, R1, #2
>> 1b8 strw R1, [R13, #16] # int
>> 1bc addw R14, R14, #1
>> 1c0 cmpw R14, R12
>> 1c4 blt B15 // counted loop end P=1.000000 C=40960.000000
>> </code></pre>
>>
>> This problem also exists in MIPS architecture. This is a patch to fix this problem. Please help review it.
>> Thanks
>
> SUN Guoyun has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision:
>
> 8277178: Reduce the priority of data dependent nodes when OptoScheduling enabled
I asked our performance experts and they observed before variations I saw on Aarch64. It seems they are not caused by your changes.
You did not answered by suggestion about added `C->do_scheduling()` check.
-------------
PR: https://git.openjdk.java.net/jdk/pull/6407
More information about the hotspot-compiler-dev
mailing list