RFR: 8284981: Support the vectorization of some counting-down loops in SLP [v2]

Fei Gao fgao at openjdk.java.net
Sat Apr 30 03:45:41 UTC 2022


On Fri, 29 Apr 2022 23:40:33 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:

>> Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:
>> 
>>  - Add an IR testcase
>>    
>>    Change-Id: If67d200754ed5a579510b46041b2ba8c3c4db22e
>>  - Merge branch 'master' into fg8284981
>>    
>>    Change-Id: I1bc92486ecc0da8917131cc55e9c5694d3c3eae5
>>  - 8284981: Support the vectorization of some counting-down loops in SLP
>>    
>>    SLP can vectorize basic counting-down or counting-up loops. But
>>    for the counting-down loop below, in which array index scale
>>    is negative and index starts from a constant value, SLP can't
>>    succeed in vectorizing.
>>    
>>    ```
>>      private static final int SIZE = 2345;
>>      private static int[] a = new int[SIZE];
>>      private static int[] b = new int[SIZE];
>>    
>>      public static void bar() {
>>          for (int i = 1000; i > 0; i--) {
>>              b[SIZE - i] = a[SIZE - i];
>>          }
>>      }
>>    ```
>>    
>>    Generally, it's necessary to find adjacent memory operations, i.e.
>>    load/store, after unrolling in SLP. Constructing SWPointers[1] for
>>    all memory operations is a key step to determine if these memory
>>    operations are adjacent. To construct a SWPointer successfully,
>>    SLP should first recognize the pattern of the memory address and
>>    normalize it. The address pattern of the memory operations in the
>>    case above can be visualized as:
>>    
>>                 Phi
>>                 /
>>      ConL   ConvI2L
>>         \    /
>>          SubL   ConI
>>            \     /
>>            LShiftL
>>    
>>    which is equivalent to `(N - (long) i) << 2`. SLP recursively
>>    resolves the address mode by SWPointer::scaled_iv_plus_offset().
>>    When arriving at the `SubL` node, it accepts `SubI` only and finally
>>    rejects the pattern of the case above[2]. In this way, SLP can't
>>    construct effective SWPointers for these memory operations and
>>    the process of vectorization breaks off.
>>    
>>    The pattern like `(N - (long) i) << 2` is formal and easy to
>>    resolve. We add the pattern of SubL in the patch to vectorize
>>    counting-down loops like the case above.
>>    
>>    After the patch, generated loop code for above case is like below on
>>    aarch64:
>>    ```
>>    LOOP: mov   w10, w12
>>          sxtw  x12, w10
>>          neg   x0, x12
>>          lsl   x0, x0, #2
>>          add   x1, x17, x0
>>          ldr   q16, [x1, x2]
>>          add   x0, x18, x0
>>          str   q16, [x0, x2]
>>          ldr   q16, [x1, x13]
>>          str   q16, [x0, x13]
>>          ldr   q16, [x1, x14]
>>          str   q16, [x0, x14]
>>          ldr   q16, [x1, x15]
>>          sub   x12, x11, x12
>>          lsl   x12, x12, #2
>>          add   x3, x17, x12
>>          str   q16, [x0, x15]
>>          ldr   q16, [x3, x2]
>>          add   x12, x18, x12
>>          str   q16, [x12, x2]
>>          ldr   q16, [x1, x16]
>>          str   q16, [x0, x16]
>>          ldr   q16, [x3, x14]
>>          str   q16, [x12, x14]
>>          ldr   q16, [x3, x15]
>>          str   q16, [x12, x15]
>>          sub   w12, w10, #0x20
>>          cmp   w12, #0x1f
>>          b.gt  LOOP
>>    ```
>>    
>>    This patch also works on x86 simd machines. We tested full jtreg on both
>>    aarch64 and x86 platforms. All tests passed.
>>    
>>    [1] https://github.com/openjdk/jdk/blob/b56df2808d79dcc1e2d954fe38dd84228c683e8b/src/hotspot/share/opto/superword.cpp#L3826
>>    [2] https://github.com/openjdk/jdk/blob/b56df2808d79dcc1e2d954fe38dd84228c683e8b/src/hotspot/share/opto/superword.cpp#L3953
>>    
>>    Change-Id: Ifcd8f8351ec5b4f7676e6ef134d279a67358b0fb
>
> Tobias testing finished clean.  You can integrate.

@vnkozlov @TobiHartmann , thanks for your review and test work :)

-------------

PR: https://git.openjdk.java.net/jdk/pull/8289


More information about the hotspot-compiler-dev mailing list