RFR: 8284981: Support the vectorization of some counting-down loops in SLP [v2]
Fei Gao
fgao at openjdk.java.net
Sat Apr 30 03:45:41 UTC 2022
On Fri, 29 Apr 2022 23:40:33 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:
>> Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:
>>
>> - Add an IR testcase
>>
>> Change-Id: If67d200754ed5a579510b46041b2ba8c3c4db22e
>> - Merge branch 'master' into fg8284981
>>
>> Change-Id: I1bc92486ecc0da8917131cc55e9c5694d3c3eae5
>> - 8284981: Support the vectorization of some counting-down loops in SLP
>>
>> SLP can vectorize basic counting-down or counting-up loops. But
>> for the counting-down loop below, in which array index scale
>> is negative and index starts from a constant value, SLP can't
>> succeed in vectorizing.
>>
>> ```
>> private static final int SIZE = 2345;
>> private static int[] a = new int[SIZE];
>> private static int[] b = new int[SIZE];
>>
>> public static void bar() {
>> for (int i = 1000; i > 0; i--) {
>> b[SIZE - i] = a[SIZE - i];
>> }
>> }
>> ```
>>
>> Generally, it's necessary to find adjacent memory operations, i.e.
>> load/store, after unrolling in SLP. Constructing SWPointers[1] for
>> all memory operations is a key step to determine if these memory
>> operations are adjacent. To construct a SWPointer successfully,
>> SLP should first recognize the pattern of the memory address and
>> normalize it. The address pattern of the memory operations in the
>> case above can be visualized as:
>>
>> Phi
>> /
>> ConL ConvI2L
>> \ /
>> SubL ConI
>> \ /
>> LShiftL
>>
>> which is equivalent to `(N - (long) i) << 2`. SLP recursively
>> resolves the address mode by SWPointer::scaled_iv_plus_offset().
>> When arriving at the `SubL` node, it accepts `SubI` only and finally
>> rejects the pattern of the case above[2]. In this way, SLP can't
>> construct effective SWPointers for these memory operations and
>> the process of vectorization breaks off.
>>
>> The pattern like `(N - (long) i) << 2` is formal and easy to
>> resolve. We add the pattern of SubL in the patch to vectorize
>> counting-down loops like the case above.
>>
>> After the patch, generated loop code for above case is like below on
>> aarch64:
>> ```
>> LOOP: mov w10, w12
>> sxtw x12, w10
>> neg x0, x12
>> lsl x0, x0, #2
>> add x1, x17, x0
>> ldr q16, [x1, x2]
>> add x0, x18, x0
>> str q16, [x0, x2]
>> ldr q16, [x1, x13]
>> str q16, [x0, x13]
>> ldr q16, [x1, x14]
>> str q16, [x0, x14]
>> ldr q16, [x1, x15]
>> sub x12, x11, x12
>> lsl x12, x12, #2
>> add x3, x17, x12
>> str q16, [x0, x15]
>> ldr q16, [x3, x2]
>> add x12, x18, x12
>> str q16, [x12, x2]
>> ldr q16, [x1, x16]
>> str q16, [x0, x16]
>> ldr q16, [x3, x14]
>> str q16, [x12, x14]
>> ldr q16, [x3, x15]
>> str q16, [x12, x15]
>> sub w12, w10, #0x20
>> cmp w12, #0x1f
>> b.gt LOOP
>> ```
>>
>> This patch also works on x86 simd machines. We tested full jtreg on both
>> aarch64 and x86 platforms. All tests passed.
>>
>> [1] https://github.com/openjdk/jdk/blob/b56df2808d79dcc1e2d954fe38dd84228c683e8b/src/hotspot/share/opto/superword.cpp#L3826
>> [2] https://github.com/openjdk/jdk/blob/b56df2808d79dcc1e2d954fe38dd84228c683e8b/src/hotspot/share/opto/superword.cpp#L3953
>>
>> Change-Id: Ifcd8f8351ec5b4f7676e6ef134d279a67358b0fb
>
> Tobias testing finished clean. You can integrate.
@vnkozlov @TobiHartmann , thanks for your review and test work :)
-------------
PR: https://git.openjdk.java.net/jdk/pull/8289
More information about the hotspot-compiler-dev
mailing list