RFR: 8300258: C2: vectorization fails on simple ByteBuffer loop
Roland Westrelin
roland at openjdk.org
Tue Feb 21 08:26:59 UTC 2023
On Fri, 10 Feb 2023 14:55:04 GMT, Emanuel Peter <epeter at openjdk.org> wrote:
>> The loop that doesn't vectorize is:
>>
>>
>> public static void testByteLong4(byte[] dest, long[] src, int start, int stop) {
>> for (int i = start; i < stop; i++) {
>> UNSAFE.putLongUnaligned(dest, 8 * i + baseOffset, src[i]);
>> }
>> }
>>
>>
>> It's from a micro-benchmark in the panama
>> repo. `SuperWord::find_adjacent_refs() `prevents it from vectorizing
>> because it finds it cannot properly align the loop and, from the
>> comment in the code, that:
>>
>>
>> // Can't allow vectorization of unaligned memory accesses with the
>> // same type since it could be overlapped accesses to the same array.
>>
>>
>> The test for "same type" is implemented by looking at the memory
>> operation type which in this case is overly conservative as the loop
>> above is reading and writing with long loads/stores but from and to
>> arrays of different types that can't overlap. Actually, with such
>> mismatched accesses, it's also likely an incorrect test (reading and
>> writing could be to the same array with loads/stores that use
>> different operand size) eventhough I couldn't write a test case that
>> would trigger an incorrect execution.
>>
>> As a fix, I propose implementing the "same type" test by looking at
>> memory aliases instead.
>
> I have an example that fails before and after your fix.
>
> Idea:
> Load value from array as `int`, store it as `float` with unsafe. Have a cyclic dependency that `independent(s1, s2)` of two adjacent memops does not detect (hence `+8` in `putFloat`).
>
> Reduced2.java
>
> import jdk.internal.misc.Unsafe;
>
> public class Reduced2 {
> static int N = 50;
> static int gold[] = new int[N];
>
> static Unsafe unsafe = Unsafe.getUnsafe();
>
> public static void main(String[] strArr) {
> init(gold);
> test(gold);
> for (int i = 0; i < 10_000; i++){
> int[] data = new int[N];
> init(data);
> test(data);
> verify(data, gold);
> }
> }
>
> static void test(int[] data) {
> for (int i = 2; i < N-2; i++) {
> int v = data[i];
> unsafe.putFloat(data, unsafe.ARRAY_BYTE_BASE_OFFSET + 4 * i + 8, v + 5);
> }
> }
>
> static void init(int[] data) {
> for (int j = 0; j < N; j++) {
> data[j] = j;
> }
> }
>
> static void verify(int[] data, int[] gold) {
> for (int i = 0; i < N; i++) {
> if (data[i] != gold[i]) {
> throw new RuntimeException(" Invalid result: dataI[" + i + "]: " + data[i] + " != " + gold[i]);
> }
> }
> }
> }
>
>
> Launch it with:
> `./java -XX:-TieredCompilation -Xbatch --add-modules java.base --add-exports java.base/jdk.internal.misc=ALL-UNNAMED Reduced2.java`
> The assert hits. Add `-Xint` to the command line and it passes.
>
> This happens with your patch:
> `StoreF` is looked at first -> `best_align_to_mem_ref`.
> Then we look at `StoreI`. We see a misalignment with `StoreF`, so we reject the `StoreI`.
> We then go on to remove all packs that have the same type (`int`).
> The `StoreF` is never removed.
> Eventually, in `extend_packlist`, the packs extend `use -> def` all the way back up, via `StoreF <- ConvI2F <- AddI <- LoadI`.
>
> With `master`, I get this:
> `find_adjacent_refs` does not check the misalignment between `StoreF` and `LoadI`, because they have a different `velt_type` -> schedule it and produce wrong results.
>
> **I see two solutions**
> Disallow taking `MemNodes` back in with `extend_packlist`. I am doing that in my change here (https://github.com/openjdk/jdk/pull/12350).
> But probably more straight forward: filter out all memops and packs that are in the same slice (instead same velt type).
@eme64 thanks for taking the time to do that. I now believe you're right and changed 2 more same element checks to same slice checks. I also added your test case and made the cleanup you suggested.
-------------
PR: https://git.openjdk.org/jdk/pull/12440
More information about the hotspot-compiler-dev
mailing list