RFR: 8300258: C2: vectorization fails on simple ByteBuffer loop [v2]

Emanuel Peter epeter at openjdk.org
Wed Feb 22 10:36:07 UTC 2023


On Tue, 21 Feb 2023 08:26:59 GMT, Roland Westrelin <roland at openjdk.org> wrote:

>> The loop that doesn't vectorize is:
>> 
>> 
>> public static void testByteLong4(byte[] dest, long[] src, int start, int stop) {
>>     for (int i = start; i < stop; i++) {
>>         UNSAFE.putLongUnaligned(dest, 8 * i + baseOffset, src[i]);
>>     }
>> }
>> 
>> 
>> It's from a micro-benchmark in the panama
>> repo. `SuperWord::find_adjacent_refs() `prevents it from vectorizing
>> because it finds it cannot properly align the loop and, from the
>> comment in the code, that:
>> 
>> 
>> // Can't allow vectorization of unaligned memory accesses with the
>> // same type since it could be overlapped accesses to the same array.
>> 
>> 
>> The test for "same type" is implemented by looking at the memory
>> operation type which in this case is overly conservative as the loop
>> above is reading and writing with long loads/stores but from and to
>> arrays of different types that can't overlap. Actually, with such
>> mismatched accesses, it's also likely an incorrect test (reading and
>> writing could be to the same array with loads/stores that use
>> different operand size) eventhough I couldn't write a test case that
>> would trigger an incorrect execution.
>> 
>> As a fix, I propose implementing the "same type" test by looking at
>> memory aliases instead.
>
> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision:
> 
>  - comments
>  - extra test
>  - more
>  - Merge branch 'master' into JDK-8300258
>  - review
>  - more
>  - fix & test

I played a bit around, trying to find an example to trigger the assert. It is not very easy.

I found another related problem though (which may be the reason why I cannot find that assert-trigger-example for you now).
https://github.com/openjdk/jdk/blob/6751978122afc5102db84c7f040a0b0ab76e70b6/src/hotspot/share/opto/superword.cpp#L1268-L1271

https://github.com/openjdk/jdk/blob/6751978122afc5102db84c7f040a0b0ab76e70b6/src/hotspot/share/opto/superword.cpp#L629-L641

`isomorphic` would allow memops from different slices to be considered for alignment at the same time. I think we could an additional restriction that only allows memops to be isomorphic if they are also from the same slice.

For this example, I see that the `LoadF` for the `char` and the `byte` array are processed at the same time, for `mem_ref` `466  LoadF  === 500 503 467  [[ 465 ]]  @byte[int:>=0]`.

So far I have not yet been able to reproduce a bug, but I think it should be possible.


./java -XX:-TieredCompilation -Xbatch --add-modules java.base --add-exports java.base/jdk.internal.misc=ALL-UNNAMED -XX:CompileCommand=compileonly,Test::test -XX:CompileCommand=printcompilation,Test::test -XX:+TraceSuperWord Test.java



import jdk.internal.misc.Unsafe;

public class Test {
    static int N = 100;
    static int M = 100*8;
    static byte goldB[] = new byte[M];
    static char goldC[] = new char[M];

    static Unsafe unsafe = Unsafe.getUnsafe();

    public static void main(String[] strArr) {
        init(goldB,goldC);
        test(goldB,goldC);
        for (int i = 0; i < 10_000; i++){
            byte[] dataB = new byte[M];
            char[] dataC = new char[M];
            init(dataB,dataC);
            test(dataB,dataC);
            verify(dataB, goldB);
            verify(dataC, goldC);
        }
    }

    static void test(byte[] dataB, char[] dataC) {
        for (int i = 2; i < N-2; i++) {
            float w = unsafe.getFloat(dataC, unsafe.ARRAY_CHAR_BASE_OFFSET + i*4);
            unsafe.putFloat(dataC, unsafe.ARRAY_CHAR_BASE_OFFSET + i*4, w + 5);
            float v = unsafe.getFloat(dataB, unsafe.ARRAY_BYTE_BASE_OFFSET + i*4);
            unsafe.putInt(dataB, unsafe.ARRAY_BYTE_BASE_OFFSET + i*4, (int)(v + 5));
        }
    }

    static void init(byte[] dataB, char[] dataC) {
        for (int j = 0; j < M; j++) {
            dataB[j] = (byte)j;
            dataC[j] = (char)j;
        }
    }

    static void verify(byte[] data, byte[] gold) {
        for (int i = 0; i < M; i++) {
            if (data[i] != gold[i]) {
                throw new RuntimeException(" Invalid result: data[" + i + "]: " + data[i] + " != " + gold[i]);
            }
        }
    }

    static void verify(char[] data, char[] gold) {
        for (int i = 0; i < M; i++) {
            if (data[i] != gold[i]) {
                throw new RuntimeException(" Invalid result: data[" + i + "]: " + data[i] + " != " + gold[i]);
            }
        }
    }
}


I will look into this again once we are done fixing this bug here, and mine in https://github.com/openjdk/jdk/pull/12350.
I think we should be able to break things like this: We have `StoreF` for two different memory slices. They get aligned to the same memref, and pass for this one. But the second one that we smuggle along actually has a conflict on its own slice. But we only do the checks for the first slice. Currently, I cannot find a reproducer without triggering either my bug or the one here at the same time.

So I am noting this here so that I can remember later what to investigate.

-------------

PR: https://git.openjdk.org/jdk/pull/12440


More information about the hotspot-compiler-dev mailing list