RFR: 8300258: C2: vectorization fails on simple ByteBuffer loop [v2]
Emanuel Peter
epeter at openjdk.org
Wed Feb 22 10:36:07 UTC 2023
On Tue, 21 Feb 2023 08:26:59 GMT, Roland Westrelin <roland at openjdk.org> wrote:
>> The loop that doesn't vectorize is:
>>
>>
>> public static void testByteLong4(byte[] dest, long[] src, int start, int stop) {
>> for (int i = start; i < stop; i++) {
>> UNSAFE.putLongUnaligned(dest, 8 * i + baseOffset, src[i]);
>> }
>> }
>>
>>
>> It's from a micro-benchmark in the panama
>> repo. `SuperWord::find_adjacent_refs() `prevents it from vectorizing
>> because it finds it cannot properly align the loop and, from the
>> comment in the code, that:
>>
>>
>> // Can't allow vectorization of unaligned memory accesses with the
>> // same type since it could be overlapped accesses to the same array.
>>
>>
>> The test for "same type" is implemented by looking at the memory
>> operation type which in this case is overly conservative as the loop
>> above is reading and writing with long loads/stores but from and to
>> arrays of different types that can't overlap. Actually, with such
>> mismatched accesses, it's also likely an incorrect test (reading and
>> writing could be to the same array with loads/stores that use
>> different operand size) eventhough I couldn't write a test case that
>> would trigger an incorrect execution.
>>
>> As a fix, I propose implementing the "same type" test by looking at
>> memory aliases instead.
>
> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision:
>
> - comments
> - extra test
> - more
> - Merge branch 'master' into JDK-8300258
> - review
> - more
> - fix & test
I played a bit around, trying to find an example to trigger the assert. It is not very easy.
I found another related problem though (which may be the reason why I cannot find that assert-trigger-example for you now).
https://github.com/openjdk/jdk/blob/6751978122afc5102db84c7f040a0b0ab76e70b6/src/hotspot/share/opto/superword.cpp#L1268-L1271
https://github.com/openjdk/jdk/blob/6751978122afc5102db84c7f040a0b0ab76e70b6/src/hotspot/share/opto/superword.cpp#L629-L641
`isomorphic` would allow memops from different slices to be considered for alignment at the same time. I think we could an additional restriction that only allows memops to be isomorphic if they are also from the same slice.
For this example, I see that the `LoadF` for the `char` and the `byte` array are processed at the same time, for `mem_ref` `466 LoadF === 500 503 467 [[ 465 ]] @byte[int:>=0]`.
So far I have not yet been able to reproduce a bug, but I think it should be possible.
./java -XX:-TieredCompilation -Xbatch --add-modules java.base --add-exports java.base/jdk.internal.misc=ALL-UNNAMED -XX:CompileCommand=compileonly,Test::test -XX:CompileCommand=printcompilation,Test::test -XX:+TraceSuperWord Test.java
import jdk.internal.misc.Unsafe;
public class Test {
static int N = 100;
static int M = 100*8;
static byte goldB[] = new byte[M];
static char goldC[] = new char[M];
static Unsafe unsafe = Unsafe.getUnsafe();
public static void main(String[] strArr) {
init(goldB,goldC);
test(goldB,goldC);
for (int i = 0; i < 10_000; i++){
byte[] dataB = new byte[M];
char[] dataC = new char[M];
init(dataB,dataC);
test(dataB,dataC);
verify(dataB, goldB);
verify(dataC, goldC);
}
}
static void test(byte[] dataB, char[] dataC) {
for (int i = 2; i < N-2; i++) {
float w = unsafe.getFloat(dataC, unsafe.ARRAY_CHAR_BASE_OFFSET + i*4);
unsafe.putFloat(dataC, unsafe.ARRAY_CHAR_BASE_OFFSET + i*4, w + 5);
float v = unsafe.getFloat(dataB, unsafe.ARRAY_BYTE_BASE_OFFSET + i*4);
unsafe.putInt(dataB, unsafe.ARRAY_BYTE_BASE_OFFSET + i*4, (int)(v + 5));
}
}
static void init(byte[] dataB, char[] dataC) {
for (int j = 0; j < M; j++) {
dataB[j] = (byte)j;
dataC[j] = (char)j;
}
}
static void verify(byte[] data, byte[] gold) {
for (int i = 0; i < M; i++) {
if (data[i] != gold[i]) {
throw new RuntimeException(" Invalid result: data[" + i + "]: " + data[i] + " != " + gold[i]);
}
}
}
static void verify(char[] data, char[] gold) {
for (int i = 0; i < M; i++) {
if (data[i] != gold[i]) {
throw new RuntimeException(" Invalid result: data[" + i + "]: " + data[i] + " != " + gold[i]);
}
}
}
}
I will look into this again once we are done fixing this bug here, and mine in https://github.com/openjdk/jdk/pull/12350.
I think we should be able to break things like this: We have `StoreF` for two different memory slices. They get aligned to the same memref, and pass for this one. But the second one that we smuggle along actually has a conflict on its own slice. But we only do the checks for the first slice. Currently, I cannot find a reproducer without triggering either my bug or the one here at the same time.
So I am noting this here so that I can remember later what to investigate.
-------------
PR: https://git.openjdk.org/jdk/pull/12440
More information about the hotspot-compiler-dev
mailing list