RFR: 8300258: C2: vectorization fails on simple ByteBuffer loop
Emanuel Peter
epeter at openjdk.org
Fri Feb 10 14:57:42 UTC 2023
On Mon, 6 Feb 2023 14:15:19 GMT, Roland Westrelin <roland at openjdk.org> wrote:
> The loop that doesn't vectorize is:
>
>
> public static void testByteLong4(byte[] dest, long[] src, int start, int stop) {
> for (int i = start; i < stop; i++) {
> UNSAFE.putLongUnaligned(dest, 8 * i + baseOffset, src[i]);
> }
> }
>
>
> It's from a micro-benchmark in the panama
> repo. `SuperWord::find_adjacent_refs() `prevents it from vectorizing
> because it finds it cannot properly align the loop and, from the
> comment in the code, that:
>
>
> // Can't allow vectorization of unaligned memory accesses with the
> // same type since it could be overlapped accesses to the same array.
>
>
> The test for "same type" is implemented by looking at the memory
> operation type which in this case is overly conservative as the loop
> above is reading and writing with long loads/stores but from and to
> arrays of different types that can't overlap. Actually, with such
> mismatched accesses, it's also likely an incorrect test (reading and
> writing could be to the same array with loads/stores that use
> different operand size) eventhough I couldn't write a test case that
> would trigger an incorrect execution.
>
> As a fix, I propose implementing the "same type" test by looking at
> memory aliases instead.
I have an example that fails before and after your fix.
Reduced2.java
import jdk.internal.misc.Unsafe;
public class Reduced2 {
static int N = 50;
static int gold[] = new int[N];
static Unsafe unsafe = Unsafe.getUnsafe();
public static void main(String[] strArr) {
init(gold);
test(gold);
for (int i = 0; i < 10_000; i++){
int[] data = new int[N];
init(data);
test(data);
verify(data, gold);
}
}
static void test(int[] data) {
for (int i = 2; i < N-2; i++) {
int v = data[i];
unsafe.putFloat(data, unsafe.ARRAY_BYTE_BASE_OFFSET + 4 * i + 8, v + 5);
}
}
static void init(int[] data) {
for (int j = 0; j < N; j++) {
data[j] = j;
}
}
static void verify(int[] data, int[] gold) {
for (int i = 0; i < N; i++) {
if (data[i] != gold[i]) {
throw new RuntimeException(" Invalid result: dataI[" + i + "]: " + data[i] + " != " + gold[i]);
}
}
}
}
Launch it with:
`./java -XX:-TieredCompilation -Xbatch --add-modules java.base --add-exports java.base/jdk.internal.misc=ALL-UNNAMED Reduced2.java`
The assert hits. Add `-Xint` to the command line and it passes.
This happens with your patch:
`StoreF` is looked at first -> `best_align_to_mem_ref`.
Then we look at `StoreI`. We see a misalignment with `StoreF`, so we reject the `StoreI`.
We then go on to remove all packs that have the same type (`int`).
The `StoreF` is never removed.
Eventually, in `best_align_to_mem_ref`, the packs extend `use -> def` all the way back up, via `StoreF <- ConvI2F <- AddI <- LoadI`.
-------------
PR: https://git.openjdk.org/jdk/pull/12440
More information about the hotspot-compiler-dev
mailing list