RFR: 8323582: C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory

Emanuel Peter epeter at openjdk.org
Mon Jan 27 06:52:06 UTC 2025


Note: the approach with Predicates and Multiversioning prepares us well for Runtime Checks for Aliasing Analysis, see more below.

**Background**

With `-XX:+AlignVector`, all vector loads/stores must be aligned. We try to statically determine if we can always align the vectors. One condition is that the address `base` is already aligned. For arrays, we know that this always holds, because they are `ObjectAlignmentInBytes` aligned. But with native memory, the `base` is just some arbitrarily aligned pointer.

**Problem**

So far, we have just naively assumed that the `base` is always `ObjectAlignmentInBytes` aligned. But that does not hold for `native` memory segments: the `base` can also be unaligned. I had constructed such an example, and with `-XX:+AlignVector -XX:+VerifyAlignVector` this example hits the verification code.


MemorySegment nativeAligned = Arena.ofAuto().allocate(RANGE * 4 + 1);
MemorySegment nativeUnaligned = nativeAligned.asSlice(1);
test3(nativeUnaligned);


When compiling the test method, we assume that the `nativeUnaligned.address()` is aligned - but it is not!

    static void test3(MemorySegment ms) {
        for (int i = 0; i < RANGE; i++) {
            long adr = i * 4L;
            int v = ms.get(ELEMENT_LAYOUT, adr);
            ms.set(ELEMENT_LAYOUT, adr, (int)(v + 1));
        }
    }


**Solution: Runtime Checks - Predicate and Multiversioning**

Of course we could just forbid cases where we have a `native` base from vectorizing. But that would lead to regressions currently - in most cases we do get aligned `base`s, and we currently vectorize those. We cannot statically determine if the `base` is aligned, we need a runtime check.

I came up with 2 options where to place the runtime checks:
- A new "auto vectorization" Parse Predicate:
  - This only works when predicates are available.
  - If we fail the predicate, then we recompile without the predicate. That means we cannot add a check to the predicate any more, and we would have to do multiversioning at that point if we still want to have a vectorized loop.
- Multiversion the loop:
  - Create 2 copies of the loop (fast and slow loops).
  - The `fast_loop` can make speculative alignment assumptions, and add the corresponding check to the `multiversion_if` which decides which loop we take
  - In the `slow_loop`, we make no assumption which means we can not vectorize, but we still compile - so even unaligned `base`s would end up with reasonably fast code.
  - We "stall" the `slow_loop` from optimizing until we have fully vectorized the `fast_loop`, and know that we actually are adding runtime checks to the `multiversion_if`, and we really need the `slow_loop`.

Hence, the goal is that we compile like this:
- First with predicate: if we are lucky we never see an unaligned `base`.
- If we fail the check at the predicate: deopt, next time do not use the predicate for that loop.
- When we recompile, we find no predicate, and instead multiversion the loop, so that we can compile both for aligned (vectorize) and unaligned (not vectorize) `base`.


**Future Work: Runtime Check for Aliasing Analysis**

See: [JDK-8324751](https://bugs.openjdk.org/browse/JDK-8324751): C2 SuperWord: Aliasing Analysis runtime check
This whole infrastructure with "auto vectorization" Parse Predicate and Multiversioning can be used when we implement Runtime Checks for Aliasing Analysis: We speculate that there is no aliasing. If the runtime check fails, we deopt at the predicate, or take the `slow_loop` for Multiversioning.

-------------

Commit messages:
 - remove multiversion mark if we break the structure
 - register opaque with igvn
 - copyright and rm CFG check
 - IR rules for all cases
 - 3 test versions
 - test changed to unaligned ints
 - stub for slicing
 - add Verify/AlignVector runs to test
 - refactor verify
 - rm TODO
 - ... and 52 more: https://git.openjdk.org/jdk/compare/16dcf15a...c53985f6

Changes: https://git.openjdk.org/jdk/pull/22016/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22016&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8323582
  Stats: 1074 lines in 27 files changed: 951 ins; 28 del; 95 mod
  Patch: https://git.openjdk.org/jdk/pull/22016.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/22016/head:pull/22016

PR: https://git.openjdk.org/jdk/pull/22016


More information about the graal-dev mailing list