RFR: 8323582: C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory

Roland Westrelin roland at openjdk.org
Tue Feb 18 09:35:16 UTC 2025


On Mon, 11 Nov 2024 14:40:09 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> Note: the approach with Predicates and Multiversioning prepares us well for Runtime Checks for Aliasing Analysis, see more below.
> 
> **Background**
> 
> With `-XX:+AlignVector`, all vector loads/stores must be aligned. We try to statically determine if we can always align the vectors. One condition is that the address `base` is already aligned. For arrays, we know that this always holds, because they are `ObjectAlignmentInBytes` aligned. But with native memory, the `base` is just some arbitrarily aligned pointer.
> 
> **Problem**
> 
> So far, we have just naively assumed that the `base` is always `ObjectAlignmentInBytes` aligned. But that does not hold for `native` memory segments: the `base` can also be unaligned. I had constructed such an example, and with `-XX:+AlignVector -XX:+VerifyAlignVector` this example hits the verification code.
> 
> 
> MemorySegment nativeAligned = Arena.ofAuto().allocate(RANGE * 4 + 1);
> MemorySegment nativeUnaligned = nativeAligned.asSlice(1);
> test3(nativeUnaligned);
> 
> 
> When compiling the test method, we assume that the `nativeUnaligned.address()` is aligned - but it is not!
> 
>     static void test3(MemorySegment ms) {
>         for (int i = 0; i < RANGE; i++) {
>             long adr = i * 4L;
>             int v = ms.get(ELEMENT_LAYOUT, adr);
>             ms.set(ELEMENT_LAYOUT, adr, (int)(v + 1));
>         }
>     }
> 
> 
> **Solution: Runtime Checks - Predicate and Multiversioning**
> 
> Of course we could just forbid cases where we have a `native` base from vectorizing. But that would lead to regressions currently - in most cases we do get aligned `base`s, and we currently vectorize those. We cannot statically determine if the `base` is aligned, we need a runtime check.
> 
> I came up with 2 options where to place the runtime checks:
> - A new "auto vectorization" Parse Predicate:
>   - This only works when predicates are available.
>   - If we fail the predicate, then we recompile without the predicate. That means we cannot add a check to the predicate any more, and we would have to do multiversioning at that point if we still want to have a vectorized loop.
> - Multiversion the loop:
>   - Create 2 copies of the loop (fast and slow loops).
>   - The `fast_loop` can make speculative alignment assumptions, and add the corresponding check to the `multiversion_if` which decides which loop we take
>   - In the `slow_loop`, we make no assumption which means we can not vectorize, but we still compile - so even unaligned `base`s would end up with reasonably fast code.
>   - We "stall" the `...

Would it make sense to add verification code that makes sure that whenever a loop is flagged as multi version, c2 can find the multi version guard (and maybe whenever there's a multi version guard, loops that are guarded are indeed flagged correctly)?

src/hotspot/share/opto/loopTransform.cpp line 751:

> 749:         // Peeling also destroys the connection of the main loop
> 750:         // to the multiversion_if.
> 751:         cl->set_no_multiversion();

Would we want to change the multiversion guard at this point so it constant folds and the slow version is removed?

src/hotspot/share/opto/loopUnswitch.cpp line 513:

> 511: 
> 512:   // Create new Region.
> 513:   RegionNode* region = new RegionNode(1);

So we create a new `Region` every time a new condition is added?

src/hotspot/share/opto/loopnode.cpp line 1097:

> 1095:     // PhaseIdealLoop::add_parse_predicate only checks trap limits per method, so
> 1096:     // we do a custom check here.
> 1097:     if (!C->too_many_traps(cloned_sfpt->jvms()->method(), cloned_sfpt->jvms()->bci(), Deoptimization::Reason_auto_vectorization_check)) {

Isn't that done by `add_parse_predicate`?

src/hotspot/share/opto/traceAutoVectorizationTag.hpp line 32:

> 30: 
> 31: #define COMPILER_TRACE_AUTO_VECTORIZATION_TAG(flags) \
> 32:   flags(POINTER_PARSING,            "Trace VPointer/MemPointer parsing") \

Has anything changed here? I stared at it a few times and couldn't figure out what has.

-------------

PR Review: https://git.openjdk.org/jdk/pull/22016#pullrequestreview-2622881581
PR Review Comment: https://git.openjdk.org/jdk/pull/22016#discussion_r1959338954
PR Review Comment: https://git.openjdk.org/jdk/pull/22016#discussion_r1959344256
PR Review Comment: https://git.openjdk.org/jdk/pull/22016#discussion_r1959347164
PR Review Comment: https://git.openjdk.org/jdk/pull/22016#discussion_r1959349092


More information about the hotspot-dev mailing list