RFR: 8323582: C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory

Mon Feb 17 15:28:13 UTC 2025

On Mon, 17 Feb 2025 14:16:59 GMT, Roland Westrelin <roland at openjdk.org> wrote:

>> Note: the approach with Predicates and Multiversioning prepares us well for Runtime Checks for Aliasing Analysis, see more below.
>> 
>> **Background**
>> 
>> With `-XX:+AlignVector`, all vector loads/stores must be aligned. We try to statically determine if we can always align the vectors. One condition is that the address `base` is already aligned. For arrays, we know that this always holds, because they are `ObjectAlignmentInBytes` aligned. But with native memory, the `base` is just some arbitrarily aligned pointer.
>> 
>> **Problem**
>> 
>> So far, we have just naively assumed that the `base` is always `ObjectAlignmentInBytes` aligned. But that does not hold for `native` memory segments: the `base` can also be unaligned. I had constructed such an example, and with `-XX:+AlignVector -XX:+VerifyAlignVector` this example hits the verification code.
>> 
>> 
>> MemorySegment nativeAligned = Arena.ofAuto().allocate(RANGE * 4 + 1);
>> MemorySegment nativeUnaligned = nativeAligned.asSlice(1);
>> test3(nativeUnaligned);
>> 
>> 
>> When compiling the test method, we assume that the `nativeUnaligned.address()` is aligned - but it is not!
>> 
>>     static void test3(MemorySegment ms) {
>>         for (int i = 0; i < RANGE; i++) {
>>             long adr = i * 4L;
>>             int v = ms.get(ELEMENT_LAYOUT, adr);
>>             ms.set(ELEMENT_LAYOUT, adr, (int)(v + 1));
>>         }
>>     }
>> 
>> 
>> **Solution: Runtime Checks - Predicate and Multiversioning**
>> 
>> Of course we could just forbid cases where we have a `native` base from vectorizing. But that would lead to regressions currently - in most cases we do get aligned `base`s, and we currently vectorize those. We cannot statically determine if the `base` is aligned, we need a runtime check.
>> 
>> I came up with 2 options where to place the runtime checks:
>> - A new "auto vectorization" Parse Predicate:
>>   - This only works when predicates are available.
>>   - If we fail the predicate, then we recompile without the predicate. That means we cannot add a check to the predicate any more, and we would have to do multiversioning at that point if we still want to have a vectorized loop.
>> - Multiversion the loop:
>>   - Create 2 copies of the loop (fast and slow loops).
>>   - The `fast_loop` can make speculative alignment assumptions, and add the corresponding check to the `multiversion_if` which decides which loop we take
>>   - In the `slow_loop`, we make no assumption which means we can not vectorize, but we still compile - so even ...
>
> What are the architectures affected by this? Isn't it the case that x86 and aarch64 are unaffected by this? Is the motivation to use this as a way to do prep work for alias analysis?
> 
> Do you intend to use a single deoptimization reason for all vectorization related predicates? (that is when you take care of aliasing, are you going to to use the same reason for aliasing and alignment checks)
> 
> I went over the code and it looks reasonable to me. I intend to do a more careful review later.

@rwestrel Thanks for having a first look!

> What are the architectures affected by this? Isn't it the case that x86 and aarch64 are unaffected by this?

Yes, x86 and aarch64 are unaffected, as far as I know. Well, we can simulate strict alignment with `-XX:+AlignVector`, and there it should behave correctly, and it currently fails with the `-XX:+VerifyAlignVector`. It would be nice if that was not the case, so that we can write tests with arbitrary alignment, and turn on those flags freely.

>  Is the motivation to use this as a way to do prep work for alias analysis?

I see this as a bug-fix AND preparation for future work. I suppose I might not have fixed this bug here since our platforms are not really affected, but I might as well fix it now since I can re-use most of the code later.

> Do you intend to use a single deoptimization reason for all vectorization related predicates? (that is when you take care of aliasing, are you going to to use the same reason for aliasing and alignment checks)

I suppose that is currently what I'm planning. But we could in principle separate them. But I would leave that for later, if there is any desire to do that. For now, I think it's ok to just go with a single "auto-vectorization" reason.

Does that sound reasonable?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/22016#issuecomment-2663434802