RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v6]

Mon Aug 11 13:43:27 UTC 2025

On Mon, 11 Aug 2025 00:08:11 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> This is a big patch, but about 3.5k lines are tests. And a large part of the VM changes is comments / proofs.
>> 
>> I am adding a dynamic (runtime) aliasing check to the auto-vectorizer (SuperWord). We use the infrastructure from https://github.com/openjdk/jdk/pull/22016:
>> - Use the auto-vectorization `predicate` when available: we speculate that there is no aliasing, else we trap and re-compile without the predicate.
>> - If the predicate is not available, we use `multiversioning`, i.e. we have a `fast_loop` where there is no aliasing, and hence vectorization. And a `slow_loop` if the check fails, with no vectorization.
>> 
>> --------------------------
>> 
>> **Where to start reviewing**
>> 
>> - `src/hotspot/share/opto/mempointer.hpp`:
>>   - Read the class comment for `MemPointerRawSummand`.
>>   - Familiarize yourself with the `MemPointer Linearity Corrolary`. We need it for the proofs of the aliasing runtime checks.
>> 
>> - `src/hotspot/share/opto/vectorization.cpp`:
>>   - Read the explanations and proofs above `VPointer::can_make_speculative_aliasing_check_with`. It explains how the aliasing runtime check works.
>> 
>> - `src/hotspot/share/opto/vtransform.hpp`:
>>   - Understand the difference between weak and strong edges.
>> 
>> If you need to see some examples, then look at the tests:
>> - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasing.java`: simple array cases. IR rules that check for vectors and in somecases if we used multiversioning.
>> - `test/micro/org/openjdk/bench/vm/compiler/VectorAliasing.java`: the miro-benchmarks I show below. Simple array cases.
>> - `test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegmentAliasing.java`: a bit advanced, but similar cases.
>> - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.java`: very large and rather compliex. Generates random loops, some with and some without aliasing at runtime. IR verification, but mostly currently only for array cases, MemorySegment cases have some issues (see comments).
>> --------------------------
>> 
>> **Details**
>> 
>> Most fundamentally:
>> - I had to refactor / extend `MemPointer` so that we have access to `MemPointerRawSummand`s.
>> - These raw summands us to reconstruct the `VPointer` at any `iv` value with `VPointer::make_pointer_expression(Node* iv_value)`.
>>    - With the raw summands, a pointer may look like this: `p = base + ConvI2L(x + 2) + ConvI2L(y + 2)`
>>   - With "regular" summands, this gets simplified to `p = base + 4L +ConvI2L(x) + Conv...
>
> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Update test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.java
>   
>   Co-authored-by: Manuel Hässig <manuel at haessig.org>

Thank you for addressing my comments so far. Here goes another round :)

src/hotspot/share/opto/mempointer.cpp line 85:

> 83:     }
> 84:     // Bail out if scale is NaN.
> 85:     if (scale.is_NaN()) {

If I understand correctly, then a summand cannot be NaN anymore? Do you still bail out somewhere in raw summands if you encounter NaN?

src/hotspot/share/opto/vectorization.cpp line 494:

> 492: //
> 493: // This would allow situations where for some iv p1 is lower than p2, and for
> 494: // other iv p1 is higher than p2. This is not very useful inpractice. We can

Suggestion:

// other iv p1 is higher than p2. This is not very useful in practice. We can

src/hotspot/share/opto/vectorization.cpp line 598:

> 596: //   If iv_stride <= 0, i.e. last <= iv <= init:
> 597: //     (iv - init) * scale_1 >= (iv - init) * iv_scale
> 598: //     (iv - last) * scale_1 <= (iv - last) * iv_scale                   (NEG-STRIDE)

Suggestion:

//   If iv_stride >= 0, i.e. init <= iv <= last:
//     (iv - init) * iv_scale_1 <= (iv - init) * iv_scale2
//     (iv - last) * iv_scale_1 >= (iv - last) * iv_scale2                   (POS-STRIDE)
//   If iv_stride <= 0, i.e. last <= iv <= init:
//     (iv - init) * iv_scale_1 >= (iv - init) * iv_scale2
//     (iv - last) * iv_scale_1 <= (iv - last) * iv_scale2                   (NEG-STRIDE)

If I am not massively confused, the `iv_scale`s should be like this.

src/hotspot/share/opto/vectorization.cpp line 604:

> 602: //       p1(init) + size1 <= p2(init)       (if iv_stride >= 0)  |    p2(last) + size2 <= p1(last)      (if iv_stride >= 0)   |
> 603: //       p1(last) + size1 <= p2(last)       (if iv_stride <= 0)  |    p2(init) + size2 <= p1(init)      (if iv_stride <= 0)   |
> 604: //       ----- is equivalent to -----                            |    ----- is equivalent to -----                            |

Suggestion:

//       ---- are equivalent to -----                            |    ---- are equivalent to -----                            |

This confused me a bit 😅

src/hotspot/share/opto/vectorization.cpp line 625:

> 623: //       <= size1 + p1(init) - init * iv_scale2 + iv * iv_scale2 |    <= size2 + p2(last) - init * iv_scale1 + iv * iv_scale1 |
> 624: //          -- assumption --                                     |       -- assumption --                                     |
> 625: //       <=         p2(init) - init * iv_scale2 + iv * iv_scale2 |    <=         p1(last) - init * iv_scale1 + iv * iv_scale1 |

Suggestion:

//        = size1 + p1(init) - init * iv_scale1 + iv * iv_scale1 |     = size2 + p2(last) - last * iv_scale2 + iv * iv_scale2 |
//                           ------ apply (POS-STRIDE) --------- |                        ------ apply (POS-STRIDE) --------- |
//       <= size1 + p1(init) - init * iv_scale2 + iv * iv_scale2 |    <= size2 + p2(last) - last * iv_scale1 + iv * iv_scale1 |
//          -- assumption --                                     |       -- assumption --                                     |
//       <=         p2(init) - init * iv_scale2 + iv * iv_scale2 |    <=         p1(last) - last * iv_scale1 + iv * iv_scale1 |

`LINEAR-FORM-LAST: p1(iv) = p1(last) - last * iv_scale1 + iv * iv_scale1`

src/hotspot/share/opto/vectorization.cpp line 639:

> 637: //       <= size1 + p1(last) - init * iv_scale2 + iv * iv_scale2 |    <= size2 + p2(init) - init * iv_scale1 + iv * iv_scale1 |
> 638: //          -- assumption --                                     |       -- assumption --                                     |
> 639: //       <=         p2(last) - init * iv_scale2 + iv * iv_scale2 |    <=         p1(init) - init * iv_scale1 + iv * iv_scale1 |

Suggestion:

//        = size1 + p1(last) - last * iv_scale1 + iv * iv_scale1 |     = size2 + p2(init) - init * iv_scale2 + iv * iv_scale2 |
//                           ------ apply (NEG-STRIDE) --------- |                        ------ apply (NEG-STRIDE) --------- |
//       <= size1 + p1(last) - last * iv_scale2 + iv * iv_scale2 |    <= size2 + p2(init) - init * iv_scale1 + iv * iv_scale1 |
//          -- assumption --                                     |       -- assumption --                                     |
//       <=         p2(last) - last * iv_scale2 + iv * iv_scale2 |    <=         p1(init) - init * iv_scale1 + iv * iv_scale1 |

src/hotspot/share/opto/vectorization.cpp line 742:

> 740: //              a solution that also works when the loop is not entered:
> 741: //
> 742: //              k = (init - stride - 1) / abs(stride)

Suggestion:

//              k = (init - limit - 1) / abs(stride)

Where does `stride` come from? If I did not miss anything, this should be `limit`.

src/hotspot/share/opto/vectorization.cpp line 743:

> 741: //
> 742: //              k = (init - stride - 1) / abs(stride)
> 743: //              last = MAX(init, init + k * stride)

Suggestion:

//              last = MIN(init, init + k * stride)

This should be `MIN` otherwise this does not clamp to zero.

src/hotspot/share/opto/vectorization.cpp line 752:

> 750: //     If stride < 0:
> 751: //       k = (init - stride - 1) / abs(stride)
> 752: //       last = MAX(init, init + k * stride)

Suggestion:

//   LAST(init, stride, limit)
//     If stride > 0:
//       k = (limit - init - 1) / abs(stride)
//       last = MAX(init, init + k * stride)
//     If stride < 0:
//       k = (init - limit - 1) / abs(stride)
//       last = MIN(init, init + k * stride)

src/hotspot/share/opto/vectorization.cpp line 853:

> 851:   // For the computation of main_init, we also need the pre_limit, and so we need
> 852:   // to check that this value is pre-loop invariant. In the case of non-equal iv_scales,
> 853:   // we also need toe main_limit in the aliasing check, and so this value must then

Suggestion:

  // we also need the main_limit in the aliasing check, and so this value must then

src/hotspot/share/opto/vectorization.cpp line 895:

> 893:   Node* diffL = (stride > 0) ? new SubLNode(limitL, initL)
> 894:                              : new SubLNode(initL, limitL);
> 895:   Node* diffL_m1 = new AddLNode(diffL, igvn.longcon(-1));

Out of curiosity, why did you choose `AddL(diff, -1)` over `SubL(diff, 1)`?

src/hotspot/share/opto/vectorization.cpp line 1026:

> 1024:     if (vp1.iv_scale() > vp2.iv_scale()) {
> 1025:       swap(p1_init, p2_init);
> 1026:       swap(size1, size2);

Shouldn't we perform this swap before calling `make_last()`, since `make_last()` assumes `iv_scale1 < iv_scale2`?

test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java line 62:

> 60:                 for (String sac : List.of("-XX:-UseAutoVectorizationSpeculativeAliasingChecks", "-XX:+UseAutoVectorizationSpeculativeAliasingChecks")) {
> 61:                     TestFramework.runWithFlags("--add-modules", "java.base", "--add-exports", "java.base/jdk.internal.misc=ALL-UNNAMED",
> 62:                                                "-XX:+UnlockExperimentalVMOptions", av, coh, sac);

This might be a good fit for `Scenarios`. I find it easier to determine which cases failed.

-------------

Changes requested by mhaessig (Committer).

PR Review: https://git.openjdk.org/jdk/pull/24278#pullrequestreview-3104618375
PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2265986953
PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2266065094
PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2266095980
PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2266124115
PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2266266510
PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2266282080
PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2266464492
PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2266514460
PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2266519259
PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2266449265
PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2266475956
PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2266556670
PR Review Comment: https://git.openjdk.org/jdk/pull/24278#discussion_r2266809686