RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v18]

Fri Aug 22 07:41:05 UTC 2025

On Wed, 20 Aug 2025 12:31:11 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> TODO work that arose during review process / recent merges with master:
>> 
>> - Vladimir asked for benchmark where predicate is disabled, only multiversioning. Show that peek performance is identical but compilation time a bit higher. Investigation ongoing.
>> - See if we can harden some of the IR rules in `TestAliasingFuzzer.java` after JDK-8356176. Probably file a follow-up RFE.
>> 
>> ---------------
>> 
>> This is a big patch, but about 3.5k lines are tests. And a large part of the VM changes is comments / proofs.
>> 
>> I am adding a dynamic (runtime) aliasing check to the auto-vectorizer (SuperWord). We use the infrastructure from https://github.com/openjdk/jdk/pull/22016:
>> - Use the auto-vectorization `predicate` when available: we speculate that there is no aliasing, else we trap and re-compile without the predicate.
>> - If the predicate is not available, we use `multiversioning`, i.e. we have a `fast_loop` where there is no aliasing, and hence vectorization. And a `slow_loop` if the check fails, with no vectorization.
>> 
>> --------------------------
>> 
>> **Where to start reviewing**
>> 
>> - `src/hotspot/share/opto/mempointer.hpp`:
>>   - Read the class comment for `MemPointerRawSummand`.
>>   - Familiarize yourself with the `MemPointer Linearity Corrolary`. We need it for the proofs of the aliasing runtime checks.
>> 
>> - `src/hotspot/share/opto/vectorization.cpp`:
>>   - Read the explanations and proofs above `VPointer::can_make_speculative_aliasing_check_with`. It explains how the aliasing runtime check works.
>> 
>> - `src/hotspot/share/opto/vtransform.hpp`:
>>   - Understand the difference between weak and strong edges.
>> 
>> If you need to see some examples, then look at the tests:
>> - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasing.java`: simple array cases. IR rules that check for vectors and in somecases if we used multiversioning.
>> - `test/micro/org/openjdk/bench/vm/compiler/VectorAliasing.java`: the miro-benchmarks I show below. Simple array cases.
>> - `test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegmentAliasing.java`: a bit advanced, but similar cases.
>> - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.java`: very large and rather compliex. Generates random loops, some with and some without aliasing at runtime. IR verification, but mostly currently only for array cases, MemorySegment cases have some issues (see comments).
>> --------------------------
>> 
>> **Details**
>> 
>> Most fundamentally:
>> - I had to...
>
> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
> 
>   disable flag if not possible

Here the comparisons on different platforms.

my avx512 laptop:
<img width="2408" height="539" alt="image" src="https://github.com/user-attachments/assets/d02edc19-cfa8-42ad-952a-1d9a67d3d067" />

linux-x64:
<img width="1755" height="539" alt="image" src="https://github.com/user-attachments/assets/1332620f-4ced-4e76-9d28-d0cbfa769c1e" />

macosx-x64:
<img width="1745" height="540" alt="image" src="https://github.com/user-attachments/assets/88e7e8bc-40c2-46a7-ad77-5d67f38006c9" />

linux-aarch64:
<img width="1749" height="562" alt="image" src="https://github.com/user-attachments/assets/7adca3e7-f9c9-4a7e-80f8-56c98ad61691" />

macosx-aarch64:
<img width="1762" height="537" alt="image" src="https://github.com/user-attachments/assets/9f556483-a15a-488d-b3d6-e7f79a023e95" />

Strange is that the aliasing cases on `patch` and `no_predicate` can but do not have to have regressions. For example compare macosx-64x (regression with long only) and macosx-aarch64 (regression with byte and int only). But there are some kinds of regressions across all platforms.

**Still**: the regression is in the 10-30% range for the edge case of aliasing. All other cases (no aliasing) have massive speedups. So over-all this is still a massive win.

And yet: I would like to at least understand what the issue is here. I have no explanation at all right now. What I have tried so far:
- Looked at assembly. Looks extremely similar, at least the main-loop does look basically identical. I checked with `perfasm` attached to the JMH benchmark, see results [here](https://github.com/openjdk/jdk/pull/24278#issuecomment-3201092650) and [here](https://github.com/openjdk/jdk/pull/24278#issuecomment-3213283462).
- Artificially avoid vectorization of fast-loop, just to check if there may be an issue with `vzeroupper` / AVX->SSE transition. No effect.
- Played with assembly level loop-alignment (address of instructions, OptoLoopAlignment). No effect.
- Might it be the runtime check and related branch misprediction? But I can increase the iterations in the main-loop, and it has no effect on the performance difference. We only check the runtime check once per loop, so it should fade away as the loop size increases. But it does not fade away.
- Run `perf stat`: it tells me that I have some issue with `backend_bound` and `bad_speculation`, see [here](https://github.com/openjdk/jdk/pull/24278#issuecomment-3201092650). But I cannot really find out more details on my machine. I'm also not sure if the reporting is correct here.
- It is also not noise in the benchmark: all other results are quite sharp, and behave as expected.

To summarize what I'm comparing here:
- `not_profitable` (like before this PR): does not vectorize. All we get is a scalar loop for all cases.
- `patch` and `no_predicate`: for aliasing cases, we eventually compile with multiversioning. Here, we get a fast-path (vectorized loop) and a slow-path (scalar loop). A runtime check determines which branch we take. With the aliasing case, we always take the slow-path. That performance we would expect to be identical to `not_profitable`. But we see that is not always the case.

@vnkozlov Any other ideas what I could look into here?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/24278#issuecomment-3213393035