RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v18]

Thu Aug 21 06:08:05 UTC 2025

On Wed, 20 Aug 2025 12:31:11 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> TODO work that arose during review process / recent merges with master:
>> 
>> - Vladimir asked for benchmark where predicate is disabled, only multiversioning. Show that peek performance is identical but compilation time a bit higher. Investigation ongoing.
>> - See if we can harden some of the IR rules in `TestAliasingFuzzer.java` after JDK-8356176. Probably file a follow-up RFE.
>> 
>> ---------------
>> 
>> This is a big patch, but about 3.5k lines are tests. And a large part of the VM changes is comments / proofs.
>> 
>> I am adding a dynamic (runtime) aliasing check to the auto-vectorizer (SuperWord). We use the infrastructure from https://github.com/openjdk/jdk/pull/22016:
>> - Use the auto-vectorization `predicate` when available: we speculate that there is no aliasing, else we trap and re-compile without the predicate.
>> - If the predicate is not available, we use `multiversioning`, i.e. we have a `fast_loop` where there is no aliasing, and hence vectorization. And a `slow_loop` if the check fails, with no vectorization.
>> 
>> --------------------------
>> 
>> **Where to start reviewing**
>> 
>> - `src/hotspot/share/opto/mempointer.hpp`:
>>   - Read the class comment for `MemPointerRawSummand`.
>>   - Familiarize yourself with the `MemPointer Linearity Corrolary`. We need it for the proofs of the aliasing runtime checks.
>> 
>> - `src/hotspot/share/opto/vectorization.cpp`:
>>   - Read the explanations and proofs above `VPointer::can_make_speculative_aliasing_check_with`. It explains how the aliasing runtime check works.
>> 
>> - `src/hotspot/share/opto/vtransform.hpp`:
>>   - Understand the difference between weak and strong edges.
>> 
>> If you need to see some examples, then look at the tests:
>> - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasing.java`: simple array cases. IR rules that check for vectors and in somecases if we used multiversioning.
>> - `test/micro/org/openjdk/bench/vm/compiler/VectorAliasing.java`: the miro-benchmarks I show below. Simple array cases.
>> - `test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegmentAliasing.java`: a bit advanced, but similar cases.
>> - `test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingFuzzer.java`: very large and rather compliex. Generates random loops, some with and some without aliasing at runtime. IR verification, but mostly currently only for array cases, MemorySegment cases have some issues (see comments).
>> --------------------------
>> 
>> **Details**
>> 
>> Most fundamentally:
>> - I had to...
>
> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
> 
>   disable flag if not possible

> > I've also investigated the performance issue with the aliasing case that uses multiversioning. And I so far could not figure out the 10% performance regression, see detailed analysis attempt [#24278 (comment)](https://github.com/openjdk/jdk/pull/24278#issuecomment-3201092650)
> 
> Is it possible it always go into slow path?

Yes, the aliasing case would always take the slow path. But that should be as fast as the scalar performance before the patch, and the same performance as `not_profitable` where we do not vectorize. The strange thing is now that we enter the slow path, but somehow the performance is 10% lower than before. But as I showed, the scalar code is basically the same in the main loop that we execute. Something must be causing the 10% difference...

-------------

PR Comment: https://git.openjdk.org/jdk/pull/24278#issuecomment-3209120343