RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v10]

Tue Aug 19 14:55:53 UTC 2025

On Mon, 18 Aug 2025 14:50:20 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:

>>> @eme64 did you measure how much C2 compilation time changed with these changes (all optimizations enabled)?
>> 
>> I did not. I don't think it would take much extra time in almost all cases. The extra analysis is not that costly compared to unrolling that we do in all cases already. What might cost more: if we deopt because of the runtime check, and recompile with multiversioning. That could essencially double C2 compile time for those cases.
>> 
>> Do you think it is worth it to benchmark now, or should be just rely on @robcasloz 's occasional benchmarking and address the issues if they come up?
>> 
>> If you want me to do C2 time benchmarking: should I just show a few specific micro-benchmarks, or do you want to have statistics collected on larger benchmark suites?
>
>> Do you think it is worth it to benchmark now, or should be just rely on @robcasloz 's occasional benchmarking and address the issues if they come up?
> 
> I am fine with using Roberto's benchmarking later. Just keep eye on it.

@vnkozlov I ran some more benchmarks:

<img width="1595" height="405" alt="image" src="https://github.com/user-attachments/assets/3e526698-80fd-4632-84a8-b467196fbf30" />

Columns:
- `not_profitable` - `-XX:AutoVectorizationOverrideProfitability=0`. Serves as baseline scalar performance. Unrolling is the same as if we vectorized.
- `no_sw` - `-XX:+UseSuperWord`. Can mess with unrolling factor, and thus gets worse performance.
- `patch` - no flags. Overall best performance - except for `bench_copy_array_B_differentIndex_alias` and `bench_copy_array_I_differentIndex_alias` - need to investigate ⚠ 
- `no_predicate` - `-XX:-UseAutoVectorizationPredicate`. Same performance as `patch`, we just always use multiversioning immediately. In a separate benchmark, I can show that this requires more C2 compile time and produces larger code - so less desirable.
- `no_multiversioning` - `-XX:-LoopMultiversioning`: struggles with mixed cases. As soon as it encounters an aliasing case, the predicate leads to deopt, and then we recompile without predicate, and so do not vectorize any more - you get scalar performance.
- `no_rt_check` - `-XX:-UseAutoVectorizationSpeculativeAliasingChecks`: behavior as before patch - no vectorization of runtime check required.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/24278#issuecomment-3201092650