RFR: 8357530: C2 SuperWord: Diagnostic flag AutoVectorizationOverrideProfitability

Fri May 23 15:13:52 UTC 2025

On Thu, 22 May 2025 08:54:42 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> I'm adding a diagnostic flag `AutoVectorizationOverrideProfitability`. The goal is that with it, we can systematically benchmark our Auto Vectorization profitability heuristics. In all cases, we run Auto Vectorization, including packing.
> - `0`: abort vectorization, as if it was not profitable.
> - `1`: default, use profitability heuristics to determine if we should vectorize.
> - `2`: always vectorize when possible, even if profitability heuristic would say that it is not profitable.
> 
> In the future, we may change our heuristics. We may for example introduce a cost model [JDK-8340093](https://bugs.openjdk.org/browse/JDK-8340093). But at any rate, we need this flag, so that we can override these profitability heuristics, even if just for benchmarking.
> 
> I did not yet go through all of `SuperWord` to check if there may be other decisions that could go under this flag. If we find any later, we can still add them.
> 
> Below, I'm showing how it helps to benchmark the some reduction cases we have been working on.
> 
> And if you want a small test to experiement with, I have one at the end for you.
> 
> **Note to reviewer:** This patch should not make any behavioral difference, i.e. with the default `AutoVectorizationOverrideProfitability=1` the behavior should be as before this patch.
> 
> --------------------------------------
> 
> **Use-Case: investigate Reduction Heuristics**
> 
> A while back, I have written a comprehensive benchmark for Reductions https://github.com/openjdk/jdk/pull/21032. I saw that some cases might possibly be profitable, but we have disabled vectorization because of a heuristic.
> 
> This heuristic was added a long time ago. The observation at the time was that simple add and mul reductions were not profitable.
> - https://bugs.openjdk.org/browse/JDK-8078563
> - https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2015-April/017740.html
> From the comments, it becomes clear that "simple reductions" are not profitable, that's why we check if there are more work vectors than reduction vectors. But I'm not sure why 2-element reductions are deemed always not profitable. Maybe it fit the benchmarks at the time, but now with moving reductions out of the loop, this probably does not make sense any more, at least for int/long.
> 
> But in the meantime, I have added an improvement, where we move int/long reductions out of the loop. We can do that because int/long reductions can be reordered. See https://github.com/openjdk/jdk/pull/13056 . We cannot do that with float/double reductions,...

This looks fine.  One suggestion I have for separate RFE is to use UL for such outputs.

-------------

Marked as reviewed by kvn (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/25387#pullrequestreview-2864779943