RFR: 8357530: C2 SuperWord: Diagnostic flag AutoVectorizationOverrideProfitability [v2]

Mon May 26 05:59:27 UTC 2025

> I'm adding a diagnostic flag `AutoVectorizationOverrideProfitability`. The goal is that with it, we can systematically benchmark our Auto Vectorization profitability heuristics. In all cases, we run Auto Vectorization, including packing.
> - `0`: abort vectorization, as if it was not profitable.
> - `1`: default, use profitability heuristics to determine if we should vectorize.
> - `2`: always vectorize when possible, even if profitability heuristic would say that it is not profitable.
> 
> In the future, we may change our heuristics. We may for example introduce a cost model [JDK-8340093](https://bugs.openjdk.org/browse/JDK-8340093). But at any rate, we need this flag, so that we can override these profitability heuristics, even if just for benchmarking.
> 
> I did not yet go through all of `SuperWord` to check if there may be other decisions that could go under this flag. If we find any later, we can still add them.
> 
> Below, I'm showing how it helps to benchmark the some reduction cases we have been working on.
> 
> And if you want a small test to experiement with, I have one at the end for you.
> 
> **Note to reviewer:** This patch should not make any behavioral difference, i.e. with the default `AutoVectorizationOverrideProfitability=1` the behavior should be as before this patch.
> 
> --------------------------------------
> 
> **Use-Case: investigate Reduction Heuristics**
> 
> A while back, I have written a comprehensive benchmark for Reductions https://github.com/openjdk/jdk/pull/21032. I saw that some cases might possibly be profitable, but we have disabled vectorization because of a heuristic.
> 
> This heuristic was added a long time ago. The observation at the time was that simple add and mul reductions were not profitable.
> - https://bugs.openjdk.org/browse/JDK-8078563
> - https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2015-April/017740.html
> From the comments, it becomes clear that "simple reductions" are not profitable, that's why we check if there are more work vectors than reduction vectors. But I'm not sure why 2-element reductions are deemed always not profitable. Maybe it fit the benchmarks at the time, but now with moving reductions out of the loop, this probably does not make sense any more, at least for int/long.
> 
> But in the meantime, I have added an improvement, where we move int/long reductions out of the loop. We can do that because int/long reductions can be reordered. See https://github.com/openjdk/jdk/pull/13056 . We cannot do that with float/double reductions,...

Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:

  Apply suggestions from code review

  Co-authored-by: Tobias Hartmann <tobias.hartmann at oracle.com>

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/25387/files
  - new: https://git.openjdk.org/jdk/pull/25387/files/54c626e0..3f2c2698

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=25387&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25387&range=00-01

  Stats: 4 lines in 2 files changed: 0 ins; 0 del; 4 mod
  Patch: https://git.openjdk.org/jdk/pull/25387.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/25387/head:pull/25387

PR: https://git.openjdk.org/jdk/pull/25387