RFR: 8340093: C2 SuperWord: implement cost model [v3]

Emanuel Peter epeter at openjdk.org
Mon Nov 3 14:12:40 UTC 2025


> Note: this looks like a large change, but only about 400-500 lines are VM changes. 2.5k comes from new tests.
> 
> Finally: after a long list of refactorings, we can implement the Cost-Model. The refactorings and this implementation was first PoC'd here: https://github.com/openjdk/jdk/pull/20964
> 
> Main goal:
> - Carefully allow the vectorization of reduction cases that lead to speedups, and prevent those that do not (or may cause regressions).
> - Open up new vectorization opportunities in the future, that introduce expensive vector nodes that are only profitable in some cases but not others.
> 
> **Why cost-model?**
> 
> Usually, vectorization leads to speedups because we replace multiple scalar operations with a single vector operation. The scalar and vector operation have a very similar cost per instruction, and so going from 4 scalar ops to a single vector op may yield a 4x speedup. This is a bit simplistic, but the general idea.
> 
> But: some vector ops are expensive. Sometimes, the vector op can be more expensive than the multiple scalar ops it replaces. This is the case with some reduction ops. Or we may introduce a vector op that does not have any corresponding scalar op (e.g. in the case of shuffle). This prevents simple heuristics that only focus on single operations.
> 
> Weighing the total cost of the scalar loop vs the vector loop allows us a more "holistic" approach. There may be expensive vector ops, but other cheaper vector ops may still make it profitable.
> 
> **Implementation**
> 
> Items:
> - New `VTransform::is_profitable`: checks cost-model and some other cost related checks.
>   - `VLoopAnalyzer::cost`: scalar loop cost
>   - `VTransformGraph::cost`: vector loop cost
> - Old reduction heuristic with `_num_work_vecs` and `_num_reductions` used to count check for "simple" reductions where the only "work" vector was the reduction itself. Reductions were not considered profitable if they were "simple". I was able to lift those restrictions.
> - Adapted existing tests.
> - Wrote a new comprehensive test, matching the related JMH benchmark, which we use below.
> 
> **Testing**
> Regular correctness testing, and performance testing. In addition to the JMH micro benchmarks below.
> 
> ------------------------------
> 
> **Some History**
> 
> I have been bothered by "simple" reductions not vectorizing for a long time. It was also a part of [my JVMLS2025 presentation](https://inside.java/2025/08/16/jvmls-hotspot-auto-vectorization/).
> 
> During JDK9, reductions were first vectorized, but then restricted for...

Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:

  More comments for SirYwell

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/27803/files
  - new: https://git.openjdk.org/jdk/pull/27803/files/22dab5a4..d79df4fc

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=27803&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27803&range=01-02

  Stats: 6 lines in 2 files changed: 5 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/27803.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27803/head:pull/27803

PR: https://git.openjdk.org/jdk/pull/27803


More information about the hotspot-compiler-dev mailing list