RFR: 8367389: C2 SuperWord: refactor VTransform to model the whole loop instead of just the basic block

Emanuel Peter epeter at openjdk.org
Fri Sep 12 14:20:34 UTC 2025


I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR:
https://github.com/openjdk/jdk/pull/20964
[See plan overfiew.](https://bugs.openjdk.org/browse/JDK-8340093)

This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier.

------------------------------

**Goals**
- VTransform models **all nodes in the loop**, not just the basic block (enables later VTransform::optimize, like moving reductions out of the loop)
- Remove `_nodes` from the vector vtnodes.

**Details**
- Remove: `AUTO_VECTORIZATION2_AFTER_REORDER`, `apply_memops_reordering_with_schedule`, `print_memops_schedule`.
  - Instead of reordering the scalar memops, we create the new memory graph during `VTransform::apply`. That is why the `VTransformApplyState` now needs to track the memory states.
- Refactor `VLoopMemorySlices`: map not just memory slices with phis (have stores in loop), but also those with only loads (no phi).
- Create vtnodes for all nodes in the loop (not just the basic block), as well as inputs (already) and outputs (new). Mapping also the output nodes means during `apply`, we naturally connect the uses after the loop to their inputs from the loop (which may be new nodes after the transformation).
- `_mem_ref_for_main_loop_alignment` -> `_vpointer_for_main_loop_alignment`. Instead of tracking the memory node to later have access to its `VPointer`, we take it directly. That removes one more use of `_nodes` for vector vtnodes.

I also made a lot of annotations in the code below, for easier review.

**Suggested order for review**
- Removal of `VTransformGraph::apply_memops_reordering_with_schedule` -> sets up need to build memory graph on the fly.
- Old and new code for `VLoopMemorySlices` -> we now also track load-only slices.
- `build_scalar_vtnodes_for_non_packed_nodes`, `build_inputs_for_scalar_vtnodes`, `build_uses_after_loop`, `apply_vtn_inputs_to_node` (use in `apply`), `apply_backedge`, `fix_memory_state_uses_after_loop`
- `VTransformApplyState`: how it now tracks the memory state.
- `VTransformVectorNode` -> removal of `_nodes` (Big Win!)
- Then look at all the other details.

-------------

Commit messages:
 - fix documentation
 - mem_ref -> vpointer
 - wip rm nodes
 - control dependency
 - phi cleanup
 - apply_backedge
 - hook inputs
 - apply
 - wip init memory state
 - small improvement
 - ... and 6 more: https://git.openjdk.org/jdk/compare/2826d170...3ec3ea2a

Changes: https://git.openjdk.org/jdk/pull/27208/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27208&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8367389
  Stats: 690 lines in 10 files changed: 363 ins; 243 del; 84 mod
  Patch: https://git.openjdk.org/jdk/pull/27208.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/27208/head:pull/27208

PR: https://git.openjdk.org/jdk/pull/27208


More information about the hotspot-compiler-dev mailing list