RFR: 8366702: C2 SuperWord: refactor VTransform vector nodes

Fri Sep 5 13:51:25 UTC 2025

On Tue, 2 Sep 2025 15:30:06 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR:
> https://github.com/openjdk/jdk/pull/20964
> [See plan overfiew.](https://bugs.openjdk.org/browse/JDK-8340093)
> 
> This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier.
> 
> ---------------------------------
> 
> I have to say: I'm very sorry for this refactoring. I took some decisions in https://github.com/openjdk/jdk/pull/19719 that I'm now partially undoing. I moved too much logic from `SuperWord::output` (now called `SuperWordVTransformBuilder::make_vector_vtnode_for_pack`) to the `VTransform...Node::apply`. https://github.com/openjdk/jdk/pull/19719  was a roughly 1.5k line change, and I took about a 0.3k misstep that I'm now correcting here ;)
> 
> I had accidentially made the `VTransformGraph` too close to the `PackSet`, and not close enough to the future vectorized C2 Graph. And that makes some future changes hard.
> 
> My vision:
> - VLoop / VLoopAnalyzer look at the scalar loop and prepare it for SuperWord
> - SuperWord creates the `PackSet`: some nodes are packed, all others are scalar.
> - `SuperWordVTransformBuilder` converts the `PackSet` into the `VTransformGraph`
> - The `VTransformGraph` very closely represents the C2 vectorized loop after vectorization
>   - It does not need to know which `nodes` it packs, it rather just needs to know how to generate the new vector nodes
>   - That means it is straight-forward to compute cost
>   - And it also makes optimizations on that graph easier
>   - And the `apply` methods are simpler too
> 
> ----------------------------------
> 
> So therefore, the main goal was to make the `VTransform...Node::apply` calls simpler again. And move the logic back to `SuperWordVTransformBuilder::make_vector_vtnode_for_pack`.
> 
> One important step to making the the `VTransformGraph` less of a `PackSet` is to remove reliance on `nodes` for the vector nodes.
> 
> What I did:
> - Moving a lot of the logic in `VTransformElementWiseVectorNode::apply` to `SuperWordVTransformBuilder::make_vector_vtnode_for_pack`.
>   - Will make it easier to optimize and compute cost in future RFE's.
> - `VTransformVectorNodePrototype`: packs a lot of the info for `VTransformVectorNode`.
>   - pass info about `bt`, `vlen`, `sopc` instead of the `pack` -> allows us to eventually remove the dependency on `nodes`.
> - New vector nodes, they are special cases I split away from `VTransformElementWiseVectorNode`:
>   - `VTransformReinterpretVectorN...

src/hotspot/share/opto/superwordVTransformBuilder.cpp line 91:

> 89:       init_req_with_scalar(p0,   vtn, MemNode::Control);
> 90:       init_req_with_scalar(p0,   vtn, MemNode::Address);
> 91:       init_req_with_vector(pack, vtn, MemNode::ValueIn);

I'm also adding control to the load/store vectors. That allows us to load control without access to the `nodes` in `VTransformLoadVectorNode::apply` and `VTransformStoreVectorNode::apply`:
https://github.com/openjdk/jdk/blob/05ee280048757e6ac095bf7e28708dce258635bf/src/hotspot/share/opto/vtransform.cpp#L877
https://github.com/openjdk/jdk/blob/05ee280048757e6ac095bf7e28708dce258635bf/src/hotspot/share/opto/vtransform.cpp#L906

src/hotspot/share/opto/superwordVTransformBuilder.cpp line 119:

> 117:     } else {
> 118:       init_all_req_with_vectors(pack, vtn);
> 119:     }

I'm mostly flattening the control flow here.
There is also a new else case that just does `init_all_req_with_vectors(pack, vtn);` this applies to the new nodes that I split away from `ElementWiseVector`:
- `VTransformReinterpretVectorNode`
- `VTransformElementWiseLongOpWithCastToIntVectorNode`
- `VTransformCmpVectorNode`

I also adapted the logic for `CMove`, to integrate the special handling logic from `VTransformElementWiseVectorNode::apply`, so now the inputs are differently permuted already at this stage, and they are now already the same as the generated `BlendVector` will once have them.

src/hotspot/share/opto/superwordVTransformBuilder.cpp line 196:

> 194:     vtn = new (_vtransform.arena()) VTransformElementWiseVectorNode(_vtransform, p0->req(), prototype, vopc);
> 195:   } else if (VectorNode::is_scalar_op_that_returns_int_but_vector_op_returns_long(sopc)) {
> 196:     vtn = new (_vtransform.arena()) VTransformElementWiseLongOpWithCastToIntVectorNode(_vtransform, prototype);

Cases moved from `VTransformElementWiseVectorNode::apply`.

src/hotspot/share/opto/vtransform.cpp line 108:

> 106: #ifndef PRODUCT
> 107:   if (_trace._info) {
> 108:     print_schedule();

Verbose is often too much, but it is nice to see which `VTransformNode`s are generated, and to see their order after scheduling.

src/hotspot/share/opto/vtransform.cpp line 163:

> 161:       VTransformMemVectorNode* vtn = vtnodes.at(i)->isa_MemVector();
> 162:       if (vtn == nullptr) { continue; }
> 163:       const VPointer& vp = vtn->vpointer();

We can check for `MemVector` directly, and then we know that they all represent `Mem` nodes and they all have a `vpointer`.

src/hotspot/share/opto/vtransform.cpp line 798:

> 796:     // Handled by Bool / VTransformBoolVectorNode, so we do not generate any nodes here.
> 797:     return VTransformApplyResult::make_empty();
> 798:   }

Moved to `VTransformCmpVectorNode` -> has empty apply.

src/hotspot/share/opto/vtransform.cpp line 801:

> 799:     vn = VectorNode::make(vopc, in1, in2, vt); // unary and binary
> 800:   } else {
> 801:     vn = VectorNode::make(vopc, in1, in2, in3, vt); // ternary

Moved to `SuperWordVTransformBuilder::build_inputs_for_vector_vtnodes`, to simplify the logic here.

src/hotspot/share/opto/vtransform.cpp line 880:

> 878:   // first has the correct memory state, determined by VTransformGraph::apply_memops_reordering_with_schedule
> 879:   Node* mem  = first->in(MemNode::Memory);
> 880:   Node* adr  = apply_state.transformed_node(in_req(MemNode::Address));

There is still minimal reliance on `nodes` / `first`: but only for `mem` state. And currently, we cannot remove that yet, because we rely on the memory graph being reordered before vectorization, see `VTransformGraph::apply_memops_reordering_with_schedule`.

In a future RFE, I will refactor scheduling, so that we build the memory graph during apply.
See step 3 in [plan overfiew.](https://bugs.openjdk.org/browse/JDK-8340093)

src/hotspot/share/opto/vtransform.cpp line 909:

> 907:   // first has the correct memory state, determined by VTransformGraph::apply_memops_reordering_with_schedule
> 908:   Node* mem  = first->in(MemNode::Memory);
> 909:   Node* adr  = apply_state.transformed_node(in_req(MemNode::Address));

There is still minimal reliance on nodes / first: but only for mem state. And currently, we cannot remove that yet, because we rely on the memory graph being reordered before vectorization, see VTransformGraph::apply_memops_reordering_with_schedule.

In a future RFE, I will refactor scheduling, so that we build the memory graph during apply.
See step 3 in [plan overfiew.](https://bugs.openjdk.org/browse/JDK-8340093)

src/hotspot/share/opto/vtransform.cpp line 933:

> 931:   phase->register_new_node(vn, apply_state.vloop().cl());
> 932:   phase->igvn()._worklist.push(vn);
> 933:   VectorNode::trace_new_vector(vn, "AutoVectorization");

Removing the argument here allows us yet another removal of dependency on the old scalar graph. We only needed it for using the same control as the old graph - but that is not necessary, we can just use the CountedLoop as control, which is good enough.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27056#discussion_r2325037712
PR Review Comment: https://git.openjdk.org/jdk/pull/27056#discussion_r2325051629
PR Review Comment: https://git.openjdk.org/jdk/pull/27056#discussion_r2325054470
PR Review Comment: https://git.openjdk.org/jdk/pull/27056#discussion_r2325058939
PR Review Comment: https://git.openjdk.org/jdk/pull/27056#discussion_r2325065437
PR Review Comment: https://git.openjdk.org/jdk/pull/27056#discussion_r2325066973
PR Review Comment: https://git.openjdk.org/jdk/pull/27056#discussion_r2325068600
PR Review Comment: https://git.openjdk.org/jdk/pull/27056#discussion_r2325076487
PR Review Comment: https://git.openjdk.org/jdk/pull/27056#discussion_r2325077256
PR Review Comment: https://git.openjdk.org/jdk/pull/27056#discussion_r2325079910