RFR: 8366702: C2 SuperWord: refactor VTransform vector nodes
Emanuel Peter
epeter at openjdk.org
Fri Sep 5 13:51:25 UTC 2025
On Tue, 2 Sep 2025 15:30:06 GMT, Emanuel Peter <epeter at openjdk.org> wrote:
> I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR:
> https://github.com/openjdk/jdk/pull/20964
> [See plan overfiew.](https://bugs.openjdk.org/browse/JDK-8340093)
>
> This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier.
>
> ---------------------------------
>
> I have to say: I'm very sorry for this refactoring. I took some decisions in https://github.com/openjdk/jdk/pull/19719 that I'm now partially undoing. I moved too much logic from `SuperWord::output` (now called `SuperWordVTransformBuilder::make_vector_vtnode_for_pack`) to the `VTransform...Node::apply`. https://github.com/openjdk/jdk/pull/19719 was a roughly 1.5k line change, and I took about a 0.3k misstep that I'm now correcting here ;)
>
> I had accidentially made the `VTransformGraph` too close to the `PackSet`, and not close enough to the future vectorized C2 Graph. And that makes some future changes hard.
>
> My vision:
> - VLoop / VLoopAnalyzer look at the scalar loop and prepare it for SuperWord
> - SuperWord creates the `PackSet`: some nodes are packed, all others are scalar.
> - `SuperWordVTransformBuilder` converts the `PackSet` into the `VTransformGraph`
> - The `VTransformGraph` very closely represents the C2 vectorized loop after vectorization
> - It does not need to know which `nodes` it packs, it rather just needs to know how to generate the new vector nodes
> - That means it is straight-forward to compute cost
> - And it also makes optimizations on that graph easier
> - And the `apply` methods are simpler too
>
> ----------------------------------
>
> So therefore, the main goal was to make the `VTransform...Node::apply` calls simpler again. And move the logic back to `SuperWordVTransformBuilder::make_vector_vtnode_for_pack`.
>
> One important step to making the the `VTransformGraph` less of a `PackSet` is to remove reliance on `nodes` for the vector nodes.
>
> What I did:
> - Moving a lot of the logic in `VTransformElementWiseVectorNode::apply` to `SuperWordVTransformBuilder::make_vector_vtnode_for_pack`.
> - Will make it easier to optimize and compute cost in future RFE's.
> - `VTransformVectorNodePrototype`: packs a lot of the info for `VTransformVectorNode`.
> - pass info about `bt`, `vlen`, `sopc` instead of the `pack` -> allows us to eventually remove the dependency on `nodes`.
> - New vector nodes, they are special cases I split away from `VTransformElementWiseVectorNode`:
> - `VTransformReinterpretVectorN...
src/hotspot/share/opto/superwordVTransformBuilder.cpp line 91:
> 89: init_req_with_scalar(p0, vtn, MemNode::Control);
> 90: init_req_with_scalar(p0, vtn, MemNode::Address);
> 91: init_req_with_vector(pack, vtn, MemNode::ValueIn);
I'm also adding control to the load/store vectors. That allows us to load control without access to the `nodes` in `VTransformLoadVectorNode::apply` and `VTransformStoreVectorNode::apply`:
https://github.com/openjdk/jdk/blob/05ee280048757e6ac095bf7e28708dce258635bf/src/hotspot/share/opto/vtransform.cpp#L877
https://github.com/openjdk/jdk/blob/05ee280048757e6ac095bf7e28708dce258635bf/src/hotspot/share/opto/vtransform.cpp#L906
src/hotspot/share/opto/superwordVTransformBuilder.cpp line 119:
> 117: } else {
> 118: init_all_req_with_vectors(pack, vtn);
> 119: }
I'm mostly flattening the control flow here.
There is also a new else case that just does `init_all_req_with_vectors(pack, vtn);` this applies to the new nodes that I split away from `ElementWiseVector`:
- `VTransformReinterpretVectorNode`
- `VTransformElementWiseLongOpWithCastToIntVectorNode`
- `VTransformCmpVectorNode`
I also adapted the logic for `CMove`, to integrate the special handling logic from `VTransformElementWiseVectorNode::apply`, so now the inputs are differently permuted already at this stage, and they are now already the same as the generated `BlendVector` will once have them.
src/hotspot/share/opto/superwordVTransformBuilder.cpp line 196:
> 194: vtn = new (_vtransform.arena()) VTransformElementWiseVectorNode(_vtransform, p0->req(), prototype, vopc);
> 195: } else if (VectorNode::is_scalar_op_that_returns_int_but_vector_op_returns_long(sopc)) {
> 196: vtn = new (_vtransform.arena()) VTransformElementWiseLongOpWithCastToIntVectorNode(_vtransform, prototype);
Cases moved from `VTransformElementWiseVectorNode::apply`.
src/hotspot/share/opto/vtransform.cpp line 108:
> 106: #ifndef PRODUCT
> 107: if (_trace._info) {
> 108: print_schedule();
Verbose is often too much, but it is nice to see which `VTransformNode`s are generated, and to see their order after scheduling.
src/hotspot/share/opto/vtransform.cpp line 163:
> 161: VTransformMemVectorNode* vtn = vtnodes.at(i)->isa_MemVector();
> 162: if (vtn == nullptr) { continue; }
> 163: const VPointer& vp = vtn->vpointer();
We can check for `MemVector` directly, and then we know that they all represent `Mem` nodes and they all have a `vpointer`.
src/hotspot/share/opto/vtransform.cpp line 798:
> 796: // Handled by Bool / VTransformBoolVectorNode, so we do not generate any nodes here.
> 797: return VTransformApplyResult::make_empty();
> 798: }
Moved to `VTransformCmpVectorNode` -> has empty apply.
src/hotspot/share/opto/vtransform.cpp line 801:
> 799: vn = VectorNode::make(vopc, in1, in2, vt); // unary and binary
> 800: } else {
> 801: vn = VectorNode::make(vopc, in1, in2, in3, vt); // ternary
Moved to `SuperWordVTransformBuilder::build_inputs_for_vector_vtnodes`, to simplify the logic here.
src/hotspot/share/opto/vtransform.cpp line 880:
> 878: // first has the correct memory state, determined by VTransformGraph::apply_memops_reordering_with_schedule
> 879: Node* mem = first->in(MemNode::Memory);
> 880: Node* adr = apply_state.transformed_node(in_req(MemNode::Address));
There is still minimal reliance on `nodes` / `first`: but only for `mem` state. And currently, we cannot remove that yet, because we rely on the memory graph being reordered before vectorization, see `VTransformGraph::apply_memops_reordering_with_schedule`.
In a future RFE, I will refactor scheduling, so that we build the memory graph during apply.
See step 3 in [plan overfiew.](https://bugs.openjdk.org/browse/JDK-8340093)
src/hotspot/share/opto/vtransform.cpp line 909:
> 907: // first has the correct memory state, determined by VTransformGraph::apply_memops_reordering_with_schedule
> 908: Node* mem = first->in(MemNode::Memory);
> 909: Node* adr = apply_state.transformed_node(in_req(MemNode::Address));
There is still minimal reliance on nodes / first: but only for mem state. And currently, we cannot remove that yet, because we rely on the memory graph being reordered before vectorization, see VTransformGraph::apply_memops_reordering_with_schedule.
In a future RFE, I will refactor scheduling, so that we build the memory graph during apply.
See step 3 in [plan overfiew.](https://bugs.openjdk.org/browse/JDK-8340093)
src/hotspot/share/opto/vtransform.cpp line 933:
> 931: phase->register_new_node(vn, apply_state.vloop().cl());
> 932: phase->igvn()._worklist.push(vn);
> 933: VectorNode::trace_new_vector(vn, "AutoVectorization");
Removing the argument here allows us yet another removal of dependency on the old scalar graph. We only needed it for using the same control as the old graph - but that is not necessary, we can just use the CountedLoop as control, which is good enough.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/27056#discussion_r2325037712
PR Review Comment: https://git.openjdk.org/jdk/pull/27056#discussion_r2325051629
PR Review Comment: https://git.openjdk.org/jdk/pull/27056#discussion_r2325054470
PR Review Comment: https://git.openjdk.org/jdk/pull/27056#discussion_r2325058939
PR Review Comment: https://git.openjdk.org/jdk/pull/27056#discussion_r2325065437
PR Review Comment: https://git.openjdk.org/jdk/pull/27056#discussion_r2325066973
PR Review Comment: https://git.openjdk.org/jdk/pull/27056#discussion_r2325068600
PR Review Comment: https://git.openjdk.org/jdk/pull/27056#discussion_r2325076487
PR Review Comment: https://git.openjdk.org/jdk/pull/27056#discussion_r2325077256
PR Review Comment: https://git.openjdk.org/jdk/pull/27056#discussion_r2325079910
More information about the hotspot-compiler-dev
mailing list