RFR: 8369448: C2 SuperWord: refactor VTransform to do move_unordered_reduction_out_of_loop during VTransforrm::optimize

Wed Oct 8 23:06:22 UTC 2025

On Wed, 8 Oct 2025 19:42:38 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR:
> https://github.com/openjdk/jdk/pull/20964
> [See plan overfiew.](https://bugs.openjdk.org/browse/JDK-8340093)
> 
> This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier.
> 
> This should be the last one before Cost Modeling, which will enable us to vectorize more reductions 😊 
> 
> --------------------------
> 
> **Goal:** we need to do the `move_reduction_out_of_loop` already during auto vectorization, and not after. This will allow us to cost-model the loop after the expensive reduction nodes are removed from the loop in a following RFE.
> 
> **Details**
> Reduction nodes are expensive, and require many instructions in the backend. In some cases, this means that the vectorized reduction is more expensive than the scalar reduction. We would have to find other operations to vectorize, so that the instruction count goes down sufficiently. There are cases where the reduction is not profitable before `move_reduction_out_of_loop`, but profitable after.
> 
> Since we now modify `VTransformNode`s during `VTransform::optimize` (think of it as the IGVN for `VTransform`), some nodes can become dead, and so we need to take care of that with `is_alive`. And we must only schedule alive nodes, others may not have a coherent state.
> 
> **Future Work**
> - Cost Modeling [JDK-8340093](https://bugs.openjdk.org/browse/JDK-8340093)
> - Other optimizations that lower the cost of the vectorized loop, and enable vectorization to be profitable.

src/hotspot/share/opto/loopnode.cpp line 5298:

> 5296:       }
> 5297:     }
> 5298:   }

Note: instead of performing the optimization after auto vectorization, we now perform it during auto vectorization.

src/hotspot/share/opto/loopopts.cpp line 4607:

> 4605: // reordering of operations (for example float addition/multiplication require
> 4606: // strict order).
> 4607: void PhaseIdealLoop::move_unordered_reduction_out_of_loop(IdealLoopTree* loop) {

Note: moved to `VTransformReductionVectorNode::optimize_move_non_strict_order_reductions_out_of_loop`

src/hotspot/share/opto/vectornode.cpp line 297:

> 295: // Return the scalar opcode for the specified vector opcode
> 296: // and basic type.
> 297: int VectorNode::scalar_opcode(int sopc, BasicType bt) {

Note: no longer needed. We used to have to go back from vectorized reduction to scalar op to get the corresponding element-wise accumulation instruction. Now that we move the reduction out of the loop during auto vectorization, we still have access to the scalar node.

src/hotspot/share/opto/vectornode.cpp line 1615:

> 1613: }
> 1614: 
> 1615: bool ReductionNode::auto_vectorization_requires_strict_order(int vopc) {

Note: we need to know which ones we can move out of the loop, and we can only do that with those that do not require strict order.

src/hotspot/share/opto/vtransform.cpp line 43:

> 41:   )
> 42: 
> 43: void VTransformGraph::optimize(VTransform& vtransform) {

Note: this is similar to IGVN optimization. But we are a bit lazy, and don't care about notifiation / worklist.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27704#discussion_r2415178920
PR Review Comment: https://git.openjdk.org/jdk/pull/27704#discussion_r2415178137
PR Review Comment: https://git.openjdk.org/jdk/pull/27704#discussion_r2415180713
PR Review Comment: https://git.openjdk.org/jdk/pull/27704#discussion_r2415181617
PR Review Comment: https://git.openjdk.org/jdk/pull/27704#discussion_r2415182559