RFR: 8369448: C2 SuperWord: refactor VTransform to do move_unordered_reduction_out_of_loop during VTransform::optimize [v2]

Christian Hagedorn chagedorn at openjdk.org
Mon Oct 13 09:13:13 UTC 2025


On Thu, 9 Oct 2025 21:46:36 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> I'm working on cost-modeling, and am integrating some smaller changes from this proof-of-concept PR:
>> https://github.com/openjdk/jdk/pull/20964
>> [See plan overfiew.](https://bugs.openjdk.org/browse/JDK-8340093)
>> 
>> This is a pure refactoring - no change in behaviour. I'm presenting it like this because it will make reviews easier.
>> 
>> This should be the last one before Cost Modeling, which will enable us to vectorize more reductions 😊 
>> 
>> --------------------------
>> 
>> **Goal:** we need to do the `move_reduction_out_of_loop` already during auto vectorization, and not after. This will allow us to cost-model the loop after the expensive reduction nodes are removed from the loop in a following RFE.
>> 
>> **Details**
>> Reduction nodes are expensive, and require many instructions in the backend. In some cases, this means that the vectorized reduction is more expensive than the scalar reduction. We would have to find other operations to vectorize, so that the instruction count goes down sufficiently. There are cases where the reduction is not profitable before `move_reduction_out_of_loop`, but profitable after.
>> 
>> Since we now modify `VTransformNode`s during `VTransform::optimize` (think of it as the IGVN for `VTransform`), some nodes can become dead, and so we need to take care of that with `is_alive`. And we must only schedule alive nodes, others may not have a coherent state.
>> 
>> **Future Work**
>> - Cost Modeling [JDK-8340093](https://bugs.openjdk.org/browse/JDK-8340093)
>> - Other optimizations that lower the cost of the vectorized loop, and enable vectorization to be profitable.
>
> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
> 
>   For Vladimir K7

Nice refactoring! Some small comments, otherwise, it looks good to me, too!

src/hotspot/share/opto/vtransform.cpp line 46:

> 44:   TRACE_OPTIMIZE( tty->print_cr("\nVTransformGraph::optimize"); )
> 45: 
> 46:   while (true) {

Could we also just do `while (progress)`? You always seem to check `!progress` at the very end of the loop.

src/hotspot/share/opto/vtransform.cpp line 97:

> 95: 
> 96:   collect_nodes_without_strong_in_edges(stack);
> 97:   int num_alive_nodes = count_alive_vtnodes();

Suggestion:

  const int num_alive_nodes = count_alive_vtnodes();

src/hotspot/share/opto/vtransform.cpp line 1070:

> 1068: //       outside the loop, and instead cheaper element-wise vector accumulations
> 1069: //       are performed inside the loop.
> 1070: bool VTransformReductionVectorNode::optimize_move_non_strict_order_reductions_out_of_loop(const VLoopAnalyzer& vloop_analyzer, VTransform& vtransform) {

Any particular reason you chose the additional `optimize` prefix? I think the intent is already clear without it.

src/hotspot/share/opto/vtransform.cpp line 1074:

> 1072:   uint vlen    = vector_length();
> 1073:   BasicType bt = element_basic_type();
> 1074:   int ropc     = vector_reduction_opcode();

Can probably be made `const` for good measure:

Suggestion:

  const int sopc     = scalar_opcode();
  const uint vlen    = vector_length();
  const BasicType bt = element_basic_type();
  const int ropc     = vector_reduction_opcode();


And could they also be moved down to the definition of `vopc`?

src/hotspot/share/opto/vtransform.cpp line 1113:

> 1111:   VTransformReductionVectorNode* last_red    = phi->in_req(2)->isa_ReductionVector();
> 1112:   VTransformReductionVectorNode* current_red = last_red;
> 1113:   while (true) {

The method is already quite big. IIUC, this only does some checking and we do not need to bookkeep for further down. Therefore, I suggest to extract this to a "is_looping_back_to_phi" method or something like that.

src/hotspot/share/opto/vtransform.cpp line 1175:

> 1173:   // Create a vector of identity values.
> 1174:   Node* identity = ReductionNode::make_identity_con_scalar(phase->igvn(), sopc, bt);
> 1175:   phase->set_ctrl(identity, phase->C->root());

Any particular reason why you are no longer using `set_root_as_ctrl()`?

-------------

PR Review: https://git.openjdk.org/jdk/pull/27704#pullrequestreview-3330490894
PR Review Comment: https://git.openjdk.org/jdk/pull/27704#discussion_r2425592796
PR Review Comment: https://git.openjdk.org/jdk/pull/27704#discussion_r2425602123
PR Review Comment: https://git.openjdk.org/jdk/pull/27704#discussion_r2425609143
PR Review Comment: https://git.openjdk.org/jdk/pull/27704#discussion_r2425611531
PR Review Comment: https://git.openjdk.org/jdk/pull/27704#discussion_r2425640049
PR Review Comment: https://git.openjdk.org/jdk/pull/27704#discussion_r2425644744


More information about the hotspot-compiler-dev mailing list