RFR: 8358521: Optimize vector operations by reassociating broadcasted inputs [v3]

Xiaohong Gong xgong at openjdk.org
Mon Feb 23 07:07:13 UTC 2026


On Thu, 12 Feb 2026 05:18:36 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Hi all,
>> 
>> This patch optimizes SIMD kernels making heavy use of broadcasted inputs through following reassociating ideal transformations.
>> 
>> 
>>  VectorOperation (VectorBroadcast INP1,  VectorBroadcast INP2) => 
>>                             VectorBroadcast (ScalarOpration INP1, INP2)
>> 
>>  VectorOperation (VectorBroadcast INP1) (VectorOperation (VectorBroadcast INP2) INP3) => 
>>                              VectorOperation INP3 (VectorOperation (VectorBroadcast INP1) (VectorOperation INP2))
>> 
>> 
>> The idea is to push broadcasts across the vector operation and replace the vector with an equivalent, cheaper scalar variant.  Currently, patch handles most common vector operations.
>> 
>> Following are the performance number of benchmark included with this patch on latest generation x86 targets:- 
>> 
>> **AMD Turin (2.1GHz)**
>> <img width="1122" height="355" alt="image" src="https://github.com/user-attachments/assets/3f5087bf-0e14-4c56-b0c2-3d23253bad54" />
>> 
>> **Intel Granite Rapids (2.1GHz)**
>> <img width="1105" height="325" alt="image" src="https://github.com/user-attachments/assets/c8481f86-4db2-4c4e-bd65-51542c59fe63" />
>> 
>> 
>> 
>> Kindly review and share your feedback.
>> 
>> Best Regards,
>> Jatin
>
> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Review comments resolution

src/hotspot/share/opto/vectornode.cpp line 1193:

> 1191: }
> 1192: 
> 1193: bool VectorNode::can_push_broadcasts_across_vector_operation(BasicType bt) {

Better to add a comment for this method?

src/hotspot/share/opto/vectornode.cpp line 1331:

> 1329:       return create_reassociated_node(this, in(1), in(2), in1_2, in1_1, phase);
> 1330:     }
> 1331:   }

These two parts are duplicated. How about merging the code like:

Suggestion:

  Node* in1 = in(1);
  Node* in2 = in(2);
 
  // Swap broadcast operation to left to make the following reassociation simpler
  if (in2->Opcode() == Op_Replicate) {
    swap(in1, in2);
  }
 
  if (in1->Opcode() == Op_Replicate && in2->Opcode() == Opcode()) {
    ...
  }

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25617#discussion_r2839339938
PR Review Comment: https://git.openjdk.org/jdk/pull/25617#discussion_r2839336882


More information about the core-libs-dev mailing list