RFR: 8358521: Optimize vector operations by reassociating broadcasted inputs
Xiaohong Gong
xgong at openjdk.org
Tue Feb 10 02:42:40 UTC 2026
On Tue, 3 Jun 2025 14:54:16 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:
> Hi all,
>
> This patch optimizes SIMD kernels making heavy use of broadcasted inputs through following reassociating ideal transformations.
>
>
> VectorOperation (VectorBroadcast INP1, VectorBroadcast INP2) =>
> VectorBroadcast (ScalarOpration INP1, INP2)
>
> VectorOperation (VectorBroadcast INP1) (VectorOperation (VectorBroadcast INP2) INP3) =>
> VectorOperation INP3 (VectorOperation (VectorBroadcast INP1) (VectorOperation INP2))
>
>
> The idea is to push broadcasts across the vector operation and replace the vector with an equivalent, cheaper scalar variant. Currently, patch handles most common vector operations.
>
> Following are the performance number of benchmark included with this patch on latest generation x86 targets:-
>
> **AMD Turin (2.1GHz)**
> <img width="1122" height="355" alt="image" src="https://github.com/user-attachments/assets/3f5087bf-0e14-4c56-b0c2-3d23253bad54" />
>
> **Intel Granite Rapids (2.1GHz)**
> <img width="1105" height="325" alt="image" src="https://github.com/user-attachments/assets/c8481f86-4db2-4c4e-bd65-51542c59fe63" />
>
>
>
> Kindly review and share your feedback.
>
> Best Regards,
> Jatin
Good work @jatin-bhateja ! I left some comments below. Thanks!
src/hotspot/share/opto/vectornode.cpp line 390:
> 388: case T_BOOLEAN:
> 389: case T_CHAR:
> 390: assert(!enable_assertions, "boolean and char are signed, not implemented for Min");
Suggestion:
assert(!enable_assertions, "boolean and char are unsigned, not implemented for Min");
src/hotspot/share/opto/vectornode.cpp line 411:
> 409: case T_BOOLEAN:
> 410: case T_CHAR:
> 411: assert(!enable_assertions, "boolean and char are signed, not implemented for Max");
ditto
src/hotspot/share/opto/vectornode.cpp line 1317:
> 1315: }
> 1316: return nullptr;
> 1317: }
Reassociation can be a separate optimization for vector nodes to me. Do you think it's better to split it as a separate change following the broadcast optimization? We can add specific tests for it.
src/hotspot/share/opto/vectornode.cpp line 1358:
> 1356: return new ReplicateNode(sop, vect_type());
> 1357: }
> 1358: }
Is it better to wrap this transformation as a method?
test/hotspot/jtreg/compiler/vectorapi/TestVectorBroadcastReassociations.java line 136:
> 134: * ======================= */
> 135:
> 136: static final VectorSpecies<Long> LSP = LongVector.SPECIES_256;
Why not using the `SPECIES_PREFERRED` instead like the int species? The max vector size for AArch64 NEON (asimd) is 128-bit. We have to add another condition `MaxVectorSize >= 32` for following IR tests if using `SPECIES_256`.
-------------
PR Review: https://git.openjdk.org/jdk/pull/25617#pullrequestreview-3776329907
PR Review Comment: https://git.openjdk.org/jdk/pull/25617#discussion_r2785411161
PR Review Comment: https://git.openjdk.org/jdk/pull/25617#discussion_r2785421927
PR Review Comment: https://git.openjdk.org/jdk/pull/25617#discussion_r2785464918
PR Review Comment: https://git.openjdk.org/jdk/pull/25617#discussion_r2785469004
PR Review Comment: https://git.openjdk.org/jdk/pull/25617#discussion_r2785485050
More information about the core-libs-dev
mailing list