RFR: 8358521: Optimize vector operations by reassociating broadcasted inputs [v4]
Jatin Bhateja
jbhateja at openjdk.org
Wed Feb 25 11:24:00 UTC 2026
On Tue, 24 Feb 2026 08:59:27 GMT, Emanuel Peter <epeter at openjdk.org> wrote:
>> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
>>
>> Review comments resolutions
>
> src/hotspot/share/opto/subnode.hpp line 524:
>
>> 522: if (c != nullptr) {
>> 523: C->add_expensive_node(this);
>> 524: }
>
> Can you explain a bit more about this change? What is the flag used for, and why did you need to change it?
Comment above it mentions the reason, We don't attach control inputs while creating SqrtVD node in VectorNode::make, similar check is added for SqrtF before calling add_expensive_node.
https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/subnode.hpp#L539
While converting SqrtVD with broadcasted input to SqrtD we don't receive any control input. is_expensive called from add_expensive_node expects a control input.
> src/hotspot/share/opto/vectornode.cpp line 1363:
>
>> 1361: if (sop) {
>> 1362: sop = phase->transform(sop);
>> 1363: return new ReplicateNode(sop, vect_type());
>
> Suppose we have an `add` operation on a `byte` vector. The operations are truncated. But the `AddI` operation would overflow the `byte` range. Does the `Replicate` node ensure that truncation?
Yes, replicate will only read lower 8 / 16 bits of integral result.
> src/hotspot/share/opto/vectornode.cpp line 1363:
>
>> 1361: if (sop) {
>> 1362: sop = phase->transform(sop);
>> 1363: return new ReplicateNode(sop, vect_type());
>
> How sure are you that the truncating vector operations are correctly optimized with the non-truncating scalar operations? Is it guaranteed that `Replicate` does the truncation?
Replicate will always pick exact sized value.
> src/hotspot/share/opto/vectornode.cpp line 2104:
>
>> 2102: return VectorNode::degenerate_vector_rotate(in(1), in(2), true, vlen, bt, phase);
>> 2103: }
>> 2104: return VectorNode::Ideal(phase, can_reshape);
>
> Would it not make sense to do the `Ideal` before the possible degeneration? Just in case we can optimize away the rotation, by pushing it through the broadcast?
Reverting for now, we are not handling Rotates currently.
> test/hotspot/jtreg/compiler/vectorapi/TestVectorBroadcastReassociations.java line 60:
>
>> 58: .add(IntVector.broadcast(ISP, ib))
>> 59: .reduceLanes(VectorOperators.ADD);
>> 60: }
>
> Why do you have the `VECTOR_SIZE_ANY`? Do we not get a replicate with the maximal size?
>
> I don't think it is wise to take an ADD reduction here, because that may also produce a ADD_I node. And then you're not sure if the IR rule finds that one or the one generated from the broadcast optimization.
Since I am using SPECIES_PREFERRED" Double and Float vector lane wise operations are 128-bit wide at UseAVX=1 for sake of lane size compatibility with integral vector.
> test/hotspot/jtreg/compiler/vectorapi/TestVectorBroadcastReassociations.java line 136:
>
>> 134: * ======================= */
>> 135:
>> 136: static final VectorSpecies<Long> LSP = LongVector.SPECIES_PREFERRED;
>
> Why don't you have the logical operators for long tested below?
Done
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/25617#discussion_r2851743023
PR Review Comment: https://git.openjdk.org/jdk/pull/25617#discussion_r2851743785
PR Review Comment: https://git.openjdk.org/jdk/pull/25617#discussion_r2851746255
PR Review Comment: https://git.openjdk.org/jdk/pull/25617#discussion_r2851743928
PR Review Comment: https://git.openjdk.org/jdk/pull/25617#discussion_r2851744539
PR Review Comment: https://git.openjdk.org/jdk/pull/25617#discussion_r2851745355
More information about the core-libs-dev
mailing list