RFR: 8356760: VectorAPI: Optimize VectorMask.fromLong for all-true/all-false cases [v3]

Jatin Bhateja jbhateja at openjdk.org
Fri Jul 4 12:03:39 UTC 2025


On Fri, 4 Jul 2025 10:53:55 GMT, erifan <duke at openjdk.org> wrote:

>> src/hotspot/share/opto/vectorIntrinsics.cpp line 707:
>> 
>>> 705:     elem_bt = converted_elem_bt;
>>> 706:     bits = gvn().longcon((bits_type->get_con() & 1L) == 0L ? 0L : -1L);
>>> 707:   } else if (!arch_supports_vector(opc, num_elem, elem_bt, checkFlags, true /*has_scalar_args*/)) {
>> 
>> I think it's appropriate to make this change as part of VectorLongToMaskNode::Ideal routine to give the opportunity for this transformation during the Iterative GVN pass.
>
> Originally I also tried to implement it in IGVN, but later changed it to Intrinsic. For two reasons:
> 
> 1. Implementing in intrinsic is relatively simpler and has better performance because it saves the process of generating `VectorLongToMaskNode`.
> 2. Implementing in intrinsic can support more cases. Because some architectures (such as aarch64 `NEON`) currently do not support the generation of `VectorLongToMaskNode,` but support `MaskAll` or `Replicate` nodes, if implemented in IGVN, then this optimization doesn't work for NEON. But implementing in Intrinsic can cover such cases.

Hi @erifan ,
A few follow-up queries

>> Implementing in intrinsic is relatively simpler and has better performance because it saves the process of generating VectorLongToMaskNode. 

What if during iterative GVN a constant -1 seeps through IR graph and gets connected to the input of VectorLongToMaskNode, you won't be able to create maskAll true in that case?

>> Implementing intrinsic can support more cases. Because some architectures (such as aarch64 NEON) currently do not support the generation of VectorLongToMaskNode, but support MaskAll or Replicate nodes, if implemented in IGVN, then this optimization doesn't work for NEON. But implementing in Intrinsic can cover such cases.

Do you see any advantage of doing this at intrinsic layer over entirely handling it in Java implimentation by simply modifying the opcode of fromBitsCoerced to MODE_BROADCAST from existing MODE_BITS_COERCED_LONG_TO_MASK for 0 or -1 input.

https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMask.java#L243

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25793#discussion_r2185179706


More information about the hotspot-compiler-dev mailing list