RFR: 8286941: Add mask IR for partial vector operations for ARM SVE [v9]

Xiaohong Gong xgong at openjdk.org
Fri Nov 28 07:41:09 UTC 2025


On Fri, 28 Nov 2025 07:19:45 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> @XiaohongGong Yes, I was able to find a simple reproducer.
>> 
>> 
>> // java -Xbatch -XX:CompileCommand=compileonly,Test*::test -XX:CompileCommand=printcompilation,Test*::test -XX:+PrintIdeal TestOptimizeLoadVector.java
>> 
>> import jdk.incubator.vector.VectorSpecies;
>> import jdk.incubator.vector.IntVector;
>> 
>> public class Test1 {
>> 
>>     static final VectorSpecies<Integer> SPECIES =
>>                 IntVector.SPECIES_256;
>> 
>>     static void test(int[] a) {
>>         // The LOAD below can be optimized away, and be replaced by the value of v1:
>>         // LoadVectorNode::Ideal calls LoadNode::Ideal, which looks at the memory
>>         // input and skips and independent stores, finding a store that matches the
>>         // exact location. And this store stores the value of v1, so we can replace
>>         // the LOAD, and just use v1 directly. Hence, the example below should have
>>         // Only a single load, and 3 stores.
>>         // HOWEVER: if we somehow exit too early in LoadVectorNode::Ideal, we may
>>         // never reach LoadNode::Ideal and miss the optimization.
>>         // This happens on aarch64 SVE with 256bits, when we return true for
>>         // Matcher::vector_needs_partial_operations, but then do nothing when calling
>>         // VectorNode::try_to_gen_masked_vector. We just return nullptr instantly,
>>         // rather than trying the other optimizations that LoadNode::Ideal has to
>>         // offer.
>>         IntVector v1 = IntVector.fromArray(SPECIES, a, 0 * SPECIES.length());
>>         v1.intoArray(a, 1 * SPECIES.length()); // STORE of v1
>>         v1.intoArray(a, 2 * SPECIES.length()); // independent STORE - no overlap with STORE above and LOAD below.
>>         IntVector v2 = IntVector.fromArray(SPECIES, a, 1 * SPECIES.length()); // LOAD - is it replaced with v1?
>>         v2.intoArray(a, 3 * SPECIES.length());
>>     }
>> 
>>     public static void main(String[] args) {
>>         int[] a = new int[1000];
>>         for (int i = 0; i < 10_000; i++) {
>>             test(a);
>> 	}
>>     }
>> }
>> 
>> 
>> I'll see if we can do similar things for the other cases.
>
> We can continue the conversation in:
> https://bugs.openjdk.org/browse/JDK-8371603

> Yes, exactly. I think instead of:
> 
> ```
> return VectorNode::try_to_gen_masked_vector(phase, this, vt);
> ```
> 
> we should do
> 
> ```
> Node* progress = VectorNode::try_to_gen_masked_vector(phase, this, vt);
> if (progress != nullptr) { return progress; }
> ```
> 
> That should be correct, right?

Yes, I think so.

> Of course the naming here is a bit confusing, and suggests that this may not be correct. Because `vector_needs_partial_operations` would suggest we _always_ need to do partial operations. And so then we would expect that `try_to_gen_masked_vector` would have to _always_ succeed. And so maybe that is why the reviewers did not think that we should continue with `LoadNode::Ideal` if it fails, I suppose? So I think the names should be changed to `maybe_vector_needs_partial_operations` and `transform_to_partial_vector_if_needed`, or similar. What do you think?

`vector_needs_partial_operations` is used to check whether we need consider the partial vector lanes for the specified op. The partial vector lanes means the vector size of the op is lower than the MaxVectorSize, hence the higher lanes should be ignored. 

I agree with your point.  I'v created a JBS to fix it and will create a patch soon. Thank you so much for the debugging, testing and any input!

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/9037#discussion_r2570704667


More information about the hotspot-compiler-dev mailing list