RFR: 8286941: Add mask IR for partial vector operations for ARM SVE [v9]

Emanuel Peter epeter at openjdk.org
Fri Nov 28 07:23:13 UTC 2025


On Fri, 28 Nov 2025 01:33:44 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:

>> src/hotspot/share/opto/vectornode.cpp line 996:
>> 
>>> 994:   }
>>> 995:   return LoadNode::Ideal(phase, can_reshape);
>>> 996: }
>> 
>> @XiaohongGong Extremely late review 😉 
>> 
>> Does this not prevent us from doing the `LoadNode::Ideal` optimizations for the cases where `vector_needs_partial_operations` returns true?
>> 
>> See also: https://bugs.openjdk.org/browse/JDK-8371603
>
> If `vector_needs_partial_operations` returns true, then the original `LoadVectorNode` is either transformed to a `LoadVectorMaskedNode` or `nullptr`. So it seems `LoadNode::Ideal` is not called if `try_to_gen_masked_vector` returns `nullptr` and some optmizations are missing? That would be an issue.

@XiaohongGong Yes, I was able to find a simple reproducer.


// java -Xbatch -XX:CompileCommand=compileonly,Test*::test -XX:CompileCommand=printcompilation,Test*::test -XX:+PrintIdeal TestOptimizeLoadVector.java

import jdk.incubator.vector.VectorSpecies;
import jdk.incubator.vector.IntVector;

public class Test1 {

    static final VectorSpecies<Integer> SPECIES =
                IntVector.SPECIES_256;

    static void test(int[] a) {
        // The LOAD below can be optimized away, and be replaced by the value of v1:
        // LoadVectorNode::Ideal calls LoadNode::Ideal, which looks at the memory
        // input and skips and independent stores, finding a store that matches the
        // exact location. And this store stores the value of v1, so we can replace
        // the LOAD, and just use v1 directly. Hence, the example below should have
        // Only a single load, and 3 stores.
        // HOWEVER: if we somehow exit too early in LoadVectorNode::Ideal, we may
        // never reach LoadNode::Ideal and miss the optimization.
        // This happens on aarch64 SVE with 256bits, when we return true for
        // Matcher::vector_needs_partial_operations, but then do nothing when calling
        // VectorNode::try_to_gen_masked_vector. We just return nullptr instantly,
        // rather than trying the other optimizations that LoadNode::Ideal has to
        // offer.
        IntVector v1 = IntVector.fromArray(SPECIES, a, 0 * SPECIES.length());
        v1.intoArray(a, 1 * SPECIES.length()); // STORE of v1
        v1.intoArray(a, 2 * SPECIES.length()); // independent STORE - no overlap with STORE above and LOAD below.
        IntVector v2 = IntVector.fromArray(SPECIES, a, 1 * SPECIES.length()); // LOAD - is it replaced with v1?
        v2.intoArray(a, 3 * SPECIES.length());
    }

    public static void main(String[] args) {
        int[] a = new int[1000];
        for (int i = 0; i < 10_000; i++) {
            test(a);
	}
    }
}


I'll see if we can do similar things for the other cases.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/9037#discussion_r2570673883


More information about the hotspot-compiler-dev mailing list