RFR: 8286941: Add mask IR for partial vector operations for ARM SVE [v9]

Emanuel Peter epeter at openjdk.org
Fri Nov 28 07:23:14 UTC 2025


On Fri, 28 Nov 2025 07:19:15 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> If `vector_needs_partial_operations` returns true, then the original `LoadVectorNode` is either transformed to a `LoadVectorMaskedNode` or `nullptr`. So it seems `LoadNode::Ideal` is not called if `try_to_gen_masked_vector` returns `nullptr` and some optmizations are missing? That would be an issue.
>
> @XiaohongGong Yes, I was able to find a simple reproducer.
> 
> 
> // java -Xbatch -XX:CompileCommand=compileonly,Test*::test -XX:CompileCommand=printcompilation,Test*::test -XX:+PrintIdeal TestOptimizeLoadVector.java
> 
> import jdk.incubator.vector.VectorSpecies;
> import jdk.incubator.vector.IntVector;
> 
> public class Test1 {
> 
>     static final VectorSpecies<Integer> SPECIES =
>                 IntVector.SPECIES_256;
> 
>     static void test(int[] a) {
>         // The LOAD below can be optimized away, and be replaced by the value of v1:
>         // LoadVectorNode::Ideal calls LoadNode::Ideal, which looks at the memory
>         // input and skips and independent stores, finding a store that matches the
>         // exact location. And this store stores the value of v1, so we can replace
>         // the LOAD, and just use v1 directly. Hence, the example below should have
>         // Only a single load, and 3 stores.
>         // HOWEVER: if we somehow exit too early in LoadVectorNode::Ideal, we may
>         // never reach LoadNode::Ideal and miss the optimization.
>         // This happens on aarch64 SVE with 256bits, when we return true for
>         // Matcher::vector_needs_partial_operations, but then do nothing when calling
>         // VectorNode::try_to_gen_masked_vector. We just return nullptr instantly,
>         // rather than trying the other optimizations that LoadNode::Ideal has to
>         // offer.
>         IntVector v1 = IntVector.fromArray(SPECIES, a, 0 * SPECIES.length());
>         v1.intoArray(a, 1 * SPECIES.length()); // STORE of v1
>         v1.intoArray(a, 2 * SPECIES.length()); // independent STORE - no overlap with STORE above and LOAD below.
>         IntVector v2 = IntVector.fromArray(SPECIES, a, 1 * SPECIES.length()); // LOAD - is it replaced with v1?
>         v2.intoArray(a, 3 * SPECIES.length());
>     }
> 
>     public static void main(String[] args) {
>         int[] a = new int[1000];
>         for (int i = 0; i < 10_000; i++) {
>             test(a);
> 	}
>     }
> }
> 
> 
> I'll see if we can do similar things for the other cases.

We can continue the conversation in:
https://bugs.openjdk.org/browse/JDK-8371603

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/9037#discussion_r2570674757


More information about the hotspot-compiler-dev mailing list