RFR: 8325520: Vector loads with offsets incorrectly compiled

Emanuel Peter epeter at openjdk.org
Tue Apr 23 10:31:31 UTC 2024


On Mon, 18 Mar 2024 12:20:34 GMT, Damon Fenacci <dfenacci at openjdk.org> wrote:

> # Issue
> When loading multiple vectors using offsets or masks (e.g. `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, offsets, 0` or `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, longMask)`) there is an error in the C2 compiled code that makes different vectors be treated as equal even though they are not.
> 
> # Causes
> On vector-capable platforms, vector loads with masks and offsets (for Long, Integer, Float and Double) create specific nodes in the ideal graph (i.e. `LoadVectorGather`, `LoadVectorMasked`, `LoadVectorGatherMasked`). Vector loads without mask or offsets are mapped as `LoadVector` nodes instead.
> While running GVN loops, we can get to the situation in which there are multiple loads from the same address. If successive loads are deemed to be identical to the first one, they might get folded, which is what happens in the problematic examples of this issue. The check for identity happens in
> https://github.com/openjdk/jdk/blob/481c866df87c693a90a1da20e131e5654b084ddd/src/hotspot/share/opto/memnode.cpp#L1253
> This version of `Identity` is defined in `LoadNode` but it is also the one used by the subclasses `LoadVectorGather`, `LoadVectorMasked` and `LoadVectorGatherMasked`. Although this definition of `Identity` is enough for `LoadVector` nodes it is not sufficient for `LoadVectorGather`, `LoadVectorMasked` and `LoadVectorGatherMasked` ones, as the value being loaded also depends on the offsets and mask (different offsets and/or masks load completely different values). This is the reason why these nodes get folded even if they shouldn't.
> 
> # Solution
> `LoadVectorGather`, `LoadVectorMasked` and `LoadVectorGatherMasked` need their own version of the `Identity` method, which specialize `LoadVector::Identity` by restricting the results to nodes that also have the same offsets and masks.
> 
> The same issue exists for _StoreVector_ nodes (i.e. `StoreVectorScatter`, `StoreVectorMasked` and `StoreVectorScatterMasked`). So, `Identity` has to be redefined there as well.
> 
> Regression tests for all versions of `Load/StoreVectorGather/Masked` have been added too.

I've randomly thought about this change again (while trying to sleep yesterday...). And started worrying about this kind of case:


import jdk.incubator.vector.*;

public class Test {
    private static final VectorSpecies<Integer> I_SPECIES = IntVector.SPECIES_MAX;

    public static void main(String[] args) {
        // create mask
        boolean[] intMask = new boolean[I_SPECIES.length()];
        for (int i = 0; i < intMask.length; i++) {
            intMask[i] = (i % 2 == 0);
        }
        int[] intArray1 = new int[I_SPECIES.length()];
        int[] intArray2 = new int[I_SPECIES.length()];
        int[] intArray3 = new int[I_SPECIES.length()];
        for (int i = 0; i < intArray1.length; i++) {
            intArray1[i] = i;
            intArray2[i] = -2 * i;
            intArray3[i] = 0;
        }

        for (int i = 0; i < 10_000; i++) {
            test1(intMask, intArray1, intArray2, intArray3);
        }

        for (int i = 0; i < intArray1.length; i++) {
            System.out.println("i: " + i + " " + intArray1[i] + " " +intArray2[i] + " " + intArray3[i]);
        }
    }

    static void test1(boolean[] intMask, int[] intArray1, int[] intArray2, int[] intArray3) {
        VectorMask<Integer> intVectorMask = VectorMask.fromArray(I_SPECIES, intMask, 0);

        // Load values:    0   1   2   3   4   5 ...
        IntVector a = IntVector.fromArray(I_SPECIES, intArray1, 0);

        // Store, but only every second value:
	// Store:          0   x   2   x   4   x ...
	// Already there:  0  -2  -4  -6  -8 -10 ...
	// Result:         0  -2   2  -6   4 -10 ...
        a.intoArray(intArray2, 0, intVectorMask);

        // Load, but we cannot just take the value from vector a, since we mask it, and where the mask is
	// off, we must have zero.
	// Load:           0   0   2   0   4   0 ...
        IntVector b = IntVector.fromArray(I_SPECIES, intArray2, 0, intVectorMask);
        b.intoArray(intArray3, 0);
    }
}


This is not a bug yet, I think, but it could be if you do the change I suggested with `store_Opcode()`.
Currently, if we have:

 2035  StoreVectorMasked  === 934 7 2031 1979 1956  |893  [[ 2034 2041 2066 ]]  @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact[0] *, idx=7; mismatched  Memory: @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact[0] *, idx=7; !jvms: IntVector::intoArray0Template @ bci:50 (line 3556) IntMaxVector::intoArray0 @ bci:9 (line 911) IntVector::intoArray @ bci:53 (line 3269) Test::test1 @ bci:26 (line 34)
 2041  LoadVectorMasked  === 934 2035 2031 1956  |893  [[ 1767 2066 1750 ]]  @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact[0] *, idx=7; mismatched #vectorz[16]:{int} !jvms: IntVector::fromArray0Template @ bci:53 (line 3455) IntMaxVector::fromArray0 @ bci:11 (line 874) IntVector::fromArray @ bci:32 (line 2986) Test::test1 @ bci:36 (line 35)

The load does not succeed in  `MemNode::can_see_stored_value`, because the load's `store_Opcode() == Op_StoreVector`, and not `Op_StoreVectorMasked`. But if you were to implement `LoadVectorMasked::store_Opcode() const { return Op_StoreVectorMasked; }`, then you have to be careful:
A masked load does not necessarily return the same as the masked store's input value. That input value is not yet masked, but the loaded value needs to be masked.

But it seems to me that you can actually never have a successful `MemNode::can_see_stored_value` case for masked operations, with your current code. It would always fail the `store_Opcode() == st->Opcode()` check. And for that gives the correct result, but it is still a bit strange that we don't override the `store_Opcode` for the masked/offset vector stores.

I don't know which way you want to go now. I these options:
- Keep disallowing masked load/store "look-throughs".
  - Do that by having the "incorrect" `store_Opcode` as now. The downside is that the "offset only" case does not manage to do the look-through, even though that would be correct.
  - OR: have the correct `store_Opcode`, which allows the look-through for the "offset only" case. But then explicitly check for the masked cases, and disallow those.
- Implement a special look-through, where you apply the mask with some blend/select/masked operation on the store input value, which simulates the masked load (i.e. you need to put zeros where the mask is off).

Not sure if this is all very clear, feel free to ask.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18347#issuecomment-2071956396


More information about the hotspot-compiler-dev mailing list