RFR: 8325520: Vector loads with offsets incorrectly compiled

Thu Apr 25 08:18:30 UTC 2024

On Tue, 23 Apr 2024 10:28:57 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> The load does not succeed in `MemNode::can_see_stored_value`, because the load's `store_Opcode() == Op_StoreVector`, and not `Op_StoreVectorMasked`. But if you were to implement `LoadVectorMasked::store_Opcode() const { return Op_StoreVectorMasked; }`, then you have to be careful: A masked load does not necessarily return the same as the masked store's input value. That input value is not yet masked, but the loaded value needs to be masked.
> 
> But it seems to me that you can actually never have a successful `MemNode::can_see_stored_value` case for masked operations, with your current code. It would always fail the `store_Opcode() == st->Opcode()` check. And for that gives the correct result, but it is still a bit strange that we don't override the `store_Opcode` for the masked/offset vector stores.
> 
> I don't know which way you want to go now. I these options:
> 
> * Keep disallowing masked load/store "look-throughs".
>   
>   * Do that by having the "incorrect" `store_Opcode` as now. The downside is that the "offset only" case does not manage to do the look-through, even though that would be correct.
>   * OR: have the correct `store_Opcode`, which allows the look-through for the "offset only" case. But then explicitly check for the masked cases, and disallow those.
> * Implement a special look-through, where you apply the mask with some blend/select/masked operation on the store input value, which simulates the masked load (i.e. you need to put zeros where the mask is off).
> 
> Not sure if this is all very clear, feel free to ask.

Yep it is actually very clear, thanks a lot @eme64! Actually your example is with masks, but there is a similar problem with offsets as well. Think of what happens if the offset array is shorter than the length of the array or if there are duplicate indices in the offset array: the result of a load-store or a store-load sequence doesn’t produce the same original array or vector.
So actually to avoid folding in those 2 cases (load->store and store->load) for now I made specific `store_Opcode` implementations that are never the same as a Store opcode, so that the `store_Opcode() == st->Opcode()` and `val->as_Load()->store_Opcode() == Opcode()` is always false (I’ve used the opcode of the `LoadNode` itself but possibly there is a better solution).
As you hint one could also think of a solution involving substituting load/store masked vector nodes with “simple” vector mask nodes and load/store gather/scatter vector nodes with “simple” vector shuffle nodes (if these exist) but I’m not sure how much this would bring and anyway for this issue I would keep the fix simple. What do  you think?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18347#issuecomment-2076627767