RFR: 8325520: Vector loads with offsets incorrectly compiled
Emanuel Peter
epeter at openjdk.org
Mon Apr 15 07:27:01 UTC 2024
On Mon, 18 Mar 2024 12:20:34 GMT, Damon Fenacci <dfenacci at openjdk.org> wrote:
> # Issue
> When loading multiple vectors using offsets or masks (e.g. `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, offsets, 0` or `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, longMask)`) there is an error in the C2 compiled code that makes different vectors be treated as equal even though they are not.
>
> # Causes
> On vector-capable platforms, vector loads with masks and offsets (for Long, Integer, Float and Double) create specific nodes in the ideal graph (i.e. `LoadVectorGather`, `LoadVectorMasked`, `LoadVectorGatherMasked`). Vector loads without mask or offsets are mapped as `LoadVector` nodes instead.
> While running GVN loops, we can get to the situation in which there are multiple loads from the same address. If successive loads are deemed to be identical to the first one, they might get folded, which is what happens in the problematic examples of this issue. The check for identity happens in
> https://github.com/openjdk/jdk/blob/481c866df87c693a90a1da20e131e5654b084ddd/src/hotspot/share/opto/memnode.cpp#L1253
> This version of `Identity` is defined in `LoadNode` but it is also the one used by the subclasses `LoadVectorGather`, `LoadVectorMasked` and `LoadVectorGatherMasked`. Although this definition of `Identity` is enough for `LoadVector` nodes it is not sufficient for `LoadVectorGather`, `LoadVectorMasked` and `LoadVectorGatherMasked` ones, as the value being loaded also depends on the offsets and mask (different offsets and/or masks load completely different values). This is the reason why these nodes get folded even if they shouldn't.
>
> # Solution
> `LoadVectorGather`, `LoadVectorMasked` and `LoadVectorGatherMasked` need their own version of the `Identity` method, which specialize `LoadVector::Identity` by restricting the results to nodes that also have the same offsets and masks.
>
> The same issue exists for _StoreVector_ nodes (i.e. `StoreVectorScatter`, `StoreVectorMasked` and `StoreVectorScatterMasked`). So, `Identity` has to be redefined there as well.
>
> Regression tests for all versions of `Load/StoreVectorGather/Masked` have been added too.
Can you please fix the whitespace issues, it is a bit difficult to read now on GitHub ;)
Issues with indentation
Looking at `LoadVectorGatherNode::Identity`, I see that you first call `LoadVectorNode::Identity`, which actually is not defined, so it goes back to `LoadNode::Identity`. So far so good.
This calls `MemNode::can_see_stored_value` to find a corresponding store which writes to the same address. And then it replaces our load with the input value of that store, so we can avoid the load.
It seems that your code would now disallow such a case, because you always check that the `Ideal` node you get back is of the same type as the `this` node. Am I right about that? Is that intended?
I think you checks should also not be done **after** we already create a new node, but ideally before we create the new node. That is the normal pattern I see everywhere. So then you would probably have to dig into `MemNode::can_see_stored_value` and other places, to see where you would need to do your additional checks. There are already some checks for vector-type there, so that would be one intuitive starting-point.
I see that `LoadVectorNode` has no additional slots, but then `LoadVectorMaskedNode` and `LoadVectorGatherMaskedNode` simply do `add_req`. Why not just compare these extra slots (as many as there are) in `LoadVectorNode::Identity`, as well as checking that their `Opcode` is the same? Would that not be simpler and accomplish the same?
src/hotspot/share/opto/vectornode.cpp line 1148:
> 1146: Node* StoreVectorMaskedNode::Identity(PhaseGVN* phase) {
> 1147: Node* value = StoreVectorNode::Identity(phase);
> 1148: if ((value != this) && (value->is_StoreVectorMasked()) && (in(MemNode::ValueIn + 1)->eqv_uncast(value->in(MemNode::ValueIn + 1)))) {
`MemNode::ValueIn + 1` is there no direct value for it? If not: maybe create one?
src/hotspot/share/opto/vectornode.cpp line 1166:
> 1164: if ((value != this) && (value->is_LoadVectorGatherMasked()) &&
> 1165: (in(MemNode::ValueIn)->eqv_uncast(value->in(MemNode::ValueIn))) &&
> 1166: (in(MemNode::OopStore)->eqv_uncast(value->in(MemNode::OopStore)))) {
Suggestion:
if ((value != this) && (value->is_LoadVectorGatherMasked()) &&
(in(MemNode::ValueIn)->eqv_uncast(value->in(MemNode::ValueIn))) &&
(in(MemNode::OopStore)->eqv_uncast(value->in(MemNode::OopStore)))) {
src/hotspot/share/opto/vectornode.cpp line 1176:
> 1174: if ((value != this) && (value->is_StoreVectorScatterMasked()) &&
> 1175: (in(MemNode::OopStore)->eqv_uncast(value->in(MemNode::OopStore))) &&
> 1176: (in(MemNode::OffsetsMask)->eqv_uncast(value->in(MemNode::OffsetsMask)))) {
Suggestion:
if ((value != this) && (value->is_StoreVectorScatterMasked()) &&
(in(MemNode::OopStore)->eqv_uncast(value->in(MemNode::OopStore))) &&
(in(MemNode::OffsetsMask)->eqv_uncast(value->in(MemNode::OffsetsMask)))) {
test/hotspot/jtreg/compiler/vectorapi/VectorGatherMaskFoldingTest.java line 43:
> 41: * @modules jdk.incubator.vector
> 42: *
> 43: * @run main/othervm -XX:UseAVX=3 compiler.vectorapi.VectorLoadGatherFoldingTest
I would also make sure that you don't require platform specific things here. This could also run on any other platform that supports the Vector API, right? And you will need at least one run without any flags.
test/hotspot/jtreg/compiler/vectorapi/VectorGatherMaskFoldingTest.java line 43:
> 41: * @modules jdk.incubator.vector
> 42: *
> 43: * @run main/othervm compiler.vectorapi.VectorGatherMaskFoldingTest
Suggestion:
* @run main compiler.vectorapi.VectorGatherMaskFoldingTest
I think you only need the `othervm` if you also use flags in the `@run` statement.
test/hotspot/jtreg/compiler/vectorapi/VectorGatherMaskFoldingTest.java line 46:
> 44: */
> 45:
> 46: public class VectorLoadGatherFoldingTest {
I assume you will also do a similar test for `Store`?
It would for example be nice to see IR tests that verify that some Stores/Loads are folded together (where it is ok), and others are not folded, because they have different masks.
Would it make sense to make this a IR test?
-------------
PR Review: https://git.openjdk.org/jdk/pull/18347#pullrequestreview-1951243369
Changes requested by epeter (Reviewer).
PR Review: https://git.openjdk.org/jdk/pull/18347#pullrequestreview-1962674696
PR Review: https://git.openjdk.org/jdk/pull/18347#pullrequestreview-1962719711
PR Comment: https://git.openjdk.org/jdk/pull/18347#issuecomment-2011673208
PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1533442547
PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1540697985
PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1540697640
PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1533465824
PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1540692865
PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1533464469
More information about the hotspot-compiler-dev
mailing list