RFR: 8282555: Missing memory edge when spilling MoveF2I, MoveD2L etc
Emanuel Peter
duke at openjdk.java.net
Mon May 2 18:44:48 UTC 2022
On Mon, 21 Mar 2022 11:02:35 GMT, Emanuel Peter <duke at openjdk.java.net> wrote:
> Update:
> After the inputs from @jatin-bhateja, and verifying with @vnkozlov and @TobiHartmann , I have implemented a much simpler fix:
> Whenever there is no pre-allocated space before the inputs for the memory edge, we simply add the memory edge after the inputs.
>
> This is a bit of an ad-hoc fix, but it is much simpler than the other two options. Changing the `.ad` files requires much more work. Adding `stackSlot` to `MatchNode::needs_ideal_memory_edge` would also be an ad-hoc fix.
>
> The added test still fails with other changes in mainline, and passes with my new fix. Ran it 50 times to verify.
> Ran larger test suite, all passed.
>
> ---------
>
> In `PhaseChaitin::fixup_spills` we decide if we need a memory edge when reading from a spilled register.
> Unfortunately, for `MoveF2I`, `MoveD2L` etc we do not add such memory edges.
> This can lead to reversed scheduling, where we read from a `stackSlot` before we wrote to it, leading to wrong results.
> (This happens intermittently, but the regression test did reproduce it at about a 10% rate)
>
> In `PhaseChaitin::fixup_spills` we decided if such a memory edge needs to be added by comparing `oper_input_base()` of the node before spilling and after spilling. If `oper_input_base()` of the `mach` node (before spilling) is 1, this means that node does not have a memory edge yet. And if `oper_input_base()` of the `cisc` node (after spilling) is 2, this means it needs a memory edge. In all spill cases I could find, the value is 1 and 2 respectively, except for MoveF2I etc, there it is 1 and 1 respectively, thus the memory edge was omitted.
>
> The values of `oper_input_base()` are determined in `InstructForm::oper_input_base`, where we query `MatchNode::needs_ideal_memory_edge`. This function checks if there is an `_opType` in the recursive match structure of this mach node, that matches one of a list of nodes (`StoreI, StoreF, ... LoadI, StoreF, ... etc.`). Unfortunately, MoveF2I etc do not have such a match. Instead of a `StoreF/LoadF`, they used `stackSlotF` (which is not recognized in `MatchNode::needs_ideal_memory_edge`). So it thinks there is no need for a memory edge.
>
> We saw 2 options to fix this issue:
> 1) add `stackSlotI/L/P/D/F` to `MatchNode::needs_ideal_memory_edge`. However, this seems to be an inconsistent solution. The other items in that list are nodes, `stackSlot` is not. And other operations (like `addI, testI, etc.`) all use `LoadI/StoreI`, which is more generic (for heap and stack).
> 2) Change the arguments and match rules to not use `stackSlot`, but `memory` arguments and `LoadI/StoreI` nodes. This is a consistent and more generic solution (the MoveF2I operation could now be used not just for stack spilling but also reading/writing from/to memory).
>
> I picked option 2.
> Further, I now assume that we can always add such a memory edge when reading from a spilled register. This assumption did not get violated in my more extensive testing.
>
> While the regression test only failed about 10% due to this bug, the assert I added verifying that we add these memory edges did trigger 100% before I applied the fix in x86_64.ad. This means the memory edge was missing every time, just the scheduling varied and we were lucky most of the time.
>
> There are a few open points for discussion:
>
> - `loadSSI/L/P/F/D` still uses `stackSlot`. I have never observed that this operation gets its register spilled. But I still wonder if we should not have this operation use `memory/LoadI` instead of `stackSlotI`. I think we might even be able to simply remove `loadSSI` because it is already covered by what `loadI` does. (Update: Tests suggest `loadSSX` can be removed from `x86_64.ad`)
> - So far I have only applied my fix to `x86_64.ad` -> we probably want to apply it to all platforms.
> - Other platforms use `stackSlot` more often, for example in `x86_32.ad`. It may well be that some of these operation could also be spilled, which would probably also lead to missing memory edges. I wonder if we should maybe remove all occurances of `stackSlot` in the `ad` files, or if we should still add `stackSlot` to `MatchNode::needs_ideal_memory_edge` to ensure we alway can add the memory edges.
I was out of the office, picking this back up again.
I am not sure I understand what you (@jatin-bhateja) are concretely suggesting. The simplest solution that follows what you are saying is this:
We add an `else` case after
https://github.com/openjdk/jdk/blob/cc598e03de39dd6e8d7e208a69d85b6a9cd0062f/src/hotspot/share/opto/chaitin.cpp#L1727-L1730
with `cisc->add_prec(src);` and a comment saying:
"In some rare cases, there is no space for a memory edge before the inputs. We always need a memory edge from src to cisc, else we might schedule cisc before src, loading from a spill location before storing the spill."
@TobiHartmann thought that maybe you agreed with my "Option 1" above, to explicitly add `stackSlotI/L/P/D/F` to `MatchNode::needs_ideal_memory_edge`. In the cases where we do not have a `Load/Store`, but only `stackSlot`, this would still match and add the required memory edge.
@jatin-bhateja @vnkozlov Both solutions seem ad-hoc to me, they are a bit ugly because they make a special case for a rare occasion. On the other hand, we avoid having to edit all the platform-specific code (which would be required if we were to eliminate `stackSlot` from `MoveF2I` etc, or remove `stackSlot` completely, if even possible).
-------------
PR: https://git.openjdk.java.net/jdk/pull/7889
More information about the hotspot-compiler-dev
mailing list