RFR: 8282555: Missing memory edge when spilling MoveF2I, MoveD2L etc
Dean Long
dlong at openjdk.java.net
Mon May 2 18:44:40 UTC 2022
On Mon, 21 Mar 2022 11:02:35 GMT, Emanuel Peter <duke at openjdk.java.net> wrote:
> Update:
> After the inputs from @jatin-bhateja, and verifying with @vnkozlov and @TobiHartmann , I have implemented a much simpler fix:
> Whenever there is no pre-allocated space before the inputs for the memory edge, we simply add the memory edge after the inputs.
>
> This is a bit of an ad-hoc fix, but it is much simpler than the other two options. Changing the `.ad` files requires much more work. Adding `stackSlot` to `MatchNode::needs_ideal_memory_edge` would also be an ad-hoc fix.
>
> The added test still fails with other changes in mainline, and passes with my new fix. Ran it 50 times to verify.
> Ran larger test suite, all passed.
>
> ---------
>
> In `PhaseChaitin::fixup_spills` we decide if we need a memory edge when reading from a spilled register.
> Unfortunately, for `MoveF2I`, `MoveD2L` etc we do not add such memory edges.
> This can lead to reversed scheduling, where we read from a `stackSlot` before we wrote to it, leading to wrong results.
> (This happens intermittently, but the regression test did reproduce it at about a 10% rate)
>
> In `PhaseChaitin::fixup_spills` we decided if such a memory edge needs to be added by comparing `oper_input_base()` of the node before spilling and after spilling. If `oper_input_base()` of the `mach` node (before spilling) is 1, this means that node does not have a memory edge yet. And if `oper_input_base()` of the `cisc` node (after spilling) is 2, this means it needs a memory edge. In all spill cases I could find, the value is 1 and 2 respectively, except for MoveF2I etc, there it is 1 and 1 respectively, thus the memory edge was omitted.
>
> The values of `oper_input_base()` are determined in `InstructForm::oper_input_base`, where we query `MatchNode::needs_ideal_memory_edge`. This function checks if there is an `_opType` in the recursive match structure of this mach node, that matches one of a list of nodes (`StoreI, StoreF, ... LoadI, StoreF, ... etc.`). Unfortunately, MoveF2I etc do not have such a match. Instead of a `StoreF/LoadF`, they used `stackSlotF` (which is not recognized in `MatchNode::needs_ideal_memory_edge`). So it thinks there is no need for a memory edge.
>
> We saw 2 options to fix this issue:
> 1) add `stackSlotI/L/P/D/F` to `MatchNode::needs_ideal_memory_edge`. However, this seems to be an inconsistent solution. The other items in that list are nodes, `stackSlot` is not. And other operations (like `addI, testI, etc.`) all use `LoadI/StoreI`, which is more generic (for heap and stack).
> 2) Change the arguments and match rules to not use `stackSlot`, but `memory` arguments and `LoadI/StoreI` nodes. This is a consistent and more generic solution (the MoveF2I operation could now be used not just for stack spilling but also reading/writing from/to memory).
>
> I picked option 2.
> Further, I now assume that we can always add such a memory edge when reading from a spilled register. This assumption did not get violated in my more extensive testing.
>
> While the regression test only failed about 10% due to this bug, the assert I added verifying that we add these memory edges did trigger 100% before I applied the fix in x86_64.ad. This means the memory edge was missing every time, just the scheduling varied and we were lucky most of the time.
>
> There are a few open points for discussion:
>
> - `loadSSI/L/P/F/D` still uses `stackSlot`. I have never observed that this operation gets its register spilled. But I still wonder if we should not have this operation use `memory/LoadI` instead of `stackSlotI`. I think we might even be able to simply remove `loadSSI` because it is already covered by what `loadI` does. (Update: Tests suggest `loadSSX` can be removed from `x86_64.ad`)
> - So far I have only applied my fix to `x86_64.ad` -> we probably want to apply it to all platforms.
> - Other platforms use `stackSlot` more often, for example in `x86_32.ad`. It may well be that some of these operation could also be spilled, which would probably also lead to missing memory edges. I wonder if we should maybe remove all occurances of `stackSlot` in the `ad` files, or if we should still add `stackSlot` to `MatchNode::needs_ideal_memory_edge` to ensure we alway can add the memory edges.
I wish I better understood the CISC spilling feature, and why we have special treatment for stack slots. If you can rewrite the rules using Loads, does that mean we can get rid of all special stackSlot and sReg logic?
Does InstructForm::needs_anti_dependence_check() come into play at all?
-------------
PR: https://git.openjdk.java.net/jdk/pull/7889
More information about the hotspot-compiler-dev
mailing list