RFR: 8282555: Missing memory edge when spilling MoveF2I, MoveD2L etc [v5]
Emanuel Peter
duke at openjdk.java.net
Wed May 4 07:34:02 UTC 2022
> Update:
> After the inputs from @jatin-bhateja, and verifying with @vnkozlov and @TobiHartmann , I have implemented a much simpler fix:
> Whenever there is no pre-allocated space before the inputs for the memory edge, we simply add the memory edge after the inputs.
>
> This is a bit of an ad-hoc fix, but it is much simpler than the other two options. Changing the `.ad` files requires much more work. Adding `stackSlot` to `MatchNode::needs_ideal_memory_edge` would also be an ad-hoc fix.
>
> The added test still fails with other changes in mainline, and passes with my new fix. Ran it 50 times to verify.
> Ran larger test suite, all passed.
>
> ---------
>
> In `PhaseChaitin::fixup_spills` we decide if we need a memory edge when reading from a spilled register.
> Unfortunately, for `MoveF2I`, `MoveD2L` etc we do not add such memory edges.
> This can lead to reversed scheduling, where we read from a `stackSlot` before we wrote to it, leading to wrong results.
> (This happens intermittently, but the regression test did reproduce it at about a 10% rate)
>
> In `PhaseChaitin::fixup_spills` we decided if such a memory edge needs to be added by comparing `oper_input_base()` of the node before spilling and after spilling. If `oper_input_base()` of the `mach` node (before spilling) is 1, this means that node does not have a memory edge yet. And if `oper_input_base()` of the `cisc` node (after spilling) is 2, this means it needs a memory edge. In all spill cases I could find, the value is 1 and 2 respectively, except for MoveF2I etc, there it is 1 and 1 respectively, thus the memory edge was omitted.
>
> The values of `oper_input_base()` are determined in `InstructForm::oper_input_base`, where we query `MatchNode::needs_ideal_memory_edge`. This function checks if there is an `_opType` in the recursive match structure of this mach node, that matches one of a list of nodes (`StoreI, StoreF, ... LoadI, StoreF, ... etc.`). Unfortunately, MoveF2I etc do not have such a match. Instead of a `StoreF/LoadF`, they used `stackSlotF` (which is not recognized in `MatchNode::needs_ideal_memory_edge`). So it thinks there is no need for a memory edge.
>
> We saw 2 options to fix this issue:
> 1) add `stackSlotI/L/P/D/F` to `MatchNode::needs_ideal_memory_edge`. However, this seems to be an inconsistent solution. The other items in that list are nodes, `stackSlot` is not. And other operations (like `addI, testI, etc.`) all use `LoadI/StoreI`, which is more generic (for heap and stack).
> 2) Change the arguments and match rules to not use `stackSlot`, but `memory` arguments and `LoadI/StoreI` nodes. This is a consistent and more generic solution (the MoveF2I operation could now be used not just for stack spilling but also reading/writing from/to memory).
>
> I picked option 2.
> Further, I now assume that we can always add such a memory edge when reading from a spilled register. This assumption did not get violated in my more extensive testing.
>
> While the regression test only failed about 10% due to this bug, the assert I added verifying that we add these memory edges did trigger 100% before I applied the fix in x86_64.ad. This means the memory edge was missing every time, just the scheduling varied and we were lucky most of the time.
>
> There are a few open points for discussion:
>
> - `loadSSI/L/P/F/D` still uses `stackSlot`. I have never observed that this operation gets its register spilled. But I still wonder if we should not have this operation use `memory/LoadI` instead of `stackSlotI`. I think we might even be able to simply remove `loadSSI` because it is already covered by what `loadI` does. (Update: Tests suggest `loadSSX` can be removed from `x86_64.ad`)
> - So far I have only applied my fix to `x86_64.ad` -> we probably want to apply it to all platforms.
> - Other platforms use `stackSlot` more often, for example in `x86_32.ad`. It may well be that some of these operation could also be spilled, which would probably also lead to missing memory edges. I wonder if we should maybe remove all occurances of `stackSlot` in the `ad` files, or if we should still add `stackSlot` to `MatchNode::needs_ideal_memory_edge` to ensure we alway can add the memory edges.
Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
compare ptr with nullptr, not zero
-------------
Changes:
- all: https://git.openjdk.java.net/jdk/pull/7889/files
- new: https://git.openjdk.java.net/jdk/pull/7889/files/664b63f1..2a7ebce6
Webrevs:
- full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7889&range=04
- incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7889&range=03-04
Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
Patch: https://git.openjdk.java.net/jdk/pull/7889.diff
Fetch: git fetch https://git.openjdk.java.net/jdk pull/7889/head:pull/7889
PR: https://git.openjdk.java.net/jdk/pull/7889
More information about the hotspot-compiler-dev
mailing list