RFR: 8282555: Missing memory edge when spilling MoveF2I, MoveD2L etc [v3]
Tobias Hartmann
thartmann at openjdk.java.net
Tue May 3 07:33:14 UTC 2022
On Tue, 3 May 2022 07:30:12 GMT, Emanuel Peter <duke at openjdk.java.net> wrote:
>> Update:
>> After the inputs from @jatin-bhateja, and verifying with @vnkozlov and @TobiHartmann , I have implemented a much simpler fix:
>> Whenever there is no pre-allocated space before the inputs for the memory edge, we simply add the memory edge after the inputs.
>>
>> This is a bit of an ad-hoc fix, but it is much simpler than the other two options. Changing the `.ad` files requires much more work. Adding `stackSlot` to `MatchNode::needs_ideal_memory_edge` would also be an ad-hoc fix.
>>
>> The added test still fails with other changes in mainline, and passes with my new fix. Ran it 50 times to verify.
>> Ran larger test suite, all passed.
>>
>> ---------
>>
>> In `PhaseChaitin::fixup_spills` we decide if we need a memory edge when reading from a spilled register.
>> Unfortunately, for `MoveF2I`, `MoveD2L` etc we do not add such memory edges.
>> This can lead to reversed scheduling, where we read from a `stackSlot` before we wrote to it, leading to wrong results.
>> (This happens intermittently, but the regression test did reproduce it at about a 10% rate)
>>
>> In `PhaseChaitin::fixup_spills` we decided if such a memory edge needs to be added by comparing `oper_input_base()` of the node before spilling and after spilling. If `oper_input_base()` of the `mach` node (before spilling) is 1, this means that node does not have a memory edge yet. And if `oper_input_base()` of the `cisc` node (after spilling) is 2, this means it needs a memory edge. In all spill cases I could find, the value is 1 and 2 respectively, except for MoveF2I etc, there it is 1 and 1 respectively, thus the memory edge was omitted.
>>
>> The values of `oper_input_base()` are determined in `InstructForm::oper_input_base`, where we query `MatchNode::needs_ideal_memory_edge`. This function checks if there is an `_opType` in the recursive match structure of this mach node, that matches one of a list of nodes (`StoreI, StoreF, ... LoadI, StoreF, ... etc.`). Unfortunately, MoveF2I etc do not have such a match. Instead of a `StoreF/LoadF`, they used `stackSlotF` (which is not recognized in `MatchNode::needs_ideal_memory_edge`). So it thinks there is no need for a memory edge.
>>
>> We saw 2 options to fix this issue:
>> 1) add `stackSlotI/L/P/D/F` to `MatchNode::needs_ideal_memory_edge`. However, this seems to be an inconsistent solution. The other items in that list are nodes, `stackSlot` is not. And other operations (like `addI, testI, etc.`) all use `LoadI/StoreI`, which is more generic (for heap and stack).
>> 2) Change the arguments and match rules to not use `stackSlot`, but `memory` arguments and `LoadI/StoreI` nodes. This is a consistent and more generic solution (the MoveF2I operation could now be used not just for stack spilling but also reading/writing from/to memory).
>>
>> I picked option 2.
>> Further, I now assume that we can always add such a memory edge when reading from a spilled register. This assumption did not get violated in my more extensive testing.
>>
>> While the regression test only failed about 10% due to this bug, the assert I added verifying that we add these memory edges did trigger 100% before I applied the fix in x86_64.ad. This means the memory edge was missing every time, just the scheduling varied and we were lucky most of the time.
>>
>> There are a few open points for discussion:
>>
>> - `loadSSI/L/P/F/D` still uses `stackSlot`. I have never observed that this operation gets its register spilled. But I still wonder if we should not have this operation use `memory/LoadI` instead of `stackSlotI`. I think we might even be able to simply remove `loadSSI` because it is already covered by what `loadI` does. (Update: Tests suggest `loadSSX` can be removed from `x86_64.ad`)
>> - So far I have only applied my fix to `x86_64.ad` -> we probably want to apply it to all platforms.
>> - Other platforms use `stackSlot` more often, for example in `x86_32.ad`. It may well be that some of these operation could also be spilled, which would probably also lead to missing memory edges. I wonder if we should maybe remove all occurances of `stackSlot` in the `ad` files, or if we should still add `stackSlot` to `MatchNode::needs_ideal_memory_edge` to ensure we alway can add the memory edges.
>
> Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision:
>
> - Adapted comments again
> - compacted comment, added test without -Djdk.test.lib.random.seed=42
Marked as reviewed by thartmann (Reviewer).
-------------
PR: https://git.openjdk.java.net/jdk/pull/7889
More information about the hotspot-compiler-dev
mailing list