RFR: 8282555: Missing memory edge when spilling MoveF2I, MoveD2L etc
Emanuel Peter
duke at openjdk.java.net
Mon May 2 18:44:42 UTC 2022
On Mon, 21 Mar 2022 22:38:39 GMT, Dean Long <dlong at openjdk.org> wrote:
>> Update:
>> After the inputs from @jatin-bhateja, and verifying with @vnkozlov and @TobiHartmann , I have implemented a much simpler fix:
>> Whenever there is no pre-allocated space before the inputs for the memory edge, we simply add the memory edge after the inputs.
>>
>> This is a bit of an ad-hoc fix, but it is much simpler than the other two options. Changing the `.ad` files requires much more work. Adding `stackSlot` to `MatchNode::needs_ideal_memory_edge` would also be an ad-hoc fix.
>>
>> The added test still fails with other changes in mainline, and passes with my new fix. Ran it 50 times to verify.
>> Ran larger test suite, all passed.
>>
>> ---------
>>
>> In `PhaseChaitin::fixup_spills` we decide if we need a memory edge when reading from a spilled register.
>> Unfortunately, for `MoveF2I`, `MoveD2L` etc we do not add such memory edges.
>> This can lead to reversed scheduling, where we read from a `stackSlot` before we wrote to it, leading to wrong results.
>> (This happens intermittently, but the regression test did reproduce it at about a 10% rate)
>>
>> In `PhaseChaitin::fixup_spills` we decided if such a memory edge needs to be added by comparing `oper_input_base()` of the node before spilling and after spilling. If `oper_input_base()` of the `mach` node (before spilling) is 1, this means that node does not have a memory edge yet. And if `oper_input_base()` of the `cisc` node (after spilling) is 2, this means it needs a memory edge. In all spill cases I could find, the value is 1 and 2 respectively, except for MoveF2I etc, there it is 1 and 1 respectively, thus the memory edge was omitted.
>>
>> The values of `oper_input_base()` are determined in `InstructForm::oper_input_base`, where we query `MatchNode::needs_ideal_memory_edge`. This function checks if there is an `_opType` in the recursive match structure of this mach node, that matches one of a list of nodes (`StoreI, StoreF, ... LoadI, StoreF, ... etc.`). Unfortunately, MoveF2I etc do not have such a match. Instead of a `StoreF/LoadF`, they used `stackSlotF` (which is not recognized in `MatchNode::needs_ideal_memory_edge`). So it thinks there is no need for a memory edge.
>>
>> We saw 2 options to fix this issue:
>> 1) add `stackSlotI/L/P/D/F` to `MatchNode::needs_ideal_memory_edge`. However, this seems to be an inconsistent solution. The other items in that list are nodes, `stackSlot` is not. And other operations (like `addI, testI, etc.`) all use `LoadI/StoreI`, which is more generic (for heap and stack).
>> 2) Change the arguments and match rules to not use `stackSlot`, but `memory` arguments and `LoadI/StoreI` nodes. This is a consistent and more generic solution (the MoveF2I operation could now be used not just for stack spilling but also reading/writing from/to memory).
>>
>> I picked option 2.
>> Further, I now assume that we can always add such a memory edge when reading from a spilled register. This assumption did not get violated in my more extensive testing.
>>
>> While the regression test only failed about 10% due to this bug, the assert I added verifying that we add these memory edges did trigger 100% before I applied the fix in x86_64.ad. This means the memory edge was missing every time, just the scheduling varied and we were lucky most of the time.
>>
>> There are a few open points for discussion:
>>
>> - `loadSSI/L/P/F/D` still uses `stackSlot`. I have never observed that this operation gets its register spilled. But I still wonder if we should not have this operation use `memory/LoadI` instead of `stackSlotI`. I think we might even be able to simply remove `loadSSI` because it is already covered by what `loadI` does. (Update: Tests suggest `loadSSX` can be removed from `x86_64.ad`)
>> - So far I have only applied my fix to `x86_64.ad` -> we probably want to apply it to all platforms.
>> - Other platforms use `stackSlot` more often, for example in `x86_32.ad`. It may well be that some of these operation could also be spilled, which would probably also lead to missing memory edges. I wonder if we should maybe remove all occurances of `stackSlot` in the `ad` files, or if we should still add `stackSlot` to `MatchNode::needs_ideal_memory_edge` to ensure we alway can add the memory edges.
>
> I wish I better understood the CISC spilling feature, and why we have special treatment for stack slots. If you can rewrite the rules using Loads, does that mean we can get rid of all special stackSlot and sReg logic?
>
> Does InstructForm::needs_anti_dependence_check() come into play at all?
@dean-long it seems like I can at least remove all `stackSlotX` occurances from `x86_64.ad`, and the tests seem to still pass. That is of course no guarantee, I do not know if the tests sufficiently cover this area of the code.
On other platforms this may be more involved, I don't know if special logic may be necessary.
@dean-long
Investigating where `stackSlot` and `sReg` is named / relevant:
of course in the ad files.
I'm 99% sure they can be removed from `x86_64.ad`.
With the other platforms I'm not sure if they need some special handling. But I think there is a good chance it could work to just use `memory` instead of `stackSlot`.
And then in the rest of the code. Would be a bit of work to remove it all, but if it does not do anything other than what we can handle with other code, then why keep it around?
`./src/hotspot/share/opto/machnode.hpp`
// Parameters needed to support MEMORY_INTERFACE access to stackSlot
virtual int disp (PhaseRegAlloc *ra_, const Node *node, int idx) const;
`MachOper::disp` --> `return 0x00`
only used in `ad_x86_peephole.cpp` , lines 130 loadLNode::peephole and 63 loadINode::peephole .
Ah, maybe this is for the memory offset calculation. The displacement?
`./src/hotspot/share/adlc/archDesc.cpp`
in `ArchDesc::initBaseOpTypes`
// !!!!! Update - when adding a new sReg/stackSlot type
// Create operand types "sReg[IPFDL]" for stack slot registers
opForm = constructOperand("sRegI", false);
opForm->_constraint = new Constraint("ALLOC_IN_RC", "stack_slots");
opForm = constructOperand("sRegP", false);
opForm->_constraint = new Constraint("ALLOC_IN_RC", "stack_slots");
opForm = constructOperand("sRegF", false);
opForm->_constraint = new Constraint("ALLOC_IN_RC", "stack_slots");
opForm = constructOperand("sRegD", false);
opForm->_constraint = new Constraint("ALLOC_IN_RC", "stack_slots");
opForm = constructOperand("sRegL", false);
opForm->_constraint = new Constraint("ALLOC_IN_RC", "stack_slots");
Defining the sReg operand types
`./src/hotspot/share/adlc/output_c.cpp`
in `OutputReduceOp::map`
// operand stackSlot does not have a match rule, but produces a stackSlot
if( oper.is_user_name_for_sReg() != Form::none ) reduce = oper.reduce_result();
could get rid off, is special casing for sReg
else if( _operand->is_user_name_for_sReg() != Form::none ) {
// The only non-constant allowed access to disp is an operand sRegX in a stackSlotX
assert( op->ideal_to_sReg_type(type) != Form::none, "StackSlots access displacements using 'sRegs'");
_may_reloc = false;
} else {
assert( false, "fatal(); Only stackSlots can access a non-constant using 'disp'");
}
also seems special casing for sReg. The assert in the else case would have to say that only constants can be accessed using 'disp'.
`./src/hotspot/share/adlc/formssel.cpp`
`OperandForm::is_user_name_for_sReg`
maps `_ident` from eg `stackSlotI` to ideal types, eg `Form::idealI`
int MatchRule::is_ideal_copy() const {
if (is_chain_rule(_AD.globalNames()) &&
_lChild && strncmp(_lChild->_opType, "stackSlot", 9) == 0) {
return 1;
}
return 0;
}
probably matches all `stackSlotX`. Hmm let's see where this is used, seems specific.
but I think this is just a parallel case to `is_ideal_load` which would apply for `LoadI` etc, where `is_ideal_copy` seems to be for stackSlot.
---- follow up search for `sRegX` :
`./src/hotspot/cpu/x86/x86_64.ad`:`operand stackSlotI(sRegI reg)`
definition of stackSlotX in ad files (same in other ad files)
`./src/hotspot/share/adlc/archDesc.cpp`: `if (strcmp(rootOp,"sRegI")==0) continue;`
in `ArchDesc::inspectOperands `: special casing them out of some logic -> idk understand this does ???
`./src/hotspot/share/adlc/archDesc.cpp`: `opForm = constructOperand("sRegI", false);`
defining the sReg operand types (see above already)
`./src/hotspot/share/adlc/forms.cpp`: `if (strcmp(name,"sRegI")==0) return Form::idealI;`
in `Form::ideal_to_sReg_type` -> see below for usage
------ follow up `Form::ideal_to_sReg_type`
`./src/hotspot/share/adlc/output_c.cpp`
special case, can eliminate -> generates special code, probably other path leads to same result (with is_base_register)
`./src/hotspot/share/adlc/output_h.cpp`:
same
`./src/hotspot/share/adlc/formssel.cpp`:
same
`./src/hotspot/share/adlc/forms.cpp`:
definition
------ follow up `is_user_name_for_sReg`
`./src/hotspot/share/adlc/output_c.cpp`
is all special caseing, I think it can be removed
`./src/hotspot/share/adlc/output_h.cpp`
same (edited)
@dean-long
Of course the question is if we should really remove all this logic and specification in ad files under this bug.
Maybe we can do something minimal now, and file separate RFEs to:
1) remove stackSlot from each platform's ad file -> RFE per platform
2) once stackSlot does not occur in any ad file, we should be able to remove the logic from the code
In this we may have to pay attention if there is a performance regression. Not sure if this could happen in this case, but we'd have to make sure anyway (thanks @TobiHartmann for bringing this up).
-------------
PR: https://git.openjdk.java.net/jdk/pull/7889
More information about the hotspot-compiler-dev
mailing list