RFR: 8319690: [AArch64] C2 compilation hits offset_ok_for_immed: assert "c2 compiler bug" [v3]
Fei Gao
fgao at openjdk.org
Fri May 31 15:17:13 UTC 2024
On Wed, 29 May 2024 18:41:40 GMT, Dean Long <dlong at openjdk.org> wrote:
>> Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision:
>>
>> - Add the assertion back and merge matchrules with a better predicate
>> - Merge branch 'master' into fg8319690
>> - Remove unused immIOffset/immLOffset
>> - Merge branch 'master' into fg8319690
>> - 8319690: [AArch64] C2 compilation hits offset_ok_for_immed: assert "c2 compiler bug"
>>
>> On LP64 systems, if the heap can be moved into low virtual
>> address space (below 4GB) and the heap size is smaller than the
>> interesting threshold of 4 GB, we can use unscaled decoding
>> pattern for narrow klass decoding. It means that a generic field
>> reference can be decoded by:
>> ```
>> cast<64> (32-bit compressed reference) + field_offset
>> ```
>>
>> When the `field_offset` is an immediate, on aarch64 platform, the
>> unscaled decoding pattern can match perfectly with a direct
>> addressing mode, i.e., `base_plus_offset`, supported by LDR/STR
>> instructions. But for certain data width, not all immediates can
>> be encoded in the instruction field of LDR/STR[1]. The ranges are
>> different as data widths vary.
>>
>> For example, when we try to load a value of long type at offset of
>> `1030`, the address expression is `(AddP (DecodeN base) 1030)`.
>> Before the patch, the expression was matching with
>> `operand indOffIN()`. But, for 64-bit LDR/STR, signed immediate
>> byte offset must be in the range -256 to 255 or positive immediate
>> byte offset must be a multiple of 8 in the range 0 to 32760[2].
>> `1030` can't be encoded in the instruction field. So, after
>> matching, when we do checking for instruction encoding, the
>> assertion would fail.
>>
>> In this patch, we're going to filter out invalid immediates
>> when deciding if current addressing mode can be matched as
>> `base_plus_offset`. We introduce `indOffIN4/indOffLN4` and
>> `indOffIN8/indOffLN8` for 32-bit data type and 64-bit data
>> type separately in the patch. E.g., for `memory4`, we remove
>> the generic `indOffIN/indOffLN`, which matches wrong unscaled
>> immediate range, and replace them with `indOffIN4/indOffLN4`
>> instead.
>>
>> Since 8-bit and 16-bit LDR/STR instructions also support the
>> unscaled decoding pattern, we add the addressing mode in the
>>...
>
> src/hotspot/cpu/aarch64/aarch64.ad line 5193:
>
>> 5191: constraint(ALLOC_IN_RC(ptr_reg));
>> 5192: match(AddP reg off);
>> 5193: match(AddP (DecodeN regn) off);
>
> I'm surprised this works. If we match on "DecodeN regn", is it really safe to use $reg instead?
Thanks for your review, @dean-long .
Yes, based on the current implementation of our ADL compiler, even if we match on "DecodeN regn", using `$reg` is safe and perhaps even must.
When ADLC is parsing operand interface from `indOffIX`, it always fetches useful information from the **first** match rule `match(AddP reg off)` and does not care about others, even though we have multiple match rules.
See https://github.com/openjdk/jdk/blob/1e04ee6d57d5fe84e1d202b16e8d13dc13c002ff/src/hotspot/share/adlc/formssel.cpp#L2461 and https://github.com/openjdk/jdk/blob/1e04ee6d57d5fe84e1d202b16e8d13dc13c002ff/src/hotspot/share/adlc/output_c.cpp#L3025.
It searches `reg` in `match(AddP reg off);` and finds that `reg` is the `first` one in all components, which is like `regn` is the `first` in `match(AddP (DecodeN regn) off);`. Then it concludes that the **first** operand starting from `oper_input_base()` is the base address input. In the stage of `emit()`, the node structure has been reduced into like:
Load === ctrl mem reg val
Load === ctrl mem regn val
`off` is saved on Operand field.
The final JVM code will be shown as:
void loadLNode::emit(C2_MacroAssembler* masm, PhaseRegAlloc* ra_) const {
// Start at oper_input_base() and count operands
unsigned idx0 = 2;
unsigned idx1 = 2; // mem
{
#line 2914 "/home/feigao02/chelsea/jdk_src/src/hotspot/cpu/aarch64/aarch64.ad"
Register dst_reg = as_Register(opnd_array(0)->reg(ra_,this)/* dst */);
loadStore(masm, &MacroAssembler::ldr, dst_reg, opnd_array(1)->opcode(),
as_Register(opnd_array(1)->base(ra_,this,idx1)), opnd_array(1)->index(ra_,this,idx1), opnd_array(1)->scale(), opnd_array(1)->disp(ra_,this,idx1), 8);
#line 999999
}
}
virtual int base(PhaseRegAlloc *ra_, const Node *node, int idx) const {
// Replacement variable: reg
return (int)ra_->get_encode(node->in(idx));
}
To be honest, `$reg` here is a little confusing but, IMO, it may represent a relative index. WDYT? Thanks.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/16991#discussion_r1622527758
More information about the hotspot-compiler-dev
mailing list