RFR: 8319690: [AArch64] C2 compilation hits offset_ok_for_immed: assert "c2 compiler bug" [v2]
Andrew Haley
aph at openjdk.org
Fri Dec 8 08:44:19 UTC 2023
On Thu, 7 Dec 2023 14:52:31 GMT, Andrew Haley <aph at openjdk.org> wrote:
>> Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:
>>
>> - Remove unused immIOffset/immLOffset
>> - Merge branch 'master' into fg8319690
>> - 8319690: [AArch64] C2 compilation hits offset_ok_for_immed: assert "c2 compiler bug"
>>
>> On LP64 systems, if the heap can be moved into low virtual
>> address space (below 4GB) and the heap size is smaller than the
>> interesting threshold of 4 GB, we can use unscaled decoding
>> pattern for narrow klass decoding. It means that a generic field
>> reference can be decoded by:
>> ```
>> cast<64> (32-bit compressed reference) + field_offset
>> ```
>>
>> When the `field_offset` is an immediate, on aarch64 platform, the
>> unscaled decoding pattern can match perfectly with a direct
>> addressing mode, i.e., `base_plus_offset`, supported by LDR/STR
>> instructions. But for certain data width, not all immediates can
>> be encoded in the instruction field of LDR/STR[1]. The ranges are
>> different as data widths vary.
>>
>> For example, when we try to load a value of long type at offset of
>> `1030`, the address expression is `(AddP (DecodeN base) 1030)`.
>> Before the patch, the expression was matching with
>> `operand indOffIN()`. But, for 64-bit LDR/STR, signed immediate
>> byte offset must be in the range -256 to 255 or positive immediate
>> byte offset must be a multiple of 8 in the range 0 to 32760[2].
>> `1030` can't be encoded in the instruction field. So, after
>> matching, when we do checking for instruction encoding, the
>> assertion would fail.
>>
>> In this patch, we're going to filter out invalid immediates
>> when deciding if current addressing mode can be matched as
>> `base_plus_offset`. We introduce `indOffIN4/indOffLN4` and
>> `indOffIN8/indOffLN8` for 32-bit data type and 64-bit data
>> type separately in the patch. E.g., for `memory4`, we remove
>> the generic `indOffIN/indOffLN`, which matches wrong unscaled
>> immediate range, and replace them with `indOffIN4/indOffLN4`
>> instead.
>>
>> Since 8-bit and 16-bit LDR/STR instructions also support the
>> unscaled decoding pattern, we add the addressing mode in the
>> lists of `memory1` and `memory2` by introducing
>> `indOffIN1/indOffLN1` and `indOffIN2/indOffLN2`.
>>
>> ...
>
> I think this patch is excessive for the problem and introduces a lot of code dupiication. Maybe it would be simpler, smaller, and faster to check for what we need:
>
>
> diff --git a/src/hotspot/cpu/aarch64/aarch64.ad b/src/hotspot/cpu/aarch64/aarch64.ad
> index 233f9b6af7c..ea842912ce9 100644
> --- a/src/hotspot/cpu/aarch64/aarch64.ad
> +++ b/src/hotspot/cpu/aarch64/aarch64.ad
> @@ -5911,7 +5911,8 @@ operand indIndexN(iRegN reg, iRegL lreg)
>
> operand indOffIN(iRegN reg, immIOffset off)
> %{
> - predicate(CompressedOops::shift() == 0);
> + predicate(CompressedOops::shift() == 0
> + && Address::offset_ok_for_immed(n->in(3)->find_int_con(min_jint), exact_log2(sizeof(jint))));
> constraint(ALLOC_IN_RC(ptr_reg));
> match(AddP (DecodeN reg) off);
> op_cost(0);
> @@ -5926,7 +5927,8 @@ operand indOffIN(iRegN reg, immIOffset off)
>
> operand indOffLN(iRegN reg, immLoffset off)
> %{
> - predicate(CompressedOops::shift() == 0);
> + predicate(CompressedOops::shift() == 0
> + && Address::offset_ok_for_immed(n->in(3)->find_long_con(min_jint), exact_log2(sizeof(jlong))));
> constraint(ALLOC_IN_RC(ptr_reg));
> match(AddP (DecodeN reg) off);
> op_cost(0);
> @theRealAph , your patch only works if when `indOffIN` is used in `memory4` and `indOffLN` is used in `memory8`, right? Introducing new operands like `indOffIN4` is consistent with how the code currently works with `indOffI4`.
Yes, it is, but clearly that does not scale, leading to a great profusion of operand kinds.
> In fact I think the new `indOffIN<size>` could be folded into the existing `indOffI<size>` by using multiple `match` lines and a better predicate.
Sure, that would be better still. Best of all would IMO be for each kind of memory access to check its offset operand by calling `ok_for_immed`, but let's see what folding and the use of better predicates does.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/16991#issuecomment-1846776665
More information about the hotspot-compiler-dev
mailing list