RFR: 8319690: [AArch64] C2 compilation hits offset_ok_for_immed: assert "c2 compiler bug" [v2]
Andrew Haley
aph at openjdk.org
Thu Dec 7 14:55:02 UTC 2023
On Thu, 7 Dec 2023 06:42:49 GMT, Fei Gao <fgao at openjdk.org> wrote:
>> On LP64 systems, if the heap can be moved into low virtual address space (below 4GB) and the heap size is smaller than the interesting threshold of 4 GB, we can use unscaled decoding pattern for narrow klass decoding. It means that a generic field reference can be decoded by:
>>
>> cast<64> (32-bit compressed reference) + field_offset
>>
>>
>> When the `field_offset` is an immediate, on aarch64 platform, the unscaled decoding pattern can match perfectly with a direct addressing mode, i.e., `base_plus_offset`, supported by `LDR/STR` instructions. But for certain data width, not all immediates can be encoded in the instruction field of `LDR/STR` [[1]](https://github.com/openjdk/jdk/blob/8db7bad992a0f31de9c7e00c2657c18670539102/src/hotspot/cpu/aarch64/assembler_aarch64.inline.hpp#L33). The ranges are different as data widths vary.
>>
>> For example, when we try to load a value of long type at offset of `1030`, the address expression is `(AddP (DecodeN base) 1030)`. Before the patch, the expression was matching with `operand indOffIN()`. But, for 64-bit `LDR/STR`, signed immediate byte offset must be in the range -256 to 255 or positive immediate byte offset must be a multiple of 8 in the range 0 to 32760 [[2]](https://developer.arm.com/documentation/ddi0602/2023-09/Base-Instructions/LDR--immediate---Load-Register--immediate--?lang=en). `1030` can't be encoded in the instruction field. So, after matching, when we do checking for instruction encoding, the assertion would fail.
>>
>> In this patch, we're going to filter out invalid immediates when deciding if current addressing mode can be matched as `base_plus_offset`. We introduce `indOffIN4/indOffLN4` and `indOffIN8/indOffLN8` for 32-bit data type and 64-bit data type separately in the patch. E.g., for `memory4`, we remove the generic `indOffIN/indOffLN`, which matches wrong unscaled immediate range, and replace them with `indOffIN4/indOffLN4` instead.
>>
>> Since 8-bit and 16-bit `LDR/STR` instructions also support the unscaled decoding pattern, we add the addressing mode in the lists of `memory1` and `memory2` by introducing `indOffIN1/indOffLN1` and `indOffIN2/indOffLN2`.
>>
>> We also remove unused operands `indOffI/indOffl/indOffIN/indOffLN` to avoid misuse.
>>
>> Tier 1-3 passed on aarch64.
>
> Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:
>
> - Remove unused immIOffset/immLOffset
> - Merge branch 'master' into fg8319690
> - 8319690: [AArch64] C2 compilation hits offset_ok_for_immed: assert "c2 compiler bug"
>
> On LP64 systems, if the heap can be moved into low virtual
> address space (below 4GB) and the heap size is smaller than the
> interesting threshold of 4 GB, we can use unscaled decoding
> pattern for narrow klass decoding. It means that a generic field
> reference can be decoded by:
> ```
> cast<64> (32-bit compressed reference) + field_offset
> ```
>
> When the `field_offset` is an immediate, on aarch64 platform, the
> unscaled decoding pattern can match perfectly with a direct
> addressing mode, i.e., `base_plus_offset`, supported by LDR/STR
> instructions. But for certain data width, not all immediates can
> be encoded in the instruction field of LDR/STR[1]. The ranges are
> different as data widths vary.
>
> For example, when we try to load a value of long type at offset of
> `1030`, the address expression is `(AddP (DecodeN base) 1030)`.
> Before the patch, the expression was matching with
> `operand indOffIN()`. But, for 64-bit LDR/STR, signed immediate
> byte offset must be in the range -256 to 255 or positive immediate
> byte offset must be a multiple of 8 in the range 0 to 32760[2].
> `1030` can't be encoded in the instruction field. So, after
> matching, when we do checking for instruction encoding, the
> assertion would fail.
>
> In this patch, we're going to filter out invalid immediates
> when deciding if current addressing mode can be matched as
> `base_plus_offset`. We introduce `indOffIN4/indOffLN4` and
> `indOffIN8/indOffLN8` for 32-bit data type and 64-bit data
> type separately in the patch. E.g., for `memory4`, we remove
> the generic `indOffIN/indOffLN`, which matches wrong unscaled
> immediate range, and replace them with `indOffIN4/indOffLN4`
> instead.
>
> Since 8-bit and 16-bit LDR/STR instructions also support the
> unscaled decoding pattern, we add the addressing mode in the
> lists of `memory1` and `memory2` by introducing
> `indOffIN1/indOffLN1` and `indOffIN2/indOffLN2`.
>
> We also remove unused operands `indOffI/indOffl/indOffIN/indOffLN`
> to avoid misuse.
>
> ...
I think this patch is excessive for the problem and introduces a lot of code dupiication. Maybe it would be simpler, smaller, and faster to check for what we need:
diff --git a/src/hotspot/cpu/aarch64/aarch64.ad b/src/hotspot/cpu/aarch64/aarch64.ad
index 233f9b6af7c..ea842912ce9 100644
--- a/src/hotspot/cpu/aarch64/aarch64.ad
+++ b/src/hotspot/cpu/aarch64/aarch64.ad
@@ -5911,7 +5911,8 @@ operand indIndexN(iRegN reg, iRegL lreg)
operand indOffIN(iRegN reg, immIOffset off)
%{
- predicate(CompressedOops::shift() == 0);
+ predicate(CompressedOops::shift() == 0
+ && Address::offset_ok_for_immed(n->in(3)->find_int_con(min_jint), exact_log2(sizeof(jint))));
constraint(ALLOC_IN_RC(ptr_reg));
match(AddP (DecodeN reg) off);
op_cost(0);
@@ -5926,7 +5927,8 @@ operand indOffIN(iRegN reg, immIOffset off)
operand indOffLN(iRegN reg, immLoffset off)
%{
- predicate(CompressedOops::shift() == 0);
+ predicate(CompressedOops::shift() == 0
+ && Address::offset_ok_for_immed(n->in(3)->find_long_con(min_jint), exact_log2(sizeof(jlong))));
constraint(ALLOC_IN_RC(ptr_reg));
match(AddP (DecodeN reg) off);
op_cost(0);
-------------
PR Comment: https://git.openjdk.org/jdk/pull/16991#issuecomment-1845483889
More information about the hotspot-compiler-dev
mailing list