RFR: 8319690: [AArch64] C2 compilation hits offset_ok_for_immed: assert "c2 compiler bug" [v2]

Wed Jan 31 11:24:07 UTC 2024

On Thu, 7 Dec 2023 06:42:49 GMT, Fei Gao <fgao at openjdk.org> wrote:

>> On LP64 systems, if the heap can be moved into low virtual address space (below 4GB) and the heap size is smaller than the interesting threshold of 4 GB, we can use unscaled decoding pattern for narrow klass decoding. It means that a generic field reference can be decoded by:
>> 
>> cast<64> (32-bit compressed reference) + field_offset
>> 
>> 
>> When the `field_offset` is an immediate, on aarch64 platform, the unscaled decoding pattern can match perfectly with a direct addressing mode, i.e., `base_plus_offset`, supported by `LDR/STR` instructions. But for certain data width, not all immediates can be encoded in the instruction field of `LDR/STR` [[1]](https://github.com/openjdk/jdk/blob/8db7bad992a0f31de9c7e00c2657c18670539102/src/hotspot/cpu/aarch64/assembler_aarch64.inline.hpp#L33). The ranges are different as data widths vary.
>> 
>> For example, when we try to load a value of long type at offset of `1030`, the address expression is `(AddP (DecodeN base) 1030)`. Before the patch, the expression was matching with `operand indOffIN()`. But, for 64-bit `LDR/STR`, signed immediate byte offset must be in the range -256 to 255 or positive immediate byte offset must be a multiple of 8 in the range 0 to 32760 [[2]](https://developer.arm.com/documentation/ddi0602/2023-09/Base-Instructions/LDR--immediate---Load-Register--immediate--?lang=en). `1030` can't be encoded in the instruction field. So, after matching, when we do checking for instruction encoding, the assertion would fail.
>> 
>> In this patch, we're going to filter out invalid immediates when deciding if current addressing mode can be matched as `base_plus_offset`. We introduce `indOffIN4/indOffLN4` and `indOffIN8/indOffLN8` for 32-bit data type and 64-bit data type separately in the patch. E.g., for `memory4`, we remove the generic `indOffIN/indOffLN`, which matches wrong unscaled immediate range, and replace them with `indOffIN4/indOffLN4` instead.
>> 
>> Since 8-bit and 16-bit `LDR/STR` instructions also support the unscaled decoding pattern, we add the addressing mode in the lists of `memory1` and `memory2` by introducing `indOffIN1/indOffLN1` and `indOffIN2/indOffLN2`.
>> 
>> We also remove unused operands `indOffI/indOffl/indOffIN/indOffLN` to avoid misuse.
>> 
>> Tier 1-3 passed on aarch64.
>
> Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:
> 
>  - Remove unused immIOffset/immLOffset
>  - Merge branch 'master' into fg8319690
>  - 8319690: [AArch64] C2 compilation hits offset_ok_for_immed: assert "c2 compiler bug"
>    
>    On LP64 systems, if the heap can be moved into low virtual
>    address space (below 4GB) and the heap size is smaller than the
>    interesting threshold of 4 GB, we can use unscaled decoding
>    pattern for narrow klass decoding. It means that a generic field
>    reference can be decoded by:
>    ```
>    cast<64> (32-bit compressed reference) + field_offset
>    ```
>    
>    When the `field_offset` is an immediate, on aarch64 platform, the
>    unscaled decoding pattern can match perfectly with a direct
>    addressing mode, i.e., `base_plus_offset`, supported by LDR/STR
>    instructions. But for certain data width, not all immediates can
>    be encoded in the instruction field of LDR/STR[1]. The ranges are
>    different as data widths vary.
>    
>    For example, when we try to load a value of long type at offset of
>    `1030`, the address expression is `(AddP (DecodeN base) 1030)`.
>    Before the patch, the expression was matching with
>    `operand indOffIN()`. But, for 64-bit LDR/STR, signed immediate
>    byte offset must be in the range -256 to 255 or positive immediate
>    byte offset must be a multiple of 8 in the range 0 to 32760[2].
>    `1030` can't be encoded in the instruction field. So, after
>    matching, when we do checking for instruction encoding, the
>    assertion would fail.
>    
>    In this patch, we're going to filter out invalid immediates
>    when deciding if current addressing mode can be matched as
>    `base_plus_offset`. We introduce `indOffIN4/indOffLN4` and
>    `indOffIN8/indOffLN8` for 32-bit data type and 64-bit data
>    type separately in the patch. E.g., for `memory4`, we remove
>    the generic `indOffIN/indOffLN`, which matches wrong unscaled
>    immediate range, and replace them with `indOffIN4/indOffLN4`
>    instead.
>    
>    Since 8-bit and 16-bit LDR/STR instructions also support the
>    unscaled decoding pattern, we add the addressing mode in the
>    lists of `memory1` and `memory2` by introducing
>    `indOffIN1/indOffLN1` and `indOffIN2/indOffLN2`.
>    
>    We also remove unused operands `indOffI/indOffl/indOffIN/indOffLN`
>    to avoid misuse.
>    
> ...

I'm withdrawing my objection to this PR.

I've had a good look at some alternatives, and none of them are any better. I've reluctantly concluded that, given the design of C2, there's no better way to fix it.

My apologies to @fg1417 . Yoy were right.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/16991#issuecomment-1918910080