RFR: 8319690: [AArch64] C2 compilation hits offset_ok_for_immed: assert "c2 compiler bug" [v2]

Thu Dec 7 06:42:49 UTC 2023

> On LP64 systems, if the heap can be moved into low virtual address space (below 4GB) and the heap size is smaller than the interesting threshold of 4 GB, we can use unscaled decoding pattern for narrow klass decoding. It means that a generic field reference can be decoded by:
> 
> cast<64> (32-bit compressed reference) + field_offset
> 
> 
> When the `field_offset` is an immediate, on aarch64 platform, the unscaled decoding pattern can match perfectly with a direct addressing mode, i.e., `base_plus_offset`, supported by `LDR/STR` instructions. But for certain data width, not all immediates can be encoded in the instruction field of `LDR/STR` [[1]](https://github.com/openjdk/jdk/blob/8db7bad992a0f31de9c7e00c2657c18670539102/src/hotspot/cpu/aarch64/assembler_aarch64.inline.hpp#L33). The ranges are different as data widths vary.
> 
> For example, when we try to load a value of long type at offset of `1030`, the address expression is `(AddP (DecodeN base) 1030)`. Before the patch, the expression was matching with `operand indOffIN()`. But, for 64-bit `LDR/STR`, signed immediate byte offset must be in the range -256 to 255 or positive immediate byte offset must be a multiple of 8 in the range 0 to 32760 [[2]](https://developer.arm.com/documentation/ddi0602/2023-09/Base-Instructions/LDR--immediate---Load-Register--immediate--?lang=en). `1030` can't be encoded in the instruction field. So, after matching, when we do checking for instruction encoding, the assertion would fail.
> 
> In this patch, we're going to filter out invalid immediates when deciding if current addressing mode can be matched as `base_plus_offset`. We introduce `indOffIN4/indOffLN4` and `indOffIN8/indOffLN8` for 32-bit data type and 64-bit data type separately in the patch. E.g., for `memory4`, we remove the generic `indOffIN/indOffLN`, which matches wrong unscaled immediate range, and replace them with `indOffIN4/indOffLN4` instead.
> 
> Since 8-bit and 16-bit `LDR/STR` instructions also support the unscaled decoding pattern, we add the addressing mode in the lists of `memory1` and `memory2` by introducing `indOffIN1/indOffLN1` and `indOffIN2/indOffLN2`.
> 
> We also remove unused operands `indOffI/indOffl/indOffIN/indOffLN` to avoid misuse.
> 
> Tier 1-3 passed on aarch64.

Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:

 - Remove unused immIOffset/immLOffset
 - Merge branch 'master' into fg8319690
 - 8319690: [AArch64] C2 compilation hits offset_ok_for_immed: assert "c2 compiler bug"

   On LP64 systems, if the heap can be moved into low virtual
   address space (below 4GB) and the heap size is smaller than the
   interesting threshold of 4 GB, we can use unscaled decoding
   pattern for narrow klass decoding. It means that a generic field
   reference can be decoded by:
   ```
   cast<64> (32-bit compressed reference) + field_offset
   ```

   When the `field_offset` is an immediate, on aarch64 platform, the
   unscaled decoding pattern can match perfectly with a direct
   addressing mode, i.e., `base_plus_offset`, supported by LDR/STR
   instructions. But for certain data width, not all immediates can
   be encoded in the instruction field of LDR/STR[1]. The ranges are
   different as data widths vary.

   For example, when we try to load a value of long type at offset of
   `1030`, the address expression is `(AddP (DecodeN base) 1030)`.
   Before the patch, the expression was matching with
   `operand indOffIN()`. But, for 64-bit LDR/STR, signed immediate
   byte offset must be in the range -256 to 255 or positive immediate
   byte offset must be a multiple of 8 in the range 0 to 32760[2].
   `1030` can't be encoded in the instruction field. So, after
   matching, when we do checking for instruction encoding, the
   assertion would fail.

   In this patch, we're going to filter out invalid immediates
   when deciding if current addressing mode can be matched as
   `base_plus_offset`. We introduce `indOffIN4/indOffLN4` and
   `indOffIN8/indOffLN8` for 32-bit data type and 64-bit data
   type separately in the patch. E.g., for `memory4`, we remove
   the generic `indOffIN/indOffLN`, which matches wrong unscaled
   immediate range, and replace them with `indOffIN4/indOffLN4`
   instead.

   Since 8-bit and 16-bit LDR/STR instructions also support the
   unscaled decoding pattern, we add the addressing mode in the
   lists of `memory1` and `memory2` by introducing
   `indOffIN1/indOffLN1` and `indOffIN2/indOffLN2`.

   We also remove unused operands `indOffI/indOffl/indOffIN/indOffLN`
   to avoid misuse.

   Tier 1-3 passed on aarch64.

   [1] https://github.com/openjdk/jdk/blob/8db7bad992a0f31de9c7e00c2657c18670539102/src/hotspot/cpu/aarch64/assembler_aarch64.inline.hpp#L33
   [2] https://developer.arm.com/documentation/ddi0602/2023-09/Base-Instructions/LDR--immediate---Load-Register--immediate--?lang=en

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/16991/files
  - new: https://git.openjdk.org/jdk/pull/16991/files/1895cf31..a7bfe267

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=16991&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16991&range=00-01

  Stats: 36013 lines in 403 files changed: 11716 ins; 22816 del; 1481 mod
  Patch: https://git.openjdk.org/jdk/pull/16991.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/16991/head:pull/16991

PR: https://git.openjdk.org/jdk/pull/16991