RFR: 8319690: [AArch64] C2 compilation hits offset_ok_for_immed: assert "c2 compiler bug" [v2]
Fei Gao
fgao at openjdk.org
Thu Dec 7 06:42:49 UTC 2023
> On LP64 systems, if the heap can be moved into low virtual address space (below 4GB) and the heap size is smaller than the interesting threshold of 4 GB, we can use unscaled decoding pattern for narrow klass decoding. It means that a generic field reference can be decoded by:
>
> cast<64> (32-bit compressed reference) + field_offset
>
>
> When the `field_offset` is an immediate, on aarch64 platform, the unscaled decoding pattern can match perfectly with a direct addressing mode, i.e., `base_plus_offset`, supported by `LDR/STR` instructions. But for certain data width, not all immediates can be encoded in the instruction field of `LDR/STR` [[1]](https://github.com/openjdk/jdk/blob/8db7bad992a0f31de9c7e00c2657c18670539102/src/hotspot/cpu/aarch64/assembler_aarch64.inline.hpp#L33). The ranges are different as data widths vary.
>
> For example, when we try to load a value of long type at offset of `1030`, the address expression is `(AddP (DecodeN base) 1030)`. Before the patch, the expression was matching with `operand indOffIN()`. But, for 64-bit `LDR/STR`, signed immediate byte offset must be in the range -256 to 255 or positive immediate byte offset must be a multiple of 8 in the range 0 to 32760 [[2]](https://developer.arm.com/documentation/ddi0602/2023-09/Base-Instructions/LDR--immediate---Load-Register--immediate--?lang=en). `1030` can't be encoded in the instruction field. So, after matching, when we do checking for instruction encoding, the assertion would fail.
>
> In this patch, we're going to filter out invalid immediates when deciding if current addressing mode can be matched as `base_plus_offset`. We introduce `indOffIN4/indOffLN4` and `indOffIN8/indOffLN8` for 32-bit data type and 64-bit data type separately in the patch. E.g., for `memory4`, we remove the generic `indOffIN/indOffLN`, which matches wrong unscaled immediate range, and replace them with `indOffIN4/indOffLN4` instead.
>
> Since 8-bit and 16-bit `LDR/STR` instructions also support the unscaled decoding pattern, we add the addressing mode in the lists of `memory1` and `memory2` by introducing `indOffIN1/indOffLN1` and `indOffIN2/indOffLN2`.
>
> We also remove unused operands `indOffI/indOffl/indOffIN/indOffLN` to avoid misuse.
>
> Tier 1-3 passed on aarch64.
Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:
- Remove unused immIOffset/immLOffset
- Merge branch 'master' into fg8319690
- 8319690: [AArch64] C2 compilation hits offset_ok_for_immed: assert "c2 compiler bug"
On LP64 systems, if the heap can be moved into low virtual
address space (below 4GB) and the heap size is smaller than the
interesting threshold of 4 GB, we can use unscaled decoding
pattern for narrow klass decoding. It means that a generic field
reference can be decoded by:
```
cast<64> (32-bit compressed reference) + field_offset
```
When the `field_offset` is an immediate, on aarch64 platform, the
unscaled decoding pattern can match perfectly with a direct
addressing mode, i.e., `base_plus_offset`, supported by LDR/STR
instructions. But for certain data width, not all immediates can
be encoded in the instruction field of LDR/STR[1]. The ranges are
different as data widths vary.
For example, when we try to load a value of long type at offset of
`1030`, the address expression is `(AddP (DecodeN base) 1030)`.
Before the patch, the expression was matching with
`operand indOffIN()`. But, for 64-bit LDR/STR, signed immediate
byte offset must be in the range -256 to 255 or positive immediate
byte offset must be a multiple of 8 in the range 0 to 32760[2].
`1030` can't be encoded in the instruction field. So, after
matching, when we do checking for instruction encoding, the
assertion would fail.
In this patch, we're going to filter out invalid immediates
when deciding if current addressing mode can be matched as
`base_plus_offset`. We introduce `indOffIN4/indOffLN4` and
`indOffIN8/indOffLN8` for 32-bit data type and 64-bit data
type separately in the patch. E.g., for `memory4`, we remove
the generic `indOffIN/indOffLN`, which matches wrong unscaled
immediate range, and replace them with `indOffIN4/indOffLN4`
instead.
Since 8-bit and 16-bit LDR/STR instructions also support the
unscaled decoding pattern, we add the addressing mode in the
lists of `memory1` and `memory2` by introducing
`indOffIN1/indOffLN1` and `indOffIN2/indOffLN2`.
We also remove unused operands `indOffI/indOffl/indOffIN/indOffLN`
to avoid misuse.
Tier 1-3 passed on aarch64.
[1] https://github.com/openjdk/jdk/blob/8db7bad992a0f31de9c7e00c2657c18670539102/src/hotspot/cpu/aarch64/assembler_aarch64.inline.hpp#L33
[2] https://developer.arm.com/documentation/ddi0602/2023-09/Base-Instructions/LDR--immediate---Load-Register--immediate--?lang=en
-------------
Changes:
- all: https://git.openjdk.org/jdk/pull/16991/files
- new: https://git.openjdk.org/jdk/pull/16991/files/1895cf31..a7bfe267
Webrevs:
- full: https://webrevs.openjdk.org/?repo=jdk&pr=16991&range=01
- incr: https://webrevs.openjdk.org/?repo=jdk&pr=16991&range=00-01
Stats: 36013 lines in 403 files changed: 11716 ins; 22816 del; 1481 mod
Patch: https://git.openjdk.org/jdk/pull/16991.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/16991/head:pull/16991
PR: https://git.openjdk.org/jdk/pull/16991
More information about the hotspot-compiler-dev
mailing list