RFR: 8319690: [AArch64] C2 compilation hits offset_ok_for_immed: assert "c2 compiler bug" [v3]
Fei Gao
fgao at openjdk.org
Thu Jun 20 10:01:22 UTC 2024
On Thu, 13 Jun 2024 23:18:04 GMT, Dean Long <dlong at openjdk.org> wrote:
> Something like this:
>
> ```
> // Special operand allowing iRegN to be treated as iRegP
> operand iRegN2P(iRegN regn) %{
> predicate(CompressedOops::shift() == 0);
>
> op_cost(0);
>
> match(DecodeN regn);
>
> format %{ "decode($regn)" %}
>
> interface(REG_INTER)
> %}
>
> // Pointer Register
> operand iRegP()
> %{
> constraint(ALLOC_IN_RC(ptr_reg));
> [...]
> match(iRegN2P);
> [...]
> ```
>
> I tried using an opclass like iRegIorL2I, but adlc can't seem to handle an opclass before all operands are defined.
Thanks for your comments, @dean-long .
That's worth a try. And then I found a problem.
When we try to define a relationship like this:
// Pointer Register
operand iRegP()
%{
constraint(ALLOC_IN_RC(ptr_reg));
[...]
match(iRegN2P);
[...]
The following DFA logic will be accordingly:
void State::_sub_Op_DecodeN(const Node *n){
...
if( STATE__VALID_CHILD(_kids[0], IREGN) &&
(
n->bottom_type()->is_ptr()->ptr() != TypePtr::NotNull &&
n->bottom_type()->is_ptr()->ptr() != TypePtr::Constant
) ) {
unsigned int c = _kids[0]->_cost[IREGN]+INSN_COST * 3;
if (STATE__NOT_YET_VALID(IREGP) || _cost[IREGP] > c) {
DFA_PRODUCTION(IREGP, decodeHeapOop_rule, c)
}
...
}
...
if( STATE__VALID_CHILD(_kids[0], IREGN) &&
CompressedOops::shift() == 0
) {
unsigned int c = _kids[0]->_cost[IREGN];
DFA_PRODUCTION(IREGN2P, iRegN2P_rule, c)
if (STATE__NOT_YET_VALID(IREGP) || _cost[IREGP] > c) {
DFA_PRODUCTION(IREGP, iRegP_rule, c)
}
...
It means that, most of time, when `CompressedOops::shift() == 0` happens, for all output operands accept `iRegP`, it will choose `iRegP_rule` because of its low cost. But, sometimes, we may expect an instruction not a last machine operand.
Assuming we have part of ideal graph like:
...
1 LoadN
/ \
... 2 DecodeN 3 ConL
\ /
4 AddP
|
5 LoadN
|
6 DecodeN ...
\ /
7 CmpP
```
We may expect `2 DecodeN - 1 LoadN` can be reduced as a last machine operand and `6 DecodeN - 5 LoadN` can be reduced as an instruct, even though `6 DecodeN - 5 LoadN` does nothing in the final compiled code.
If we choose different rules for similar tree structures, we need more information. Detecting more information here would make the DFA condition more complex. I guess that's probably the same reason why we need an opclass to build a relationship for `iRegIorL2I` and `iRegI`. WDYT?
If merging these two rules looks confusing here, how about just keeping them separate as before? After implementing the scheme suggested by @theRealAph , operands with duplicate logic will be reduced significantly.
Thanks!
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/16991#discussion_r1647318308
More information about the hotspot-compiler-dev
mailing list