RFR: 8275202: C2: optimize out more redundant conditions

Thu Oct 5 12:26:13 UTC 2023

On Mon, 21 Aug 2023 08:43:39 GMT, Roland Westrelin <roland at openjdk.org> wrote:

>> This change adds a new loop opts pass to optimize redundant conditions
>> such as the second one in:
>> 
>> 
>> if (i < 10) {
>>   if (i < 42) {
>> 
>> 
>> In the branch of the first if, the type of i can be narrowed down to
>> [min_jint, 9] which can then be used to constant fold the second
>> condition.
>>   
>> The compiler already keeps track of type[n] for every node in the
>> current compilation unit. That's not sufficient to optimize the
>> snippet above though because the type of i can only be narrowed in
>> some sections of the control flow (that is a subset of all
>> controls). The solution is to build a new table that tracks the type
>> of n at every control c
>> 
>> 
>> type'[n, root] = type[n] // initialized from igvn's type table
>> type'[n, c] = type[n, idom(c)]
>> 
>> 
>> This pass iterates over the CFG looking for conditions such as:
>> 
>> 
>> if (i < 10) {
>> 
>> 
>> that allows narrowing the type of i and updates the type' table
>> accordingly.
>> 
>> At a region r:
>> 
>> 
>> type'[n, r] = meet(type'[n, r->in(1)], type'[n, r->in(2)]...)
>> 
>> 
>> For a Phi phi at a region r:
>> 
>> 
>> type'[phi, r] = meet(type'[phi->in(1), r->in(1)], type'[phi->in(2), r->in(2)]...)
>> 
>> 
>> Once a type is narrowed, uses are enqueued and their types are
>> computed by calling the Value() methods. If a use's type is narrowed,
>> it's recorded at c in the type' table. Value() methods retrieve types
>> from the type table, not the type' table. To address that issue while
>> leaving Value() methods unchanged, before calling Value() at c, the
>> type table is updated so:
>> 
>> 
>> type[n] = type'[n, c]
>> 
>> 
>> An exception is for Phi::Value which needs to retrieve the type of
>> nodes are various controls: there, a new type(Node* n, Node* c)
>> method is used.
>> 
>> For most n and c, type'[n, c] is likely the same as type[n], the type
>> recorded in the global igvn table (that is there shouldn't be many
>> nodes at only a few control for which we can narrow the type down). As
>> a consequence, the types'[n, c] table is implemented with:
>> 
>> - At c, narrowed down types are stored in a GrowableArray. Each entry
>>   records the previous type at idom(c) and the narrowed down type at
>>   c.
>> 
>> - The GrowableArray of type updates is recorded in a hash table
>>   indexed by c. If there's no update at c, there's no entry in the
>>   hash table.
>> 
>> This pass operates in 2 steps:
>> 
>> - it first iterates over the graph looking for conditions that narrow
>>   the types of some nodes and propagate type updates to uses ...
>
> Comment to keep alive.

> @rwestrel this looks like an awesome idea. I gave the code a quick glance, and have a few questions:

Thanks for looking at this. I have an updated patch coming but I'll make sure to look at your comments before I send it.

> **How powerful is this?** Can we make inferences from `expr(i) > 0` about `i` itself? I guess not really, that would require us to propagate type information "upward" through `expr` to `i`.
> 
> Assume we have`expr1(i) > 0` and `expr2(i) < 0`:
> 
>     * And let's assume that together this implied that `i` was a constant at c. Your algorithm would not detect this, right?

No
It can do:

if (2 * i < 0) {
  if (i >= 0 && i <= 10) {
    // i in [0, 10] => 2 * i in [0, 20] which is impossible so this branch is unreachable
  }
}

>     * Alternatively, let's assume this meant that `i` was TOP at c. We would not directly detect this, but we would somehow detect that one of the two expressions is not TOP at c, and therefore we could collapse c? This is very powerful for dead code elimination.

Yes, I think that's correct.

> **How expensive is this?** Worst case, you cannot get around having a basically quadratic runtime and memory requirement, right? So basically `O(C * D)`, where C is number of control nodes, and D is number of data nodes. I'm trying to think of such a worst-case example: Can there be a program where we have lots of expressions of `i`, and lots of comparisons on `i`, and therefore have to update all expressions of `i` at every comparison of `i`? This worst-case is probably far off of the "average" program, right? But still, one wonders what the distribution is - even the occasional program/method that hits quadratic runtime could be problematically expensive.
> 
> If you don't think the runtime is indeed worst-case quadratic, then I'd be interested in hearing the argument ;)

I think you're right about this.
I've seen a case where the type of a single node is repeatedly narrowed and that node has many uses. That method has no loop so the compiler spends almost no time in loop opts without the patch and quite a bit of time with the patch. I found that method while looking at compilation times for CTW on java.base. Working around the issue improves compilation time of that method but made no difference on the total time spent on java.base. So it's true that there could be pathological cases but they seem to be uncommon.

> **Do you think this really solves all the other bugs?** If we put this behind a product flag, then we do need to fix all other bugs that are linked as duplicate separately. But that may not be easy, because they basically happen because separate type info about expressions of `i` are suddenly combined at some data node, but not visible to the control -> inconsistent graph.

So maybe this should be behind a diagnostic flag. The reason for having a flag is to diagnose issues that this could cause: new crashes, incorrect execution or maybe compilation time becoming to expensive and maybe work around those issues temporarily.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/14586#issuecomment-1748787521