RFR: 8303466: C2: failed: malformed control flow. Limit type made precise with MaxL/MinL [v3]

Tue Apr 11 09:07:40 UTC 2023

On Tue, 11 Apr 2023 08:00:35 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> **Context**
>> 
>> During `PhaseIdealLoop::do_unroll`, we hack the loop-limit, and subtract `stride` from it. We have to prevent underflow on that subtract. Currently, we do this with a `CMoveI`. The problem with this: `CMoveI` is not smart enough to generate a precise type. For example, there are many cases where the input types get better, and underflow is not possible anymore. But the `CMoveI` does not detect this, and still has type `min_jint..hi`.
>> 
>> We have the same issue in `PhaseIdealLoop::adjust_limit`, where we use `CMoveL` to implement long max/min. The types are not as precise as they could and should be.
>> 
>> **Problem**
>> 
>> The imprecise type is used for the zero-trip-guard. It does not fold to false, even though the data-path into the post loop does constant fold to `TOP`. The graph breaks, and assert `malformed control flow` triggers.
>> 
>> Details: In these cases, we have the super-unrolled main-loop (SuperWord'ed, then further unrolled) directly leading to a vectorized post-loop. The effect is that there is no `region/phi` merging main-exit and main-zero-trip-guard. So the types are already more narrow here. It may be possible that the values are such that we find out that we should never enter the vectorized post-loop. But if data finds out and control does not, we get a broken graph.
>> Note: we have pre-loop. Then a main-loop and vectorized post loop. Then we merge the main-zero-trip-guard. And at the end we have the scalar post loop.
>> 
>> I have already recently fixed a bug around this `CMoveI`. https://github.com/openjdk/jdk/commit/5a4945c0d95423d0ab07762c915e9cb4d3c66abb I would now like to have a more satisfactory fix, that properly propagates the types.
>> 
>> **Solution**
>> 
>> `PhaseIdealLoop::adjust_limit` already converts the limit from int to long, and does all computations in long, including taking max/min with a `CMoveL`. I now use the so far unused `MaxL/MinL`. I implemented some missing `Value/Identity` components for it. Since `MaxL/MinL` is not implemented in the backend, I just expand it in macro-expansion to a `CMoveL`. At that point the loop-opts are over, and it is most likely ok that we do not make the types more precise after this.
>> 
>> I take the same approach for `PhaseIdealLoop::do_unroll`: convert limits to long, do subtraction in long, take `MinL/MaxL` to clamp it to the int-range (prevent subtraction underflow).
>> 
>> **Discussion**
>> 
>> This solution seems much cleaner to me, and I hope that we will see less bugs because of imprecise types in the limit computation, which were often due to the `CMove` not being smart enough to analyze all inputs (it would have to recognize a multitude of patterns, for the Cmp inputs and the direct inputs to the CMove - we currently do not do that, but just take the union of the input types - this is very inprecise).
>> 
>> There is a bit of an overhead here: We use longs even though we only want to have int values. But I think we should prefer a clean implementation here, with correct type computation. The performance impact is probably non-existent on 64-bit machines anyway.
>> 
>> **Caveat**
>> 
>> I found some cases with the same assert `malformed control flow` that are most likely skeleton/assertion predicate bugs [JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981). Some of those cases were new patterns, for example where we PreMainPost a main loop.
>> 
>> I hope that this fix here at least reduces the frequency of failures significantly.
>> 
>> **Testing**
>> 
>> I added 2 regression tests. Our fuzzer seems to spit out examples regularly, so that gives us extra coverage.
>> 
>> Tested up to `tier5` and stress testing. Performance testing **running...**
>> 
>> **Future Work**
>> 
>> We should implement `MaxL/MinL` in the backend. We should also use them during parsing. This would also allow to `SuperWord` the instruction, on the platforms that support it.
>> 
>> Should we add such an assert during IGVN? I think after IGVN, we should never have a `MultiBranchNode` that does not have the required number of outputs, right? We could add it to `VerifyIterativeGVN`.
>
> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
> 
>   stride_l should be longcon

src/hotspot/share/opto/addnode.cpp line 1272:

> 1270:   const TypeLong* t2 = phase->type(in(2))->is_long();
> 1271: 
> 1272:   // Can we determine minimum statically?

Suggestion:

  // Can we determine maximum statically?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/13269#discussion_r1162513651