[9] RFR (S): 8148754: C2 loop unrolling fails due to unexpected graph shape

Wed Mar 16 17:59:13 UTC 2016

Hi Vladimir,

I've spent more time on this issue. Please find my findings below.

On 02/25/2016 03:53 AM, Vladimir Kozlov wrote:
> So it is again _major_progress problem.
> I have to spend more time on this. It is not simple.
>
> We may add an other state when Ideal transformation could be executed. 
> For example, after all loop opts:
>
> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/0fc557e05fc0/src/share/vm/opto/compile.cpp#l2286 
>
>
> Or more states to specify which Ideal transformations and loop 
> optimizations could be executed in which state.

I think adding more states is necessary, adding a single state is not 
sufficient as... (see below)

>
> The main problem from your description is elimination of Opaque1 on 
> which loop optimizations relies.
>
> We can simply remove Opaque1Node::Identity(PhaseGVN* phase) because 
> PhaseMacroExpand::expand_macro_nodes() will remove them after all loop 
> opts.

...there are even more places where Opaque1 nodes are removed, than 
we've initially assumed.

The two I'm concerned about are
- Compile::cleanup_loop_predicates()
http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/3256d4204291/src/share/vm/opto/compile.cpp#l1907
- IdealLoopTree::remove_main_post_loops()
http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/3256d4204291/src/share/vm/opto/loopTransform.cpp#l2469

> On other hand we may do want to execute some simple loop optimizations 
> even after Opaque, CastII and CastI2L are optimized out. For example, 
> removing empty loops or one iteration loops (pre-loops). But 
> definitely not ones which use cloning or other aggressive optimizations.

Yes, I agree that we want to execute some loop optimizations even 
afterwards.

So I've added three states: LOOP_OPTS_FULL, LOOP_OPTS_LIMITED, and 
LOOP_OPTS_INHIBITED. These states indicate which loop optimizations are 
allowed, Major_progress indicates only if loop optimizations have made 
progress (but not if loop optimizations are expected to be performed).

>
> Inline_Warm() is not used since InlineWarmCalls for very long time. 
> The code could be very rotten by now. So removing set_major_progress 
> from it is fine.

OK.

>
> It is also fine to remove it from inline_incrementally since it will 
> be restored by skip_loop_opts code (and cleared if method is empty or 
> set if there are expensive nodes).

OK.

>
> LoopNode::Ideal() change seems also fine. LoopNode is created only in 
> loop opts (RootNode has own Ideal()) so if it has TOP input it will be 
> removed by RegionNode::Ideal most likely.

OK.

>
> Which leaves remove_useless_bool() code only and I have concern about 
> it. It could happened after CCP phase and we may want to execute loop 
> opts after it. I am actually want to set major progress after CCP 
> unconditionally since some If nodes could be folded by it.

Yes, that makes sense and I did it.

>
> As you can see it is not simple :(

No, it's not simple at all. I did a prototype that implements all we 
discussed above. Here is the code:
http://cr.openjdk.java.net/~zmajo/code/8148754/webrev/

The code is not yet RFR quality, but I've sent it out because I'd like 
to have your feedback on how to continue.

The code fixes the current problem with the unexpected graph shape. But 
it is likely to also solve similar problems that are triggered also by 
an unexpected graph shape, for example any of the asserts in 
PhaseIdealLoop::do_range_check:
http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/3256d4204291/src/share/vm/opto/loopTransform.cpp#l2124

I evaluated performance of the prototype. Peformance improves in a 
number of cases by 1-4%:
Octane-Mandreel
Octane-Richards
Octane-Splay

Unfortunately, there is also a performance regression with 
SPECjvm2008-MonteCarlo-G1 (3-5%). Finding the cause of that regression 
is likely to take a at least a week, but most likely even more.

So my question is: Should I spend more time on this prototype and fix 
the performance regression?

A different solution would be check the complete graph shape. That is 
also done at other places, e.g., in SuperWord::get_pre_loop_end()
http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/3256d4204291/src/share/vm/opto/superword.cpp#l3076

Here is the webrev for the second solution:
http://cr.openjdk.java.net/~zmajo/8148754/webrev.03/

The second solution does not regress. I've tested it with:
- JPRT;
- local testing (linux-86_64) with the failing test case;
- executing all hotspot tests locally, all tests pass that pass with an 
unmodified build.

Can you please let me know which solution you prefer:
- (1) the prototype with the regression solved or
- (2) checking the graph shape?

We could also fix this issue with pushing (2) for now (as this issue is 
a "critical" nightly failure). I could then spend more time on (1) later 
in a different bug.

Thank you and best regards,

Zoltan

> Thanks,
> Vladimir
>
> On 2/22/16 6:22 AM, Zoltán Majó wrote:
>> Hi Vladimir,
>>
>>
>> thank you for the feedback!
>>
>> On 02/16/2016 01:11 AM, Vladimir Kozlov wrote:
>>> Zoltan,
>>>
>>> It should not be "main" loop if peeling happened. See do_peeling():
>>>
>>>     if (cl->is_main_loop()) {
>>>       cl->set_normal_loop();
>>>
>>> Split-if optimization should not split through loop's phi. And
>>> generally not through loop's head since it is not making code better -
>>> split through backedge moves code into loop again. Making loop body
>>> more complicated as this case shows.
>>
>> I did more investigation to understand what causes the invalid graph
>> shape to appear. It seems that the invalid graph shape appears because
>> the compiler uses the Compile:: _major_progress inconsistently. Here are
>> some details.
>>
>> - If _major_progress *is set*, the compiler expects more loop
>> optimizations to happen. Therefore, certain transformations on the graph
>> are not allowed so that the graph is in a shape that can be processed by
>> loop optimizations. See:
>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/2c3c43037e14/src/share/vm/opto/convertnode.cpp#l253 
>>
>>
>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/2c3c43037e14/src/share/vm/opto/castnode.cpp#l251 
>>
>>
>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/2c3c43037e14/src/share/vm/opto/loopnode.cpp#l950 
>>
>>
>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/2c3c43037e14/src/share/vm/opto/opaquenode.cpp#l37 
>>
>>
>>
>> - If _major_progress *is not set*, the compiler is allowed to perform
>> all possible transformations (because it does not have to care about
>> future loop optimizations).
>>
>> The crash reported for the current issue appears because _major_progress
>> *can be accidentally set again* after the compiler decided to stop
>> performing loop optimizations. As a result, invalid graph shapes appear.
>>
>> Here are details about how this happens for both failures I've been
>> studying:
>> https://bugs.openjdk.java.net/browse/JDK-8148754?focusedCommentId=13901941&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13901941 
>>
>>
>>
>> I would propose to change the compiler to use _major_progress
>> consistently. (This goes into the same direction as Tobias's recent work
>> on JDK-8144487.)
>>
>> I propose that _major_progress:
>> - can be SET when the compiler is initialized (because loop
>> optimizations are expected to happen afterwards);
>> - can be SET/RESET in the scope of loop optimizations (because we want
>> to see if loop optimizations made progress);
>> - cannot be SET/RESET by neither incremental inlining nor IGVN (even if
>> the IGVN is performed in the scope of loop optimizations).
>>
>> Here is the updated webrev:
>> http://cr.openjdk.java.net/~zmajo/8148754/webrev.02/
>>
>> Performance evaluation:
>> - The proposed webrev does not cause performance regressions for
>> SPECjvm2008, SPECjbb2005, and Octane.
>>
>> Testing:
>> - all hotspot JTREG tests on all supported platforms;
>> - JPRT;
>> - failing test case.
>>
>> Thank you and best regards,
>>
>>
>> Zoltan
>>
>>
>>>
>>> Bailout unrolling is fine but performance may suffer because in some
>>> cases loop unrolling is better then split-if.
>>
>>
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 2/15/16 7:22 AM, Zoltán Majó wrote:
>>>> Hi,
>>>>
>>>>
>>>> please review the patch for 8148754.
>>>>
>>>> https://bugs.openjdk.java.net/browse/JDK-8148754
>>>>
>>>> Problem: Compilation fails when the C2 compiler attempts loop 
>>>> unrolling.
>>>> The cause of the failure is that the loop unrolling optimization 
>>>> expects
>>>> a well-defined graph shape at the entry control of a 'CountedLoopNode'
>>>> ('IfTrue'/'IfFalse' preceeded by 'If' preceeded by 'Bool' preceeded by
>>>> 'CmpI').
>>>>
>>>>
>>>> Solution: I investigated several different instances of the same
>>>> failure. It turns out that the shape of the graph at a loop's entry
>>>> control is often different from the way loop unrolling expects it 
>>>> to be
>>>> (please find some examples in the bug's JBS issue). The various graph
>>>> shapes are a result of previously performed transformations, e.g.,
>>>> split-if optimization and loop peeling.
>>>>
>>>> Loop unrolling requires the above mentioned graph shape so that it can
>>>> adjust the zero-trip guard of the loop. With the unexpected graph
>>>> shapes, it is not possible to perform loop unrolling. However, the 
>>>> graph
>>>> is still in a valid state (except for loop unrolling) and can be 
>>>> used to
>>>> produce correct code.
>>>>
>>>> I propose that (1) we check if an unexpected graph shape is 
>>>> encountered
>>>> and (2) bail out of loop unrolling if it is (but not fail in the
>>>> compiler in such cases).
>>>>
>>>> The failure was triggered by Aleksey's Indify String Concatenation
>>>> changes but the generated bytecodes are valid. So this seems to be a
>>>> compiler issue that was previously there but was not yet triggered.
>>>>
>>>>
>>>> Webrev:
>>>> http://cr.openjdk.java.net/~zmajo/8148754/webrev.00/
>>>>
>>>> Testing:
>>>> - JPRT;
>>>> - local testing (linux-86_64) with the failing test case;
>>>> - executed all hotspot tests locally, all tests pass that pass with an
>>>> unmodified build.
>>>>
>>>> Thank you!
>>>>
>>>> Best regards,
>>>>
>>>>
>>>> Zoltan
>>>>
>>