[9] RFR (S): 8148754: C2 loop unrolling fails due to unexpected graph shape

Thu Mar 17 16:09:19 UTC 2016

Hi Vladimir,

thank you for the feedback.

On 03/16/2016 09:52 PM, Vladimir Kozlov wrote:
>> Can you please let me know which solution you prefer:
>> - (1) the prototype with the regression solved or
>> - (2) checking the graph shape?
>
> I agree that we should do (2) now. One suggestion I have is to prepare 
> a separate method to do these checks and use it in other places which 
> you pointed - superword and do_range_check.

OK, I updated the patch for (2) so that the check of the graph's shape 
is performed in a separate method. Here is the webrev:
http://cr.openjdk.java.net/~zmajo/8148754/webrev.04/

I've tested the updated webrev with JPRT and also by executing the 
failing test. Both pass. I will soon start RBT testing as well (will let 
you know if failures have appeared).

> Yes, you can work later on (1) solution if you have a bug and not RFE 
> - we should stabilize C2 as you know and these changes may have some 
> side effects we don't know about yet. But I like it because we can 
> explicitly specify which optimizations are allowed.

OK. I filed 8152110: "Stabilize C2 loop optimizations" and will continue 
work in the scope of that bug:
https://bugs.openjdk.java.net/browse/JDK-8152110
>
>> The two I'm concerned about are
>> - Compile::cleanup_loop_predicates()
>
> Yes, this one should be marked LOOP_OPTS_LIMITED.
>
>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/3256d4204291/src/share/vm/opto/compile.cpp#l1907 
>>
>> - IdealLoopTree::remove_main_post_loops()
>
> This one is fine because the loop goes away.

I'll take of these once I continue work on 8152110.

Thank you!

Best regards,

Zoltan

>
> Thanks,
> Vladimir
>
> On 3/16/16 10:59 AM, Zoltán Majó wrote:
>> Hi Vladimir,
>>
>>
>> I've spent more time on this issue. Please find my findings below.
>>
>> On 02/25/2016 03:53 AM, Vladimir Kozlov wrote:
>>> So it is again _major_progress problem.
>>> I have to spend more time on this. It is not simple.
>>>
>>> We may add an other state when Ideal transformation could be 
>>> executed. For example, after all loop opts:
>>>
>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/0fc557e05fc0/src/share/vm/opto/compile.cpp#l2286 
>>>
>>>
>>> Or more states to specify which Ideal transformations and loop 
>>> optimizations could be executed in which state.
>>
>> I think adding more states is necessary, adding a single state is not 
>> sufficient as... (see below)
>>
>>>
>>> The main problem from your description is elimination of Opaque1 on 
>>> which loop optimizations relies.
>>>
>>> We can simply remove Opaque1Node::Identity(PhaseGVN* phase) because 
>>> PhaseMacroExpand::expand_macro_nodes() will remove them after all 
>>> loop opts.
>>
>> ...there are even more places where Opaque1 nodes are removed, than 
>> we've initially assumed.
>>
>> The two I'm concerned about are
>> - Compile::cleanup_loop_predicates()
>
> Yes, this one should be marked LOOP_OPTS_LIMITED.
>
>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/3256d4204291/src/share/vm/opto/compile.cpp#l1907 
>>
>> - IdealLoopTree::remove_main_post_loops()
>
> This one is fine because the loop goes away.
>
>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/3256d4204291/src/share/vm/opto/loopTransform.cpp#l2469 
>>
>>
>>> On other hand we may do want to execute some simple loop 
>>> optimizations even after Opaque, CastII and CastI2L are optimized 
>>> out. For example, removing empty loops or one iteration loops 
>>> (pre-loops).
>>> But definitely not ones which use cloning or other aggressive 
>>> optimizations.
>>
>> Yes, I agree that we want to execute some loop optimizations even 
>> afterwards.
>>
>> So I've added three states: LOOP_OPTS_FULL, LOOP_OPTS_LIMITED, and 
>> LOOP_OPTS_INHIBITED. These states indicate which loop optimizations 
>> are allowed, Major_progress indicates only if loop optimizations
>> have made progress (but not if loop optimizations are expected to be 
>> performed).
>>
>>>
>>> Inline_Warm() is not used since InlineWarmCalls for very long time. 
>>> The code could be very rotten by now. So removing set_major_progress 
>>> from it is fine.
>>
>> OK.
>>
>>>
>>> It is also fine to remove it from inline_incrementally since it will 
>>> be restored by skip_loop_opts code (and cleared if method is empty 
>>> or set if there are expensive nodes).
>>
>> OK.
>>
>>>
>>> LoopNode::Ideal() change seems also fine. LoopNode is created only 
>>> in loop opts (RootNode has own Ideal()) so if it has TOP input it 
>>> will be removed by RegionNode::Ideal most likely.
>>
>> OK.
>>
>>>
>>> Which leaves remove_useless_bool() code only and I have concern 
>>> about it. It could happened after CCP phase and we may want to 
>>> execute loop opts after it. I am actually want to set major progress
>>> after CCP unconditionally since some If nodes could be folded by it.
>>
>> Yes, that makes sense and I did it.
>>
>>>
>>> As you can see it is not simple :(
>>
>> No, it's not simple at all. I did a prototype that implements all we 
>> discussed above. Here is the code:
>> http://cr.openjdk.java.net/~zmajo/code/8148754/webrev/
>>
>> The code is not yet RFR quality, but I've sent it out because I'd 
>> like to have your feedback on how to continue.
>>
>> The code fixes the current problem with the unexpected graph shape. 
>> But it is likely to also solve similar problems that are triggered 
>> also by an unexpected graph shape, for example any of the asserts
>> in PhaseIdealLoop::do_range_check:
>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/3256d4204291/src/share/vm/opto/loopTransform.cpp#l2124 
>>
>>
>> I evaluated performance of the prototype. Peformance improves in a 
>> number of cases by 1-4%:
>> Octane-Mandreel
>> Octane-Richards
>> Octane-Splay
>>
>> Unfortunately, there is also a performance regression with 
>> SPECjvm2008-MonteCarlo-G1 (3-5%). Finding the cause of that 
>> regression is likely to take a at least a week, but most likely even 
>> more.
>>
>> So my question is: Should I spend more time on this prototype and fix 
>> the performance regression?
>>
>> A different solution would be check the complete graph shape. That is 
>> also done at other places, e.g., in SuperWord::get_pre_loop_end()
>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/3256d4204291/src/share/vm/opto/superword.cpp#l3076 
>>
>>
>> Here is the webrev for the second solution:
>> http://cr.openjdk.java.net/~zmajo/8148754/webrev.03/
>>
>> The second solution does not regress. I've tested it with:
>> - JPRT;
>> - local testing (linux-86_64) with the failing test case;
>> - executing all hotspot tests locally, all tests pass that pass with 
>> an unmodified build.
>>
>> Can you please let me know which solution you prefer:
>> - (1) the prototype with the regression solved or
>> - (2) checking the graph shape?
>>
>> We could also fix this issue with pushing (2) for now (as this issue 
>> is a "critical" nightly failure). I could then spend more time on (1) 
>> later in a different bug.
>>
>> Thank you and best regards,
>>
>>
>> Zoltan
>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 2/22/16 6:22 AM, Zoltán Majó wrote:
>>>> Hi Vladimir,
>>>>
>>>>
>>>> thank you for the feedback!
>>>>
>>>> On 02/16/2016 01:11 AM, Vladimir Kozlov wrote:
>>>>> Zoltan,
>>>>>
>>>>> It should not be "main" loop if peeling happened. See do_peeling():
>>>>>
>>>>>     if (cl->is_main_loop()) {
>>>>>       cl->set_normal_loop();
>>>>>
>>>>> Split-if optimization should not split through loop's phi. And
>>>>> generally not through loop's head since it is not making code 
>>>>> better -
>>>>> split through backedge moves code into loop again. Making loop body
>>>>> more complicated as this case shows.
>>>>
>>>> I did more investigation to understand what causes the invalid graph
>>>> shape to appear. It seems that the invalid graph shape appears because
>>>> the compiler uses the Compile:: _major_progress inconsistently. 
>>>> Here are
>>>> some details.
>>>>
>>>> - If _major_progress *is set*, the compiler expects more loop
>>>> optimizations to happen. Therefore, certain transformations on the 
>>>> graph
>>>> are not allowed so that the graph is in a shape that can be 
>>>> processed by
>>>> loop optimizations. See:
>>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/2c3c43037e14/src/share/vm/opto/convertnode.cpp#l253 
>>>>
>>>>
>>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/2c3c43037e14/src/share/vm/opto/castnode.cpp#l251 
>>>>
>>>>
>>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/2c3c43037e14/src/share/vm/opto/loopnode.cpp#l950 
>>>>
>>>>
>>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/2c3c43037e14/src/share/vm/opto/opaquenode.cpp#l37 
>>>>
>>>>
>>>>
>>>> - If _major_progress *is not set*, the compiler is allowed to perform
>>>> all possible transformations (because it does not have to care about
>>>> future loop optimizations).
>>>>
>>>> The crash reported for the current issue appears because 
>>>> _major_progress
>>>> *can be accidentally set again* after the compiler decided to stop
>>>> performing loop optimizations. As a result, invalid graph shapes 
>>>> appear.
>>>>
>>>> Here are details about how this happens for both failures I've been
>>>> studying:
>>>> https://bugs.openjdk.java.net/browse/JDK-8148754?focusedCommentId=13901941&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13901941 
>>>>
>>>>
>>>>
>>>> I would propose to change the compiler to use _major_progress
>>>> consistently. (This goes into the same direction as Tobias's recent 
>>>> work
>>>> on JDK-8144487.)
>>>>
>>>> I propose that _major_progress:
>>>> - can be SET when the compiler is initialized (because loop
>>>> optimizations are expected to happen afterwards);
>>>> - can be SET/RESET in the scope of loop optimizations (because we want
>>>> to see if loop optimizations made progress);
>>>> - cannot be SET/RESET by neither incremental inlining nor IGVN 
>>>> (even if
>>>> the IGVN is performed in the scope of loop optimizations).
>>>>
>>>> Here is the updated webrev:
>>>> http://cr.openjdk.java.net/~zmajo/8148754/webrev.02/
>>>>
>>>> Performance evaluation:
>>>> - The proposed webrev does not cause performance regressions for
>>>> SPECjvm2008, SPECjbb2005, and Octane.
>>>>
>>>> Testing:
>>>> - all hotspot JTREG tests on all supported platforms;
>>>> - JPRT;
>>>> - failing test case.
>>>>
>>>> Thank you and best regards,
>>>>
>>>>
>>>> Zoltan
>>>>
>>>>
>>>>>
>>>>> Bailout unrolling is fine but performance may suffer because in some
>>>>> cases loop unrolling is better then split-if.
>>>>
>>>>
>>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>> On 2/15/16 7:22 AM, Zoltán Majó wrote:
>>>>>> Hi,
>>>>>>
>>>>>>
>>>>>> please review the patch for 8148754.
>>>>>>
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8148754
>>>>>>
>>>>>> Problem: Compilation fails when the C2 compiler attempts loop 
>>>>>> unrolling.
>>>>>> The cause of the failure is that the loop unrolling optimization 
>>>>>> expects
>>>>>> a well-defined graph shape at the entry control of a 
>>>>>> 'CountedLoopNode'
>>>>>> ('IfTrue'/'IfFalse' preceeded by 'If' preceeded by 'Bool' 
>>>>>> preceeded by
>>>>>> 'CmpI').
>>>>>>
>>>>>>
>>>>>> Solution: I investigated several different instances of the same
>>>>>> failure. It turns out that the shape of the graph at a loop's entry
>>>>>> control is often different from the way loop unrolling expects it 
>>>>>> to be
>>>>>> (please find some examples in the bug's JBS issue). The various 
>>>>>> graph
>>>>>> shapes are a result of previously performed transformations, e.g.,
>>>>>> split-if optimization and loop peeling.
>>>>>>
>>>>>> Loop unrolling requires the above mentioned graph shape so that 
>>>>>> it can
>>>>>> adjust the zero-trip guard of the loop. With the unexpected graph
>>>>>> shapes, it is not possible to perform loop unrolling. However, 
>>>>>> the graph
>>>>>> is still in a valid state (except for loop unrolling) and can be 
>>>>>> used to
>>>>>> produce correct code.
>>>>>>
>>>>>> I propose that (1) we check if an unexpected graph shape is 
>>>>>> encountered
>>>>>> and (2) bail out of loop unrolling if it is (but not fail in the
>>>>>> compiler in such cases).
>>>>>>
>>>>>> The failure was triggered by Aleksey's Indify String Concatenation
>>>>>> changes but the generated bytecodes are valid. So this seems to be a
>>>>>> compiler issue that was previously there but was not yet triggered.
>>>>>>
>>>>>>
>>>>>> Webrev:
>>>>>> http://cr.openjdk.java.net/~zmajo/8148754/webrev.00/
>>>>>>
>>>>>> Testing:
>>>>>> - JPRT;
>>>>>> - local testing (linux-86_64) with the failing test case;
>>>>>> - executed all hotspot tests locally, all tests pass that pass 
>>>>>> with an
>>>>>> unmodified build.
>>>>>>
>>>>>> Thank you!
>>>>>>
>>>>>> Best regards,
>>>>>>
>>>>>>
>>>>>> Zoltan
>>>>>>
>>>>
>>