[9] RFR (S): 8148754: C2 loop unrolling fails due to unexpected graph shape

Thu Mar 17 17:12:45 UTC 2016

On 3/17/16 9:09 AM, Zoltán Majó wrote:
> Hi Vladimir,
>
>
> thank you for the feedback.
>
> On 03/16/2016 09:52 PM, Vladimir Kozlov wrote:
>>> Can you please let me know which solution you prefer:
>>> - (1) the prototype with the regression solved or
>>> - (2) checking the graph shape?
>>
>> I agree that we should do (2) now. One suggestion I have is to prepare a separate method to do these checks and use it in other places which you pointed - superword and do_range_check.
>
> OK, I updated the patch for (2) so that the check of the graph's shape is performed in a separate method. Here is the webrev:
> http://cr.openjdk.java.net/~zmajo/8148754/webrev.04/

Zoltan, I don't see changes in do_range_check().
Opcode() is virtual function. Use is_*() query methods was originally in SuperWord::get_pre_loop_end().
I don't like is_adjustable_loop_entry() name, especially since you negate it in checks. Consider is_canonical_main_loop_entry().
Also move assert(cl->is_main_loop(), "") into it.

Thanks,
Vladimir

>
> I've tested the updated webrev with JPRT and also by executing the failing test. Both pass. I will soon start RBT testing as well (will let you know if failures have appeared).
>
>> Yes, you can work later on (1) solution if you have a bug and not RFE - we should stabilize C2 as you know and these changes may have some side effects we don't know about yet. But I like it because
>> we can explicitly specify which optimizations are allowed.
>
> OK. I filed 8152110: "Stabilize C2 loop optimizations" and will continue work in the scope of that bug:
> https://bugs.openjdk.java.net/browse/JDK-8152110
>>
>>> The two I'm concerned about are
>>> - Compile::cleanup_loop_predicates()
>>
>> Yes, this one should be marked LOOP_OPTS_LIMITED.
>>
>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/3256d4204291/src/share/vm/opto/compile.cpp#l1907
>>> - IdealLoopTree::remove_main_post_loops()
>>
>> This one is fine because the loop goes away.
>
> I'll take of these once I continue work on 8152110.
>
> Thank you!
>
> Best regards,
>
>
> Zoltan
>
>>
>> Thanks,
>> Vladimir
>>
>> On 3/16/16 10:59 AM, Zoltán Majó wrote:
>>> Hi Vladimir,
>>>
>>>
>>> I've spent more time on this issue. Please find my findings below.
>>>
>>> On 02/25/2016 03:53 AM, Vladimir Kozlov wrote:
>>>> So it is again _major_progress problem.
>>>> I have to spend more time on this. It is not simple.
>>>>
>>>> We may add an other state when Ideal transformation could be executed. For example, after all loop opts:
>>>>
>>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/0fc557e05fc0/src/share/vm/opto/compile.cpp#l2286
>>>>
>>>> Or more states to specify which Ideal transformations and loop optimizations could be executed in which state.
>>>
>>> I think adding more states is necessary, adding a single state is not sufficient as... (see below)
>>>
>>>>
>>>> The main problem from your description is elimination of Opaque1 on which loop optimizations relies.
>>>>
>>>> We can simply remove Opaque1Node::Identity(PhaseGVN* phase) because PhaseMacroExpand::expand_macro_nodes() will remove them after all loop opts.
>>>
>>> ...there are even more places where Opaque1 nodes are removed, than we've initially assumed.
>>>
>>> The two I'm concerned about are
>>> - Compile::cleanup_loop_predicates()
>>
>> Yes, this one should be marked LOOP_OPTS_LIMITED.
>>
>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/3256d4204291/src/share/vm/opto/compile.cpp#l1907
>>> - IdealLoopTree::remove_main_post_loops()
>>
>> This one is fine because the loop goes away.
>>
>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/3256d4204291/src/share/vm/opto/loopTransform.cpp#l2469
>>>
>>>> On other hand we may do want to execute some simple loop optimizations even after Opaque, CastII and CastI2L are optimized out. For example, removing empty loops or one iteration loops (pre-loops).
>>>> But definitely not ones which use cloning or other aggressive optimizations.
>>>
>>> Yes, I agree that we want to execute some loop optimizations even afterwards.
>>>
>>> So I've added three states: LOOP_OPTS_FULL, LOOP_OPTS_LIMITED, and LOOP_OPTS_INHIBITED. These states indicate which loop optimizations are allowed, Major_progress indicates only if loop optimizations
>>> have made progress (but not if loop optimizations are expected to be performed).
>>>
>>>>
>>>> Inline_Warm() is not used since InlineWarmCalls for very long time. The code could be very rotten by now. So removing set_major_progress from it is fine.
>>>
>>> OK.
>>>
>>>>
>>>> It is also fine to remove it from inline_incrementally since it will be restored by skip_loop_opts code (and cleared if method is empty or set if there are expensive nodes).
>>>
>>> OK.
>>>
>>>>
>>>> LoopNode::Ideal() change seems also fine. LoopNode is created only in loop opts (RootNode has own Ideal()) so if it has TOP input it will be removed by RegionNode::Ideal most likely.
>>>
>>> OK.
>>>
>>>>
>>>> Which leaves remove_useless_bool() code only and I have concern about it. It could happened after CCP phase and we may want to execute loop opts after it. I am actually want to set major progress
>>>> after CCP unconditionally since some If nodes could be folded by it.
>>>
>>> Yes, that makes sense and I did it.
>>>
>>>>
>>>> As you can see it is not simple :(
>>>
>>> No, it's not simple at all. I did a prototype that implements all we discussed above. Here is the code:
>>> http://cr.openjdk.java.net/~zmajo/code/8148754/webrev/
>>>
>>> The code is not yet RFR quality, but I've sent it out because I'd like to have your feedback on how to continue.
>>>
>>> The code fixes the current problem with the unexpected graph shape. But it is likely to also solve similar problems that are triggered also by an unexpected graph shape, for example any of the asserts
>>> in PhaseIdealLoop::do_range_check:
>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/3256d4204291/src/share/vm/opto/loopTransform.cpp#l2124
>>>
>>> I evaluated performance of the prototype. Peformance improves in a number of cases by 1-4%:
>>> Octane-Mandreel
>>> Octane-Richards
>>> Octane-Splay
>>>
>>> Unfortunately, there is also a performance regression with SPECjvm2008-MonteCarlo-G1 (3-5%). Finding the cause of that regression is likely to take a at least a week, but most likely even more.
>>>
>>> So my question is: Should I spend more time on this prototype and fix the performance regression?
>>>
>>> A different solution would be check the complete graph shape. That is also done at other places, e.g., in SuperWord::get_pre_loop_end()
>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/3256d4204291/src/share/vm/opto/superword.cpp#l3076
>>>
>>> Here is the webrev for the second solution:
>>> http://cr.openjdk.java.net/~zmajo/8148754/webrev.03/
>>>
>>> The second solution does not regress. I've tested it with:
>>> - JPRT;
>>> - local testing (linux-86_64) with the failing test case;
>>> - executing all hotspot tests locally, all tests pass that pass with an unmodified build.
>>>
>>> Can you please let me know which solution you prefer:
>>> - (1) the prototype with the regression solved or
>>> - (2) checking the graph shape?
>>>
>>> We could also fix this issue with pushing (2) for now (as this issue is a "critical" nightly failure). I could then spend more time on (1) later in a different bug.
>>>
>>> Thank you and best regards,
>>>
>>>
>>> Zoltan
>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 2/22/16 6:22 AM, Zoltán Majó wrote:
>>>>> Hi Vladimir,
>>>>>
>>>>>
>>>>> thank you for the feedback!
>>>>>
>>>>> On 02/16/2016 01:11 AM, Vladimir Kozlov wrote:
>>>>>> Zoltan,
>>>>>>
>>>>>> It should not be "main" loop if peeling happened. See do_peeling():
>>>>>>
>>>>>>     if (cl->is_main_loop()) {
>>>>>>       cl->set_normal_loop();
>>>>>>
>>>>>> Split-if optimization should not split through loop's phi. And
>>>>>> generally not through loop's head since it is not making code better -
>>>>>> split through backedge moves code into loop again. Making loop body
>>>>>> more complicated as this case shows.
>>>>>
>>>>> I did more investigation to understand what causes the invalid graph
>>>>> shape to appear. It seems that the invalid graph shape appears because
>>>>> the compiler uses the Compile:: _major_progress inconsistently. Here are
>>>>> some details.
>>>>>
>>>>> - If _major_progress *is set*, the compiler expects more loop
>>>>> optimizations to happen. Therefore, certain transformations on the graph
>>>>> are not allowed so that the graph is in a shape that can be processed by
>>>>> loop optimizations. See:
>>>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/2c3c43037e14/src/share/vm/opto/convertnode.cpp#l253
>>>>>
>>>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/2c3c43037e14/src/share/vm/opto/castnode.cpp#l251
>>>>>
>>>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/2c3c43037e14/src/share/vm/opto/loopnode.cpp#l950
>>>>>
>>>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/2c3c43037e14/src/share/vm/opto/opaquenode.cpp#l37
>>>>>
>>>>>
>>>>> - If _major_progress *is not set*, the compiler is allowed to perform
>>>>> all possible transformations (because it does not have to care about
>>>>> future loop optimizations).
>>>>>
>>>>> The crash reported for the current issue appears because _major_progress
>>>>> *can be accidentally set again* after the compiler decided to stop
>>>>> performing loop optimizations. As a result, invalid graph shapes appear.
>>>>>
>>>>> Here are details about how this happens for both failures I've been
>>>>> studying:
>>>>> https://bugs.openjdk.java.net/browse/JDK-8148754?focusedCommentId=13901941&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13901941
>>>>>
>>>>>
>>>>> I would propose to change the compiler to use _major_progress
>>>>> consistently. (This goes into the same direction as Tobias's recent work
>>>>> on JDK-8144487.)
>>>>>
>>>>> I propose that _major_progress:
>>>>> - can be SET when the compiler is initialized (because loop
>>>>> optimizations are expected to happen afterwards);
>>>>> - can be SET/RESET in the scope of loop optimizations (because we want
>>>>> to see if loop optimizations made progress);
>>>>> - cannot be SET/RESET by neither incremental inlining nor IGVN (even if
>>>>> the IGVN is performed in the scope of loop optimizations).
>>>>>
>>>>> Here is the updated webrev:
>>>>> http://cr.openjdk.java.net/~zmajo/8148754/webrev.02/
>>>>>
>>>>> Performance evaluation:
>>>>> - The proposed webrev does not cause performance regressions for
>>>>> SPECjvm2008, SPECjbb2005, and Octane.
>>>>>
>>>>> Testing:
>>>>> - all hotspot JTREG tests on all supported platforms;
>>>>> - JPRT;
>>>>> - failing test case.
>>>>>
>>>>> Thank you and best regards,
>>>>>
>>>>>
>>>>> Zoltan
>>>>>
>>>>>
>>>>>>
>>>>>> Bailout unrolling is fine but performance may suffer because in some
>>>>>> cases loop unrolling is better then split-if.
>>>>>
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Vladimir
>>>>>>
>>>>>> On 2/15/16 7:22 AM, Zoltán Majó wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>>
>>>>>>> please review the patch for 8148754.
>>>>>>>
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8148754
>>>>>>>
>>>>>>> Problem: Compilation fails when the C2 compiler attempts loop unrolling.
>>>>>>> The cause of the failure is that the loop unrolling optimization expects
>>>>>>> a well-defined graph shape at the entry control of a 'CountedLoopNode'
>>>>>>> ('IfTrue'/'IfFalse' preceeded by 'If' preceeded by 'Bool' preceeded by
>>>>>>> 'CmpI').
>>>>>>>
>>>>>>>
>>>>>>> Solution: I investigated several different instances of the same
>>>>>>> failure. It turns out that the shape of the graph at a loop's entry
>>>>>>> control is often different from the way loop unrolling expects it to be
>>>>>>> (please find some examples in the bug's JBS issue). The various graph
>>>>>>> shapes are a result of previously performed transformations, e.g.,
>>>>>>> split-if optimization and loop peeling.
>>>>>>>
>>>>>>> Loop unrolling requires the above mentioned graph shape so that it can
>>>>>>> adjust the zero-trip guard of the loop. With the unexpected graph
>>>>>>> shapes, it is not possible to perform loop unrolling. However, the graph
>>>>>>> is still in a valid state (except for loop unrolling) and can be used to
>>>>>>> produce correct code.
>>>>>>>
>>>>>>> I propose that (1) we check if an unexpected graph shape is encountered
>>>>>>> and (2) bail out of loop unrolling if it is (but not fail in the
>>>>>>> compiler in such cases).
>>>>>>>
>>>>>>> The failure was triggered by Aleksey's Indify String Concatenation
>>>>>>> changes but the generated bytecodes are valid. So this seems to be a
>>>>>>> compiler issue that was previously there but was not yet triggered.
>>>>>>>
>>>>>>>
>>>>>>> Webrev:
>>>>>>> http://cr.openjdk.java.net/~zmajo/8148754/webrev.00/
>>>>>>>
>>>>>>> Testing:
>>>>>>> - JPRT;
>>>>>>> - local testing (linux-86_64) with the failing test case;
>>>>>>> - executed all hotspot tests locally, all tests pass that pass with an
>>>>>>> unmodified build.
>>>>>>>
>>>>>>> Thank you!
>>>>>>>
>>>>>>> Best regards,
>>>>>>>
>>>>>>>
>>>>>>> Zoltan
>>>>>>>
>>>>>
>>>
>