[9] RFR (S): 8148754: C2 loop unrolling fails due to unexpected graph shape

Fri Mar 18 17:14:21 UTC 2016

Very nice!

Thanks,
Vladimir

On 3/18/16 10:11 AM, Zoltán Majó wrote:
> Hi Vladimir,
>
>
> On 03/17/2016 06:12 PM, Vladimir Kozlov wrote:
>> On 3/17/16 9:09 AM, Zoltán Majó wrote:
>>> Hi Vladimir,
>>>
>>>
>>> thank you for the feedback.
>>>
>>> On 03/16/2016 09:52 PM, Vladimir Kozlov wrote:
>>>>> Can you please let me know which solution you prefer:
>>>>> - (1) the prototype with the regression solved or
>>>>> - (2) checking the graph shape?
>>>>
>>>> I agree that we should do (2) now. One suggestion I have is to prepare a separate method to do these checks and use
>>>> it in other places which you pointed - superword and do_range_check.
>>>
>>> OK, I updated the patch for (2) so that the check of the graph's shape is performed in a separate method. Here is the
>>> webrev:
>>> http://cr.openjdk.java.net/~zmajo/8148754/webrev.04/
>>
>> Zoltan, I don't see changes in do_range_check().
>
> I'm sorry, I forgot to add the check to do_range_check().
>
>> Opcode() is virtual function. Use is_*() query methods was originally in SuperWord::get_pre_loop_end().
>
> OK, I changed the checks to use is_* query methods instead of Opcode().
>
>> I don't like is_adjustable_loop_entry() name, especially since you negate it in checks. Consider
>> is_canonical_main_loop_entry().
>
> OK, I've updated the name.
>
>> Also move assert(cl->is_main_loop(), "") into it.
>
> Done.
>
> Here is the updated webrev:
> http://cr.openjdk.java.net/~zmajo/8148754/webrev.05/
>
> I've tested with JPRT and with locally executing the failing test. Both pass. I've started RBT testing.
>
> Thank you and best regards,
>
>
> Zoltan
>
>>
>> Thanks,
>> Vladimir
>>
>>>
>>> I've tested the updated webrev with JPRT and also by executing the failing test. Both pass. I will soon start RBT
>>> testing as well (will let you know if failures have appeared).
>>>
>>>> Yes, you can work later on (1) solution if you have a bug and not RFE - we should stabilize C2 as you know and these
>>>> changes may have some side effects we don't know about yet. But I like it because
>>>> we can explicitly specify which optimizations are allowed.
>>>
>>> OK. I filed 8152110: "Stabilize C2 loop optimizations" and will continue work in the scope of that bug:
>>> https://bugs.openjdk.java.net/browse/JDK-8152110
>>>>
>>>>> The two I'm concerned about are
>>>>> - Compile::cleanup_loop_predicates()
>>>>
>>>> Yes, this one should be marked LOOP_OPTS_LIMITED.
>>>>
>>>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/3256d4204291/src/share/vm/opto/compile.cpp#l1907
>>>>> - IdealLoopTree::remove_main_post_loops()
>>>>
>>>> This one is fine because the loop goes away.
>>>
>>> I'll take of these once I continue work on 8152110.
>>>
>>> Thank you!
>>>
>>> Best regards,
>>>
>>>
>>> Zoltan
>>>
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 3/16/16 10:59 AM, Zoltán Majó wrote:
>>>>> Hi Vladimir,
>>>>>
>>>>>
>>>>> I've spent more time on this issue. Please find my findings below.
>>>>>
>>>>> On 02/25/2016 03:53 AM, Vladimir Kozlov wrote:
>>>>>> So it is again _major_progress problem.
>>>>>> I have to spend more time on this. It is not simple.
>>>>>>
>>>>>> We may add an other state when Ideal transformation could be executed. For example, after all loop opts:
>>>>>>
>>>>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/0fc557e05fc0/src/share/vm/opto/compile.cpp#l2286
>>>>>>
>>>>>> Or more states to specify which Ideal transformations and loop optimizations could be executed in which state.
>>>>>
>>>>> I think adding more states is necessary, adding a single state is not sufficient as... (see below)
>>>>>
>>>>>>
>>>>>> The main problem from your description is elimination of Opaque1 on which loop optimizations relies.
>>>>>>
>>>>>> We can simply remove Opaque1Node::Identity(PhaseGVN* phase) because PhaseMacroExpand::expand_macro_nodes() will
>>>>>> remove them after all loop opts.
>>>>>
>>>>> ...there are even more places where Opaque1 nodes are removed, than we've initially assumed.
>>>>>
>>>>> The two I'm concerned about are
>>>>> - Compile::cleanup_loop_predicates()
>>>>
>>>> Yes, this one should be marked LOOP_OPTS_LIMITED.
>>>>
>>>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/3256d4204291/src/share/vm/opto/compile.cpp#l1907
>>>>> - IdealLoopTree::remove_main_post_loops()
>>>>
>>>> This one is fine because the loop goes away.
>>>>
>>>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/3256d4204291/src/share/vm/opto/loopTransform.cpp#l2469
>>>>>
>>>>>> On other hand we may do want to execute some simple loop optimizations even after Opaque, CastII and CastI2L are
>>>>>> optimized out. For example, removing empty loops or one iteration loops (pre-loops).
>>>>>> But definitely not ones which use cloning or other aggressive optimizations.
>>>>>
>>>>> Yes, I agree that we want to execute some loop optimizations even afterwards.
>>>>>
>>>>> So I've added three states: LOOP_OPTS_FULL, LOOP_OPTS_LIMITED, and LOOP_OPTS_INHIBITED. These states indicate which
>>>>> loop optimizations are allowed, Major_progress indicates only if loop optimizations
>>>>> have made progress (but not if loop optimizations are expected to be performed).
>>>>>
>>>>>>
>>>>>> Inline_Warm() is not used since InlineWarmCalls for very long time. The code could be very rotten by now. So
>>>>>> removing set_major_progress from it is fine.
>>>>>
>>>>> OK.
>>>>>
>>>>>>
>>>>>> It is also fine to remove it from inline_incrementally since it will be restored by skip_loop_opts code (and
>>>>>> cleared if method is empty or set if there are expensive nodes).
>>>>>
>>>>> OK.
>>>>>
>>>>>>
>>>>>> LoopNode::Ideal() change seems also fine. LoopNode is created only in loop opts (RootNode has own Ideal()) so if
>>>>>> it has TOP input it will be removed by RegionNode::Ideal most likely.
>>>>>
>>>>> OK.
>>>>>
>>>>>>
>>>>>> Which leaves remove_useless_bool() code only and I have concern about it. It could happened after CCP phase and we
>>>>>> may want to execute loop opts after it. I am actually want to set major progress
>>>>>> after CCP unconditionally since some If nodes could be folded by it.
>>>>>
>>>>> Yes, that makes sense and I did it.
>>>>>
>>>>>>
>>>>>> As you can see it is not simple :(
>>>>>
>>>>> No, it's not simple at all. I did a prototype that implements all we discussed above. Here is the code:
>>>>> http://cr.openjdk.java.net/~zmajo/code/8148754/webrev/
>>>>>
>>>>> The code is not yet RFR quality, but I've sent it out because I'd like to have your feedback on how to continue.
>>>>>
>>>>> The code fixes the current problem with the unexpected graph shape. But it is likely to also solve similar problems
>>>>> that are triggered also by an unexpected graph shape, for example any of the asserts
>>>>> in PhaseIdealLoop::do_range_check:
>>>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/3256d4204291/src/share/vm/opto/loopTransform.cpp#l2124
>>>>>
>>>>> I evaluated performance of the prototype. Peformance improves in a number of cases by 1-4%:
>>>>> Octane-Mandreel
>>>>> Octane-Richards
>>>>> Octane-Splay
>>>>>
>>>>> Unfortunately, there is also a performance regression with SPECjvm2008-MonteCarlo-G1 (3-5%). Finding the cause of
>>>>> that regression is likely to take a at least a week, but most likely even more.
>>>>>
>>>>> So my question is: Should I spend more time on this prototype and fix the performance regression?
>>>>>
>>>>> A different solution would be check the complete graph shape. That is also done at other places, e.g., in
>>>>> SuperWord::get_pre_loop_end()
>>>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/3256d4204291/src/share/vm/opto/superword.cpp#l3076
>>>>>
>>>>> Here is the webrev for the second solution:
>>>>> http://cr.openjdk.java.net/~zmajo/8148754/webrev.03/
>>>>>
>>>>> The second solution does not regress. I've tested it with:
>>>>> - JPRT;
>>>>> - local testing (linux-86_64) with the failing test case;
>>>>> - executing all hotspot tests locally, all tests pass that pass with an unmodified build.
>>>>>
>>>>> Can you please let me know which solution you prefer:
>>>>> - (1) the prototype with the regression solved or
>>>>> - (2) checking the graph shape?
>>>>>
>>>>> We could also fix this issue with pushing (2) for now (as this issue is a "critical" nightly failure). I could then
>>>>> spend more time on (1) later in a different bug.
>>>>>
>>>>> Thank you and best regards,
>>>>>
>>>>>
>>>>> Zoltan
>>>>>
>>>>>> Thanks,
>>>>>> Vladimir
>>>>>>
>>>>>> On 2/22/16 6:22 AM, Zoltán Majó wrote:
>>>>>>> Hi Vladimir,
>>>>>>>
>>>>>>>
>>>>>>> thank you for the feedback!
>>>>>>>
>>>>>>> On 02/16/2016 01:11 AM, Vladimir Kozlov wrote:
>>>>>>>> Zoltan,
>>>>>>>>
>>>>>>>> It should not be "main" loop if peeling happened. See do_peeling():
>>>>>>>>
>>>>>>>>     if (cl->is_main_loop()) {
>>>>>>>>       cl->set_normal_loop();
>>>>>>>>
>>>>>>>> Split-if optimization should not split through loop's phi. And
>>>>>>>> generally not through loop's head since it is not making code better -
>>>>>>>> split through backedge moves code into loop again. Making loop body
>>>>>>>> more complicated as this case shows.
>>>>>>>
>>>>>>> I did more investigation to understand what causes the invalid graph
>>>>>>> shape to appear. It seems that the invalid graph shape appears because
>>>>>>> the compiler uses the Compile:: _major_progress inconsistently. Here are
>>>>>>> some details.
>>>>>>>
>>>>>>> - If _major_progress *is set*, the compiler expects more loop
>>>>>>> optimizations to happen. Therefore, certain transformations on the graph
>>>>>>> are not allowed so that the graph is in a shape that can be processed by
>>>>>>> loop optimizations. See:
>>>>>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/2c3c43037e14/src/share/vm/opto/convertnode.cpp#l253
>>>>>>>
>>>>>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/2c3c43037e14/src/share/vm/opto/castnode.cpp#l251
>>>>>>>
>>>>>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/2c3c43037e14/src/share/vm/opto/loopnode.cpp#l950
>>>>>>>
>>>>>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/2c3c43037e14/src/share/vm/opto/opaquenode.cpp#l37
>>>>>>>
>>>>>>>
>>>>>>> - If _major_progress *is not set*, the compiler is allowed to perform
>>>>>>> all possible transformations (because it does not have to care about
>>>>>>> future loop optimizations).
>>>>>>>
>>>>>>> The crash reported for the current issue appears because _major_progress
>>>>>>> *can be accidentally set again* after the compiler decided to stop
>>>>>>> performing loop optimizations. As a result, invalid graph shapes appear.
>>>>>>>
>>>>>>> Here are details about how this happens for both failures I've been
>>>>>>> studying:
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8148754?focusedCommentId=13901941&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13901941
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I would propose to change the compiler to use _major_progress
>>>>>>> consistently. (This goes into the same direction as Tobias's recent work
>>>>>>> on JDK-8144487.)
>>>>>>>
>>>>>>> I propose that _major_progress:
>>>>>>> - can be SET when the compiler is initialized (because loop
>>>>>>> optimizations are expected to happen afterwards);
>>>>>>> - can be SET/RESET in the scope of loop optimizations (because we want
>>>>>>> to see if loop optimizations made progress);
>>>>>>> - cannot be SET/RESET by neither incremental inlining nor IGVN (even if
>>>>>>> the IGVN is performed in the scope of loop optimizations).
>>>>>>>
>>>>>>> Here is the updated webrev:
>>>>>>> http://cr.openjdk.java.net/~zmajo/8148754/webrev.02/
>>>>>>>
>>>>>>> Performance evaluation:
>>>>>>> - The proposed webrev does not cause performance regressions for
>>>>>>> SPECjvm2008, SPECjbb2005, and Octane.
>>>>>>>
>>>>>>> Testing:
>>>>>>> - all hotspot JTREG tests on all supported platforms;
>>>>>>> - JPRT;
>>>>>>> - failing test case.
>>>>>>>
>>>>>>> Thank you and best regards,
>>>>>>>
>>>>>>>
>>>>>>> Zoltan
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Bailout unrolling is fine but performance may suffer because in some
>>>>>>>> cases loop unrolling is better then split-if.
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Vladimir
>>>>>>>>
>>>>>>>> On 2/15/16 7:22 AM, Zoltán Majó wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> please review the patch for 8148754.
>>>>>>>>>
>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8148754
>>>>>>>>>
>>>>>>>>> Problem: Compilation fails when the C2 compiler attempts loop unrolling.
>>>>>>>>> The cause of the failure is that the loop unrolling optimization expects
>>>>>>>>> a well-defined graph shape at the entry control of a 'CountedLoopNode'
>>>>>>>>> ('IfTrue'/'IfFalse' preceeded by 'If' preceeded by 'Bool' preceeded by
>>>>>>>>> 'CmpI').
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Solution: I investigated several different instances of the same
>>>>>>>>> failure. It turns out that the shape of the graph at a loop's entry
>>>>>>>>> control is often different from the way loop unrolling expects it to be
>>>>>>>>> (please find some examples in the bug's JBS issue). The various graph
>>>>>>>>> shapes are a result of previously performed transformations, e.g.,
>>>>>>>>> split-if optimization and loop peeling.
>>>>>>>>>
>>>>>>>>> Loop unrolling requires the above mentioned graph shape so that it can
>>>>>>>>> adjust the zero-trip guard of the loop. With the unexpected graph
>>>>>>>>> shapes, it is not possible to perform loop unrolling. However, the graph
>>>>>>>>> is still in a valid state (except for loop unrolling) and can be used to
>>>>>>>>> produce correct code.
>>>>>>>>>
>>>>>>>>> I propose that (1) we check if an unexpected graph shape is encountered
>>>>>>>>> and (2) bail out of loop unrolling if it is (but not fail in the
>>>>>>>>> compiler in such cases).
>>>>>>>>>
>>>>>>>>> The failure was triggered by Aleksey's Indify String Concatenation
>>>>>>>>> changes but the generated bytecodes are valid. So this seems to be a
>>>>>>>>> compiler issue that was previously there but was not yet triggered.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Webrev:
>>>>>>>>> http://cr.openjdk.java.net/~zmajo/8148754/webrev.00/
>>>>>>>>>
>>>>>>>>> Testing:
>>>>>>>>> - JPRT;
>>>>>>>>> - local testing (linux-86_64) with the failing test case;
>>>>>>>>> - executed all hotspot tests locally, all tests pass that pass with an
>>>>>>>>> unmodified build.
>>>>>>>>>
>>>>>>>>> Thank you!
>>>>>>>>>
>>>>>>>>> Best regards,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Zoltan
>>>>>>>>>
>>>>>>>
>>>>>
>>>
>