[9] RFR (S): 8148754: C2 loop unrolling fails due to unexpected graph shape

Fri Mar 18 17:11:34 UTC 2016

Hi Vladimir,

On 03/17/2016 06:12 PM, Vladimir Kozlov wrote:
> On 3/17/16 9:09 AM, Zoltán Majó wrote:
>> Hi Vladimir,
>>
>>
>> thank you for the feedback.
>>
>> On 03/16/2016 09:52 PM, Vladimir Kozlov wrote:
>>>> Can you please let me know which solution you prefer:
>>>> - (1) the prototype with the regression solved or
>>>> - (2) checking the graph shape?
>>>
>>> I agree that we should do (2) now. One suggestion I have is to 
>>> prepare a separate method to do these checks and use it in other 
>>> places which you pointed - superword and do_range_check.
>>
>> OK, I updated the patch for (2) so that the check of the graph's 
>> shape is performed in a separate method. Here is the webrev:
>> http://cr.openjdk.java.net/~zmajo/8148754/webrev.04/
>
> Zoltan, I don't see changes in do_range_check().

I'm sorry, I forgot to add the check to do_range_check().

> Opcode() is virtual function. Use is_*() query methods was originally 
> in SuperWord::get_pre_loop_end().

OK, I changed the checks to use is_* query methods instead of Opcode().

> I don't like is_adjustable_loop_entry() name, especially since you 
> negate it in checks. Consider is_canonical_main_loop_entry().

OK, I've updated the name.

> Also move assert(cl->is_main_loop(), "") into it.

Done.

Here is the updated webrev:
http://cr.openjdk.java.net/~zmajo/8148754/webrev.05/

I've tested with JPRT and with locally executing the failing test. Both 
pass. I've started RBT testing.

Thank you and best regards,

Zoltan

>
> Thanks,
> Vladimir
>
>>
>> I've tested the updated webrev with JPRT and also by executing the 
>> failing test. Both pass. I will soon start RBT testing as well (will 
>> let you know if failures have appeared).
>>
>>> Yes, you can work later on (1) solution if you have a bug and not 
>>> RFE - we should stabilize C2 as you know and these changes may have 
>>> some side effects we don't know about yet. But I like it because
>>> we can explicitly specify which optimizations are allowed.
>>
>> OK. I filed 8152110: "Stabilize C2 loop optimizations" and will 
>> continue work in the scope of that bug:
>> https://bugs.openjdk.java.net/browse/JDK-8152110
>>>
>>>> The two I'm concerned about are
>>>> - Compile::cleanup_loop_predicates()
>>>
>>> Yes, this one should be marked LOOP_OPTS_LIMITED.
>>>
>>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/3256d4204291/src/share/vm/opto/compile.cpp#l1907 
>>>>
>>>> - IdealLoopTree::remove_main_post_loops()
>>>
>>> This one is fine because the loop goes away.
>>
>> I'll take of these once I continue work on 8152110.
>>
>> Thank you!
>>
>> Best regards,
>>
>>
>> Zoltan
>>
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 3/16/16 10:59 AM, Zoltán Majó wrote:
>>>> Hi Vladimir,
>>>>
>>>>
>>>> I've spent more time on this issue. Please find my findings below.
>>>>
>>>> On 02/25/2016 03:53 AM, Vladimir Kozlov wrote:
>>>>> So it is again _major_progress problem.
>>>>> I have to spend more time on this. It is not simple.
>>>>>
>>>>> We may add an other state when Ideal transformation could be 
>>>>> executed. For example, after all loop opts:
>>>>>
>>>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/0fc557e05fc0/src/share/vm/opto/compile.cpp#l2286 
>>>>>
>>>>>
>>>>> Or more states to specify which Ideal transformations and loop 
>>>>> optimizations could be executed in which state.
>>>>
>>>> I think adding more states is necessary, adding a single state is 
>>>> not sufficient as... (see below)
>>>>
>>>>>
>>>>> The main problem from your description is elimination of Opaque1 
>>>>> on which loop optimizations relies.
>>>>>
>>>>> We can simply remove Opaque1Node::Identity(PhaseGVN* phase) 
>>>>> because PhaseMacroExpand::expand_macro_nodes() will remove them 
>>>>> after all loop opts.
>>>>
>>>> ...there are even more places where Opaque1 nodes are removed, than 
>>>> we've initially assumed.
>>>>
>>>> The two I'm concerned about are
>>>> - Compile::cleanup_loop_predicates()
>>>
>>> Yes, this one should be marked LOOP_OPTS_LIMITED.
>>>
>>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/3256d4204291/src/share/vm/opto/compile.cpp#l1907 
>>>>
>>>> - IdealLoopTree::remove_main_post_loops()
>>>
>>> This one is fine because the loop goes away.
>>>
>>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/3256d4204291/src/share/vm/opto/loopTransform.cpp#l2469 
>>>>
>>>>
>>>>> On other hand we may do want to execute some simple loop 
>>>>> optimizations even after Opaque, CastII and CastI2L are optimized 
>>>>> out. For example, removing empty loops or one iteration loops 
>>>>> (pre-loops).
>>>>> But definitely not ones which use cloning or other aggressive 
>>>>> optimizations.
>>>>
>>>> Yes, I agree that we want to execute some loop optimizations even 
>>>> afterwards.
>>>>
>>>> So I've added three states: LOOP_OPTS_FULL, LOOP_OPTS_LIMITED, and 
>>>> LOOP_OPTS_INHIBITED. These states indicate which loop optimizations 
>>>> are allowed, Major_progress indicates only if loop optimizations
>>>> have made progress (but not if loop optimizations are expected to 
>>>> be performed).
>>>>
>>>>>
>>>>> Inline_Warm() is not used since InlineWarmCalls for very long 
>>>>> time. The code could be very rotten by now. So removing 
>>>>> set_major_progress from it is fine.
>>>>
>>>> OK.
>>>>
>>>>>
>>>>> It is also fine to remove it from inline_incrementally since it 
>>>>> will be restored by skip_loop_opts code (and cleared if method is 
>>>>> empty or set if there are expensive nodes).
>>>>
>>>> OK.
>>>>
>>>>>
>>>>> LoopNode::Ideal() change seems also fine. LoopNode is created only 
>>>>> in loop opts (RootNode has own Ideal()) so if it has TOP input it 
>>>>> will be removed by RegionNode::Ideal most likely.
>>>>
>>>> OK.
>>>>
>>>>>
>>>>> Which leaves remove_useless_bool() code only and I have concern 
>>>>> about it. It could happened after CCP phase and we may want to 
>>>>> execute loop opts after it. I am actually want to set major progress
>>>>> after CCP unconditionally since some If nodes could be folded by it.
>>>>
>>>> Yes, that makes sense and I did it.
>>>>
>>>>>
>>>>> As you can see it is not simple :(
>>>>
>>>> No, it's not simple at all. I did a prototype that implements all 
>>>> we discussed above. Here is the code:
>>>> http://cr.openjdk.java.net/~zmajo/code/8148754/webrev/
>>>>
>>>> The code is not yet RFR quality, but I've sent it out because I'd 
>>>> like to have your feedback on how to continue.
>>>>
>>>> The code fixes the current problem with the unexpected graph shape. 
>>>> But it is likely to also solve similar problems that are triggered 
>>>> also by an unexpected graph shape, for example any of the asserts
>>>> in PhaseIdealLoop::do_range_check:
>>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/3256d4204291/src/share/vm/opto/loopTransform.cpp#l2124 
>>>>
>>>>
>>>> I evaluated performance of the prototype. Peformance improves in a 
>>>> number of cases by 1-4%:
>>>> Octane-Mandreel
>>>> Octane-Richards
>>>> Octane-Splay
>>>>
>>>> Unfortunately, there is also a performance regression with 
>>>> SPECjvm2008-MonteCarlo-G1 (3-5%). Finding the cause of that 
>>>> regression is likely to take a at least a week, but most likely 
>>>> even more.
>>>>
>>>> So my question is: Should I spend more time on this prototype and 
>>>> fix the performance regression?
>>>>
>>>> A different solution would be check the complete graph shape. That 
>>>> is also done at other places, e.g., in SuperWord::get_pre_loop_end()
>>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/3256d4204291/src/share/vm/opto/superword.cpp#l3076 
>>>>
>>>>
>>>> Here is the webrev for the second solution:
>>>> http://cr.openjdk.java.net/~zmajo/8148754/webrev.03/
>>>>
>>>> The second solution does not regress. I've tested it with:
>>>> - JPRT;
>>>> - local testing (linux-86_64) with the failing test case;
>>>> - executing all hotspot tests locally, all tests pass that pass 
>>>> with an unmodified build.
>>>>
>>>> Can you please let me know which solution you prefer:
>>>> - (1) the prototype with the regression solved or
>>>> - (2) checking the graph shape?
>>>>
>>>> We could also fix this issue with pushing (2) for now (as this 
>>>> issue is a "critical" nightly failure). I could then spend more 
>>>> time on (1) later in a different bug.
>>>>
>>>> Thank you and best regards,
>>>>
>>>>
>>>> Zoltan
>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>> On 2/22/16 6:22 AM, Zoltán Majó wrote:
>>>>>> Hi Vladimir,
>>>>>>
>>>>>>
>>>>>> thank you for the feedback!
>>>>>>
>>>>>> On 02/16/2016 01:11 AM, Vladimir Kozlov wrote:
>>>>>>> Zoltan,
>>>>>>>
>>>>>>> It should not be "main" loop if peeling happened. See do_peeling():
>>>>>>>
>>>>>>>     if (cl->is_main_loop()) {
>>>>>>>       cl->set_normal_loop();
>>>>>>>
>>>>>>> Split-if optimization should not split through loop's phi. And
>>>>>>> generally not through loop's head since it is not making code 
>>>>>>> better -
>>>>>>> split through backedge moves code into loop again. Making loop body
>>>>>>> more complicated as this case shows.
>>>>>>
>>>>>> I did more investigation to understand what causes the invalid graph
>>>>>> shape to appear. It seems that the invalid graph shape appears 
>>>>>> because
>>>>>> the compiler uses the Compile:: _major_progress inconsistently. 
>>>>>> Here are
>>>>>> some details.
>>>>>>
>>>>>> - If _major_progress *is set*, the compiler expects more loop
>>>>>> optimizations to happen. Therefore, certain transformations on 
>>>>>> the graph
>>>>>> are not allowed so that the graph is in a shape that can be 
>>>>>> processed by
>>>>>> loop optimizations. See:
>>>>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/2c3c43037e14/src/share/vm/opto/convertnode.cpp#l253 
>>>>>>
>>>>>>
>>>>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/2c3c43037e14/src/share/vm/opto/castnode.cpp#l251 
>>>>>>
>>>>>>
>>>>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/2c3c43037e14/src/share/vm/opto/loopnode.cpp#l950 
>>>>>>
>>>>>>
>>>>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/2c3c43037e14/src/share/vm/opto/opaquenode.cpp#l37 
>>>>>>
>>>>>>
>>>>>>
>>>>>> - If _major_progress *is not set*, the compiler is allowed to 
>>>>>> perform
>>>>>> all possible transformations (because it does not have to care about
>>>>>> future loop optimizations).
>>>>>>
>>>>>> The crash reported for the current issue appears because 
>>>>>> _major_progress
>>>>>> *can be accidentally set again* after the compiler decided to stop
>>>>>> performing loop optimizations. As a result, invalid graph shapes 
>>>>>> appear.
>>>>>>
>>>>>> Here are details about how this happens for both failures I've been
>>>>>> studying:
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8148754?focusedCommentId=13901941&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13901941 
>>>>>>
>>>>>>
>>>>>>
>>>>>> I would propose to change the compiler to use _major_progress
>>>>>> consistently. (This goes into the same direction as Tobias's 
>>>>>> recent work
>>>>>> on JDK-8144487.)
>>>>>>
>>>>>> I propose that _major_progress:
>>>>>> - can be SET when the compiler is initialized (because loop
>>>>>> optimizations are expected to happen afterwards);
>>>>>> - can be SET/RESET in the scope of loop optimizations (because we 
>>>>>> want
>>>>>> to see if loop optimizations made progress);
>>>>>> - cannot be SET/RESET by neither incremental inlining nor IGVN 
>>>>>> (even if
>>>>>> the IGVN is performed in the scope of loop optimizations).
>>>>>>
>>>>>> Here is the updated webrev:
>>>>>> http://cr.openjdk.java.net/~zmajo/8148754/webrev.02/
>>>>>>
>>>>>> Performance evaluation:
>>>>>> - The proposed webrev does not cause performance regressions for
>>>>>> SPECjvm2008, SPECjbb2005, and Octane.
>>>>>>
>>>>>> Testing:
>>>>>> - all hotspot JTREG tests on all supported platforms;
>>>>>> - JPRT;
>>>>>> - failing test case.
>>>>>>
>>>>>> Thank you and best regards,
>>>>>>
>>>>>>
>>>>>> Zoltan
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Bailout unrolling is fine but performance may suffer because in 
>>>>>>> some
>>>>>>> cases loop unrolling is better then split-if.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Vladimir
>>>>>>>
>>>>>>> On 2/15/16 7:22 AM, Zoltán Majó wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>>
>>>>>>>> please review the patch for 8148754.
>>>>>>>>
>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8148754
>>>>>>>>
>>>>>>>> Problem: Compilation fails when the C2 compiler attempts loop 
>>>>>>>> unrolling.
>>>>>>>> The cause of the failure is that the loop unrolling 
>>>>>>>> optimization expects
>>>>>>>> a well-defined graph shape at the entry control of a 
>>>>>>>> 'CountedLoopNode'
>>>>>>>> ('IfTrue'/'IfFalse' preceeded by 'If' preceeded by 'Bool' 
>>>>>>>> preceeded by
>>>>>>>> 'CmpI').
>>>>>>>>
>>>>>>>>
>>>>>>>> Solution: I investigated several different instances of the same
>>>>>>>> failure. It turns out that the shape of the graph at a loop's 
>>>>>>>> entry
>>>>>>>> control is often different from the way loop unrolling expects 
>>>>>>>> it to be
>>>>>>>> (please find some examples in the bug's JBS issue). The various 
>>>>>>>> graph
>>>>>>>> shapes are a result of previously performed transformations, e.g.,
>>>>>>>> split-if optimization and loop peeling.
>>>>>>>>
>>>>>>>> Loop unrolling requires the above mentioned graph shape so that 
>>>>>>>> it can
>>>>>>>> adjust the zero-trip guard of the loop. With the unexpected graph
>>>>>>>> shapes, it is not possible to perform loop unrolling. However, 
>>>>>>>> the graph
>>>>>>>> is still in a valid state (except for loop unrolling) and can 
>>>>>>>> be used to
>>>>>>>> produce correct code.
>>>>>>>>
>>>>>>>> I propose that (1) we check if an unexpected graph shape is 
>>>>>>>> encountered
>>>>>>>> and (2) bail out of loop unrolling if it is (but not fail in the
>>>>>>>> compiler in such cases).
>>>>>>>>
>>>>>>>> The failure was triggered by Aleksey's Indify String Concatenation
>>>>>>>> changes but the generated bytecodes are valid. So this seems to 
>>>>>>>> be a
>>>>>>>> compiler issue that was previously there but was not yet 
>>>>>>>> triggered.
>>>>>>>>
>>>>>>>>
>>>>>>>> Webrev:
>>>>>>>> http://cr.openjdk.java.net/~zmajo/8148754/webrev.00/
>>>>>>>>
>>>>>>>> Testing:
>>>>>>>> - JPRT;
>>>>>>>> - local testing (linux-86_64) with the failing test case;
>>>>>>>> - executed all hotspot tests locally, all tests pass that pass 
>>>>>>>> with an
>>>>>>>> unmodified build.
>>>>>>>>
>>>>>>>> Thank you!
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>>
>>>>>>>>
>>>>>>>> Zoltan
>>>>>>>>
>>>>>>
>>>>
>>