[9] RFR (S): 8148754: C2 loop unrolling fails due to unexpected graph shape

Mon Mar 21 08:39:03 UTC 2016

Hi Chris,

On 03/18/2016 07:50 PM, Christian Thalinger wrote:
> Wait.  is_canonical_main_loop_entry is a static method:
> +  static bool is_canonical_main_loop_entry(CountedLoopNode* cl);
> but we are calling it as virtual:
> +  if (!_phase->is_canonical_main_loop_entry(cl)) {
> ?

thank you for catching that!

There are two ways the C++ standard allows calling a static method m() 
of class C:
- (1) using C::m() and
- (2) using obj.m() (where obj is an object of type C).

The effect is the same, if obj is non-null. For more details, see 
Section 9.4 of
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3337.pdf

But I agree with you, (1) looks much better.

Here is the updated webrev:
http://cr.openjdk.java.net/~zmajo/8148754/webrev.06/

I'm pushing this today as there are no functional changes relative to 
the previous webrev. RBT result also look good.

Thank you!

Best regards,

Zoltan

>
>> On Mar 18, 2016, at 7:11 AM, Zoltán Majó <zoltan.majo at oracle.com 
>> <mailto:zoltan.majo at oracle.com>> wrote:
>>
>> Hi Vladimir,
>>
>>
>> On 03/17/2016 06:12 PM, Vladimir Kozlov wrote:
>>> On 3/17/16 9:09 AM, Zoltán Majó wrote:
>>>> Hi Vladimir,
>>>>
>>>>
>>>> thank you for the feedback.
>>>>
>>>> On 03/16/2016 09:52 PM, Vladimir Kozlov wrote:
>>>>>> Can you please let me know which solution you prefer:
>>>>>> - (1) the prototype with the regression solved or
>>>>>> - (2) checking the graph shape?
>>>>>
>>>>> I agree that we should do (2) now. One suggestion I have is to 
>>>>> prepare a separate method to do these checks and use it in other 
>>>>> places which you pointed - superword and do_range_check.
>>>>
>>>> OK, I updated the patch for (2) so that the check of the graph's 
>>>> shape is performed in a separate method. Here is the webrev:
>>>> http://cr.openjdk.java.net/~zmajo/8148754/webrev.04/ 
>>>> <http://cr.openjdk.java.net/%7Ezmajo/8148754/webrev.04/>
>>>
>>> Zoltan, I don't see changes in do_range_check().
>>
>> I'm sorry, I forgot to add the check to do_range_check().
>>
>>> Opcode() is virtual function. Use is_*() query methods was 
>>> originally in SuperWord::get_pre_loop_end().
>>
>> OK, I changed the checks to use is_* query methods instead of Opcode().
>>
>>> I don't like is_adjustable_loop_entry() name, especially since you 
>>> negate it in checks. Consider is_canonical_main_loop_entry().
>>
>> OK, I've updated the name.
>>
>>> Also move assert(cl->is_main_loop(), "") into it.
>>
>> Done.
>>
>> Here is the updated webrev:
>> http://cr.openjdk.java.net/~zmajo/8148754/webrev.05/ 
>> <http://cr.openjdk.java.net/%7Ezmajo/8148754/webrev.05/>
>>
>> I've tested with JPRT and with locally executing the failing test. 
>> Both pass. I've started RBT testing.
>>
>> Thank you and best regards,
>>
>>
>> Zoltan
>>
>>>
>>> Thanks,
>>> Vladimir
>>>
>>>>
>>>> I've tested the updated webrev with JPRT and also by executing the 
>>>> failing test. Both pass. I will soon start RBT testing as well 
>>>> (will let you know if failures have appeared).
>>>>
>>>>> Yes, you can work later on (1) solution if you have a bug and not 
>>>>> RFE - we should stabilize C2 as you know and these changes may 
>>>>> have some side effects we don't know about yet. But I like it because
>>>>> we can explicitly specify which optimizations are allowed.
>>>>
>>>> OK. I filed 8152110: "Stabilize C2 loop optimizations" and will 
>>>> continue work in the scope of that bug:
>>>> https://bugs.openjdk.java.net/browse/JDK-8152110
>>>>>
>>>>>> The two I'm concerned about are
>>>>>> - Compile::cleanup_loop_predicates()
>>>>>
>>>>> Yes, this one should be marked LOOP_OPTS_LIMITED.
>>>>>
>>>>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/3256d4204291/src/share/vm/opto/compile.cpp#l1907
>>>>>> - IdealLoopTree::remove_main_post_loops()
>>>>>
>>>>> This one is fine because the loop goes away.
>>>>
>>>> I'll take of these once I continue work on 8152110.
>>>>
>>>> Thank you!
>>>>
>>>> Best regards,
>>>>
>>>>
>>>> Zoltan
>>>>
>>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>> On 3/16/16 10:59 AM, Zoltán Majó wrote:
>>>>>> Hi Vladimir,
>>>>>>
>>>>>>
>>>>>> I've spent more time on this issue. Please find my findings below.
>>>>>>
>>>>>> On 02/25/2016 03:53 AM, Vladimir Kozlov wrote:
>>>>>>> So it is again _major_progress problem.
>>>>>>> I have to spend more time on this. It is not simple.
>>>>>>>
>>>>>>> We may add an other state when Ideal transformation could be 
>>>>>>> executed. For example, after all loop opts:
>>>>>>>
>>>>>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/0fc557e05fc0/src/share/vm/opto/compile.cpp#l2286
>>>>>>>
>>>>>>> Or more states to specify which Ideal transformations and loop 
>>>>>>> optimizations could be executed in which state.
>>>>>>
>>>>>> I think adding more states is necessary, adding a single state is 
>>>>>> not sufficient as... (see below)
>>>>>>
>>>>>>>
>>>>>>> The main problem from your description is elimination of Opaque1 
>>>>>>> on which loop optimizations relies.
>>>>>>>
>>>>>>> We can simply remove Opaque1Node::Identity(PhaseGVN* phase) 
>>>>>>> because PhaseMacroExpand::expand_macro_nodes() will remove them 
>>>>>>> after all loop opts.
>>>>>>
>>>>>> ...there are even more places where Opaque1 nodes are removed, 
>>>>>> than we've initially assumed.
>>>>>>
>>>>>> The two I'm concerned about are
>>>>>> - Compile::cleanup_loop_predicates()
>>>>>
>>>>> Yes, this one should be marked LOOP_OPTS_LIMITED.
>>>>>
>>>>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/3256d4204291/src/share/vm/opto/compile.cpp#l1907
>>>>>> - IdealLoopTree::remove_main_post_loops()
>>>>>
>>>>> This one is fine because the loop goes away.
>>>>>
>>>>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/3256d4204291/src/share/vm/opto/loopTransform.cpp#l2469
>>>>>>
>>>>>>> On other hand we may do want to execute some simple loop 
>>>>>>> optimizations even after Opaque, CastII and CastI2L are 
>>>>>>> optimized out. For example, removing empty loops or one 
>>>>>>> iteration loops (pre-loops).
>>>>>>> But definitely not ones which use cloning or other aggressive 
>>>>>>> optimizations.
>>>>>>
>>>>>> Yes, I agree that we want to execute some loop optimizations even 
>>>>>> afterwards.
>>>>>>
>>>>>> So I've added three states: LOOP_OPTS_FULL, LOOP_OPTS_LIMITED, 
>>>>>> and LOOP_OPTS_INHIBITED. These states indicate which loop 
>>>>>> optimizations are allowed, Major_progress indicates only if loop 
>>>>>> optimizations
>>>>>> have made progress (but not if loop optimizations are expected to 
>>>>>> be performed).
>>>>>>
>>>>>>>
>>>>>>> Inline_Warm() is not used since InlineWarmCalls for very long 
>>>>>>> time. The code could be very rotten by now. So removing 
>>>>>>> set_major_progress from it is fine.
>>>>>>
>>>>>> OK.
>>>>>>
>>>>>>>
>>>>>>> It is also fine to remove it from inline_incrementally since it 
>>>>>>> will be restored by skip_loop_opts code (and cleared if method 
>>>>>>> is empty or set if there are expensive nodes).
>>>>>>
>>>>>> OK.
>>>>>>
>>>>>>>
>>>>>>> LoopNode::Ideal() change seems also fine. LoopNode is created 
>>>>>>> only in loop opts (RootNode has own Ideal()) so if it has TOP 
>>>>>>> input it will be removed by RegionNode::Ideal most likely.
>>>>>>
>>>>>> OK.
>>>>>>
>>>>>>>
>>>>>>> Which leaves remove_useless_bool() code only and I have concern 
>>>>>>> about it. It could happened after CCP phase and we may want to 
>>>>>>> execute loop opts after it. I am actually want to set major progress
>>>>>>> after CCP unconditionally since some If nodes could be folded by it.
>>>>>>
>>>>>> Yes, that makes sense and I did it.
>>>>>>
>>>>>>>
>>>>>>> As you can see it is not simple :(
>>>>>>
>>>>>> No, it's not simple at all. I did a prototype that implements all 
>>>>>> we discussed above. Here is the code:
>>>>>> http://cr.openjdk.java.net/~zmajo/code/8148754/webrev/
>>>>>>
>>>>>> The code is not yet RFR quality, but I've sent it out because I'd 
>>>>>> like to have your feedback on how to continue.
>>>>>>
>>>>>> The code fixes the current problem with the unexpected graph 
>>>>>> shape. But it is likely to also solve similar problems that are 
>>>>>> triggered also by an unexpected graph shape, for example any of 
>>>>>> the asserts
>>>>>> in PhaseIdealLoop::do_range_check:
>>>>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/3256d4204291/src/share/vm/opto/loopTransform.cpp#l2124
>>>>>>
>>>>>> I evaluated performance of the prototype. Peformance improves in 
>>>>>> a number of cases by 1-4%:
>>>>>> Octane-Mandreel
>>>>>> Octane-Richards
>>>>>> Octane-Splay
>>>>>>
>>>>>> Unfortunately, there is also a performance regression with 
>>>>>> SPECjvm2008-MonteCarlo-G1 (3-5%). Finding the cause of that 
>>>>>> regression is likely to take a at least a week, but most likely 
>>>>>> even more.
>>>>>>
>>>>>> So my question is: Should I spend more time on this prototype and 
>>>>>> fix the performance regression?
>>>>>>
>>>>>> A different solution would be check the complete graph shape. 
>>>>>> That is also done at other places, e.g., in 
>>>>>> SuperWord::get_pre_loop_end()
>>>>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/3256d4204291/src/share/vm/opto/superword.cpp#l3076
>>>>>>
>>>>>> Here is the webrev for the second solution:
>>>>>> http://cr.openjdk.java.net/~zmajo/8148754/webrev.03/
>>>>>>
>>>>>> The second solution does not regress. I've tested it with:
>>>>>> - JPRT;
>>>>>> - local testing (linux-86_64) with the failing test case;
>>>>>> - executing all hotspot tests locally, all tests pass that pass 
>>>>>> with an unmodified build.
>>>>>>
>>>>>> Can you please let me know which solution you prefer:
>>>>>> - (1) the prototype with the regression solved or
>>>>>> - (2) checking the graph shape?
>>>>>>
>>>>>> We could also fix this issue with pushing (2) for now (as this 
>>>>>> issue is a "critical" nightly failure). I could then spend more 
>>>>>> time on (1) later in a different bug.
>>>>>>
>>>>>> Thank you and best regards,
>>>>>>
>>>>>>
>>>>>> Zoltan
>>>>>>
>>>>>>> Thanks,
>>>>>>> Vladimir
>>>>>>>
>>>>>>> On 2/22/16 6:22 AM, Zoltán Majó wrote:
>>>>>>>> Hi Vladimir,
>>>>>>>>
>>>>>>>>
>>>>>>>> thank you for the feedback!
>>>>>>>>
>>>>>>>> On 02/16/2016 01:11 AM, Vladimir Kozlov wrote:
>>>>>>>>> Zoltan,
>>>>>>>>>
>>>>>>>>> It should not be "main" loop if peeling happened. See 
>>>>>>>>> do_peeling():
>>>>>>>>>
>>>>>>>>>    if (cl->is_main_loop()) {
>>>>>>>>>      cl->set_normal_loop();
>>>>>>>>>
>>>>>>>>> Split-if optimization should not split through loop's phi. And
>>>>>>>>> generally not through loop's head since it is not making code 
>>>>>>>>> better -
>>>>>>>>> split through backedge moves code into loop again. Making loop 
>>>>>>>>> body
>>>>>>>>> more complicated as this case shows.
>>>>>>>>
>>>>>>>> I did more investigation to understand what causes the invalid 
>>>>>>>> graph
>>>>>>>> shape to appear. It seems that the invalid graph shape appears 
>>>>>>>> because
>>>>>>>> the compiler uses the Compile:: _major_progress inconsistently. 
>>>>>>>> Here are
>>>>>>>> some details.
>>>>>>>>
>>>>>>>> - If _major_progress *is set*, the compiler expects more loop
>>>>>>>> optimizations to happen. Therefore, certain transformations on 
>>>>>>>> the graph
>>>>>>>> are not allowed so that the graph is in a shape that can be 
>>>>>>>> processed by
>>>>>>>> loop optimizations. See:
>>>>>>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/2c3c43037e14/src/share/vm/opto/convertnode.cpp#l253
>>>>>>>>
>>>>>>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/2c3c43037e14/src/share/vm/opto/castnode.cpp#l251
>>>>>>>>
>>>>>>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/2c3c43037e14/src/share/vm/opto/loopnode.cpp#l950
>>>>>>>>
>>>>>>>> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/file/2c3c43037e14/src/share/vm/opto/opaquenode.cpp#l37
>>>>>>>>
>>>>>>>>
>>>>>>>> - If _major_progress *is not set*, the compiler is allowed to 
>>>>>>>> perform
>>>>>>>> all possible transformations (because it does not have to care 
>>>>>>>> about
>>>>>>>> future loop optimizations).
>>>>>>>>
>>>>>>>> The crash reported for the current issue appears because 
>>>>>>>> _major_progress
>>>>>>>> *can be accidentally set again* after the compiler decided to stop
>>>>>>>> performing loop optimizations. As a result, invalid graph 
>>>>>>>> shapes appear.
>>>>>>>>
>>>>>>>> Here are details about how this happens for both failures I've been
>>>>>>>> studying:
>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8148754?focusedCommentId=13901941&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13901941
>>>>>>>>
>>>>>>>>
>>>>>>>> I would propose to change the compiler to use _major_progress
>>>>>>>> consistently. (This goes into the same direction as Tobias's 
>>>>>>>> recent work
>>>>>>>> on JDK-8144487.)
>>>>>>>>
>>>>>>>> I propose that _major_progress:
>>>>>>>> - can be SET when the compiler is initialized (because loop
>>>>>>>> optimizations are expected to happen afterwards);
>>>>>>>> - can be SET/RESET in the scope of loop optimizations (because 
>>>>>>>> we want
>>>>>>>> to see if loop optimizations made progress);
>>>>>>>> - cannot be SET/RESET by neither incremental inlining nor IGVN 
>>>>>>>> (even if
>>>>>>>> the IGVN is performed in the scope of loop optimizations).
>>>>>>>>
>>>>>>>> Here is the updated webrev:
>>>>>>>> http://cr.openjdk.java.net/~zmajo/8148754/webrev.02/
>>>>>>>>
>>>>>>>> Performance evaluation:
>>>>>>>> - The proposed webrev does not cause performance regressions for
>>>>>>>> SPECjvm2008, SPECjbb2005, and Octane.
>>>>>>>>
>>>>>>>> Testing:
>>>>>>>> - all hotspot JTREG tests on all supported platforms;
>>>>>>>> - JPRT;
>>>>>>>> - failing test case.
>>>>>>>>
>>>>>>>> Thank you and best regards,
>>>>>>>>
>>>>>>>>
>>>>>>>> Zoltan
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Bailout unrolling is fine but performance may suffer because 
>>>>>>>>> in some
>>>>>>>>> cases loop unrolling is better then split-if.
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Vladimir
>>>>>>>>>
>>>>>>>>> On 2/15/16 7:22 AM, Zoltán Majó wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> please review the patch for 8148754.
>>>>>>>>>>
>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8148754
>>>>>>>>>>
>>>>>>>>>> Problem: Compilation fails when the C2 compiler attempts loop 
>>>>>>>>>> unrolling.
>>>>>>>>>> The cause of the failure is that the loop unrolling 
>>>>>>>>>> optimization expects
>>>>>>>>>> a well-defined graph shape at the entry control of a 
>>>>>>>>>> 'CountedLoopNode'
>>>>>>>>>> ('IfTrue'/'IfFalse' preceeded by 'If' preceeded by 'Bool' 
>>>>>>>>>> preceeded by
>>>>>>>>>> 'CmpI').
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Solution: I investigated several different instances of the same
>>>>>>>>>> failure. It turns out that the shape of the graph at a loop's 
>>>>>>>>>> entry
>>>>>>>>>> control is often different from the way loop unrolling 
>>>>>>>>>> expects it to be
>>>>>>>>>> (please find some examples in the bug's JBS issue). The 
>>>>>>>>>> various graph
>>>>>>>>>> shapes are a result of previously performed transformations, 
>>>>>>>>>> e.g.,
>>>>>>>>>> split-if optimization and loop peeling.
>>>>>>>>>>
>>>>>>>>>> Loop unrolling requires the above mentioned graph shape so 
>>>>>>>>>> that it can
>>>>>>>>>> adjust the zero-trip guard of the loop. With the unexpected graph
>>>>>>>>>> shapes, it is not possible to perform loop unrolling. 
>>>>>>>>>> However, the graph
>>>>>>>>>> is still in a valid state (except for loop unrolling) and can 
>>>>>>>>>> be used to
>>>>>>>>>> produce correct code.
>>>>>>>>>>
>>>>>>>>>> I propose that (1) we check if an unexpected graph shape is 
>>>>>>>>>> encountered
>>>>>>>>>> and (2) bail out of loop unrolling if it is (but not fail in the
>>>>>>>>>> compiler in such cases).
>>>>>>>>>>
>>>>>>>>>> The failure was triggered by Aleksey's Indify String 
>>>>>>>>>> Concatenation
>>>>>>>>>> changes but the generated bytecodes are valid. So this seems 
>>>>>>>>>> to be a
>>>>>>>>>> compiler issue that was previously there but was not yet 
>>>>>>>>>> triggered.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Webrev:
>>>>>>>>>> http://cr.openjdk.java.net/~zmajo/8148754/webrev.00/
>>>>>>>>>>
>>>>>>>>>> Testing:
>>>>>>>>>> - JPRT;
>>>>>>>>>> - local testing (linux-86_64) with the failing test case;
>>>>>>>>>> - executed all hotspot tests locally, all tests pass that 
>>>>>>>>>> pass with an
>>>>>>>>>> unmodified build.
>>>>>>>>>>
>>>>>>>>>> Thank you!
>>>>>>>>>>
>>>>>>>>>> Best regards,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Zoltan
>