[9] RFR(S): 8034216: assert(false) failed: infinite loop in PhaseIterGVN::optimize

Thu Apr 10 15:12:43 UTC 2014

Hi Vladimir,

Here is an updated version of the castPP node removal solution.

http://cr.openjdk.java.net/~anoll/8034216/webrev.04/
I executed specjvm98 with these changes on my local desktop and I could 
not see a performance regression.

Best,
Albert

On 04/10/2014 07:46 AM, Albert wrote:
> Hi Vladimir,
>
> thanks a lot for looking into this again. I will prepare a patch with 
> which removes CastPP node (this was Roland's idea). I will also run 
> SpecJVM98 to see if there
> is an impact on performance.
>
> Best,
> Albert
>
>
> On 04/09/2014 09:07 PM, Vladimir Kozlov wrote:
>> Hi Albert,
>>
>> I am continue thinking about this problem.
>> Your suggestion to process CastPP node first could be less intrusive 
>> than other solutions. It will do the same thing as current code if 
>> such nodes are first on worklist. It would be still nice to continue 
>> investigate the possibility to keep these nodes in graph after CCP 
>> but it should not be very high priority (file RFE).
>>
>> Note, you can't iterate the _worklist the way you do because remove() 
>> will put last element into current slot so you have to check it again.
>> Also using remove(uint i) version will be faster.
>>
>> Thanks,
>> Vladimir
>>
>> On 4/7/14 7:04 PM, Vladimir Kozlov wrote:
>>> Hi Albert,
>>>
>>> Can you push the cleanup first under a different bug?
>>> We may need to continue discussion how to fix the problem but I think
>>> cleanup can be pushed now. Please, send a separate webrev for it under
>>> new bug id.
>>>
>>> About the problem.
>>>
>>> It looks like 6916062 all over again :(  See attached mail thread 
>>> when I
>>> did investigation:
>>>
>>> [217]   CheckCastPP  . .  # java/util/ArrayList:NotNull*
>>> [229]   CheckCastPP  . .  #
>>> java/util/Collections$UnmodifiableCollection:NotNull*
>>> [292]   Phi . [217] [229] # java/lang/Object:NotNull*
>>> [11637] CastPP . [292]    # _type = java/util/Collection*  but
>>> igvn.type(cast) = java/util/Collection:NotNull*
>>> [11638] AddP . [11637][11637] + 8 # igvn.type(addp)
>>> java/util/Collection:NotNull+8 *
>>> [11639] LoadNKlass . . [11638]
>>>
>>> and we hit the assert in the 6916062 fix code:
>>>
>>> http://hg.openjdk.java.net/hsx/hsx19/baseline/rev/9a19ee0490e0
>>>
>>>
>>> Types are converged during CCP phase when all CastPP nodes are still
>>> present. Unfortunately, after CastPP nodes are removed, types could
>>> degradate after your change:
>>>
>>> http://cr.openjdk.java.net/~anoll/8034216/webrev.01/
>>>
>>> In your case cached type for AddP (NotNull) will be replaced by input's
>>> type because this is what AddPNode::Value() returns.
>>>
>>> I afraid this will affect types more than we want.
>>>
>>> Note, the only information about NotNull is located in the cached type
>>> because corresponding CastPP was also removed.
>>>
>>>
>>> Removing CastPP nodes first will replace cached type of AddP to even
>>> more general j.l.Object (from Phi node).
>>>
>>> May be we should investigate if the next performance statement about
>>> jvm98 is still true and solve it differently (in PhiNode::Ideal() for
>>> example):
>>>
>>> // I tried to leave the CastPP's in.  This makes the graph more 
>>> accurate in
>>> // some sense; we get to keep around the knowledge that an oop is 
>>> not-null
>>> // after some test.  Alas, the CastPP's interfere with GVN (some 
>>> values are
>>> // the regular oop, some are the CastPP of the oop, all merge at Phi's
>>> which
>>> // cannot collapse, etc).  This cost us 10% on SpecJVM, even when I 
>>> removed
>>> // some of the more trivial cases in the optimizer.  Removing more 
>>> useless
>>> // Phi's started allowing Loads to illegally float above null 
>>> checks.  I
>>> gave
>>> // up on this approach.  CNC 10/20/2000
>>>
>>> Regards,
>>> Vladimir
>>>
>>> On 4/4/14 8:09 AM, Albert wrote:
>>>> Hi Vladimir,
>>>>
>>>> thanks for looking at this.  Please see comments inline:
>>>>
>>>> On 04/04/2014 04:03 AM, Vladimir Kozlov wrote:
>>>>> On 4/3/14 4:14 PM, Vladimir Kozlov wrote:
>>>>>> Hi Albert,
>>>>>>
>>>>>> Thank you for doing this investigation. I hope it was good C2
>>>>>> learning exercise :)
>>>>>>
>>>> Yes, I learned a lot. Many thanks to you and Roland for teaching me.
>>>>
>>>>>> I prefer the second solution, webrev.01. I don't like that we are
>>>>>> loosing more concrete type with first solution.
>>>>>
>>>>> Actually it would be more correct and complete to have a solution
>>>>> which does not put a load node on worklist again when there is no
>>>>> progress with type. Even with webrev.01 we loose NotNull information.
>>>> I have one general question here: Are types expected to converge at 
>>>> this
>>>> point if everything would work as expected?
>>>> Why do we loose NotNull information if types converge to
>>>> java/util/Spliterator:NotNull+8 * ?
>>>>
>>>> I think that your suggestion, not pushing a node on the worklist if we
>>>> don't make progress, will result in the type:
>>>> java/util/Spliterator for the LoadKlass node -  so the NotNull
>>>> information would be lost in that case. If you think that
>>>> this is the solution we should go for I'll look more into it.
>>>>
>>>> Here is a new webrev of the cleanup. I included all your suggestion. I
>>>> deleted the two lines
>>>> 1123     // Move users of node to worklist
>>>> 1124     add_users_to_worklist( k );
>>>>
>>>> accidentally. Thanks for catching this.
>>>> http://cr.openjdk.java.net/~anoll/8034216/webrev.03/
>>>>
>>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>>>
>>>>>> It would be nice to detect such situations and bailout 
>>>>>> compilation in
>>>>>> product VM and throw assert in debug VM.
>>>>>>
>>>>>> The clean up is nice, go for it. Note that you should put new 
>>>>>> methods
>>>>>> under #ifndef PRODUCT instead of using
>>>>>> PRODUCT_RETURN because they are called only in NOT_PRODUCT().
>>>>>>
>>>>>> Why you removed next code?:
>>>>>>
>>>>>> 1123     // Move users of node to worklist
>>>>>> 1124     add_users_to_worklist( k );
>>>>>>
>>>>>> Next code should be under #ifdef ASSERT because TraceIterativeGVN is
>>>>>> develop flag. So you don't need for NOT_PRODUCT()
>>>>>> and DEBUG_ONLY() in the code:
>>>>>>
>>>>>> +     if (TraceIterativeGVN && Verbose) {
>>>>>> +       tty->print("  Pop ");
>>>>>> +       NOT_PRODUCT(n->dump();)
>>>>>> +       DEBUG_ONLY(if( (num_processed++ % 100) == 0 )
>>>>>> _worklist.print_set();)
>>>>>> +     }
>>>>>>
>>>>>> For next I also think is better to use #ifdef ASSERT:
>>>>>>
>>>>>> +       DEBUG_ONLY(n->dump(4);)
>>>>>> +       DEBUG_ONLY(_worklist.dump();)
>>>>>> +       assert(false, "infinite loop in PhaseIterGVN::optimize");
>>>>>>
>>>>>> And #ifndef PRODUCT for next:
>>>>>>
>>>>>> +       NOT_PRODUCT(trace_PhaseIterGVN(n, nn, oldtype));
>>>>>> +       NOT_PRODUCT(if (nn != n) verify_step((Node*) NULL);)  //
>>>>>> ignore n, it might be subsumed
>>>>>>
>>>>>>
>>>>>> On 4/3/14 5:40 AM, Albert wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> could I get reviews for this patch?
>>>>>>>
>>>>>>> Bug:
>>>>>>> ------
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8034216
>>>>>>>
>>>>>>>
>>>>>>> Problem:
>>>>>>> ------------
>>>>>>> The endless loop occurs, since there are 2 nodes of type 
>>>>>>> LoadKlass in
>>>>>>> the worklist that keep pushing themselves
>>>>>>> onto the worklist. The current implementation can only handle 1 
>>>>>>> node
>>>>>>> that keeps pushing itself on the worklist.
>>>>>>> The endless loop occurs in the IGVN pass that comes after CCP 
>>>>>>> phase.
>>>>>>>
>>>>>>> This piece of code in memnode.cpp is responsible for the endless 
>>>>>>> loop:
>>>>>>>
>>>>>>> Node *MemNode::Ideal_common(PhaseGVN *phase, bool can_reshape) {
>>>>>>>    ...
>>>>>>>    if (can_reshape && igvn != NULL &&
>>>>>>>        (igvn->_worklist.member(address) ||
>>>>>>>         igvn->_worklist.size() > 0 && (t_adr != adr_type())) ) {
>>>>>>>      // The address's base and type may change when the address is
>>>>>>> processed.
>>>>>>>      // Delay this mem node transformation until the address is
>>>>>>> processed.
>>>>>>>      phase->is_IterGVN()->_worklist.push(this);
>>>>>>>      return NodeSentinel; // caller will return NULL
>>>>>>>    }
>>>>>>>    ...
>>>>>>> }
>>>>>>> As seen above, we keep pushing the current node onto the 
>>>>>>> worklist if
>>>>>>> t_adr != adr_type().
>>>>>>> Unfortunately the types do not converge. See the following node 
>>>>>>> dump:
>>>>>>>
>>>>>>>     5559 272 b 4 java.util.concurrent.CountedCompleter::exec (6 
>>>>>>> bytes)
>>>>>>>   2054 CheckCastPP === 4086 2023 [[ 2098 2064 2064 ]]
>>>>>>> #java/util/Spliterators$DoubleArraySpliterator:NotNull:exact *
>>>>>>> Oop:java/util/Spliterators$DoubleArraySpliterator:NotNull:exact *
>>>>>>> !jvms:
>>>>>>> AbstractTask::compute @ bci:6 CountedCompleter::exec @ bci:-1
>>>>>>>   2044 CheckCastPP === 4090 2023 [[ 2098 2077 2077 ]]
>>>>>>> #java/util/ArrayList$ArrayListSpliterator:NotNull:exact *
>>>>>>> Oop:java/util/ArrayList$ArrayListSpliterator:NotNull:exact * !jvms:
>>>>>>> AbstractTask::compute @ bci:6 CountedCompleter::exec @ bci:-1
>>>>>>>   2095 Region === 2095 2085 2072 [[ 2095 2120 2096 2097 2098 
>>>>>>> 2099 ]]
>>>>>>> !jvms: AbstractTask::compute @ bci:6 CountedCompleter::exec @ 
>>>>>>> bci:-1
>>>>>>>   2169 CmpL === _ 2099 2164 [[ 2170 ]] !jvms: 
>>>>>>> AbstractTask::compute @
>>>>>>> bci:29 CountedCompleter::exec @ bci:-1
>>>>>>>   2121 IfTrue === 2120 [[ 2125 ]] #1 !jvms:
>>>>>>> AbstractTask::getTargetSize
>>>>>>> @ bci:8 AbstractTask::compute @ bci:14 CountedCompleter::exec @ 
>>>>>>> bci:-1
>>>>>>>   2122 IfFalse === 2120 [[ 2125 2162 ]] #0 !jvms:
>>>>>>> AbstractTask::getTargetSize @ bci:8 AbstractTask::compute @ bci:14
>>>>>>> CountedCompleter::exec @ bci:-1
>>>>>>>   2098 Phi === 2095 2044 2054 [[ 2229 ]] 
>>>>>>> #java/lang/Object:NotNull *
>>>>>>> Oop:java/lang/Object:NotNull * !orig=2181 !jvms:
>>>>>>> AbstractTask::compute @
>>>>>>> bci:6 CountedCompleter::exec @ bci:-1
>>>>>>>   2170 Bool === _ 2169 [[ 2171 ]] [le] !jvms: 
>>>>>>> AbstractTask::compute @
>>>>>>> bci:29 CountedCompleter::exec @ bci:-1
>>>>>>>   2125 Region === 2125 2122 2121 [[ 2125 2171 2163 2164 ]] !jvms:
>>>>>>> AbstractTask::getTargetSize @ bci:24 AbstractTask::compute @ bci:14
>>>>>>> CountedCompleter::exec @ bci:-1
>>>>>>>   22 ConL === 0 [[ 24 2037 81 2230 288 ]] #long:8
>>>>>>>   2229 CastPP === 2173 2098 [[ 2240 2240 2230 2230 2237 ]]
>>>>>>> #java/util/Spliterator * Interface:java/util/Spliterator * !jvms:
>>>>>>> AbstractTask::compute @ bci:33 CountedCompleter::exec @ bci:-1
>>>>>>>   3 Start === 3 0 [[ 3 5 6 7 8 9 10 ]] #{0:control, 1:abIO, 
>>>>>>> 2:memory,
>>>>>>> 3:rawptr:BotPTR, 4:return_address,
>>>>>>> 5:java/util/concurrent/CountedCompleter:NotNull *}
>>>>>>>   2171 If === 2125 2170 [[ 2172 2173 ]] P=0.500000, C=-1.000000 
>>>>>>> !jvms:
>>>>>>> AbstractTask::compute @ bci:29 CountedCompleter::exec @ bci:-1
>>>>>>>   2230 AddP === _ 2229 2229 22 [[ 2231 ]]
>>>>>>> Interface:java/util/Spliterator+8 * !jvms: AbstractTask::compute @
>>>>>>> bci:33 CountedCompleter::exec @ bci:-1
>>>>>>>   7 Parm === 3 [[ 103 289 110 123 25 82 2231 2077 47 2064 2057 
>>>>>>> 2038 65
>>>>>>> 2031 75 2023 ]] Memory Memory: @BotPTR *+bot, idx=Bot; !jvms:
>>>>>>> CountedCompleter::exec @ bci:-1
>>>>>>>   2173 IfFalse === 2171 [[ 4079 2231 2229 ]] #0 !orig=[2195] !jvms:
>>>>>>> AbstractTask::compute @ bci:29 CountedCompleter::exec @ bci:-1
>>>>>>>   2231 LoadKlass === 2173 7 2230 [[ 2232 ]] @java/lang/Object+8 *,
>>>>>>> idx=4; #klass java/util/Spliterator: 0x0000000002252c98 *
>>>>>>> Interface:klass java/util/Spliterator: 0x0000000002252c98 * !jvms
>>>>>>>
>>>>>>> t_adr: is the cached type of the Address input (2230 AddP):
>>>>>>> java/util/Spliterator:NotNull
>>>>>>> adr_type(): is the bottom type of the address input (2230 AddP):
>>>>>>> java/util/Spliterator
>>>>>>>
>>>>>>> The bottom_type of the AddP node is determined by the bottom 
>>>>>>> type of
>>>>>>> it's address input node ( 2229 CastPP).
>>>>>>> The bottom type of the CastPP node is set in the CCP phase:
>>>>>>>
>>>>>>> Node *PhaseCCP::transform_once( Node *n ) {
>>>>>>>    ...
>>>>>>>    Node *nn = n->Ideal_DU_postCCP(this);
>>>>>>>    ...
>>>>>>> }
>>>>>>>
>>>>>>> In the concrete execution, the CCP phase sets the bottom type of 
>>>>>>> the
>>>>>>> castPP node equal to the type of its input (in(1)).
>>>>>>> in(1) of the castPP node is a phi-node. Since the two types are 
>>>>>>> equal,
>>>>>>> the castPP node is redundant and could be removed
>>>>>>> in the subsequent IGVN phase. However, the castPP node is not 
>>>>>>> removed,
>>>>>>> since another node in the worklist (a redundant
>>>>>>> region node) is processed prior to the castPP node. Removing the
>>>>>>> redundant region node results in a change of in(1) of the
>>>>>>> castPP node. As a result, the type of in(1) of the castPP node 
>>>>>>> changes
>>>>>>> as well and the castPP node is not removed. Not
>>>>>>> removing the castPP node makes the types not converge as expected.
>>>>>>>
>>>>>>>
>>>>>>> Solution:
>>>>>>> -----------
>>>>>>> Add a pass after CCP that removes the redundant castPP node. As a
>>>>>>> result, the types converge to
>>>>>>> java/lang/Object:NotNull+8 * and the endless loop does not occur.
>>>>>>>
>>>>>>> Here is the webrev:
>>>>>>> http://cr.openjdk.java.net/~anoll/8034216/webrev.00/
>>>>>>>
>>>>>>>
>>>>>>> Alternative solution:
>>>>>>> --------------------------
>>>>>>> Instead of adding the extra pass over the IR, we could raise the
>>>>>>> bottom
>>>>>>> type of a type node
>>>>>>> in IGVN. As a result, the types converge to
>>>>>>> java/util/Spliterator:NotNull+8 * and the endless loop
>>>>>>> does not occur.
>>>>>>>
>>>>>>> Here is the webrev:
>>>>>>> http://cr.openjdk.java.net/~anoll/8034216/webrev.01/
>>>>>>>
>>>>>>>
>>>>>>> Cleanup:
>>>>>>> ------------
>>>>>>> In addition, I have worked on refactoring/cleaning up the affected
>>>>>>> code.
>>>>>>> This code cleanup comes
>>>>>>> in a separate webrev to make it easier to figure out the actual
>>>>>>> changes
>>>>>>> that fix the bug. The code
>>>>>>> cleanup is optional.
>>>>>>>
>>>>>>> Here is the webrev:
>>>>>>> http://cr.openjdk.java.net/~anoll/8034216/webrev.02/
>>>>>>>
>>>>>>>
>>>>>>> Testing:
>>>>>>> ----------
>>>>>>> Both solutions successfully passed JPRT and CTW (rt.jar).
>>>>>>>
>>>>>>> Many thanks in advance,
>>>>>>> Albert
>>>>
>