RFR(M): 8080289: Intermediate writes in a loop not eliminated by optimizer

Wed Jun 17 04:46:22 UTC 2015

Why would removing the redundant st3 be an incorrect optimization?

sent from my phone
On Jun 16, 2015 3:05 PM, "Vladimir Kozlov" <vladimir.kozlov at oracle.com>
wrote:

> On 6/11/15 3:39 AM, Roland Westrelin wrote:
>
>> Thanks for looking at this, Vladimir.
>>
>>  We don't call previous node as 'user' - we call them definitions or
>>> inputs. Your comment change in memnode.cpp sound strange because of that.
>>> The original statement was correct:
>>> // If anybody other than 'this' uses 'mem', we cannot fold 'mem' away.
>>>
>>> in your case it will be:
>>>
>>> // If anybody other than 'this' uses 'st', we cannot fold 'st' away.
>>>
>>
>> Right.
>>
>>  Also code does not seems right. The code should go through input memory
>>> chain and remove all preceding similar stores - 'this' node remains and we
>>> change its memory input which makes previous stores dead. So you can't do
>>> 'prev = st’.
>>>
>>
>> That’s what I think the code does. That is if you have:
>>
>> st1->st2->st3->st4
>>
>
> I assume st4 is first store and st1 is last. Right?
>
>
>> and st3 is redundant with st1, the chain should become:
>>
>> st1->st2->st4
>>
>
> I am not sure it is correct optimization. On some machines result of st3
> could be visible before result of st2. And you change it.
> I am suggesting not do that. Do you need that for stores move from loop?
>
>
>> so we need to change the memory input of st2 when we find st3 can be
>> removed. In the code, at that point, this=st1, st = st3 and prev=st2.
>>
>
> In this case the code should be:
>
>    if (st->in(MemNode::Address)->eqv_uncast(address) &&
>      ...
>    } else {
>      prev = st;
>    }
>
> to update 'prev' with 'st' only if 'st' is not removed.
> Also, I think, st->in(MemNode::Memory) could be put in local var since it
> is used several times in this code.
>
>
>>  You need to set improved = true since 'this' will not change. We also
>>> use 'make_progress' variable's name in such cases.
>>>
>>
>> In the example above, if we remove st2, we modify this, right?
>>
>
> We need to call Ideal() again if store inputs are changed. So if st2 is
> removed then inputs of st1 are changed so we need to rerun Ideal(). This
> allow to avoid having your new loop in the Ideal().
>
>
>>  I try_move_store_before_loop() add check (n_loop->_head == n_ctrl) to
>>> make sure it is not other control node inside loop. Then you can check
>>> Phi's control as (mem->in(0) == n_ctrl).
>>>
>>> I don't understand verification code "Check that store's control does
>>> post dominate loop entry". In first comment you said "Store has to be first
>>> in the loop body" - I understand this as Store's control should be loop's
>>> head. You can't allow store to be on one of branches (it will not post
>>> dominate) but your check code allow that. Also the check done only in debug
>>> VM.
>>>
>>> If you really want to accept cases when a store is placed after diamond
>>> control then you need checks in product to make sure that it is really post
>>> dominate head. For that I think you need to go from loopend to loophead
>>> through idom links and see if you meet n_ctrl.
>>>
>>
>>
>> My check code starts from the loop, follows every path and make sure
>> there’s no path that leaves the loop without going through n. With example
>> 1 below:
>>
>> for (int i = 0; i < 10; i++) {
>>    if (some_condition) {
>>      array[idx] = 999;
>>    } else {
>>    }
>> }
>>
>> We’ll find a path from the head that doesn’t go through the store and
>> that exits the loop. What the comment doesn’t say is that with example 2
>> below:
>>
>> for (int i = 0; i < 10; i++) {
>>    if (some_condition) {
>>      uncommon_trap();
>>    }
>>    array[idx] = 999;
>> }
>>
>> my verification code would find the early exit as well.
>>
>> It’s verification code only because if we have example 1 above, then we
>> have a memory Phi to merge both branches of the if. So the pattern that we
>> look for in PhaseIdealLoop::try_move_store_before_loop() won’t match: the
>> loop’s memory Phi backedge won’t be the store. If we have example 2 above,
>> then the loop’s memory Phi doesn’t have a single memory use. So I don’t
>> think we need to check that the store post dominate the loop head in
>> product. That’s my reasoning anyway and the verification code is there to
>> verify it.
>>
>
> I missed 'mem->in(LoopNode::LoopBackControl) == n' condition. Which reduce
> cases only to one store to this address in the loop - good.
>
> How you check in product VM that there are no other exists from a loop
> (your example 2)? Is it guarded by mem->outcnt() == 1 check?
>
>
>> This said, my verification code doesn’t work for infinite loops. It would
>> need to check whether we reach the tail of the loop maybe?
>>
>>  I don't see assert(n->in(0) in try_move_store_before_loop() but you have
>>> it in try_move_store_after_loop().
>>>
>>
>> Ok.
>>
>>
>>> Why you need next assert?:
>>> +         assert(n_loop != address_loop, "address is loop varying”);
>>>
>>
>> I wonder about that too ;-)
>> I’ll remove it.
>>
>>  Should you check phi == NULL instead of assert to make sure you have
>>> only one Phi node?
>>>
>>
>> Can there be more than one memory Phi for a particular slice that has
>> in(0) == n_loop->_head?
>> I would have expected that to be impossible.
>>
>
> BOTTOM (all slices) Phi?
>
> Thanks,
> Vladimir
>
>
>>
>>> conflict between assert and following check:
>>> +               assert(new_loop->_child != NULL, "");
>>> +               if (new_loop->_child == NULL) new_loop->_body.push(st);
>>>
>>
>> Thanks for spotting that. I’ll remove the assert.
>>
>> Roland.
>>
>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 6/10/15 8:03 AM, Roland Westrelin wrote:
>>>
>>>> http://cr.openjdk.java.net/~roland/8080289/webrev.00/
>>>>
>>>> Sink stores out of loops when possible:
>>>>
>>>> for (int i = 0; i < 1000; i++) {
>>>>      // Some stuff that doesn’t prevent the optimization
>>>>      array[idx] = i;
>>>> }
>>>>
>>>> becomes:
>>>>
>>>> for (int i = 0; i < 1000; i++) {
>>>>      // Some stuff
>>>> }
>>>> array[idx] = 999;
>>>>
>>>> Or move stores before the loop when possible:
>>>>
>>>> for (int i = 0; i < 1000; i++) {
>>>>      array[idx] = 999;
>>>>      // Some stuff that doesn’t prevent the optimization
>>>> }
>>>>
>>>> becomes:
>>>>
>>>> array[idx] = 999;
>>>> for (int i = 0; i < 1000; i++) {
>>>>      // Some stuff
>>>> }
>>>>
>>>> The change in memnode.cpp is useful to clean up code generated from
>>>> test_after_5 because the stores are moved out of the loop only after the
>>>> loop is split and unrolled. That code removes duplicate stores.
>>>>
>>>> Roland.
>>>>
>>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150617/db49c2fc/attachment.html>