RFR(M): 8080289: Intermediate writes in a loop not eliminated by optimizer

Tue Jun 16 19:05:49 UTC 2015

On 6/11/15 3:39 AM, Roland Westrelin wrote:
> Thanks for looking at this, Vladimir.
>
>> We don't call previous node as 'user' - we call them definitions or inputs. Your comment change in memnode.cpp sound strange because of that. The original statement was correct:
>> // If anybody other than 'this' uses 'mem', we cannot fold 'mem' away.
>>
>> in your case it will be:
>>
>> // If anybody other than 'this' uses 'st', we cannot fold 'st' away.
>
> Right.
>
>> Also code does not seems right. The code should go through input memory chain and remove all preceding similar stores - 'this' node remains and we change its memory input which makes previous stores dead. So you can't do 'prev = st’.
>
> That’s what I think the code does. That is if you have:
>
> st1->st2->st3->st4

I assume st4 is first store and st1 is last. Right?

>
> and st3 is redundant with st1, the chain should become:
>
> st1->st2->st4

I am not sure it is correct optimization. On some machines result of st3 
could be visible before result of st2. And you change it.
I am suggesting not do that. Do you need that for stores move from loop?

>
> so we need to change the memory input of st2 when we find st3 can be removed. In the code, at that point, this=st1, st = st3 and prev=st2.

In this case the code should be:

    if (st->in(MemNode::Address)->eqv_uncast(address) &&
      ...
    } else {
      prev = st;
    }

to update 'prev' with 'st' only if 'st' is not removed.
Also, I think, st->in(MemNode::Memory) could be put in local var since 
it is used several times in this code.

>
>> You need to set improved = true since 'this' will not change. We also use 'make_progress' variable's name in such cases.
>
> In the example above, if we remove st2, we modify this, right?

We need to call Ideal() again if store inputs are changed. So if st2 is 
removed then inputs of st1 are changed so we need to rerun Ideal(). This 
allow to avoid having your new loop in the Ideal().

>
>> I try_move_store_before_loop() add check (n_loop->_head == n_ctrl) to make sure it is not other control node inside loop. Then you can check Phi's control as (mem->in(0) == n_ctrl).
>>
>> I don't understand verification code "Check that store's control does post dominate loop entry". In first comment you said "Store has to be first in the loop body" - I understand this as Store's control should be loop's head. You can't allow store to be on one of branches (it will not post dominate) but your check code allow that. Also the check done only in debug VM.
>>
>> If you really want to accept cases when a store is placed after diamond control then you need checks in product to make sure that it is really post dominate head. For that I think you need to go from loopend to loophead through idom links and see if you meet n_ctrl.
>
>
> My check code starts from the loop, follows every path and make sure there’s no path that leaves the loop without going through n. With example 1 below:
>
> for (int i = 0; i < 10; i++) {
>    if (some_condition) {
>      array[idx] = 999;
>    } else {
>    }
> }
>
> We’ll find a path from the head that doesn’t go through the store and that exits the loop. What the comment doesn’t say is that with example 2 below:
>
> for (int i = 0; i < 10; i++) {
>    if (some_condition) {
>      uncommon_trap();
>    }
>    array[idx] = 999;
> }
>
> my verification code would find the early exit as well.
>
> It’s verification code only because if we have example 1 above, then we have a memory Phi to merge both branches of the if. So the pattern that we look for in PhaseIdealLoop::try_move_store_before_loop() won’t match: the loop’s memory Phi backedge won’t be the store. If we have example 2 above, then the loop’s memory Phi doesn’t have a single memory use. So I don’t think we need to check that the store post dominate the loop head in product. That’s my reasoning anyway and the verification code is there to verify it.

I missed 'mem->in(LoopNode::LoopBackControl) == n' condition. Which 
reduce cases only to one store to this address in the loop - good.

How you check in product VM that there are no other exists from a loop 
(your example 2)? Is it guarded by mem->outcnt() == 1 check?

>
> This said, my verification code doesn’t work for infinite loops. It would need to check whether we reach the tail of the loop maybe?
>
>> I don't see assert(n->in(0) in try_move_store_before_loop() but you have it in try_move_store_after_loop().
>
> Ok.
>
>>
>> Why you need next assert?:
>> +         assert(n_loop != address_loop, "address is loop varying”);
>
> I wonder about that too ;-)
> I’ll remove it.
>
>> Should you check phi == NULL instead of assert to make sure you have only one Phi node?
>
> Can there be more than one memory Phi for a particular slice that has in(0) == n_loop->_head?
> I would have expected that to be impossible.

BOTTOM (all slices) Phi?

Thanks,
Vladimir

>
>>
>> conflict between assert and following check:
>> +               assert(new_loop->_child != NULL, "");
>> +               if (new_loop->_child == NULL) new_loop->_body.push(st);
>
> Thanks for spotting that. I’ll remove the assert.
>
> Roland.
>
>>
>> Thanks,
>> Vladimir
>>
>> On 6/10/15 8:03 AM, Roland Westrelin wrote:
>>> http://cr.openjdk.java.net/~roland/8080289/webrev.00/
>>>
>>> Sink stores out of loops when possible:
>>>
>>> for (int i = 0; i < 1000; i++) {
>>>      // Some stuff that doesn’t prevent the optimization
>>>      array[idx] = i;
>>> }
>>>
>>> becomes:
>>>
>>> for (int i = 0; i < 1000; i++) {
>>>      // Some stuff
>>> }
>>> array[idx] = 999;
>>>
>>> Or move stores before the loop when possible:
>>>
>>> for (int i = 0; i < 1000; i++) {
>>>      array[idx] = 999;
>>>      // Some stuff that doesn’t prevent the optimization
>>> }
>>>
>>> becomes:
>>>
>>> array[idx] = 999;
>>> for (int i = 0; i < 1000; i++) {
>>>      // Some stuff
>>> }
>>>
>>> The change in memnode.cpp is useful to clean up code generated from test_after_5 because the stores are moved out of the loop only after the loop is split and unrolled. That code removes duplicate stores.
>>>
>>> Roland.
>>>
>