RFR(M): 8080289: Intermediate writes in a loop not eliminated by optimizer

Roland Westrelin roland.westrelin at oracle.com
Wed Jun 17 08:35:47 UTC 2015


>> That’s what I think the code does. That is if you have:
>> 
>> st1->st2->st3->st4
> 
> I assume st4 is first store and st1 is last. Right?

Program order is:
st4
st3
st2
st1

>> and st3 is redundant with st1, the chain should become:
>> 
>> st1->st2->st4
> 
> I am not sure it is correct optimization. On some machines result of st3 could be visible before result of st2. And you change it.
> I am suggesting not do that. Do you need that for stores move from loop?

It’s not required. It cleans up the graph in some cases like this:

    static void test_after_5(int idx) {
        for (int i = 0; i < 1000; i++) {
            array[idx] = i;
            array[idx+1] = i;
            array[idx+2] = i;
            array[idx+3] = i;
            array[idx+4] = i;
            array[idx+5] = i;
        }
    }

all stores are sunk out of the loop but that happens after iteration splitting and so there are multiple redundant copies of each store that are not collapsed. 

This said, we currently reorder the stores even if it’s less aggressive than what I’m proposing. With program:

st4
st3
st2
st1

If st1, st3 and st4 are on one slice and st2 is on another and if st1 and st3 store to the same address we optimize st3 out:

st4
st2
st1

so st3=st1 may only be visible after st2.

Also, the way I read the first table in this:

http://gee.cs.oswego.edu/dl/jmm/cookbook.html

it’s allowed to reorder normal stores with normal stores

>> so we need to change the memory input of st2 when we find st3 can be removed. In the code, at that point, this=st1, st = st3 and prev=st2.
> 
> In this case the code should be:
> 
>   if (st->in(MemNode::Address)->eqv_uncast(address) &&
>     ...
>   } else {
>     prev = st;
>   }
> 
> to update 'prev' with 'st' only if 'st' is not removed.

You’re right.

> Also, I think, st->in(MemNode::Memory) could be put in local var since it is used several times in this code.
> 
>> 
>>> You need to set improved = true since 'this' will not change. We also use 'make_progress' variable's name in such cases.
>> 
>> In the example above, if we remove st2, we modify this, right?
> 
> We need to call Ideal() again if store inputs are changed. So if st2 is removed then inputs of st1 are changed so we need to rerun Ideal(). This allow to avoid having your new loop in the Ideal().

Sorry, I don’t understand this. Are you saying there’s no need for a loop at all? Or are you saying that as soon as there’s a similar store that is found we should return from Ideal that will be called again to maybe find other similar stores?

>> We’ll find a path from the head that doesn’t go through the store and that exits the loop. What the comment doesn’t say is that with example 2 below:
>> 
>> for (int i = 0; i < 10; i++) {
>>   if (some_condition) {
>>     uncommon_trap();
>>   }
>>   array[idx] = 999;
>> }
>> 
>> my verification code would find the early exit as well.
>> 
>> It’s verification code only because if we have example 1 above, then we have a memory Phi to merge both branches of the if. So the pattern that we look for in PhaseIdealLoop::try_move_store_before_loop() won’t match: the loop’s memory Phi backedge won’t be the store. If we have example 2 above, then the loop’s memory Phi doesn’t have a single memory use. So I don’t think we need to check that the store post dominate the loop head in product. That’s my reasoning anyway and the verification code is there to verify it.
> 
> I missed 'mem->in(LoopNode::LoopBackControl) == n' condition. Which reduce cases only to one store to this address in the loop - good.
> 
> How you check in product VM that there are no other exists from a loop (your example 2)? Is it guarded by mem->outcnt() == 1 check?

Yes.

>>> Should you check phi == NULL instead of assert to make sure you have only one Phi node?
>> 
>> Can there be more than one memory Phi for a particular slice that has in(0) == n_loop->_head?
>> I would have expected that to be impossible.
> 
> BOTTOM (all slices) Phi?

Wouldn’t there be a MergeMem between the store and the Phi then?

For the record, the webrev:

http://cr.openjdk.java.net/~roland/8080289/webrev.00/

Roland.



More information about the hotspot-compiler-dev mailing list