RFR(M): 7197327: 40% regression on 8 b41 comp 8 b40 on specjvm2008.mpegaudio on oob

Mon Jan 21 15:33:24 PST 2013

On 1/21/13 2:53 PM, Vladimir Kozlov wrote:
> Roland,
>
> What is effect on refworkload?
>
> I would also use it for trigonometric and math (sqrt, log) intrinsics
> (as separate RFE after checking performance effect).
>
> I like you factoring of sorting code in sources.
>
> Why ciConstantPoolCache::_keys is array of intptr_t and not int type
> elements? We'r wasting space in 64bit VM and do conversion each time we
> access elements.
>
> In growableArray.hpp call f(value, key) only once.
>
> In loopnode.cpp you miss 'return;' in second "if (stop_early)".
>
> It is very expensive to do build_loop_* when stop_early is true. Could
> you check before that if you have Expensive nodes with the same data
> inputs instead of just number of such nodes? Your list sorting should be
> cheaper. May be instead of flat list you should create list of pairs
> which have same data inputs, it will simplify processing code.

And you should remove control edge if there are no pairs after 
incremental inlining finished (no new Expensive node added).

>
> Anyway, I think new process_expensive_nodes() method is too complex for
> cases which are very very rare. I think you should narrow cases for this
> optimization:

Also finding best control (move from loop) for Expensive nodes could be 
done in build_loop_early() method if data inputs dominate loop and the 
loop head is the control edge of Expensive node.

Thanks,
Vladimir

>
>   c1 = get_control(n1);
>   c2 = get_control(n2);
>
>   if (is_dominator(c1, c2)) {
>     c2 = c1;
>   } else if (is_dominator(c2, c1)) {
>     c1 = c2;
>   } else if (c1->is_Proj() && c1->in(0)->is_If() &&
>              is_dominator(c1->in(0), c2)) {
>     c1 = c2 = c1->in(0);
>   } else if (c2->is_Proj() && c2->in(0)->is_If() &&
>              is_dominator(c2->in(0), c1)) {
>     c1 = c2 = c2->in(0);
>   }
>   if (n1->in(0) != c1) {
>     n1->set_req(0, c1);
>   }
>   if (n2->in(0) != c2) {
>     n2->set_req(0, c2);
>   }
>
> Note, when you skip UNC you don't check min_dom_depth but some data
> inputs may depend on this UNC. So it may be not safe.
>
> Thanks,
> Vladimir
>
> On 1/21/13 2:59 AM, Roland Westrelin wrote:
>> One of the method has the following structure:
>>
>> for ( .. ) {
>>    ..
>>    if ( ..) {
>>      ..
>>    } else {
>>      if ( .. ) {
>>        if ( .. ) {
>>          ..
>>        } else {
>>          if ( .. ) {
>>            ..
>>          } else {
>>            Math.pow(x, y);
>>          }
>>        }
>>      } else {
>>        if ( .. ) {
>>          ..
>>        } else {
>>          Math.pow(x, y);
>>        }
>>    }
>>    ..
>> }
>>
>> Both Math.pow(x,y) have the same inputs and so a single PowDNode is
>> kept and it's scheduled in the else of the outer most if. So the pow
>> computation is executed independently of the other if conditions, more
>> frequently and because the computation is expensive there's a
>> noticeable performance regression.
>>
>> The fix consists in:
>>
>> 1) setting the control input of the expensive nodes (PowDNode,
>> ExpDNode) to prevent IGVN to freely common nodes
>> 2) During the loop optimization pass, consider each expensive node
>> and, when possible, modify the control input to allow optimization by
>> the IGVN while making sure it's not executed more frequently
>>
>> http://cr.openjdk.java.net/~roland/7197327/
>>
>> To test it, given it's quite rare to have 2 PowDNodes with the same
>> inputs, I applied the same technique to a bunch of other nodes:
>>
>> http://cr.openjdk.java.net/~roland/7197327/webrev.test/
>>
>> Roland.
>>