RFR(M): 7197327: 40% regression on 8 b41 comp 8 b40 on specjvm2008.mpegaudio on oob

Mon Jan 21 14:53:08 PST 2013

Roland,

What is effect on refworkload?

I would also use it for trigonometric and math (sqrt, log) intrinsics 
(as separate RFE after checking performance effect).

I like you factoring of sorting code in sources.

Why ciConstantPoolCache::_keys is array of intptr_t and not int type 
elements? We'r wasting space in 64bit VM and do conversion each time we 
access elements.

In growableArray.hpp call f(value, key) only once.

In loopnode.cpp you miss 'return;' in second "if (stop_early)".

It is very expensive to do build_loop_* when stop_early is true. Could 
you check before that if you have Expensive nodes with the same data 
inputs instead of just number of such nodes? Your list sorting should be 
cheaper. May be instead of flat list you should create list of pairs 
which have same data inputs, it will simplify processing code.

Anyway, I think new process_expensive_nodes() method is too complex for 
cases which are very very rare. I think you should narrow cases for this 
optimization:

  c1 = get_control(n1);
  c2 = get_control(n2);

  if (is_dominator(c1, c2)) {
    c2 = c1;
  } else if (is_dominator(c2, c1)) {
    c1 = c2;
  } else if (c1->is_Proj() && c1->in(0)->is_If() &&
             is_dominator(c1->in(0), c2)) {
    c1 = c2 = c1->in(0);
  } else if (c2->is_Proj() && c2->in(0)->is_If() &&
             is_dominator(c2->in(0), c1)) {
    c1 = c2 = c2->in(0);
  }
  if (n1->in(0) != c1) {
    n1->set_req(0, c1);
  }
  if (n2->in(0) != c2) {
    n2->set_req(0, c2);
  }

Note, when you skip UNC you don't check min_dom_depth but some data 
inputs may depend on this UNC. So it may be not safe.

Thanks,
Vladimir

On 1/21/13 2:59 AM, Roland Westrelin wrote:
> One of the method has the following structure:
>
> for ( .. ) {
>    ..
>    if ( ..) {
>      ..
>    } else {
>      if ( .. ) {
>        if ( .. ) {
>          ..
>        } else {
>          if ( .. ) {
>            ..
>          } else {
>            Math.pow(x, y);
>          }
>        }
>      } else {
>        if ( .. ) {
>          ..
>        } else {
>          Math.pow(x, y);
>        }
>    }
>    ..
> }
>
> Both Math.pow(x,y) have the same inputs and so a single PowDNode is kept and it's scheduled in the else of the outer most if. So the pow computation is executed independently of the other if conditions, more frequently and because the computation is expensive there's a noticeable performance regression.
>
> The fix consists in:
>
> 1) setting the control input of the expensive nodes (PowDNode, ExpDNode) to prevent IGVN to freely common nodes
> 2) During the loop optimization pass, consider each expensive node and, when possible, modify the control input to allow optimization by the IGVN while making sure it's not executed more frequently
>
> http://cr.openjdk.java.net/~roland/7197327/
>
> To test it, given it's quite rare to have 2 PowDNodes with the same inputs, I applied the same technique to a bunch of other nodes:
>
> http://cr.openjdk.java.net/~roland/7197327/webrev.test/
>
> Roland.
>