RFR(M): 7197327: 40% regression on 8 b41 comp 8 b40 on specjvm2008.mpegaudio on oob
Vladimir Kozlov
vladimir.kozlov at oracle.com
Mon Jan 21 14:53:08 PST 2013
Roland,
What is effect on refworkload?
I would also use it for trigonometric and math (sqrt, log) intrinsics
(as separate RFE after checking performance effect).
I like you factoring of sorting code in sources.
Why ciConstantPoolCache::_keys is array of intptr_t and not int type
elements? We'r wasting space in 64bit VM and do conversion each time we
access elements.
In growableArray.hpp call f(value, key) only once.
In loopnode.cpp you miss 'return;' in second "if (stop_early)".
It is very expensive to do build_loop_* when stop_early is true. Could
you check before that if you have Expensive nodes with the same data
inputs instead of just number of such nodes? Your list sorting should be
cheaper. May be instead of flat list you should create list of pairs
which have same data inputs, it will simplify processing code.
Anyway, I think new process_expensive_nodes() method is too complex for
cases which are very very rare. I think you should narrow cases for this
optimization:
c1 = get_control(n1);
c2 = get_control(n2);
if (is_dominator(c1, c2)) {
c2 = c1;
} else if (is_dominator(c2, c1)) {
c1 = c2;
} else if (c1->is_Proj() && c1->in(0)->is_If() &&
is_dominator(c1->in(0), c2)) {
c1 = c2 = c1->in(0);
} else if (c2->is_Proj() && c2->in(0)->is_If() &&
is_dominator(c2->in(0), c1)) {
c1 = c2 = c2->in(0);
}
if (n1->in(0) != c1) {
n1->set_req(0, c1);
}
if (n2->in(0) != c2) {
n2->set_req(0, c2);
}
Note, when you skip UNC you don't check min_dom_depth but some data
inputs may depend on this UNC. So it may be not safe.
Thanks,
Vladimir
On 1/21/13 2:59 AM, Roland Westrelin wrote:
> One of the method has the following structure:
>
> for ( .. ) {
> ..
> if ( ..) {
> ..
> } else {
> if ( .. ) {
> if ( .. ) {
> ..
> } else {
> if ( .. ) {
> ..
> } else {
> Math.pow(x, y);
> }
> }
> } else {
> if ( .. ) {
> ..
> } else {
> Math.pow(x, y);
> }
> }
> ..
> }
>
> Both Math.pow(x,y) have the same inputs and so a single PowDNode is kept and it's scheduled in the else of the outer most if. So the pow computation is executed independently of the other if conditions, more frequently and because the computation is expensive there's a noticeable performance regression.
>
> The fix consists in:
>
> 1) setting the control input of the expensive nodes (PowDNode, ExpDNode) to prevent IGVN to freely common nodes
> 2) During the loop optimization pass, consider each expensive node and, when possible, modify the control input to allow optimization by the IGVN while making sure it's not executed more frequently
>
> http://cr.openjdk.java.net/~roland/7197327/
>
> To test it, given it's quite rare to have 2 PowDNodes with the same inputs, I applied the same technique to a bunch of other nodes:
>
> http://cr.openjdk.java.net/~roland/7197327/webrev.test/
>
> Roland.
>
More information about the hotspot-compiler-dev
mailing list