RFR(M): 7197327: 40% regression on 8 b41 comp 8 b40 on specjvm2008.mpegaudio on oob

Tue Jan 22 00:39:04 PST 2013

Hi Vladimir,

Thanks for reviewing this.

> What is effect on ref workload?

I haven't tried yet.

> I would also use it for trigonometric and math (sqrt, log) intrinsics (as separate RFE after checking performance effect).

Ok. I will file an RFE.

> I like you factoring of sorting code in sources.

It doesn't look like you like how it's used in the new code so I guess I'll file an RFE for the clean-up if it's no longer related to this change.

> Why ciConstantPoolCache::_keys is array of intptr_t and not int type elements? We'r wasting space in 64bit VM and do conversion each time we access elements.

I'll file an RFE for this one as well.

> In growableArray.hpp call f(value, key) only once.
> 
> In loopnode.cpp you miss 'return;' in second "if (stop_early)".
> 
> It is very expensive to do build_loop_* when stop_early is true. Could you check before that if you have Expensive nodes with the same data inputs instead of just number of such nodes? Your list sorting should be cheaper. May be instead of flat list you should create list of pairs which have same data inputs, it will simplify processing code.

I don't understand what pairs you are referring to. There can be more than 2 identical expensive nodes in a method, right?

> Anyway, I think new process_expensive_nodes() method is too complex for cases which are very very rare. I think you should narrow cases for this optimization:

Are you suggesting I should drop all the process_expensive_nodes() and replace it by the code below that would be applied 2 by 2 to pair of nodes with same data inputs?

> c1 = get_control(n1);
> c2 = get_control(n2);
> 
> if (is_dominator(c1, c2)) {
>   c2 = c1;
> } else if (is_dominator(c2, c1)) {
>   c1 = c2;
> } else if (c1->is_Proj() && c1->in(0)->is_If() &&
>            is_dominator(c1->in(0), c2)) {
>   c1 = c2 = c1->in(0);
> } else if (c2->is_Proj() && c2->in(0)->is_If() &&
>            is_dominator(c2->in(0), c1)) {
>   c1 = c2 = c2->in(0);
> }
> if (n1->in(0) != c1) {
>   n1->set_req(0, c1);
> }
> if (n2->in(0) != c2) {
>   n2->set_req(0, c2);
> }

My code has 3 steps:

1- Move the node up the dominator tree along paths with same frequency
2- Handle If with same computation in both branches
3- Check for computation that dominates another computation

1- and 2- is done in a loop until no more progress.

You code above does 2- and 3-. Are you suggesting 1- can be dropped? Or that there's no need for 1- and 2- as separate steps in a loop?

I wrote the code in process_expensive_nodes() so that it optimizes this correctly:

    static double m1(double a, double b) {
        double r1 = 0, r2 = 0, r3 = 0;

        for(int i = 0; i < 10000; i++) {
            if (a > 0) {
                if (b > 0) {
                    if (b > a) {
                        r3 = Math.pow(a,b);
                        r2 = Math.exp(r3);
                    } else {
                        r3 = Math.pow(a,b);
                        if (b > a) {
                            m2();
                        }
                        r2 = Math.exp(r3);
                    }
                } else {
                    r3 = Math.pow(a,b);
                    r2 = Math.exp(r3);
                }
            } else {
                r3 = Math.pow(a, b);
                r2 = Math.exp(r3);
            }
        }
        r3 = Math.pow(a, b);
        r2 = Math.exp(r3);
        return r1 + r2 + r3;
    }

Roland.