RFR 8080325 SuperWord loop unrolling analysis

Thu Jun 4 00:25:27 UTC 2015

Thank you, Michael, for this contribution.

First, I am fine with such approach - call superword only to collect 
data about loop.

It could be useful for superword optimization to have a pass over loop's 
nodes to determine if it could be vectorize as separate phase. And avoid 
to do that in SLP analysis.

Make SuperWordLoopUnrollAnalysis flag's default value to 'false' and set 
it to true only in vm_version_x86.cpp (#ifdef COMPILER2) so that you 
don't need to modify setting on all platforms.

In flag description say what 'slp' means (Superword Level Parallelism).

Code style:

if (cl->is_reduction_loop() == false) phase->mark_reductions(this);

should be:

if (!cl->is_reduction_loop()) {
   phase->mark_reductions(this);
}

An other one: cl->has_passed_slp() == false. We use ! for such cases.

There are following checks after new code which prevent unrolling. Why 
you do analysis before them without affecting their decision? And you 
use the result of analysis only later at the end of method. Why not do 
analysis there then?

(_local_loop_unroll_factor > 4) check should be done inside 
policy_slp_max_unroll() to have only one check. Actually all next code 
(lines 668-692) could be done at the end of policy_slp_max_unroll().
And you don't need to return bool then (I changed name too):

  665     if (LoopMaxUnroll > _local_loop_unroll_factor) {
  666       // Once policy_slp_analysis succeeds, mark the loop with the
  667       // maximal unroll factor so that we minimize analysis passes
  668       policy_unroll_slp_analysis(cl, phase);
  693     }
  694   }
  695
  696   // Check for initial stride being a small enough constant
  697   if (abs(cl->stride_con()) > (1<<2)*future_unroll_ct) return false;

I think slp analysis code should be in superword.cpp - SWPointer should 
be used only there. Just add an other method to SuperWord class.

Instead of a->Afree() and next:
  810       Arena *a = Thread::current()->resource_area();
  812       size_t ignored_size = _body.size()*sizeof(int*);
  813       int *ignored_loop_nodes = (int*)a->Amalloc_D(ignored_size);
  814       Node_Stack nstack((int)ignored_size);

use:

    ResourceMark rm;
    size_t ignored_size = _body.size();
    int *ignored_loop_nodes = NEW_RESOURCE_ARRAY(int, ignored_size);
    Node_Stack nstack((int)ignored_size);

Node_Stack should take number of nodes and not bytes.

I am concern about nstack.clear() because you have poped all nodes on it.

Thanks,
Vladimir

On 5/13/15 6:26 PM, Berg, Michael C wrote:
> Hi Folks,
>
> We (Intel) would like to contribute superword loop unrolling analysis to
> facilitate more default unrolling and larger SIMD vector mapping.
> Please review this patch and comment as needed:
>
> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8080325
>
> webrev:
>
> http://cr.openjdk.java.net/~kvn/8080325/webrev/
>
> The design finds the highest common vector supported and implemented on
> a given machine and applies that to unrolling, iff it is greater than
> the default. If the user gates unrolling we will still obey the user
> directive. It’s light weight, when we get to the analysis part, if we
> fail, we stop further tries. If we succeed we stop further tries. We
> generally always analyze only once. We then gate the unroll factor by
> extending the size of the unroll segment so that the iterative tries
> will keep unrolling, obtaining larger unrolls of a targeted loop. I see
> no negative behavior wrt to performance, and a modest positive swing in
> default behavior up to 1.5x for some micros.
>
> Vladimir Koslov has offered to sponsor this patch.
>