RFR 8080325 SuperWord loop unrolling analysis
Vladimir Kozlov
vladimir.kozlov at oracle.com
Thu Jun 4 00:25:27 UTC 2015
Thank you, Michael, for this contribution.
First, I am fine with such approach - call superword only to collect
data about loop.
It could be useful for superword optimization to have a pass over loop's
nodes to determine if it could be vectorize as separate phase. And avoid
to do that in SLP analysis.
Make SuperWordLoopUnrollAnalysis flag's default value to 'false' and set
it to true only in vm_version_x86.cpp (#ifdef COMPILER2) so that you
don't need to modify setting on all platforms.
In flag description say what 'slp' means (Superword Level Parallelism).
Code style:
if (cl->is_reduction_loop() == false) phase->mark_reductions(this);
should be:
if (!cl->is_reduction_loop()) {
phase->mark_reductions(this);
}
An other one: cl->has_passed_slp() == false. We use ! for such cases.
There are following checks after new code which prevent unrolling. Why
you do analysis before them without affecting their decision? And you
use the result of analysis only later at the end of method. Why not do
analysis there then?
(_local_loop_unroll_factor > 4) check should be done inside
policy_slp_max_unroll() to have only one check. Actually all next code
(lines 668-692) could be done at the end of policy_slp_max_unroll().
And you don't need to return bool then (I changed name too):
665 if (LoopMaxUnroll > _local_loop_unroll_factor) {
666 // Once policy_slp_analysis succeeds, mark the loop with the
667 // maximal unroll factor so that we minimize analysis passes
668 policy_unroll_slp_analysis(cl, phase);
693 }
694 }
695
696 // Check for initial stride being a small enough constant
697 if (abs(cl->stride_con()) > (1<<2)*future_unroll_ct) return false;
I think slp analysis code should be in superword.cpp - SWPointer should
be used only there. Just add an other method to SuperWord class.
Instead of a->Afree() and next:
810 Arena *a = Thread::current()->resource_area();
812 size_t ignored_size = _body.size()*sizeof(int*);
813 int *ignored_loop_nodes = (int*)a->Amalloc_D(ignored_size);
814 Node_Stack nstack((int)ignored_size);
use:
ResourceMark rm;
size_t ignored_size = _body.size();
int *ignored_loop_nodes = NEW_RESOURCE_ARRAY(int, ignored_size);
Node_Stack nstack((int)ignored_size);
Node_Stack should take number of nodes and not bytes.
I am concern about nstack.clear() because you have poped all nodes on it.
Thanks,
Vladimir
On 5/13/15 6:26 PM, Berg, Michael C wrote:
> Hi Folks,
>
> We (Intel) would like to contribute superword loop unrolling analysis to
> facilitate more default unrolling and larger SIMD vector mapping.
> Please review this patch and comment as needed:
>
> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8080325
>
> webrev:
>
> http://cr.openjdk.java.net/~kvn/8080325/webrev/
>
> The design finds the highest common vector supported and implemented on
> a given machine and applies that to unrolling, iff it is greater than
> the default. If the user gates unrolling we will still obey the user
> directive. It’s light weight, when we get to the analysis part, if we
> fail, we stop further tries. If we succeed we stop further tries. We
> generally always analyze only once. We then gate the unroll factor by
> extending the size of the unroll segment so that the iterative tries
> will keep unrolling, obtaining larger unrolls of a targeted loop. I see
> no negative behavior wrt to performance, and a modest positive swing in
> default behavior up to 1.5x for some micros.
>
> Vladimir Koslov has offered to sponsor this patch.
>
More information about the hotspot-compiler-dev
mailing list