RFR 8080325 SuperWord loop unrolling analysis
Berg, Michael C
michael.c.berg at intel.com
Fri Jun 5 20:03:42 UTC 2015
Ok, will do, I will roll it into any comments I get from Roland. I have already changed my copy and will have in the final webrev.
-Michael
-----Original Message-----
From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
Sent: Friday, June 05, 2015 12:55 PM
To: Berg, Michael C; hotspot-compiler-dev at openjdk.java.net
Subject: Re: RFR 8080325 SuperWord loop unrolling analysis
Looks good. I would invert not_slp variable to positive is_slp -
if(is_slp) looks more readable.
Thanks,
Vladimir
On 6/4/15 9:45 PM, Berg, Michael C wrote:
> Vladimir, please find the following webrev with pretty much the full list of changes made.
> I made some improvements too. For instance I only allow the analysis to take place when we are trying to unroll beyond the default.
>
> http://cr.openjdk.java.net/~mcberg/8080325/webrev.01/
>
> Regards,
> -Michael
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Wednesday, June 03, 2015 5:25 PM
> To: Berg, Michael C; hotspot-compiler-dev at openjdk.java.net
> Subject: Re: RFR 8080325 SuperWord loop unrolling analysis
>
> Thank you, Michael, for this contribution.
>
> First, I am fine with such approach - call superword only to collect data about loop.
>
> It could be useful for superword optimization to have a pass over loop's nodes to determine if it could be vectorize as separate phase. And avoid to do that in SLP analysis.
>
> Make SuperWordLoopUnrollAnalysis flag's default value to 'false' and set it to true only in vm_version_x86.cpp (#ifdef COMPILER2) so that you don't need to modify setting on all platforms.
>
> In flag description say what 'slp' means (Superword Level Parallelism).
>
> Code style:
>
> if (cl->is_reduction_loop() == false) phase->mark_reductions(this);
>
> should be:
>
> if (!cl->is_reduction_loop()) {
> phase->mark_reductions(this);
> }
>
> An other one: cl->has_passed_slp() == false. We use ! for such cases.
>
> There are following checks after new code which prevent unrolling. Why you do analysis before them without affecting their decision? And you use the result of analysis only later at the end of method. Why not do analysis there then?
>
> (_local_loop_unroll_factor > 4) check should be done inside
> policy_slp_max_unroll() to have only one check. Actually all next code (lines 668-692) could be done at the end of policy_slp_max_unroll().
> And you don't need to return bool then (I changed name too):
>
> 665 if (LoopMaxUnroll > _local_loop_unroll_factor) {
> 666 // Once policy_slp_analysis succeeds, mark the loop with the
> 667 // maximal unroll factor so that we minimize analysis passes
> 668 policy_unroll_slp_analysis(cl, phase);
> 693 }
> 694 }
> 695
> 696 // Check for initial stride being a small enough constant
> 697 if (abs(cl->stride_con()) > (1<<2)*future_unroll_ct) return false;
>
> I think slp analysis code should be in superword.cpp - SWPointer should be used only there. Just add an other method to SuperWord class.
>
> Instead of a->Afree() and next:
> 810 Arena *a = Thread::current()->resource_area();
> 812 size_t ignored_size = _body.size()*sizeof(int*);
> 813 int *ignored_loop_nodes = (int*)a->Amalloc_D(ignored_size);
> 814 Node_Stack nstack((int)ignored_size);
>
> use:
>
> ResourceMark rm;
> size_t ignored_size = _body.size();
> int *ignored_loop_nodes = NEW_RESOURCE_ARRAY(int, ignored_size);
> Node_Stack nstack((int)ignored_size);
>
> Node_Stack should take number of nodes and not bytes.
>
> I am concern about nstack.clear() because you have poped all nodes on it.
>
> Thanks,
> Vladimir
>
> On 5/13/15 6:26 PM, Berg, Michael C wrote:
>> Hi Folks,
>>
>> We (Intel) would like to contribute superword loop unrolling analysis
>> to facilitate more default unrolling and larger SIMD vector mapping.
>> Please review this patch and comment as needed:
>>
>> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8080325
>>
>> webrev:
>>
>> http://cr.openjdk.java.net/~kvn/8080325/webrev/
>>
>> The design finds the highest common vector supported and implemented
>> on a given machine and applies that to unrolling, iff it is greater
>> than the default. If the user gates unrolling we will still obey the
>> user directive. It's light weight, when we get to the analysis part,
>> if we fail, we stop further tries. If we succeed we stop further
>> tries. We generally always analyze only once. We then gate the unroll
>> factor by extending the size of the unroll segment so that the
>> iterative tries will keep unrolling, obtaining larger unrolls of a
>> targeted loop. I see no negative behavior wrt to performance, and a
>> modest positive swing in default behavior up to 1.5x for some micros.
>>
>> Vladimir Koslov has offered to sponsor this patch.
>>
More information about the hotspot-compiler-dev
mailing list