(Resend with description) Re: Request for review (XS): 6837146 Should perform unswitch before maximally unroll in loop transformation
Changpeng Fang
Changpeng.Fang at Sun.COM
Tue May 5 12:29:06 PDT 2009
With the modified version of Test1, I saw a 3% improvement on x86 with
my change.
While the improvement itself is not impressive, I am just thinking the
logic should be
that unswitch should be performed before fully unroll.
I remember that we can not eliminate redundant store if someone else
uses the memory
state of the first store. However, sometime, that memory state usage
is not real use, it is
just created during unrolling (the actual usage should be the Phi Node).
I have an initial
idea to detect such wrong memory state usage by dependence analysis, but
I am afraid
this will schedule most loads to the beginning of the loop in some case.
Thanks,
Changpeng
On 05/05/09 11:49, Tom Rodriguez wrote:
> I think your test case is going to be dominated by allocation since
> the work is so small. Don't reallocate the array each time and make
> the unknown into a static field. Maybe this:
>
> class Test1 {
>
> static int unknown;
>
> public static void init(int src[], int dst[]) {
> // initialize the arrays
> for (int i =0; i<4; i++) {
> dst[i] = i;
> if(unknown >= 5)
> src[i] = i; }
> }
>
> static int[] src = new int[4];
> static int[] dst = new int[4];
>
> public static void test() {
> init(src, dst);
> }
>
> public static void main(String[] args) {
> long start = System.currentTimeMillis();
> for (int i=0; i< 2000000; i++) {
> test();
> }
> long end = System.currentTimeMillis();
> System.out.println(end-start);
> }
> }
>
> The opto for that is kind of interesting because we end up unswitching
> and unrolling the 0 to 20000000 loop and produce code like this:
>
> 0e0 B10: # B10 B11 <- B9 B10 Loop: B10-B10 inner stride:
> not constant main of N249 Freq: 999988
> 0e0 + STW #0,[R_L4 + #12]
> 0e4 + STW R_L1,[R_L4 + #16]
> 0e8 + STW R_L3,[R_L4 + #20]
> 0ec + STW R_G4,[R_L4 + #24]
> 0f0 + STW R_L1,[R_L4 + #16]
> 0f4 + STW R_L3,[R_L4 + #20]
> 0f8 + STW R_G4,[R_L4 + #24]
> ....
>
> Surprisingly we don't eliminate the redundant stores though I believe
> we have code does that. The other unswitched loop is more interesting
> with a similar problem:
>
> 300 B32: # B32 B33 <- B31 B32 Loop: B32-B32 inner stride:
> not constant main of N220 Freq: 0.476827
> 300 + STW #0,[R_L4 + #12]
> 304 + STW #0,[R_L7 + #12]
> 308 + STW R_L1,[R_L4 + #16]
> 30c + STW R_L1,[R_L7 + #16]
> 310 + STW R_L3,[R_L4 + #20]
> 314 + STW R_L3,[R_L7 + #20]
> 318 + STW R_G4,[R_L4 + #24]
> 31c + STW R_G4,[R_L7 + #24]
> 320 + STW #0,[R_L4 + #12]
> 324 + STW #0,[R_L7 + #12]
> 328 + STW R_L1,[R_L4 + #16]
> 32c + STW R_L1,[R_L7 + #16]
> 330 + STW R_L3,[R_L4 + #20]
> 334 + STW R_L3,[R_L7 + #20]
> 338 + STW R_G4,[R_L4 + #24]
> 33c + STW R_G4,[R_L7 + #24]
>
> We can't know that L4 and L7 point to different objects but since
> we're storing the same value we could in fact eliminate the later
> stores. Neither of these pieces of code are that representative I
> think but they it's surprising we do badly on them.
>
> I think the version of you test below might give you measurable results.
>
> class Test1 {
>
> static int unknown;
>
> public static void init(int src[], int dst[]) {
> // initialize the arrays
> for (int i =0; i<4; i++) {
> dst[i] = i;
> if(unknown >= 5)
> src[i] = i; }
> }
>
> static int[] src = new int[4];
> static int[] dst = new int[4];
>
> public static void test() {
> init(src, dst);
> }
>
> public static void main(String[] args) {
> long start = System.currentTimeMillis();
> for (int i=0; i< 2000000; i++) {
> unknown = i % 10;
> test();
> }
> long end = System.currentTimeMillis();
> System.out.println(end-start);
> }
> }
>
> tom
>
> On May 5, 2009, at 11:23 AM, Changpeng Fang wrote:
>
>>
>> On 05/04/09 16:34, Tom Rodriguez wrote:
>>> This seems ok. Was there an example that motivated this change?
>>>
>> For the following Test1.java with -XX:CompileOnly=Test1.init, you
>> should see
>> the loop is not unswitched. However, I could not observe perform
>> difference
>> with my proposed change even though the loop is unswitched after the
>> change.
>>
>> Thanks,
>>
>> Changpeng
>>
>> class Test1 {
>>
>> public static void init(int src[], int dst[], int unknown) {
>> // initialize the arrays
>> for (int i =0; i<4; i++) {
>> dst[i] = i;
>> if(unknown >= 5)
>> src[i] = i; }
>> }
>>
>> public static void test() {
>> int[] src = new int[4];
>> int[] dst = new int[4];
>>
>> init(src, dst, 5);
>> }
>>
>> public static void main(String[] args) {
>> long start = System.currentTimeMillis();
>> for (int i=0; i< 2000000; i++) {
>> test();
>> }
>> long end = System.currentTimeMillis();
>> System.out.println(end-start);
>> }
>> }
>>
>>
>>
>>
>>
>>
>>
>>
>>> On May 4, 2009, at 3:49 PM, Changpeng Fang wrote:
>>>
>>>> http://cr.openjdk.java.net/~cfang/6837146/webrev.00/
>>>>
>>>> Problem Summary:
>>>> The concern is the ordering of maximally unroll and unswitch of a
>>>> loop in loop transformation.
>>>> Current implementation is that maximally unroll is performed before
>>>> loop unswitch. The problem
>>>> is if a loop is maximally unrolled (fully unrolled), it will never
>>>> be unswitched. This will leave
>>>> many conditional statements (blocks) in the code which should not
>>>> appear if the loop can be
>>>> unswitched.
>>>>
>>>> Proposed Solution:
>>>> Change the ordering of unswitch and maximally_unroll in loop
>>>> transformation. After unswitch,
>>>> the loop should still be able to be maximally unrolled.
>>>>
>>>> Thanks,
>>>>
>>>> Changpeng
>>>>
>>>>
>>>>
>>>> On 05/04/09 15:05, Changpeng Fang wrote:
>>>>> http://cr.openjdk.java.net/~cfang/6837146/webrev.00/
>>>>>
>>>>> Summary: Change the ordering of do_unswitch and
>>>>> do_maximally_unroll in loop transformation.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Changpeng
>>>>>
>>>>
>>>
>>
>
More information about the hotspot-compiler-dev
mailing list