(Resend with description) Re: Request for review (XS): 6837146 Should perform unswitch before maximally unroll in loop transformation
Tom Rodriguez
Thomas.Rodriguez at Sun.COM
Tue May 5 11:49:18 PDT 2009
I think your test case is going to be dominated by allocation since
the work is so small. Don't reallocate the array each time and make
the unknown into a static field. Maybe this:
class Test1 {
static int unknown;
public static void init(int src[], int dst[]) {
// initialize the arrays
for (int i =0; i<4; i++) {
dst[i] = i;
if(unknown >= 5)
src[i] = i; }
}
static int[] src = new int[4];
static int[] dst = new int[4];
public static void test() {
init(src, dst);
}
public static void main(String[] args) {
long start = System.currentTimeMillis();
for (int i=0; i< 2000000; i++) {
test();
}
long end = System.currentTimeMillis();
System.out.println(end-start);
}
}
The opto for that is kind of interesting because we end up unswitching
and unrolling the 0 to 20000000 loop and produce code like this:
0e0 B10: # B10 B11 <- B9 B10 Loop: B10-B10 inner stride:
not constant main of N249 Freq: 999988
0e0 + STW #0,[R_L4 + #12]
0e4 + STW R_L1,[R_L4 + #16]
0e8 + STW R_L3,[R_L4 + #20]
0ec + STW R_G4,[R_L4 + #24]
0f0 + STW R_L1,[R_L4 + #16]
0f4 + STW R_L3,[R_L4 + #20]
0f8 + STW R_G4,[R_L4 + #24]
....
Surprisingly we don't eliminate the redundant stores though I believe
we have code does that. The other unswitched loop is more interesting
with a similar problem:
300 B32: # B32 B33 <- B31 B32 Loop: B32-B32 inner stride:
not constant main of N220 Freq: 0.476827
300 + STW #0,[R_L4 + #12]
304 + STW #0,[R_L7 + #12]
308 + STW R_L1,[R_L4 + #16]
30c + STW R_L1,[R_L7 + #16]
310 + STW R_L3,[R_L4 + #20]
314 + STW R_L3,[R_L7 + #20]
318 + STW R_G4,[R_L4 + #24]
31c + STW R_G4,[R_L7 + #24]
320 + STW #0,[R_L4 + #12]
324 + STW #0,[R_L7 + #12]
328 + STW R_L1,[R_L4 + #16]
32c + STW R_L1,[R_L7 + #16]
330 + STW R_L3,[R_L4 + #20]
334 + STW R_L3,[R_L7 + #20]
338 + STW R_G4,[R_L4 + #24]
33c + STW R_G4,[R_L7 + #24]
We can't know that L4 and L7 point to different objects but since
we're storing the same value we could in fact eliminate the later
stores. Neither of these pieces of code are that representative I
think but they it's surprising we do badly on them.
I think the version of you test below might give you measurable results.
class Test1 {
static int unknown;
public static void init(int src[], int dst[]) {
// initialize the arrays
for (int i =0; i<4; i++) {
dst[i] = i;
if(unknown >= 5)
src[i] = i; }
}
static int[] src = new int[4];
static int[] dst = new int[4];
public static void test() {
init(src, dst);
}
public static void main(String[] args) {
long start = System.currentTimeMillis();
for (int i=0; i< 2000000; i++) {
unknown = i % 10;
test();
}
long end = System.currentTimeMillis();
System.out.println(end-start);
}
}
tom
On May 5, 2009, at 11:23 AM, Changpeng Fang wrote:
>
> On 05/04/09 16:34, Tom Rodriguez wrote:
>> This seems ok. Was there an example that motivated this change?
>>
> For the following Test1.java with -XX:CompileOnly=Test1.init, you
> should see
> the loop is not unswitched. However, I could not observe perform
> difference
> with my proposed change even though the loop is unswitched after the
> change.
>
> Thanks,
>
> Changpeng
>
> class Test1 {
>
> public static void init(int src[], int dst[], int unknown) {
> // initialize the arrays
> for (int i =0; i<4; i++) {
> dst[i] = i;
> if(unknown >= 5)
> src[i] = i; }
> }
>
> public static void test() {
> int[] src = new int[4];
> int[] dst = new int[4];
>
> init(src, dst, 5);
> }
>
> public static void main(String[] args) {
> long start = System.currentTimeMillis();
> for (int i=0; i< 2000000; i++) {
> test();
> }
> long end = System.currentTimeMillis();
> System.out.println(end-start);
> }
> }
>
>
>
>
>
>
>
>
>> On May 4, 2009, at 3:49 PM, Changpeng Fang wrote:
>>
>>> http://cr.openjdk.java.net/~cfang/6837146/webrev.00/
>>>
>>> Problem Summary:
>>> The concern is the ordering of maximally unroll and unswitch of a
>>> loop in loop transformation.
>>> Current implementation is that maximally unroll is performed
>>> before loop unswitch. The problem
>>> is if a loop is maximally unrolled (fully unrolled), it will never
>>> be unswitched. This will leave
>>> many conditional statements (blocks) in the code which should not
>>> appear if the loop can be
>>> unswitched.
>>>
>>> Proposed Solution:
>>> Change the ordering of unswitch and maximally_unroll in loop
>>> transformation. After unswitch,
>>> the loop should still be able to be maximally unrolled.
>>>
>>> Thanks,
>>>
>>> Changpeng
>>>
>>>
>>>
>>> On 05/04/09 15:05, Changpeng Fang wrote:
>>>> http://cr.openjdk.java.net/~cfang/6837146/webrev.00/
>>>>
>>>> Summary: Change the ordering of do_unswitch and
>>>> do_maximally_unroll in loop transformation.
>>>>
>>>> Thanks,
>>>>
>>>> Changpeng
>>>>
>>>
>>
>
More information about the hotspot-compiler-dev
mailing list