Request for reviews (XXL): 5091921: Sign flip issues in loop optimizer

Tue May 3 19:55:49 PDT 2011

On May 3, 2011, at 6:37 PM, Vladimir Kozlov wrote:

> Thank you very much, Tom
> 
> Tom Rodriguez wrote:
>> Overall it looks good I think.  I believe I understand it for the most part but I'm assuming that the tests you've added are getting us good coverage too.  There are still some perf regression too right? 
> 
> Yes, mostly scimark Sparse and LU, I don't see affect on jbb (in refworkload). And I'm expecting performance results from aurora done by perf team. I talked with Dave and perf team and we accept performance lost with this fix. I am also looking on generated code and we may recover some perf lost with optimizations in lcm and igvn optimizations for CMove and Min, Max nodes.
> 
>> Would it be possible to factor some of the big new pieces of code in loopTransform.cpp?  I know it's not always easy since there's often a lot of state required but if there's anything you can do that would be great.  You don't have to do it as part of this change if you want to do it later.
> 
> Yes, I will do that clean up later. I did a lot of testing with current code and I don't want to modify it just before the push.

That's what I figured.

> 
>> I have a vague memory of the need for more optimizations around min and max nodes to reduce some of the overhead.  Presumably we don't need that because we're using predicates?
> 
> We still need to do more optimizations for min and max even with predicates since not all RCE is done by predicates. Predicates currently do RCE only for arrays range checks.

Ok.

> 
>> loopTransform.cpp:
>> symetrical -> symmetrical
>> fliped -> flipped
>> guaranty -> guarantee
> 
> Corrected.
> 
>> What did you mean here?
>> +             // Use not adjusted previous limit as old limit.
> 
> I am trying to use original (not adjusted) limit to avoid nested CMove nodes. After first unroll the new limit is:
> 
> adj_limit1 = orig_limit - stride;
> new_limit1 = (orig_limit < adj_limit1) ? MININT : adj_limit1;
> 
> If I do simple adjustment the next new limit for next unroll is:
> 
> adj_limit2 = new_limit1 - stride;
> new_limit2 = (new_limit1 < adj_limit2) ? MININT : adj_limit2;
> 
> which will introduce nested CMove nodes, so I do next instead:
> 
> adj_limit2 = adj_limit1 - stride;
> new_limit2 = (orig_limit < adj_limit2) ? MININT : adj_limit2;

Use "original" or "unadjusted" instead.  Not adjusted doesn't sound right.

> 
>> loopnode.hpp:
>> What do you mean by Normal below?  That it has the normal meaning?  Just leave it out.
>> +   enum { Normal=0, Init=1, Limit=2, Stride=3 };
> 
> Removed. It was copy-paste typo :)
> 
>> macro.cpp:
>> It's not really on the macro list is it?
> 
> It IS on macro list. What is a problem to have it on macro list?
> 
>  LoopLimitNode( Compile* C, Node *init, Node *limit, Node *stride ) : Node(0,init,limit,stride) {
>    // Put it on the Macro nodes list to optimize during macro nodes expansion.
>    init_flags(Flag_is_macro);
>    C->add_macro_node(this);
>  }

I didn't notice that part.  It just doesn't seem very much like the other macros.  Why is it a macro?  It's not expanded by macro.cpp.  Is it just so that it's reprocessed by GVN at the end?  Why isn't it just expanded by macro.cpp then?  I'm a bit dubious of guards like this in the first place:

+   // Delay following optimizations until all loop optimizations
+   // done to keep Ideal graph simple.
+   if (!can_reshape || phase->C->major_progress())
+     return NULL;

It seems more like there should be a more general way to ask what phase we are in instead.  Anyway, I'm fine if you leave it as is.  It just seems odd.  Maybe macro is the wrong name for what that list has become.

tom

> 
> Thank you,
> Vladimir
> 
>> tom
>> On Apr 28, 2011, at 4:04 PM, Vladimir Kozlov wrote:
>>> http://cr.openjdk.java.net/~kvn/5091921/webrev
>>> 
>>> Fixed 5091921: Sign flip issues in loop optimizer
>>> 
>>> Loop optimizer in Server VM does not take into account integer overflow when it generates code for loop limits calculation during counted loop construction, unrolling and range check elimination. As result generated code will behave incorrectly when an integer overflow happens.
>>> 
>>> 1. Added limit check for Counted loops which, if failed, causes recompilation of the method without counted loop. It based on loop predication code we have already but we don't need to copy it after corresponding counted loop is created (this is why a new flag is passed to loop predicate copy methods).
>>> 
>>> if (limit>= max_int-stride)
>>> uncommon_trap
>>> counted_loop init, limit, stride
>>> 
>>> 2. Long arithmetic is used to calculate exact limit only when it is needed: empty loop elimination and predicated range check elimination. New ideal macro node LoopLimitNode is used but corresponding mach node is created only for 32-bit x86 to get better assembler code. Sparc and x64 have long registers so generated assembler is good enough without specialized mach node. Also delay LoopLimitNode transformation until all loop optimizations are done (by marking LoopLimitNode as macro node).
>>> 
>>> 3. New limit after unroll calculated as:
>>> 
>>> new_limit = limit-stride
>>> new_limit = (limit < new_limit) ? MININT : new_limit;
>>> 
>>> instead of current expression:
>>> 
>>> new_limit = init + (((limit-init)/stride)&(-2))*stride
>>> 
>>> Added other checks to avoid limit value overflow during unroll. Also fully unroll loops with up to 3 trip count.
>>> 
>>> 4. Added underflow check for normal compares in loops which are optimized out using range check elimination code (I also refactored the code):
>>> 
>>> MIN_INT <= scale*I+offset < limit
>>> 
>>> ----------------------------------------------
>>> These changes are put under flags to debug associated problems. I also allowed to print inlining decisions in product VM since PrintInlining is diagnostic flag now. New regression tests are added based on associated bug reports.
>>> 
>>> These changes cause performance regression in benchmarks. Some of regression cases can't not be avoided but some we will address in a future.
>>> 
>>> And finally I want to thank java VM team from HP who provided nice test cases and even suggested fix. I used their idea for unroll limit fix.