RFR(L): 8186027: C2: loop strip mining

Fri Oct 27 21:19:45 UTC 2017

First observations.

src/hotspot/share/opto/c2_globals.hpp

We have uint and int types for flags now. Don't use uintx, which is 64-bit.

src/hotspot/share/runtime/arguments.cpp

I agree that UseCountedLoopSafepoints should enable strip mining by default. I am concern about 
enabling UseCountedLoopSafepoints by default. I will look on performance data late. But for 
regular/nightly testing we need to add special testing with it on and off.

src/hotspot/share/opto/loopnode.hpp

Should we just make _loop_flags field type uint (32-bit) since we hit 16-bit limit?

There is confusion (because you did not have enough bits?) about which loops are marked as 
strip_mined. I thought it is only inner loop but it looks like out (skeleton) loop also marked as 
such. I would suggest to mark them differently.

I was thinking may be we should create new Loop node subclass for outer loop. Then you don't need 
special flag for it and it will be obvious what they are in Ideal Graph. The same for outer loop end 
node.

src/hotspot/share/opto/superword.cpp

Where next change come from?

+      if (t2->Opcode() == Op_AddI && t2 == _lp->as_CountedLoop()->incr()) continue; // don't mess 
with the iv

Thanks,
Vladimir

On 10/27/17 8:50 AM, Vladimir Kozlov wrote:
> I ran pre-integration testing with latest webrev.01 and it passed.
> But, give me more time to look though changes.
> 
> Thanks,
> Vladimir
> 
> On 10/25/17 7:29 AM, Roland Westrelin wrote:
>>
>> Hi Vladimir,
>>
>> Thanks for looking at this.
>>
>>> Did you consider less intrusive approach by adding branch over
>>> SafePoint with masking on index variable?
>>>
>>>     int mask = LoopStripMiningMask * inc; // simplified
>>>     for (int i = start; i < stop; i += inc) {
>>>        // body
>>>        if (i & mask != 0) continue;
>>>        safepoint;
>>>     }
>>>
>>> Or may be doing it inside .ad file in new SafePoint node
>>> implementation so that ideal graph is not affected.
>>
>> We're looking for the best trade off between latency and thoughput: we
>> want the safepoint poll overhead to be entirely eliminated even when the
>> safepoint doesn't trigger.
>>
>>> I am concern that suggested changes may affect Range Check elimination
>>> (you changed limit to variable value/flag) in addition to complexity
>>> of changes which may affect stability of C2.
>>
>> The CountedLoop that is created with my patch is strictly identical to
>> the CountedLoop created today with -UseCountedLoopSafepoints. Bounds are
>> not changed at that time. They are left as they are today. The
>> difference, with loop strip mining, is that the counted loop has a
>> skeleton outer loop. The bounds of the counted loop are adjusted once
>> loop opts are over. If the counted loop has a predicate, the predicate
>> is moved out of loop just as it is today. The only difference with
>> today, is that the predicate should be moved out of the outer loop. If a
>> pre and post loop needs to be created, then the only difference with
>> today is that the clones need to be moved out of the outer loop and
>> logic that locate the pre from the main loop need to account for the
>> outer loop.
>>
>> It's obviously a complex change so if your primary concern is stability
>> then loop strip mining can be disabled by default. Assuming strip mining
>> off, then that patch is mostly some code refactoring and some logic that
>> never triggers.
>>
>> Roland.
>>