RFR(L): 8186027: C2: loop strip mining

Mon Nov 27 18:19:55 UTC 2017

Second pre-integration re-run finshed and there were timeouts. Looks like it depend what hosts were used to run tests.

I agree to enable it by default on subset if GCs which are sensitive to pauses: G1 and Shenandoah.

Roland, please, prepare changeset.

I am not sure we need separate CR for setting defaults. It is simple to file bug and change only defaults if we need.

Thanks,
Vladimir

On 11/24/17 1:22 AM, Thomas Schatzl wrote:
> Hi Nils,
> 
> On Thu, 2017-11-23 at 22:59 +0100, Nils Eliasson wrote:
>> Hi,
>>
>>
>> On 2017-11-23 15:18, Roland Westrelin wrote:
>>> Hi Vladimir,
>>>
>>>> I am running testing again. But if this will repeat and presence
>>>> of this Sparse.small regression suggesting to me that may be we
>>>> should keep this optimization off by default - keep
>>>> UseCountedLoopSafepoints false.
>>>>
>>>> We may switch it on later with additional changes which address
>>>> regressions.
>>>>
>>>> What do you think?
>>>
>>> If the inner loop runs for a small number of iterations and the
>>> compiler can't statically prove it, I don't see a way to remove the
>>> overhead of loop strip mining entirely. So I'm not optimistic the
>>> regression can be fixed.
>>
>> Agreed. In other words: Loop strip mining adds a guarantee that
>> time-to-safepoint won't be too long, and that has a small cost
>>
>> The current situation is that we have some extra performance with
>> UseCountedLoopSafepoints default off, but let some users have a bad
>> experience when they encounter long time-to-safepoint times or
>> failures (https://bugs.openjdk.java.net/browse/JDK-5014723). I rather
>> turn the table and have loop strip mining on, and let the power users
>> experiment with turning it off for any uncertain performance boost.
>>
>>> If loop strip mining defaults to false, would there we be any
>>> regular testing on your side?
>>
>> We would have to add some.
>>>
>>> It seems to me that it would make sense to enable loop strip mining
>>> depending on what GC is used: it makes little sense for parallel gc
>>> but we'll want it enabled for Shenandoah for instance. Where does
>>> G1 fit? I can't really say and I don't have a strong opinion. But
>>> as I understand, G1 was made default under the assumption that
>>> users would be ok trading throughput for better latency. Maybe,
>>> that same reasoning applies to loop strip mining?
>>
>> Scimark.sparse.small show a regression, but having long
>> time-to-safepoint has a throughput cost in some settings like the
>> companion benchmark scimark.sparse.large. Numbers using G1:
>>
>> -XX:-UseCountedLoopSafepoints (default) ~86 ops/m
>> -XX:+UseCountedLoopSafepoints ~106 ops/m
>> -XX:+UseCountedLoopSafepoints -XX:LoopStripMiningIter=1000 ~111 ops/m
>>
>> I would prefer having it on by default, at least in G1. Let's ask the
>> G1 GC-team on their opinion.
> 
>    our perf team uses -XX:+UseCountedLoopSafepoints for _all_ collectors
>   for some time now. When asked, they replied that predictability of
> results is very important for them too.
> 
> We also closed out some perf regressions (e.g. JDK-8177704) due to the
> problems with -XX:-UseCountedLoopSafepoints after you posted these
> results in agreement with the perf team.
> 
> So I am all for making it default for G1 (and I am sure others agree),
> if not for all GCs. However I recommend having a separate CR for
> changing the defaults. Makes it easier reverting it in case things go
> wrong.
> 
> Thanks,
>    Thomas
>