RFR(L): 8186027: C2: loop strip mining

Fri Nov 24 09:22:23 UTC 2017

Hi Nils,

On Thu, 2017-11-23 at 22:59 +0100, Nils Eliasson wrote:
> Hi,
> 
> 
> On 2017-11-23 15:18, Roland Westrelin wrote:
> > Hi Vladimir,
> > 
> > > I am running testing again. But if this will repeat and presence
> > > of this Sparse.small regression suggesting to me that may be we
> > > should keep this optimization off by default - keep
> > > UseCountedLoopSafepoints false.
> > > 
> > > We may switch it on later with additional changes which address
> > > regressions.
> > > 
> > > What do you think?
> > 
> > If the inner loop runs for a small number of iterations and the
> > compiler can't statically prove it, I don't see a way to remove the
> > overhead of loop strip mining entirely. So I'm not optimistic the
> > regression can be fixed.
> 
> Agreed. In other words: Loop strip mining adds a guarantee that 
> time-to-safepoint won't be too long, and that has a small cost
> 
> The current situation is that we have some extra performance with 
> UseCountedLoopSafepoints default off, but let some users have a bad 
> experience when they encounter long time-to-safepoint times or
> failures (https://bugs.openjdk.java.net/browse/JDK-5014723). I rather
> turn the table and have loop strip mining on, and let the power users
> experiment with turning it off for any uncertain performance boost.
> 
> > If loop strip mining defaults to false, would there we be any
> > regular testing on your side?
> 
> We would have to add some.
> > 
> > It seems to me that it would make sense to enable loop strip mining
> > depending on what GC is used: it makes little sense for parallel gc
> > but we'll want it enabled for Shenandoah for instance. Where does
> > G1 fit? I can't really say and I don't have a strong opinion. But
> > as I understand, G1 was made default under the assumption that
> > users would be ok trading throughput for better latency. Maybe,
> > that same reasoning applies to loop strip mining?
> 
> Scimark.sparse.small show a regression, but having long 
> time-to-safepoint has a throughput cost in some settings like the 
> companion benchmark scimark.sparse.large. Numbers using G1:
> 
> -XX:-UseCountedLoopSafepoints (default) ~86 ops/m
> -XX:+UseCountedLoopSafepoints ~106 ops/m
> -XX:+UseCountedLoopSafepoints -XX:LoopStripMiningIter=1000 ~111 ops/m
> 
> I would prefer having it on by default, at least in G1. Let's ask the
> G1 GC-team on their opinion.

  our perf team uses -XX:+UseCountedLoopSafepoints for _all_ collectors
 for some time now. When asked, they replied that predictability of
results is very important for them too.

We also closed out some perf regressions (e.g. JDK-8177704) due to the
problems with -XX:-UseCountedLoopSafepoints after you posted these
results in agreement with the perf team.

So I am all for making it default for G1 (and I am sure others agree),
if not for all GCs. However I recommend having a separate CR for
changing the defaults. Makes it easier reverting it in case things go
wrong.

Thanks,
  Thomas