High termination times pre-concurrent cycle in G1

Fri Sep 2 14:46:01 UTC 2016

This particular run hits to-space exhaustion, followed by a full GC that
appears a bunch of humongous arrays and generally reduces heap occupancy
substantially (of course you paid 23s for that).

There's a humongous allocation that triggers a mixed cycle (~60s into the
run), which is followed by initiation of concurrent-marking.  That
concurrent-mark never completes because the to-space exhaustion comes, and
then a full GC; the concurrent mark is aborted right after that.

Since you've reduced ParallelGCThreads to 2, you should explicitly bump
ConcGCThreads or else you're running with just 1, which looks like isn't
keeping up.

On Fri, Sep 2, 2016 at 6:10 AM, William Good <bkgood at gmail.com> wrote:

> Jenny,
>
> There's a log attached with those options.
>
> Termination times are less with 2 workers but eden is much smaller (~2
> GB instead of ~7) and evac pauses are more frequent. So it's a nice
> improvement but not ideal. Also it appears to me that the 2 worker
> case shows that work distribution appears to degenerate over time
> (termination attempts increases) which mirrors what I perceive in the
> cases with higher worker counts.
>
> Thanks for your help so far!
>
> William
>
> On Fri, Sep 2, 2016 at 1:34 AM, yu.zhang at oracle.com <yu.zhang at oracle.com>
> wrote:
> > William,
> >
> > I think it might be related to humongous objects. Can you do some
> > experiments
> >
> > -XX:+UnlockExperimentalVMOptions -XX:+G1TraceEagerReclaimHumongousOb
> jects
> >
> > Can you remove -XX:G1LogLevel=finest?
> >
> > Based on the output, can adjust the heap region size.
> > But I think the termination is less with 2 worker threads.
> >
> > Thanks
> > Jenny
> >
> > On 09/01/2016 08:12 AM, William Good wrote:
> >>
> >> Yep, the marking cycle seems to help. I just don't know why. Objects
> >> in old regions should die very infrequently, as everything produced
> >> either survives indefinitely or is a byproduct of loading or
> >> evaluation (a very fast operation,especially when compared to
> >> frequency of evac pause * number of survivor regions). Thus a mark
> >> cycle shouldn't reveal much to be collected in old regions, and my
> >> understanding is that all the survivor spaces are marked+evac'd on
> >> each evac pause.
> >>
> >> Tried first with 12 workers (happens to be the number of physical
> >> cores on my machine) and got the same pathological behavior. Then
> >> tried with 2 and still see large termination time increases. Log file
> >> attached.
> >>
> >> William
> >>
> >> On Wed, Aug 31, 2016 at 8:18 AM, yu.zhang at oracle.com
> >> <yu.zhang at oracle.com> wrote:
> >>>
> >>> It seems that after marking (clean up), the termination time drops.
> Maybe
> >>> that is why you need a very low ihop so that you can have more marking
> >>> cycle.
> >>>
> >>> The work distribution seems fine. But system time is high. Maybe some
> >>> lock
> >>> contention.
> >>>
> >>> I would agree to try lowering the gc threads, -XX:ParallelGCThreads=<n>
> >>>
> >>> Jenny
> >>>
> >>>
> >>> On 08/30/2016 04:08 PM, Vitaly Davidovich wrote:
> >>>
> >>> William,
> >>>
> >>> Have you tried running with a lower number (than the current 18) of
> >>> parallel
> >>> workers?
> >>>
> >>> On Tuesday, August 30, 2016, William Good <bkgood at gmail.com> wrote:
> >>>>
> >>>> I've been experiencing an issue in a production application using G1
> >>>> for quite some time over a handful of 1.8.0 builds. The application is
> >>>> relatively simple: it spends about 60s reading some parameters from
> >>>> files on disk, and then starts serving web requests which merge some
> >>>> input with those parameters, performs some computation and returns a
> >>>> result. We're aiming to keep max total request time (as seen by remote
> >>>> hosts) below 100 ms but from previous experience with parnew and cms
> >>>> (and g1 on previous projects, for that matter), I didn't anticipate
> >>>> this being a problem.
> >>>>
> >>>> The symptoms are an ever-increasing time spent in evacuation pauses,
> >>>> and high parallel worker termination times stick out. With the
> >>>> recommended set of G1 settings (max heap size and pause time target),
> >>>> they increase sharply until I start seeing 500ms+ pause times and have
> >>>> to kill the JVM.
> >>>>
> >>>> I found some time ago that first forcing a bunch of full GCs with
> >>>> System.gc() at the phase (load -> serve) change and then forcing
> >>>> frequent concurrent cycles with -XX:InitiatingHeapOccupancyPercent=1
> >>>> seems to mitigate the problem. I'd prefer to have to do neither, as
> >>>> the former makes redeployments very slow and the latter adds a couple
> >>>> of neighboring 40ms pauses for remark and cleanup pauses that aren't
> >>>> good for request time targets.
> >>>>
> >>>> I'm attaching a log file that details a short run, with the phase
> >>>> change at about 60s from start. After a few evacuation pauses, one
> >>>> lasts 160ms with nearly 100-120ms spent in parallel workers'
> >>>> 'termination'. After this, a concurrent cycle runs and everything goes
> >>>> back to normal. java params are at the top of the file.
> >>>>
> >>>> Generally this happens over a much longer period of time (and
> >>>> especially if I haven't given the low
> >>>> -XX:InitiatingHeapOccupancyPercent value) and over many different
> >>>> builds of 1.8.0. This was b101. It's running alone on a fairly hefty
> >>>> dual-socket Xeon box with 128GB of RAM on CentOS 7.
> >>>>
> >>>> I'd be more than happy to hear any ideas on what's going on here and
> >>>> how it could be fixed.
> >>>>
> >>>> Best,
> >>>> William
> >>>
> >>>
> >>>
> >>> --
> >>> Sent from my phone
> >>>
> >>>
> >>> _______________________________________________
> >>> hotspot-gc-use mailing list
> >>> hotspot-gc-use at openjdk.java.net
> >>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
> >>>
> >>>
> >
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20160902/eef0e68b/attachment.html>