Repeated ParNews when Young Gen is Empty?

Thu Jul 11 02:40:39 PDT 2013

Hi,

On Wed, 2013-07-10 at 23:20 +0000, Andrew Colombi wrote:
> Thanks for the suggestions.
> 
> > CR opened with Oracle
> No CR with Oracle yet.  I'll see what we can do.
> 
> > turn on per-GC-worker stats per phase
> Is this something I can turn on now?  A quick scan of the GC options that
> I know of don't look like they do.

  there is nothing that is available in regular builds; you have to go
to src/share/vm/utilities/taskqueue.hpp and define TASKQUEUE_STATS to 1.

Then use -XX:+PrintGCDetails and -XX:+ParallelGCVerbose to print some
per-thread timing statistics that e.g. look like the following:

     elapsed  --strong roots-- -------termination-------
thr     ms        ms       %       ms       %   attempts
--- --------- --------- ------ --------- ------ --------
  0      1.06      0.77  73.20      0.07   6.63        1
  1      1.12      0.82  72.92      0.00   0.00        1
  2      1.13      0.71  62.81      0.00   0.35        1

i.e. showing total elapsed time per thread, strong root processing time
(e.g. evacuation) and task termination time.

Compared to eg. g1 it is primitive though.

> > trying ParallelGC
> Just to be clear, are you recommending we try the throughput collector,
> e.g. -XX:+UseParallelOldGC.  I'm up for that, given how bad things are
> with the current configuration.
> 
> > trying the latest JVM
> Definitely a good idea.  We'll try the latest.  If we were to roll-back,
> is there any particular version you would recommend?
> 
> -Andrew
> 
> On 7/10/13 3:44 PM, "Srinivas Ramakrishna" <ysr1729 at gmail.com> wrote:
> 
> >[Just some off the cuff suggestions here, no solutions.]
> >
> >Yikes, this looks to me like a pretty serious bug. Has a CR been
> >opened with Oracle for this problem?
> >Could you downrev yr version of the JVM to see if you can determine
> >when the problem may have started?
> >
> >The growing par new times are definitely concerning. I'd suggest that
> >at the very least, if not already the case,
> >we should be able to turn on per-GC-worker stats per phase for ParNew
> >in much the same way that Parallel and G1 provide.

Afaik the hack above is all there is.

> >
> >You might also try ParallelGC to see if you can replicate the growing
> >minor gc problem. Given how bad it is, my guess is
> >that it stems from some linear single-threaded root-scanning issue and
> >so (at least with an instrumented JVM -- which
> >you might be able to request from Oracle or build on yr own) could be
> >tracked down quickly.

Above changes should show that imo.

> >
> >Also (if possible) try the latest JVM to see if the problem is a known
> >one that has already been fixed perhaps.
> >
> >
> >On Wed, Jul 10, 2013 at 3:02 PM, Andrew Colombi <acolombi at palantir.com>
> >wrote:
> >> Thanks for the response and help.  I've done some more investigation and
> >> learning, and I have another _fascinating_ log from production.  First,
> >> here are some things we've done.
> >>
> >> * We're going to enable -XX:+PrintTLAB as a way of learning more about
> >> how the application is allocating memory in Eden.
> >> * We're examining areas of the code base that might be allocating large
> >> objects (we haven't found anything egregious, e.g exceeding 10~100MB
> >> allocation).  Nevertheless, we have a few changes that will reduce the
> >> size of these objects, and we're deploying the change this week.
> >>
> >> Now check out this event (more of the log is below, since it's pretty
>>> damn interesting):
> >>
> >> 30779.759: [GC 30779.760: [ParNew: 12397330K->125395K(13824000K),
> >> 122.6032130 secs] 33858511K->21587730K(75010048K), 122.6041920 secs]
> >> [Times: user=2811.37 sys=1.10, real=122.59 secs]
> >> 30914.319: [GC 30914.320: [ParNew
> >> 30914.319: [GC 30914.320: [ParNew: 12413753K->247108K(13824000K),
> >> 132.8863570 secs] 33876089K->21710570K(75010048K), 132.8874170 secs]
> >> [Times: user=3050.21 sys=0.74, real=132.86 secs]
> >> 31047.212: [GC 31047.212: [ParNew: 247347K->312540K(13824000K),
> >> 147.9675020 secs] 21710809K->21777393K(75010048K), 147.9681870 secs]
> >> [Times: user=3398.88 sys=0.64, real=147.94 secs]
> >>
> >> Notable things:
> >>
> >> * The first ParNew took 2811s of user time, and 122s of wall-clock.  My
> >> understanding is that the copy collector's performance is primarily
> >>bound
> >> by the number of objects that end up getting copied to survivor or
> >> tenured.  Looking at these numbers, approximately 100MB survived the
> >> ParNew collection.  100MB surviving hardly seems cause for a 122s pause.

Note that finding space in the tenured generation may be an issue here
as it uses free lists. Using PrintFLSStatistics as suggested earlier may
help in finding issues.

> >> * Then it prints an empty ParNew line.  What's that about?  Feels like
> >> the garbage collector is either: i) hitting a race (two threads are racing
>>> to initiate the ParNew, so they both print the ParNew line), ii)

The message is printed after serializing the collection requests so
there should be no race. This is odd.

>>> following an
> >> unusual sequence of branches, and it's a bug that it accidentally prints
> >> the second ParNew.  In either case, I'm going to try to track it down in
> >> the hotspot source.
> >> * Another ParNew hits, which takes even longer, but otherwise looks
> >> similar to the first.
> >> * Third ParNew, and most interesting: the GC reports that Young Gen
> >> *GROWS* during GC.  Garbage collection begins at 247MB (why? did I
>>> really allocate a 12GB object? doesn't seem likely), and ends at
>>> 312MB.  That's fascinating.

Maybe there is something odd with tlab sizing. Can you add -XX:
+PrintTLAB?

> >> On 7/1/13 2:42 PM, "Bernd Eckenfels" <bernd.eckenfels at googlemail.com>
> >> wrote:
> >>
> >>>Am 01.07.2013, 22:44 Uhr, schrieb Andrew Colombi
> >>><acolombi at palantir.com>:
> >>>> My question is, Why would it do three ParNews, only 300ms apart from
> >>>> each other, when the young gen is essentially empty?  Here are three
> >>>> hypotheses that I have:
> >>>> * Is the application trying to allocate something giant, e.g. a 1
> >>>> billion element double[]? Is there a way I can test for this, i.e.
> >>>>some
> >>>>
> >>>> JVM level logging that would indicate very large objects being
> >>>>allocated.
> >>>
> >>>That was a suspicion of me as well. (And I dont know a good tool for Sun
> >>>VM (with IBM you can trace it)).

There has been some discussion here recently; the best option seemed to
be bytecode rewriting, but for some reason it did not work in some cases
(on g1).

Thread starts here:
http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2013-June/001555.html

Hth,
  Thomas