Repeated ParNews when Young Gen is Empty?

Andrew Colombi acolombi at palantir.com
Wed Jul 10 15:02:59 PDT 2013


Thanks for the response and help.  I've done some more investigation and
learning, and I have another _fascinating_ log from production.  First,
here are some things we've done.

* We're going to enable -XX:+PrintTLAB as a way of learning more about how
the application is allocating memory in Eden.
* We're examining areas of the code base that might be allocating large
objects (we haven't found anything egregious, e.g exceeding 10~100MB
allocation).  Nevertheless, we have a few changes that will reduce the
size of these objects, and we're deploying the change this week.

Now check out this event (more of the log is below, since it's pretty damn
interesting):

30779.759: [GC 30779.760: [ParNew: 12397330K->125395K(13824000K),
122.6032130 secs] 33858511K->21587730K(75010048K), 122.6041920 secs]
[Times: user=2811.37 sys=1.10, real=122.59 secs]
30914.319: [GC 30914.320: [ParNew
30914.319: [GC 30914.320: [ParNew: 12413753K->247108K(13824000K),
132.8863570 secs] 33876089K->21710570K(75010048K), 132.8874170 secs]
[Times: user=3050.21 sys=0.74, real=132.86 secs]
31047.212: [GC 31047.212: [ParNew: 247347K->312540K(13824000K),
147.9675020 secs] 21710809K->21777393K(75010048K), 147.9681870 secs]
[Times: user=3398.88 sys=0.64, real=147.94 secs]

Notable things:

* The first ParNew took 2811s of user time, and 122s of wall-clock.  My
understanding is that the copy collector's performance is primarily bound
by the number of objects that end up getting copied to survivor or
tenured.  Looking at these numbers, approximately 100MB survived the
ParNew collection.  100MB surviving hardly seems cause for a 122s pause.
* Then it prints an empty ParNew line.  What's that about?  Feels like the
garbage collector is either: i) hitting a race (two threads are racing to
initiate the ParNew, so they both print the ParNew line), ii) following an
unusual sequence of branches, and it's a bug that it accidentally prints
the second ParNew.  In either case, I'm going to try to track it down in
the hotspot source.
* Another ParNew hits, which takes even longer, but otherwise looks
similar to the first.
* Third ParNew, and most interesting: the GC reports that Young Gen
*GROWS* during GC.  Garbage collection begins at 247MB (why? did I really
allocate a 12GB object? doesn't seem likely), and ends at 312MB.  That's
fascinating.

My next step is to learn what I can by examining the ParNew source.  If
anyone has ever seen, or understands why, allocations grow during garbage
collection, I would be very grateful for your guidance.

Thanks for your help,
-Andrew

Here are more stats about my VM and additional log output:

java version "1.7.0_21"
Java(TM) SE Runtime Environment (build 1.7.0_21-b11)
Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode)

Here are all the GC relevant parameters we are setting:

-Dsun.rmi.dgc.client.gcInterval=3600000
-Dsun.rmi.dgc.server.gcInterval=3600000
-Xms74752m
-Xmx74752m
-Xmn15000m
-XX:PermSize=192m
-XX:MaxPermSize=1500m
-XX:CMSInitiatingOccupancyFraction=60
-XX:+UseConcMarkSweepGC
-XX:+UseParNewGC
-XX:+CMSParallelRemarkEnabled
-XX:+ExplicitGCInvokesConcurrent
-verbose:gc
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+PrintGCDateStamps // I removed this from the output above to make it
slightly more concise
-Xloggc:gc.log

And some GC Logs, notice how the time to GC grows exponentially for the
first five allocations


30137.445: [GC 30137.446: [ParNew: 12904920K->601316K(13824000K),
0.1975450 secs] 33841556K->21539328K(75010048K), 0.1982140 secs] [Times:
user=3.82 sys=0.02, real=0.19 secs]
30160.854: [GC 30160.854: [ParNew: 12889316K->93588K(13824000K), 1.5997950
secs] 33827328K->21534997K(75010048K), 1.6004450 secs] [Times: user=35.92
sys=0.02, real=1.60 secs]
30186.369: [GC 30186.369: [ParNew: 12381622K->61970K(13824000K), 5.2605870
secs] 33823030K->21505459K(75010048K), 5.2612450 secs] [Times: user=119.75
sys=0.06, real=5.26 secs]
30214.193: [GC 30214.194: [ParNew: 12349970K->69808K(13824000K),
10.6501520 secs] 33793459K->21515427K(75010048K), 10.6509060 secs] [Times:
user=243.13 sys=0.10, real=10.65 secs]
30245.569: [GC 30245.569: [ParNew: 12357808K->52428K(13824000K),
32.4167510 secs] 33803427K->21504964K(75010048K), 32.4175410 secs] [Times:
user=740.98 sys=0.34, real=32.41 secs]
30294.965: [GC 30294.966: [ParNew: 12340428K->39537K(13824000K),
51.0611270 secs] 33792964K->21492074K(75010048K), 51.0619680 secs] [Times:
user=1169.93 sys=0.38, real=51.05 secs]
30365.735: [GC 30365.736: [ParNew
30365.735: [GC 30365.736: [ParNew: 12327537K->45619K(13824000K),
64.2732840 secs] 33780074K->21501245K(75010048K), 64.2740740 secs] [Times:
user=1472.58 sys=0.43, real=64.27 secs]
30448.941: [GC 30448.941: [ParNew: 12333619K->62322K(13824000K),
78.4998780 secs] 33789245K->21519995K(75010048K), 78.5007800 secs] [Times:
user=1799.07 sys=0.50, real=78.48 secs]
30541.600: [GC 30541.601: [ParNew: 12350322K->93647K(13824000K),
95.1860020 secs] 33807995K->21552580K(75010048K), 95.1869510 secs] [Times:
user=2181.58 sys=0.71, real=95.17 secs]
30655.141: [GC 30655.142: [ParNew
30655.141: [GC 30655.142: [ParNew: 12381662K->109330K(13824000K),
111.0219700 secs] 33840595K->21570511K(75010048K), 111.0229110 secs]
[Times: user=2546.12 sys=0.73, real=111.00 secs]
30779.759: [GC 30779.760: [ParNew: 12397330K->125395K(13824000K),
122.6032130 secs] 33858511K->21587730K(75010048K), 122.6041920 secs]
[Times: user=2811.37 sys=1.10, real=122.59 secs]
30914.319: [GC 30914.320: [ParNew
30914.319: [GC 30914.320: [ParNew: 12413753K->247108K(13824000K),
132.8863570 secs] 33876089K->21710570K(75010048K), 132.8874170 secs]
[Times: user=3050.21 sys=0.74, real=132.86 secs]
31047.212: [GC 31047.212: [ParNew: 247347K->312540K(13824000K),
147.9675020 secs] 21710809K->21777393K(75010048K), 147.9681870 secs]
[Times: user=3398.88 sys=0.64, real=147.94 secs]
31202.704: [GC 31202.705: [ParNew
31202.704: [GC 31202.705: [ParNew: 12600540K->1536000K(13824000K),
139.7664350 secs] 34065393K->23473563K(75010048K), 139.7675110 secs]
[Times: user=3206.88 sys=0.86, real=139.75 secs]
31353.548: [GC 31353.549: [ParNew: 13824000K->442901K(13824000K),
0.8626320 secs] 35761563K->23802063K(75010048K), 0.8634580 secs] [Times:
user=15.23 sys=0.12, real=0.86 secs]
31372.225: [GC 31372.226: [ParNew: 12730901K->329727K(13824000K),
0.1372260 secs] 36090063K->23688888K(75010048K), 0.1381430 secs] [Times:
user=2.49 sys=0.02, real=0.14 secs]




On 7/1/13 2:42 PM, "Bernd Eckenfels" <bernd.eckenfels at googlemail.com>
wrote:

>Am 01.07.2013, 22:44 Uhr, schrieb Andrew Colombi <acolombi at palantir.com>:
>> My question is, Why would it do three ParNews, only 300ms apart from
>> each other, when the young gen is essentially empty?  Here are three
>> hypotheses that I have:
>> * Is the application trying to allocate something giant, e.g. a 1
>> billion element double[]? Is there a way I can test for this, i.e. some
>> 
>> JVM level logging that would indicate very large objects being
>>allocated.
>
>That was a suspicion of me as well. (And I dont know a good tool for Sun
>VM (with IBM you can trace it)).
>
>> * Is there an explicit System.gc() in 3rd party code? (Our code is
>> clean.) We're going to disable explicit GC in our next maintenance
>> period.  But this theory doesn't explain concurrent mode failure.
>
>I think System.gc() will also not trigger 3 ParNew in a row.
>
>> * Maybe a third explanation is fragmentation? Is ParNew compacted on
>> every collection?  I've read that CMS tenured gen can suffer from
>> fragmentation.
>
>ParNew is a copy collector, this is automatically compacting. But the
>promoted objects might of course fragment due to the PLABs in old. Your
>log is from 13h uptime, do you see it before/after as well?
>
>Because there was no follow up, I will just mention some more candidates
>to look out for, the changes around CMSWaitDuration (RFE 7189971) I think
> 
>they have been merged to 1.7.0.
>
>And maybe enabling more diagnostics can help:
>
>-XX:PrintFLSStatistics=2
>
>Greetings
>Bernd
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5019 bytes
Desc: not available
Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130710/3d8bf90d/smime.p7s 


More information about the hotspot-gc-use mailing list