Repeated ParNews when Young Gen is Empty?

Fri Jul 19 15:51:46 PDT 2013

Andrew,

Regarding the growth in used in the young gen after a young GC,
it might be badly sized promotion buffers.  With parallel young
collections, each GC worker thread will have a private buffer
for copying the live objects (into a survivor space in the
young gen).  At the end of the GC I think the buffers are filled
with dummy objects.  Very badly sized promotion buffers could
lead to more used space in the young gen after a GC.
Try

-XX:+PrintPLAB

I think that will print out the promotion buffer sizes including
the amount of waste.

Jon

On 7/19/2013 2:23 PM, Andrew Colombi wrote:
> Jon,
>
> Here is a bit more from that same log:
>
> 49378.517: [GC 49378.517: [ParNew: 7894655K->332202K(13824000K), 0.1304950
> secs] 56021115K->48458661K(75010048K), 0.1314370 secs] [Times: user=2.00
> sys=0.01, real=0.13 secs]
> 49378.735: [GC 49378.736: [ParNew: 669513K->342976K(13824000K), 0.0849560
> secs] 48795972K->48469435K(75010048K), 0.0859460 secs] [Times: user=1.84
> sys=0.03, real=0.09 secs]
> 49378.850: [GC 49378.851: [ParNew: 514163K->312532K(13824000K), 0.1114820
> secs] 48640622K->48471080K(75010048K), 0.1122890 secs] [Times: user=2.21
> sys=0.02, real=0.12 secs]
> 49379.000: [GC 49379.000: [ParNew: 529899K->247436K(13824000K), 0.1099240
> secs]49379.110: [CMS2013-06-26T17:14:08.834-0400: 49395.671:
> [CMS-concurrent-mark: 16.629/17.168 secs] [Times: user=104.18 sys=1.86,
> real=17.16 secs]
> 227750K->31607742K(61186048K), 129.9298170 secs]
> 48688447K->31607742K(75010048K), [CMS Perm : 150231K->147875K(250384K)],
> 130.0405700 secs] [Times: user=209.80 sys=1.83, real=130.02 secs]
> 49510.844: [GC [1 CMS-initial-mark: 46419131K(61186048K)]
> 46881938K(75010048K), 0.1073960 secs] [Times: user=0.11 sys=0.00, real=0.11
> secs]
> 49510.952: [CMS-concurrent-mark-start]
> 49515.315: [GC 49515.316: [ParNew: 12288000K->130669K(13824000K), 0.0827050
> secs] 58707131K->46549801K(75010048K), 0.0838760 secs] [Times: user=1.63
> sys=0.01, real=0.09 secs]
> 49528.241: [CMS-concurrent-mark: 16.811/17.288 secs] [Times: user=184.48
> sys=21.43, real=17.29 secs]
> 49528.241: [CMS-concurrent-preclean-start]
> 49529.795: [CMS-concurrent-preclean: 1.549/1.554 secs] [Times: user=8.39
> sys=1.75, real=1.55 secs]
> 49529.795: [CMS-concurrent-abortable-preclean-start]
> 49534.261: [GC 49534.262: [ParNew: 12418669K->199314K(13824000K), 0.1248450
> secs] 58837801K->46618445K(75010048K), 0.1258850 secs] [Times: user=1.83
> sys=0.01, real=0.12 secs]
> me 2013-06-26T17:16:29.282-0400: 49536.120:
> [CMS-concurrent-abortable-preclean: 6.164/6.325 secs] [Times: user=29.18
> sys=6.79, real=6.33 secs]
> 49536.127: [GC[YG occupancy: 1158498 K (13824000 K)]49536.127: [Rescan
> (parallel) , 0.6845350 secs]49536.812: [weak refs processing, 0.0027360
> secs]49536.815: [scrub string table, 0.0026210 secs] [1 CMS-remark:
> 46419131K(61186048K)] 47577630K(75010048K), 0.6903830 secs] [Times:
> user=14.71 sys=0.08, real=0.69 secs]
> 49536.818: [CMS-concurrent-sweep-start]
> 49550.868: [CMS-concurrent-sweep: 14.026/14.049 secs] [Times: user=68.18
> sys=16.72, real=14.05 secs]
> 49550.868: [CMS-concurrent-reset-start]
> 49551.105: [CMS-concurrent-reset: 0.237/0.237 secs] [Times: user=1.31
> sys=0.32, real=0.24 secs]
>
> But I'd also point your attention to the log that I shared later on in this
> thread, where we observed Young Gen _grow_ during ParNew collection, snippet
> below.  Notice the last collection actually grows during collection, and the
> spurious "ParNew" line is part of the actual log, though I don't understand
> why.
>
> 30779.759: [GC 30779.760: [ParNew: 12397330K->125395K(13824000K),
> 122.6032130 secs] 33858511K->21587730K(75010048K), 122.6041920 secs] [Times:
> user=2811.37 sys=1.10, real=122.59 secs]
> 30914.319: [GC 30914.320: [ParNew
> 30914.319: [GC 30914.320: [ParNew: 12413753K->247108K(13824000K),
> 132.8863570 secs] 33876089K->21710570K(75010048K), 132.8874170 secs] [Times:
> user=3050.21 sys=0.74, real=132.86 secs]
> 31047.212: [GC 31047.212: [ParNew: 247347K->312540K(13824000K), 147.9675020
> secs] 21710809K->21777393K(75010048K), 147.9681870 secs] [Times:
> user=3398.88 sys=0.64, real=147.94 secs]
>
> We're still struggling with this.  We've opened an issue with Oracle support
> through our support contract.  I will keep the thread updated as we learn
> new, interesting things.
>
> -Andrew
>
> From:  Jon Masamitsu <jon.masamitsu at oracle.com>
> Organization:  Oracle Corporation
> Date:  Friday, July 19, 2013 1:32 PM
> To:  "hotspot-gc-use at openjdk.java.net" <hotspot-gc-use at openjdk.java.net>
> Subject:  Re: Repeated ParNews when Young Gen is Empty?
>
> What is the ParNew behavior like after the "concurrent mode failure"?
>
> Jon
>
> On 7/1/2013 1:44 PM, Andrew Colombi wrote:
>> Hi,
>>
>> I've been investigating some big, slow stop the world GCs, and came upon this
>> curious pattern of rapid, repeated ParNews on an almost empty Young Gen.
>> We're using - XX:+UseConcMarkSweepGC -XX:+UseParNewGC
>> -XX:+CMSParallelRemarkEnabled.  Here is the log:
>>
>> 49355.202: [GC 49355.202: [ParNew: 12499734K->276251K(13824000K), 0.1382160
>> secs] 45603872K->33380389K(75010048K), 0.1392380 secs] [Times: user=1.89
>> sys=0.02, real=0.14 secs]
>> 49370.274: [GC [1 CMS-initial-mark: 48126459K(61186048K)]
>> 56007160K(75010048K), 8.2281560 secs] [Times: user=8.22 sys=0.00, real=8.23
>> secs]
>> 49378.503: [CMS-concurrent-mark-start]
>> 49378.517: [GC 49378.517: [ParNew: 7894655K->332202K(13824000K), 0.1304950
>> secs] 56021115K->48458661K(75010048K), 0.1314370 secs] [Times: user=2.00
>> sys=0.01, real=0.13 secs]
>> 49378.735: [GC 49378.736: [ParNew: 669513K->342976K(13824000K), 0.0849560
>> secs] 48795972K->48469435K(75010048K), 0.0859460 secs] [Times: user=1.84
>> sys=0.03, real=0.09 secs]
>> 49378.850: [GC 49378.851: [ParNew: 514163K->312532K(13824000K), 0.1114820
>> secs] 48640622K->48471080K(75010048K), 0.1122890 secs] [Times: user=2.21
>> sys=0.02, real=0.12 secs]
>> 49379.000: [GC 49379.000: [ParNew: 529899K->247436K(13824000K), 0.1099240
>> secs]49379.110: [CMS2013-06-26T17:14:08.834-0400: 49395.671:
>> [CMS-concurrent-mark: 16.629/17.168 secs] [Times: user=104.18 sys=1.86,
>> real=17.16 secs]
>>   (concurrent mode failure): 48227750K->31607742K(61186048K), 129.9298170 secs]
>> 48688447K->31607742K(75010048K), [CMS Perm : 150231K->147875K(250384K)],
>> 130.0405700 secs] [Times: user=209.80 sys=1.83, real=130.02 secs]
>>
>> By my read, it starts with a typical ParNew that cleans about 12GB (of a 13GB
>> young gen).  Then CMS begins, and then the next three ParNews start feeling
>> strange.  First it does a ParNew at 49378.517 that hits at only 7.8GB occupied
>> of 13GB available. Then at  49378.736 and 49378.851 it does two ParNews when
>> young gen only has 660MB and 514MB occupied, respectively.  Then really bad
>> stuff happens: we hit a concurrent mode failure.  This stops the world for 2
>> minutes and clears about 17GB of data, almost all of which was in the CMS
>> tenured gen.  Notice there are still 12GB free in CMS!
>>
>> My question is, Why would it do three ParNews, only 300ms apart from each
>> other, when the young gen is essentially empty?  Here are three hypotheses
>> that I have:
>> * Is the application trying to allocate something giant, e.g. a 1 billion
>> element double[]? Is there a way I can test for this, i.e. some JVM level
>> logging that would indicate very large objects being allocated.
>> * Is there an explicit System.gc() in 3rd party code? (Our code is clean.)
>> We're going to disable explicit GC in our next maintenance period.  But this
>> theory doesn't explain concurrent mode failure.
>> * Maybe a third explanation is fragmentation? Is ParNew compacted on every
>> collection?  I've read that CMS tenured gen can suffer from fragmentation.
>>
>> Some details of the installation.  Here is the Java version.
>>
>> java version "1.7.0_21"
>> Java(TM) SE Runtime Environment (build 1.7.0_21-b11)
>> Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode)
>>
>> Here are all the GC relevant parameters we are setting:
>>
>> -Dsun.rmi.dgc.client.gcInterval=3600000
>> -Dsun.rmi.dgc.server.gcInterval=3600000
>> -Xms74752m
>> -Xmx74752m
>> -Xmn15000m
>> -XX:PermSize=192m
>> -XX:MaxPermSize=1500m
>> -XX:CMSInitiatingOccupancyFraction=60
>> -XX:+UseConcMarkSweepGC
>> -XX:+UseParNewGC
>> -XX:+CMSParallelRemarkEnabled
>> -XX:+ExplicitGCInvokesConcurrent
>> -verbose:gc
>> -XX:+PrintGCDetails
>> -XX:+PrintGCTimeStamps
>> -XX:+PrintGCDateStamps // I removed this from the output above to make it
>> slightly more concise
>> -Xloggc:gc.log
>>
>> Any thoughts or recommendations would be welcome,
>> -Andrew
>>
>>
>>   
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/h
>> otspot-gc-use
>> <https://urldefense.proofpoint.com/v1/url?u=http://mail.openjdk.java.net/mailm
>> an/listinfo/hotspot-gc-use&k=fDZpZZQMmYwf27OU23GmAQ%3D%3D%0A&r=XSv4n1CNq0shECU
>> adR229oCRIwnwFZHnIJXszDSM7n4%3D%0A&m=AoKioLSCElHhkeKGB3Hh00BAmtDw%2BK%2FzjC3u0
>> 7rQI3k%3D%0A&s=99757a8204b83fe8a9294b91a51d0cd1f289000588db1a77d2ccc9fd609bcfc
>> f>
>
>
>