Repeated ParNews when Young Gen is Empty?

Sat Jul 20 08:24:10 PDT 2013

Andrew,

I'm having second thoughts about promotion buffers being
relevant.  I'll have to think about it some more.  Sorry for the half
baked idea.

Jon

On 7/19/2013 3:51 PM, Jon Masamitsu wrote:
> Andrew,
>
> Regarding the growth in used in the young gen after a young GC,
> it might be badly sized promotion buffers.  With parallel young
> collections, each GC worker thread will have a private buffer
> for copying the live objects (into a survivor space in the
> young gen).  At the end of the GC I think the buffers are filled
> with dummy objects.  Very badly sized promotion buffers could
> lead to more used space in the young gen after a GC.
> Try
>
> -XX:+PrintPLAB
>
> I think that will print out the promotion buffer sizes including
> the amount of waste.
>
> Jon
>
> On 7/19/2013 2:23 PM, Andrew Colombi wrote:
>> Jon,
>>
>> Here is a bit more from that same log:
>>
>> 49378.517: [GC 49378.517: [ParNew: 7894655K->332202K(13824000K), 0.1304950
>> secs] 56021115K->48458661K(75010048K), 0.1314370 secs] [Times: user=2.00
>> sys=0.01, real=0.13 secs]
>> 49378.735: [GC 49378.736: [ParNew: 669513K->342976K(13824000K), 0.0849560
>> secs] 48795972K->48469435K(75010048K), 0.0859460 secs] [Times: user=1.84
>> sys=0.03, real=0.09 secs]
>> 49378.850: [GC 49378.851: [ParNew: 514163K->312532K(13824000K), 0.1114820
>> secs] 48640622K->48471080K(75010048K), 0.1122890 secs] [Times: user=2.21
>> sys=0.02, real=0.12 secs]
>> 49379.000: [GC 49379.000: [ParNew: 529899K->247436K(13824000K), 0.1099240
>> secs]49379.110: [CMS2013-06-26T17:14:08.834-0400: 49395.671:
>> [CMS-concurrent-mark: 16.629/17.168 secs] [Times: user=104.18 sys=1.86,
>> real=17.16 secs]
>> 227750K->31607742K(61186048K), 129.9298170 secs]
>> 48688447K->31607742K(75010048K), [CMS Perm : 150231K->147875K(250384K)],
>> 130.0405700 secs] [Times: user=209.80 sys=1.83, real=130.02 secs]
>> 49510.844: [GC [1 CMS-initial-mark: 46419131K(61186048K)]
>> 46881938K(75010048K), 0.1073960 secs] [Times: user=0.11 sys=0.00, real=0.11
>> secs]
>> 49510.952: [CMS-concurrent-mark-start]
>> 49515.315: [GC 49515.316: [ParNew: 12288000K->130669K(13824000K), 0.0827050
>> secs] 58707131K->46549801K(75010048K), 0.0838760 secs] [Times: user=1.63
>> sys=0.01, real=0.09 secs]
>> 49528.241: [CMS-concurrent-mark: 16.811/17.288 secs] [Times: user=184.48
>> sys=21.43, real=17.29 secs]
>> 49528.241: [CMS-concurrent-preclean-start]
>> 49529.795: [CMS-concurrent-preclean: 1.549/1.554 secs] [Times: user=8.39
>> sys=1.75, real=1.55 secs]
>> 49529.795: [CMS-concurrent-abortable-preclean-start]
>> 49534.261: [GC 49534.262: [ParNew: 12418669K->199314K(13824000K), 0.1248450
>> secs] 58837801K->46618445K(75010048K), 0.1258850 secs] [Times: user=1.83
>> sys=0.01, real=0.12 secs]
>> me 2013-06-26T17:16:29.282-0400: 49536.120:
>> [CMS-concurrent-abortable-preclean: 6.164/6.325 secs] [Times: user=29.18
>> sys=6.79, real=6.33 secs]
>> 49536.127: [GC[YG occupancy: 1158498 K (13824000 K)]49536.127: [Rescan
>> (parallel) , 0.6845350 secs]49536.812: [weak refs processing, 0.0027360
>> secs]49536.815: [scrub string table, 0.0026210 secs] [1 CMS-remark:
>> 46419131K(61186048K)] 47577630K(75010048K), 0.6903830 secs] [Times:
>> user=14.71 sys=0.08, real=0.69 secs]
>> 49536.818: [CMS-concurrent-sweep-start]
>> 49550.868: [CMS-concurrent-sweep: 14.026/14.049 secs] [Times: user=68.18
>> sys=16.72, real=14.05 secs]
>> 49550.868: [CMS-concurrent-reset-start]
>> 49551.105: [CMS-concurrent-reset: 0.237/0.237 secs] [Times: user=1.31
>> sys=0.32, real=0.24 secs]
>>
>> But I'd also point your attention to the log that I shared later on in this
>> thread, where we observed Young Gen _grow_ during ParNew collection, snippet
>> below.  Notice the last collection actually grows during collection, and the
>> spurious "ParNew" line is part of the actual log, though I don't understand
>> why.
>>
>> 30779.759: [GC 30779.760: [ParNew: 12397330K->125395K(13824000K),
>> 122.6032130 secs] 33858511K->21587730K(75010048K), 122.6041920 secs] [Times:
>> user=2811.37 sys=1.10, real=122.59 secs]
>> 30914.319: [GC 30914.320: [ParNew
>> 30914.319: [GC 30914.320: [ParNew: 12413753K->247108K(13824000K),
>> 132.8863570 secs] 33876089K->21710570K(75010048K), 132.8874170 secs] [Times:
>> user=3050.21 sys=0.74, real=132.86 secs]
>> 31047.212: [GC 31047.212: [ParNew: 247347K->312540K(13824000K), 147.9675020
>> secs] 21710809K->21777393K(75010048K), 147.9681870 secs] [Times:
>> user=3398.88 sys=0.64, real=147.94 secs]
>>
>> We're still struggling with this.  We've opened an issue with Oracle support
>> through our support contract.  I will keep the thread updated as we learn
>> new, interesting things.
>>
>> -Andrew
>>
>> From:  Jon Masamitsu <jon.masamitsu at oracle.com>
>> Organization:  Oracle Corporation
>> Date:  Friday, July 19, 2013 1:32 PM
>> To:  "hotspot-gc-use at openjdk.java.net" <hotspot-gc-use at openjdk.java.net>
>> Subject:  Re: Repeated ParNews when Young Gen is Empty?
>>
>> What is the ParNew behavior like after the "concurrent mode failure"?
>>
>> Jon
>>
>> On 7/1/2013 1:44 PM, Andrew Colombi wrote:
>>> Hi,
>>>
>>> I've been investigating some big, slow stop the world GCs, and came upon this
>>> curious pattern of rapid, repeated ParNews on an almost empty Young Gen.
>>> We're using - XX:+UseConcMarkSweepGC -XX:+UseParNewGC
>>> -XX:+CMSParallelRemarkEnabled.  Here is the log:
>>>
>>> 49355.202: [GC 49355.202: [ParNew: 12499734K->276251K(13824000K), 0.1382160
>>> secs] 45603872K->33380389K(75010048K), 0.1392380 secs] [Times: user=1.89
>>> sys=0.02, real=0.14 secs]
>>> 49370.274: [GC [1 CMS-initial-mark: 48126459K(61186048K)]
>>> 56007160K(75010048K), 8.2281560 secs] [Times: user=8.22 sys=0.00, real=8.23
>>> secs]
>>> 49378.503: [CMS-concurrent-mark-start]
>>> 49378.517: [GC 49378.517: [ParNew: 7894655K->332202K(13824000K), 0.1304950
>>> secs] 56021115K->48458661K(75010048K), 0.1314370 secs] [Times: user=2.00
>>> sys=0.01, real=0.13 secs]
>>> 49378.735: [GC 49378.736: [ParNew: 669513K->342976K(13824000K), 0.0849560
>>> secs] 48795972K->48469435K(75010048K), 0.0859460 secs] [Times: user=1.84
>>> sys=0.03, real=0.09 secs]
>>> 49378.850: [GC 49378.851: [ParNew: 514163K->312532K(13824000K), 0.1114820
>>> secs] 48640622K->48471080K(75010048K), 0.1122890 secs] [Times: user=2.21
>>> sys=0.02, real=0.12 secs]
>>> 49379.000: [GC 49379.000: [ParNew: 529899K->247436K(13824000K), 0.1099240
>>> secs]49379.110: [CMS2013-06-26T17:14:08.834-0400: 49395.671:
>>> [CMS-concurrent-mark: 16.629/17.168 secs] [Times: user=104.18 sys=1.86,
>>> real=17.16 secs]
>>>    (concurrent mode failure): 48227750K->31607742K(61186048K), 129.9298170 secs]
>>> 48688447K->31607742K(75010048K), [CMS Perm : 150231K->147875K(250384K)],
>>> 130.0405700 secs] [Times: user=209.80 sys=1.83, real=130.02 secs]
>>>
>>> By my read, it starts with a typical ParNew that cleans about 12GB (of a 13GB
>>> young gen).  Then CMS begins, and then the next three ParNews start feeling
>>> strange.  First it does a ParNew at 49378.517 that hits at only 7.8GB occupied
>>> of 13GB available. Then at  49378.736 and 49378.851 it does two ParNews when
>>> young gen only has 660MB and 514MB occupied, respectively.  Then really bad
>>> stuff happens: we hit a concurrent mode failure.  This stops the world for 2
>>> minutes and clears about 17GB of data, almost all of which was in the CMS
>>> tenured gen.  Notice there are still 12GB free in CMS!
>>>
>>> My question is, Why would it do three ParNews, only 300ms apart from each
>>> other, when the young gen is essentially empty?  Here are three hypotheses
>>> that I have:
>>> * Is the application trying to allocate something giant, e.g. a 1 billion
>>> element double[]? Is there a way I can test for this, i.e. some JVM level
>>> logging that would indicate very large objects being allocated.
>>> * Is there an explicit System.gc() in 3rd party code? (Our code is clean.)
>>> We're going to disable explicit GC in our next maintenance period.  But this
>>> theory doesn't explain concurrent mode failure.
>>> * Maybe a third explanation is fragmentation? Is ParNew compacted on every
>>> collection?  I've read that CMS tenured gen can suffer from fragmentation.
>>>
>>> Some details of the installation.  Here is the Java version.
>>>
>>> java version "1.7.0_21"
>>> Java(TM) SE Runtime Environment (build 1.7.0_21-b11)
>>> Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode)
>>>
>>> Here are all the GC relevant parameters we are setting:
>>>
>>> -Dsun.rmi.dgc.client.gcInterval=3600000
>>> -Dsun.rmi.dgc.server.gcInterval=3600000
>>> -Xms74752m
>>> -Xmx74752m
>>> -Xmn15000m
>>> -XX:PermSize=192m
>>> -XX:MaxPermSize=1500m
>>> -XX:CMSInitiatingOccupancyFraction=60
>>> -XX:+UseConcMarkSweepGC
>>> -XX:+UseParNewGC
>>> -XX:+CMSParallelRemarkEnabled
>>> -XX:+ExplicitGCInvokesConcurrent
>>> -verbose:gc
>>> -XX:+PrintGCDetails
>>> -XX:+PrintGCTimeStamps
>>> -XX:+PrintGCDateStamps // I removed this from the output above to make it
>>> slightly more concise
>>> -Xloggc:gc.log
>>>
>>> Any thoughts or recommendations would be welcome,
>>> -Andrew
>>>
>>>
>>>    
>>> _______________________________________________
>>> hotspot-gc-use mailing list
>>> hotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/h
>>> otspot-gc-use
>>> <https://urldefense.proofpoint.com/v1/url?u=http://mail.openjdk.java.net/mailm
>>> an/listinfo/hotspot-gc-use&k=fDZpZZQMmYwf27OU23GmAQ%3D%3D%0A&r=XSv4n1CNq0shECU
>>> adR229oCRIwnwFZHnIJXszDSM7n4%3D%0A&m=AoKioLSCElHhkeKGB3Hh00BAmtDw%2BK%2FzjC3u0
>>> 7rQI3k%3D%0A&s=99757a8204b83fe8a9294b91a51d0cd1f289000588db1a77d2ccc9fd609bcfc
>>> f>
>>
>>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use