Repeated ParNews when Young Gen is Empty?

Mon Jul 22 18:03:34 PDT 2013

Jon, wasted space in promotion buffers was also my first thought. The
code for sizing promotion buffers is rather simplistic and assumes
uniformly sized promotion volume handled by each thread. So there
could definitely be wasted space in promotion buffers of for example
all threads but one, with very skewed promotion volumes across the
threads or perhaps with very skewed object sizes leading to much more
waste than usual.

The quick back-to-back minor gc's without the young gen filling up,
and the extremely long gc's seem to point to perhaps some highly
skewed work distribution causing serialization of the work.

Andrew, does the problem persist until a Full GC happens? Does the
problem (of back-to-back long minor gc's) always start when there's a
minor gc that happens before the young gen is full?

-- ramki

On Sat, Jul 20, 2013 at 8:24 AM, Jon Masamitsu <jon.masamitsu at oracle.com> wrote:
> Andrew,
>
> I'm having second thoughts about promotion buffers being
> relevant.  I'll have to think about it some more.  Sorry for the half
> baked idea.
>
> Jon
>
>
> On 7/19/2013 3:51 PM, Jon Masamitsu wrote:
>> Andrew,
>>
>> Regarding the growth in used in the young gen after a young GC,
>> it might be badly sized promotion buffers.  With parallel young
>> collections, each GC worker thread will have a private buffer
>> for copying the live objects (into a survivor space in the
>> young gen).  At the end of the GC I think the buffers are filled
>> with dummy objects.  Very badly sized promotion buffers could
>> lead to more used space in the young gen after a GC.
>> Try
>>
>> -XX:+PrintPLAB
>>
>> I think that will print out the promotion buffer sizes including
>> the amount of waste.
>>
>> Jon
>>
>> On 7/19/2013 2:23 PM, Andrew Colombi wrote:
>>> Jon,
>>>
>>> Here is a bit more from that same log:
>>>
>>> 49378.517: [GC 49378.517: [ParNew: 7894655K->332202K(13824000K), 0.1304950
>>> secs] 56021115K->48458661K(75010048K), 0.1314370 secs] [Times: user=2.00
>>> sys=0.01, real=0.13 secs]
>>> 49378.735: [GC 49378.736: [ParNew: 669513K->342976K(13824000K), 0.0849560
>>> secs] 48795972K->48469435K(75010048K), 0.0859460 secs] [Times: user=1.84
>>> sys=0.03, real=0.09 secs]
>>> 49378.850: [GC 49378.851: [ParNew: 514163K->312532K(13824000K), 0.1114820
>>> secs] 48640622K->48471080K(75010048K), 0.1122890 secs] [Times: user=2.21
>>> sys=0.02, real=0.12 secs]
>>> 49379.000: [GC 49379.000: [ParNew: 529899K->247436K(13824000K), 0.1099240
>>> secs]49379.110: [CMS2013-06-26T17:14:08.834-0400: 49395.671:
>>> [CMS-concurrent-mark: 16.629/17.168 secs] [Times: user=104.18 sys=1.86,
>>> real=17.16 secs]
>>> 227750K->31607742K(61186048K), 129.9298170 secs]
>>> 48688447K->31607742K(75010048K), [CMS Perm : 150231K->147875K(250384K)],
>>> 130.0405700 secs] [Times: user=209.80 sys=1.83, real=130.02 secs]
>>> 49510.844: [GC [1 CMS-initial-mark: 46419131K(61186048K)]
>>> 46881938K(75010048K), 0.1073960 secs] [Times: user=0.11 sys=0.00, real=0.11
>>> secs]
>>> 49510.952: [CMS-concurrent-mark-start]
>>> 49515.315: [GC 49515.316: [ParNew: 12288000K->130669K(13824000K), 0.0827050
>>> secs] 58707131K->46549801K(75010048K), 0.0838760 secs] [Times: user=1.63
>>> sys=0.01, real=0.09 secs]
>>> 49528.241: [CMS-concurrent-mark: 16.811/17.288 secs] [Times: user=184.48
>>> sys=21.43, real=17.29 secs]
>>> 49528.241: [CMS-concurrent-preclean-start]
>>> 49529.795: [CMS-concurrent-preclean: 1.549/1.554 secs] [Times: user=8.39
>>> sys=1.75, real=1.55 secs]
>>> 49529.795: [CMS-concurrent-abortable-preclean-start]
>>> 49534.261: [GC 49534.262: [ParNew: 12418669K->199314K(13824000K), 0.1248450
>>> secs] 58837801K->46618445K(75010048K), 0.1258850 secs] [Times: user=1.83
>>> sys=0.01, real=0.12 secs]
>>> me 2013-06-26T17:16:29.282-0400: 49536.120:
>>> [CMS-concurrent-abortable-preclean: 6.164/6.325 secs] [Times: user=29.18
>>> sys=6.79, real=6.33 secs]
>>> 49536.127: [GC[YG occupancy: 1158498 K (13824000 K)]49536.127: [Rescan
>>> (parallel) , 0.6845350 secs]49536.812: [weak refs processing, 0.0027360
>>> secs]49536.815: [scrub string table, 0.0026210 secs] [1 CMS-remark:
>>> 46419131K(61186048K)] 47577630K(75010048K), 0.6903830 secs] [Times:
>>> user=14.71 sys=0.08, real=0.69 secs]
>>> 49536.818: [CMS-concurrent-sweep-start]
>>> 49550.868: [CMS-concurrent-sweep: 14.026/14.049 secs] [Times: user=68.18
>>> sys=16.72, real=14.05 secs]
>>> 49550.868: [CMS-concurrent-reset-start]
>>> 49551.105: [CMS-concurrent-reset: 0.237/0.237 secs] [Times: user=1.31
>>> sys=0.32, real=0.24 secs]
>>>
>>> But I'd also point your attention to the log that I shared later on in this
>>> thread, where we observed Young Gen _grow_ during ParNew collection, snippet
>>> below.  Notice the last collection actually grows during collection, and the
>>> spurious "ParNew" line is part of the actual log, though I don't understand
>>> why.
>>>
>>> 30779.759: [GC 30779.760: [ParNew: 12397330K->125395K(13824000K),
>>> 122.6032130 secs] 33858511K->21587730K(75010048K), 122.6041920 secs] [Times:
>>> user=2811.37 sys=1.10, real=122.59 secs]
>>> 30914.319: [GC 30914.320: [ParNew
>>> 30914.319: [GC 30914.320: [ParNew: 12413753K->247108K(13824000K),
>>> 132.8863570 secs] 33876089K->21710570K(75010048K), 132.8874170 secs] [Times:
>>> user=3050.21 sys=0.74, real=132.86 secs]
>>> 31047.212: [GC 31047.212: [ParNew: 247347K->312540K(13824000K), 147.9675020
>>> secs] 21710809K->21777393K(75010048K), 147.9681870 secs] [Times:
>>> user=3398.88 sys=0.64, real=147.94 secs]
>>>
>>> We're still struggling with this.  We've opened an issue with Oracle support
>>> through our support contract.  I will keep the thread updated as we learn
>>> new, interesting things.
>>>
>>> -Andrew
>>>
>>> From:  Jon Masamitsu <jon.masamitsu at oracle.com>
>>> Organization:  Oracle Corporation
>>> Date:  Friday, July 19, 2013 1:32 PM
>>> To:  "hotspot-gc-use at openjdk.java.net" <hotspot-gc-use at openjdk.java.net>
>>> Subject:  Re: Repeated ParNews when Young Gen is Empty?
>>>
>>> What is the ParNew behavior like after the "concurrent mode failure"?
>>>
>>> Jon
>>>
>>> On 7/1/2013 1:44 PM, Andrew Colombi wrote:
>>>> Hi,
>>>>
>>>> I've been investigating some big, slow stop the world GCs, and came upon this
>>>> curious pattern of rapid, repeated ParNews on an almost empty Young Gen.
>>>> We're using - XX:+UseConcMarkSweepGC -XX:+UseParNewGC
>>>> -XX:+CMSParallelRemarkEnabled.  Here is the log:
>>>>
>>>> 49355.202: [GC 49355.202: [ParNew: 12499734K->276251K(13824000K), 0.1382160
>>>> secs] 45603872K->33380389K(75010048K), 0.1392380 secs] [Times: user=1.89
>>>> sys=0.02, real=0.14 secs]
>>>> 49370.274: [GC [1 CMS-initial-mark: 48126459K(61186048K)]
>>>> 56007160K(75010048K), 8.2281560 secs] [Times: user=8.22 sys=0.00, real=8.23
>>>> secs]
>>>> 49378.503: [CMS-concurrent-mark-start]
>>>> 49378.517: [GC 49378.517: [ParNew: 7894655K->332202K(13824000K), 0.1304950
>>>> secs] 56021115K->48458661K(75010048K), 0.1314370 secs] [Times: user=2.00
>>>> sys=0.01, real=0.13 secs]
>>>> 49378.735: [GC 49378.736: [ParNew: 669513K->342976K(13824000K), 0.0849560
>>>> secs] 48795972K->48469435K(75010048K), 0.0859460 secs] [Times: user=1.84
>>>> sys=0.03, real=0.09 secs]
>>>> 49378.850: [GC 49378.851: [ParNew: 514163K->312532K(13824000K), 0.1114820
>>>> secs] 48640622K->48471080K(75010048K), 0.1122890 secs] [Times: user=2.21
>>>> sys=0.02, real=0.12 secs]
>>>> 49379.000: [GC 49379.000: [ParNew: 529899K->247436K(13824000K), 0.1099240
>>>> secs]49379.110: [CMS2013-06-26T17:14:08.834-0400: 49395.671:
>>>> [CMS-concurrent-mark: 16.629/17.168 secs] [Times: user=104.18 sys=1.86,
>>>> real=17.16 secs]
>>>>    (concurrent mode failure): 48227750K->31607742K(61186048K), 129.9298170 secs]
>>>> 48688447K->31607742K(75010048K), [CMS Perm : 150231K->147875K(250384K)],
>>>> 130.0405700 secs] [Times: user=209.80 sys=1.83, real=130.02 secs]
>>>>
>>>> By my read, it starts with a typical ParNew that cleans about 12GB (of a 13GB
>>>> young gen).  Then CMS begins, and then the next three ParNews start feeling
>>>> strange.  First it does a ParNew at 49378.517 that hits at only 7.8GB occupied
>>>> of 13GB available. Then at  49378.736 and 49378.851 it does two ParNews when
>>>> young gen only has 660MB and 514MB occupied, respectively.  Then really bad
>>>> stuff happens: we hit a concurrent mode failure.  This stops the world for 2
>>>> minutes and clears about 17GB of data, almost all of which was in the CMS
>>>> tenured gen.  Notice there are still 12GB free in CMS!
>>>>
>>>> My question is, Why would it do three ParNews, only 300ms apart from each
>>>> other, when the young gen is essentially empty?  Here are three hypotheses
>>>> that I have:
>>>> * Is the application trying to allocate something giant, e.g. a 1 billion
>>>> element double[]? Is there a way I can test for this, i.e. some JVM level
>>>> logging that would indicate very large objects being allocated.
>>>> * Is there an explicit System.gc() in 3rd party code? (Our code is clean.)
>>>> We're going to disable explicit GC in our next maintenance period.  But this
>>>> theory doesn't explain concurrent mode failure.
>>>> * Maybe a third explanation is fragmentation? Is ParNew compacted on every
>>>> collection?  I've read that CMS tenured gen can suffer from fragmentation.
>>>>
>>>> Some details of the installation.  Here is the Java version.
>>>>
>>>> java version "1.7.0_21"
>>>> Java(TM) SE Runtime Environment (build 1.7.0_21-b11)
>>>> Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode)
>>>>
>>>> Here are all the GC relevant parameters we are setting:
>>>>
>>>> -Dsun.rmi.dgc.client.gcInterval=3600000
>>>> -Dsun.rmi.dgc.server.gcInterval=3600000
>>>> -Xms74752m
>>>> -Xmx74752m
>>>> -Xmn15000m
>>>> -XX:PermSize=192m
>>>> -XX:MaxPermSize=1500m
>>>> -XX:CMSInitiatingOccupancyFraction=60
>>>> -XX:+UseConcMarkSweepGC
>>>> -XX:+UseParNewGC
>>>> -XX:+CMSParallelRemarkEnabled
>>>> -XX:+ExplicitGCInvokesConcurrent
>>>> -verbose:gc
>>>> -XX:+PrintGCDetails
>>>> -XX:+PrintGCTimeStamps
>>>> -XX:+PrintGCDateStamps // I removed this from the output above to make it
>>>> slightly more concise
>>>> -Xloggc:gc.log
>>>>
>>>> Any thoughts or recommendations would be welcome,
>>>> -Andrew
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> hotspot-gc-use mailing list
>>>> hotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/h
>>>> otspot-gc-use
>>>> <https://urldefense.proofpoint.com/v1/url?u=http://mail.openjdk.java.net/mailm
>>>> an/listinfo/hotspot-gc-use&k=fDZpZZQMmYwf27OU23GmAQ%3D%3D%0A&r=XSv4n1CNq0shECU
>>>> adR229oCRIwnwFZHnIJXszDSM7n4%3D%0A&m=AoKioLSCElHhkeKGB3Hh00BAmtDw%2BK%2FzjC3u0
>>>> 7rQI3k%3D%0A&s=99757a8204b83fe8a9294b91a51d0cd1f289000588db1a77d2ccc9fd609bcfc
>>>> f>
>>>
>>>
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use