CMS collection keep working during holiday
Y Srinivas Ramakrishna
Y.S.Ramakrishna at Sun.COM
Wed Oct 8 09:10:38 UTC 2008
Hi Kirk --
> Nice analysis. Maybe I've missed it but I've not seen anything that
> rules out a simple object leak. Until the leak has been ruled out
> would
> you prefer to keep objects out of tenured rather than quickly
> promoting
> them to a space that is already under enormous pressure? Do you see
> extra copying as even being relevant to this problem?
The extra copying is relevant only in as much as avoiding it will
reduce the scavenge pauses to approximately 1/5 (or lower) of
their current value. However, you are right that this will not
change by much the promotion _rate_ into the tenured gen.
(it will increase by a tad, but not by much.) Thus, tenuring
after aging for a single scavenge appears to be a win-win.
Now the reason I suggested the quick tenuring was because
i suspected that the difference between CMS and mark-compact
collections is showing up because the new gen is treated
as a source of roots into the old gen and perhaps the quicker
tenuring will reduce the # "false roots" if you will
(see my reference to Reference objects and to cycles
spanning across the generations; the idea is to avoid
carrying around such potential garbage).
You are right though that perhaps it's a simple object leak.
What makes it strange, and makes it somewhat unlikely to be a simple
memory leak is how a mark-compact collection reclaims the
space whereas a concurrent collection does not (even accounting
for some small amount of floating garbage because of concurrent
marking -- of which there should be little in the present case
of long inter-scavenge durations and frequent back-to-back CMS
collections).
As regards pressure on the tenured gen: the logs actually indicate
that the promotion rate into the old gen is not that great (note
that each scavenge promotes only about 10-11 MB or so of data
into the old gen and scavenges happen only about once every 6 minutes,
making for a promotion rate of about 30 KB/s which is fairly low.
As you will have noted the CMS cycles are essentially "busy work"
cycles (in that the work is unnecessary and does not accomplish
anything; thus must not be mistaken for the need for a lot of
work). It is true, however, that there is some "pressure"
in the sense that the occupancy, as described by Ken, keeps
increasing over the course of the day ...
-- ramki
>
> Regards,
> Kirk
>
> Y Srinivas Ramakrishna wrote:
> > Hi Ken --
> >
> > You do see of course that in the gc.zip that you made available,
> > there were 4 promotion failures, followed by compacting collections,
> > as listed below in chronological order:-
> >
> > 163116.821: [GC 163116.821: [ParNew: 873856K->174720K(873856K),
> 0.4830286 secs] 1508484K->961405K(4019584K), 0.4831137 secs] [Times:
> user=1.59 sys=0.01, real=0.
> > 49 secs]
> > ...
> > 165310.759: [GC 165310.759: [ParNew (promotion failed):
> 873856K->873856K(873856K), 0.8919148 secs]165311.651:[CMS:
> 1911965K->785641K(3145728K), 2.9784355 secs]
> 2733287K->785641K(4019584K), 3.8704903 secs] [Times: user=4.80
> sys=0.16, real=3.88 secs]
> > ...
> > 231411.554: [GC 231411.554: [ParNew: 873856K->174720K(873856K),
> 0.5915618 secs] 1676990K->1122182K(4019584K), 0.5916525 secs] [Times:
> user=1.45 sys=0.00, real=0
> > .59 secs]
> > ...
> > 233946.300: [GC 233946.300: [ParNew (promotion failed):
> 873856K->873856K(873856K), 0.9428034 secs]233947.243: [CMS:
> 2172134K->570964K(3145728K), 2.7655578 secs]
> 2969591K->570964K(4019584K), 3.7084973 secs] [Times: user=5.17
> sys=0.09, real=3.70 secs]
> > ...
> > 247864.029: [GC 247864.029: [ParNew (promotion failed):
> 873856K->873855K(873856K), 0.8641325 secs]247864.893: [CMS:
> 1845477K->499108K(3145728K), 2.2010342 secs]
> 2628538K->499108K(4019584K), 3.0652834 secs] [Times: user=4.30
> sys=0.06, real=3.06 secs]
> > ...
> > 248187.856: [GC 248187.856: [ParNew: 699136K->174719K(873856K),
> 0.3926926 secs] 1050016K->610565K(4019584K), 0.3927797 secs] [Times:
> user=1.45 sys=0.03, real=0.
> > 39 secs]
> > ...
> > 265611.034: [GC 265611.034: [ParNew (promotion failed):
> 873856K->873856K(873856K), 0.6459451 secs]265611.680: [CMS:
> 1854348K->502128K(3145728K), 2.2556597 secs]
> 2714667K->502128K(4019584K), 2.9017069 secs] [Times: user=4.19
> sys=0.08, real=2.89 secs]
> > ...
> > 502712.808: [GC 502712.808: [ParNew: 873856K->174720K(873856K),
> 1.2972687 secs] 3806995K->3167193K(4019584K), 1.2973536 secs] [Times:
> user=1.75 sys=0.00, real=1
> > .30 secs]
> > ...
> > 504379.249: [GC 504379.249: [ParNew: 873856K->873856K(873856K),
> 0.0000241 secs]504379.249: [CMS504379.358:
> [CMS-concurrent-abortable-preclean: 0.209/2.471 secs] [Times:
> user=1.28 sys=0.00, real=2.48 secs] (concurrent mode failure):
> 2938816K->424257K(3145728K), 5.2334306 secs]
> 3812672K->424257K(4019584K), 5.2335621 secs] [Times:
> > user=5.11 sys=0.02, real=5.23 secs]
> >
> > In other words, CMS often does not manage to clear the allocated
> > memory. But the CMS collection preceding the promotion failure
> > indicates heap usage only a little more than what you see after
> > the mark-compact collection. There is almost certainly something
> > fishy going on here. It may be related to Reference objects and their
> > referents being split across generations, but it's not completely clear.
> >
> > [Perhaps you can check if your application makes use of Soft/Weak/References
> > in some funny way, so as to perhaps cause the Reference and referent
> to be
> > split between the old and new generations?]
> >
> > Now one thing we know about the difference between CMS and a full mark-compact
> > collection is that CMS treats the young gen as a source of roots for
> the
> > CMS and perm gen collection, whereas this is not so for the latter.
> > In pathological cases, a dead cycle spanning the young gen and the
> > old gen would remain uncollected by CMS but be collected by the mark-compact
> > collection. Now, I don't think that would be happening in your case,
> but I
> > do see that the frequency of ParNew collections in your case is
> fairly low --
> > about 1 every 6 minutes or so. With a tenuring threshold of 11, it's
> possible
> > for a dead cycle to live for a full hour before it gets promoted
> into the
> > old generation whence it would be eligible for collection.
> >
> > In some cases, the CMS initiating occupancy fraction has not been hit
> > (and a CMS collection not even attempte) before a promotion failure
> > occurs. This would appear to call for more frequent CMS collections
> > following ParNew collections, even something drastic like:-
> >
> > CMSInitiatingOccupancyFraction=25
> >
> > This would of course cause your back to back CMS collections that
> > would peg your CPU, in order to avoid that, you might disable the
> > CMS "timeout" via:-
> >
> > CMSWaitDuration=1200000
> >
> > i.e. make it 20 minutes (=120000 ms).
> >
> > In addition, I find that your scavenges are unnecessarily copying
> > survivors back-and-forth 11 times or so, I'd just scrunch that down
> > to:-
> >
> > MaxTenuringThreshold=1 (or at most 2)
> >
> > This may or may not address the problem you are seeing here but
> > it's at least a start towards eliminating at least some of
> > the suspects and of course of eliminating the "busy work"
> > that CMS is doing during the holidays.
> >
> > -- ramki
> >
> > ----- Original Message -----
> > From: "Ken-- at newsgroupstats.hk" <dragonken at gmail.com>
> > Date: Tuesday, October 7, 2008 11:14 pm
> > Subject: Re: CMS collection keep working during holiday
> > To: hotspot-gc-dev at openjdk.java.net
> >
> >
> >
> >> Thanks to your quick response. I will do it now but it require 1 -
> 2
> >> days to
> >> reproduce the result.
> >>
> >> It seems that my case is very special.
> >>
> >> What the application is simply a stock quote client server and
> store daily
> >> stock quote and chart data in memory. The lived objects need around
> 80M
> >> memory at day open and 500M memory at day end.
> >>
> >>
> >>
> >>
> >> y.s.ramakrishna wrote:
> >>
> >>> Could you run with the additional option +PrintReferenceGC and see
> if
> >>> it sheds additional light on the problem? (Please share the resulting
> >>> GC logs.)
> >>>
> >>> If at all possible, you might try rev'ing down to 6.0 and seeing
> >>> if the same behaviour reproduces or not?
> >>>
> >>> I understand that some or all of this may not be possible in a
> >>> production setting.
> >>>
> >>> -- ramki
> >>>
> >>> ----- Original Message -----
> >>> From: "Ken-- at newsgroupstats.hk" <dragonken at gmail.com>
> >>> Date: Tuesday, October 7, 2008 10:17 pm
> >>> Subject: Re: CMS collection keep working during holiday
> >>> To: hotspot-gc-dev at openjdk.java.net
> >>>
> >>>
> >>>
> >>>> Follow up to the jconsole above. This gif is captured today. You
>
> >>>>
> >> can
> >>
> >>>> see the
> >>>> first CMS after a FULL GC is working. The 2nd one is working too
>
> >>>>
> >> but the
> >>
> >>>> memory free from 2nd generation is much less than the first CMS.
> The
> >>>> memory
> >>>> free by 3rd CMS is further decreased.
> >>>>
> >>>> The 5th CMS cannot free enough memory to make occanpcy in old-gen
>
> >>>>
> >> to less
> >>
> >>>> than 70%. So, continue non-stopping CMS happen again until a Full
> GC
> >>>> (triggered by Concurrent Mode Failure).
> >>>>
> >>>> http://www.nabble.com/file/p19872463/jconsole_20081008.jpg
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> View this message in context:
> >>>> http://www.nabble.com/CMS-collection-keep-working-during-holiday-tp19773575p19872463.html
> >>>> Sent from the OpenJDK Hotspot Garbage Collection mailing list
> >>>>
> >> archive
> >>
> >>>> at Nabble.com.
> >>>>
> >>>>
> >>>
> >> --
> >> View this message in context: http://www.nabble.com/CMS-collection-keep-working-during-holiday-tp19773575p19872900.html
> >> Sent from the OpenJDK Hotspot Garbage Collection mailing list
> archive
> >> at Nabble.com.
> >>
> >>
> >
> >
>
More information about the hotspot-gc-dev
mailing list