Promotion failures: indication of CMS fragmentation?

Sun Apr 15 09:41:02 PDT 2012

Hi Chi,

I've sent you a decent chunk of the gc.log file off-list (hopefully
not too large).

For completeness, we're running with the following options (ignoring
the diagnostic ones):
-----
-server
-Xms5g
-Xmx5g
-Xmn800m
-XX:PermSize=256m
-XX:MaxPermSize=256m
-XX:SurvivorRatio=4
-XX:+UseConcMarkSweepGC
-XX:+UseParNewGC
-XX:+DisableExplicitGC
-XX:+UseCMSInitiatingOccupancyOnly
-XX:+CMSClassUnloadingEnabled
-XX:CMSInitiatingOccupancyFraction=68
-----
Platform is Java 6u29 running on Linux 2.6 x64.
Hardware is 2xquad Xeons, but pretty old ones (pre-Nehalem, no QPI).

The gc logs will (typically) show big peaks at the start and end of
the working day - this is nature of the domain our application
targets.

I would expect the live set to be below 1G (usually below 600M even).
However, we can experience temporary spikes of higher volume
longer-living object allocation bursts.

We'll set up a jstat log for this machine. I do have historical jstat
logs for one of the other machines, but that one is still running with
a smaller new gen, and smaller survivor spaces. If there's any other
relevant data that I can collect, let me know.

Kind regards,
Taras

On Sun, Apr 15, 2012 at 6:15 PM, Chi Ho Kwok <chkwok at digibites.nl> wrote:
> Hi Teras,
>
> Sure thing. Just the previous CMS should be enough, it doesn't matter if
> there is 10 or 1000 parnew's between that and the failure.
>
> As for the jstat failure, it looks like it looks in
> /tmp/hsperfdata_[username] for the pid by default, maybe something
> like -J-Djava.io.tmpdir=[path, like /app/client/program/tomcat/temp] can
> help; and from what I've seen, running jstat as the same user as the process
> or root is required. Historical data is nice to have, but even just staring
> at it for 15 minutes should give you a hint for the old gen usage.
>
> If the collection starts at 68, takes a while and the heap fills to 80%+
> before it's done when it's not busy, it's probably wise to lower the initial
> occupancy factor or increase the thread count so it completes faster. We run
> with -XX:ParallelCMSThreads=3 on a 8 hw thread server, the default (2) was
> too slow for us as we run with 76%, it still takes 15s on average for CMS to
> scan and clean the old gen (while old gen grows to up to 80% full), much
> longer can mean a promotion failure during request spikes.
>
>
> Chi Ho Kwok
>
>
> On Sun, Apr 15, 2012 at 5:08 PM, Taras Tielkes <taras.tielkes at gmail.com>
> wrote:
>>
>> Hi Chi,
>>
>> Is it o.k. if I send this off-list to you directly? If so, how much
>> more do you need? Just enough to cover the previous CMS?
>> We're running with  -XX:CMSInitiatingOccupancyFraction=68 and
>> -XX:+UseCMSInitiatingOccupancyOnly, by the way.
>>
>> I do have shell access, however, on that particular machine we're
>> experiencing the "process id not found" issue with jstat.
>> I think this can be worked around by fiddling with temp directory
>> options, but we haven't tried that yet.
>> Regarding the jstat output, I assume this would be most valuable to
>> have for the exact moment when the promotion failure happens, correct?
>> If so, we can try to set up jstat to run in the background
>> continuously, to have more diagnostic data in the future.
>>
>> Kind regards,
>> Taras
>>
>> On Sun, Apr 15, 2012 at 2:48 PM, Chi Ho Kwok <chkwok at digibites.nl> wrote:
>> > Hi Teras,
>> >
>> > Can you send me a larger chunk of the log? I'm interested in seeing when
>> > the
>> > last CMS was run and what it freed. Maybe it's kicking in too late, the
>> > full
>> > GC triggered by promotion failure only found 600M live data, rest was
>> > garbage. If that's the cause, lowering XX:CMSInitiatingOccupancyFraction
>> > can
>> > help.
>> >
>> > Also, do you have shell access to that machine? If so, try running
>> > jstat,
>> > you can see the usage of all generations live as it happens.
>> >
>> >
>> > Chi Ho Kwok
>> >
>> > On Sun, Apr 15, 2012 at 2:34 PM, Taras Tielkes <taras.tielkes at gmail.com>
>> > wrote:
>> >>
>> >> Hi Chi, Srinivas,
>> >>
>> >> Optimizing the cost of ParNew (by lowering MTT) would be nice, but for
>> >> now my priority is still to minimize the promotion failures.
>> >>
>> >> For example, on the machine running CMS with the "larger" young gen
>> >> and survivor spaces (-Xmx5g -Xmn800 -XX:SurvivorRatio=4), I've just
>> >> seen a promotion failure again. Below is a snippet of gc.log showing
>> >> this.
>> >> To put this into perspective, this is a first promotion failure on
>> >> that machine in a couple of weeks. Still, zero failures would beat a
>> >> single failure, since the clients connecting to this application will
>> >> only wait a few seconds before timing out and terminating the
>> >> connection. In addition, the promotion failures are occurring in peak
>> >> usage moments.
>> >>
>> >> Apart from trying to eliminate the promotion failure pauses, my main
>> >> goal is to learn how to understand the root cause in a case like this.
>> >> Any suggestions for things to try or read up on are appreciated.
>> >>
>> >> Kind regards,
>> >> Taras
>> >> ------------------------------------------------
>> >> 2012-04-13T17:44:27.777+0200: 3296255.045: [GC 3296255.046: [ParNew
>> >> Desired survivor size 69894144 bytes, new threshold 15 (max 15)
>> >> - age   1:    3684448 bytes,    3684448 total
>> >> - age   2:     824984 bytes,    4509432 total
>> >> - age   3:     885120 bytes,    5394552 total
>> >> - age   4:     756568 bytes,    6151120 total
>> >> - age   5:     696880 bytes,    6848000 total
>> >> - age   6:     890688 bytes,    7738688 total
>> >> - age   7:    2631184 bytes,   10369872 total
>> >> - age   8:     719976 bytes,   11089848 total
>> >> - age   9:     724944 bytes,   11814792 total
>> >> - age  10:     750360 bytes,   12565152 total
>> >> - age  11:     934944 bytes,   13500096 total
>> >> - age  12:     521080 bytes,   14021176 total
>> >> - age  13:     543392 bytes,   14564568 total
>> >> - age  14:     906616 bytes,   15471184 total
>> >> - age  15:     504008 bytes,   15975192 total
>> >> : 568932K->22625K(682688K), 0.0410180 secs]
>> >> 3077079K->2531413K(5106368K), 0.0416940 secs] [Times: user=0.30
>> >> sys=0.01, real=0.05 secs]
>> >> 2012-04-13T17:44:33.893+0200: 3296261.162: [GC 3296261.162: [ParNew
>> >> Desired survivor size 69894144 bytes, new threshold 15 (max 15)
>> >> - age   1:    2975896 bytes,    2975896 total
>> >> - age   2:     742592 bytes,    3718488 total
>> >> - age   3:     812864 bytes,    4531352 total
>> >> - age   4:     873488 bytes,    5404840 total
>> >> - age   5:     746128 bytes,    6150968 total
>> >> - age   6:     685192 bytes,    6836160 total
>> >> - age   7:     888376 bytes,    7724536 total
>> >> - age   8:    2621688 bytes,   10346224 total
>> >> - age   9:     715608 bytes,   11061832 total
>> >> - age  10:     723336 bytes,   11785168 total
>> >> - age  11:     749856 bytes,   12535024 total
>> >> - age  12:     914632 bytes,   13449656 total
>> >> - age  13:     520944 bytes,   13970600 total
>> >> - age  14:     543224 bytes,   14513824 total
>> >> - age  15:     906040 bytes,   15419864 total
>> >> : 568801K->22726K(682688K), 0.0447800 secs]
>> >> 3077589K->2532008K(5106368K), 0.0454710 secs] [Times: user=0.33
>> >> sys=0.00, real=0.05 secs]
>> >> 2012-04-13T17:44:40.231+0200: 3296267.499: [GC 3296267.500: [ParNew
>> >> (1: promotion failure size = 16)  (2: promotion failure size = 56)
>> >> (4: promotion failure
>> >> size = 342)  (5: promotion failure size = 1026)  (6: promotion failure
>> >> size = 278)  (promotion failed)
>> >> Desired survivor size 69894144 bytes, new threshold 15 (max 15)
>> >> - age   1:    2436840 bytes,    2436840 total
>> >> - age   2:    1625136 bytes,    4061976 total
>> >> - age   3:     691664 bytes,    4753640 total
>> >> - age   4:     799992 bytes,    5553632 total
>> >> - age   5:     858344 bytes,    6411976 total
>> >> - age   6:     730200 bytes,    7142176 total
>> >> - age   7:     680072 bytes,    7822248 total
>> >> - age   8:     885960 bytes,    8708208 total
>> >> - age   9:    2618544 bytes,   11326752 total
>> >> - age  10:     709168 bytes,   12035920 total
>> >> - age  11:     714576 bytes,   12750496 total
>> >> - age  12:     734976 bytes,   13485472 total
>> >> - age  13:     905048 bytes,   14390520 total
>> >> - age  14:     520320 bytes,   14910840 total
>> >> - age  15:     543056 bytes,   15453896 total
>> >> : 568902K->568678K(682688K), 0.3130580 secs]3296267.813: [CMS:
>> >> 2510091K->573489K(4423680K), 7.7481330 secs]
>> >> 3078184K->573489K(5106368K), [CMS Perm : 144002K->
>> >> 143970K(262144K)], 8.0619690 secs] [Times: user=8.35 sys=0.01,
>> >> real=8.06
>> >> secs]
>> >> 2012-04-13T17:44:51.337+0200: 3296278.606: [GC 3296278.606: [ParNew
>> >> Desired survivor size 69894144 bytes, new threshold 15 (max 15)
>> >> - age   1:   33717528 bytes,   33717528 total
>> >> : 546176K->43054K(682688K), 0.0515990 secs]
>> >> 1119665K->616543K(5106368K), 0.0523550 secs] [Times: user=0.34
>> >> sys=0.00, real=0.05 secs]
>> >> ------------------------------------------------
>> >>
>> >> On Tue, Mar 20, 2012 at 10:12 PM, Srinivas Ramakrishna
>> >> <ysr1729 at gmail.com> wrote:
>> >> > As Chi-ho noted, about 3-4 MB of data does get promoted per scavenge,
>> >> > after having
>> >> > sloshed around in your survivor spaces some 15 times. I'd venture
>> >> > that
>> >> > whatever winnowing
>> >> > of young objects was to ocur has in fact occured already within the
>> >> > first 3-4 scavenges that
>> >> > an object has survived, after which the drop-off in population is
>> >> > less
>> >> > sharp. So I'd suggest
>> >> > lowering the MTT to about 3, while leaving the survivor ratio intact.
>> >> > That should reduce your
>> >> > copying costs and bring down your scavenge pauses further, while not
>> >> > adversely affecting
>> >> > your promotion rates (and concomitantly the fragmentation).
>> >> >
>> >> > One thing that was a bit puzzling about the stats below was that
>> >> > you'd
>> >> > expect the volume
>> >> > of generation X in scavenge N to be no less than the volume of
>> >> > generation X+1 in scavenge N+1,
>> >> > but occasionally that natural invariant does not appear to hold,
>> >> > which
>> >> > is quite puzzling --
>> >> > indicating perhaps that either ages or populations are not being
>> >> > correctly tracked.
>> >> >
>> >> > I don't know if anyone else has noticed that in their tenuring
>> >> > distributions as well....
>> >> >
>> >> > -- ramki
>> >> >
>> >> > On Tue, Mar 20, 2012 at 9:36 AM, Taras Tielkes
>> >> > <taras.tielkes at gmail.com>
>> >> > wrote:
>> >> >> Hi,
>> >> >>
>> >> >> I've collected -XX:+PrintTenuringDistribution data from a node in
>> >> >> our
>> >> >> production environment, running -Xmx5g -Xmn400m -XX:SurvivorRatio=8.
>> >> >> On one other production node, we've configured a larger new gen, and
>> >> >> larger survivor spaces (-Xmx5g -Xmn800m -XX:SurvivorRatio=4).
>> >> >> This node has -XX:+PrintTenuringDistribution logging as well.
>> >> >>
>> >> >> The node running the larger new gen and survivor spaces has not run
>> >> >> into a promotion failure yet, while the ones still running the old
>> >> >> config have hit a few.
>> >> >> The promotion failures are typically experienced at high load
>> >> >> periods,
>> >> >> which makes sense, as allocation and promotion will experience a
>> >> >> spike
>> >> >> in those periods as well.
>> >> >>
>> >> >> The inherent nature of the application implies relatively long
>> >> >> sessions (towards a few hours), retaining a fair amout of state up
>> >> >> to
>> >> >> an hour.
>> >> >> I believe this is the main reason of the relatively high promotion
>> >> >> rate we're experiencing.
>> >> >>
>> >> >>
>> >> >> Here's a fragment of gc log from one of the nodes running the older
>> >> >> (smaller) new gen, including a promotion failure:
>> >> >> -------------------------
>> >> >> 2012-03-15T18:32:17.785+0100: 796604.225: [GC 796604.225: [ParNew
>> >> >> Desired survivor size 20971520 bytes, new threshold 8 (max 15)
>> >> >> - age   1:    2927728 bytes,    2927728 total
>> >> >> - age   2:    2428512 bytes,    5356240 total
>> >> >> - age   3:    2696376 bytes,    8052616 total
>> >> >> - age   4:    2623576 bytes,   10676192 total
>> >> >> - age   5:    3365576 bytes,   14041768 total
>> >> >> - age   6:    2792272 bytes,   16834040 total
>> >> >> - age   7:    2233008 bytes,   19067048 total
>> >> >> - age   8:    2263824 bytes,   21330872 total
>> >> >> : 358709K->29362K(368640K), 0.0461460 secs]
>> >> >> 3479492K->3151874K(5201920K), 0.0467320 secs] [Times: user=0.34
>> >> >> sys=0.01, real=0.05 secs]
>> >> >> 2012-03-15T18:32:21.546+0100: 796607.986: [GC 796607.986: [ParNew
>> >> >> (0:
>> >> >> promotion failure size = 25)  (1: promotion failure size = 25)  (2:
>> >> >> promotion failure size = 25)  (3: promotion failure size = 25)  (4:
>> >> >> promotion failure size = 25)  (5
>> >> >> : promotion failure size = 25)  (6: promotion failure size = 341)
>> >> >>  (7:
>> >> >> promotion failure size = 25)  (promotion failed)
>> >> >> Desired survivor size 20971520 bytes, new threshold 8 (max 15)
>> >> >> - age   1:    3708208 bytes,    3708208 total
>> >> >> - age   2:    2174384 bytes,    5882592 total
>> >> >> - age   3:    2383256 bytes,    8265848 total
>> >> >> - age   4:    2689912 bytes,   10955760 total
>> >> >> - age   5:    2621832 bytes,   13577592 total
>> >> >> - age   6:    3360440 bytes,   16938032 total
>> >> >> - age   7:    2784136 bytes,   19722168 total
>> >> >> - age   8:    2220232 bytes,   21942400 total
>> >> >> : 357042K->356456K(368640K), 0.2734100 secs]796608.259: [CMS:
>> >> >> 3124189K->516640K(4833280K), 6.8127070 secs]
>> >> >> 3479554K->516640K(5201920K), [CMS Perm : 142423K->142284K(262144K)],
>> >> >> 7.0867850 secs] [Times: user=7.32 sys=0.07, real=7.09 secs]
>> >> >> 2012-03-15T18:32:30.279+0100: 796616.719: [GC 796616.720: [ParNew
>> >> >> Desired survivor size 20971520 bytes, new threshold 1 (max 15)
>> >> >> - age   1:   29721456 bytes,   29721456 total
>> >> >> : 327680K->40960K(368640K), 0.0403130 secs]
>> >> >> 844320K->557862K(5201920K), 0.0409070 secs] [Times: user=0.27
>> >> >> sys=0.01, real=0.04 secs]
>> >> >> 2012-03-15T18:32:32.701+0100: 796619.141: [GC 796619.141: [ParNew
>> >> >> Desired survivor size 20971520 bytes, new threshold 15 (max 15)
>> >> >> - age   1:   10310176 bytes,   10310176 total
>> >> >> -------------------------
>> >> >>
>> >> >> For contrast, here's a gc log fragment from the single node running
>> >> >> the larger new gen and larger survivor spaces:
>> >> >> (the fragment is from the same point in time, with the nodes
>> >> >> experiencing equal load)
>> >> >> -------------------------
>> >> >> 2012-03-15T18:32:12.067+0100: 797119.336: [GC 797119.336: [ParNew
>> >> >> Desired survivor size 69894144 bytes, new threshold 15 (max 15)
>> >> >> - age   1:    5611536 bytes,    5611536 total
>> >> >> - age   2:    3731888 bytes,    9343424 total
>> >> >> - age   3:    3450672 bytes,   12794096 total
>> >> >> - age   4:    3314744 bytes,   16108840 total
>> >> >> - age   5:    3459888 bytes,   19568728 total
>> >> >> - age   6:    3334712 bytes,   22903440 total
>> >> >> - age   7:    3671960 bytes,   26575400 total
>> >> >> - age   8:    3841608 bytes,   30417008 total
>> >> >> - age   9:    2035392 bytes,   32452400 total
>> >> >> - age  10:    1975056 bytes,   34427456 total
>> >> >> - age  11:    2021344 bytes,   36448800 total
>> >> >> - age  12:    1520752 bytes,   37969552 total
>> >> >> - age  13:    1494176 bytes,   39463728 total
>> >> >> - age  14:    2355136 bytes,   41818864 total
>> >> >> - age  15:    1279000 bytes,   43097864 total
>> >> >> : 603473K->61640K(682688K), 0.0756570 secs]
>> >> >> 3373284K->2832383K(5106368K), 0.0762090 secs] [Times: user=0.56
>> >> >> sys=0.00, real=0.08 secs]
>> >> >> 2012-03-15T18:32:18.200+0100: 797125.468: [GC 797125.469: [ParNew
>> >> >> Desired survivor size 69894144 bytes, new threshold 15 (max 15)
>> >> >> - age   1:    6101320 bytes,    6101320 total
>> >> >> - age   2:    4446776 bytes,   10548096 total
>> >> >> - age   3:    3701384 bytes,   14249480 total
>> >> >> - age   4:    3438488 bytes,   17687968 total
>> >> >> - age   5:    3295360 bytes,   20983328 total
>> >> >> - age   6:    3403320 bytes,   24386648 total
>> >> >> - age   7:    3323368 bytes,   27710016 total
>> >> >> - age   8:    3665760 bytes,   31375776 total
>> >> >> - age   9:    2427904 bytes,   33803680 total
>> >> >> - age  10:    1418656 bytes,   35222336 total
>> >> >> - age  11:    1955192 bytes,   37177528 total
>> >> >> - age  12:    2006064 bytes,   39183592 total
>> >> >> - age  13:    1520768 bytes,   40704360 total
>> >> >> - age  14:    1493728 bytes,   42198088 total
>> >> >> - age  15:    2354376 bytes,   44552464 total
>> >> >> : 607816K->62650K(682688K), 0.0779270 secs]
>> >> >> 3378559K->2834643K(5106368K), 0.0784690 secs] [Times: user=0.58
>> >> >> sys=0.00, real=0.08 secs]
>> >> >> -------------------------
>> >> >>
>> >> >> Questions:
>> >> >>
>> >> >> 1) From the tenuring distributions, it seems that the application
>> >> >> benefits from larger new gen and survivor spaces.
>> >> >> The next thing we'll try is to run with -Xmn1g -XX:SurvivorRatio=2,
>> >> >> and see if the ParNew times are still acceptable.
>> >> >> Does this seem a sensible approach in this context?
>> >> >> Are there other variables beyond ParNew times that limit scaling the
>> >> >> new gen to a large size?
>> >> >>
>> >> >> 2) Given the object age demographics inherent to our application, we
>> >> >> can not expect to see the majority of data get collected in the new
>> >> >> gen.
>> >> >>
>> >> >> Our approach to fight the promotion failures consists of three
>> >> >> aspects:
>> >> >> a) Lower the overall allocation rate of our application (by
>> >> >> improving
>> >> >> wasteful hotspots), to decrease overall ParNew collection frequency.
>> >> >> b) Configure the new gen and survivor spaces as large as possible,
>> >> >> keeping an eye on ParNew times and overall new/tenured ratio.
>> >> >> c) Try to refactor the data structures that form the bulk of
>> >> >> promoted
>> >> >> data, to retain only the strictly required subgraphs.
>> >> >>
>> >> >> Is there anything else I can try or measure, in order to better
>> >> >> understand the problem?
>> >> >>
>> >> >> Thanks in advance,
>> >> >> Taras
>> >> >>
>> >> >>
>> >> >> On Wed, Feb 22, 2012 at 10:51 AM, Taras Tielkes
>> >> >> <taras.tielkes at gmail.com> wrote:
>> >> >>> (this time properly responding to the list alias)
>> >> >>> Hi Srinivas,
>> >> >>>
>> >> >>> We're running 1.6.0 u29 on Linux x64. My understanding is that
>> >> >>> CompressedOops is enabled by default since u23.
>> >> >>>
>> >> >>> At least this page seems to support that:
>> >> >>>
>> >> >>>
>> >> >>> http://docs.oracle.com/javase/7/docs/technotes/guides/vm/performance-enhancements-7.html
>> >> >>>
>> >> >>> Regarding the other remarks (also from Todd and Chi), I'll comment
>> >> >>> later. The first thing on my list is to collect
>> >> >>> PrintTenuringDistribution data now.
>> >> >>>
>> >> >>> Kind regards,
>> >> >>> Taras
>> >> >>>
>> >> >>> On Wed, Feb 22, 2012 at 10:50 AM, Taras Tielkes
>> >> >>> <taras.tielkes at gmail.com> wrote:
>> >> >>>> Hi Srinivas,
>> >> >>>>
>> >> >>>> We're running 1.6.0 u29 on Linux x64. My understanding is that
>> >> >>>> CompressedOops is enabled by default since u23.
>> >> >>>>
>> >> >>>> At least this page seems to support that:
>> >> >>>>
>> >> >>>>
>> >> >>>> http://docs.oracle.com/javase/7/docs/technotes/guides/vm/performance-enhancements-7.html
>> >> >>>>
>> >> >>>> Regarding the other remarks (also from Todd and Chi), I'll comment
>> >> >>>> later. The first thing on my list is to collect
>> >> >>>> PrintTenuringDistribution data now.
>> >> >>>>
>> >> >>>> Kind regards,
>> >> >>>> Taras
>> >> >>>>
>> >> >>>> On Wed, Feb 22, 2012 at 12:40 AM, Srinivas Ramakrishna
>> >> >>>> <ysr1729 at gmail.com> wrote:
>> >> >>>>> I agree that premature promotions are almost always the first and
>> >> >>>>> most
>> >> >>>>> important thing to fix when running
>> >> >>>>> into fragmentation or overload issues with CMS. However, I can
>> >> >>>>> also
>> >> >>>>> imagine
>> >> >>>>> long-lived objects with a highly
>> >> >>>>> non-stationary size distribution which can also cause problems
>> >> >>>>> for
>> >> >>>>> CMS
>> >> >>>>> despite best efforts to tune against
>> >> >>>>> premature promotion.
>> >> >>>>>
>> >> >>>>> I didn't think Treas was running with MTT=0, although MTT > 0 is
>> >> >>>>> no
>> >> >>>>> recipe
>> >> >>>>> for avoiding premature promotion
>> >> >>>>> with bursty loads that case overflow the survivor spaces -- as
>> >> >>>>> you
>> >> >>>>> say large
>> >> >>>>> survivor spaces with a low
>> >> >>>>> TargetSurvivorRatio -- so as to leave plenty of space to
>> >> >>>>> absorb/accommodate
>> >> >>>>> spiking/bursty loads  is
>> >> >>>>> definitely a "best practice" for CMS (and possibly for other
>> >> >>>>> concurrent
>> >> >>>>> collectors as well).
>> >> >>>>>
>> >> >>>>> One thing Taras can do to see if premature promotion might be an
>> >> >>>>> issue is to
>> >> >>>>> look at the tenuring
>> >> >>>>> threshold in his case. A rough proxy (if
>> >> >>>>> PrintTenuringDistribution
>> >> >>>>> is not
>> >> >>>>> enabled) is to look at the
>> >> >>>>> promotion volume per scavenge. It may be possible, if premature
>> >> >>>>> promotion is
>> >> >>>>> a cause, to see
>> >> >>>>> some kind of medium-term correlation between high promotion
>> >> >>>>> volume
>> >> >>>>> and
>> >> >>>>> eventual promotion
>> >> >>>>> failure despite frequent CMS collections.
>> >> >>>>>
>> >> >>>>> One other point which may or may not be relevant. I see that
>> >> >>>>> Taras
>> >> >>>>> is not
>> >> >>>>> using CompressedOops...
>> >> >>>>> Using that alone would greatly decrease memory pressure and
>> >> >>>>> provide
>> >> >>>>> more
>> >> >>>>> breathing room to CMS,
>> >> >>>>> which is also almost always a good idea.
>> >> >>>>>
>> >> >>>>> -- ramki
>> >> >>>>>
>> >> >>>>> On Tue, Feb 21, 2012 at 10:16 AM, Chi Ho Kwok
>> >> >>>>> <chkwok at digibites.nl>
>> >> >>>>> wrote:
>> >> >>>>>>
>> >> >>>>>> Hi Teras,
>> >> >>>>>>
>> >> >>>>>> I think you may want to look into sizing the new and especially
>> >> >>>>>> the
>> >> >>>>>> survivor spaces differently. We run something similar to what
>> >> >>>>>> you
>> >> >>>>>> described,
>> >> >>>>>> high volume request processing with large dataset loading, and
>> >> >>>>>> what
>> >> >>>>>> we've
>> >> >>>>>> seen at the start is that the survivor spaces are completely
>> >> >>>>>> overloaded,
>> >> >>>>>> causing premature promotions.
>> >> >>>>>>
>> >> >>>>>> We've configured our vm with the following goals/guideline:
>> >> >>>>>>
>> >> >>>>>> old space is for semi-permanent data, living for at least 30s,
>> >> >>>>>> average ~10
>> >> >>>>>> minutes
>> >> >>>>>> new space contains only temporary and just loaded data
>> >> >>>>>> surviving objects from new should never reach old in 1 gc, so
>> >> >>>>>> the
>> >> >>>>>> survivor
>> >> >>>>>> space may never be 100% full
>> >> >>>>>>
>> >> >>>>>> With jstat -gcutil `pidof java` 2000, we see things like:
>> >> >>>>>>
>> >> >>>>>>   S0     S1     E      O      P     YGC     YGCT    FGC    FGCT
>> >> >>>>>> GCT
>> >> >>>>>>  70.20   0.00  19.65  57.60  59.90 124808 29474.299  2498
>> >> >>>>>>  191.110
>> >> >>>>>> 29665.409
>> >> >>>>>>  70.20   0.00  92.89  57.60  59.90 124808 29474.299  2498
>> >> >>>>>>  191.110
>> >> >>>>>> 29665.409
>> >> >>>>>>  70.20   0.00  93.47  57.60  59.90 124808 29474.299  2498
>> >> >>>>>>  191.110
>> >> >>>>>> 29665.409
>> >> >>>>>>   0.00  65.69  78.07  58.09  59.90 124809 29474.526  2498
>> >> >>>>>>  191.110
>> >> >>>>>> 29665.636
>> >> >>>>>>  84.97   0.00  48.19  58.57  59.90 124810 29474.774  2498
>> >> >>>>>>  191.110
>> >> >>>>>> 29665.884
>> >> >>>>>>  84.97   0.00  81.30  58.57  59.90 124810 29474.774  2498
>> >> >>>>>>  191.110
>> >> >>>>>> 29665.884
>> >> >>>>>>   0.00  62.64  27.22  59.12  59.90 124811 29474.992  2498
>> >> >>>>>>  191.110
>> >> >>>>>> 29666.102
>> >> >>>>>>   0.00  62.64  54.47  59.12  59.90 124811 29474.992  2498
>> >> >>>>>>  191.110
>> >> >>>>>> 29666.102
>> >> >>>>>>  75.68   0.00   6.80  59.53  59.90 124812 29475.228  2498
>> >> >>>>>>  191.110
>> >> >>>>>> 29666.338
>> >> >>>>>>  75.68   0.00  23.38  59.53  59.90 124812 29475.228  2498
>> >> >>>>>>  191.110
>> >> >>>>>> 29666.338
>> >> >>>>>>  75.68   0.00  27.72  59.53  59.90 124812 29475.228  2498
>> >> >>>>>>  191.110
>> >> >>>>>> 29666.338
>> >> >>>>>>
>> >> >>>>>> If you follow the lines, you can see Eden fill up to 100% on
>> >> >>>>>> line
>> >> >>>>>> 4,
>> >> >>>>>> surviving objects are copied into S1, S0 is collected and added
>> >> >>>>>> 0.49% to
>> >> >>>>>> Old. On line 5, another GC happened, with Eden->S0, S1->Old,
>> >> >>>>>> etc.
>> >> >>>>>> No objects
>> >> >>>>>> is ever transferred from Eden to Old, unless there's a huge peak
>> >> >>>>>> of
>> >> >>>>>> requests.
>> >> >>>>>>
>> >> >>>>>> This is with a: 32GB heap, Mxn1200M, SurvivorRatio 2 (600MB
>> >> >>>>>> Eden,
>> >> >>>>>> 300MB
>> >> >>>>>> S0, 300MB S1), MaxTenuringThreshold 1 (whatever is still alive
>> >> >>>>>> in
>> >> >>>>>> S0/1 on
>> >> >>>>>> the second GC is copied to old, don't wait, web requests are
>> >> >>>>>> quite
>> >> >>>>>> bursty).
>> >> >>>>>> With about 1 collection every 2-5 seconds, objects promoted to
>> >> >>>>>> Old
>> >> >>>>>> must live
>> >> >>>>>> for at 4-10 seconds; as that's longer than an average request
>> >> >>>>>> (50ms-1s),
>> >> >>>>>> none of the temporary data ever makes it into Old, which is much
>> >> >>>>>> more
>> >> >>>>>> expensive to collect. It works even with a higher than default
>> >> >>>>>> CMSInitiatingOccupancyFraction=76 to optimize for space
>> >> >>>>>> available
>> >> >>>>>> for the
>> >> >>>>>> large data cache we have.
>> >> >>>>>>
>> >> >>>>>>
>> >> >>>>>> With your config of 400MB Total new, with 350MB Eden, 25MB S0,
>> >> >>>>>> 25MB
>> >> >>>>>> S1
>> >> >>>>>> (SurvivorRatio 8), no tenuring threshold, I think loads of new
>> >> >>>>>> objects get
>> >> >>>>>> copied from Eden to Old directly, causing trouble for the CMS.
>> >> >>>>>> You
>> >> >>>>>> can use
>> >> >>>>>> jstat to get live stats and tweak until it doesn't happen. If
>> >> >>>>>> you
>> >> >>>>>> can't make
>> >> >>>>>> changes on live that easil, try doubling the new size indeed,
>> >> >>>>>> with
>> >> >>>>>> a 400
>> >> >>>>>> Eden, 200 S0, 200 S1 and MaxTenuringThreshold 1 setting. It's
>> >> >>>>>> probably
>> >> >>>>>> overkill, but if should solve the problem if it is caused by
>> >> >>>>>> premature
>> >> >>>>>> promotion.
>> >> >>>>>>
>> >> >>>>>>
>> >> >>>>>> Chi Ho Kwok
>> >> >>>>>>
>> >> >>>>>>
>> >> >>>>>> On Tue, Feb 21, 2012 at 5:55 PM, Taras Tielkes
>> >> >>>>>> <taras.tielkes at gmail.com>
>> >> >>>>>> wrote:
>> >> >>>>>>>
>> >> >>>>>>> Hi,
>> >> >>>>>>>
>> >> >>>>>>> We've removed the "-XX:+CMSScavengeBeforeRemark" setting from
>> >> >>>>>>> 50%
>> >> >>>>>>> of
>> >> >>>>>>> our production nodes.
>> >> >>>>>>> After running for a few weeks, it seems that there's no impact
>> >> >>>>>>> from
>> >> >>>>>>> removing this option.
>> >> >>>>>>> Which is good, since it seems we can remove it from the other
>> >> >>>>>>> nodes as
>> >> >>>>>>> well, simplifying our overall JVM configuration ;-)
>> >> >>>>>>>
>> >> >>>>>>> However, we're still seeing promotion failures on all nodes,
>> >> >>>>>>> once
>> >> >>>>>>> every day or so.
>> >> >>>>>>>
>> >> >>>>>>> There's still the "Magic 1026": this accounts for ~60% of the
>> >> >>>>>>> promotion failures that we're seeing (single ParNew thread
>> >> >>>>>>> thread,
>> >> >>>>>>> 1026 failure size):
>> >> >>>>>>> --------------------
>> >> >>>>>>> 2012-02-06T09:13:51.806+0100: 328095.085: [GC 328095.086:
>> >> >>>>>>> [ParNew:
>> >> >>>>>>> 359895K->29357K(368640K), 0.0429070 secs]
>> >> >>>>>>> 3471021K->3143476K(5201920K), 0.0434950 secs] [Times: user=0.32
>> >> >>>>>>> sys=0.00, real=0.04 secs]
>> >> >>>>>>> 2012-02-06T09:13:55.922+0100: 328099.201: [GC 328099.201:
>> >> >>>>>>> [ParNew:
>> >> >>>>>>> 357037K->31817K(368640K), 0.0429130 secs]
>> >> >>>>>>> 3471156K->3148946K(5201920K), 0.0434930 secs] [Times: user=0.31
>> >> >>>>>>> sys=0.00, real=0.04 secs]
>> >> >>>>>>> 2012-02-06T09:13:59.044+0100: 328102.324: [GC 328102.324:
>> >> >>>>>>> [ParNew
>> >> >>>>>>> (promotion failure size = 1026)  (promotion failed):
>> >> >>>>>>> 359497K->368640K(368640K), 0.2226790 secs]328102.547: [CMS:
>> >> >>>>>>> 3125609K->451515K(4833280K), 5.6225880 secs] 3476626K->4515
>> >> >>>>>>> 15K(5201920K), [CMS Perm : 124373K->124353K(262144K)],
>> >> >>>>>>> 5.8459380
>> >> >>>>>>> secs]
>> >> >>>>>>> [Times: user=6.20 sys=0.01, real=5.85 secs]
>> >> >>>>>>> 2012-02-06T09:14:05.243+0100: 328108.522: [GC 328108.523:
>> >> >>>>>>> [ParNew:
>> >> >>>>>>> 327680K->40960K(368640K), 0.0319160 secs]
>> >> >>>>>>> 779195K->497658K(5201920K),
>> >> >>>>>>> 0.0325360 secs] [Times: user=0.21 sys=0.01, real=0.03 secs]
>> >> >>>>>>> 2012-02-06T09:14:07.836+0100: 328111.116: [GC 328111.116:
>> >> >>>>>>> [ParNew:
>> >> >>>>>>> 368640K->32785K(368640K), 0.0744670 secs]
>> >> >>>>>>> 825338K->520234K(5201920K),
>> >> >>>>>>> 0.0750390 secs] [Times: user=0.40 sys=0.02, real=0.08 secs]
>> >> >>>>>>> --------------------
>> >> >>>>>>> Given the 1026 word size, I'm wondering if I should be hunting
>> >> >>>>>>> for
>> >> >>>>>>> an
>> >> >>>>>>> overuse of BufferedInputStream/BufferedOutoutStream, since both
>> >> >>>>>>> have
>> >> >>>>>>> 8192 as a default buffer size.
>> >> >>>>>>>
>> >> >>>>>>> The second group of promotion failures look like this (multiple
>> >> >>>>>>> ParNew
>> >> >>>>>>> threads, small failure sizes):
>> >> >>>>>>> --------------------
>> >> >>>>>>> 2012-02-06T09:50:15.773+0100: 328756.964: [GC 328756.964:
>> >> >>>>>>> [ParNew:
>> >> >>>>>>> 356116K->29934K(368640K), 0.0461100 secs]
>> >> >>>>>>> 3203863K->2880162K(5201920K), 0.0468870 secs] [Times: user=0.34
>> >> >>>>>>> sys=0.01, real=0.05 secs]
>> >> >>>>>>> 2012-02-06T09:50:19.153+0100: 328760.344: [GC 328760.344:
>> >> >>>>>>> [ParNew:
>> >> >>>>>>> 357614K->30359K(368640K), 0.0454680 secs]
>> >> >>>>>>> 3207842K->2882892K(5201920K), 0.0462280 secs] [Times: user=0.33
>> >> >>>>>>> sys=0.01, real=0.05 secs]
>> >> >>>>>>> 2012-02-06T09:50:22.658+0100: 328763.849: [GC 328763.849:
>> >> >>>>>>> [ParNew
>> >> >>>>>>> (1:
>> >> >>>>>>> promotion failure size = 25)  (4: promotion failure size = 25)
>> >> >>>>>>>  (6:
>> >> >>>>>>> promotion failure size = 25)  (7: promotion failure size = 144)
>> >> >>>>>>> (promotion failed): 358039K->358358
>> >> >>>>>>> K(368640K), 0.2148680 secs]328764.064: [CMS:
>> >> >>>>>>> 2854709K->446750K(4833280K), 5.8368270 secs]
>> >> >>>>>>> 3210572K->446750K(5201920K), [CMS Perm :
>> >> >>>>>>> 124670K->124644K(262144K)],
>> >> >>>>>>> 6.0525230 secs] [Times: user=6.32 sys=0.00, real=6.05 secs]
>> >> >>>>>>> 2012-02-06T09:50:29.896+0100: 328771.086: [GC 328771.087:
>> >> >>>>>>> [ParNew:
>> >> >>>>>>> 327680K->22569K(368640K), 0.0227080 secs]
>> >> >>>>>>> 774430K->469319K(5201920K),
>> >> >>>>>>> 0.0235020 secs] [Times: user=0.16 sys=0.00, real=0.02 secs]
>> >> >>>>>>> 2012-02-06T09:50:31.076+0100: 328772.266: [GC 328772.267:
>> >> >>>>>>> [ParNew:
>> >> >>>>>>> 350249K->22264K(368640K), 0.0235480 secs]
>> >> >>>>>>> 796999K->469014K(5201920K),
>> >> >>>>>>> 0.0243000 secs] [Times: user=0.18 sys=0.01, real=0.02 secs]
>> >> >>>>>>> --------------------
>> >> >>>>>>>
>> >> >>>>>>> We're going to try to double the new size on a single node, to
>> >> >>>>>>> see
>> >> >>>>>>> the
>> >> >>>>>>> effects of that.
>> >> >>>>>>>
>> >> >>>>>>> Beyond this experiment, is there any additional data I can
>> >> >>>>>>> collect
>> >> >>>>>>> to
>> >> >>>>>>> better understand the nature of the promotion failures?
>> >> >>>>>>> Am I facing collecting free list statistics at this point?
>> >> >>>>>>>
>> >> >>>>>>> Thanks,
>> >> >>>>>>> Taras
>> >> >>>>>>
>> >> >>>>>>
>> >> >>>>>> _______________________________________________
>> >> >>>>>> hotspot-gc-use mailing list
>> >> >>>>>> hotspot-gc-use at openjdk.java.net
>> >> >>>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>> >> >>>>>>
>> >> >>>>>
>> >> >> _______________________________________________
>> >> >> hotspot-gc-use mailing list
>> >> >> hotspot-gc-use at openjdk.java.net
>> >> >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>> >> _______________________________________________
>> >> hotspot-gc-use mailing list
>> >> hotspot-gc-use at openjdk.java.net
>> >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>> >
>> >
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>