Promotion failures: indication of CMS fragmentation?

Tue Feb 21 15:40:52 PST 2012

I agree that premature promotions are almost always the first and most
important thing to fix when running
into fragmentation or overload issues with CMS. However, I can also imagine
long-lived objects with a highly
non-stationary size distribution which can also cause problems for CMS
despite best efforts to tune against
premature promotion.

I didn't think Treas was running with MTT=0, although MTT > 0 is no recipe
for avoiding premature promotion
with bursty loads that case overflow the survivor spaces -- as you say
large survivor spaces with a low
TargetSurvivorRatio -- so as to leave plenty of space to absorb/accommodate
spiking/bursty loads  is
definitely a "best practice" for CMS (and possibly for other concurrent
collectors as well).

One thing Taras can do to see if premature promotion might be an issue is
to look at the tenuring
threshold in his case. A rough proxy (if PrintTenuringDistribution is not
enabled) is to look at the
promotion volume per scavenge. It may be possible, if premature promotion
is a cause, to see
some kind of medium-term correlation between high promotion volume and
eventual promotion
failure despite frequent CMS collections.

One other point which may or may not be relevant. I see that Taras is not
using CompressedOops...
Using that alone would greatly decrease memory pressure and provide more
breathing room to CMS,
which is also almost always a good idea.

-- ramki

On Tue, Feb 21, 2012 at 10:16 AM, Chi Ho Kwok <chkwok at digibites.nl> wrote:

> Hi Teras,
>
> I think you may want to look into sizing the new and especially the
> survivor spaces differently. We run something similar to what you
> described, high volume request processing with large dataset loading, and
> what we've seen at the start is that the survivor spaces are completely
> overloaded, causing premature promotions.
>
> We've configured our vm with the following goals/guideline:
>
>    - old space is for semi-permanent data, living for at least 30s,
>    average ~10 minutes
>    - new space contains only temporary and just loaded data
>    - surviving objects from new should never reach old in 1 gc, so the
>    survivor space may never be 100% full
>
> With jstat -gcutil `pidof java` 2000, we see things like:
>
>   S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT
>  70.20   0.00  19.65  57.60  59.90 124808 29474.299  2498  191.110
> 29665.409
>  70.20   0.00  92.89  57.60  59.90 124808 29474.299  2498  191.110
> 29665.409
>  70.20   0.00  93.47  57.60  59.90 124808 29474.299  2498  191.110
> 29665.409
>   0.00  65.69  78.07  58.09  59.90 124809 29474.526  2498  191.110
> 29665.636
>  84.97   0.00  48.19  58.57  59.90 124810 29474.774  2498  191.110
> 29665.884
>  84.97   0.00  81.30  58.57  59.90 124810 29474.774  2498  191.110
> 29665.884
>   0.00  62.64  27.22  59.12  59.90 124811 29474.992  2498  191.110
> 29666.102
>   0.00  62.64  54.47  59.12  59.90 124811 29474.992  2498  191.110
> 29666.102
>  75.68   0.00   6.80  59.53  59.90 124812 29475.228  2498  191.110
> 29666.338
>  75.68   0.00  23.38  59.53  59.90 124812 29475.228  2498  191.110
> 29666.338
>  75.68   0.00  27.72  59.53  59.90 124812 29475.228  2498  191.110
> 29666.338
>
> If you follow the lines, you can see Eden fill up to 100% on line 4,
> surviving objects are copied into S1, S0 is collected and added 0.49% to
> Old. On line 5, another GC happened, with Eden->S0, S1->Old, etc. No
> objects is ever transferred from Eden to Old, unless there's a huge peak of
> requests.
>
> This is with a: 32GB heap, Mxn1200M, SurvivorRatio 2 (600MB Eden, 300MB
> S0, 300MB S1), MaxTenuringThreshold 1 (whatever is still alive in S0/1 on
> the second GC is copied to old, don't wait, web requests are quite bursty).
> With about 1 collection every 2-5 seconds, objects promoted to Old must
> live for at 4-10 seconds; as that's longer than an average request
> (50ms-1s), none of the temporary data ever makes it into Old, which is much
> more expensive to collect. It works even with a higher than default
> CMSInitiatingOccupancyFraction=76 to optimize for space available for the
> large data cache we have.
>
>
> With your config of 400MB Total new, with 350MB Eden, 25MB S0, 25MB S1
> (SurvivorRatio 8), no tenuring threshold, I think loads of new objects get
> copied from Eden to Old directly, causing trouble for the CMS. You can use
> jstat to get live stats and tweak until it doesn't happen. If you can't
> make changes on live that easil, try doubling the new size indeed, with a
> 400 Eden, 200 S0, 200 S1 and MaxTenuringThreshold 1 setting. It's probably
> overkill, but if should solve the problem if it is caused by premature
> promotion.
>
>
> Chi Ho Kwok
>
>
> On Tue, Feb 21, 2012 at 5:55 PM, Taras Tielkes <taras.tielkes at gmail.com>wrote:
>
>> Hi,
>>
>> We've removed the "-XX:+CMSScavengeBeforeRemark" setting from 50% of
>> our production nodes.
>> After running for a few weeks, it seems that there's no impact from
>> removing this option.
>> Which is good, since it seems we can remove it from the other nodes as
>> well, simplifying our overall JVM configuration ;-)
>>
>> However, we're still seeing promotion failures on all nodes, once
>> every day or so.
>>
>> There's still the "Magic 1026": this accounts for ~60% of the
>> promotion failures that we're seeing (single ParNew thread thread,
>> 1026 failure size):
>> --------------------
>> 2012-02-06T09:13:51.806+0100: 328095.085: [GC 328095.086: [ParNew:
>> 359895K->29357K(368640K), 0.0429070 secs]
>> 3471021K->3143476K(5201920K), 0.0434950 secs] [Times: user=0.32
>> sys=0.00, real=0.04 secs]
>> 2012-02-06T09:13:55.922+0100: 328099.201: [GC 328099.201: [ParNew:
>> 357037K->31817K(368640K), 0.0429130 secs]
>> 3471156K->3148946K(5201920K), 0.0434930 secs] [Times: user=0.31
>> sys=0.00, real=0.04 secs]
>> 2012-02-06T09:13:59.044+0100: 328102.324: [GC 328102.324: [ParNew
>> (promotion failure size = 1026)  (promotion failed):
>> 359497K->368640K(368640K), 0.2226790 secs]328102.547: [CMS:
>> 3125609K->451515K(4833280K), 5.6225880 secs] 3476626K->4515
>> 15K(5201920K), [CMS Perm : 124373K->124353K(262144K)], 5.8459380 secs]
>> [Times: user=6.20 sys=0.01, real=5.85 secs]
>> 2012-02-06T09:14:05.243+0100: 328108.522: [GC 328108.523: [ParNew:
>> 327680K->40960K(368640K), 0.0319160 secs] 779195K->497658K(5201920K),
>> 0.0325360 secs] [Times: user=0.21 sys=0.01, real=0.03 secs]
>> 2012-02-06T09:14:07.836+0100: 328111.116: [GC 328111.116: [ParNew:
>> 368640K->32785K(368640K), 0.0744670 secs] 825338K->520234K(5201920K),
>> 0.0750390 secs] [Times: user=0.40 sys=0.02, real=0.08 secs]
>> --------------------
>> Given the 1026 word size, I'm wondering if I should be hunting for an
>> overuse of BufferedInputStream/BufferedOutoutStream, since both have
>> 8192 as a default buffer size.
>>
>> The second group of promotion failures look like this (multiple ParNew
>> threads, small failure sizes):
>> --------------------
>> 2012-02-06T09:50:15.773+0100: 328756.964: [GC 328756.964: [ParNew:
>> 356116K->29934K(368640K), 0.0461100 secs]
>> 3203863K->2880162K(5201920K), 0.0468870 secs] [Times: user=0.34
>> sys=0.01, real=0.05 secs]
>> 2012-02-06T09:50:19.153+0100: 328760.344: [GC 328760.344: [ParNew:
>> 357614K->30359K(368640K), 0.0454680 secs]
>> 3207842K->2882892K(5201920K), 0.0462280 secs] [Times: user=0.33
>> sys=0.01, real=0.05 secs]
>> 2012-02-06T09:50:22.658+0100: 328763.849: [GC 328763.849: [ParNew (1:
>> promotion failure size = 25)  (4: promotion failure size = 25)  (6:
>> promotion failure size = 25)  (7: promotion failure size = 144)
>> (promotion failed): 358039K->358358
>> K(368640K), 0.2148680 secs]328764.064: [CMS:
>> 2854709K->446750K(4833280K), 5.8368270 secs]
>> 3210572K->446750K(5201920K), [CMS Perm : 124670K->124644K(262144K)],
>> 6.0525230 secs] [Times: user=6.32 sys=0.00, real=6.05 secs]
>> 2012-02-06T09:50:29.896+0100: 328771.086: [GC 328771.087: [ParNew:
>> 327680K->22569K(368640K), 0.0227080 secs] 774430K->469319K(5201920K),
>> 0.0235020 secs] [Times: user=0.16 sys=0.00, real=0.02 secs]
>> 2012-02-06T09:50:31.076+0100: 328772.266: [GC 328772.267: [ParNew:
>> 350249K->22264K(368640K), 0.0235480 secs] 796999K->469014K(5201920K),
>> 0.0243000 secs] [Times: user=0.18 sys=0.01, real=0.02 secs]
>> --------------------
>>
>> We're going to try to double the new size on a single node, to see the
>> effects of that.
>>
>> Beyond this experiment, is there any additional data I can collect to
>> better understand the nature of the promotion failures?
>> Am I facing collecting free list statistics at this point?
>>
>> Thanks,
>> Taras
>>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120221/bd819e6e/attachment.html