Promotion failures: indication of CMS fragmentation?

Wed Feb 22 01:51:19 PST 2012

(this time properly responding to the list alias)
Hi Srinivas,

We're running 1.6.0 u29 on Linux x64. My understanding is that
CompressedOops is enabled by default since u23.

At least this page seems to support that:
http://docs.oracle.com/javase/7/docs/technotes/guides/vm/performance-enhancements-7.html

Regarding the other remarks (also from Todd and Chi), I'll comment
later. The first thing on my list is to collect
PrintTenuringDistribution data now.

Kind regards,
Taras

On Wed, Feb 22, 2012 at 10:50 AM, Taras Tielkes <taras.tielkes at gmail.com> wrote:
> Hi Srinivas,
>
> We're running 1.6.0 u29 on Linux x64. My understanding is that
> CompressedOops is enabled by default since u23.
>
> At least this page seems to support that:
> http://docs.oracle.com/javase/7/docs/technotes/guides/vm/performance-enhancements-7.html
>
> Regarding the other remarks (also from Todd and Chi), I'll comment
> later. The first thing on my list is to collect
> PrintTenuringDistribution data now.
>
> Kind regards,
> Taras
>
> On Wed, Feb 22, 2012 at 12:40 AM, Srinivas Ramakrishna
> <ysr1729 at gmail.com> wrote:
>> I agree that premature promotions are almost always the first and most
>> important thing to fix when running
>> into fragmentation or overload issues with CMS. However, I can also imagine
>> long-lived objects with a highly
>> non-stationary size distribution which can also cause problems for CMS
>> despite best efforts to tune against
>> premature promotion.
>>
>> I didn't think Treas was running with MTT=0, although MTT > 0 is no recipe
>> for avoiding premature promotion
>> with bursty loads that case overflow the survivor spaces -- as you say large
>> survivor spaces with a low
>> TargetSurvivorRatio -- so as to leave plenty of space to absorb/accommodate
>> spiking/bursty loads  is
>> definitely a "best practice" for CMS (and possibly for other concurrent
>> collectors as well).
>>
>> One thing Taras can do to see if premature promotion might be an issue is to
>> look at the tenuring
>> threshold in his case. A rough proxy (if PrintTenuringDistribution is not
>> enabled) is to look at the
>> promotion volume per scavenge. It may be possible, if premature promotion is
>> a cause, to see
>> some kind of medium-term correlation between high promotion volume and
>> eventual promotion
>> failure despite frequent CMS collections.
>>
>> One other point which may or may not be relevant. I see that Taras is not
>> using CompressedOops...
>> Using that alone would greatly decrease memory pressure and provide more
>> breathing room to CMS,
>> which is also almost always a good idea.
>>
>> -- ramki
>>
>> On Tue, Feb 21, 2012 at 10:16 AM, Chi Ho Kwok <chkwok at digibites.nl> wrote:
>>>
>>> Hi Teras,
>>>
>>> I think you may want to look into sizing the new and especially the
>>> survivor spaces differently. We run something similar to what you described,
>>> high volume request processing with large dataset loading, and what we've
>>> seen at the start is that the survivor spaces are completely overloaded,
>>> causing premature promotions.
>>>
>>> We've configured our vm with the following goals/guideline:
>>>
>>> old space is for semi-permanent data, living for at least 30s, average ~10
>>> minutes
>>> new space contains only temporary and just loaded data
>>> surviving objects from new should never reach old in 1 gc, so the survivor
>>> space may never be 100% full
>>>
>>> With jstat -gcutil `pidof java` 2000, we see things like:
>>>
>>>   S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT
>>>  70.20   0.00  19.65  57.60  59.90 124808 29474.299  2498  191.110
>>> 29665.409
>>>  70.20   0.00  92.89  57.60  59.90 124808 29474.299  2498  191.110
>>> 29665.409
>>>  70.20   0.00  93.47  57.60  59.90 124808 29474.299  2498  191.110
>>> 29665.409
>>>   0.00  65.69  78.07  58.09  59.90 124809 29474.526  2498  191.110
>>> 29665.636
>>>  84.97   0.00  48.19  58.57  59.90 124810 29474.774  2498  191.110
>>> 29665.884
>>>  84.97   0.00  81.30  58.57  59.90 124810 29474.774  2498  191.110
>>> 29665.884
>>>   0.00  62.64  27.22  59.12  59.90 124811 29474.992  2498  191.110
>>> 29666.102
>>>   0.00  62.64  54.47  59.12  59.90 124811 29474.992  2498  191.110
>>> 29666.102
>>>  75.68   0.00   6.80  59.53  59.90 124812 29475.228  2498  191.110
>>> 29666.338
>>>  75.68   0.00  23.38  59.53  59.90 124812 29475.228  2498  191.110
>>> 29666.338
>>>  75.68   0.00  27.72  59.53  59.90 124812 29475.228  2498  191.110
>>> 29666.338
>>>
>>> If you follow the lines, you can see Eden fill up to 100% on line 4,
>>> surviving objects are copied into S1, S0 is collected and added 0.49% to
>>> Old. On line 5, another GC happened, with Eden->S0, S1->Old, etc. No objects
>>> is ever transferred from Eden to Old, unless there's a huge peak of
>>> requests.
>>>
>>> This is with a: 32GB heap, Mxn1200M, SurvivorRatio 2 (600MB Eden, 300MB
>>> S0, 300MB S1), MaxTenuringThreshold 1 (whatever is still alive in S0/1 on
>>> the second GC is copied to old, don't wait, web requests are quite bursty).
>>> With about 1 collection every 2-5 seconds, objects promoted to Old must live
>>> for at 4-10 seconds; as that's longer than an average request (50ms-1s),
>>> none of the temporary data ever makes it into Old, which is much more
>>> expensive to collect. It works even with a higher than default
>>> CMSInitiatingOccupancyFraction=76 to optimize for space available for the
>>> large data cache we have.
>>>
>>>
>>> With your config of 400MB Total new, with 350MB Eden, 25MB S0, 25MB S1
>>> (SurvivorRatio 8), no tenuring threshold, I think loads of new objects get
>>> copied from Eden to Old directly, causing trouble for the CMS. You can use
>>> jstat to get live stats and tweak until it doesn't happen. If you can't make
>>> changes on live that easil, try doubling the new size indeed, with a 400
>>> Eden, 200 S0, 200 S1 and MaxTenuringThreshold 1 setting. It's probably
>>> overkill, but if should solve the problem if it is caused by premature
>>> promotion.
>>>
>>>
>>> Chi Ho Kwok
>>>
>>>
>>> On Tue, Feb 21, 2012 at 5:55 PM, Taras Tielkes <taras.tielkes at gmail.com>
>>> wrote:
>>>>
>>>> Hi,
>>>>
>>>> We've removed the "-XX:+CMSScavengeBeforeRemark" setting from 50% of
>>>> our production nodes.
>>>> After running for a few weeks, it seems that there's no impact from
>>>> removing this option.
>>>> Which is good, since it seems we can remove it from the other nodes as
>>>> well, simplifying our overall JVM configuration ;-)
>>>>
>>>> However, we're still seeing promotion failures on all nodes, once
>>>> every day or so.
>>>>
>>>> There's still the "Magic 1026": this accounts for ~60% of the
>>>> promotion failures that we're seeing (single ParNew thread thread,
>>>> 1026 failure size):
>>>> --------------------
>>>> 2012-02-06T09:13:51.806+0100: 328095.085: [GC 328095.086: [ParNew:
>>>> 359895K->29357K(368640K), 0.0429070 secs]
>>>> 3471021K->3143476K(5201920K), 0.0434950 secs] [Times: user=0.32
>>>> sys=0.00, real=0.04 secs]
>>>> 2012-02-06T09:13:55.922+0100: 328099.201: [GC 328099.201: [ParNew:
>>>> 357037K->31817K(368640K), 0.0429130 secs]
>>>> 3471156K->3148946K(5201920K), 0.0434930 secs] [Times: user=0.31
>>>> sys=0.00, real=0.04 secs]
>>>> 2012-02-06T09:13:59.044+0100: 328102.324: [GC 328102.324: [ParNew
>>>> (promotion failure size = 1026)  (promotion failed):
>>>> 359497K->368640K(368640K), 0.2226790 secs]328102.547: [CMS:
>>>> 3125609K->451515K(4833280K), 5.6225880 secs] 3476626K->4515
>>>> 15K(5201920K), [CMS Perm : 124373K->124353K(262144K)], 5.8459380 secs]
>>>> [Times: user=6.20 sys=0.01, real=5.85 secs]
>>>> 2012-02-06T09:14:05.243+0100: 328108.522: [GC 328108.523: [ParNew:
>>>> 327680K->40960K(368640K), 0.0319160 secs] 779195K->497658K(5201920K),
>>>> 0.0325360 secs] [Times: user=0.21 sys=0.01, real=0.03 secs]
>>>> 2012-02-06T09:14:07.836+0100: 328111.116: [GC 328111.116: [ParNew:
>>>> 368640K->32785K(368640K), 0.0744670 secs] 825338K->520234K(5201920K),
>>>> 0.0750390 secs] [Times: user=0.40 sys=0.02, real=0.08 secs]
>>>> --------------------
>>>> Given the 1026 word size, I'm wondering if I should be hunting for an
>>>> overuse of BufferedInputStream/BufferedOutoutStream, since both have
>>>> 8192 as a default buffer size.
>>>>
>>>> The second group of promotion failures look like this (multiple ParNew
>>>> threads, small failure sizes):
>>>> --------------------
>>>> 2012-02-06T09:50:15.773+0100: 328756.964: [GC 328756.964: [ParNew:
>>>> 356116K->29934K(368640K), 0.0461100 secs]
>>>> 3203863K->2880162K(5201920K), 0.0468870 secs] [Times: user=0.34
>>>> sys=0.01, real=0.05 secs]
>>>> 2012-02-06T09:50:19.153+0100: 328760.344: [GC 328760.344: [ParNew:
>>>> 357614K->30359K(368640K), 0.0454680 secs]
>>>> 3207842K->2882892K(5201920K), 0.0462280 secs] [Times: user=0.33
>>>> sys=0.01, real=0.05 secs]
>>>> 2012-02-06T09:50:22.658+0100: 328763.849: [GC 328763.849: [ParNew (1:
>>>> promotion failure size = 25)  (4: promotion failure size = 25)  (6:
>>>> promotion failure size = 25)  (7: promotion failure size = 144)
>>>> (promotion failed): 358039K->358358
>>>> K(368640K), 0.2148680 secs]328764.064: [CMS:
>>>> 2854709K->446750K(4833280K), 5.8368270 secs]
>>>> 3210572K->446750K(5201920K), [CMS Perm : 124670K->124644K(262144K)],
>>>> 6.0525230 secs] [Times: user=6.32 sys=0.00, real=6.05 secs]
>>>> 2012-02-06T09:50:29.896+0100: 328771.086: [GC 328771.087: [ParNew:
>>>> 327680K->22569K(368640K), 0.0227080 secs] 774430K->469319K(5201920K),
>>>> 0.0235020 secs] [Times: user=0.16 sys=0.00, real=0.02 secs]
>>>> 2012-02-06T09:50:31.076+0100: 328772.266: [GC 328772.267: [ParNew:
>>>> 350249K->22264K(368640K), 0.0235480 secs] 796999K->469014K(5201920K),
>>>> 0.0243000 secs] [Times: user=0.18 sys=0.01, real=0.02 secs]
>>>> --------------------
>>>>
>>>> We're going to try to double the new size on a single node, to see the
>>>> effects of that.
>>>>
>>>> Beyond this experiment, is there any additional data I can collect to
>>>> better understand the nature of the promotion failures?
>>>> Am I facing collecting free list statistics at this point?
>>>>
>>>> Thanks,
>>>> Taras
>>>
>>>
>>> _______________________________________________
>>> hotspot-gc-use mailing list
>>> hotspot-gc-use at openjdk.java.net
>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>
>>