Identifying concurrent mode failures caused by fragmentation

Tue Nov 1 13:50:29 UTC 2011

Jon,

Incremental CMS (iCMS) was written for a specific use case - 1 or 2 hardware
threads where concurrent activity by CMS would look like a STW (if
only 1 hardware thread) or a high tax on the cpu cycles (2 hardware
threads).   It has a higher overhead and also is less efficient in terms
of identifying garbage.  The latter is because iCMS spreads out the
concurrent work so that objects that it has identified as live earlier
may actually be dead when the dead objects are swept up.  It's
worth testing with regular CMS instead of iCMS.

BTW, for a 6g heap your young gen might be on the small side.
A larger young gen allows more objects to die in the young gen
and puts less pressure on the old (CMS) gen (i.e. fewer objects
get promoted).    Next time you want to play with your GC
settings, try a larger young gen.  Not sure if iCMS pushed you
toward a smaller young gen.  I personally don't have much
experience with iCMS but with regular CMS, I would expect you to
get better throughput with a larger young gen.  As usual the
devil is in the details.

Jon

On 11/01/11 05:58, Jon Bright wrote:
> Jon,
>
> Indeed, the problem appears to have gone away with today's update to
> u26.  (We plan to migrate further, but we're fairly conservative about
> rolling out new versions, and we already had u26 in use elsewhere.)
>
> With regard to your (and Kris') question on incremental mode: I started
> out by reading the tuning guide at
>
> 	http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html#icms
>
> and followed that up by reading various other pages and your blog (which
> was very helpful in terms of giving a sense of how to think about GC -
> thank you!).
>
> Whilst I was fairly ambivalent about incremental mode (we have at least
> 4 logical CPUs in each machine), we'd been using it in the past and I
> didn't see anything specifically mentioning that it was obsolete.  Is
> there a better reference on this subject?
>
> I'll certainly now try a few benchmarking/test runs with incremental
> mode turned off and roll that out if all is well.
>
> Thanks!
>
> Jon
>
> On 01.11.2011 03:43, Jon Masamitsu wrote:
>> Jon,
>>
>> I haven't looked at the longer log but in general I've found the
>> information in the GC logs inadequate to figure out if the
>> problem is fragmentation. But more important, there has
>> been some good work in recent versions of hotspot so that
>> we're more successful at combating fragmentation. Try
>> the latest release and see if it helps (u26 should be good
>> enough).
>>
>> Jon
>>
>> On 10/31/11 06:06, Jon Bright wrote:
>>> Hi,
>>>
>>> We have an application running with a 6GB heap (complete parameters
>>> below). Mostly it has a fairly low turnover of memory use, but on
>>> occasion, it will come under some pressure as it reloads a large
>>> in-memory data set from a database.
>>>
>>> Sometimes in this situation, we'll see a concurrent mode failure.
>>> Here's one failure:
>>>
>>> 20021.464: [GC 20021.465: [ParNew: 13093K->3939K(76672K), 0.0569240
>>> secs]20021.522: [CMS20023.747: [CMS-concurrent-mark: 11.403/29.029
>>> secs] [Times: user=41.11 sys=1.03, real=29.03 secs]
>>> (concurrent mode failure): 3873922K->2801744K(6206272K), 30.7900180
>>> secs] 3886215K->2801744K(6282944K), [CMS Perm :
>>> 142884K->142834K(524288K)] icms_dc=33 , 30.8473830 secs] [Times:
>>> user=30.26 sys=0.71, real=30.85 secs]
>>> Total time for which application threads were stopped: 30.8484460 seconds
>>>
>>> (I've attached a lengthier log including the previous and subsequent
>>> CMS collection.)
>>>
>>> Am I correct in thinking that this failure can basically only be
>>> caused by fragmentation? Both young and old seem to have plenty of
>>> space. There doesn't seem to be any sign that the tenured generation
>>> would run out of space before CMS completes. Fragmentation is the only
>>> remaining cause that occurs to me.
>>>
>>> We're running with 1.6.0_11, although this will be upgraded to
>>> 1.6.0_26 tomorrow. I realise our current version is ancient - I'm not
>>> really looking for help on the problem itself, just for advice on
>>> whether the log line above indicates fragmentation.
>>>
>>> Thanks
>>>
>>> Jon Bright
>>>
>>>
>>>
>>> The parameters we have set are:
>>>
>>> -server
>>> -Xmx6144M
>>> -Xms6144M
>>> -XX:MaxPermSize=512m
>>> -XX:PermSize=512m
>>> -XX:+UseConcMarkSweepGC
>>> -XX:+CMSIncrementalMode
>>> -XX:+CMSIncrementalPacing
>>> -XX:SoftRefLRUPolicyMSPerMB=3
>>> -XX:CMSIncrementalSafetyFactor=30
>>> -XX:+PrintGCDetails
>>> -XX:+PrintGCApplicationStoppedTime
>>> -XX:+PrintGCApplicationConcurrentTime
>>> -XX:+PrintGCTimeStamps
>>> -Xloggc:/home/tbmx/log/gc_`date +%Y%m%d%H%M`.log
>>>
>>>
>>>
>>> _______________________________________________
>>> hotspot-gc-use mailing list
>>> hotspot-gc-use at openjdk.java.net
>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
_______________________________________________
hotspot-gc-use mailing list
hotspot-gc-use at openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use