Promotion Failed when the Old Generation Usage is very low.

Wed Apr 25 23:45:58 PDT 2012

Hi Srinivas,

Thanks very much for your response, very glad to have expert can talk about GC issue.

For my question#2, your answer is 'concurrent mode failure' cause Old generation compaction, I have attach a piece of gc log from real production of our client, there're 4 ParNew GCs, 
the 1st one at time 298550.966 having Promotion Failed and Concurrent Mode Failure, the 2nd and 3rd are OK, the 4th one at 298786.902 having Promotion Failed
again, but the whole old generation only used 2615456K out of 10387456K, and have been compacted at the 1st time. which confuses me a lot.

Regards,
Bond

>>> Srinivas Ramakrishna <ysr1729 at gmail.com> 4/25/2012 3:01 PM >>>
Bond, you are apparently using an MP box. I'd suggest losing the
"incremental" options entirely and dropping the
max tenuring threshold to 8 or so. I'd make use of the size of the young
gen and the survivor spaces to
control promotion into the old gen, which i would size at two times your
application footprint plus the size of
the young gen as a starting point and refine from there. There have been
some suggestions on this alias
from Chi-Ho Kwok etc. on the importance of reducing promotion of very young
objects into the old
generation to prevent fragmentation. LOnger-lived objects typically imply
(for most but not all applications)
relatively stable and less non-stationary distributions which CMS block
inventorying heuristics prefer.

more inline below...

On Tue, Apr 24, 2012 at 2:49 AM, Bond Chen <Bond.Chen at lombardrisk.com>wrote:

> Hi ,
>
> We're suffering high frequent promotion failed and concurrent mode
> failure, cause very long GC pause(5 seconds to 1000 seconds even more some
> time) attached the '1st promote failed' and '49th promotion failed' of
> gc.log
>
> 1, The '1st promote failed' caused by the old generation usage is too
> high, no enough space for promotion, but the  '49th promotion failed', only
> used
> 2615456K out of 10387456K, what happed?
>

either a large object allocation or fragmentation or more likely both.

>
> 2, Does the CMS throwing 'Concurrent Mode Failure' combat the old
> generation? move all objects together and leave only one free block? or
> Only 'Full GC' does this?
>

concurrent mode failure results in compaction, yes.

>
> 3, when will 'Promotion failure' cause ''Concurrent Mode Failure' and some
> time 'Full GC' ?
>

it's just a notional difference. Both should be called "concurrent mode
failure". I think the newer mesages say "concurrent mode interrupted"
and "full gc" respectively. In the latter case there is not an ongoing
concurrent cycle that was interrupted. From the standpoint of
the effect on the application (long pause for gc) and of the state of the
heap after gc (fully compacted) there is little difference.
For historical reasons, "concurrent mode failure" usually results in longer
pauses because an ongoing concurrent collection phase
first completes an ongoing phase before bailing to compaction, whereas in
the latter case there is no such delay so is usually
less painful.

-- ramki

> Regards,
> Bond
>
> /****parameter ***/
> ### New JVM Parameter
> #Below line changed per RH recommendation 15 Dec 2009
> #export RUN_ARGS=" -d64 -server -Xms2048M -Xmx12144M -XX:PermSize=512m
> -XX:MaxPermSize=512m -Xss1024k "
> export RUN_ARGS=" -server -d64 -Xms2048M -Xmx12144M -XX:PermSize=512m
> -XX:MaxPermSize=512m -Xss1024k "
>
> export RUN_ARGS=" $RUN_ARGS -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
> -XX:+CMSParallelRemarkEnabled -XX:+UseTLAB -XX:+CMSIncrementalMode "
>
> #Below line commented per RH recommendation 15 Dec 2009
> #export RUN_ARGS=" $RUN_ARGS -XX:+UseCMSCompactAtFullCollection "
>
> #Below line changed per RH recommendation 15 Dec 2009
> #export RUN_ARGS=" $RUN_ARGS -XX:+CMSIncrementalPacing
> -XX:CMSIncrementalDutyCycleMin=0 -XX:CMSIncrementalDutyCycle=10
> -XX:MaxTenuringThreshold=0 "
> export RUN_ARGS=" $RUN_ARGS -XX:+CMSIncrementalPacing
> -XX:CMSIncrementalDutyCycleMin=0 -XX:CMSIncrementalDutyCycle=10
> -XX:MaxTenuringThreshold=32 "
>
> #Below 2 lines added per RH recommendation 15 Dec 2009
> #export RUN_ARGS=" -XX:ParallelGCThreads=13 "
> #export RUN_ARGS=" -XX:SurvivorRatio=48 "
>
> #Below 2 lines added per RH recommendation 16 Dec 2009
> RUN_ARGS=" $RUN_ARGS -XX:ParallelGCThreads=13 "
> RUN_ARGS=" $RUN_ARGS -XX:SurvivorRatio=48 "
>
> ### set for cluster monitor  added on 25-Jun-2011
> export RUN_ARGS="$RUN_ARGS -Djboss.cluster.monitor.switch=y";
> export RUN_ARGS="$RUN_ARGS -Djboss.cluster.number=2";
>
> #Below line changed with RELEASE_2009_1_SP10.2 on 26 Feb 2010
> #export RUN_ARGS=" $RUN_ARGS -XX:CMSInitiatingOccupancyFraction=70
> -XX:+PrintTenuringDistribution -Xmn512m -XX:+UseLargePages
> -XX:LargePageSizeInBytes=64k "
> export RUN_ARGS=" $RUN_ARGS -XX:CMSInitiatingOccupancyFraction=60
> -Xmn2000m -XX:+UseLargePages -XX:LargePageSizeInBytes=64k "
>
> #Below line added with RELEASE_2009_1_SP10.2 on 26 Feb 2010
> export RUN_ARGS=" $RUN_ARGS -XX:+CMSClassUnloadingEnabled
> -XX:+ExplicitGCInvokesConcurrent -XX:+AggressiveOpts "
>
> export RUN_ARGS=" $RUN_ARGS -XX:+PrintGCDetails
> -XX:+PrintGCApplicationStoppedTime -Xloggc:./gc_${start_ts}.log "
>
> export RUN_ARGS=" $RUN_ARGS  -Dsun.rmi.dgc.server.gcInterval=18000000
> -Dsun.rmi.dgc.client.gcInterval=18000000 -verbose:gc"
>
>
>
> /***parameter
>
>
>
>
>
>
>
> /** the 1st promotion failed **/
>
> 169682.980: [GC Before GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 7127332
> Max   Chunk Size: 6041118
> Number of Blocks: 1785
> Av.  Block  Size: 3992
> Tree      Height: 24
> Before GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 6834133
> Max   Chunk Size: 97353
> Number of Blocks: 4773
> Av.  Block  Size: 1431
> Tree      Height: 27
> 169682.981: [ParNew (promotion failed): 2007040K->2007040K(2007040K),
> 48.9558338 secs]169731.937: [CMS169741.903: [CMS-concurrent-sweep:
> 10.823/99.414 secs] [Times: user=127.09 sys=25.97, real=99.41 secs]
>  (concurrent mode failure): 8681490K->2319271K(10387456K), 44.6304362
> secs] 10395485K->2319271K(12394496K), [CMS Perm :
> 291584K->290856K(524288K)]After GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 1032711195
> Max   Chunk Size: 1032711195
> Number of Blocks: 1
> Av.  Block  Size: 1032711195
> Tree      Height: 1
> After GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 0
> Max   Chunk Size: 0
> Number of Blocks: 0
> Tree      Height: 0
>  icms_dc=16 , 93.5876901 secs] [Times: user=97.28 sys=21.58, real=93.59
> secs]
>
> /** the 1st promotion failed **/
>
>
>
> /** the 49th promotion failed ***/
> 298786.901: [GC Before GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 236997970
> Max   Chunk Size: 236997970
> Number of Blocks: 1
> Av.  Block  Size: 236997970
> Tree      Height: 1
> Before GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 0
> Max   Chunk Size: 0
> Number of Blocks: 0
> Tree      Height: 0
> 298786.902: [ParNew (promotion failed): 2007039K->2007040K(2007040K),
> 4.5565939 secs]298791.458: [CMS: 2615456K->1813239K(10387456K), 19.2232319
> secs] 4346089K->1813239K(12394496K), [CMS Perm :
> 299206K->299126K(524288K)]After GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 1097483360
> Max   Chunk Size: 1097483360
> Number of Blocks: 1
> Av.  Block  Size: 1097483360
> Tree      Height: 1
> After GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 0
> Max   Chunk Size: 0
> Number of Blocks: 0
> Tree      Height: 0
>  icms_dc=0 , 23.7805042 secs] [Times: user=25.47 sys=0.02, real=23.78 secs]
> Total time for which application threads were stopped: 23.7861234 seconds
> /** the 49th promotion failed ***/
>
> This e-mail together with any attachments (the "Message") is confidential
> and may contain privileged information. If you are not the intended
> recipient (or have received this e-mail in error) please notify the sender
> immediately and delete this Message from your system.  Any unauthorized
> copying, disclosure, distribution or use of this Message is strictly
> forbidden.
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>

This e-mail together with any attachments (the "Message") is confidential and may contain privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and delete this Message from your system.  Any unauthorized copying, disclosure, distribution or use of this Message is strictly forbidden.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120426/92573cf1/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gc_twoNeibouringPromotionFailure.log
Type: application/octet-stream
Size: 7272 bytes
Desc: not available
Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120426/92573cf1/gc_twoNeibouringPromotionFailure-0001.log