CMS Promotion Failures

Brian Williams brian.williams at mayalane.com
Wed Nov 17 06:32:51 PST 2010


Hi Jon,
We just received GC detail from one machine, a 16-way x4600.  The promotion failure occurs 10 hours into the process life.

-- application startup ---
2010-11-15T22:34:47.439-0800: [GC [ParNew: 1780514K->209664K(1887488K), 1.1993542 secs] 1780514K->369453K(24956160K), 1.1996619 secs] [Times: user=5.17 sys=1.05, real=1.20 secs] 
2010-11-15T22:35:51.297-0800: [GC [ParNew: 1887488K->209664K(1887488K), 0.5736733 secs] 2047277K->610880K(24956160K), 0.5739673 secs] [Times: user=2.65 sys=0.44, real=0.57 secs] 
2010-11-15T22:37:13.893-0800: [GC [ParNew: 1887488K->209664K(1887488K), 0.5714056 secs] 2288704K->838848K(24956160K), 0.5716780 secs] [Times: user=3.32 sys=0.24, real=0.57 secs] 
2010-11-15T22:38:28.112-0800: [GC [ParNew: 1887488K->209664K(1887488K), 0.6964940 secs] 2516672K->1117914K(24956160K), 0.6967518 secs] [Times: user=4.06 sys=0.31, real=0.70 secs] 
2010-11-15T22:40:02.015-0800: [GC [ParNew: 1887488K->209664K(1887488K), 0.6151412 secs] 2795738K->1369924K(24956160K), 0.6154426 secs] [Times: user=3.50 sys=0.28, real=0.62 secs] 

...  50 CMS cycles pass ...

2010-11-16T10:05:46.116-0800: [GC [ParNew: 1887488K->209664K(1887488K), 0.4683483 secs] 8156115K->6706120K(24956160K), 0.4686710 secs] [Times: user=4.09 sys=0.25, real=0.47 secs] 
2010-11-16T10:06:15.535-0800: [GC [ParNew: 1887488K->201665K(1887488K), 0.3743882 secs] 8383944K->6904794K(24956160K), 0.3746896 secs] [Times: user=2.88 sys=0.32, real=0.37 secs] 
2010-11-16T10:06:25.861-0800: [GC [ParNew: 1879489K->209664K(1887488K), 0.6735419 secs] 8582618K->7164756K(24956160K), 0.6738153 secs] [Times: user=5.83 sys=0.68, real=0.67 secs] 
2010-11-16T10:06:26.537-0800: [GC [1 CMS-initial-mark: 6955092K(23068672K)] 7172883K(24956160K), 0.1812929 secs] [Times: user=0.18 sys=0.00, real=0.18 secs] 
2010-11-16T10:06:28.457-0800: [CMS-concurrent-mark: 1.708/1.738 secs] [Times: user=8.80 sys=0.16, real=1.74 secs] 
2010-11-16T10:06:28.768-0800: [CMS-concurrent-preclean: 0.279/0.311 secs] [Times: user=0.75 sys=0.05, real=0.31 secs] 
2010-11-16T10:06:54.938-0800: [GC [ParNew: 1887488K->208834K(1887488K), 0.3598649 secs] 8842580K->7358032K(24956160K), 0.3601703 secs] [Times: user=3.07 sys=0.11, real=0.36 secs] 
2010-11-16T10:06:57.753-0800: [CMS-concurrent-abortable-preclean: 28.096/28.986 secs] [Times: user=44.84 sys=2.12, real=28.99 secs] 
2010-11-16T10:06:57.755-0800: [GC[YG occupancy: 1050375 K (1887488 K)]2010-11-16T10:06:57.755-0800: [GC [ParNew (promotion failed): 1050375K->1051204K(1887488K), 0.9133199 secs] 8199573K->8337584K(24956160K), 0.9136117 secs] [Times: user=3.90 sys=0.01, real=0.91 secs] 
[Rescan (parallel) , 0.7407982 secs][weak refs processing, 0.0022770 secs] [1 CMS-remark: 7286379K(23068672K)] 8337584K(24956160K), 1.6572438 secs] [Times: user=7.01 sys=0.06, real=1.66 secs] 
2010-11-16T10:07:01.679-0800: [Full GC [CMS2010-11-16T10:07:42.463-0800: [CMS-concurrent-sweep: 43.044/43.050 secs] [Times: user=48.80 sys=0.72, real=43.05 secs] 
 (concurrent mode failure): 6616900K->2119438K(23068672K), 61.3282450 secs] 8504388K->2119438K(24956160K), [CMS Perm : 49108K->48394K(82008K)], 61.3285709 secs] [Times: user=61.33 sys=0.01, real=61.33 secs] 
2010-11-16T10:08:08.462-0800: [GC [ParNew: 1677824K->157105K(1887488K), 0.0797175 secs] 3797262K->2276544K(24956160K), 0.0800104 secs] [Times: user=1.00 sys=0.00, real=0.08 secs] 
2010-11-16T10:08:36.240-0800: [GC [ParNew: 1834929K->136334K(1887488K), 0.1978673 secs] 3954368K->2386916K(24956160K), 0.1981614 secs] [Times: user=1.72 sys=0.04, real=0.20 secs] 

The average amount data promoted per ParNew is 187m and we are looking into why it is so large.  If you have any insight into this particular promotion failure, we would appreciate it. 

Thanks,
Brian


On Nov 15, 2010, at 9:31 PM, Jon Masamitsu wrote:

>  Brian,
> 
> Ramki and Shaun have addressed most of your questions I think.
> Just wanted to know what type of platform (how many hardware
> threads)  you're using.  Also what is CMS doing when the
> promotion failures are happening (concurrent marking,
> preclean cleaning or sweeping)?
> 
> Jon
> 



More information about the hotspot-gc-use mailing list