understanding GC logs

Tue Mar 30 21:34:34 PDT 2010

On 03/30/10 07:30, Shaun Hennessy wrote:
>
> A couple of question related to the GC logs and promotion failure  
> messages
>
> I am running 6u17.
>
> rgv[2]: -server
> argv[3]: -Xms14000m
> argv[4]: -Xmx14000m
> argv[5]: -XX:PermSize=800m
> argv[6]: -XX:NewSize=5600m
> argv[7]: -XX:MaxNewSize=5600m
> argv[8]: -XX:MaxPermSize=800m
> argv[9]: -XX:+DisableExplicitGC
> argv[10]: -XX:+UseConcMarkSweepGC
> argv[11]: -XX:+UseParNewGC
> argv[12]: -XX:+UseCMSCompactAtFullCollection
> argv[13]: -XX:+CMSClassUnloadingEnabled
> argv[28]: -verbose:gc
> argv[29]: -XX:+PrintGCDetails
> argv[30]: -XX:+PrintGCDateStamps
>
>
>
> 4112.744: [GC 4112.744: [ParNew: 4940531K->530787K(5017600K),  
> 0.1455641 secs] 11878708K->7540012K(13619200K), 0.1457559 secs]  
> [Times: user=1.38 sys=0.02, real=0.15 secs]
> 4113.780: [GC 4113.780: [ParNew: 4831587K->372801K(5017600K),  
> 0.2093305 secs] 11840812K->7551390K(13619200K), 0.2095270 secs]  
> [Times: user=1.34 sys=0.07, real=0.21 secs]
>  [Times: user=0.10 sys=0.00, real=0.11 secs]
> 2010-03-24T16:31:56.490-0400: 4114.097: [CMS-concurrent-mark-start]
> 4115.261: [GC 4115.261: [ParNew: 4674075K->364108K(5017600K),  
> 0.0755017 secs] 11852663K->7542736K(13619200K), 0.0756880 secs]  
> [Times: user=0.93 sys=0.00, real=0.08 secs]
> 4115.338: [GC 4115.338: [ParNew: 420064K->323310K(5017600K),  
> 0.1112115 secs] 7598693K->7587370K(13619200K), 0.1113667 secs]  
> [Times: user=0.98 sys=0.02, real=0.11 secs]
> 2010-03-24T16:31:58.647-0400: 4116.254: [CMS-concurrent-mark:  
> 1.909/2.157 secs] [Times: user=31.47 sys=1.55, real=2.16 secs]
> 2010-03-24T16:31:58.647-0400: 4116.254: [CMS-concurrent-preclean- 
> start]
> 2010-03-24T16:31:58.798-0400: 4116.405: [CMS-concurrent-preclean:  
> 0.149/0.151 secs] [Times: user=2.29 sys=0.12, real=0.15 secs]
> 2010-03-24T16:31:58.799-0400: 4116.406: [CMS-concurrent-abortable- 
> preclean-start]
> 4116.460: [GC 4116.460: [ParNew: 4624110K->301464K(5017600K),  
> 0.0914784 secs] 11888170K->7617401K(13619200K), 0.0916679 secs]  
> [Times: user=0.89 sys=0.03, real=0.09 secs]
> 2010-03-24T16:31:59.494-0400: 4117.101: [CMS-concurrent-abortable- 
> preclean: 0.596/0.695 secs] [Times: user=9.88 sys=0.60, real=0.70  
> secs]
> [YG occupancy: 2756990 K (5017600 K)]4117.102: [Rescan (parallel) ,  
> 0.4648394 secs]4117.567: [weak refs processing, 0.0028851  
> secs]4117.570: [class unloading, 0.0240174 secs]4117.594: [scrub  
> symbol & string tables, 0.0898531 secs] [Times: user=1.72 sys=0.37,  
> real=0.58 secs]
> 2010-03-24T16:32:00.079-0400: 4117.686: [CMS-concurrent-sweep-start]
> 4118.116: [GC 4118.116: [ParNew: 4602264K->305089K(5017600K),  
> 0.0712571 secs] 11891816K->7620802K(13619200K), 0.0714474 secs]  
> [Times: user=0.75 sys=0.00, real=0.07 secs]
> 4119.117: [GC 4119.117: [ParNew: 4605889K->263281K(5017600K),  
> 0.0842051 secs] 11665429K->7368947K(13619200K), 0.0843955 secs]  
> [Times: user=0.79 sys=0.01, real=0.08 secs]
> 4125.941: [GC 4125.941: [ParNew: 4936868K->708251K(5017600K),  
> 0.1426036 secs] 9789305K->5612975K(13619200K), 0.1427944 secs]  
> [Times: user=1.56 sys=0.01, real=0.14 secs]
> 4126.893: [GC 4126.894: [ParNew: 5009051K->485783K(5017600K),  
> 0.2210054 secs] 9536611K->5247528K(13619200K), 0.2211964 secs]  
> [Times: user=1.58 sys=0.04, real=0.22 secs]
> 4128.102: [GC 4128.102: [ParNew: 4786583K->455386K(5017600K),  
> 0.0748814 secs] 8588693K->4257495K(13619200K), 0.0750694 secs]  
> [Times: user=0.94 sys=0.00, real=0.08 secs]
> 2010-03-24T16:32:11.951-0400: 4129.558: [CMS-concurrent-sweep:  
> 10.777/11.872 secs] [Times: user=149.77 sys=7.25, real=11.87 secs]
> 2010-03-24T16:32:11.951-0400: 4129.558: [CMS-concurrent-reset-start]
> 2010-03-24T16:32:11.984-0400: 4129.591: [CMS-concurrent-reset:  
> 0.033/0.033 secs] [Times: user=0.04 sys=0.00, real=0.03 secs]  
> 4140.537: [GC 4140.537: [ParNew: 4756186K->539705K(5017600K),  
> 0.0873384 secs] 6572247K->2355767K(13619200K), 0.0875281 secs]  
> [Times: user=1.10 sys=0.00, re
> al=0.09 secs]
>
>
> 1) I no longer seem to get any "CMS-initial-mark"  - is this a  
> change since 6u12?

I'm running

Java(TM) SE Runtime Environment (build 1.6.0_17-b04)
Java HotSpot(TM) Server VM (build 14.3-b01, mixed mode)

and I see entries such as

0.869: [GC [1 CMS-initial-mark: 22901K(229376K)] 28264K(258880K),  
0.0561592 secs] [Times: user=0.05 sys=0.01, real=0.06 secs]

> 2) The rescan portion than is the only non-concurrent correct?  So  
> from the above the application was only STW for 0.58 sec.

The initial-mark is also STW.

The rescan is part of the remark which is STW.  From my run

1.270: [GC[YG occupancy: 15860 K (29504 K)]1.270: [Rescan (parallel) ,  
0.1002467 secs]1.370: [weak refs processing, 0.0000167 secs] [1 CMS- 
remark: 22901K(229376K)] 38761K(258880K), 0.1004598 secs] [Times:  
user=0.11 sys=0.03, real=0.10 secs]

In your entry yes it is .058 sec.

> 3) This may have been a chance from 1.5 to 1.6, but this line also  
> used to display a CMS-remark did it not?

Yes, see my example above.

> 4) Is there a way to have my minor collections also display the full  
> date stamp (ie  2010-03-24T16:31:58.799-0400)
>
>

When I run with -XX:+PrintGCDateStamps I see entries such as

2010-03-30T16:06:55.297-0700: 2.602: [GC 2.602: [ParNew: 29504K- 
 >3264K(29504K), 0.2457302 secs] 52405K->38187K(258880K), 0.2460394  
secs] [Times: user=0.41 sys=0.02, real=0.25 secs]

I don't know why you're not seeing that.
>
> 1270.736: [GC 1270.736: [ParNew: 4872340K->574345K(5017600K),  
> 0.1967262 secs] 7123984K->2972742K(13619200K), 0.1969106 secs]  
> [Times: user=1.45 sys=0.05, re
> al=0.20 secs]
> 1272.024: [GC 1272.024: [ParNew: 4875542K->653139K(5017600K),  
> 0.1334760 secs] 7273939K->3051536K(13619200K), 0.1336582 secs]  
> [Times: user=1.54 sys=0.01, re
> al=0.13 secs]
> 1272.158: [GC 1272.159: [ParNew: 681949K->563105K(5017600K),  
> 0.2187865 secs] 3080347K->3158904K(13619200K), 0.2189362 secs]  
> [Times: user=1.48 sys=0.06, rea
> l=0.22 secs]
> 1273.398: [GC 1273.398: [ParNew: 4863905K->535051K(5017600K),  
> 0.1196808 secs] 7459704K->3130850K(13619200K), 0.1198694 secs]  
> [Times: user=1.51 sys=0.00, re
> al=0.12 secs]
> 1274.461: [GC 1274.461: [ParNew: 4835851K->399744K(5017600K),  
> 0.2861376 secs] 7431650K->3249248K(13619200K), 0.2863296 secs]  
> [Times: user=1.61 sys=0.09, re
> al=0.29 secs]
>
> 5) Why did the middle minor collection occur?  A big allocation?
>

That seems rather suspicious.   It may be a side effect of using JNI  
critical sections.  I don't
know if this is such a case but its the best behavior.

>
>
> - Promotion Failure
> 4896.478: [GC 4896.478: [ParNew: 4894353K->587864K(5017600K),  
> 0.4789909 secs] 8473688K->4268560K(13619200K), 0.4791812 secs]  
> [Times: user=1.00 sys=0.61, real=0.48 secs]
> 4897.812: [GC 4897.812: [ParNew: 4888664K->545903K(5017600K),  
> 0.4105613 secs] 8569360K->4326583K(13619200K), 0.4107560 secs]  
> [Times: user=1.06 sys=0.55, real=0.41 secs]
> 4899.057: [GC 4899.058: [ParNew: 4846703K->638966K(5017600K),  
> 0.2759734 secs] 8627383K->4496987K(13619200K), 0.2761637 secs]  
> [Times: user=1.13 sys=0.36, real=0.28 secs]
> 4900.101: [GC 4900.101: [ParNew: 4939768K->630721K(5017600K),  
> 0.5117751 secs] 8797789K->4607020K(13619200K), 0.5119662 secs]  
> [Times: user=0.84 sys=0.66, real=0.51 secs]
> 4900.615: [GC 4900.615: [ParNew: 651487K->487288K(5017600K),  
> 0.0780183 secs] 4627786K->4463587K(13619200K), 0.0781687 secs]  
> [Times: user=0.96 sys=0.00, real=0.08 secs]
> 4901.581: [GC 4901.581: [ParNew (promotion failed): 4788088K- 
> >4780999K(5017600K), 2.8947499 secs]4904.476: [CMS: 4003090K- 
> >1530872K(8601600K), 7.5122451 secs] 8764387K->1530872K(13619200K),  
> [CMS Perm : 671102K->671102K(819200K)], 10.4072102 secs] [Times:  
> user=11.03 sys=1.09, real=10.41 secs]
> 4913.024: [GC 4913.024: [ParNew: 4300800K->316807K(5017600K),  
> 0.0615917 secs] 5831672K->1847679K(13619200K), 0.0617857 secs]  
> [Times: user=0.74 sys=0.00, real=0.06 secs]
> 4914.015: [GC 4914.015: [ParNew: 4617607K->475077K(5017600K),  
> 0.0771389 secs] 6148479K->2005949K(13619200K), 0.0773290 secs]  
> [Times: user=0.95 sys=0.00, real=0.08 secs]
> 4914.908: [GC 4914.908: [ParNew: 4775877K->586339K(5017600K),  
> 0.0857102 secs] 6306749K->2117211K(13619200K), 0.0859046 secs]  
> [Times: user=1.06 sys=0.00, real=0.09 secs]
> 4915.816: [GC 4915.816: [ParNew: 4887139K->476398K(5017600K),  
> 0.1841627 secs] 6418011K->2152868K(13619200K), 0.1843556 secs]  
> [Times: user=1.32 sys=0.07, real=0.18 secs]
> 6) So here i had a promotion failure, this is due to fragmentation  
> of the tenured generation rather than lack of space?

Fragmentation is the likely problem.  When 6u20 is released try it.   
It does a better job
of keeping fragmentation down.

> 7) Do we need 1 contiguous space in tenured big enough to hold the  
> complete list/size of all objects being promoted, or
>    do multiple spaces get used& the pieces don't all fit?

The free space in the tenured generation is kept in a free list so  
there are multiple chunks.
Don't need 1 contiguous chunk for all the promotions.

> 8) What exactly is occurring during this promotion failed  
> collection?  Based on the next example I assume
>   it's a (successful) scavenge.  What exactly is this - which  
> thread(s) serial / ParallelGCThreads?,
>   STW?, are we simply compacting the tenured gen or are we can  
> actually GC the tenured?

A promotion failure is a scavenge that does not succeed because there  
is not enough
space in the old gen to do all the  needed promotions.  The scavenge  
is in essence
unwound and then a full STW compaction of the entire heap is done.

>
>
>
> promotion failed, and full GC
> 50786.124: [GC 50786.124: [ParNew: 4606713K->338518K(5017600K),  
> 0.0961884 secs] 12303455K->8081859K(13619200K), 0.0963907 secs]  
> [Times: user=0.91 sys=0.01, real=0.10 secs]
> 50787.373: [GC 50787.373: [ParNew: 4639318K->272229K(5017600K),  
> 0.0749353 secs] 12382659K->8053730K(13619200K), 0.0751408 secs]  
> [Times: user=0.75 sys=0.00, real=0.08 secs]
> 50788.483: [GC 50788.483: [ParNew: 4573029K->393397K(5017600K),  
> 0.0837182 secs] 12354530K->8185595K(13619200K), 0.0839321 secs]  
> [Times: user=1.03 sys=0.00, real=0.08 secs]
> 50789.590: [GC 50789.590: [ParNew (promotion failed): 4694264K- 
> >4612345K(5017600K), 1.5974678 secs] 12486461K- 
> >12447305K(13619200K), 1.5976765 secs] [Times : user=2.38 sys=0.20,  
> real=1.60 secs]
> GC locker: Trying a full collection because scavenge failed
> 50791.188: [Full GC 50791.188: [CMS: 7834959K->1227325K(8601600K),  
> 6.7102106 secs] 12447305K->1227325K(13619200K), [CMS Perm : 670478K- 
> >670478K(819200K)], 6.7103417 secs] [Times: user=6.71 sys=0.00,  
> real=6.71 secs]
> 50798.982: [GC 50798.982: [ParNew: 4300800K->217359K(5017600K),  
> 0.0364557 secs] 5528125K->1444685K(13619200K), 0.0366630 secs]  
> [Times: user=0.44 sys=0.00, real=0.04 secs]
> 50800.246: [GC 50800.246: [ParNew: 4518167K->198753K(5017600K),  
> 0.0368620 secs] 5745493K->1426078K(13619200K), 0.0370604 secs]  
> [Times: user=0.46 sys=0.01, real=0.04 secs]
> 9) Probably once I understand what the scavenge is doing will help  
> me understand this case, but logic seems
>  simply enough - fullgc on promotionfailure&scavenge failed.

Yes, full STW compaction.

>
>
>
> promotion and concurrent mode failures
> 53494.424: [GC 53494.424: [ParNew: 4979001K->716800K(5017600K),  
> 0.2120290 secs] 12583633K->8434774K(13619200K), 0.2122200 secs]  
> [Times: user=2.12 sys=0.03, real=0.21 secs]
> 53496.131: [GC 53496.131: [ParNew: 5017600K->605278K(5017600K),  
> 0.2761710 secs] 12735574K->8578720K(13619200K), 0.2763597 secs]  
> [Times: user=1.94 sys=0.08, real=0.28 secs]
>  [Times: user=0.16 sys=0.00, real=0.16 secs]
> 2010-03-25T06:14:58.961-0400: 53496.568: [CMS-concurrent-mark-start]
> 53497.688: [GC 53497.688: [ParNew: 4906078K->545999K(5017600K),  
> 0.0989930 secs] 12879520K->8519441K(13619200K), 0.0991785 secs]  
> [Times: user=1.21 sys=0.02, real=0.10 secs]
> 2010-03-25T06:15:00.188-0400: 53497.795: [CMS-concurrent-mark:  
> 1.107/1.227 secs] [Times: user=15.14 sys=0.42, real=1.23 secs]
> 2010-03-25T06:15:00.188-0400: 53497.795: [CMS-concurrent-preclean- 
> start]
> 2010-03-25T06:15:00.233-0400: 53497.840: [CMS-concurrent-preclean:  
> 0.043/0.045 secs] [Times: user=0.31 sys=0.01, real=0.04 secs]
> 2010-03-25T06:15:00.233-0400: 53497.840: [CMS-concurrent-abortable- 
> preclean-start]
> 2010-03-25T06:15:00.794-0400: 53498.401: [CMS-concurrent-abortable- 
> preclean: 0.541/0.560 secs] [Times: user=6.11 sys=0.22, real=0.56  
> secs]
> [YG occupancy: 3222128 K (5017600 K)]53498.402: [Rescan (parallel) ,  
> 0.4447462 secs]53498.847: [weak refs processing, 0.0028967  
> secs]53498.850: [class unloading, 0.0248904 secs]53498.875: [scrub  
> symbol & string tables, 0.0896937 secs] [Times: user=1.79 sys=0.35,  
> real=0.56 secs]
> 2010-03-25T06:15:01.360-0400: 53498.967: [CMS-concurrent-sweep- 
> start] 53499.350: [GC 53499.350: [ParNew (promotion failed):  
> 4846799K->4718254K(5017600K), 5.3142493 secs]53504.664:  
> [CMS2010-03-25T06:15:11.506-0400: 53509.113:
> [CMS-concurrent-sweep: 4.825/10.146 secs] [Times: user=16.61  
> sys=2.94, real=10.15 secs]
>  (concurrent mode failure): 8087820K->1346631K(8601600K), 11.0573075  
> secs] 12820241K->1346631K(13619200K), [CMS Perm : 670478K- 
> >670478K(819200K)], 16.37177 19 secs] [Times: user=17.62 sys=2.66,  
> real=16.37 secs]
> 53516.713: [GC 53516.714: [ParNew: 4300800K->283359K(5017600K),  
> 0.0498000 secs] 5647431K->1629990K(13619200K), 0.0499965 secs]  
> [Times: user=0.62 sys=0.00, real=0.05 secs]
> 53517.743: [GC 53517.743: [ParNew: 4584343K->340302K(5017600K),  
> 0.0544853 secs] 5930975K->1686933K(13619200K), 0.0546710 secs]  
> [Times: user=0.68 sys=0.00, real=0.05 secs]
> 10)  I think it's just the system is allocating at such at high rate  
> at this point in time ( and we don't use InitiatingOccupancy on this  
> app)
>   so we get close to full on tenured, minor collection came - no  
> room in tenured ---- so even though we don't say "Full GC" in this  
> one,
>   don't you get a Full GC as part of any concurrent-mode-failure?

The promotion failure that happens leads to the concurrent mode failure.
>
>
> promotion failed, scavenge failed, concurrent mode failure
> 86833.016: [GC 86833.017: [ParNew: 4769273K->453398K(5017600K),  
> 0.1316717 secs] 12418197K->8169164K(13619200K), 0.1319220 secs]  
> [Times: user=1.22 sys=0.03, real=0.13 secs]
>  [Times: user=0.14 sys=0.00, real=0.15 secs]
> 2010-03-25T15:30:37.688-0400: 86833.295: [CMS-concurrent-mark-start]
> 86834.751: [GC 86834.751: [ParNew: 4754198K->513298K(5017600K),  
> 0.1250485 secs] 12469964K->8281014K(13619200K), 0.1252553 secs]  
> [Times: user=1.38 sys=0.01, real=0.13 secs]
> 2010-03-25T15:30:39.310-0400: 86834.917: [CMS-concurrent-mark:  
> 1.453/1.621 secs] [Times: user=21.57 sys=1.15, real=1.62 secs]
> 2010-03-25T15:30:39.310-0400: 86834.917: [CMS-concurrent-preclean- 
> start]
> 2010-03-25T15:30:39.650-0400: 86835.258: [CMS-concurrent-preclean:  
> 0.337/0.341 secs] [Times: user=5.30 sys=0.18, real=0.34 secs]
> 2010-03-25T15:30:39.651-0400: 86835.258: [CMS-concurrent-abortable- 
> preclean-start]
> 2010-03-25T15:30:39.864-0400: 86835.471: [CMS-concurrent-abortable- 
> preclean: 0.211/0.214 secs] [Times: user=3.16 sys=0.19, real=0.21  
> secs]
> [YG occupancy: 3329361 K (5017600 K)]86835.500: [Rescan (parallel) ,  
> 0.3868448 secs]86835.887: [weak refs processing, 0.0030042  
> secs]86835.890: [class unloading, 0.0250008 secs]86835.915: [scrub  
> symbol & string tables, 0.0904210 secs] [Times: user=1.85 sys=0.29,  
> real=0.51 secs]
> 2010-03-25T15:30:40.401-0400: 86836.008: [CMS-concurrent-sweep-start]
> 86836.421: [GC 86836.422: [ParNew: 4814154K->680591K(5017600K),  
> 0.2031305 secs] 12581870K->8543701K(13619200K), 0.2033332 secs]  
> [Times: user=1.88 sys=0.04, real=0.20 secs]
> 86836.627: [GC 86836.627: [ParNew (promotion failed): 720747K- 
> >511306K(5017600K), 1.3076955 secs] 8583857K->8560580K(13619200K),  
> 1.3078889 secs] [Times: user=2.66 sys=0.78, real=1.31 secs]
> GC locker: Trying a full collection because scavenge failed
> 86837.935: [Full GC 86837.935: [CMS2010-03-25T15:30:46.850-0400:  
> 86842.457: [CMS-concurrent-sweep: 4.926/6.449 secs] [Times:  
> user=15.24 sys=1.19, real=6.45 secs]
>  (concurrent mode failure): 8049273K->1356962K(8601600K), 9.6514031  
> secs] 8560580K->1356962K(13619200K), [CMS Perm : 670523K- 
> >670523K(819200K)], 9.6515260 secs] [Times: user=9.65 sys=0.00,  
> real=9.65 secs]
> 86848.669: [GC 86848.669: [ParNew: 4301133K->201781K(5017600K),  
> 0.0452702 secs] 5658095K->1558743K(13619200K), 0.0454738 secs]  
> [Times: user=0.57 sys=0.00, real=0.05 secs]
>
> 11) - So here our scavenge failed, - this is what gave us the "Full  
> GC" log message -- the concurrent mode failure
>   was really just a coincidence/timing?  The full gc (triggered by  
> the promotion failure) aborts the tenured CMS collection
> does it not?

The "GC locker" message says that after a JNI critical section was  
exited the GC wanted to
do a scavenge but did not think there was enough room in  the old gen  
so it does a full
STW compaction.  Because a CMS concurrent collection was in progress,  
it is aborted
and that abortion causes the concurrent mode failure to be printed.   
Not a coincidence.
Just telling you that the CMS concurrent collection could not be  
completed for some
reason.

>
>
> thanks,
> Shaun
>
>
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100330/27351cfc/attachment-0001.html