From bdimitri at cisco.com Tue Mar 1 01:38:34 2011 From: bdimitri at cisco.com (Bogdan Dimitriu) Date: Tue, 01 Mar 2011 09:38:34 +0000 Subject: Hung JVM consuming 100% CPU Message-ID: <4D6CBE9A.6040304@cisco.com> Hi guys, We're having a problem with garbage collection as described here: http://forums.oracle.com/forums/message.jspa?messageID=9345173 (I apologise if posting links is not the right policy, but I prefer not to duplicate data). We are going to try an upgrade to JRE 6u24 soon, but reading the release notes for each of the versions since 6u20, I don't have much hope of this upgrade fixing the problem. I have searched a bit on the Java bugs database and I've come across something that looks similar to the problem I am experiencing: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6692906, but I'm not convinced this is exactly the same issue. This bug it seems will be fixed in 6u25 (which I've read will be released late March or early April). The reason I'm leaning towards thinking this is a JVM bug is the fact that the JVM can stay in the hung state (as described on the forum) for 2+ hours until we kill the process. I was hoping to get an idea about this from the source :), so any hints will be greatly appreciated. Thanks, Bogdan From jon.masamitsu at oracle.com Tue Mar 1 08:29:47 2011 From: jon.masamitsu at oracle.com (Jon Masamitsu) Date: Tue, 01 Mar 2011 08:29:47 -0800 Subject: Hung JVM consuming 100% CPU In-Reply-To: <4D6CBE9A.6040304@cisco.com> References: <4D6CBE9A.6040304@cisco.com> Message-ID: <4D6D1EFB.70606@oracle.com> For 6692906 to be the problem there needs to be a CMS concurrent phase in progress (marking, precleaning or sweeping) and a minor collection running (with UseParNewGC in use). From the fragment of the gc log I could not tell for sure (maybe it was in the ... removed) but I don't think a concurrent phase was in progress so I would say it is not 6692906). Did you try -XX:-UseParNewGC as was suggested? Your minor pauses are not particularly long so maybe you could afford to try it. 6692906 will not happen without UseParNewGC. Note you need to turn of UseParNewGC as it is the default for CMS. Look back through the log for any other ParNew (promotion failed) and see what happens in those cases (in you find one). 2+ hours is too long. The gentleman who would know best about this code is out of the office until the end of the week. I'll talk to him about this to see if he remembers a recent fix that I don't. On 03/01/11 01:38, Bogdan Dimitriu wrote: > Hi guys, > > We're having a problem with garbage collection as described here: > http://forums.oracle.com/forums/message.jspa?messageID=9345173 (I > apologise if posting links is not the right policy, but I prefer not to > duplicate data). > > We are going to try an upgrade to JRE 6u24 soon, but reading the release > notes for each of the versions since 6u20, I don't have much hope of > this upgrade fixing the problem. > > I have searched a bit on the Java bugs database and I've come across > something that looks similar to the problem I am experiencing: > http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6692906, but I'm not > convinced this is exactly the same issue. This bug it seems will be > fixed in 6u25 (which I've read will be released late March or early April). > > The reason I'm leaning towards thinking this is a JVM bug is the fact > that the JVM can stay in the hung state (as described on the forum) for > 2+ hours until we kill the process. > > I was hoping to get an idea about this from the source :), so any hints > will be greatly appreciated. > > Thanks, > Bogdan > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From justin at techadvise.com Tue Mar 1 09:40:40 2011 From: justin at techadvise.com (Justin Ellison) Date: Tue, 1 Mar 2011 11:40:40 -0600 Subject: Hung JVM consuming 100% CPU In-Reply-To: <4D6CBE9A.6040304@cisco.com> References: <4D6CBE9A.6040304@cisco.com> Message-ID: Bogdan, Make sure you're not actually seeing this bug: http://www.exploringbinary.com/fpupdater-fixes-the-java-2-2250738585072012e-308-bug/ It sounds strikingly similar. Justin On Tue, Mar 1, 2011 at 3:38 AM, Bogdan Dimitriu wrote: > Hi guys, > > We're having a problem with garbage collection as described here: > http://forums.oracle.com/forums/message.jspa?messageID=9345173 (I > apologise if posting links is not the right policy, but I prefer not to > duplicate data). > > We are going to try an upgrade to JRE 6u24 soon, but reading the release > notes for each of the versions since 6u20, I don't have much hope of > this upgrade fixing the problem. > > I have searched a bit on the Java bugs database and I've come across > something that looks similar to the problem I am experiencing: > http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6692906, but I'm not > convinced this is exactly the same issue. This bug it seems will be > fixed in 6u25 (which I've read will be released late March or early April). > > The reason I'm leaning towards thinking this is a JVM bug is the fact > that the JVM can stay in the hung state (as described on the forum) for > 2+ hours until we kill the process. > > I was hoping to get an idea about this from the source :), so any hints > will be greatly appreciated. > > Thanks, > Bogdan > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > From jon.masamitsu at oracle.com Tue Mar 1 13:12:40 2011 From: jon.masamitsu at oracle.com (Jon Masamitsu) Date: Tue, 01 Mar 2011 13:12:40 -0800 Subject: Hung JVM consuming 100% CPU In-Reply-To: <3BA76B61-E7DA-49E1-AEFE-819214B3C729@kodewerk.com> References: <4D6CBE9A.6040304@cisco.com> <4D6D1EFB.70606@oracle.com> <3BA76B61-E7DA-49E1-AEFE-819214B3C729@kodewerk.com> Message-ID: <4D6D6148.7090906@oracle.com> On 03/01/11 09:26, Charles K Pepperdine wrote: > Tenured is 524288K of which 478087K is occupied, greater than 90% I've > just been recently poking about in the source trying to sort out how > logs are being printed. I've seen that partial fragments of the log > are printed and that the " [CMS" fragment is printed only after a CMS > has been triggered. So, the problem is bounded by the log messages > (sans a stray pointer bug). > > My question is, shouldn't a 90% occupancy of tenured trigger a full GC? The CMS concurrent collections are fast from what I've seen (on the order of the time between ParNew collections). The rate at which objects are getting promoted is also low (maybe 3m per ParNew collection). CMS thinks it can wait to start a concurrent collection. The fact that a "promotion failure" happened makes it look like fragmentation. > > Regards, > Kirk > > > On Mar 1, 2011, at 5:29 PM, Jon Masamitsu wrote: > >> For 6692906 to be the problem there needs to be a >> CMS concurrent phase in progress (marking, precleaning or >> sweeping) and a minor collection running (with >> UseParNewGC in use). From the fragment of the gc log >> I could not tell for sure (maybe it was in the ... removed) >> but I don't think a concurrent phase was in progress >> so I would say it is not 6692906). Did you try >> -XX:-UseParNewGC as was suggested? Your minor >> pauses are not particularly long so maybe you >> could afford to try it. 6692906 will not happen >> without UseParNewGC. Note you need to turn of >> UseParNewGC as it is the default for CMS. >> >> Look back through the log for any other >> ParNew (promotion failed) and see what happens >> in those cases (in you find one). 2+ hours is too >> long. >> >> The gentleman who would know best about this code >> is out of the office until the end of the week. I'll talk >> to him about this to see if he remembers a recent >> fix that I don't. >> >> >> On 03/01/11 01:38, Bogdan Dimitriu wrote: >>> Hi guys, >>> >>> We're having a problem with garbage collection as described here: >>> http://forums.oracle.com/forums/message.jspa?messageID=9345173 (I >>> apologise if posting links is not the right policy, but I prefer not to >>> duplicate data). >>> >>> We are going to try an upgrade to JRE 6u24 soon, but reading the release >>> notes for each of the versions since 6u20, I don't have much hope of >>> this upgrade fixing the problem. >>> >>> I have searched a bit on the Java bugs database and I've come across >>> something that looks similar to the problem I am experiencing: >>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6692906, but I'm not >>> convinced this is exactly the same issue. This bug it seems will be >>> fixed in 6u25 (which I've read will be released late March or early >>> April). >>> >>> The reason I'm leaning towards thinking this is a JVM bug is the fact >>> that the JVM can stay in the hung state (as described on the forum) for >>> 2+ hours until we kill the process. >>> >>> I was hoping to get an idea about this from the source :), so any hints >>> will be greatly appreciated. >>> >>> Thanks, >>> Bogdan >>> _______________________________________________ >>> hotspot-gc-use mailing list >>> hotspot-gc-use at openjdk.java.net >>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20110301/56ae70c8/attachment.html From jon.masamitsu at oracle.com Tue Mar 1 14:44:39 2011 From: jon.masamitsu at oracle.com (Jon Masamitsu) Date: Tue, 01 Mar 2011 14:44:39 -0800 Subject: Hung JVM consuming 100% CPU In-Reply-To: <919F96FA-E09E-47BA-819C-0DFF2D9CF5F9@kodewerk.com> References: <4D6CBE9A.6040304@cisco.com> <4D6D1EFB.70606@oracle.com> <3BA76B61-E7DA-49E1-AEFE-819214B3C729@kodewerk.com> <4D6D6148.7090906@oracle.com> <919F96FA-E09E-47BA-819C-0DFF2D9CF5F9@kodewerk.com> Message-ID: <4D6D76D7.9070505@oracle.com> On 03/01/11 14:00, Charles K Pepperdine wrote: > > ... >> The CMS concurrent collections are fast from what I've seen (on the >> order of >> the time between ParNew collections). The rate at which objects are >> getting >> promoted is also low (maybe 3m per ParNew collection). CMS thinks it >> can wait to start a concurrent collection. The fact that a >> "promotion failure" >> happened makes it look like fragmentation. > > Even with occupancy greater than 90%? I could very well be wrong about > this but I thought that promotion failure meant just that, the young > gen collector wasn't able to stuff objects up into tenured space. That > didn't preclude fragmentation but in this case occupancy is 90% which > IIRC is well above the OMG, I can't keep up lets trigger a parallel > collection threshold. > > I don't see CMSInitiatingOccupancyFraction on the command line. -Xmx896m -Xss128k -XX:NewSize=384M -XX:MaxPermSize=96m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC so CMS isn't using occupancy of the tenured gen to start a cycle. That is the default behavior. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20110301/525fcdd5/attachment.html From justin at techadvise.com Wed Mar 2 09:22:07 2011 From: justin at techadvise.com (Justin Ellison) Date: Wed, 2 Mar 2011 11:22:07 -0600 Subject: Hung JVM consuming 100% CPU In-Reply-To: <4D6E0A06.2040301@cisco.com> References: <4D6CBE9A.6040304@cisco.com> <4D6E0A06.2040301@cisco.com> Message-ID: On Wed, Mar 2, 2011 at 3:12 AM, Bogdan Dimitriu wrote: > Hi Justin, > I'm quite sure it's not the good old FP bug, because it's the VM Thread that > seems to be hung (have a look at the native thread dump mentioned on the > forum thread, we found out that was the spinning thread using 'top') and not > one of the applications threads. > > Bogdan Yeah, sorry for the noise. It was one of those emails you wish you could take back 30 seconds after sending it. I didn't read enough of your issue before I sent that. My apologies, carry on :) Justin From jon.masamitsu at oracle.com Wed Mar 2 10:45:22 2011 From: jon.masamitsu at oracle.com (Jon Masamitsu) Date: Wed, 02 Mar 2011 10:45:22 -0800 Subject: Hung JVM consuming 100% CPU In-Reply-To: References: <4D6CBE9A.6040304@cisco.com> <4D6D1EFB.70606@oracle.com> <3BA76B61-E7DA-49E1-AEFE-819214B3C729@kodewerk.com> <4D6D6148.7090906@oracle.com> <919F96FA-E09E-47BA-819C-0DFF2D9CF5F9@kodewerk.com> <4D6D76D7.9070505@oracle.com> Message-ID: <4D6E9042.4000201@oracle.com> Kirk, you're right! The MinFreeRatio and MaxFreeRatio do result in an initiating occupancy as you describe. Does seem like a concurrent collection should have started. On 3/1/2011 9:06 PM, Charles K Pepperdine wrote: > Hi Jon, > > >>> >>> >> I don't see CMSInitiatingOccupancyFraction on the command line. >> >> >> -Xmx896m -Xss128k -XX:NewSize=384M -XX:MaxPermSize=96m >> -XX:+UseParNewGC -XX:+UseConcMarkSweepGC >> >> so CMS isn't using occupancy of the tenured gen to start a cycle. >> That is the >> default behavior. > > Ok, I'm trying to understand what you are saying. It was my > understanding that CMS is initiated when tenured reaches a value of > 68% = current occupancy + the anticipated promotions. So are you > saying that setting the initiating occupancy fraction doesn't override > that value, it completely changes how CMS is triggered? Is the > flag UseCMSInitiatingOccupancyOnly only valid when you > use CMSInitiatingOccupancyFraction or does that flag also use > anticipated promotions as part of it's calculation? > > TIA, > Kirk -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20110302/95113e96/attachment.html From bdimitri at cisco.com Wed Mar 2 01:12:38 2011 From: bdimitri at cisco.com (Bogdan Dimitriu) Date: Wed, 02 Mar 2011 09:12:38 +0000 Subject: Hung JVM consuming 100% CPU In-Reply-To: References: <4D6CBE9A.6040304@cisco.com> Message-ID: <4D6E0A06.2040301@cisco.com> Hi Justin, I'm quite sure it's not the good old FP bug, because it's the VM Thread that seems to be hung (have a look at the native thread dump mentioned on the forum thread, we found out that was the spinning thread using 'top') and not one of the applications threads. Bogdan On 01/03/11 17:40, Justin Ellison wrote: > Bogdan, > > Make sure you're not actually seeing this bug: > http://www.exploringbinary.com/fpupdater-fixes-the-java-2-2250738585072012e-308-bug/ > > It sounds strikingly similar. > > Justin > > On Tue, Mar 1, 2011 at 3:38 AM, Bogdan Dimitriu wrote: >> Hi guys, >> >> We're having a problem with garbage collection as described here: >> http://forums.oracle.com/forums/message.jspa?messageID=9345173 (I >> apologise if posting links is not the right policy, but I prefer not to >> duplicate data). >> >> We are going to try an upgrade to JRE 6u24 soon, but reading the release >> notes for each of the versions since 6u20, I don't have much hope of >> this upgrade fixing the problem. >> >> I have searched a bit on the Java bugs database and I've come across >> something that looks similar to the problem I am experiencing: >> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6692906, but I'm not >> convinced this is exactly the same issue. This bug it seems will be >> fixed in 6u25 (which I've read will be released late March or early April). >> >> The reason I'm leaning towards thinking this is a JVM bug is the fact >> that the JVM can stay in the hung state (as described on the forum) for >> 2+ hours until we kill the process. >> >> I was hoping to get an idea about this from the source :), so any hints >> will be greatly appreciated. >> >> Thanks, >> Bogdan >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> From jon.masamitsu at oracle.com Wed Mar 2 11:56:19 2011 From: jon.masamitsu at oracle.com (Jon Masamitsu) Date: Wed, 02 Mar 2011 11:56:19 -0800 Subject: Hung JVM consuming 100% CPU In-Reply-To: References: <4D6CBE9A.6040304@cisco.com> <4D6D1EFB.70606@oracle.com> <3BA76B61-E7DA-49E1-AEFE-819214B3C729@kodewerk.com> <4D6D6148.7090906@oracle.com> <919F96FA-E09E-47BA-819C-0DFF2D9CF5F9@kodewerk.com> <4D6D76D7.9070505@oracle.com> <4D6E9042.4000201@oracle.com> Message-ID: <4D6EA0E3.8080408@oracle.com> Kirk, Just checked and the default values of MinHeapFreeRatio results in an initiating occupancy of 92% so that answers that (see ConcurrentMarkSweepGeneration::init_initiating_occupancy()). Hopefully not to get too boring but I think the problem has to do with a partially initialized object. I've seen other parts of the collector run into such objects but this may be a new one. Have to consult about this. Jon PS. Don't think it is the mix of objects. More likely a timing issue plus some circumstance we have not run into before. On 3/2/2011 11:31 AM, Charles K Pepperdine wrote: > Hi Jon, > > Thats a relief, I though I was going to have to rework a section of > slides! > > So, it seems like an unexpected glut of objects were promoted and the > cms-abortable-preclean wasn't able to get rid of anything grey. Could > it be that fragmentation was bad enough and free space was low enough > to make compaction difficult if not impossible? Certainly enough free > space to avoid OOME. How would one simulate that? Maps of Maps of Maps > of Maps ad nasium with holes punched into them containing randomly > sized data? > > TIA, > Kirk > > PS not to figure out why I'm half subscribed... > > On Mar 2, 2011, at 7:45 PM, Jon Masamitsu wrote: > >> Kirk, you're right! The MinFreeRatio and MaxFreeRatio do result in an >> initiating occupancy as you describe. Does seem like a concurrent >> collection should have started. >> >> >> On 3/1/2011 9:06 PM, Charles K Pepperdine wrote: >>> Hi Jon, >>> >>> >>>>> >>>>> >>>> I don't see CMSInitiatingOccupancyFraction on the command line. >>>> >>>> >>>> -Xmx896m -Xss128k -XX:NewSize=384M -XX:MaxPermSize=96m >>>> -XX:+UseParNewGC -XX:+UseConcMarkSweepGC >>>> >>>> so CMS isn't using occupancy of the tenured gen to start a cycle. >>>> That is the >>>> default behavior. >>> >>> Ok, I'm trying to understand what you are saying. It was my >>> understanding that CMS is initiated when tenured reaches a value of >>> 68% = current occupancy + the anticipated promotions. So are you >>> saying that setting the initiating occupancy fraction doesn't >>> override that value, it completely changes how CMS is triggered? Is >>> the flag UseCMSInitiatingOccupancyOnly only valid when you >>> use CMSInitiatingOccupancyFraction or does that flag also use >>> anticipated promotions as part of it's calculation? >>> >>> TIA, >>> Kirk > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20110302/055ac2cd/attachment.html From bdimitri at cisco.com Thu Mar 3 01:18:58 2011 From: bdimitri at cisco.com (Bogdan Dimitriu) Date: Thu, 03 Mar 2011 09:18:58 +0000 Subject: Hung JVM consuming 100% CPU In-Reply-To: <4D6E1397.9080601@cisco.com> References: <4D6CBE9A.6040304@cisco.com> <4D6D1EFB.70606@oracle.com> <4D6E1397.9080601@cisco.com> Message-ID: <4D6F5D02.3080507@cisco.com> Hi Jon, I haven't tried the "-XX:-UseParNewGC" option as I wasn't very sure of its effects. On the web it seems to be pretty much only mentioned in the context of JDK 1.4.2, but now that you mention it, I guess it's probably because it is on by default in JVM 5&6. I have looked back through the logs for messages about ParNew promotion failures. The message only appears once, when the JVM gets into the hung state. The problem is though that I only have the GC logs for the last 10 hours, as somehow the first part of the log is consistently corrupted on all the servers. I had to use the Linux "strings" command to get rid of the corrupted first part, so unfortunately I have the logs only for the last 10 hours of the entire run (in the example that I'll paste below the application had been running for about 48 hours). I suspect this is related to the fact that the JVM overwrites the existing file, so in the next release of the application I will modify the startup scripts to create a new file each time it starts up. Actually I'm not sure whether the Concurrent phase is in progress - it might be, as CMS is mentioned in the last line of the log (which is incomplete, suggesting this is where it got stuck). Here is the last part of the log, including a Full GC run: --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 172810.835: [GC 172810.836: [ParNew: 335013K->17207K(353920K), 0.0256640 secs] 812472K->494666K(878208K), 0.0262000 secs] [Times: user=0.10 sys=0.00, real=0.03 secs] 172818.346: [GC 172818.346: [ParNew: 331831K->25313K(353920K), 0.0372040 secs] 809290K->502772K(878208K), 0.0377570 secs] [Times: user=0.12 sys=0.00, real=0.04 secs] 172827.274: [GC 172827.274: [ParNew: 339937K->26650K(353920K), 0.0547470 secs] 817396K->506506K(878208K), 0.0552930 secs] [Times: user=0.27 sys=0.00, real=0.06 secs] 172835.992: [GC 172835.993: [ParNew: 341274K->24233K(353920K), 0.0399470 secs] 821130K->507267K(878208K), 0.0404820 secs] [Times: user=0.08 sys=0.00, real=0.04 secs] 172836.036: [GC [1 CMS-initial-mark: 483033K(524288K)] 507697K(878208K), 0.0157910 secs] [Times: user=0.01 sys=0.00, real=0.02 secs] 172836.052: [CMS-concurrent-mark-start] 172837.283: [CMS-concurrent-mark: 1.116/1.230 secs] [Times: user=2.56 sys=0.10, real=1.23 secs] 172837.283: [CMS-concurrent-preclean-start] 172837.289: [CMS-concurrent-preclean: 0.003/0.006 secs] [Times: user=0.00 sys=0.01, real=0.00 secs] 172837.289: [CMS-concurrent-abortable-preclean-start] CMS: abort preclean due to time 172842.290: [CMS-concurrent-abortable-preclean: 0.928/5.001 secs] [Times: user=2.44 sys=0.40, real=5.00 secs] 172842.293: [GC[YG occupancy: 272804 K (353920 K)]172842.293: [Rescan (parallel) , 0.1720890 secs]172842.465: [weak refs processing, 0.0513100 secs] [1 CMS-remark: 483033K(524288K)] 755838K(878208K), 0.2239370 secs] [Times: user=1.12 sys=0.01, real=0.22 secs] 172842.517: [CMS-concurrent-sweep-start] 172843.138: [CMS-concurrent-sweep: 0.581/0.620 secs] [Times: user=0.76 sys=0.03, real=0.62 secs] 172843.138: [CMS-concurrent-reset-start] 172843.145: [CMS-concurrent-reset: 0.007/0.007 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] 172844.526: [GC 172844.527: [ParNew: 338857K->19496K(353920K), 0.0274480 secs] 668134K->348772K(878208K), 0.0279560 secs] [Times: user=0.08 sys=0.00, real=0.03 secs] 172856.952: [GC 172856.952: [ParNew: 334120K->24285K(353920K), 0.0336270 secs] 663396K->354183K(878208K), 0.0341980 secs] [Times: user=0.12 sys=0.00, real=0.04 secs] 172863.470: [GC 172863.470: [ParNew: 338909K->26540K(353920K), 0.0318580 secs] 668807K->357471K(878208K), 0.0323820 secs] [Times: user=0.13 sys=0.00, real=0.03 secs] 172869.969: [GC 172869.969: [ParNew: 341164K->21134K(353920K), 0.0284350 secs] 672095K->352661K(878208K), 0.0289620 secs] [Times: user=0.08 sys=0.00, real=0.03 secs] 172875.900: [GC 172875.900: [ParNew: 335758K->26017K(353920K), 0.0351950 secs] 667285K->358464K(878208K), 0.0357110 secs] [Times: user=0.11 sys=0.00, real=0.03 secs] 172886.284: [GC 172886.284: [ParNew: 340641K->30161K(353920K), 0.0392460 secs] 673088K->363526K(878208K), 0.0398300 secs] [Times: user=0.14 sys=0.01, real=0.04 secs] 172894.582: [GC 172894.583: [ParNew: 344785K->33738K(353920K), 0.0528800 secs] 678150K->371746K(878208K), 0.0534410 secs] [Times: user=0.17 sys=0.00, real=0.05 secs] 172902.573: [GC 172902.574: [ParNew: 348362K->20566K(353920K), 0.0350850 secs] 686370K->360603K(878208K), 0.0356060 secs] [Times: user=0.10 sys=0.00, real=0.03 secs] 172911.092: [GC 172911.093: [ParNew: 335190K->25539K(353920K), 0.0291210 secs] 675227K->365577K(878208K), 0.0296190 secs] [Times: user=0.10 sys=0.00, real=0.03 secs] 172920.086: [GC 172920.086: [ParNew: 340163K->27850K(353920K), 0.0364810 secs] 680201K->369749K(878208K), 0.0369720 secs] [Times: user=0.13 sys=0.00, real=0.04 secs] 172926.834: [GC 172926.835: [ParNew: 342474K->19027K(353920K), 0.0212320 secs] 684373K->362527K(878208K), 0.0217080 secs] [Times: user=0.07 sys=0.01, real=0.02 secs] 172932.864: [GC 172932.864: [ParNew: 333651K->17634K(353920K), 0.0290070 secs] 677151K->361134K(878208K), 0.0295560 secs] [Times: user=0.11 sys=0.00, real=0.03 secs] 172941.276: [GC 172941.276: [ParNew: 332258K->23914K(353920K), 0.0309830 secs] 675758K->368230K(878208K), 0.0314980 secs] [Times: user=0.12 sys=0.00, real=0.03 secs] 172950.214: [GC 172950.214: [ParNew: 338538K->29813K(353920K), 0.0443870 secs] 682854K->376094K(878208K), 0.0448970 secs] [Times: user=0.16 sys=0.01, real=0.04 secs] 172960.301: [GC 172960.301: [ParNew: 344437K->22418K(353920K), 0.0328660 secs] 690718K->372184K(878208K), 0.0333630 secs] [Times: user=0.10 sys=0.00, real=0.04 secs] 172970.265: [GC 172970.265: [ParNew: 337042K->21191K(353920K), 0.0282100 secs] 686808K->370957K(878208K), 0.0286870 secs] [Times: user=0.09 sys=0.00, real=0.03 secs] 172979.430: [GC 172979.430: [ParNew: 335815K->25613K(353920K), 0.0337320 secs] 685581K->376256K(878208K), 0.0342430 secs] [Times: user=0.12 sys=0.01, real=0.03 secs] 172985.633: [GC 172985.633: [ParNew: 340237K->33418K(353920K), 0.0697150 secs] 690880K->386230K(878208K), 0.0702150 secs] [Times: user=0.30 sys=0.16, real=0.07 secs] 172991.103: [GC 172991.104: [ParNew: 348042K->13141K(353920K), 0.0245680 secs] 700854K->368781K(878208K), 0.0250600 secs] [Times: user=0.11 sys=0.00, real=0.02 secs] 172997.990: [GC 172997.990: [ParNew: 327765K->20332K(353920K), 0.0275460 secs] 683405K->375972K(878208K), 0.0280460 secs] [Times: user=0.11 sys=0.00, real=0.03 secs] 173006.103: [GC 173006.103: [ParNew: 334955K->25609K(353920K), 0.0350700 secs] 690595K->381249K(878208K), 0.0355930 secs] [Times: user=0.12 sys=0.01, real=0.04 secs] 173014.988: [GC 173014.988: [ParNew: 340233K->28945K(353920K), 0.0394950 secs] 695873K->388856K(878208K), 0.0399710 secs] [Times: user=0.18 sys=0.01, real=0.04 secs] 173024.762: [GC 173024.763: [ParNew: 343569K->20790K(353920K), 0.0285920 secs] 703480K->382552K(878208K), 0.0291690 secs] [Times: user=0.10 sys=0.00, real=0.03 secs] 173032.275: [GC 173032.275: [ParNew: 335414K->22651K(353920K), 0.0314880 secs] 697176K->384412K(878208K), 0.0319920 secs] [Times: user=0.09 sys=0.01, real=0.03 secs] 173040.422: [GC 173040.422: [ParNew: 337275K->28884K(353920K), 0.0364200 secs] 699036K->391736K(878208K), 0.0369290 secs] [Times: user=0.12 sys=0.00, real=0.03 secs] 173047.318: [GC 173047.318: [ParNew: 338931K->29454K(353920K), 0.0452430 secs] 701782K->394091K(878208K), 0.0457590 secs] [Times: user=0.18 sys=0.09, real=0.05 secs] 173055.117: [GC 173055.117: [ParNew: 344078K->16092K(353920K), 0.0269200 secs] 708715K->382187K(878208K), 0.0274180 secs] [Times: user=0.09 sys=0.01, real=0.03 secs] 173063.941: [GC 173063.941: [ParNew: 330716K->22134K(353920K), 0.0276290 secs] 696811K->388229K(878208K), 0.0281150 secs] [Times: user=0.11 sys=0.00, real=0.03 secs] 173072.327: [GC 173072.328: [ParNew: 336758K->28665K(353920K), 0.0321540 secs] 702853K->394760K(878208K), 0.0327300 secs] [Times: user=0.11 sys=0.00, real=0.03 secs] 173081.576: [GC 173081.577: [ParNew: 343289K->21232K(353920K), 0.0226000 secs] 709384K->388820K(878208K), 0.0231110 secs] [Times: user=0.08 sys=0.00, real=0.03 secs] 173091.248: [GC 173091.248: [ParNew: 335856K->20493K(353920K), 0.0307700 secs] 703444K->388081K(878208K), 0.0312920 secs] [Times: user=0.10 sys=0.00, real=0.04 secs] 173100.070: [GC 173100.070: [ParNew: 335117K->23963K(353920K), 0.0368090 secs] 702705K->392639K(878208K), 0.0373010 secs] [Times: user=0.12 sys=0.00, real=0.04 secs] 173108.493: [GC 173108.493: [ParNew: 337029K->24101K(353920K), 0.0345020 secs] 705704K->393929K(878208K), 0.0350420 secs] [Times: user=0.10 sys=0.03, real=0.04 secs] 173111.886: [GC 173111.886: [ParNew: 338725K->15160K(353920K), 0.0256680 secs] 708553K->384989K(878208K), 0.0261560 secs] [Times: user=0.09 sys=0.00, real=0.02 secs] 173118.311: [GC 173118.311: [ParNew: 329783K->25394K(353920K), 0.0327300 secs] 699611K->395883K(878208K), 0.0332530 secs] [Times: user=0.13 sys=0.00, real=0.03 secs] 173125.560: [GC 173125.560: [ParNew: 340018K->28397K(353920K), 0.0391560 secs] 710507K->401633K(878208K), 0.0396540 secs] [Times: user=0.15 sys=0.00, real=0.04 secs] 173133.837: [GC 173133.837: [ParNew: 343021K->29626K(353920K), 0.0398970 secs] 716257K->406532K(878208K), 0.0403770 secs] [Times: user=0.17 sys=0.01, real=0.04 secs] 173143.335: [GC 173143.335: [ParNew: 344248K->22141K(353920K), 0.0265350 secs] 721154K->400591K(878208K), 0.0270290 secs] [Times: user=0.09 sys=0.00, real=0.03 secs] 173151.453: [GC 173151.453: [ParNew: 336765K->20814K(353920K), 0.0258770 secs] 715215K->399264K(878208K), 0.0263840 secs] [Times: user=0.10 sys=0.00, real=0.02 secs] 173159.168: [GC 173159.168: [ParNew: 335438K->25731K(353920K), 0.0357620 secs] 713888K->405274K(878208K), 0.0362750 secs] [Times: user=0.12 sys=0.00, real=0.04 secs] 173168.286: [GC 173168.286: [ParNew: 340354K->20134K(353920K), 0.0225600 secs] 719897K->400643K(878208K), 0.0230880 secs] [Times: user=0.06 sys=0.01, real=0.02 secs] 173172.296: [GC 173172.296: [ParNew: 334758K->15053K(353920K), 0.0196610 secs] 715267K->395562K(878208K), 0.0201770 secs] [Times: user=0.09 sys=0.00, real=0.03 secs] 173179.419: [GC 173179.419: [ParNew: 329677K->22239K(353920K), 0.0269060 secs] 710186K->403528K(878208K), 0.0274240 secs] [Times: user=0.12 sys=0.00, real=0.02 secs] 173189.624: [GC 173189.624: [ParNew: 336863K->29527K(353920K), 0.0354300 secs] 718152K->411721K(878208K), 0.0359290 secs] [Times: user=0.13 sys=0.00, real=0.04 secs] 173199.332: [GC 173199.332: [ParNew: 344151K->22652K(353920K), 0.0313600 secs] 726345K->407117K(878208K), 0.0318730 secs] [Times: user=0.09 sys=0.00, real=0.03 secs] 173206.234: [GC 173206.235: [ParNew: 337276K->21251K(353920K), 0.0276060 secs] 721741K->405716K(878208K), 0.0281380 secs] [Times: user=0.11 sys=0.00, real=0.03 secs] 173213.212: [GC 173213.213: [ParNew: 335875K->23874K(353920K), 0.0343490 secs] 720340K->410581K(878208K), 0.0348690 secs] [Times: user=0.13 sys=0.00, real=0.04 secs] 173221.142: [GC 173221.142: [ParNew: 338498K->30114K(353920K), 0.0336290 secs] 725205K->417820K(878208K), 0.0341060 secs] [Times: user=0.13 sys=0.01, real=0.04 secs] 173229.352: [GC 173229.352: [ParNew: 344734K->22047K(353920K), 0.0212680 secs] 732440K->410996K(878208K), 0.0217690 secs] [Times: user=0.08 sys=0.00, real=0.02 secs] 173233.123: [GC 173233.123: [ParNew: 336671K->28934K(353920K), 0.0662500 secs] 725620K->417883K(878208K), 0.0667880 secs] [Times: user=0.25 sys=0.17, real=0.07 secs] 173238.572: [GC 173238.572: [ParNew: 343558K->15084K(353920K), 0.0414360 secs] 732507K->411434K(878208K), 0.0419820 secs] [Times: user=0.13 sys=0.00, real=0.04 secs] 173244.972: [GC 173244.972: [ParNew: 329702K->21166K(353920K), 0.0324990 secs] 726052K->417517K(878208K), 0.0330620 secs] [Times: user=0.10 sys=0.00, real=0.03 secs] 173251.914: [GC 173251.914: [ParNew: 335790K->25859K(353920K), 0.0349920 secs] 732141K->422209K(878208K), 0.0356000 secs] [Times: user=0.12 sys=0.00, real=0.03 secs] 173258.950: [GC 173258.951: [ParNew: 340483K->19646K(353920K), 0.0291690 secs] 736833K->417832K(878208K), 0.0296990 secs] [Times: user=0.09 sys=0.00, real=0.03 secs] 173265.537: [GC 173265.538: [ParNew: 334270K->19549K(353920K), 0.0252880 secs] 732456K->417735K(878208K), 0.0258010 secs] [Times: user=0.10 sys=0.00, real=0.02 secs] 173272.752: [GC 173272.752: [ParNew: 334173K->23934K(353920K), 0.0389720 secs] 732359K->423181K(878208K), 0.0395380 secs] [Times: user=0.11 sys=0.00, real=0.04 secs] 173277.439: [GC 173277.439: [ParNew: 338558K->25186K(353920K), 0.0416090 secs] 737805K->427348K(878208K), 0.0421310 secs] [Times: user=0.17 sys=0.01, real=0.04 secs] 173279.881: [GC 173279.881: [ParNew: 339810K->23951K(353920K), 0.0278470 secs] 741972K->426113K(878208K), 0.0283570 secs] [Times: user=0.10 sys=0.00, real=0.03 secs] 173282.774: [GC 173282.774: [ParNew: 338575K->20398K(353920K), 0.0363580 secs] 740737K->426564K(878208K), 0.0368540 secs] [Times: user=0.12 sys=0.00, real=0.04 secs] 173285.342: [GC 173285.342: [ParNew: 335022K->16019K(353920K), 0.0337310 secs] 741188K->426121K(878208K), 0.0342230 secs] [Times: user=0.13 sys=0.00, real=0.03 secs] 173287.877: [GC 173287.878: [ParNew: 330643K->12775K(353920K), 0.0176100 secs] 740745K->423452K(878208K), 0.0181520 secs] [Times: user=0.05 sys=0.00, real=0.02 secs] 173290.154: [GC 173290.155: [ParNew: 327399K->11274K(353920K), 0.0203460 secs] 738076K->422256K(878208K), 0.0208680 secs] [Times: user=0.05 sys=0.00, real=0.02 secs] 173292.536: [GC 173292.536: [ParNew: 325898K->12115K(353920K), 0.0220890 secs] 736880K->423752K(878208K), 0.0225970 secs] [Times: user=0.06 sys=0.00, real=0.03 secs] 173293.987: [GC 173293.987: [ParNew: 326739K->14877K(353920K), 0.0247410 secs] 738376K->427058K(878208K), 0.0252260 secs] [Times: user=0.09 sys=0.00, real=0.03 secs] 173296.868: [GC 173296.869: [ParNew: 329501K->17131K(353920K), 0.0286210 secs] 741682K->430685K(878208K), 0.0291230 secs] [Times: user=0.10 sys=0.00, real=0.03 secs] 173299.613: [GC 173299.613: [ParNew: 331755K->15907K(353920K), 0.0280770 secs] 745309K->431159K(878208K), 0.0286330 secs] [Times: user=0.10 sys=0.00, real=0.03 secs] 173302.694: [GC 173302.695: [ParNew: 330531K->17876K(353920K), 0.0331540 secs] 745783K->434465K(878208K), 0.0337160 secs] [Times: user=0.10 sys=0.00, real=0.03 secs] 173305.183: [GC 173305.183: [ParNew: 332500K->17103K(353920K), 0.0294830 secs] 749089K->436924K(878208K), 0.0300120 secs] [Times: user=0.09 sys=0.00, real=0.03 secs] 173307.693: [GC 173307.693: [ParNew: 331727K->12777K(353920K), 0.0277740 secs] 751548K->435194K(878208K), 0.0282400 secs] [Times: user=0.09 sys=0.00, real=0.02 secs] 173310.303: [GC 173310.303: [ParNew: 327401K->11467K(353920K), 0.0236230 secs] 749818K->435600K(878208K), 0.0241380 secs] [Times: user=0.08 sys=0.01, real=0.02 secs] 173313.261: [GC 173313.262: [ParNew: 326091K->9428K(353920K), 0.0213220 secs] 750224K->435412K(878208K), 0.0218490 secs] [Times: user=0.08 sys=0.00, real=0.03 secs] 173316.133: [GC 173316.134: [ParNew: 324052K->7805K(353920K), 0.0129430 secs] 750036K->433987K(878208K), 0.0134990 secs] [Times: user=0.05 sys=0.00, real=0.02 secs] 173318.989: [GC 173318.990: [ParNew: 322429K->7789K(353920K), 0.0186160 secs] 748611K->434251K(878208K), 0.0191230 secs] [Times: user=0.05 sys=0.00, real=0.02 secs] 173321.568: [GC 173321.569: [ParNew: 322413K->8907K(353920K), 0.0162900 secs] 748875K->435745K(878208K), 0.0167790 secs] [Times: user=0.06 sys=0.00, real=0.02 secs] 173324.187: [GC 173324.187: [ParNew: 323531K->8944K(353920K), 0.0156860 secs] 750369K->436376K(878208K), 0.0161730 secs] [Times: user=0.06 sys=0.00, real=0.01 secs] 173327.256: [GC 173327.256: [ParNew: 323568K->10076K(353920K), 0.0221340 secs] 751000K->437761K(878208K), 0.0226440 secs] [Times: user=0.06 sys=0.00, real=0.02 secs] 173329.671: [GC 173329.671: [ParNew: 324700K->13122K(353920K), 0.0263700 secs] 752385K->441961K(878208K), 0.0268870 secs] [Times: user=0.07 sys=0.00, real=0.03 secs] 173332.531: [GC 173332.532: [ParNew: 327746K->12210K(353920K), 0.0232720 secs] 756585K->442348K(878208K), 0.0237890 secs] [Times: user=0.07 sys=0.00, real=0.02 secs] 173337.361: [GC 173337.361: [ParNew: 326834K->16932K(353920K), 0.0282830 secs] 756972K->448294K(878208K), 0.0288680 secs] [Times: user=0.10 sys=0.01, real=0.03 secs] 173345.542: [GC 173345.542: [ParNew: 331556K->23804K(353920K), 0.0418770 secs] 762918K->456600K(878208K), 0.0423920 secs] [Times: user=0.16 sys=0.00, real=0.04 secs] 173354.129: [GC 173354.130: [ParNew: 338428K->18752K(353920K), 0.0295520 secs] 771224K->451833K(878208K), 0.0300530 secs] [Times: user=0.08 sys=0.00, real=0.03 secs] 173359.146: [GC 173359.147: [ParNew: 333376K->20265K(353920K), 0.0283780 secs] 766457K->453672K(878208K), 0.0288950 secs] [Times: user=0.11 sys=0.00, real=0.03 secs] 173368.127: [GC 173368.127: [ParNew: 334882K->28277K(353920K), 0.0354640 secs] 768289K->462519K(878208K), 0.0359880 secs] [Times: user=0.13 sys=0.00, real=0.03 secs] 173378.520: [GC 173378.520: [ParNew: 342901K->22067K(353920K), 0.0296420 secs] 777143K->457756K(878208K), 0.0301390 secs] [Times: user=0.09 sys=0.00, real=0.03 secs] 173387.164: [GC 173387.164: [ParNew: 336691K->22012K(353920K), 0.0267320 secs] 772380K->457701K(878208K), 0.0272360 secs] [Times: user=0.11 sys=0.01, real=0.03 secs] 173394.740: [GC 173394.740: [ParNew: 336631K->26339K(353920K), 0.0388220 secs] 772320K->465461K(878208K), 0.0393170 secs] [Times: user=0.14 sys=0.00, real=0.04 secs] 173400.862: [GC 173400.862: [ParNew: 340963K->24924K(353920K), 0.0303880 secs] 780085K->465928K(878208K), 0.0308870 secs] [Times: user=0.13 sys=0.00, real=0.03 secs] 173410.809: [GC 173410.809: [ParNew: 339548K->17041K(353920K), 0.0252030 secs] 780552K->458044K(878208K), 0.0256960 secs] [Times: user=0.06 sys=0.00, real=0.03 secs] 173416.966: [GC 173416.966: [ParNew: 331665K->19489K(353920K), 0.0358120 secs] 772668K->461299K(878208K), 0.0363700 secs] [Times: user=0.11 sys=0.00, real=0.03 secs] 173424.717: [GC 173424.717: [ParNew: 334113K->28375K(353920K), 0.0369290 secs] 775923K->470795K(878208K), 0.0374130 secs] [Times: user=0.11 sys=0.00, real=0.04 secs] 173433.182: [GC 173433.182: [ParNew: 342999K->30523K(353920K), 0.0353430 secs] 785419K->475898K(878208K), 0.0358290 secs] [Times: user=0.16 sys=0.00, real=0.04 secs] 173442.796: [GC 173442.796: [ParNew: 345147K->21722K(353920K), 0.0315550 secs] 790522K->470412K(878208K), 0.0320370 secs] [Times: user=0.11 sys=0.00, real=0.03 secs] 173449.770: [GC 173449.771: [ParNew: 336346K->19023K(353920K), 0.0298400 secs] 785036K->467713K(878208K), 0.0303510 secs] [Times: user=0.08 sys=0.01, real=0.03 secs] 173457.125: [GC 173457.126: [ParNew: 333647K->29658K(353920K), 0.0358550 secs] 782337K->479044K(878208K), 0.0363040 secs] [Times: user=0.12 sys=0.00, real=0.04 secs] 173465.254: [GC 173465.254: [ParNew: 344282K->21370K(353920K), 0.0570640 secs] 793668K->477175K(878208K), 0.0575510 secs] [Times: user=0.17 sys=0.00, real=0.05 secs] 173474.393: [GC 173474.393: [ParNew: 335994K->21657K(353920K), 0.0272450 secs] 791799K->477462K(878208K), 0.0277450 secs] [Times: user=0.08 sys=0.01, real=0.03 secs] 173479.220: [GC 173479.221: [ParNew: 336281K->22488K(353920K), 0.0314980 secs] 792086K->478293K(878208K), 0.0319760 secs] [Times: user=0.10 sys=0.00, real=0.03 secs] 173487.947: [GC 173487.947: [ParNew: 337112K->30437K(353920K), 0.0347540 secs] 792917K->487554K(878208K), 0.0353030 secs] [Times: user=0.14 sys=0.01, real=0.04 secs] 173495.006: [GC 173495.006: [ParNew: 345061K->28495K(353920K), 0.0532260 secs] 802178K->490768K(878208K), 0.0537490 secs] [Times: user=0.25 sys=0.01, real=0.06 secs] 173503.977: [GC 173503.977: [ParNew: 343119K->22143K(353920K), 0.0355340 secs] 805392K->487338K(878208K), 0.0360720 secs] [Times: user=0.09 sys=0.00, real=0.04 secs] 173512.551: [GC 173512.552: [ParNew: 336767K->21732K(353920K), 0.0286980 secs] 801962K->486927K(878208K), 0.0291940 secs] [Times: user=0.11 sys=0.00, real=0.03 secs] 173519.010: [GC 173519.010: [ParNew: 336356K->36349K(353920K), 0.0708280 secs] 801551K->502664K(878208K), 0.0713080 secs] [Times: user=0.28 sys=0.18, real=0.07 secs] 173525.439: [GC 173525.439: [ParNew: 350973K->19075K(353920K), 0.0544040 secs] 817288K->496267K(878208K), 0.0548920 secs] [Times: user=0.24 sys=0.00, real=0.06 secs] 173533.166: [GC 173533.166: [ParNew: 333698K->16951K(353920K), 0.0235020 secs] 810890K->494143K(878208K), 0.0239960 secs] [Times: user=0.08 sys=0.00, real=0.03 secs] 173539.189: [GC 173539.189: [ParNew: 331575K->19185K(353920K), 0.0236660 secs] 808767K->496377K(878208K), 0.0241110 secs] [Times: user=0.09 sys=0.00, real=0.02 secs] 173544.518: [GC 173544.518: [ParNew: 333809K->24147K(353920K), 0.0287630 secs] 811001K->501339K(878208K), 0.0292370 secs] [Times: user=0.14 sys=0.00, real=0.02 secs] 173552.911: [GC 173552.911: [ParNew (promotion failed): 338771K->343936K(353920K), 0.1432700 secs]173553.054: [CMS --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- On 01/03/11 16:29, Jon Masamitsu wrote: > For 6692906 to be the problem there needs to be a > CMS concurrent phase in progress (marking, precleaning or > sweeping) and a minor collection running (with > UseParNewGC in use). From the fragment of the gc log > I could not tell for sure (maybe it was in the ... removed) > but I don't think a concurrent phase was in progress > so I would say it is not 6692906). Did you try > -XX:-UseParNewGC as was suggested? Your minor > pauses are not particularly long so maybe you > could afford to try it. 6692906 will not happen > without UseParNewGC. Note you need to turn of > UseParNewGC as it is the default for CMS. > > Look back through the log for any other > ParNew (promotion failed) and see what happens > in those cases (in you find one). 2+ hours is too > long. > > The gentleman who would know best about this code > is out of the office until the end of the week. I'll talk > to him about this to see if he remembers a recent > fix that I don't. > > > On 03/01/11 01:38, Bogdan Dimitriu wrote: >> Hi guys, >> >> We're having a problem with garbage collection as described here: >> http://forums.oracle.com/forums/message.jspa?messageID=9345173 (I >> apologise if posting links is not the right policy, but I prefer not to >> duplicate data). >> >> We are going to try an upgrade to JRE 6u24 soon, but reading the release >> notes for each of the versions since 6u20, I don't have much hope of >> this upgrade fixing the problem. >> >> I have searched a bit on the Java bugs database and I've come across >> something that looks similar to the problem I am experiencing: >> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6692906, but I'm not >> convinced this is exactly the same issue. This bug it seems will be >> fixed in 6u25 (which I've read will be released late March or early >> April). >> >> The reason I'm leaning towards thinking this is a JVM bug is the fact >> that the JVM can stay in the hung state (as described on the forum) for >> 2+ hours until we kill the process. >> >> I was hoping to get an idea about this from the source :), so any hints >> will be greatly appreciated. >> >> Thanks, >> Bogdan >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From y.s.ramakrishna at oracle.com Thu Mar 3 03:01:21 2011 From: y.s.ramakrishna at oracle.com (Y. Srinivas Ramakrishna) Date: Thu, 03 Mar 2011 03:01:21 -0800 Subject: Hung JVM consuming 100% CPU In-Reply-To: <4D6CBE9A.6040304@cisco.com> References: <4D6CBE9A.6040304@cisco.com> Message-ID: <4D6F7501.5070805@oracle.com> Hi Bogdan -- I agree with Jon that this is not 6692906. Rather it's a case of a bug in object size computation causing the GC to go awry. There _were_ a few related bugs fixed in this area not long ago (in 6u23 i believe), so I believe you'll find that the upgrade to 6u23 or higher would fix this problem. If you find that the latest 6uXX update (6u24?) still exhibits this problem, please file a new bug via your Java support channel and let us know. thanks! -- ramki On 3/1/2011 1:38 AM, Bogdan Dimitriu wrote: > Hi guys, > > We're having a problem with garbage collection as described here: > http://forums.oracle.com/forums/message.jspa?messageID=9345173 (I > apologise if posting links is not the right policy, but I prefer not to > duplicate data). > > We are going to try an upgrade to JRE 6u24 soon, but reading the release > notes for each of the versions since 6u20, I don't have much hope of > this upgrade fixing the problem. > > I have searched a bit on the Java bugs database and I've come across > something that looks similar to the problem I am experiencing: > http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6692906, but I'm not > convinced this is exactly the same issue. This bug it seems will be > fixed in 6u25 (which I've read will be released late March or early April). > > The reason I'm leaning towards thinking this is a JVM bug is the fact > that the JVM can stay in the hung state (as described on the forum) for > 2+ hours until we kill the process. > > I was hoping to get an idea about this from the source :), so any hints > will be greatly appreciated. > > Thanks, > Bogdan > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From bdimitri at cisco.com Thu Mar 3 04:31:15 2011 From: bdimitri at cisco.com (Bogdan Dimitriu) Date: Thu, 03 Mar 2011 12:31:15 +0000 Subject: Hung JVM consuming 100% CPU In-Reply-To: <4D6F7501.5070805@oracle.com> References: <4D6CBE9A.6040304@cisco.com> <4D6F7501.5070805@oracle.com> Message-ID: <4D6F8A13.1030903@cisco.com> Thanks Srinivas, We will upgrade our servers to JRE 6u24 soon and I will let you know after a while if we are still seeing the problem. If it is still a problem, do you reckon that fiddling with the -XX:CMSInitiatingOccupancyFraction option could improve the fragmentation issue and hence will make compaction less likely (assuming the bug is in the compaction code)? We've got some CPU cycles to spare, so we could afford to lower this value a bit. Bogdan On 03/03/11 11:01, Y. Srinivas Ramakrishna wrote: > Hi Bogdan -- > > I agree with Jon that this is not 6692906. > Rather it's a case of a bug in object size computation > causing the GC to go awry. There _were_ a few related > bugs fixed in this area not long ago (in 6u23 i believe), > so I believe you'll find that the upgrade to 6u23 > or higher would fix this problem. > If you find that the latest 6uXX update (6u24?) still > exhibits this problem, please file a new bug via your > Java support channel and let us know. > > thanks! > -- ramki > > On 3/1/2011 1:38 AM, Bogdan Dimitriu wrote: >> Hi guys, >> >> We're having a problem with garbage collection as described here: >> http://forums.oracle.com/forums/message.jspa?messageID=9345173 (I >> apologise if posting links is not the right policy, but I prefer not to >> duplicate data). >> >> We are going to try an upgrade to JRE 6u24 soon, but reading the release >> notes for each of the versions since 6u20, I don't have much hope of >> this upgrade fixing the problem. >> >> I have searched a bit on the Java bugs database and I've come across >> something that looks similar to the problem I am experiencing: >> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6692906, but I'm not >> convinced this is exactly the same issue. This bug it seems will be >> fixed in 6u25 (which I've read will be released late March or early >> April). >> >> The reason I'm leaning towards thinking this is a JVM bug is the fact >> that the JVM can stay in the hung state (as described on the forum) for >> 2+ hours until we kill the process. >> >> I was hoping to get an idea about this from the source :), so any hints >> will be greatly appreciated. >> >> Thanks, >> Bogdan >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > From aaisinzon at guidewire.com Thu Mar 3 08:55:59 2011 From: aaisinzon at guidewire.com (Alex Aisinzon) Date: Thu, 3 Mar 2011 08:55:59 -0800 Subject: A few questions about G1 Message-ID: Hi all I noted that, in Sun JVM 1.6 update 24, G1 no longer needs the -XX:+UnlockExperimentalVMOptions option, or at least when testing with the -version flag. I guess G1 is more mature and will give it a try as part of some tests with the 1.6 JVM. I had some questions about G1: * Region size: our product does not generate large objects. Nevertheless, some of our customers will configure it in a way that results in the allocation of large objects. The typical example is that our product will be integrated with a document management system and that some very large documents (10s of MB or even 100s of MB) will go through the JVM that supports our product. I wonder what happens if the region size is smaller than these large objects and wonder if one can size up the region size pre-emptively to avoid possible negative effects. It may also make sense to instrument the JVM to output in the GC log the size of the object that caused a region to become full, as it would allow to identify such issues. * Pause time: with Sun JVM 1.5, I had experimented with ParallelGC, ParallelOldGC and CMS. The two first show long full collections. CMS was mostly better but resulted in some rare even longer collections, which I attributed to fragmentation, which was in line with our applications memory behavior. I wonder what customer/benchmark experience is so far with G1 and whether I can hope to avoid these long pauses with G1. * Possible tunings to minimize pause times: what would be the best tunings to explore with G1, considering that objective of containing pause times. From reading http://www.oracle.com/technetwork/java/javase/tech/g1-intro-jsp-135488.h tml, it seems that MaxGCPauseMillis would be a good tuning. Are they other tunings I should consider to test G1 more specifically on the pause time dimension? Thanks in advance Alex Aisinzon -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20110303/6a34ff2c/attachment.html From bhorowit at gmail.com Thu Mar 3 09:54:06 2011 From: bhorowit at gmail.com (Ben Horowitz) Date: Thu, 3 Mar 2011 09:54:06 -0800 Subject: Java library + JVMTI agent for getting GC pause times? Message-ID: Hi all, Does anyone know of a Java library/JVMTI agent pair that allows a Java application access to GC pause times soon after GC pauses occur? It'd be useful to incorporate this information into the logs written by my Java application for easier correlation with other application behavior. Of course, we get this information from GC log files, but correlating multiple log files is time consuming. I'm thinking of writing such a library/agent pair myself, but don't want to duplicate work that's already been done. Thanks, Ben -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20110303/5be15eb5/attachment.html From y.s.ramakrishna at oracle.com Thu Mar 3 09:57:11 2011 From: y.s.ramakrishna at oracle.com (Y. Srinivas Ramakrishna) Date: Thu, 03 Mar 2011 09:57:11 -0800 Subject: Hung JVM consuming 100% CPU In-Reply-To: <4D6F8A13.1030903@cisco.com> References: <4D6CBE9A.6040304@cisco.com> <4D6F7501.5070805@oracle.com> <4D6F8A13.1030903@cisco.com> Message-ID: <4D6FD677.7030405@oracle.com> On 3/3/2011 4:31 AM, Bogdan Dimitriu wrote: > Thanks Srinivas, > > We will upgrade our servers to JRE 6u24 soon and I will let you know after a while if we are still > seeing the problem. OK, thanks! > > If it is still a problem, do you reckon that fiddling with the -XX:CMSInitiatingOccupancyFraction > option could improve the fragmentation issue and hence will make compaction less likely (assuming > the bug is in the compaction code)? We've got some CPU cycles to spare, so we could afford to lower > this value a bit. The bug i am thinking about is actually not in compaction code; rather it's in ParNew+CMS scavenge code that causes promoted objects to be corrupted. Compaction is then unlucky to run into those objects (because they are still alive). So avoiding the compaction is not necessarily going to avoid the issue. Nonetheless, avoiding a situation where compaction is needed is probably worthwhile, because I am assuming that you are using CMS so as to avoid the kind of costly GC pauses that compaction entails. Thus, tuning to avoid compaction has independent value, i think. PS: By the way, I think it might be worthwhile to send in the core file to Oracle, in any case, for analysis to see if the crash in question has the rough shape of the bug(s) that we were thinking about... -- ramki > > Bogdan > > On 03/03/11 11:01, Y. Srinivas Ramakrishna wrote: >> Hi Bogdan -- >> >> I agree with Jon that this is not 6692906. >> Rather it's a case of a bug in object size computation >> causing the GC to go awry. There _were_ a few related >> bugs fixed in this area not long ago (in 6u23 i believe), >> so I believe you'll find that the upgrade to 6u23 >> or higher would fix this problem. >> If you find that the latest 6uXX update (6u24?) still >> exhibits this problem, please file a new bug via your >> Java support channel and let us know. >> >> thanks! >> -- ramki >> >> On 3/1/2011 1:38 AM, Bogdan Dimitriu wrote: >>> Hi guys, >>> >>> We're having a problem with garbage collection as described here: >>> http://forums.oracle.com/forums/message.jspa?messageID=9345173 (I >>> apologise if posting links is not the right policy, but I prefer not to >>> duplicate data). >>> >>> We are going to try an upgrade to JRE 6u24 soon, but reading the release >>> notes for each of the versions since 6u20, I don't have much hope of >>> this upgrade fixing the problem. >>> >>> I have searched a bit on the Java bugs database and I've come across >>> something that looks similar to the problem I am experiencing: >>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6692906, but I'm not >>> convinced this is exactly the same issue. This bug it seems will be >>> fixed in 6u25 (which I've read will be released late March or early April). >>> >>> The reason I'm leaning towards thinking this is a JVM bug is the fact >>> that the JVM can stay in the hung state (as described on the forum) for >>> 2+ hours until we kill the process. >>> >>> I was hoping to get an idea about this from the source :), so any hints >>> will be greatly appreciated. >>> >>> Thanks, >>> Bogdan >>> _______________________________________________ >>> hotspot-gc-use mailing list >>> hotspot-gc-use at openjdk.java.net >>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> From aaisinzon at guidewire.com Sun Mar 13 07:51:39 2011 From: aaisinzon at guidewire.com (Alex Aisinzon) Date: Sun, 13 Mar 2011 07:51:39 -0700 Subject: A few questions about G1 Message-ID: Hi all I never heard back on this. Is there something wrong with the mailing list? Thanks in advance Alex Aisinzon From: Alex Aisinzon Sent: Thursday, March 03, 2011 8:56 AM To: hotspot-gc-use at openjdk.java.net Subject: A few questions about G1 Hi all I noted that, in Sun JVM 1.6 update 24, G1 no longer needs the -XX:+UnlockExperimentalVMOptions option, or at least when testing with the -version flag. I guess G1 is more mature and will give it a try as part of some tests with the 1.6 JVM. I had some questions about G1: * Region size: our product does not generate large objects. Nevertheless, some of our customers will configure it in a way that results in the allocation of large objects. The typical example is that our product will be integrated with a document management system and that some very large documents (10s of MB or even 100s of MB) will go through the JVM that supports our product. I wonder what happens if the region size is smaller than these large objects and wonder if one can size up the region size pre-emptively to avoid possible negative effects. It may also make sense to instrument the JVM to output in the GC log the size of the object that caused a region to become full, as it would allow to identify such issues. * Pause time: with Sun JVM 1.5, I had experimented with ParallelGC, ParallelOldGC and CMS. The two first show long full collections. CMS was mostly better but resulted in some rare even longer collections, which I attributed to fragmentation, which was in line with our applications memory behavior. I wonder what customer/benchmark experience is so far with G1 and whether I can hope to avoid these long pauses with G1. * Possible tunings to minimize pause times: what would be the best tunings to explore with G1, considering that objective of containing pause times. From reading http://www.oracle.com/technetwork/java/javase/tech/g1-intro-jsp-135488.h tml, it seems that MaxGCPauseMillis would be a good tuning. Are they other tunings I should consider to test G1 more specifically on the pause time dimension? Thanks in advance Alex Aisinzon -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20110313/013e139b/attachment.html From dalibor.topic at oracle.com Sun Mar 13 08:57:06 2011 From: dalibor.topic at oracle.com (Dalibor Topic) Date: Sun, 13 Mar 2011 16:57:06 +0100 Subject: A few questions about G1 In-Reply-To: References: Message-ID: <4D7CE952.3030404@oracle.com> On 3/13/11 3:51 PM, Alex Aisinzon wrote: > Hi all > > > > I never heard back on this. Is there something wrong with the mailing list? Hi Alex, It sounds like you're asking about G1's Early Access status and are thinking about using it in production. I'd agree that the disappearance of the flag is a bit confusing, but the intention is that it's still an experimental feature. As the release notes for Java SE 6 Update 21 say at http://www.oracle.com/technetwork/java/javase/6u21-156341.html , "G1 continues to be available in Early Access [..]" The release notes for later releases, including those for Java SE 6 Update 24 don't mention G1. Accordingly, its status is still the same - it's available in Early Access, so usage in production settings is neither recommended nor supported. cheers, dalibor topic -- Oracle Dalibor Topic | Java F/OSS Ambassador Phone: +494023646738 | | | Mobile: +491772664192 Oracle Java Platform Group ORACLE Deutschland B.V. & Co. KG | Nagelsweg 55 | 20097 Hamburg ORACLE Deutschland B.V. & Co. KG Hauptverwaltung: Riesstr. 25, D-80992 M?nchen Registergericht: Amtsgericht M?nchen, HRA 95603 Komplement?rin: ORACLE Deutschland Verwaltung B.V. Rijnzathe 6, 3454PV De Meern, Niederlande Handelsregister der Handelskammer Midden-Niederlande, Nr. 30143697 Gesch?ftsf?hrer: J?rgen Kunz, Marcel van de Molen, Alexander van der Ven Green Oracle Oracle is committed to developing practices and products that help protect the environment From y.s.ramakrishna at oracle.com Mon Mar 14 00:26:17 2011 From: y.s.ramakrishna at oracle.com (Y. Srinivas Ramakrishna) Date: Mon, 14 Mar 2011 00:26:17 -0700 Subject: A few questions about G1 In-Reply-To: References: Message-ID: <4D7DC319.9040502@oracle.com> Hi Alex -- On 3/13/2011 7:51 AM, Alex Aisinzon wrote: ... > I guess G1 is more mature and will give it a try as part of some tests > with the 1.6 JVM. Definitely more mature, both in performance and reliability. Try 6u25 EA, or 7 EA. > > I had some questions about G1: > > * Region size: our product does not generate large objects. > Nevertheless, some of our customers will configure it in a way that > results in the allocation of large objects. The typical example is that > our product will be integrated with a document management system and > that some very large documents (10s of MB or even 100s of MB) will go > through the JVM that supports our product. I wonder what happens if the > region size is smaller than these large objects and wonder if one can > size up the region size pre-emptively to avoid possible negative > effects. It may also make sense to instrument the JVM to output in the > GC log the size of the object that caused a region to become full, as it > would allow to identify such issues. G1 does treat large objects specially (see the ISMM 2004 paper). In typical programs the birth-death rates of large objects is quite small, so I would typically not expect G1 with its default region size to have any issues. So I would not start off with the assumption that region size will need any kind of adjustment. If you have any issues with the default size, you can revisit on this alias. > > * Pause time: with Sun JVM 1.5, I had experimented with > ParallelGC, ParallelOldGC and CMS. The two first show long full > collections. CMS was mostly better but resulted in some rare even longer > collections, which I attributed to fragmentation, which was in line with > our applications memory behavior. I wonder what customer/benchmark > experience is so far with G1 and whether I can hope to avoid these long > pauses with G1. G1 is compacting so the expectation is indeed that the long term fragmentation that is sometimes difficult to avoid with CMS would be much less of an issue with G1. Our experiences with G1 in the field are limited, but our expectation is indeed that it will cause far fewer issues with fragmentation. Please try it out and let us know of your experiences. > > * Possible tunings to minimize pause times: what would be the > best tunings to explore with G1, considering that objective of > containing pause times. From reading > http://www.oracle.com/technetwork/java/javase/tech/g1-intro-jsp-135488.h > tml, it seems that MaxGCPauseMillis would be a good tuning. Are they > other tunings I should consider to test G1 more specifically on the > pause time dimension? I believe most of our internal testing has shown that if your GC pause time and MMU settings are reasonable, G1 does a fairly good job of containing the pauses withing the requested band. Sometimes these may be difficult to meet, however, and in those cases you may need to explicitly size and shape the heap to avoid concurrent mode failure. Here, our experiences are again somewhat limited, so experiences from the field will help us refine our understanding of and tuning of G1 performance as we go forward. One thing to keep in mind is that G1 will typically do well with its heuristics if there is some amount of spare concurrency available on your platform to deal with its concurrent remembered set refinement and concurrent marking work, as well as, like CMS, extra room in the old generation so as to give its concurrent collection enough headroom to finish collection. If explicitly tuning the heap, larger heaps with smaller promotion rates into the old gen, just like in the case of CMS, would also be favourable to better operation of G1. I'll let those on our team more familiar with G1 performance and tuning please chine in with further detail as well as correct any inaccuracies in my understanding above. thanks! -- ramki From aaisinzon at guidewire.com Sat Mar 26 06:46:05 2011 From: aaisinzon at guidewire.com (Alex Aisinzon) Date: Sat, 26 Mar 2011 06:46:05 -0700 Subject: G1 feedback Message-ID: Hi all I experimented with G1 and Sun JDK 1.6 update 24 and ran two long running tests (8 hours) with it: With " -server -XX:+UseG1GC -XX:+UseCompressedOops -Xms24576m -Xmx24576m", most pauses were very short. 4 pauses were around 7 seconds and 4 at 34 seconds. I then added the objective of keeping the longest pause around 1 second and used "-server -XX:+UseG1GC -XX:MaxGCPauseMillis=1000 -XX:+UseCompressedOops -Xms24576m -Xmx24576m". Most pauses were a little above 1 second except for one pause at 8 seconds and one at 57 seconds. The server is a dual X5570 (8 cores total) and has 48GB of RAM. Its average CPU utilization was around 60-65% so it was not over-used. Everything would be perfect if it were not for the 7, 34 and 8, 57 seconds pauses. What would you recommend I do to either reduce these longer pauses or give insights into what happened so that G1 can avoid these very rare but pretty long pauses in the future? Thanks in advance Alex A -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20110326/a2551fbf/attachment.html