From Y.S.Ramakrishna at Sun.COM Tue Jan 1 12:46:26 2008 From: Y.S.Ramakrishna at Sun.COM (Y Srinivas Ramakrishna) Date: Tue, 01 Jan 2008 12:46:26 -0800 Subject: Perplexing GC Time Growth In-Reply-To: References: Message-ID: It's probably a combination of card-scanning times and allocation slow-down (but probably more of the former). We've had some internal instrumentation of card-scanning times in the JVM which unfortunately has not made into the JVM code proper because the instrumentation is not as lightweight as to be enabled in production. Perhaps a spin on a test system with the card-scanning times explicitly called out might shed light. Basically what happens with CMS is that allocation is from free lists, and lacking something like big bag of pages (BBOP) allocation, this has traditionally tended to scatter the allocated objects over a large number of pages. This increases card-scanning times, although one would normally expect that this would eventually stabilize. Do the scavenge times increase suddenly after a specific event or do they just creep up slowly after each scavenge? The complete GC log would be useful to look at to answer that question. -- ramki ----- Original Message ----- From: Jason Vasquez Date: Monday, December 31, 2007 10:21 am Subject: Perplexing GC Time Growth To: hotspot-gc-use at openjdk.java.net > Hi all, > > I'm having a perplexing problem -- the garbage collector appears to be > > functioning well, with a nice object/garbage lifecycle, yet minor GC > > times increase over the life of the process inexplicably. We are > working with telephony hardware with this application, so keeping GC > > pauses very low is quite important. (keeping well below 100 ms would > > be ideal) > > Here is the current configuration we are using: > > -server \ > -Xloggc:garbage.log \ > -XX:+PrintGCDetails \ > -Dsun.rmi.dgc.server.gcInterval=3600000 \ > -Dsun.rmi.dgc.client.gcInterval=3600000 \ > -XX:ParallelGCThreads=8 \ > -XX:+UseParNewGC \ > -XX:+UseConcMarkSweepGC \ > -XX:+PrintGCTimeStamps \ > -XX:-TraceClassUnloading \ > -XX:+AggressiveOpts \ > -Xmx512M \ > -Xms512M \ > -Xmn128M \ > -XX:MaxTenuringThreshold=6 \ > -XX:+ExplicitGCInvokesConcurrent > > A large number of our bigger objects size-wise live for approximately > > 4-5 minutes, thus the larger young generation, and tenuring threshold. > > This seems to be successful in filtering most objects before they > reach the tenured gen. (8 core x86 server, running 1.6.0_03-b05 on 32- > > bit Linux, kernel rev 2.6.18) > > Here is a representative snippet of our garbage log: > > 487.135: [GC 487.135: [ParNew: 112726K->7290K(118016K), 0.0218110 > secs] 134494K->29058K(511232K), 0.0220520 secs] > 557.294: [GC 557.294: [ParNew: 112250K->7976K(118016K), 0.0204220 > secs] 134018K->29744K(511232K), 0.0206690 secs] > 607.025: [GC 607.025: [ParNew: 112936K->7831K(118016K), 0.0231230 > secs] 134704K->30003K(511232K), 0.0233670 secs] > 672.522: [GC 672.522: [ParNew: 112791K->7361K(118016K), 0.0253620 > secs] 134963K->29533K(511232K), 0.0256080 secs] > ... > 4006.635: [GC 4006.635: [ParNew: 112983K->7386K(118016K), 0.0385960 > secs] 141969K->36608K(511232K), 0.0388460 secs] > 4083.066: [GC 4083.066: [ParNew: 112346K->8439K(118016K), 0.0365940 > secs] 141568K->37661K(511232K), 0.0368340 secs] > 4158.457: [GC 4158.457: [ParNew: 113399K->7152K(118016K), 0.0360540 > secs] 142621K->36374K(511232K), 0.0362990 secs] > 4228.312: [GC 4228.313: [ParNew: 112112K->8738K(118016K), 0.0362510 > secs] 141334K->38083K(511232K), 0.0365050 secs] > 4293.800: [GC 4293.800: [ParNew: 113698K->8294K(118016K), 0.0368700 > secs] 143043K->37917K(511232K), 0.0371160 secs] > ... > 10489.555: [GC 10489.556: [ParNew: 112701K->7770K(118016K), 0.0639770 > > secs] 151966K->47156K(511232K), 0.0642210 secs] > 10562.544: [GC 10562.544: [ParNew: 112730K->9267K(118016K), 0.0625900 > > secs] 152116K->48772K(511232K), 0.0628470 secs] > 10622.558: [GC 10622.558: [ParNew: 114227K->8361K(118016K), 0.0675730 > > secs] 153732K->48381K(511232K), 0.0678220 secs] > 10678.842: [GC 10678.842: [ParNew: 113056K->7214K(118016K), 0.0669330 > > secs] 153076K->47234K(511232K), 0.0671800 secs] > ... > 177939.062: [GC 177939.062: [ParNew: 112608K->8620K(118016K), > 0.7681440 secs] 466132K->362144K(511232K), 0.7684030 secs] > 178005.483: [GC 178005.483: [ParNew: 113449K->7731K(118016K), > 0.7677300 secs] 466973K->361893K(511232K), 0.7679890 secs] > 178069.658: [GC 178069.658: [ParNew: 112670K->6814K(118016K), > 0.7700020 secs] 466832K->360976K(511232K), 0.7702590 secs] > 178133.513: [GC 178133.513: [ParNew: 111717K->7920K(118016K), > 0.7702920 secs] 465879K->362082K(511232K), 0.7705560 secs] > > As you can see, the gc times continue to increase over time, on the > order of about 10-20ms per hour. CMS runs are spaced very far apart, > > in fact, since most objects die before reaching the tenured > generation, the CMS is triggered more by RMI DGC runs then by heap > growth. (We were getting serial GCs, apparently due to RMI DGC > before adding -XX:+ExplicitGCInvokesConcurrent) > > Here's some representative output from `jstat -gcutil -t -h10 2s`: > > Timestamp S0 S1 E O P YGC YGCT FGC FGCT GCT > 11067.6 55.74 0.00 89.32 9.73 59.84 168 7.471 8 0.280 7.751 > 11069.6 55.74 0.00 93.65 9.73 59.84 168 7.471 8 0.280 7.751 > 11071.6 55.74 0.00 99.34 9.73 59.84 168 7.471 8 0.280 7.751 > 11073.5 0.00 62.22 2.89 9.76 59.84 169 7.537 8 0.280 7.816 > 11075.6 0.00 62.22 4.80 9.76 59.84 169 7.537 8 0.280 7.816 > > Survivor spaces continue to sit at about 50-65% occupancy, which seems > > fairly good to my eye. Eden fills approximately every 70 seconds, > triggering minor GCs. > > > Any ideas? This is becoming quite frustrating for us -- our > application uptime is pretty horrible with the too-frequent scheduled > > restarts we are being forced to run. > > Thanks for any assistance you might be able to offer, > -Jason Vasquez > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From jason at mugfu.com Wed Jan 2 14:22:43 2008 From: jason at mugfu.com (Jason Vasquez) Date: Wed, 2 Jan 2008 17:22:43 -0500 Subject: Perplexing GC Time Growth In-Reply-To: <477C09ED.9020400@Sun.COM> References: <9BADD5B8-F9DA-4656-843B-7D44FF36963A@mugfu.com> <477C09ED.9020400@Sun.COM> Message-ID: <22133221-306B-41D3-AE57-155876104354@mugfu.com> I've enabled the flag and am starting the test now. It takes a few hours before we can see definite trending, but I'll let you know what we see. And yes, we do a fair bit of classloading/reflection work in the application. Thanks, jason On Jan 2, 2008, at 17:02 , Y.S.Ramakrishna at Sun.COM wrote: > Very cool log, showing a near linear and monotonic growth > not only in scavenge times, but also in initial mark and > remark times, see attached plots produced by the gc histogram > tool (from Tony Printezis, John Coomes et al). Then, looking > also at the log file, one also sees that the live data following > a GC is also increasing monotonically. > > While we think of potential reasons for this, or mull over > appropriate sensors that can lay bare the root cause here, > could you, on a quick hunch, do a quick experiment and tell > me if adding the options -XX:+CMSClassUnloadingEnabled makes > any difference to the observations above. [Does your application > do lots of class loading or reflection or string interning?] > > thanks. > -- ramki > > Jason Vasquez wrote: >> I've attached a representative garbage log. To my eye, I don't >> see anything specific that seems to indicate an event, but >> hopefully more experienced eyes will see something different :) >> Thanks, >> -jason >> On Jan 1, 2008, at 15:46 , Y Srinivas Ramakrishna wrote: >>> >>> It's probably a combination of card-scanning times and allocation >>> slow-down >>> (but probably more of the former). >>> >>> We've had some internal instrumentation of card-scanning times in >>> the JVM which >>> unfortunately has not made into the JVM code proper because the >>> instrumentation >>> is not as lightweight as to be enabled in production. Perhaps a >>> spin on a test system >>> with the card-scanning times explicitly called out might shed light. >>> >>> Basically what happens with CMS is that allocation is from free >>> lists, and lacking >>> something like big bag of pages (BBOP) allocation, this has >>> traditionally tended to >>> scatter the allocated objects over a large number of pages. This >>> increases card-scanning >>> times, although one would normally expect that this would eventually >>> stabilize. >>> >>> Do the scavenge times increase suddenly after a specific event or >>> do they just >>> creep up slowly after each scavenge? The complete GC log would be >>> useful to >>> look at to answer that question. >>> >>> -- ramki >>> >>> >>> ----- Original Message ----- >>> From: Jason Vasquez >>> Date: Monday, December 31, 2007 10:21 am >>> Subject: Perplexing GC Time Growth >>> To: hotspot-gc-use at openjdk.java.net >>> >>> >>>> Hi all, >>>> >>>> I'm having a perplexing problem -- the garbage collector appears >>>> to be >>>> >>>> functioning well, with a nice object/garbage lifecycle, yet minor >>>> GC >>>> >>>> times increase over the life of the process inexplicably. We are >>>> working with telephony hardware with this application, so keeping >>>> GC >>>> >>>> pauses very low is quite important. (keeping well below 100 ms >>>> would >>>> >>>> be ideal) >>>> >>>> Here is the current configuration we are using: >>>> >>>> -server \ >>>> -Xloggc:garbage.log \ >>>> -XX:+PrintGCDetails \ >>>> -Dsun.rmi.dgc.server.gcInterval=3600000 \ >>>> -Dsun.rmi.dgc.client.gcInterval=3600000 \ >>>> -XX:ParallelGCThreads=8 \ >>>> -XX:+UseParNewGC \ >>>> -XX:+UseConcMarkSweepGC \ >>>> -XX:+PrintGCTimeStamps \ >>>> -XX:-TraceClassUnloading \ >>>> -XX:+AggressiveOpts \ >>>> -Xmx512M \ >>>> -Xms512M \ >>>> -Xmn128M \ >>>> -XX:MaxTenuringThreshold=6 \ >>>> -XX:+ExplicitGCInvokesConcurrent >>>> >>>> A large number of our bigger objects size-wise live for >>>> approximately >>>> >>>> 4-5 minutes, thus the larger young generation, and tenuring >>>> threshold. >>>> >>>> This seems to be successful in filtering most objects before they >>>> reach the tenured gen. (8 core x86 server, running 1.6.0_03-b05 >>>> on 32- >>>> >>>> bit Linux, kernel rev 2.6.18) >>>> >>>> Here is a representative snippet of our garbage log: >>>> >>>> 487.135: [GC 487.135: [ParNew: 112726K->7290K(118016K), 0.0218110 >>>> secs] 134494K->29058K(511232K), 0.0220520 secs] >>>> 557.294: [GC 557.294: [ParNew: 112250K->7976K(118016K), 0.0204220 >>>> secs] 134018K->29744K(511232K), 0.0206690 secs] >>>> 607.025: [GC 607.025: [ParNew: 112936K->7831K(118016K), 0.0231230 >>>> secs] 134704K->30003K(511232K), 0.0233670 secs] >>>> 672.522: [GC 672.522: [ParNew: 112791K->7361K(118016K), 0.0253620 >>>> secs] 134963K->29533K(511232K), 0.0256080 secs] >>>> ... >>>> 4006.635: [GC 4006.635: [ParNew: 112983K->7386K(118016K), 0.0385960 >>>> secs] 141969K->36608K(511232K), 0.0388460 secs] >>>> 4083.066: [GC 4083.066: [ParNew: 112346K->8439K(118016K), 0.0365940 >>>> secs] 141568K->37661K(511232K), 0.0368340 secs] >>>> 4158.457: [GC 4158.457: [ParNew: 113399K->7152K(118016K), 0.0360540 >>>> secs] 142621K->36374K(511232K), 0.0362990 secs] >>>> 4228.312: [GC 4228.313: [ParNew: 112112K->8738K(118016K), 0.0362510 >>>> secs] 141334K->38083K(511232K), 0.0365050 secs] >>>> 4293.800: [GC 4293.800: [ParNew: 113698K->8294K(118016K), 0.0368700 >>>> secs] 143043K->37917K(511232K), 0.0371160 secs] >>>> ... >>>> 10489.555: [GC 10489.556: [ParNew: 112701K->7770K(118016K), >>>> 0.0639770 >>>> >>>> secs] 151966K->47156K(511232K), 0.0642210 secs] >>>> 10562.544: [GC 10562.544: [ParNew: 112730K->9267K(118016K), >>>> 0.0625900 >>>> >>>> secs] 152116K->48772K(511232K), 0.0628470 secs] >>>> 10622.558: [GC 10622.558: [ParNew: 114227K->8361K(118016K), >>>> 0.0675730 >>>> >>>> secs] 153732K->48381K(511232K), 0.0678220 secs] >>>> 10678.842: [GC 10678.842: [ParNew: 113056K->7214K(118016K), >>>> 0.0669330 >>>> >>>> secs] 153076K->47234K(511232K), 0.0671800 secs] >>>> ... >>>> 177939.062: [GC 177939.062: [ParNew: 112608K->8620K(118016K), >>>> 0.7681440 secs] 466132K->362144K(511232K), 0.7684030 secs] >>>> 178005.483: [GC 178005.483: [ParNew: 113449K->7731K(118016K), >>>> 0.7677300 secs] 466973K->361893K(511232K), 0.7679890 secs] >>>> 178069.658: [GC 178069.658: [ParNew: 112670K->6814K(118016K), >>>> 0.7700020 secs] 466832K->360976K(511232K), 0.7702590 secs] >>>> 178133.513: [GC 178133.513: [ParNew: 111717K->7920K(118016K), >>>> 0.7702920 secs] 465879K->362082K(511232K), 0.7705560 secs] >>>> >>>> As you can see, the gc times continue to increase over time, on the >>>> order of about 10-20ms per hour. CMS runs are spaced very far >>>> apart, >>>> >>>> in fact, since most objects die before reaching the tenured >>>> generation, the CMS is triggered more by RMI DGC runs then by heap >>>> growth. (We were getting serial GCs, apparently due to RMI DGC >>>> before adding -XX:+ExplicitGCInvokesConcurrent) >>>> >>>> Here's some representative output from `jstat -gcutil -t -h10 >>>> 2s`: >>>> >>>> Timestamp S0 S1 E O P YGC YGCT FGC FGCT GCT >>>> 11067.6 55.74 0.00 89.32 9.73 59.84 168 7.471 8 0.280 7.751 >>>> 11069.6 55.74 0.00 93.65 9.73 59.84 168 7.471 8 0.280 7.751 >>>> 11071.6 55.74 0.00 99.34 9.73 59.84 168 7.471 8 0.280 7.751 >>>> 11073.5 0.00 62.22 2.89 9.76 59.84 169 7.537 8 0.280 7.816 >>>> 11075.6 0.00 62.22 4.80 9.76 59.84 169 7.537 8 0.280 7.816 >>>> >>>> Survivor spaces continue to sit at about 50-65% occupancy, which >>>> seems >>>> >>>> fairly good to my eye. Eden fills approximately every 70 seconds, >>>> triggering minor GCs. >>>> >>>> >>>> Any ideas? This is becoming quite frustrating for us -- our >>>> application uptime is pretty horrible with the too-frequent >>>> scheduled >>>> >>>> restarts we are being forced to run. >>>> >>>> Thanks for any assistance you might be able to offer, >>>> -Jason Vasquez >>>> _______________________________________________ >>>> hotspot-gc-use mailing list >>>> hotspot-gc-use at openjdk.java.net >>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > From Y.S.Ramakrishna at Sun.COM Wed Jan 2 14:58:32 2008 From: Y.S.Ramakrishna at Sun.COM (Y.S.Ramakrishna at Sun.COM) Date: Wed, 02 Jan 2008 14:58:32 -0800 Subject: Perplexing GC Time Growth In-Reply-To: <22133221-306B-41D3-AE57-155876104354@mugfu.com> References: <9BADD5B8-F9DA-4656-843B-7D44FF36963A@mugfu.com> <477C09ED.9020400@Sun.COM> <22133221-306B-41D3-AE57-155876104354@mugfu.com> Message-ID: <477C1718.70004@Sun.COM> Jason Vasquez wrote: > I've enabled the flag and am starting the test now. It takes a few > hours before we can see definite trending, but I'll let you know what > we see. > > And yes, we do a fair bit of classloading/reflection work in the > application. On the hunch that it was the accumulation of classes in the perm gen and the system dictionary that was the root cause of the troubles here (which the explicit class unloading will now hopefully take care of -- recall that cms does not by default unload classes to keep cms remark pause times low), you might want to monitor the size of the perm gen (with and without the CMSClassUnloadingEnabled option) by means of "jstat -gc ..." (which will show perm gen utilization under the "PU" column). This (lack of class unloading w/CMS unless explicitly requested) will hopefully be much less of an issue going forward as a result of work currently being done towards 6543076 (some of which occurred under 6634032 recently discussed on this list); until that time you might find the feature in 6541037 (available as of 6u4 in a production JVM) useful. I look forward to the results of yr test; thanks! -- ramki > > Thanks, > jason > > > On Jan 2, 2008, at 17:02 , Y.S.Ramakrishna at Sun.COM wrote: > >> Very cool log, showing a near linear and monotonic growth >> not only in scavenge times, but also in initial mark and >> remark times, see attached plots produced by the gc histogram >> tool (from Tony Printezis, John Coomes et al). Then, looking >> also at the log file, one also sees that the live data following >> a GC is also increasing monotonically. >> >> While we think of potential reasons for this, or mull over >> appropriate sensors that can lay bare the root cause here, >> could you, on a quick hunch, do a quick experiment and tell >> me if adding the options -XX:+CMSClassUnloadingEnabled makes >> any difference to the observations above. [Does your application >> do lots of class loading or reflection or string interning?] >> >> thanks. >> -- ramki >> >> Jason Vasquez wrote: >>> I've attached a representative garbage log. To my eye, I don't >>> see anything specific that seems to indicate an event, but >>> hopefully more experienced eyes will see something different :) >>> Thanks, >>> -jason >>> On Jan 1, 2008, at 15:46 , Y Srinivas Ramakrishna wrote: >>>> It's probably a combination of card-scanning times and allocation >>>> slow-down >>>> (but probably more of the former). >>>> >>>> We've had some internal instrumentation of card-scanning times in >>>> the JVM which >>>> unfortunately has not made into the JVM code proper because the >>>> instrumentation >>>> is not as lightweight as to be enabled in production. Perhaps a >>>> spin on a test system >>>> with the card-scanning times explicitly called out might shed light. >>>> >>>> Basically what happens with CMS is that allocation is from free >>>> lists, and lacking >>>> something like big bag of pages (BBOP) allocation, this has >>>> traditionally tended to >>>> scatter the allocated objects over a large number of pages. This >>>> increases card-scanning >>>> times, although one would normally expect that this would eventually >>>> stabilize. >>>> >>>> Do the scavenge times increase suddenly after a specific event or >>>> do they just >>>> creep up slowly after each scavenge? The complete GC log would be >>>> useful to >>>> look at to answer that question. >>>> >>>> -- ramki >>>> >>>> >>>> ----- Original Message ----- >>>> From: Jason Vasquez >>>> Date: Monday, December 31, 2007 10:21 am >>>> Subject: Perplexing GC Time Growth >>>> To: hotspot-gc-use at openjdk.java.net >>>> >>>> >>>>> Hi all, >>>>> >>>>> I'm having a perplexing problem -- the garbage collector appears >>>>> to be >>>>> >>>>> functioning well, with a nice object/garbage lifecycle, yet minor >>>>> GC >>>>> >>>>> times increase over the life of the process inexplicably. We are >>>>> working with telephony hardware with this application, so keeping >>>>> GC >>>>> >>>>> pauses very low is quite important. (keeping well below 100 ms >>>>> would >>>>> >>>>> be ideal) >>>>> >>>>> Here is the current configuration we are using: >>>>> >>>>> -server \ >>>>> -Xloggc:garbage.log \ >>>>> -XX:+PrintGCDetails \ >>>>> -Dsun.rmi.dgc.server.gcInterval=3600000 \ >>>>> -Dsun.rmi.dgc.client.gcInterval=3600000 \ >>>>> -XX:ParallelGCThreads=8 \ >>>>> -XX:+UseParNewGC \ >>>>> -XX:+UseConcMarkSweepGC \ >>>>> -XX:+PrintGCTimeStamps \ >>>>> -XX:-TraceClassUnloading \ >>>>> -XX:+AggressiveOpts \ >>>>> -Xmx512M \ >>>>> -Xms512M \ >>>>> -Xmn128M \ >>>>> -XX:MaxTenuringThreshold=6 \ >>>>> -XX:+ExplicitGCInvokesConcurrent >>>>> >>>>> A large number of our bigger objects size-wise live for >>>>> approximately >>>>> >>>>> 4-5 minutes, thus the larger young generation, and tenuring >>>>> threshold. >>>>> >>>>> This seems to be successful in filtering most objects before they >>>>> reach the tenured gen. (8 core x86 server, running 1.6.0_03-b05 >>>>> on 32- >>>>> >>>>> bit Linux, kernel rev 2.6.18) >>>>> >>>>> Here is a representative snippet of our garbage log: >>>>> >>>>> 487.135: [GC 487.135: [ParNew: 112726K->7290K(118016K), 0.0218110 >>>>> secs] 134494K->29058K(511232K), 0.0220520 secs] >>>>> 557.294: [GC 557.294: [ParNew: 112250K->7976K(118016K), 0.0204220 >>>>> secs] 134018K->29744K(511232K), 0.0206690 secs] >>>>> 607.025: [GC 607.025: [ParNew: 112936K->7831K(118016K), 0.0231230 >>>>> secs] 134704K->30003K(511232K), 0.0233670 secs] >>>>> 672.522: [GC 672.522: [ParNew: 112791K->7361K(118016K), 0.0253620 >>>>> secs] 134963K->29533K(511232K), 0.0256080 secs] >>>>> ... >>>>> 4006.635: [GC 4006.635: [ParNew: 112983K->7386K(118016K), 0.0385960 >>>>> secs] 141969K->36608K(511232K), 0.0388460 secs] >>>>> 4083.066: [GC 4083.066: [ParNew: 112346K->8439K(118016K), 0.0365940 >>>>> secs] 141568K->37661K(511232K), 0.0368340 secs] >>>>> 4158.457: [GC 4158.457: [ParNew: 113399K->7152K(118016K), 0.0360540 >>>>> secs] 142621K->36374K(511232K), 0.0362990 secs] >>>>> 4228.312: [GC 4228.313: [ParNew: 112112K->8738K(118016K), 0.0362510 >>>>> secs] 141334K->38083K(511232K), 0.0365050 secs] >>>>> 4293.800: [GC 4293.800: [ParNew: 113698K->8294K(118016K), 0.0368700 >>>>> secs] 143043K->37917K(511232K), 0.0371160 secs] >>>>> ... >>>>> 10489.555: [GC 10489.556: [ParNew: 112701K->7770K(118016K), >>>>> 0.0639770 >>>>> >>>>> secs] 151966K->47156K(511232K), 0.0642210 secs] >>>>> 10562.544: [GC 10562.544: [ParNew: 112730K->9267K(118016K), >>>>> 0.0625900 >>>>> >>>>> secs] 152116K->48772K(511232K), 0.0628470 secs] >>>>> 10622.558: [GC 10622.558: [ParNew: 114227K->8361K(118016K), >>>>> 0.0675730 >>>>> >>>>> secs] 153732K->48381K(511232K), 0.0678220 secs] >>>>> 10678.842: [GC 10678.842: [ParNew: 113056K->7214K(118016K), >>>>> 0.0669330 >>>>> >>>>> secs] 153076K->47234K(511232K), 0.0671800 secs] >>>>> ... >>>>> 177939.062: [GC 177939.062: [ParNew: 112608K->8620K(118016K), >>>>> 0.7681440 secs] 466132K->362144K(511232K), 0.7684030 secs] >>>>> 178005.483: [GC 178005.483: [ParNew: 113449K->7731K(118016K), >>>>> 0.7677300 secs] 466973K->361893K(511232K), 0.7679890 secs] >>>>> 178069.658: [GC 178069.658: [ParNew: 112670K->6814K(118016K), >>>>> 0.7700020 secs] 466832K->360976K(511232K), 0.7702590 secs] >>>>> 178133.513: [GC 178133.513: [ParNew: 111717K->7920K(118016K), >>>>> 0.7702920 secs] 465879K->362082K(511232K), 0.7705560 secs] >>>>> >>>>> As you can see, the gc times continue to increase over time, on the >>>>> order of about 10-20ms per hour. CMS runs are spaced very far >>>>> apart, >>>>> >>>>> in fact, since most objects die before reaching the tenured >>>>> generation, the CMS is triggered more by RMI DGC runs then by heap >>>>> growth. (We were getting serial GCs, apparently due to RMI DGC >>>>> before adding -XX:+ExplicitGCInvokesConcurrent) >>>>> >>>>> Here's some representative output from `jstat -gcutil -t -h10 >>>>> 2s`: >>>>> >>>>> Timestamp S0 S1 E O P YGC YGCT FGC FGCT GCT >>>>> 11067.6 55.74 0.00 89.32 9.73 59.84 168 7.471 8 0.280 7.751 >>>>> 11069.6 55.74 0.00 93.65 9.73 59.84 168 7.471 8 0.280 7.751 >>>>> 11071.6 55.74 0.00 99.34 9.73 59.84 168 7.471 8 0.280 7.751 >>>>> 11073.5 0.00 62.22 2.89 9.76 59.84 169 7.537 8 0.280 7.816 >>>>> 11075.6 0.00 62.22 4.80 9.76 59.84 169 7.537 8 0.280 7.816 >>>>> >>>>> Survivor spaces continue to sit at about 50-65% occupancy, which >>>>> seems >>>>> >>>>> fairly good to my eye. Eden fills approximately every 70 seconds, >>>>> triggering minor GCs. >>>>> >>>>> >>>>> Any ideas? This is becoming quite frustrating for us -- our >>>>> application uptime is pretty horrible with the too-frequent >>>>> scheduled >>>>> >>>>> restarts we are being forced to run. >>>>> >>>>> Thanks for any assistance you might be able to offer, >>>>> -Jason Vasquez >>>>> _______________________________________________ >>>>> hotspot-gc-use mailing list >>>>> hotspot-gc-use at openjdk.java.net >>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From jason at mugfu.com Wed Jan 2 19:21:33 2008 From: jason at mugfu.com (Jason Vasquez) Date: Wed, 2 Jan 2008 22:21:33 -0500 Subject: Perplexing GC Time Growth In-Reply-To: <477C1718.70004@Sun.COM> References: <9BADD5B8-F9DA-4656-843B-7D44FF36963A@mugfu.com> <477C09ED.9020400@Sun.COM> <22133221-306B-41D3-AE57-155876104354@mugfu.com> <477C1718.70004@Sun.COM> Message-ID: <07B7B19B-9EEA-4AB2-8CB1-3FCF1A2524F3@mugfu.com> Too bad, I had hoped for some improvement as well. Attached is the gc log from the early part of the latest run. I don't have the charting utility, but looks similar as to before from what I can see. As requested earlier, here are the current JVM arguments, as dumped by -XX:+PrintCommandLineFlags: -XX:+AggressiveOpts -XX:+CMSClassUnloadingEnabled -XX: +ExplicitGCInvokesConcurrent -XX:+ManagementServer - XX:MaxHeapSize=536870912 -XX:MaxNewSize =100663296 -XX:MaxTenuringThreshold=6 -XX:NewSize=100663296 - XX:ParallelGCThreads=8 -XX:+PrintCommandLineFlags -XX:+PrintGC -XX: +PrintGCDetails - XX:+PrintGCTimeStamps -XX:-TraceClassUnloading -XX:+UseConcMarkSweepGC -XX:+UseParNewGC (Running on 8 core, 32-bit Linux machine) Thanks! -jason -------------- next part -------------- A non-text attachment was scrubbed... Name: garbage.log.gz Type: application/x-gzip Size: 6656 bytes Desc: not available Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20080102/fac1d735/attachment.bin -------------- next part -------------- On Jan 2, 2008, at 17:58 , Y.S.Ramakrishna at Sun.COM wrote: > Jason Vasquez wrote: >> I've enabled the flag and am starting the test now. It takes a >> few hours before we can see definite trending, but I'll let you >> know what we see. >> And yes, we do a fair bit of classloading/reflection work in the >> application. > > On the hunch that it was the accumulation of classes in the perm gen > and the system dictionary that was the root cause > of the troubles here (which the explicit class unloading will now > hopefully take care of -- recall that cms does not by default unload > classes to keep cms remark pause times low), you might want to monitor > the size of the perm gen (with and without the > CMSClassUnloadingEnabled > option) by means of "jstat -gc ..." (which will show perm gen > utilization under the "PU" column). > > This (lack of class unloading w/CMS unless explicitly requested) > will hopefully be much less of an issue going forward as > a result of work currently being done towards 6543076 > (some of which occurred under 6634032 recently discussed > on this list); until that time you might find the feature > in 6541037 (available as of 6u4 in a production JVM) useful. > > I look forward to the results of yr test; thanks! > -- ramki > >> Thanks, >> jason >> On Jan 2, 2008, at 17:02 , Y.S.Ramakrishna at Sun.COM wrote: >>> Very cool log, showing a near linear and monotonic growth >>> not only in scavenge times, but also in initial mark and >>> remark times, see attached plots produced by the gc histogram >>> tool (from Tony Printezis, John Coomes et al). Then, looking >>> also at the log file, one also sees that the live data following >>> a GC is also increasing monotonically. >>> >>> While we think of potential reasons for this, or mull over >>> appropriate sensors that can lay bare the root cause here, >>> could you, on a quick hunch, do a quick experiment and tell >>> me if adding the options -XX:+CMSClassUnloadingEnabled makes >>> any difference to the observations above. [Does your application >>> do lots of class loading or reflection or string interning?] >>> >>> thanks. >>> -- ramki >>> >>> Jason Vasquez wrote: >>>> I've attached a representative garbage log. To my eye, I don't >>>> see anything specific that seems to indicate an event, but >>>> hopefully more experienced eyes will see something different :) >>>> Thanks, >>>> -jason >>>> On Jan 1, 2008, at 15:46 , Y Srinivas Ramakrishna wrote: >>>>> It's probably a combination of card-scanning times and >>>>> allocation slow-down >>>>> (but probably more of the former). >>>>> >>>>> We've had some internal instrumentation of card-scanning times >>>>> in the JVM which >>>>> unfortunately has not made into the JVM code proper because the >>>>> instrumentation >>>>> is not as lightweight as to be enabled in production. Perhaps a >>>>> spin on a test system >>>>> with the card-scanning times explicitly called out might shed >>>>> light. >>>>> >>>>> Basically what happens with CMS is that allocation is from free >>>>> lists, and lacking >>>>> something like big bag of pages (BBOP) allocation, this has >>>>> traditionally tended to >>>>> scatter the allocated objects over a large number of pages. >>>>> This increases card-scanning >>>>> times, although one would normally expect that this would >>>>> eventually >>>>> stabilize. >>>>> >>>>> Do the scavenge times increase suddenly after a specific event >>>>> or do they just >>>>> creep up slowly after each scavenge? The complete GC log would >>>>> be useful to >>>>> look at to answer that question. >>>>> >>>>> -- ramki >>>>> >>>>> >>>>> ----- Original Message ----- >>>>> From: Jason Vasquez >>>>> Date: Monday, December 31, 2007 10:21 am >>>>> Subject: Perplexing GC Time Growth >>>>> To: hotspot-gc-use at openjdk.java.net >>>>> >>>>> >>>>>> Hi all, >>>>>> >>>>>> I'm having a perplexing problem -- the garbage collector >>>>>> appears to be >>>>>> >>>>>> functioning well, with a nice object/garbage lifecycle, yet >>>>>> minor GC >>>>>> >>>>>> times increase over the life of the process inexplicably. We are >>>>>> working with telephony hardware with this application, so >>>>>> keeping GC >>>>>> >>>>>> pauses very low is quite important. (keeping well below 100 >>>>>> ms would >>>>>> >>>>>> be ideal) >>>>>> >>>>>> Here is the current configuration we are using: >>>>>> >>>>>> -server \ >>>>>> -Xloggc:garbage.log \ >>>>>> -XX:+PrintGCDetails \ >>>>>> -Dsun.rmi.dgc.server.gcInterval=3600000 \ >>>>>> -Dsun.rmi.dgc.client.gcInterval=3600000 \ >>>>>> -XX:ParallelGCThreads=8 \ >>>>>> -XX:+UseParNewGC \ >>>>>> -XX:+UseConcMarkSweepGC \ >>>>>> -XX:+PrintGCTimeStamps \ >>>>>> -XX:-TraceClassUnloading \ >>>>>> -XX:+AggressiveOpts \ >>>>>> -Xmx512M \ >>>>>> -Xms512M \ >>>>>> -Xmn128M \ >>>>>> -XX:MaxTenuringThreshold=6 \ >>>>>> -XX:+ExplicitGCInvokesConcurrent >>>>>> >>>>>> A large number of our bigger objects size-wise live for >>>>>> approximately >>>>>> >>>>>> 4-5 minutes, thus the larger young generation, and tenuring >>>>>> threshold. >>>>>> >>>>>> This seems to be successful in filtering most objects before they >>>>>> reach the tenured gen. (8 core x86 server, running 1.6.0_03- >>>>>> b05 on 32- >>>>>> >>>>>> bit Linux, kernel rev 2.6.18) >>>>>> >>>>>> Here is a representative snippet of our garbage log: >>>>>> >>>>>> 487.135: [GC 487.135: [ParNew: 112726K->7290K(118016K), 0.0218110 >>>>>> secs] 134494K->29058K(511232K), 0.0220520 secs] >>>>>> 557.294: [GC 557.294: [ParNew: 112250K->7976K(118016K), 0.0204220 >>>>>> secs] 134018K->29744K(511232K), 0.0206690 secs] >>>>>> 607.025: [GC 607.025: [ParNew: 112936K->7831K(118016K), 0.0231230 >>>>>> secs] 134704K->30003K(511232K), 0.0233670 secs] >>>>>> 672.522: [GC 672.522: [ParNew: 112791K->7361K(118016K), 0.0253620 >>>>>> secs] 134963K->29533K(511232K), 0.0256080 secs] >>>>>> ... >>>>>> 4006.635: [GC 4006.635: [ParNew: 112983K->7386K(118016K), >>>>>> 0.0385960 >>>>>> secs] 141969K->36608K(511232K), 0.0388460 secs] >>>>>> 4083.066: [GC 4083.066: [ParNew: 112346K->8439K(118016K), >>>>>> 0.0365940 >>>>>> secs] 141568K->37661K(511232K), 0.0368340 secs] >>>>>> 4158.457: [GC 4158.457: [ParNew: 113399K->7152K(118016K), >>>>>> 0.0360540 >>>>>> secs] 142621K->36374K(511232K), 0.0362990 secs] >>>>>> 4228.312: [GC 4228.313: [ParNew: 112112K->8738K(118016K), >>>>>> 0.0362510 >>>>>> secs] 141334K->38083K(511232K), 0.0365050 secs] >>>>>> 4293.800: [GC 4293.800: [ParNew: 113698K->8294K(118016K), >>>>>> 0.0368700 >>>>>> secs] 143043K->37917K(511232K), 0.0371160 secs] >>>>>> ... >>>>>> 10489.555: [GC 10489.556: [ParNew: 112701K->7770K(118016K), >>>>>> 0.0639770 >>>>>> >>>>>> secs] 151966K->47156K(511232K), 0.0642210 secs] >>>>>> 10562.544: [GC 10562.544: [ParNew: 112730K->9267K(118016K), >>>>>> 0.0625900 >>>>>> >>>>>> secs] 152116K->48772K(511232K), 0.0628470 secs] >>>>>> 10622.558: [GC 10622.558: [ParNew: 114227K->8361K(118016K), >>>>>> 0.0675730 >>>>>> >>>>>> secs] 153732K->48381K(511232K), 0.0678220 secs] >>>>>> 10678.842: [GC 10678.842: [ParNew: 113056K->7214K(118016K), >>>>>> 0.0669330 >>>>>> >>>>>> secs] 153076K->47234K(511232K), 0.0671800 secs] >>>>>> ... >>>>>> 177939.062: [GC 177939.062: [ParNew: 112608K->8620K(118016K), >>>>>> 0.7681440 secs] 466132K->362144K(511232K), 0.7684030 secs] >>>>>> 178005.483: [GC 178005.483: [ParNew: 113449K->7731K(118016K), >>>>>> 0.7677300 secs] 466973K->361893K(511232K), 0.7679890 secs] >>>>>> 178069.658: [GC 178069.658: [ParNew: 112670K->6814K(118016K), >>>>>> 0.7700020 secs] 466832K->360976K(511232K), 0.7702590 secs] >>>>>> 178133.513: [GC 178133.513: [ParNew: 111717K->7920K(118016K), >>>>>> 0.7702920 secs] 465879K->362082K(511232K), 0.7705560 secs] >>>>>> >>>>>> As you can see, the gc times continue to increase over time, on >>>>>> the >>>>>> order of about 10-20ms per hour. CMS runs are spaced very far >>>>>> apart, >>>>>> >>>>>> in fact, since most objects die before reaching the tenured >>>>>> generation, the CMS is triggered more by RMI DGC runs then by >>>>>> heap >>>>>> growth. (We were getting serial GCs, apparently due to RMI DGC >>>>>> before adding -XX:+ExplicitGCInvokesConcurrent) >>>>>> >>>>>> Here's some representative output from `jstat -gcutil -t -h10 >>>>>> 2s`: >>>>>> >>>>>> Timestamp S0 S1 E O P YGC YGCT FGC FGCT GCT >>>>>> 11067.6 55.74 0.00 89.32 9.73 59.84 168 7.471 8 0.280 7.751 >>>>>> 11069.6 55.74 0.00 93.65 9.73 59.84 168 7.471 8 0.280 7.751 >>>>>> 11071.6 55.74 0.00 99.34 9.73 59.84 168 7.471 8 0.280 7.751 >>>>>> 11073.5 0.00 62.22 2.89 9.76 59.84 169 7.537 8 0.280 7.816 >>>>>> 11075.6 0.00 62.22 4.80 9.76 59.84 169 7.537 8 0.280 7.816 >>>>>> >>>>>> Survivor spaces continue to sit at about 50-65% occupancy, >>>>>> which seems >>>>>> >>>>>> fairly good to my eye. Eden fills approximately every 70 >>>>>> seconds, >>>>>> triggering minor GCs. >>>>>> >>>>>> >>>>>> Any ideas? This is becoming quite frustrating for us -- our >>>>>> application uptime is pretty horrible with the too-frequent >>>>>> scheduled >>>>>> >>>>>> restarts we are being forced to run. >>>>>> >>>>>> Thanks for any assistance you might be able to offer, >>>>>> -Jason Vasquez >>>>>> _______________________________________________ >>>>>> hotspot-gc-use mailing list >>>>>> hotspot-gc-use at openjdk.java.net >>>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > From jason at mugfu.com Thu Jan 3 02:12:58 2008 From: jason at mugfu.com (Jason Vasquez) Date: Thu, 3 Jan 2008 05:12:58 -0500 Subject: Perplexing GC Time Growth In-Reply-To: <477C1718.70004@Sun.COM> References: <9BADD5B8-F9DA-4656-843B-7D44FF36963A@mugfu.com> <477C09ED.9020400@Sun.COM> <22133221-306B-41D3-AE57-155876104354@mugfu.com> <477C1718.70004@Sun.COM> Message-ID: One other thing of interest to note, beyond the classloading/ reflection work, we do quite a bit of JNI interaction as well. Not sure how that would affect GC times, but wanted to make sure that information was in the mix as well. I have done some cursory audits of that code, it appears that all heap-allocated objects are being freed properly from the native side of things, I think we should be OK in that regard. -jason On Jan 2, 2008, at 17:58 , Y.S.Ramakrishna at Sun.COM wrote: > Jason Vasquez wrote: >> I've enabled the flag and am starting the test now. It takes a >> few hours before we can see definite trending, but I'll let you >> know what we see. >> And yes, we do a fair bit of classloading/reflection work in the >> application. > > On the hunch that it was the accumulation of classes in the perm gen > and the system dictionary that was the root cause > of the troubles here (which the explicit class unloading will now > hopefully take care of -- recall that cms does not by default unload > classes to keep cms remark pause times low), you might want to monitor > the size of the perm gen (with and without the > CMSClassUnloadingEnabled > option) by means of "jstat -gc ..." (which will show perm gen > utilization under the "PU" column). > > This (lack of class unloading w/CMS unless explicitly requested) > will hopefully be much less of an issue going forward as > a result of work currently being done towards 6543076 > (some of which occurred under 6634032 recently discussed > on this list); until that time you might find the feature > in 6541037 (available as of 6u4 in a production JVM) useful. > > I look forward to the results of yr test; thanks! > -- ramki > >> Thanks, >> jason >> On Jan 2, 2008, at 17:02 , Y.S.Ramakrishna at Sun.COM wrote: >>> Very cool log, showing a near linear and monotonic growth >>> not only in scavenge times, but also in initial mark and >>> remark times, see attached plots produced by the gc histogram >>> tool (from Tony Printezis, John Coomes et al). Then, looking >>> also at the log file, one also sees that the live data following >>> a GC is also increasing monotonically. >>> >>> While we think of potential reasons for this, or mull over >>> appropriate sensors that can lay bare the root cause here, >>> could you, on a quick hunch, do a quick experiment and tell >>> me if adding the options -XX:+CMSClassUnloadingEnabled makes >>> any difference to the observations above. [Does your application >>> do lots of class loading or reflection or string interning?] >>> >>> thanks. >>> -- ramki >>> >>> Jason Vasquez wrote: >>>> I've attached a representative garbage log. To my eye, I don't >>>> see anything specific that seems to indicate an event, but >>>> hopefully more experienced eyes will see something different :) >>>> Thanks, >>>> -jason >>>> On Jan 1, 2008, at 15:46 , Y Srinivas Ramakrishna wrote: >>>>> It's probably a combination of card-scanning times and >>>>> allocation slow-down >>>>> (but probably more of the former). >>>>> >>>>> We've had some internal instrumentation of card-scanning times >>>>> in the JVM which >>>>> unfortunately has not made into the JVM code proper because the >>>>> instrumentation >>>>> is not as lightweight as to be enabled in production. Perhaps a >>>>> spin on a test system >>>>> with the card-scanning times explicitly called out might shed >>>>> light. >>>>> >>>>> Basically what happens with CMS is that allocation is from free >>>>> lists, and lacking >>>>> something like big bag of pages (BBOP) allocation, this has >>>>> traditionally tended to >>>>> scatter the allocated objects over a large number of pages. >>>>> This increases card-scanning >>>>> times, although one would normally expect that this would >>>>> eventually >>>>> stabilize. >>>>> >>>>> Do the scavenge times increase suddenly after a specific event >>>>> or do they just >>>>> creep up slowly after each scavenge? The complete GC log would >>>>> be useful to >>>>> look at to answer that question. >>>>> >>>>> -- ramki >>>>> >>>>> >>>>> ----- Original Message ----- >>>>> From: Jason Vasquez >>>>> Date: Monday, December 31, 2007 10:21 am >>>>> Subject: Perplexing GC Time Growth >>>>> To: hotspot-gc-use at openjdk.java.net >>>>> >>>>> >>>>>> Hi all, >>>>>> >>>>>> I'm having a perplexing problem -- the garbage collector >>>>>> appears to be >>>>>> >>>>>> functioning well, with a nice object/garbage lifecycle, yet >>>>>> minor GC >>>>>> >>>>>> times increase over the life of the process inexplicably. We are >>>>>> working with telephony hardware with this application, so >>>>>> keeping GC >>>>>> >>>>>> pauses very low is quite important. (keeping well below 100 >>>>>> ms would >>>>>> >>>>>> be ideal) >>>>>> >>>>>> Here is the current configuration we are using: >>>>>> >>>>>> -server \ >>>>>> -Xloggc:garbage.log \ >>>>>> -XX:+PrintGCDetails \ >>>>>> -Dsun.rmi.dgc.server.gcInterval=3600000 \ >>>>>> -Dsun.rmi.dgc.client.gcInterval=3600000 \ >>>>>> -XX:ParallelGCThreads=8 \ >>>>>> -XX:+UseParNewGC \ >>>>>> -XX:+UseConcMarkSweepGC \ >>>>>> -XX:+PrintGCTimeStamps \ >>>>>> -XX:-TraceClassUnloading \ >>>>>> -XX:+AggressiveOpts \ >>>>>> -Xmx512M \ >>>>>> -Xms512M \ >>>>>> -Xmn128M \ >>>>>> -XX:MaxTenuringThreshold=6 \ >>>>>> -XX:+ExplicitGCInvokesConcurrent >>>>>> >>>>>> A large number of our bigger objects size-wise live for >>>>>> approximately >>>>>> >>>>>> 4-5 minutes, thus the larger young generation, and tenuring >>>>>> threshold. >>>>>> >>>>>> This seems to be successful in filtering most objects before they >>>>>> reach the tenured gen. (8 core x86 server, running 1.6.0_03- >>>>>> b05 on 32- >>>>>> >>>>>> bit Linux, kernel rev 2.6.18) >>>>>> >>>>>> Here is a representative snippet of our garbage log: >>>>>> >>>>>> 487.135: [GC 487.135: [ParNew: 112726K->7290K(118016K), 0.0218110 >>>>>> secs] 134494K->29058K(511232K), 0.0220520 secs] >>>>>> 557.294: [GC 557.294: [ParNew: 112250K->7976K(118016K), 0.0204220 >>>>>> secs] 134018K->29744K(511232K), 0.0206690 secs] >>>>>> 607.025: [GC 607.025: [ParNew: 112936K->7831K(118016K), 0.0231230 >>>>>> secs] 134704K->30003K(511232K), 0.0233670 secs] >>>>>> 672.522: [GC 672.522: [ParNew: 112791K->7361K(118016K), 0.0253620 >>>>>> secs] 134963K->29533K(511232K), 0.0256080 secs] >>>>>> ... >>>>>> 4006.635: [GC 4006.635: [ParNew: 112983K->7386K(118016K), >>>>>> 0.0385960 >>>>>> secs] 141969K->36608K(511232K), 0.0388460 secs] >>>>>> 4083.066: [GC 4083.066: [ParNew: 112346K->8439K(118016K), >>>>>> 0.0365940 >>>>>> secs] 141568K->37661K(511232K), 0.0368340 secs] >>>>>> 4158.457: [GC 4158.457: [ParNew: 113399K->7152K(118016K), >>>>>> 0.0360540 >>>>>> secs] 142621K->36374K(511232K), 0.0362990 secs] >>>>>> 4228.312: [GC 4228.313: [ParNew: 112112K->8738K(118016K), >>>>>> 0.0362510 >>>>>> secs] 141334K->38083K(511232K), 0.0365050 secs] >>>>>> 4293.800: [GC 4293.800: [ParNew: 113698K->8294K(118016K), >>>>>> 0.0368700 >>>>>> secs] 143043K->37917K(511232K), 0.0371160 secs] >>>>>> ... >>>>>> 10489.555: [GC 10489.556: [ParNew: 112701K->7770K(118016K), >>>>>> 0.0639770 >>>>>> >>>>>> secs] 151966K->47156K(511232K), 0.0642210 secs] >>>>>> 10562.544: [GC 10562.544: [ParNew: 112730K->9267K(118016K), >>>>>> 0.0625900 >>>>>> >>>>>> secs] 152116K->48772K(511232K), 0.0628470 secs] >>>>>> 10622.558: [GC 10622.558: [ParNew: 114227K->8361K(118016K), >>>>>> 0.0675730 >>>>>> >>>>>> secs] 153732K->48381K(511232K), 0.0678220 secs] >>>>>> 10678.842: [GC 10678.842: [ParNew: 113056K->7214K(118016K), >>>>>> 0.0669330 >>>>>> >>>>>> secs] 153076K->47234K(511232K), 0.0671800 secs] >>>>>> ... >>>>>> 177939.062: [GC 177939.062: [ParNew: 112608K->8620K(118016K), >>>>>> 0.7681440 secs] 466132K->362144K(511232K), 0.7684030 secs] >>>>>> 178005.483: [GC 178005.483: [ParNew: 113449K->7731K(118016K), >>>>>> 0.7677300 secs] 466973K->361893K(511232K), 0.7679890 secs] >>>>>> 178069.658: [GC 178069.658: [ParNew: 112670K->6814K(118016K), >>>>>> 0.7700020 secs] 466832K->360976K(511232K), 0.7702590 secs] >>>>>> 178133.513: [GC 178133.513: [ParNew: 111717K->7920K(118016K), >>>>>> 0.7702920 secs] 465879K->362082K(511232K), 0.7705560 secs] >>>>>> >>>>>> As you can see, the gc times continue to increase over time, on >>>>>> the >>>>>> order of about 10-20ms per hour. CMS runs are spaced very far >>>>>> apart, >>>>>> >>>>>> in fact, since most objects die before reaching the tenured >>>>>> generation, the CMS is triggered more by RMI DGC runs then by >>>>>> heap >>>>>> growth. (We were getting serial GCs, apparently due to RMI DGC >>>>>> before adding -XX:+ExplicitGCInvokesConcurrent) >>>>>> >>>>>> Here's some representative output from `jstat -gcutil -t -h10 >>>>>> 2s`: >>>>>> >>>>>> Timestamp S0 S1 E O P YGC YGCT FGC FGCT GCT >>>>>> 11067.6 55.74 0.00 89.32 9.73 59.84 168 7.471 8 0.280 7.751 >>>>>> 11069.6 55.74 0.00 93.65 9.73 59.84 168 7.471 8 0.280 7.751 >>>>>> 11071.6 55.74 0.00 99.34 9.73 59.84 168 7.471 8 0.280 7.751 >>>>>> 11073.5 0.00 62.22 2.89 9.76 59.84 169 7.537 8 0.280 7.816 >>>>>> 11075.6 0.00 62.22 4.80 9.76 59.84 169 7.537 8 0.280 7.816 >>>>>> >>>>>> Survivor spaces continue to sit at about 50-65% occupancy, >>>>>> which seems >>>>>> >>>>>> fairly good to my eye. Eden fills approximately every 70 >>>>>> seconds, >>>>>> triggering minor GCs. >>>>>> >>>>>> >>>>>> Any ideas? This is becoming quite frustrating for us -- our >>>>>> application uptime is pretty horrible with the too-frequent >>>>>> scheduled >>>>>> >>>>>> restarts we are being forced to run. >>>>>> >>>>>> Thanks for any assistance you might be able to offer, >>>>>> -Jason Vasquez >>>>>> _______________________________________________ >>>>>> hotspot-gc-use mailing list >>>>>> hotspot-gc-use at openjdk.java.net >>>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > From jason at mugfu.com Thu Jan 3 13:42:52 2008 From: jason at mugfu.com (Jason Vasquez) Date: Thu, 3 Jan 2008 16:42:52 -0500 Subject: Perplexing GC Time Growth In-Reply-To: <477D2529.2030507@Sun.COM> References: <9BADD5B8-F9DA-4656-843B-7D44FF36963A@mugfu.com> <477C09ED.9020400@Sun.COM> <22133221-306B-41D3-AE57-155876104354@mugfu.com> <477C1718.70004@Sun.COM> <07B7B19B-9EEA-4AB2-8CB1-3FCF1A2524F3@mugfu.com> <477D2529.2030507@Sun.COM> Message-ID: <20940539-5615-412D-AD4C-28F32A4D6BFA@mugfu.com> During the latest test run, I did observe PU/PC for a bit -- although I did not notice in decrease in PU after a CMS (unfortunately i was only able to observe through one CMS). Late in the test, here are the current values: PC: 28824.0 PU: 16582.6 Doesn't actually seem like that large of a number for me. PU was approximately 14157 several hours earlier. 1) Yes, we do have a test setup where we can simulate full load and perform experiments, I would be more than happy to gather whatever data would be helpful. 2) Unfortunately, no official support contract at this point. I'll see what can be done if it becomes absolutely necessary, but we're not a large company at this point :) -jason On Jan 3, 2008, at 13:10 , Y.S.Ramakrishna at Sun.COM wrote: > Hi Jason -- thanks for the new data. I was wondering if you also > got this data:- > > >>> ... you might want to monitor >>> the size of the perm gen (with and without the >>> CMSClassUnloadingEnabled >>> option) by means of "jstat -gc ..." (which will show perm gen >>> utilization under the "PU" column). > > > And two follow-up questions: > > (1) do you have a test set-up on which some experiments could be > tried? > If so, i will be happy to suggest some follow-up data gathering > that might aid diagnosis. > > (2) do you have a Sun support contract that we could leverage to get > our support engineers looking at this issue? (If so, can you send > me either your contract #, or at least your company name so we can > dig up your support contract info). > > thanks. > -- ramki From jason at mugfu.com Thu Jan 3 13:44:35 2008 From: jason at mugfu.com (Jason Vasquez) Date: Thu, 3 Jan 2008 16:44:35 -0500 Subject: Perplexing GC Time Growth In-Reply-To: <477D2F43.5010603@Sun.COM> References: <9BADD5B8-F9DA-4656-843B-7D44FF36963A@mugfu.com> <477C09ED.9020400@Sun.COM> <22133221-306B-41D3-AE57-155876104354@mugfu.com> <477C1718.70004@Sun.COM> <477D2F43.5010603@Sun.COM> Message-ID: I'll try doing more histogram audits on the next run. We can try serial or parallel GC -- the system will work, but the quality of the application will be horrible -- fortunately doesn't really matter too much in this environment :) Thanks -jason On Jan 3, 2008, at 13:53 , Y.S.Ramakrishna at Sun.COM wrote: > Jason Vasquez wrote: >> One other thing of interest to note, beyond the classloading/ >> reflection work, we do quite a bit of JNI interaction as well. Not >> sure how that would affect GC times, but wanted to make sure that >> information was in the mix as well. I have done some cursory >> audits of that code, it appears that all heap-allocated objects are >> being freed properly from the native side of things, I think we >> should be OK in that regard. > > You might want to run a jmap -histo audit on a periodic basis (perhaps > each time a CMS cycle completes) to > see what might be accumulating in the heap since there is a definite > growth in the program heap footprint. > > It is of course possible that some form of JNI handles are piling > up and also causing leakage in the Java heap. > > The weight of the evidence (the increase in scavenge times, initial > mark times, remark times and heap usage) seems to point towards an > increase in the #roots. Let me see if we can get a suitably > instrumented > JVM to you (or at least see if there's existing instrumentation there > which can be enabled to gather information). > > I am also wondering if it might be possible to run with for > example -XX:+UseSerialGC or -XX:+UseParallelGC to see if the > same increase in scavenge times is noted. (If it's feasible to > run with these collectors without badly compromising the fidelity > of the testing set-up.) > > thanks. > -- ramki From jason at mugfu.com Wed Jan 2 10:42:28 2008 From: jason at mugfu.com (Jason Vasquez) Date: Wed, 2 Jan 2008 13:42:28 -0500 Subject: Perplexing GC Time Growth In-Reply-To: References: Message-ID: <9BADD5B8-F9DA-4656-843B-7D44FF36963A@mugfu.com> I've attached a representative garbage log. To my eye, I don't see anything specific that seems to indicate an event, but hopefully more experienced eyes will see something different :) Thanks, -jason -------------- next part -------------- A non-text attachment was scrubbed... Name: garbage.log.gz Type: application/x-gzip Size: 33347 bytes Desc: not available Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20080102/ed0257a8/attachment.bin -------------- next part -------------- On Jan 1, 2008, at 15:46 , Y Srinivas Ramakrishna wrote: > > It's probably a combination of card-scanning times and allocation > slow-down > (but probably more of the former). > > We've had some internal instrumentation of card-scanning times in > the JVM which > unfortunately has not made into the JVM code proper because the > instrumentation > is not as lightweight as to be enabled in production. Perhaps a spin > on a test system > with the card-scanning times explicitly called out might shed light. > > Basically what happens with CMS is that allocation is from free > lists, and lacking > something like big bag of pages (BBOP) allocation, this has > traditionally tended to > scatter the allocated objects over a large number of pages. This > increases card-scanning > times, although one would normally expect that this would eventually > stabilize. > > Do the scavenge times increase suddenly after a specific event or do > they just > creep up slowly after each scavenge? The complete GC log would be > useful to > look at to answer that question. > > -- ramki > > > ----- Original Message ----- > From: Jason Vasquez > Date: Monday, December 31, 2007 10:21 am > Subject: Perplexing GC Time Growth > To: hotspot-gc-use at openjdk.java.net > > >> Hi all, >> >> I'm having a perplexing problem -- the garbage collector appears to >> be >> >> functioning well, with a nice object/garbage lifecycle, yet minor GC >> >> times increase over the life of the process inexplicably. We are >> working with telephony hardware with this application, so keeping GC >> >> pauses very low is quite important. (keeping well below 100 ms would >> >> be ideal) >> >> Here is the current configuration we are using: >> >> -server \ >> -Xloggc:garbage.log \ >> -XX:+PrintGCDetails \ >> -Dsun.rmi.dgc.server.gcInterval=3600000 \ >> -Dsun.rmi.dgc.client.gcInterval=3600000 \ >> -XX:ParallelGCThreads=8 \ >> -XX:+UseParNewGC \ >> -XX:+UseConcMarkSweepGC \ >> -XX:+PrintGCTimeStamps \ >> -XX:-TraceClassUnloading \ >> -XX:+AggressiveOpts \ >> -Xmx512M \ >> -Xms512M \ >> -Xmn128M \ >> -XX:MaxTenuringThreshold=6 \ >> -XX:+ExplicitGCInvokesConcurrent >> >> A large number of our bigger objects size-wise live for approximately >> >> 4-5 minutes, thus the larger young generation, and tenuring >> threshold. >> >> This seems to be successful in filtering most objects before they >> reach the tenured gen. (8 core x86 server, running 1.6.0_03-b05 on >> 32- >> >> bit Linux, kernel rev 2.6.18) >> >> Here is a representative snippet of our garbage log: >> >> 487.135: [GC 487.135: [ParNew: 112726K->7290K(118016K), 0.0218110 >> secs] 134494K->29058K(511232K), 0.0220520 secs] >> 557.294: [GC 557.294: [ParNew: 112250K->7976K(118016K), 0.0204220 >> secs] 134018K->29744K(511232K), 0.0206690 secs] >> 607.025: [GC 607.025: [ParNew: 112936K->7831K(118016K), 0.0231230 >> secs] 134704K->30003K(511232K), 0.0233670 secs] >> 672.522: [GC 672.522: [ParNew: 112791K->7361K(118016K), 0.0253620 >> secs] 134963K->29533K(511232K), 0.0256080 secs] >> ... >> 4006.635: [GC 4006.635: [ParNew: 112983K->7386K(118016K), 0.0385960 >> secs] 141969K->36608K(511232K), 0.0388460 secs] >> 4083.066: [GC 4083.066: [ParNew: 112346K->8439K(118016K), 0.0365940 >> secs] 141568K->37661K(511232K), 0.0368340 secs] >> 4158.457: [GC 4158.457: [ParNew: 113399K->7152K(118016K), 0.0360540 >> secs] 142621K->36374K(511232K), 0.0362990 secs] >> 4228.312: [GC 4228.313: [ParNew: 112112K->8738K(118016K), 0.0362510 >> secs] 141334K->38083K(511232K), 0.0365050 secs] >> 4293.800: [GC 4293.800: [ParNew: 113698K->8294K(118016K), 0.0368700 >> secs] 143043K->37917K(511232K), 0.0371160 secs] >> ... >> 10489.555: [GC 10489.556: [ParNew: 112701K->7770K(118016K), 0.0639770 >> >> secs] 151966K->47156K(511232K), 0.0642210 secs] >> 10562.544: [GC 10562.544: [ParNew: 112730K->9267K(118016K), 0.0625900 >> >> secs] 152116K->48772K(511232K), 0.0628470 secs] >> 10622.558: [GC 10622.558: [ParNew: 114227K->8361K(118016K), 0.0675730 >> >> secs] 153732K->48381K(511232K), 0.0678220 secs] >> 10678.842: [GC 10678.842: [ParNew: 113056K->7214K(118016K), 0.0669330 >> >> secs] 153076K->47234K(511232K), 0.0671800 secs] >> ... >> 177939.062: [GC 177939.062: [ParNew: 112608K->8620K(118016K), >> 0.7681440 secs] 466132K->362144K(511232K), 0.7684030 secs] >> 178005.483: [GC 178005.483: [ParNew: 113449K->7731K(118016K), >> 0.7677300 secs] 466973K->361893K(511232K), 0.7679890 secs] >> 178069.658: [GC 178069.658: [ParNew: 112670K->6814K(118016K), >> 0.7700020 secs] 466832K->360976K(511232K), 0.7702590 secs] >> 178133.513: [GC 178133.513: [ParNew: 111717K->7920K(118016K), >> 0.7702920 secs] 465879K->362082K(511232K), 0.7705560 secs] >> >> As you can see, the gc times continue to increase over time, on the >> order of about 10-20ms per hour. CMS runs are spaced very far apart, >> >> in fact, since most objects die before reaching the tenured >> generation, the CMS is triggered more by RMI DGC runs then by heap >> growth. (We were getting serial GCs, apparently due to RMI DGC >> before adding -XX:+ExplicitGCInvokesConcurrent) >> >> Here's some representative output from `jstat -gcutil -t -h10 >> 2s`: >> >> Timestamp S0 S1 E O P YGC YGCT FGC FGCT GCT >> 11067.6 55.74 0.00 89.32 9.73 59.84 168 7.471 8 0.280 7.751 >> 11069.6 55.74 0.00 93.65 9.73 59.84 168 7.471 8 0.280 7.751 >> 11071.6 55.74 0.00 99.34 9.73 59.84 168 7.471 8 0.280 7.751 >> 11073.5 0.00 62.22 2.89 9.76 59.84 169 7.537 8 0.280 7.816 >> 11075.6 0.00 62.22 4.80 9.76 59.84 169 7.537 8 0.280 7.816 >> >> Survivor spaces continue to sit at about 50-65% occupancy, which >> seems >> >> fairly good to my eye. Eden fills approximately every 70 seconds, >> triggering minor GCs. >> >> >> Any ideas? This is becoming quite frustrating for us -- our >> application uptime is pretty horrible with the too-frequent scheduled >> >> restarts we are being forced to run. >> >> Thanks for any assistance you might be able to offer, >> -Jason Vasquez >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From Y.S.Ramakrishna at Sun.COM Wed Jan 2 14:02:21 2008 From: Y.S.Ramakrishna at Sun.COM (Y.S.Ramakrishna at Sun.COM) Date: Wed, 02 Jan 2008 14:02:21 -0800 Subject: Perplexing GC Time Growth In-Reply-To: <9BADD5B8-F9DA-4656-843B-7D44FF36963A@mugfu.com> References: <9BADD5B8-F9DA-4656-843B-7D44FF36963A@mugfu.com> Message-ID: <477C09ED.9020400@Sun.COM> Very cool log, showing a near linear and monotonic growth not only in scavenge times, but also in initial mark and remark times, see attached plots produced by the gc histogram tool (from Tony Printezis, John Coomes et al). Then, looking also at the log file, one also sees that the live data following a GC is also increasing monotonically. While we think of potential reasons for this, or mull over appropriate sensors that can lay bare the root cause here, could you, on a quick hunch, do a quick experiment and tell me if adding the options -XX:+CMSClassUnloadingEnabled makes any difference to the observations above. [Does your application do lots of class loading or reflection or string interning?] thanks. -- ramki Jason Vasquez wrote: > I've attached a representative garbage log. To my eye, I don't see > anything specific that seems to indicate an event, but hopefully more > experienced eyes will see something different :) > > Thanks, > -jason > > > > > > On Jan 1, 2008, at 15:46 , Y Srinivas Ramakrishna wrote: > >> >> It's probably a combination of card-scanning times and allocation >> slow-down >> (but probably more of the former). >> >> We've had some internal instrumentation of card-scanning times in the >> JVM which >> unfortunately has not made into the JVM code proper because the >> instrumentation >> is not as lightweight as to be enabled in production. Perhaps a spin >> on a test system >> with the card-scanning times explicitly called out might shed light. >> >> Basically what happens with CMS is that allocation is from free lists, >> and lacking >> something like big bag of pages (BBOP) allocation, this has >> traditionally tended to >> scatter the allocated objects over a large number of pages. This >> increases card-scanning >> times, although one would normally expect that this would eventually >> stabilize. >> >> Do the scavenge times increase suddenly after a specific event or do >> they just >> creep up slowly after each scavenge? The complete GC log would be >> useful to >> look at to answer that question. >> >> -- ramki >> >> >> ----- Original Message ----- >> From: Jason Vasquez >> Date: Monday, December 31, 2007 10:21 am >> Subject: Perplexing GC Time Growth >> To: hotspot-gc-use at openjdk.java.net >> >> >>> Hi all, >>> >>> I'm having a perplexing problem -- the garbage collector appears to be >>> >>> functioning well, with a nice object/garbage lifecycle, yet minor GC >>> >>> times increase over the life of the process inexplicably. We are >>> working with telephony hardware with this application, so keeping GC >>> >>> pauses very low is quite important. (keeping well below 100 ms would >>> >>> be ideal) >>> >>> Here is the current configuration we are using: >>> >>> -server \ >>> -Xloggc:garbage.log \ >>> -XX:+PrintGCDetails \ >>> -Dsun.rmi.dgc.server.gcInterval=3600000 \ >>> -Dsun.rmi.dgc.client.gcInterval=3600000 \ >>> -XX:ParallelGCThreads=8 \ >>> -XX:+UseParNewGC \ >>> -XX:+UseConcMarkSweepGC \ >>> -XX:+PrintGCTimeStamps \ >>> -XX:-TraceClassUnloading \ >>> -XX:+AggressiveOpts \ >>> -Xmx512M \ >>> -Xms512M \ >>> -Xmn128M \ >>> -XX:MaxTenuringThreshold=6 \ >>> -XX:+ExplicitGCInvokesConcurrent >>> >>> A large number of our bigger objects size-wise live for approximately >>> >>> 4-5 minutes, thus the larger young generation, and tenuring threshold. >>> >>> This seems to be successful in filtering most objects before they >>> reach the tenured gen. (8 core x86 server, running 1.6.0_03-b05 on 32- >>> >>> bit Linux, kernel rev 2.6.18) >>> >>> Here is a representative snippet of our garbage log: >>> >>> 487.135: [GC 487.135: [ParNew: 112726K->7290K(118016K), 0.0218110 >>> secs] 134494K->29058K(511232K), 0.0220520 secs] >>> 557.294: [GC 557.294: [ParNew: 112250K->7976K(118016K), 0.0204220 >>> secs] 134018K->29744K(511232K), 0.0206690 secs] >>> 607.025: [GC 607.025: [ParNew: 112936K->7831K(118016K), 0.0231230 >>> secs] 134704K->30003K(511232K), 0.0233670 secs] >>> 672.522: [GC 672.522: [ParNew: 112791K->7361K(118016K), 0.0253620 >>> secs] 134963K->29533K(511232K), 0.0256080 secs] >>> ... >>> 4006.635: [GC 4006.635: [ParNew: 112983K->7386K(118016K), 0.0385960 >>> secs] 141969K->36608K(511232K), 0.0388460 secs] >>> 4083.066: [GC 4083.066: [ParNew: 112346K->8439K(118016K), 0.0365940 >>> secs] 141568K->37661K(511232K), 0.0368340 secs] >>> 4158.457: [GC 4158.457: [ParNew: 113399K->7152K(118016K), 0.0360540 >>> secs] 142621K->36374K(511232K), 0.0362990 secs] >>> 4228.312: [GC 4228.313: [ParNew: 112112K->8738K(118016K), 0.0362510 >>> secs] 141334K->38083K(511232K), 0.0365050 secs] >>> 4293.800: [GC 4293.800: [ParNew: 113698K->8294K(118016K), 0.0368700 >>> secs] 143043K->37917K(511232K), 0.0371160 secs] >>> ... >>> 10489.555: [GC 10489.556: [ParNew: 112701K->7770K(118016K), 0.0639770 >>> >>> secs] 151966K->47156K(511232K), 0.0642210 secs] >>> 10562.544: [GC 10562.544: [ParNew: 112730K->9267K(118016K), 0.0625900 >>> >>> secs] 152116K->48772K(511232K), 0.0628470 secs] >>> 10622.558: [GC 10622.558: [ParNew: 114227K->8361K(118016K), 0.0675730 >>> >>> secs] 153732K->48381K(511232K), 0.0678220 secs] >>> 10678.842: [GC 10678.842: [ParNew: 113056K->7214K(118016K), 0.0669330 >>> >>> secs] 153076K->47234K(511232K), 0.0671800 secs] >>> ... >>> 177939.062: [GC 177939.062: [ParNew: 112608K->8620K(118016K), >>> 0.7681440 secs] 466132K->362144K(511232K), 0.7684030 secs] >>> 178005.483: [GC 178005.483: [ParNew: 113449K->7731K(118016K), >>> 0.7677300 secs] 466973K->361893K(511232K), 0.7679890 secs] >>> 178069.658: [GC 178069.658: [ParNew: 112670K->6814K(118016K), >>> 0.7700020 secs] 466832K->360976K(511232K), 0.7702590 secs] >>> 178133.513: [GC 178133.513: [ParNew: 111717K->7920K(118016K), >>> 0.7702920 secs] 465879K->362082K(511232K), 0.7705560 secs] >>> >>> As you can see, the gc times continue to increase over time, on the >>> order of about 10-20ms per hour. CMS runs are spaced very far apart, >>> >>> in fact, since most objects die before reaching the tenured >>> generation, the CMS is triggered more by RMI DGC runs then by heap >>> growth. (We were getting serial GCs, apparently due to RMI DGC >>> before adding -XX:+ExplicitGCInvokesConcurrent) >>> >>> Here's some representative output from `jstat -gcutil -t -h10 2s`: >>> >>> Timestamp S0 S1 E O P YGC YGCT FGC FGCT GCT >>> 11067.6 55.74 0.00 89.32 9.73 59.84 168 7.471 8 0.280 7.751 >>> 11069.6 55.74 0.00 93.65 9.73 59.84 168 7.471 8 0.280 7.751 >>> 11071.6 55.74 0.00 99.34 9.73 59.84 168 7.471 8 0.280 7.751 >>> 11073.5 0.00 62.22 2.89 9.76 59.84 169 7.537 8 0.280 7.816 >>> 11075.6 0.00 62.22 4.80 9.76 59.84 169 7.537 8 0.280 7.816 >>> >>> Survivor spaces continue to sit at about 50-65% occupancy, which seems >>> >>> fairly good to my eye. Eden fills approximately every 70 seconds, >>> triggering minor GCs. >>> >>> >>> Any ideas? This is becoming quite frustrating for us -- our >>> application uptime is pretty horrible with the too-frequent scheduled >>> >>> restarts we are being forced to run. >>> >>> Thanks for any assistance you might be able to offer, >>> -Jason Vasquez >>> _______________________________________________ >>> hotspot-gc-use mailing list >>> hotspot-gc-use at openjdk.java.net >>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > -------------- next part -------------- A non-text attachment was scrubbed... Name: scavenge.png Type: image/png Size: 28214 bytes Desc: not available Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20080102/8683ca91/attachment.png -------------- next part -------------- A non-text attachment was scrubbed... Name: cms.png Type: image/png Size: 17026 bytes Desc: not available Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20080102/8683ca91/attachment-0001.png From Keith.Holdaway at sas.com Thu Jan 3 07:34:17 2008 From: Keith.Holdaway at sas.com (Keith Holdaway) Date: Thu, 3 Jan 2008 10:34:17 -0500 Subject: Negative durations? In-Reply-To: References: <9BADD5B8-F9DA-4656-843B-7D44FF36963A@mugfu.com> <477C09ED.9020400@Sun.COM> <22133221-306B-41D3-AE57-155876104354@mugfu.com> <477C1718.70004@Sun.COM> Message-ID: <304E9E55F6A4BE4B910C2437D4D1B49608FC799162@MERCMBX14.na.sas.com> I have Google'd, but found no explanation for negative GC pauses: 8829.092: [GC 426933K->375154K(517760K), 0.0117606 secs] 8830.373: [GC 427634K->376403K(517760K), 0.0833584 secs] 8830.569: [GC 428883K->377941K(517760K), -0.6576383 secs] 8831.110: [GC 430421K->379175K(517760K), 0.0161026 secs] 8831.548: [GC 431628K->379968K(517760K), 0.0134666 secs] 8831.942: [GC 432448K->379701K(517760K), -0.1804611 secs] 8832.718: [GC 432180K->382348K(517760K), 0.0836352 secs] 8832.303: [GC 434828K->382927K(517760K), 0.6898266 secs] 8833.482: [GC 435407K->384775K(517760K), -0.1111267 secs] Keith R Holdaway Java Development Technologies SAS... The Power to Know Carpe Diem ... From Y.S.Ramakrishna at Sun.COM Thu Jan 3 15:56:46 2008 From: Y.S.Ramakrishna at Sun.COM (Y.S.Ramakrishna at Sun.COM) Date: Thu, 03 Jan 2008 15:56:46 -0800 Subject: Negative durations? In-Reply-To: <304E9E55F6A4BE4B910C2437D4D1B49608FC799162@MERCMBX14.na.sas.com> References: <9BADD5B8-F9DA-4656-843B-7D44FF36963A@mugfu.com> <477C09ED.9020400@Sun.COM> <22133221-306B-41D3-AE57-155876104354@mugfu.com> <477C1718.70004@Sun.COM> <304E9E55F6A4BE4B910C2437D4D1B49608FC799162@MERCMBX14.na.sas.com> Message-ID: <477D763E.9000607@Sun.COM> If on windows, you could try disabling NTP sync to see if the "problem" goes away... Cross-posting to the runtime list for possible further comment. You clearly see time-stamps also stepping back (for example between line n-1 and n-2 below). -- ramki Keith Holdaway wrote: > I have Google'd, but found no explanation for negative GC pauses: > > 8829.092: [GC 426933K->375154K(517760K), 0.0117606 secs] > 8830.373: [GC 427634K->376403K(517760K), 0.0833584 secs] > 8830.569: [GC 428883K->377941K(517760K), -0.6576383 secs] > 8831.110: [GC 430421K->379175K(517760K), 0.0161026 secs] > 8831.548: [GC 431628K->379968K(517760K), 0.0134666 secs] > 8831.942: [GC 432448K->379701K(517760K), -0.1804611 secs] > 8832.718: [GC 432180K->382348K(517760K), 0.0836352 secs] > 8832.303: [GC 434828K->382927K(517760K), 0.6898266 secs] > 8833.482: [GC 435407K->384775K(517760K), -0.1111267 secs] > > > Keith R Holdaway > Java Development Technologies > > SAS... The Power to Know > > Carpe Diem ... > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From Y.S.Ramakrishna at Sun.COM Thu Jan 3 16:10:11 2008 From: Y.S.Ramakrishna at Sun.COM (Y.S.Ramakrishna at Sun.COM) Date: Thu, 03 Jan 2008 16:10:11 -0800 Subject: Perplexing GC Time Growth In-Reply-To: <477C09ED.9020400@Sun.COM> References: <9BADD5B8-F9DA-4656-843B-7D44FF36963A@mugfu.com> <477C09ED.9020400@Sun.COM> Message-ID: <477D7963.1000804@Sun.COM> Hi Jason -- Y.S.Ramakrishna at Sun.COM wrote: ... > While we think of potential reasons for this, or mull over > appropriate sensors that can lay bare the root cause here, > could you, on a quick hunch, do a quick experiment and tell > me if adding the options -XX:+CMSClassUnloadingEnabled makes > any difference to the observations above. [Does your application > do lots of class loading or reflection or string interning?] I think i screwed up a bit here. I was assuming you were using the (newest) OpenJDK build. As it happens, in 6u3 and older you need to specify both +CMSClassUnloadingEnabled _and_ +CMSPermGenSweepingEnabled, else you do not get the desired effect. Sorry this was fixed after 6u3, so the additional flag did not do anything in your case. Let me know if the additional flag (please use both) made a difference, and apologies for the bad user interface here (set right as of 6u4 and in OpenJDK). -- ramki > From David.Holmes at Sun.COM Thu Jan 3 16:17:28 2008 From: David.Holmes at Sun.COM (David Holmes - Sun Microsystems) Date: Fri, 04 Jan 2008 10:17:28 +1000 Subject: Negative durations? In-Reply-To: <477D763E.9000607@Sun.COM> References: <9BADD5B8-F9DA-4656-843B-7D44FF36963A@mugfu.com> <477C09ED.9020400@Sun.COM> <22133221-306B-41D3-AE57-155876104354@mugfu.com> <477C1718.70004@Sun.COM> <304E9E55F6A4BE4B910C2437D4D1B49608FC799162@MERCMBX14.na.sas.com> <477D763E.9000607@Sun.COM> Message-ID: <477D7B18.4010304@sun.com> Possible problems depend on the OS, as the elapsed_counter used underneath these "timings" is totally different on each platform: Solaris: uses gethrtime() via getTimeNanos() which has code to ensure the returned value is monotonic even if gethrtime() is not (it should be but it has had bugs due to TSC usage on x86 MP systems). So negative values should not been seen on Solaris, but you might see zero values. Linux: uses gettimeofday() - so if you have a bad clock and are running (x)ntpd then you might see lots of adjustments that cause apparent negative intervals. I don't recall if gettimeofday can be subject to TSC problems, but you might try booting with the notsc boot option. (It all depends on which Linux kernel version you have.) Windows: uses QueryPerformanceCounter if available, else GetSystemTimeAsFileTime. If QPC is used then you need to ensure Windows is using a stable time source for your platform eg. not using the TSC on MP systems; and not using the TSC if you have "speed-step" or equivalent dynamic CPU frequency adjusting mechanisms. Add /usepmtimer to the boot options in boot.ini to avoid TSC use. I don't know what problems GetSystemTimeAsFileTime might encounter - I suspect it would be susceptible to NTP adjustments as well. (For gory details of clocks and timers on Windows see: http://blogs.sun.com/dholmes/entry/inside_the_hotspot_vm_clocks) Cheers, David Holmes Y.S.Ramakrishna at Sun.COM said the following on 4/01/08 09:56 AM: > If on windows, you could try disabling NTP sync to see if > the "problem" goes away... Cross-posting to the runtime > list for possible further comment. > > You clearly see time-stamps also stepping back (for example > between line n-1 and n-2 below). > > -- ramki > > Keith Holdaway wrote: >> I have Google'd, but found no explanation for negative GC pauses: >> >> 8829.092: [GC 426933K->375154K(517760K), 0.0117606 secs] >> 8830.373: [GC 427634K->376403K(517760K), 0.0833584 secs] >> 8830.569: [GC 428883K->377941K(517760K), -0.6576383 secs] >> 8831.110: [GC 430421K->379175K(517760K), 0.0161026 secs] >> 8831.548: [GC 431628K->379968K(517760K), 0.0134666 secs] >> 8831.942: [GC 432448K->379701K(517760K), -0.1804611 secs] >> 8832.718: [GC 432180K->382348K(517760K), 0.0836352 secs] >> 8832.303: [GC 434828K->382927K(517760K), 0.6898266 secs] >> 8833.482: [GC 435407K->384775K(517760K), -0.1111267 secs] >> >> >> Keith R Holdaway >> Java Development Technologies >> >> SAS... The Power to Know >> >> Carpe Diem ... >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > From neojia at gmail.com Thu Jan 3 16:00:37 2008 From: neojia at gmail.com (Neo Jia) Date: Thu, 3 Jan 2008 16:00:37 -0800 Subject: Negative durations? In-Reply-To: <304E9E55F6A4BE4B910C2437D4D1B49608FC799162@MERCMBX14.na.sas.com> References: <9BADD5B8-F9DA-4656-843B-7D44FF36963A@mugfu.com> <477C09ED.9020400@Sun.COM> <22133221-306B-41D3-AE57-155876104354@mugfu.com> <477C1718.70004@Sun.COM> <304E9E55F6A4BE4B910C2437D4D1B49608FC799162@MERCMBX14.na.sas.com> Message-ID: <5d649bdb0801031600j7d36c5e9k4049421726346cf3@mail.gmail.com> Is that because you are going the call minor GC but failed and switch to a full GC? I remember I have seen something like this before. Thanks, Neo On Jan 3, 2008 7:34 AM, Keith Holdaway wrote: > I have Google'd, but found no explanation for negative GC pauses: > > 8829.092: [GC 426933K->375154K(517760K), 0.0117606 secs] > 8830.373: [GC 427634K->376403K(517760K), 0.0833584 secs] > 8830.569: [GC 428883K->377941K(517760K), -0.6576383 secs] > 8831.110: [GC 430421K->379175K(517760K), 0.0161026 secs] > 8831.548: [GC 431628K->379968K(517760K), 0.0134666 secs] > 8831.942: [GC 432448K->379701K(517760K), -0.1804611 secs] > 8832.718: [GC 432180K->382348K(517760K), 0.0836352 secs] > 8832.303: [GC 434828K->382927K(517760K), 0.6898266 secs] > 8833.482: [GC 435407K->384775K(517760K), -0.1111267 secs] > > > Keith R Holdaway > Java Development Technologies > > SAS... The Power to Know > > Carpe Diem ... > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > -- I would remember that if researchers were not ambitious probably today we haven't the technology we are using! From ted at tedneward.com Thu Jan 3 16:35:14 2008 From: ted at tedneward.com (Ted Neward) Date: Thu, 3 Jan 2008 16:35:14 -0800 Subject: Negative durations? In-Reply-To: <304E9E55F6A4BE4B910C2437D4D1B49608FC799162@MERCMBX14.na.sas.com> References: <9BADD5B8-F9DA-4656-843B-7D44FF36963A@mugfu.com> <477C09ED.9020400@Sun.COM> <22133221-306B-41D3-AE57-155876104354@mugfu.com> <477C1718.70004@Sun.COM> <304E9E55F6A4BE4B910C2437D4D1B49608FC799162@MERCMBX14.na.sas.com> Message-ID: <01b101c84e69$af86d180$0e947480$@com> It's just Really Really Fast(TM)? :-) What flags did you run this with? Can you reproduce it on demand? Ted Neward Java, .NET, XML Services Consulting, Teaching, Speaking, Writing http://www.tedneward.com > -----Original Message----- > From: hotspot-gc-dev-bounces at openjdk.java.net [mailto:hotspot-gc-dev- > bounces at openjdk.java.net] On Behalf Of Keith Holdaway > Sent: Thursday, January 03, 2008 7:34 AM > To: hotspot-gc-use at openjdk.java.net > Subject: Negative durations? > > I have Google'd, but found no explanation for negative GC pauses: > > 8829.092: [GC 426933K->375154K(517760K), 0.0117606 secs] > 8830.373: [GC 427634K->376403K(517760K), 0.0833584 secs] > 8830.569: [GC 428883K->377941K(517760K), -0.6576383 secs] > 8831.110: [GC 430421K->379175K(517760K), 0.0161026 secs] > 8831.548: [GC 431628K->379968K(517760K), 0.0134666 secs] > 8831.942: [GC 432448K->379701K(517760K), -0.1804611 secs] > 8832.718: [GC 432180K->382348K(517760K), 0.0836352 secs] > 8832.303: [GC 434828K->382927K(517760K), 0.6898266 secs] > 8833.482: [GC 435407K->384775K(517760K), -0.1111267 secs] > > > Keith R Holdaway > Java Development Technologies > > SAS... The Power to Know > > Carpe Diem ... > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > No virus found in this incoming message. > Checked by AVG Free Edition. > Version: 7.5.516 / Virus Database: 269.17.13/1208 - Release Date: > 1/3/2008 3:52 PM > No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.5.516 / Virus Database: 269.17.13/1208 - Release Date: 1/3/2008 3:52 PM From Y.S.Ramakrishna at Sun.COM Thu Jan 3 16:40:07 2008 From: Y.S.Ramakrishna at Sun.COM (Y.S.Ramakrishna at Sun.COM) Date: Thu, 03 Jan 2008 16:40:07 -0800 Subject: Negative durations? In-Reply-To: <5d649bdb0801031600j7d36c5e9k4049421726346cf3@mail.gmail.com> References: <9BADD5B8-F9DA-4656-843B-7D44FF36963A@mugfu.com> <477C09ED.9020400@Sun.COM> <22133221-306B-41D3-AE57-155876104354@mugfu.com> <477C1718.70004@Sun.COM> <304E9E55F6A4BE4B910C2437D4D1B49608FC799162@MERCMBX14.na.sas.com> <5d649bdb0801031600j7d36c5e9k4049421726346cf3@mail.gmail.com> Message-ID: <477D8067.8050802@Sun.COM> Neo Jia wrote: > Is that because you are going the call minor GC but failed and switch > to a full GC? I remember I have seen something like this before. No that's printed as GC-- (when using the parallel gc collector). This is different, see the follow-up from David Holmes. -- ramki > > Thanks, > Neo > > On Jan 3, 2008 7:34 AM, Keith Holdaway wrote: >> I have Google'd, but found no explanation for negative GC pauses: >> >> 8829.092: [GC 426933K->375154K(517760K), 0.0117606 secs] >> 8830.373: [GC 427634K->376403K(517760K), 0.0833584 secs] >> 8830.569: [GC 428883K->377941K(517760K), -0.6576383 secs] >> 8831.110: [GC 430421K->379175K(517760K), 0.0161026 secs] >> 8831.548: [GC 431628K->379968K(517760K), 0.0134666 secs] >> 8831.942: [GC 432448K->379701K(517760K), -0.1804611 secs] >> 8832.718: [GC 432180K->382348K(517760K), 0.0836352 secs] >> 8832.303: [GC 434828K->382927K(517760K), 0.6898266 secs] >> 8833.482: [GC 435407K->384775K(517760K), -0.1111267 secs] >> >> >> Keith R Holdaway >> Java Development Technologies >> >> SAS... The Power to Know >> >> Carpe Diem ... >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> > > > From Y.S.Ramakrishna at Sun.COM Fri Jan 4 10:21:38 2008 From: Y.S.Ramakrishna at Sun.COM (Y Srinivas Ramakrishna) Date: Fri, 04 Jan 2008 10:21:38 -0800 Subject: Perplexing GC Time Growth In-Reply-To: <8CDFE9CE1B9A344E9AA47C27B7C58DD8168FE8@MSMAIL.cboent.cboe.com> References: <8CDFE9CE1B9A344E9AA47C27B7C58DD8168FE8@MSMAIL.cboent.cboe.com> Message-ID: Hi Paul -- [I am posting my response to the *-use list where the original discussion started.] As the scavenge time plots which i posted earlier indicated, the scavenges increase in a linear fashion and are not affected by remarks. As I also indicated to Jason in a late follow-up yesterday, I had screwed up in the specification for class unloading which, pre-6u4, need two flags to be effective: XX:+CMSClassUnloadingEnabled -XX:+CMSPermGenSweepingEnabled. Let us see what Jason says about the effect of those two flags. --ramki ----- Original Message ----- From: "Ciciora, Paul" Date: Friday, January 4, 2008 7:30 am Subject: Perplexing GC Time Growth To: hotspot-gc-dev at openjdk.java.net > If you don't mind me asking, what do the times look like after a remark > cycle? Also have you run with PrintTenuringDistribution? It's a nice > way to make sure you have the optimal number of slots. > From jason at mugfu.com Sat Jan 5 06:46:40 2008 From: jason at mugfu.com (Jason Vasquez) Date: Sat, 5 Jan 2008 09:46:40 -0500 Subject: Perplexing GC Time Growth In-Reply-To: References: <9BADD5B8-F9DA-4656-843B-7D44FF36963A@mugfu.com> <477C09ED.9020400@Sun.COM> <477D7963.1000804@Sun.COM> Message-ID: <9437345E-852D-4F45-81DE-AF7B68E8B339@mugfu.com> Hi Ramki, Sorry for the delay, I'm on vacation visiting some family over a long weekend, trying not to draw the wrath of my wife for stealing time away hopping on the VPN to check things out :) Anyway, I've attached the latest garbage log, unfortunately, it doesn't seem to have changed things much (although I can see that the flags are now taking affect) -jason -------------- next part -------------- A non-text attachment was scrubbed... Name: garbage.log.gz Type: application/x-gzip Size: 44157 bytes Desc: not available Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20080105/707b85a4/attachment.bin -------------- next part -------------- On Jan 4, 2008, at 14:54 , Y Srinivas Ramakrishna wrote: > Hi Jason -- did you get a chance to check what happens to scavenge > times with both the class unloading options specified as stated below. > > thanks! > -- ramki > > ----- Original Message ----- > From: Y.S.Ramakrishna at Sun.COM > Date: Thursday, January 3, 2008 4:10 pm > Subject: Re: Perplexing GC Time Growth > To: Jason Vasquez > Cc: hotspot-gc-use at openjdk.java.net > > >> Hi Jason -- >> >> Y.S.Ramakrishna at Sun.COM wrote: >> ... >>> While we think of potential reasons for this, or mull over >>> appropriate sensors that can lay bare the root cause here, >>> could you, on a quick hunch, do a quick experiment and tell >>> me if adding the options -XX:+CMSClassUnloadingEnabled makes >>> any difference to the observations above. [Does your application >>> do lots of class loading or reflection or string interning?] >> >> I think i screwed up a bit here. I was assuming you were using >> the (newest) OpenJDK build. As it happens, in 6u3 and older >> you need to specify both +CMSClassUnloadingEnabled _and_ >> +CMSPermGenSweepingEnabled, else you do not get the desired effect. >> Sorry this was fixed after 6u3, so the additional flag did not >> do anything in your case. >> >> Let me know if the additional flag (please use both) made >> a difference, and apologies for the bad user interface here >> (set right as of 6u4 and in OpenJDK). >> >> -- ramki >> >>> >> From jason at mugfu.com Tue Jan 8 08:16:33 2008 From: jason at mugfu.com (Jason Vasquez) Date: Tue, 8 Jan 2008 11:16:33 -0500 Subject: Perplexing GC Time Growth In-Reply-To: <9437345E-852D-4F45-81DE-AF7B68E8B339@mugfu.com> References: <9BADD5B8-F9DA-4656-843B-7D44FF36963A@mugfu.com> <477C09ED.9020400@Sun.COM> <477D7963.1000804@Sun.COM> <9437345E-852D-4F45-81DE-AF7B68E8B339@mugfu.com> Message-ID: Well, interesting finds yesterday -- You had me thinking about gcroots again -- I inspected the gc roots via jhat after a 24 hour run of the application, and noticed some surprising results. The jhat output was quite large and ended up crashing the browser after it consumed >2gb RAM :) dumped it out via curl, and stopped after the HTML output capped 350MB over the slow VPN link. Appears with the limited output, there were >1.5million JNI Local References hanging around. (and likely more if I had been able to obtain the full output) Looks like the count sits idle at about 5 local refs, and grows very rapidly during normal running of the application (in tens of thousands after a couple minutes) I peppered the native code with DeleteLocalRef directives anywhere a local reference was declared, and the number of JNI local refs now stays steady at 52-53 during full load. (!) The JNI documentation seems to indicate that I shouldn't need to do that for these type of references, but in any case, that is an extreme difference. This makes sense for the linear increase in GC times, since the number of JNI Local Refs (and, by extension, gc roots) was likely growing at a quite rapid and liner pace. I've started a new run in our load test environment now, with high hopes of success this time, I'll let you know how it turns out. I had temporarily disabled the PermGen sweep at the beginning of the test, and am seeing some very slow growth in the perm gen, I might still need that flag back before we're all done, but this is a good start. Thanks for your assistance - I'm hoping for the best, -jaosn On Jan 5, 2008, at 09:46 , Jason Vasquez wrote: > Hi Ramki, > > Sorry for the delay, I'm on vacation visiting some family over a > long weekend, trying not to draw the wrath of my wife for stealing > time away hopping on the VPN to check things out :) > > Anyway, I've attached the latest garbage log, unfortunately, it > doesn't seem to have changed things much (although I can see that > the flags are now taking affect) > > -jason > > > > > > On Jan 4, 2008, at 14:54 , Y Srinivas Ramakrishna wrote: > >> Hi Jason -- did you get a chance to check what happens to scavenge >> times with both the class unloading options specified as stated >> below. >> >> thanks! >> -- ramki >> >> ----- Original Message ----- >> From: Y.S.Ramakrishna at Sun.COM >> Date: Thursday, January 3, 2008 4:10 pm >> Subject: Re: Perplexing GC Time Growth >> To: Jason Vasquez >> Cc: hotspot-gc-use at openjdk.java.net >> >> >>> Hi Jason -- >>> >>> Y.S.Ramakrishna at Sun.COM wrote: >>> ... >>>> While we think of potential reasons for this, or mull over >>>> appropriate sensors that can lay bare the root cause here, >>>> could you, on a quick hunch, do a quick experiment and tell >>>> me if adding the options -XX:+CMSClassUnloadingEnabled makes >>>> any difference to the observations above. [Does your application >>>> do lots of class loading or reflection or string interning?] >>> >>> I think i screwed up a bit here. I was assuming you were using >>> the (newest) OpenJDK build. As it happens, in 6u3 and older >>> you need to specify both +CMSClassUnloadingEnabled _and_ >>> +CMSPermGenSweepingEnabled, else you do not get the desired effect. >>> Sorry this was fixed after 6u3, so the additional flag did not >>> do anything in your case. >>> >>> Let me know if the additional flag (please use both) made >>> a difference, and apologies for the bad user interface here >>> (set right as of 6u4 and in OpenJDK). >>> >>> -- ramki >>> >>>> >>> > From Stephen.Bohne at Sun.COM Tue Jan 8 13:40:03 2008 From: Stephen.Bohne at Sun.COM (Steve Bohne) Date: Tue, 08 Jan 2008 16:40:03 -0500 Subject: Perplexing GC Time Growth In-Reply-To: References: <9BADD5B8-F9DA-4656-843B-7D44FF36963A@mugfu.com> <477C09ED.9020400@Sun.COM> <477D7963.1000804@Sun.COM> <9437345E-852D-4F45-81DE-AF7B68E8B339@mugfu.com> Message-ID: <4783EDB3.4070509@sun.com> Jason, Jason Vasquez wrote: > Well, interesting finds yesterday -- > > You had me thinking about gcroots again -- I inspected the gc roots > via jhat after a 24 hour run of the application, and noticed some > surprising results. The jhat output was quite large and ended up > crashing the browser after it consumed >2gb RAM :) dumped it out via > curl, and stopped after the HTML output capped 350MB over the slow VPN > link. Appears with the limited output, there were >1.5million JNI > Local References hanging around. (and likely more if I had been able > to obtain the full output) > > Looks like the count sits idle at about 5 local refs, and grows very > rapidly during normal running of the application (in tens of thousands > after a couple minutes) I peppered the native code with > DeleteLocalRef directives anywhere a local reference was declared, and > the number of JNI local refs now stays steady at 52-53 during full > load. (!) The JNI documentation seems to indicate that I shouldn't > need to do that for these type of references, but in any case, that is > an extreme difference. Normally, local references should be deleted automatically when a native method returns to Java. One explanation for this behavior might involve some thread sitting in a native method and generating local references for a very long/indefinite amount of time, without returning to Java. Is this a possibility? Thanks, Steve > > This makes sense for the linear increase in GC times, since the number > of JNI Local Refs (and, by extension, gc roots) was likely growing at > a quite rapid and liner pace. > > I've started a new run in our load test environment now, with high > hopes of success this time, I'll let you know how it turns out. I had > temporarily disabled the PermGen sweep at the beginning of the test, > and am seeing some very slow growth in the perm gen, I might still > need that flag back before we're all done, but this is a good start. > > Thanks for your assistance - I'm hoping for the best, > -jaosn > > > On Jan 5, 2008, at 09:46 , Jason Vasquez wrote: > >> Hi Ramki, >> >> Sorry for the delay, I'm on vacation visiting some family over a >> long weekend, trying not to draw the wrath of my wife for stealing >> time away hopping on the VPN to check things out :) >> >> Anyway, I've attached the latest garbage log, unfortunately, it >> doesn't seem to have changed things much (although I can see that >> the flags are now taking affect) >> >> -jason >> >> >> >> >> >> On Jan 4, 2008, at 14:54 , Y Srinivas Ramakrishna wrote: >> >>> Hi Jason -- did you get a chance to check what happens to scavenge >>> times with both the class unloading options specified as stated >>> below. >>> >>> thanks! >>> -- ramki >>> >>> ----- Original Message ----- >>> From: Y.S.Ramakrishna at Sun.COM >>> Date: Thursday, January 3, 2008 4:10 pm >>> Subject: Re: Perplexing GC Time Growth >>> To: Jason Vasquez >>> Cc: hotspot-gc-use at openjdk.java.net >>> >>> >>>> Hi Jason -- >>>> >>>> Y.S.Ramakrishna at Sun.COM wrote: >>>> ... >>>>> While we think of potential reasons for this, or mull over >>>>> appropriate sensors that can lay bare the root cause here, >>>>> could you, on a quick hunch, do a quick experiment and tell >>>>> me if adding the options -XX:+CMSClassUnloadingEnabled makes >>>>> any difference to the observations above. [Does your application >>>>> do lots of class loading or reflection or string interning?] >>>> I think i screwed up a bit here. I was assuming you were using >>>> the (newest) OpenJDK build. As it happens, in 6u3 and older >>>> you need to specify both +CMSClassUnloadingEnabled _and_ >>>> +CMSPermGenSweepingEnabled, else you do not get the desired effect. >>>> Sorry this was fixed after 6u3, so the additional flag did not >>>> do anything in your case. >>>> >>>> Let me know if the additional flag (please use both) made >>>> a difference, and apologies for the bad user interface here >>>> (set right as of 6u4 and in OpenJDK). >>>> >>>> -- ramki >>>> > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From jason at mugfu.com Tue Jan 8 14:03:22 2008 From: jason at mugfu.com (Jason Vasquez) Date: Tue, 8 Jan 2008 17:03:22 -0500 Subject: Perplexing GC Time Growth In-Reply-To: <4783EDB3.4070509@sun.com> References: <9BADD5B8-F9DA-4656-843B-7D44FF36963A@mugfu.com> <477C09ED.9020400@Sun.COM> <477D7963.1000804@Sun.COM> <9437345E-852D-4F45-81DE-AF7B68E8B339@mugfu.com> <4783EDB3.4070509@sun.com> Message-ID: On Jan 8, 2008, at 16:40 , Steve Bohne wrote: > Jason Vasquez wrote: >> I peppered the native code with DeleteLocalRef directives anywhere >> a local reference was declared, and the number of JNI local refs >> now stays steady at 52-53 during full load. (!) The JNI >> documentation seems to indicate that I shouldn't need to do that >> for these type of references, but in any case, that is an extreme >> difference. > > Normally, local references should be deleted automatically when a > native method returns to Java. One explanation for this behavior > might involve some thread sitting in a native method and generating > local references for a very long/indefinite amount of time, without > returning to Java. Is this a possibility? These references are generated from a separate native thread calling up into Java. The access pattern is a bit dodgy (legacy code here, folks :)): void foo() { ... //ctl is some other object with a jenv member jobject job = ctl->jenv->CallStaticObjectMethod( ctl->fooCls, ctl->getInstanceMid ); ctl->jenv->CallVoidMethod( job, ctl->newEventMid, fooId, eventType, state, data ); ctl->jenv->DeleteLocalRef(job); //newly added ... } For C++ objects that have jenv as a local member or variable, it seems that the local refs are cleaned up normally. In access patterns of the sort listed above, 'job' doesn't seem to be deleted, even though 'foo()' enters and exits quite quickly. This whole pattern is very nasty to me, and ripe for refactoring, but for now, DeleteLocalRef seems to be the solution. -jason From jason at mugfu.com Wed Jan 9 03:44:44 2008 From: jason at mugfu.com (Jason Vasquez) Date: Wed, 9 Jan 2008 06:44:44 -0500 Subject: Perplexing GC Time Growth In-Reply-To: References: <9BADD5B8-F9DA-4656-843B-7D44FF36963A@mugfu.com> <477C09ED.9020400@Sun.COM> <477D7963.1000804@Sun.COM> <9437345E-852D-4F45-81DE-AF7B68E8B339@mugfu.com> Message-ID: Thought you may be interested in the latest GC log -- this is just past the 7.5 hour mark, GC times are still holding steady at 20ms (grand improvement) Perm Gen sweep/Class unloading is enabled here as well, btw. There appears thre might be a remaining small leak troubling the old gen a bit, but it's growth is relatively minor, I'll have a look there next now that the other noise has subsided :) -------------- next part -------------- A non-text attachment was scrubbed... Name: garbage.log.gz Type: application/x-gzip Size: 13957 bytes Desc: not available Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20080109/4d96201d/attachment.bin -------------- next part -------------- Anyway, thanks again for your consideration of this issue, seems as if it might be in hand now. -jason On Jan 8, 2008, at 11:16 , Jason Vasquez wrote: > Well, interesting finds yesterday -- > > You had me thinking about gcroots again -- I inspected the gc roots > via jhat after a 24 hour run of the application, and noticed some > surprising results. The jhat output was quite large and ended up > crashing the browser after it consumed >2gb RAM :) dumped it out > via curl, and stopped after the HTML output capped 350MB over the > slow VPN link. Appears with the limited output, there were > >1.5million JNI Local References hanging around. (and likely more if > I had been able to obtain the full output) > > Looks like the count sits idle at about 5 local refs, and grows very > rapidly during normal running of the application (in tens of > thousands after a couple minutes) I peppered the native code with > DeleteLocalRef directives anywhere a local reference was declared, > and the number of JNI local refs now stays steady at 52-53 during > full load. (!) The JNI documentation seems to indicate that I > shouldn't need to do that for these type of references, but in any > case, that is an extreme difference. > > This makes sense for the linear increase in GC times, since the > number of JNI Local Refs (and, by extension, gc roots) was likely > growing at a quite rapid and liner pace. > > I've started a new run in our load test environment now, with high > hopes of success this time, I'll let you know how it turns out. I > had temporarily disabled the PermGen sweep at the beginning of the > test, and am seeing some very slow growth in the perm gen, I might > still need that flag back before we're all done, but this is a good > start. > > Thanks for your assistance - I'm hoping for the best, > -jaosn > > > On Jan 5, 2008, at 09:46 , Jason Vasquez wrote: > >> Hi Ramki, >> >> Sorry for the delay, I'm on vacation visiting some family over a >> long weekend, trying not to draw the wrath of my wife for stealing >> time away hopping on the VPN to check things out :) >> >> Anyway, I've attached the latest garbage log, unfortunately, it >> doesn't seem to have changed things much (although I can see that >> the flags are now taking affect) >> >> -jason >> >> >> >> >> >> On Jan 4, 2008, at 14:54 , Y Srinivas Ramakrishna wrote: >> >>> Hi Jason -- did you get a chance to check what happens to scavenge >>> times with both the class unloading options specified as stated >>> below. >>> >>> thanks! >>> -- ramki >>> >>> ----- Original Message ----- >>> From: Y.S.Ramakrishna at Sun.COM >>> Date: Thursday, January 3, 2008 4:10 pm >>> Subject: Re: Perplexing GC Time Growth >>> To: Jason Vasquez >>> Cc: hotspot-gc-use at openjdk.java.net >>> >>> >>>> Hi Jason -- >>>> >>>> Y.S.Ramakrishna at Sun.COM wrote: >>>> ... >>>>> While we think of potential reasons for this, or mull over >>>>> appropriate sensors that can lay bare the root cause here, >>>>> could you, on a quick hunch, do a quick experiment and tell >>>>> me if adding the options -XX:+CMSClassUnloadingEnabled makes >>>>> any difference to the observations above. [Does your application >>>>> do lots of class loading or reflection or string interning?] >>>> >>>> I think i screwed up a bit here. I was assuming you were using >>>> the (newest) OpenJDK build. As it happens, in 6u3 and older >>>> you need to specify both +CMSClassUnloadingEnabled _and_ >>>> +CMSPermGenSweepingEnabled, else you do not get the desired effect. >>>> Sorry this was fixed after 6u3, so the additional flag did not >>>> do anything in your case. >>>> >>>> Let me know if the additional flag (please use both) made >>>> a difference, and apologies for the bad user interface here >>>> (set right as of 6u4 and in OpenJDK). >>>> >>>> -- ramki >>>> >>>>> >>>> >> > From Steve.Goldman at Sun.COM Wed Jan 9 06:00:25 2008 From: Steve.Goldman at Sun.COM (steve goldman) Date: Wed, 09 Jan 2008 09:00:25 -0500 Subject: Perplexing GC Time Growth In-Reply-To: References: <9BADD5B8-F9DA-4656-843B-7D44FF36963A@mugfu.com> <477C09ED.9020400@Sun.COM> <477D7963.1000804@Sun.COM> <9437345E-852D-4F45-81DE-AF7B68E8B339@mugfu.com> <4783EDB3.4070509@sun.com> Message-ID: <4784D379.2070707@sun.com> Jason Vasquez wrote: > On Jan 8, 2008, at 16:40 , Steve Bohne wrote: >> Jason Vasquez wrote: >>> I peppered the native code with DeleteLocalRef directives anywhere >>> a local reference was declared, and the number of JNI local refs >>> now stays steady at 52-53 during full load. (!) The JNI >>> documentation seems to indicate that I shouldn't need to do that >>> for these type of references, but in any case, that is an extreme >>> difference. >> Normally, local references should be deleted automatically when a >> native method returns to Java. One explanation for this behavior >> might involve some thread sitting in a native method and generating >> local references for a very long/indefinite amount of time, without >> returning to Java. Is this a possibility? > > These references are generated from a separate native thread calling > up into Java. The access pattern is a bit dodgy (legacy code here, > folks :)): > > void foo() { > ... > //ctl is some other object with a jenv member > jobject job = ctl->jenv->CallStaticObjectMethod( ctl->fooCls, > ctl->getInstanceMid ); > ctl->jenv->CallVoidMethod( job, > ctl->newEventMid, > fooId, > eventType, > state, > data ); > ctl->jenv->DeleteLocalRef(job); //newly added > ... > } > > For C++ objects that have jenv as a local member or variable, it seems > that the local refs are cleaned up normally. In access patterns of > the sort listed above, 'job' doesn't seem to be deleted, even though > 'foo()' enters and exits quite quickly. This call direction is the opposite of what Steve was refering to. Local references created while in native are automatically deleted when returning to Java, they are not removed when returning from Java to native. So if you C++ code looks effectively like for (;; ) { foo(); } you will in fact never free local references without the DeleteLocalRef. -- Steve From jamesnichols3 at gmail.com Thu Jan 10 15:54:50 2008 From: jamesnichols3 at gmail.com (James Nichols) Date: Thu, 10 Jan 2008 18:54:50 -0500 Subject: Rapid fluctuation of tenuring threshold Message-ID: <83a51e120801101554h4a9d7f51yc827d506d21fcd2@mail.gmail.com> Hello, My application has a fairly high streaming dataflow that creates a lot of short-lived objects. I've made the young generation fairly large and have a pretty big survivor space, but I still have a # of objects that end up in the old generation. As a result of this, my old generation size ramps up slowly over time until these objects are dead, then they all get cleaned up and there is a big drop in old generation usage, from about 2.5GBs to about 1GB. Most of the time it takes about 80 minutes to get up to 2.5GBs, but occasionally it happens much more rapidly, as fast at every 8 minutes. My workload does fluctuate over time, but not enough to explain this change in garbage collection behavior. I'm seeing some very odd behavior in the JVM's tuning of the tenuring threshold. Attached is a chart that plots the old generation over time (the blue line with a moving average in yellow). Each dot is a garbage collection (usually the young generation). On this chart, I also plotted the tenuring threshold in red, with a moving average in black. Each dot represents what the threshold was for that particular garbage collection. You can see around time 11,500 minutes the peaks/valleys become much more frequent as more data is ending up int the old generation. During this time, the moving average of the tenuring threshold drops substantially. This also happens during other periods where the old generation is filled up quickly. I'm not sure if this is a cause or an effect. I leaning towards the tenuring threshold being the cause because the objects ending up in the old generation are typically reclaimed in a very short period of time. I've analyzed the distribution of the tenuring threshold over a substantial period of time in my application. It appears that 25% of the time the tenuring threshold is 1, 60% of the time it's 16, and the other 15% is distributed somewhere in the middle. I'm a bit puzzled why the threshold would so rapidly change between 1 and 16, and wonder if I'm getting hosed by the auto-tuning of this threshold. Looking at the log, it rapidly goes from 1 to 16, very rarely stopping intermediately. Is it reasonable to think that something is wrong with the auto tuning to make it stay at 1 and cause the behavior I noted above? I'm running jdk 1.5.0_12 on RedHat linux and my application server is JBoss 4.0.5. My GC settings are below... I can send the gc.dat log if you want it, it's pretty big so I won't email it to the list. I've signed up to the mailing list, but please CC me on any replies. Thanks, James -server -Xms5170m -Xmx5170m -XX:NewSize=1536M -XX:MaxNewSize=1536M -XX:PermSize=512M -XX:MaxPermSize=512M -XX:MaxTenuringThreshold=30 -XX:SurvivorRatio=10 -XX:+ScavengeBeforeFullGC -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ParallelGCThreads=3 -XX:+CMSParallelRemarkEnabled -XX:+CMSClassUnloadingEnabled -XX:+CMSPermGenSweepingEnabled -XX:+DisableExplicitGC -XX:+PrintTenuringDistribution -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -XX:+PrintGCApplicationConcurrentTime -XX:+PrintGCApplicationStoppedTime -Xloggc:gc.dat -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20080110/1a9501cb/attachment.html -------------- next part -------------- A non-text attachment was scrubbed... Name: tenuring.png Type: image/png Size: 43555 bytes Desc: not available Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20080110/1a9501cb/attachment.png From Y.S.Ramakrishna at Sun.COM Fri Jan 11 14:51:11 2008 From: Y.S.Ramakrishna at Sun.COM (Y Srinivas Ramakrishna) Date: Fri, 11 Jan 2008 14:51:11 -0800 Subject: Rapid fluctuation of tenuring threshold In-Reply-To: <83a51e120801101554h4a9d7f51yc827d506d21fcd2@mail.gmail.com> References: <83a51e120801101554h4a9d7f51yc827d506d21fcd2@mail.gmail.com> Message-ID: Hi James -- thanks for all of the data; the graphs as well as the logs. Basically, the adaptive tenuring policy (for fixed survivor spaces) appears to be working pretty well here. When there is space in the survivor spaces to accomodate the survivors the threshold is loosened and as soon as there is a spike in the survivors the threshold is tightened down. Interestingly what appears to be happening is that occasionally your program is going through these what i shall call "survival spikes" where the death rate of very recently allocated objects falls sharply. (You see this as a sudden spike in the population of age 1 objects in the PrintTenuringDistribution output.) However this quickly passes and these objects then die fairly quickly thereafter. When the death rate falls suddenly in this manner, you see the adaptive tenuring policy clamp down the threshold in an effort to prevent survivor space overflow (which can cause premature promotion and nepotism). When this happens there is also an increase in promotiom rate (which you can observe in your plot at a fine scale). My guess is that you have a system on which occasionally some threads are getting stalled or are getting temporarily starved. When this happens the objects being processed by these threads tend to have their lifetimes elongated and this causes a temporary fall in the death rate. These threads probably then get to execute and quickly do the processing that makes these objects dead and things then return to normal. Either this, or there is some other burstiness in the processing of objects that gives them this bursty dynamics. I suspect that the "streaming" is not streaming as well as you expect and the streams may be getting ocassionally stalled. So that's something you want to check in your system. As far as tuning GC so as to avoid any unintended performance consequences from this, what you want to do is to prevent survivor space overflow under any circumstances, because overflow can lead to premature promotion and that in turn can lead to nepotism so that there will be a sudden spike in promotions, and thence in scavenge pause times as well as more pressure on the concurrent collector (just as you noticed). The increased activity of the concurrent collector can in turn stall the streaming threads some more. (Which makes me ask what your platform is -- #cpus -- and whether you have tried incremental cms to see if it makes the behaviour less bursty and more smooth.) >From the logs, i see some age 1 objects (i.e. objects that have been allocated since the last scavenge) are getting tenured. This clearly indicates that the survivor spaces are under-sized to deal with the occasional heavier load (or stall of the processing threads). I would recommend loweing the max tenuring threshold to 4 (this avoids the unnecessary copying of long-lived objects that you see which increases scavenge pause times) and increasing the survivor space size by decreasing the survivor ratio to something like 4 (this would avoid the occasional survivor space overflow which can be very bad as i described above). In order to compensate for the the smaller survivor ratio (so as to leave the size of Eden the same, so as to leave your inter-scavenge period unchanged), i would make a commensurate increase in NewSize. In order to leave CMS cycles running at the same rate as before you may also need to add space appropriately to the old generation (by increasing Xmx Xms). That's all for now from me, but i am sure others on the alias will be able to provide further tips (or alternate theories for the apparent burstiness in behaviour). [Note however that the tenuring threshold oscillating from 16 to 1 and back to 16 is more or less normal in this case, especially the transition from 16 to 1 given the suddenness with which the new object death rate falls -- it is this suddenness that makes me suspect some kind of a stall in one or a group of threads that would normally be processing these surviving objects. Do you have any application level queue length or other metrics that would tell you whether something like this was happening?] How many threads do you have in your application, how many cpu's on your platform, and do you observe any fluctuation in the cpu consumed by various threads -- in particular are the downward spikes in tenuring threshold preceded by stalling of some heavily working threads in the application? That's the direction in which I would point your investigation. PS: I am attaching the survivor space occupancy %ge plot that you attached in your most recent email to you since it is a nice illustration of how the reduction in tenuring threshold is still occasionally unable to avoid survivor space overflow (to the extent of 1 in 9.5 scavenges on average according to a rough calculation). -- ramki ----- Original Message ----- From: James Nichols Date: Friday, January 11, 2008 11:11 am Subject: Rapid fluctuation of tenuring threshold To: hotspot-gc-use at openjdk.java.net > Hello, > > My application has a fairly high streaming dataflow that creates a lot > of > short-lived objects. I've made the young generation fairly large and > have a > pretty big survivor space, but I still have a # of objects that end up > in > the old generation. As a result of this, my old generation size ramps > up > slowly over time until these objects are dead, then they all get > cleaned up > and there is a big drop in old generation usage, from about 2.5GBs to > about > 1GB. Most of the time it takes about 80 minutes to get up to 2.5GBs, > but > occasionally it happens much more rapidly, as fast at every 8 minutes. > My > workload does fluctuate over time, but not enough to explain this > change in > garbage collection behavior. I'm seeing some very odd behavior in the > JVM's > tuning of the tenuring threshold. > > Attached is a chart that plots the old generation over time (the blue > line > with a moving average in yellow). Each dot is a garbage collection (usually > the young generation). On this chart, I also plotted the tenuring threshold > in red, with a moving average in black. Each dot represents what the > threshold was for that particular garbage collection. You can see around > time 11,500 minutes the peaks/valleys become much more frequent as > more data > is ending up int the old generation. During this time, the moving average > of the tenuring threshold drops substantially. This also happens during > other periods where the old generation is filled up quickly. I'm not > sure > if this is a cause or an effect. I leaning towards the tenuring threshold > being the cause because the objects ending up in the old generation are > typically reclaimed in a very short period of time. > > I've analyzed the distribution of the tenuring threshold over a substantial > period of time in my application. It appears that 25% of the time the > tenuring threshold is 1, 60% of the time it's 16, and the other 15% is > distributed somewhere in the middle. I'm a bit puzzled why the threshold > would so rapidly change between 1 and 16, and wonder if I'm getting > hosed by > the auto-tuning of this threshold. Looking at the log, it rapidly > goes from > 1 to 16, very rarely stopping intermediately. Is it reasonable to think > that something is wrong with the auto tuning to make it stay at 1 and > cause > the behavior I noted above? > > I'm running jdk 1.5.0_12 on RedHat linux and my application server is > JBoss > 4.0.5. My GC settings are below... I can send the gc.dat log if you want > it, it's pretty big so I won't email it to the list. I've signed up > to the > mailing list, but please CC me on any replies. > > Thanks, James > > -server -Xms5170m -Xmx5170m > -XX:NewSize=1536M -XX:MaxNewSize=1536M > -XX:PermSize=512M -XX:MaxPermSize=512M > -XX:MaxTenuringThreshold=30 -XX:SurvivorRatio=10 > -XX:+ScavengeBeforeFullGC > -XX:+UseConcMarkSweepGC > -XX:+UseParNewGC -XX:ParallelGCThreads=3 > -XX:+CMSParallelRemarkEnabled -XX:+CMSClassUnloadingEnabled > -XX:+CMSPermGenSweepingEnabled -XX:+DisableExplicitGC > -XX:+PrintTenuringDistribution -XX:+PrintGCDetails -XX:+PrintGCTimeStamps > -XX:+PrintHeapAtGC > -XX:+PrintGCApplicationConcurrentTime -XX:+PrintGCApplicationStoppedTime > -Xloggc:gc.dat > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From Y.S.Ramakrishna at Sun.COM Fri Jan 11 15:00:45 2008 From: Y.S.Ramakrishna at Sun.COM (Y Srinivas Ramakrishna) Date: Fri, 11 Jan 2008 15:00:45 -0800 Subject: Rapid fluctuation of tenuring threshold In-Reply-To: References: <83a51e120801101554h4a9d7f51yc827d506d21fcd2@mail.gmail.com> Message-ID: > PS: I am attaching the survivor space occupancy %ge plot > that you attached in your most recent email to you since > it is a nice illustration of how the reduction in tenuring threshold > is still occasionally unable to avoid survivor space overflow > (to the extent of 1 in 9.5 scavenges on average according to > a rough calculation). I have attached your plot this time, for the benefit of others on the list who may be following this thread. -------------- next part -------------- A non-text attachment was scrubbed... Name: tenuring_survivor.png Type: image/png Size: 72108 bytes Desc: not available Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20080111/3d99e264/attachment.png From chris.guan at oocl.com Sun Jan 13 22:54:33 2008 From: chris.guan at oocl.com (chris.guan at oocl.com) Date: Mon, 14 Jan 2008 14:54:33 +0800 Subject: Strange isssue about OOM Message-ID: <586EF14A9B93B945B69DF313EEE8B9A50244CCDD@shamail3.corp.oocl.com> Dear, Encounter one issue about out of memory, server continued trigger gc and full gc while the memory is less used, eapesicially the [young]. Please help.Also, what is the theary to trigger gc?Many thanks. Not calling System.gc(). Performing 10 VU's test. Not reproduced. The server parameter: -server -Xms512M -Xmx512M -verbose:gc -XX:PermSize=64M -XX:MaxPermSize=64M. The log: 25185.728: [GC 25185.728: [DefNew: 163242K->163242K(169600K), 0.0000587 secs]25185.728: [Tenured: 234502K->252491K(349568K), 1.3021262 secs] 397745K->252491K(519168K), 1.3029282 secs] Total time for which application threads were stopped: 1.3040612 seconds Application time: 23.7854558 seconds 25210.819: [GC 25210.820: [DefNew: 86804K->1788K(169600K), 0.0486894 secs]25210.868: [Tenured: 266146K->241318K(349568K), 1.2399896 secs] 339295K->241318K(519168K), 1.2888920 secs] 25212.108: [Full GC 25212.109: [Tenured[Unloading class sun.reflect.GeneratedConstructorAccessor314] [Unloading class sun.reflect.GeneratedMethodAccessor759] [Unloading class sun.reflect.GeneratedConstructorAccessor315] [Unloading class sun.reflect.GeneratedConstructorAccessor268] [Unloading class sun.reflect.GeneratedConstructorAccessor269] [Unloading class sun.reflect.GeneratedConstructorAccessor316] [Unloading class sun.reflect.GeneratedConstructorAccessor313] : 241318K->226997K(349568K), 1.5769614 secs] 241318K->226997K(519168K), [Perm : 27510K->27498K(65536K)], 1.5771121 secs] Total time for which application threads were stopped: 2.8672106 seconds Application time: 0.0086622 seconds 25213.699: [GC 25213.699: [DefNew: 756K->82K(169600K), 0.0078670 secs]25213.707: [Tenured[Unloading class sun.reflect.GeneratedConstructorAccessor274] : 226997K->221430K(349568K), 2.0173917 secs] 227753K->221430K(519168K), 2.0254397 secs] 25215.724: [Full GC 25215.724: [Tenured: 221430K->221422K(349568K), 1.2257687 secs] 221430K->221422K(519168K), [Perm : 27496K->27496K(65536K)], 1.2259138 secs] Total time for which application threads were stopped: 3.2556802 seconds Application time: 21.5991046 seconds 25238.563: [GC 25238.563: [DefNew: 18728K->296K(169600K), 0.0169044 secs]25238.580: [Tenured: 221422K->221605K(349568K), 1.3133015 secs] 240150K->221605K(519168K), 1.3303972 secs] 25239.893: [Full GC 25239.894: [Tenured: 221605K->220330K(349568K), 1.2038988 secs] 221605K->220330K(519168K), [Perm : 27497K->27497K(65536K)], 1.2040751 secs] Total time for which application threads were stopped: 2.5488005 seconds Application time: 0.0188518 seconds 25241.130: [GC 25241.130: [DefNew: 1990K->59K(169600K), 0.0071533 secs]25241.137: [Tenured: 220330K->220344K(349568K), 1.2219109 secs] 222321K->220344K(519168K), 1.2292561 secs] 25242.359: [Full GC 25242.359: [Tenured: 220344K->220335K(349568K), 1.2026312 secs] 220344K->220335K(519168K), [Perm : 27497K->27497K(65536K)], 1.2027791 secs] Total time for which application threads were stopped: 2.4439904 seconds Application time: 24.2665427 seconds 25267.833: [GC 25267.833: [DefNew: 24971K->255K(169600K), 0.0081354 secs]25267.841: [Tenured: 220335K->220542K(349568K), 1.2416907 secs] 245306K->220542K(519168K), 1.2500197 secs] 25269.083: [Full GC 25269.083: [Tenured: 220542K->220317K(349568K), 1.2465385 secs] 220542K->220317K(519168K), [Perm : 27497K->27497K(65536K)], 1.2466957 secs] Total time for which application threads were stopped: 2.5006234 seconds Application time: 9.1710434 seconds 25279.513: [GC 25279.513: [DefNew: 11184K->124K(169600K), 0.0191095 secs]25279.532: [Tenured: 220317K->220247K(349568K), 1.2142263 secs] 231502K->220247K(519168K), 1.2335457 secs] 25280.746: [Full GC 25280.746: [Tenured: 220247K->220237K(349568K), 1.2042132 secs] 220247K->220237K(519168K), [Perm : 27497K->27497K(65536K)], 1.2043556 secs] Total time for which application threads were stopped: 2.4500521 seconds Application time: 28.5731236 seconds 25310.536: [GC 25310.536: [DefNew: 20905K->185K(169600K), 0.0174417 secs]25310.554: [Tenured: 220237K->220422K(349568K), 1.2108142 secs] 241142K->220422K(519168K), 1.2284657 secs] 25311.765: [Full GC 25311.765: [Tenured: 220422K->220264K(349568K), 1.2223217 secs] 220422K->220264K(519168K), [Perm : 27497K->27497K(65536K)], 1.2224701 secs] Total time for which application threads were stopped: 2.4631855 seconds Application time: 0.0368244 seconds 25313.039: [GC 25313.039: [DefNew: 2421K->91K(169600K), 0.0079322 secs]25313.047: [Tenured: 220264K->220182K(349568K), 1.3464051 secs] 222685K->220182K(519168K), 1.3545552 secs] 25314.393: [Full GC 25314.394: [Tenured: 220182K->220172K(349568K), 1.2893013 secs] 220182K->220172K(519168K), [Perm : 27497K->27497K(65536K)], 1.2894685 secs] Total time for which application threads were stopped: 2.6585330 seconds Application time: 45.1138619 seconds 25360.803: [GC 25360.803: [DefNew: 29023K->222K(169600K), 0.0078027 secs]25360.811: [Tenured: 220172K->220388K(349568K), 1.2135921 secs] 249195K->220388K(519168K), 1.2215896 secs] 25362.025: [Full GC 25362.025: [Tenured: 220388K->220294K(349568K), 1.1972240 secs] 220388K->220294K(519168K), [Perm : 27497K->27497K(65536K)], 1.1973704 secs] Total time for which application threads were stopped: 2.4229267 seconds Application time: 31.0328978 seconds 25394.270: [GC 25394.270: [DefNew: 19657K->162K(169600K), 0.0088962 secs]25394.279: [Tenured: 220294K->220251K(349568K), 1.3206682 secs] 239951K->220251K(519168K), 1.3297797 secs] 25395.599: [Full GC 25395.600: [Tenured: 220251K->220241K(349568K), 1.1989645 secs] 220251K->220241K(519168K), [Perm : 27497K->27497K(65536K)], 1.1991104 secs] Total time for which application threads were stopped: 2.5428534 seconds Application time: 92.3087423 seconds 25489.129: [GC 25489.130: [DefNew: 54368K->315K(169600K), 0.0116064 secs]25489.141: [Tenured: 220241K->220556K(349568K), 1.5860020 secs] 274610K->220556K(519168K), 1.5978559 secs] 25490.727: [Full GC 25490.728: [Tenured: 220556K->220393K(349568K), 1.7381398 secs] 220556K->220393K(519168K), [Perm : 27497K->27497K(65536K)], 1.7382875 secs] Total time for which application threads were stopped: 3.3583027 seconds Application time: 5.9312480 seconds 25498.413: [GC 25498.413: [DefNew: 5241K->93K(169600K), 0.0172377 secs]25498.430: [Tenured: 220393K->220184K(349568K), 1.2922102 secs] 225635K->220184K(519168K), 1.3096568 secs] 25499.722: [Full GC 25499.722: [Tenured: 220184K->220175K(349568K), 1.1997437 secs] 220184K->220175K(519168K), [Perm : 27497K->27497K(65536K)], 1.1999031 secs] Total time for which application threads were stopped: 2.5211851 seconds Application time: 30.6028562 seconds 25531.528: [GC 25531.528: [DefNew: 17776K->156K(169600K), 0.0171998 secs]25531.545: [Tenured: 220175K->220330K(349568K), 1.2259878 secs] 237951K->220330K(519168K), 1.2433891 secs] 25532.771: [Full GC 25532.771: [Tenured: 220330K->220239K(349568K), 1.6592153 secs] 220330K->220239K(519168K), [Perm : 27499K->27498K(65536K)], 1.6593672 secs] Total time for which application threads were stopped: 2.9038367 seconds Application time: 60.4048509 seconds 25594.836: [GC 25594.837: [DefNew: 33874K->231K(169600K), 0.0168536 secs]25594.853: [Tenured: 220239K->220323K(349568K), 1.2180811 secs] 254113K->220323K(519168K), 1.2351280 secs] 25596.072: [Full GC 25596.072: [Tenured: 220323K->220313K(349568K), 1.2081214 secs] 220323K->220313K(519168K), [Perm : 27498K->27498K(65536K)], 1.2082667 secs] Total time for which application threads were stopped: 2.4445252 seconds Application time: 197.3583592 seconds 25794.639: [GC 25794.640: [DefNew: 106524K->606K(169600K), 0.0115235 secs]25794.651: [Tenured: 220313K->220918K(349568K), 1.2305807 secs] 326837K->220918K(519168K), 1.2423031 secs] 25795.882: [Full GC 25795.882: [Tenured: 220918K->220684K(349568K), 1.4731407 secs] 220918K->220684K(519168K), [Perm : 27499K->27498K(65536K)], 1.4732974 secs] Total time for which application threads were stopped: 2.7168052 seconds Application time: 164.9563502 seconds 25962.313: [GC 25962.313: [DefNew: 82927K->507K(169600K), 0.0091128 secs]25962.322: [Tenured: 220684K->220600K(349568K), 1.2012755 secs] 303612K->220600K(519168K), 1.2105846 secs] 25963.523: [Full GC 25963.524: [Tenured: 220600K->220590K(349568K), 1.2976048 secs] 220600K->220590K(519168K), [Perm : 27498K->27498K(65536K)], 1.2977544 secs] Total time for which application threads were stopped: 2.5095478 seconds Application time: 318.2895362 seconds 26283.112: [GC 26283.112: [DefNew: 164480K->164480K(169600K), 0.0000494 secs]26283.112: [Tenured: 220590K->221495K(349568K), 1.3880562 secs] 385070K->221495K(519168K), 1.3883772 secs] Total time for which application threads were stopped: 1.3895031 seconds Application time: 318.6411712 seconds 26603.143: [GC 26603.143: [DefNew: 164480K->164480K(169600K), 0.0000486 secs]26603.143: [Tenured: 221495K->222391K(349568K), 1.3624147 secs] 385975K->222391K(519168K), 1.3626644 secs] Total time for which application threads were stopped: 1.3641826 seconds Application time: 318.5726610 seconds 26923.080: [GC 26923.080: [DefNew: 164480K->164480K(169600K), 0.0000489 secs]26923.080: [Tenured: 222391K->221000K(349568K), 1.3836587 secs] 386871K->221000K(519168K), 1.3839017 secs] Total time for which application threads were stopped: 1.3854997 seconds Application time: 319.1556307 seconds 27243.621: [GC 27243.621: [DefNew: 164480K->164480K(169600K), 0.0000463 secs]27243.621: [Tenured: 221000K->221899K(349568K), 1.7850120 secs] 385480K->221899K(519168K), 1.7852453 secs] Total time for which application threads were stopped: 1.7865089 seconds Application time: 318.6705844 seconds 27564.079: [GC 27564.079: [DefNew: 164480K->164480K(169600K), 0.0000428 secs]27564.079: [Tenured: 221899K->222801K(349568K), 1.5890202 secs] 386379K->222801K(519168K), 1.5892684 secs] Total time for which application threads were stopped: 1.5909515 seconds Application time: 318.5994769 seconds 27884.269: [GC 27884.269: [DefNew: 164480K->164480K(169600K), 0.0000471 secs]27884.269: [Tenured: 222801K->223693K(349568K), 1.6433190 secs] 387281K->223693K(519168K), 1.6435445 secs] Total time for which application threads were stopped: 1.6447412 seconds Application time: 94.5616049 seconds 27980.476: [GC 27980.476: [DefNew: 51783K->327K(169600K), 0.0179721 secs]27980.494: [Tenured: 223693K->220424K(349568K), 1.2211616 secs] 275476K->220424K(519168K), 1.2393261 secs] 27981.715: [Full GC 27981.715: [Tenured: 220424K->220413K(349568K), 1.2103792 secs] 220424K->220413K(519168K), [Perm : 27498K->27498K(65536K)], 1.2105386 secs] Total time for which application threads were stopped: 2.4509644 seconds Application time: 32.3719363 seconds 28015.299: [GC 28015.299: [DefNew: 18543K->164K(169600K), 0.0073310 secs]28015.306: [Tenured: 220413K->220577K(349568K), 1.2056721 secs] 238957K->220577K(519168K), 1.2131937 secs] 28016.512: [Full GC 28016.512: [Tenured: 220577K->220252K(349568K), 1.2023572 secs] 220577K->220252K(519168K), [Perm : 27499K->27498K(65536K)], 1.2025188 secs] Total time for which application threads were stopped: 2.4167372 seconds -------- 08/01/03 17:32:28 Stop process Thanks & Regards Chris Guan Tel: 24056255 IMPORTANT NOTICE Email from OOCL is confidential and may be legally privileged. If it is not intended for you, please delete it immediately unread. The internet cannot guarantee that this communication is free of viruses, interception or interference and anyone who communicates with us by email is taken to accept the risks in doing so. Without limitation, OOCL and its affiliates accept no liability whatsoever and howsoever arising in connection with the use of this email. Under no circumstances shall this email constitute a binding agreement to carry or for provision of carriage services by OOCL, which is subject to the availability of carrier's equipment and vessels and the terms and conditions of OOCL's standard bill of lading which is also available at http://www.oocl.com. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20080114/345d85e6/attachment.html From Y.S.Ramakrishna at Sun.COM Mon Jan 14 11:51:10 2008 From: Y.S.Ramakrishna at Sun.COM (Y Srinivas Ramakrishna) Date: Mon, 14 Jan 2008 11:51:10 -0800 Subject: Strange isssue about OOM In-Reply-To: <586EF14A9B93B945B69DF313EEE8B9A50244CCDD@shamail3.corp.oocl.com> References: <586EF14A9B93B945B69DF313EEE8B9A50244CCDD@shamail3.corp.oocl.com> Message-ID: What version of the JDK are you running? You might want to try -XX:+HandlePromotionFailure if using a version lower than JDK 6.0. See CR 6206427. -- ramki ----- Original Message ----- From: chris.guan at oocl.com Date: Monday, January 14, 2008 11:37 am Subject: Strange isssue about OOM To: hotspot-gc-use at openjdk.java.net Cc: steve.zhuang at oocl.com > Dear, > > Encounter one issue about out of memory, server continued trigger gc > and full gc while the memory is less used, eapesicially the [young]. > Please help.Also, what is the theary to trigger gc?Many thanks. > > Not calling System.gc(). Performing 10 VU's test. Not reproduced. > > The server parameter: -server -Xms512M -Xmx512M -verbose:gc > -XX:PermSize=64M -XX:MaxPermSize=64M. > > The log: > 25185.728: [GC 25185.728: [DefNew: 163242K->163242K(169600K), 0.0000587 > secs]25185.728: [Tenured: 234502K->252491K(349568K), 1.3021262 secs] > 397745K->252491K(519168K), 1.3029282 secs] > Total time for which application threads were stopped: 1.3040612 seconds > Application time: 23.7854558 seconds > 25210.819: [GC 25210.820: [DefNew: 86804K->1788K(169600K), 0.0486894 > secs]25210.868: [Tenured: 266146K->241318K(349568K), 1.2399896 secs] > 339295K->241318K(519168K), 1.2888920 secs] > 25212.108: [Full GC 25212.109: [Tenured[Unloading class > sun.reflect.GeneratedConstructorAccessor314] > [Unloading class sun.reflect.GeneratedMethodAccessor759] > [Unloading class sun.reflect.GeneratedConstructorAccessor315] > [Unloading class sun.reflect.GeneratedConstructorAccessor268] > [Unloading class sun.reflect.GeneratedConstructorAccessor269] > [Unloading class sun.reflect.GeneratedConstructorAccessor316] > [Unloading class sun.reflect.GeneratedConstructorAccessor313] > : 241318K->226997K(349568K), 1.5769614 secs] 241318K->226997K(519168K), > [Perm : 27510K->27498K(65536K)], 1.5771121 secs] > Total time for which application threads were stopped: 2.8672106 seconds > Application time: 0.0086622 seconds > 25213.699: [GC 25213.699: [DefNew: 756K->82K(169600K), 0.0078670 > secs]25213.707: [Tenured[Unloading class > sun.reflect.GeneratedConstructorAccessor274] > : 226997K->221430K(349568K), 2.0173917 secs] 227753K->221430K(519168K), > 2.0254397 secs] > 25215.724: [Full GC 25215.724: [Tenured: 221430K->221422K(349568K), > 1.2257687 secs] 221430K->221422K(519168K), [Perm : > 27496K->27496K(65536K)], 1.2259138 secs] > Total time for which application threads were stopped: 3.2556802 seconds > Application time: 21.5991046 seconds > 25238.563: [GC 25238.563: [DefNew: 18728K->296K(169600K), 0.0169044 > secs]25238.580: [Tenured: 221422K->221605K(349568K), 1.3133015 secs] > 240150K->221605K(519168K), 1.3303972 secs] > 25239.893: [Full GC 25239.894: [Tenured: 221605K->220330K(349568K), > 1.2038988 secs] 221605K->220330K(519168K), [Perm : > 27497K->27497K(65536K)], 1.2040751 secs] > Total time for which application threads were stopped: 2.5488005 seconds > Application time: 0.0188518 seconds > 25241.130: [GC 25241.130: [DefNew: 1990K->59K(169600K), 0.0071533 > secs]25241.137: [Tenured: 220330K->220344K(349568K), 1.2219109 secs] > 222321K->220344K(519168K), 1.2292561 secs] > 25242.359: [Full GC 25242.359: [Tenured: 220344K->220335K(349568K), > 1.2026312 secs] 220344K->220335K(519168K), [Perm : > 27497K->27497K(65536K)], 1.2027791 secs] > Total time for which application threads were stopped: 2.4439904 seconds > Application time: 24.2665427 seconds > 25267.833: [GC 25267.833: [DefNew: 24971K->255K(169600K), 0.0081354 > secs]25267.841: [Tenured: 220335K->220542K(349568K), 1.2416907 secs] > 245306K->220542K(519168K), 1.2500197 secs] > 25269.083: [Full GC 25269.083: [Tenured: 220542K->220317K(349568K), > 1.2465385 secs] 220542K->220317K(519168K), [Perm : > 27497K->27497K(65536K)], 1.2466957 secs] > Total time for which application threads were stopped: 2.5006234 seconds > Application time: 9.1710434 seconds > 25279.513: [GC 25279.513: [DefNew: 11184K->124K(169600K), 0.0191095 > secs]25279.532: [Tenured: 220317K->220247K(349568K), 1.2142263 secs] > 231502K->220247K(519168K), 1.2335457 secs] > 25280.746: [Full GC 25280.746: [Tenured: 220247K->220237K(349568K), > 1.2042132 secs] 220247K->220237K(519168K), [Perm : > 27497K->27497K(65536K)], 1.2043556 secs] > Total time for which application threads were stopped: 2.4500521 seconds > Application time: 28.5731236 seconds > 25310.536: [GC 25310.536: [DefNew: 20905K->185K(169600K), 0.0174417 > secs]25310.554: [Tenured: 220237K->220422K(349568K), 1.2108142 secs] > 241142K->220422K(519168K), 1.2284657 secs] > 25311.765: [Full GC 25311.765: [Tenured: 220422K->220264K(349568K), > 1.2223217 secs] 220422K->220264K(519168K), [Perm : > 27497K->27497K(65536K)], 1.2224701 secs] > Total time for which application threads were stopped: 2.4631855 seconds > Application time: 0.0368244 seconds > 25313.039: [GC 25313.039: [DefNew: 2421K->91K(169600K), 0.0079322 > secs]25313.047: [Tenured: 220264K->220182K(349568K), 1.3464051 secs] > 222685K->220182K(519168K), 1.3545552 secs] > 25314.393: [Full GC 25314.394: [Tenured: 220182K->220172K(349568K), > 1.2893013 secs] 220182K->220172K(519168K), [Perm : > 27497K->27497K(65536K)], 1.2894685 secs] > Total time for which application threads were stopped: 2.6585330 seconds > Application time: 45.1138619 seconds > 25360.803: [GC 25360.803: [DefNew: 29023K->222K(169600K), 0.0078027 > secs]25360.811: [Tenured: 220172K->220388K(349568K), 1.2135921 secs] > 249195K->220388K(519168K), 1.2215896 secs] > 25362.025: [Full GC 25362.025: [Tenured: 220388K->220294K(349568K), > 1.1972240 secs] 220388K->220294K(519168K), [Perm : > 27497K->27497K(65536K)], 1.1973704 secs] > Total time for which application threads were stopped: 2.4229267 seconds > Application time: 31.0328978 seconds > 25394.270: [GC 25394.270: [DefNew: 19657K->162K(169600K), 0.0088962 > secs]25394.279: [Tenured: 220294K->220251K(349568K), 1.3206682 secs] > 239951K->220251K(519168K), 1.3297797 secs] > 25395.599: [Full GC 25395.600: [Tenured: 220251K->220241K(349568K), > 1.1989645 secs] 220251K->220241K(519168K), [Perm : > 27497K->27497K(65536K)], 1.1991104 secs] > Total time for which application threads were stopped: 2.5428534 seconds > Application time: 92.3087423 seconds > 25489.129: [GC 25489.130: [DefNew: 54368K->315K(169600K), 0.0116064 > secs]25489.141: [Tenured: 220241K->220556K(349568K), 1.5860020 secs] > 274610K->220556K(519168K), 1.5978559 secs] > 25490.727: [Full GC 25490.728: [Tenured: 220556K->220393K(349568K), > 1.7381398 secs] 220556K->220393K(519168K), [Perm : > 27497K->27497K(65536K)], 1.7382875 secs] > Total time for which application threads were stopped: 3.3583027 seconds > Application time: 5.9312480 seconds > 25498.413: [GC 25498.413: [DefNew: 5241K->93K(169600K), 0.0172377 > secs]25498.430: [Tenured: 220393K->220184K(349568K), 1.2922102 secs] > 225635K->220184K(519168K), 1.3096568 secs] > 25499.722: [Full GC 25499.722: [Tenured: 220184K->220175K(349568K), > 1.1997437 secs] 220184K->220175K(519168K), [Perm : > 27497K->27497K(65536K)], 1.1999031 secs] > Total time for which application threads were stopped: 2.5211851 seconds > Application time: 30.6028562 seconds > 25531.528: [GC 25531.528: [DefNew: 17776K->156K(169600K), 0.0171998 > secs]25531.545: [Tenured: 220175K->220330K(349568K), 1.2259878 secs] > 237951K->220330K(519168K), 1.2433891 secs] > 25532.771: [Full GC 25532.771: [Tenured: 220330K->220239K(349568K), > 1.6592153 secs] 220330K->220239K(519168K), [Perm : > 27499K->27498K(65536K)], 1.6593672 secs] > Total time for which application threads were stopped: 2.9038367 seconds > Application time: 60.4048509 seconds > 25594.836: [GC 25594.837: [DefNew: 33874K->231K(169600K), 0.0168536 > secs]25594.853: [Tenured: 220239K->220323K(349568K), 1.2180811 secs] > 254113K->220323K(519168K), 1.2351280 secs] > 25596.072: [Full GC 25596.072: [Tenured: 220323K->220313K(349568K), > 1.2081214 secs] 220323K->220313K(519168K), [Perm : > 27498K->27498K(65536K)], 1.2082667 secs] > Total time for which application threads were stopped: 2.4445252 seconds > Application time: 197.3583592 seconds > 25794.639: [GC 25794.640: [DefNew: 106524K->606K(169600K), 0.0115235 > secs]25794.651: [Tenured: 220313K->220918K(349568K), 1.2305807 secs] > 326837K->220918K(519168K), 1.2423031 secs] > 25795.882: [Full GC 25795.882: [Tenured: 220918K->220684K(349568K), > 1.4731407 secs] 220918K->220684K(519168K), [Perm : > 27499K->27498K(65536K)], 1.4732974 secs] > Total time for which application threads were stopped: 2.7168052 seconds > Application time: 164.9563502 seconds > 25962.313: [GC 25962.313: [DefNew: 82927K->507K(169600K), 0.0091128 > secs]25962.322: [Tenured: 220684K->220600K(349568K), 1.2012755 secs] > 303612K->220600K(519168K), 1.2105846 secs] > 25963.523: [Full GC 25963.524: [Tenured: 220600K->220590K(349568K), > 1.2976048 secs] 220600K->220590K(519168K), [Perm : > 27498K->27498K(65536K)], 1.2977544 secs] > Total time for which application threads were stopped: 2.5095478 seconds > Application time: 318.2895362 seconds > 26283.112: [GC 26283.112: [DefNew: 164480K->164480K(169600K), 0.0000494 > secs]26283.112: [Tenured: 220590K->221495K(349568K), 1.3880562 secs] > 385070K->221495K(519168K), 1.3883772 secs] > Total time for which application threads were stopped: 1.3895031 seconds > Application time: 318.6411712 seconds > 26603.143: [GC 26603.143: [DefNew: 164480K->164480K(169600K), 0.0000486 > secs]26603.143: [Tenured: 221495K->222391K(349568K), 1.3624147 secs] > 385975K->222391K(519168K), 1.3626644 secs] > Total time for which application threads were stopped: 1.3641826 seconds > Application time: 318.5726610 seconds > 26923.080: [GC 26923.080: [DefNew: 164480K->164480K(169600K), 0.0000489 > secs]26923.080: [Tenured: 222391K->221000K(349568K), 1.3836587 secs] > 386871K->221000K(519168K), 1.3839017 secs] > Total time for which application threads were stopped: 1.3854997 seconds > Application time: 319.1556307 seconds > 27243.621: [GC 27243.621: [DefNew: 164480K->164480K(169600K), 0.0000463 > secs]27243.621: [Tenured: 221000K->221899K(349568K), 1.7850120 secs] > 385480K->221899K(519168K), 1.7852453 secs] > Total time for which application threads were stopped: 1.7865089 seconds > Application time: 318.6705844 seconds > 27564.079: [GC 27564.079: [DefNew: 164480K->164480K(169600K), 0.0000428 > secs]27564.079: [Tenured: 221899K->222801K(349568K), 1.5890202 secs] > 386379K->222801K(519168K), 1.5892684 secs] > Total time for which application threads were stopped: 1.5909515 seconds > Application time: 318.5994769 seconds > 27884.269: [GC 27884.269: [DefNew: 164480K->164480K(169600K), 0.0000471 > secs]27884.269: [Tenured: 222801K->223693K(349568K), 1.6433190 secs] > 387281K->223693K(519168K), 1.6435445 secs] > Total time for which application threads were stopped: 1.6447412 seconds > Application time: 94.5616049 seconds > 27980.476: [GC 27980.476: [DefNew: 51783K->327K(169600K), 0.0179721 > secs]27980.494: [Tenured: 223693K->220424K(349568K), 1.2211616 secs] > 275476K->220424K(519168K), 1.2393261 secs] > 27981.715: [Full GC 27981.715: [Tenured: 220424K->220413K(349568K), > 1.2103792 secs] 220424K->220413K(519168K), [Perm : > 27498K->27498K(65536K)], 1.2105386 secs] > Total time for which application threads were stopped: 2.4509644 seconds > Application time: 32.3719363 seconds > 28015.299: [GC 28015.299: [DefNew: 18543K->164K(169600K), 0.0073310 > secs]28015.306: [Tenured: 220413K->220577K(349568K), 1.2056721 secs] > 238957K->220577K(519168K), 1.2131937 secs] > 28016.512: [Full GC 28016.512: [Tenured: 220577K->220252K(349568K), > 1.2023572 secs] 220577K->220252K(519168K), [Perm : > 27499K->27498K(65536K)], 1.2025188 secs] > Total time for which application threads were stopped: 2.4167372 seconds > > -------- > 08/01/03 17:32:28 Stop process > > > > > > > > Thanks & Regards > Chris Guan > Tel: 24056255 > > > > > > > > IMPORTANT NOTICE > Email from OOCL is confidential and may be legally privileged. If it > is not intended for you, please delete it immediately unread. The > internet cannot guarantee that this communication is free of viruses, > interception or interference and anyone who communicates with us by > email is taken to accept the risks in doing so. Without limitation, > OOCL and its affiliates accept no liability whatsoever and howsoever > arising in connection with the use of this email. Under no > circumstances shall this email constitute a binding agreement to carry > or for provision of carriage services by OOCL, which is subject to the > availability of carrier's equipment and vessels and the terms and > conditions of OOCL's standard bill of lading which is also available > at http://www.oocl.com. > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From tony.printezis at sun.com Mon Jan 14 10:55:27 2008 From: tony.printezis at sun.com (Tony Printezis) Date: Mon, 14 Jan 2008 13:55:27 -0500 Subject: Strange isssue about OOM In-Reply-To: <586EF14A9B93B945B69DF313EEE8B9A50244CCDD@shamail3.corp.oocl.com> References: <586EF14A9B93B945B69DF313EEE8B9A50244CCDD@shamail3.corp.oocl.com> Message-ID: <478BB01F.4080506@sun.com> Could you please tell us what version of the VM you're using? I.e., java -version. Thanks. Tony chris.guan at oocl.com wrote: > > Dear, > > Encounter one issue about out of memory, server continued trigger gc > and full gc while the memory is less used, eapesicially the [young]. > Please help.Also, what is the theary to trigger gc?Many thanks. > > Not calling System.gc(). Performing 10 VU's test. Not reproduced. > > The server parameter: -server -Xms512M -Xmx512M -verbose:gc > -XX:PermSize=64M -XX:MaxPermSize=64M. > > The log: > 25185.728: [GC 25185.728: [DefNew: 163242K->163242K(169600K), > 0.0000587 secs]25185.728: [Tenured: 234502K->252491K(349568K), > 1.3021262 secs] 397745K->252491K(519168K), 1.3029282 secs] > Total time for which application threads were stopped: 1.3040612 seconds > Application time: 23.7854558 seconds > 25210.819: [GC 25210.820: [DefNew: 86804K->1788K(169600K), 0.0486894 > secs]25210.868: [Tenured: 266146K->241318K(349568K), 1.2399896 secs] > 339295K->241318K(519168K), 1.2888920 secs] > 25212.108: [Full GC 25212.109: [Tenured[Unloading class > sun.reflect.GeneratedConstructorAccessor314] > [Unloading class sun.reflect.GeneratedMethodAccessor759] > [Unloading class sun.reflect.GeneratedConstructorAccessor315] > [Unloading class sun.reflect.GeneratedConstructorAccessor268] > [Unloading class sun.reflect.GeneratedConstructorAccessor269] > [Unloading class sun.reflect.GeneratedConstructorAccessor316] > [Unloading class sun.reflect.GeneratedConstructorAccessor313] > : 241318K->226997K(349568K), 1.5769614 secs] > 241318K->226997K(519168K), [Perm : 27510K->27498K(65536K)], 1.5771121 > secs] > Total time for which application threads were stopped: 2.8672106 seconds > Application time: 0.0086622 seconds > 25213.699: [GC 25213.699: [DefNew: 756K->82K(169600K), 0.0078670 > secs]25213.707: [Tenured[Unloading class > sun.reflect.GeneratedConstructorAccessor274] > : 226997K->221430K(349568K), 2.0173917 secs] > 227753K->221430K(519168K), 2.0254397 secs] > 25215.724: [Full GC 25215.724: [Tenured: 221430K->221422K(349568K), > 1.2257687 secs] 221430K->221422K(519168K), [Perm : > 27496K->27496K(65536K)], 1.2259138 secs] > Total time for which application threads were stopped: 3.2556802 seconds > Application time: 21.5991046 seconds > 25238.563: [GC 25238.563: [DefNew: 18728K->296K(169600K), 0.0169044 > secs]25238.580: [Tenured: 221422K->221605K(349568K), 1.3133015 secs] > 240150K->221605K(519168K), 1.3303972 secs] > 25239.893: [Full GC 25239.894: [Tenured: 221605K->220330K(349568K), > 1.2038988 secs] 221605K->220330K(519168K), [Perm : > 27497K->27497K(65536K)], 1.2040751 secs] > Total time for which application threads were stopped: 2.5488005 seconds > Application time: 0.0188518 seconds > 25241.130: [GC 25241.130: [DefNew: 1990K->59K(169600K), 0.0071533 > secs]25241.137: [Tenured: 220330K->220344K(349568K), 1.2219109 secs] > 222321K->220344K(519168K), 1.2292561 secs] > 25242.359: [Full GC 25242.359: [Tenured: 220344K->220335K(349568K), > 1.2026312 secs] 220344K->220335K(519168K), [Perm : > 27497K->27497K(65536K)], 1.2027791 secs] > Total time for which application threads were stopped: 2.4439904 seconds > Application time: 24.2665427 seconds > 25267.833: [GC 25267.833: [DefNew: 24971K->255K(169600K), 0.0081354 > secs]25267.841: [Tenured: 220335K->220542K(349568K), 1.2416907 secs] > 245306K->220542K(519168K), 1.2500197 secs] > 25269.083: [Full GC 25269.083: [Tenured: 220542K->220317K(349568K), > 1.2465385 secs] 220542K->220317K(519168K), [Perm : > 27497K->27497K(65536K)], 1.2466957 secs] > Total time for which application threads were stopped: 2.5006234 seconds > Application time: 9.1710434 seconds > 25279.513: [GC 25279.513: [DefNew: 11184K->124K(169600K), 0.0191095 > secs]25279.532: [Tenured: 220317K->220247K(349568K), 1.2142263 secs] > 231502K->220247K(519168K), 1.2335457 secs] > 25280.746: [Full GC 25280.746: [Tenured: 220247K->220237K(349568K), > 1.2042132 secs] 220247K->220237K(519168K), [Perm : > 27497K->27497K(65536K)], 1.2043556 secs] > Total time for which application threads were stopped: 2.4500521 seconds > Application time: 28.5731236 seconds > 25310.536: [GC 25310.536: [DefNew: 20905K->185K(169600K), 0.0174417 > secs]25310.554: [Tenured: 220237K->220422K(349568K), 1.2108142 secs] > 241142K->220422K(519168K), 1.2284657 secs] > 25311.765: [Full GC 25311.765: [Tenured: 220422K->220264K(349568K), > 1.2223217 secs] 220422K->220264K(519168K), [Perm : > 27497K->27497K(65536K)], 1.2224701 secs] > Total time for which application threads were stopped: 2.4631855 seconds > Application time: 0.0368244 seconds > 25313.039: [GC 25313.039: [DefNew: 2421K->91K(169600K), 0.0079322 > secs]25313.047: [Tenured: 220264K->220182K(349568K), 1.3464051 secs] > 222685K->220182K(519168K), 1.3545552 secs] > 25314.393: [Full GC 25314.394: [Tenured: 220182K->220172K(349568K), > 1.2893013 secs] 220182K->220172K(519168K), [Perm : > 27497K->27497K(65536K)], 1.2894685 secs] > Total time for which application threads were stopped: 2.6585330 seconds > Application time: 45.1138619 seconds > 25360.803: [GC 25360.803: [DefNew: 29023K->222K(169600K), 0.0078027 > secs]25360.811: [Tenured: 220172K->220388K(349568K), 1.2135921 secs] > 249195K->220388K(519168K), 1.2215896 secs] > 25362.025: [Full GC 25362.025: [Tenured: 220388K->220294K(349568K), > 1.1972240 secs] 220388K->220294K(519168K), [Perm : > 27497K->27497K(65536K)], 1.1973704 secs] > Total time for which application threads were stopped: 2.4229267 seconds > Application time: 31.0328978 seconds > 25394.270: [GC 25394.270: [DefNew: 19657K->162K(169600K), 0.0088962 > secs]25394.279: [Tenured: 220294K->220251K(349568K), 1.3206682 secs] > 239951K->220251K(519168K), 1.3297797 secs] > 25395.599: [Full GC 25395.600: [Tenured: 220251K->220241K(349568K), > 1.1989645 secs] 220251K->220241K(519168K), [Perm : > 27497K->27497K(65536K)], 1.1991104 secs] > Total time for which application threads were stopped: 2.5428534 seconds > Application time: 92.3087423 seconds > 25489.129: [GC 25489.130: [DefNew: 54368K->315K(169600K), 0.0116064 > secs]25489.141: [Tenured: 220241K->220556K(349568K), 1.5860020 secs] > 274610K->220556K(519168K), 1.5978559 secs] > 25490.727: [Full GC 25490.728: [Tenured: 220556K->220393K(349568K), > 1.7381398 secs] 220556K->220393K(519168K), [Perm : > 27497K->27497K(65536K)], 1.7382875 secs] > Total time for which application threads were stopped: 3.3583027 seconds > Application time: 5.9312480 seconds > 25498.413: [GC 25498.413: [DefNew: 5241K->93K(169600K), 0.0172377 > secs]25498.430: [Tenured: 220393K->220184K(349568K), 1.2922102 secs] > 225635K->220184K(519168K), 1.3096568 secs] > 25499.722: [Full GC 25499.722: [Tenured: 220184K->220175K(349568K), > 1.1997437 secs] 220184K->220175K(519168K), [Perm : > 27497K->27497K(65536K)], 1.1999031 secs] > Total time for which application threads were stopped: 2.5211851 seconds > Application time: 30.6028562 seconds > 25531.528: [GC 25531.528: [DefNew: 17776K->156K(169600K), 0.0171998 > secs]25531.545: [Tenured: 220175K->220330K(349568K), 1.2259878 secs] > 237951K->220330K(519168K), 1.2433891 secs] > 25532.771: [Full GC 25532.771: [Tenured: 220330K->220239K(349568K), > 1.6592153 secs] 220330K->220239K(519168K), [Perm : > 27499K->27498K(65536K)], 1.6593672 secs] > Total time for which application threads were stopped: 2.9038367 seconds > Application time: 60.4048509 seconds > 25594.836: [GC 25594.837: [DefNew: 33874K->231K(169600K), 0.0168536 > secs]25594.853: [Tenured: 220239K->220323K(349568K), 1.2180811 secs] > 254113K->220323K(519168K), 1.2351280 secs] > 25596.072: [Full GC 25596.072: [Tenured: 220323K->220313K(349568K), > 1.2081214 secs] 220323K->220313K(519168K), [Perm : > 27498K->27498K(65536K)], 1.2082667 secs] > Total time for which application threads were stopped: 2.4445252 seconds > Application time: 197.3583592 seconds > 25794.639: [GC 25794.640: [DefNew: 106524K->606K(169600K), 0.0115235 > secs]25794.651: [Tenured: 220313K->220918K(349568K), 1.2305807 secs] > 326837K->220918K(519168K), 1.2423031 secs] > 25795.882: [Full GC 25795.882: [Tenured: 220918K->220684K(349568K), > 1.4731407 secs] 220918K->220684K(519168K), [Perm : > 27499K->27498K(65536K)], 1.4732974 secs] > Total time for which application threads were stopped: 2.7168052 seconds > Application time: 164.9563502 seconds > 25962.313: [GC 25962.313: [DefNew: 82927K->507K(169600K), 0.0091128 > secs]25962.322: [Tenured: 220684K->220600K(349568K), 1.2012755 secs] > 303612K->220600K(519168K), 1.2105846 secs] > 25963.523: [Full GC 25963.524: [Tenured: 220600K->220590K(349568K), > 1.2976048 secs] 220600K->220590K(519168K), [Perm : > 27498K->27498K(65536K)], 1.2977544 secs] > Total time for which application threads were stopped: 2.5095478 seconds > Application time: 318.2895362 seconds > 26283.112: [GC 26283.112: [DefNew: 164480K->164480K(169600K), > 0.0000494 secs]26283.112: [Tenured: 220590K->221495K(349568K), > 1.3880562 secs] 385070K->221495K(519168K), 1.3883772 secs] > Total time for which application threads were stopped: 1.3895031 seconds > Application time: 318.6411712 seconds > 26603.143: [GC 26603.143: [DefNew: 164480K->164480K(169600K), > 0.0000486 secs]26603.143: [Tenured: 221495K->222391K(349568K), > 1.3624147 secs] 385975K->222391K(519168K), 1.3626644 secs] > Total time for which application threads were stopped: 1.3641826 seconds > Application time: 318.5726610 seconds > 26923.080: [GC 26923.080: [DefNew: 164480K->164480K(169600K), > 0.0000489 secs]26923.080: [Tenured: 222391K->221000K(349568K), > 1.3836587 secs] 386871K->221000K(519168K), 1.3839017 secs] > Total time for which application threads were stopped: 1.3854997 seconds > Application time: 319.1556307 seconds > 27243.621: [GC 27243.621: [DefNew: 164480K->164480K(169600K), > 0.0000463 secs]27243.621: [Tenured: 221000K->221899K(349568K), > 1.7850120 secs] 385480K->221899K(519168K), 1.7852453 secs] > Total time for which application threads were stopped: 1.7865089 seconds > Application time: 318.6705844 seconds > 27564.079: [GC 27564.079: [DefNew: 164480K->164480K(169600K), > 0.0000428 secs]27564.079: [Tenured: 221899K->222801K(349568K), > 1.5890202 secs] 386379K->222801K(519168K), 1.5892684 secs] > Total time for which application threads were stopped: 1.5909515 seconds > Application time: 318.5994769 seconds > 27884.269: [GC 27884.269: [DefNew: 164480K->164480K(169600K), > 0.0000471 secs]27884.269: [Tenured: 222801K->223693K(349568K), > 1.6433190 secs] 387281K->223693K(519168K), 1.6435445 secs] > Total time for which application threads were stopped: 1.6447412 seconds > Application time: 94.5616049 seconds > 27980.476: [GC 27980.476: [DefNew: 51783K->327K(169600K), 0.0179721 > secs]27980.494: [Tenured: 223693K->220424K(349568K), 1.2211616 secs] > 275476K->220424K(519168K), 1.2393261 secs] > 27981.715: [Full GC 27981.715: [Tenured: 220424K->220413K(349568K), > 1.2103792 secs] 220424K->220413K(519168K), [Perm : > 27498K->27498K(65536K)], 1.2105386 secs] > Total time for which application threads were stopped: 2.4509644 seconds > Application time: 32.3719363 seconds > 28015.299: [GC 28015.299: [DefNew: 18543K->164K(169600K), 0.0073310 > secs]28015.306: [Tenured: 220413K->220577K(349568K), 1.2056721 secs] > 238957K->220577K(519168K), 1.2131937 secs] > 28016.512: [Full GC 28016.512: [Tenured: 220577K->220252K(349568K), > 1.2023572 secs] 220577K->220252K(519168K), [Perm : > 27499K->27498K(65536K)], 1.2025188 secs] > Total time for which application threads were stopped: 2.4167372 seconds > > -------- > 08/01/03 17:32:28 Stop process > > > > > > /Thanks & Regards/ > /Chris Guan/ > /Tel: 24056255/ > > > > > > IMPORTANT NOTICE > Email from OOCL is confidential and may be legally privileged. If it > is not intended for you, please delete it immediately unread. The > internet cannot guarantee that this communication is free of viruses, > interception or interference and anyone who communicates with us by > email is taken to accept the risks in doing so. Without limitation, > OOCL and its affiliates accept no liability whatsoever and howsoever > arising in connection with the use of this email. Under no > circumstances shall this email constitute a binding agreement to carry > or for provision of carriage services by OOCL, which is subject to the > availability of carrier's equipment and vessels and the terms and > conditions of OOCL's standard bill of lading which is also available > at http://www.oocl.com. > > ------------------------------------------------------------------------ > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > -- ---------------------------------------------------------------------- | Tony Printezis, Staff Engineer | Sun Microsystems Inc. | | | MS BUR02-311 | | e-mail: tony.printezis at sun.com | 35 Network Drive | | office: +1 781 442 0998 (x20998) | Burlington, MA01803-0902, USA | ---------------------------------------------------------------------- e-mail client: Thunderbird (Solaris) From jamesnichols3 at gmail.com Tue Jan 15 10:27:46 2008 From: jamesnichols3 at gmail.com (James Nichols) Date: Tue, 15 Jan 2008 13:27:46 -0500 Subject: Rapid fluctuation of tenuring threshold In-Reply-To: References: <83a51e120801101554h4a9d7f51yc827d506d21fcd2@mail.gmail.com> Message-ID: <83a51e120801151027s2cf2405am5697254bad10aef4@mail.gmail.com> Hello Ramki, Thank you very much for your response. Your insights are very useful. I am actively digging into the system behavior during these periods when the death rate falls suddenly. I have several hundred threads in the application. 200 or so of these threads make outbound webservice calls every 5 minutes or so. I suspect that something is going on with these threads which is causing these issues. Many of these remote web service endpoints are on the same network, so it's likely that something is occuring which stalls many threads at the same time. I don't have a very good application layer metric to track this but am working to implement one. I have a 4CPU machine running the application and haven't tried running the incremental collector. The CPU usage on the machine isn't very high, as most of the threads are network bound, but I do see occasional spikes. I haven't had much luck correlating these spikes with the promotion issues, but will try. I've been testing out the new surviror space related tunings, but it will be several days until I'm ready to deploy these settings onto the production enviroment but will report back if these settings produce better results. Initial results on a test environment look very promising. Thanks, Jim On 1/11/08, Y Srinivas Ramakrishna wrote: > > > Hi James -- thanks for all of the data; the graphs as well as the logs. > Basically, the adaptive tenuring policy (for fixed survivor spaces) > appears > to be working pretty well here. > When there is space in the survivor spaces to accomodate the survivors > the threshold is loosened and as soon as there is a spike in the survivors > the threshold is tightened down. Interestingly what appears to be > happening > is that occasionally your program is going through these what i shall call > "survival spikes" where the death rate of very recently allocated objects > falls sharply. (You see this as a sudden spike in the population of > age 1 objects in the PrintTenuringDistribution output.) > However this quickly passes and these objects then > die fairly quickly thereafter. When the death rate falls suddenly in this > manner, you see the adaptive tenuring policy clamp down the threshold > in an effort to prevent survivor space overflow (which can cause premature > promotion and nepotism). When this happens there is also an increase in > promotiom rate (which you can observe in your plot at a fine scale). > > My guess is that you have a system on which occasionally some threads > are getting stalled or are getting temporarily starved. When this > happens the objects being processed by these threads tend to have their > lifetimes elongated and this causes a temporary fall in the death rate. > These threads probably then get to execute and quickly do the > processing that makes these objects dead and things then return to > normal. Either this, or there is some other burstiness in the processing > of objects that gives them this bursty dynamics. I suspect that the > "streaming" is not streaming as well as you expect and the > streams may be getting ocassionally stalled. So that's something > you want to check in your system. > > As far as tuning GC so as to avoid any unintended performance consequences > from this, what you want to do is to prevent survivor space overflow under > any > circumstances, because overflow can lead to premature promotion > and that in turn can lead to nepotism so that there will be a sudden spike > in promotions, and thence in scavenge pause times as well as more > pressure on the concurrent collector (just as you noticed). The increased > activity of the concurrent collector can in turn stall the streaming > threads > some more. (Which makes me ask what your platform is -- #cpus -- > and whether you have tried incremental cms to see if it makes the > behaviour less bursty and more smooth.) > > From the logs, i see some age 1 objects (i.e. objects that have been > allocated since the last scavenge) are getting tenured. This clearly > indicates that the survivor spaces are under-sized to deal with the > occasional heavier load (or stall of the processing threads). > I would recommend loweing the max tenuring threshold to 4 > (this avoids the unnecessary copying of long-lived objects that > you see which increases scavenge pause times) and > increasing the survivor space size by decreasing the survivor > ratio to something like 4 (this would avoid the occasional > survivor space overflow which can be very bad as i described > above). In order to compensate for the the smaller survivor ratio (so as > to leave the size of Eden the same, so as to leave your inter-scavenge > period unchanged), i would make a commensurate > increase in NewSize. In order to leave CMS cycles running > at the same rate as before you may also need to add space > appropriately to the old generation (by increasing Xmx Xms). > > That's all for now from me, but i am sure others on the alias > will be able to provide further tips (or alternate theories for the > apparent burstiness in behaviour). [Note however that the tenuring > threshold oscillating from 16 to 1 and back to 16 is more or less > normal in this case, especially the transition from 16 to 1 given the > suddenness with which the new object death rate falls -- it is this > suddenness that makes me suspect some kind of a stall in > one or a group of threads that would normally be processing > these surviving objects. Do you have any application level > queue length or other metrics that would tell you whether > something like this was happening?] > > How many threads do you have in your application, how many > cpu's on your platform, and do you observe any fluctuation in > the cpu consumed by various threads -- in particular are the > downward spikes in tenuring threshold preceded by stalling of > some heavily working threads in the application? That's the > direction in which I would point your investigation. > > PS: I am attaching the survivor space occupancy %ge plot > that you attached in your most recent email to you since > it is a nice illustration of how the reduction in tenuring threshold > is still occasionally unable to avoid survivor space overflow > (to the extent of 1 in 9.5 scavenges on average according to > a rough calculation). > > -- ramki > > ----- Original Message ----- > From: James Nichols > Date: Friday, January 11, 2008 11:11 am > Subject: Rapid fluctuation of tenuring threshold > To: hotspot-gc-use at openjdk.java.net > > > > Hello, > > > > My application has a fairly high streaming dataflow that creates a lot > > of > > short-lived objects. I've made the young generation fairly large and > > have a > > pretty big survivor space, but I still have a # of objects that end up > > in > > the old generation. As a result of this, my old generation size ramps > > up > > slowly over time until these objects are dead, then they all get > > cleaned up > > and there is a big drop in old generation usage, from about 2.5GBs to > > about > > 1GB. Most of the time it takes about 80 minutes to get up to 2.5GBs, > > but > > occasionally it happens much more rapidly, as fast at every 8 minutes. > > My > > workload does fluctuate over time, but not enough to explain this > > change in > > garbage collection behavior. I'm seeing some very odd behavior in the > > JVM's > > tuning of the tenuring threshold. > > > > Attached is a chart that plots the old generation over time (the blue > > line > > with a moving average in yellow). Each dot is a garbage collection > (usually > > the young generation). On this chart, I also plotted the tenuring > threshold > > in red, with a moving average in black. Each dot represents what the > > threshold was for that particular garbage collection. You can see > around > > time 11,500 minutes the peaks/valleys become much more frequent as > > more data > > is ending up int the old generation. During this time, the moving > average > > of the tenuring threshold drops substantially. This also happens during > > other periods where the old generation is filled up quickly. I'm not > > sure > > if this is a cause or an effect. I leaning towards the tenuring > threshold > > being the cause because the objects ending up in the old generation are > > typically reclaimed in a very short period of time. > > > > I've analyzed the distribution of the tenuring threshold over a > substantial > > period of time in my application. It appears that 25% of the time the > > tenuring threshold is 1, 60% of the time it's 16, and the other 15% is > > distributed somewhere in the middle. I'm a bit puzzled why the > threshold > > would so rapidly change between 1 and 16, and wonder if I'm getting > > hosed by > > the auto-tuning of this threshold. Looking at the log, it rapidly > > goes from > > 1 to 16, very rarely stopping intermediately. Is it reasonable to think > > that something is wrong with the auto tuning to make it stay at 1 and > > cause > > the behavior I noted above? > > > > I'm running jdk 1.5.0_12 on RedHat linux and my application server is > > JBoss > > 4.0.5. My GC settings are below... I can send the gc.dat log if you > want > > it, it's pretty big so I won't email it to the list. I've signed up > > to the > > mailing list, but please CC me on any replies. > > > > Thanks, James > > > > -server -Xms5170m -Xmx5170m > > -XX:NewSize=1536M -XX:MaxNewSize=1536M > > -XX:PermSize=512M -XX:MaxPermSize=512M > > -XX:MaxTenuringThreshold=30 -XX:SurvivorRatio=10 > > -XX:+ScavengeBeforeFullGC > > -XX:+UseConcMarkSweepGC > > -XX:+UseParNewGC -XX:ParallelGCThreads=3 > > -XX:+CMSParallelRemarkEnabled -XX:+CMSClassUnloadingEnabled > > -XX:+CMSPermGenSweepingEnabled -XX:+DisableExplicitGC > > -XX:+PrintTenuringDistribution -XX:+PrintGCDetails > -XX:+PrintGCTimeStamps > > -XX:+PrintHeapAtGC > > -XX:+PrintGCApplicationConcurrentTime -XX:+PrintGCApplicationStoppedTime > > -Xloggc:gc.dat > > _______________________________________________ > > hotspot-gc-use mailing list > > hotspot-gc-use at openjdk.java.net > > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20080115/2e5f5ab4/attachment.html From chris.guan at oocl.com Mon Jan 14 17:59:43 2008 From: chris.guan at oocl.com (chris.guan at oocl.com) Date: Tue, 15 Jan 2008 09:59:43 +0800 Subject: Strange isssue about OOM References: <586EF14A9B93B945B69DF313EEE8B9A50244CCDD@shamail3.corp.oocl.com> <478BB01F.4080506@sun.com> Message-ID: <586EF14A9B93B945B69DF313EEE8B9A50244CF6C@shamail3.corp.oocl.com> Dear all, Java version "1.4.2_13"; Oracle Application Server Containers for J2EE 10g (10.1.2.2.0) ; No System.gc(); WebService; Thanks & Regards Chris Guan Tel: 24056255 -----Original Message----- From: Tony Printezis [mailto:tony.printezis at sun.com] Sent: Tuesday, January 15, 2008 2:55 AM To: CHRIS GUAN (ISDC-ISD-OOCLL/SHA) Cc: hotspot-gc-use at openjdk.java.net; STEVE ZHUANG (ISDC-ISD-OOCLL/SHA) Subject: Re: Strange isssue about OOM Could you please tell us what version of the VM you're using? I.e., java -version. Thanks. Tony chris.guan at oocl.com wrote: > > Dear, > > Encounter one issue about out of memory, server continued trigger gc > and full gc while the memory is less used, eapesicially the [young]. > Please help.Also, what is the theary to trigger gc?Many thanks. > > Not calling System.gc(). Performing 10 VU's test. Not reproduced. > > The server parameter: -server -Xms512M -Xmx512M -verbose:gc > -XX:PermSize=64M -XX:MaxPermSize=64M. > > The log: > 25185.728: [GC 25185.728: [DefNew: 163242K->163242K(169600K), > 0.0000587 secs]25185.728: [Tenured: 234502K->252491K(349568K), > 1.3021262 secs] 397745K->252491K(519168K), 1.3029282 secs] Total time > for which application threads were stopped: 1.3040612 seconds > Application time: 23.7854558 seconds > 25210.819: [GC 25210.820: [DefNew: 86804K->1788K(169600K), 0.0486894 > secs]25210.868: [Tenured: 266146K->241318K(349568K), 1.2399896 secs] > 339295K->241318K(519168K), 1.2888920 secs] > 25212.108: [Full GC 25212.109: [Tenured[Unloading class > sun.reflect.GeneratedConstructorAccessor314] > [Unloading class sun.reflect.GeneratedMethodAccessor759] > [Unloading class sun.reflect.GeneratedConstructorAccessor315] > [Unloading class sun.reflect.GeneratedConstructorAccessor268] > [Unloading class sun.reflect.GeneratedConstructorAccessor269] > [Unloading class sun.reflect.GeneratedConstructorAccessor316] > [Unloading class sun.reflect.GeneratedConstructorAccessor313] > : 241318K->226997K(349568K), 1.5769614 secs] > 241318K->226997K(519168K), [Perm : 27510K->27498K(65536K)], 1.5771121 > secs] > Total time for which application threads were stopped: 2.8672106 > seconds Application time: 0.0086622 seconds > 25213.699: [GC 25213.699: [DefNew: 756K->82K(169600K), 0.0078670 > secs]25213.707: [Tenured[Unloading class > sun.reflect.GeneratedConstructorAccessor274] > : 226997K->221430K(349568K), 2.0173917 secs] > 227753K->221430K(519168K), 2.0254397 secs] > 25215.724: [Full GC 25215.724: [Tenured: 221430K->221422K(349568K), > 1.2257687 secs] 221430K->221422K(519168K), [Perm : > 27496K->27496K(65536K)], 1.2259138 secs] > Total time for which application threads were stopped: 3.2556802 > seconds Application time: 21.5991046 seconds > 25238.563: [GC 25238.563: [DefNew: 18728K->296K(169600K), 0.0169044 > secs]25238.580: [Tenured: 221422K->221605K(349568K), 1.3133015 secs] > 240150K->221605K(519168K), 1.3303972 secs] > 25239.893: [Full GC 25239.894: [Tenured: 221605K->220330K(349568K), > 1.2038988 secs] 221605K->220330K(519168K), [Perm : > 27497K->27497K(65536K)], 1.2040751 secs] > Total time for which application threads were stopped: 2.5488005 > seconds Application time: 0.0188518 seconds > 25241.130: [GC 25241.130: [DefNew: 1990K->59K(169600K), 0.0071533 > secs]25241.137: [Tenured: 220330K->220344K(349568K), 1.2219109 secs] > 222321K->220344K(519168K), 1.2292561 secs] > 25242.359: [Full GC 25242.359: [Tenured: 220344K->220335K(349568K), > 1.2026312 secs] 220344K->220335K(519168K), [Perm : > 27497K->27497K(65536K)], 1.2027791 secs] > Total time for which application threads were stopped: 2.4439904 > seconds Application time: 24.2665427 seconds > 25267.833: [GC 25267.833: [DefNew: 24971K->255K(169600K), 0.0081354 > secs]25267.841: [Tenured: 220335K->220542K(349568K), 1.2416907 secs] > 245306K->220542K(519168K), 1.2500197 secs] > 25269.083: [Full GC 25269.083: [Tenured: 220542K->220317K(349568K), > 1.2465385 secs] 220542K->220317K(519168K), [Perm : > 27497K->27497K(65536K)], 1.2466957 secs] > Total time for which application threads were stopped: 2.5006234 > seconds Application time: 9.1710434 seconds > 25279.513: [GC 25279.513: [DefNew: 11184K->124K(169600K), 0.0191095 > secs]25279.532: [Tenured: 220317K->220247K(349568K), 1.2142263 secs] > 231502K->220247K(519168K), 1.2335457 secs] > 25280.746: [Full GC 25280.746: [Tenured: 220247K->220237K(349568K), > 1.2042132 secs] 220247K->220237K(519168K), [Perm : > 27497K->27497K(65536K)], 1.2043556 secs] > Total time for which application threads were stopped: 2.4500521 > seconds Application time: 28.5731236 seconds > 25310.536: [GC 25310.536: [DefNew: 20905K->185K(169600K), 0.0174417 > secs]25310.554: [Tenured: 220237K->220422K(349568K), 1.2108142 secs] > 241142K->220422K(519168K), 1.2284657 secs] > 25311.765: [Full GC 25311.765: [Tenured: 220422K->220264K(349568K), > 1.2223217 secs] 220422K->220264K(519168K), [Perm : > 27497K->27497K(65536K)], 1.2224701 secs] > Total time for which application threads were stopped: 2.4631855 > seconds Application time: 0.0368244 seconds > 25313.039: [GC 25313.039: [DefNew: 2421K->91K(169600K), 0.0079322 > secs]25313.047: [Tenured: 220264K->220182K(349568K), 1.3464051 secs] > 222685K->220182K(519168K), 1.3545552 secs] > 25314.393: [Full GC 25314.394: [Tenured: 220182K->220172K(349568K), > 1.2893013 secs] 220182K->220172K(519168K), [Perm : > 27497K->27497K(65536K)], 1.2894685 secs] > Total time for which application threads were stopped: 2.6585330 > seconds Application time: 45.1138619 seconds > 25360.803: [GC 25360.803: [DefNew: 29023K->222K(169600K), 0.0078027 > secs]25360.811: [Tenured: 220172K->220388K(349568K), 1.2135921 secs] > 249195K->220388K(519168K), 1.2215896 secs] > 25362.025: [Full GC 25362.025: [Tenured: 220388K->220294K(349568K), > 1.1972240 secs] 220388K->220294K(519168K), [Perm : > 27497K->27497K(65536K)], 1.1973704 secs] > Total time for which application threads were stopped: 2.4229267 > seconds Application time: 31.0328978 seconds > 25394.270: [GC 25394.270: [DefNew: 19657K->162K(169600K), 0.0088962 > secs]25394.279: [Tenured: 220294K->220251K(349568K), 1.3206682 secs] > 239951K->220251K(519168K), 1.3297797 secs] > 25395.599: [Full GC 25395.600: [Tenured: 220251K->220241K(349568K), > 1.1989645 secs] 220251K->220241K(519168K), [Perm : > 27497K->27497K(65536K)], 1.1991104 secs] > Total time for which application threads were stopped: 2.5428534 > seconds Application time: 92.3087423 seconds > 25489.129: [GC 25489.130: [DefNew: 54368K->315K(169600K), 0.0116064 > secs]25489.141: [Tenured: 220241K->220556K(349568K), 1.5860020 secs] > 274610K->220556K(519168K), 1.5978559 secs] > 25490.727: [Full GC 25490.728: [Tenured: 220556K->220393K(349568K), > 1.7381398 secs] 220556K->220393K(519168K), [Perm : > 27497K->27497K(65536K)], 1.7382875 secs] > Total time for which application threads were stopped: 3.3583027 > seconds Application time: 5.9312480 seconds > 25498.413: [GC 25498.413: [DefNew: 5241K->93K(169600K), 0.0172377 > secs]25498.430: [Tenured: 220393K->220184K(349568K), 1.2922102 secs] > 225635K->220184K(519168K), 1.3096568 secs] > 25499.722: [Full GC 25499.722: [Tenured: 220184K->220175K(349568K), > 1.1997437 secs] 220184K->220175K(519168K), [Perm : > 27497K->27497K(65536K)], 1.1999031 secs] > Total time for which application threads were stopped: 2.5211851 > seconds Application time: 30.6028562 seconds > 25531.528: [GC 25531.528: [DefNew: 17776K->156K(169600K), 0.0171998 > secs]25531.545: [Tenured: 220175K->220330K(349568K), 1.2259878 secs] > 237951K->220330K(519168K), 1.2433891 secs] > 25532.771: [Full GC 25532.771: [Tenured: 220330K->220239K(349568K), > 1.6592153 secs] 220330K->220239K(519168K), [Perm : > 27499K->27498K(65536K)], 1.6593672 secs] > Total time for which application threads were stopped: 2.9038367 > seconds Application time: 60.4048509 seconds > 25594.836: [GC 25594.837: [DefNew: 33874K->231K(169600K), 0.0168536 > secs]25594.853: [Tenured: 220239K->220323K(349568K), 1.2180811 secs] > 254113K->220323K(519168K), 1.2351280 secs] > 25596.072: [Full GC 25596.072: [Tenured: 220323K->220313K(349568K), > 1.2081214 secs] 220323K->220313K(519168K), [Perm : > 27498K->27498K(65536K)], 1.2082667 secs] > Total time for which application threads were stopped: 2.4445252 > seconds Application time: 197.3583592 seconds > 25794.639: [GC 25794.640: [DefNew: 106524K->606K(169600K), 0.0115235 > secs]25794.651: [Tenured: 220313K->220918K(349568K), 1.2305807 secs] > 326837K->220918K(519168K), 1.2423031 secs] > 25795.882: [Full GC 25795.882: [Tenured: 220918K->220684K(349568K), > 1.4731407 secs] 220918K->220684K(519168K), [Perm : > 27499K->27498K(65536K)], 1.4732974 secs] > Total time for which application threads were stopped: 2.7168052 > seconds Application time: 164.9563502 seconds > 25962.313: [GC 25962.313: [DefNew: 82927K->507K(169600K), 0.0091128 > secs]25962.322: [Tenured: 220684K->220600K(349568K), 1.2012755 secs] > 303612K->220600K(519168K), 1.2105846 secs] > 25963.523: [Full GC 25963.524: [Tenured: 220600K->220590K(349568K), > 1.2976048 secs] 220600K->220590K(519168K), [Perm : > 27498K->27498K(65536K)], 1.2977544 secs] > Total time for which application threads were stopped: 2.5095478 > seconds Application time: 318.2895362 seconds > 26283.112: [GC 26283.112: [DefNew: 164480K->164480K(169600K), > 0.0000494 secs]26283.112: [Tenured: 220590K->221495K(349568K), > 1.3880562 secs] 385070K->221495K(519168K), 1.3883772 secs] Total time > for which application threads were stopped: 1.3895031 seconds > Application time: 318.6411712 seconds > 26603.143: [GC 26603.143: [DefNew: 164480K->164480K(169600K), > 0.0000486 secs]26603.143: [Tenured: 221495K->222391K(349568K), > 1.3624147 secs] 385975K->222391K(519168K), 1.3626644 secs] Total time > for which application threads were stopped: 1.3641826 seconds > Application time: 318.5726610 seconds > 26923.080: [GC 26923.080: [DefNew: 164480K->164480K(169600K), > 0.0000489 secs]26923.080: [Tenured: 222391K->221000K(349568K), > 1.3836587 secs] 386871K->221000K(519168K), 1.3839017 secs] Total time > for which application threads were stopped: 1.3854997 seconds > Application time: 319.1556307 seconds > 27243.621: [GC 27243.621: [DefNew: 164480K->164480K(169600K), > 0.0000463 secs]27243.621: [Tenured: 221000K->221899K(349568K), > 1.7850120 secs] 385480K->221899K(519168K), 1.7852453 secs] Total time > for which application threads were stopped: 1.7865089 seconds > Application time: 318.6705844 seconds > 27564.079: [GC 27564.079: [DefNew: 164480K->164480K(169600K), > 0.0000428 secs]27564.079: [Tenured: 221899K->222801K(349568K), > 1.5890202 secs] 386379K->222801K(519168K), 1.5892684 secs] Total time > for which application threads were stopped: 1.5909515 seconds > Application time: 318.5994769 seconds > 27884.269: [GC 27884.269: [DefNew: 164480K->164480K(169600K), > 0.0000471 secs]27884.269: [Tenured: 222801K->223693K(349568K), > 1.6433190 secs] 387281K->223693K(519168K), 1.6435445 secs] Total time > for which application threads were stopped: 1.6447412 seconds > Application time: 94.5616049 seconds > 27980.476: [GC 27980.476: [DefNew: 51783K->327K(169600K), 0.0179721 > secs]27980.494: [Tenured: 223693K->220424K(349568K), 1.2211616 secs] > 275476K->220424K(519168K), 1.2393261 secs] > 27981.715: [Full GC 27981.715: [Tenured: 220424K->220413K(349568K), > 1.2103792 secs] 220424K->220413K(519168K), [Perm : > 27498K->27498K(65536K)], 1.2105386 secs] > Total time for which application threads were stopped: 2.4509644 > seconds Application time: 32.3719363 seconds > 28015.299: [GC 28015.299: [DefNew: 18543K->164K(169600K), 0.0073310 > secs]28015.306: [Tenured: 220413K->220577K(349568K), 1.2056721 secs] > 238957K->220577K(519168K), 1.2131937 secs] > 28016.512: [Full GC 28016.512: [Tenured: 220577K->220252K(349568K), > 1.2023572 secs] 220577K->220252K(519168K), [Perm : > 27499K->27498K(65536K)], 1.2025188 secs] > Total time for which application threads were stopped: 2.4167372 > seconds > > -------- > 08/01/03 17:32:28 Stop process > > > > > > /Thanks & Regards/ > /Chris Guan/ > /Tel: 24056255/ > > > > > > IMPORTANT NOTICE > Email from OOCL is confidential and may be legally privileged. If it > is not intended for you, please delete it immediately unread. The > internet cannot guarantee that this communication is free of viruses, > interception or interference and anyone who communicates with us by > email is taken to accept the risks in doing so. Without limitation, > OOCL and its affiliates accept no liability whatsoever and howsoever > arising in connection with the use of this email. Under no > circumstances shall this email constitute a binding agreement to carry > or for provision of carriage services by OOCL, which is subject to the > availability of carrier's equipment and vessels and the terms and > conditions of OOCL's standard bill of lading which is also available > at http://www.oocl.com. > > ---------------------------------------------------------------------- > -- > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > -- ---------------------------------------------------------------------- | Tony Printezis, Staff Engineer | Sun Microsystems Inc. | | | MS BUR02-311 | | e-mail: tony.printezis at sun.com | 35 Network Drive | | office: +1 781 442 0998 (x20998) | Burlington, MA01803-0902, USA | ---------------------------------------------------------------------- e-mail client: Thunderbird (Solaris) IMPORTANT NOTICE Email from OOCL is confidential and may be legally privileged. If it is not intended for you, please delete it immediately unread. The internet cannot guarantee that this communication is free of viruses, interception or interference and anyone who communicates with us by email is taken to accept the risks in doing so. Without limitation, OOCL and its affiliates accept no liability whatsoever and howsoever arising in connection with the use of this email. Under no circumstances shall this email constitute a binding agreement to carry or for provision of carriage services by OOCL, which is subject to the availability of carrier's equipment and vessels and the terms and conditions of OOCL's standard bill of lading which is also available at http://www.oocl.com. From Y.S.Ramakrishna at Sun.COM Tue Jan 15 11:20:22 2008 From: Y.S.Ramakrishna at Sun.COM (Y Srinivas Ramakrishna) Date: Tue, 15 Jan 2008 11:20:22 -0800 Subject: Strange isssue about OOM In-Reply-To: <586EF14A9B93B945B69DF313EEE8B9A50244CF73@shamail3.corp.oocl.com> References: <586EF14A9B93B945B69DF313EEE8B9A50244CCDD@shamail3.corp.oocl.com> <586EF14A9B93B945B69DF313EEE8B9A50244CF73@shamail3.corp.oocl.com> Message-ID: Hi Chris -- > Thanks for your help. > > Java version "1.4.2_13"; > Oracle Application Server Containers for J2EE 10g (10.1.2.2.0) ; > No System.gc(); > WebService; > > Would you please explain "HandlePromotionFailure"?Where to see CR > 6206427. > In older versions of the JVM (such as the one you are using), we didn't have a mechanism to recover from a partial scavenge if the scavenge could not be completed because of lack of sufficient space in the old generation to promote tenured objects into. So we used to be very conservative in starting a scavenge and probably conservative in precipitating a full collection if we felt the next scavenge would not succeed. This conservatism could cause us to waste free space in the old generation. See for example:- http://java.sun.com/docs/hotspot/gc1.4.2/#4.4.2.%20Young%20Generation%20Guarantee|outline In later JVM's, we added the ability to recover from a partially completed scavenge and to compact the entire heap. This allowed us to be less pessimistic about going into a scavenge, and would waste less space in the old generation even when you run with large young generations. See:- http://java.sun.com/docs/hotspot/gc5.0/gc_tuning_5.html#0.0.0.0.%20Young%20Generation%20Guarantee%7Coutline I'd actually recommend upgrading to a more recent JVM where you'd see performance improvements, but if that is not possible, then with 1.4.2_13, you should use -XX:+HandlePromotionFailure which should allow you to make more efficient use of the heap by avoiding the issue described above. As to the question of why you need to do, for example, the following scavenge :- > > 25313.039: [GC 25313.039: [DefNew: 2421K->91K(169600K), 0.0079322 > > secs]25313.047: [Tenured: 220264K->220182K(349568K), 1.3464051 secs] My only guess is that the application may be requesting the allocation of a large object. But that's a guess. (Yes, we should ideally have that information be available from the gc logs.) -- ramki From chris.guan at oocl.com Tue Jan 15 23:31:59 2008 From: chris.guan at oocl.com (chris.guan at oocl.com) Date: Wed, 16 Jan 2008 15:31:59 +0800 Subject: Strange isssue about OOM References: <586EF14A9B93B945B69DF313EEE8B9A50244CCDD@shamail3.corp.oocl.com> <586EF14A9B93B945B69DF313EEE8B9A50244CF73@shamail3.corp.oocl.com> Message-ID: <586EF14A9B93B945B69DF313EEE8B9A50244D8B2@shamail3.corp.oocl.com> Dear all, Do anybody uderstand the triggering reason of GC and Full GC. Confused by 1 The [Young] less used when call GC. 25213.699: [GC 25213.699: [DefNew: 756K->82K(169600K), 0.0078670 secs]25213.707: [Tenured[Unloading class sun.reflect.GeneratedConstructorAccessor274] : 226997K->221430K(349568K), 2.0173917 secs] 227753K->221430K(519168K), 2.0254397 secs]. 2 Frequenty full GC 27980.476: [GC 27980.476: [DefNew: 51783K->327K(169600K), 0.0179721 secs]27980.494: [Tenured: 223693K->220424K(349568K), 1.2211616 secs] 275476K->220424K(519168K), 1.2393261 secs] 27981.715: [Full GC 27981.715: [Tenured: 220424K->220413K(349568K), 1.2103792 secs] 220424K->220413K(519168K), [Perm : 27498K->27498K(65536K)], 1.2105386 secs] Total time for which application threads were stopped: 2.4509644 seconds Application time: 32.3719363 seconds 28015.299: [GC 28015.299: [DefNew: 18543K->164K(169600K), 0.0073310 secs]28015.306: [Tenured: 220413K->220577K(349568K), 1.2056721 secs] 238957K->220577K(519168K), 1.2131937 secs] 28016.512: [Full GC 28016.512: [Tenured: 220577K->220252K(349568K), 1.2023572 secs] 220577K->220252K(519168K), [Perm : 27499K->27498K(65536K)], 1.2025188 secs] Total time for which application threads were stopped: 2.4167372 seconds Thanks & Regards Chris Guan Tel: 24056255 -----Original Message----- From: Y Srinivas Ramakrishna [mailto:Y.S.Ramakrishna at Sun.COM] Sent: Wednesday, January 16, 2008 3:20 AM To: CHRIS GUAN (ISDC-ISD-OOCLL/SHA) Cc: hotspot-gc-use at openjdk.java.net Subject: Re: RE: Strange isssue about OOM Hi Chris -- > Thanks for your help. > > Java version "1.4.2_13"; > Oracle Application Server Containers for J2EE 10g (10.1.2.2.0) ; No > System.gc(); WebService; > > Would you please explain "HandlePromotionFailure"?Where to see CR > 6206427. > In older versions of the JVM (such as the one you are using), we didn't have a mechanism to recover from a partial scavenge if the scavenge could not be completed because of lack of sufficient space in the old generation to promote tenured objects into. So we used to be very conservative in starting a scavenge and probably conservative in precipitating a full collection if we felt the next scavenge would not succeed. This conservatism could cause us to waste free space in the old generation. See for example:- http://java.sun.com/docs/hotspot/gc1.4.2/#4.4.2.%20Young%20Generation%20 Guarantee|outline In later JVM's, we added the ability to recover from a partially completed scavenge and to compact the entire heap. This allowed us to be less pessimistic about going into a scavenge, and would waste less space in the old generation even when you run with large young generations. See:- http://java.sun.com/docs/hotspot/gc5.0/gc_tuning_5.html#0.0.0.0.%20Young %20Generation%20Guarantee%7Coutline I'd actually recommend upgrading to a more recent JVM where you'd see performance improvements, but if that is not possible, then with 1.4.2_13, you should use -XX:+HandlePromotionFailure which should allow you to make more efficient use of the heap by avoiding the issue described above. As to the question of why you need to do, for example, the following scavenge :- > > 25313.039: [GC 25313.039: [DefNew: 2421K->91K(169600K), 0.0079322 > > secs]25313.047: [Tenured: 220264K->220182K(349568K), 1.3464051 secs] My only guess is that the application may be requesting the allocation of a large object. But that's a guess. (Yes, we should ideally have that information be available from the gc logs.) -- ramki IMPORTANT NOTICE Email from OOCL is confidential and may be legally privileged. If it is not intended for you, please delete it immediately unread. The internet cannot guarantee that this communication is free of viruses, interception or interference and anyone who communicates with us by email is taken to accept the risks in doing so. Without limitation, OOCL and its affiliates accept no liability whatsoever and howsoever arising in connection with the use of this email. Under no circumstances shall this email constitute a binding agreement to carry or for provision of carriage services by OOCL, which is subject to the availability of carrier's equipment and vessels and the terms and conditions of OOCL's standard bill of lading which is also available at http://www.oocl.com. From steve.zhuang at oocl.com Wed Jan 16 06:14:05 2008 From: steve.zhuang at oocl.com (steve.zhuang at oocl.com) Date: Wed, 16 Jan 2008 22:14:05 +0800 Subject: Strange isssue about OOM References: <586EF14A9B93B945B69DF313EEE8B9A50244CCDD@shamail3.corp.oocl.com> <586EF14A9B93B945B69DF313EEE8B9A50244CF73@shamail3.corp.oocl.com> <586EF14A9B93B945B69DF313EEE8B9A50244D51E@shamail3.corp.oocl.com> Message-ID: <586EF14A9B93B945B69DF313EEE8B9A50248D834@shamail3.corp.oocl.com> Hi Ramki, Thanks for your explanation about "HandlePromotionFailure" option. >From the gc log, 25210.819: [GC 25210.820: [DefNew: 86804K->1788K(169600K), 0.0486894 secs]25210.868: [Tenured: 266146K->241318K(349568K), 1.2399896 secs] 339295K->241318K(519168K), 1.2888920 secs] 25212.108: [Full GC 25212.109: [Tenured[Unloading class sun.reflect.GeneratedConstructorAccessor314] [Unloading class sun.reflect.GeneratedMethodAccessor759] [Unloading class sun.reflect.GeneratedConstructorAccessor315] [Unloading class sun.reflect.GeneratedConstructorAccessor268] [Unloading class sun.reflect.GeneratedConstructorAccessor269] [Unloading class sun.reflect.GeneratedConstructorAccessor316] [Unloading class sun.reflect.GeneratedConstructorAccessor313] : 241318K->226997K(349568K), 1.5769614 secs] 241318K->226997K(519168K), [Perm : 27510K->27498K(65536K)], 1.5771121 secs] I still have below issues: 1. Why do a partially scavenge while Young generation still has lot of free memory? Your guess is that the application may be requesting the allocation of a large object, but I think it is impossible for our application to have such a large obejct (about 169 - 86 = 83 M). Is there any known issue about this kind of unusual gc? And is the ratio setting between Young generation and Tenured generation possible to improve it? Our current ratio is as default 1:2 (Young : Tenured) 2. We could see that some unloading classes occur in the next full gc, and seems JVM has some CR on this, is it possible related to this OOM? or if it is related to application issue, how could we avoid it? 3. We have studied JVM doc on http://java.sun.com/docs/hotspot/gc5.0/gc_tuning_5.html See: 5.2.3 Out-of-Memory Exceptions The throughput collector will throw an out-of-memory exception if too much time is being spent doing garbage collection. For example, if the JVM is spending more than 98% of the total time doing garbage collection and is recovering less than 2% of the heap, it will throw an out-of-memory expection. The implementation of this feature has changed in 1.5. The policy is the same but there may be slight differences in behavior due to the new implementation. And our gc log: 25213.699: [GC 25213.699: [DefNew: 756K->82K(169600K), 0.0078670 secs]25213.707: [Tenured[Unloading class sun.reflect.GeneratedConstructorAccessor274] : 226997K->221430K(349568K), 2.0173917 secs] 227753K->221430K(519168K), 2.0254397 secs] 25215.724: [Full GC 25215.724: [Tenured: 221430K->221422K(349568K), 1.2257687 secs] 221430K->221422K(519168K), [Perm : 27496K->27496K(65536K)], 1.2259138 secs] Seems our gc log reveals a behavior like "if the JVM is spending more than 98% of the total time doing garbage collection and is recovering less than 2% of the heap", is this the time server threw the OOM? Thanks. Best wishes, Steve Zhuang -----Original Message----- From: Y Srinivas Ramakrishna [mailto:Y.S.Ramakrishna at Sun.COM] Sent: Wednesday, January 16, 2008 3:51 AM To: CHRIS GUAN (ISDC-ISD-OOCLL/SHA) Subject: Re: RE: Strange isssue about OOM I forgot to send a pointer to the CR:- http://bugs.sun.com/view_bug.do?bug_id=6206427 There really is not all that much to see in the bug report though except that newer jvm's do this by default so you do not have to explicitly enable the feature. -- ramki ----- Original Message ----- From: Y Srinivas Ramakrishna Date: Tuesday, January 15, 2008 11:22 am Subject: Re: RE: Strange isssue about OOM To: chris.guan at oocl.com Cc: hotspot-gc-use at openjdk.java.net > Hi Chris -- > > > Thanks for your help. > > > > Java version "1.4.2_13"; > > Oracle Application Server Containers for J2EE 10g (10.1.2.2.0) ; No > > System.gc(); WebService; > > > > Would you please explain "HandlePromotionFailure"?Where to see CR > > 6206427. > > > > In older versions of the JVM (such as the one you are using), we > didn't have a mechanism to recover from a partial scavenge if the > scavenge could not be completed because of lack of sufficient space in > the old generation to promote tenured objects into. So we used to be > very conservative in starting a scavenge and probably conservative in > precipitating a full collection if we felt the next scavenge would not > succeed. This conservatism could cause us to waste free space in the > old generation. See for example:- > > http://java.sun.com/docs/hotspot/gc1.4.2/#4.4.2.%20Young%20Generation% > 20Guarantee|outline > > In later JVM's, we added the ability to recover from a partially > completed scavenge and to compact the entire heap. This allowed us to > be less pessimistic about going into a scavenge, and would waste less > space in the old generation even when you run with large young > generations. See:- > > http://java.sun.com/docs/hotspot/gc5.0/gc_tuning_5.html#0.0.0.0.%20You > ng%20Generation%20Guarantee%7Coutline > > I'd actually recommend upgrading to a more recent JVM where you'd see > performance improvements, but if that is not possible, then with > 1.4.2_13, you should use -XX:+HandlePromotionFailure which should > allow you to make more efficient use of the heap by avoiding the issue > described above. > > As to the question of why you need to do, for example, the following > scavenge :- > > > > 25313.039: [GC 25313.039: [DefNew: 2421K->91K(169600K), 0.0079322 > > > secs]25313.047: [Tenured: 220264K->220182K(349568K), 1.3464051 > > > secs] > > My only guess is that the application may be requesting the allocation > of a large object. But that's a guess. (Yes, we should ideally have > that information be available from the gc logs.) > > -- ramki > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use IMPORTANT NOTICE Email from OOCL is confidential and may be legally privileged. If it is not intended for you, please delete it immediately unread. The internet cannot guarantee that this communication is free of viruses, interception or interference and anyone who communicates with us by email is taken to accept the risks in doing so. Without limitation, OOCL and its affiliates accept no liability whatsoever and howsoever arising in connection with the use of this email. Under no circumstances shall this email constitute a binding agreement to carry or for provision of carriage services by OOCL, which is subject to the availability of carrier's equipment and vessels and the terms and conditions of OOCL's standard bill of lading which is also available at http://www.oocl.com. From Jon.Masamitsu at Sun.COM Wed Jan 16 13:21:17 2008 From: Jon.Masamitsu at Sun.COM (Jon Masamitsu) Date: Wed, 16 Jan 2008 13:21:17 -0800 Subject: Strange isssue about OOM In-Reply-To: <586EF14A9B93B945B69DF313EEE8B9A50248D834@shamail3.corp.oocl.com> References: <586EF14A9B93B945B69DF313EEE8B9A50244CCDD@shamail3.corp.oocl.com> <586EF14A9B93B945B69DF313EEE8B9A50244CF73@shamail3.corp.oocl.com> <586EF14A9B93B945B69DF313EEE8B9A50244D51E@shamail3.corp.oocl.com> <586EF14A9B93B945B69DF313EEE8B9A50248D834@shamail3.corp.oocl.com> Message-ID: <478E754D.5050108@Sun.COM> Steve, The section 5.2.3 applies to the throughput collector (i.e., -XX:+UseParallelGC). The snippet of the gc log that you included does not look like a log from the throughput collector so I don't think it is an issue of too much time spent in GC. Jon steve.zhuang at oocl.com wrote: > avoid it? > > 3. We have studied JVM doc on > http://java.sun.com/docs/hotspot/gc5.0/gc_tuning_5.html > See: 5.2.3 Out-of-Memory Exceptions > The throughput collector will throw an out-of-memory exception if too > much time is being spent doing garbage collection. For example, if the > JVM is spending more than 98% of the total time doing garbage collection > and is recovering less than 2% of the heap, it will throw an > out-of-memory expection. The implementation of this feature has changed > in 1.5. The policy is the same but there may be slight differences in > behavior due to the new implementation. > > And our gc log: > 25213.699: [GC 25213.699: [DefNew: 756K->82K(169600K), 0.0078670 > secs]25213.707: [Tenured[Unloading class > sun.reflect.GeneratedConstructorAccessor274] > : 226997K->221430K(349568K), 2.0173917 secs] 227753K->221430K(519168K), > 2.0254397 secs] > 25215.724: [Full GC 25215.724: [Tenured: 221430K->221422K(349568K), > 1.2257687 secs] 221430K->221422K(519168K), [Perm : > 27496K->27496K(65536K)], 1.2259138 secs] > > Seems our gc log reveals a behavior like "if the JVM is spending more > than 98% of the total time doing garbage collection and is recovering > less than 2% of the heap", is this the time server threw the OOM? > > Thanks. > > Best wishes, > Steve Zhuang > > >