From jon at siliconcircus.com Tue Nov 1 05:58:09 2011 From: jon at siliconcircus.com (Jon Bright) Date: Tue, 01 Nov 2011 13:58:09 +0100 Subject: Identifying concurrent mode failures caused by fragmentation In-Reply-To: <4EAF5CCD.5030004@oracle.com> References: <4EAE9D6E.1060807@siliconcircus.com> <4EAF5CCD.5030004@oracle.com> Message-ID: <4EAFECE1.7040806@siliconcircus.com> Jon, Indeed, the problem appears to have gone away with today's update to u26. (We plan to migrate further, but we're fairly conservative about rolling out new versions, and we already had u26 in use elsewhere.) With regard to your (and Kris') question on incremental mode: I started out by reading the tuning guide at http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html#icms and followed that up by reading various other pages and your blog (which was very helpful in terms of giving a sense of how to think about GC - thank you!). Whilst I was fairly ambivalent about incremental mode (we have at least 4 logical CPUs in each machine), we'd been using it in the past and I didn't see anything specifically mentioning that it was obsolete. Is there a better reference on this subject? I'll certainly now try a few benchmarking/test runs with incremental mode turned off and roll that out if all is well. Thanks! Jon On 01.11.2011 03:43, Jon Masamitsu wrote: > Jon, > > I haven't looked at the longer log but in general I've found the > information in the GC logs inadequate to figure out if the > problem is fragmentation. But more important, there has > been some good work in recent versions of hotspot so that > we're more successful at combating fragmentation. Try > the latest release and see if it helps (u26 should be good > enough). > > Jon > > On 10/31/11 06:06, Jon Bright wrote: >> Hi, >> >> We have an application running with a 6GB heap (complete parameters >> below). Mostly it has a fairly low turnover of memory use, but on >> occasion, it will come under some pressure as it reloads a large >> in-memory data set from a database. >> >> Sometimes in this situation, we'll see a concurrent mode failure. >> Here's one failure: >> >> 20021.464: [GC 20021.465: [ParNew: 13093K->3939K(76672K), 0.0569240 >> secs]20021.522: [CMS20023.747: [CMS-concurrent-mark: 11.403/29.029 >> secs] [Times: user=41.11 sys=1.03, real=29.03 secs] >> (concurrent mode failure): 3873922K->2801744K(6206272K), 30.7900180 >> secs] 3886215K->2801744K(6282944K), [CMS Perm : >> 142884K->142834K(524288K)] icms_dc=33 , 30.8473830 secs] [Times: >> user=30.26 sys=0.71, real=30.85 secs] >> Total time for which application threads were stopped: 30.8484460 seconds >> >> (I've attached a lengthier log including the previous and subsequent >> CMS collection.) >> >> Am I correct in thinking that this failure can basically only be >> caused by fragmentation? Both young and old seem to have plenty of >> space. There doesn't seem to be any sign that the tenured generation >> would run out of space before CMS completes. Fragmentation is the only >> remaining cause that occurs to me. >> >> We're running with 1.6.0_11, although this will be upgraded to >> 1.6.0_26 tomorrow. I realise our current version is ancient - I'm not >> really looking for help on the problem itself, just for advice on >> whether the log line above indicates fragmentation. >> >> Thanks >> >> Jon Bright >> >> >> >> The parameters we have set are: >> >> -server >> -Xmx6144M >> -Xms6144M >> -XX:MaxPermSize=512m >> -XX:PermSize=512m >> -XX:+UseConcMarkSweepGC >> -XX:+CMSIncrementalMode >> -XX:+CMSIncrementalPacing >> -XX:SoftRefLRUPolicyMSPerMB=3 >> -XX:CMSIncrementalSafetyFactor=30 >> -XX:+PrintGCDetails >> -XX:+PrintGCApplicationStoppedTime >> -XX:+PrintGCApplicationConcurrentTime >> -XX:+PrintGCTimeStamps >> -Xloggc:/home/tbmx/log/gc_`date +%Y%m%d%H%M`.log >> >> >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From jon at siliconcircus.com Tue Nov 1 06:01:38 2011 From: jon at siliconcircus.com (Jon Bright) Date: Tue, 01 Nov 2011 14:01:38 +0100 Subject: Identifying concurrent mode failures caused by fragmentation In-Reply-To: References: <4EAE9D6E.1060807@siliconcircus.com> Message-ID: <4EAFEDB2.2030800@siliconcircus.com> Hi Kris, Many thanks for PrintFLSStatistics - it looks like just the sort of thing I'm after. I'm mostly reading the GC logs manually anyway, so breaking the parsing tools isn't a big deal for me. As I mentioned in my reply to Jon, we'll probably turn off incremental mode - I hadn't realised it was obsolete. Thanks for the hint. Jon On 01.11.2011 04:21, Krystal Mok wrote: > Hi Jon, > > It might be helpful to set -XX:PrintFLSStatistics to a value greater > than zero, to get the stats of FreeListSpace so that you'd know the size > of the biggest fragment. The GC log produced by -XX:+PrintGCDetails > doesn't give enough information on fragmentation. > > Here's an example of using -XX:PrintFLSStatistics=1: > https://gist.github.com/1329783 > It does make the GC log messier, and some of the GC log parsing tools > won't cope with this, but you get to know how bad the fragmentation is. > > Anyway, it looks like you're using CMS in incremental mode. This mode > should be obsolete in JDK6 already. Is there a good reason for you to be > using it? If not, I'd suggest turning it off, though, no matter if > you're upgrading your JDK or not. > > Regards, > Kris Mok > > On Mon, Oct 31, 2011 at 9:06 PM, Jon Bright > wrote: > > Hi, > > We have an application running with a 6GB heap (complete parameters > below). Mostly it has a fairly low turnover of memory use, but on > occasion, it will come under some pressure as it reloads a large > in-memory data set from a database. > > Sometimes in this situation, we'll see a concurrent mode failure. > Here's one failure: > > 20021.464: [GC 20021.465: [ParNew: 13093K->3939K(76672K), 0.0569240 > secs]20021.522: [CMS20023.747: [CMS-concurrent-mark: 11.403/29.029 > secs] [Times: user=41.11 sys=1.03, real=29.03 secs] > (concurrent mode failure): 3873922K->2801744K(6206272K), > 30.7900180 secs] 3886215K->2801744K(6282944K), [CMS Perm : > 142884K->142834K(524288K)] icms_dc=33 , 30.8473830 secs] [Times: > user=30.26 sys=0.71, real=30.85 secs] > Total time for which application threads were stopped: 30.8484460 > seconds > > (I've attached a lengthier log including the previous and subsequent > CMS collection.) > > Am I correct in thinking that this failure can basically only be > caused by fragmentation? Both young and old seem to have plenty of > space. There doesn't seem to be any sign that the tenured generation > would run out of space before CMS completes. Fragmentation is the > only remaining cause that occurs to me. > > We're running with 1.6.0_11, although this will be upgraded to > 1.6.0_26 tomorrow. I realise our current version is ancient - I'm > not really looking for help on the problem itself, just for advice > on whether the log line above indicates fragmentation. > > Thanks > > Jon Bright > > > > The parameters we have set are: > > -server > -Xmx6144M > -Xms6144M > -XX:MaxPermSize=512m > -XX:PermSize=512m > -XX:+UseConcMarkSweepGC > -XX:+CMSIncrementalMode > -XX:+CMSIncrementalPacing > -XX:SoftRefLRUPolicyMSPerMB=3 > -XX:__CMSIncrementalSafetyFactor=30 > -XX:+PrintGCDetails > -XX:+__PrintGCApplicationStoppedTime > -XX:+__PrintGCApplicationConcurrentTi__me > -XX:+PrintGCTimeStamps > -Xloggc:/home/tbmx/log/gc_`__date +%Y%m%d%H%M`.log > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > From jon.masamitsu at oracle.com Tue Nov 1 06:50:29 2011 From: jon.masamitsu at oracle.com (Jon Masamitsu) Date: Tue, 01 Nov 2011 06:50:29 -0700 Subject: Identifying concurrent mode failures caused by fragmentation In-Reply-To: <4EAFECE1.7040806@siliconcircus.com> References: <4EAE9D6E.1060807@siliconcircus.com> <4EAF5CCD.5030004@oracle.com> <4EAFECE1.7040806@siliconcircus.com> Message-ID: <4EAFF925.9000005@oracle.com> Jon, Incremental CMS (iCMS) was written for a specific use case - 1 or 2 hardware threads where concurrent activity by CMS would look like a STW (if only 1 hardware thread) or a high tax on the cpu cycles (2 hardware threads). It has a higher overhead and also is less efficient in terms of identifying garbage. The latter is because iCMS spreads out the concurrent work so that objects that it has identified as live earlier may actually be dead when the dead objects are swept up. It's worth testing with regular CMS instead of iCMS. BTW, for a 6g heap your young gen might be on the small side. A larger young gen allows more objects to die in the young gen and puts less pressure on the old (CMS) gen (i.e. fewer objects get promoted). Next time you want to play with your GC settings, try a larger young gen. Not sure if iCMS pushed you toward a smaller young gen. I personally don't have much experience with iCMS but with regular CMS, I would expect you to get better throughput with a larger young gen. As usual the devil is in the details. Jon On 11/01/11 05:58, Jon Bright wrote: > Jon, > > Indeed, the problem appears to have gone away with today's update to > u26. (We plan to migrate further, but we're fairly conservative about > rolling out new versions, and we already had u26 in use elsewhere.) > > With regard to your (and Kris') question on incremental mode: I started > out by reading the tuning guide at > > http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html#icms > > and followed that up by reading various other pages and your blog (which > was very helpful in terms of giving a sense of how to think about GC - > thank you!). > > Whilst I was fairly ambivalent about incremental mode (we have at least > 4 logical CPUs in each machine), we'd been using it in the past and I > didn't see anything specifically mentioning that it was obsolete. Is > there a better reference on this subject? > > I'll certainly now try a few benchmarking/test runs with incremental > mode turned off and roll that out if all is well. > > Thanks! > > Jon > > On 01.11.2011 03:43, Jon Masamitsu wrote: >> Jon, >> >> I haven't looked at the longer log but in general I've found the >> information in the GC logs inadequate to figure out if the >> problem is fragmentation. But more important, there has >> been some good work in recent versions of hotspot so that >> we're more successful at combating fragmentation. Try >> the latest release and see if it helps (u26 should be good >> enough). >> >> Jon >> >> On 10/31/11 06:06, Jon Bright wrote: >>> Hi, >>> >>> We have an application running with a 6GB heap (complete parameters >>> below). Mostly it has a fairly low turnover of memory use, but on >>> occasion, it will come under some pressure as it reloads a large >>> in-memory data set from a database. >>> >>> Sometimes in this situation, we'll see a concurrent mode failure. >>> Here's one failure: >>> >>> 20021.464: [GC 20021.465: [ParNew: 13093K->3939K(76672K), 0.0569240 >>> secs]20021.522: [CMS20023.747: [CMS-concurrent-mark: 11.403/29.029 >>> secs] [Times: user=41.11 sys=1.03, real=29.03 secs] >>> (concurrent mode failure): 3873922K->2801744K(6206272K), 30.7900180 >>> secs] 3886215K->2801744K(6282944K), [CMS Perm : >>> 142884K->142834K(524288K)] icms_dc=33 , 30.8473830 secs] [Times: >>> user=30.26 sys=0.71, real=30.85 secs] >>> Total time for which application threads were stopped: 30.8484460 seconds >>> >>> (I've attached a lengthier log including the previous and subsequent >>> CMS collection.) >>> >>> Am I correct in thinking that this failure can basically only be >>> caused by fragmentation? Both young and old seem to have plenty of >>> space. There doesn't seem to be any sign that the tenured generation >>> would run out of space before CMS completes. Fragmentation is the only >>> remaining cause that occurs to me. >>> >>> We're running with 1.6.0_11, although this will be upgraded to >>> 1.6.0_26 tomorrow. I realise our current version is ancient - I'm not >>> really looking for help on the problem itself, just for advice on >>> whether the log line above indicates fragmentation. >>> >>> Thanks >>> >>> Jon Bright >>> >>> >>> >>> The parameters we have set are: >>> >>> -server >>> -Xmx6144M >>> -Xms6144M >>> -XX:MaxPermSize=512m >>> -XX:PermSize=512m >>> -XX:+UseConcMarkSweepGC >>> -XX:+CMSIncrementalMode >>> -XX:+CMSIncrementalPacing >>> -XX:SoftRefLRUPolicyMSPerMB=3 >>> -XX:CMSIncrementalSafetyFactor=30 >>> -XX:+PrintGCDetails >>> -XX:+PrintGCApplicationStoppedTime >>> -XX:+PrintGCApplicationConcurrentTime >>> -XX:+PrintGCTimeStamps >>> -Xloggc:/home/tbmx/log/gc_`date +%Y%m%d%H%M`.log >>> >>> >>> >>> _______________________________________________ >>> hotspot-gc-use mailing list >>> hotspot-gc-use at openjdk.java.net >>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From mchr3k at gmail.com Sat Nov 5 15:29:10 2011 From: mchr3k at gmail.com (Martin Hare Robertson) Date: Sat, 5 Nov 2011 22:29:10 +0000 Subject: Perf Impact of CMSClassUnloadingEnabled Message-ID: Hi, I recently encountered an interesting GC issue with a Tomcat application. I came up with a simple repro scenario which I posted to StackOverflow: http://stackoverflow.com/questions/8017193/when-does-the-perm-gen-get-collected To solve this issue I have been encouraged to use -XX:+CMSClassUnloadingEnabled. I currently use the following GC configuration. -XX:+UseMembar -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:CMSInitiatingOccupancyFraction=80 -XX:+UseCMSInitiatingOccupancyOnly Is enabling CMSClassUnloadingEnabled likely to have a negative perf impact? If not, why is it disabled by default? Thanks Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20111105/5d3c6043/attachment.html From jon.masamitsu at oracle.com Mon Nov 7 07:44:17 2011 From: jon.masamitsu at oracle.com (Jon Masamitsu) Date: Mon, 07 Nov 2011 07:44:17 -0800 Subject: Perf Impact of CMSClassUnloadingEnabled In-Reply-To: References: Message-ID: <4EB7FCD1.4090607@oracle.com> Doing class unloading with CMS will often increase the remark pause times and so is not on by default. On 11/5/2011 3:29 PM, Martin Hare Robertson wrote: > Hi, > > I recently encountered an interesting GC issue with a Tomcat application. I > came up with a simple repro scenario which I posted to StackOverflow: > http://stackoverflow.com/questions/8017193/when-does-the-perm-gen-get-collected > > To solve this issue I have been encouraged to use > -XX:+CMSClassUnloadingEnabled. > I currently use the following GC configuration. > > -XX:+UseMembar > -XX:+UseConcMarkSweepGC > -XX:+UseParNewGC > -XX:CMSInitiatingOccupancyFraction=80 > -XX:+UseCMSInitiatingOccupancyOnly > > Is enabling CMSClassUnloadingEnabled likely to have a negative perf impact? > If not, why is it disabled by default? > > Thanks > > Martin > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20111107/b39b7256/attachment.html From Andreas.Loew at oracle.com Mon Nov 7 10:33:17 2011 From: Andreas.Loew at oracle.com (Andreas Loew) Date: Mon, 07 Nov 2011 19:33:17 +0100 Subject: Perf Impact of CMSClassUnloadingEnabled In-Reply-To: <4EB7FCD1.4090607@oracle.com> References: <4EB7FCD1.4090607@oracle.com> Message-ID: <4EB8246D.8000202@oracle.com> Hi Jon, sorry, a follow-up question from my side: As it shouldn't be the most normal thing even for a Java EE app to constantly dereference classloaders or single classes that need to be GC'ed: In how far does your statement about increased remark pauses still apply in case the PermGen / set of loaded classes has stayed completely constant between initial mark and remark (which should be the usual case)? And wouldn't there a also be a distinction between PermGen and Old Gen? Many thanks & best regards, Andreas -- Andreas Loew Senior Java Architect Oracle Advanced Customer Services Germany Am 07.11.2011 16:44, schrieb Jon Masamitsu: > Doing class unloading with CMS will often increase the remark pause times > and so is not on by default. > > On 11/5/2011 3:29 PM, Martin Hare Robertson wrote: >> Hi, >> >> I recently encountered an interesting GC issue with a Tomcat application. I >> came up with a simple repro scenario which I posted to StackOverflow: >> http://stackoverflow.com/questions/8017193/when-does-the-perm-gen-get-collected >> >> To solve this issue I have been encouraged to use >> -XX:+CMSClassUnloadingEnabled. >> I currently use the following GC configuration. >> >> -XX:+UseMembar >> -XX:+UseConcMarkSweepGC >> -XX:+UseParNewGC >> -XX:CMSInitiatingOccupancyFraction=80 >> -XX:+UseCMSInitiatingOccupancyOnly >> >> Is enabling CMSClassUnloadingEnabled likely to have a negative perf impact? >> If not, why is it disabled by default? >> >> Thanks >> >> Martin From jon.masamitsu at oracle.com Tue Nov 8 07:58:51 2011 From: jon.masamitsu at oracle.com (Jon Masamitsu) Date: Tue, 08 Nov 2011 07:58:51 -0800 Subject: Perf Impact of CMSClassUnloadingEnabled In-Reply-To: <4EB8246D.8000202@oracle.com> References: <4EB7FCD1.4090607@oracle.com> <4EB8246D.8000202@oracle.com> Message-ID: <4EB951BB.3000600@oracle.com> Andreas, Hotspot maintains a list of classes that are loaded in the Dictionary (dictionary.hpp/cpp). This list does not keep classes alive. After marking (when we know what classes are dead), we walk the list and remove dead classes. Hotspot does not keep information that says classes have not been unloaded, so the list is always walked. Jon On 11/07/11 10:33, Andreas Loew wrote: > Hi Jon, > > sorry, a follow-up question from my side: As it shouldn't be the most > normal thing even for a Java EE app to constantly dereference > classloaders or single classes that need to be GC'ed: > > In how far does your statement about increased remark pauses still > apply in case the PermGen / set of loaded classes has stayed > completely constant between initial mark and remark (which should be > the usual case)? > > And wouldn't there a also be a distinction between PermGen and Old Gen? > > Many thanks & best regards, > > Andreas > > -- > Andreas Loew > Senior Java Architect > Oracle Advanced Customer Services Germany > > > Am 07.11.2011 16:44, schrieb Jon Masamitsu: >> Doing class unloading with CMS will often increase the remark pause >> times >> and so is not on by default. >> >> On 11/5/2011 3:29 PM, Martin Hare Robertson wrote: >>> Hi, >>> >>> I recently encountered an interesting GC issue with a Tomcat >>> application. I >>> came up with a simple repro scenario which I posted to StackOverflow: >>> http://stackoverflow.com/questions/8017193/when-does-the-perm-gen-get-collected >>> >>> >>> To solve this issue I have been encouraged to use >>> -XX:+CMSClassUnloadingEnabled. >>> I currently use the following GC configuration. >>> >>> -XX:+UseMembar >>> -XX:+UseConcMarkSweepGC >>> -XX:+UseParNewGC >>> -XX:CMSInitiatingOccupancyFraction=80 >>> -XX:+UseCMSInitiatingOccupancyOnly >>> >>> Is enabling CMSClassUnloadingEnabled likely to have a negative perf >>> impact? >>> If not, why is it disabled by default? >>> >>> Thanks >>> >>> Martin From Andreas.Loew at oracle.com Tue Nov 8 08:13:58 2011 From: Andreas.Loew at oracle.com (Andreas Loew) Date: Tue, 08 Nov 2011 17:13:58 +0100 Subject: Perf Impact of CMSClassUnloadingEnabled In-Reply-To: <4EB951BB.3000600@oracle.com> References: <4EB7FCD1.4090607@oracle.com> <4EB8246D.8000202@oracle.com> <4EB951BB.3000600@oracle.com> Message-ID: <4EB95546.5000006@oracle.com> Hi Jon, many thanks for your reply :-) The behavior you mention indeed seems a little "unfortunate"... ;-) Will this change as part of the efforts to completely remove PermGen (part of the "HotRockit" initiative) following the example of JRockit? Thanks again & best regards, Andreas -- Andreas Loew Senior Java Architect Oracle Advanced Customer Services Germany Am 08.11.2011 16:58, schrieb Jon Masamitsu: > Andreas, > > Hotspot maintains a list of classes that are loaded in the > Dictionary (dictionary.hpp/cpp). This list does not keep > classes alive. After marking (when we know what classes > are dead), we walk the list and remove dead classes. > Hotspot does not keep information that says classes have > not been unloaded, so the list is always walked. > > Jon > > On 11/07/11 10:33, Andreas Loew wrote: >> Hi Jon, >> >> sorry, a follow-up question from my side: As it shouldn't be the most >> normal thing even for a Java EE app to constantly dereference >> classloaders or single classes that need to be GC'ed: >> >> In how far does your statement about increased remark pauses still >> apply in case the PermGen / set of loaded classes has stayed >> completely constant between initial mark and remark (which should be >> the usual case)? >> >> And wouldn't there a also be a distinction between PermGen and Old Gen? >> >> Many thanks & best regards, >> >> Andreas >> >> -- >> Andreas Loew >> Senior Java Architect >> Oracle Advanced Customer Services Germany >> >> >> Am 07.11.2011 16:44, schrieb Jon Masamitsu: >>> Doing class unloading with CMS will often increase the remark pause >>> times >>> and so is not on by default. >>> >>> On 11/5/2011 3:29 PM, Martin Hare Robertson wrote: >>>> Hi, >>>> >>>> I recently encountered an interesting GC issue with a Tomcat >>>> application. I >>>> came up with a simple repro scenario which I posted to StackOverflow: >>>> http://stackoverflow.com/questions/8017193/when-does-the-perm-gen-get-collected >>>> >>>> >>>> To solve this issue I have been encouraged to use >>>> -XX:+CMSClassUnloadingEnabled. >>>> I currently use the following GC configuration. >>>> >>>> -XX:+UseMembar >>>> -XX:+UseConcMarkSweepGC >>>> -XX:+UseParNewGC >>>> -XX:CMSInitiatingOccupancyFraction=80 >>>> -XX:+UseCMSInitiatingOccupancyOnly >>>> >>>> Is enabling CMSClassUnloadingEnabled likely to have a negative perf >>>> impact? >>>> If not, why is it disabled by default? >>>> >>>> Thanks >>>> >>>> Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20111108/2467fd8f/attachment.html From jon.masamitsu at oracle.com Tue Nov 8 09:28:42 2011 From: jon.masamitsu at oracle.com (Jon Masamitsu) Date: Tue, 08 Nov 2011 09:28:42 -0800 Subject: Perf Impact of CMSClassUnloadingEnabled In-Reply-To: <4EB95546.5000006@oracle.com> References: <4EB7FCD1.4090607@oracle.com> <4EB8246D.8000202@oracle.com> <4EB951BB.3000600@oracle.com> <4EB95546.5000006@oracle.com> Message-ID: <4EB966CA.1020604@oracle.com> On 11/08/11 08:13, Andreas Loew wrote: > Hi Jon, > > many thanks for your reply :-) > > The behavior you mention indeed seems a little "unfortunate"... ;-) > Will this change as part of the efforts to completely remove PermGen > (part of the "HotRockit" initiative) following the example of JRockit? It will be the case after perm gen removal that we will readily know if classes have been unloaded so will be able to conditionally skip the walk of the Dictionary for purposes of purging dead classes. Jon > > Thanks again & best regards, > > Andreas > > -- > Andreas Loew > Senior Java Architect > Oracle Advanced Customer Services Germany > > > Am 08.11.2011 16:58, schrieb Jon Masamitsu: >> Andreas, >> >> Hotspot maintains a list of classes that are loaded in the >> Dictionary (dictionary.hpp/cpp). This list does not keep >> classes alive. After marking (when we know what classes >> are dead), we walk the list and remove dead classes. >> Hotspot does not keep information that says classes have >> not been unloaded, so the list is always walked. >> >> Jon >> >> On 11/07/11 10:33, Andreas Loew wrote: >>> Hi Jon, >>> >>> sorry, a follow-up question from my side: As it shouldn't be the >>> most normal thing even for a Java EE app to constantly dereference >>> classloaders or single classes that need to be GC'ed: >>> >>> In how far does your statement about increased remark pauses still >>> apply in case the PermGen / set of loaded classes has stayed >>> completely constant between initial mark and remark (which should be >>> the usual case)? >>> >>> And wouldn't there a also be a distinction between PermGen and Old Gen? >>> >>> Many thanks & best regards, >>> >>> Andreas >>> >>> -- >>> Andreas Loew >>> Senior Java Architect >>> Oracle Advanced Customer Services Germany >>> >>> >>> Am 07.11.2011 16:44, schrieb Jon Masamitsu: >>>> Doing class unloading with CMS will often increase the remark pause >>>> times >>>> and so is not on by default. >>>> >>>> On 11/5/2011 3:29 PM, Martin Hare Robertson wrote: >>>>> Hi, >>>>> >>>>> I recently encountered an interesting GC issue with a Tomcat >>>>> application. I >>>>> came up with a simple repro scenario which I posted to StackOverflow: >>>>> http://stackoverflow.com/questions/8017193/when-does-the-perm-gen-get-collected >>>>> >>>>> >>>>> To solve this issue I have been encouraged to use >>>>> -XX:+CMSClassUnloadingEnabled. >>>>> I currently use the following GC configuration. >>>>> >>>>> -XX:+UseMembar >>>>> -XX:+UseConcMarkSweepGC >>>>> -XX:+UseParNewGC >>>>> -XX:CMSInitiatingOccupancyFraction=80 >>>>> -XX:+UseCMSInitiatingOccupancyOnly >>>>> >>>>> Is enabling CMSClassUnloadingEnabled likely to have a negative >>>>> perf impact? >>>>> If not, why is it disabled by default? >>>>> >>>>> Thanks >>>>> >>>>> Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20111108/72ec8a46/attachment.html From ysr1729 at gmail.com Tue Nov 8 10:20:46 2011 From: ysr1729 at gmail.com (Srinivas Ramakrishna) Date: Tue, 8 Nov 2011 10:20:46 -0800 Subject: Perf Impact of CMSClassUnloadingEnabled In-Reply-To: <4EB951BB.3000600@oracle.com> References: <4EB7FCD1.4090607@oracle.com> <4EB8246D.8000202@oracle.com> <4EB951BB.3000600@oracle.com> Message-ID: Right, and this walk is done single-threaded today (perhaps it could be parallelized without too much effort?). May be moot with the upcoming changes around perm gen though.... -- ramki On Tue, Nov 8, 2011 at 7:58 AM, Jon Masamitsu wrote: > Andreas, > > Hotspot maintains a list of classes that are loaded in the > Dictionary (dictionary.hpp/cpp). This list does not keep > classes alive. After marking (when we know what classes > are dead), we walk the list and remove dead classes. > Hotspot does not keep information that says classes have > not been unloaded, so the list is always walked. > > Jon > > On 11/07/11 10:33, Andreas Loew wrote: > > Hi Jon, > > > > sorry, a follow-up question from my side: As it shouldn't be the most > > normal thing even for a Java EE app to constantly dereference > > classloaders or single classes that need to be GC'ed: > > > > In how far does your statement about increased remark pauses still > > apply in case the PermGen / set of loaded classes has stayed > > completely constant between initial mark and remark (which should be > > the usual case)? > > > > And wouldn't there a also be a distinction between PermGen and Old Gen? > > > > Many thanks & best regards, > > > > Andreas > > > > -- > > Andreas Loew > > Senior Java Architect > > Oracle Advanced Customer Services Germany > > > > > > Am 07.11.2011 16:44, schrieb Jon Masamitsu: > >> Doing class unloading with CMS will often increase the remark pause > >> times > >> and so is not on by default. > >> > >> On 11/5/2011 3:29 PM, Martin Hare Robertson wrote: > >>> Hi, > >>> > >>> I recently encountered an interesting GC issue with a Tomcat > >>> application. I > >>> came up with a simple repro scenario which I posted to StackOverflow: > >>> > http://stackoverflow.com/questions/8017193/when-does-the-perm-gen-get-collected > >>> > >>> > >>> To solve this issue I have been encouraged to use > >>> -XX:+CMSClassUnloadingEnabled. > >>> I currently use the following GC configuration. > >>> > >>> -XX:+UseMembar > >>> -XX:+UseConcMarkSweepGC > >>> -XX:+UseParNewGC > >>> -XX:CMSInitiatingOccupancyFraction=80 > >>> -XX:+UseCMSInitiatingOccupancyOnly > >>> > >>> Is enabling CMSClassUnloadingEnabled likely to have a negative perf > >>> impact? > >>> If not, why is it disabled by default? > >>> > >>> Thanks > >>> > >>> Martin > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20111108/add51560/attachment-0001.html From ysr1729 at gmail.com Fri Nov 11 14:31:21 2011 From: ysr1729 at gmail.com (Srinivas Ramakrishna) Date: Fri, 11 Nov 2011 14:31:21 -0800 Subject: Class histogram output and chopping off long thin tails Message-ID: I am posting this to hotspot-gc-use, but the idea is that it also post to -dev (but given how the lists are arranged, I am posting directly to the one and not the other to avoid double copies to those who are in the intersection of the two kists, while covering those in the union of the two). I've noticed recently in my use of the the class histogram feature, that in typical cases I am interested in the top few types of objects and not in the long thin tail. I am not sure how typical my use or experience is, but it would appear to me (based on my limited experience of late) that if we limited the histogram output to the top "N" (for say N = 40 or so) classes by default, it would likely satisfy 80-90% of use cases. For the remaining 10% of use cases, one would provide a complete dump, or a dump with more entries than available by default. I wanted to run this suggestion by everyone and see whether this would have some traction wrt such a request. I am guessing that this may be especially useful when dealing with very large applications that may have many different types of objects in the heap and might present a very long thin (and in many cases uninteresting) tail. (There may be other ways of restricting the output, for example by cutting off output below a certain population or volume threshold, but simply displaying the top N most voluminous or populous classes would seem to be the simplest....) Comments? -- ramki -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20111111/6510c3e1/attachment.html From Peter.B.Kessler at Oracle.COM Fri Nov 11 15:19:12 2011 From: Peter.B.Kessler at Oracle.COM (Peter B. Kessler) Date: Fri, 11 Nov 2011 15:19:12 -0800 Subject: Class histogram output and chopping off long thin tails In-Reply-To: References: Message-ID: <4EBDAD70.6040009@Oracle.COM> This seems like a reasonable request. In fact, I thought there *was* a way not to print classes that had fewer than N bytes (or instances), but I don't see it (or any traces of it :-). The other way I've wanted to filter PrintClassHistogram is to only print objects of a particular class (or probably package). E.g., java.util.Hashtable and java.util.Hashtable$Entry, when I'm looking for a "leak" like that. Knowing me, I probably kludged those together with grep or awk. ... peter Srinivas Ramakrishna wrote: > I am posting this to hotspot-gc-use, but the idea is that it also post to -dev (but given how > the lists are arranged, I am posting directly to the one and not the other to avoid double copies > to those who are in the intersection of the two kists, while covering those in the union of the two). > > I've noticed recently in my use of the the class histogram feature, that in typical cases I am interested > in the top few types of objects and not in the long thin tail. I am not sure how typical my use or > experience is, but it would appear to me (based on my limited experience of late) that if we limited > the histogram output to the top "N" (for say N = 40 or so) classes by default, it would likely satisfy > 80-90% of use cases. For the remaining 10% of use cases, one would provide a complete dump, > or a dump with more entries than available by default. > > I wanted to run this suggestion by everyone and see whether this would have some traction > wrt such a request. > > I am guessing that this may be especially useful when dealing with very large applications that > may have many different types of objects in the heap and might present a very long thin (and in > many cases uninteresting) tail. (There may be other ways of restricting the output, for example > by cutting off output below a certain population or volume threshold, but simply displaying the > top N most voluminous or populous classes would seem to be the simplest....) > > Comments? > -- ramki From tony.printezis at oracle.com Mon Nov 14 07:13:55 2011 From: tony.printezis at oracle.com (Tony Printezis) Date: Mon, 14 Nov 2011 10:13:55 -0500 Subject: Class histogram output and chopping off long thin tails In-Reply-To: References: Message-ID: <4EC13033.6010901@oracle.com> Ramki, First, which version of the class histogram are you referring to? I assume it's the one we generate from within the JVM which goes to the GC log? If you were using jmap you could just pipe the output to head or similar. Is your concern mainly to keep the class histogram output reasonably compact? FWIW, and I don't know how common this scenario is, I once tracked down a leak by noticing that they were 2 instances of a particular class instead of 1 (I was replacing once instance with a newly-allocated one, but the original one ended up being queued up for finalization and held on to a lot of space). If we only dumped the top N classes I would have missed this piece of information. Maybe adding a new -XX parameter :-) to set N would be a good compromise? Tony On 11/11/2011 5:31 PM, Srinivas Ramakrishna wrote: > > I am posting this to hotspot-gc-use, but the idea is that it also post > to -dev (but given how > the lists are arranged, I am posting directly to the one and not the > other to avoid double copies > to those who are in the intersection of the two kists, while covering > those in the union of the two). > > I've noticed recently in my use of the the class histogram feature, > that in typical cases I am interested > in the top few types of objects and not in the long thin tail. I am > not sure how typical my use or > experience is, but it would appear to me (based on my limited > experience of late) that if we limited > the histogram output to the top "N" (for say N = 40 or so) classes by > default, it would likely satisfy > 80-90% of use cases. For the remaining 10% of use cases, one would > provide a complete dump, > or a dump with more entries than available by default. > > I wanted to run this suggestion by everyone and see whether this would > have some traction > wrt such a request. > > I am guessing that this may be especially useful when dealing with > very large applications that > may have many different types of objects in the heap and might present > a very long thin (and in > many cases uninteresting) tail. (There may be other ways of > restricting the output, for example > by cutting off output below a certain population or volume threshold, > but simply displaying the > top N most voluminous or populous classes would seem to be the > simplest....) > > Comments? > -- ramki > > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20111114/86c75dd7/attachment.html From stefan.karlsson at oracle.com Mon Nov 14 07:51:56 2011 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Mon, 14 Nov 2011 16:51:56 +0100 Subject: Class histogram output and chopping off long thin tails In-Reply-To: References: Message-ID: <4EC1391C.5020505@oracle.com> Hi Ramki, On 11/11/2011 11:31 PM, Srinivas Ramakrishna wrote: > > I am posting this to hotspot-gc-use, but the idea is that it also post > to -dev (but given how > the lists are arranged, I am posting directly to the one and not the > other to avoid double copies > to those who are in the intersection of the two kists, while covering > those in the union of the two). > > I've noticed recently in my use of the the class histogram feature, > that in typical cases I am interested > in the top few types of objects and not in the long thin tail. I am > not sure how typical my use or > experience is, but it would appear to me (based on my limited > experience of late) that if we limited > the histogram output to the top "N" (for say N = 40 or so) classes by > default, it would likely satisfy > 80-90% of use cases. For the remaining 10% of use cases, one would > provide a complete dump, > or a dump with more entries than available by default. > > I wanted to run this suggestion by everyone and see whether this would > have some traction > wrt such a request. We have this feature in JRockit's class histogram infrastructure, so maybe one could see this as a convergence "project"? For example: $ jrcmd 14991 print_object_summary 14991: --------- Detailed Heap Statistics: --------- 33.3% 79k 1005 +79k [C 22.3% 53k 456 +53k java/lang/Class 14.2% 34k 10 +34k [B 9.7% 23k 995 +23k java/lang/String 5.4% 12k 304 +12k [Ljava/lang/Object; 2.5% 5k 76 +5k java/lang/reflect/Method 1.3% 3k 157 +3k [Ljava/lang/Class; 1.3% 2k 49 +2k [Ljava/lang/String; 1.1% 2k 4 +2k [Ljrockit/vm/FCECache$FCE; 0.9% 2k 32 +2k java/lang/reflect/Field 0.7% 1k 20 +1k [Ljava/util/HashMap$Entry; 0.6% 1k 10 +1k java/lang/Thread 0.6% 1k 62 +1k java/util/Hashtable$Entry 0.5% 1k 5 +1k [I 239kB total --- --------- End of Detailed Heap Statistics --- where the cut off is as explained with: $ jrcmd 14991 help print_object_summary ... cutoff - classes that represent less than this percentage of totallive objects (measured in size) will not be displayed. Currently the percentage should be multiplied by 1000 so 1.5%% would be 1500 (int, 500) cutoffpointsto - like cutoff but for points-to information (int, 500) increaseonly - set if you only want to display the classes thatincreased since the last listing (bool, false) ... StefanK > > I am guessing that this may be especially useful when dealing with > very large applications that > may have many different types of objects in the heap and might present > a very long thin (and in > many cases uninteresting) tail. (There may be other ways of > restricting the output, for example > by cutting off output below a certain population or volume threshold, > but simply displaying the > top N most voluminous or populous classes would seem to be the > simplest....) > > Comments? > -- ramki > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20111114/b1536712/attachment.html From ysr1729 at gmail.com Mon Nov 14 11:37:17 2011 From: ysr1729 at gmail.com (Srinivas Ramakrishna) Date: Mon, 14 Nov 2011 11:37:17 -0800 Subject: Class histogram output and chopping off long thin tails In-Reply-To: <4EC13033.6010901@oracle.com> References: <4EC13033.6010901@oracle.com> Message-ID: On Mon, Nov 14, 2011 at 7:13 AM, Tony Printezis wrote: > Ramki, > > First, which version of the class histogram are you referring to? I assume > it's the one we generate from within the JVM which goes to the GC log? If > you were using jmap you could just pipe the output to head or similar. > Right -- the former. > > Is your concern mainly to keep the class histogram output reasonably > compact? FWIW, and I don't know how common this scenario is, I once tracked > down a leak by noticing that they were 2 instances of a particular class > instead of 1 (I was replacing once instance with a newly-allocated one, but > the original one ended up being queued up for finalization and held on to a > lot of space). If we only dumped the top N classes I would have missed this > piece of information. > Sure. I can imagine there are cases where the skinny tail is interesting and indeed vital. My guess (as i indicated in the email) was that perhaps the common use case was in the top part of the histogram, and the objective as you stated was compactness :-) > > Maybe adding a new -XX parameter :-) to set N would be a good compromise? > Sure. That's what i was suggesting, plus that the default be to favor compactness (because of my guesstimate on how the use-cases fell in practice, a guesstimate that could be wrong since it was based on subjective experience rather than a survey :-) thanks! -- ramki > > Tony > > > On 11/11/2011 5:31 PM, Srinivas Ramakrishna wrote: > > > I am posting this to hotspot-gc-use, but the idea is that it also post to > -dev (but given how > the lists are arranged, I am posting directly to the one and not the other > to avoid double copies > to those who are in the intersection of the two kists, while covering > those in the union of the two). > > I've noticed recently in my use of the the class histogram feature, that > in typical cases I am interested > in the top few types of objects and not in the long thin tail. I am not > sure how typical my use or > experience is, but it would appear to me (based on my limited experience > of late) that if we limited > the histogram output to the top "N" (for say N = 40 or so) classes by > default, it would likely satisfy > 80-90% of use cases. For the remaining 10% of use cases, one would provide > a complete dump, > or a dump with more entries than available by default. > > I wanted to run this suggestion by everyone and see whether this would > have some traction > wrt such a request. > > I am guessing that this may be especially useful when dealing with very > large applications that > may have many different types of objects in the heap and might present a > very long thin (and in > many cases uninteresting) tail. (There may be other ways of restricting > the output, for example > by cutting off output below a certain population or volume threshold, but > simply displaying the > top N most voluminous or populous classes would seem to be the > simplest....) > > Comments? > -- ramki > > > > > _______________________________________________ > hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20111114/47cba37e/attachment-0001.html From ysr1729 at gmail.com Mon Nov 14 11:40:46 2011 From: ysr1729 at gmail.com (Srinivas Ramakrishna) Date: Mon, 14 Nov 2011 11:40:46 -0800 Subject: Class histogram output and chopping off long thin tails In-Reply-To: <4EC1391C.5020505@oracle.com> References: <4EC1391C.5020505@oracle.com> Message-ID: Hi Stefan -- yes, +1 for that. With of course the option to have the entire listing if the user so chose. I love the +increaseonly option. From my limited experience, it would likley be a big hit (although lacking more experience in its behaviour, i am mildly concerned about short-term noise/volatility confusing the user). -- ramki On Mon, Nov 14, 2011 at 7:51 AM, Stefan Karlsson wrote: > ** > Hi Ramki, > > > On 11/11/2011 11:31 PM, Srinivas Ramakrishna wrote: > > > I am posting this to hotspot-gc-use, but the idea is that it also post to > -dev (but given how > the lists are arranged, I am posting directly to the one and not the other > to avoid double copies > to those who are in the intersection of the two kists, while covering > those in the union of the two). > > I've noticed recently in my use of the the class histogram feature, that > in typical cases I am interested > in the top few types of objects and not in the long thin tail. I am not > sure how typical my use or > experience is, but it would appear to me (based on my limited experience > of late) that if we limited > the histogram output to the top "N" (for say N = 40 or so) classes by > default, it would likely satisfy > 80-90% of use cases. For the remaining 10% of use cases, one would provide > a complete dump, > or a dump with more entries than available by default. > > I wanted to run this suggestion by everyone and see whether this would > have some traction > wrt such a request. > > > We have this feature in JRockit's class histogram infrastructure, so maybe > one could see this as a convergence "project"? > > For example: > $ jrcmd 14991 print_object_summary > 14991: > > --------- Detailed Heap Statistics: --------- > 33.3% 79k 1005 +79k [C > 22.3% 53k 456 +53k java/lang/Class > 14.2% 34k 10 +34k [B > 9.7% 23k 995 +23k java/lang/String > 5.4% 12k 304 +12k [Ljava/lang/Object; > 2.5% 5k 76 +5k java/lang/reflect/Method > 1.3% 3k 157 +3k [Ljava/lang/Class; > 1.3% 2k 49 +2k [Ljava/lang/String; > 1.1% 2k 4 +2k [Ljrockit/vm/FCECache$FCE; > 0.9% 2k 32 +2k java/lang/reflect/Field > 0.7% 1k 20 +1k [Ljava/util/HashMap$Entry; > 0.6% 1k 10 +1k java/lang/Thread > 0.6% 1k 62 +1k java/util/Hashtable$Entry > 0.5% 1k 5 +1k [I > 239kB total --- > > --------- End of Detailed Heap Statistics --- > > where the cut off is as explained with: > $ jrcmd 14991 help print_object_summary > ... > cutoff - classes that represent less than this > percentage of totallive objects (measured in > size) will not be displayed. > Currently the percentage should be multiplied > by 1000 so 1.5%% would be 1500 (int, 500) > cutoffpointsto - like cutoff but for points-to > information > (int, 500) > increaseonly - set if you only want to display the > classes > thatincreased since the last listing (bool, > false) > ... > > StefanK > > > I am guessing that this may be especially useful when dealing with very > large applications that > may have many different types of objects in the heap and might present a > very long thin (and in > many cases uninteresting) tail. (There may be other ways of restricting > the output, for example > by cutting off output below a certain population or volume threshold, but > simply displaying the > top N most voluminous or populous classes would seem to be the > simplest....) > > Comments? > -- ramki > > > > _______________________________________________ > hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20111114/c5587b07/attachment.html From tony.printezis at oracle.com Mon Nov 14 14:10:17 2011 From: tony.printezis at oracle.com (Tony Printezis) Date: Mon, 14 Nov 2011 17:10:17 -0500 Subject: Class histogram output and chopping off long thin tails In-Reply-To: References: <4EC13033.6010901@oracle.com> Message-ID: <4EC191C9.6010907@oracle.com> I'll be happy if we provided a parameter to limit the histogram output. But, I would personally recommend that the default value for this is "unbounded" for the reasons I described in my previous e-mail... Tony On 11/14/2011 02:37 PM, Srinivas Ramakrishna wrote: > > > On Mon, Nov 14, 2011 at 7:13 AM, Tony Printezis > > wrote: > > Ramki, > > First, which version of the class histogram are you referring to? > I assume it's the one we generate from within the JVM which goes > to the GC log? If you were using jmap you could just pipe the > output to head or similar. > > > Right -- the former. > > > Is your concern mainly to keep the class histogram output > reasonably compact? FWIW, and I don't know how common this > scenario is, I once tracked down a leak by noticing that they were > 2 instances of a particular class instead of 1 (I was replacing > once instance with a newly-allocated one, but the original one > ended up being queued up for finalization and held on to a lot of > space). If we only dumped the top N classes I would have missed > this piece of information. > > > Sure. I can imagine there are cases where the skinny tail is > interesting and indeed vital. My guess (as i indicated in the email) > was that perhaps the > common use case was in the top part of the histogram, and the > objective as you stated was compactness :-) > > > Maybe adding a new -XX parameter :-) to set N would be a good > compromise? > > > Sure. That's what i was suggesting, plus that the default be to favor > compactness (because of my guesstimate on how the use-cases fell in > practice, > a guesstimate that could be wrong since it was based on subjective > experience rather than a survey :-) > > thanks! > -- ramki > > > Tony > > > On 11/11/2011 5:31 PM, Srinivas Ramakrishna wrote: >> >> I am posting this to hotspot-gc-use, but the idea is that it also >> post to -dev (but given how >> the lists are arranged, I am posting directly to the one and not >> the other to avoid double copies >> to those who are in the intersection of the two kists, while >> covering those in the union of the two). >> >> I've noticed recently in my use of the the class histogram >> feature, that in typical cases I am interested >> in the top few types of objects and not in the long thin tail. I >> am not sure how typical my use or >> experience is, but it would appear to me (based on my limited >> experience of late) that if we limited >> the histogram output to the top "N" (for say N = 40 or so) >> classes by default, it would likely satisfy >> 80-90% of use cases. For the remaining 10% of use cases, one >> would provide a complete dump, >> or a dump with more entries than available by default. >> >> I wanted to run this suggestion by everyone and see whether this >> would have some traction >> wrt such a request. >> >> I am guessing that this may be especially useful when dealing >> with very large applications that >> may have many different types of objects in the heap and might >> present a very long thin (and in >> many cases uninteresting) tail. (There may be other ways of >> restricting the output, for example >> by cutting off output below a certain population or volume >> threshold, but simply displaying the >> top N most voluminous or populous classes would seem to be the >> simplest....) >> >> Comments? >> -- ramki >> >> >> >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20111114/99072f66/attachment.html From ysr1729 at gmail.com Mon Nov 14 14:22:13 2011 From: ysr1729 at gmail.com (Srinivas Ramakrishna) Date: Mon, 14 Nov 2011 14:22:13 -0800 Subject: Class histogram output and chopping off long thin tails In-Reply-To: <4EC191C9.6010907@oracle.com> References: <4EC13033.6010901@oracle.com> <4EC191C9.6010907@oracle.com> Message-ID: OK, sounds good to me. thanks! -- ramki On Mon, Nov 14, 2011 at 2:10 PM, Tony Printezis wrote: > I'll be happy if we provided a parameter to limit the histogram output. > But, I would personally recommend that the default value for this is > "unbounded" for the reasons I described in my previous e-mail... > > Tony > > > On 11/14/2011 02:37 PM, Srinivas Ramakrishna wrote: > > > > On Mon, Nov 14, 2011 at 7:13 AM, Tony Printezis > wrote: > >> Ramki, >> >> First, which version of the class histogram are you referring to? I >> assume it's the one we generate from within the JVM which goes to the GC >> log? If you were using jmap you could just pipe the output to head or >> similar. >> > > Right -- the former. > > >> >> Is your concern mainly to keep the class histogram output reasonably >> compact? FWIW, and I don't know how common this scenario is, I once tracked >> down a leak by noticing that they were 2 instances of a particular class >> instead of 1 (I was replacing once instance with a newly-allocated one, but >> the original one ended up being queued up for finalization and held on to a >> lot of space). If we only dumped the top N classes I would have missed this >> piece of information. >> > > Sure. I can imagine there are cases where the skinny tail is interesting > and indeed vital. My guess (as i indicated in the email) was that perhaps > the > common use case was in the top part of the histogram, and the objective as > you stated was compactness :-) > > >> >> Maybe adding a new -XX parameter :-) to set N would be a good compromise? >> > > Sure. That's what i was suggesting, plus that the default be to favor > compactness (because of my guesstimate on how the use-cases fell in > practice, > a guesstimate that could be wrong since it was based on subjective > experience rather than a survey :-) > > thanks! > -- ramki > > >> >> Tony >> >> >> On 11/11/2011 5:31 PM, Srinivas Ramakrishna wrote: >> >> >> I am posting this to hotspot-gc-use, but the idea is that it also post to >> -dev (but given how >> the lists are arranged, I am posting directly to the one and not the >> other to avoid double copies >> to those who are in the intersection of the two kists, while covering >> those in the union of the two). >> >> I've noticed recently in my use of the the class histogram feature, that >> in typical cases I am interested >> in the top few types of objects and not in the long thin tail. I am not >> sure how typical my use or >> experience is, but it would appear to me (based on my limited experience >> of late) that if we limited >> the histogram output to the top "N" (for say N = 40 or so) classes by >> default, it would likely satisfy >> 80-90% of use cases. For the remaining 10% of use cases, one would >> provide a complete dump, >> or a dump with more entries than available by default. >> >> I wanted to run this suggestion by everyone and see whether this would >> have some traction >> wrt such a request. >> >> I am guessing that this may be especially useful when dealing with very >> large applications that >> may have many different types of objects in the heap and might present a >> very long thin (and in >> many cases uninteresting) tail. (There may be other ways of restricting >> the output, for example >> by cutting off output below a certain population or volume threshold, but >> simply displaying the >> top N most voluminous or populous classes would seem to be the >> simplest....) >> >> Comments? >> -- ramki >> >> >> >> >> _______________________________________________ >> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20111114/a6a6d5e5/attachment.html From rhelbing at icubic.de Wed Nov 16 07:02:35 2011 From: rhelbing at icubic.de (Ralf Helbing) Date: Wed, 16 Nov 2011 16:02:35 +0100 Subject: GC Parameters for low-latency Message-ID: <4EC3D08B.4060309@icubic.de> dear mailing list, we try to achieve low latencies despite using a huge heap (10G) and many logical cores (64). VM is 1.7u1. Ideally, we would let GC ergonomics decide what is best, giving only a low pause time goal (50ms). -Xss2m -Xmx10000M -XX:PermSize=256m -XX:+UseAdaptiveGCBoundary -XX:+UseAdaptiveSizePolicy -XX:+UseConcMarkSweepGC -XX:MaxGCPauseMillis=100 -XX:ParallelGCThreads=12 -XX:+BindGCTaskThreadsToCPUs -XX:+UseGCTaskAffinity -XX:+UseCompressedOops -XX:+DoEscapeAnalysis Whenever we use adaptive sizes, the VM will crash in GenCollect*, as soon as some serious allocations start. I already filed a bug for this (7112413). Assuming a small newsize helps maintaining a low pause time goal, I can set the newsize, too. Say I set it to 100MB, it will increase later anyway, again yielding frequent pause times in over 1s by the time the newsize is around 1G. What am I doing wrong here? From jon.masamitsu at oracle.com Wed Nov 16 07:22:07 2011 From: jon.masamitsu at oracle.com (Jon Masamitsu) Date: Wed, 16 Nov 2011 07:22:07 -0800 Subject: GC Parameters for low-latency In-Reply-To: <4EC3D08B.4060309@icubic.de> References: <4EC3D08B.4060309@icubic.de> Message-ID: <4EC3D51F.6020509@oracle.com> Do not use UseAdaptiveSizePolicy with CMS. The implementation for CMS is incomplete. Never use UseAdaptiveGCBoundary. There are known problems with that option. UseAdaptiveSizePolicy should only be used with UseParallelGC and UseParallelOldGC. On 11/16/2011 7:02 AM, Ralf Helbing wrote: > dear mailing list, > > we try to achieve low latencies despite using a huge heap (10G) and many > logical cores (64). > VM is 1.7u1. Ideally, we would let GC ergonomics decide what is best, > giving only a low pause time goal (50ms). > > -Xss2m > -Xmx10000M > -XX:PermSize=256m > -XX:+UseAdaptiveGCBoundary > -XX:+UseAdaptiveSizePolicy > -XX:+UseConcMarkSweepGC > -XX:MaxGCPauseMillis=100 > -XX:ParallelGCThreads=12 > > -XX:+BindGCTaskThreadsToCPUs > -XX:+UseGCTaskAffinity > > -XX:+UseCompressedOops > -XX:+DoEscapeAnalysis > > Whenever we use adaptive sizes, the VM will crash in GenCollect*, as > soon as some serious allocations start. I already filed a bug for this > (7112413). > > Assuming a small newsize helps maintaining a low pause time goal, I can > set the newsize, too. Say I set it to 100MB, it will increase later > anyway, again yielding frequent pause times in over 1s by the time the > newsize is around 1G. > > What am I doing wrong here? > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From michal at frajt.eu Wed Nov 16 15:45:49 2011 From: michal at frajt.eu (Michal Frajt) Date: Thu, 17 Nov 2011 00:45:49 +0100 Subject: Survivor space class historgram print (know your bad garbage) Message-ID: <01e201cca4b9$de88e400$9b9aac00$@frajt.eu> Hi, Is there a way to get printed a survivor space class histogram on every minor collection run? Tony once provided us a special jdk build containing this feature but it got never integrated as a print flag into the main hotspot version. It was very useful for understanding and identifying promoted objects to the old gen. Additionally we were looking to get a class histogram print for the eden space but there was no easy way to implement it. The eden space class histogram would help to identify garbage invoking minor collection runs. It could be as well used to check the impact of the scalar replacements. Would it be possible to reimplement both histogram prints? Thanks, Michal -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20111117/c73ef862/attachment.html From ysr1729 at gmail.com Wed Nov 16 16:01:50 2011 From: ysr1729 at gmail.com (Srinivas Ramakrishna) Date: Wed, 16 Nov 2011 16:01:50 -0800 Subject: Survivor space class historgram print (know your bad garbage) In-Reply-To: <01e201cca4b9$de88e400$9b9aac00$@frajt.eu> References: <01e201cca4b9$de88e400$9b9aac00$@frajt.eu> Message-ID: AFAR, it was not integrated at that time because of the performance impact even when the feature was turned off. I believe it would be possible to refactor the code, at some cost, to get this to work without that performance impact, but that didn't get done. It might be time to revisit that code and do the requisite refactoring. Tony et al? -- ramki On Wed, Nov 16, 2011 at 3:45 PM, Michal Frajt wrote: > Hi,**** > > ** ** > > Is there a way to get printed a survivor space class histogram on every > minor collection run? Tony once provided us a special jdk build containing > this feature but it got never integrated as a print flag into the main > hotspot version. It was very useful for understanding and identifying > promoted objects to the old gen. Additionally we were looking to get a > class histogram print for the eden space but there was no easy way to > implement it. The eden space class histogram would help to identify garbage > invoking minor collection runs. It could be as well used to check the > impact of the scalar replacements.**** > > ** ** > > Would it be possible to reimplement both histogram prints?**** > > ** ** > > Thanks,**** > > Michal**** > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20111116/77872637/attachment.html From tony.printezis at oracle.com Thu Nov 17 05:10:56 2011 From: tony.printezis at oracle.com (Tony Printezis) Date: Thu, 17 Nov 2011 08:10:56 -0500 Subject: Survivor space class historgram print (know your bad garbage) In-Reply-To: References: <01e201cca4b9$de88e400$9b9aac00$@frajt.eu> Message-ID: <4EC507E0.1080905@oracle.com> No current plans to integrate this. But this is something we could consider as part of our ongoing effort to support Mission Control. Tony On 11/16/2011 7:01 PM, Srinivas Ramakrishna wrote: > AFAR, it was not integrated at that time because of the performance > impact even when the feature was turned off. > I believe it would be possible to refactor the code, at some cost, to > get this to work without that performance > impact, but that didn't get done. It might be time to revisit that > code and do the requisite refactoring. Tony et al? > > -- ramki > > > On Wed, Nov 16, 2011 at 3:45 PM, Michal Frajt > wrote: > > Hi, > > Is there a way to get printed a survivor space class histogram on > every minor collection run? Tony once provided us a special jdk > build containing this feature but it got never integrated as a > print flag into the main hotspot version. It was very useful for > understanding and identifying promoted objects to the old gen. > Additionally we were looking to get a class histogram print for > the eden space but there was no easy way to implement it. The eden > space class histogram would help to identify garbage invoking > minor collection runs. It could be as well used to check the > impact of the scalar replacements. > > Would it be possible to reimplement both histogram prints? > > Thanks, > > Michal > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20111117/d8942979/attachment.html From ysr1729 at gmail.com Thu Nov 17 08:53:32 2011 From: ysr1729 at gmail.com (Srinivas Ramakrishna) Date: Thu, 17 Nov 2011 08:53:32 -0800 Subject: Survivor space class historgram print (know your bad garbage) In-Reply-To: <4EC507E0.1080905@oracle.com> References: <01e201cca4b9$de88e400$9b9aac00$@frajt.eu> <4EC507E0.1080905@oracle.com> Message-ID: Yes, that seems like a good place to expose such functionality. The question of refactoring to allow the stats gathering to happen at low cost when disabled of course still remains, I guess. thanks! -- ramki On Thu, Nov 17, 2011 at 5:10 AM, Tony Printezis wrote: > No current plans to integrate this. But this is something we could > consider as part of our ongoing effort to support Mission Control. > > Tony > > > On 11/16/2011 7:01 PM, Srinivas Ramakrishna wrote: > > AFAR, it was not integrated at that time because of the performance impact > even when the feature was turned off. > I believe it would be possible to refactor the code, at some cost, to get > this to work without that performance > impact, but that didn't get done. It might be time to revisit that code > and do the requisite refactoring. Tony et al? > > -- ramki > > > On Wed, Nov 16, 2011 at 3:45 PM, Michal Frajt wrote: > >> Hi, >> >> >> >> Is there a way to get printed a survivor space class histogram on every >> minor collection run? Tony once provided us a special jdk build containing >> this feature but it got never integrated as a print flag into the main >> hotspot version. It was very useful for understanding and identifying >> promoted objects to the old gen. Additionally we were looking to get a >> class histogram print for the eden space but there was no easy way to >> implement it. The eden space class histogram would help to identify garbage >> invoking minor collection runs. It could be as well used to check the >> impact of the scalar replacements. >> >> >> >> Would it be possible to reimplement both histogram prints? >> >> >> >> Thanks, >> >> Michal >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> >> > > > _______________________________________________ > hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20111117/b5b8b509/attachment.html From knoguchi at yahoo-inc.com Tue Nov 22 13:06:49 2011 From: knoguchi at yahoo-inc.com (Koji Noguchi) Date: Tue, 22 Nov 2011 13:06:49 -0800 Subject: Is CMS cycle can collect finalize objects In-Reply-To: <<4DB5A34A.5040108@oracle.com>> Message-ID: This is from an old thread in 2011 April but we're still seeing the same problem with (nio) Socket instances not getting collecting by CMS. Opened http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7113118 Thanks, Koji (From http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2011-April/subject.htm l) On 4/25/11 8:37 AM, "Y. Srinivas Ramakrishna" > wrote: > On 4/25/2011 9:10 AM, Bharath Mundlapudi wrote: > > Hi Ramki, > > > > Thanks for the detailed explanation. I was trying to > > run some tests for your questions. Here are the answers to some of your > > questions. > > > >>> What are the symptoms? > > java.net.SocksSocketImpl objects are not getting cleaned up after a CMS > cycle. I see the direct > > correlation to java.lang.ref.Finalizer objects. Overtime, this fills up > > the old generation and CMS going in loop occupying complete one core. > > But when we trigger Full GC, these objects are garbage collected. > > OK, thanks. > > > > > You > > mentioned that CMS cycle does cleanup these objects provided we enable > > class unloading. Are you suggesting -XX:+ClassUnloading or > > -XX:+CMSClassUnloadingEnabled? I have tried with later and > > > > didn't > > succeed. Our pern gen is relatively constant, by enabling this, are we > > introducing performance overhead? We have room for CPU cycles and perm > > gen is relatively small, so this may be fine. Just that we want to see > > these objects should GC'ed in CMS cycle. > > > > > > Do you have any suggestion w.r.t. to which flags should i be using to > trigger this? > > For the issue you are seeing the -XX:+CMSClassUnloadingFlag will > not make a difference in the accumulation of the socket objects > because there is no "projection" as far as i can tell of these > into the perm gen, esepcially since as you say there is no class > loading going on (since your perm gen size remains constant after > start-up). > > However, keeping class unloading enabled via this flag should > hopefully not have much of an impact on your pause times given that > the perm gen is small. The typical effect you will see if class > unloading is enabled is that the CMS remark pause times are a bit > longer (if you enable PrintGCDetails you will see messages > such as "scrub string table" and "scrub symbol table", "code cache" > etc. BY comparing the CMS-remark pause details and times with > and without enabling class unloading you will get a good idea > of its impact. In some cases, eben though you pay a small price > in terms of increased CMS-remark pause times, you will make up > for that in terms of faster scavenges etc., so it might well > be worthwhile. > > In the very near future, we may end up turning that on > by default for CMS because the savings from leaving it off > by default are much smaller now and it can often lead to > other issues if class unloading is turned off. > > So bottom line is: it will not affect the accumulation of > your socket objects, but it's a good idea to keep class > unloading by CMS enabled anyway. > > > > > > >>> What does jmap -finalizerinfo on your process show? > >>> What does -XX:+PrintClassHistogram show as accumulating in the > heap? > >>> (Are they one specific type of Finalizer objects or all > varieties?) > > > > Jmap -histo shows the above class is keep accumulating. Infact, > > finalizerinfo doesn't show any objects on this process. > > OK, that shows that the objects are somehow not discovered by > CMS as being eligible for finalization. Although one can imagine > a one cycle delay (because of floating garbage) with CMS finding > these objects to be unreachable and hence eligible for finalization, > continuing accumulation of these objects over a period of time > (and presumably many CMS cycles) seems strange and almost > definitely a CMS bug especially as you find that a full STW > gc does indeed reclaim them. > > > > > > > > >>> Did the problem start in 6u21? Or are those the only versions > >>> you tested and found that there was an issue? > > We > > have seen this problem in 6u21. We were on 6u12 earlier and didn't run > > into this problem. But can't say this is a build particular, since lots > > of things have changed. > > Can you boil down this behavior into a test case that you are able > to share with us? > If so, please file a bug with the test case > and send me the CR id and I'll take a look. > > Oh, and before you do that, can you please check the latest public > release (6u24 or 6u25?) to see if the problem still reproduces? > > thanks, and sorry I could not be of more help without a bug > report or a test case. > > -- ramki > > > > > Thanks in anticipation, > > -Bharath > > From jon.masamitsu at oracle.com Wed Nov 23 05:31:29 2011 From: jon.masamitsu at oracle.com (Jon Masamitsu) Date: Wed, 23 Nov 2011 05:31:29 -0800 Subject: Is CMS cycle can collect finalize objects In-Reply-To: References: Message-ID: <4ECCF5B1.7040408@oracle.com> Koji, There is no engineer assigned to this CR and no progress has been made on it as far as I can tell. I'd suggest you pursue this through your Oracle support contacts. Jon On 11/22/2011 1:06 PM, Koji Noguchi wrote: > This is from an old thread in 2011 April but we're still seeing the same > problem with (nio) Socket instances not getting collecting by CMS. > > Opened http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7113118 > > Thanks, > Koji > > > (From > http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2011-April/subject.htm > l) > On 4/25/11 8:37 AM, "Y. Srinivas Ramakrishna"> > wrote: >> On 4/25/2011 9:10 AM, Bharath Mundlapudi wrote: >>> Hi Ramki, >>> >>> Thanks for the detailed explanation. I was trying to >>> run some tests for your questions. Here are the answers to some of your >>> questions. >>> >>>>> What are the symptoms? >>> java.net.SocksSocketImpl objects are not getting cleaned up after a CMS >> cycle. I see the direct >>> correlation to java.lang.ref.Finalizer objects. Overtime, this fills up >>> the old generation and CMS going in loop occupying complete one core. >>> But when we trigger Full GC, these objects are garbage collected. >> OK, thanks. >> >>> You >>> mentioned that CMS cycle does cleanup these objects provided we enable >>> class unloading. Are you suggesting -XX:+ClassUnloading or >>> -XX:+CMSClassUnloadingEnabled? I have tried with later and >>> >>> didn't >>> succeed. Our pern gen is relatively constant, by enabling this, are we >>> introducing performance overhead? We have room for CPU cycles and perm >>> gen is relatively small, so this may be fine. Just that we want to see >>> these objects should GC'ed in CMS cycle. >>> >>> >>> Do you have any suggestion w.r.t. to which flags should i be using to >> trigger this? >> >> For the issue you are seeing the -XX:+CMSClassUnloadingFlag will >> not make a difference in the accumulation of the socket objects >> because there is no "projection" as far as i can tell of these >> into the perm gen, esepcially since as you say there is no class >> loading going on (since your perm gen size remains constant after >> start-up). >> >> However, keeping class unloading enabled via this flag should >> hopefully not have much of an impact on your pause times given that >> the perm gen is small. The typical effect you will see if class >> unloading is enabled is that the CMS remark pause times are a bit >> longer (if you enable PrintGCDetails you will see messages >> such as "scrub string table" and "scrub symbol table", "code cache" >> etc. BY comparing the CMS-remark pause details and times with >> and without enabling class unloading you will get a good idea >> of its impact. In some cases, eben though you pay a small price >> in terms of increased CMS-remark pause times, you will make up >> for that in terms of faster scavenges etc., so it might well >> be worthwhile. >> >> In the very near future, we may end up turning that on >> by default for CMS because the savings from leaving it off >> by default are much smaller now and it can often lead to >> other issues if class unloading is turned off. >> >> So bottom line is: it will not affect the accumulation of >> your socket objects, but it's a good idea to keep class >> unloading by CMS enabled anyway. >> >>> >>>>> What does jmap -finalizerinfo on your process show? >>>>> What does -XX:+PrintClassHistogram show as accumulating in the >> heap? >>>>> (Are they one specific type of Finalizer objects or all >> varieties?) >>> Jmap -histo shows the above class is keep accumulating. Infact, >>> finalizerinfo doesn't show any objects on this process. >> OK, that shows that the objects are somehow not discovered by >> CMS as being eligible for finalization. Although one can imagine >> a one cycle delay (because of floating garbage) with CMS finding >> these objects to be unreachable and hence eligible for finalization, >> continuing accumulation of these objects over a period of time >> (and presumably many CMS cycles) seems strange and almost >> definitely a CMS bug especially as you find that a full STW >> gc does indeed reclaim them. >> >>> >>> >>>>> Did the problem start in 6u21? Or are those the only versions >>>>> you tested and found that there was an issue? >>> We >>> have seen this problem in 6u21. We were on 6u12 earlier and didn't run >>> into this problem. But can't say this is a build particular, since lots >>> of things have changed. >> Can you boil down this behavior into a test case that you are able >> to share with us? >> If so, please file a bug with the test case >> and send me the CR id and I'll take a look. >> >> Oh, and before you do that, can you please check the latest public >> release (6u24 or 6u25?) to see if the problem still reproduces? >> >> thanks, and sorry I could not be of more help without a bug >> report or a test case. >> >> -- ramki >> >>> Thanks in anticipation, >>> -Bharath >>> > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From rednaxelafx at gmail.com Wed Nov 23 05:43:00 2011 From: rednaxelafx at gmail.com (Krystal Mok) Date: Wed, 23 Nov 2011 21:43:00 +0800 Subject: Is CMS cycle can collect finalize objects In-Reply-To: <4ECCF5B1.7040408@oracle.com> References: <4ECCF5B1.7040408@oracle.com> Message-ID: Hi, I submitted a patch recently to mitigate the specific CMS problem caused by excessive SocksSocketImpl objects, by trying to avoid creating them in the first place. [1] That doesn't solve the general case if there really is a problem with CMS and finalization. Since we've hit the same problem here, we might investigate further on CMS. I'll report back shall we make progress on it. Regards, Kris Mok Software Engineer, Taobao (http://www.taobao.com) [1]: http://mail.openjdk.java.net/pipermail/nio-dev/2011-November/001480.html On Wed, Nov 23, 2011 at 9:31 PM, Jon Masamitsu wrote: > Koji, > > There is no engineer assigned to this CR and no progress has been > made on it as far as I can tell. I'd suggest you pursue this through > your Oracle support contacts. > > Jon > > > > On 11/22/2011 1:06 PM, Koji Noguchi wrote: > > This is from an old thread in 2011 April but we're still seeing the same > > problem with (nio) Socket instances not getting collecting by CMS. > > > > Opened http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7113118 > > > > Thanks, > > Koji > > > > > > (From > > > http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2011-April/subject.htm > > l) > > On 4/25/11 8:37 AM, "Y. Srinivas Ramakrishna" >> > > wrote: > >> On 4/25/2011 9:10 AM, Bharath Mundlapudi wrote: > >>> Hi Ramki, > >>> > >>> Thanks for the detailed explanation. I was trying to > >>> run some tests for your questions. Here are the answers to some of your > >>> questions. > >>> > >>>>> What are the symptoms? > >>> java.net.SocksSocketImpl objects are not getting cleaned up after a CMS > >> cycle. I see the direct > >>> correlation to java.lang.ref.Finalizer objects. Overtime, this fills up > >>> the old generation and CMS going in loop occupying complete one core. > >>> But when we trigger Full GC, these objects are garbage collected. > >> OK, thanks. > >> > >>> You > >>> mentioned that CMS cycle does cleanup these objects provided we > enable > >>> class unloading. Are you suggesting -XX:+ClassUnloading or > >>> -XX:+CMSClassUnloadingEnabled? I have tried with later and > >>> > >>> didn't > >>> succeed. Our pern gen is relatively constant, by enabling this, > are we > >>> introducing performance overhead? We have room for CPU cycles and > perm > >>> gen is relatively small, so this may be fine. Just that we want to see > >>> these objects should GC'ed in CMS cycle. > >>> > >>> > >>> Do you have any suggestion w.r.t. to which flags should i be using to > >> trigger this? > >> > >> For the issue you are seeing the -XX:+CMSClassUnloadingFlag will > >> not make a difference in the accumulation of the socket objects > >> because there is no "projection" as far as i can tell of these > >> into the perm gen, esepcially since as you say there is no class > >> loading going on (since your perm gen size remains constant after > >> start-up). > >> > >> However, keeping class unloading enabled via this flag should > >> hopefully not have much of an impact on your pause times given that > >> the perm gen is small. The typical effect you will see if class > >> unloading is enabled is that the CMS remark pause times are a bit > >> longer (if you enable PrintGCDetails you will see messages > >> such as "scrub string table" and "scrub symbol table", "code cache" > >> etc. BY comparing the CMS-remark pause details and times with > >> and without enabling class unloading you will get a good idea > >> of its impact. In some cases, eben though you pay a small price > >> in terms of increased CMS-remark pause times, you will make up > >> for that in terms of faster scavenges etc., so it might well > >> be worthwhile. > >> > >> In the very near future, we may end up turning that on > >> by default for CMS because the savings from leaving it off > >> by default are much smaller now and it can often lead to > >> other issues if class unloading is turned off. > >> > >> So bottom line is: it will not affect the accumulation of > >> your socket objects, but it's a good idea to keep class > >> unloading by CMS enabled anyway. > >> > >>> > >>>>> What does jmap -finalizerinfo on your process show? > >>>>> What does -XX:+PrintClassHistogram show as accumulating in the > >> heap? > >>>>> (Are they one specific type of Finalizer objects or all > >> varieties?) > >>> Jmap -histo shows the above class is keep accumulating. Infact, > >>> finalizerinfo doesn't show any objects on this process. > >> OK, that shows that the objects are somehow not discovered by > >> CMS as being eligible for finalization. Although one can imagine > >> a one cycle delay (because of floating garbage) with CMS finding > >> these objects to be unreachable and hence eligible for finalization, > >> continuing accumulation of these objects over a period of time > >> (and presumably many CMS cycles) seems strange and almost > >> definitely a CMS bug especially as you find that a full STW > >> gc does indeed reclaim them. > >> > >>> > >>> > >>>>> Did the problem start in 6u21? Or are those the only versions > >>>>> you tested and found that there was an issue? > >>> We > >>> have seen this problem in 6u21. We were on 6u12 earlier and didn't > run > >>> into this problem. But can't say this is a build particular, since lots > >>> of things have changed. > >> Can you boil down this behavior into a test case that you are able > >> to share with us? > >> If so, please file a bug with the test case > >> and send me the CR id and I'll take a look. > >> > >> Oh, and before you do that, can you please check the latest public > >> release (6u24 or 6u25?) to see if the problem still reproduces? > >> > >> thanks, and sorry I could not be of more help without a bug > >> report or a test case. > >> > >> -- ramki > >> > >>> Thanks in anticipation, > >>> -Bharath > >>> > > > > > > _______________________________________________ > > hotspot-gc-use mailing list > > hotspot-gc-use at openjdk.java.net > > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20111123/f74f8ced/attachment-0001.html From knoguchi at yahoo-inc.com Wed Nov 23 14:14:56 2011 From: knoguchi at yahoo-inc.com (Koji Noguchi) Date: Wed, 23 Nov 2011 14:14:56 -0800 Subject: Is CMS cycle can collect finalize objects In-Reply-To: Message-ID: Thanks Kris and Jon. On 11/23/11 5:43 AM, "Krystal Mok" wrote: > [1]: http://mail.openjdk.java.net/pipermail/nio-dev/2011-November/001480.html > Yes, I believe we?re hitting the same issue and that nio change would solve at least the problem we?re facing. On Wed, Nov 23, 2011 at 9:31 PM, Jon Masamitsu wrote: > I'd suggest you pursue this through your Oracle support contacts. > Thanks. I?ll try that. Koji On 11/23/11 5:43 AM, "Krystal Mok" wrote: Hi, I submitted a patch recently to mitigate the specific CMS problem caused by excessive SocksSocketImpl objects, by trying to avoid creating them in the first place. [1] That doesn't solve the general case if there really is a problem with CMS and finalization. Since we've hit the same problem here, we might investigate further on CMS. I'll report back shall we make progress on it. Regards, Kris Mok Software Engineer, Taobao (http://www.taobao.com) [1]: http://mail.openjdk.java.net/pipermail/nio-dev/2011-November/001480.html On Wed, Nov 23, 2011 at 9:31 PM, Jon Masamitsu wrote: Koji, There is no engineer assigned to this CR and no progress has been made on it as far as I can tell. I'd suggest you pursue this through your Oracle support contacts. Jon On 11/22/2011 1:06 PM, Koji Noguchi wrote: > This is from an old thread in 2011 April but we're still seeing the same > problem with (nio) Socket instances not getting collecting by CMS. > > Opened http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7113118 > > Thanks, > Koji > > > (From > http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2011-April/subject.htm > l) > On 4/25/11 8:37 AM, "Y. Srinivas Ramakrishna"> > wrote: >> On 4/25/2011 9:10 AM, Bharath Mundlapudi wrote: >>> Hi Ramki, >>> >>> Thanks for the detailed explanation. I was trying to >>> run some tests for your questions. Here are the answers to some of your >>> questions. >>> >>>>> What are the symptoms? >>> java.net.SocksSocketImpl objects are not getting cleaned up after a CMS >> cycle. I see the direct >>> correlation to java.lang.ref.Finalizer objects. Overtime, this fills up >>> the old generation and CMS going in loop occupying complete one core. >>> But when we trigger Full GC, these objects are garbage collected. >> OK, thanks. >> >>> You >>> mentioned that CMS cycle does cleanup these objects provided we enable >>> class unloading. Are you suggesting -XX:+ClassUnloading or >>> -XX:+CMSClassUnloadingEnabled? I have tried with later and >>> >>> didn't >>> succeed. Our pern gen is relatively constant, by enabling this, are we >>> introducing performance overhead? We have room for CPU cycles and perm >>> gen is relatively small, so this may be fine. Just that we want to see >>> these objects should GC'ed in CMS cycle. >>> >>> >>> Do you have any suggestion w.r.t. to which flags should i be using to >> trigger this? >> >> For the issue you are seeing the -XX:+CMSClassUnloadingFlag will >> not make a difference in the accumulation of the socket objects >> because there is no "projection" as far as i can tell of these >> into the perm gen, esepcially since as you say there is no class >> loading going on (since your perm gen size remains constant after >> start-up). >> >> However, keeping class unloading enabled via this flag should >> hopefully not have much of an impact on your pause times given that >> the perm gen is small. The typical effect you will see if class >> unloading is enabled is that the CMS remark pause times are a bit >> longer (if you enable PrintGCDetails you will see messages >> such as "scrub string table" and "scrub symbol table", "code cache" >> etc. BY comparing the CMS-remark pause details and times with >> and without enabling class unloading you will get a good idea >> of its impact. In some cases, eben though you pay a small price >> in terms of increased CMS-remark pause times, you will make up >> for that in terms of faster scavenges etc., so it might well >> be worthwhile. >> >> In the very near future, we may end up turning that on >> by default for CMS because the savings from leaving it off >> by default are much smaller now and it can often lead to >> other issues if class unloading is turned off. >> >> So bottom line is: it will not affect the accumulation of >> your socket objects, but it's a good idea to keep class >> unloading by CMS enabled anyway. >> >>> >>>>> What does jmap -finalizerinfo on your process show? >>>>> What does -XX:+PrintClassHistogram show as accumulating in the >> heap? >>>>> (Are they one specific type of Finalizer objects or all >> varieties?) >>> Jmap -histo shows the above class is keep accumulating. Infact, >>> finalizerinfo doesn't show any objects on this process. >> OK, that shows that the objects are somehow not discovered by >> CMS as being eligible for finalization. Although one can imagine >> a one cycle delay (because of floating garbage) with CMS finding >> these objects to be unreachable and hence eligible for finalization, >> continuing accumulation of these objects over a period of time >> (and presumably many CMS cycles) seems strange and almost >> definitely a CMS bug especially as you find that a full STW >> gc does indeed reclaim them. >> >>> >>> >>>>> Did the problem start in 6u21? Or are those the only versions >>>>> you tested and found that there was an issue? >>> We >>> have seen this problem in 6u21. We were on 6u12 earlier and didn't run >>> into this problem. But can't say this is a build particular, since lots >>> of things have changed. >> Can you boil down this behavior into a test case that you are able >> to share with us? >> If so, please file a bug with the test case >> and send me the CR id and I'll take a look. >> >> Oh, and before you do that, can you please check the latest public >> release (6u24 or 6u25?) to see if the problem still reproduces? >> >> thanks, and sorry I could not be of more help without a bug >> report or a test case. >> >> -- ramki >> >>> Thanks in anticipation, >>> -Bharath >>> > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use _______________________________________________ hotspot-gc-use mailing list hotspot-gc-use at openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20111123/a72c14a8/attachment.html From ysr1729 at gmail.com Wed Nov 23 15:11:38 2011 From: ysr1729 at gmail.com (Srinivas Ramakrishna) Date: Wed, 23 Nov 2011 15:11:38 -0800 Subject: Is CMS cycle can collect finalize objects In-Reply-To: References: Message-ID: Hi Koji -- Thanks for the test case, that should definitely help with the dentification of the problem. I'll see if i can find some spare time to pursue it one of these days (but can't promise), so please do open that Oracle support ticket to get the requisite resource allocated for the official investigation. Thanks again for boiling it down to a simple test case, and i'll update if i identify the root cause... -- ramki On Tue, Nov 22, 2011 at 1:06 PM, Koji Noguchi wrote: > This is from an old thread in 2011 April but we're still seeing the same > problem with (nio) Socket instances not getting collecting by CMS. > > Opened http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7113118 > > Thanks, > Koji > > > (From > > http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2011-April/subject.htm > l) > On 4/25/11 8:37 AM, "Y. Srinivas Ramakrishna" >> > wrote: > > On 4/25/2011 9:10 AM, Bharath Mundlapudi wrote: > > > Hi Ramki, > > > > > > Thanks for the detailed explanation. I was trying to > > > run some tests for your questions. Here are the answers to some of your > > > questions. > > > > > >>> What are the symptoms? > > > java.net.SocksSocketImpl objects are not getting cleaned up after a CMS > > cycle. I see the direct > > > correlation to java.lang.ref.Finalizer objects. Overtime, this fills up > > > the old generation and CMS going in loop occupying complete one core. > > > But when we trigger Full GC, these objects are garbage collected. > > > > OK, thanks. > > > > > > > > You > > > mentioned that CMS cycle does cleanup these objects provided we > enable > > > class unloading. Are you suggesting -XX:+ClassUnloading or > > > -XX:+CMSClassUnloadingEnabled? I have tried with later and > > > > > > didn't > > > succeed. Our pern gen is relatively constant, by enabling this, are > we > > > introducing performance overhead? We have room for CPU cycles and > perm > > > gen is relatively small, so this may be fine. Just that we want to see > > > these objects should GC'ed in CMS cycle. > > > > > > > > > Do you have any suggestion w.r.t. to which flags should i be using to > > trigger this? > > > > For the issue you are seeing the -XX:+CMSClassUnloadingFlag will > > not make a difference in the accumulation of the socket objects > > because there is no "projection" as far as i can tell of these > > into the perm gen, esepcially since as you say there is no class > > loading going on (since your perm gen size remains constant after > > start-up). > > > > > However, keeping class unloading enabled via this flag should > > hopefully not have much of an impact on your pause times given that > > the perm gen is small. The typical effect you will see if class > > unloading is enabled is that the CMS remark pause times are a bit > > longer (if you enable PrintGCDetails you will see messages > > such as "scrub string table" and "scrub symbol table", "code cache" > > etc. BY comparing the CMS-remark pause details and times with > > and without enabling class unloading you will get a good idea > > of its impact. In some cases, eben though you pay a small price > > in terms of increased CMS-remark pause times, you will make up > > for that in terms of faster scavenges etc., so it might well > > be worthwhile. > > > > In the very near future, we may end up turning that on > > by default for CMS because the savings from leaving it off > > by default are much smaller now and it can often lead to > > other issues if class unloading is turned off. > > > > So bottom line is: it will not affect the accumulation of > > your socket objects, but it's a good idea to keep class > > unloading by CMS enabled anyway. > > > > > > > > > > >>> What does jmap -finalizerinfo on your process show? > > >>> What does -XX:+PrintClassHistogram show as accumulating in the > > heap? > > >>> (Are they one specific type of Finalizer objects or all > > varieties?) > > > > > > Jmap -histo shows the above class is keep accumulating. Infact, > > > finalizerinfo doesn't show any objects on this process. > > > > OK, that shows that the objects are somehow not discovered by > > CMS as being eligible for finalization. Although one can imagine > > a one cycle delay (because of floating garbage) with CMS finding > > these objects to be unreachable and hence eligible for finalization, > > continuing accumulation of these objects over a period of time > > (and presumably many CMS cycles) seems strange and almost > > definitely a CMS bug especially as you find that a full STW > > gc does indeed reclaim them. > > > > > > > > > > > > > >>> Did the problem start in 6u21? Or are those the only versions > > >>> you tested and found that there was an issue? > > > We > > > have seen this problem in 6u21. We were on 6u12 earlier and didn't > run > > > into this problem. But can't say this is a build particular, since lots > > > of things have changed. > > > > Can you boil down this behavior into a test case that you are able > > to share with us? > > If so, please file a bug with the test case > > and send me the CR id and I'll take a look. > > > > Oh, and before you do that, can you please check the latest public > > release (6u24 or 6u25?) to see if the problem still reproduces? > > > > thanks, and sorry I could not be of more help without a bug > > report or a test case. > > > > -- ramki > > > > > > > > Thanks in anticipation, > > > -Bharath > > > > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20111123/150dd794/attachment-0001.html From rednaxelafx at gmail.com Thu Nov 24 01:52:13 2011 From: rednaxelafx at gmail.com (Krystal Mok) Date: Thu, 24 Nov 2011 17:52:13 +0800 Subject: Is CMS cycle can collect finalize objects In-Reply-To: References: Message-ID: Hi Koji and Ramki, I had a look at the repro test case in Bug 7113118. I don't think the test case is showing the same problem as the original one caused by SocksSocketImpl objects. The way this test case is behaving is exactly what the VM arguments told it to do. I ran the test case on a 64-bit Linux HotSpot Server VM, on JDK 6 update 29. My .hotspotrc is at [1]. SurvivorRatio doesn't need to be set explicitly, because when CMS is in use and MaxTenuringThreshold is 0 (or AlwaysTenure is true), the SurvivorRatio will automatically be set to 1024 by ergonomics. UsePSAdaptiveSurvivorSizePolicy has no effect when CMS is in use, so it's omitted from my configuration, too. By using -XX:+PrintReferenceGC, the gc log will show when and how many finalizable object are discovered. The VM arguments given force all surviving object from minor collections to be promoted into the old generation, and none of the minor collections had a chance to discovery any ready-to-be-collected FinalReferences, so minor GC logs aren't of interest in this case. All of the minor GC log lines show "[FinalReference, 0 refs, xxx secs]". A part of the GC log can be found at [2]. This log shows two CMS collections cycles, in between dozens of minor collections. * Before the first of these two CMS collections, the Java heap used is 971914K, and then the CMS occupancy threshold is crossed so a CMS collection cycle starts; * During the re-mark phase of the first CMS collection, 46400 FinalReferences were discovered; * After the first CMS collection, the Java heap used is still high, at 913771K, because the finalizable objects need another old generation collection to be collected (either CMS or full GC is fine); * During the re-mark phase of the second CMS collection, 3000 FinalReferences were discovered, these are from promoted objects from the minor collections in between; * After the second CMS collection, the Java heap used goes down to 61747K, as the finalizable objects discovered from the first CMS collection are indeed finalized and then collected during the second CMS collection. This behavior looks normal to me -- it's what the VM arguments were telling the VM to do. The reason that the Java heap used size was swing up and down is because the actual live data set was very low, but the CMSInitiatingOccupancyFraction was set too high so concurrent collections are started too late. If the initiating threshold were set to a smaller value, say 20, then the test case would behave quite reasonably. We'd need another test case to study, because this one doesn't really repro the problem. After we applied the patch to SocketAdaptor [3], we don't see this kind of CMS/finalization problem in production anymore. Should we hit one of these again, I'll try to get more info from our production site and see if I can trace down the real problem. - Kris [1]: https://gist.github.com/1390876#file_.hotspotrc [2]: https://gist.github.com/1390876#file_gc.partial.log [3]: http://mail.openjdk.java.net/pipermail/nio-dev/2011-November/001480.html On Thu, Nov 24, 2011 at 7:11 AM, Srinivas Ramakrishna wrote: > Hi Koji -- > > Thanks for the test case, that should definitely help with the > dentification of the problem. I'll see if > i can find some spare time to pursue it one of these days (but can't > promise), so please > do open that Oracle support ticket to get the requisite resource allocated > for the official > investigation. > > Thanks again for boiling it down to a simple test case, and i'll update if > i identify the > root cause... > > -- ramki > > > On Tue, Nov 22, 2011 at 1:06 PM, Koji Noguchi wrote: > >> This is from an old thread in 2011 April but we're still seeing the same >> problem with (nio) Socket instances not getting collecting by CMS. >> >> Opened http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7113118 >> >> Thanks, >> Koji >> >> >> (From >> >> http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2011-April/subject.htm >> l) >> On 4/25/11 8:37 AM, "Y. Srinivas Ramakrishna" > >> >> wrote: >> > On 4/25/2011 9:10 AM, Bharath Mundlapudi wrote: >> > > Hi Ramki, >> > > >> > > Thanks for the detailed explanation. I was trying to >> > > run some tests for your questions. Here are the answers to some of >> your >> > > questions. >> > > >> > >>> What are the symptoms? >> > > java.net.SocksSocketImpl objects are not getting cleaned up after a >> CMS >> > cycle. I see the direct >> > > correlation to java.lang.ref.Finalizer objects. Overtime, this fills >> up >> > > the old generation and CMS going in loop occupying complete one core. >> > > But when we trigger Full GC, these objects are garbage collected. >> > >> > OK, thanks. >> > >> > > >> > > You >> > > mentioned that CMS cycle does cleanup these objects provided we >> enable >> > > class unloading. Are you suggesting -XX:+ClassUnloading or >> > > -XX:+CMSClassUnloadingEnabled? I have tried with later and >> > > >> > > didn't >> > > succeed. Our pern gen is relatively constant, by enabling this, >> are we >> > > introducing performance overhead? We have room for CPU cycles and >> perm >> > > gen is relatively small, so this may be fine. Just that we want to see >> > > these objects should GC'ed in CMS cycle. >> > > >> > > >> > > Do you have any suggestion w.r.t. to which flags should i be using to >> > trigger this? >> > >> > For the issue you are seeing the -XX:+CMSClassUnloadingFlag will >> > not make a difference in the accumulation of the socket objects >> > because there is no "projection" as far as i can tell of these >> > into the perm gen, esepcially since as you say there is no class >> > loading going on (since your perm gen size remains constant after >> > start-up). >> > >> >> > However, keeping class unloading enabled via this flag should >> > hopefully not have much of an impact on your pause times given that >> > the perm gen is small. The typical effect you will see if class >> > unloading is enabled is that the CMS remark pause times are a bit >> > longer (if you enable PrintGCDetails you will see messages >> > such as "scrub string table" and "scrub symbol table", "code cache" >> > etc. BY comparing the CMS-remark pause details and times with >> > and without enabling class unloading you will get a good idea >> > of its impact. In some cases, eben though you pay a small price >> > in terms of increased CMS-remark pause times, you will make up >> > for that in terms of faster scavenges etc., so it might well >> > be worthwhile. >> > >> > In the very near future, we may end up turning that on >> > by default for CMS because the savings from leaving it off >> > by default are much smaller now and it can often lead to >> > other issues if class unloading is turned off. >> > >> > So bottom line is: it will not affect the accumulation of >> > your socket objects, but it's a good idea to keep class >> > unloading by CMS enabled anyway. >> > >> > > >> > > >> > >>> What does jmap -finalizerinfo on your process show? >> > >>> What does -XX:+PrintClassHistogram show as accumulating in the >> > heap? >> > >>> (Are they one specific type of Finalizer objects or all >> > varieties?) >> > > >> > > Jmap -histo shows the above class is keep accumulating. Infact, >> > > finalizerinfo doesn't show any objects on this process. >> > >> > OK, that shows that the objects are somehow not discovered by >> > CMS as being eligible for finalization. Although one can imagine >> > a one cycle delay (because of floating garbage) with CMS finding >> > these objects to be unreachable and hence eligible for finalization, >> > continuing accumulation of these objects over a period of time >> > (and presumably many CMS cycles) seems strange and almost >> > definitely a CMS bug especially as you find that a full STW >> > gc does indeed reclaim them. >> > >> > > >> > > >> > > >> > >>> Did the problem start in 6u21? Or are those the only versions >> > >>> you tested and found that there was an issue? >> > > We >> > > have seen this problem in 6u21. We were on 6u12 earlier and didn't >> run >> > > into this problem. But can't say this is a build particular, since >> lots >> > > of things have changed. >> > >> > Can you boil down this behavior into a test case that you are able >> > to share with us? >> > If so, please file a bug with the test case >> > and send me the CR id and I'll take a look. >> > >> > Oh, and before you do that, can you please check the latest public >> > release (6u24 or 6u25?) to see if the problem still reproduces? >> > >> > thanks, and sorry I could not be of more help without a bug >> > report or a test case. >> > >> > -- ramki >> > >> > > >> > > Thanks in anticipation, >> > > -Bharath >> > > >> >> >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20111124/5211e23d/attachment.html From ysr1729 at gmail.com Thu Nov 24 13:45:12 2011 From: ysr1729 at gmail.com (Srinivas Ramakrishna) Date: Thu, 24 Nov 2011 13:45:12 -0800 Subject: Is CMS cycle can collect finalize objects In-Reply-To: References: Message-ID: Hi Kris, thanks for running the test case and figuring that out, and saving us further investigation of the submitted test case from Koji. Hopefully you or Koji will be able to find a simple test case that illustrates the real issue. thanks! -- ramki On Thu, Nov 24, 2011 at 1:52 AM, Krystal Mok wrote: > Hi Koji and Ramki, > > I had a look at the repro test case in Bug 7113118. I don't think the test > case is showing the same problem as the original one caused by > SocksSocketImpl objects. The way this test case is behaving is exactly what > the VM arguments told it to do. > > I ran the test case on a 64-bit Linux HotSpot Server VM, on JDK 6 update > 29. > > My .hotspotrc is at [1]. > > SurvivorRatio doesn't need to be set explicitly, because when CMS is in > use and MaxTenuringThreshold is 0 (or AlwaysTenure is true), the > SurvivorRatio will automatically be set to 1024 by ergonomics. > UsePSAdaptiveSurvivorSizePolicy has no effect when CMS is in use, so it's > omitted from my configuration, too. > > By using -XX:+PrintReferenceGC, the gc log will show when and how many > finalizable object are discovered. > > The VM arguments given force all surviving object from minor collections > to be promoted into the old generation, and none of the minor collections > had a chance to discovery any ready-to-be-collected FinalReferences, so > minor GC logs aren't of interest in this case. All of the minor GC log > lines show "[FinalReference, 0 refs, xxx secs]". > > A part of the GC log can be found at [2]. This log shows two CMS > collections cycles, in between dozens of minor collections. > > * Before the first of these two CMS collections, the Java heap used > is 971914K, and then the CMS occupancy threshold is crossed so a CMS > collection cycle starts; > * During the re-mark phase of the first CMS collection, 46400 > FinalReferences were discovered; > * After the first CMS collection, the Java heap used is still high, > at 913771K, because the finalizable objects need another old generation > collection to be collected (either CMS or full GC is fine); > * During the re-mark phase of the second CMS collection, 3000 > FinalReferences were discovered, these are from promoted objects from the > minor collections in between; > * After the second CMS collection, the Java heap used goes down to 61747K, > as the finalizable objects discovered from the first CMS collection are > indeed finalized and then collected during the second CMS collection. > > This behavior looks normal to me -- it's what the VM arguments were > telling the VM to do. > The reason that the Java heap used size was swing up and down is because > the actual live data set was very low, but > the CMSInitiatingOccupancyFraction was set too high so concurrent > collections are started too late. If the initiating threshold were set to a > smaller value, say 20, then the test case would behave quite reasonably. > > We'd need another test case to study, because this one doesn't really > repro the problem. > > After we applied the patch to SocketAdaptor [3], we don't see this kind of > CMS/finalization problem in production anymore. Should we hit one of these > again, I'll try to get more info from our production site and see if I can > trace down the real problem. > > - Kris > > [1]: https://gist.github.com/1390876#file_.hotspotrc > [2]: https://gist.github.com/1390876#file_gc.partial.log > [3]: > http://mail.openjdk.java.net/pipermail/nio-dev/2011-November/001480.html > > > On Thu, Nov 24, 2011 at 7:11 AM, Srinivas Ramakrishna wrote: > >> Hi Koji -- >> >> Thanks for the test case, that should definitely help with the >> dentification of the problem. I'll see if >> i can find some spare time to pursue it one of these days (but can't >> promise), so please >> do open that Oracle support ticket to get the requisite resource >> allocated for the official >> investigation. >> >> Thanks again for boiling it down to a simple test case, and i'll update >> if i identify the >> root cause... >> >> -- ramki >> >> >> On Tue, Nov 22, 2011 at 1:06 PM, Koji Noguchi wrote: >> >>> This is from an old thread in 2011 April but we're still seeing the same >>> problem with (nio) Socket instances not getting collecting by CMS. >>> >>> Opened http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7113118 >>> >>> Thanks, >>> Koji >>> >>> >>> (From >>> >>> http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2011-April/subject.htm >>> l) >>> On 4/25/11 8:37 AM, "Y. Srinivas Ramakrishna" < >>> y.s.ramakrishna at oracle.com>> >>> wrote: >>> > On 4/25/2011 9:10 AM, Bharath Mundlapudi wrote: >>> > > Hi Ramki, >>> > > >>> > > Thanks for the detailed explanation. I was trying to >>> > > run some tests for your questions. Here are the answers to some of >>> your >>> > > questions. >>> > > >>> > >>> What are the symptoms? >>> > > java.net.SocksSocketImpl objects are not getting cleaned up after a >>> CMS >>> > cycle. I see the direct >>> > > correlation to java.lang.ref.Finalizer objects. Overtime, this fills >>> up >>> > > the old generation and CMS going in loop occupying complete one core. >>> > > But when we trigger Full GC, these objects are garbage collected. >>> > >>> > OK, thanks. >>> > >>> > > >>> > > You >>> > > mentioned that CMS cycle does cleanup these objects provided we >>> enable >>> > > class unloading. Are you suggesting -XX:+ClassUnloading or >>> > > -XX:+CMSClassUnloadingEnabled? I have tried with later and >>> > > >>> > > didn't >>> > > succeed. Our pern gen is relatively constant, by enabling this, >>> are we >>> > > introducing performance overhead? We have room for CPU cycles and >>> perm >>> > > gen is relatively small, so this may be fine. Just that we want to >>> see >>> > > these objects should GC'ed in CMS cycle. >>> > > >>> > > >>> > > Do you have any suggestion w.r.t. to which flags should i be using to >>> > trigger this? >>> > >>> > For the issue you are seeing the -XX:+CMSClassUnloadingFlag will >>> > not make a difference in the accumulation of the socket objects >>> > because there is no "projection" as far as i can tell of these >>> > into the perm gen, esepcially since as you say there is no class >>> > loading going on (since your perm gen size remains constant after >>> > start-up). >>> > >>> >>> > However, keeping class unloading enabled via this flag should >>> > hopefully not have much of an impact on your pause times given that >>> > the perm gen is small. The typical effect you will see if class >>> > unloading is enabled is that the CMS remark pause times are a bit >>> > longer (if you enable PrintGCDetails you will see messages >>> > such as "scrub string table" and "scrub symbol table", "code cache" >>> > etc. BY comparing the CMS-remark pause details and times with >>> > and without enabling class unloading you will get a good idea >>> > of its impact. In some cases, eben though you pay a small price >>> > in terms of increased CMS-remark pause times, you will make up >>> > for that in terms of faster scavenges etc., so it might well >>> > be worthwhile. >>> > >>> > In the very near future, we may end up turning that on >>> > by default for CMS because the savings from leaving it off >>> > by default are much smaller now and it can often lead to >>> > other issues if class unloading is turned off. >>> > >>> > So bottom line is: it will not affect the accumulation of >>> > your socket objects, but it's a good idea to keep class >>> > unloading by CMS enabled anyway. >>> > >>> > > >>> > > >>> > >>> What does jmap -finalizerinfo on your process show? >>> > >>> What does -XX:+PrintClassHistogram show as accumulating in the >>> > heap? >>> > >>> (Are they one specific type of Finalizer objects or all >>> > varieties?) >>> > > >>> > > Jmap -histo shows the above class is keep accumulating. Infact, >>> > > finalizerinfo doesn't show any objects on this process. >>> > >>> > OK, that shows that the objects are somehow not discovered by >>> > CMS as being eligible for finalization. Although one can imagine >>> > a one cycle delay (because of floating garbage) with CMS finding >>> > these objects to be unreachable and hence eligible for finalization, >>> > continuing accumulation of these objects over a period of time >>> > (and presumably many CMS cycles) seems strange and almost >>> > definitely a CMS bug especially as you find that a full STW >>> > gc does indeed reclaim them. >>> > >>> > > >>> > > >>> > > >>> > >>> Did the problem start in 6u21? Or are those the only versions >>> > >>> you tested and found that there was an issue? >>> > > We >>> > > have seen this problem in 6u21. We were on 6u12 earlier and didn't >>> run >>> > > into this problem. But can't say this is a build particular, since >>> lots >>> > > of things have changed. >>> > >>> > Can you boil down this behavior into a test case that you are able >>> > to share with us? >>> > If so, please file a bug with the test case >>> > and send me the CR id and I'll take a look. >>> > >>> > Oh, and before you do that, can you please check the latest public >>> > release (6u24 or 6u25?) to see if the problem still reproduces? >>> > >>> > thanks, and sorry I could not be of more help without a bug >>> > report or a test case. >>> > >>> > -- ramki >>> > >>> > > >>> > > Thanks in anticipation, >>> > > -Bharath >>> > > >>> >>> >>> >>> _______________________________________________ >>> hotspot-gc-use mailing list >>> hotspot-gc-use at openjdk.java.net >>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>> >> >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20111124/d68f9df8/attachment.html From fancyerii at gmail.com Mon Nov 28 01:25:04 2011 From: fancyerii at gmail.com (Li Li) Date: Mon, 28 Nov 2011 17:25:04 +0800 Subject: what about Azul's Zing JVM? Message-ID: hi everybody, I read an article today about Azul's Zing JVM. It is said that this jvm is pauseless. In my application, our machine is about 48GB and about 25GB memory is given to jvm(by -Xmx). But it will occasionally pause 1-2 seconds. So when I saw this, I want to know whether it's so good as they say. And I googled and found a related question in stackoverflow: http://stackoverflow.com/questions/4491260/explanation-of-azuls-pauseless-garbage-collector after reading, I am still confusing. Anyone would give more detail explanations about it? thanks -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20111128/328ae759/attachment.html From fweimer at bfk.de Mon Nov 28 01:33:11 2011 From: fweimer at bfk.de (Florian Weimer) Date: Mon, 28 Nov 2011 09:33:11 +0000 Subject: what about Azul's Zing JVM? In-Reply-To: (Li Li's message of "Mon, 28 Nov 2011 17:25:04 +0800") References: Message-ID: <82mxbgvbfs.fsf@mid.bfk.de> * Li Li: > after reading, I am still confusing. Anyone would give more detail > explanations about it? thanks I think you should ask on one of Azul's mailing lists. As far as I understand it, the MRI VM has a similar garbage collector, and source code has been published, so you could have a look at it and ask on the MRI mailing list (but I don't know if it is still active). -- Florian Weimer BFK edv-consulting GmbH http://www.bfk.de/ Kriegsstra?e 100 tel: +49-721-96201-1 D-76133 Karlsruhe fax: +49-721-96201-99 From vitalyd at gmail.com Mon Nov 28 06:10:47 2011 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Mon, 28 Nov 2011 09:10:47 -0500 Subject: what about Azul's Zing JVM? In-Reply-To: References: Message-ID: As a gross oversimplification their GC is concurrent to mutator (java) threads but is mostly pauseless (they still pause at times but only very briefly) because they use read barriers. This means that if a mutator thread reads memory that's been relocated, they trap this condition at read time, fix up the pointer (mutator does this itself), and continue on. Last I heard this approach required azul's os support for bulk in/mapping of pagetable entries, and required a Linux patch for x86 to do the same (but it wasn't accepted into mainline kernel). What's interesting is whether hotspot has any plans to do something similar? On Nov 28, 2011 4:27 AM, "Li Li" wrote: > hi everybody, > I read an article today about Azul's Zing JVM. It is said that this > jvm is pauseless. > In my application, our machine is about 48GB and about 25GB memory is > given to jvm(by -Xmx). But it will occasionally pause 1-2 seconds. > So when I saw this, I want to know whether it's so good as they say. > And I googled and found a related question in stackoverflow: > http://stackoverflow.com/questions/4491260/explanation-of-azuls-pauseless-garbage-collector > after reading, I am still confusing. Anyone would give more detail > explanations about it? thanks > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20111128/e968e276/attachment.html From gaberger at cisco.com Mon Nov 28 06:54:35 2011 From: gaberger at cisco.com (Gary Berger) Date: Mon, 28 Nov 2011 09:54:35 -0500 Subject: what about Azul's Zing JVM? In-Reply-To: Message-ID: Yes, Zing is a concurrent mark-and-compact GC, compaction being the somewhat harder. Originally the C4 collector ran on custom Vega processors and than leveraged hypervisor based memory mapping features (EPT) to flip pages, now they have figured out how to do it on bare Linux with a kernel module.. Would be great to get these changes in the upstream kernel. Gil gave a great talk about Zing 5 at QCON http://bit.ly/vN5xQ0 .:|:.:|:. Gary Berger | Architect, Office of the CTO, DSSG | Cisco Systems| One Penn Plaza | New York, NY 10119 | Phone: 917.288.8691 From: Vitaly Davidovich Date: Mon, 28 Nov 2011 09:10:47 -0500 To: Li Li Cc: hotspot-gc-use Subject: Re: what about Azul's Zing JVM? As a gross oversimplification their GC is concurrent to mutator (java) threads but is mostly pauseless (they still pause at times but only very briefly) because they use read barriers. This means that if a mutator thread reads memory that's been relocated, they trap this condition at read time, fix up the pointer (mutator does this itself), and continue on. Last I heard this approach required azul's os support for bulk in/mapping of pagetable entries, and required a Linux patch for x86 to do the same (but it wasn't accepted into mainline kernel). What's interesting is whether hotspot has any plans to do something similar? On Nov 28, 2011 4:27 AM, "Li Li" wrote: > hi everybody, > I read an article today about Azul's Zing JVM. It is said that this jvm is > pauseless. > In my application, our machine is about 48GB and about 25GB memory is > given to jvm(by -Xmx). But it will occasionally pause 1-2 seconds. > So when I saw this, I want to know whether it's so good as they say. And I > googled and found a related question in stackoverflow: > http://stackoverflow.com/questions/4491260/explanation-of-azuls-pauseless-garb > age-collector > after reading, I am still confusing. Anyone would give more detail > explanations about it? thanks > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > _______________________________________________ hotspot-gc-use mailing list hotspot-gc-use at openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20111128/df17bcc1/attachment.html From kirk at kodewerk.com Wed Nov 23 23:28:11 2011 From: kirk at kodewerk.com (Charles K Pepperdine) Date: Thu, 24 Nov 2011 08:28:11 +0100 Subject: Is CMS cycle can collect finalize objects In-Reply-To: <4ECCF5B1.7040408@oracle.com> References: <4ECCF5B1.7040408@oracle.com> Message-ID: <55A131FE-6724-4C1A-B535-7CEA69751B26@kodewerk.com> Hi Jon, If I can solve the problem locally, what is the chance of getting it into a build? Regards, Kirk On Nov 23, 2011, at 2:31 PM, Jon Masamitsu wrote: > Koji, > > There is no engineer assigned to this CR and no progress has been > made on it as far as I can tell. I'd suggest you pursue this through > your Oracle support contacts. > > Jon > > > > On 11/22/2011 1:06 PM, Koji Noguchi wrote: >> This is from an old thread in 2011 April but we're still seeing the same >> problem with (nio) Socket instances not getting collecting by CMS. >> >> Opened http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7113118 >> >> Thanks, >> Koji >> >> >> (From >> http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2011-April/subject.htm >> l) >> On 4/25/11 8:37 AM, "Y. Srinivas Ramakrishna"> >> wrote: >>> On 4/25/2011 9:10 AM, Bharath Mundlapudi wrote: >>>> Hi Ramki, >>>> >>>> Thanks for the detailed explanation. I was trying to >>>> run some tests for your questions. Here are the answers to some of your >>>> questions. >>>> >>>>>> What are the symptoms? >>>> java.net.SocksSocketImpl objects are not getting cleaned up after a CMS >>> cycle. I see the direct >>>> correlation to java.lang.ref.Finalizer objects. Overtime, this fills up >>>> the old generation and CMS going in loop occupying complete one core. >>>> But when we trigger Full GC, these objects are garbage collected. >>> OK, thanks. >>> >>>> You >>>> mentioned that CMS cycle does cleanup these objects provided we enable >>>> class unloading. Are you suggesting -XX:+ClassUnloading or >>>> -XX:+CMSClassUnloadingEnabled? I have tried with later and >>>> >>>> didn't >>>> succeed. Our pern gen is relatively constant, by enabling this, are we >>>> introducing performance overhead? We have room for CPU cycles and perm >>>> gen is relatively small, so this may be fine. Just that we want to see >>>> these objects should GC'ed in CMS cycle. >>>> >>>> >>>> Do you have any suggestion w.r.t. to which flags should i be using to >>> trigger this? >>> >>> For the issue you are seeing the -XX:+CMSClassUnloadingFlag will >>> not make a difference in the accumulation of the socket objects >>> because there is no "projection" as far as i can tell of these >>> into the perm gen, esepcially since as you say there is no class >>> loading going on (since your perm gen size remains constant after >>> start-up). >>> >>> However, keeping class unloading enabled via this flag should >>> hopefully not have much of an impact on your pause times given that >>> the perm gen is small. The typical effect you will see if class >>> unloading is enabled is that the CMS remark pause times are a bit >>> longer (if you enable PrintGCDetails you will see messages >>> such as "scrub string table" and "scrub symbol table", "code cache" >>> etc. BY comparing the CMS-remark pause details and times with >>> and without enabling class unloading you will get a good idea >>> of its impact. In some cases, eben though you pay a small price >>> in terms of increased CMS-remark pause times, you will make up >>> for that in terms of faster scavenges etc., so it might well >>> be worthwhile. >>> >>> In the very near future, we may end up turning that on >>> by default for CMS because the savings from leaving it off >>> by default are much smaller now and it can often lead to >>> other issues if class unloading is turned off. >>> >>> So bottom line is: it will not affect the accumulation of >>> your socket objects, but it's a good idea to keep class >>> unloading by CMS enabled anyway. >>> >>>> >>>>>> What does jmap -finalizerinfo on your process show? >>>>>> What does -XX:+PrintClassHistogram show as accumulating in the >>> heap? >>>>>> (Are they one specific type of Finalizer objects or all >>> varieties?) >>>> Jmap -histo shows the above class is keep accumulating. Infact, >>>> finalizerinfo doesn't show any objects on this process. >>> OK, that shows that the objects are somehow not discovered by >>> CMS as being eligible for finalization. Although one can imagine >>> a one cycle delay (because of floating garbage) with CMS finding >>> these objects to be unreachable and hence eligible for finalization, >>> continuing accumulation of these objects over a period of time >>> (and presumably many CMS cycles) seems strange and almost >>> definitely a CMS bug especially as you find that a full STW >>> gc does indeed reclaim them. >>> >>>> >>>> >>>>>> Did the problem start in 6u21? Or are those the only versions >>>>>> you tested and found that there was an issue? >>>> We >>>> have seen this problem in 6u21. We were on 6u12 earlier and didn't run >>>> into this problem. But can't say this is a build particular, since lots >>>> of things have changed. >>> Can you boil down this behavior into a test case that you are able >>> to share with us? >>> If so, please file a bug with the test case >>> and send me the CR id and I'll take a look. >>> >>> Oh, and before you do that, can you please check the latest public >>> release (6u24 or 6u25?) to see if the problem still reproduces? >>> >>> thanks, and sorry I could not be of more help without a bug >>> report or a test case. >>> >>> -- ramki >>> >>>> Thanks in anticipation, >>>> -Bharath >>>> >> >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From java at java4.info Mon Nov 28 11:18:17 2011 From: java at java4.info (Florian Binder) Date: Mon, 28 Nov 2011 20:18:17 +0100 Subject: G1 discovers same garbage again? Message-ID: <4ED3DE79.5050801@java4.info> Hi everybody, I have a java application with 20gb (large-table) memory and using the g1 garbage collector. The application calculates the whole time with 10 threads some ratios (high cpu load). This is done without producing any garbage. About two times a minute a request is sent which produce a littlebit garbage. Since we are working with realtime data we are interested in very short stop-the-world pauses. Therefore we have used the CMS gc in the past until we have got problems with fragmentation now. Therefore I am trying the g1. This seemed to work very well at first. The stw-pauses were, except the cleanup pause, very short. This yields me to my first question: Is this normal and are there any parameters to influence the cleanup-process? I thought this phase should be short because there is just finished the counting, the role of the bitmaps is switched and the next possible garbage regions are detemined. All things, which should be very fast. So what is taking the time? The second cause for my email is the crazy behaviour after a few hours: After the startup of the server it uses about 13.5 gb old-gen memory and generates very slowly eden-garbage. Since the new allocated memory is mostly garbage the (young) garbage collections are very fast and g1 decides to grow up the eden space. This works 4 times until eden space has more than about 3.5 gb memory. After this the gc is making much more collections and while the collections it discovers new garbage (probably the old one again). Eden memory usage jumps between 0 and 3.5gb even though I am sure the java-application is not making more than before. I assume that it runs during a collection in the old garbage and collects it again. Is this possible? Or is there an overflow since eden space uses more than 3.5 gb? Thanks and regards, Flo Some useful information: $ java -version java version "1.6.0_29" Java(TM) SE Runtime Environment (build 1.6.0_29-b11) Java HotSpot(TM) 64-Bit Server VM (build 20.4-b02, mixed mode) Startup Parameters: -Xms20g -Xmx20g -verbose:gc \ -XX:+UnlockExperimentalVMOptions \ -XX:+UseG1GC \ -XX:+PrintGCDetails \ -XX:+PrintGCDateStamps \ -XX:+UseLargePages \ -XX:+PrintFlagsFinal \ -XX:-TraceClassUnloading \ $ cat /proc/meminfo | grep Huge HugePages_Total: 11264 HugePages_Free: 1015 HugePages_Rsvd: 32 Hugepagesize: 2048 kB A few screen-shots of the jconsole memory-view: http://java4.info/g1/1h.png http://java4.info/g1/all.png http://java4.info/g1/eden_1h.png http://java4.info/g1/eden_all.png http://java4.info/g1/oldgen_all.png The sysout end syserr logfile with the gc logging and PrintFinalFlags: http://java4.info/g1/out_err.log.gz From tony.printezis at oracle.com Tue Nov 29 09:29:22 2011 From: tony.printezis at oracle.com (Tony Printezis) Date: Tue, 29 Nov 2011 12:29:22 -0500 Subject: G1 discovers same garbage again? In-Reply-To: <4ED3DE79.5050801@java4.info> References: <4ED3DE79.5050801@java4.info> Message-ID: <4ED51672.6030105@oracle.com> Hi Florian, See inline. On 11/28/2011 2:18 PM, Florian Binder wrote: > Hi everybody, > > I have a java application with 20gb (large-table) memory and using the > g1 garbage collector. Quick clarification: I saw that you use a 20G heap from the parameters you showed below. Do you know what's your live data size? > The application calculates the whole time with 10 threads some ratios > (high cpu load). This is done without producing any garbage. About two > times a minute a request is sent which produce a littlebit garbage. > Since we are working with realtime data we are interested in very short > stop-the-world pauses. Therefore we have used the CMS gc in the past > until we have got problems with fragmentation now. Since you don't produce much garbage how come you have fragmentation? Do you keep the results for all the requests you serve? > Therefore I am trying the g1. > > This seemed to work very well at first. The stw-pauses were, except the > cleanup pause, Out of curiosity: how long are the cleanup pauses? > very short. This yields me to my first question: > Is this normal and are there any parameters to influence the > cleanup-process? I don't think there's much you can do in the app to influence the cleanup duration. During this pause we do some, ahem, cleanup of our data structures and for large heaps I have also seen the cleanup pauses to take longer than I thought they would take. I know this is not going to help you in the short term but we have plans to do the cleanup work concurrently (or at least mostly-concurrently) in the future. > I thought this phase should be short because there is > just finished the counting, the role of the bitmaps is switched and the > next possible garbage regions are detemined. All things, which should be > very fast. So what is taking the time? Most likely, the remembered set scrubbing phase... > The second cause for my email is the crazy behaviour after a few hours: > After the startup of the server it uses about 13.5 gb old-gen memory and > generates very slowly eden-garbage. Since the new allocated memory is > mostly garbage the (young) garbage collections are very fast and g1 > decides to grow up the eden space. This works 4 times until eden space > has more than about 3.5 gb memory. After this the gc is making much more > collections and while the collections it discovers new garbage (probably > the old one again). I'm not quite sure what you mean by "it discovers new garbage". For young GCs, G1 (and our other GCs) will reclaim any young objects that will discover to be dead (more accurately: that it will not discover to be live). > Eden memory usage jumps between 0 and 3.5gb even > though I am sure the java-application is not making more than before. Well, that's not good. :-) Can you try to explicitly set the young gen size with -Xmn3g say, to see what happens? Tony > I > assume that it runs during a collection in the old garbage and collects > it again. Is this possible? Or is there an overflow since eden space > uses more than 3.5 gb? > > Thanks and regards, > Flo > > Some useful information: > $ java -version > java version "1.6.0_29" > Java(TM) SE Runtime Environment (build 1.6.0_29-b11) > Java HotSpot(TM) 64-Bit Server VM (build 20.4-b02, mixed mode) > > Startup Parameters: > -Xms20g -Xmx20g > -verbose:gc \ > -XX:+UnlockExperimentalVMOptions \ > -XX:+UseG1GC \ > -XX:+PrintGCDetails \ > -XX:+PrintGCDateStamps \ > -XX:+UseLargePages \ > -XX:+PrintFlagsFinal \ > -XX:-TraceClassUnloading \ > > $ cat /proc/meminfo | grep Huge > HugePages_Total: 11264 > HugePages_Free: 1015 > HugePages_Rsvd: 32 > Hugepagesize: 2048 kB > > A few screen-shots of the jconsole memory-view: > http://java4.info/g1/1h.png > http://java4.info/g1/all.png > http://java4.info/g1/eden_1h.png > http://java4.info/g1/eden_all.png > http://java4.info/g1/oldgen_all.png > > The sysout end syserr logfile with the gc logging and PrintFinalFlags: > http://java4.info/g1/out_err.log.gz > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From knoguchi at yahoo-inc.com Tue Nov 29 14:06:42 2011 From: knoguchi at yahoo-inc.com (Koji Noguchi) Date: Tue, 29 Nov 2011 14:06:42 -0800 Subject: Is CMS cycle can collect finalize objects In-Reply-To: Message-ID: Thanks Krystal for your update. I don?t know why I?m getting a different result than yours. > * After the second CMS collection, the Java heap used goes down to 61747K, > In my case, it stays above 800MBytes... Attached is the memory footprint with CMS(-XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=95 -XX:+UseConcMarkSweepGC) and without(fullgc). It was interesting to see 1. FullGC case eventually stabilizing to having each FullGC releasing half of the heap (500M) due to finalizer requiring two GCs. 2. CMS case still stayed above 800M but there were a few times when memory footprint dropped. In any cases, I?m pretty sure your SocketAdaptor [3] patch would workaround the CMS issue I?m facing. So this is no longer urgent to me as long as that change gets into a future java version. Thanks again for all your inputs. Koji On 11/24/11 1:45 PM, "Srinivas Ramakrishna" wrote: Hi Kris, thanks for running the test case and figuring that out, and saving us further investigation of the submitted test case from Koji. Hopefully you or Koji will be able to find a simple test case that illustrates the real issue. thanks! -- ramki On Thu, Nov 24, 2011 at 1:52 AM, Krystal Mok wrote: Hi Koji and Ramki, I had a look at the repro test case in Bug 7113118. I don't think the test case is showing the same problem as the original one caused by SocksSocketImpl objects. The way this test case is behaving is exactly what the VM arguments told it to do. I ran the test case on a 64-bit Linux HotSpot Server VM, on JDK 6 update 29. My .hotspotrc is at [1]. SurvivorRatio doesn't need to be set explicitly, because when CMS is in use and MaxTenuringThreshold is 0 (or AlwaysTenure is true), the SurvivorRatio will automatically be set to 1024 by ergonomics. UsePSAdaptiveSurvivorSizePolicy has no effect when CMS is in use, so it's omitted from my configuration, too. By using -XX:+PrintReferenceGC, the gc log will show when and how many finalizable object are discovered. The VM arguments given force all surviving object from minor collections to be promoted into the old generation, and none of the minor collections had a chance to discovery any ready-to-be-collected FinalReferences, so minor GC logs aren't of interest in this case. All of the minor GC log lines show "[FinalReference, 0 refs, xxx secs]". A part of the GC log can be found at [2]. This log shows two CMS collections cycles, in between dozens of minor collections. * Before the first of these two CMS collections, the Java heap used is 971914K, and then the CMS occupancy threshold is crossed so a CMS collection cycle starts; * During the re-mark phase of the first CMS collection, 46400 FinalReferences were discovered; * After the first CMS collection, the Java heap used is still high, at 913771K, because the finalizable objects need another old generation collection to be collected (either CMS or full GC is fine); * During the re-mark phase of the second CMS collection, 3000 FinalReferences were discovered, these are from promoted objects from the minor collections in between; * After the second CMS collection, the Java heap used goes down to 61747K, as the finalizable objects discovered from the first CMS collection are indeed finalized and then collected during the second CMS collection. This behavior looks normal to me -- it's what the VM arguments were telling the VM to do. The reason that the Java heap used size was swing up and down is because the actual live data set was very low, but the CMSInitiatingOccupancyFraction was set too high so concurrent collections are started too late. If the initiating threshold were set to a smaller value, say 20, then the test case would behave quite reasonably. We'd need another test case to study, because this one doesn't really repro the problem. After we applied the patch to SocketAdaptor [3], we don't see this kind of CMS/finalization problem in production anymore. Should we hit one of these again, I'll try to get more info from our production site and see if I can trace down the real problem. - Kris [1]: https://gist.github.com/1390876#file_.hotspotrc [2]: https://gist.github.com/1390876#file_gc.partial.log [3]: http://mail.openjdk.java.net/pipermail/nio-dev/2011-November/001480.html On Thu, Nov 24, 2011 at 7:11 AM, Srinivas Ramakrishna wrote: Hi Koji -- Thanks for the test case, that should definitely help with the dentification of the problem. I'll see if i can find some spare time to pursue it one of these days (but can't promise), so please do open that Oracle support ticket to get the requisite resource allocated for the official investigation. Thanks again for boiling it down to a simple test case, and i'll update if i identify the root cause... -- ramki On Tue, Nov 22, 2011 at 1:06 PM, Koji Noguchi wrote: This is from an old thread in 2011 April but we're still seeing the same problem with (nio) Socket instances not getting collecting by CMS. Opened http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7113118 Thanks, Koji (From http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2011-April/subject.htm l) On 4/25/11 8:37 AM, "Y. Srinivas Ramakrishna" > wrote: > On 4/25/2011 9:10 AM, Bharath Mundlapudi wrote: > > Hi Ramki, > > > > Thanks for the detailed explanation. I was trying to > > run some tests for your questions. Here are the answers to some of your > > questions. > > > >>> What are the symptoms? > > java.net.SocksSocketImpl objects are not getting cleaned up after a CMS > cycle. I see the direct > > correlation to java.lang.ref.Finalizer objects. Overtime, this fills up > > the old generation and CMS going in loop occupying complete one core. > > But when we trigger Full GC, these objects are garbage collected. > > OK, thanks. > > > > > You > > mentioned that CMS cycle does cleanup these objects provided we enable > > class unloading. Are you suggesting -XX:+ClassUnloading or > > -XX:+CMSClassUnloadingEnabled? I have tried with later and > > > > didn't > > succeed. Our pern gen is relatively constant, by enabling this, are we > > introducing performance overhead? We have room for CPU cycles and perm > > gen is relatively small, so this may be fine. Just that we want to see > > these objects should GC'ed in CMS cycle. > > > > > > Do you have any suggestion w.r.t. to which flags should i be using to > trigger this? > > For the issue you are seeing the -XX:+CMSClassUnloadingFlag will > not make a difference in the accumulation of the socket objects > because there is no "projection" as far as i can tell of these > into the perm gen, esepcially since as you say there is no class > loading going on (since your perm gen size remains constant after > start-up). > > However, keeping class unloading enabled via this flag should > hopefully not have much of an impact on your pause times given that > the perm gen is small. The typical effect you will see if class > unloading is enabled is that the CMS remark pause times are a bit > longer (if you enable PrintGCDetails you will see messages > such as "scrub string table" and "scrub symbol table", "code cache" > etc. BY comparing the CMS-remark pause details and times with > and without enabling class unloading you will get a good idea > of its impact. In some cases, eben though you pay a small price > in terms of increased CMS-remark pause times, you will make up > for that in terms of faster scavenges etc., so it might well > be worthwhile. > > In the very near future, we may end up turning that on > by default for CMS because the savings from leaving it off > by default are much smaller now and it can often lead to > other issues if class unloading is turned off. > > So bottom line is: it will not affect the accumulation of > your socket objects, but it's a good idea to keep class > unloading by CMS enabled anyway. > > > > > > >>> What does jmap -finalizerinfo on your process show? > >>> What does -XX:+PrintClassHistogram show as accumulating in the > heap? > >>> (Are they one specific type of Finalizer objects or all > varieties?) > > > > Jmap -histo shows the above class is keep accumulating. Infact, > > finalizerinfo doesn't show any objects on this process. > > OK, that shows that the objects are somehow not discovered by > CMS as being eligible for finalization. Although one can imagine > a one cycle delay (because of floating garbage) with CMS finding > these objects to be unreachable and hence eligible for finalization, > continuing accumulation of these objects over a period of time > (and presumably many CMS cycles) seems strange and almost > definitely a CMS bug especially as you find that a full STW > gc does indeed reclaim them. > > > > > > > > >>> Did the problem start in 6u21? Or are those the only versions > >>> you tested and found that there was an issue? > > We > > have seen this problem in 6u21. We were on 6u12 earlier and didn't run > > into this problem. But can't say this is a build particular, since lots > > of things have changed. > > Can you boil down this behavior into a test case that you are able > to share with us? > If so, please file a bug with the test case > and send me the CR id and I'll take a look. > > Oh, and before you do that, can you please check the latest public > release (6u24 or 6u25?) to see if the problem still reproduces? > > thanks, and sorry I could not be of more help without a bug > report or a test case. > > -- ramki > > > > > Thanks in anticipation, > > -Bharath > > _______________________________________________ hotspot-gc-use mailing list hotspot-gc-use at openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use _______________________________________________ hotspot-gc-use mailing list hotspot-gc-use at openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20111129/edb6e31c/attachment-0001.html -------------- next part -------------- A non-text attachment was scrubbed... Name: cmsAndFullGCTesting.png Type: application/octet-stream Size: 46276 bytes Desc: cmsAndFullGCTesting.png Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20111129/edb6e31c/cmsAndFullGCTesting-0001.png From ysr1729 at gmail.com Tue Nov 29 22:16:41 2011 From: ysr1729 at gmail.com (Srinivas Ramakrishna) Date: Tue, 29 Nov 2011 22:16:41 -0800 Subject: Is CMS cycle can collect finalize objects In-Reply-To: References: Message-ID: Who knows, may be this is related to the other CMS CR that Stefan just sent out a review request for. If I understand correctly then, the behaviour should be good if you turn off parallel marking in CMS, viz -XX:-CMSConcurrentMTEnabled (or whatever the flag is called now). Are you able to check that? Adding Stefan to the cc, just in case. -- ramki On Tue, Nov 29, 2011 at 2:06 PM, Koji Noguchi wrote: > Thanks Krystal for your update. > > I don?t know why I?m getting a different result than yours. > > > > * After the second CMS collection, the Java heap used goes down > to 61747K, > > > In my case, it stays above 800MBytes... > > Attached is the memory footprint with > CMS(-XX:+UseCMSInitiatingOccupancyOnly > -XX:CMSInitiatingOccupancyFraction=95 -XX:+UseConcMarkSweepGC) and > without(fullgc). > > It was interesting to see > > 1. FullGC case eventually stabilizing to having each FullGC releasing > half of the heap (500M) due to finalizer requiring two GCs. > 2. CMS case still stayed above 800M but there were a few times when > memory footprint dropped. > > > In any cases, I?m pretty sure your SocketAdaptor [3] patch would > workaround the CMS issue I?m facing. So this is no longer urgent to me as > long as that change gets into a future java version. > > Thanks again for all your inputs. > > Koji > > > On 11/24/11 1:45 PM, "Srinivas Ramakrishna" wrote: > > Hi Kris, thanks for running the test case and figuring that out, and > saving us further investigation of > the submitted test case from Koji. > > Hopefully you or Koji will be able to find a simple test case that > illustrates the real issue. > > thanks! > -- ramki > > > On Thu, Nov 24, 2011 at 1:52 AM, Krystal Mok > wrote: > > Hi Koji and Ramki, > > I had a look at the repro test case in Bug 7113118. I don't think the test > case is showing the same problem as the original one caused by > SocksSocketImpl objects. The way this test case is behaving is exactly what > the VM arguments told it to do. > > I ran the test case on a 64-bit Linux HotSpot Server VM, on JDK 6 update > 29. > > My .hotspotrc is at [1]. > > SurvivorRatio doesn't need to be set explicitly, because when CMS is in > use and MaxTenuringThreshold is 0 (or AlwaysTenure is true), the > SurvivorRatio will automatically be set to 1024 by ergonomics. > UsePSAdaptiveSurvivorSizePolicy has no effect when CMS is in use, so it's > omitted from my configuration, too. > > By using -XX:+PrintReferenceGC, the gc log will show when and how many > finalizable object are discovered. > > The VM arguments given force all surviving object from minor collections > to be promoted into the old generation, and none of the minor collections > had a chance to discovery any ready-to-be-collected FinalReferences, so > minor GC logs aren't of interest in this case. All of the minor GC log > lines show "[FinalReference, 0 refs, xxx secs]". > > A part of the GC log can be found at [2]. This log shows two CMS > collections cycles, in between dozens of minor collections. > > * Before the first of these two CMS collections, the Java heap used > is 971914K, and then the CMS occupancy threshold is crossed so a CMS > collection cycle starts; > * During the re-mark phase of the first CMS collection, 46400 > FinalReferences were discovered; > * After the first CMS collection, the Java heap used is still high, > at 913771K, because the finalizable objects need another old generation > collection to be collected (either CMS or full GC is fine); > * During the re-mark phase of the second CMS collection, 3000 > FinalReferences were discovered, these are from promoted objects from the > minor collections in between; > * After the second CMS collection, the Java heap used goes down to 61747K, > as the finalizable objects discovered from the first CMS collection are > indeed finalized and then collected during the second CMS collection. > > This behavior looks normal to me -- it's what the VM arguments were > telling the VM to do. > The reason that the Java heap used size was swing up and down is because > the actual live data set was very low, but > the CMSInitiatingOccupancyFraction was set too high so concurrent > collections are started too late. If the initiating threshold were set to a > smaller value, say 20, then the test case would behave quite reasonably. > > We'd need another test case to study, because this one doesn't really > repro the problem. > > After we applied the patch to SocketAdaptor [3], we don't see this kind of > CMS/finalization problem in production anymore. Should we hit one of these > again, I'll try to get more info from our production site and see if I can > trace down the real problem. > > - Kris > > [1]: https://gist.github.com/1390876#file_.hotspotrc > [2]: https://gist.github.com/1390876#file_gc.partial.log > [3]: > http://mail.openjdk.java.net/pipermail/nio-dev/2011-November/001480.html > > > On Thu, Nov 24, 2011 at 7:11 AM, Srinivas Ramakrishna > wrote: > > Hi Koji -- > > Thanks for the test case, that should definitely help with the > dentification of the problem. I'll see if > i can find some spare time to pursue it one of these days (but can't > promise), so please > do open that Oracle support ticket to get the requisite resource allocated > for the official > investigation. > > Thanks again for boiling it down to a simple test case, and i'll update if > i identify the > root cause... > > -- ramki > > > On Tue, Nov 22, 2011 at 1:06 PM, Koji Noguchi > wrote: > > This is from an old thread in 2011 April but we're still seeing the same > problem with (nio) Socket instances not getting collecting by CMS. > > Opened http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7113118 > > Thanks, > Koji > > > (From > > http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2011-April/subject.htm > l) > On 4/25/11 8:37 AM, "Y. Srinivas Ramakrishna" > > > wrote: > > On 4/25/2011 9:10 AM, Bharath Mundlapudi wrote: > > > Hi Ramki, > > > > > > Thanks for the detailed explanation. I was trying to > > > run some tests for your questions. Here are the answers to some of your > > > questions. > > > > > >>> What are the symptoms? > > > java.net.SocksSocketImpl objects are not getting cleaned up after a CMS > > cycle. I see the direct > > > correlation to java.lang.ref.Finalizer objects. Overtime, this fills up > > > the old generation and CMS going in loop occupying complete one core. > > > But when we trigger Full GC, these objects are garbage collected. > > > > OK, thanks. > > > > > > > > You > > > mentioned that CMS cycle does cleanup these objects provided we > enable > > > class unloading. Are you suggesting -XX:+ClassUnloading or > > > -XX:+CMSClassUnloadingEnabled? I have tried with later and > > > > > > didn't > > > succeed. Our pern gen is relatively constant, by enabling this, are > we > > > introducing performance overhead? We have room for CPU cycles and > perm > > > gen is relatively small, so this may be fine. Just that we want to see > > > these objects should GC'ed in CMS cycle. > > > > > > > > > Do you have any suggestion w.r.t. to which flags should i be using to > > trigger this? > > > > For the issue you are seeing the -XX:+CMSClassUnloadingFlag will > > not make a difference in the accumulation of the socket objects > > because there is no "projection" as far as i can tell of these > > into the perm gen, esepcially since as you say there is no class > > loading going on (since your perm gen size remains constant after > > start-up). > > > > > However, keeping class unloading enabled via this flag should > > hopefully not have much of an impact on your pause times given that > > the perm gen is small. The typical effect you will see if class > > unloading is enabled is that the CMS remark pause times are a bit > > longer (if you enable PrintGCDetails you will see messages > > such as "scrub string table" and "scrub symbol table", "code cache" > > etc. BY comparing the CMS-remark pause details and times with > > and without enabling class unloading you will get a good idea > > of its impact. In some cases, eben though you pay a small price > > in terms of increased CMS-remark pause times, you will make up > > for that in terms of faster scavenges etc., so it might well > > be worthwhile. > > > > In the very near future, we may end up turning that on > > by default for CMS because the savings from leaving it off > > by default are much smaller now and it can often lead to > > other issues if class unloading is turned off. > > > > So bottom line is: it will not affect the accumulation of > > your socket objects, but it's a good idea to keep class > > unloading by CMS enabled anyway. > > > > > > > > > > >>> What does jmap -finalizerinfo on your process show? > > >>> What does -XX:+PrintClassHistogram show as accumulating in the > > heap? > > >>> (Are they one specific type of Finalizer objects or all > > varieties?) > > > > > > Jmap -histo shows the above class is keep accumulating. Infact, > > > finalizerinfo doesn't show any objects on this process. > > > > OK, that shows that the objects are somehow not discovered by > > CMS as being eligible for finalization. Although one can imagine > > a one cycle delay (because of floating garbage) with CMS finding > > these objects to be unreachable and hence eligible for finalization, > > continuing accumulation of these objects over a period of time > > (and presumably many CMS cycles) seems strange and almost > > definitely a CMS bug especially as you find that a full STW > > gc does indeed reclaim them. > > > > > > > > > > > > > >>> Did the problem start in 6u21? Or are those the only versions > > >>> you tested and found that there was an issue? > > > We > > > have seen this problem in 6u21. We were on 6u12 earlier and didn't > run > > > into this problem. But can't say this is a build particular, since lots > > > of things have changed. > > > > Can you boil down this behavior into a test case that you are able > > to share with us? > > If so, please file a bug with the test case > > and send me the CR id and I'll take a look. > > > > Oh, and before you do that, can you please check the latest public > > release (6u24 or 6u25?) to see if the problem still reproduces? > > > > thanks, and sorry I could not be of more help without a bug > > report or a test case. > > > > -- ramki > > > > > > > > Thanks in anticipation, > > > -Bharath > > > > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20111129/10bd293b/attachment.html From java at java4.info Wed Nov 30 00:35:45 2011 From: java at java4.info (Florian Binder) Date: Wed, 30 Nov 2011 09:35:45 +0100 Subject: G1 discovers same garbage again? In-Reply-To: <4ED51672.6030105@oracle.com> References: <4ED3DE79.5050801@java4.info> <4ED51672.6030105@oracle.com> Message-ID: <4ED5EAE1.2020102@java4.info> Hi Tony, first of all thank you for your answere. See inline. Am 29.11.2011 18:29, schrieb Tony Printezis: > Hi Florian, > > See inline. > > On 11/28/2011 2:18 PM, Florian Binder wrote: >> Hi everybody, >> >> I have a java application with 20gb (large-table) memory and using the >> g1 garbage collector. > > Quick clarification: I saw that you use a 20G heap from the parameters > you showed below. Do you know what's your live data size? At this time I have about 14gb live data but it is growing day by day. > >> The application calculates the whole time with 10 threads some ratios >> (high cpu load). This is done without producing any garbage. About two >> times a minute a request is sent which produce a littlebit garbage. >> Since we are working with realtime data we are interested in very short >> stop-the-world pauses. Therefore we have used the CMS gc in the past >> until we have got problems with fragmentation now. > > Since you don't produce much garbage how come you have fragmentation? > Do you keep the results for all the requests you serve? This data is hold one day and every night it is droped and reinitialized. We have a lot of different server with big memory and have had problems with fragmentation on few of them. This was the cause I am experiencing with g1 in general. I am not sure if we had fragmentation on this one. Today I tried the g1 with another server which surely have had a problem with fragmented heap, but this one did not start wit g1. I got several different exceptions (NoClassDefFound, NullPointerException or even a jvm-crash ;-)). But I think I will write you another email especially for this, because it is started with a lot of special parameters (e.g. -Xms39G -Xmx39G -XX:+UseCompressedOops -XX:ObjectAlignmentInBytes=16 -XX:+UseLargePages). > >> Therefore I am trying the g1. >> >> This seemed to work very well at first. The stw-pauses were, except the >> cleanup pause, > > Out of curiosity: how long are the cleanup pauses? I think they were about 150ms. This is acceptable for me, but in proportion to the garbage-collection of 30ms it is very long and therefore I was wondering. > >> very short. This yields me to my first question: >> Is this normal and are there any parameters to influence the >> cleanup-process? > > I don't think there's much you can do in the app to influence the > cleanup duration. During this pause we do some, ahem, cleanup of our > data structures and for large heaps I have also seen the cleanup > pauses to take longer than I thought they would take. I know this is > not going to help you in the short term but we have plans to do the > cleanup work concurrently (or at least mostly-concurrently) in the > future. Sounds good :-) > >> I thought this phase should be short because there is >> just finished the counting, the role of the bitmaps is switched and the >> next possible garbage regions are detemined. All things, which should be >> very fast. So what is taking the time? > > Most likely, the remembered set scrubbing phase... > >> The second cause for my email is the crazy behaviour after a few hours: >> After the startup of the server it uses about 13.5 gb old-gen memory and >> generates very slowly eden-garbage. Since the new allocated memory is >> mostly garbage the (young) garbage collections are very fast and g1 >> decides to grow up the eden space. This works 4 times until eden space >> has more than about 3.5 gb memory. After this the gc is making much more >> collections and while the collections it discovers new garbage (probably >> the old one again). > > I'm not quite sure what you mean by "it discovers new garbage". For > young GCs, G1 (and our other GCs) will reclaim any young objects that > will discover to be dead (more accurately: that it will not discover > to be live). > >> Eden memory usage jumps between 0 and 3.5gb even >> though I am sure the java-application is not making more than before. > > Well, that's not good. :-) Can you try to explicitly set the young gen > size with -Xmn3g say, to see what happens? With "it discovers new garbage" I mean that during the garbage collection the eden space usage jumps up to 3gb. Then it cleans up the whole garbage (eden usage is 0) and a few seconds later the eden usage jumps again up. You can see this in the 1h eden-space snapshot: http://java4.info/g1/eden_1h.png Since the jumps are betweend 0 and the last max eden usage (of about 3.5gb) I assume that it discovers the same garbage, it cleaned up the last time, and collects it again. I am sure the application is not making more garbage than the time before. Have you ever heared of problems like this? After I have written the last email, I have seen that it has calm itself after a few hours. But it is nevertheless very curious and produces a lot of unnecessary pauses. Flo > > Tony > >> I >> assume that it runs during a collection in the old garbage and collects >> it again. Is this possible? Or is there an overflow since eden space >> uses more than 3.5 gb? >> >> Thanks and regards, >> Flo >> >> Some useful information: >> $ java -version >> java version "1.6.0_29" >> Java(TM) SE Runtime Environment (build 1.6.0_29-b11) >> Java HotSpot(TM) 64-Bit Server VM (build 20.4-b02, mixed mode) >> >> Startup Parameters: >> -Xms20g -Xmx20g >> -verbose:gc \ >> -XX:+UnlockExperimentalVMOptions \ >> -XX:+UseG1GC \ >> -XX:+PrintGCDetails \ >> -XX:+PrintGCDateStamps \ >> -XX:+UseLargePages \ >> -XX:+PrintFlagsFinal \ >> -XX:-TraceClassUnloading \ >> >> $ cat /proc/meminfo | grep Huge >> HugePages_Total: 11264 >> HugePages_Free: 1015 >> HugePages_Rsvd: 32 >> Hugepagesize: 2048 kB >> >> A few screen-shots of the jconsole memory-view: >> http://java4.info/g1/1h.png >> http://java4.info/g1/all.png >> http://java4.info/g1/eden_1h.png >> http://java4.info/g1/eden_all.png >> http://java4.info/g1/oldgen_all.png >> >> The sysout end syserr logfile with the gc logging and PrintFinalFlags: >> http://java4.info/g1/out_err.log.gz >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From stefan.karlsson at oracle.com Wed Nov 30 01:02:33 2011 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Wed, 30 Nov 2011 10:02:33 +0100 Subject: Is CMS cycle can collect finalize objects In-Reply-To: References: Message-ID: <4ED5F129.7040208@oracle.com> On 11/30/2011 07:16 AM, Srinivas Ramakrishna wrote: > Who knows, may be this is related to the other CMS CR that Stefan just > sent out a review request for. If I understand correctly then, > the behaviour should be good if you turn off parallel marking in CMS, > viz -XX:-CMSConcurrentMTEnabled (or whatever > the flag is called now). Are you able to check that? If it's the same bug -XX:-CMSConcurrentMTEnabled should fix it. I instrumented and ran the small reproducer in the bug report for bug 7113118 . I agree with Krystal's earlier assessment of that reproducer. We actually do discover most Finalizers, but we need a second GC to clean them out. StefanK > > Adding Stefan to the cc, just in case. > -- ramki > > On Tue, Nov 29, 2011 at 2:06 PM, Koji Noguchi > wrote: > > Thanks Krystal for your update. > > I don?t know why I?m getting a different result than yours. > > > > * After the second CMS collection, the Java heap used goes down > to 61747K, > > > In my case, it stays above 800MBytes... > > Attached is the memory footprint with > CMS(-XX:+UseCMSInitiatingOccupancyOnly > -XX:CMSInitiatingOccupancyFraction=95 -XX:+UseConcMarkSweepGC) and > without(fullgc). > > It was interesting to see > > 1. FullGC case eventually stabilizing to having each FullGC > releasing half of the heap (500M) due to finalizer requiring > two GCs. > 2. CMS case still stayed above 800M but there were a few times > when memory footprint dropped. > > > In any cases, I?m pretty sure your SocketAdaptor [3] patch would > workaround the CMS issue I?m facing. So this is no longer urgent > to me as long as that change gets into a future java version. > > Thanks again for all your inputs. > > Koji > > > On 11/24/11 1:45 PM, "Srinivas Ramakrishna" > wrote: > > Hi Kris, thanks for running the test case and figuring that > out, and saving us further investigation of > the submitted test case from Koji. > > Hopefully you or Koji will be able to find a simple test case > that illustrates the real issue. > > thanks! > -- ramki > > > On Thu, Nov 24, 2011 at 1:52 AM, Krystal Mok > > wrote: > > Hi Koji and Ramki, > > I had a look at the repro test case in Bug 7113118. I > don't think the test case is showing the same problem as > the original one caused by SocksSocketImpl objects. The > way this test case is behaving is exactly what the VM > arguments told it to do. > > I ran the test case on a 64-bit Linux HotSpot Server VM, > on JDK 6 update 29. > > My .hotspotrc is at [1]. > > SurvivorRatio doesn't need to be set explicitly, because > when CMS is in use and MaxTenuringThreshold is 0 (or > AlwaysTenure is true), the SurvivorRatio will > automatically be set to 1024 by ergonomics. > UsePSAdaptiveSurvivorSizePolicy has no effect when CMS is > in use, so it's omitted from my configuration, too. > > By using -XX:+PrintReferenceGC, the gc log will show when > and how many finalizable object are discovered. > > The VM arguments given force all surviving object from > minor collections to be promoted into the old generation, > and none of the minor collections had a chance to > discovery any ready-to-be-collected FinalReferences, so > minor GC logs aren't of interest in this case. All of the > minor GC log lines show "[FinalReference, 0 refs, xxx secs]". > > A part of the GC log can be found at [2]. This log shows > two CMS collections cycles, in between dozens of minor > collections. > > * Before the first of these two CMS collections, the Java > heap used is 971914K, and then the CMS occupancy threshold > is crossed so a CMS collection cycle starts; > * During the re-mark phase of the first CMS > collection, 46400 FinalReferences were discovered; > * After the first CMS collection, the Java heap used is > still high, at 913771K, because the finalizable objects > need another old generation collection to be collected > (either CMS or full GC is fine); > * During the re-mark phase of the second CMS collection, > 3000 FinalReferences were discovered, these are from > promoted objects from the minor collections in between; > * After the second CMS collection, the Java heap used goes > down to 61747K, as the finalizable objects discovered from > the first CMS collection are indeed finalized and then > collected during the second CMS collection. > > This behavior looks normal to me -- it's what the VM > arguments were telling the VM to do. > The reason that the Java heap used size was swing up and > down is because the actual live data set was very low, but > the CMSInitiatingOccupancyFraction was set too high so > concurrent collections are started too late. If the > initiating threshold were set to a smaller value, say 20, > then the test case would behave quite reasonably. > > We'd need another test case to study, because this one > doesn't really repro the problem. > > After we applied the patch to SocketAdaptor [3], we don't > see this kind of CMS/finalization problem in production > anymore. Should we hit one of these again, I'll try to get > more info from our production site and see if I can trace > down the real problem. > > - Kris > > [1]: https://gist.github.com/1390876#file_.hotspotrc > [2]: https://gist.github.com/1390876#file_gc.partial.log > [3]: > http://mail.openjdk.java.net/pipermail/nio-dev/2011-November/001480.html > > > On Thu, Nov 24, 2011 at 7:11 AM, Srinivas Ramakrishna > > wrote: > > Hi Koji -- > > Thanks for the test case, that should definitely help > with the dentification of the problem. I'll see if > i can find some spare time to pursue it one of these > days (but can't promise), so please > do open that Oracle support ticket to get the > requisite resource allocated for the official > investigation. > > Thanks again for boiling it down to a simple test > case, and i'll update if i identify the > root cause... > > -- ramki > > > On Tue, Nov 22, 2011 at 1:06 PM, Koji Noguchi > > wrote: > > This is from an old thread in 2011 April but we're > still seeing the same > problem with (nio) Socket instances not getting > collecting by CMS. > > Opened > http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7113118 > > Thanks, > Koji > > > (From > http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2011-April/subject.htm > l) > On 4/25/11 8:37 AM, "Y. Srinivas Ramakrishna" > > > wrote: > > On 4/25/2011 9:10 AM, Bharath Mundlapudi wrote: > > > Hi Ramki, > > > > > > Thanks for the detailed explanation. I was > trying to > > > run some tests for your questions. Here are the > answers to some of your > > > questions. > > > > > >>> What are the symptoms? > > > java.net.SocksSocketImpl objects are not > getting cleaned up after a CMS > > cycle. I see the direct > > > correlation to java.lang.ref.Finalizer objects. > Overtime, this fills up > > > the old generation and CMS going in loop > occupying complete one core. > > > But when we trigger Full GC, these objects are > garbage collected. > > > > OK, thanks. > > > > > > > > You > > > mentioned that CMS cycle does cleanup these > objects provided we enable > > > class unloading. Are you suggesting > -XX:+ClassUnloading or > > > -XX:+CMSClassUnloadingEnabled? I have tried > with later and > > > > > > didn't > > > succeed. Our pern gen is relatively > constant, by enabling this, are we > > > introducing performance overhead? We have > room for CPU cycles and perm > > > gen is relatively small, so this may be fine. > Just that we want to see > > > these objects should GC'ed in CMS cycle. > > > > > > > > > Do you have any suggestion w.r.t. to which > flags should i be using to > > trigger this? > > > > For the issue you are seeing the > -XX:+CMSClassUnloadingFlag will > > not make a difference in the accumulation of the > socket objects > > because there is no "projection" as far as i can > tell of these > > into the perm gen, esepcially since as you say > there is no class > > loading going on (since your perm gen size > remains constant after > > start-up). > > > > > However, keeping class unloading enabled via this > flag should > > hopefully not have much of an impact on your > pause times given that > > the perm gen is small. The typical effect you > will see if class > > unloading is enabled is that the CMS remark pause > times are a bit > > longer (if you enable PrintGCDetails you will see > messages > > such as "scrub string table" and "scrub symbol > table", "code cache" > > etc. BY comparing the CMS-remark pause details > and times with > > and without enabling class unloading you will get > a good idea > > of its impact. In some cases, eben though you pay > a small price > > in terms of increased CMS-remark pause times, you > will make up > > for that in terms of faster scavenges etc., so it > might well > > be worthwhile. > > > > In the very near future, we may end up turning > that on > > by default for CMS because the savings from > leaving it off > > by default are much smaller now and it can often > lead to > > other issues if class unloading is turned off. > > > > So bottom line is: it will not affect the > accumulation of > > your socket objects, but it's a good idea to keep > class > > unloading by CMS enabled anyway. > > > > > > > > > > >>> What does jmap -finalizerinfo on your process > show? > > >>> What does -XX:+PrintClassHistogram show as > accumulating in the > > heap? > > >>> (Are they one specific type of Finalizer > objects or all > > varieties?) > > > > > > Jmap -histo shows the above class is keep > accumulating. Infact, > > > finalizerinfo doesn't show any objects on this > process. > > > > OK, that shows that the objects are somehow not > discovered by > > CMS as being eligible for finalization. Although > one can imagine > > a one cycle delay (because of floating garbage) > with CMS finding > > these objects to be unreachable and hence > eligible for finalization, > > continuing accumulation of these objects over a > period of time > > (and presumably many CMS cycles) seems strange > and almost > > definitely a CMS bug especially as you find that > a full STW > > gc does indeed reclaim them. > > > > > > > > > > > > > >>> Did the problem start in 6u21? Or are those > the only versions > > >>> you tested and found that there was an issue? > > > We > > > have seen this problem in 6u21. We were on > 6u12 earlier and didn't run > > > into this problem. But can't say this is a > build particular, since lots > > > of things have changed. > > > > Can you boil down this behavior into a test case > that you are able > > to share with us? > > If so, please file a bug with the test case > > and send me the CR id and I'll take a look. > > > > Oh, and before you do that, can you please check > the latest public > > release (6u24 or 6u25?) to see if the problem > still reproduces? > > > > thanks, and sorry I could not be of more help > without a bug > > report or a test case. > > > > -- ramki > > > > > > > > Thanks in anticipation, > > > -Bharath > > > > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > > > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20111130/2c7360c3/attachment-0001.html From rednaxelafx at gmail.com Wed Nov 30 01:45:19 2011 From: rednaxelafx at gmail.com (Krystal Mok) Date: Wed, 30 Nov 2011 17:45:19 +0800 Subject: Is CMS cycle can collect finalize objects In-Reply-To: <4ED5F129.7040208@oracle.com> References: <4ED5F129.7040208@oracle.com> Message-ID: Hi Stefan and Ramki, I've been looking at 7112034 since it was sent for review. I'll see if I can get one of our production site machines that encountered the problem try out -XX:-CMSConcurrentMTEnabled without the SocketAdaptor patch. Will report back later. Thanks a lot for the fix! - Kris On Wed, Nov 30, 2011 at 5:02 PM, Stefan Karlsson wrote: > ** > On 11/30/2011 07:16 AM, Srinivas Ramakrishna wrote: > > Who knows, may be this is related to the other CMS CR that Stefan just > sent out a review request for. If I understand correctly then, > the behaviour should be good if you turn off parallel marking in CMS, viz > -XX:-CMSConcurrentMTEnabled (or whatever > the flag is called now). Are you able to check that? > > > If it's the same bug -XX:-CMSConcurrentMTEnabled should fix it. > > I instrumented and ran the small reproducer in the bug report for bug > 7113118 . I > agree with Krystal's earlier assessment of that reproducer. We actually do > discover most Finalizers, but we need a second GC to clean them out. > > StefanK > > > > Adding Stefan to the cc, just in case. > -- ramki > > On Tue, Nov 29, 2011 at 2:06 PM, Koji Noguchi wrote: > >> Thanks Krystal for your update. >> >> I don?t know why I?m getting a different result than yours. >> >> >> > * After the second CMS collection, the Java heap used goes down >> to 61747K, >> > >> In my case, it stays above 800MBytes... >> >> Attached is the memory footprint with >> CMS(-XX:+UseCMSInitiatingOccupancyOnly >> -XX:CMSInitiatingOccupancyFraction=95 -XX:+UseConcMarkSweepGC) and >> without(fullgc). >> >> It was interesting to see >> >> 1. FullGC case eventually stabilizing to having each FullGC releasing >> half of the heap (500M) due to finalizer requiring two GCs. >> 2. CMS case still stayed above 800M but there were a few times when >> memory footprint dropped. >> >> >> In any cases, I?m pretty sure your SocketAdaptor [3] patch would >> workaround the CMS issue I?m facing. So this is no longer urgent to me as >> long as that change gets into a future java version. >> >> Thanks again for all your inputs. >> >> Koji >> >> >> On 11/24/11 1:45 PM, "Srinivas Ramakrishna" wrote: >> >> Hi Kris, thanks for running the test case and figuring that out, and >> saving us further investigation of >> the submitted test case from Koji. >> >> Hopefully you or Koji will be able to find a simple test case that >> illustrates the real issue. >> >> thanks! >> -- ramki >> >> >> On Thu, Nov 24, 2011 at 1:52 AM, Krystal Mok >> wrote: >> >> Hi Koji and Ramki, >> >> I had a look at the repro test case in Bug 7113118. I don't think the >> test case is showing the same problem as the original one caused by >> SocksSocketImpl objects. The way this test case is behaving is exactly what >> the VM arguments told it to do. >> >> I ran the test case on a 64-bit Linux HotSpot Server VM, on JDK 6 update >> 29. >> >> My .hotspotrc is at [1]. >> >> SurvivorRatio doesn't need to be set explicitly, because when CMS is in >> use and MaxTenuringThreshold is 0 (or AlwaysTenure is true), the >> SurvivorRatio will automatically be set to 1024 by ergonomics. >> UsePSAdaptiveSurvivorSizePolicy has no effect when CMS is in use, so it's >> omitted from my configuration, too. >> >> By using -XX:+PrintReferenceGC, the gc log will show when and how many >> finalizable object are discovered. >> >> The VM arguments given force all surviving object from minor collections >> to be promoted into the old generation, and none of the minor collections >> had a chance to discovery any ready-to-be-collected FinalReferences, so >> minor GC logs aren't of interest in this case. All of the minor GC log >> lines show "[FinalReference, 0 refs, xxx secs]". >> >> A part of the GC log can be found at [2]. This log shows two CMS >> collections cycles, in between dozens of minor collections. >> >> * Before the first of these two CMS collections, the Java heap used >> is 971914K, and then the CMS occupancy threshold is crossed so a CMS >> collection cycle starts; >> * During the re-mark phase of the first CMS collection, 46400 >> FinalReferences were discovered; >> * After the first CMS collection, the Java heap used is still high, >> at 913771K, because the finalizable objects need another old generation >> collection to be collected (either CMS or full GC is fine); >> * During the re-mark phase of the second CMS collection, 3000 >> FinalReferences were discovered, these are from promoted objects from the >> minor collections in between; >> * After the second CMS collection, the Java heap used goes down >> to 61747K, as the finalizable objects discovered from the first CMS >> collection are indeed finalized and then collected during the second CMS >> collection. >> >> This behavior looks normal to me -- it's what the VM arguments were >> telling the VM to do. >> The reason that the Java heap used size was swing up and down is because >> the actual live data set was very low, but >> the CMSInitiatingOccupancyFraction was set too high so concurrent >> collections are started too late. If the initiating threshold were set to a >> smaller value, say 20, then the test case would behave quite reasonably. >> >> We'd need another test case to study, because this one doesn't really >> repro the problem. >> >> After we applied the patch to SocketAdaptor [3], we don't see this kind >> of CMS/finalization problem in production anymore. Should we hit one of >> these again, I'll try to get more info from our production site and see if >> I can trace down the real problem. >> >> - Kris >> >> [1]: https://gist.github.com/1390876#file_.hotspotrc >> [2]: https://gist.github.com/1390876#file_gc.partial.log >> [3]: >> http://mail.openjdk.java.net/pipermail/nio-dev/2011-November/001480.html >> >> >> On Thu, Nov 24, 2011 at 7:11 AM, Srinivas Ramakrishna >> wrote: >> >> Hi Koji -- >> >> Thanks for the test case, that should definitely help with the >> dentification of the problem. I'll see if >> i can find some spare time to pursue it one of these days (but can't >> promise), so please >> do open that Oracle support ticket to get the requisite resource >> allocated for the official >> investigation. >> >> Thanks again for boiling it down to a simple test case, and i'll update >> if i identify the >> root cause... >> >> -- ramki >> >> >> On Tue, Nov 22, 2011 at 1:06 PM, Koji Noguchi >> wrote: >> >> This is from an old thread in 2011 April but we're still seeing the same >> problem with (nio) Socket instances not getting collecting by CMS. >> >> Opened http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7113118 >> >> Thanks, >> Koji >> >> >> (From >> >> http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2011-April/subject.htm >> l) >> On 4/25/11 8:37 AM, "Y. Srinivas Ramakrishna" < >> y.s.ramakrishna at oracle.com>> >> wrote: >> > On 4/25/2011 9:10 AM, Bharath Mundlapudi wrote: >> > > Hi Ramki, >> > > >> > > Thanks for the detailed explanation. I was trying to >> > > run some tests for your questions. Here are the answers to some of >> your >> > > questions. >> > > >> > >>> What are the symptoms? >> > > java.net.SocksSocketImpl objects are not getting cleaned up after a >> CMS >> > cycle. I see the direct >> > > correlation to java.lang.ref.Finalizer objects. Overtime, this fills >> up >> > > the old generation and CMS going in loop occupying complete one core. >> > > But when we trigger Full GC, these objects are garbage collected. >> > >> > OK, thanks. >> > >> > > >> > > You >> > > mentioned that CMS cycle does cleanup these objects provided we >> enable >> > > class unloading. Are you suggesting -XX:+ClassUnloading or >> > > -XX:+CMSClassUnloadingEnabled? I have tried with later and >> > > >> > > didn't >> > > succeed. Our pern gen is relatively constant, by enabling this, >> are we >> > > introducing performance overhead? We have room for CPU cycles and >> perm >> > > gen is relatively small, so this may be fine. Just that we want to see >> > > these objects should GC'ed in CMS cycle. >> > > >> > > >> > > Do you have any suggestion w.r.t. to which flags should i be using to >> > trigger this? >> > >> > For the issue you are seeing the -XX:+CMSClassUnloadingFlag will >> > not make a difference in the accumulation of the socket objects >> > because there is no "projection" as far as i can tell of these >> > into the perm gen, esepcially since as you say there is no class >> > loading going on (since your perm gen size remains constant after >> > start-up). >> > >> >> > However, keeping class unloading enabled via this flag should >> > hopefully not have much of an impact on your pause times given that >> > the perm gen is small. The typical effect you will see if class >> > unloading is enabled is that the CMS remark pause times are a bit >> > longer (if you enable PrintGCDetails you will see messages >> > such as "scrub string table" and "scrub symbol table", "code cache" >> > etc. BY comparing the CMS-remark pause details and times with >> > and without enabling class unloading you will get a good idea >> > of its impact. In some cases, eben though you pay a small price >> > in terms of increased CMS-remark pause times, you will make up >> > for that in terms of faster scavenges etc., so it might well >> > be worthwhile. >> > >> > In the very near future, we may end up turning that on >> > by default for CMS because the savings from leaving it off >> > by default are much smaller now and it can often lead to >> > other issues if class unloading is turned off. >> > >> > So bottom line is: it will not affect the accumulation of >> > your socket objects, but it's a good idea to keep class >> > unloading by CMS enabled anyway. >> > >> > > >> > > >> > >>> What does jmap -finalizerinfo on your process show? >> > >>> What does -XX:+PrintClassHistogram show as accumulating in the >> > heap? >> > >>> (Are they one specific type of Finalizer objects or all >> > varieties?) >> > > >> > > Jmap -histo shows the above class is keep accumulating. Infact, >> > > finalizerinfo doesn't show any objects on this process. >> > >> > OK, that shows that the objects are somehow not discovered by >> > CMS as being eligible for finalization. Although one can imagine >> > a one cycle delay (because of floating garbage) with CMS finding >> > these objects to be unreachable and hence eligible for finalization, >> > continuing accumulation of these objects over a period of time >> > (and presumably many CMS cycles) seems strange and almost >> > definitely a CMS bug especially as you find that a full STW >> > gc does indeed reclaim them. >> > >> > > >> > > >> > > >> > >>> Did the problem start in 6u21? Or are those the only versions >> > >>> you tested and found that there was an issue? >> > > We >> > > have seen this problem in 6u21. We were on 6u12 earlier and didn't >> run >> > > into this problem. But can't say this is a build particular, since >> lots >> > > of things have changed. >> > >> > Can you boil down this behavior into a test case that you are able >> > to share with us? >> > If so, please file a bug with the test case >> > and send me the CR id and I'll take a look. >> > >> > Oh, and before you do that, can you please check the latest public >> > release (6u24 or 6u25?) to see if the problem still reproduces? >> > >> > thanks, and sorry I could not be of more help without a bug >> > report or a test case. >> > >> > -- ramki >> > >> > > >> > > Thanks in anticipation, >> > > -Bharath >> > > >> >> >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> >> >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> >> >> >> >> > > _______________________________________________ > hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20111130/762ebad7/attachment-0001.html From knoguchi at yahoo-inc.com Wed Nov 30 16:03:31 2011 From: knoguchi at yahoo-inc.com (Koji Noguchi) Date: Wed, 30 Nov 2011 16:03:31 -0800 Subject: Is CMS cycle can collect finalize objects In-Reply-To: <4ED5F129.7040208@oracle.com> Message-ID: Thanks everyone! > the behaviour should be good if you turn off parallel marking in CMS, viz -XX:-CMSConcurrentMTEnabled > Jon also pinged me offline pointing out the same. And yes, this does seem to solve the issue I?m observing. Attaching the result I got from disabling parallel marking for my simple test. Red: Regular CMS with CMSConcurrentMTEnabled Green: CMS with CMSConcurrentMT disabled Blue: FullGC You can see that with CMSConcurrentMT disabled, it is successfully collecting all the stale objects on every other CMS. As a side note, > FullGC case eventually stabilizing to having each FullGC releasing half of the heap (500M) due to finalizer requiring two GCs. > >From the graph it doesn?t seem like CMS+ ?XX:-CMSConcurrentMTEnabled (green) is hitting this, but this is just a matter of time. It is slowly getting closer to this state. I would just need to run the test 100 times longer. So, (i) My simple test: ?XX:-CMSConcurrentMTEnabled does fix the issue. (ii) Single node test on my actual server(namenode): ?XX:-CMSConcurrentMTEnabled also seem to fix the issue, I would continue to run for couple more days to confirm. (iii) Test on production. : Haven?t done this yet but I?m optimistic on this option fixing the issue. Thanks again for everyone who helped ! It has bugged me for such a long time. I cannot wait to try this option on a real cluster soon. Koji On 11/30/11 1:02 AM, "Stefan Karlsson" wrote: On 11/30/2011 07:16 AM, Srinivas Ramakrishna wrote: Who knows, may be this is related to the other CMS CR that Stefan just sent out a review request for. If I understand correctly then, the behaviour should be good if you turn off parallel marking in CMS, viz -XX:-CMSConcurrentMTEnabled (or whatever the flag is called now). Are you able to check that? If it's the same bug -XX:-CMSConcurrentMTEnabled should fix it. I instrumented and ran the small reproducer in the bug report for bug 7113118 . I agree with Krystal's earlier assessment of that reproducer. We actually do discover most Finalizers, but we need a second GC to clean them out. StefanK Adding Stefan to the cc, just in case. -- ramki On Tue, Nov 29, 2011 at 2:06 PM, Koji Noguchi wrote: Thanks Krystal for your update. I don?t know why I?m getting a different result than yours. > * After the second CMS collection, the Java heap used goes down to 61747K, > In my case, it stays above 800MBytes... Attached is the memory footprint with CMS(-XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=95 -XX:+UseConcMarkSweepGC) and without(fullgc). It was interesting to see 1. FullGC case eventually stabilizing to having each FullGC releasing half of the heap (500M) due to finalizer requiring two GCs. 2. CMS case still stayed above 800M but there were a few times when memory footprint dropped. 3. In any cases, I?m pretty sure your SocketAdaptor [3] patch would workaround the CMS issue I?m facing. So this is no longer urgent to me as long as that change gets into a future java version. Thanks again for all your inputs. Koji On 11/24/11 1:45 PM, "Srinivas Ramakrishna" > wrote: Hi Kris, thanks for running the test case and figuring that out, and saving us further investigation of the submitted test case from Koji. Hopefully you or Koji will be able to find a simple test case that illustrates the real issue. thanks! -- ramki On Thu, Nov 24, 2011 at 1:52 AM, Krystal Mok > wrote: Hi Koji and Ramki, I had a look at the repro test case in Bug 7113118. I don't think the test case is showing the same problem as the original one caused by SocksSocketImpl objects. The way this test case is behaving is exactly what the VM arguments told it to do. I ran the test case on a 64-bit Linux HotSpot Server VM, on JDK 6 update 29. My .hotspotrc is at [1]. SurvivorRatio doesn't need to be set explicitly, because when CMS is in use and MaxTenuringThreshold is 0 (or AlwaysTenure is true), the SurvivorRatio will automatically be set to 1024 by ergonomics. UsePSAdaptiveSurvivorSizePolicy has no effect when CMS is in use, so it's omitted from my configuration, too. By using -XX:+PrintReferenceGC, the gc log will show when and how many finalizable object are discovered. The VM arguments given force all surviving object from minor collections to be promoted into the old generation, and none of the minor collections had a chance to discovery any ready-to-be-collected FinalReferences, so minor GC logs aren't of interest in this case. All of the minor GC log lines show "[FinalReference, 0 refs, xxx secs]". A part of the GC log can be found at [2]. This log shows two CMS collections cycles, in between dozens of minor collections. * Before the first of these two CMS collections, the Java heap used is 971914K, and then the CMS occupancy threshold is crossed so a CMS collection cycle starts; * During the re-mark phase of the first CMS collection, 46400 FinalReferences were discovered; * After the first CMS collection, the Java heap used is still high, at 913771K, because the finalizable objects need another old generation collection to be collected (either CMS or full GC is fine); * During the re-mark phase of the second CMS collection, 3000 FinalReferences were discovered, these are from promoted objects from the minor collections in between; * After the second CMS collection, the Java heap used goes down to 61747K, as the finalizable objects discovered from the first CMS collection are indeed finalized and then collected during the second CMS collection. This behavior looks normal to me -- it's what the VM arguments were telling the VM to do. The reason that the Java heap used size was swing up and down is because the actual live data set was very low, but the CMSInitiatingOccupancyFraction was set too high so concurrent collections are started too late. If the initiating threshold were set to a smaller value, say 20, then the test case would behave quite reasonably. We'd need another test case to study, because this one doesn't really repro the problem. After we applied the patch to SocketAdaptor [3], we don't see this kind of CMS/finalization problem in production anymore. Should we hit one of these again, I'll try to get more info from our production site and see if I can trace down the real problem. - Kris [1]: https://gist.github.com/1390876#file_.hotspotrc [2]: https://gist.github.com/1390876#file_gc.partial.log [3]: http://mail.openjdk.java.net/pipermail/nio-dev/2011-November/001480.html On Thu, Nov 24, 2011 at 7:11 AM, Srinivas Ramakrishna > wrote: Hi Koji -- Thanks for the test case, that should definitely help with the dentification of the problem. I'll see if i can find some spare time to pursue it one of these days (but can't promise), so please do open that Oracle support ticket to get the requisite resource allocated for the official investigation. Thanks again for boiling it down to a simple test case, and i'll update if i identify the root cause... -- ramki On Tue, Nov 22, 2011 at 1:06 PM, Koji Noguchi > wrote: This is from an old thread in 2011 April but we're still seeing the same problem with (nio) Socket instances not getting collecting by CMS. Opened http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7113118 Thanks, Koji (From http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2011-April/subject.htm l) On 4/25/11 8:37 AM, "Y. Srinivas Ramakrishna" > wrote: > On 4/25/2011 9:10 AM, Bharath Mundlapudi wrote: > > Hi Ramki, > > > > Thanks for the detailed explanation. I was trying to > > run some tests for your questions. Here are the answers to some of your > > questions. > > > >>> What are the symptoms? > > java.net.SocksSocketImpl objects are not getting cleaned up after a CMS > cycle. I see the direct > > correlation to java.lang.ref.Finalizer objects. Overtime, this fills up > > the old generation and CMS going in loop occupying complete one core. > > But when we trigger Full GC, these objects are garbage collected. > > OK, thanks. > > > > > You > > mentioned that CMS cycle does cleanup these objects provided we enable > > class unloading. Are you suggesting -XX:+ClassUnloading or > > -XX:+CMSClassUnloadingEnabled? I have tried with later and > > > > didn't > > succeed. Our pern gen is relatively constant, by enabling this, are we > > introducing performance overhead? We have room for CPU cycles and perm > > gen is relatively small, so this may be fine. Just that we want to see > > these objects should GC'ed in CMS cycle. > > > > > > Do you have any suggestion w.r.t. to which flags should i be using to > trigger this? > > For the issue you are seeing the -XX:+CMSClassUnloadingFlag will > not make a difference in the accumulation of the socket objects > because there is no "projection" as far as i can tell of these > into the perm gen, esepcially since as you say there is no class > loading going on (since your perm gen size remains constant after > start-up). > > However, keeping class unloading enabled via this flag should > hopefully not have much of an impact on your pause times given that > the perm gen is small. The typical effect you will see if class > unloading is enabled is that the CMS remark pause times are a bit > longer (if you enable PrintGCDetails you will see messages > such as "scrub string table" and "scrub symbol table", "code cache" > etc. BY comparing the CMS-remark pause details and times with > and without enabling class unloading you will get a good idea > of its impact. In some cases, eben though you pay a small price > in terms of increased CMS-remark pause times, you will make up > for that in terms of faster scavenges etc., so it might well > be worthwhile. > > In the very near future, we may end up turning that on > by default for CMS because the savings from leaving it off > by default are much smaller now and it can often lead to > other issues if class unloading is turned off. > > So bottom line is: it will not affect the accumulation of > your socket objects, but it's a good idea to keep class > unloading by CMS enabled anyway. > > > > > > >>> What does jmap -finalizerinfo on your process show? > >>> What does -XX:+PrintClassHistogram show as accumulating in the > heap? > >>> (Are they one specific type of Finalizer objects or all > varieties?) > > > > Jmap -histo shows the above class is keep accumulating. Infact, > > finalizerinfo doesn't show any objects on this process. > > OK, that shows that the objects are somehow not discovered by > CMS as being eligible for finalization. Although one can imagine > a one cycle delay (because of floating garbage) with CMS finding > these objects to be unreachable and hence eligible for finalization, > continuing accumulation of these objects over a period of time > (and presumably many CMS cycles) seems strange and almost > definitely a CMS bug especially as you find that a full STW > gc does indeed reclaim them. > > > > > > > > >>> Did the problem start in 6u21? Or are those the only versions > >>> you tested and found that there was an issue? > > We > > have seen this problem in 6u21. We were on 6u12 earlier and didn't run > > into this problem. But can't say this is a build particular, since lots > > of things have changed. > > Can you boil down this behavior into a test case that you are able > to share with us? > If so, please file a bug with the test case > and send me the CR id and I'll take a look. > > Oh, and before you do that, can you please check the latest public > release (6u24 or 6u25?) to see if the problem still reproduces? > > thanks, and sorry I could not be of more help without a bug > report or a test case. > > -- ramki > > > > > Thanks in anticipation, > > -Bharath > > _______________________________________________ hotspot-gc-use mailing list hotspot-gc-use at openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use _______________________________________________ hotspot-gc-use mailing list hotspot-gc-use at openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use _______________________________________________ hotspot-gc-use mailing list hotspot-gc-use at openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20111130/011b9c65/attachment-0001.html -------------- next part -------------- A non-text attachment was scrubbed... Name: cmsAndWithoutConcurrentMTAndFullGC.png Type: application/octet-stream Size: 46041 bytes Desc: cmsAndWithoutConcurrentMTAndFullGC.png Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20111130/011b9c65/cmsAndWithoutConcurrentMTAndFullGC-0001.png