From ryebrye at gmail.com Tue May 7 07:09:30 2013 From: ryebrye at gmail.com (Ryan Gardner) Date: Tue, 7 May 2013 10:09:30 -0400 Subject: Is "-XX:G1OldCSetRegionLiveThresholdPercent" a flag in java 7? Message-ID: In the slides posted for the G1 tuning session at Java One 2012 here: http://www.myexpospace.com/JavaOne2012/SessionFiles/CON6583_PDF_6583_0001.pdf I see "-XX:G1OldCSetRegionLiveThresholdPercent" being listed as one of the options to try to tune long mixed GC's I tried using this on Java 1.7.0_21 but it comes back as being an unrecognized vm option Is there another secret flag I need to enable to try to tune these bits more? -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130507/d3fb4a74/attachment.html From jesper.wilhelmsson at oracle.com Tue May 7 08:57:54 2013 From: jesper.wilhelmsson at oracle.com (Jesper Wilhelmsson) Date: Tue, 07 May 2013 17:57:54 +0200 Subject: Is "-XX:G1OldCSetRegionLiveThresholdPercent" a flag in java 7? In-Reply-To: References: Message-ID: <51892482.5090501@oracle.com> Hi Ryan, -XX:G1OldCSetRegionLiveThresholdPercent has been replaced by -XX:G1MixedGCLiveThresholdPercent It is also an experimental option which means you should only use it if you know what you are doing. To enable experimental options use -XX:+UnlockExperimentalVMOptions as shown in some of the examples in the presentation. Hth, /Jesper Ryan Gardner skrev 7/5/13 4:09 PM: > In the slides posted for the G1 tuning session at Java One 2012 here: > > http://www.myexpospace.com/JavaOne2012/SessionFiles/CON6583_PDF_6583_0001.pdf > > I see "-XX:G1OldCSetRegionLiveThresholdPercent" being listed as one of the > options to try to tune long mixed GC's > > I tried using this on Java 1.7.0_21 but it comes back as being an unrecognized > vm option > > Is there another secret flag I need to enable to try to tune these bits more? > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > From monica.beckwith at oracle.com Tue May 7 09:54:39 2013 From: monica.beckwith at oracle.com (Monica Beckwith) Date: Tue, 07 May 2013 11:54:39 -0500 Subject: Is "-XX:G1OldCSetRegionLiveThresholdPercent" a flag in java 7? In-Reply-To: <51892482.5090501@oracle.com> References: <51892482.5090501@oracle.com> Message-ID: <518931CF.8060702@oracle.com> Ryan, We have also renamed a couple of other flags. You will find them in this CR: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8001424 Thanks! Monica On 5/7/2013 10:57 AM, Jesper Wilhelmsson wrote: > Hi Ryan, > > -XX:G1OldCSetRegionLiveThresholdPercent has been replaced by > -XX:G1MixedGCLiveThresholdPercent > > It is also an experimental option which means you should only use it if you know > what you are doing. To enable experimental options use > -XX:+UnlockExperimentalVMOptions as shown in some of the examples in the > presentation. > > Hth, > /Jesper > > > Ryan Gardner skrev 7/5/13 4:09 PM: >> In the slides posted for the G1 tuning session at Java One 2012 here: >> >> http://www.myexpospace.com/JavaOne2012/SessionFiles/CON6583_PDF_6583_0001.pdf >> >> I see "-XX:G1OldCSetRegionLiveThresholdPercent" being listed as one of the >> options to try to tune long mixed GC's >> >> I tried using this on Java 1.7.0_21 but it comes back as being an unrecognized >> vm option >> >> Is there another secret flag I need to enable to try to tune these bits more? >> >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -- Oracle Monica Beckwith | Principal Member of Technical Staff VOIP: +15124011274 Oracle Java Performance Green Oracle Oracle is committed to developing practices and products that help protect the environment -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130507/abaa46dd/attachment.html -------------- next part -------------- A non-text attachment was scrubbed... Name: oracle_sig_logo.gif Type: image/gif Size: 658 bytes Desc: not available Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130507/abaa46dd/oracle_sig_logo.gif -------------- next part -------------- A non-text attachment was scrubbed... Name: green-for-email-sig_0.gif Type: image/gif Size: 356 bytes Desc: not available Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130507/abaa46dd/green-for-email-sig_0.gif From the.6th.month at gmail.com Thu May 16 21:19:22 2013 From: the.6th.month at gmail.com (the.6th.month at gmail.com) Date: Fri, 17 May 2013 12:19:22 +0800 Subject: unexpected full gc time spike Message-ID: hi, all: We just had a situation that I don't quite understand with CMS gc. When I examined the gc log, I found that there was a cms gc which resulted in a parnew promotion failure and concurrent mode failure at the same time, and then the full gc lasted for slightly over three minutes. Here is the gc log: 2013-05-17T10:12:55.983+0800: 45168.774: [CMS-concurrent-mark: 7.056/7.860 secs] [Times: user=14.90 sys=0.45, real=7.86 secs] 2013-05-17T10:12:55.984+0800: 45168.775: [CMS-concurrent-preclean-start] 2013-05-17T10:12:56.753+0800: 45169.544: [CMS-concurrent-preclean: 0.676/0.770 secs] [Times: user=0.83 sys=0.15, real=0.77 secs] 2013-05-17T10:12:56.753+0800: 45169.544: [CMS-concurrent-abortable-preclean-start] 2013-05-17T10:12:58.460+0800: 45171.251: [GC 45171.252: [ParNew (promotion failed) Desired survivor size 67108864 bytes, new threshold 1 (max 6) - age 1: 70527216 bytes, 70527216 total : 917504K->917504K(917504K), 177.3558880 secs]45348.608: [CMS CMS: abort preclean due to time 2013-05-17T10:15:56.197+0800: 45348.989: [CMS-concurrent-abortable-preclean: 2.037/179.444 secs] [Times: user=44.72 sys=13.59, real=179.45 secs] (concurrent mode failure): 3017323K->2177093K(3047424K), 16.5528620 secs] 3879476K->2177093K(3964928K), [CMS Perm : 91333K->91008K(262144K)], 193.9097970 secs] [Times: user=58.79 sys=13.55, real=193.91 secs] the usual cms full gc time was roughly 100ms-400ms, but this time it lasted for 193 seconds. I understand that when there's a parnew gc happens during cms and to space is not large enough to hold all survived objects, or the remaining space in old gen cannot cope with memory allocation in old gen, full gc happens. But I don't understand why it hangs so long. I am using oracle jdk 1.6.0_37, and the jvm options we use are: -Xms4000m -Xmx4000m -Xmn1G -Xss256k -XX:PermSize=256m -XX:SurvivorRatio=6 -XX:MaxTenuringThreshold=6 -XX:+DisableExplicitGC -Xnoclassgc -Xverify:none -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+UseCMSCompactAtFullCollection -XX:+UseFastAccessorMethods -XX:+CMSPermGenSweepingEnabled -XX:+CMSClassUnloadingEnabled -XX:+UseCompressedOops -XX:CMSInitiatingOccupancyFraction=90 -XX:+UseCMSInitiatingOccupancyOnly Could it be a bug that results in the long full gc in case of promotion failure or something else? Could anyone offer me some help, and I really appreciate your help. Looking forward to any reply. All the best, Leon -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130517/ac6e67ea/attachment.html From ysr1729 at gmail.com Thu May 16 21:38:44 2013 From: ysr1729 at gmail.com (Srinivas Ramakrishna) Date: Thu, 16 May 2013 21:38:44 -0700 Subject: unexpected full gc time spike In-Reply-To: References: Message-ID: Hi Leon -- Yes, there are a couple of performance bugs related to promotion failure handling with ParNew+CMS that can cause this time to balloon. Here the unwind of the failed promotion took 177 s. I have at least a partial fix for this which I had written up a few months ago but never quite got around to collecting sufficient performance data to submit it as an official patch. I'll try and revive that patch and submit it... May be someone else can check if it helps sufficiently in the performance with promotion failure. -- ramki On Thu, May 16, 2013 at 9:19 PM, the.6th.month at gmail.com wrote: > hi, all: > We just had a situation that I don't quite understand with CMS gc. When I > examined the gc log, I found that there was a cms gc which resulted in a > parnew promotion failure and concurrent mode failure at the same time, and > then the full gc lasted for slightly over three minutes. Here is the gc log: > 2013-05-17T10:12:55.983+0800: 45168.774: [CMS-concurrent-mark: 7.056/7.860 > secs] [Times: user=14.90 sys=0.45, real=7.86 secs] > 2013-05-17T10:12:55.984+0800: 45168.775: [CMS-concurrent-preclean-start] > 2013-05-17T10:12:56.753+0800: 45169.544: [CMS-concurrent-preclean: > 0.676/0.770 secs] [Times: user=0.83 sys=0.15, real=0.77 secs] > 2013-05-17T10:12:56.753+0800: 45169.544: > [CMS-concurrent-abortable-preclean-start] > 2013-05-17T10:12:58.460+0800: 45171.251: [GC 45171.252: [ParNew (promotion > failed) > Desired survivor size 67108864 bytes, new threshold 1 (max 6) > - age 1: 70527216 bytes, 70527216 total > : 917504K->917504K(917504K), 177.3558880 secs]45348.608: [CMS CMS: abort > preclean due to time 2013-05-17T10:15:56.197+0800: 45348.989: > [CMS-concurrent-abortable-preclean: 2.037/179.444 secs] [Times: user=44.72 > sys=13.59, real=179.45 secs] > (concurrent mode failure): 3017323K->2177093K(3047424K), 16.5528620 secs] > 3879476K->2177093K(3964928K), [CMS Perm : 91333K->91008K(262144K)], > 193.9097970 secs] [Times: user=58.79 sys=13.55, real=193.91 secs] > > the usual cms full gc time was roughly 100ms-400ms, but this time it lasted > for 193 seconds. I understand that when there's a parnew gc happens during > cms and to space is not large enough to hold all survived objects, or the > remaining space in old gen cannot cope with memory allocation in old gen, > full gc happens. But I don't understand why it hangs so long. > I am using oracle jdk 1.6.0_37, and the jvm options we use are: > -Xms4000m -Xmx4000m -Xmn1G -Xss256k -XX:PermSize=256m -XX:SurvivorRatio=6 > -XX:MaxTenuringThreshold=6 -XX:+DisableExplicitGC -Xnoclassgc -Xverify:none > -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled > -XX:+UseCMSCompactAtFullCollection -XX:+UseFastAccessorMethods > -XX:+CMSPermGenSweepingEnabled -XX:+CMSClassUnloadingEnabled > -XX:+UseCompressedOops -XX:CMSInitiatingOccupancyFraction=90 > -XX:+UseCMSInitiatingOccupancyOnly > > Could it be a bug that results in the long full gc in case of promotion > failure or something else? Could anyone offer me some help, and I really > appreciate your help. > > Looking forward to any reply. > > All the best, > Leon > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > From the.6th.month at gmail.com Thu May 16 21:39:56 2013 From: the.6th.month at gmail.com (the.6th.month at gmail.com) Date: Fri, 17 May 2013 12:39:56 +0800 Subject: unexpected full gc time spike In-Reply-To: References: Message-ID: thanks very much indeed, hope we can see your patch soon On 17 May 2013 12:38, Srinivas Ramakrishna wrote: > Hi Leon -- > > Yes, there are a couple of performance bugs related to promotion > failure handling with ParNew+CMS that can cause this time to balloon. > Here the unwind of the failed promotion took 177 s. I have at least a > partial fix for this which I had written up a few months ago but never > quite got around to collecting sufficient performance data to submit > it as an official patch. > > I'll try and revive that patch and submit it... May be someone else > can check if it helps sufficiently in the performance with promotion > failure. > > -- ramki > > > On Thu, May 16, 2013 at 9:19 PM, the.6th.month at gmail.com > wrote: > > hi, all: > > We just had a situation that I don't quite understand with CMS gc. When I > > examined the gc log, I found that there was a cms gc which resulted in a > > parnew promotion failure and concurrent mode failure at the same time, > and > > then the full gc lasted for slightly over three minutes. Here is the gc > log: > > 2013-05-17T10:12:55.983+0800: 45168.774: [CMS-concurrent-mark: > 7.056/7.860 > > secs] [Times: user=14.90 sys=0.45, real=7.86 secs] > > 2013-05-17T10:12:55.984+0800: 45168.775: [CMS-concurrent-preclean-start] > > 2013-05-17T10:12:56.753+0800: 45169.544: [CMS-concurrent-preclean: > > 0.676/0.770 secs] [Times: user=0.83 sys=0.15, real=0.77 secs] > > 2013-05-17T10:12:56.753+0800: 45169.544: > > [CMS-concurrent-abortable-preclean-start] > > 2013-05-17T10:12:58.460+0800: 45171.251: [GC 45171.252: [ParNew > (promotion > > failed) > > Desired survivor size 67108864 bytes, new threshold 1 (max 6) > > - age 1: 70527216 bytes, 70527216 total > > : 917504K->917504K(917504K), 177.3558880 secs]45348.608: [CMS CMS: abort > > preclean due to time 2013-05-17T10:15:56.197+0800: 45348.989: > > [CMS-concurrent-abortable-preclean: 2.037/179.444 secs] [Times: > user=44.72 > > sys=13.59, real=179.45 secs] > > (concurrent mode failure): 3017323K->2177093K(3047424K), 16.5528620 > secs] > > 3879476K->2177093K(3964928K), [CMS Perm : 91333K->91008K(262144K)], > > 193.9097970 secs] [Times: user=58.79 sys=13.55, real=193.91 secs] > > > > the usual cms full gc time was roughly 100ms-400ms, but this time it > lasted > > for 193 seconds. I understand that when there's a parnew gc happens > during > > cms and to space is not large enough to hold all survived objects, or the > > remaining space in old gen cannot cope with memory allocation in old gen, > > full gc happens. But I don't understand why it hangs so long. > > I am using oracle jdk 1.6.0_37, and the jvm options we use are: > > -Xms4000m -Xmx4000m -Xmn1G -Xss256k -XX:PermSize=256m -XX:SurvivorRatio=6 > > -XX:MaxTenuringThreshold=6 -XX:+DisableExplicitGC -Xnoclassgc > -Xverify:none > > -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled > > -XX:+UseCMSCompactAtFullCollection -XX:+UseFastAccessorMethods > > -XX:+CMSPermGenSweepingEnabled -XX:+CMSClassUnloadingEnabled > > -XX:+UseCompressedOops -XX:CMSInitiatingOccupancyFraction=90 > > -XX:+UseCMSInitiatingOccupancyOnly > > > > Could it be a bug that results in the long full gc in case of promotion > > failure or something else? Could anyone offer me some help, and I really > > appreciate your help. > > > > Looking forward to any reply. > > > > All the best, > > Leon > > > > > > _______________________________________________ > > hotspot-gc-use mailing list > > hotspot-gc-use at openjdk.java.net > > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130517/1fdd6f2a/attachment.html From the.6th.month at gmail.com Thu May 16 22:06:54 2013 From: the.6th.month at gmail.com (the.6th.month at gmail.com) Date: Fri, 17 May 2013 13:06:54 +0800 Subject: unexpected full gc time spike In-Reply-To: References: Message-ID: hi, Ramki: btw, could you possibly explain what the bugs are and how those bugs affect the fallback fullgc time? I am really curious about the reason. thanks very much. all the best, Leon On 17 May 2013 12:39, "the.6th.month at gmail.com" wrote: > thanks very much indeed, hope we can see your patch soon > > > On 17 May 2013 12:38, Srinivas Ramakrishna wrote: > >> Hi Leon -- >> >> Yes, there are a couple of performance bugs related to promotion >> failure handling with ParNew+CMS that can cause this time to balloon. >> Here the unwind of the failed promotion took 177 s. I have at least a >> partial fix for this which I had written up a few months ago but never >> quite got around to collecting sufficient performance data to submit >> it as an official patch. >> >> I'll try and revive that patch and submit it... May be someone else >> can check if it helps sufficiently in the performance with promotion >> failure. >> >> -- ramki >> >> >> On Thu, May 16, 2013 at 9:19 PM, the.6th.month at gmail.com >> wrote: >> > hi, all: >> > We just had a situation that I don't quite understand with CMS gc. When >> I >> > examined the gc log, I found that there was a cms gc which resulted in a >> > parnew promotion failure and concurrent mode failure at the same time, >> and >> > then the full gc lasted for slightly over three minutes. Here is the gc >> log: >> > 2013-05-17T10:12:55.983+0800: 45168.774: [CMS-concurrent-mark: >> 7.056/7.860 >> > secs] [Times: user=14.90 sys=0.45, real=7.86 secs] >> > 2013-05-17T10:12:55.984+0800: 45168.775: [CMS-concurrent-preclean-start] >> > 2013-05-17T10:12:56.753+0800: 45169.544: [CMS-concurrent-preclean: >> > 0.676/0.770 secs] [Times: user=0.83 sys=0.15, real=0.77 secs] >> > 2013-05-17T10:12:56.753+0800: 45169.544: >> > [CMS-concurrent-abortable-preclean-start] >> > 2013-05-17T10:12:58.460+0800: 45171.251: [GC 45171.252: [ParNew >> (promotion >> > failed) >> > Desired survivor size 67108864 bytes, new threshold 1 (max 6) >> > - age 1: 70527216 bytes, 70527216 total >> > : 917504K->917504K(917504K), 177.3558880 secs]45348.608: [CMS CMS: abort >> > preclean due to time 2013-05-17T10:15:56.197+0800: 45348.989: >> > [CMS-concurrent-abortable-preclean: 2.037/179.444 secs] [Times: >> user=44.72 >> > sys=13.59, real=179.45 secs] >> > (concurrent mode failure): 3017323K->2177093K(3047424K), 16.5528620 >> secs] >> > 3879476K->2177093K(3964928K), [CMS Perm : 91333K->91008K(262144K)], >> > 193.9097970 secs] [Times: user=58.79 sys=13.55, real=193.91 secs] >> > >> > the usual cms full gc time was roughly 100ms-400ms, but this time it >> lasted >> > for 193 seconds. I understand that when there's a parnew gc happens >> during >> > cms and to space is not large enough to hold all survived objects, or >> the >> > remaining space in old gen cannot cope with memory allocation in old >> gen, >> > full gc happens. But I don't understand why it hangs so long. >> > I am using oracle jdk 1.6.0_37, and the jvm options we use are: >> > -Xms4000m -Xmx4000m -Xmn1G -Xss256k -XX:PermSize=256m >> -XX:SurvivorRatio=6 >> > -XX:MaxTenuringThreshold=6 -XX:+DisableExplicitGC -Xnoclassgc >> -Xverify:none >> > -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled >> > -XX:+UseCMSCompactAtFullCollection -XX:+UseFastAccessorMethods >> > -XX:+CMSPermGenSweepingEnabled -XX:+CMSClassUnloadingEnabled >> > -XX:+UseCompressedOops -XX:CMSInitiatingOccupancyFraction=90 >> > -XX:+UseCMSInitiatingOccupancyOnly >> > >> > Could it be a bug that results in the long full gc in case of promotion >> > failure or something else? Could anyone offer me some help, and I really >> > appreciate your help. >> > >> > Looking forward to any reply. >> > >> > All the best, >> > Leon >> > >> > >> > _______________________________________________ >> > hotspot-gc-use mailing list >> > hotspot-gc-use at openjdk.java.net >> > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> > >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130517/4695f7cc/attachment-0001.html From ysr1729 at gmail.com Fri May 17 09:37:42 2013 From: ysr1729 at gmail.com (Srinivas Ramakrishna) Date: Fri, 17 May 2013 09:37:42 -0700 Subject: unexpected full gc time spike In-Reply-To: References: Message-ID: Hi Leon -- Here's the history of that discussion, starting with this email (follow subject thread): http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2012-October/001370.html On Thu, May 16, 2013 at 10:06 PM, the.6th.month at gmail.com wrote: > hi, Ramki: > btw, could you possibly explain what the bugs are and how those bugs affect > the fallback fullgc time? I am really curious about the reason. > thanks very much. > > all the best, > Leon > > On 17 May 2013 12:39, "the.6th.month at gmail.com" > wrote: >> >> thanks very much indeed, hope we can see your patch soon >> >> >> On 17 May 2013 12:38, Srinivas Ramakrishna wrote: >>> >>> Hi Leon -- >>> >>> Yes, there are a couple of performance bugs related to promotion >>> failure handling with ParNew+CMS that can cause this time to balloon. >>> Here the unwind of the failed promotion took 177 s. I have at least a >>> partial fix for this which I had written up a few months ago but never >>> quite got around to collecting sufficient performance data to submit >>> it as an official patch. >>> >>> I'll try and revive that patch and submit it... May be someone else >>> can check if it helps sufficiently in the performance with promotion >>> failure. >>> >>> -- ramki >>> >>> >>> On Thu, May 16, 2013 at 9:19 PM, the.6th.month at gmail.com >>> wrote: >>> > hi, all: >>> > We just had a situation that I don't quite understand with CMS gc. When >>> > I >>> > examined the gc log, I found that there was a cms gc which resulted in >>> > a >>> > parnew promotion failure and concurrent mode failure at the same time, >>> > and >>> > then the full gc lasted for slightly over three minutes. Here is the gc >>> > log: >>> > 2013-05-17T10:12:55.983+0800: 45168.774: [CMS-concurrent-mark: >>> > 7.056/7.860 >>> > secs] [Times: user=14.90 sys=0.45, real=7.86 secs] >>> > 2013-05-17T10:12:55.984+0800: 45168.775: >>> > [CMS-concurrent-preclean-start] >>> > 2013-05-17T10:12:56.753+0800: 45169.544: [CMS-concurrent-preclean: >>> > 0.676/0.770 secs] [Times: user=0.83 sys=0.15, real=0.77 secs] >>> > 2013-05-17T10:12:56.753+0800: 45169.544: >>> > [CMS-concurrent-abortable-preclean-start] >>> > 2013-05-17T10:12:58.460+0800: 45171.251: [GC 45171.252: [ParNew >>> > (promotion >>> > failed) >>> > Desired survivor size 67108864 bytes, new threshold 1 (max 6) >>> > - age 1: 70527216 bytes, 70527216 total >>> > : 917504K->917504K(917504K), 177.3558880 secs]45348.608: [CMS CMS: >>> > abort >>> > preclean due to time 2013-05-17T10:15:56.197+0800: 45348.989: >>> > [CMS-concurrent-abortable-preclean: 2.037/179.444 secs] [Times: >>> > user=44.72 >>> > sys=13.59, real=179.45 secs] >>> > (concurrent mode failure): 3017323K->2177093K(3047424K), 16.5528620 >>> > secs] >>> > 3879476K->2177093K(3964928K), [CMS Perm : 91333K->91008K(262144K)], >>> > 193.9097970 secs] [Times: user=58.79 sys=13.55, real=193.91 secs] >>> > >>> > the usual cms full gc time was roughly 100ms-400ms, but this time it >>> > lasted >>> > for 193 seconds. I understand that when there's a parnew gc happens >>> > during >>> > cms and to space is not large enough to hold all survived objects, or >>> > the >>> > remaining space in old gen cannot cope with memory allocation in old >>> > gen, >>> > full gc happens. But I don't understand why it hangs so long. >>> > I am using oracle jdk 1.6.0_37, and the jvm options we use are: >>> > -Xms4000m -Xmx4000m -Xmn1G -Xss256k -XX:PermSize=256m >>> > -XX:SurvivorRatio=6 >>> > -XX:MaxTenuringThreshold=6 -XX:+DisableExplicitGC -Xnoclassgc >>> > -Xverify:none >>> > -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled >>> > -XX:+UseCMSCompactAtFullCollection -XX:+UseFastAccessorMethods >>> > -XX:+CMSPermGenSweepingEnabled -XX:+CMSClassUnloadingEnabled >>> > -XX:+UseCompressedOops -XX:CMSInitiatingOccupancyFraction=90 >>> > -XX:+UseCMSInitiatingOccupancyOnly >>> > >>> > Could it be a bug that results in the long full gc in case of promotion >>> > failure or something else? Could anyone offer me some help, and I >>> > really >>> > appreciate your help. >>> > >>> > Looking forward to any reply. >>> > >>> > All the best, >>> > Leon >>> > >>> > >>> > _______________________________________________ >>> > hotspot-gc-use mailing list >>> > hotspot-gc-use at openjdk.java.net >>> > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>> > >> >> > From ysr1729 at gmail.com Fri May 17 10:05:16 2013 From: ysr1729 at gmail.com (Srinivas Ramakrishna) Date: Fri, 17 May 2013 10:05:16 -0700 Subject: unexpected full gc time spike In-Reply-To: References: Message-ID: Looks like the search functionality of bugs.sun.com is no longer available. I tried searching the new bugzilla portal for the bug I had submitted around that time, but that doesn't bring up the bug when i use the normal search terms, so I do not know if the bug report is still in review or not, and whether it ever made it into the set of hotspot/gc bugs or not, but the Review ID i recvd was:- "Your Report (Review ID: 2391561) - Promotion failure code does not scale " I'll try and dig up the (raw, tentative) patch and send it in soon. -- ramki On Fri, May 17, 2013 at 9:37 AM, Srinivas Ramakrishna wrote: > Hi Leon -- > > Here's the history of that discussion, starting with this email > (follow subject thread): > > http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2012-October/001370.html > > On Thu, May 16, 2013 at 10:06 PM, the.6th.month at gmail.com > wrote: >> hi, Ramki: >> btw, could you possibly explain what the bugs are and how those bugs affect >> the fallback fullgc time? I am really curious about the reason. >> thanks very much. >> >> all the best, >> Leon >> >> On 17 May 2013 12:39, "the.6th.month at gmail.com" >> wrote: >>> >>> thanks very much indeed, hope we can see your patch soon >>> >>> >>> On 17 May 2013 12:38, Srinivas Ramakrishna wrote: >>>> >>>> Hi Leon -- >>>> >>>> Yes, there are a couple of performance bugs related to promotion >>>> failure handling with ParNew+CMS that can cause this time to balloon. >>>> Here the unwind of the failed promotion took 177 s. I have at least a >>>> partial fix for this which I had written up a few months ago but never >>>> quite got around to collecting sufficient performance data to submit >>>> it as an official patch. >>>> >>>> I'll try and revive that patch and submit it... May be someone else >>>> can check if it helps sufficiently in the performance with promotion >>>> failure. >>>> >>>> -- ramki >>>> >>>> >>>> On Thu, May 16, 2013 at 9:19 PM, the.6th.month at gmail.com >>>> wrote: >>>> > hi, all: >>>> > We just had a situation that I don't quite understand with CMS gc. When >>>> > I >>>> > examined the gc log, I found that there was a cms gc which resulted in >>>> > a >>>> > parnew promotion failure and concurrent mode failure at the same time, >>>> > and >>>> > then the full gc lasted for slightly over three minutes. Here is the gc >>>> > log: >>>> > 2013-05-17T10:12:55.983+0800: 45168.774: [CMS-concurrent-mark: >>>> > 7.056/7.860 >>>> > secs] [Times: user=14.90 sys=0.45, real=7.86 secs] >>>> > 2013-05-17T10:12:55.984+0800: 45168.775: >>>> > [CMS-concurrent-preclean-start] >>>> > 2013-05-17T10:12:56.753+0800: 45169.544: [CMS-concurrent-preclean: >>>> > 0.676/0.770 secs] [Times: user=0.83 sys=0.15, real=0.77 secs] >>>> > 2013-05-17T10:12:56.753+0800: 45169.544: >>>> > [CMS-concurrent-abortable-preclean-start] >>>> > 2013-05-17T10:12:58.460+0800: 45171.251: [GC 45171.252: [ParNew >>>> > (promotion >>>> > failed) >>>> > Desired survivor size 67108864 bytes, new threshold 1 (max 6) >>>> > - age 1: 70527216 bytes, 70527216 total >>>> > : 917504K->917504K(917504K), 177.3558880 secs]45348.608: [CMS CMS: >>>> > abort >>>> > preclean due to time 2013-05-17T10:15:56.197+0800: 45348.989: >>>> > [CMS-concurrent-abortable-preclean: 2.037/179.444 secs] [Times: >>>> > user=44.72 >>>> > sys=13.59, real=179.45 secs] >>>> > (concurrent mode failure): 3017323K->2177093K(3047424K), 16.5528620 >>>> > secs] >>>> > 3879476K->2177093K(3964928K), [CMS Perm : 91333K->91008K(262144K)], >>>> > 193.9097970 secs] [Times: user=58.79 sys=13.55, real=193.91 secs] >>>> > >>>> > the usual cms full gc time was roughly 100ms-400ms, but this time it >>>> > lasted >>>> > for 193 seconds. I understand that when there's a parnew gc happens >>>> > during >>>> > cms and to space is not large enough to hold all survived objects, or >>>> > the >>>> > remaining space in old gen cannot cope with memory allocation in old >>>> > gen, >>>> > full gc happens. But I don't understand why it hangs so long. >>>> > I am using oracle jdk 1.6.0_37, and the jvm options we use are: >>>> > -Xms4000m -Xmx4000m -Xmn1G -Xss256k -XX:PermSize=256m >>>> > -XX:SurvivorRatio=6 >>>> > -XX:MaxTenuringThreshold=6 -XX:+DisableExplicitGC -Xnoclassgc >>>> > -Xverify:none >>>> > -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled >>>> > -XX:+UseCMSCompactAtFullCollection -XX:+UseFastAccessorMethods >>>> > -XX:+CMSPermGenSweepingEnabled -XX:+CMSClassUnloadingEnabled >>>> > -XX:+UseCompressedOops -XX:CMSInitiatingOccupancyFraction=90 >>>> > -XX:+UseCMSInitiatingOccupancyOnly >>>> > >>>> > Could it be a bug that results in the long full gc in case of promotion >>>> > failure or something else? Could anyone offer me some help, and I >>>> > really >>>> > appreciate your help. >>>> > >>>> > Looking forward to any reply. >>>> > >>>> > All the best, >>>> > Leon >>>> > >>>> > >>>> > _______________________________________________ >>>> > hotspot-gc-use mailing list >>>> > hotspot-gc-use at openjdk.java.net >>>> > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>>> > >>> >>> >> From the.6th.month at gmail.com Fri May 17 10:14:01 2013 From: the.6th.month at gmail.com (the.6th.month at gmail.com) Date: Sat, 18 May 2013 01:14:01 +0800 Subject: unexpected full gc time spike In-Reply-To: References: Message-ID: thanks ramki, looking forward to it Leon On 18 May 2013 01:05, "Srinivas Ramakrishna" wrote: > Looks like the search functionality of bugs.sun.com is no longer > available. I tried searching the new bugzilla portal for the bug I had > submitted around that time, but that doesn't bring up the bug when i > use the normal search terms, so I do not know if the bug report is > still in review or not, and whether it ever made it into the set of > hotspot/gc bugs or not, but the Review ID i recvd was:- > > "Your Report (Review ID: 2391561) - Promotion failure code does not > scale " > > I'll try and dig up the (raw, tentative) patch and send it in soon. > > -- ramki > > > On Fri, May 17, 2013 at 9:37 AM, Srinivas Ramakrishna > wrote: > > Hi Leon -- > > > > Here's the history of that discussion, starting with this email > > (follow subject thread): > > > > > http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2012-October/001370.html > > > > On Thu, May 16, 2013 at 10:06 PM, the.6th.month at gmail.com > > wrote: > >> hi, Ramki: > >> btw, could you possibly explain what the bugs are and how those bugs > affect > >> the fallback fullgc time? I am really curious about the reason. > >> thanks very much. > >> > >> all the best, > >> Leon > >> > >> On 17 May 2013 12:39, "the.6th.month at gmail.com" < > the.6th.month at gmail.com> > >> wrote: > >>> > >>> thanks very much indeed, hope we can see your patch soon > >>> > >>> > >>> On 17 May 2013 12:38, Srinivas Ramakrishna wrote: > >>>> > >>>> Hi Leon -- > >>>> > >>>> Yes, there are a couple of performance bugs related to promotion > >>>> failure handling with ParNew+CMS that can cause this time to balloon. > >>>> Here the unwind of the failed promotion took 177 s. I have at least a > >>>> partial fix for this which I had written up a few months ago but never > >>>> quite got around to collecting sufficient performance data to submit > >>>> it as an official patch. > >>>> > >>>> I'll try and revive that patch and submit it... May be someone else > >>>> can check if it helps sufficiently in the performance with promotion > >>>> failure. > >>>> > >>>> -- ramki > >>>> > >>>> > >>>> On Thu, May 16, 2013 at 9:19 PM, the.6th.month at gmail.com > >>>> wrote: > >>>> > hi, all: > >>>> > We just had a situation that I don't quite understand with CMS gc. > When > >>>> > I > >>>> > examined the gc log, I found that there was a cms gc which resulted > in > >>>> > a > >>>> > parnew promotion failure and concurrent mode failure at the same > time, > >>>> > and > >>>> > then the full gc lasted for slightly over three minutes. Here is > the gc > >>>> > log: > >>>> > 2013-05-17T10:12:55.983+0800: 45168.774: [CMS-concurrent-mark: > >>>> > 7.056/7.860 > >>>> > secs] [Times: user=14.90 sys=0.45, real=7.86 secs] > >>>> > 2013-05-17T10:12:55.984+0800: 45168.775: > >>>> > [CMS-concurrent-preclean-start] > >>>> > 2013-05-17T10:12:56.753+0800: 45169.544: [CMS-concurrent-preclean: > >>>> > 0.676/0.770 secs] [Times: user=0.83 sys=0.15, real=0.77 secs] > >>>> > 2013-05-17T10:12:56.753+0800: 45169.544: > >>>> > [CMS-concurrent-abortable-preclean-start] > >>>> > 2013-05-17T10:12:58.460+0800: 45171.251: [GC 45171.252: [ParNew > >>>> > (promotion > >>>> > failed) > >>>> > Desired survivor size 67108864 bytes, new threshold 1 (max 6) > >>>> > - age 1: 70527216 bytes, 70527216 total > >>>> > : 917504K->917504K(917504K), 177.3558880 secs]45348.608: [CMS CMS: > >>>> > abort > >>>> > preclean due to time 2013-05-17T10:15:56.197+0800: 45348.989: > >>>> > [CMS-concurrent-abortable-preclean: 2.037/179.444 secs] [Times: > >>>> > user=44.72 > >>>> > sys=13.59, real=179.45 secs] > >>>> > (concurrent mode failure): 3017323K->2177093K(3047424K), 16.5528620 > >>>> > secs] > >>>> > 3879476K->2177093K(3964928K), [CMS Perm : 91333K->91008K(262144K)], > >>>> > 193.9097970 secs] [Times: user=58.79 sys=13.55, real=193.91 secs] > >>>> > > >>>> > the usual cms full gc time was roughly 100ms-400ms, but this time it > >>>> > lasted > >>>> > for 193 seconds. I understand that when there's a parnew gc happens > >>>> > during > >>>> > cms and to space is not large enough to hold all survived objects, > or > >>>> > the > >>>> > remaining space in old gen cannot cope with memory allocation in old > >>>> > gen, > >>>> > full gc happens. But I don't understand why it hangs so long. > >>>> > I am using oracle jdk 1.6.0_37, and the jvm options we use are: > >>>> > -Xms4000m -Xmx4000m -Xmn1G -Xss256k -XX:PermSize=256m > >>>> > -XX:SurvivorRatio=6 > >>>> > -XX:MaxTenuringThreshold=6 -XX:+DisableExplicitGC -Xnoclassgc > >>>> > -Xverify:none > >>>> > -XX:+UseParNewGC -XX:+UseConcMarkSweepGC > -XX:+CMSParallelRemarkEnabled > >>>> > -XX:+UseCMSCompactAtFullCollection -XX:+UseFastAccessorMethods > >>>> > -XX:+CMSPermGenSweepingEnabled -XX:+CMSClassUnloadingEnabled > >>>> > -XX:+UseCompressedOops -XX:CMSInitiatingOccupancyFraction=90 > >>>> > -XX:+UseCMSInitiatingOccupancyOnly > >>>> > > >>>> > Could it be a bug that results in the long full gc in case of > promotion > >>>> > failure or something else? Could anyone offer me some help, and I > >>>> > really > >>>> > appreciate your help. > >>>> > > >>>> > Looking forward to any reply. > >>>> > > >>>> > All the best, > >>>> > Leon > >>>> > > >>>> > > >>>> > _______________________________________________ > >>>> > hotspot-gc-use mailing list > >>>> > hotspot-gc-use at openjdk.java.net > >>>> > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > >>>> > > >>> > >>> > >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130518/6633aaca/attachment-0001.html From jon.masamitsu at oracle.com Fri May 17 10:29:48 2013 From: jon.masamitsu at oracle.com (Jon Masamitsu) Date: Fri, 17 May 2013 10:29:48 -0700 Subject: unexpected full gc time spike In-Reply-To: References: Message-ID: <5196690C.6040900@oracle.com> I found it 8005060: Promotion failure code does not scale On 5/17/2013 10:05 AM, Srinivas Ramakrishna wrote: > Looks like the search functionality of bugs.sun.com is no longer > available. I tried searching the new bugzilla portal for the bug I had > submitted around that time, but that doesn't bring up the bug when i > use the normal search terms, so I do not know if the bug report is > still in review or not, and whether it ever made it into the set of > hotspot/gc bugs or not, but the Review ID i recvd was:- > > "Your Report (Review ID: 2391561) - Promotion failure code does not scale " > > I'll try and dig up the (raw, tentative) patch and send it in soon. > > -- ramki > > > On Fri, May 17, 2013 at 9:37 AM, Srinivas Ramakrishna wrote: >> Hi Leon -- >> >> Here's the history of that discussion, starting with this email >> (follow subject thread): >> >> http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2012-October/001370.html >> >> On Thu, May 16, 2013 at 10:06 PM, the.6th.month at gmail.com >> wrote: >>> hi, Ramki: >>> btw, could you possibly explain what the bugs are and how those bugs affect >>> the fallback fullgc time? I am really curious about the reason. >>> thanks very much. >>> >>> all the best, >>> Leon >>> >>> On 17 May 2013 12:39, "the.6th.month at gmail.com" >>> wrote: >>>> thanks very much indeed, hope we can see your patch soon >>>> >>>> >>>> On 17 May 2013 12:38, Srinivas Ramakrishna wrote: >>>>> Hi Leon -- >>>>> >>>>> Yes, there are a couple of performance bugs related to promotion >>>>> failure handling with ParNew+CMS that can cause this time to balloon. >>>>> Here the unwind of the failed promotion took 177 s. I have at least a >>>>> partial fix for this which I had written up a few months ago but never >>>>> quite got around to collecting sufficient performance data to submit >>>>> it as an official patch. >>>>> >>>>> I'll try and revive that patch and submit it... May be someone else >>>>> can check if it helps sufficiently in the performance with promotion >>>>> failure. >>>>> >>>>> -- ramki >>>>> >>>>> >>>>> On Thu, May 16, 2013 at 9:19 PM, the.6th.month at gmail.com >>>>> wrote: >>>>>> hi, all: >>>>>> We just had a situation that I don't quite understand with CMS gc. When >>>>>> I >>>>>> examined the gc log, I found that there was a cms gc which resulted in >>>>>> a >>>>>> parnew promotion failure and concurrent mode failure at the same time, >>>>>> and >>>>>> then the full gc lasted for slightly over three minutes. Here is the gc >>>>>> log: >>>>>> 2013-05-17T10:12:55.983+0800: 45168.774: [CMS-concurrent-mark: >>>>>> 7.056/7.860 >>>>>> secs] [Times: user=14.90 sys=0.45, real=7.86 secs] >>>>>> 2013-05-17T10:12:55.984+0800: 45168.775: >>>>>> [CMS-concurrent-preclean-start] >>>>>> 2013-05-17T10:12:56.753+0800: 45169.544: [CMS-concurrent-preclean: >>>>>> 0.676/0.770 secs] [Times: user=0.83 sys=0.15, real=0.77 secs] >>>>>> 2013-05-17T10:12:56.753+0800: 45169.544: >>>>>> [CMS-concurrent-abortable-preclean-start] >>>>>> 2013-05-17T10:12:58.460+0800: 45171.251: [GC 45171.252: [ParNew >>>>>> (promotion >>>>>> failed) >>>>>> Desired survivor size 67108864 bytes, new threshold 1 (max 6) >>>>>> - age 1: 70527216 bytes, 70527216 total >>>>>> : 917504K->917504K(917504K), 177.3558880 secs]45348.608: [CMS CMS: >>>>>> abort >>>>>> preclean due to time 2013-05-17T10:15:56.197+0800: 45348.989: >>>>>> [CMS-concurrent-abortable-preclean: 2.037/179.444 secs] [Times: >>>>>> user=44.72 >>>>>> sys=13.59, real=179.45 secs] >>>>>> (concurrent mode failure): 3017323K->2177093K(3047424K), 16.5528620 >>>>>> secs] >>>>>> 3879476K->2177093K(3964928K), [CMS Perm : 91333K->91008K(262144K)], >>>>>> 193.9097970 secs] [Times: user=58.79 sys=13.55, real=193.91 secs] >>>>>> >>>>>> the usual cms full gc time was roughly 100ms-400ms, but this time it >>>>>> lasted >>>>>> for 193 seconds. I understand that when there's a parnew gc happens >>>>>> during >>>>>> cms and to space is not large enough to hold all survived objects, or >>>>>> the >>>>>> remaining space in old gen cannot cope with memory allocation in old >>>>>> gen, >>>>>> full gc happens. But I don't understand why it hangs so long. >>>>>> I am using oracle jdk 1.6.0_37, and the jvm options we use are: >>>>>> -Xms4000m -Xmx4000m -Xmn1G -Xss256k -XX:PermSize=256m >>>>>> -XX:SurvivorRatio=6 >>>>>> -XX:MaxTenuringThreshold=6 -XX:+DisableExplicitGC -Xnoclassgc >>>>>> -Xverify:none >>>>>> -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled >>>>>> -XX:+UseCMSCompactAtFullCollection -XX:+UseFastAccessorMethods >>>>>> -XX:+CMSPermGenSweepingEnabled -XX:+CMSClassUnloadingEnabled >>>>>> -XX:+UseCompressedOops -XX:CMSInitiatingOccupancyFraction=90 >>>>>> -XX:+UseCMSInitiatingOccupancyOnly >>>>>> >>>>>> Could it be a bug that results in the long full gc in case of promotion >>>>>> failure or something else? Could anyone offer me some help, and I >>>>>> really >>>>>> appreciate your help. >>>>>> >>>>>> Looking forward to any reply. >>>>>> >>>>>> All the best, >>>>>> Leon >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> hotspot-gc-use mailing list >>>>>> hotspot-gc-use at openjdk.java.net >>>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>>>>> >>>> > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130517/a80f8f34/attachment.html From the.6th.month at gmail.com Sat May 18 02:05:49 2013 From: the.6th.month at gmail.com (the.6th.month at gmail.com) Date: Sat, 18 May 2013 17:05:49 +0800 Subject: unexpected full gc time spike In-Reply-To: <5196690C.6040900@oracle.com> References: <5196690C.6040900@oracle.com> Message-ID: Hi, Jon & Ramki: Sorry I can't get access to that page, the browser says the webpage is currently unavailable. But I did look through the whole mail thread regarding this issue. I am wondering if I get it right. If each thread failed fast and started to fall back to single-threaded full gc immediately, there wouldn't be such a long pause. But under the current mechanism, there's no such a global flag to coordinate threads to fall back, and each thread tends to retry the allocation of new plab which could result in highly active locking contention and hence the extremely long pause. Is that correct? Leon On 18 May 2013 01:29, Jon Masamitsu wrote: > I found it > > 8005060: Promotion failure code does not scale > > > > On 5/17/2013 10:05 AM, Srinivas Ramakrishna wrote: > > Looks like the search functionality of bugs.sun.com is no longer > available. I tried searching the new bugzilla portal for the bug I had > submitted around that time, but that doesn't bring up the bug when i > use the normal search terms, so I do not know if the bug report is > still in review or not, and whether it ever made it into the set of > hotspot/gc bugs or not, but the Review ID i recvd was:- > > "Your Report (Review ID: 2391561) - Promotion failure code does not scale " > > I'll try and dig up the (raw, tentative) patch and send it in soon. > > -- ramki > > > On Fri, May 17, 2013 at 9:37 AM, Srinivas Ramakrishna wrote: > > Hi Leon -- > > Here's the history of that discussion, starting with this email > (follow subject thread): > http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2012-October/001370.html > > On Thu, May 16, 2013 at 10:06 PM, the.6th.month at gmail.com wrote: > > hi, Ramki: > btw, could you possibly explain what the bugs are and how those bugs affect > the fallback fullgc time? I am really curious about the reason. > thanks very much. > > all the best, > Leon > > On 17 May 2013 12:39, "the.6th.month at gmail.com" > wrote: > > thanks very much indeed, hope we can see your patch soon > > > On 17 May 2013 12:38, Srinivas Ramakrishna wrote: > > Hi Leon -- > > Yes, there are a couple of performance bugs related to promotion > failure handling with ParNew+CMS that can cause this time to balloon. > Here the unwind of the failed promotion took 177 s. I have at least a > partial fix for this which I had written up a few months ago but never > quite got around to collecting sufficient performance data to submit > it as an official patch. > > I'll try and revive that patch and submit it... May be someone else > can check if it helps sufficiently in the performance with promotion > failure. > > -- ramki > > > On Thu, May 16, 2013 at 9:19 PM, the.6th.month at gmail.com wrote: > > hi, all: > We just had a situation that I don't quite understand with CMS gc. When > I > examined the gc log, I found that there was a cms gc which resulted in > a > parnew promotion failure and concurrent mode failure at the same time, > and > then the full gc lasted for slightly over three minutes. Here is the gc > log: > 2013-05-17T10:12:55.983+0800: 45168.774: [CMS-concurrent-mark: > 7.056/7.860 > secs] [Times: user=14.90 sys=0.45, real=7.86 secs] > 2013-05-17T10:12:55.984+0800: 45168.775: > [CMS-concurrent-preclean-start] > 2013-05-17T10:12:56.753+0800: 45169.544: [CMS-concurrent-preclean: > 0.676/0.770 secs] [Times: user=0.83 sys=0.15, real=0.77 secs] > 2013-05-17T10:12:56.753+0800: 45169.544: > [CMS-concurrent-abortable-preclean-start] > 2013-05-17T10:12:58.460+0800: 45171.251: [GC 45171.252: [ParNew > (promotion > failed) > Desired survivor size 67108864 bytes, new threshold 1 (max 6) > - age 1: 70527216 bytes, 70527216 total > : 917504K->917504K(917504K), 177.3558880 secs]45348.608: [CMS CMS: > abort > preclean due to time 2013-05-17T10:15:56.197+0800: 45348.989: > [CMS-concurrent-abortable-preclean: 2.037/179.444 secs] [Times: > user=44.72 > sys=13.59, real=179.45 secs] > (concurrent mode failure): 3017323K->2177093K(3047424K), 16.5528620 > secs] > 3879476K->2177093K(3964928K), [CMS Perm : 91333K->91008K(262144K)], > 193.9097970 secs] [Times: user=58.79 sys=13.55, real=193.91 secs] > > the usual cms full gc time was roughly 100ms-400ms, but this time it > lasted > for 193 seconds. I understand that when there's a parnew gc happens > during > cms and to space is not large enough to hold all survived objects, or > the > remaining space in old gen cannot cope with memory allocation in old > gen, > full gc happens. But I don't understand why it hangs so long. > I am using oracle jdk 1.6.0_37, and the jvm options we use are: > -Xms4000m -Xmx4000m -Xmn1G -Xss256k -XX:PermSize=256m > -XX:SurvivorRatio=6 > -XX:MaxTenuringThreshold=6 -XX:+DisableExplicitGC -Xnoclassgc > -Xverify:none > -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled > -XX:+UseCMSCompactAtFullCollection -XX:+UseFastAccessorMethods > -XX:+CMSPermGenSweepingEnabled -XX:+CMSClassUnloadingEnabled > -XX:+UseCompressedOops -XX:CMSInitiatingOccupancyFraction=90 > -XX:+UseCMSInitiatingOccupancyOnly > > Could it be a bug that results in the long full gc in case of promotion > failure or something else? Could anyone offer me some help, and I > really > appreciate your help. > > Looking forward to any reply. > > All the best, > Leon > > > _______________________________________________ > hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > _______________________________________________ > hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130518/9c873d4f/attachment.html From bengt.rutisson at oracle.com Sun May 19 22:39:40 2013 From: bengt.rutisson at oracle.com (Bengt Rutisson) Date: Mon, 20 May 2013 07:39:40 +0200 Subject: unexpected full gc time spike In-Reply-To: References: <5196690C.6040900@oracle.com> Message-ID: <5199B71C.5070807@oracle.com> Hi Leon, Here is the link to the bug that should be available to you: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8005060 Ramki, The bug report made it in to the hotspot/gc bug set just a couple of days after you filed it, in December 2012. It has been classified as an enhancement and since we are in a bug fixing phase right now we don't have anybody assigned to fixing this right now. I haven't been following this thread closely enough to have a strong opinion about whether it is a bug or an enhancement. Let me know if you think we should update the the report to be a bug rather than an enhancement. If you have a patch it would be great to try to get it out for review. Sounds like a good thing to fix no matter how we define the bug report :) Thanks, Bengt On 5/18/13 11:05 AM, the.6th.month at gmail.com wrote: > Hi, Jon & Ramki: > Sorry I can't get access to that page, the browser says the webpage is > currently unavailable. But I did look through the whole mail thread > regarding this issue. I am wondering if I get it right. If each thread > failed fast and started to fall back to single-threaded full gc > immediately, there wouldn't be such a long pause. But under the > current mechanism, there's no such a global flag to coordinate threads > to fall back, and each thread tends to retry the allocation of new > plab which could result in highly active locking contention and hence > the extremely long pause. > Is that correct? > > Leon > > > On 18 May 2013 01:29, Jon Masamitsu > wrote: > > I found it > > 8005060: Promotion failure code does not scale > > > > > On 5/17/2013 10:05 AM, Srinivas Ramakrishna wrote: >> Looks like the search functionality ofbugs.sun.com is no longer >> available. I tried searching the new bugzilla portal for the bug I had >> submitted around that time, but that doesn't bring up the bug when i >> use the normal search terms, so I do not know if the bug report is >> still in review or not, and whether it ever made it into the set of >> hotspot/gc bugs or not, but the Review ID i recvd was:- >> >> "Your Report (Review ID: 2391561) - Promotion failure code does not scale " >> >> I'll try and dig up the (raw, tentative) patch and send it in soon. >> >> -- ramki >> >> >> On Fri, May 17, 2013 at 9:37 AM, Srinivas Ramakrishna wrote: >>> Hi Leon -- >>> >>> Here's the history of that discussion, starting with this email >>> (follow subject thread): >>> >>> http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2012-October/001370.html >>> >>> On Thu, May 16, 2013 at 10:06 PM,the.6th.month at gmail.com >>> wrote: >>>> hi, Ramki: >>>> btw, could you possibly explain what the bugs are and how those bugs affect >>>> the fallback fullgc time? I am really curious about the reason. >>>> thanks very much. >>>> >>>> all the best, >>>> Leon >>>> >>>> On 17 May 2013 12:39,"the.6th.month at gmail.com" >>>> wrote: >>>>> thanks very much indeed, hope we can see your patch soon >>>>> >>>>> >>>>> On 17 May 2013 12:38, Srinivas Ramakrishna wrote: >>>>>> Hi Leon -- >>>>>> >>>>>> Yes, there are a couple of performance bugs related to promotion >>>>>> failure handling with ParNew+CMS that can cause this time to balloon. >>>>>> Here the unwind of the failed promotion took 177 s. I have at least a >>>>>> partial fix for this which I had written up a few months ago but never >>>>>> quite got around to collecting sufficient performance data to submit >>>>>> it as an official patch. >>>>>> >>>>>> I'll try and revive that patch and submit it... May be someone else >>>>>> can check if it helps sufficiently in the performance with promotion >>>>>> failure. >>>>>> >>>>>> -- ramki >>>>>> >>>>>> >>>>>> On Thu, May 16, 2013 at 9:19 PM,the.6th.month at gmail.com >>>>>> wrote: >>>>>>> hi, all: >>>>>>> We just had a situation that I don't quite understand with CMS gc. When >>>>>>> I >>>>>>> examined the gc log, I found that there was a cms gc which resulted in >>>>>>> a >>>>>>> parnew promotion failure and concurrent mode failure at the same time, >>>>>>> and >>>>>>> then the full gc lasted for slightly over three minutes. Here is the gc >>>>>>> log: >>>>>>> 2013-05-17T10:12:55.983+0800: 45168.774: [CMS-concurrent-mark: >>>>>>> 7.056/7.860 >>>>>>> secs] [Times: user=14.90 sys=0.45, real=7.86 secs] >>>>>>> 2013-05-17T10:12:55.984+0800: 45168.775: >>>>>>> [CMS-concurrent-preclean-start] >>>>>>> 2013-05-17T10:12:56.753+0800: 45169.544: [CMS-concurrent-preclean: >>>>>>> 0.676/0.770 secs] [Times: user=0.83 sys=0.15, real=0.77 secs] >>>>>>> 2013-05-17T10:12:56.753+0800: 45169.544: >>>>>>> [CMS-concurrent-abortable-preclean-start] >>>>>>> 2013-05-17T10:12:58.460+0800: 45171.251: [GC 45171.252: [ParNew >>>>>>> (promotion >>>>>>> failed) >>>>>>> Desired survivor size 67108864 bytes, new threshold 1 (max 6) >>>>>>> - age 1: 70527216 bytes, 70527216 total >>>>>>> : 917504K->917504K(917504K), 177.3558880 secs]45348.608: [CMS CMS: >>>>>>> abort >>>>>>> preclean due to time 2013-05-17T10:15:56.197+0800: 45348.989: >>>>>>> [CMS-concurrent-abortable-preclean: 2.037/179.444 secs] [Times: >>>>>>> user=44.72 >>>>>>> sys=13.59, real=179.45 secs] >>>>>>> (concurrent mode failure): 3017323K->2177093K(3047424K), 16.5528620 >>>>>>> secs] >>>>>>> 3879476K->2177093K(3964928K), [CMS Perm : 91333K->91008K(262144K)], >>>>>>> 193.9097970 secs] [Times: user=58.79 sys=13.55, real=193.91 secs] >>>>>>> >>>>>>> the usual cms full gc time was roughly 100ms-400ms, but this time it >>>>>>> lasted >>>>>>> for 193 seconds. I understand that when there's a parnew gc happens >>>>>>> during >>>>>>> cms and to space is not large enough to hold all survived objects, or >>>>>>> the >>>>>>> remaining space in old gen cannot cope with memory allocation in old >>>>>>> gen, >>>>>>> full gc happens. But I don't understand why it hangs so long. >>>>>>> I am using oracle jdk 1.6.0_37, and the jvm options we use are: >>>>>>> -Xms4000m -Xmx4000m -Xmn1G -Xss256k -XX:PermSize=256m >>>>>>> -XX:SurvivorRatio=6 >>>>>>> -XX:MaxTenuringThreshold=6 -XX:+DisableExplicitGC -Xnoclassgc >>>>>>> -Xverify:none >>>>>>> -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled >>>>>>> -XX:+UseCMSCompactAtFullCollection -XX:+UseFastAccessorMethods >>>>>>> -XX:+CMSPermGenSweepingEnabled -XX:+CMSClassUnloadingEnabled >>>>>>> -XX:+UseCompressedOops -XX:CMSInitiatingOccupancyFraction=90 >>>>>>> -XX:+UseCMSInitiatingOccupancyOnly >>>>>>> >>>>>>> Could it be a bug that results in the long full gc in case of promotion >>>>>>> failure or something else? Could anyone offer me some help, and I >>>>>>> really >>>>>>> appreciate your help. >>>>>>> >>>>>>> Looking forward to any reply. >>>>>>> >>>>>>> All the best, >>>>>>> Leon >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> hotspot-gc-use mailing list >>>>>>> hotspot-gc-use at openjdk.java.net >>>>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>>>>>> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130520/a7064d63/attachment.html From java at java4.info Thu May 23 05:25:27 2013 From: java at java4.info (Florian Binder) Date: Thu, 23 May 2013 14:25:27 +0200 Subject: Missing memory Message-ID: <519E0AB7.3010202@java4.info> Hi all, I am running a jboss application with an embedded h2-database using the CMS-Collector. It uses the following memory configuration: -Xms8G -Xmx8G -Xmn2G After running a while I got the following interesting issue: After a young collection the application uses only 3172435K (8178944K). But In the statistics for the BinaryTreeDictionary I see only 1976982 words (~ 16MB) of Total Free Space. So I am wondering about the 2GB which are not used and not in the free list space. Might it be in a TLAB or PLAB or where? The annoying problem of this occurs during the next young collection where it does not have enough space in the old generation and fails with "promotion failed", which results in a 17s stw collection. After this collection I have 446204324 of Total Free Space, which seems correct. A concurrent collection is not running due to less usage of the old generation. I am running it on an 8 core machine with Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode) (1.7.0_17). Detailed information can be found below. Thank you for your help, Flo ############ The startup parameter are: ############ -server \ -Xms8G -Xmx8G \ -Xmn2G \ -XX:MaxPermSize=256m \ -verbose:gc \ -XX:+PrintGC \ -XX:+PrintGCDateStamps \ -XX:+PrintGCDetails \ -XX:+UseConcMarkSweepGC \ -XX:CMSInitiatingOccupancyFraction=80 \ -XX:+PrintFlagsFinal \ -XX:PrintFLSStatistics=1 \ -XX:+PrintTenuringDistribution \ -XX:+PrintGCApplicationConcurrentTime \ -XX:+PrintGCApplicationStoppedTime \ -XX:+UseLargePages \ -XX:LargePageSizeInBytes=4m \ ############ The relevant gc-log snippet: ############ 2013-05-23T01:04:57.536-0400: [GC Before GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 4459853 Max Chunk Size: 2117113 Number of Blocks: 10 Av. Block Size: 445985 Tree Height: 6 Before GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 6837777 Max Chunk Size: 6832640 Number of Blocks: 6 Av. Block Size: 1139629 Tree Height: 5 [ParNew Desired survivor size 107347968 bytes, new threshold 2 (max 6) - age 1: 59250760 bytes, 59250760 total - age 2: 72435232 bytes, 131685992 total : 1887488K->206257K(1887488K), 0,1177960 secs] 4788275K->3172435K(8178944K)After GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 1976982 Max Chunk Size: 1969801 Number of Blocks: 2 Av. Block Size: 988491 Tree Height: 2 After GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 6837777 Max Chunk Size: 6832640 Number of Blocks: 6 Av. Block Size: 1139629 Tree Height: 5 , 0,1179200 secs] [Times: user=0,80 sys=0,00, real=0,12 secs] Total time for which application threads were stopped: 0,1186510 seconds Application time: 0,7920070 seconds 2013-05-23T01:04:58.447-0400: [GC Before GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 1976982 Max Chunk Size: 1969801 Number of Blocks: 2 Av. Block Size: 988491 Tree Height: 2 Before GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 6837777 Max Chunk Size: 6832640 Number of Blocks: 6 Av. Block Size: 1139629 Tree Height: 5 [ParNew (promotion failed) Desired survivor size 107347968 bytes, new threshold 2 (max 6) - age 1: 57903280 bytes, 57903280 total - age 2: 52076168 bytes, 109979448 total : 1884081K->1878750K(1887488K), 2,5295040 secs][CMSCMS: Large block 0x000000071b3bb2e0 : 3020224K->2805356K(6291456K), 15,0643760 secs] 4850259K->2805356K(8178944K), [CMS Perm : 80257K->79027K(133764K)]After GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 446204324 Max Chunk Size: 446204324 Number of Blocks: 1 Av. Block Size: 446204324 Tree Height: 1 After GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 0 Max Chunk Size: 0 Number of Blocks: 0 Tree Height: 0 , 17,5940190 secs] [Times: user=18,00 sys=0,78, real=17,59 secs] From Philip.Lee at smartstream-stp.com Thu May 23 05:47:14 2013 From: Philip.Lee at smartstream-stp.com (Philip Lee) Date: Thu, 23 May 2013 12:47:14 +0000 Subject: Missing memory In-Reply-To: <519E0AB7.3010202@java4.info> References: <519E0AB7.3010202@java4.info> Message-ID: <62EC155DFFC99A4EB1D3BBC6CD0A395D01BFF1EDC1@briexch0002.sst.stp> Hi, We have seen a few problems using the CMS collector with JBoss one of which was that the CMS collector does not collect objects that have finalizer() methods. The version of JBoss that we were using (5.1) made heavy use of objects with finalizer() methods within the implementation of its VFS component. We ended up switching to the parallel collector which gave us maximum pause times of around 5s on a 5G heap. - Phil ________________________________________ From: hotspot-gc-use-bounces at openjdk.java.net [hotspot-gc-use-bounces at openjdk.java.net] on behalf of Florian Binder [java at java4.info] Sent: 23 May 2013 13:25 To: hotspot-gc-use at openjdk.java.net Subject: Missing memory Hi all, I am running a jboss application with an embedded h2-database using the CMS-Collector. It uses the following memory configuration: -Xms8G -Xmx8G -Xmn2G After running a while I got the following interesting issue: After a young collection the application uses only 3172435K (8178944K). But In the statistics for the BinaryTreeDictionary I see only 1976982 words (~ 16MB) of Total Free Space. So I am wondering about the 2GB which are not used and not in the free list space. Might it be in a TLAB or PLAB or where? The annoying problem of this occurs during the next young collection where it does not have enough space in the old generation and fails with "promotion failed", which results in a 17s stw collection. After this collection I have 446204324 of Total Free Space, which seems correct. A concurrent collection is not running due to less usage of the old generation. I am running it on an 8 core machine with Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode) (1.7.0_17). Detailed information can be found below. Thank you for your help, Flo ############ The startup parameter are: ############ -server \ -Xms8G -Xmx8G \ -Xmn2G \ -XX:MaxPermSize=256m \ -verbose:gc \ -XX:+PrintGC \ -XX:+PrintGCDateStamps \ -XX:+PrintGCDetails \ -XX:+UseConcMarkSweepGC \ -XX:CMSInitiatingOccupancyFraction=80 \ -XX:+PrintFlagsFinal \ -XX:PrintFLSStatistics=1 \ -XX:+PrintTenuringDistribution \ -XX:+PrintGCApplicationConcurrentTime \ -XX:+PrintGCApplicationStoppedTime \ -XX:+UseLargePages \ -XX:LargePageSizeInBytes=4m \ ############ The relevant gc-log snippet: ############ 2013-05-23T01:04:57.536-0400: [GC Before GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 4459853 Max Chunk Size: 2117113 Number of Blocks: 10 Av. Block Size: 445985 Tree Height: 6 Before GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 6837777 Max Chunk Size: 6832640 Number of Blocks: 6 Av. Block Size: 1139629 Tree Height: 5 [ParNew Desired survivor size 107347968 bytes, new threshold 2 (max 6) - age 1: 59250760 bytes, 59250760 total - age 2: 72435232 bytes, 131685992 total : 1887488K->206257K(1887488K), 0,1177960 secs] 4788275K->3172435K(8178944K)After GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 1976982 Max Chunk Size: 1969801 Number of Blocks: 2 Av. Block Size: 988491 Tree Height: 2 After GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 6837777 Max Chunk Size: 6832640 Number of Blocks: 6 Av. Block Size: 1139629 Tree Height: 5 , 0,1179200 secs] [Times: user=0,80 sys=0,00, real=0,12 secs] Total time for which application threads were stopped: 0,1186510 seconds Application time: 0,7920070 seconds 2013-05-23T01:04:58.447-0400: [GC Before GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 1976982 Max Chunk Size: 1969801 Number of Blocks: 2 Av. Block Size: 988491 Tree Height: 2 Before GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 6837777 Max Chunk Size: 6832640 Number of Blocks: 6 Av. Block Size: 1139629 Tree Height: 5 [ParNew (promotion failed) Desired survivor size 107347968 bytes, new threshold 2 (max 6) - age 1: 57903280 bytes, 57903280 total - age 2: 52076168 bytes, 109979448 total : 1884081K->1878750K(1887488K), 2,5295040 secs][CMSCMS: Large block 0x000000071b3bb2e0 : 3020224K->2805356K(6291456K), 15,0643760 secs] 4850259K->2805356K(8178944K), [CMS Perm : 80257K->79027K(133764K)]After GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 446204324 Max Chunk Size: 446204324 Number of Blocks: 1 Av. Block Size: 446204324 Tree Height: 1 After GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 0 Max Chunk Size: 0 Number of Blocks: 0 Tree Height: 0 , 17,5940190 secs] [Times: user=18,00 sys=0,78, real=17,59 secs] _______________________________________________ hotspot-gc-use mailing list hotspot-gc-use at openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use ________________________________ The information in this email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this email by anyone else is unauthorised. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. SmartStream Technologies Ltd. is a company incorporated in England and Wales. Registered office: St Helen's, 1 Undershaft, London, EC3A 8EE. Registration No. 2285524 From java at java4.info Thu May 23 09:36:10 2013 From: java at java4.info (Florian Binder) Date: Thu, 23 May 2013 18:36:10 +0200 Subject: Missing memory In-Reply-To: <62EC155DFFC99A4EB1D3BBC6CD0A395D01BFF1EDC1@briexch0002.sst.stp> References: <519E0AB7.3010202@java4.info> <62EC155DFFC99A4EB1D3BBC6CD0A395D01BFF1EDC1@briexch0002.sst.stp> Message-ID: <519E457A.8070509@java4.info> Ok, I found it: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 97824599 Max Chunk Size: 67277447 Number of Blocks: 9063 Av. Block Size: 10793 Tree Height: 67 Statistics for IndexedFreeLists: -------------------------------- Total Free Space: 322670900 Max Chunk Size: 256 Number of Blocks: 53951619 Av. Block Size: 5 free=420495499 frag=0,9744 They are in the IndexedFreeLists. There seem to be a lot of very small objects in the old generation, which are removed soon :( /Flo Am 23.05.2013 14:47, schrieb Philip Lee: > Hi, > > We have seen a few problems using the CMS collector with JBoss one of which was that the CMS collector does not collect objects that have finalizer() methods. The version of JBoss that we were using (5.1) made heavy use of objects with finalizer() methods within the implementation of its VFS component. > > We ended up switching to the parallel collector which gave us maximum pause times of around 5s on a 5G heap. > > - Phil > ________________________________________ > From: hotspot-gc-use-bounces at openjdk.java.net [hotspot-gc-use-bounces at openjdk.java.net] on behalf of Florian Binder [java at java4.info] > Sent: 23 May 2013 13:25 > To: hotspot-gc-use at openjdk.java.net > Subject: Missing memory > > Hi all, > > I am running a jboss application with an embedded h2-database using the > CMS-Collector. > It uses the following memory configuration: > -Xms8G -Xmx8G -Xmn2G > > After running a while I got the following interesting issue: > After a young collection the application uses only 3172435K (8178944K). > But In the statistics for the BinaryTreeDictionary I see only 1976982 > words (~ 16MB) of Total Free Space. So I am wondering about the 2GB > which are not used and not in the free list space. Might it be in a TLAB > or PLAB or where? > > The annoying problem of this occurs during the next young collection > where it does not have enough space in the old generation and fails with > "promotion failed", which results in a 17s stw collection. After this > collection I have 446204324 of Total Free Space, which seems correct. > A concurrent collection is not running due to less usage of the old > generation. > > I am running it on an 8 core machine with > Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode) (1.7.0_17). > Detailed information can be found below. > > Thank you for your help, > Flo > > > ############ The startup parameter are: ############ > -server \ > -Xms8G -Xmx8G \ > -Xmn2G \ > -XX:MaxPermSize=256m \ > -verbose:gc \ > -XX:+PrintGC \ > -XX:+PrintGCDateStamps \ > -XX:+PrintGCDetails \ > -XX:+UseConcMarkSweepGC \ > -XX:CMSInitiatingOccupancyFraction=80 \ > -XX:+PrintFlagsFinal \ > -XX:PrintFLSStatistics=1 \ > -XX:+PrintTenuringDistribution \ > -XX:+PrintGCApplicationConcurrentTime \ > -XX:+PrintGCApplicationStoppedTime \ > -XX:+UseLargePages \ > -XX:LargePageSizeInBytes=4m \ > > ############ The relevant gc-log snippet: ############ > 2013-05-23T01:04:57.536-0400: [GC Before GC: > Statistics for BinaryTreeDictionary: > ------------------------------------ > Total Free Space: 4459853 > Max Chunk Size: 2117113 > Number of Blocks: 10 > Av. Block Size: 445985 > Tree Height: 6 > Before GC: > Statistics for BinaryTreeDictionary: > ------------------------------------ > Total Free Space: 6837777 > Max Chunk Size: 6832640 > Number of Blocks: 6 > Av. Block Size: 1139629 > Tree Height: 5 > [ParNew > Desired survivor size 107347968 bytes, new threshold 2 (max 6) > - age 1: 59250760 bytes, 59250760 total > - age 2: 72435232 bytes, 131685992 total > : 1887488K->206257K(1887488K), 0,1177960 secs] > 4788275K->3172435K(8178944K)After GC: > Statistics for BinaryTreeDictionary: > ------------------------------------ > Total Free Space: 1976982 > Max Chunk Size: 1969801 > Number of Blocks: 2 > Av. Block Size: 988491 > Tree Height: 2 > After GC: > Statistics for BinaryTreeDictionary: > ------------------------------------ > Total Free Space: 6837777 > Max Chunk Size: 6832640 > Number of Blocks: 6 > Av. Block Size: 1139629 > Tree Height: 5 > , 0,1179200 secs] [Times: user=0,80 sys=0,00, real=0,12 secs] > Total time for which application threads were stopped: 0,1186510 seconds > Application time: 0,7920070 seconds > 2013-05-23T01:04:58.447-0400: [GC Before GC: > Statistics for BinaryTreeDictionary: > ------------------------------------ > Total Free Space: 1976982 > Max Chunk Size: 1969801 > Number of Blocks: 2 > Av. Block Size: 988491 > Tree Height: 2 > Before GC: > Statistics for BinaryTreeDictionary: > ------------------------------------ > Total Free Space: 6837777 > Max Chunk Size: 6832640 > Number of Blocks: 6 > Av. Block Size: 1139629 > Tree Height: 5 > [ParNew (promotion failed) > Desired survivor size 107347968 bytes, new threshold 2 (max 6) > - age 1: 57903280 bytes, 57903280 total > - age 2: 52076168 bytes, 109979448 total > : 1884081K->1878750K(1887488K), 2,5295040 secs][CMSCMS: Large block > 0x000000071b3bb2e0 > : 3020224K->2805356K(6291456K), 15,0643760 secs] > 4850259K->2805356K(8178944K), [CMS Perm : 80257K->79027K(133764K)]After GC: > Statistics for BinaryTreeDictionary: > ------------------------------------ > Total Free Space: 446204324 > Max Chunk Size: 446204324 > Number of Blocks: 1 > Av. Block Size: 446204324 > Tree Height: 1 > After GC: > Statistics for BinaryTreeDictionary: > ------------------------------------ > Total Free Space: 0 > Max Chunk Size: 0 > Number of Blocks: 0 > Tree Height: 0 > , 17,5940190 secs] [Times: user=18,00 sys=0,78, real=17,59 secs] > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > ________________________________ > The information in this email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this email by anyone else is unauthorised. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. > SmartStream Technologies Ltd. is a company incorporated in England and Wales. Registered office: St Helen's, 1 Undershaft, London, EC3A 8EE. Registration No. 2285524 > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130523/a9bbb202/attachment-0001.html From martin.makundi at koodaripalvelut.com Sat May 25 21:32:37 2013 From: martin.makundi at koodaripalvelut.com (Martin Makundi) Date: Sun, 26 May 2013 07:32:37 +0300 Subject: Bug in G1GC it performs Full GC when code cache is full resulting in overkill Message-ID: Hi! For long time, we have had about 10gb more memory than what we need, but about 1-3 times per day in production, the G1GC performs Full GC without any apparent reason. Recently we installed Appdynamics profiler which shows also Code Cache memory levels. To our surprise we noticed, that every time the code cache becomes almost full, G1GC performs a Full GC, which we ofcourse consider an overkill because the Full GC takes nearly 60 seconds every time with our memory size! Is this a bug in G1GC or is there a configuration option to disable such behavior? See profiler snapshots at: http://eisler.vps.kotisivut.com/logs/g1gc-code-cache-full-gc-bug-illustration.png The issue is not an isolated occurrence, it occurs daily. Similar posts can be found on the web where G1GC performs Full GC with no apparent reason: http://grokbase.com/t/openjdk/hotspot-gc-use/1192sy84j5/g1c-strange-full-gc-behavior http://grokbase.com/p/openjdk/hotspot-gc-use/123ydf9c92/puzzling-why-is-a-full-gc-triggered-here ** Martin From chunt at salesforce.com Sun May 26 07:21:34 2013 From: chunt at salesforce.com (Charlie Hunt) Date: Sun, 26 May 2013 07:21:34 -0700 Subject: Bug in G1GC it performs Full GC when code cache is full resulting in overkill In-Reply-To: References: Message-ID: <3807DE76-D6CA-451F-AC72-771332825905@salesforce.com> Which version of the JDK/JRE are you using? One of the links you referenced below was using JDK 6, where there is no official support for G1. The other link suggests it could have been RMI DGC or a System.gc(). Sent from my iPhone On May 25, 2013, at 11:43 PM, "Martin Makundi" wrote: > it occurs daily. From martin.makundi at koodaripalvelut.com Sun May 26 08:20:36 2013 From: martin.makundi at koodaripalvelut.com (Martin Makundi) Date: Sun, 26 May 2013 18:20:36 +0300 Subject: Bug in G1GC it performs Full GC when code cache is full resulting in overkill In-Reply-To: <3807DE76-D6CA-451F-AC72-771332825905@salesforce.com> References: <3807DE76-D6CA-451F-AC72-771332825905@salesforce.com> Message-ID: Sorry, forgot to mention, using: java version "1.7.0_21" Java(TM) SE Runtime Environment (build 1.7.0_21-b11) Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode) Linux version 3.0.1.stk64 (dfn at localhost.localdomain) (gcc version 4.5.1 20100924 (Red Hat 4.5.1-4) (GCC) ) #1 SMP Sat Aug 13 12:53:46 EDT 2011 -Dclassworlds.conf=/usr/share/maven/maven/bin/m2.conf -Dmaven.home=/usr/share/maven/maven -Duser.timezone=EET -XX:+AggressiveOpts -XX:+DisableExplicitGC -XX:+ParallelRefProcEnabled -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+UseAdaptiveSizePolicy -XX:+UseCompressedOops -XX:+UseFastAccessorMethods -XX:+UseG1GC -XX:+UseGCOverheadLimit -XX:+UseNUMA -XX:+UseStringCache -XX:CMSInitiatingOccupancyFraction=70 -XX:GCPauseIntervalMillis=10000 -XX:InitiatingHeapOccupancyPercent=0 -XX:MaxGCPauseMillis=500 -XX:MaxPermSize=512m -XX:PermSize=512m -XX:ReservedCodeCacheSize=48m -Xloggc:gc.log -Xmaxf1 -Xms30G -Xmx30G -Xnoclassgc -Xss4096k ** Martin 2013/5/26 Charlie Hunt : > Which version of the JDK/JRE are you using? > > One of the links you referenced below was using JDK 6, where there is no official support for G1. The other link suggests it could have been RMI DGC or a System.gc(). > > > > Sent from my iPhone > > On May 25, 2013, at 11:43 PM, "Martin Makundi" wrote: > >> it occurs daily. From darius.ski at gmail.com Mon May 27 15:51:37 2013 From: darius.ski at gmail.com (Darius D.) Date: Tue, 28 May 2013 01:51:37 +0300 Subject: Bug in G1GC it performs Full GC when code cache is full resulting in overkill In-Reply-To: References: <3807DE76-D6CA-451F-AC72-771332825905@salesforce.com> Message-ID: Hi, since I see a reference to my old post with G1 problems, I felt the need to share our success story with G1 collector. We have been using G1 in production since ~1.7_04, the main reason being that CMS was generating way too many FullGCs (in retrospective this is already a hint what the real problem was ...). It turned out that we were getting Full GCs due to humongous object allocation failures, our application generates JSON objects sized in quite a few megabytes and due to unfortunate design of web framework this was causing plenty of reallocation during character encode phase spraying heap in progress. Once proper instrumentation was in place for G1GC in mid-late 2012, we were seeing humongous allocation failures of some 30+ megabytes or so. No wonder that in a busy heap there was not enough continuous space for object this large ( remember realloc chain burned 16, 8, 4, 2 etc megabyte sized chunks down to half of G1HeapRegionSize). So we set out to fix it: 1) We got performance patch merged into our open source web framework that slashed reallocations and rewrote our own code that was generating JSON String to limit reallocation. 2) Tuning -XX:G1HeapRegionSize once we saw proper explanation of G1GC in Monica's JavaOne presentation. For whatever reason for ~7-8GB heap we were getting thousands of G1GC heap regions and we've arrived to -XX:G1HeapRegionSize=16m after some testing. The fact is after all this tuning G1GC has been performing amazingly for us { -XX:G1HeapRegionSize=16m -XX:InitiatingHeapOccupancyPercent=33 -XX:MaxGCPauseMillis=250 -XX:+UseG1GC -Xmx7168m -Xms7168m -XX:MaxPermSize=768m -XX:ReservedCodeCacheSize=128m }. We haven't seen FullGC in production for months now and we have to work really hard to generate them in testing too. Our GC pauses are sub 0.3s, giving us desired web app performance. Keep up great work! :) Darius. On Sun, May 26, 2013 at 6:20 PM, Martin Makundi wrote: > Sorry, forgot to mention, using: > > java version "1.7.0_21" > Java(TM) SE Runtime Environment (build 1.7.0_21-b11) > Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode) > > Linux version 3.0.1.stk64 (dfn at localhost.localdomain) (gcc version > 4.5.1 20100924 (Red Hat 4.5.1-4) (GCC) ) #1 SMP Sat Aug 13 12:53:46 > EDT 2011 > > -Dclassworlds.conf=/usr/share/maven/maven/bin/m2.conf > -Dmaven.home=/usr/share/maven/maven > -Duser.timezone=EET > -XX:+AggressiveOpts > -XX:+DisableExplicitGC > -XX:+ParallelRefProcEnabled > -XX:+PrintGCDateStamps > -XX:+PrintGCDetails > -XX:+PrintHeapAtGC > -XX:+UseAdaptiveSizePolicy > -XX:+UseCompressedOops > -XX:+UseFastAccessorMethods > -XX:+UseG1GC > -XX:+UseGCOverheadLimit > -XX:+UseNUMA > -XX:+UseStringCache > -XX:CMSInitiatingOccupancyFraction=70 > -XX:GCPauseIntervalMillis=10000 > -XX:InitiatingHeapOccupancyPercent=0 > -XX:MaxGCPauseMillis=500 > -XX:MaxPermSize=512m > -XX:PermSize=512m > -XX:ReservedCodeCacheSize=48m > -Xloggc:gc.log > -Xmaxf1 > -Xms30G > -Xmx30G > -Xnoclassgc > -Xss4096k > > > ** > Martin > > 2013/5/26 Charlie Hunt : >> Which version of the JDK/JRE are you using? >> >> One of the links you referenced below was using JDK 6, where there is no official support for G1. The other link suggests it could have been RMI DGC or a System.gc(). >> >> >> >> Sent from my iPhone >> >> On May 25, 2013, at 11:43 PM, "Martin Makundi" wrote: >> >>> it occurs daily. > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From martin.makundi at koodaripalvelut.com Mon May 27 19:09:08 2013 From: martin.makundi at koodaripalvelut.com (Martin Makundi) Date: Tue, 28 May 2013 05:09:08 +0300 Subject: Bug in G1GC it performs Full GC when code cache is full resulting in overkill In-Reply-To: References: <3807DE76-D6CA-451F-AC72-771332825905@salesforce.com> Message-ID: Hi! We actually recorded this bug on 1.7._u06 and upgraded to 1.7.0_21-b11 just a couple of days ago and mistakenly reported wrong version to be in use when bug in effect. We haven't seen any Full GC's since upgrade, so it seems like this bug is fixed. The code cache memory profiles look much nicer now too in the profiler. If the situation changes, I will report back. Until then, I consider this problem solved with the upgrade. ** Martin 2013/5/28 Darius D. > Hi, > > since I see a reference to my old post with G1 problems, I felt the > need to share our success story with G1 collector. > > We have been using G1 in production since ~1.7_04, the main reason > being that CMS was generating way too many FullGCs (in retrospective > this is already a hint what the real problem was ...). > > It turned out that we were getting Full GCs due to humongous object > allocation failures, our application generates JSON objects sized in > quite a few megabytes and due to unfortunate design of web framework > this was causing plenty of reallocation during character encode phase > spraying heap in progress. Once proper instrumentation was in place > for G1GC in mid-late 2012, we were seeing humongous allocation > failures of some 30+ megabytes or so. No wonder that in a busy heap > there was not enough continuous space for object this large ( remember > realloc chain burned 16, 8, 4, 2 etc megabyte sized chunks down to > half of G1HeapRegionSize). > > So we set out to fix it: > > 1) We got performance patch merged into our open source web framework > that slashed reallocations and rewrote our own code that was > generating JSON String to limit reallocation. > 2) Tuning -XX:G1HeapRegionSize once we saw proper explanation of G1GC > in Monica's JavaOne presentation. For whatever reason for ~7-8GB heap > we were getting thousands of G1GC heap regions and we've arrived to > -XX:G1HeapRegionSize=16m after some testing. > > The fact is after all this tuning G1GC has been performing amazingly > for us { -XX:G1HeapRegionSize=16m > -XX:InitiatingHeapOccupancyPercent=33 -XX:MaxGCPauseMillis=250 > -XX:+UseG1GC -Xmx7168m -Xms7168m -XX:MaxPermSize=768m > -XX:ReservedCodeCacheSize=128m }. We haven't seen FullGC in production > for months now and we have to work really hard to generate them in > testing too. Our GC pauses are sub 0.3s, giving us desired web app > performance. > > > Keep up great work! :) > > > Darius. > > > On Sun, May 26, 2013 at 6:20 PM, Martin Makundi > wrote: > > Sorry, forgot to mention, using: > > > > java version "1.7.0_21" > > Java(TM) SE Runtime Environment (build 1.7.0_21-b11) > > Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode) > > > > Linux version 3.0.1.stk64 (dfn at localhost.localdomain) (gcc version > > 4.5.1 20100924 (Red Hat 4.5.1-4) (GCC) ) #1 SMP Sat Aug 13 12:53:46 > > EDT 2011 > > > > -Dclassworlds.conf=/usr/share/maven/maven/bin/m2.conf > > -Dmaven.home=/usr/share/maven/maven > > -Duser.timezone=EET > > -XX:+AggressiveOpts > > -XX:+DisableExplicitGC > > -XX:+ParallelRefProcEnabled > > -XX:+PrintGCDateStamps > > -XX:+PrintGCDetails > > -XX:+PrintHeapAtGC > > -XX:+UseAdaptiveSizePolicy > > -XX:+UseCompressedOops > > -XX:+UseFastAccessorMethods > > -XX:+UseG1GC > > -XX:+UseGCOverheadLimit > > -XX:+UseNUMA > > -XX:+UseStringCache > > -XX:CMSInitiatingOccupancyFraction=70 > > -XX:GCPauseIntervalMillis=10000 > > -XX:InitiatingHeapOccupancyPercent=0 > > -XX:MaxGCPauseMillis=500 > > -XX:MaxPermSize=512m > > -XX:PermSize=512m > > -XX:ReservedCodeCacheSize=48m > > -Xloggc:gc.log > > -Xmaxf1 > > -Xms30G > > -Xmx30G > > -Xnoclassgc > > -Xss4096k > > > > > > ** > > Martin > > > > 2013/5/26 Charlie Hunt : > >> Which version of the JDK/JRE are you using? > >> > >> One of the links you referenced below was using JDK 6, where there is > no official support for G1. The other link suggests it could have been RMI > DGC or a System.gc(). > >> > >> > >> > >> Sent from my iPhone > >> > >> On May 25, 2013, at 11:43 PM, "Martin Makundi" < > martin.makundi at koodaripalvelut.com> wrote: > >> > >>> it occurs daily. > > _______________________________________________ > > hotspot-gc-use mailing list > > hotspot-gc-use at openjdk.java.net > > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130528/a7619ee6/attachment.html From chunt at salesforce.com Tue May 28 10:39:24 2013 From: chunt at salesforce.com (Charlie Hunt) Date: Tue, 28 May 2013 12:39:24 -0500 Subject: Bug in G1GC it performs Full GC when code cache is full resulting in overkill In-Reply-To: References: <3807DE76-D6CA-451F-AC72-771332825905@salesforce.com> Message-ID: Hi Martin, There's a few cmd line options in your list that you likely don't need. We'll address in a different email. Do you have GC logs you can share that exhibit the "unexpected Full GC" with G1 ? At a minimum several GC events before the Full GC event, and a couple after. thanks, charlie ... On May 26, 2013, at 10:20 AM, Martin Makundi wrote: > Sorry, forgot to mention, using: > > java version "1.7.0_21" > Java(TM) SE Runtime Environment (build 1.7.0_21-b11) > Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode) > > Linux version 3.0.1.stk64 (dfn at localhost.localdomain) (gcc version > 4.5.1 20100924 (Red Hat 4.5.1-4) (GCC) ) #1 SMP Sat Aug 13 12:53:46 > EDT 2011 > > -Dclassworlds.conf=/usr/share/maven/maven/bin/m2.conf > -Dmaven.home=/usr/share/maven/maven > -Duser.timezone=EET > -XX:+AggressiveOpts > -XX:+DisableExplicitGC > -XX:+ParallelRefProcEnabled > -XX:+PrintGCDateStamps > -XX:+PrintGCDetails > -XX:+PrintHeapAtGC > -XX:+UseAdaptiveSizePolicy > -XX:+UseCompressedOops > -XX:+UseFastAccessorMethods > -XX:+UseG1GC > -XX:+UseGCOverheadLimit > -XX:+UseNUMA > -XX:+UseStringCache > -XX:CMSInitiatingOccupancyFraction=70 > -XX:GCPauseIntervalMillis=10000 > -XX:InitiatingHeapOccupancyPercent=0 > -XX:MaxGCPauseMillis=500 > -XX:MaxPermSize=512m > -XX:PermSize=512m > -XX:ReservedCodeCacheSize=48m > -Xloggc:gc.log > -Xmaxf1 > -Xms30G > -Xmx30G > -Xnoclassgc > -Xss4096k > > > ** > Martin > > 2013/5/26 Charlie Hunt : >> Which version of the JDK/JRE are you using? >> >> One of the links you referenced below was using JDK 6, where there is no official support for G1. The other link suggests it could have been RMI DGC or a System.gc(). >> >> >> >> Sent from my iPhone >> >> On May 25, 2013, at 11:43 PM, "Martin Makundi" wrote: >> >>> it occurs daily. From chunt at salesforce.com Tue May 28 11:00:48 2013 From: chunt at salesforce.com (Charlie Hunt) Date: Tue, 28 May 2013 13:00:48 -0500 Subject: Bug in G1GC it performs Full GC when code cache is full resulting in overkill In-Reply-To: References: <3807DE76-D6CA-451F-AC72-771332825905@salesforce.com> Message-ID: Hi Martin, On the subject of cmd line options ... Here's a list of options that I think look a bit questionable, and I'd like to understand why you feel the need to set them: -XX:+UseFastAccessorMethod (the default is disabled) -XX:+UseNUMA (Are you running a JVM that spans NUMA memory nodes? Or, do you have multiple JVMs running on a NUMA or non-NUMA system?) -XX:+UseStringCache (Do you have evidence that this helps? And, do you know what it does?) -XX:CMSInitiatingOccupancyFraction=70 (This is applicable to CMS GC, and not applicable to G1 GC) -XX:GCPauseIntervalMillis=10000 (Would like to understand the justification for setting this, and to a 10 second value. This will impact G1.) -XX:InitiatingHeapOccupancyPercent=0 (You realize this will force G1's concurrent cycle to run continuously?) -Xmaxf1 (I've never seen this used before, can you share what you it does and what you expect it to do?) -noclassgc (This is rarely needed and haven't seen an app that required it for quite some time) These you don't need to set as they are the default with 1.7.0_21 when you specify -XX:+UseG1GC, hence you can remove them: -XX:+UseAdaptiveSizePolicy -XX:+UseCompressedOops (this gets auto-enabled based on the size of the Java with 64-bit JVMs --- and you might realize slightly better performance if you can run with -Xmx26g / -Xms26g, that should give you zero base compressed oops) -XX:+ UseGCOverheadLimit -XX:+ReservedCodeCacheSize=48, that is the default for 7u21. You might consider setting it higher if you have the available space, and more importantly if you think you're running out of code space. thanks, charlie ... On May 26, 2013, at 10:20 AM, Martin Makundi wrote: > Sorry, forgot to mention, using: > > java version "1.7.0_21" > Java(TM) SE Runtime Environment (build 1.7.0_21-b11) > Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode) > > Linux version 3.0.1.stk64 (dfn at localhost.localdomain) (gcc version > 4.5.1 20100924 (Red Hat 4.5.1-4) (GCC) ) #1 SMP Sat Aug 13 12:53:46 > EDT 2011 > > -Dclassworlds.conf=/usr/share/maven/maven/bin/m2.conf > -Dmaven.home=/usr/share/maven/maven > -Duser.timezone=EET > -XX:+AggressiveOpts > -XX:+DisableExplicitGC > -XX:+ParallelRefProcEnabled > -XX:+PrintGCDateStamps > -XX:+PrintGCDetails > -XX:+PrintHeapAtGC > -XX:+UseAdaptiveSizePolicy > -XX:+UseCompressedOops > -XX:+UseFastAccessorMethods > -XX:+UseG1GC > -XX:+UseGCOverheadLimit > -XX:+UseNUMA > -XX:+UseStringCache > -XX:CMSInitiatingOccupancyFraction=70 > -XX:GCPauseIntervalMillis=10000 > -XX:InitiatingHeapOccupancyPercent=0 > -XX:MaxGCPauseMillis=500 > -XX:MaxPermSize=512m > -XX:PermSize=512m > -XX:ReservedCodeCacheSize=48m > -Xloggc:gc.log > -Xmaxf1 > -Xms30G > -Xmx30G > -Xnoclassgc > -Xss4096k > > > ** > Martin > > 2013/5/26 Charlie Hunt : >> Which version of the JDK/JRE are you using? >> >> One of the links you referenced below was using JDK 6, where there is no official support for G1. The other link suggests it could have been RMI DGC or a System.gc(). >> >> >> >> Sent from my iPhone >> >> On May 25, 2013, at 11:43 PM, "Martin Makundi" wrote: >> >>> it occurs daily. From martin.makundi at koodaripalvelut.com Tue May 28 11:35:06 2013 From: martin.makundi at koodaripalvelut.com (Martin Makundi) Date: Tue, 28 May 2013 21:35:06 +0300 Subject: Bug in G1GC it performs Full GC when code cache is full resulting in overkill In-Reply-To: References: <3807DE76-D6CA-451F-AC72-771332825905@salesforce.com> Message-ID: Hi! > On the subject of cmd line options ... Thanks for the detailed feedback, here is what we based our decisions upon: > Here's a list of options that I think look a bit questionable, and I'd like to understand why you feel the need to set them: They are not very clearly documented, so there are a lot of 'shotgun' options. > -XX:+UseFastAccessorMethod (the default is disabled) Fast sounds good, the description of it is "Use optimized versions of GetField" which sounds good. I see no harm in this. > -XX:+UseNUMA (Are you running a JVM that spans NUMA memory nodes? Or, do you have multiple JVMs running on a NUMA or non-NUMA system?) Single JVM 64bit Linux, I do not know the technical details, but switched on based on this sentence: NUMA Performance Metrics When evaluated against the SPEC JBB 2005 benchmark on an 8-chip Opteron machine, NUMA-aware systems showed the following performance increases: 32 bit ? About 30 percent increase in performance with NUMA-aware allocator 64 bit ? About 40 percent increase in performance with NUMA-aware allocator > -XX:+UseStringCache (Do you have evidence that this helps? And, do you know what it does?) I assume it is some sort of string interning solution. Don't know exactly what it does, but our application uses high amount of redundant strings, smaller memory footprint is a good idea. Again, very little documentation about this available but seems straightforward. Haven't benchmarked it personally. > -XX:CMSInitiatingOccupancyFraction=70 (This is applicable to CMS GC, and not applicable to G1 GC) Again, not documented thoroughly where it applies and where not, jvm gave no warning/error about it so we assumed it's valid. > -XX:GCPauseIntervalMillis=10000 (Would like to understand the justification for setting this, and to a 10 second value. This will impact G1.) I understand what matters is the ratio MaxGCPauseMillis/GCPauseIntervalMillis and a larger GCPauseIntervalMillis makes it less aggressive and thus less overhead? > -XX:InitiatingHeapOccupancyPercent=0 (You realize this will force G1's concurrent cycle to run continuously?) Yes, that's what we figured out, we don't want it to sit lazy and end up in a situation where it is required to do a Full GC. This switch was specifically chosen in a situation we had a memory leak and tried to aggressively fight against it before we found the root cause. Maybe we should try without this switch now, and see what effect it has. > -Xmaxf1 (I've never seen this used before, can you share what you it does and what you expect it to do?) Again, referring to previous memory leak issues, we did not want the application to fight with other applications for available memory. Xmaxf1 keeps memory reservation fixed to initial value which is equal to maximum value. > -noclassgc (This is rarely needed and haven't seen an app that required it for quite some time) Jvm 1.6 stopped the world for couple of minutes several times per day while unloading classes, so we used noclassgc to disable that. We do not know if this is necessary for latest 1.7 to avoid class unload pause, but we continued to use this switch and found no harm in it. Can't afford testing that in production ;) > These you don't need to set as they are the default with 1.7.0_21 when you specify -XX:+UseG1GC, hence you can remove them: Some of them are set explicitly just to keep track amidst jvm upgrades. > -XX:+UseAdaptiveSizePolicy > -XX:+UseCompressedOops (this gets auto-enabled based on the size of the Java with 64-bit JVMs --- and you might realize slightly better performance if you can run with -Xmx26g / -Xms26g, that should give you zero base compressed oops) Thanks, good to know, will try that. Is it exactly 26g or bits more or bits less? > -XX:+ UseGCOverheadLimit > -XX:+ReservedCodeCacheSize=48, that is the default for 7u21. You might consider setting it higher if you have the available space, and more importantly if you think you're running out of code space. For our sun jvm linux 64bit 48m is maximum, jvm won't start if higher value. ** Martin > > > On May 26, 2013, at 10:20 AM, Martin Makundi wrote: > >> Sorry, forgot to mention, using: >> >> java version "1.7.0_21" >> Java(TM) SE Runtime Environment (build 1.7.0_21-b11) >> Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode) >> >> Linux version 3.0.1.stk64 (dfn at localhost.localdomain) (gcc version >> 4.5.1 20100924 (Red Hat 4.5.1-4) (GCC) ) #1 SMP Sat Aug 13 12:53:46 >> EDT 2011 >> >> -Dclassworlds.conf=/usr/share/maven/maven/bin/m2.conf >> -Dmaven.home=/usr/share/maven/maven >> -Duser.timezone=EET >> -XX:+AggressiveOpts >> -XX:+DisableExplicitGC >> -XX:+ParallelRefProcEnabled >> -XX:+PrintGCDateStamps >> -XX:+PrintGCDetails >> -XX:+PrintHeapAtGC >> -XX:+UseAdaptiveSizePolicy >> -XX:+UseCompressedOops >> -XX:+UseFastAccessorMethods >> -XX:+UseG1GC >> -XX:+UseGCOverheadLimit >> -XX:+UseNUMA >> -XX:+UseStringCache >> -XX:CMSInitiatingOccupancyFraction=70 >> -XX:GCPauseIntervalMillis=10000 >> -XX:InitiatingHeapOccupancyPercent=0 >> -XX:MaxGCPauseMillis=500 >> -XX:MaxPermSize=512m >> -XX:PermSize=512m >> -XX:ReservedCodeCacheSize=48m >> -Xloggc:gc.log >> -Xmaxf1 >> -Xms30G >> -Xmx30G >> -Xnoclassgc >> -Xss4096k >> >> >> ** >> Martin >> >> 2013/5/26 Charlie Hunt : >>> Which version of the JDK/JRE are you using? >>> >>> One of the links you referenced below was using JDK 6, where there is no official support for G1. The other link suggests it could have been RMI DGC or a System.gc(). >>> >>> >>> >>> Sent from my iPhone >>> >>> On May 25, 2013, at 11:43 PM, "Martin Makundi" wrote: >>> >>>> it occurs daily. > From martin.makundi at koodaripalvelut.com Wed May 29 04:10:21 2013 From: martin.makundi at koodaripalvelut.com (Martin Makundi) Date: Wed, 29 May 2013 14:10:21 +0300 Subject: Bug in G1GC it performs Full GC when code cache is full resulting in overkill In-Reply-To: References: <3807DE76-D6CA-451F-AC72-771332825905@salesforce.com> Message-ID: Hi! These changes resulted in two Full GCs already within 8 hours from deployment: - memory reduction to 26g - removed InitiatingHeapOccupancyPercent=0 Neither change had noticeable effect on performance, we will first put back InitiatingHeapOccupancyPercent=0 to see if it makes a difference. ** Martin 2013/5/28 Martin Makundi : > Hi! > >> On the subject of cmd line options ... > > Thanks for the detailed feedback, here is what we based our decisions upon: > >> Here's a list of options that I think look a bit questionable, and I'd like to understand why you feel the need to set them: > > They are not very clearly documented, so there are a lot of 'shotgun' options. > >> -XX:+UseFastAccessorMethod (the default is disabled) > > Fast sounds good, the description of it is "Use optimized versions of > GetField" which sounds good. I see no harm in this. > >> -XX:+UseNUMA (Are you running a JVM that spans NUMA memory nodes? Or, do you have multiple JVMs running on a NUMA or non-NUMA system?) > > Single JVM 64bit Linux, I do not know the technical details, but > switched on based on this sentence: > NUMA Performance Metrics > > When evaluated against the SPEC JBB 2005 benchmark on an 8-chip > Opteron machine, NUMA-aware systems showed the following performance > increases: > 32 bit ? About 30 percent increase in performance with NUMA-aware allocator > 64 bit ? About 40 percent increase in performance with NUMA-aware allocator > >> -XX:+UseStringCache (Do you have evidence that this helps? And, do you know what it does?) > > I assume it is some sort of string interning solution. Don't know > exactly what it does, but our application uses high amount of > redundant strings, smaller memory footprint is a good idea. Again, > very little documentation about this available but seems > straightforward. Haven't benchmarked it personally. > >> -XX:CMSInitiatingOccupancyFraction=70 (This is applicable to CMS GC, and not applicable to G1 GC) > > Again, not documented thoroughly where it applies and where not, jvm > gave no warning/error about it so we assumed it's valid. > >> -XX:GCPauseIntervalMillis=10000 (Would like to understand the justification for setting this, and to a 10 second value. This will impact G1.) > > I understand what matters is the ratio > MaxGCPauseMillis/GCPauseIntervalMillis and a larger > GCPauseIntervalMillis makes it less aggressive and thus less overhead? > >> -XX:InitiatingHeapOccupancyPercent=0 (You realize this will force G1's concurrent cycle to run continuously?) > > Yes, that's what we figured out, we don't want it to sit lazy and end > up in a situation where it is required to do a Full GC. This switch > was specifically chosen in a situation we had a memory leak and tried > to aggressively fight against it before we found the root cause. Maybe > we should try without this switch now, and see what effect it has. > >> -Xmaxf1 (I've never seen this used before, can you share what you it does and what you expect it to do?) > > Again, referring to previous memory leak issues, we did not want the > application to fight with other applications for available memory. > Xmaxf1 keeps memory reservation fixed to initial value which is equal > to maximum value. > >> -noclassgc (This is rarely needed and haven't seen an app that required it for quite some time) > > Jvm 1.6 stopped the world for couple of minutes several times per day > while unloading classes, so we used noclassgc to disable that. We do > not know if this is necessary for latest 1.7 to avoid class unload > pause, but we continued to use this switch and found no harm in it. > Can't afford testing that in production ;) > >> These you don't need to set as they are the default with 1.7.0_21 when you specify -XX:+UseG1GC, hence you can remove them: > > Some of them are set explicitly just to keep track amidst jvm upgrades. > >> -XX:+UseAdaptiveSizePolicy >> -XX:+UseCompressedOops (this gets auto-enabled based on the size of the Java with 64-bit JVMs --- and you might realize slightly better performance if you can run with -Xmx26g / -Xms26g, that should give you zero base compressed oops) > > Thanks, good to know, will try that. Is it exactly 26g or bits more or > bits less? > >> -XX:+ UseGCOverheadLimit >> -XX:+ReservedCodeCacheSize=48, that is the default for 7u21. You might consider setting it higher if you have the available space, and more importantly if you think you're running out of code space. > > For our sun jvm linux 64bit 48m is maximum, jvm won't start if higher value. > > > ** > Martin > >> >> >> On May 26, 2013, at 10:20 AM, Martin Makundi wrote: >> >>> Sorry, forgot to mention, using: >>> >>> java version "1.7.0_21" >>> Java(TM) SE Runtime Environment (build 1.7.0_21-b11) >>> Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode) >>> >>> Linux version 3.0.1.stk64 (dfn at localhost.localdomain) (gcc version >>> 4.5.1 20100924 (Red Hat 4.5.1-4) (GCC) ) #1 SMP Sat Aug 13 12:53:46 >>> EDT 2011 >>> >>> -Dclassworlds.conf=/usr/share/maven/maven/bin/m2.conf >>> -Dmaven.home=/usr/share/maven/maven >>> -Duser.timezone=EET >>> -XX:+AggressiveOpts >>> -XX:+DisableExplicitGC >>> -XX:+ParallelRefProcEnabled >>> -XX:+PrintGCDateStamps >>> -XX:+PrintGCDetails >>> -XX:+PrintHeapAtGC >>> -XX:+UseAdaptiveSizePolicy >>> -XX:+UseCompressedOops >>> -XX:+UseFastAccessorMethods >>> -XX:+UseG1GC >>> -XX:+UseGCOverheadLimit >>> -XX:+UseNUMA >>> -XX:+UseStringCache >>> -XX:CMSInitiatingOccupancyFraction=70 >>> -XX:GCPauseIntervalMillis=10000 >>> -XX:InitiatingHeapOccupancyPercent=0 >>> -XX:MaxGCPauseMillis=500 >>> -XX:MaxPermSize=512m >>> -XX:PermSize=512m >>> -XX:ReservedCodeCacheSize=48m >>> -Xloggc:gc.log >>> -Xmaxf1 >>> -Xms30G >>> -Xmx30G >>> -Xnoclassgc >>> -Xss4096k >>> >>> >>> ** >>> Martin >>> >>> 2013/5/26 Charlie Hunt : >>>> Which version of the JDK/JRE are you using? >>>> >>>> One of the links you referenced below was using JDK 6, where there is no official support for G1. The other link suggests it could have been RMI DGC or a System.gc(). >>>> >>>> >>>> >>>> Sent from my iPhone >>>> >>>> On May 25, 2013, at 11:43 PM, "Martin Makundi" wrote: >>>> >>>>> it occurs daily. >> From chunt at salesforce.com Wed May 29 08:10:13 2013 From: chunt at salesforce.com (Charlie Hunt) Date: Wed, 29 May 2013 10:10:13 -0500 Subject: Bug in G1GC it performs Full GC when code cache is full resulting in overkill In-Reply-To: References: <3807DE76-D6CA-451F-AC72-771332825905@salesforce.com> Message-ID: <5C892669-3515-4DCC-BB72-96AE89EBE5F8@salesforce.com> A bit of constructive criticism ;-) It would be good practice to set one option at a time and measure its performance to determine whether it improves performance rather than choosing an option because of something you read in text. In short, always measure and reason about whether what you've observed for an improvement or regression makes sense. And, also run multiple times to get a sense of noise versus real improvement or regression. Addl comments embedded below. hths, charlie ... On May 28, 2013, at 1:35 PM, Martin Makundi wrote: > Hi! > >> On the subject of cmd line options ... > > Thanks for the detailed feedback, here is what we based our decisions upon: > >> Here's a list of options that I think look a bit questionable, and I'd like to understand why you feel the need to set them: > > They are not very clearly documented, so there are a lot of 'shotgun' options. > >> -XX:+UseFastAccessorMethod (the default is disabled) > > Fast sounds good, the description of it is "Use optimized versions of > GetField" which sounds good. I see no harm in this. These would be JNI operations. A quick at the HotSpot source suggests UseFastAccessorMethods is mostly confined to interpreter operations. > >> -XX:+UseNUMA (Are you running a JVM that spans NUMA memory nodes? Or, do you have multiple JVMs running on a NUMA or non-NUMA system?) > > Single JVM 64bit Linux, I do not know the technical details, but > switched on based on this sentence: > NUMA Performance Metrics > > When evaluated against the SPEC JBB 2005 benchmark on an 8-chip > Opteron machine, NUMA-aware systems showed the following performance > increases: > 32 bit ? About 30 percent increase in performance with NUMA-aware allocator > 64 bit ? About 40 percent increase in performance with NUMA-aware allocator A bit of missing context here ... the underlying system should be a NUMA system. IIRC, on that particular 8-chip AMD system, there could be as much as two "hops" if you will to access memory on a given node. Key point is that you should use -XX:+UseNUMA only when you are deploying a JVM that spans NUMA nodes. If you're on a system that is not a NUMA architecture, then you shouldn't use it. If you have multiple JVMs on a NUMA system, it would be a better practice to bind those JVMs to a NUMA node (CPU & memory node), unless the two JVMs are so disparate that it doesn't make sense to give an entire NUMA node to one JVM. > >> -XX:+UseStringCache (Do you have evidence that this helps? And, do you know what it does?) > > I assume it is some sort of string interning solution. Don't know > exactly what it does, but our application uses high amount of > redundant strings, smaller memory footprint is a good idea. Again, > very little documentation about this available but seems > straightforward. Haven't benchmarked it personally. I won't go into the details of what it does. I don't think I can say what it does without possibly being at risk of binding separation agreement. I'll just say that you should measure the perf difference with it off versus on if you think it might help. > >> -XX:CMSInitiatingOccupancyFraction=70 (This is applicable to CMS GC, and not applicable to G1 GC) > > Again, not documented thoroughly where it applies and where not, jvm > gave no warning/error about it so we assumed it's valid. There's always the HotSpot source code ;-) It's also quite well documented in various slide ware on the internet. It's also quite well documented in the Java Performance book. :-) > >> -XX:GCPauseIntervalMillis=10000 (Would like to understand the justification for setting this, and to a 10 second value. This will impact G1.) > > I understand what matters is the ratio > MaxGCPauseMillis/GCPauseIntervalMillis and a larger > GCPauseIntervalMillis makes it less aggressive and thus less overhead? That's the intention. But, in practice in work I've done with G1, I rarely find I need to set GCPauseIntervalMillis different from the default. > >> -XX:InitiatingHeapOccupancyPercent=0 (You realize this will force G1's concurrent cycle to run continuously?) > > Yes, that's what we figured out, we don't want it to sit lazy and end > up in a situation where it is required to do a Full GC. This switch > was specifically chosen in a situation we had a memory leak and tried > to aggressively fight against it before we found the root cause. Maybe > we should try without this switch now, and see what effect it has. Having GC logs to see what available head room you have between the initiating of a G1 concurrent cycle and available regions / heap space would be most appropriate. > >> -Xmaxf1 (I've never seen this used before, can you share what you it does and what you expect it to do?) > > Again, referring to previous memory leak issues, we did not want the > application to fight with other applications for available memory. > Xmaxf1 keeps memory reservation fixed to initial value which is equal > to maximum value. Ok > >> -noclassgc (This is rarely needed and haven't seen an app that required it for quite some time) > > Jvm 1.6 stopped the world for couple of minutes several times per day > while unloading classes, so we used noclassgc to disable that. We do > not know if this is necessary for latest 1.7 to avoid class unload > pause, but we continued to use this switch and found no harm in it. > Can't afford testing that in production ;) Haven't seen a case where unloading classes cause a several minute pause. Are you sure your system is not swapping? And, do you have GC logs you can share that illustrate the behavior and that -noclassgc fixed it? > >> These you don't need to set as they are the default with 1.7.0_21 when you specify -XX:+UseG1GC, hence you can remove them: > > Some of them are set explicitly just to keep track amidst jvm upgrades. You can do as you wish. ;-) I tend to like to keep the list of JVM options a short as possible and when migrating to newer versions doing a dump of -XX:+PrintFlagsFinal to get the defaults, and then also checking the default values after selecting the collector I'm gonna use, i.e. -XX:+UseG1GC, and if I'm gonna use -XX:+AggressiveOpts because I know those will also set other options too. That prevents some other command line option changing default values and not noticing it. > >> -XX:+UseAdaptiveSizePolicy >> -XX:+UseCompressedOops (this gets auto-enabled based on the size of the Java with 64-bit JVMs --- and you might realize slightly better performance if you can run with -Xmx26g / -Xms26g, that should give you zero base compressed oops) > > Thanks, good to know, will try that. Is it exactly 26g or bits more or > bits less? Not exactly 26g, but in that area. 26g almost always gives you zero base. I haven't seen on that hasn't. > >> -XX:+ UseGCOverheadLimit >> -XX:ReservedCodeCacheSize=48, that is the default for 7u21. You might consider setting it higher if you have the available space, and more importantly if you think you're running out of code space. > > For our sun jvm linux 64bit 48m is maximum, jvm won't start if higher value. If you can't go larger than -XX:ReservedCodeCacheSize=48m, that may suggest you have memory constraints and may also suggest you don't have enough swap space defined, and you may be experiencing swapping during JVM execution. I've got a Linux system that has 32 GB of RAM, I can set ReservedCodeCacheSize=256m with no issues, even with -Xms30g and -Xms30g. > > > ** > Martin > >> >> >> On May 26, 2013, at 10:20 AM, Martin Makundi wrote: >> >>> Sorry, forgot to mention, using: >>> >>> java version "1.7.0_21" >>> Java(TM) SE Runtime Environment (build 1.7.0_21-b11) >>> Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode) >>> >>> Linux version 3.0.1.stk64 (dfn at localhost.localdomain) (gcc version >>> 4.5.1 20100924 (Red Hat 4.5.1-4) (GCC) ) #1 SMP Sat Aug 13 12:53:46 >>> EDT 2011 >>> >>> -Dclassworlds.conf=/usr/share/maven/maven/bin/m2.conf >>> -Dmaven.home=/usr/share/maven/maven >>> -Duser.timezone=EET >>> -XX:+AggressiveOpts >>> -XX:+DisableExplicitGC >>> -XX:+ParallelRefProcEnabled >>> -XX:+PrintGCDateStamps >>> -XX:+PrintGCDetails >>> -XX:+PrintHeapAtGC >>> -XX:+UseAdaptiveSizePolicy >>> -XX:+UseCompressedOops >>> -XX:+UseFastAccessorMethods >>> -XX:+UseG1GC >>> -XX:+UseGCOverheadLimit >>> -XX:+UseNUMA >>> -XX:+UseStringCache >>> -XX:CMSInitiatingOccupancyFraction=70 >>> -XX:GCPauseIntervalMillis=10000 >>> -XX:InitiatingHeapOccupancyPercent=0 >>> -XX:MaxGCPauseMillis=500 >>> -XX:MaxPermSize=512m >>> -XX:PermSize=512m >>> -XX:ReservedCodeCacheSize=48m >>> -Xloggc:gc.log >>> -Xmaxf1 >>> -Xms30G >>> -Xmx30G >>> -Xnoclassgc >>> -Xss4096k >>> >>> >>> ** >>> Martin >>> >>> 2013/5/26 Charlie Hunt : >>>> Which version of the JDK/JRE are you using? >>>> >>>> One of the links you referenced below was using JDK 6, where there is no official support for G1. The other link suggests it could have been RMI DGC or a System.gc(). >>>> >>>> >>>> >>>> Sent from my iPhone >>>> >>>> On May 25, 2013, at 11:43 PM, "Martin Makundi" wrote: >>>> >>>>> it occurs daily. >> From martin.makundi at koodaripalvelut.com Wed May 29 08:49:46 2013 From: martin.makundi at koodaripalvelut.com (Martin Makundi) Date: Wed, 29 May 2013 18:49:46 +0300 Subject: Bug in G1GC it performs Full GC when code cache is full resulting in overkill In-Reply-To: <5C892669-3515-4DCC-BB72-96AE89EBE5F8@salesforce.com> References: <3807DE76-D6CA-451F-AC72-771332825905@salesforce.com> <5C892669-3515-4DCC-BB72-96AE89EBE5F8@salesforce.com> Message-ID: Hi! > A bit of constructive criticism ;-) It would be good practice to set one option at a time and measure its performance to determine whether it improves performance rather than choosing an option because of something you read in text. In short, always measure and reason about whether what you've observed for an improvement or regression makes sense. And, also run multiple times to get a sense of noise versus real improvement or regression. Thanks. That's one of the reasons we never changed our options. Once we found someting that works very well, we know that its always n! work to test changes and the system was running very nice indeed before the previous tweak ;) >>> -XX:+UseFastAccessorMethod (the default is disabled) >> >> Fast sounds good, the description of it is "Use optimized versions of >> GetField" which sounds good. I see no harm in this. > > These would be JNI operations. > > A quick at the HotSpot source suggests UseFastAccessorMethods > is mostly confined to interpreter operations. Thanks for the info. Doesn't say much to me, but does not seem to harm anything. Will try setting it off at some point in time. >>> -XX:+UseNUMA (Are you running a JVM that spans NUMA memory nodes? >>> Or, do you have multiple JVMs running on a NUMA or non-NUMA system?) >> > > Key point is that you should use -XX:+UseNUMA only when you are > deploying a JVM that spans NUMA nodes. Thanks for the info. Doesn't say much to me, but does not seem to harm anything. Will try setting it off at some point in time. >>> -XX:+UseStringCache (Do you have evidence that this helps? And, do you know what it does?) >> >> I assume it is some sort of string interning solution. Don't know >> exactly what it does, but our application uses high amount of >> redundant strings, smaller memory footprint is a good idea. Again, >> very little documentation about this available but seems >> straightforward. Haven't benchmarked it personally. > > I won't go into the details of what it does. I don't think I can say what it does without possibly being at risk of binding separation agreement. > > I'll just say that you should measure the perf difference with it off versus on if you think it might help. No visible impact on performance, really, in production. If we test it with a bogus test case, well.. results are as informative as our tests are close to the production. Which is unlikely. >>> -XX:CMSInitiatingOccupancyFraction=70 (This is applicable to CMS GC, and not applicable to G1 GC) >> >> Again, not documented thoroughly where it applies and where not, jvm >> gave no warning/error about it so we assumed it's valid. > > There's always the HotSpot source code ;-) > > It's also quite well documented in various slide ware on the internet. > It's also quite well documented in the Java Performance book. :-) Uh.. does it say somewhere that Do not use XX:CMSInitiatingOccupancyFraction with G1GC? ;) I know performance tuning is your bread and butter but is not ours... is more like we are just driving the car and you are the mechanic...different perspective.. just trying to fill'er'up to go..leaded or unleaded... ;) >>> -XX:InitiatingHeapOccupancyPercent=0 (You realize this will force G1's concurrent cycle to run continuously?) >> >> Yes, that's what we figured out, we don't want it to sit lazy and end >> up in a situation where it is required to do a Full GC. This switch >> was specifically chosen in a situation we had a memory leak and tried >> to aggressively fight against it before we found the root cause. Maybe >> we should try without this switch now, and see what effect it has. > > Having GC logs to see what available head room you have between the > initiating of a G1 concurrent cycle and available regions / heap space would be most appropriate. Hmm.. I don't thoroughly understand the logs either, but, here is a snap: 2013-05-29T17:28:56.119+0300: 38905.407: [GC pause (young), 0.29261000 secs] [Parallel Time: 288.8 ms] [GC Worker Start (ms): 38905407.6 38905407.7 38905407.7 38905407.9 Avg: 38905407.7, Min: 38905407.6, Max: 38905407.9, Diff: 0.3] [Ext Root Scanning (ms): 22.8 16.3 18.1 22.0 Avg: 19.8, Min: 16.3, Max: 22.8, Diff: 6.6] [SATB Filtering (ms): 0.0 0.1 0.0 0.0 Avg: 0.0, Min: 0.0, Max: 0.1, Diff: 0.1] [Update RS (ms): 31.9 37.3 35.1 33.3 Avg: 34.4, Min: 31.9, Max: 37.3, Diff: 5.5] [Processed Buffers : 102 106 119 104 Sum: 431, Avg: 107, Min: 102, Max: 119, Diff: 17] [Scan RS (ms): 0.0 0.0 0.1 0.0 Avg: 0.0, Min: 0.0, Max: 0.1, Diff: 0.1] [Object Copy (ms): 228.2 229.1 229.5 227.3 Avg: 228.5, Min: 227.3, Max: 229.5, Diff: 2.2] [Termination (ms): 0.0 0.0 0.0 0.0 Avg: 0.0, Min: 0.0, Max: 0.0, Diff: 0.0] [Termination Attempts : 4 1 11 4 Sum: 20, Avg: 5, Min: 1, Max: 11, Diff: 10] [GC Worker End (ms): 38905690.5 38905690.5 38905690.5 38905690.5 Avg: 38905690.5, Min: 38905690.5, Max: 38905690.5, Diff: 0.0] [GC Worker (ms): 282.9 282.8 282.8 282.6 Avg: 282.8, Min: 282.6, Max: 282.9, Diff: 0.3] [GC Worker Other (ms): 5.9 6.0 6.0 6.2 Avg: 6.0, Min: 5.9, Max: 6.2, Diff: 0.3] [Complete CSet Marking: 0.0 ms] [Clear CT: 0.1 ms] [Other: 3.7 ms] [Choose CSet: 0.0 ms] [Ref Proc: 2.8 ms] [Ref Enq: 0.1 ms] [Free CSet: 0.3 ms] [Eden: 48M(5032M)->0B(5032M) Survivors: 288M->288M Heap: 15790M(26624M)->15741M(26624M)] [Times: user=1.14 sys=0.00, real=0.29 secs] Heap after GC invocations=575 (full 157): garbage-first heap total 27262976K, used 16119181K [0x0000000160000000, 0x00000007e0000000, 0x00000007e0000000) region size 8192K, 36 young (294912K), 36 survivors (294912K) compacting perm gen total 524288K, used 164479K [0x00000007e0000000, 0x0000000800000000, 0x0000000800000000) the space 524288K, 31% used [0x00000007e0000000, 0x00000007ea09ff68, 0x00000007ea0a0000, 0x0000000800000000) No shared spaces configured. } {Heap before GC invocations=575 (full 157): garbage-first heap total 27262976K, used 16119181K [0x0000000160000000, 0x00000007e0000000, 0x00000007e0000000) region size 8192K, 37 young (303104K), 36 survivors (294912K) compacting perm gen total 524288K, used 164479K [0x00000007e0000000, 0x0000000800000000, 0x0000000800000000) the space 524288K, 31% used [0x00000007e0000000, 0x00000007ea09ff68, 0x00000007ea0a0000, 0x0000000800000000) No shared spaces configured. 2013-05-29T17:28:56.413+0300: 38905.701: [Full GC 15742M->14497M(26624M), 56.7731320 secs] That's the third Full GC today after the change to 26G and change from occupancypercent=0. Tomorrow will be trying again with occupancypercent=0 >>> -noclassgc (This is rarely needed and haven't seen an app that required it for quite some time) >> >> Jvm 1.6 stopped the world for couple of minutes several times per day >> while unloading classes, so we used noclassgc to disable that. We do >> not know if this is necessary for latest 1.7 to avoid class unload >> pause, but we continued to use this switch and found no harm in it. >> Can't afford testing that in production ;) > > Haven't seen a case where unloading classes cause a several minute pause. > Are you sure your system is not swapping? And, do you have GC logs you > can share that illustrate the behavior and that -noclassgc fixed it? We deleted swap partition long time ago, we simply do not risk swapping at all. We had this class unloading problem several times per day like half a year ago, and fixed it with noclasssgc, that was a no-brainer, single parameter that made the difference. It is also discussed here (they do not discuss noclassgc though, we figured that out somehow) http://stackoverflow.com/questions/2833983/meaning-of-the-unloading-class-messages >>> -XX:+ UseGCOverheadLimit >>> -XX:ReservedCodeCacheSize=48, that is the default for 7u21. You might consider setting it higher if you have the available space, and more importantly if you think you're running out of code space. >> >> For our sun jvm linux 64bit 48m is maximum, jvm won't start if higher value. > > If you can't go larger than -XX:ReservedCodeCacheSize=48m, that may suggest you have memory constraints and may also suggest you don't have enough swap space defined, and you may be experiencing swapping during JVM execution. I've got a Linux system that has 32 GB of RAM, I can set ReservedCodeCacheSize=256m with no issues, even with -Xms30g and -Xms30g. It is also documented that 48m is maximum http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html "maximum code cache size. [Solaris 64-bit, amd64, and -server x86: 48m" ** Martin > > >>> >>> On May 26, 2013, at 10:20 AM, Martin Makundi wrote: >>> >>>> Sorry, forgot to mention, using: >>>> >>>> java version "1.7.0_21" >>>> Java(TM) SE Runtime Environment (build 1.7.0_21-b11) >>>> Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode) >>>> >>>> Linux version 3.0.1.stk64 (dfn at localhost.localdomain) (gcc version >>>> 4.5.1 20100924 (Red Hat 4.5.1-4) (GCC) ) #1 SMP Sat Aug 13 12:53:46 >>>> EDT 2011 >>>> >>>> -Dclassworlds.conf=/usr/share/maven/maven/bin/m2.conf >>>> -Dmaven.home=/usr/share/maven/maven >>>> -Duser.timezone=EET >>>> -XX:+AggressiveOpts >>>> -XX:+DisableExplicitGC >>>> -XX:+ParallelRefProcEnabled >>>> -XX:+PrintGCDateStamps >>>> -XX:+PrintGCDetails >>>> -XX:+PrintHeapAtGC >>>> -XX:+UseAdaptiveSizePolicy >>>> -XX:+UseCompressedOops >>>> -XX:+UseFastAccessorMethods >>>> -XX:+UseG1GC >>>> -XX:+UseGCOverheadLimit >>>> -XX:+UseNUMA >>>> -XX:+UseStringCache >>>> -XX:CMSInitiatingOccupancyFraction=70 >>>> -XX:GCPauseIntervalMillis=10000 >>>> -XX:InitiatingHeapOccupancyPercent=0 >>>> -XX:MaxGCPauseMillis=500 >>>> -XX:MaxPermSize=512m >>>> -XX:PermSize=512m >>>> -XX:ReservedCodeCacheSize=48m >>>> -Xloggc:gc.log >>>> -Xmaxf1 >>>> -Xms30G >>>> -Xmx30G >>>> -Xnoclassgc >>>> -Xss4096k >>>> >>>> >>>> ** >>>> Martin >>>> >>>> 2013/5/26 Charlie Hunt : >>>>> Which version of the JDK/JRE are you using? >>>>> >>>>> One of the links you referenced below was using JDK 6, where there is no official support for G1. The other link suggests it could have been RMI DGC or a System.gc(). >>>>> >>>>> >>>>> >>>>> Sent from my iPhone >>>>> >>>>> On May 25, 2013, at 11:43 PM, "Martin Makundi" wrote: >>>>> >>>>>> it occurs daily. >>> > From chunt at salesforce.com Wed May 29 09:31:28 2013 From: chunt at salesforce.com (Charlie Hunt) Date: Wed, 29 May 2013 11:31:28 -0500 Subject: Fwd: Bug in G1GC it performs Full GC when code cache is full resulting in overkill References: <9A54ECD2-B470-4797-9EFB-E81DD91C6B3F@salesforce.com> Message-ID: forgot to cc hotspot-gc-use ... will try to remember on future replies. Begin forwarded message: > From: Charlie Hunt > Subject: Re: Bug in G1GC it performs Full GC when code cache is full resulting in overkill > Date: May 29, 2013 11:28:35 AM CDT > To: Martin Makundi > > Couple comments below. > > charlie ... > > On May 29, 2013, at 10:49 AM, Martin Makundi wrote: > >> Hi! >> >>> A bit of constructive criticism ;-) It would be good practice to set one option at a time and measure its performance to determine whether it improves performance rather than choosing an option because of something you read in text. In short, always measure and reason about whether what you've observed for an improvement or regression makes sense. And, also run multiple times to get a sense of noise versus real improvement or regression. >> >> >> Thanks. That's one of the reasons we never changed our options. Once >> we found someting that works very well, we know that its always n! >> work to test changes and the system was running very nice indeed >> before the previous tweak ;) >> >>>>> -XX:+UseFastAccessorMethod (the default is disabled) >>>> >>>> Fast sounds good, the description of it is "Use optimized versions of >>>> GetField" which sounds good. I see no harm in this. >>> >>> These would be JNI operations. >>> >>> A quick at the HotSpot source suggests UseFastAccessorMethods >>> is mostly confined to interpreter operations. >> >> Thanks for the info. Doesn't say much to me, but does not seem to harm >> anything. Will try setting it off at some point in time. >> >>>>> -XX:+UseNUMA (Are you running a JVM that spans NUMA memory nodes? >>>>> Or, do you have multiple JVMs running on a NUMA or non-NUMA system?) >>>> >>> >>> Key point is that you should use -XX:+UseNUMA only when you are >>> deploying a JVM that spans NUMA nodes. >> >> Thanks for the info. Doesn't say much to me, but does not seem to harm >> anything. Will try setting it off at some point in time. > > You either have a NUMA system and deploying a single JVM on it, or you're on a non-NUMA system. > > Come to think of it, you're on a Linux system. I don't recall the exact details of how the numa-allocator works on Linux. I was first thinking of it in terms of how it's handled on Solaris with Solaris lgroups. I won't go into that. :-) On Linux it may just do round robin .. would have to go look again at the HotSpot source to see what it does on Linux. Depending on what it does, you may not see any difference with it. However, that could change in the future and you could be caught off guard with such a change. ;-) > >> >>>>> -XX:+UseStringCache (Do you have evidence that this helps? And, do you know what it does?) >>>> >>>> I assume it is some sort of string interning solution. Don't know >>>> exactly what it does, but our application uses high amount of >>>> redundant strings, smaller memory footprint is a good idea. Again, >>>> very little documentation about this available but seems >>>> straightforward. Haven't benchmarked it personally. >>> >>> I won't go into the details of what it does. I don't think I can say what it does without possibly being at risk of binding separation agreement. >>> >>> I'll just say that you should measure the perf difference with it off versus on if you think it might help. >> >> No visible impact on performance, really, in production. If we test it >> with a bogus test case, well.. results are as informative as our tests >> are close to the production. Which is unlikely. >> >>>>> -XX:CMSInitiatingOccupancyFraction=70 (This is applicable to CMS GC, and not applicable to G1 GC) >>>> >>>> Again, not documented thoroughly where it applies and where not, jvm >>>> gave no warning/error about it so we assumed it's valid. >>> >>> There's always the HotSpot source code ;-) >>> >>> It's also quite well documented in various slide ware on the internet. >>> It's also quite well documented in the Java Performance book. :-) >> >> Uh.. does it say somewhere that Do not use >> XX:CMSInitiatingOccupancyFraction with G1GC? ;) I know performance >> tuning is your bread and butter but is not ours... is more like we are >> just driving the car and you are the mechanic...different >> perspective.. just trying to fill'er'up to go..leaded or unleaded... >> ;) > > Well, uh, the command line options says "CMS" in it. Isn't that enough to imply that it's CMS specific? Additionally, if the description says, "The percent of old generation space occupancy at which the first CMS garbage collection cycle should start. Subsequent starts of the CMS cycle are determined at a HotSpot ergonomically computed occupancy.", isn't that enough to imply it's CMS GC specific? That description comes directly from Java Performance. > > To use your analogy, would you put diesel fuel in your gasoline powered vehicle when the label at the pump says "diesel fuel"? > >> >>>>> -XX:InitiatingHeapOccupancyPercent=0 (You realize this will force G1's concurrent cycle to run continuously?) >>>> >>>> Yes, that's what we figured out, we don't want it to sit lazy and end >>>> up in a situation where it is required to do a Full GC. This switch >>>> was specifically chosen in a situation we had a memory leak and tried >>>> to aggressively fight against it before we found the root cause. Maybe >>>> we should try without this switch now, and see what effect it has. >>> >>> Having GC logs to see what available head room you have between the >>> initiating of a G1 concurrent cycle and available regions / heap space would be most appropriate. >> >> Hmm.. I don't thoroughly understand the logs either, but, here is a snap: >> >> 2013-05-29T17:28:56.119+0300: 38905.407: [GC pause (young), 0.29261000 secs] >> [Parallel Time: 288.8 ms] >> [GC Worker Start (ms): 38905407.6 38905407.7 38905407.7 38905407.9 >> Avg: 38905407.7, Min: 38905407.6, Max: 38905407.9, Diff: 0.3] >> [Ext Root Scanning (ms): 22.8 16.3 18.1 22.0 >> Avg: 19.8, Min: 16.3, Max: 22.8, Diff: 6.6] >> [SATB Filtering (ms): 0.0 0.1 0.0 0.0 >> Avg: 0.0, Min: 0.0, Max: 0.1, Diff: 0.1] >> [Update RS (ms): 31.9 37.3 35.1 33.3 >> Avg: 34.4, Min: 31.9, Max: 37.3, Diff: 5.5] >> [Processed Buffers : 102 106 119 104 >> Sum: 431, Avg: 107, Min: 102, Max: 119, Diff: 17] >> [Scan RS (ms): 0.0 0.0 0.1 0.0 >> Avg: 0.0, Min: 0.0, Max: 0.1, Diff: 0.1] >> [Object Copy (ms): 228.2 229.1 229.5 227.3 >> Avg: 228.5, Min: 227.3, Max: 229.5, Diff: 2.2] >> [Termination (ms): 0.0 0.0 0.0 0.0 >> Avg: 0.0, Min: 0.0, Max: 0.0, Diff: 0.0] >> [Termination Attempts : 4 1 11 4 >> Sum: 20, Avg: 5, Min: 1, Max: 11, Diff: 10] >> [GC Worker End (ms): 38905690.5 38905690.5 38905690.5 38905690.5 >> Avg: 38905690.5, Min: 38905690.5, Max: 38905690.5, Diff: 0.0] >> [GC Worker (ms): 282.9 282.8 282.8 282.6 >> Avg: 282.8, Min: 282.6, Max: 282.9, Diff: 0.3] >> [GC Worker Other (ms): 5.9 6.0 6.0 6.2 >> Avg: 6.0, Min: 5.9, Max: 6.2, Diff: 0.3] >> [Complete CSet Marking: 0.0 ms] >> [Clear CT: 0.1 ms] >> [Other: 3.7 ms] >> [Choose CSet: 0.0 ms] >> [Ref Proc: 2.8 ms] >> [Ref Enq: 0.1 ms] >> [Free CSet: 0.3 ms] >> [Eden: 48M(5032M)->0B(5032M) Survivors: 288M->288M Heap: >> 15790M(26624M)->15741M(26624M)] >> [Times: user=1.14 sys=0.00, real=0.29 secs] >> Heap after GC invocations=575 (full 157): >> garbage-first heap total 27262976K, used 16119181K >> [0x0000000160000000, 0x00000007e0000000, 0x00000007e0000000) >> region size 8192K, 36 young (294912K), 36 survivors (294912K) >> compacting perm gen total 524288K, used 164479K [0x00000007e0000000, >> 0x0000000800000000, 0x0000000800000000) >> the space 524288K, 31% used [0x00000007e0000000, >> 0x00000007ea09ff68, 0x00000007ea0a0000, 0x0000000800000000) >> No shared spaces configured. >> } >> {Heap before GC invocations=575 (full 157): >> garbage-first heap total 27262976K, used 16119181K >> [0x0000000160000000, 0x00000007e0000000, 0x00000007e0000000) >> region size 8192K, 37 young (303104K), 36 survivors (294912K) >> compacting perm gen total 524288K, used 164479K [0x00000007e0000000, >> 0x0000000800000000, 0x0000000800000000) >> the space 524288K, 31% used [0x00000007e0000000, >> 0x00000007ea09ff68, 0x00000007ea0a0000, 0x0000000800000000) >> No shared spaces configured. >> 2013-05-29T17:28:56.413+0300: 38905.701: [Full GC >> 15742M->14497M(26624M), 56.7731320 secs] >> >> That's the third Full GC today after the change to 26G and change from >> occupancypercent=0. Tomorrow will be trying again with >> occupancypercent=0 > > I saw your follow-up post with additional GC logs. Thanks! That'll really help! > >> >>>>> -noclassgc (This is rarely needed and haven't seen an app that required it for quite some time) >>>> >>>> Jvm 1.6 stopped the world for couple of minutes several times per day >>>> while unloading classes, so we used noclassgc to disable that. We do >>>> not know if this is necessary for latest 1.7 to avoid class unload >>>> pause, but we continued to use this switch and found no harm in it. >>>> Can't afford testing that in production ;) >>> >>> Haven't seen a case where unloading classes cause a several minute pause. >>> Are you sure your system is not swapping? And, do you have GC logs you >>> can share that illustrate the behavior and that -noclassgc fixed it? >> >> We deleted swap partition long time ago, we simply do not risk swapping at all. > > You may need some additional swap, or you'll need additional memory for backing reserved space even though the application may not use it. > >> >> We had this class unloading problem several times per day like half a >> year ago, and fixed it with noclasssgc, that was a no-brainer, single >> parameter that made the difference. > > Ok, if you're convinced it fixes your issue, then use it. :-) Usually class unloading issues generally implies perm gen size needs increases, or initial perm gen size could use increasing as an alternative. > >> >> It is also discussed here (they do not discuss noclassgc though, we >> figured that out somehow) >> http://stackoverflow.com/questions/2833983/meaning-of-the-unloading-class-messages >> >>>>> -XX:+ UseGCOverheadLimit >>>>> -XX:ReservedCodeCacheSize=48, that is the default for 7u21. You might consider setting it higher if you have the available space, and more importantly if you think you're running out of code space. >>>> >>>> For our sun jvm linux 64bit 48m is maximum, jvm won't start if higher value. >>> >>> If you can't go larger than -XX:ReservedCodeCacheSize=48m, that may suggest you have memory constraints and may also suggest you don't have enough swap space defined, and you may be experiencing swapping during JVM execution. I've got a Linux system that has 32 GB of RAM, I can set ReservedCodeCacheSize=256m with no issues, even with -Xms30g and -Xms30g. >> >> It is also documented that 48m is maximum >> http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html >> "maximum code cache size. [Solaris 64-bit, amd64, and -server x86: >> 48m" >> > > The documentation means that the command line option sets the maximum code cache size, not that it is the absolute maximum you can set. Rather it sets the default maximum code cache size. It's not any different than setting -Xmx, there is a default -Xmx value if you don't set one, and you can specify a larger one using -Xmx. > > Fwiw, here's the output from 7u21 on my Linux x64 system with 32 GB of RAM, notice I also specified +AlwaysPreTouch which forces every page to be touched as part of the command execution to illustrate the memory has been reserved, committed and touched: > $ java -Xmx26g -Xms26g -XX:+AlwaysPreTouch -XX:ReservedCodeCacheSize=256m -version > java version "1.7.0_21" > Java(TM) SE Runtime Environment (build 1.7.0_21-b11) > Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode) > > >> ** >> Martin >> >>> >>> >>>>> >>>>> On May 26, 2013, at 10:20 AM, Martin Makundi wrote: >>>>> >>>>>> Sorry, forgot to mention, using: >>>>>> >>>>>> java version "1.7.0_21" >>>>>> Java(TM) SE Runtime Environment (build 1.7.0_21-b11) >>>>>> Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode) >>>>>> >>>>>> Linux version 3.0.1.stk64 (dfn at localhost.localdomain) (gcc version >>>>>> 4.5.1 20100924 (Red Hat 4.5.1-4) (GCC) ) #1 SMP Sat Aug 13 12:53:46 >>>>>> EDT 2011 >>>>>> >>>>>> -Dclassworlds.conf=/usr/share/maven/maven/bin/m2.conf >>>>>> -Dmaven.home=/usr/share/maven/maven >>>>>> -Duser.timezone=EET >>>>>> -XX:+AggressiveOpts >>>>>> -XX:+DisableExplicitGC >>>>>> -XX:+ParallelRefProcEnabled >>>>>> -XX:+PrintGCDateStamps >>>>>> -XX:+PrintGCDetails >>>>>> -XX:+PrintHeapAtGC >>>>>> -XX:+UseAdaptiveSizePolicy >>>>>> -XX:+UseCompressedOops >>>>>> -XX:+UseFastAccessorMethods >>>>>> -XX:+UseG1GC >>>>>> -XX:+UseGCOverheadLimit >>>>>> -XX:+UseNUMA >>>>>> -XX:+UseStringCache >>>>>> -XX:CMSInitiatingOccupancyFraction=70 >>>>>> -XX:GCPauseIntervalMillis=10000 >>>>>> -XX:InitiatingHeapOccupancyPercent=0 >>>>>> -XX:MaxGCPauseMillis=500 >>>>>> -XX:MaxPermSize=512m >>>>>> -XX:PermSize=512m >>>>>> -XX:ReservedCodeCacheSize=48m >>>>>> -Xloggc:gc.log >>>>>> -Xmaxf1 >>>>>> -Xms30G >>>>>> -Xmx30G >>>>>> -Xnoclassgc >>>>>> -Xss4096k >>>>>> >>>>>> >>>>>> ** >>>>>> Martin >>>>>> >>>>>> 2013/5/26 Charlie Hunt : >>>>>>> Which version of the JDK/JRE are you using? >>>>>>> >>>>>>> One of the links you referenced below was using JDK 6, where there is no official support for G1. The other link suggests it could have been RMI DGC or a System.gc(). >>>>>>> >>>>>>> >>>>>>> >>>>>>> Sent from my iPhone >>>>>>> >>>>>>> On May 25, 2013, at 11:43 PM, "Martin Makundi" wrote: >>>>>>> >>>>>>>> it occurs daily. >>>>> >>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130529/9978f7a5/attachment-0001.html From martin.makundi at koodaripalvelut.com Wed May 29 09:47:17 2013 From: martin.makundi at koodaripalvelut.com (Martin Makundi) Date: Wed, 29 May 2013 19:47:17 +0300 Subject: Bug in G1GC it performs Full GC when code cache is full resulting in overkill In-Reply-To: References: <9A54ECD2-B470-4797-9EFB-E81DD91C6B3F@salesforce.com> Message-ID: Hi! > To use your analogy, would you put diesel fuel in your gasoline powered > vehicle when the label at the pump says "diesel fuel"? Well, eh.. to use your analogy, we are like the old lady who distinguishes cars by color only =) You are looking down on us from a high tower...was it 12 years JVM tuning? ..huh..we are yet just toddlers... ;) > You may need some additional swap, or you'll need additional memory for > backing reserved space even though the application may not use it. We had memory fixed so swapping should not be an issue if the problem is inside jvm..which seems to be. We have free memory on linux side so physical memory as such is not a problem. >> We had this class unloading problem several times per day like half a >> year ago, and fixed it with noclasssgc, that was a no-brainer, single >> parameter that made the difference. > > Ok, if you're convinced it fixes your issue, then use it. :-) Usually class > unloading issues generally implies perm gen size needs increases, or initial > perm gen size could use increasing as an alternative. Also permgen was measured at that time and it was not an issue, there was plenty of permgen space, but probably some unreferenced loaded classes. I read somewhere that if your code uses lots of reflection it might generate some moss...anyways, works now so I quit guessing =) > -XX:ReservedCodeCacheSize=48, that is the default for 7u21. You might > consider setting it higher if you have the available space, and more > importantly if you think you're running out of code space. > > For our sun jvm linux 64bit 48m is maximum, jvm won't start if higher value. > > > If you can't go larger than -XX:ReservedCodeCacheSize=48m, that may suggest > you have memory constraints and may also suggest you don't have enough swap > space defined, and you may be experiencing swapping during JVM execution. > I've got a Linux system that has 32 GB of RAM, I can set > ReservedCodeCacheSize=256m with no issues, even with -Xms30g and -Xms30g. > > > It is also documented that 48m is maximum > > http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html > > "maximum code cache size. [Solaris 64-bit, amd64, and -server x86: > > 48m" > > > > The documentation means that the command line option sets the maximum code > cache size, not that it is the absolute maximum you can set. Rather it sets > the default maximum code cache size. It's not any different than setting > -Xmx, there is a default -Xmx value if you don't set one, and you can > specify a larger one using -Xmx. > > Fwiw, here's the output from 7u21 on my Linux x64 system with 32 GB of RAM, > notice I also specified +AlwaysPreTouch which forces every page to be > touched as part of the command execution to illustrate the memory has been > reserved, committed and touched: > $ java -Xmx26g -Xms26g -XX:+AlwaysPreTouch -XX:ReservedCodeCacheSize=256m > -version > java version "1.7.0_21" > Java(TM) SE Runtime Environment (build 1.7.0_21-b11) > Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode) Ok, you are right, it seems to work. What do you recommend for code cache size or how to find a good value for it? ** Martin > > > > > > On May 26, 2013, at 10:20 AM, Martin Makundi wrote: > > > Sorry, forgot to mention, using: > > > java version "1.7.0_21" > > Java(TM) SE Runtime Environment (build 1.7.0_21-b11) > > Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode) > > > Linux version 3.0.1.stk64 (dfn at localhost.localdomain) (gcc version > > 4.5.1 20100924 (Red Hat 4.5.1-4) (GCC) ) #1 SMP Sat Aug 13 12:53:46 > > EDT 2011 > > > -Dclassworlds.conf=/usr/share/maven/maven/bin/m2.conf > > -Dmaven.home=/usr/share/maven/maven > > -Duser.timezone=EET > > -XX:+AggressiveOpts > > -XX:+DisableExplicitGC > > -XX:+ParallelRefProcEnabled > > -XX:+PrintGCDateStamps > > -XX:+PrintGCDetails > > -XX:+PrintHeapAtGC > > -XX:+UseAdaptiveSizePolicy > > -XX:+UseCompressedOops > > -XX:+UseFastAccessorMethods > > -XX:+UseG1GC > > -XX:+UseGCOverheadLimit > > -XX:+UseNUMA > > -XX:+UseStringCache > > -XX:CMSInitiatingOccupancyFraction=70 > > -XX:GCPauseIntervalMillis=10000 > > -XX:InitiatingHeapOccupancyPercent=0 > > -XX:MaxGCPauseMillis=500 > > -XX:MaxPermSize=512m > > -XX:PermSize=512m > > -XX:ReservedCodeCacheSize=48m > > -Xloggc:gc.log > > -Xmaxf1 > > -Xms30G > > -Xmx30G > > -Xnoclassgc > > -Xss4096k > > > > ** > > Martin > > > 2013/5/26 Charlie Hunt : > > Which version of the JDK/JRE are you using? > > > One of the links you referenced below was using JDK 6, where there is no > official support for G1. The other link suggests it could have been RMI DGC > or a System.gc(). > > > > > Sent from my iPhone > > > On May 25, 2013, at 11:43 PM, "Martin Makundi" > wrote: > > > it occurs daily. > > > > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > From chunt at salesforce.com Wed May 29 10:25:35 2013 From: chunt at salesforce.com (Charlie Hunt) Date: Wed, 29 May 2013 12:25:35 -0500 Subject: Bug in G1GC it performs Full GC when code cache is full resulting in overkill In-Reply-To: References: <9A54ECD2-B470-4797-9EFB-E81DD91C6B3F@salesforce.com> Message-ID: <6208C497-21E4-4A26-B825-13F23E702898@salesforce.com> On May 29, 2013, at 11:47 AM, Martin Makundi wrote: > Hi! > >> To use your analogy, would you put diesel fuel in your gasoline powered >> vehicle when the label at the pump says "diesel fuel"? > > Well, eh.. to use your analogy, we are like the old lady who > distinguishes cars by color only =) You are looking down on us from a > high tower...was it 12 years JVM tuning? ..huh..we are yet just > toddlers... ;) > >> You may need some additional swap, or you'll need additional memory for >> backing reserved space even though the application may not use it. > > We had memory fixed so swapping should not be an issue if the problem > is inside jvm..which seems to be. > > We have free memory on linux side so physical memory as such is not a problem. Having free memory is one thing. How much free memory do you have? Do you have 2x more than your Java heap size, including -Xmx and what you're specifying for MaxPermSize? > >>> We had this class unloading problem several times per day like half a >>> year ago, and fixed it with noclasssgc, that was a no-brainer, single >>> parameter that made the difference. >> >> Ok, if you're convinced it fixes your issue, then use it. :-) Usually class >> unloading issues generally implies perm gen size needs increases, or initial >> perm gen size could use increasing as an alternative. > > Also permgen was measured at that time and it was not an issue, there > was plenty of permgen space, but probably some unreferenced loaded > classes. I read somewhere that if your code uses lots of reflection > it might generate some moss...anyways, works now so I quit guessing =) Question is did you set both -XX:PermSize and -XX:MaxPermSize to the same value? If not, if perm gen needs to expand from the initial size, or gets close to that initial size, the JVM may attempt to unload classes to free up space prior to expanding perm gen. But, if you'd rather use -noclassgc, then go ahead. :-) > >> -XX:ReservedCodeCacheSize=48, that is the default for 7u21. You might >> consider setting it higher if you have the available space, and more >> importantly if you think you're running out of code space. >> >> For our sun jvm linux 64bit 48m is maximum, jvm won't start if higher value. >> >> >> If you can't go larger than -XX:ReservedCodeCacheSize=48m, that may suggest >> you have memory constraints and may also suggest you don't have enough swap >> space defined, and you may be experiencing swapping during JVM execution. >> I've got a Linux system that has 32 GB of RAM, I can set >> ReservedCodeCacheSize=256m with no issues, even with -Xms30g and -Xms30g. >> >> >> It is also documented that 48m is maximum >> >> http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html >> >> "maximum code cache size. [Solaris 64-bit, amd64, and -server x86: >> >> 48m" >> >> >> >> The documentation means that the command line option sets the maximum code >> cache size, not that it is the absolute maximum you can set. Rather it sets >> the default maximum code cache size. It's not any different than setting >> -Xmx, there is a default -Xmx value if you don't set one, and you can >> specify a larger one using -Xmx. >> >> Fwiw, here's the output from 7u21 on my Linux x64 system with 32 GB of RAM, >> notice I also specified +AlwaysPreTouch which forces every page to be >> touched as part of the command execution to illustrate the memory has been >> reserved, committed and touched: >> $ java -Xmx26g -Xms26g -XX:+AlwaysPreTouch -XX:ReservedCodeCacheSize=256m >> -version >> java version "1.7.0_21" >> Java(TM) SE Runtime Environment (build 1.7.0_21-b11) >> Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode) > > Ok, you are right, it seems to work. What do you recommend for code > cache size or how to find a good value for it? There's not magic number of what to increase it too. It's probably more magical than suggesting a Java heap configuration size to use. We don't know how much code you've got and how much of it will be executed enough times to compile, or require a de-opt and re-opt. However, you can monitor the occupancy of code cache in a couple different ways. There's a JMX MBean for the code cache where you can get the occupancy and size of code cache. There's also a plug-in for VisualVM that monitors code cache size and occupancy, you get a copy of the plug-in and install it into VisualVM. The web site for it is: https://java.net/projects/memorypoolview. If you're monitoring code cache occupancy in production today, you should probably put it on your short list. In JDK 7u21, code cache flushing is enabled by default. So when code cache approaches getting full, it will attempt to flush the oldest compilations to make available space. I noticed some recent commits for JDK 8 that improve the behavior of code cache flushing. If your application requires a huge amount of code cache space, flushing may not be the best option. Ideally, you'd like to not have to rely on flushing. You can disable code cache flushing by using -XX:-UseCodeCacheFlushing. But, realize that if you run out of code cache space, JIT compilation will be halted. If you monitor code cache with code cache flushing enabled and you see code cache occupancy grow near capacity and drop back, it's a symptom that code cache flushing is taken place and new (JIT compilation) activations are occurring. If you disable code cache flushing, and you see code cache occupancy grow near capacity and you've noticed application throughput slows down, or your response times have increased, you can expect that code cache space has been exhausted. If you have code cache flushing disabled, if you run out of code cache, you'll get a message in your log that says: CodeCache is full. Compiler has been disabled. Try increasing the code cache size using -XX:ReservedCodeCacheSize= > ** > Martin > > >> >> >> >> >> >> On May 26, 2013, at 10:20 AM, Martin Makundi wrote: >> >> >> Sorry, forgot to mention, using: >> >> >> java version "1.7.0_21" >> >> Java(TM) SE Runtime Environment (build 1.7.0_21-b11) >> >> Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode) >> >> >> Linux version 3.0.1.stk64 (dfn at localhost.localdomain) (gcc version >> >> 4.5.1 20100924 (Red Hat 4.5.1-4) (GCC) ) #1 SMP Sat Aug 13 12:53:46 >> >> EDT 2011 >> >> >> -Dclassworlds.conf=/usr/share/maven/maven/bin/m2.conf >> >> -Dmaven.home=/usr/share/maven/maven >> >> -Duser.timezone=EET >> >> -XX:+AggressiveOpts >> >> -XX:+DisableExplicitGC >> >> -XX:+ParallelRefProcEnabled >> >> -XX:+PrintGCDateStamps >> >> -XX:+PrintGCDetails >> >> -XX:+PrintHeapAtGC >> >> -XX:+UseAdaptiveSizePolicy >> >> -XX:+UseCompressedOops >> >> -XX:+UseFastAccessorMethods >> >> -XX:+UseG1GC >> >> -XX:+UseGCOverheadLimit >> >> -XX:+UseNUMA >> >> -XX:+UseStringCache >> >> -XX:CMSInitiatingOccupancyFraction=70 >> >> -XX:GCPauseIntervalMillis=10000 >> >> -XX:InitiatingHeapOccupancyPercent=0 >> >> -XX:MaxGCPauseMillis=500 >> >> -XX:MaxPermSize=512m >> >> -XX:PermSize=512m >> >> -XX:ReservedCodeCacheSize=48m >> >> -Xloggc:gc.log >> >> -Xmaxf1 >> >> -Xms30G >> >> -Xmx30G >> >> -Xnoclassgc >> >> -Xss4096k >> >> >> >> ** >> >> Martin >> >> >> 2013/5/26 Charlie Hunt : >> >> Which version of the JDK/JRE are you using? >> >> >> One of the links you referenced below was using JDK 6, where there is no >> official support for G1. The other link suggests it could have been RMI DGC >> or a System.gc(). >> >> >> >> >> Sent from my iPhone >> >> >> On May 25, 2013, at 11:43 PM, "Martin Makundi" >> wrote: >> >> >> it occurs daily. >> >> >> >> >> >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130529/dd5f29d1/attachment-0001.html From john.cuthbertson at oracle.com Wed May 29 10:35:27 2013 From: john.cuthbertson at oracle.com (John Cuthbertson) Date: Wed, 29 May 2013 10:35:27 -0700 Subject: Bug in G1GC it performs Full GC when code cache is full resulting in overkill In-Reply-To: References: <3807DE76-D6CA-451F-AC72-771332825905@salesforce.com> <5C892669-3515-4DCC-BB72-96AE89EBE5F8@salesforce.com> Message-ID: <51A63C5F.9040808@oracle.com> Hi Martin, I'm going to fill in bit more detail to Charlie's replies.... On 5/29/2013 8:49 AM, Martin Makundi wrote: > Hi! > >> A bit of constructive criticism ;-) It would be good practice to set one option at a time and measure its performance to determine whether it improves performance rather than choosing an option because of something you read in text. In short, always measure and reason about whether what you've observed for an improvement or regression makes sense. And, also run multiple times to get a sense of noise versus real improvement or regression. > > Thanks. That's one of the reasons we never changed our options. Once > we found someting that works very well, we know that its always n! > work to test changes and the system was running very nice indeed > before the previous tweak ;) > >>>> -XX:+UseFastAccessorMethod (the default is disabled) >>> Fast sounds good, the description of it is "Use optimized versions of >>> GetField" which sounds good. I see no harm in this. >> These would be JNI operations. >> >> A quick at the HotSpot source suggests UseFastAccessorMethods >> is mostly confined to interpreter operations. > Thanks for the info. Doesn't say much to me, but does not seem to harm > anything. Will try setting it off at some point in time. > >>>> -XX:+UseNUMA (Are you running a JVM that spans NUMA memory nodes? >>>> Or, do you have multiple JVMs running on a NUMA or non-NUMA system?) >> Key point is that you should use -XX:+UseNUMA only when you are >> deploying a JVM that spans NUMA nodes. > Thanks for the info. Doesn't say much to me, but does not seem to harm > anything. Will try setting it off at some point in time. The fast accessor methods flag creates specialized (i.e. short and optimized) interpreter entry points for accessor methods (those that just return the value in one of the object's fields). In most applications the bulk of the execution time is spent executing JIT compiled code; only a few percent is typically spent in Hotspot's interpreter. The JIT compiler will always try to inline accessor methods into their caller. So, unless your application is spending a ton of time interpreting, this flag should make no difference. >>>> -XX:CMSInitiatingOccupancyFraction=70 (This is applicable to CMS GC, and not applicable to G1 GC) >>> Again, not documented thoroughly where it applies and where not, jvm >>> gave no warning/error about it so we assumed it's valid. >> There's always the HotSpot source code ;-) >> >> It's also quite well documented in various slide ware on the internet. >> It's also quite well documented in the Java Performance book. :-) > Uh.. does it say somewhere that Do not use > XX:CMSInitiatingOccupancyFraction with G1GC? ;) I know performance > tuning is your bread and butter but is not ours... is more like we are > just driving the car and you are the mechanic...different > perspective.. just trying to fill'er'up to go..leaded or unleaded... > ;) The G1 equivalent of this flag is InitiatingHeapOccupancyPercent (IHOP for short). Actually both G1 and CMS accept and observe IHOP. CMSInitiatingOccupancyFraction was superseded by IHOP. The JVM still accepts the old flag name - but it is CMS only and doesn't affect G1. >>>> -XX:InitiatingHeapOccupancyPercent=0 (You realize this will force G1's concurrent cycle to run continuously?) >>> Yes, that's what we figured out, we don't want it to sit lazy and end >>> up in a situation where it is required to do a Full GC. This switch >>> was specifically chosen in a situation we had a memory leak and tried >>> to aggressively fight against it before we found the root cause. Maybe >>> we should try without this switch now, and see what effect it has. >> Having GC logs to see what available head room you have between the >> initiating of a G1 concurrent cycle and available regions / heap space would be most appropriate. > Hmm.. I don't thoroughly understand the logs either, but, here is a snap: > > 2013-05-29T17:28:56.119+0300: 38905.407: [GC pause (young), 0.29261000 secs] > [Parallel Time: 288.8 ms] > [GC Worker Start (ms): 38905407.6 38905407.7 38905407.7 38905407.9 > Avg: 38905407.7, Min: 38905407.6, Max: 38905407.9, Diff: 0.3] > [Ext Root Scanning (ms): 22.8 16.3 18.1 22.0 > Avg: 19.8, Min: 16.3, Max: 22.8, Diff: 6.6] > [SATB Filtering (ms): 0.0 0.1 0.0 0.0 > Avg: 0.0, Min: 0.0, Max: 0.1, Diff: 0.1] > [Update RS (ms): 31.9 37.3 35.1 33.3 > Avg: 34.4, Min: 31.9, Max: 37.3, Diff: 5.5] > [Processed Buffers : 102 106 119 104 > Sum: 431, Avg: 107, Min: 102, Max: 119, Diff: 17] > [Scan RS (ms): 0.0 0.0 0.1 0.0 > Avg: 0.0, Min: 0.0, Max: 0.1, Diff: 0.1] > [Object Copy (ms): 228.2 229.1 229.5 227.3 > Avg: 228.5, Min: 227.3, Max: 229.5, Diff: 2.2] > [Termination (ms): 0.0 0.0 0.0 0.0 > Avg: 0.0, Min: 0.0, Max: 0.0, Diff: 0.0] > [Termination Attempts : 4 1 11 4 > Sum: 20, Avg: 5, Min: 1, Max: 11, Diff: 10] > [GC Worker End (ms): 38905690.5 38905690.5 38905690.5 38905690.5 > Avg: 38905690.5, Min: 38905690.5, Max: 38905690.5, Diff: 0.0] > [GC Worker (ms): 282.9 282.8 282.8 282.6 > Avg: 282.8, Min: 282.6, Max: 282.9, Diff: 0.3] > [GC Worker Other (ms): 5.9 6.0 6.0 6.2 > Avg: 6.0, Min: 5.9, Max: 6.2, Diff: 0.3] > [Complete CSet Marking: 0.0 ms] > [Clear CT: 0.1 ms] > [Other: 3.7 ms] > [Choose CSet: 0.0 ms] > [Ref Proc: 2.8 ms] > [Ref Enq: 0.1 ms] > [Free CSet: 0.3 ms] > [Eden: 48M(5032M)->0B(5032M) Survivors: 288M->288M Heap: > 15790M(26624M)->15741M(26624M)] > [Times: user=1.14 sys=0.00, real=0.29 secs] > Heap after GC invocations=575 (full 157): > garbage-first heap total 27262976K, used 16119181K > [0x0000000160000000, 0x00000007e0000000, 0x00000007e0000000) > region size 8192K, 36 young (294912K), 36 survivors (294912K) > compacting perm gen total 524288K, used 164479K [0x00000007e0000000, > 0x0000000800000000, 0x0000000800000000) > the space 524288K, 31% used [0x00000007e0000000, > 0x00000007ea09ff68, 0x00000007ea0a0000, 0x0000000800000000) > No shared spaces configured. > } > {Heap before GC invocations=575 (full 157): > garbage-first heap total 27262976K, used 16119181K > [0x0000000160000000, 0x00000007e0000000, 0x00000007e0000000) > region size 8192K, 37 young (303104K), 36 survivors (294912K) > compacting perm gen total 524288K, used 164479K [0x00000007e0000000, > 0x0000000800000000, 0x0000000800000000) > the space 524288K, 31% used [0x00000007e0000000, > 0x00000007ea09ff68, 0x00000007ea0a0000, 0x0000000800000000) > No shared spaces configured. > 2013-05-29T17:28:56.413+0300: 38905.701: [Full GC > 15742M->14497M(26624M), 56.7731320 secs] > > That's the third Full GC today after the change to 26G and change from > occupancypercent=0. Tomorrow will be trying again with > occupancypercent=0 What did you set the IHOP value to? > >>>> -noclassgc (This is rarely needed and haven't seen an app that required it for quite some time) >>> Jvm 1.6 stopped the world for couple of minutes several times per day >>> while unloading classes, so we used noclassgc to disable that. We do >>> not know if this is necessary for latest 1.7 to avoid class unload >>> pause, but we continued to use this switch and found no harm in it. >>> Can't afford testing that in production ;) >> Haven't seen a case where unloading classes cause a several minute pause. >> Are you sure your system is not swapping? And, do you have GC logs you >> can share that illustrate the behavior and that -noclassgc fixed it? > We deleted swap partition long time ago, we simply do not risk swapping at all. > > We had this class unloading problem several times per day like half a > year ago, and fixed it with noclasssgc, that was a no-brainer, single > parameter that made the difference. > > It is also discussed here (they do not discuss noclassgc though, we > figured that out somehow) > http://stackoverflow.com/questions/2833983/meaning-of-the-unloading-class-messages G1 only performs class unloading during a full GC. But if you're not running out of perm space or compiled code cache - you can leave this flag. > >>>> -XX:+ UseGCOverheadLimit >>>> -XX:ReservedCodeCacheSize=48, that is the default for 7u21. You might consider setting it higher if you have the available space, and more importantly if you think you're running out of code space. >>> For our sun jvm linux 64bit 48m is maximum, jvm won't start if higher value. >> If you can't go larger than -XX:ReservedCodeCacheSize=48m, that may suggest you have memory constraints and may also suggest you don't have enough swap space defined, and you may be experiencing swapping during JVM execution. I've got a Linux system that has 32 GB of RAM, I can set ReservedCodeCacheSize=256m with no issues, even with -Xms30g and -Xms30g. > It is also documented that 48m is maximum > http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html > "maximum code cache size. [Solaris 64-bit, amd64, and -server x86: > 48m" > > That's the default max code cache size. When the JIT compiler compiles a Java method it places the generated code into the code cache. When there's no more room in the code cache, a warning is issued and JIT compilation is stopped. You can set it higher. IIRC there was time in the past when the size was limited in order to use short branches in compiled code. I don't think we've had that restriction for a while. HTHs JohnC From darius.ski at gmail.com Wed May 29 12:11:24 2013 From: darius.ski at gmail.com (Darius D.) Date: Wed, 29 May 2013 22:11:24 +0300 Subject: Bug in G1GC it performs Full GC when code cache is full resulting in overkill In-Reply-To: <51A63C5F.9040808@oracle.com> References: <3807DE76-D6CA-451F-AC72-771332825905@salesforce.com> <5C892669-3515-4DCC-BB72-96AE89EBE5F8@salesforce.com> <51A63C5F.9040808@oracle.com> Message-ID: Hi, I'd strongly suggest that Martin should add -XX:+PrintAdaptiveSizePolicy to his JVM options. In our case that was what we needed to solve the mystery of FullGCs with gigabytes of heap free. Actually with some minor googling around i've found: https://forums.oracle.com/forums/thread.jspa?messageID=10869877 I suspect it could be same story as ours, "humongous allocation request failed" is bad for JVM health, FullGC will occur immediately. Remember, any allocation that is larger than half of G1GC region size will get allocated as "humongous" object on heap, that does not care about regions etc. In our case we were failing to allocate 32 megabytes with over 50% of heap free! Best regards, Darius. On Wed, May 29, 2013 at 8:35 PM, John Cuthbertson wrote: > Hi Martin, > > I'm going to fill in bit more detail to Charlie's replies.... > > On 5/29/2013 8:49 AM, Martin Makundi wrote: >> Hi! >> >>> A bit of constructive criticism ;-) It would be good practice to set one option at a time and measure its performance to determine whether it improves performance rather than choosing an option because of something you read in text. In short, always measure and reason about whether what you've observed for an improvement or regression makes sense. And, also run multiple times to get a sense of noise versus real improvement or regression. >> >> Thanks. That's one of the reasons we never changed our options. Once >> we found someting that works very well, we know that its always n! >> work to test changes and the system was running very nice indeed >> before the previous tweak ;) >> >>>>> -XX:+UseFastAccessorMethod (the default is disabled) >>>> Fast sounds good, the description of it is "Use optimized versions of >>>> GetField" which sounds good. I see no harm in this. >>> These would be JNI operations. >>> >>> A quick at the HotSpot source suggests UseFastAccessorMethods >>> is mostly confined to interpreter operations. >> Thanks for the info. Doesn't say much to me, but does not seem to harm >> anything. Will try setting it off at some point in time. >> >>>>> -XX:+UseNUMA (Are you running a JVM that spans NUMA memory nodes? >>>>> Or, do you have multiple JVMs running on a NUMA or non-NUMA system?) >>> Key point is that you should use -XX:+UseNUMA only when you are >>> deploying a JVM that spans NUMA nodes. >> Thanks for the info. Doesn't say much to me, but does not seem to harm >> anything. Will try setting it off at some point in time. > > The fast accessor methods flag creates specialized (i.e. short and > optimized) interpreter entry points for accessor methods (those that > just return the value in one of the object's fields). In most > applications the bulk of the execution time is spent executing JIT > compiled code; only a few percent is typically spent in Hotspot's > interpreter. The JIT compiler will always try to inline accessor methods > into their caller. So, unless your application is spending a ton of time > interpreting, this flag should make no difference. > >>>>> -XX:CMSInitiatingOccupancyFraction=70 (This is applicable to CMS GC, and not applicable to G1 GC) >>>> Again, not documented thoroughly where it applies and where not, jvm >>>> gave no warning/error about it so we assumed it's valid. >>> There's always the HotSpot source code ;-) >>> >>> It's also quite well documented in various slide ware on the internet. >>> It's also quite well documented in the Java Performance book. :-) >> Uh.. does it say somewhere that Do not use >> XX:CMSInitiatingOccupancyFraction with G1GC? ;) I know performance >> tuning is your bread and butter but is not ours... is more like we are >> just driving the car and you are the mechanic...different >> perspective.. just trying to fill'er'up to go..leaded or unleaded... >> ;) > > The G1 equivalent of this flag is InitiatingHeapOccupancyPercent (IHOP > for short). Actually both G1 and CMS accept and observe IHOP. > CMSInitiatingOccupancyFraction was superseded by IHOP. The JVM still > accepts the old flag name - but it is CMS only and doesn't affect G1. > >>>>> -XX:InitiatingHeapOccupancyPercent=0 (You realize this will force G1's concurrent cycle to run continuously?) >>>> Yes, that's what we figured out, we don't want it to sit lazy and end >>>> up in a situation where it is required to do a Full GC. This switch >>>> was specifically chosen in a situation we had a memory leak and tried >>>> to aggressively fight against it before we found the root cause. Maybe >>>> we should try without this switch now, and see what effect it has. >>> Having GC logs to see what available head room you have between the >>> initiating of a G1 concurrent cycle and available regions / heap space would be most appropriate. >> Hmm.. I don't thoroughly understand the logs either, but, here is a snap: >> >> 2013-05-29T17:28:56.119+0300: 38905.407: [GC pause (young), 0.29261000 secs] >> [Parallel Time: 288.8 ms] >> [GC Worker Start (ms): 38905407.6 38905407.7 38905407.7 38905407.9 >> Avg: 38905407.7, Min: 38905407.6, Max: 38905407.9, Diff: 0.3] >> [Ext Root Scanning (ms): 22.8 16.3 18.1 22.0 >> Avg: 19.8, Min: 16.3, Max: 22.8, Diff: 6.6] >> [SATB Filtering (ms): 0.0 0.1 0.0 0.0 >> Avg: 0.0, Min: 0.0, Max: 0.1, Diff: 0.1] >> [Update RS (ms): 31.9 37.3 35.1 33.3 >> Avg: 34.4, Min: 31.9, Max: 37.3, Diff: 5.5] >> [Processed Buffers : 102 106 119 104 >> Sum: 431, Avg: 107, Min: 102, Max: 119, Diff: 17] >> [Scan RS (ms): 0.0 0.0 0.1 0.0 >> Avg: 0.0, Min: 0.0, Max: 0.1, Diff: 0.1] >> [Object Copy (ms): 228.2 229.1 229.5 227.3 >> Avg: 228.5, Min: 227.3, Max: 229.5, Diff: 2.2] >> [Termination (ms): 0.0 0.0 0.0 0.0 >> Avg: 0.0, Min: 0.0, Max: 0.0, Diff: 0.0] >> [Termination Attempts : 4 1 11 4 >> Sum: 20, Avg: 5, Min: 1, Max: 11, Diff: 10] >> [GC Worker End (ms): 38905690.5 38905690.5 38905690.5 38905690.5 >> Avg: 38905690.5, Min: 38905690.5, Max: 38905690.5, Diff: 0.0] >> [GC Worker (ms): 282.9 282.8 282.8 282.6 >> Avg: 282.8, Min: 282.6, Max: 282.9, Diff: 0.3] >> [GC Worker Other (ms): 5.9 6.0 6.0 6.2 >> Avg: 6.0, Min: 5.9, Max: 6.2, Diff: 0.3] >> [Complete CSet Marking: 0.0 ms] >> [Clear CT: 0.1 ms] >> [Other: 3.7 ms] >> [Choose CSet: 0.0 ms] >> [Ref Proc: 2.8 ms] >> [Ref Enq: 0.1 ms] >> [Free CSet: 0.3 ms] >> [Eden: 48M(5032M)->0B(5032M) Survivors: 288M->288M Heap: >> 15790M(26624M)->15741M(26624M)] >> [Times: user=1.14 sys=0.00, real=0.29 secs] >> Heap after GC invocations=575 (full 157): >> garbage-first heap total 27262976K, used 16119181K >> [0x0000000160000000, 0x00000007e0000000, 0x00000007e0000000) >> region size 8192K, 36 young (294912K), 36 survivors (294912K) >> compacting perm gen total 524288K, used 164479K [0x00000007e0000000, >> 0x0000000800000000, 0x0000000800000000) >> the space 524288K, 31% used [0x00000007e0000000, >> 0x00000007ea09ff68, 0x00000007ea0a0000, 0x0000000800000000) >> No shared spaces configured. >> } >> {Heap before GC invocations=575 (full 157): >> garbage-first heap total 27262976K, used 16119181K >> [0x0000000160000000, 0x00000007e0000000, 0x00000007e0000000) >> region size 8192K, 37 young (303104K), 36 survivors (294912K) >> compacting perm gen total 524288K, used 164479K [0x00000007e0000000, >> 0x0000000800000000, 0x0000000800000000) >> the space 524288K, 31% used [0x00000007e0000000, >> 0x00000007ea09ff68, 0x00000007ea0a0000, 0x0000000800000000) >> No shared spaces configured. >> 2013-05-29T17:28:56.413+0300: 38905.701: [Full GC >> 15742M->14497M(26624M), 56.7731320 secs] >> >> That's the third Full GC today after the change to 26G and change from >> occupancypercent=0. Tomorrow will be trying again with >> occupancypercent=0 > > What did you set the IHOP value to? > >> >>>>> -noclassgc (This is rarely needed and haven't seen an app that required it for quite some time) >>>> Jvm 1.6 stopped the world for couple of minutes several times per day >>>> while unloading classes, so we used noclassgc to disable that. We do >>>> not know if this is necessary for latest 1.7 to avoid class unload >>>> pause, but we continued to use this switch and found no harm in it. >>>> Can't afford testing that in production ;) >>> Haven't seen a case where unloading classes cause a several minute pause. >>> Are you sure your system is not swapping? And, do you have GC logs you >>> can share that illustrate the behavior and that -noclassgc fixed it? >> We deleted swap partition long time ago, we simply do not risk swapping at all. >> >> We had this class unloading problem several times per day like half a >> year ago, and fixed it with noclasssgc, that was a no-brainer, single >> parameter that made the difference. >> >> It is also discussed here (they do not discuss noclassgc though, we >> figured that out somehow) >> http://stackoverflow.com/questions/2833983/meaning-of-the-unloading-class-messages > > G1 only performs class unloading during a full GC. But if you're not > running out of perm space or compiled code cache - you can leave this flag. > >> >>>>> -XX:+ UseGCOverheadLimit >>>>> -XX:ReservedCodeCacheSize=48, that is the default for 7u21. You might consider setting it higher if you have the available space, and more importantly if you think you're running out of code space. >>>> For our sun jvm linux 64bit 48m is maximum, jvm won't start if higher value. >>> If you can't go larger than -XX:ReservedCodeCacheSize=48m, that may suggest you have memory constraints and may also suggest you don't have enough swap space defined, and you may be experiencing swapping during JVM execution. I've got a Linux system that has 32 GB of RAM, I can set ReservedCodeCacheSize=256m with no issues, even with -Xms30g and -Xms30g. >> It is also documented that 48m is maximum >> http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html >> "maximum code cache size. [Solaris 64-bit, amd64, and -server x86: >> 48m" >> >> > > That's the default max code cache size. When the JIT compiler compiles a > Java method it places the generated code into the code cache. When > there's no more room in the code cache, a warning is issued and JIT > compilation is stopped. You can set it higher. IIRC there was time in > the past when the size was limited in order to use short branches in > compiled code. I don't think we've had that restriction for a while. > > HTHs > > JohnC > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From martin.makundi at koodaripalvelut.com Thu May 30 11:37:33 2013 From: martin.makundi at koodaripalvelut.com (Martin Makundi) Date: Thu, 30 May 2013 21:37:33 +0300 Subject: Bug in G1GC it performs Full GC when code cache is full resulting in overkill In-Reply-To: <6208C497-21E4-4A26-B825-13F23E702898@salesforce.com> References: <9A54ECD2-B470-4797-9EFB-E81DD91C6B3F@salesforce.com> <6208C497-21E4-4A26-B825-13F23E702898@salesforce.com> Message-ID: Hi! > Having free memory is one thing. How much free memory do you have? Do you > have 2x more than your Java heap size, including -Xmx and what you're > specifying for MaxPermSize? Do we need 2x more (or just 2x) memory relative to java heap size? Why? Currently we have 40gb ram and 26-30gb allocated to java (fixed xm size). The rest is for system needs. > Question is did you set both -XX:PermSize and -XX:MaxPermSize to the same > value? If not, if perm gen needs to expand from the initial size, or gets > close to that initial size, the JVM may attempt to unload classes to free up > space prior to expanding perm gen. Yes, exactly for that reason we have set both equal. > But, if you'd rather use -noclassgc, then go ahead. :-) We had to use that too, permsize alone didn't do the job. > -XX:ReservedCodeCacheSize=48, that is the default for 7u21. > > There's not magic number of what to increase it too. It's probably more > magical than suggesting a Java heap configuration size to use. We don't know > how much code you've got and how much of it will be executed enough times to > compile, or require a de-opt and re-opt. Ok. We set it to 256m and code cache usage is now approximately 20-25% for most of the time. Bit over-sized but luckily we aren't in 8bit hardware anymore so we can afford it ;) > However, you can monitor the occupancy of code cache in a couple different > ways. There's a JMX MBean for the code cache where you can get the > occupancy and size of code cache. There's also a plug-in for VisualVM that > monitors code cache size and occupancy, you get a copy of the plug-in and > install it into VisualVM. The web site for it is: > https://java.net/projects/memorypoolview. If you're monitoring code cache > occupancy in production today, you should probably put it on your short > list. We have quite nice view to our server stats from new relic and appdynamics monitors. Will try 128m though... ** Martin > > > > > > > > On May 26, 2013, at 10:20 AM, Martin Makundi wrote: > > > > Sorry, forgot to mention, using: > > > > java version "1.7.0_21" > > > Java(TM) SE Runtime Environment (build 1.7.0_21-b11) > > > Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode) > > > > Linux version 3.0.1.stk64 (dfn at localhost.localdomain) (gcc version > > > 4.5.1 20100924 (Red Hat 4.5.1-4) (GCC) ) #1 SMP Sat Aug 13 12:53:46 > > > EDT 2011 > > > > -Dclassworlds.conf=/usr/share/maven/maven/bin/m2.conf > > > -Dmaven.home=/usr/share/maven/maven > > > -Duser.timezone=EET > > > -XX:+AggressiveOpts > > > -XX:+DisableExplicitGC > > > -XX:+ParallelRefProcEnabled > > > -XX:+PrintGCDateStamps > > > -XX:+PrintGCDetails > > > -XX:+PrintHeapAtGC > > > -XX:+UseAdaptiveSizePolicy > > > -XX:+UseCompressedOops > > > -XX:+UseFastAccessorMethods > > > -XX:+UseG1GC > > > -XX:+UseGCOverheadLimit > > > -XX:+UseNUMA > > > -XX:+UseStringCache > > > -XX:CMSInitiatingOccupancyFraction=70 > > > -XX:GCPauseIntervalMillis=10000 > > > -XX:InitiatingHeapOccupancyPercent=0 > > > -XX:MaxGCPauseMillis=500 > > > -XX:MaxPermSize=512m > > > -XX:PermSize=512m > > > -XX:ReservedCodeCacheSize=48m > > > -Xloggc:gc.log > > > -Xmaxf1 > > > -Xms30G > > > -Xmx30G > > > -Xnoclassgc > > > -Xss4096k > > > > > ** > > > Martin > > > > 2013/5/26 Charlie Hunt : > > > Which version of the JDK/JRE are you using? > > > > One of the links you referenced below was using JDK 6, where there is no > > official support for G1. The other link suggests it could have been RMI DGC > > or a System.gc(). > > > > > > Sent from my iPhone > > > > On May 25, 2013, at 11:43 PM, "Martin Makundi" > > wrote: > > > > it occurs daily. > > > > > > > > _______________________________________________ > > hotspot-gc-use mailing list > > hotspot-gc-use at openjdk.java.net > > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > > From martin.makundi at koodaripalvelut.com Thu May 30 11:54:04 2013 From: martin.makundi at koodaripalvelut.com (Martin Makundi) Date: Thu, 30 May 2013 21:54:04 +0300 Subject: Bug in G1GC it performs Full GC when code cache is full resulting in overkill In-Reply-To: References: <3807DE76-D6CA-451F-AC72-771332825905@salesforce.com> <5C892669-3515-4DCC-BB72-96AE89EBE5F8@salesforce.com> <51A63C5F.9040808@oracle.com> Message-ID: Hi! > I'd strongly suggest that Martin should add > -XX:+PrintAdaptiveSizePolicy to his JVM options. In our case that was > what we needed to solve the mystery of FullGCs with gigabytes of heap > free. Thanks, will add that tomorrow. > Actually with some minor googling around i've found: > > https://forums.oracle.com/forums/thread.jspa?messageID=10869877 > > I suspect it could be same story as ours, "humongous allocation > request failed" is bad for JVM health, FullGC will occur immediately. > > Remember, any allocation that is larger than half of G1GC region size > will get allocated as "humongous" object on heap, that does not care > about regions etc. In our case we were failing to allocate 32 > megabytes with over 50% of heap free! Any solution to such problem or is it a bug in g1gc? Is there a way to log what code is performing the memory allocation if that happens to be the case? ** Martin > > > Best regards, > > Darius. > > > > On Wed, May 29, 2013 at 8:35 PM, John Cuthbertson > wrote: >> Hi Martin, >> >> I'm going to fill in bit more detail to Charlie's replies.... >> >> On 5/29/2013 8:49 AM, Martin Makundi wrote: >>> Hi! >>> >>>> A bit of constructive criticism ;-) It would be good practice to set one option at a time and measure its performance to determine whether it improves performance rather than choosing an option because of something you read in text. In short, always measure and reason about whether what you've observed for an improvement or regression makes sense. And, also run multiple times to get a sense of noise versus real improvement or regression. >>> >>> Thanks. That's one of the reasons we never changed our options. Once >>> we found someting that works very well, we know that its always n! >>> work to test changes and the system was running very nice indeed >>> before the previous tweak ;) >>> >>>>>> -XX:+UseFastAccessorMethod (the default is disabled) >>>>> Fast sounds good, the description of it is "Use optimized versions of >>>>> GetField" which sounds good. I see no harm in this. >>>> These would be JNI operations. >>>> >>>> A quick at the HotSpot source suggests UseFastAccessorMethods >>>> is mostly confined to interpreter operations. >>> Thanks for the info. Doesn't say much to me, but does not seem to harm >>> anything. Will try setting it off at some point in time. >>> >>>>>> -XX:+UseNUMA (Are you running a JVM that spans NUMA memory nodes? >>>>>> Or, do you have multiple JVMs running on a NUMA or non-NUMA system?) >>>> Key point is that you should use -XX:+UseNUMA only when you are >>>> deploying a JVM that spans NUMA nodes. >>> Thanks for the info. Doesn't say much to me, but does not seem to harm >>> anything. Will try setting it off at some point in time. >> >> The fast accessor methods flag creates specialized (i.e. short and >> optimized) interpreter entry points for accessor methods (those that >> just return the value in one of the object's fields). In most >> applications the bulk of the execution time is spent executing JIT >> compiled code; only a few percent is typically spent in Hotspot's >> interpreter. The JIT compiler will always try to inline accessor methods >> into their caller. So, unless your application is spending a ton of time >> interpreting, this flag should make no difference. >> >>>>>> -XX:CMSInitiatingOccupancyFraction=70 (This is applicable to CMS GC, and not applicable to G1 GC) >>>>> Again, not documented thoroughly where it applies and where not, jvm >>>>> gave no warning/error about it so we assumed it's valid. >>>> There's always the HotSpot source code ;-) >>>> >>>> It's also quite well documented in various slide ware on the internet. >>>> It's also quite well documented in the Java Performance book. :-) >>> Uh.. does it say somewhere that Do not use >>> XX:CMSInitiatingOccupancyFraction with G1GC? ;) I know performance >>> tuning is your bread and butter but is not ours... is more like we are >>> just driving the car and you are the mechanic...different >>> perspective.. just trying to fill'er'up to go..leaded or unleaded... >>> ;) >> >> The G1 equivalent of this flag is InitiatingHeapOccupancyPercent (IHOP >> for short). Actually both G1 and CMS accept and observe IHOP. >> CMSInitiatingOccupancyFraction was superseded by IHOP. The JVM still >> accepts the old flag name - but it is CMS only and doesn't affect G1. >> >>>>>> -XX:InitiatingHeapOccupancyPercent=0 (You realize this will force G1's concurrent cycle to run continuously?) >>>>> Yes, that's what we figured out, we don't want it to sit lazy and end >>>>> up in a situation where it is required to do a Full GC. This switch >>>>> was specifically chosen in a situation we had a memory leak and tried >>>>> to aggressively fight against it before we found the root cause. Maybe >>>>> we should try without this switch now, and see what effect it has. >>>> Having GC logs to see what available head room you have between the >>>> initiating of a G1 concurrent cycle and available regions / heap space would be most appropriate. >>> Hmm.. I don't thoroughly understand the logs either, but, here is a snap: >>> >>> 2013-05-29T17:28:56.119+0300: 38905.407: [GC pause (young), 0.29261000 secs] >>> [Parallel Time: 288.8 ms] >>> [GC Worker Start (ms): 38905407.6 38905407.7 38905407.7 38905407.9 >>> Avg: 38905407.7, Min: 38905407.6, Max: 38905407.9, Diff: 0.3] >>> [Ext Root Scanning (ms): 22.8 16.3 18.1 22.0 >>> Avg: 19.8, Min: 16.3, Max: 22.8, Diff: 6.6] >>> [SATB Filtering (ms): 0.0 0.1 0.0 0.0 >>> Avg: 0.0, Min: 0.0, Max: 0.1, Diff: 0.1] >>> [Update RS (ms): 31.9 37.3 35.1 33.3 >>> Avg: 34.4, Min: 31.9, Max: 37.3, Diff: 5.5] >>> [Processed Buffers : 102 106 119 104 >>> Sum: 431, Avg: 107, Min: 102, Max: 119, Diff: 17] >>> [Scan RS (ms): 0.0 0.0 0.1 0.0 >>> Avg: 0.0, Min: 0.0, Max: 0.1, Diff: 0.1] >>> [Object Copy (ms): 228.2 229.1 229.5 227.3 >>> Avg: 228.5, Min: 227.3, Max: 229.5, Diff: 2.2] >>> [Termination (ms): 0.0 0.0 0.0 0.0 >>> Avg: 0.0, Min: 0.0, Max: 0.0, Diff: 0.0] >>> [Termination Attempts : 4 1 11 4 >>> Sum: 20, Avg: 5, Min: 1, Max: 11, Diff: 10] >>> [GC Worker End (ms): 38905690.5 38905690.5 38905690.5 38905690.5 >>> Avg: 38905690.5, Min: 38905690.5, Max: 38905690.5, Diff: 0.0] >>> [GC Worker (ms): 282.9 282.8 282.8 282.6 >>> Avg: 282.8, Min: 282.6, Max: 282.9, Diff: 0.3] >>> [GC Worker Other (ms): 5.9 6.0 6.0 6.2 >>> Avg: 6.0, Min: 5.9, Max: 6.2, Diff: 0.3] >>> [Complete CSet Marking: 0.0 ms] >>> [Clear CT: 0.1 ms] >>> [Other: 3.7 ms] >>> [Choose CSet: 0.0 ms] >>> [Ref Proc: 2.8 ms] >>> [Ref Enq: 0.1 ms] >>> [Free CSet: 0.3 ms] >>> [Eden: 48M(5032M)->0B(5032M) Survivors: 288M->288M Heap: >>> 15790M(26624M)->15741M(26624M)] >>> [Times: user=1.14 sys=0.00, real=0.29 secs] >>> Heap after GC invocations=575 (full 157): >>> garbage-first heap total 27262976K, used 16119181K >>> [0x0000000160000000, 0x00000007e0000000, 0x00000007e0000000) >>> region size 8192K, 36 young (294912K), 36 survivors (294912K) >>> compacting perm gen total 524288K, used 164479K [0x00000007e0000000, >>> 0x0000000800000000, 0x0000000800000000) >>> the space 524288K, 31% used [0x00000007e0000000, >>> 0x00000007ea09ff68, 0x00000007ea0a0000, 0x0000000800000000) >>> No shared spaces configured. >>> } >>> {Heap before GC invocations=575 (full 157): >>> garbage-first heap total 27262976K, used 16119181K >>> [0x0000000160000000, 0x00000007e0000000, 0x00000007e0000000) >>> region size 8192K, 37 young (303104K), 36 survivors (294912K) >>> compacting perm gen total 524288K, used 164479K [0x00000007e0000000, >>> 0x0000000800000000, 0x0000000800000000) >>> the space 524288K, 31% used [0x00000007e0000000, >>> 0x00000007ea09ff68, 0x00000007ea0a0000, 0x0000000800000000) >>> No shared spaces configured. >>> 2013-05-29T17:28:56.413+0300: 38905.701: [Full GC >>> 15742M->14497M(26624M), 56.7731320 secs] >>> >>> That's the third Full GC today after the change to 26G and change from >>> occupancypercent=0. Tomorrow will be trying again with >>> occupancypercent=0 >> >> What did you set the IHOP value to? >> >>> >>>>>> -noclassgc (This is rarely needed and haven't seen an app that required it for quite some time) >>>>> Jvm 1.6 stopped the world for couple of minutes several times per day >>>>> while unloading classes, so we used noclassgc to disable that. We do >>>>> not know if this is necessary for latest 1.7 to avoid class unload >>>>> pause, but we continued to use this switch and found no harm in it. >>>>> Can't afford testing that in production ;) >>>> Haven't seen a case where unloading classes cause a several minute pause. >>>> Are you sure your system is not swapping? And, do you have GC logs you >>>> can share that illustrate the behavior and that -noclassgc fixed it? >>> We deleted swap partition long time ago, we simply do not risk swapping at all. >>> >>> We had this class unloading problem several times per day like half a >>> year ago, and fixed it with noclasssgc, that was a no-brainer, single >>> parameter that made the difference. >>> >>> It is also discussed here (they do not discuss noclassgc though, we >>> figured that out somehow) >>> http://stackoverflow.com/questions/2833983/meaning-of-the-unloading-class-messages >> >> G1 only performs class unloading during a full GC. But if you're not >> running out of perm space or compiled code cache - you can leave this flag. >> >>> >>>>>> -XX:+ UseGCOverheadLimit >>>>>> -XX:ReservedCodeCacheSize=48, that is the default for 7u21. You might consider setting it higher if you have the available space, and more importantly if you think you're running out of code space. >>>>> For our sun jvm linux 64bit 48m is maximum, jvm won't start if higher value. >>>> If you can't go larger than -XX:ReservedCodeCacheSize=48m, that may suggest you have memory constraints and may also suggest you don't have enough swap space defined, and you may be experiencing swapping during JVM execution. I've got a Linux system that has 32 GB of RAM, I can set ReservedCodeCacheSize=256m with no issues, even with -Xms30g and -Xms30g. >>> It is also documented that 48m is maximum >>> http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html >>> "maximum code cache size. [Solaris 64-bit, amd64, and -server x86: >>> 48m" >>> >>> >> >> That's the default max code cache size. When the JIT compiler compiles a >> Java method it places the generated code into the code cache. When >> there's no more room in the code cache, a warning is issued and JIT >> compilation is stopped. You can set it higher. IIRC there was time in >> the past when the size was limited in order to use short branches in >> compiled code. I don't think we've had that restriction for a while. >> >> HTHs >> >> JohnC >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From monica.beckwith at oracle.com Thu May 30 13:55:35 2013 From: monica.beckwith at oracle.com (Monica Beckwith) Date: Thu, 30 May 2013 15:55:35 -0500 Subject: Bug in G1GC it performs Full GC when code cache is full resulting in overkill In-Reply-To: References: <3807DE76-D6CA-451F-AC72-771332825905@salesforce.com> <5C892669-3515-4DCC-BB72-96AE89EBE5F8@salesforce.com> <51A63C5F.9040808@oracle.com> Message-ID: <51A7BCC7.4050700@oracle.com> +1 to enabling PrintAdaptiveSizePolicy. Darius, As you have already mentioned - Any object with size greater or equal to a half region is called a "humongous" object (H-obj). The max region size for G1 is 32M. So yes, even if you set your region size to max, your 32M object will be considered humongous. Now, there are a couple of things that we should be aware of with respect to humongous regions (H-region)/objects - 1. The H-obj allocation will happen directly into the old generation 1. There will be a check for marking threshold (IHOP), and a concurrent cycle will be initiated if necessary 2. The H-regions are not included in an evacuation pause, since it's just going to increase the copying expense. 1. But if the H-obj(s) are dead, they get freed at the end of the multi-phased concurrent marking cycle. So, I think, if you have to work with H-objs and increasing the region size doesn't help (as is your case), then maybe you should try limiting your nursery so as to allow more space for the old generation, so as to sustain your H-objs till they die or if they are a part of your live data set, then it's all the more necessary to be able to fit them in your old gen. -Monica On 5/30/2013 1:54 PM, Martin Makundi wrote: > Hi! > >> I'd strongly suggest that Martin should add >> -XX:+PrintAdaptiveSizePolicy to his JVM options. In our case that was >> what we needed to solve the mystery of FullGCs with gigabytes of heap >> free. > Thanks, will add that tomorrow. > >> Actually with some minor googling around i've found: >> >> https://forums.oracle.com/forums/thread.jspa?messageID=10869877 >> >> I suspect it could be same story as ours, "humongous allocation >> request failed" is bad for JVM health, FullGC will occur immediately. >> >> Remember, any allocation that is larger than half of G1GC region size >> will get allocated as "humongous" object on heap, that does not care >> about regions etc. In our case we were failing to allocate 32 >> megabytes with over 50% of heap free! > Any solution to such problem or is it a bug in g1gc? Is there a way to > log what code is performing the memory allocation if that happens to > be the case? > > ** > Martin >> >> Best regards, >> >> Darius. >> >> >> >> On Wed, May 29, 2013 at 8:35 PM, John Cuthbertson >> wrote: >>> Hi Martin, >>> >>> I'm going to fill in bit more detail to Charlie's replies.... >>> >>> On 5/29/2013 8:49 AM, Martin Makundi wrote: >>>> Hi! >>>> >>>>> A bit of constructive criticism ;-) It would be good practice to set one option at a time and measure its performance to determine whether it improves performance rather than choosing an option because of something you read in text. In short, always measure and reason about whether what you've observed for an improvement or regression makes sense. And, also run multiple times to get a sense of noise versus real improvement or regression. >>>> Thanks. That's one of the reasons we never changed our options. Once >>>> we found someting that works very well, we know that its always n! >>>> work to test changes and the system was running very nice indeed >>>> before the previous tweak ;) >>>> >>>>>>> -XX:+UseFastAccessorMethod (the default is disabled) >>>>>> Fast sounds good, the description of it is "Use optimized versions of >>>>>> GetField" which sounds good. I see no harm in this. >>>>> These would be JNI operations. >>>>> >>>>> A quick at the HotSpot source suggests UseFastAccessorMethods >>>>> is mostly confined to interpreter operations. >>>> Thanks for the info. Doesn't say much to me, but does not seem to harm >>>> anything. Will try setting it off at some point in time. >>>> >>>>>>> -XX:+UseNUMA (Are you running a JVM that spans NUMA memory nodes? >>>>>>> Or, do you have multiple JVMs running on a NUMA or non-NUMA system?) >>>>> Key point is that you should use -XX:+UseNUMA only when you are >>>>> deploying a JVM that spans NUMA nodes. >>>> Thanks for the info. Doesn't say much to me, but does not seem to harm >>>> anything. Will try setting it off at some point in time. >>> The fast accessor methods flag creates specialized (i.e. short and >>> optimized) interpreter entry points for accessor methods (those that >>> just return the value in one of the object's fields). In most >>> applications the bulk of the execution time is spent executing JIT >>> compiled code; only a few percent is typically spent in Hotspot's >>> interpreter. The JIT compiler will always try to inline accessor methods >>> into their caller. So, unless your application is spending a ton of time >>> interpreting, this flag should make no difference. >>> >>>>>>> -XX:CMSInitiatingOccupancyFraction=70 (This is applicable to CMS GC, and not applicable to G1 GC) >>>>>> Again, not documented thoroughly where it applies and where not, jvm >>>>>> gave no warning/error about it so we assumed it's valid. >>>>> There's always the HotSpot source code ;-) >>>>> >>>>> It's also quite well documented in various slide ware on the internet. >>>>> It's also quite well documented in the Java Performance book. :-) >>>> Uh.. does it say somewhere that Do not use >>>> XX:CMSInitiatingOccupancyFraction with G1GC? ;) I know performance >>>> tuning is your bread and butter but is not ours... is more like we are >>>> just driving the car and you are the mechanic...different >>>> perspective.. just trying to fill'er'up to go..leaded or unleaded... >>>> ;) >>> The G1 equivalent of this flag is InitiatingHeapOccupancyPercent (IHOP >>> for short). Actually both G1 and CMS accept and observe IHOP. >>> CMSInitiatingOccupancyFraction was superseded by IHOP. The JVM still >>> accepts the old flag name - but it is CMS only and doesn't affect G1. >>> >>>>>>> -XX:InitiatingHeapOccupancyPercent=0 (You realize this will force G1's concurrent cycle to run continuously?) >>>>>> Yes, that's what we figured out, we don't want it to sit lazy and end >>>>>> up in a situation where it is required to do a Full GC. This switch >>>>>> was specifically chosen in a situation we had a memory leak and tried >>>>>> to aggressively fight against it before we found the root cause. Maybe >>>>>> we should try without this switch now, and see what effect it has. >>>>> Having GC logs to see what available head room you have between the >>>>> initiating of a G1 concurrent cycle and available regions / heap space would be most appropriate. >>>> Hmm.. I don't thoroughly understand the logs either, but, here is a snap: >>>> >>>> 2013-05-29T17:28:56.119+0300: 38905.407: [GC pause (young), 0.29261000 secs] >>>> [Parallel Time: 288.8 ms] >>>> [GC Worker Start (ms): 38905407.6 38905407.7 38905407.7 38905407.9 >>>> Avg: 38905407.7, Min: 38905407.6, Max: 38905407.9, Diff: 0.3] >>>> [Ext Root Scanning (ms): 22.8 16.3 18.1 22.0 >>>> Avg: 19.8, Min: 16.3, Max: 22.8, Diff: 6.6] >>>> [SATB Filtering (ms): 0.0 0.1 0.0 0.0 >>>> Avg: 0.0, Min: 0.0, Max: 0.1, Diff: 0.1] >>>> [Update RS (ms): 31.9 37.3 35.1 33.3 >>>> Avg: 34.4, Min: 31.9, Max: 37.3, Diff: 5.5] >>>> [Processed Buffers : 102 106 119 104 >>>> Sum: 431, Avg: 107, Min: 102, Max: 119, Diff: 17] >>>> [Scan RS (ms): 0.0 0.0 0.1 0.0 >>>> Avg: 0.0, Min: 0.0, Max: 0.1, Diff: 0.1] >>>> [Object Copy (ms): 228.2 229.1 229.5 227.3 >>>> Avg: 228.5, Min: 227.3, Max: 229.5, Diff: 2.2] >>>> [Termination (ms): 0.0 0.0 0.0 0.0 >>>> Avg: 0.0, Min: 0.0, Max: 0.0, Diff: 0.0] >>>> [Termination Attempts : 4 1 11 4 >>>> Sum: 20, Avg: 5, Min: 1, Max: 11, Diff: 10] >>>> [GC Worker End (ms): 38905690.5 38905690.5 38905690.5 38905690.5 >>>> Avg: 38905690.5, Min: 38905690.5, Max: 38905690.5, Diff: 0.0] >>>> [GC Worker (ms): 282.9 282.8 282.8 282.6 >>>> Avg: 282.8, Min: 282.6, Max: 282.9, Diff: 0.3] >>>> [GC Worker Other (ms): 5.9 6.0 6.0 6.2 >>>> Avg: 6.0, Min: 5.9, Max: 6.2, Diff: 0.3] >>>> [Complete CSet Marking: 0.0 ms] >>>> [Clear CT: 0.1 ms] >>>> [Other: 3.7 ms] >>>> [Choose CSet: 0.0 ms] >>>> [Ref Proc: 2.8 ms] >>>> [Ref Enq: 0.1 ms] >>>> [Free CSet: 0.3 ms] >>>> [Eden: 48M(5032M)->0B(5032M) Survivors: 288M->288M Heap: >>>> 15790M(26624M)->15741M(26624M)] >>>> [Times: user=1.14 sys=0.00, real=0.29 secs] >>>> Heap after GC invocations=575 (full 157): >>>> garbage-first heap total 27262976K, used 16119181K >>>> [0x0000000160000000, 0x00000007e0000000, 0x00000007e0000000) >>>> region size 8192K, 36 young (294912K), 36 survivors (294912K) >>>> compacting perm gen total 524288K, used 164479K [0x00000007e0000000, >>>> 0x0000000800000000, 0x0000000800000000) >>>> the space 524288K, 31% used [0x00000007e0000000, >>>> 0x00000007ea09ff68, 0x00000007ea0a0000, 0x0000000800000000) >>>> No shared spaces configured. >>>> } >>>> {Heap before GC invocations=575 (full 157): >>>> garbage-first heap total 27262976K, used 16119181K >>>> [0x0000000160000000, 0x00000007e0000000, 0x00000007e0000000) >>>> region size 8192K, 37 young (303104K), 36 survivors (294912K) >>>> compacting perm gen total 524288K, used 164479K [0x00000007e0000000, >>>> 0x0000000800000000, 0x0000000800000000) >>>> the space 524288K, 31% used [0x00000007e0000000, >>>> 0x00000007ea09ff68, 0x00000007ea0a0000, 0x0000000800000000) >>>> No shared spaces configured. >>>> 2013-05-29T17:28:56.413+0300: 38905.701: [Full GC >>>> 15742M->14497M(26624M), 56.7731320 secs] >>>> >>>> That's the third Full GC today after the change to 26G and change from >>>> occupancypercent=0. Tomorrow will be trying again with >>>> occupancypercent=0 >>> What did you set the IHOP value to? >>> >>>>>>> -noclassgc (This is rarely needed and haven't seen an app that required it for quite some time) >>>>>> Jvm 1.6 stopped the world for couple of minutes several times per day >>>>>> while unloading classes, so we used noclassgc to disable that. We do >>>>>> not know if this is necessary for latest 1.7 to avoid class unload >>>>>> pause, but we continued to use this switch and found no harm in it. >>>>>> Can't afford testing that in production ;) >>>>> Haven't seen a case where unloading classes cause a several minute pause. >>>>> Are you sure your system is not swapping? And, do you have GC logs you >>>>> can share that illustrate the behavior and that -noclassgc fixed it? >>>> We deleted swap partition long time ago, we simply do not risk swapping at all. >>>> >>>> We had this class unloading problem several times per day like half a >>>> year ago, and fixed it with noclasssgc, that was a no-brainer, single >>>> parameter that made the difference. >>>> >>>> It is also discussed here (they do not discuss noclassgc though, we >>>> figured that out somehow) >>>> http://stackoverflow.com/questions/2833983/meaning-of-the-unloading-class-messages >>> G1 only performs class unloading during a full GC. But if you're not >>> running out of perm space or compiled code cache - you can leave this flag. >>> >>>>>>> -XX:+ UseGCOverheadLimit >>>>>>> -XX:ReservedCodeCacheSize=48, that is the default for 7u21. You might consider setting it higher if you have the available space, and more importantly if you think you're running out of code space. >>>>>> For our sun jvm linux 64bit 48m is maximum, jvm won't start if higher value. >>>>> If you can't go larger than -XX:ReservedCodeCacheSize=48m, that may suggest you have memory constraints and may also suggest you don't have enough swap space defined, and you may be experiencing swapping during JVM execution. I've got a Linux system that has 32 GB of RAM, I can set ReservedCodeCacheSize=256m with no issues, even with -Xms30g and -Xms30g. >>>> It is also documented that 48m is maximum >>>> http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html >>>> "maximum code cache size. [Solaris 64-bit, amd64, and -server x86: >>>> 48m" >>>> >>>> >>> That's the default max code cache size. When the JIT compiler compiles a >>> Java method it places the generated code into the code cache. When >>> there's no more room in the code cache, a warning is issued and JIT >>> compilation is stopped. You can set it higher. IIRC there was time in >>> the past when the size was limited in order to use short branches in >>> compiled code. I don't think we've had that restriction for a while. >>> >>> HTHs >>> >>> JohnC >>> _______________________________________________ >>> hotspot-gc-use mailing list >>> hotspot-gc-use at openjdk.java.net >>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -- Oracle Monica Beckwith | Principal Member of Technical Staff VOIP: +15124011274 Oracle Java Performance Green Oracle Oracle is committed to developing practices and products that help protect the environment -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130530/d988a187/attachment-0001.html -------------- next part -------------- A non-text attachment was scrubbed... Name: oracle_sig_logo.gif Type: image/gif Size: 658 bytes Desc: not available Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130530/d988a187/oracle_sig_logo-0001.gif -------------- next part -------------- A non-text attachment was scrubbed... Name: green-for-email-sig_0.gif Type: image/gif Size: 356 bytes Desc: not available Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130530/d988a187/green-for-email-sig_0-0001.gif From darius.ski at gmail.com Thu May 30 15:32:55 2013 From: darius.ski at gmail.com (Darius D.) Date: Fri, 31 May 2013 01:32:55 +0300 Subject: Bug in G1GC it performs Full GC when code cache is full resulting in overkill In-Reply-To: <51A7BCC7.4050700@oracle.com> References: <3807DE76-D6CA-451F-AC72-771332825905@salesforce.com> <5C892669-3515-4DCC-BB72-96AE89EBE5F8@salesforce.com> <51A63C5F.9040808@oracle.com> <51A7BCC7.4050700@oracle.com> Message-ID: Hi, Monica, thanks a lot for Your additional insight about H-Objects, as i've already mentioned in this thread, your great JavaOne presentation about G1GC was key in solving our problem. You are right that it is all about "fitting" H-Object in old gen. In our case no matter how high (within confines of what server memory allowed us) we set the heap size, we were still getting H-Obj alloc failures. Actually even drastically cutting H-Object count we were still getting some Full GCs, only after we increased region size step by step to 16M they ceased. It was subtle fragmentation issue, made worse by rather small default region size. The following happened: 1) Big web request came in, 1-1.5GB of various allocations were done to generate that Json string, triggering young gc in progress. 2) After young gc heap was a bit fragmented cause there was temporary, but still "live" data. All that data now got into fresh survivor/old regions. The "health" of heap now really depends on how long ago last mixed GC ran. 3) Into such "sprayed" heap we start to allocate our big object. I have no idea how a set of regions is chosen for humongous region, but i think we were generating a total of ~30 humongous objects ( "generating" as in resizing StringBuffer somewhere deep in web framework till 30M fits) and that was too much for G1GC to cope. 4) Reducing allocation rate is not enough unfortunately, those small ones that slip are really dangerous - they are immediately allocated in OldGen, fragmenting the heap further. 5) It is now a race between big web requests and next mixed gc. We could reliably reproduce Full GCs in testing by generating several well timed big requests :) Getting region size up from default (actually I got a question, why G1GC is aiming for thousands of regions, what are the drawbacks of larger than "default" region) is more about reducing fragmentation by keeping those large but temporary objects where they belong - in nursery where G1GC can collect them efficiently. So we have 3 important todo and tunables in avoiding H-Object caused FullGC: 1) Code changes - any profiler that can record every allocation stack trace will help, set it to record each alloc above 1/2 heap region size and limit them as much as possible 2) IHOP should be tuned to allow mixed GCs frequently enough even if your app is behaving perfectly ( stable old gen + temporaries in nursery with little promotion going on ). 3) G1 region size can be increased to reduce heap fragmentation caused by H-Objects if they are temporary (by reducing them to ordinary objects allocated in young gen) Darius. On Thu, May 30, 2013 at 11:55 PM, Monica Beckwith < monica.beckwith at oracle.com> wrote: > +1 to enabling PrintAdaptiveSizePolicy. > Darius, > As you have already mentioned - Any object with size greater or equal to a > half region is called a "humongous" object (H-obj). The max region size for > G1 is 32M. So yes, even if you set your region size to max, your 32M object > will be considered humongous. > Now, there are a couple of things that we should be aware of with respect > to humongous regions (H-region)/objects - > > 1. The H-obj allocation will happen directly into the old generation > 1. There will be a check for marking threshold (IHOP), and a > concurrent cycle will be initiated if necessary > 2. The H-regions are not included in an evacuation pause, since it's > just going to increase the copying expense. > 1. But if the H-obj(s) are dead, they get freed at the end of the > multi-phased concurrent marking cycle. > > So, I think, if you have to work with H-objs and increasing the region > size doesn't help (as is your case), then maybe you should try limiting > your nursery so as to allow more space for the old generation, so as to > sustain your H-objs till they die or if they are a part of your live data > set, then it's all the more necessary to be able to fit them in your old > gen. > > -Monica > > > On 5/30/2013 1:54 PM, Martin Makundi wrote: > > Hi! > > > I'd strongly suggest that Martin should add > -XX:+PrintAdaptiveSizePolicy to his JVM options. In our case that was > what we needed to solve the mystery of FullGCs with gigabytes of heap > free. > > Thanks, will add that tomorrow. > > > Actually with some minor googling around i've found: > https://forums.oracle.com/forums/thread.jspa?messageID=10869877 > > I suspect it could be same story as ours, "humongous allocation > request failed" is bad for JVM health, FullGC will occur immediately. > > Remember, any allocation that is larger than half of G1GC region size > will get allocated as "humongous" object on heap, that does not care > about regions etc. In our case we were failing to allocate 32 > megabytes with over 50% of heap free! > > Any solution to such problem or is it a bug in g1gc? Is there a way to > log what code is performing the memory allocation if that happens to > be the case? > > ** > Martin > > Best regards, > > Darius. > > > > On Wed, May 29, 2013 at 8:35 PM, John Cuthbertson wrote: > > Hi Martin, > > I'm going to fill in bit more detail to Charlie's replies.... > > On 5/29/2013 8:49 AM, Martin Makundi wrote: > > Hi! > > > A bit of constructive criticism ;-) It would be good practice to set one option at a time and measure its performance to determine whether it improves performance rather than choosing an option because of something you read in text. In short, always measure and reason about whether what you've observed for an improvement or regression makes sense. And, also run multiple times to get a sense of noise versus real improvement or regression. > > Thanks. That's one of the reasons we never changed our options. Once > we found someting that works very well, we know that its always n! > work to test changes and the system was running very nice indeed > before the previous tweak ;) > > > -XX:+UseFastAccessorMethod (the default is disabled) > > Fast sounds good, the description of it is "Use optimized versions of > GetField" which sounds good. I see no harm in this. > > These would be JNI operations. > > A quick at the HotSpot source suggests UseFastAccessorMethods > is mostly confined to interpreter operations. > > Thanks for the info. Doesn't say much to me, but does not seem to harm > anything. Will try setting it off at some point in time. > > > -XX:+UseNUMA (Are you running a JVM that spans NUMA memory nodes? > Or, do you have multiple JVMs running on a NUMA or non-NUMA system?) > > Key point is that you should use -XX:+UseNUMA only when you are > deploying a JVM that spans NUMA nodes. > > Thanks for the info. Doesn't say much to me, but does not seem to harm > anything. Will try setting it off at some point in time. > > The fast accessor methods flag creates specialized (i.e. short and > optimized) interpreter entry points for accessor methods (those that > just return the value in one of the object's fields). In most > applications the bulk of the execution time is spent executing JIT > compiled code; only a few percent is typically spent in Hotspot's > interpreter. The JIT compiler will always try to inline accessor methods > into their caller. So, unless your application is spending a ton of time > interpreting, this flag should make no difference. > > > -XX:CMSInitiatingOccupancyFraction=70 (This is applicable to CMS GC, and not applicable to G1 GC) > > Again, not documented thoroughly where it applies and where not, jvm > gave no warning/error about it so we assumed it's valid. > > There's always the HotSpot source code ;-) > > It's also quite well documented in various slide ware on the internet. > It's also quite well documented in the Java Performance book. :-) > > Uh.. does it say somewhere that Do not use > XX:CMSInitiatingOccupancyFraction with G1GC? ;) I know performance > tuning is your bread and butter but is not ours... is more like we are > just driving the car and you are the mechanic...different > perspective.. just trying to fill'er'up to go..leaded or unleaded... > ;) > > The G1 equivalent of this flag is InitiatingHeapOccupancyPercent (IHOP > for short). Actually both G1 and CMS accept and observe IHOP. > CMSInitiatingOccupancyFraction was superseded by IHOP. The JVM still > accepts the old flag name - but it is CMS only and doesn't affect G1. > > > -XX:InitiatingHeapOccupancyPercent=0 (You realize this will force G1's concurrent cycle to run continuously?) > > Yes, that's what we figured out, we don't want it to sit lazy and end > up in a situation where it is required to do a Full GC. This switch > was specifically chosen in a situation we had a memory leak and tried > to aggressively fight against it before we found the root cause. Maybe > we should try without this switch now, and see what effect it has. > > Having GC logs to see what available head room you have between the > initiating of a G1 concurrent cycle and available regions / heap space would be most appropriate. > > Hmm.. I don't thoroughly understand the logs either, but, here is a snap: > > 2013-05-29T17:28:56.119+0300: 38905.407: [GC pause (young), 0.29261000 secs] > [Parallel Time: 288.8 ms] > [GC Worker Start (ms): 38905407.6 38905407.7 38905407.7 38905407.9 > Avg: 38905407.7, Min: 38905407.6, Max: 38905407.9, Diff: 0.3] > [Ext Root Scanning (ms): 22.8 16.3 18.1 22.0 > Avg: 19.8, Min: 16.3, Max: 22.8, Diff: 6.6] > [SATB Filtering (ms): 0.0 0.1 0.0 0.0 > Avg: 0.0, Min: 0.0, Max: 0.1, Diff: 0.1] > [Update RS (ms): 31.9 37.3 35.1 33.3 > Avg: 34.4, Min: 31.9, Max: 37.3, Diff: 5.5] > [Processed Buffers : 102 106 119 104 > Sum: 431, Avg: 107, Min: 102, Max: 119, Diff: 17] > [Scan RS (ms): 0.0 0.0 0.1 0.0 > Avg: 0.0, Min: 0.0, Max: 0.1, Diff: 0.1] > [Object Copy (ms): 228.2 229.1 229.5 227.3 > Avg: 228.5, Min: 227.3, Max: 229.5, Diff: 2.2] > [Termination (ms): 0.0 0.0 0.0 0.0 > Avg: 0.0, Min: 0.0, Max: 0.0, Diff: 0.0] > [Termination Attempts : 4 1 11 4 > Sum: 20, Avg: 5, Min: 1, Max: 11, Diff: 10] > [GC Worker End (ms): 38905690.5 38905690.5 38905690.5 38905690.5 > Avg: 38905690.5, Min: 38905690.5, Max: 38905690.5, Diff: 0.0] > [GC Worker (ms): 282.9 282.8 282.8 282.6 > Avg: 282.8, Min: 282.6, Max: 282.9, Diff: 0.3] > [GC Worker Other (ms): 5.9 6.0 6.0 6.2 > Avg: 6.0, Min: 5.9, Max: 6.2, Diff: 0.3] > [Complete CSet Marking: 0.0 ms] > [Clear CT: 0.1 ms] > [Other: 3.7 ms] > [Choose CSet: 0.0 ms] > [Ref Proc: 2.8 ms] > [Ref Enq: 0.1 ms] > [Free CSet: 0.3 ms] > [Eden: 48M(5032M)->0B(5032M) Survivors: 288M->288M Heap: > 15790M(26624M)->15741M(26624M)] > [Times: user=1.14 sys=0.00, real=0.29 secs] > Heap after GC invocations=575 (full 157): > garbage-first heap total 27262976K, used 16119181K > [0x0000000160000000, 0x00000007e0000000, 0x00000007e0000000) > region size 8192K, 36 young (294912K), 36 survivors (294912K) > compacting perm gen total 524288K, used 164479K [0x00000007e0000000, > 0x0000000800000000, 0x0000000800000000) > the space 524288K, 31% used [0x00000007e0000000, > 0x00000007ea09ff68, 0x00000007ea0a0000, 0x0000000800000000) > No shared spaces configured. > } > {Heap before GC invocations=575 (full 157): > garbage-first heap total 27262976K, used 16119181K > [0x0000000160000000, 0x00000007e0000000, 0x00000007e0000000) > region size 8192K, 37 young (303104K), 36 survivors (294912K) > compacting perm gen total 524288K, used 164479K [0x00000007e0000000, > 0x0000000800000000, 0x0000000800000000) > the space 524288K, 31% used [0x00000007e0000000, > 0x00000007ea09ff68, 0x00000007ea0a0000, 0x0000000800000000) > No shared spaces configured. > 2013-05-29T17:28:56.413+0300: 38905.701: [Full GC > 15742M->14497M(26624M), 56.7731320 secs] > > That's the third Full GC today after the change to 26G and change from > occupancypercent=0. Tomorrow will be trying again with > occupancypercent=0 > > What did you set the IHOP value to? > > > -noclassgc (This is rarely needed and haven't seen an app that required it for quite some time) > > Jvm 1.6 stopped the world for couple of minutes several times per day > while unloading classes, so we used noclassgc to disable that. We do > not know if this is necessary for latest 1.7 to avoid class unload > pause, but we continued to use this switch and found no harm in it. > Can't afford testing that in production ;) > > Haven't seen a case where unloading classes cause a several minute pause. > Are you sure your system is not swapping? And, do you have GC logs you > can share that illustrate the behavior and that -noclassgc fixed it? > > We deleted swap partition long time ago, we simply do not risk swapping at all. > > We had this class unloading problem several times per day like half a > year ago, and fixed it with noclasssgc, that was a no-brainer, single > parameter that made the difference. > > It is also discussed here (they do not discuss noclassgc though, we > figured that out somehow)http://stackoverflow.com/questions/2833983/meaning-of-the-unloading-class-messages > > G1 only performs class unloading during a full GC. But if you're not > running out of perm space or compiled code cache - you can leave this flag. > > > -XX:+ UseGCOverheadLimit > -XX:ReservedCodeCacheSize=48, that is the default for 7u21. You might consider setting it higher if you have the available space, and more importantly if you think you're running out of code space. > > For our sun jvm linux 64bit 48m is maximum, jvm won't start if higher value. > > If you can't go larger than -XX:ReservedCodeCacheSize=48m, that may suggest you have memory constraints and may also suggest you don't have enough swap space defined, and you may be experiencing swapping during JVM execution. I've got a Linux system that has 32 GB of RAM, I can set ReservedCodeCacheSize=256m with no issues, even with -Xms30g and -Xms30g. > > It is also documented that 48m is maximumhttp://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html > "maximum code cache size. [Solaris 64-bit, amd64, and -server x86: > 48m" > > > > That's the default max code cache size. When the JIT compiler compiles a > Java method it places the generated code into the code cache. When > there's no more room in the code cache, a warning is issued and JIT > compilation is stopped. You can set it higher. IIRC there was time in > the past when the size was limited in order to use short branches in > compiled code. I don't think we've had that restriction for a while. > > HTHs > > JohnC > _______________________________________________ > hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > _______________________________________________ > hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > _______________________________________________ > hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > > -- > [image: Oracle] > Monica Beckwith | Principal Member of Technical Staff > VOIP: +15124011274 > Oracle Java Performance > > [image: Green Oracle] Oracle is > committed to developing practices and products that help protect the > environment > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130531/01047367/attachment-0001.html -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 356 bytes Desc: not available Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130531/01047367/attachment-0002.gif -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 658 bytes Desc: not available Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130531/01047367/attachment-0003.gif From martin.makundi at koodaripalvelut.com Thu May 30 20:41:20 2013 From: martin.makundi at koodaripalvelut.com (Martin Makundi) Date: Fri, 31 May 2013 06:41:20 +0300 Subject: Bug in G1GC it performs Full GC when code cache is full resulting in overkill In-Reply-To: <51A7BCC7.4050700@oracle.com> References: <3807DE76-D6CA-451F-AC72-771332825905@salesforce.com> <5C892669-3515-4DCC-BB72-96AE89EBE5F8@salesforce.com> <51A63C5F.9040808@oracle.com> <51A7BCC7.4050700@oracle.com> Message-ID: Hi! > So, I think, if you have to work with H-objs and increasing the region > size doesn't help (as is your case), then maybe you should try limiting your > nursery so as to allow more space for the old generation, so as to sustain > your H-objs till they die or if they are a part of your live data set, then > it's all the more necessary to be able to fit them in your old gen. 1. I will post my logs from todays results. 2. How can this ("limiting your nursery") be achieved in order to reach the goal? 3. Is there a way to adjust "region size" (what region??)? 4. Isn't g1gc with adaptivesizepolicy supposed to handle all this automatically, i.e, it's a bug in the algorithm that it fails in these situations? > Monica, thanks a lot for Your additional insight about H-Objects, as > i've already mentioned in this thread, your great JavaOne presentation > about G1GC was key in solving our problem. Thanks for the hint, googled http://www.myexpospace.com/JavaOne2012/SessionFiles/CON6583_PDF_6583_0001.pdf Also http://www.slideshare.net/C2B2/g1-garbage-collector-big-heaps-and-low-pauses was nice. > 1) Code changes - any profiler that can record every allocation > stack trace will help, set it to record each alloc above 1/2 heap > region size and limit them as much as possible What profiler can do such, and moreover, what profiler can do that in production enviroment with low overhead so that it can be run in production? > 2) IHOP should be tuned to allow mixed GCs frequently enough > even if your app is behaving perfectly ( stable old gen + > temporaries in nursery with little promotion going on ). How to do that? We have set InitiatingHeapOccupancyPercent=0 and either this or default neither rids us of Full GC's. > 3) G1 region size can be increased to reduce heap fragmentation > caused by H-Objects if they are temporary (by reducing them to > ordinary objects allocated in young gen) Region size can be adjusted with G1HeapRegionSize? How will it reflect to adaptivesizepolicy, shouldnt adaptivesizepolicy automatically handle that for me? Shouldn't g1gc be able to handle different size regions simultaneously for best fit? How about tuning parameter G1MixedGCLiveThresholdPercent? Can we see from the logs how it is performing? Should I enable -XX:+PrintGC with "fine" mode? How about tuning parameter G1MixedGCCountTarget? Can we see from the logs how it is performing? Does the parameter G1MixedGCCountTarget have any effect when InitiatingHeapOccupancyPercent=0? Can it 'run out'? What is defautl value for G1MixedGCCountTarget and what is maximum value? How about tuning parameter G1HeapWastePercent ? Can we see from the logs how it is performing? How will it reflect to adaptivesizepolicy? Anybody know if azul jvm handles these issues automatically? ** Martin > > -Monica > > > On 5/30/2013 1:54 PM, Martin Makundi wrote: > > Hi! > > I'd strongly suggest that Martin should add > -XX:+PrintAdaptiveSizePolicy to his JVM options. In our case that was > what we needed to solve the mystery of FullGCs with gigabytes of heap > free. > > Thanks, will add that tomorrow. > > Actually with some minor googling around i've found: > > https://forums.oracle.com/forums/thread.jspa?messageID=10869877 > > I suspect it could be same story as ours, "humongous allocation > request failed" is bad for JVM health, FullGC will occur immediately. > > Remember, any allocation that is larger than half of G1GC region size > will get allocated as "humongous" object on heap, that does not care > about regions etc. In our case we were failing to allocate 32 > megabytes with over 50% of heap free! > > Any solution to such problem or is it a bug in g1gc? Is there a way to > log what code is performing the memory allocation if that happens to > be the case? > > ** > Martin > > Best regards, > > Darius. > > > > On Wed, May 29, 2013 at 8:35 PM, John Cuthbertson > wrote: > > Hi Martin, > > I'm going to fill in bit more detail to Charlie's replies.... > > On 5/29/2013 8:49 AM, Martin Makundi wrote: > > Hi! > > A bit of constructive criticism ;-) It would be good practice to set one > option at a time and measure its performance to determine whether it > improves performance rather than choosing an option because of something you > read in text. In short, always measure and reason about whether what you've > observed for an improvement or regression makes sense. And, also run > multiple times to get a sense of noise versus real improvement or > regression. > > Thanks. That's one of the reasons we never changed our options. Once > we found someting that works very well, we know that its always n! > work to test changes and the system was running very nice indeed > before the previous tweak ;) > > -XX:+UseFastAccessorMethod (the default is disabled) > > Fast sounds good, the description of it is "Use optimized versions of > GetField" which sounds good. I see no harm in this. > > These would be JNI operations. > > A quick at the HotSpot source suggests UseFastAccessorMethods > is mostly confined to interpreter operations. > > Thanks for the info. Doesn't say much to me, but does not seem to harm > anything. Will try setting it off at some point in time. > > -XX:+UseNUMA (Are you running a JVM that spans NUMA memory nodes? > Or, do you have multiple JVMs running on a NUMA or non-NUMA system?) > > Key point is that you should use -XX:+UseNUMA only when you are > deploying a JVM that spans NUMA nodes. > > Thanks for the info. Doesn't say much to me, but does not seem to harm > anything. Will try setting it off at some point in time. > > The fast accessor methods flag creates specialized (i.e. short and > optimized) interpreter entry points for accessor methods (those that > just return the value in one of the object's fields). In most > applications the bulk of the execution time is spent executing JIT > compiled code; only a few percent is typically spent in Hotspot's > interpreter. The JIT compiler will always try to inline accessor methods > into their caller. So, unless your application is spending a ton of time > interpreting, this flag should make no difference. > > -XX:CMSInitiatingOccupancyFraction=70 (This is applicable to CMS GC, and > not applicable to G1 GC) > > Again, not documented thoroughly where it applies and where not, jvm > gave no warning/error about it so we assumed it's valid. > > There's always the HotSpot source code ;-) > > It's also quite well documented in various slide ware on the internet. > It's also quite well documented in the Java Performance book. :-) > > Uh.. does it say somewhere that Do not use > XX:CMSInitiatingOccupancyFraction with G1GC? ;) I know performance > tuning is your bread and butter but is not ours... is more like we are > just driving the car and you are the mechanic...different > perspective.. just trying to fill'er'up to go..leaded or unleaded... > ;) > > The G1 equivalent of this flag is InitiatingHeapOccupancyPercent (IHOP > for short). Actually both G1 and CMS accept and observe IHOP. > CMSInitiatingOccupancyFraction was superseded by IHOP. The JVM still > accepts the old flag name - but it is CMS only and doesn't affect G1. > > -XX:InitiatingHeapOccupancyPercent=0 (You realize this will force G1's > concurrent cycle to run continuously?) > > Yes, that's what we figured out, we don't want it to sit lazy and end > up in a situation where it is required to do a Full GC. This switch > was specifically chosen in a situation we had a memory leak and tried > to aggressively fight against it before we found the root cause. Maybe > we should try without this switch now, and see what effect it has. > > Having GC logs to see what available head room you have between the > initiating of a G1 concurrent cycle and available regions / heap space > would be most appropriate. > > Hmm.. I don't thoroughly understand the logs either, but, here is a snap: > > 2013-05-29T17:28:56.119+0300: 38905.407: [GC pause (young), 0.29261000 > secs] > [Parallel Time: 288.8 ms] > [GC Worker Start (ms): 38905407.6 38905407.7 38905407.7 > 38905407.9 > Avg: 38905407.7, Min: 38905407.6, Max: 38905407.9, Diff: 0.3] > [Ext Root Scanning (ms): 22.8 16.3 18.1 22.0 > Avg: 19.8, Min: 16.3, Max: 22.8, Diff: 6.6] > [SATB Filtering (ms): 0.0 0.1 0.0 0.0 > Avg: 0.0, Min: 0.0, Max: 0.1, Diff: 0.1] > [Update RS (ms): 31.9 37.3 35.1 33.3 > Avg: 34.4, Min: 31.9, Max: 37.3, Diff: 5.5] > [Processed Buffers : 102 106 119 104 > Sum: 431, Avg: 107, Min: 102, Max: 119, Diff: 17] > [Scan RS (ms): 0.0 0.0 0.1 0.0 > Avg: 0.0, Min: 0.0, Max: 0.1, Diff: 0.1] > [Object Copy (ms): 228.2 229.1 229.5 227.3 > Avg: 228.5, Min: 227.3, Max: 229.5, Diff: 2.2] > [Termination (ms): 0.0 0.0 0.0 0.0 > Avg: 0.0, Min: 0.0, Max: 0.0, Diff: 0.0] > [Termination Attempts : 4 1 11 4 > Sum: 20, Avg: 5, Min: 1, Max: 11, Diff: 10] > [GC Worker End (ms): 38905690.5 38905690.5 38905690.5 > 38905690.5 > Avg: 38905690.5, Min: 38905690.5, Max: 38905690.5, Diff: 0.0] > [GC Worker (ms): 282.9 282.8 282.8 282.6 > Avg: 282.8, Min: 282.6, Max: 282.9, Diff: 0.3] > [GC Worker Other (ms): 5.9 6.0 6.0 6.2 > Avg: 6.0, Min: 5.9, Max: 6.2, Diff: 0.3] > [Complete CSet Marking: 0.0 ms] > [Clear CT: 0.1 ms] > [Other: 3.7 ms] > [Choose CSet: 0.0 ms] > [Ref Proc: 2.8 ms] > [Ref Enq: 0.1 ms] > [Free CSet: 0.3 ms] > [Eden: 48M(5032M)->0B(5032M) Survivors: 288M->288M Heap: > 15790M(26624M)->15741M(26624M)] > [Times: user=1.14 sys=0.00, real=0.29 secs] > Heap after GC invocations=575 (full 157): > garbage-first heap total 27262976K, used 16119181K > [0x0000000160000000, 0x00000007e0000000, 0x00000007e0000000) > region size 8192K, 36 young (294912K), 36 survivors (294912K) > compacting perm gen total 524288K, used 164479K [0x00000007e0000000, > 0x0000000800000000, 0x0000000800000000) > the space 524288K, 31% used [0x00000007e0000000, > 0x00000007ea09ff68, 0x00000007ea0a0000, 0x0000000800000000) > No shared spaces configured. > } > {Heap before GC invocations=575 (full 157): > garbage-first heap total 27262976K, used 16119181K > [0x0000000160000000, 0x00000007e0000000, 0x00000007e0000000) > region size 8192K, 37 young (303104K), 36 survivors (294912K) > compacting perm gen total 524288K, used 164479K [0x00000007e0000000, > 0x0000000800000000, 0x0000000800000000) > the space 524288K, 31% used [0x00000007e0000000, > 0x00000007ea09ff68, 0x00000007ea0a0000, 0x0000000800000000) > No shared spaces configured. > 2013-05-29T17:28:56.413+0300: 38905.701: [Full GC > 15742M->14497M(26624M), 56.7731320 secs] > > That's the third Full GC today after the change to 26G and change from > occupancypercent=0. Tomorrow will be trying again with > occupancypercent=0 > > What did you set the IHOP value to? > > -noclassgc (This is rarely needed and haven't seen an app that required it > for quite some time) > > Jvm 1.6 stopped the world for couple of minutes several times per day > while unloading classes, so we used noclassgc to disable that. We do > not know if this is necessary for latest 1.7 to avoid class unload > pause, but we continued to use this switch and found no harm in it. > Can't afford testing that in production ;) > > Haven't seen a case where unloading classes cause a several minute pause. > Are you sure your system is not swapping? And, do you have GC logs you > can share that illustrate the behavior and that -noclassgc fixed it? > > We deleted swap partition long time ago, we simply do not risk swapping at > all. > > We had this class unloading problem several times per day like half a > year ago, and fixed it with noclasssgc, that was a no-brainer, single > parameter that made the difference. > > It is also discussed here (they do not discuss noclassgc though, we > figured that out somehow) > > http://stackoverflow.com/questions/2833983/meaning-of-the-unloading-class-messages > > G1 only performs class unloading during a full GC. But if you're not > running out of perm space or compiled code cache - you can leave this > flag. > > -XX:+ UseGCOverheadLimit > -XX:ReservedCodeCacheSize=48, that is the default for 7u21. You might > consider setting it higher if you have the available space, and more > importantly if you think you're running out of code space. > > For our sun jvm linux 64bit 48m is maximum, jvm won't start if higher > value. > > If you can't go larger than -XX:ReservedCodeCacheSize=48m, that may > suggest you have memory constraints and may also suggest you don't have > enough swap space defined, and you may be experiencing swapping during JVM > execution. I've got a Linux system that has 32 GB of RAM, I can set > ReservedCodeCacheSize=256m with no issues, even with -Xms30g and -Xms30g. > > It is also documented that 48m is maximum > > http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html > "maximum code cache size. [Solaris 64-bit, amd64, and -server x86: > 48m" > > > That's the default max code cache size. When the JIT compiler compiles a > Java method it places the generated code into the code cache. When > there's no more room in the code cache, a warning is issued and JIT > compilation is stopped. You can set it higher. IIRC there was time in > the past when the size was limited in order to use short branches in > compiled code. I don't think we've had that restriction for a while. > > HTHs > > JohnC > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > > -- > > Monica Beckwith | Principal Member of Technical Staff > VOIP: +15124011274 > Oracle Java Performance > > Oracle is committed to developing practices and products that help protect > the environment