From wajih.ahmed at gmail.com Wed Sep 12 13:26:09 2018 From: wajih.ahmed at gmail.com (Wajih Ahmed) Date: Wed, 12 Sep 2018 09:26:09 -0400 Subject: G1GC fine tuning under heavy load Message-ID: Hello, I have an application running on two nodes in a kubernetes cluster. It is handling about 70 million requests per day. I have noticed a gradual decline in the throughput so much so that in about 7 days the througput falls about 50%. Although large percent of this decline is in the first hour and then a gradual decline. This graph to shows this pattern. Some of the decline i can attribute to the application and use case itself. As database starts growing rapidly the system come under memory and cpu pressure and the database itself is also a java application. So perhaps ignoring the decline of the first hour is prudent but i am still interested in seeing if i can tune the jvm of the app so that the throughput is more linear after the first hour. I am also providing a gceasy.io report that will give the required information about GC activity. You will see i have done some rudementary tuning already. What i am curious about is if the young gen size needs to be reduced by tunring G1NewSizePercent to reduce the duration of the pauses in particular the object copy stage. Secondly what GCEasy is calling "consecutive full gc" don't appear to be full GC's. But it might be CMS (initial-mark) activity which accouts for most of the GC activity and has some long pause times. Will increasing InitiatingHeapOccupancyPercent be recommended to reduce this activity and give the application more time? Any other advise will be helpful as i start to learn and unfold the mystries of GC tuning :-) Just in case you don't want to open the pdf report these are my JVM args -XX:G1MixedGCCountTarget=12 -XX:InitialHeapSize=7516192768 -XX:MaxGCPauseMillis=200 -XX:MaxHeapSize=7516192768 - XX:MetaspaceSize=268435456 -XX:+PrintAdaptiveSizePolicy -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintPromotionFailure -XX:+PrintTenuringDistribution -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseG1GC -XX:- UseNUMA Regards -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.weeda at det.nsw.edu.au Thu Sep 13 00:12:51 2018 From: david.weeda at det.nsw.edu.au (David Weeda) Date: Thu, 13 Sep 2018 00:12:51 +0000 Subject: G1GC fine tuning under heavy load In-Reply-To: References: Message-ID: Hello Ahmed, What we have done is following the documentation at https://www.oracle.com/technetwork/articles/java/g1gc-1984535.html We noted the following comment. To us it meant that ideally your HEAP size should be a power of 2 also. We settled on 2 x 8GB. -XX:G1HeapRegionSize=n Sets the size of a G1 region. The value will be a power of two and can range from 1MB to 32MB. The goal is to have around 2048 regions based on the minimum Java heap size. From our java trace J [0.013s][info ][gc,heap] Heap region size: 4M J [0.128s][info ][gc ] Using G1 Regards, David Weeda SAP Technical Architect From: hotspot-gc-use On Behalf Of Wajih Ahmed Sent: Wednesday, 12 September 2018 11:26 PM To: hotspot-gc-use at openjdk.java.net Subject: G1GC fine tuning under heavy load Hello, I have an application running on two nodes in a kubernetes cluster. It is handling about 70 million requests per day. I have noticed a gradual decline in the throughput so much so that in about 7 days the througput falls about 50%. Although large percent of this decline is in the first hour and then a gradual decline. This graph to shows this pattern. Some of the decline i can attribute to the application and use case itself. As database starts growing rapidly the system come under memory and cpu pressure and the database itself is also a java application. So perhaps ignoring the decline of the first hour is prudent but i am still interested in seeing if i can tune the jvm of the app so that the throughput is more linear after the first hour. I am also providing a gceasy.io report that will give the required information about GC activity. You will see i have done some rudementary tuning already. What i am curious about is if the young gen size needs to be reduced by tunring G1NewSizePercent to reduce the duration of the pauses in particular the object copy stage. Secondly what GCEasy is calling "consecutive full gc" don't appear to be full GC's. But it might be CMS (initial-mark) activity which accouts for most of the GC activity and has some long pause times. Will increasing InitiatingHeapOccupancyPercent be recommended to reduce this activity and give the application more time? Any other advise will be helpful as i start to learn and unfold the mystries of GC tuning :-) Just in case you don't want to open the pdf report these are my JVM args -XX:G1MixedGCCountTarget=12 -XX:InitialHeapSize=7516192768 -XX:MaxGCPauseMillis=200 -XX:MaxHeapSize=7516192768 - XX:MetaspaceSize=268435456 -XX:+PrintAdaptiveSizePolicy -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintPromotionFailure -XX:+PrintTenuringDistribution -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseG1GC -XX:- UseNUMA Regards ********************************************************************** This message is intended for the addressee named and may contain privileged information or confidential information or both. If you are not the intended recipient please delete it and notify the sender. ********************************************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: From wolfgang.pedot at finkzeit.at Thu Sep 13 07:21:59 2018 From: wolfgang.pedot at finkzeit.at (Wolfgang Pedot) Date: Thu, 13 Sep 2018 09:21:59 +0200 Subject: G1GC fine tuning under heavy load In-Reply-To: References: Message-ID: <68b777a3-e416-3176-280c-2f8354f78d43@finkzeit.at> Hello, Am 12.09.2018 um 15:26 schrieb Wajih Ahmed: > What i am curious about is if the young gen size needs to be reduced > by tunring G1NewSizePercent to reduce the duration of the pauses in > particular the object copy stage. What I have learned tuning G1 is that it is usually best to trust G1s ergonomics and not directly influence the size allocated to the different generations. What you can do to test with smaller NewGen is reduce the pause-time. From your report I would say that the 200ms you currently have configured are met pretty much all of the time so the NewGen is sized correctly. regards Wolfgang From stefan.johansson at oracle.com Thu Sep 13 12:39:33 2018 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Thu, 13 Sep 2018 14:39:33 +0200 Subject: G1GC fine tuning under heavy load In-Reply-To: References: Message-ID: Hi, Could you please provide the GC logs from the run as well, the reports give a good overview but some details from the logs might help us give better advise. It will also help to rule out that there are no Full GCs occurring as you say. Some more comments inline. On 2018-09-12 15:26, Wajih Ahmed wrote: > Hello, > > I have an application running on two nodes in a kubernetes cluster. It > is handling about 70 million requests per day.? I have noticed a gradual > decline in the throughput so much so that in about 7 days the througput > falls about 50%.? Although large percent of this decline is in the first > hour and then a gradual decline. > This graph > to > shows this pattern.? Some of the decline i can attribute to the > application and use case itself. As database starts growing rapidly the > system come under memory and cpu pressure and the database itself is > also a java application.? So perhaps ignoring the decline of the first > hour is prudent but i am still interested in seeing if i can tune the > jvm of the app so that the throughput is more linear after the first hour. > > I am also providing a gceasy.io report > > that will > give the required information about GC activity.? You will see i have > done some rudementary tuning already. > > What i am curious about is if the young gen size needs to be reduced by > tunring G1NewSizePercent to reduce the duration of the pauses in > particular the object copy stage. This is a very hard question and answer, a smaller young gen of course mean less regions to collect but since the GCs will occur more frequently, less objects will have time to die, so it might be that a larger young gen is quicker to collect for some applications. And since long pause times doesn't seem to be the biggest problem, I wouldn't start the tuning here. > > Secondly what GCEasy is calling "consecutive full gc" don't appear to be > full GC's.? But it might be CMS (initial-mark) activity which accouts > for most of the GC activity and has some long pause times.? Will > increasing InitiatingHeapOccupancyPercent be recommended to reduce this > activity and give the application more time? > Looking at the report it looks like the old generation grows over time and it might be that a lot of it is live so the concurrent cycles don't free up that much and that you still are above the limit afterwards. If this is the case setting a higher InitiatingHeapOccupancyPercent could help. Would also be helpful to know what version of Java you are running. Cheers, Stefan > Any other advise will be helpful as i start to learn and unfold the > mystries of GC tuning :-) > > Just in case you don't want to open the pdf report these are my JVM args > > -XX:G1MixedGCCountTarget=12 -XX:InitialHeapSize=7516192768 > -XX:MaxGCPauseMillis=200 -XX:MaxHeapSize=7516192768 - > XX:MetaspaceSize=268435456 -XX:+PrintAdaptiveSizePolicy -XX:+PrintGC > -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps > -XX:+PrintPromotionFailure -XX:+PrintTenuringDistribution > -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseG1GC > -XX:- UseNUMA > > > Regards > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > From wajih.ahmed at gmail.com Thu Sep 13 21:34:20 2018 From: wajih.ahmed at gmail.com (Wajih Ahmed) Date: Thu, 13 Sep 2018 17:34:20 -0400 Subject: G1GC fine tuning under heavy load In-Reply-To: References: Message-ID: The link to the file is https://drive.google.com/open?id=1Y8ZvxxHy078xPjJFy26nMhxeTaWgIcwn. And i hope i have the correct file corresponding to the report but even if it is not it will exhibit the same pattern as it could be from one of the servers in the cluster. Regards, On Thu, Sep 13, 2018 at 8:40 AM Stefan Johansson < stefan.johansson at oracle.com> wrote: > Hi, > > Could you please provide the GC logs from the run as well, the reports > give a good overview but some details from the logs might help us give > better advise. It will also help to rule out that there are no Full GCs > occurring as you say. Some more comments inline. > > On 2018-09-12 15:26, Wajih Ahmed wrote: > > Hello, > > > > I have an application running on two nodes in a kubernetes cluster. It > > is handling about 70 million requests per day. I have noticed a gradual > > decline in the throughput so much so that in about 7 days the througput > > falls about 50%. Although large percent of this decline is in the first > > hour and then a gradual decline. > > This graph > > to > > shows this pattern. Some of the decline i can attribute to the > > application and use case itself. As database starts growing rapidly the > > system come under memory and cpu pressure and the database itself is > > also a java application. So perhaps ignoring the decline of the first > > hour is prudent but i am still interested in seeing if i can tune the > > jvm of the app so that the throughput is more linear after the first > hour. > > > > I am also providing a gceasy.io report > > > > that will > > give the required information about GC activity. You will see i have > > done some rudementary tuning already. > > > > What i am curious about is if the young gen size needs to be reduced by > > tunring G1NewSizePercent to reduce the duration of the pauses in > > particular the object copy stage. > > This is a very hard question and answer, a smaller young gen of course > mean less regions to collect but since the GCs will occur more > frequently, less objects will have time to die, so it might be that a > larger young gen is quicker to collect for some applications. And since > long pause times doesn't seem to be the biggest problem, I wouldn't > start the tuning here. > > > > > Secondly what GCEasy is calling "consecutive full gc" don't appear to be > > full GC's. But it might be CMS (initial-mark) activity which accouts > > for most of the GC activity and has some long pause times. Will > > increasing InitiatingHeapOccupancyPercent be recommended to reduce this > > activity and give the application more time? > > > > Looking at the report it looks like the old generation grows over time > and it might be that a lot of it is live so the concurrent cycles don't > free up that much and that you still are above the limit afterwards. If > this is the case setting a higher InitiatingHeapOccupancyPercent could > help. > > Would also be helpful to know what version of Java you are running. > > Cheers, > Stefan > > > Any other advise will be helpful as i start to learn and unfold the > > mystries of GC tuning :-) > > > > Just in case you don't want to open the pdf report these are my JVM args > > > > -XX:G1MixedGCCountTarget=12 -XX:InitialHeapSize=7516192768 > > -XX:MaxGCPauseMillis=200 -XX:MaxHeapSize=7516192768 - > > XX:MetaspaceSize=268435456 -XX:+PrintAdaptiveSizePolicy -XX:+PrintGC > > -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps > > -XX:+PrintPromotionFailure -XX:+PrintTenuringDistribution > > -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseG1GC > > -XX:- UseNUMA > > > > > > Regards > > > > > > _______________________________________________ > > hotspot-gc-use mailing list > > hotspot-gc-use at openjdk.java.net > > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan.johansson at oracle.com Fri Sep 14 08:55:41 2018 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Fri, 14 Sep 2018 10:55:41 +0200 Subject: G1GC fine tuning under heavy load In-Reply-To: References: Message-ID: Thanks for the log, After looking through the log I think the suggestion to increase InitiatingHeapOccupancyPercent (IHOP) stands. Your old generation is growing slowly and after a while it gets above the concurrent cycle threshold which is 45% of the heap (~ 3225M for a 7G heap). When the concurrent cycle finishes and the Mixed collections try to reclaim old generation space only a few megabytes is reclaimed before the ergonomics decide to not do more Mixed collections. The reason for this is that most of the old generation data is still live, roughly 90%. If you want the Mixed collection to reclaim some more space you can lower the G1HeapWastePercent value, which by default is 5. This will generate a few more Mixed collections that can reclaim some additional space, and it might be worth it since some much in old is live. If you raise the IHOP a bit you should be able to avoid the back to back concurrent cycles and also be able to reclaim some more space when the Mixed collections are actually running. One thing that you should investigate is why the old generation is growing over time, if this is expected you will run into the same problem later on when it has grown to the new IHOP value. One additional point, if you have the possibility I would suggest trying to run with a later version of Java, there have been some great improvements done to G1 and the feature adaptive IHOP could very well help you to avoid doing this kind of tuning. Hope this helps, Stefan On 2018-09-13 23:34, Wajih Ahmed wrote: > The link to the file is > https://drive.google.com/open?id=1Y8ZvxxHy078xPjJFy26nMhxeTaWgIcwn.? And > i hope i have the correct file corresponding to the report but even if > it is not it will exhibit the same pattern as it could be from one of > the servers in the cluster. > > Regards, > > On Thu, Sep 13, 2018 at 8:40 AM Stefan Johansson > > wrote: > > Hi, > > Could you please provide the GC logs from the run as well, the reports > give a good overview but some details from the logs might help us give > better advise. It will also help to rule out that there are no Full GCs > occurring as you say. Some more comments inline. > > On 2018-09-12 15:26, Wajih Ahmed wrote: > > Hello, > > > > I have an application running on two nodes in a kubernetes > cluster. It > > is handling about 70 million requests per day.? I have noticed a > gradual > > decline in the throughput so much so that in about 7 days the > througput > > falls about 50%.? Although large percent of this decline is in > the first > > hour and then a gradual decline. > > This graph > > > to > > shows this pattern.? Some of the decline i can attribute to the > > application and use case itself. As database starts growing > rapidly the > > system come under memory and cpu pressure and the database itself is > > also a java application.? So perhaps ignoring the decline of the > first > > hour is prudent but i am still interested in seeing if i can tune > the > > jvm of the app so that the throughput is more linear after the > first hour. > > > > I am also providing a gceasy.io > report > > > > that will > > give the required information about GC activity.? You will see i > have > > done some rudementary tuning already. > > > > What i am curious about is if the young gen size needs to be > reduced by > > tunring G1NewSizePercent to reduce the duration of the pauses in > > particular the object copy stage. > > This is a very hard question and answer, a smaller young gen of course > mean less regions to collect but since the GCs will occur more > frequently, less objects will have time to die, so it might be that a > larger young gen is quicker to collect for some applications. And since > long pause times doesn't seem to be the biggest problem, I wouldn't > start the tuning here. > > > > > Secondly what GCEasy is calling "consecutive full gc" don't > appear to be > > full GC's.? But it might be CMS (initial-mark) activity which > accouts > > for most of the GC activity and has some long pause times.? Will > > increasing InitiatingHeapOccupancyPercent be recommended to > reduce this > > activity and give the application more time? > > > > Looking at the report it looks like the old generation grows over time > and it might be that a lot of it is live so the concurrent cycles don't > free up that much and that you still are above the limit afterwards. If > this is the case setting a higher InitiatingHeapOccupancyPercent could > help. > > Would also be helpful to know what version of Java you are running. > > Cheers, > Stefan > > > Any other advise will be helpful as i start to learn and unfold the > > mystries of GC tuning :-) > > > > Just in case you don't want to open the pdf report these are my > JVM args > > > > -XX:G1MixedGCCountTarget=12 -XX:InitialHeapSize=7516192768 > > -XX:MaxGCPauseMillis=200 -XX:MaxHeapSize=7516192768 - > > XX:MetaspaceSize=268435456 -XX:+PrintAdaptiveSizePolicy -XX:+PrintGC > > -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps > > -XX:+PrintPromotionFailure -XX:+PrintTenuringDistribution > > -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseG1GC > > -XX:- UseNUMA > > > > > > Regards > > > > > > _______________________________________________ > > hotspot-gc-use mailing list > > hotspot-gc-use at openjdk.java.net > > > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >