From prasanthmathialagan at gmail.com Fri Jun 5 16:44:14 2020 From: prasanthmathialagan at gmail.com (Prasanth Mathialagan) Date: Fri, 5 Jun 2020 09:44:14 -0700 Subject: Increased CPU time with G1GC In-Reply-To: References: Message-ID: Hi, We recently switched our Java application from CMS to G1. Since then we observed increased CPU time (user cpu) and latency for the requests. *Observations* - Count of GC pauses remains the same with CMS and G1 and so does the pause time. - My initial suspicion was that the application threads were competing with GC threads to get CPU cycles. But I don't see any indication of increased concurrent time in GC logs. I suspect that the overhead associated with read/write barriers could be the reason for the increased CPU cycles but I want to confirm that. *Are there any GC flags that prints statistics about read/write barriers? Or is there a way to debug this?* java -version openjdk version "1.8.0_222" OpenJDK Runtime Environment Corretto-8.222.10.1 (build 1.8.0_222-b10) OpenJDK 64-Bit Server VM Corretto-8.222.10.1 (build 25.222-b10, mixed mode) These are the command line flags I find in GC logs that the application uses. -XX:+UseG1GC -XX:CICompilerCount=3 -XX:CompressedClassSpaceSize=931135488 -XX:ConcGCThreads=1 -XX:G1HeapRegionSize=4194304 -XX:InitialCodeCacheSize=402653184 -XX:InitialHeapSize=8589934592 -XX:InitialTenuringThreshold=6 -XX:InitiatingHeapOccupancyPercent=50 -XX:MarkStackSize=4194304 -XX:MaxGCPauseMillis=200 -XX:MaxHeapSize=8589934592 -XX:MaxMetaspaceSize=939524096 -XX:MaxNewSize=5150605312 -XX:MaxTenuringThreshold=6 -XX:MetaspaceSize=268435456 -XX:MinHeapDeltaBytes=4194304 -XX:+ParallelRefProcEnabled -XX:+PrintAdaptiveSizePolicy -XX:+PrintClassHistogram -XX:+PrintGC -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1 -XX:PrintSafepointStatisticsTimeout=1000 -XX:+PrintTenuringDistribution -XX:ReservedCodeCacheSize=402653184 -XX:+ScavengeBeforeFullGC -XX:SoftRefLRUPolicyMSPerMB=2048 -XX:StackShadowPages=20 -XX:ThreadStackSize=512 -XX:+TieredCompilation -XX:+UseBiasedLocking -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseFastAccessorMethods -XX:+UseLargePages -XX:+UseTLAB Let me know if I need to provide any other information. Regards, Prasanth -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.schatzl at oracle.com Mon Jun 8 09:02:33 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Mon, 8 Jun 2020 11:02:33 +0200 Subject: Increased CPU time with G1GC In-Reply-To: References: Message-ID: Hi Prasanth, On 05.06.20 18:44, Prasanth Mathialagan wrote: > Hi, > We recently switched our Java application from CMS to G1. Since then we > observed increased CPU time (user cpu) and latency for the requests. > > */_Observations_/* > > * Count of GC pauses remains the same with CMS and G1 and so does the > pause time. > * My initial suspicion was that the application threads were competing > with GC threads to get CPU cycles. But I don't see any indication of > increased concurrent time in GC logs. > > I suspect that the overhead associated with read/write barriers could be > the reason for the increased CPU cycles but I want to confirm that. *Are > there any GC flags that prints statistics about read/write barriers? Or > is there a way to debug this?* > The significantly larger write barriers (there are almost no read barriers in g1) can have an effect as you describe, although I would not expect a direct impact on latency. There is no statistics gathering option to be enabled on actual impact of write barriers as they are too small to measure by themselves without huge overhead. Tracing throughput deficiencies back to barriers is mostly deduced by elimination of all other causes. > java -version > > openjdk version "1.8.0_222" In later JDKs the amount of applications where G1 improves upon CMS broadens. There will also always be some applications where CMS is very hard to beat in terms of your desired throughput/latency. Particularly ones where the application and options were previously tuned to CMS. > > OpenJDK Runtime Environment Corretto-8.222.10.1 (build 1.8.0_222-b10) > > OpenJDK 64-Bit Server VM Corretto-8.222.10.1 (build 25.222-b10, mixed mode) > > > These are the command line flags I find in GC logs that the application > uses. Thanks. Some thoughts on the options, not sure if you spent time on tuning them to G1, but if not it might be useful to reconsider some of the GC specific ones. > -XX:+UseG1GC > -XX:CICompilerCount=3 > -XX:CompressedClassSpaceSize=931135488 > -XX:ConcGCThreads=1 Not sure if that makes a lot of sense to slow down concurrent operation, but it might help eeke you last throughput. Note that in jdk8 scalability of marking in g1 isn't that great, but that typically only has impact if you are in the tens of threads. > -XX:G1HeapRegionSize=4194304 That should be automatically selected with given initial/max heap size. > -XX:InitialCodeCacheSize=402653184 > -XX:InitialHeapSize=8589934592 > -XX:InitialTenuringThreshold=6 > -XX:InitiatingHeapOccupancyPercent=50 If you increase the number of conc gc threads, you might be able to increase this one to decrease the frequency of (old gen) collections. 50% seems pretty low on a 8g heap. (That also applies to CMS I think). > -XX:MarkStackSize=4194304 Curious about the reason for that? Afaik even in jdk8, while G1 reserves a lot of memory for that, it will not be allocated by the OS anyway unless used; I think other collectors are the same. > -XX:MaxGCPauseMillis=200 Default. > -XX:MaxHeapSize=8589934592 > -XX:MaxMetaspaceSize=939524096 > -XX:MaxNewSize=5150605312 > -XX:MaxTenuringThreshold=6 Not sure if that potentially prematurely pushing objects into old gen is a good idea, but I assume you tested that. > -XX:MetaspaceSize=268435456 > -XX:MinHeapDeltaBytes=4194304 > -XX:+ParallelRefProcEnabled [...lots of Print options...] > -XX:ReservedCodeCacheSize=402653184 > -XX:+ScavengeBeforeFullGC That last one never had any effect in G1 afair. > -XX:SoftRefLRUPolicyMSPerMB=2048 > -XX:StackShadowPages=20 > -XX:ThreadStackSize=512 > -XX:+TieredCompilation > -XX:+UseBiasedLocking > -XX:+UseCompressedClassPointers > -XX:+UseCompressedOops > -XX:+UseFastAccessorMethods > -XX:+UseLargePages > -XX:+UseTLAB Given that you set initial and max heap size the same, use large pages, I recommend to add -XX:+AlwaysPreTouch. > > Let me know if I need to provide any other information. Sorry for not being a great help. Thanks, Thomas From robberphex at gmail.com Sun Jun 21 11:59:58 2020 From: robberphex at gmail.com (Robert Lu) Date: Sun, 21 Jun 2020 19:59:58 +0800 Subject: How could self link help GC? Message-ID: Hi, On java.util.concurrent.LinkedBlockingQueue#dequeue https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/util/concurrent/LinkedBlockingQueue.java#L217 : private E dequeue() { // assert takeLock.isHeldByCurrentThread(); // assert head.item == null; Node h = head; Node first = h.next; h.next = h; // help GC head = first; E x = first.item; first.item = null; return x; } Why does h.next = h help GC? -- Robert Lu About me: https://www.robberphex.com/about-me -------------- next part -------------- An HTML attachment was scrubbed... URL: From dhd at exnet.com Sun Jun 21 16:18:30 2020 From: dhd at exnet.com (Damon Hart-Davis) Date: Sun, 21 Jun 2020 17:18:30 +0100 Subject: How could self link help GC? In-Reply-To: References: Message-ID: <03F8B2B0-79C0-4088-BEE7-37579DF843C1@exnet.com> By avoiding spurious pointers to h.next keeping the ?next? item alive longer than necessary. Rgds Damon > On 21 Jun 2020, at 12:59, Robert Lu wrote: > > Hi, > On java.util.concurrent.LinkedBlockingQueue#dequeue https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/util/concurrent/LinkedBlockingQueue.java#L217 : > > private E dequeue() { > // assert takeLock.isHeldByCurrentThread(); > // assert head.item == null; > Node h = head; > Node first = h.next; > h.next = h; // help GC > head = first; > E x = first.item; > first.item = null; > return x; > } > > Why does h.next = h help GC? > > -- > Robert Lu > > About me: https://www.robberphex.com/about-me > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > https://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: From robberphex at gmail.com Mon Jun 22 03:40:08 2020 From: robberphex at gmail.com (Robert Lu) Date: Mon, 22 Jun 2020 11:40:08 +0800 Subject: How could self link help GC? In-Reply-To: <03F8B2B0-79C0-4088-BEE7-37579DF843C1@exnet.com> References: <03F8B2B0-79C0-4088-BEE7-37579DF843C1@exnet.com> Message-ID: Hi, Damon. But once dequeued, old node is dead object. The pointer(h.next) from dead object to alive/dead object makes no difference to GC. So I thinks h.next=h is meaningless. And, why isn't it h.next=null ? On Mon, Jun 22, 2020 at 12:18 AM Damon Hart-Davis wrote: > By avoiding spurious pointers to h.next keeping the ?next? item alive > longer than necessary. > > Rgds > > Damon > > On 21 Jun 2020, at 12:59, Robert Lu wrote: > > Hi, > On java.util.concurrent.LinkedBlockingQueue#dequeue > https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/util/concurrent/LinkedBlockingQueue.java#L217 > : > > private E dequeue() { > // assert takeLock.isHeldByCurrentThread(); > // assert head.item == null; > Node h = head; > Node first = h.next; > h.next = h; // help GC > head = first; > E x = first.item; > first.item = null; > return x; > } > > Why does h.next = h help GC? > > -- > Robert Lu > About me: https://www.robberphex.com/about-me > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > https://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > > -- Robert Lu About me: https://www.robberphex.com/about-me -------------- next part -------------- An HTML attachment was scrubbed... URL: From ecki at zusammenkunft.net Mon Jun 22 04:33:44 2020 From: ecki at zusammenkunft.net (Bernd Eckenfels) Date: Mon, 22 Jun 2020 04:33:44 +0000 Subject: How could self link help GC? In-Reply-To: References: <03F8B2B0-79C0-4088-BEE7-37579DF843C1@exnet.com>, Message-ID: It is probably not needed here, since the slot is freed up quite quickly (at the end of the method) and the remaining queue is also kept alive anyway. Setting it to selve instead of null is typically used to reduce the risk of NPEs (but it might on the other hand increase the risk of lovelocks) Not sure if it's a good idea to remove the setting for concurrency reasons, but it seems not really be beneficial for GC. Gruss Bernd -- http://bernd.eckenfels.net ________________________________ Von: hotspot-gc-use im Auftrag von Robert Lu Gesendet: Monday, June 22, 2020 5:40:08 AM An: Damon Hart-Davis Cc: hotspot-gc-use at openjdk.java.net Betreff: Re: How could self link help GC? Hi, Damon. But once dequeued, old node is dead object. The pointer(h.next) from dead object to alive/dead object makes no difference to GC. So I thinks h.next=h is meaningless. And, why isn't it h.next=null ? On Mon, Jun 22, 2020 at 12:18 AM Damon Hart-Davis > wrote: By avoiding spurious pointers to h.next keeping the ?next? item alive longer than necessary. Rgds Damon On 21 Jun 2020, at 12:59, Robert Lu > wrote: Hi, On java.util.concurrent.LinkedBlockingQueue#dequeue https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/util/concurrent/LinkedBlockingQueue.java#L217 : private E dequeue() { // assert takeLock.isHeldByCurrentThread(); // assert head.item == null; Node h = head; Node first = h.next; h.next = h; // help GC head = first; E x = first.item; first.item = null; return x; } Why does h.next = h help GC? -- Robert Lu > About me: https://www.robberphex.com/about-me _______________________________________________ hotspot-gc-use mailing list hotspot-gc-use at openjdk.java.net https://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -- Robert Lu > About me: https://www.robberphex.com/about-me -------------- next part -------------- An HTML attachment was scrubbed... URL: