From todd at cloudera.com Tue Jul 6 13:27:36 2010 From: todd at cloudera.com (Todd Lipcon) Date: Tue, 6 Jul 2010 13:27:36 -0700 Subject: G1GC Full GCs Message-ID: Hi all, I work on HBase, a distributed database written in Java. We generally run on large heaps (8GB+), and our object lifetime distribution has proven pretty problematic for garbage collection (we manage a multi-GB LRU cache inside the process, so in CMS we tenure a lot of byte arrays which later get collected). In Java6, we generally run with the CMS collector, tuning down CMSInitiatingOccupancyFraction and constraining MaxNewSize to achieve fairly low pause GC, but after a week or two of uptime we often run into full heap compaction which takes several minutes and wreaks havoc on the system. Needless to say, we've been watching the development of the G1 GC with anticipation for the last year or two. Finally it seems in the latest build of JDK7 it's stable enough for actual use (previously it would segfault within an hour or so). However, in my testing I'm seeing a fair amount of 8-10 second Full GC pauses. The flags I'm passing are: -Xmx8000m -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:GCPauseIntervalMillis=80 Most of the pauses I see in the GC log are around 10-20ms as expected: 2010-07-05T22:43:19.849-0700: 1680.079: [GC pause (partial), 0.01209000 secs] [Parallel Time: 10.5 ms] [Update RS (Start) (ms): 1680080.2 1680080.1 1680080.2 1680079.9 1680080.0 1680080.2 1680080.1 1680080.1 1680080.0 1680080.1 1680080.0 1680079.9 1680081.5] [Update RS (ms): 1.4 2.0 2.2 1.8 1.7 1.4 2.5 2.2 1.9 2.5 1.7 1.7 0.1 Avg: 1.8, Min: 0.1, Max: 2.5] [Processed Buffers : 8 1 3 1 1 7 3 2 6 2 7 8 3 Sum: 52, Avg: 4, Min: 1, Max: 8] [Ext Root Scanning (ms): 0.7 0.5 0.5 0.3 0.5 0.7 0.4 0.5 0.4 0.5 0.3 0.3 0.0 Avg: 0.4, Min: 0.0, Max: 0.7] [Mark Stack Scanning (ms): 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Avg: 0.0, Min: 0.0, Max: 0.0] [Scan RS (ms): 0.9 0.4 0.1 0.7 0.7 0.9 0.0 0.1 0.6 0.0 0.8 0.8 0.9 Avg: 0.5, Min: 0.0, Max: 0.9] [Object Copy (ms): 7.2 7.2 7.3 7.3 7.1 7.1 7.0 7.2 7.1 6.9 7.1 7.1 7.0 Avg: 7.1, Min: 6.9, Max: 7.3] [Termination (ms): 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Avg: 0.0, Min: 0.0, Max: 0.0] [Other: 0.6 ms] [Clear CT: 0.5 ms] [Other: 1.1 ms] [Choose CSet: 0.0 ms] [ 7677M->7636M(8000M)] [Times: user=0.12 sys=0.00, real=0.01 secs] But every 5-10 minutes I see a GC pause that lasts 10-15 seconds: [todd at monster01 logs]$ grep 'Full GC' gc-hbase.log | tail 2010-07-06T12:50:41.216-0700: 52521.446: [Full GC 7934M->4865M(8000M), 9.8907800 secs] 2010-07-06T12:55:39.802-0700: 52820.032: [Full GC 7930M->4964M(8000M), 9.9025520 secs] 2010-07-06T13:02:26.872-0700: 53227.102: [Full GC 7934M->4882M(8000M), 10.1232190 secs] 2010-07-06T13:09:41.049-0700: 53661.279: [Full GC 7938M->5002M(8000M), 10.4997760 secs] 2010-07-06T13:18:51.531-0700: 54211.761: [Full GC 7938M->4962M(8000M), 11.0497380 secs] These pauses are pretty unacceptable for soft real time operation. Am I missing some tuning that should be done for G1GC for applications like this? Is 20ms out of 80ms too aggressive a target for the garbage rates we're generating? My actual live heap usage should be very stable around 5GB - the application very carefully accounts its live object set at around 60% of the max heap (as you can see in the logs above). At this point we are considering doing crazy things like ripping out our main memory consumers into a custom slab allocator, and manually reference count the byte array slices. But, if we can get G1GC to work for us, it will save a lot of engineering on the application side! Thanks -Todd -- Todd Lipcon Software Engineer, Cloudera -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100706/9ad3f8a9/attachment.html From jon.masamitsu at oracle.com Tue Jul 6 14:09:15 2010 From: jon.masamitsu at oracle.com (Jon Masamitsu) Date: Tue, 06 Jul 2010 14:09:15 -0700 Subject: G1GC Full GCs In-Reply-To: References: Message-ID: <4C339B7B.4000807@oracle.com> Todd, Could you send a segment of the GC logs from the beginning through the first dozen or so full GC's? Exactly which version of the JVM are you using? java -version will tell us. Do you have a test setup where you could do some experiments? Can you send the set of CMS flags you use? It might tell us something about the GC behavior of you application. Might not tell us anything but it's worth a look. Jon On 07/06/10 13:27, Todd Lipcon wrote: > Hi all, > > I work on HBase, a distributed database written in Java. We generally > run on large heaps (8GB+), and our object lifetime distribution has > proven pretty problematic for garbage collection (we manage a multi-GB > LRU cache inside the process, so in CMS we tenure a lot of byte arrays > which later get collected). > > In Java6, we generally run with the CMS collector, tuning down > CMSInitiatingOccupancyFraction and constraining MaxNewSize to achieve > fairly low pause GC, but after a week or two of uptime we often run > into full heap compaction which takes several minutes and wreaks havoc > on the system. > > Needless to say, we've been watching the development of the G1 GC with > anticipation for the last year or two. Finally it seems in the latest > build of JDK7 it's stable enough for actual use (previously it would > segfault within an hour or so). However, in my testing I'm seeing a > fair amount of 8-10 second Full GC pauses. > > The flags I'm passing are: > > -Xmx8000m -XX:+UseG1GC -XX:MaxGCPauseMillis=20 > -XX:GCPauseIntervalMillis=80 > > Most of the pauses I see in the GC log are around 10-20ms as expected: > > 2010-07-05T22:43:19.849-0700: 1680.079: [GC pause (partial), > 0.01209000 secs] > [Parallel Time: 10.5 ms] > [Update RS (Start) (ms): 1680080.2 1680080.1 1680080.2 > 1680079.9 1680080.0 1680080.2 1680080.1 1680080.1 1680080.0 > 1680080.1 1680080.0 1680079.9 1680081.5] > [Update RS (ms): 1.4 2.0 2.2 1.8 1.7 1.4 2.5 2.2 1.9 > 2.5 1.7 1.7 0.1 > Avg: 1.8, Min: 0.1, Max: 2.5] > [Processed Buffers : 8 1 3 1 1 7 3 2 6 2 7 8 3 > Sum: 52, Avg: 4, Min: 1, Max: 8] > [Ext Root Scanning (ms): 0.7 0.5 0.5 0.3 0.5 0.7 0.4 0.5 > 0.4 0.5 0.3 0.3 0.0 > Avg: 0.4, Min: 0.0, Max: 0.7] > [Mark Stack Scanning (ms): 0.0 0.0 0.0 0.0 0.0 0.0 0.0 > 0.0 0.0 0.0 0.0 0.0 0.0 > Avg: 0.0, Min: 0.0, Max: 0.0] > [Scan RS (ms): 0.9 0.4 0.1 0.7 0.7 0.9 0.0 0.1 0.6 0.0 > 0.8 0.8 0.9 > Avg: 0.5, Min: 0.0, Max: 0.9] > [Object Copy (ms): 7.2 7.2 7.3 7.3 7.1 7.1 7.0 7.2 7.1 > 6.9 7.1 7.1 7.0 > Avg: 7.1, Min: 6.9, Max: 7.3] > [Termination (ms): 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 > 0.0 0.0 0.0 0.0 > Avg: 0.0, Min: 0.0, Max: 0.0] > [Other: 0.6 ms] > [Clear CT: 0.5 ms] > [Other: 1.1 ms] > [Choose CSet: 0.0 ms] > [ 7677M->7636M(8000M)] > [Times: user=0.12 sys=0.00, real=0.01 secs] > > But every 5-10 minutes I see a GC pause that lasts 10-15 seconds: > [todd at monster01 logs]$ grep 'Full GC' gc-hbase.log | tail > 2010-07-06T12:50:41.216-0700: 52521.446: [Full GC 7934M->4865M(8000M), > 9.8907800 secs] > 2010-07-06T12:55:39.802-0700: 52820.032: [Full GC 7930M->4964M(8000M), > 9.9025520 secs] > 2010-07-06T13:02:26.872-0700: 53227.102: [Full GC 7934M->4882M(8000M), > 10.1232190 secs] > 2010-07-06T13:09:41.049-0700: 53661.279: [Full GC 7938M->5002M(8000M), > 10.4997760 secs] > 2010-07-06T13:18:51.531-0700: 54211.761: [Full GC 7938M->4962M(8000M), > 11.0497380 secs] > > These pauses are pretty unacceptable for soft real time operation. > > Am I missing some tuning that should be done for G1GC for applications > like this? Is 20ms out of 80ms too aggressive a target for the garbage > rates we're generating? > > My actual live heap usage should be very stable around 5GB - the > application very carefully accounts its live object set at around 60% > of the max heap (as you can see in the logs above). > > At this point we are considering doing crazy things like ripping out > our main memory consumers into a custom slab allocator, and manually > reference count the byte array slices. But, if we can get G1GC to work > for us, it will save a lot of engineering on the application side! > > Thanks > -Todd > > -- > Todd Lipcon > Software Engineer, Cloudera > ------------------------------------------------------------------------ > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100706/892ea9ad/attachment.html From y.s.ramakrishna at oracle.com Tue Jul 6 14:12:12 2010 From: y.s.ramakrishna at oracle.com (Y. S. Ramakrishna) Date: Tue, 06 Jul 2010 14:12:12 -0700 Subject: G1GC Full GCs In-Reply-To: References: Message-ID: <4C339C2C.6030605@oracle.com> Did you try doubling the heap size? You might want to post a full log so we can see what's happening between those full collections. Also, If you have comparable CMS logs all the better, as a known starting point. The full gc's almost look like the heap got too full, so it must mean that incremental collection is not keeping up with the rate of garbage generation. Also, what's the JDK build you are running? -- ramki On 07/06/10 13:27, Todd Lipcon wrote: > Hi all, > > I work on HBase, a distributed database written in Java. We generally > run on large heaps (8GB+), and our object lifetime distribution has > proven pretty problematic for garbage collection (we manage a multi-GB > LRU cache inside the process, so in CMS we tenure a lot of byte arrays > which later get collected). > > In Java6, we generally run with the CMS collector, tuning down > CMSInitiatingOccupancyFraction and constraining MaxNewSize to achieve > fairly low pause GC, but after a week or two of uptime we often run into > full heap compaction which takes several minutes and wreaks havoc on the > system. > > Needless to say, we've been watching the development of the G1 GC with > anticipation for the last year or two. Finally it seems in the latest > build of JDK7 it's stable enough for actual use (previously it would > segfault within an hour or so). However, in my testing I'm seeing a fair > amount of 8-10 second Full GC pauses. > > The flags I'm passing are: > > -Xmx8000m -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:GCPauseIntervalMillis=80 > > Most of the pauses I see in the GC log are around 10-20ms as expected: > > 2010-07-05T22:43:19.849-0700: 1680.079: [GC pause (partial), 0.01209000 > secs] > [Parallel Time: 10.5 ms] > [Update RS (Start) (ms): 1680080.2 1680080.1 1680080.2 > 1680079.9 1680080.0 1680080.2 1680080.1 1680080.1 1680080.0 > 1680080.1 1680080.0 1680079.9 1680081.5] > [Update RS (ms): 1.4 2.0 2.2 1.8 1.7 1.4 2.5 2.2 1.9 2.5 > 1.7 1.7 0.1 > Avg: 1.8, Min: 0.1, Max: 2.5] > [Processed Buffers : 8 1 3 1 1 7 3 2 6 2 7 8 3 > Sum: 52, Avg: 4, Min: 1, Max: 8] > [Ext Root Scanning (ms): 0.7 0.5 0.5 0.3 0.5 0.7 0.4 0.5 > 0.4 0.5 0.3 0.3 0.0 > Avg: 0.4, Min: 0.0, Max: 0.7] > [Mark Stack Scanning (ms): 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 > 0.0 0.0 0.0 0.0 0.0 > Avg: 0.0, Min: 0.0, Max: 0.0] > [Scan RS (ms): 0.9 0.4 0.1 0.7 0.7 0.9 0.0 0.1 0.6 0.0 > 0.8 0.8 0.9 > Avg: 0.5, Min: 0.0, Max: 0.9] > [Object Copy (ms): 7.2 7.2 7.3 7.3 7.1 7.1 7.0 7.2 7.1 > 6.9 7.1 7.1 7.0 > Avg: 7.1, Min: 6.9, Max: 7.3] > [Termination (ms): 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 > 0.0 0.0 0.0 0.0 > Avg: 0.0, Min: 0.0, Max: 0.0] > [Other: 0.6 ms] > [Clear CT: 0.5 ms] > [Other: 1.1 ms] > [Choose CSet: 0.0 ms] > [ 7677M->7636M(8000M)] > [Times: user=0.12 sys=0.00, real=0.01 secs] > > But every 5-10 minutes I see a GC pause that lasts 10-15 seconds: > [todd at monster01 logs]$ grep 'Full GC' gc-hbase.log | tail > 2010-07-06T12:50:41.216-0700: 52521.446: [Full GC 7934M->4865M(8000M), > 9.8907800 secs] > 2010-07-06T12:55:39.802-0700: 52820.032: [Full GC 7930M->4964M(8000M), > 9.9025520 secs] > 2010-07-06T13:02:26.872-0700: 53227.102: [Full GC 7934M->4882M(8000M), > 10.1232190 secs] > 2010-07-06T13:09:41.049-0700: 53661.279: [Full GC 7938M->5002M(8000M), > 10.4997760 secs] > 2010-07-06T13:18:51.531-0700: 54211.761: [Full GC 7938M->4962M(8000M), > 11.0497380 secs] > > These pauses are pretty unacceptable for soft real time operation. > > Am I missing some tuning that should be done for G1GC for applications > like this? Is 20ms out of 80ms too aggressive a target for the garbage > rates we're generating? > > My actual live heap usage should be very stable around 5GB - the > application very carefully accounts its live object set at around 60% of > the max heap (as you can see in the logs above). > > At this point we are considering doing crazy things like ripping out our > main memory consumers into a custom slab allocator, and manually > reference count the byte array slices. But, if we can get G1GC to work > for us, it will save a lot of engineering on the application side! > > Thanks > -Todd > > -- > Todd Lipcon > Software Engineer, Cloudera > > > ------------------------------------------------------------------------ > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From todd at cloudera.com Tue Jul 6 14:24:27 2010 From: todd at cloudera.com (Todd Lipcon) Date: Tue, 6 Jul 2010 14:24:27 -0700 Subject: G1GC Full GCs In-Reply-To: <4C339B7B.4000807@oracle.com> References: <4C339B7B.4000807@oracle.com> Message-ID: On Tue, Jul 6, 2010 at 2:09 PM, Jon Masamitsu wrote: > Todd, > > Could you send a segment of the GC logs from the beginning > through the first dozen or so full GC's? > Sure, I just put it online at: http://cloudera-todd.s3.amazonaws.com/gc-log-g1gc.txt > > Exactly which version of the JVM are you using? > > java -version > > will tell us. > > Latest as of last night: [todd at monster01 ~]$ ./jdk1.7.0/jre/bin/java -version java version "1.7.0-ea" Java(TM) SE Runtime Environment (build 1.7.0-ea-b99) Java HotSpot(TM) 64-Bit Server VM (build 19.0-b03, mixed mode) > Do you have a test setup where you could do some experiments? > > Sure, I have a five node cluster here where I do lots of testing, happy to try different builds/options/etc (though I probably don't have time to apply patches and rebuild the JDK myself) > Can you send the set of CMS flags you use? It might tell > us something about the GC behavior of you application. > Might not tell us anything but it's worth a look. > Different customers have found different flags to work well for them. One user uses the following: -Xmx12000m -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC \ -XX:NewSize=64m -XX:MaxNewSize=64m -XX:CMSInitiatingOccupancyFraction=40 \ Another uses: -XX:+DoEscapeAnalysis -XX:+AggressiveOpts -XX:+UseConcMarkSweepGC -XX:NewSize=64m -XX:MaxNewSize=64m -XX:CMSInitiatingOccupancyFraction=88 -verbose:gc -XX:+PrintGCDetails The particular tuning options probably depend on the actual cache workload of the user. I tend to recommend CMSInitiatingOccupancyFraction around 75 or so, since the software maintains about 60% heap usage. I also think a NewSize slightly larger would improve things a bit, but if it gets more than 256m or so, the ParNew pauses start to be too long for a lot of use cases. Regarding CMS logs, I can probably restart this test later this afternoon on CMS and run it for a couple hours, but it isn't likely to hit the multi-minute compaction that quickly. It happens more in the wild. -Todd > > On 07/06/10 13:27, Todd Lipcon wrote: > > Hi all, > > I work on HBase, a distributed database written in Java. We generally run > on large heaps (8GB+), and our object lifetime distribution has proven > pretty problematic for garbage collection (we manage a multi-GB LRU cache > inside the process, so in CMS we tenure a lot of byte arrays which later get > collected). > > In Java6, we generally run with the CMS collector, tuning down > CMSInitiatingOccupancyFraction and constraining MaxNewSize to achieve fairly > low pause GC, but after a week or two of uptime we often run into full heap > compaction which takes several minutes and wreaks havoc on the system. > > Needless to say, we've been watching the development of the G1 GC with > anticipation for the last year or two. Finally it seems in the latest build > of JDK7 it's stable enough for actual use (previously it would segfault > within an hour or so). However, in my testing I'm seeing a fair amount of > 8-10 second Full GC pauses. > > The flags I'm passing are: > > -Xmx8000m -XX:+UseG1GC -XX:MaxGCPauseMillis=20 > -XX:GCPauseIntervalMillis=80 > > Most of the pauses I see in the GC log are around 10-20ms as expected: > > 2010-07-05T22:43:19.849-0700: 1680.079: [GC pause (partial), 0.01209000 > secs] > [Parallel Time: 10.5 ms] > [Update RS (Start) (ms): 1680080.2 1680080.1 1680080.2 1680079.9 > 1680080.0 1680080.2 1680080.1 1680080.1 1680080.0 1680080.1 1680080.0 > 1680079.9 1680081.5] > [Update RS (ms): 1.4 2.0 2.2 1.8 1.7 1.4 2.5 2.2 1.9 2.5 > 1.7 1.7 0.1 > Avg: 1.8, Min: 0.1, Max: 2.5] > [Processed Buffers : 8 1 3 1 1 7 3 2 6 2 7 8 3 > Sum: 52, Avg: 4, Min: 1, Max: 8] > [Ext Root Scanning (ms): 0.7 0.5 0.5 0.3 0.5 0.7 0.4 0.5 0.4 > 0.5 0.3 0.3 0.0 > Avg: 0.4, Min: 0.0, Max: 0.7] > [Mark Stack Scanning (ms): 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 > 0.0 0.0 0.0 0.0 0.0 > Avg: 0.0, Min: 0.0, Max: 0.0] > [Scan RS (ms): 0.9 0.4 0.1 0.7 0.7 0.9 0.0 0.1 0.6 0.0 0.8 > 0.8 0.9 > Avg: 0.5, Min: 0.0, Max: 0.9] > [Object Copy (ms): 7.2 7.2 7.3 7.3 7.1 7.1 7.0 7.2 7.1 6.9 > 7.1 7.1 7.0 > Avg: 7.1, Min: 6.9, Max: 7.3] > [Termination (ms): 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 > 0.0 0.0 0.0 > Avg: 0.0, Min: 0.0, Max: 0.0] > [Other: 0.6 ms] > [Clear CT: 0.5 ms] > [Other: 1.1 ms] > [Choose CSet: 0.0 ms] > [ 7677M->7636M(8000M)] > [Times: user=0.12 sys=0.00, real=0.01 secs] > > But every 5-10 minutes I see a GC pause that lasts 10-15 seconds: > [todd at monster01 logs]$ grep 'Full GC' gc-hbase.log | tail > 2010-07-06T12:50:41.216-0700: 52521.446: [Full GC 7934M->4865M(8000M), > 9.8907800 secs] > 2010-07-06T12:55:39.802-0700: 52820.032: [Full GC 7930M->4964M(8000M), > 9.9025520 secs] > 2010-07-06T13:02:26.872-0700: 53227.102: [Full GC 7934M->4882M(8000M), > 10.1232190 secs] > 2010-07-06T13:09:41.049-0700: 53661.279: [Full GC 7938M->5002M(8000M), > 10.4997760 secs] > 2010-07-06T13:18:51.531-0700: 54211.761: [Full GC 7938M->4962M(8000M), > 11.0497380 secs] > > These pauses are pretty unacceptable for soft real time operation. > > Am I missing some tuning that should be done for G1GC for applications > like this? Is 20ms out of 80ms too aggressive a target for the garbage rates > we're generating? > > My actual live heap usage should be very stable around 5GB - the > application very carefully accounts its live object set at around 60% of the > max heap (as you can see in the logs above). > > At this point we are considering doing crazy things like ripping out our > main memory consumers into a custom slab allocator, and manually reference > count the byte array slices. But, if we can get G1GC to work for us, it will > save a lot of engineering on the application side! > > Thanks > -Todd > > -- > Todd Lipcon > Software Engineer, Cloudera > > ------------------------------ > > _______________________________________________ > hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > -- Todd Lipcon Software Engineer, Cloudera -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100706/b169796f/attachment.html From ryanobjc at gmail.com Tue Jul 6 14:40:28 2010 From: ryanobjc at gmail.com (Ryan Rawson) Date: Tue, 6 Jul 2010 14:40:28 -0700 Subject: G1GC Full GCs In-Reply-To: <4C339C2C.6030605@oracle.com> References: <4C339C2C.6030605@oracle.com> Message-ID: I also work with Todd on this systems (I am one of the other people with the alternate CMS config) and doubling the heap size from 8 GB to 16GB is a little insane... we'd like to have some amount of reasonable memory efficiency here... The thing is the more we can get out of our ram for this block cache, the better performing our systems are. Also a lot of the settings are self tuning, so if we up the Xmx the size of the block cache is scaled as well. -ryan On Tue, Jul 6, 2010 at 2:12 PM, Y. S. Ramakrishna wrote: > Did you try doubling the heap size? You might want to post a full > log so we can see what's happening between those full collections. > Also, If you have comparable CMS logs > all the better, as a known starting point. The full gc's almost > look like the heap got too full, so it must mean that incremental > collection is not keeping up with the rate of garbage generation. > Also, what's the JDK build you are running? > > -- ramki > > On 07/06/10 13:27, Todd Lipcon wrote: >> Hi all, >> >> I work on HBase, a distributed database written in Java. We generally >> run on large heaps (8GB+), and our object lifetime distribution has >> proven pretty problematic for garbage collection (we manage a multi-GB >> LRU cache inside the process, so in CMS we tenure a lot of byte arrays >> which later get collected). >> >> In Java6, we generally run with the CMS collector, tuning down >> CMSInitiatingOccupancyFraction and constraining MaxNewSize to achieve >> fairly low pause GC, but after a week or two of uptime we often run into >> full heap compaction which takes several minutes and wreaks havoc on the >> system. >> >> Needless to say, we've been watching the development of the G1 GC with >> anticipation for the last year or two. Finally it seems in the latest >> build of JDK7 it's stable enough for actual use (previously it would >> segfault within an hour or so). However, in my testing I'm seeing a fair >> amount of 8-10 second Full GC pauses. >> >> The flags I'm passing are: >> >> -Xmx8000m -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:GCPauseIntervalMillis=80 >> >> Most of the pauses I see in the GC log are around 10-20ms as expected: >> >> 2010-07-05T22:43:19.849-0700: 1680.079: [GC pause (partial), 0.01209000 >> secs] >> ? ?[Parallel Time: ?10.5 ms] >> ? ? ? [Update RS (Start) (ms): ?1680080.2 ?1680080.1 ?1680080.2 >> ?1680079.9 ?1680080.0 ?1680080.2 ?1680080.1 ?1680080.1 ?1680080.0 >> ?1680080.1 ?1680080.0 ?1680079.9 ?1680081.5] >> ? ? ? [Update RS (ms): ?1.4 ?2.0 ?2.2 ?1.8 ?1.7 ?1.4 ?2.5 ?2.2 ?1.9 ?2.5 >> ?1.7 ?1.7 ?0.1 >> ? ? ? ?Avg: ? 1.8, Min: ? 0.1, Max: ? 2.5] >> ? ? ? ? ?[Processed Buffers : 8 1 3 1 1 7 3 2 6 2 7 8 3 >> ? ? ? ? ? Sum: 52, Avg: 4, Min: 1, Max: 8] >> ? ? ? [Ext Root Scanning (ms): ?0.7 ?0.5 ?0.5 ?0.3 ?0.5 ?0.7 ?0.4 ?0.5 >> ?0.4 ?0.5 ?0.3 ?0.3 ?0.0 >> ? ? ? ?Avg: ? 0.4, Min: ? 0.0, Max: ? 0.7] >> ? ? ? [Mark Stack Scanning (ms): ?0.0 ?0.0 ?0.0 ?0.0 ?0.0 ?0.0 ?0.0 ?0.0 >> ?0.0 ?0.0 ?0.0 ?0.0 ?0.0 >> ? ? ? ?Avg: ? 0.0, Min: ? 0.0, Max: ? 0.0] >> ? ? ? [Scan RS (ms): ?0.9 ?0.4 ?0.1 ?0.7 ?0.7 ?0.9 ?0.0 ?0.1 ?0.6 ?0.0 >> ?0.8 ?0.8 ?0.9 >> ? ? ? ?Avg: ? 0.5, Min: ? 0.0, Max: ? 0.9] >> ? ? ? [Object Copy (ms): ?7.2 ?7.2 ?7.3 ?7.3 ?7.1 ?7.1 ?7.0 ?7.2 ?7.1 >> ?6.9 ?7.1 ?7.1 ?7.0 >> ? ? ? ?Avg: ? 7.1, Min: ? 6.9, Max: ? 7.3] >> ? ? ? [Termination (ms): ?0.0 ?0.0 ?0.0 ?0.0 ?0.0 ?0.0 ?0.0 ?0.0 ?0.0 >> ?0.0 ?0.0 ?0.0 ?0.0 >> ? ? ? ?Avg: ? 0.0, Min: ? 0.0, Max: ? 0.0] >> ? ? ? [Other: ? 0.6 ms] >> ? ?[Clear CT: ? 0.5 ms] >> ? ?[Other: ? 1.1 ms] >> ? ? ? [Choose CSet: ? 0.0 ms] >> ? ?[ 7677M->7636M(8000M)] >> ?[Times: user=0.12 sys=0.00, real=0.01 secs] >> >> But every 5-10 minutes I see a GC pause that lasts 10-15 seconds: >> [todd at monster01 logs]$ grep 'Full GC' gc-hbase.log | tail >> 2010-07-06T12:50:41.216-0700: 52521.446: [Full GC 7934M->4865M(8000M), >> 9.8907800 secs] >> 2010-07-06T12:55:39.802-0700: 52820.032: [Full GC 7930M->4964M(8000M), >> 9.9025520 secs] >> 2010-07-06T13:02:26.872-0700: 53227.102: [Full GC 7934M->4882M(8000M), >> 10.1232190 secs] >> 2010-07-06T13:09:41.049-0700: 53661.279: [Full GC 7938M->5002M(8000M), >> 10.4997760 secs] >> 2010-07-06T13:18:51.531-0700: 54211.761: [Full GC 7938M->4962M(8000M), >> 11.0497380 secs] >> >> These pauses are pretty unacceptable for soft real time operation. >> >> Am I missing some tuning that should be done for G1GC for applications >> like this? Is 20ms out of 80ms too aggressive a target for the garbage >> rates we're generating? >> >> My actual live heap usage should be very stable around 5GB - the >> application very carefully accounts its live object set at around 60% of >> the max heap (as you can see in the logs above). >> >> At this point we are considering doing crazy things like ripping out our >> main memory consumers into a custom slab allocator, and manually >> reference count the byte array slices. But, if we can get G1GC to work >> for us, it will save a lot of engineering on the application side! >> >> Thanks >> -Todd >> >> -- >> Todd Lipcon >> Software Engineer, Cloudera >> >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > From y.s.ramakrishna at oracle.com Tue Jul 6 16:57:32 2010 From: y.s.ramakrishna at oracle.com (Y. S. Ramakrishna) Date: Tue, 06 Jul 2010 16:57:32 -0700 Subject: G1GC Full GCs In-Reply-To: References: <4C339B7B.4000807@oracle.com> Message-ID: <4C33C2EC.9090105@oracle.com> Two questions: (1) do you enable class unloading with CMS? (I do not see that below in yr option list, but wondered.) (2) does your application load classes, or intern a lot of strings? If i am reading the logs right, G1 appears to reclaim less and less of the heap in each cycle until a full collection intervenes, and I have no real explanation for this behaviour except that perhaps there's something in the perm gen that keeps stuff in the remainder of the heap artificially live. G1 does not incrementally collect the young gen, so this is plausible. But CMS does not either by default and I do not see that option in the CMS options list you gave below. It would be instructive to see what the comparable CMS logs look like. May be then you could start with the same heap shapes for the two and see if you can get to the bottom of the full gc (which as i understand you get to more quickly w/G1 than you did w/CMS). -- ramki On 07/06/10 14:24, Todd Lipcon wrote: > On Tue, Jul 6, 2010 at 2:09 PM, Jon Masamitsu > wrote: > > Todd, > > Could you send a segment of the GC logs from the beginning > through the first dozen or so full GC's? > > > Sure, I just put it online at: > > http://cloudera-todd.s3.amazonaws.com/gc-log-g1gc.txt > > > > Exactly which version of the JVM are you using? > > java -version > > will tell us. > > > Latest as of last night: > > [todd at monster01 ~]$ ./jdk1.7.0/jre/bin/java -version > java version "1.7.0-ea" > Java(TM) SE Runtime Environment (build 1.7.0-ea-b99) > Java HotSpot(TM) 64-Bit Server VM (build 19.0-b03, mixed mode) > > > Do you have a test setup where you could do some experiments? > > > Sure, I have a five node cluster here where I do lots of testing, happy > to try different builds/options/etc (though I probably don't have time > to apply patches and rebuild the JDK myself) > > > Can you send the set of CMS flags you use? It might tell > us something about the GC behavior of you application. > Might not tell us anything but it's worth a look. > > > Different customers have found different flags to work well for them. > One user uses the following: > > > -Xmx12000m -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC \ > -XX:NewSize=64m -XX:MaxNewSize=64m -XX:CMSInitiatingOccupancyFraction=40 \ > > > Another uses: > > > -XX:+DoEscapeAnalysis -XX:+AggressiveOpts -XX:+UseConcMarkSweepGC > -XX:NewSize=64m -XX:MaxNewSize=64m > -XX:CMSInitiatingOccupancyFraction=88 -verbose:gc -XX:+PrintGCDetails > > > > > > > > The particular tuning options probably depend on the actual cache > workload of the user. I tend to recommend CMSInitiatingOccupancyFraction > around 75 or so, since the software maintains about 60% heap usage. I > also think a NewSize slightly larger would improve things a bit, but if > it gets more than 256m or so, the ParNew pauses start to be too long for > a lot of use cases. > > Regarding CMS logs, I can probably restart this test later this > afternoon on CMS and run it for a couple hours, but it isn't likely to > hit the multi-minute compaction that quickly. It happens more in the wild. > > -Todd > > > > On 07/06/10 13:27, Todd Lipcon wrote: >> Hi all, >> >> I work on HBase, a distributed database written in Java. We >> generally run on large heaps (8GB+), and our object lifetime >> distribution has proven pretty problematic for garbage collection >> (we manage a multi-GB LRU cache inside the process, so in CMS we >> tenure a lot of byte arrays which later get collected). >> >> In Java6, we generally run with the CMS collector, tuning down >> CMSInitiatingOccupancyFraction and constraining MaxNewSize to >> achieve fairly low pause GC, but after a week or two of uptime we >> often run into full heap compaction which takes several minutes >> and wreaks havoc on the system. >> >> Needless to say, we've been watching the development of the G1 GC >> with anticipation for the last year or two. Finally it seems in >> the latest build of JDK7 it's stable enough for actual use >> (previously it would segfault within an hour or so). However, in >> my testing I'm seeing a fair amount of 8-10 second Full GC pauses. >> >> The flags I'm passing are: >> >> -Xmx8000m -XX:+UseG1GC -XX:MaxGCPauseMillis=20 >> -XX:GCPauseIntervalMillis=80 >> >> Most of the pauses I see in the GC log are around 10-20ms as expected: >> >> 2010-07-05T22:43:19.849-0700: 1680.079: [GC pause (partial), >> 0.01209000 secs] >> [Parallel Time: 10.5 ms] >> [Update RS (Start) (ms): 1680080.2 1680080.1 1680080.2 >> 1680079.9 1680080.0 1680080.2 1680080.1 1680080.1 1680080.0 >> 1680080.1 1680080.0 1680079.9 1680081.5] >> [Update RS (ms): 1.4 2.0 2.2 1.8 1.7 1.4 2.5 2.2 >> 1.9 2.5 1.7 1.7 0.1 >> Avg: 1.8, Min: 0.1, Max: 2.5] >> [Processed Buffers : 8 1 3 1 1 7 3 2 6 2 7 8 3 >> Sum: 52, Avg: 4, Min: 1, Max: 8] >> [Ext Root Scanning (ms): 0.7 0.5 0.5 0.3 0.5 0.7 0.4 >> 0.5 0.4 0.5 0.3 0.3 0.0 >> Avg: 0.4, Min: 0.0, Max: 0.7] >> [Mark Stack Scanning (ms): 0.0 0.0 0.0 0.0 0.0 0.0 >> 0.0 0.0 0.0 0.0 0.0 0.0 0.0 >> Avg: 0.0, Min: 0.0, Max: 0.0] >> [Scan RS (ms): 0.9 0.4 0.1 0.7 0.7 0.9 0.0 0.1 0.6 >> 0.0 0.8 0.8 0.9 >> Avg: 0.5, Min: 0.0, Max: 0.9] >> [Object Copy (ms): 7.2 7.2 7.3 7.3 7.1 7.1 7.0 7.2 >> 7.1 6.9 7.1 7.1 7.0 >> Avg: 7.1, Min: 6.9, Max: 7.3] >> [Termination (ms): 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 >> 0.0 0.0 0.0 0.0 0.0 >> Avg: 0.0, Min: 0.0, Max: 0.0] >> [Other: 0.6 ms] >> [Clear CT: 0.5 ms] >> [Other: 1.1 ms] >> [Choose CSet: 0.0 ms] >> [ 7677M->7636M(8000M)] >> [Times: user=0.12 sys=0.00, real=0.01 secs] >> >> But every 5-10 minutes I see a GC pause that lasts 10-15 seconds: >> [todd at monster01 logs]$ grep 'Full GC' gc-hbase.log | tail >> 2010-07-06T12:50:41.216-0700: 52521.446: [Full GC >> 7934M->4865M(8000M), 9.8907800 secs] >> 2010-07-06T12:55:39.802-0700: 52820.032: [Full GC >> 7930M->4964M(8000M), 9.9025520 secs] >> 2010-07-06T13:02:26.872-0700: 53227.102: [Full GC >> 7934M->4882M(8000M), 10.1232190 secs] >> 2010-07-06T13:09:41.049-0700: 53661.279: [Full GC >> 7938M->5002M(8000M), 10.4997760 secs] >> 2010-07-06T13:18:51.531-0700: 54211.761: [Full GC >> 7938M->4962M(8000M), 11.0497380 secs] >> >> These pauses are pretty unacceptable for soft real time operation. >> >> Am I missing some tuning that should be done for G1GC for >> applications like this? Is 20ms out of 80ms too aggressive a >> target for the garbage rates we're generating? >> >> My actual live heap usage should be very stable around 5GB - the >> application very carefully accounts its live object set at around >> 60% of the max heap (as you can see in the logs above). >> >> At this point we are considering doing crazy things like ripping >> out our main memory consumers into a custom slab allocator, and >> manually reference count the byte array slices. But, if we can get >> G1GC to work for us, it will save a lot of engineering on the >> application side! >> >> Thanks >> -Todd >> >> -- >> Todd Lipcon >> Software Engineer, Cloudera >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> > > > > > -- > Todd Lipcon > Software Engineer, Cloudera > > > ------------------------------------------------------------------------ > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From todd at cloudera.com Tue Jul 6 17:41:14 2010 From: todd at cloudera.com (Todd Lipcon) Date: Tue, 6 Jul 2010 17:41:14 -0700 Subject: G1GC Full GCs In-Reply-To: <4C33C2EC.9090105@oracle.com> References: <4C339B7B.4000807@oracle.com> <4C33C2EC.9090105@oracle.com> Message-ID: On Tue, Jul 6, 2010 at 4:57 PM, Y. S. Ramakrishna < y.s.ramakrishna at oracle.com> wrote: > Two questions: > > (1) do you enable class unloading with CMS? (I do not see that > below in yr option list, but wondered.) > > We don't enable class unloading. > (2) does your application load classes, or intern a lot of strings? > > We do a little bit of class loading at startup - we have some basic informational web UIs that are in Jetty servlets, and I think Jetty uses its own ClassLoader. It's nothing fancy, though, and nothing should get unloaded/reloaded. Perm usually stabilizes around 30MB if I recall correctly when I use CMS. > If i am reading the logs right, G1 appears to reclaim less > and less of the heap in each cycle until a full collection > intervenes, and I have no real explanation for this behaviour > except that perhaps there's something in the perm gen that > keeps stuff in the remainder of the heap artificially live. > G1 does not incrementally collect the young gen, so this is > plausible. But CMS does not either by default and I do not see > that option in the CMS options list you gave below. We don't enable permgen collection with CMS - the options I gave were from production instances. > It would > be instructive to see what the comparable CMS logs look like. > May be then you could start with the same heap shapes for the > two and see if you can get to the bottom of the full gc (which > as i understand you get to more quickly w/G1 than you did > w/CMS). > Yep, I'll kick off some CMS tests this evening and get back to you with those logs. Thanks for the help, all. -Todd > > On 07/06/10 14:24, Todd Lipcon wrote: > >> On Tue, Jul 6, 2010 at 2:09 PM, Jon Masamitsu > jon.masamitsu at oracle.com>> wrote: >> >> Todd, >> >> Could you send a segment of the GC logs from the beginning >> through the first dozen or so full GC's? >> >> >> Sure, I just put it online at: >> >> http://cloudera-todd.s3.amazonaws.com/gc-log-g1gc.txt >> >> >> Exactly which version of the JVM are you using? >> >> java -version >> >> will tell us. >> >> >> Latest as of last night: >> >> [todd at monster01 ~]$ ./jdk1.7.0/jre/bin/java -version >> java version "1.7.0-ea" >> Java(TM) SE Runtime Environment (build 1.7.0-ea-b99) >> Java HotSpot(TM) 64-Bit Server VM (build 19.0-b03, mixed mode) >> >> Do you have a test setup where you could do some experiments? >> >> >> Sure, I have a five node cluster here where I do lots of testing, happy to >> try different builds/options/etc (though I probably don't have time to apply >> patches and rebuild the JDK myself) >> >> Can you send the set of CMS flags you use? It might tell >> us something about the GC behavior of you application. >> Might not tell us anything but it's worth a look. >> >> >> Different customers have found different flags to work well for them. One >> user uses the following: >> >> >> -Xmx12000m -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC \ >> -XX:NewSize=64m -XX:MaxNewSize=64m -XX:CMSInitiatingOccupancyFraction=40 \ >> >> >> Another uses: >> >> >> -XX:+DoEscapeAnalysis -XX:+AggressiveOpts -XX:+UseConcMarkSweepGC >> -XX:NewSize=64m -XX:MaxNewSize=64m >> -XX:CMSInitiatingOccupancyFraction=88 -verbose:gc -XX:+PrintGCDetails >> >> >> >> >> >> >> >> The particular tuning options probably depend on the actual cache workload >> of the user. I tend to recommend CMSInitiatingOccupancyFraction around 75 or >> so, since the software maintains about 60% heap usage. I also think a >> NewSize slightly larger would improve things a bit, but if it gets more than >> 256m or so, the ParNew pauses start to be too long for a lot of use cases. >> >> Regarding CMS logs, I can probably restart this test later this afternoon >> on CMS and run it for a couple hours, but it isn't likely to hit the >> multi-minute compaction that quickly. It happens more in the wild. >> >> -Todd >> >> >> >> On 07/06/10 13:27, Todd Lipcon wrote: >> >>> Hi all, >>> >>> I work on HBase, a distributed database written in Java. We >>> generally run on large heaps (8GB+), and our object lifetime >>> distribution has proven pretty problematic for garbage collection >>> (we manage a multi-GB LRU cache inside the process, so in CMS we >>> tenure a lot of byte arrays which later get collected). >>> >>> In Java6, we generally run with the CMS collector, tuning down >>> CMSInitiatingOccupancyFraction and constraining MaxNewSize to >>> achieve fairly low pause GC, but after a week or two of uptime we >>> often run into full heap compaction which takes several minutes >>> and wreaks havoc on the system. >>> >>> Needless to say, we've been watching the development of the G1 GC >>> with anticipation for the last year or two. Finally it seems in >>> the latest build of JDK7 it's stable enough for actual use >>> (previously it would segfault within an hour or so). However, in >>> my testing I'm seeing a fair amount of 8-10 second Full GC pauses. >>> >>> The flags I'm passing are: >>> >>> -Xmx8000m -XX:+UseG1GC -XX:MaxGCPauseMillis=20 >>> -XX:GCPauseIntervalMillis=80 >>> >>> Most of the pauses I see in the GC log are around 10-20ms as expected: >>> >>> 2010-07-05T22:43:19.849-0700: 1680.079: [GC pause (partial), >>> 0.01209000 secs] >>> [Parallel Time: 10.5 ms] >>> [Update RS (Start) (ms): 1680080.2 1680080.1 1680080.2 >>> 1680079.9 1680080.0 1680080.2 1680080.1 1680080.1 1680080.0 >>> 1680080.1 1680080.0 1680079.9 1680081.5] >>> [Update RS (ms): 1.4 2.0 2.2 1.8 1.7 1.4 2.5 2.2 >>> 1.9 2.5 1.7 1.7 0.1 >>> Avg: 1.8, Min: 0.1, Max: 2.5] >>> [Processed Buffers : 8 1 3 1 1 7 3 2 6 2 7 8 3 >>> Sum: 52, Avg: 4, Min: 1, Max: 8] >>> [Ext Root Scanning (ms): 0.7 0.5 0.5 0.3 0.5 0.7 0.4 >>> 0.5 0.4 0.5 0.3 0.3 0.0 >>> Avg: 0.4, Min: 0.0, Max: 0.7] >>> [Mark Stack Scanning (ms): 0.0 0.0 0.0 0.0 0.0 0.0 >>> 0.0 0.0 0.0 0.0 0.0 0.0 0.0 >>> Avg: 0.0, Min: 0.0, Max: 0.0] >>> [Scan RS (ms): 0.9 0.4 0.1 0.7 0.7 0.9 0.0 0.1 0.6 >>> 0.0 0.8 0.8 0.9 >>> Avg: 0.5, Min: 0.0, Max: 0.9] >>> [Object Copy (ms): 7.2 7.2 7.3 7.3 7.1 7.1 7.0 7.2 >>> 7.1 6.9 7.1 7.1 7.0 >>> Avg: 7.1, Min: 6.9, Max: 7.3] >>> [Termination (ms): 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 >>> 0.0 0.0 0.0 0.0 0.0 >>> Avg: 0.0, Min: 0.0, Max: 0.0] >>> [Other: 0.6 ms] >>> [Clear CT: 0.5 ms] >>> [Other: 1.1 ms] >>> [Choose CSet: 0.0 ms] >>> [ 7677M->7636M(8000M)] >>> [Times: user=0.12 sys=0.00, real=0.01 secs] >>> >>> But every 5-10 minutes I see a GC pause that lasts 10-15 seconds: >>> [todd at monster01 logs]$ grep 'Full GC' gc-hbase.log | tail >>> 2010-07-06T12:50:41.216-0700: 52521.446: [Full GC >>> 7934M->4865M(8000M), 9.8907800 secs] >>> 2010-07-06T12:55:39.802-0700: 52820.032: [Full GC >>> 7930M->4964M(8000M), 9.9025520 secs] >>> 2010-07-06T13:02:26.872-0700: 53227.102: [Full GC >>> 7934M->4882M(8000M), 10.1232190 secs] >>> 2010-07-06T13:09:41.049-0700: 53661.279: [Full GC >>> 7938M->5002M(8000M), 10.4997760 secs] >>> 2010-07-06T13:18:51.531-0700: 54211.761: [Full GC >>> 7938M->4962M(8000M), 11.0497380 secs] >>> >>> These pauses are pretty unacceptable for soft real time operation. >>> >>> Am I missing some tuning that should be done for G1GC for >>> applications like this? Is 20ms out of 80ms too aggressive a >>> target for the garbage rates we're generating? >>> >>> My actual live heap usage should be very stable around 5GB - the >>> application very carefully accounts its live object set at around >>> 60% of the max heap (as you can see in the logs above). >>> >>> At this point we are considering doing crazy things like ripping >>> out our main memory consumers into a custom slab allocator, and >>> manually reference count the byte array slices. But, if we can get >>> G1GC to work for us, it will save a lot of engineering on the >>> application side! >>> >>> Thanks >>> -Todd >>> >>> -- Todd Lipcon >>> Software Engineer, Cloudera >>> >>> ------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> hotspot-gc-use mailing list >>> hotspot-gc-use at openjdk.java.net >> hotspot-gc-use at openjdk.java.net> >>> >>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>> >>> >> >> >> >> >> -- >> Todd Lipcon >> Software Engineer, Cloudera >> >> >> ------------------------------------------------------------------------ >> >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> > -- Todd Lipcon Software Engineer, Cloudera -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100706/595fba5a/attachment.html From todd at cloudera.com Wed Jul 7 00:49:49 2010 From: todd at cloudera.com (Todd Lipcon) Date: Wed, 7 Jul 2010 00:49:49 -0700 Subject: G1GC Full GCs In-Reply-To: References: <4C339B7B.4000807@oracle.com> <4C33C2EC.9090105@oracle.com> Message-ID: > It would >> be instructive to see what the comparable CMS logs look like. >> May be then you could start with the same heap shapes for the >> two and see if you can get to the bottom of the full gc (which >> as i understand you get to more quickly w/G1 than you did >> w/CMS). >> > > Yep, I'll kick off some CMS tests this evening and get back to you with > those logs. > > The CMS logs are up at: http://cloudera-todd.s3.amazonaws.com/cms-log.txt The actual application activity starts at 00:15:50 or so - this is when I started up the benchmark clients. Before that, the process was essentially idle. If it would be helpful to rerun with more verbose options like tenuring output, I can do so. Also happy to provide heap dumps - this is just a benchmark with artificial data. Thanks -Todd -- Todd Lipcon Software Engineer, Cloudera -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100707/0dbebbb4/attachment.html From todd at cloudera.com Wed Jul 7 08:45:39 2010 From: todd at cloudera.com (Todd Lipcon) Date: Wed, 7 Jul 2010 08:45:39 -0700 Subject: G1GC Full GCs In-Reply-To: References: <4C339B7B.4000807@oracle.com> <4C33C2EC.9090105@oracle.com> Message-ID: On Wed, Jul 7, 2010 at 12:49 AM, Todd Lipcon wrote: > > It would >>> be instructive to see what the comparable CMS logs look like. >>> May be then you could start with the same heap shapes for the >>> two and see if you can get to the bottom of the full gc (which >>> as i understand you get to more quickly w/G1 than you did >>> w/CMS). >>> >> >> Yep, I'll kick off some CMS tests this evening and get back to you with >> those logs. >> >> > The CMS logs are up at: > > http://cloudera-todd.s3.amazonaws.com/cms-log.txt > > The actual application activity starts at 00:15:50 or so - this is when I > started up the benchmark clients. Before that, the process was essentially > idle. If it would be helpful to rerun with more verbose options like > tenuring output, I can do so. Also happy to provide heap dumps - this is > just a benchmark with artificial data. > Overnight I saw one "concurrent mode failure". Here's the section of logs leading up to it: 2010-07-07T07:56:26.953-0700: 28489.370: [CMS-concurrent-mark: 1.880/2.253 secs] [Times: user=16.08 sys=0.65, real=2.25 secs] 2010-07-07T07:56:26.953-0700: 28489.370: [CMS-concurrent-preclean-start] 2010-07-07T07:56:26.978-0700: 28489.395: [GC 28489.395: [ParNew: 59004K->6528K(59008K), 0.0186670 secs] 6371506K->6323755K(8382080K), 0.0187590 secs] [Times: user=0.16 sys=0.01, real=0.02 secs] 2010-07-07T07:56:27.060-0700: 28489.477: [GC 28489.477: [ParNew: 59008K->6527K(59008K), 0.0169360 secs] 6376235K->6330593K(8382080K), 0.0170350 secs] [Times: user=0.17 sys=0.00, real=0.02 secs] 2010-07-07T07:56:27.141-0700: 28489.558: [GC 28489.558: [ParNew: 57956K->6528K(59008K), 0.0184100 secs] 6382021K->6337204K(8382080K), 0.0185230 secs] [Times: user=0.17 sys=0.00, real=0.01 secs] 2010-07-07T07:56:27.218-0700: 28489.634: [GC 28489.634: [ParNew: 58929K->6528K(59008K), 0.2090130 secs] 6389605K->6345481K(8382080K), 0.2091010 secs] [Times: user=2.28 sys=0.01, real=0.21 secs] 2010-07-07T07:56:27.478-0700: 28489.894: [GC 28489.895: [ParNew: 58968K->6528K(59008K), 0.0175530 secs] 6397922K->6352489K(8382080K), 0.0176710 secs] [Times: user=0.17 sys=0.00, real=0.02 secs] 2010-07-07T07:56:27.546-0700: 28489.963: [GC 28489.963: [ParNew: 59004K->6528K(59008K), 0.0163740 secs] 6404965K->6357350K(8382080K), 0.0164930 secs] [Times: user=0.15 sys=0.00, real=0.02 secs] 2010-07-07T07:56:27.613-0700: 28490.030: [GC 28490.030: [ParNew: 58967K->6528K(59008K), 0.0234590 secs] 6409789K->6361980K(8382080K), 0.0235860 secs] [Times: user=0.22 sys=0.00, real=0.02 secs] 2010-07-07T07:56:27.696-0700: 28490.113: [GC 28490.113: [ParNew: 58974K->6528K(59008K), 0.0132980 secs] 6414427K->6364893K(8382080K), 0.0133890 secs] [Times: user=0.15 sys=0.01, real=0.01 secs] 2010-07-07T07:56:27.786-0700: 28490.203: [GC 28490.203: [ParNew (promotion failed): 59008K->59008K(59008K), 0.0179250 secs]28490.221: [CMS2010-07-07T07:56:27.901-0700: 28490.317: [CMS-concurrent-preclean: 0.556/0.947 secs] [Times: user=5.76 sys=0.26, real=0.95 secs] (concurrent mode failure): 6359176K->4206871K(8323072K), 17.4366220 secs] 6417373K->4206871K(8382080K), [CMS Perm : 18609K->18565K(31048K)], 17.4546890 secs] [Times: user=11.17 sys=0.09, real=17.45 secs] I've interpreted pauses like this as being caused by fragmentation, since the young gen is 64M, and the old gen here has about 2G free. If there's something I'm not understanding about CMS, and I can tune it more smartly to avoid these longer pauses, I'm happy to try. Thanks -Todd -- Todd Lipcon Software Engineer, Cloudera -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100707/fc8977df/attachment-0001.html From y.s.ramakrishna at oracle.com Wed Jul 7 11:28:56 2010 From: y.s.ramakrishna at oracle.com (Y. S. Ramakrishna) Date: Wed, 07 Jul 2010 11:28:56 -0700 Subject: G1GC Full GCs In-Reply-To: References: <4C339B7B.4000807@oracle.com> <4C33C2EC.9090105@oracle.com> Message-ID: <4C34C768.5050204@oracle.com> On 07/07/10 08:45, Todd Lipcon wrote: ... > > Overnight I saw one "concurrent mode failure". ... > 2010-07-07T07:56:27.786-0700: 28490.203: [GC 28490.203: [ParNew > (promotion failed): 59008K->59008K(59008K), 0.0179250 secs]28490.221: > [CMS2010-07-07T07:56:27.901-0700: 28490.317: [CMS-concurrent-preclean: > 0.556/0.947 secs] [Times: > user=5.76 sys=0.26, real=0.95 secs] > (concurrent mode failure): 6359176K->4206871K(8323072K), 17.4366220 > secs] 6417373K->4206871K(8382080K), [CMS Perm : 18609K->18565K(31048K)], > 17.4546890 secs] [Times: user=11.17 sys=0.09, real=17.45 secs] > > I've interpreted pauses like this as being caused by fragmentation, > since the young gen is 64M, and the old gen here has about 2G free. If > there's something I'm not understanding about CMS, and I can tune it > more smartly to avoid these longer pauses, I'm happy to try. Yes the old gen must be fragmented. I'll look at the data you have made available (for CMS). The CMS log you uploaded does not have the suffix leading into the concurrent mode failure ypu display above (it stops less than 2500 s into the run). If you could include the entire log leading into the concurrent mode failures, it would be a great help. Do you have large arrays in your application? The shape of the promotion graph for CMS is somewhat jagged, indicating _perhaps_ that. Yes, +PrintTenuringDistribution would shed a bit more light. As regards fragmentation, it can be tricky to tune against, but we can try once we understand a bit more about the object sizes and demographics. I am sure you don't have an easily shared test case, so we can reproduce both the CMS fragmentation and the G1 full gc issues locally for quickest progress on this? -- ramki From todd at cloudera.com Wed Jul 7 11:56:46 2010 From: todd at cloudera.com (Todd Lipcon) Date: Wed, 7 Jul 2010 11:56:46 -0700 Subject: G1GC Full GCs In-Reply-To: <4C34C768.5050204@oracle.com> References: <4C339B7B.4000807@oracle.com> <4C33C2EC.9090105@oracle.com> <4C34C768.5050204@oracle.com> Message-ID: On Wed, Jul 7, 2010 at 11:28 AM, Y. S. Ramakrishna < y.s.ramakrishna at oracle.com> wrote: > > > On 07/07/10 08:45, Todd Lipcon wrote: > ... > > >> Overnight I saw one "concurrent mode failure". >> > ... > > 2010-07-07T07:56:27.786-0700: 28490.203: [GC 28490.203: [ParNew (promotion >> failed): 59008K->59008K(59008K), 0.0179250 secs]28490.221: >> [CMS2010-07-07T07:56:27.901-0700: 28490.317: [CMS-concurrent-preclean: >> 0.556/0.947 secs] [Times: >> user=5.76 sys=0.26, real=0.95 secs] (concurrent mode failure): >> 6359176K->4206871K(8323072K), 17.4366220 secs] 6417373K->4206871K(8382080K), >> [CMS Perm : 18609K->18565K(31048K)], 17.4546890 secs] [Times: user=11.17 >> sys=0.09, real=17.45 secs] >> I've interpreted pauses like this as being caused by fragmentation, since >> the young gen is 64M, and the old gen here has about 2G free. If there's >> something I'm not understanding about CMS, and I can tune it more smartly to >> avoid these longer pauses, I'm happy to try. >> > > Yes the old gen must be fragmented. I'll look at the data you have > made available (for CMS). The CMS log you uploaded does not have the > suffix leading into the concurrent mode failure ypu display above > (it stops less than 2500 s into the run). If you could include > the entire log leading into the concurrent mode failures, it would > be a great help. Just uploaded the full log from the entire 11-hour run, all the way up through the 218-second GC pause which caused the server to get kicked out of the cluster (since it stopped heartbeating to the master) http://cloudera-todd.s3.amazonaws.com/cms-full-gc-log.txt.gz Do you have large arrays in your > application? The primary heap consumers in the application are: - RPC buffers - in this case I'm configured for 40 RPC handlers, each of which is usually handling a byte[] around 2-3MB for a "put". These buffers then get passed along into the memstore: - Memstore - this is allocated 40% of the heap, and it's made up of some hundreds of separate ConcurrentSkipListMaps. The values of the map are small objects which contain offsets into to the byte[]s passed in above. So, typically this is about 2GB of heap, corresponding to around a million of the offset containers, and maybe 100 thousand of the actual byte arrays. These memstores are always being "flushed" to disk (basically we take one of the maps and write it out, then drop references to the map to let GC free up memory) - LRU block cache - this is a large ConcurrentHashMap, where a CachedBlock is basically a wrapper for a ByteBuffer. These ByteBuffers represent around 64KB each. Typically this is allocated 20% of the heap, so on the order of 20,000 entries in the map here. Eviction is done by manually accounting heap usage, and when it gets too high, we remove blocks from the cache. So to answer your question simply: there shouldn't be any byte arrays floating around larger than 2MB, though there are a fair number at that size and a fair number at 64KB. Can I use jmap or another program to do any useful analysis? > The shape of the promotion graph for CMS is somewhat > jagged, indicating _perhaps_ that. Yes, +PrintTenuringDistribution > would shed a bit more light. I'll restart the test with this option on and collect some more logs for you guys. > As regards fragmentation, it can be > tricky to tune against, but we can try once we understand a bit > more about the object sizes and demographics. > > I am sure you don't have an easily shared test case, so we > can reproduce both the CMS fragmentation and the G1 full gc > issues locally for quickest progress on this? > > Well, the project itself is open source, but to really get serious load going into it you need beefy machines/disks. I'm running my tests on a 5-node cluster of dual quad core Nehalems, 24G RAM, 12 disks each. I can try to set up a mocked workload (eg skip actual disk IO) from the same codebase, but it would be a fair bit of work and I don't think I can get to it this month (leaving for vacation next week) If it's useful to look at the source, here are some pointers to the relevant RAM consumers: Cache: http://github.com/apache/hbase/blob/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/LruBlockCache.java MemStore: http://github.com/apache/hbase/blob/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java Wrapper class held by memstore: http://github.com/apache/hbase/blob/trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java The class used by RPC to receive "Put" requests: http://github.com/apache/hbase/blob/trunk/src/main/java/org/apache/hadoop/hbase/client/Put.java Thanks again for all the help, it's much appreciated. -Todd -- Todd Lipcon Software Engineer, Cloudera -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100707/302a5d04/attachment.html From y.s.ramakrishna at oracle.com Wed Jul 7 17:09:13 2010 From: y.s.ramakrishna at oracle.com (Y. S. Ramakrishna) Date: Wed, 07 Jul 2010 17:09:13 -0700 Subject: G1GC Full GCs In-Reply-To: References: <4C339B7B.4000807@oracle.com> <4C33C2EC.9090105@oracle.com> <4C34C768.5050204@oracle.com> Message-ID: <4C351729.9080706@oracle.com> I plotted the heap used _after_ each gc (scavenge or full) etc., attached; and if you stand away from the plot, tilt your head to the right and squint at the plot, you'll see what looks, at least to me, like a slow leak. (The tell-tale slowly-rising lower envelope of your carrier-wave, if you will pardon a telecom term.) Leaks can of course exacerbate fragmentation in non-moving collectors such as CMS, but also possibly in regionalized lazily evacuating collectors such as G1. Perhaps jmap -histo:live or +PrintClassHistogram[AfterFullGC] will help you get to the bottom of yr leak. Once the leak is plugged perhaps we could come back to the G1 tuning effort? (We have some guesses as to what might be happening and the best G1 minds are chewing on the info you provided so far, for which thanks!) -- ramki On 07/07/10 11:56, Todd Lipcon wrote: > On Wed, Jul 7, 2010 at 11:28 AM, Y. S. Ramakrishna > > wrote: > > > > On 07/07/10 08:45, Todd Lipcon wrote: > ... > > > Overnight I saw one "concurrent mode failure". > > ... > > 2010-07-07T07:56:27.786-0700: 28490.203: [GC 28490.203: [ParNew > (promotion failed): 59008K->59008K(59008K), 0.0179250 > secs]28490.221: [CMS2010-07-07T07:56:27.901-0700: 28490.317: > [CMS-concurrent-preclean: 0.556/0.947 secs] [Times: > user=5.76 sys=0.26, real=0.95 secs] (concurrent mode failure): > 6359176K->4206871K(8323072K), 17.4366220 secs] > 6417373K->4206871K(8382080K), [CMS Perm : > 18609K->18565K(31048K)], 17.4546890 secs] [Times: user=11.17 > sys=0.09, real=17.45 secs] > I've interpreted pauses like this as being caused by > fragmentation, since the young gen is 64M, and the old gen here > has about 2G free. If there's something I'm not understanding > about CMS, and I can tune it more smartly to avoid these longer > pauses, I'm happy to try. > > > Yes the old gen must be fragmented. I'll look at the data you have > made available (for CMS). The CMS log you uploaded does not have the > suffix leading into the concurrent mode failure ypu display above > (it stops less than 2500 s into the run). If you could include > the entire log leading into the concurrent mode failures, it would > be a great help. > > > Just uploaded the full log from the entire 11-hour run, all the way up > through the 218-second GC pause which caused the server to get kicked > out of the cluster (since it stopped heartbeating to the master) > > http://cloudera-todd.s3.amazonaws.com/cms-full-gc-log.txt.gz > > > Do you have large arrays in your > application? > > > The primary heap consumers in the application are: > - RPC buffers - in this case I'm configured for 40 RPC handlers, each of > which is usually handling a byte[] around 2-3MB for a "put". These > buffers then get passed along into the memstore: > - Memstore - this is allocated 40% of the heap, and it's made up of some > hundreds of separate ConcurrentSkipListMaps. The values of the map are > small objects which contain offsets into to the byte[]s passed in above. > So, typically this is about 2GB of heap, corresponding to around a > million of the offset containers, and maybe 100 thousand of the actual > byte arrays. > > These memstores are always being "flushed" to disk (basically we take > one of the maps and write it out, then drop references to the map to let > GC free up memory) > > - LRU block cache - this is a large > ConcurrentHashMap, where a CachedBlock is basically > a wrapper for a ByteBuffer. These ByteBuffers represent around 64KB > each. Typically this is allocated 20% of the heap, so on the order of > 20,000 entries in the map here. > > Eviction is done by manually accounting heap usage, and when it gets too > high, we remove blocks from the cache. > > So to answer your question simply: there shouldn't be any byte arrays > floating around larger than 2MB, though there are a fair number at that > size and a fair number at 64KB. Can I use jmap or another program to do > any useful analysis? > > > The shape of the promotion graph for CMS is somewhat > jagged, indicating _perhaps_ that. Yes, +PrintTenuringDistribution > would shed a bit more light. > > > I'll restart the test with this option on and collect some more logs for > you guys. > > > As regards fragmentation, it can be > tricky to tune against, but we can try once we understand a bit > more about the object sizes and demographics. > > I am sure you don't have an easily shared test case, so we > can reproduce both the CMS fragmentation and the G1 full gc > issues locally for quickest progress on this? > > Well, the project itself is open source, but to really get serious load > going into it you need beefy machines/disks. I'm running my tests on a > 5-node cluster of dual quad core Nehalems, 24G RAM, 12 disks each. I can > try to set up a mocked workload (eg skip actual disk IO) from the same > codebase, but it would be a fair bit of work and I don't think I can get > to it this month (leaving for vacation next week) > > If it's useful to look at the source, here are some pointers to the > relevant RAM consumers: > > Cache: > http://github.com/apache/hbase/blob/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/LruBlockCache.java > > MemStore: > http://github.com/apache/hbase/blob/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java > > Wrapper class held by memstore: > http://github.com/apache/hbase/blob/trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java > > The class used by RPC to receive "Put" requests: > http://github.com/apache/hbase/blob/trunk/src/main/java/org/apache/hadoop/hbase/client/Put.java > > Thanks again for all the help, it's much appreciated. > -Todd > -- > Todd Lipcon > Software Engineer, Cloudera > > > ------------------------------------------------------------------------ > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- A non-text attachment was scrubbed... Name: cms-full-useda.gif Type: image/gif Size: 50693 bytes Desc: not available Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100707/86cfe7f5/attachment-0001.gif From todd at cloudera.com Wed Jul 7 17:18:29 2010 From: todd at cloudera.com (Todd Lipcon) Date: Wed, 7 Jul 2010 17:18:29 -0700 Subject: G1GC Full GCs In-Reply-To: <4C351729.9080706@oracle.com> References: <4C339B7B.4000807@oracle.com> <4C33C2EC.9090105@oracle.com> <4C34C768.5050204@oracle.com> <4C351729.9080706@oracle.com> Message-ID: On Wed, Jul 7, 2010 at 5:09 PM, Y. S. Ramakrishna < y.s.ramakrishna at oracle.com> wrote: > > I plotted the heap used _after_ each gc (scavenge or full) etc., > attached; and if you stand away from the plot, tilt your head to the right > and > squint at the plot, you'll see what looks, at least to me, like a slow > leak. > (The tell-tale slowly-rising lower envelope of your carrier-wave, > if you will pardon a telecom term.) Leaks can of course exacerbate > fragmentation in non-moving collectors such as CMS, but also possibly > in regionalized lazily evacuating collectors such as G1. > Hi Ramki, Looking at the graph you attached, it appears that the low-water mark stabilizes at somewhere between 4.5G and 5G. The configuration I'm running is to allocate 40% of the heap to Memstore and 20% of the heap to the LRU cache. For an 8G heap, this is 4.8GB. So, for this application it's somewhat expected that, as it runs, it will accumulate more and more data until it reaches this threshold. The data is, of course, not *permanent*, but it's reasonably long-lived, so it makes sense to me that it should go into the old generation. If you like, I can tune down those percentages to 20/20 instead of 20/40, and I think we'll see the same pattern, just stabilized around 3.2GB. This will probably delay the full GCs, but still eventually hit them. It's also way lower than we can really go - customers won't like "throwing away" 60% of the allocated heap to GC! > > Perhaps jmap -histo:live or +PrintClassHistogram[AfterFullGC] > will help you get to the bottom of yr leak. Once the leak is plugged > perhaps we could come back to the G1 tuning effort? (We have some > guesses as to what might be happening and the best G1 minds are > chewing on the info you provided so far, for which thanks!) > I can try running with those options and see what I see, but I've already spent some time looking at heap dumps, and not found any leaks, so I'm pretty sure it's not the issue. -Todd > > On 07/07/10 11:56, Todd Lipcon wrote: > >> On Wed, Jul 7, 2010 at 11:28 AM, Y. S. Ramakrishna < >> y.s.ramakrishna at oracle.com > wrote: >> >> >> >> On 07/07/10 08:45, Todd Lipcon wrote: >> ... >> >> >> Overnight I saw one "concurrent mode failure". >> >> ... >> >> 2010-07-07T07:56:27.786-0700: 28490.203: [GC 28490.203: [ParNew >> (promotion failed): 59008K->59008K(59008K), 0.0179250 >> secs]28490.221: [CMS2010-07-07T07:56:27.901-0700: 28490.317: >> [CMS-concurrent-preclean: 0.556/0.947 secs] [Times: >> user=5.76 sys=0.26, real=0.95 secs] (concurrent mode failure): >> 6359176K->4206871K(8323072K), 17.4366220 secs] >> 6417373K->4206871K(8382080K), [CMS Perm : >> 18609K->18565K(31048K)], 17.4546890 secs] [Times: user=11.17 >> sys=0.09, real=17.45 secs] >> I've interpreted pauses like this as being caused by >> fragmentation, since the young gen is 64M, and the old gen here >> has about 2G free. If there's something I'm not understanding >> about CMS, and I can tune it more smartly to avoid these longer >> pauses, I'm happy to try. >> >> >> Yes the old gen must be fragmented. I'll look at the data you have >> made available (for CMS). The CMS log you uploaded does not have the >> suffix leading into the concurrent mode failure ypu display above >> (it stops less than 2500 s into the run). If you could include >> the entire log leading into the concurrent mode failures, it would >> be a great help. >> >> Just uploaded the full log from the entire 11-hour run, all the way up >> through the 218-second GC pause which caused the server to get kicked out of >> the cluster (since it stopped heartbeating to the master) >> >> http://cloudera-todd.s3.amazonaws.com/cms-full-gc-log.txt.gz >> >> >> Do you have large arrays in your >> application? >> >> The primary heap consumers in the application are: >> - RPC buffers - in this case I'm configured for 40 RPC handlers, each of >> which is usually handling a byte[] around 2-3MB for a "put". These buffers >> then get passed along into the memstore: >> - Memstore - this is allocated 40% of the heap, and it's made up of some >> hundreds of separate ConcurrentSkipListMaps. The values of the map are small >> objects which contain offsets into to the byte[]s passed in above. So, >> typically this is about 2GB of heap, corresponding to around a million of >> the offset containers, and maybe 100 thousand of the actual byte arrays. >> >> These memstores are always being "flushed" to disk (basically we take one >> of the maps and write it out, then drop references to the map to let GC free >> up memory) >> >> - LRU block cache - this is a large ConcurrentHashMap, >> where a CachedBlock is basically a wrapper for a ByteBuffer. These >> ByteBuffers represent around 64KB each. Typically this is allocated 20% of >> the heap, so on the order of 20,000 entries in the map here. >> >> Eviction is done by manually accounting heap usage, and when it gets too >> high, we remove blocks from the cache. >> >> So to answer your question simply: there shouldn't be any byte arrays >> floating around larger than 2MB, though there are a fair number at that size >> and a fair number at 64KB. Can I use jmap or another program to do any >> useful analysis? >> >> The shape of the promotion graph for CMS is somewhat >> jagged, indicating _perhaps_ that. Yes, +PrintTenuringDistribution >> would shed a bit more light. >> >> I'll restart the test with this option on and collect some more logs for >> you guys. >> >> As regards fragmentation, it can be >> tricky to tune against, but we can try once we understand a bit >> more about the object sizes and demographics. >> >> I am sure you don't have an easily shared test case, so we >> can reproduce both the CMS fragmentation and the G1 full gc >> issues locally for quickest progress on this? >> >> Well, the project itself is open source, but to really get serious load >> going into it you need beefy machines/disks. I'm running my tests on a >> 5-node cluster of dual quad core Nehalems, 24G RAM, 12 disks each. I can try >> to set up a mocked workload (eg skip actual disk IO) from the same codebase, >> but it would be a fair bit of work and I don't think I can get to it this >> month (leaving for vacation next week) >> >> If it's useful to look at the source, here are some pointers to the >> relevant RAM consumers: >> >> Cache: >> >> http://github.com/apache/hbase/blob/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/LruBlockCache.java >> >> MemStore: >> >> http://github.com/apache/hbase/blob/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java >> >> Wrapper class held by memstore: >> >> http://github.com/apache/hbase/blob/trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java >> >> The class used by RPC to receive "Put" requests: >> >> http://github.com/apache/hbase/blob/trunk/src/main/java/org/apache/hadoop/hbase/client/Put.java >> >> Thanks again for all the help, it's much appreciated. >> -Todd >> -- >> Todd Lipcon >> Software Engineer, Cloudera >> >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> > -- Todd Lipcon Software Engineer, Cloudera -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100707/5f462dca/attachment.html From y.s.ramakrishna at oracle.com Wed Jul 7 17:26:39 2010 From: y.s.ramakrishna at oracle.com (Y. S. Ramakrishna) Date: Wed, 07 Jul 2010 17:26:39 -0700 Subject: G1GC Full GCs In-Reply-To: References: <4C339B7B.4000807@oracle.com> <4C33C2EC.9090105@oracle.com> <4C34C768.5050204@oracle.com> <4C351729.9080706@oracle.com> Message-ID: <4C351B3F.9070104@oracle.com> On 07/07/10 17:18, Todd Lipcon wrote: ... > Looking at the graph you attached, it appears that the low-water mark > stabilizes at somewhere between 4.5G and 5G. The configuration I'm > running is to allocate 40% of the heap to Memstore and 20% of the heap > to the LRU cache. For an 8G heap, this is 4.8GB. So, for this > application it's somewhat expected that, as it runs, it will accumulate > more and more data until it reaches this threshold. The data is, of > course, not *permanent*, but it's reasonably long-lived, so it makes > sense to me that it should go into the old generation. Ah, i see. In that case, i think you could try using a slightly larger old gen. If the old gen stabilizes at 4.2 GB, we should allow as much for slop. i.e. make the old gen 8.4 GB (or whatever is the measured stable old gen occupancy), then add to that the young gen size, and use that for the whole heap. I would be even more aggressive and grant more to the old gen -- as i said earlier perhaps double the old gen from its present size. If that doesn;t work we know that something is amiss in the way we are going at this. If it works, we can iterate downwards from a config that we know works, down to what may be considered an acceptable space overhead for GC. > > If you like, I can tune down those percentages to 20/20 instead of > 20/40, and I think we'll see the same pattern, just stabilized around > 3.2GB. This will probably delay the full GCs, but still eventually hit > them. It's also way lower than we can really go - customers won't like > "throwing away" 60% of the allocated heap to GC! I understand that sentiment. I want us to get to a state where we are able to completely avoid the creeping fragmentation, if possible. There are other ways to tune for this, but they are more labour-intensive and tricky, and I would not want to go into that lightly. You might want to contact your Java support for help with that. > > > Perhaps jmap -histo:live or +PrintClassHistogram[AfterFullGC] > will help you get to the bottom of yr leak. Once the leak is plugged > perhaps we could come back to the G1 tuning effort? (We have some > guesses as to what might be happening and the best G1 minds are > chewing on the info you provided so far, for which thanks!) > > > I can try running with those options and see what I see, but I've > already spent some time looking at heap dumps, and not found any leaks, > so I'm pretty sure it's not the issue. OK, in that case it's not worth doing, since you've already ruled out leaks. I'll think some more about this meanwhile. thanks. -- ramki From todd at cloudera.com Wed Jul 7 17:32:38 2010 From: todd at cloudera.com (Todd Lipcon) Date: Wed, 7 Jul 2010 17:32:38 -0700 Subject: G1GC Full GCs In-Reply-To: <4C351B3F.9070104@oracle.com> References: <4C339B7B.4000807@oracle.com> <4C33C2EC.9090105@oracle.com> <4C34C768.5050204@oracle.com> <4C351729.9080706@oracle.com> <4C351B3F.9070104@oracle.com> Message-ID: On Wed, Jul 7, 2010 at 5:26 PM, Y. S. Ramakrishna < y.s.ramakrishna at oracle.com> wrote: > > > On 07/07/10 17:18, Todd Lipcon wrote: > ... > > Looking at the graph you attached, it appears that the low-water mark >> stabilizes at somewhere between 4.5G and 5G. The configuration I'm running >> is to allocate 40% of the heap to Memstore and 20% of the heap to the LRU >> cache. For an 8G heap, this is 4.8GB. So, for this application it's somewhat >> expected that, as it runs, it will accumulate more and more data until it >> reaches this threshold. The data is, of course, not *permanent*, but it's >> reasonably long-lived, so it makes sense to me that it should go into the >> old generation. >> > > Ah, i see. In that case, i think you could try using a slightly larger old > gen. If the old gen stabilizes at 4.2 GB, we should allow as much for slop. > i.e. make the old gen 8.4 GB (or whatever is the measured stable > old gen occupancy), then add to that the young gen size, and use > that for the whole heap. I would be even more aggressive > and grant more to the old gen -- as i said earlier perhaps > double the old gen from its present size. If that doesn;t work > we know that something is amiss in the way we are going at this. > If it works, we can iterate downwards from a config that we know > works, down to what may be considered an acceptable space overhead > for GC. > > OK, I can try some tests with cache configured for only 40% heap usage. Should I run these tests with CMS or G1? > > >> If you like, I can tune down those percentages to 20/20 instead of 20/40, >> and I think we'll see the same pattern, just stabilized around 3.2GB. This >> will probably delay the full GCs, but still eventually hit them. It's also >> way lower than we can really go - customers won't like "throwing away" 60% >> of the allocated heap to GC! >> > > I understand that sentiment. I want us to get to a state where we are able > to completely avoid the creeping fragmentation, if possible. There are > other ways to tune for this, but they are more labour-intensive and tricky, > and I would not want to go into that lightly. You might want to contact > your Java support for help with that. > > Yep, we've considered various solutions involving managing our own ref-counted slices of a single pre-allocated byte array - essentially writing our own slab allocator. In theory this should make all of the GCable objects constrained to a small number of sizes, and thus prevent fragmentation, but it's quite a project to undertake :) Regarding Java support, as an open source project we have no such luxury. Projects like HBase and Hadoop, though, are pretty visible to users as "big Java apps", so getting them working well on the GC front does good things for Java adoption in the database/distributed systems community, I think. -Todd -- Todd Lipcon Software Engineer, Cloudera -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100707/7eadf068/attachment-0001.html From y.s.ramakrishna at oracle.com Thu Jul 8 09:46:24 2010 From: y.s.ramakrishna at oracle.com (Y. S. Ramakrishna) Date: Thu, 08 Jul 2010 09:46:24 -0700 Subject: G1GC Full GCs In-Reply-To: References: <4C339B7B.4000807@oracle.com> <4C33C2EC.9090105@oracle.com> <4C34C768.5050204@oracle.com> <4C351729.9080706@oracle.com> <4C351B3F.9070104@oracle.com> Message-ID: <4C3600E0.6050409@oracle.com> On 07/07/10 17:32, Todd Lipcon wrote: ... > OK, I can try some tests with cache configured for only 40% heap usage. > Should I run these tests with CMS or G1? I'd first try CMS, and if that works, try G1. > > > > > If you like, I can tune down those percentages to 20/20 instead > of 20/40, and I think we'll see the same pattern, just > stabilized around 3.2GB. This will probably delay the full GCs, > but still eventually hit them. It's also way lower than we can > really go - customers won't like "throwing away" 60% of the > allocated heap to GC! > > > I understand that sentiment. I want us to get to a state where we > are able > to completely avoid the creeping fragmentation, if possible. There are > other ways to tune for this, but they are more labour-intensive and > tricky, > and I would not want to go into that lightly. You might want to contact > your Java support for help with that. > > > Yep, we've considered various solutions involving managing our own > ref-counted slices of a single pre-allocated byte array - essentially > writing our own slab allocator. In theory this should make all of the > GCable objects constrained to a small number of sizes, and thus prevent > fragmentation, but it's quite a project to undertake :) That would be overdoing it. I didn't mean anything so drastic and certainly nothing so drastic at the application level. When I said "labour intensive" I meant tuning GC to avoid that kind of fragmentation would be more work. > > Regarding Java support, as an open source project we have no such > luxury. Projects like HBase and Hadoop, though, are pretty visible to > users as "big Java apps", so getting them working well on the GC front > does good things for Java adoption in the database/distributed systems > community, I think. I agree, and we certainly should. -- ramki > > -Todd > > > -- > Todd Lipcon > Software Engineer, Cloudera > > > ------------------------------------------------------------------------ > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From todd at cloudera.com Fri Jul 9 14:53:36 2010 From: todd at cloudera.com (Todd Lipcon) Date: Fri, 9 Jul 2010 14:53:36 -0700 Subject: G1GC Full GCs In-Reply-To: <4C3600E0.6050409@oracle.com> References: <4C339B7B.4000807@oracle.com> <4C33C2EC.9090105@oracle.com> <4C34C768.5050204@oracle.com> <4C351729.9080706@oracle.com> <4C351B3F.9070104@oracle.com> <4C3600E0.6050409@oracle.com> Message-ID: On Thu, Jul 8, 2010 at 9:46 AM, Y. S. Ramakrishna < y.s.ramakrishna at oracle.com> wrote: > > > On 07/07/10 17:32, Todd Lipcon wrote: > ... > > OK, I can try some tests with cache configured for only 40% heap usage. >> Should I run these tests with CMS or G1? >> > > I'd first try CMS, and if that works, try G1. Here's a bzip2 of the full logs from another run I did with the same JDK7 build and CMS: http://cloudera-todd.s3.amazonaws.com/gc-cms-less-usage.txt.bz2 It took much longer to get to the concurrent mode failure, and the full pause was shorter (only 6 seconds), but I imagine if I kept it under load for a long time it would eventually result in the same long GC pause seen before. This was configured for 40% heap usage for the main consumers. The histogram in the logs shows: 1: 3300462 2374366648 [B 2: 5317607 170163424 org.apache.hadoop.hbase.KeyValue 3: 5200626 124815024 java.util.concurrent.ConcurrentSkipListMap$Node 4: 2594448 62266752 java.util.concurrent.ConcurrentSkipListMap$Index 5: 713 21190544 [J 6: 3534 10767416 [I 7: 710 10601456 [[B 8: 65025 9192480 [C as the top consumers. This works out to 2654MB, which is under the configured 3200M 40% mark. (the configuration is really a high-water, after which it will even reject writes before crossing) -Todd -- Todd Lipcon Software Engineer, Cloudera -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100709/2b898137/attachment.html From justin at techadvise.com Mon Jul 12 08:29:37 2010 From: justin at techadvise.com (Justin Ellison) Date: Mon, 12 Jul 2010 10:29:37 -0500 Subject: Any pointers for tuning 1.5.0_22 for a webapp with a large tenured gen? Message-ID: Hello everyone, We've recently upgraded our Weblogic webapp to Weblogic 9.2.1 which also involved upgrading from jdk 1.4.2 to 1.5.0_22. Java 6 isn't an option with Weblogic 9.2, and the upgrade to Weblogic 10 isn't until next year, so Java 6 is not an option for me at this point in time. I had the 1.4.2 tuned to the extreme: JAVA_ARGS=-Xms2816m -Xmx2816m -XX:NewSize=384m -XX:MaxNewSize=384m -XX:CompileThreshold=3000 -Djava.net.setSoTimeout=20000 \ -XX:LargePageSizeInBytes=4m -XX:+UseMPSS -Xss128k -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled \ -Xnoclassgc -XX:ParallelGCThreads=4 -XX:MaxTenuringThreshold=8 -XX:SurvivorRatio=6 -XX:+UseCMSCompactAtFullCollection -Xloggc:gc.out \ -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:MaxPermSize=92m With our upgrade, the ergonomics of the application changed some, so I just specified Xmx and Xms at 3GB, and let it run to see what happened. Remarkably, things were actually pretty good - minor GC's are faster, and major GC's only occur about once or twice an hour. However, those major GC's are taking from 12-20seconds, which I can't let our website users endure. Before I go down the road of tuning things, does anyone have any tips for me? I can afford at most 2 or 3 seconds of pause time at once, and would prefer to keep it under 2 seconds if possible. I need the 2GB of tenured to be able to cache all the objects that I need to ensure good site performance. The ergonomics look kinda cool, but I'm wondering if that large of tenured generation + my low pause requirement is just too much to ask from the throughput collector. Am I destined to go back to hand tweaking the CMS collector? Justin -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100712/68fe162d/attachment.html From peter.schuller at infidyne.com Mon Jul 12 09:02:34 2010 From: peter.schuller at infidyne.com (Peter Schuller) Date: Mon, 12 Jul 2010 18:02:34 +0200 Subject: G1GC Full GCs In-Reply-To: References: Message-ID: > Am I missing some tuning that should be done for G1GC for applications like > this? Is 20ms out of 80ms too aggressive a target for the garbage rates > we're generating? I have never run HBase, but in an LRU stress test (I posted about it a few months ago) I specifically observed remembered set scanning costs go way up. In addition I was seeing fallbacks to full GC:s recently in a slightly different test that I also posed about to -use, and that turned out to be a result of the estimated rset scanning costs being so high that regions were never selected for eviction even though they had very little live data. I would be very interested to hear if you're having the same problem. My last post on the topic is here: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2010-June/000652.html Including the link to the (throw-away) patch that should tell you whether this is what's happening: http://distfiles.scode.org/mlref/g1/g1_region_live_stats_hack.patch Out of personal curiosity I'd be very interested to hear whether this is what's happening to you (in a real reasonable use-case rather than a synthetic benchmark). My sense (and hotspot/g1 developers please smack me around if I am misrepresenting anything here) is that the effect I saw (with rset scanning costs) could cause perpetual memory grow (until fallback to full GC) in two ways: (1) The estimated (and possibly real) cost of rset scanning for a single region could be so high that it is never possible to select it for eviction given the asked for pause time goals. Hence, such a region effectively "leaks" until full GC. (2) The estimated (and possibly real) cost of rset scanning for regions may be so high that there are, in practice, always other regions selected for high pay-off/cost ratios, such that they end up never being collected even if theoretically a single region could be evicted within the pause time goal. These are effectively the same thing, with (1) being an extreme case of (2). In both cases, the effect should be mitigated (and have been in the case where I did my testing), but as far as I can tell not generally "fixed", by increasing the pause time goals. It is unclear to me how this is intended to be handled. The original g1 paper mentions an rset scanning thread that I may suspect would be intended to help do rset scanning in the background such that regions like these could be evicted more cheaply during the STW eviction pause; but I didn't find such a thread anywhere in the source code - but I may very well just be missing it. -- / Peter Schuller From todd at cloudera.com Mon Jul 12 09:09:49 2010 From: todd at cloudera.com (Todd Lipcon) Date: Mon, 12 Jul 2010 09:09:49 -0700 Subject: G1GC Full GCs In-Reply-To: References: Message-ID: Hi Peter, This sounds interesting, and plausible to me (though I have no clue about the codebase!) I'm leaving for a trip for the next two weeks tomorrow, though, so not sure I'll have a chance to try the patch before then. I'll certainly circle back on this towards the end of the month. Thanks again for all the continued help. -Todd On Mon, Jul 12, 2010 at 9:02 AM, Peter Schuller wrote: > > Am I missing some tuning that should be done for G1GC for applications > like > > this? Is 20ms out of 80ms too aggressive a target for the garbage rates > > we're generating? > > I have never run HBase, but in an LRU stress test (I posted about it a > few months ago) I specifically observed remembered set scanning costs > go way up. In addition I was seeing fallbacks to full GC:s recently in > a slightly different test that I also posed about to -use, and that > turned out to be a result of the estimated rset scanning costs being > so high that regions were never selected for eviction even though they > had very little live data. I would be very interested to hear if > you're having the same problem. My last post on the topic is here: > > > http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2010-June/000652.html > > Including the link to the (throw-away) patch that should tell you > whether this is what's happening: > > http://distfiles.scode.org/mlref/g1/g1_region_live_stats_hack.patch > > Out of personal curiosity I'd be very interested to hear whether this > is what's happening to you (in a real reasonable use-case rather than > a synthetic benchmark). > > My sense (and hotspot/g1 developers please smack me around if I am > misrepresenting anything here) is that the effect I saw (with rset > scanning costs) could cause perpetual memory grow (until fallback to > full GC) in two ways: > > (1) The estimated (and possibly real) cost of rset scanning for a > single region could be so high that it is never possible to select it > for eviction given the asked for pause time goals. Hence, such a > region effectively "leaks" until full GC. > > (2) The estimated (and possibly real) cost of rset scanning for > regions may be so high that there are, in practice, always other > regions selected for high pay-off/cost ratios, such that they end up > never being collected even if theoretically a single region could be > evicted within the pause time goal. > > These are effectively the same thing, with (1) being an extreme case of > (2). > > In both cases, the effect should be mitigated (and have been in the > case where I did my testing), but as far as I can tell not generally > "fixed", by increasing the pause time goals. > > It is unclear to me how this is intended to be handled. The original > g1 paper mentions an rset scanning thread that I may suspect would be > intended to help do rset scanning in the background such that regions > like these could be evicted more cheaply during the STW eviction > pause; but I didn't find such a thread anywhere in the source code - > but I may very well just be missing it. > > -- > / Peter Schuller > -- Todd Lipcon Software Engineer, Cloudera -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100712/31a11f4a/attachment.html From jon.masamitsu at oracle.com Mon Jul 12 09:22:51 2010 From: jon.masamitsu at oracle.com (Jon Masamitsu) Date: Mon, 12 Jul 2010 09:22:51 -0700 Subject: Any pointers for tuning 1.5.0_22 for a webapp with a large tenured gen? In-Reply-To: References: Message-ID: <4C3B415B.5020907@oracle.com> Justin, This alias is mainly for Openjdk issues and that would currently be jdk7. For help with tuning jdk5, check with your Sun/Oracle support contacts. They should be able to give you good tuning advice on jdk5. Jon On 7/12/10 8:29 AM, Justin Ellison wrote: > Hello everyone, > > We've recently upgraded our Weblogic webapp to Weblogic 9.2.1 which > also involved upgrading from jdk 1.4.2 to 1.5.0_22. Java 6 isn't an > option with Weblogic 9.2, and the upgrade to Weblogic 10 isn't until > next year, so Java 6 is not an option for me at this point in time. > > I had the 1.4.2 tuned to the extreme: > JAVA_ARGS=-Xms2816m -Xmx2816m -XX:NewSize=384m -XX:MaxNewSize=384m -XX:CompileThreshold=3000 -Djava.net.setSoTimeout=20000 \ > -XX:LargePageSizeInBytes=4m -XX:+UseMPSS -Xss128k -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled \ > > -Xnoclassgc -XX:ParallelGCThreads=4 -XX:MaxTenuringThreshold=8 -XX:SurvivorRatio=6 -XX:+UseCMSCompactAtFullCollection -Xloggc:gc.out \ > -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:MaxPermSize=92m > With our upgrade, the ergonomics of the application changed some, so I > just specified Xmx and Xms at 3GB, and let it run to see what > happened. Remarkably, things were actually pretty good - minor GC's > are faster, and major GC's only occur about once or twice an hour. > However, those major GC's are taking from 12-20seconds, which I can't > let our website users endure. > > Before I go down the road of tuning things, does anyone have any tips > for me? I can afford at most 2 or 3 seconds of pause time at once, > and would prefer to keep it under 2 seconds if possible. I need the > 2GB of tenured to be able to cache all the objects that I need to > ensure good site performance. > > The ergonomics look kinda cool, but I'm wondering if that large of > tenured generation + my low pause requirement is just too much to ask > from the throughput collector. Am I destined to go back to hand > tweaking the CMS collector? > > Justin > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100712/c3566fa6/attachment.html From y.s.ramakrishna at oracle.com Mon Jul 12 12:35:00 2010 From: y.s.ramakrishna at oracle.com (Y. S. Ramakrishna) Date: Mon, 12 Jul 2010 12:35:00 -0700 Subject: Any pointers for tuning 1.5.0_22 for a webapp with a large tenured gen? In-Reply-To: <4C3B415B.5020907@oracle.com> References: <4C3B415B.5020907@oracle.com> Message-ID: <4C3B6E64.4050300@oracle.com> ... >> I had the 1.4.2 tuned to the extreme: >> JAVA_ARGS=-Xms2816m -Xmx2816m -XX:NewSize=384m -XX:MaxNewSize=384m -XX:CompileThreshold=3000 -Djava.net.setSoTimeout=20000 \ >> -XX:LargePageSizeInBytes=4m -XX:+UseMPSS -Xss128k -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled \ >> >> -Xnoclassgc -XX:ParallelGCThreads=4 -XX:MaxTenuringThreshold=8 -XX:SurvivorRatio=6 -XX:+UseCMSCompactAtFullCollection -Xloggc:gc.out \ >> -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:MaxPermSize=92m >> With our upgrade, the ergonomics of the application changed some, so I >> just specified Xmx and Xms at 3GB, and let it run to see what >> happened. Remarkably, things were actually pretty good - minor GC's >> are faster, and major GC's only occur about once or twice an hour. >> However, those major GC's are taking from 12-20seconds, which I can't >> let our website users endure. I understand you were getting the parallel throughput collector. Try -XX:+UseParallelOldGC to do the mahor gc's multi-threaded and may be that will suffice to get you within your GC pause threshold. >> >> Before I go down the road of tuning things, does anyone have any tips >> for me? I can afford at most 2 or 3 seconds of pause time at once, >> and would prefer to keep it under 2 seconds if possible. I need the >> 2GB of tenured to be able to cache all the objects that I need to >> ensure good site performance. >> >> The ergonomics look kinda cool, but I'm wondering if that large of >> tenured generation + my low pause requirement is just too much to ask >> from the throughput collector. Am I destined to go back to hand >> tweaking the CMS collector? If +UseParallelOldGC does not do the trick, you might have to do hand-tuning using either the throughput collector, or use CMS, and tune it -- some amount of hand-tuning of young gen size, survivor sizes and max tenuring thresholds is almost always necessary for CMS. Look at the JavaOne talk last year by Charlie Hunt and Tony Printezis for more tips (and/or attend the one they will be doing, i think, at the next JavaOne ;-). -- ramki From y.s.ramakrishna at oracle.com Mon Jul 12 12:43:47 2010 From: y.s.ramakrishna at oracle.com (Y. S. Ramakrishna) Date: Mon, 12 Jul 2010 12:43:47 -0700 Subject: G1GC Full GCs In-Reply-To: References: Message-ID: <4C3B7073.2020906@oracle.com> Hi Peter -- Yes, my guess was also that something (possibly along the lines you stated below) was preventing the selection of certain (sets of) regions for evacuation on a regular basis ... I am told there are flags that will allow you to get verbose details on what is or is not selected for inclusion in the collection set; perhaps that will help you get down to the bottom of this. Did you say you had a test case that showed this behaviour? Filing a bug with that test case may be the quickest way to get this before the right set of eyes. Over to the G1 cognoscenti. -- ramki On 07/12/10 09:02, Peter Schuller wrote: >> Am I missing some tuning that should be done for G1GC for applications like >> this? Is 20ms out of 80ms too aggressive a target for the garbage rates >> we're generating? > > I have never run HBase, but in an LRU stress test (I posted about it a > few months ago) I specifically observed remembered set scanning costs > go way up. In addition I was seeing fallbacks to full GC:s recently in > a slightly different test that I also posed about to -use, and that > turned out to be a result of the estimated rset scanning costs being > so high that regions were never selected for eviction even though they > had very little live data. I would be very interested to hear if > you're having the same problem. My last post on the topic is here: > > http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2010-June/000652.html > > Including the link to the (throw-away) patch that should tell you > whether this is what's happening: > > http://distfiles.scode.org/mlref/g1/g1_region_live_stats_hack.patch > > Out of personal curiosity I'd be very interested to hear whether this > is what's happening to you (in a real reasonable use-case rather than > a synthetic benchmark). > > My sense (and hotspot/g1 developers please smack me around if I am > misrepresenting anything here) is that the effect I saw (with rset > scanning costs) could cause perpetual memory grow (until fallback to > full GC) in two ways: > > (1) The estimated (and possibly real) cost of rset scanning for a > single region could be so high that it is never possible to select it > for eviction given the asked for pause time goals. Hence, such a > region effectively "leaks" until full GC. > > (2) The estimated (and possibly real) cost of rset scanning for > regions may be so high that there are, in practice, always other > regions selected for high pay-off/cost ratios, such that they end up > never being collected even if theoretically a single region could be > evicted within the pause time goal. > > These are effectively the same thing, with (1) being an extreme case of (2). > > In both cases, the effect should be mitigated (and have been in the > case where I did my testing), but as far as I can tell not generally > "fixed", by increasing the pause time goals. > > It is unclear to me how this is intended to be handled. The original > g1 paper mentions an rset scanning thread that I may suspect would be > intended to help do rset scanning in the background such that regions > like these could be evicted more cheaply during the STW eviction > pause; but I didn't find such a thread anywhere in the source code - > but I may very well just be missing it. > From tony.printezis at oracle.com Tue Jul 13 11:43:53 2010 From: tony.printezis at oracle.com (Tony Printezis) Date: Tue, 13 Jul 2010 14:43:53 -0400 Subject: G1GC Full GCs In-Reply-To: References: Message-ID: <4C3CB3E9.4040305@oracle.com> Peter and Todd, Any chance of setting +PrintHeapAtGC and -XX:+PrintHeapAtGCExtended and sending us the log, or part of it (say between two Full GCs)? Be prepared: this will generate piles of output. But it will give us per-region information that might shed more light on the cause of the issue.... thanks, Tony, HS GC Group Peter Schuller wrote: >> Am I missing some tuning that should be done for G1GC for applications like >> this? Is 20ms out of 80ms too aggressive a target for the garbage rates >> we're generating? >> > > I have never run HBase, but in an LRU stress test (I posted about it a > few months ago) I specifically observed remembered set scanning costs > go way up. In addition I was seeing fallbacks to full GC:s recently in > a slightly different test that I also posed about to -use, and that > turned out to be a result of the estimated rset scanning costs being > so high that regions were never selected for eviction even though they > had very little live data. I would be very interested to hear if > you're having the same problem. My last post on the topic is here: > > http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2010-June/000652.html > > Including the link to the (throw-away) patch that should tell you > whether this is what's happening: > > http://distfiles.scode.org/mlref/g1/g1_region_live_stats_hack.patch > > Out of personal curiosity I'd be very interested to hear whether this > is what's happening to you (in a real reasonable use-case rather than > a synthetic benchmark). > > My sense (and hotspot/g1 developers please smack me around if I am > misrepresenting anything here) is that the effect I saw (with rset > scanning costs) could cause perpetual memory grow (until fallback to > full GC) in two ways: > > (1) The estimated (and possibly real) cost of rset scanning for a > single region could be so high that it is never possible to select it > for eviction given the asked for pause time goals. Hence, such a > region effectively "leaks" until full GC. > > (2) The estimated (and possibly real) cost of rset scanning for > regions may be so high that there are, in practice, always other > regions selected for high pay-off/cost ratios, such that they end up > never being collected even if theoretically a single region could be > evicted within the pause time goal. > > These are effectively the same thing, with (1) being an extreme case of (2). > > In both cases, the effect should be mitigated (and have been in the > case where I did my testing), but as far as I can tell not generally > "fixed", by increasing the pause time goals. > > It is unclear to me how this is intended to be handled. The original > g1 paper mentions an rset scanning thread that I may suspect would be > intended to help do rset scanning in the background such that regions > like these could be evicted more cheaply during the STW eviction > pause; but I didn't find such a thread anywhere in the source code - > but I may very well just be missing it. > > From peter.schuller at infidyne.com Tue Jul 13 17:15:43 2010 From: peter.schuller at infidyne.com (Peter Schuller) Date: Wed, 14 Jul 2010 02:15:43 +0200 Subject: G1GC Full GCs In-Reply-To: <4C3CB3E9.4040305@oracle.com> References: <4C3CB3E9.4040305@oracle.com> Message-ID: Ramki/Tony, > Any chance of setting +PrintHeapAtGC and -XX:+PrintHeapAtGCExtended and > sending us the log, or part of it (say between two Full GCs)? Be prepared: > this will generate piles of output. But it will give us per-region > information that might shed more light on the cause of the issue.... thanks, So what I have in terms of data is (see footnotes for urls references in []): (a) A patch[1] that prints some additional information about estimated costs of region eviction, and disables the GC efficiency check that normally terminates selection of regions. (Note: This is a throw-away patch for debugging; it's not intended as a suggested change for inclusion.) (b) A log[2] showing the output of a test run I did just now, with both your flags above and my patch enabled (but without disabling the efficiency check). It shows fallback to full GC when the actual live set size is 252 MB, and the maximum heap size is 2 GB (in other words, ~ 12% liveness). An easy way to find the point of full gc is to search for the string 'full 1'. (c) A file[3] with the effective VM options during the test. (d) Instructions for how to run the test to reproduce it (I'll get to that at the end; it's simplified relative to previously). (e) Nature of the test. Discussion: WIth respect to region information: I originally tried it in response to your recommendation earlier, but I found I did not see the information I was after. Perhaps I was just misreading it, but I mostly just saw either 0% or 100% fullness, and never the actual liveness estimate as produced by the mark phase. In the log I am referring to in this E-Mail, you can see that the last printout of region information just before the live GC fits this pattern; I just don't see anything that looks like legitimate liveness information being printed. (I don't have time to dig back into it right now to double-check what it's printing.) If you scroll up from the point of the full gc until you find a bunch of output starting with "predict_region_elapsed_time_ms" you see some output resulting from the patch, with pretty extreme values such as: predict_region_elapsed_time_ms: 34.378642ms total, 34.021154ms rs scan (46542 cnum), 0.040069 copy time (20704 bytes), 0.317419 other time predict_region_elapsed_time_ms: 45.020866ms total, 44.653222ms rs scan (61087 cnum), 0.050225 copy time (25952 bytes), 0.317419 other time predict_region_elapsed_time_ms: 16.250033ms total, 15.887065ms rs scan (21734 cnum), 0.045550 copy time (23536 bytes), 0.317419 other time predict_region_elapsed_time_ms: 226.942877ms total, 226.559163ms rs scan (309940 cnum), 0.066296 copy time (34256 bytes), 0.317419 other time predict_region_elapsed_time_ms: 542.344828ms total, 541.954750ms rs scan (741411 cnum), 0.072659 copy time (37544 bytes), 0.317419 other time predict_region_elapsed_time_ms: 668.595597ms total, 668.220877ms rs scan (914147 cnum), 0.057301 copy time (29608 bytes), 0.317419 other time So in the most extreme case in the excerpt above, that's > half a second of estimate rset scanning time for a single region with 914147 cards to be scanned. While not all are that extreme, lots and lots of regions are very expensive and almost only due to rset scanning costs. If you scroll down a bit to the first (and ONLY) partial that happened after the statistics accumulating from the marking phase, we see more output resulting form the patch. At the end, we see: (picked region; 0.345890ms predicted; 1.317244ms remaining; 15kb marked; 15kb maxlive; 1-1% liveness) (393380 KB left in heap.) (picked region; 0.345963ms predicted; 0.971354ms remaining; 15kb marked; 15kb maxlive; 1-1% liveness) (393365 KB left in heap.) (picked region; 0.346036ms predicted; 0.625391ms remaining; 15kb marked; 15kb maxlive; 1-1% liveness) (393349 KB left in heap.) (no more marked regions; next region too expensive (adaptive; predicted 0.346036ms > remaining 0.279355ms)) So in other words, it picked a bunch of regions in order of "lowest hanging fruit". The *least* low hanging fruit picked still had liveness at 1%; in other words, there's plenty of further regions that ideally should be collected because they contain almost no garbage (ignoring the cost of collecting them). In this case, it stopped picking regions because the next region to be picked, though cheap, was the straw that broke the camel's neck and we simply exceeded the alloted time for this particular GC. However, after this partial completes, it reverts back to doing just young gc:s. In other words, even though there's *plenty* of regions with very low liveness, further partials aren't happening. By applying this part of the patch: - (adaptive_young_list_length() && + (adaptive_young_list_length() && false && // scodetodo I artificially force g1 to not fall back to doing young gc:s for efficiency reasons. When I run with that change, I don't experience the slow perpetual growth until fallback to full GC. If I remember correctly though, the rset scanning cost is in fact high, but I don't have details saved and I'm afraid I don't have time to re-run those tests right now and compare numbers. Reproducing it: I made some changes and the test case should now hopefully be easy to run assuming you have maven installed. The github project is at: http://github.com/scode/httpgctest There is a README, but the shortest possible instructions to re-produce the test that I did: git clone git://github.com/scode/httpgctest.git cd httpgctest.git git checkout 20100714_1 # grab from appropriate tag, in case I change master mvn package HTTPGCTEST_LOGGC=gc.log ./run.sh That should start the http server; then run concurrently: while [ 1 ] ; do curl 'http://localhost:9191/dropdata?ratio=0.10' ; curl 'http://localhost:9191/gendata?amount=25000' ; sleep 0.1 ; done And then just wait and observe. Nature of the test: So the test if run as above will essentially reach a steady state of equilibrium with about 25000 pieces of data in a clojure immutable map. The result is that a significant amount of new data is being allocated, but very little writing to old regions is happening. The garbage generated is very well spread out over the entire heap because it goes through all objects and drops 10% (the ratio=0.10) for each iteration, after which it adds 25000 new items. In other words; not a lot of old gen writing, but lots of writes to the young gen referencing objects in the old gen. [1] http://distfiles.scode.org/mlref/g1/g1_region_live_stats_hack.patch [2] http://distfiles.scode.org/mlref/gctest/httpgctest-g1-fullgc-20100714/gc-fullfallback.log [3] http://distfiles.scode.org/mlref/gctest/httpgctest-g1-fullgc-20100714/vmoptions.txt -- / Peter Schuller From eagle.kessler at gmail.com Wed Jul 21 10:09:25 2010 From: eagle.kessler at gmail.com (Eagle Kessler) Date: Wed, 21 Jul 2010 10:09:25 -0700 Subject: Unreasonably long ParNew, CMS pause times Message-ID: Hello all, I've got a web service that I'm seeing some very strange behavior on around it's garbage collection, and was wondering if someone here could explain why it might be happening. The service itself is fairly simple: Take in data from a few sources and merge them with the existing data in the database. It stores nearly no state while doing this, and indeed heap dumps taken 1, 24, and 72 hours after boot indicate that we have a consistent ~12mb of live data (in a 2GB heap, but I hope that's not what is causing this). The GC logs, though, don't look very happy at all. After our start up period, they settle into a fairly consistent pattern: 1041.159: [GC 1041.159: [ParNew Desired survivor size 2064384 bytes, new threshold 0 (max 0) : 516224K->0K(520256K), 0.0170322 secs] 537266K->22659K(2093120K), 0.0171330 secs] [Times: user=0.04 sys=0.01, real=0.02 secs] 1606.500: [GC 1606.500: [ParNew Desired survivor size 2064384 bytes, new threshold 0 (max 0) : 516224K->0K(520256K), 0.0173235 secs] 538883K->24214K(2093120K), 0.0174138 secs] [Times: user=0.04 sys=0.03, real=0.02 secs] 2040.773: [GC 2040.773: [ParNew Desired survivor size 2064384 bytes, new threshold 0 (max 0) : 516224K->0K(520256K), 0.0196319 secs] 540438K->25737K(2093120K), 0.0197275 secs] [Times: user=0.05 sys=0.02, real=0.02 secs] Which we would be very nice if it kept going like that. However, by the first time the CMS collector runs, things aren't working nearly as well: 214182.439: [GC 214182.439: [ParNew Desired survivor size 2064384 bytes, new threshold 0 (max 0) : 516224K->0K(520256K), 1.0146996 secs] 1297278K->782575K(2093120K), 1.0148799 secs] [Times: user=1.21 sys=0.58, real=1.01 secs] 214247.437: [GC 214247.438: [ParNew Desired survivor size 2064384 bytes, new threshold 0 (max 0) : 516224K->0K(520256K), 1.2310274 secs] 1298799K->784188K(2093120K), 1.2311534 secs] [Times: user=1.46 sys=0.69, real=1.23 secs] 214313.642: [GC 214313.642: [ParNew Desired survivor size 2064384 bytes, new threshold 0 (max 0) : 516224K->0K(520256K), 1.2183258 secs] 1300412K->785710K(2093120K), 1.2184848 secs] [Times: user=1.45 sys=0.65, real=1.22 secs] The increasing sys time is a bit worrying, but it seems like the actual GC time is rising as well, even though we aren't collecting any more young-gen garbage. At this point, CMS went off 214380.695: [GC [1 CMS-initial-mark: 787188K(1572864K)] 787195K(2093120K), 1.8929842 secs] [Times: user=1.50 sys=0.02, real=1.89 secs] 214382.589: [CMS-concurrent-mark-start] 214383.056: [CMS-concurrent-mark: 0.467/0.467 secs] [Times: user=1.81 sys=0.01, real=0.47 secs] 214383.056: [CMS-concurrent-preclean-start] 214383.064: [CMS-concurrent-preclean: 0.008/0.008 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] 214383.064: [CMS-concurrent-abortable-preclean-start] CMS: abort preclean due to time 214388.133: [CMS-concurrent-abortable-preclean: 0.242/5.069 secs] [Times: user=5.02 sys=0.02, real=5.07 secs] 214388.159: [GC[YG occupancy: 51943 K (520256 K)]214388.159: [Rescan (parallel) , 1.5403455 secs]214389.699: [weak refs processing, 0.0050170 secs] [1 CMS-remark: 787188K(1572864K)] 839132K(2093120K), 1.5458536 secs] [Times: user=1.80 sys=0.71, real=1.55 secs] 214389.705: [CMS-concurrent-sweep-start] 214390.439: [CMS-concurrent-sweep: 0.671/0.734 secs] [Times: user=1.35 sys=0.00, real=0.73 secs] 214390.439: [CMS-concurrent-reset-start] 214390.621: [CMS-concurrent-reset: 0.183/0.183 secs] [Times: user=0.20 sys=0.02, real=0.18 secs] It seems like a) initial-mark shouldn't take 1.8 seconds, b) if we really do only have 12mb of live data, CMS should have collected a lot more than it did (the next ParNew collection reported ~545MB of old gen in use), and c) 50% heap usage with very little promotion seems very early for the collector to go off. The next CMS cycle is at 434,973 seconds, by which point the young gen collections are taking 3 seconds (user 3.59, sys 1.60, real 3.09). The initial mark takes 4.82 seconds (user 3.82, sys 0.02, real 4.82), and sweeps down to 1.1gb of used old gen. I haven't yet confirmed it, but given the previous heap dumps I'd guess that they will claim 12mb of live objects and 1.1gb of dead objects. The current young gen collections (at 497,601 seconds) are taking ~3.7 seconds (4.33 user, 2.03 sys) Any idea what could be going on here? We're running JDK 1.6_16. -- Eagle Kessler -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100721/dbea6214/attachment.html From matt.fowles at gmail.com Wed Jul 21 11:09:02 2010 From: matt.fowles at gmail.com (Matt Fowles) Date: Wed, 21 Jul 2010 14:09:02 -0400 Subject: Unreasonably long ParNew, CMS pause times In-Reply-To: References: Message-ID: Eagle~ What JVM settings are you using? Can you attach a log with -XX:+PrintTenuringDistribution as well? Matt On Wed, Jul 21, 2010 at 1:09 PM, Eagle Kessler wrote: > Hello all, > I've got a web service that I'm seeing some very strange behavior on > around it's garbage collection, and was wondering if someone here could > explain why it might be happening. The service itself is fairly simple: Take > in data from a few sources and merge them with the existing data in the > database. It stores nearly no state while doing this, and indeed heap dumps > taken 1, 24, and 72 hours after boot indicate that we have a consistent > ~12mb of live data (in a 2GB heap, but I hope that's not what is causing > this). > > The GC logs, though, don't look very happy at all. After our start up > period, they settle into a fairly consistent pattern: > > 1041.159: [GC 1041.159: [ParNew > Desired survivor size 2064384 bytes, new threshold 0 (max 0) > > > : 516224K->0K(520256K), 0.0170322 secs] 537266K->22659K(2093120K), 0.0171330 secs] [Times: user=0.04 sys=0.01, real=0.02 secs] > 1606.500: [GC 1606.500: [ParNew > Desired survivor size 2064384 bytes, new threshold 0 (max 0) > > > : 516224K->0K(520256K), 0.0173235 secs] 538883K->24214K(2093120K), 0.0174138 secs] [Times: user=0.04 sys=0.03, real=0.02 secs] > 2040.773: [GC 2040.773: [ParNew > Desired survivor size 2064384 bytes, new threshold 0 (max 0) > > > : 516224K->0K(520256K), 0.0196319 secs] 540438K->25737K(2093120K), 0.0197275 secs] [Times: user=0.05 sys=0.02, real=0.02 secs] > > Which we would be very nice if it kept going like that. However, by the > first time the CMS collector runs, things aren't working nearly as well: > > 214182.439: [GC 214182.439: [ParNew > Desired survivor size 2064384 bytes, new threshold 0 (max 0) > : 516224K->0K(520256K), 1.0146996 secs] 1297278K->782575K(2093120K), 1.0148799 secs] [Times: user=1.21 sys=0.58, real=1.01 secs] > > > 214247.437: [GC 214247.438: [ParNew > Desired survivor size 2064384 bytes, new threshold 0 (max 0) > : 516224K->0K(520256K), 1.2310274 secs] 1298799K->784188K(2093120K), 1.2311534 secs] [Times: user=1.46 sys=0.69, real=1.23 secs] > > > 214313.642: [GC 214313.642: [ParNew > Desired survivor size 2064384 bytes, new threshold 0 (max 0) > : 516224K->0K(520256K), 1.2183258 secs] 1300412K->785710K(2093120K), 1.2184848 secs] [Times: user=1.45 sys=0.65, real=1.22 secs] > > The increasing sys time is a bit worrying, but it seems like the actual GC > time is rising as well, even though we aren't collecting any more young-gen > garbage. At this point, CMS went off > > 214380.695: [GC [1 CMS-initial-mark: 787188K(1572864K)] 787195K(2093120K), 1.8929842 secs] [Times: user=1.50 sys=0.02, real=1.89 secs] > > > 214382.589: [CMS-concurrent-mark-start] > 214383.056: [CMS-concurrent-mark: 0.467/0.467 secs] [Times: user=1.81 sys=0.01, real=0.47 secs] > 214383.056: [CMS-concurrent-preclean-start] > 214383.064: [CMS-concurrent-preclean: 0.008/0.008 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] > > > 214383.064: [CMS-concurrent-abortable-preclean-start] > CMS: abort preclean due to time 214388.133: [CMS-concurrent-abortable-preclean: 0.242/5.069 secs] [Times: user=5.02 sys=0.02, real=5.07 secs] > 214388.159: [GC[YG occupancy: 51943 K (520256 K)]214388.159: [Rescan (parallel) , 1.5403455 secs]214389.699: [weak refs processing, 0.0050170 secs] [1 CMS-remark: 787188K(1572864K)] 839132K(2093120K), 1.5458536 secs] [Times: user=1.80 sys=0.71, real=1.55 secs] > > > 214389.705: [CMS-concurrent-sweep-start] > 214390.439: [CMS-concurrent-sweep: 0.671/0.734 secs] [Times: user=1.35 sys=0.00, real=0.73 secs] > 214390.439: [CMS-concurrent-reset-start] > 214390.621: [CMS-concurrent-reset: 0.183/0.183 secs] [Times: user=0.20 sys=0.02, real=0.18 secs] > > It seems like a) initial-mark shouldn't take 1.8 seconds, b) if we really > do only have 12mb of live data, CMS should have collected a lot more than it > did (the next ParNew collection reported ~545MB of old gen in use), and c) > 50% heap usage with very little promotion seems very early for the collector > to go off. > > The next CMS cycle is at 434,973 seconds, by which point the young gen > collections are taking 3 seconds (user 3.59, sys 1.60, real 3.09). The > initial mark takes 4.82 seconds (user 3.82, sys 0.02, real 4.82), and sweeps > down to 1.1gb of used old gen. I haven't yet confirmed it, but given the > previous heap dumps I'd guess that they will claim 12mb of live objects and > 1.1gb of dead objects. The current young gen collections (at 497,601 > seconds) are taking ~3.7 seconds (4.33 user, 2.03 sys) Any idea what could > be going on here? We're running JDK 1.6_16. > > -- > Eagle Kessler > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100721/4018e577/attachment.html From y.s.ramakrishna at oracle.com Wed Jul 21 12:34:46 2010 From: y.s.ramakrishna at oracle.com (Y. S. Ramakrishna) Date: Wed, 21 Jul 2010 12:34:46 -0700 Subject: Unreasonably long ParNew, CMS pause times In-Reply-To: References: Message-ID: <4C474BD6.1060908@oracle.com> Yes, like what i think Matt is getting at, i'd configure sufficiently large survivor spaces. Even if you expect most of your objects to die young, you'd want survivor spaces large enough to keep at least age 1 objects in the survivor space. If as you state no medium- to ling-lived state is retained, your data is mostly short-lived and you'll be able to do without any promotion at all. Your problem here is that somehow your survivor spaces may have disappeared. (+PrintHeapAtGC will tell you, and of course examining yr JVM options should shed more light on that apparent disappearance.) -- ramki On 07/21/10 10:09, Eagle Kessler wrote: > Hello all, > I've got a web service that I'm seeing some very strange behavior on > around it's garbage collection, and was wondering if someone here could > explain why it might be happening. The service itself is fairly simple: > Take in data from a few sources and merge them with the existing data in > the database. It stores nearly no state while doing this, and indeed > heap dumps taken 1, 24, and 72 hours after boot indicate that we have a > consistent ~12mb of live data (in a 2GB heap, but I hope that's not what > is causing this). > > The GC logs, though, don't look very happy at all. After our start up > period, they settle into a fairly consistent pattern: > > 1041.159: [GC 1041.159: [ParNew > Desired survivor size 2064384 bytes, new threshold 0 (max 0) > > : 516224K->0K(520256K), 0.0170322 secs] 537266K->22659K(2093120K), 0.0171330 secs] [Times: user=0.04 sys=0.01, real=0.02 secs] > 1606.500: [GC 1606.500: [ParNew > Desired survivor size 2064384 bytes, new threshold 0 (max 0) > > : 516224K->0K(520256K), 0.0173235 secs] 538883K->24214K(2093120K), 0.0174138 secs] [Times: user=0.04 sys=0.03, real=0.02 secs] > 2040.773: [GC 2040.773: [ParNew > Desired survivor size 2064384 bytes, new threshold 0 (max 0) > > : 516224K->0K(520256K), 0.0196319 secs] 540438K->25737K(2093120K), 0.0197275 secs] [Times: user=0.05 sys=0.02, real=0.02 secs] > > Which we would be very nice if it kept going like that. However, by the > first time the CMS collector runs, things aren't working nearly as well: > > 214182.439: [GC 214182.439: [ParNew > Desired survivor size 2064384 bytes, new threshold 0 (max 0) > : 516224K->0K(520256K), 1.0146996 secs] 1297278K->782575K(2093120K), 1.0148799 secs] [Times: user=1.21 sys=0.58, real=1.01 secs] > > 214247.437: [GC 214247.438: [ParNew > Desired survivor size 2064384 bytes, new threshold 0 (max 0) > : 516224K->0K(520256K), 1.2310274 secs] 1298799K->784188K(2093120K), 1.2311534 secs] [Times: user=1.46 sys=0.69, real=1.23 secs] > > 214313.642: [GC 214313.642: [ParNew > Desired survivor size 2064384 bytes, new threshold 0 (max 0) > : 516224K->0K(520256K), 1.2183258 secs] 1300412K->785710K(2093120K), 1.2184848 secs] [Times: user=1.45 sys=0.65, real=1.22 secs] > > The increasing sys time is a bit worrying, but it seems like the actual > GC time is rising as well, even though we aren't collecting any more > young-gen garbage. At this point, CMS went off > > 214380.695: [GC [1 CMS-initial-mark: 787188K(1572864K)] 787195K(2093120K), 1.8929842 secs] [Times: user=1.50 sys=0.02, real=1.89 secs] > > 214382.589: [CMS-concurrent-mark-start] > 214383.056: [CMS-concurrent-mark: 0.467/0.467 secs] [Times: user=1.81 sys=0.01, real=0.47 secs] > 214383.056: [CMS-concurrent-preclean-start] > 214383.064: [CMS-concurrent-preclean: 0.008/0.008 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] > > 214383.064: [CMS-concurrent-abortable-preclean-start] > CMS: abort preclean due to time 214388.133: [CMS-concurrent-abortable-preclean: 0.242/5.069 secs] [Times: user=5.02 sys=0.02, real=5.07 secs] > 214388.159: [GC[YG occupancy: 51943 K (520256 K)]214388.159: [Rescan (parallel) , 1.5403455 secs]214389.699: [weak refs processing, 0.0050170 secs] [1 CMS-remark: 787188K(1572864K)] 839132K(2093120K), 1.5458536 secs] [Times: user=1.80 sys=0.71, real=1.55 secs] > > 214389.705: [CMS-concurrent-sweep-start] > 214390.439: [CMS-concurrent-sweep: 0.671/0.734 secs] [Times: user=1.35 sys=0.00, real=0.73 secs] > 214390.439: [CMS-concurrent-reset-start] > 214390.621: [CMS-concurrent-reset: 0.183/0.183 secs] [Times: user=0.20 sys=0.02, real=0.18 secs] > > It seems like a) initial-mark shouldn't take 1.8 seconds, b) if we > really do only have 12mb of live data, CMS should have collected a lot > more than it did (the next ParNew collection reported ~545MB of old gen > in use), and c) 50% heap usage with very little promotion seems very > early for the collector to go off. > > The next CMS cycle is at 434,973 seconds, by which point the young gen > collections are taking 3 seconds (user 3.59, sys 1.60, real 3.09). The > initial mark takes 4.82 seconds (user 3.82, sys 0.02, real 4.82), and > sweeps down to 1.1gb of used old gen. I haven't yet confirmed it, but > given the previous heap dumps I'd guess that they will claim 12mb of > live objects and 1.1gb of dead objects. The current young gen > collections (at 497,601 seconds) are taking ~3.7 seconds (4.33 user, > 2.03 sys) Any idea what could be going on here? We're running JDK 1.6_16. > > -- > Eagle Kessler > > > ------------------------------------------------------------------------ > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From y.s.ramakrishna at oracle.com Wed Jul 21 12:40:28 2010 From: y.s.ramakrishna at oracle.com (Y. S. Ramakrishna) Date: Wed, 21 Jul 2010 12:40:28 -0700 Subject: Unreasonably long ParNew, CMS pause times In-Reply-To: <4C474BD6.1060908@oracle.com> References: <4C474BD6.1060908@oracle.com> Message-ID: <4C474D2C.9070103@oracle.com> oh, and the "(max 0)" seems to imply you have somehow ended up asking for MaxTenuringThreshold=0 which can be disastrous. It is possible that the default settings for CMS on certain very old JVM's did indeed result in MTT=0 (but memory is fading of that era), so if using such older vintage of JVM's, explicitly setting MTT and SurvivorRatio higher is a good idea. -- ramki On 07/21/10 12:34, Y. S. Ramakrishna wrote: > Yes, like what i think Matt is getting at, i'd configure sufficiently > large survivor spaces. Even if you expect most of your objects to die young, > you'd want survivor spaces large enough to keep at least age 1 objects in > the survivor space. If as you state no medium- to ling-lived state is > retained, your data is mostly short-lived and you'll be able to do without > any promotion at all. Your problem here is that somehow your survivor > spaces may have disappeared. (+PrintHeapAtGC will tell you, and > of course examining yr JVM options should shed more light on that apparent > disappearance.) > > -- ramki > > On 07/21/10 10:09, Eagle Kessler wrote: >> Hello all, >> I've got a web service that I'm seeing some very strange behavior on >> around it's garbage collection, and was wondering if someone here could >> explain why it might be happening. The service itself is fairly simple: >> Take in data from a few sources and merge them with the existing data in >> the database. It stores nearly no state while doing this, and indeed >> heap dumps taken 1, 24, and 72 hours after boot indicate that we have a >> consistent ~12mb of live data (in a 2GB heap, but I hope that's not what >> is causing this). >> >> The GC logs, though, don't look very happy at all. After our start up >> period, they settle into a fairly consistent pattern: >> >> 1041.159: [GC 1041.159: [ParNew >> Desired survivor size 2064384 bytes, new threshold 0 (max 0) >> >> : 516224K->0K(520256K), 0.0170322 secs] 537266K->22659K(2093120K), 0.0171330 secs] [Times: user=0.04 sys=0.01, real=0.02 secs] >> 1606.500: [GC 1606.500: [ParNew >> Desired survivor size 2064384 bytes, new threshold 0 (max 0) >> >> : 516224K->0K(520256K), 0.0173235 secs] 538883K->24214K(2093120K), 0.0174138 secs] [Times: user=0.04 sys=0.03, real=0.02 secs] >> 2040.773: [GC 2040.773: [ParNew >> Desired survivor size 2064384 bytes, new threshold 0 (max 0) >> >> : 516224K->0K(520256K), 0.0196319 secs] 540438K->25737K(2093120K), 0.0197275 secs] [Times: user=0.05 sys=0.02, real=0.02 secs] >> >> Which we would be very nice if it kept going like that. However, by the >> first time the CMS collector runs, things aren't working nearly as well: >> >> 214182.439: [GC 214182.439: [ParNew >> Desired survivor size 2064384 bytes, new threshold 0 (max 0) >> : 516224K->0K(520256K), 1.0146996 secs] 1297278K->782575K(2093120K), 1.0148799 secs] [Times: user=1.21 sys=0.58, real=1.01 secs] >> >> 214247.437: [GC 214247.438: [ParNew >> Desired survivor size 2064384 bytes, new threshold 0 (max 0) >> : 516224K->0K(520256K), 1.2310274 secs] 1298799K->784188K(2093120K), 1.2311534 secs] [Times: user=1.46 sys=0.69, real=1.23 secs] >> >> 214313.642: [GC 214313.642: [ParNew >> Desired survivor size 2064384 bytes, new threshold 0 (max 0) >> : 516224K->0K(520256K), 1.2183258 secs] 1300412K->785710K(2093120K), 1.2184848 secs] [Times: user=1.45 sys=0.65, real=1.22 secs] >> >> The increasing sys time is a bit worrying, but it seems like the actual >> GC time is rising as well, even though we aren't collecting any more >> young-gen garbage. At this point, CMS went off >> >> 214380.695: [GC [1 CMS-initial-mark: 787188K(1572864K)] 787195K(2093120K), 1.8929842 secs] [Times: user=1.50 sys=0.02, real=1.89 secs] >> >> 214382.589: [CMS-concurrent-mark-start] >> 214383.056: [CMS-concurrent-mark: 0.467/0.467 secs] [Times: user=1.81 sys=0.01, real=0.47 secs] >> 214383.056: [CMS-concurrent-preclean-start] >> 214383.064: [CMS-concurrent-preclean: 0.008/0.008 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] >> >> 214383.064: [CMS-concurrent-abortable-preclean-start] >> CMS: abort preclean due to time 214388.133: [CMS-concurrent-abortable-preclean: 0.242/5.069 secs] [Times: user=5.02 sys=0.02, real=5.07 secs] >> 214388.159: [GC[YG occupancy: 51943 K (520256 K)]214388.159: [Rescan (parallel) , 1.5403455 secs]214389.699: [weak refs processing, 0.0050170 secs] [1 CMS-remark: 787188K(1572864K)] 839132K(2093120K), 1.5458536 secs] [Times: user=1.80 sys=0.71, real=1.55 secs] >> >> 214389.705: [CMS-concurrent-sweep-start] >> 214390.439: [CMS-concurrent-sweep: 0.671/0.734 secs] [Times: user=1.35 sys=0.00, real=0.73 secs] >> 214390.439: [CMS-concurrent-reset-start] >> 214390.621: [CMS-concurrent-reset: 0.183/0.183 secs] [Times: user=0.20 sys=0.02, real=0.18 secs] >> >> It seems like a) initial-mark shouldn't take 1.8 seconds, b) if we >> really do only have 12mb of live data, CMS should have collected a lot >> more than it did (the next ParNew collection reported ~545MB of old gen >> in use), and c) 50% heap usage with very little promotion seems very >> early for the collector to go off. >> >> The next CMS cycle is at 434,973 seconds, by which point the young gen >> collections are taking 3 seconds (user 3.59, sys 1.60, real 3.09). The >> initial mark takes 4.82 seconds (user 3.82, sys 0.02, real 4.82), and >> sweeps down to 1.1gb of used old gen. I haven't yet confirmed it, but >> given the previous heap dumps I'd guess that they will claim 12mb of >> live objects and 1.1gb of dead objects. The current young gen >> collections (at 497,601 seconds) are taking ~3.7 seconds (4.33 user, >> 2.03 sys) Any idea what could be going on here? We're running JDK 1.6_16. >> >> -- >> Eagle Kessler >> >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From eagle.kessler at gmail.com Wed Jul 21 13:41:18 2010 From: eagle.kessler at gmail.com (Eagle Kessler) Date: Wed, 21 Jul 2010 13:41:18 -0700 Subject: Unreasonably long ParNew, CMS pause times In-Reply-To: <4C474D2C.9070103@oracle.com> References: <4C474BD6.1060908@oracle.com> <4C474D2C.9070103@oracle.com> Message-ID: Checking the configs, it looks like we are explicitly setting MTT to 0: # Min, max, total JVM size (-Xms -Xmx) JVM_SIZE="-server -Xms2g -Xmx2g" # New Generation Sizes (-XX:NewSize -XX:MaxNewSize) JVM_SIZE_NEW="-XX:NewSize=512m -XX:MaxNewSize=512m" # Perm Generation Sizes (-XX:PermSize -XX:MaxPermSize) JVM_SIZE_PERM="-XX:PermSize=128m -XX:MaxPermSize=128m" # Type of Garbage Collector to use JVM_GC_TYPE="-XX:+UseConcMarkSweepGC -XX:+UseParNewGC" JVM_GC_OPTS="-XX:MaxTenuringThreshold=0 -XX:SurvivorRatio=128" I agree that in this case we definitely want to be using survivor spaces (I'm unfortunately not in charge of the default GC settings yet). However, I didn't know that running without survivor spaces could cause this kind of behavior. Why does running without survivor spaces cause such a large performance issue? Regardless, I'll ask that -XX:+PrintTenuringDistribution be added to the configs, along with a non-zero MTT, and see if the issue persists. The rising ParNew times seem like they would be unrelated to the tenuring threshold, though, wouldn't they? On Wed, Jul 21, 2010 at 12:40 PM, Y. S. Ramakrishna < y.s.ramakrishna at oracle.com> wrote: > oh, and the "(max 0)" seems to imply you have somehow ended up asking for > MaxTenuringThreshold=0 which can be disastrous. It is possible that the > default settings for CMS on certain very old JVM's did indeed result in > MTT=0 (but memory is fading of that era), so if using such older vintage > of JVM's, explicitly setting MTT and SurvivorRatio higher is a good idea. > > -- ramki > > > On 07/21/10 12:34, Y. S. Ramakrishna wrote: > >> Yes, like what i think Matt is getting at, i'd configure sufficiently >> large survivor spaces. Even if you expect most of your objects to die >> young, >> you'd want survivor spaces large enough to keep at least age 1 objects in >> the survivor space. If as you state no medium- to ling-lived state is >> retained, your data is mostly short-lived and you'll be able to do without >> any promotion at all. Your problem here is that somehow your survivor >> spaces may have disappeared. (+PrintHeapAtGC will tell you, and >> of course examining yr JVM options should shed more light on that apparent >> disappearance.) >> >> -- ramki >> >> On 07/21/10 10:09, Eagle Kessler wrote: >> >>> Hello all, >>> I've got a web service that I'm seeing some very strange behavior on >>> around it's garbage collection, and was wondering if someone here could >>> explain why it might be happening. The service itself is fairly simple: Take >>> in data from a few sources and merge them with the existing data in the >>> database. It stores nearly no state while doing this, and indeed heap dumps >>> taken 1, 24, and 72 hours after boot indicate that we have a consistent >>> ~12mb of live data (in a 2GB heap, but I hope that's not what is causing >>> this). >>> >>> The GC logs, though, don't look very happy at all. After our start up >>> period, they settle into a fairly consistent pattern: >>> >>> 1041.159: [GC 1041.159: [ParNew >>> Desired survivor size 2064384 bytes, new threshold 0 (max 0) >>> >>> : 516224K->0K(520256K), 0.0170322 secs] 537266K->22659K(2093120K), >>> 0.0171330 secs] [Times: user=0.04 sys=0.01, real=0.02 secs] 1606.500: [GC >>> 1606.500: [ParNew >>> Desired survivor size 2064384 bytes, new threshold 0 (max 0) >>> >>> : 516224K->0K(520256K), 0.0173235 secs] 538883K->24214K(2093120K), >>> 0.0174138 secs] [Times: user=0.04 sys=0.03, real=0.02 secs] 2040.773: [GC >>> 2040.773: [ParNew >>> Desired survivor size 2064384 bytes, new threshold 0 (max 0) >>> >>> : 516224K->0K(520256K), 0.0196319 secs] 540438K->25737K(2093120K), >>> 0.0197275 secs] [Times: user=0.05 sys=0.02, real=0.02 secs] >>> Which we would be very nice if it kept going like that. However, by the >>> first time the CMS collector runs, things aren't working nearly as well: >>> >>> 214182.439: [GC 214182.439: [ParNew >>> Desired survivor size 2064384 bytes, new threshold 0 (max 0) >>> : 516224K->0K(520256K), 1.0146996 secs] 1297278K->782575K(2093120K), >>> 1.0148799 secs] [Times: user=1.21 sys=0.58, real=1.01 secs] >>> 214247.437: [GC 214247.438: [ParNew >>> Desired survivor size 2064384 bytes, new threshold 0 (max 0) >>> : 516224K->0K(520256K), 1.2310274 secs] 1298799K->784188K(2093120K), >>> 1.2311534 secs] [Times: user=1.46 sys=0.69, real=1.23 secs] >>> 214313.642: [GC 214313.642: [ParNew >>> Desired survivor size 2064384 bytes, new threshold 0 (max 0) >>> : 516224K->0K(520256K), 1.2183258 secs] 1300412K->785710K(2093120K), >>> 1.2184848 secs] [Times: user=1.45 sys=0.65, real=1.22 secs] >>> >>> The increasing sys time is a bit worrying, but it seems like the actual >>> GC time is rising as well, even though we aren't collecting any more >>> young-gen garbage. At this point, CMS went off >>> >>> 214380.695: [GC [1 CMS-initial-mark: 787188K(1572864K)] >>> 787195K(2093120K), 1.8929842 secs] [Times: user=1.50 sys=0.02, real=1.89 >>> secs] >>> 214382.589: [CMS-concurrent-mark-start] >>> 214383.056: [CMS-concurrent-mark: 0.467/0.467 secs] [Times: user=1.81 >>> sys=0.01, real=0.47 secs] 214383.056: [CMS-concurrent-preclean-start] >>> 214383.064: [CMS-concurrent-preclean: 0.008/0.008 secs] [Times: user=0.01 >>> sys=0.00, real=0.01 secs] >>> 214383.064: [CMS-concurrent-abortable-preclean-start] >>> CMS: abort preclean due to time 214388.133: >>> [CMS-concurrent-abortable-preclean: 0.242/5.069 secs] [Times: user=5.02 >>> sys=0.02, real=5.07 secs] 214388.159: [GC[YG occupancy: 51943 K (520256 >>> K)]214388.159: [Rescan (parallel) , 1.5403455 secs]214389.699: [weak refs >>> processing, 0.0050170 secs] [1 CMS-remark: 787188K(1572864K)] >>> 839132K(2093120K), 1.5458536 secs] [Times: user=1.80 sys=0.71, real=1.55 >>> secs] >>> 214389.705: [CMS-concurrent-sweep-start] >>> 214390.439: [CMS-concurrent-sweep: 0.671/0.734 secs] [Times: user=1.35 >>> sys=0.00, real=0.73 secs] 214390.439: [CMS-concurrent-reset-start] >>> 214390.621: [CMS-concurrent-reset: 0.183/0.183 secs] [Times: user=0.20 >>> sys=0.02, real=0.18 secs] >>> >>> It seems like a) initial-mark shouldn't take 1.8 seconds, b) if we really >>> do only have 12mb of live data, CMS should have collected a lot more than it >>> did (the next ParNew collection reported ~545MB of old gen in use), and c) >>> 50% heap usage with very little promotion seems very early for the collector >>> to go off. >>> >>> The next CMS cycle is at 434,973 seconds, by which point the young gen >>> collections are taking 3 seconds (user 3.59, sys 1.60, real 3.09). The >>> initial mark takes 4.82 seconds (user 3.82, sys 0.02, real 4.82), and sweeps >>> down to 1.1gb of used old gen. I haven't yet confirmed it, but given the >>> previous heap dumps I'd guess that they will claim 12mb of live objects and >>> 1.1gb of dead objects. The current young gen collections (at 497,601 >>> seconds) are taking ~3.7 seconds (4.33 user, 2.03 sys) Any idea what could >>> be going on here? We're running JDK 1.6_16. >>> >>> -- >>> Eagle Kessler >>> >>> >>> ------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> hotspot-gc-use mailing list >>> hotspot-gc-use at openjdk.java.net >>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> > -- Eagle Kessler -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100721/99e34714/attachment-0001.html From y.s.ramakrishna at oracle.com Wed Jul 21 14:16:28 2010 From: y.s.ramakrishna at oracle.com (Y. S. Ramakrishna) Date: Wed, 21 Jul 2010 14:16:28 -0700 Subject: Unreasonably long ParNew, CMS pause times In-Reply-To: References: <4C474BD6.1060908@oracle.com> <4C474D2C.9070103@oracle.com> Message-ID: <4C4763AC.4030508@oracle.com> On 07/21/10 13:41, Eagle Kessler wrote: > Checking the configs, it looks like we are explicitly setting MTT to 0: > > # Min, max, total JVM size (-Xms -Xmx) > JVM_SIZE="-server -Xms2g -Xmx2g" > > # New Generation Sizes (-XX:NewSize -XX:MaxNewSize) > > JVM_SIZE_NEW="-XX:NewSize=512m -XX:MaxNewSize=512m" > > # Perm Generation Sizes (-XX:PermSize -XX:MaxPermSize) > JVM_SIZE_PERM="-XX:PermSize=128m -XX:MaxPermSize=128m" > > # Type of Garbage Collector to use > > JVM_GC_TYPE="-XX:+UseConcMarkSweepGC -XX:+UseParNewGC" > > JVM_GC_OPTS="-XX:MaxTenuringThreshold=0 -XX:SurvivorRatio=128" > > I agree that in this case we definitely want to be using survivor spaces > (I'm unfortunately not in charge of the default GC settings yet). > However, I didn't know that running without survivor spaces could cause > this kind of behavior. Why does running without survivor spaces cause > such a large performance issue? See below. > > Regardless, I'll ask that -XX:+PrintTenuringDistribution be added to the > configs, along with a non-zero MTT, and see if the issue persists. The And of course SurvivorRatio a more reasonable value like 6 or 8, depending on expected survival rate etc. > rising ParNew times seem like they would be unrelated to the tenuring > threshold, though, wouldn't they? No, it's related in the sense that MTT=0 was resulting in very very short-lived data to promote into the old gen. This can cause object dempgraphics (size and age dirstribution) to be very non-stationary, and confuse the heuristics for sizing the free list inventory. It of course also places large pressure on the old gen allocator, increases fragmentation, increases mutation rates and so on. The only thing it might not affect much is the initial mark pauses, which will probably stay as they were before. -- ramki > > On Wed, Jul 21, 2010 at 12:40 PM, Y. S. Ramakrishna > > wrote: > > oh, and the "(max 0)" seems to imply you have somehow ended up > asking for > MaxTenuringThreshold=0 which can be disastrous. It is possible that the > default settings for CMS on certain very old JVM's did indeed result in > MTT=0 (but memory is fading of that era), so if using such older vintage > of JVM's, explicitly setting MTT and SurvivorRatio higher is a good > idea. > > -- ramki > > > On 07/21/10 12:34, Y. S. Ramakrishna wrote: > > Yes, like what i think Matt is getting at, i'd configure > sufficiently > large survivor spaces. Even if you expect most of your objects > to die young, > you'd want survivor spaces large enough to keep at least age 1 > objects in > the survivor space. If as you state no medium- to ling-lived > state is > retained, your data is mostly short-lived and you'll be able to > do without > any promotion at all. Your problem here is that somehow your > survivor > spaces may have disappeared. (+PrintHeapAtGC will tell you, and > of course examining yr JVM options should shed more light on > that apparent > disappearance.) > > -- ramki > > On 07/21/10 10:09, Eagle Kessler wrote: > > Hello all, > I've got a web service that I'm seeing some very strange > behavior on around it's garbage collection, and was > wondering if someone here could explain why it might be > happening. The service itself is fairly simple: Take in data > from a few sources and merge them with the existing data in > the database. It stores nearly no state while doing this, > and indeed heap dumps taken 1, 24, and 72 hours after boot > indicate that we have a consistent ~12mb of live data (in a > 2GB heap, but I hope that's not what is causing this). > > The GC logs, though, don't look very happy at all. After our > start up period, they settle into a fairly consistent pattern: > > 1041.159: [GC 1041.159: [ParNew > Desired survivor size 2064384 bytes, new threshold 0 (max 0) > > : 516224K->0K(520256K), 0.0170322 secs] > 537266K->22659K(2093120K), 0.0171330 secs] [Times: user=0.04 > sys=0.01, real=0.02 secs] 1606.500: [GC 1606.500: [ParNew > Desired survivor size 2064384 bytes, new threshold 0 (max 0) > > : 516224K->0K(520256K), 0.0173235 secs] > 538883K->24214K(2093120K), 0.0174138 secs] [Times: user=0.04 > sys=0.03, real=0.02 secs] 2040.773: [GC 2040.773: [ParNew > Desired survivor size 2064384 bytes, new threshold 0 (max 0) > > : 516224K->0K(520256K), 0.0196319 secs] > 540438K->25737K(2093120K), 0.0197275 secs] [Times: user=0.05 > sys=0.02, real=0.02 secs] > Which we would be very nice if it kept going like that. > However, by the first time the CMS collector runs, things > aren't working nearly as well: > > 214182.439: [GC 214182.439: [ParNew > Desired survivor size 2064384 bytes, new threshold 0 (max 0) > : 516224K->0K(520256K), 1.0146996 secs] > 1297278K->782575K(2093120K), 1.0148799 secs] [Times: > user=1.21 sys=0.58, real=1.01 secs] > 214247.437: [GC 214247.438: [ParNew > Desired survivor size 2064384 bytes, new threshold 0 (max 0) > : 516224K->0K(520256K), 1.2310274 secs] > 1298799K->784188K(2093120K), 1.2311534 secs] [Times: > user=1.46 sys=0.69, real=1.23 secs] > 214313.642: [GC 214313.642: [ParNew > Desired survivor size 2064384 bytes, new threshold 0 (max 0) > : 516224K->0K(520256K), 1.2183258 secs] > 1300412K->785710K(2093120K), 1.2184848 secs] [Times: > user=1.45 sys=0.65, real=1.22 secs] > > The increasing sys time is a bit worrying, but it seems like > the actual GC time is rising as well, even though we aren't > collecting any more young-gen garbage. At this point, CMS > went off > > 214380.695: [GC [1 CMS-initial-mark: 787188K(1572864K)] > 787195K(2093120K), 1.8929842 secs] [Times: user=1.50 > sys=0.02, real=1.89 secs] > 214382.589: [CMS-concurrent-mark-start] > 214383.056: [CMS-concurrent-mark: 0.467/0.467 secs] [Times: > user=1.81 sys=0.01, real=0.47 secs] 214383.056: > [CMS-concurrent-preclean-start] > 214383.064: [CMS-concurrent-preclean: 0.008/0.008 secs] > [Times: user=0.01 sys=0.00, real=0.01 secs] > 214383.064: [CMS-concurrent-abortable-preclean-start] > CMS: abort preclean due to time 214388.133: > [CMS-concurrent-abortable-preclean: 0.242/5.069 secs] > [Times: user=5.02 sys=0.02, real=5.07 secs] 214388.159: > [GC[YG occupancy: 51943 K (520256 K)]214388.159: [Rescan > (parallel) , 1.5403455 secs]214389.699: [weak refs > processing, 0.0050170 secs] [1 CMS-remark: > 787188K(1572864K)] 839132K(2093120K), 1.5458536 secs] > [Times: user=1.80 sys=0.71, real=1.55 secs] > 214389.705: [CMS-concurrent-sweep-start] > 214390.439: [CMS-concurrent-sweep: 0.671/0.734 secs] [Times: > user=1.35 sys=0.00, real=0.73 secs] 214390.439: > [CMS-concurrent-reset-start] > 214390.621: [CMS-concurrent-reset: 0.183/0.183 secs] [Times: > user=0.20 sys=0.02, real=0.18 secs] > > It seems like a) initial-mark shouldn't take 1.8 seconds, b) > if we really do only have 12mb of live data, CMS should have > collected a lot more than it did (the next ParNew collection > reported ~545MB of old gen in use), and c) 50% heap usage > with very little promotion seems very early for the > collector to go off. > > The next CMS cycle is at 434,973 seconds, by which point the > young gen collections are taking 3 seconds (user 3.59, sys > 1.60, real 3.09). The initial mark takes 4.82 seconds (user > 3.82, sys 0.02, real 4.82), and sweeps down to 1.1gb of used > old gen. I haven't yet confirmed it, but given the previous > heap dumps I'd guess that they will claim 12mb of live > objects and 1.1gb of dead objects. The current young gen > collections (at 497,601 seconds) are taking ~3.7 seconds > (4.33 user, 2.03 sys) Any idea what could be going on here? > We're running JDK 1.6_16. > > -- > Eagle Kessler > > > ------------------------------------------------------------------------ > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > > > > -- > Eagle Kessler From y.s.ramakrishna at oracle.com Wed Jul 21 16:02:37 2010 From: y.s.ramakrishna at oracle.com (Y. S. Ramakrishna) Date: Wed, 21 Jul 2010 16:02:37 -0700 Subject: Unreasonably long ParNew, CMS pause times In-Reply-To: <4C4763AC.4030508@oracle.com> References: <4C474BD6.1060908@oracle.com> <4C474D2C.9070103@oracle.com> <4C4763AC.4030508@oracle.com> Message-ID: <4C477C8D.6010508@oracle.com> Eagle, I also noticed that you have a 2G heap. Depending on what kind of machine that's on, the number of available cpu's for the JVM, and how many JVM's are sharing that machine etc., for a heap that size, you may well be able to do almost as well or perhaps even better (especially if you have no long term state) by just using the throughput collector -XX:+UseParallelOldGC I am guessing that your scavenges will be at least as fast, if not faster, and your full collections with Parallel Old of such a small heap can probably be done fairly quickly as well. The major cycles will probably be quite infrequent with large survivor spaces and little or no promotion. (And you may even be able to schedule the full collection during the middle of the night when ambient load is low etc.) -- ramki On 07/21/10 14:16, Y. S. Ramakrishna wrote: > > On 07/21/10 13:41, Eagle Kessler wrote: >> Checking the configs, it looks like we are explicitly setting MTT to 0: >> >> # Min, max, total JVM size (-Xms -Xmx) >> JVM_SIZE="-server -Xms2g -Xmx2g" >> >> # New Generation Sizes (-XX:NewSize -XX:MaxNewSize) >> >> JVM_SIZE_NEW="-XX:NewSize=512m -XX:MaxNewSize=512m" >> >> # Perm Generation Sizes (-XX:PermSize -XX:MaxPermSize) >> JVM_SIZE_PERM="-XX:PermSize=128m -XX:MaxPermSize=128m" >> >> # Type of Garbage Collector to use >> >> JVM_GC_TYPE="-XX:+UseConcMarkSweepGC -XX:+UseParNewGC" >> >> JVM_GC_OPTS="-XX:MaxTenuringThreshold=0 -XX:SurvivorRatio=128" >> >> I agree that in this case we definitely want to be using survivor spaces >> (I'm unfortunately not in charge of the default GC settings yet). >> However, I didn't know that running without survivor spaces could cause >> this kind of behavior. Why does running without survivor spaces cause >> such a large performance issue? > > See below. > >> Regardless, I'll ask that -XX:+PrintTenuringDistribution be added to the >> configs, along with a non-zero MTT, and see if the issue persists. The > > And of course SurvivorRatio a more reasonable value like 6 or 8, > depending on expected survival rate etc. > >> rising ParNew times seem like they would be unrelated to the tenuring >> threshold, though, wouldn't they? > > No, it's related in the sense that MTT=0 was resulting in very very short-lived > data to promote into the old gen. This can cause object dempgraphics (size and > age dirstribution) to be very non-stationary, and confuse the heuristics > for sizing the free list inventory. It of course also places large pressure > on the old gen allocator, increases fragmentation, increases mutation rates > and so on. > > The only thing it might not affect much is the initial mark pauses, > which will probably stay as they were before. > > -- ramki > >> On Wed, Jul 21, 2010 at 12:40 PM, Y. S. Ramakrishna >> > wrote: >> >> oh, and the "(max 0)" seems to imply you have somehow ended up >> asking for >> MaxTenuringThreshold=0 which can be disastrous. It is possible that the >> default settings for CMS on certain very old JVM's did indeed result in >> MTT=0 (but memory is fading of that era), so if using such older vintage >> of JVM's, explicitly setting MTT and SurvivorRatio higher is a good >> idea. >> >> -- ramki >> >> >> On 07/21/10 12:34, Y. S. Ramakrishna wrote: >> >> Yes, like what i think Matt is getting at, i'd configure >> sufficiently >> large survivor spaces. Even if you expect most of your objects >> to die young, >> you'd want survivor spaces large enough to keep at least age 1 >> objects in >> the survivor space. If as you state no medium- to ling-lived >> state is >> retained, your data is mostly short-lived and you'll be able to >> do without >> any promotion at all. Your problem here is that somehow your >> survivor >> spaces may have disappeared. (+PrintHeapAtGC will tell you, and >> of course examining yr JVM options should shed more light on >> that apparent >> disappearance.) >> >> -- ramki >> >> On 07/21/10 10:09, Eagle Kessler wrote: >> >> Hello all, >> I've got a web service that I'm seeing some very strange >> behavior on around it's garbage collection, and was >> wondering if someone here could explain why it might be >> happening. The service itself is fairly simple: Take in data >> from a few sources and merge them with the existing data in >> the database. It stores nearly no state while doing this, >> and indeed heap dumps taken 1, 24, and 72 hours after boot >> indicate that we have a consistent ~12mb of live data (in a >> 2GB heap, but I hope that's not what is causing this). >> >> The GC logs, though, don't look very happy at all. After our >> start up period, they settle into a fairly consistent pattern: >> >> 1041.159: [GC 1041.159: [ParNew >> Desired survivor size 2064384 bytes, new threshold 0 (max 0) >> >> : 516224K->0K(520256K), 0.0170322 secs] >> 537266K->22659K(2093120K), 0.0171330 secs] [Times: user=0.04 >> sys=0.01, real=0.02 secs] 1606.500: [GC 1606.500: [ParNew >> Desired survivor size 2064384 bytes, new threshold 0 (max 0) >> >> : 516224K->0K(520256K), 0.0173235 secs] >> 538883K->24214K(2093120K), 0.0174138 secs] [Times: user=0.04 >> sys=0.03, real=0.02 secs] 2040.773: [GC 2040.773: [ParNew >> Desired survivor size 2064384 bytes, new threshold 0 (max 0) >> >> : 516224K->0K(520256K), 0.0196319 secs] >> 540438K->25737K(2093120K), 0.0197275 secs] [Times: user=0.05 >> sys=0.02, real=0.02 secs] >> Which we would be very nice if it kept going like that. >> However, by the first time the CMS collector runs, things >> aren't working nearly as well: >> >> 214182.439: [GC 214182.439: [ParNew >> Desired survivor size 2064384 bytes, new threshold 0 (max 0) >> : 516224K->0K(520256K), 1.0146996 secs] >> 1297278K->782575K(2093120K), 1.0148799 secs] [Times: >> user=1.21 sys=0.58, real=1.01 secs] >> 214247.437: [GC 214247.438: [ParNew >> Desired survivor size 2064384 bytes, new threshold 0 (max 0) >> : 516224K->0K(520256K), 1.2310274 secs] >> 1298799K->784188K(2093120K), 1.2311534 secs] [Times: >> user=1.46 sys=0.69, real=1.23 secs] >> 214313.642: [GC 214313.642: [ParNew >> Desired survivor size 2064384 bytes, new threshold 0 (max 0) >> : 516224K->0K(520256K), 1.2183258 secs] >> 1300412K->785710K(2093120K), 1.2184848 secs] [Times: >> user=1.45 sys=0.65, real=1.22 secs] >> >> The increasing sys time is a bit worrying, but it seems like >> the actual GC time is rising as well, even though we aren't >> collecting any more young-gen garbage. At this point, CMS >> went off >> >> 214380.695: [GC [1 CMS-initial-mark: 787188K(1572864K)] >> 787195K(2093120K), 1.8929842 secs] [Times: user=1.50 >> sys=0.02, real=1.89 secs] >> 214382.589: [CMS-concurrent-mark-start] >> 214383.056: [CMS-concurrent-mark: 0.467/0.467 secs] [Times: >> user=1.81 sys=0.01, real=0.47 secs] 214383.056: >> [CMS-concurrent-preclean-start] >> 214383.064: [CMS-concurrent-preclean: 0.008/0.008 secs] >> [Times: user=0.01 sys=0.00, real=0.01 secs] >> 214383.064: [CMS-concurrent-abortable-preclean-start] >> CMS: abort preclean due to time 214388.133: >> [CMS-concurrent-abortable-preclean: 0.242/5.069 secs] >> [Times: user=5.02 sys=0.02, real=5.07 secs] 214388.159: >> [GC[YG occupancy: 51943 K (520256 K)]214388.159: [Rescan >> (parallel) , 1.5403455 secs]214389.699: [weak refs >> processing, 0.0050170 secs] [1 CMS-remark: >> 787188K(1572864K)] 839132K(2093120K), 1.5458536 secs] >> [Times: user=1.80 sys=0.71, real=1.55 secs] >> 214389.705: [CMS-concurrent-sweep-start] >> 214390.439: [CMS-concurrent-sweep: 0.671/0.734 secs] [Times: >> user=1.35 sys=0.00, real=0.73 secs] 214390.439: >> [CMS-concurrent-reset-start] >> 214390.621: [CMS-concurrent-reset: 0.183/0.183 secs] [Times: >> user=0.20 sys=0.02, real=0.18 secs] >> >> It seems like a) initial-mark shouldn't take 1.8 seconds, b) >> if we really do only have 12mb of live data, CMS should have >> collected a lot more than it did (the next ParNew collection >> reported ~545MB of old gen in use), and c) 50% heap usage >> with very little promotion seems very early for the >> collector to go off. >> >> The next CMS cycle is at 434,973 seconds, by which point the >> young gen collections are taking 3 seconds (user 3.59, sys >> 1.60, real 3.09). The initial mark takes 4.82 seconds (user >> 3.82, sys 0.02, real 4.82), and sweeps down to 1.1gb of used >> old gen. I haven't yet confirmed it, but given the previous >> heap dumps I'd guess that they will claim 12mb of live >> objects and 1.1gb of dead objects. The current young gen >> collections (at 497,601 seconds) are taking ~3.7 seconds >> (4.33 user, 2.03 sys) Any idea what could be going on here? >> We're running JDK 1.6_16. >> >> -- >> Eagle Kessler >> >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> >> >> >> >> -- >> Eagle Kessler > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From eagle.kessler at gmail.com Thu Jul 22 11:36:50 2010 From: eagle.kessler at gmail.com (Eagle Kessler) Date: Thu, 22 Jul 2010 11:36:50 -0700 Subject: Unreasonably long ParNew, CMS pause times In-Reply-To: <4C4763AC.4030508@oracle.com> References: <4C474BD6.1060908@oracle.com> <4C474D2C.9070103@oracle.com> <4C4763AC.4030508@oracle.com> Message-ID: We've switched to the throughput collector with a max pause time goal, and that seems to be behaving well. I'm still curious about the behavior that I'm seeing with CMS, though. I ran some tests with -XX:+PrintTenuringDistribution and without setting MaxTenuringThreshold, and I'm still seeing the behavior. Both of these tests were executed under constant load: 512mb heap (128mb young gen), MTT=0: 62.537: [GC 62.537: [ParNew Desired survivor size 491520 bytes, new threshold 0 (max 0) : 129152K->0K(130112K), 0.0064317 secs] 143985K->16058K(523328K), 0.0065156 secs] [Times: user=0.03 sys=0.01, real=0.01 secs] 95.633: [GC 95.633: [ParNew Desired survivor size 491520 bytes, new threshold 0 (max 0) : 129152K->0K(130112K), 0.0030186 secs] 145210K->16614K(523328K), 0.0030929 secs] [Times: user=0.01 sys=0.00, real=0.00 secs] 127.882: [GC 127.882: [ParNew Desired survivor size 491520 bytes, new threshold 0 (max 0) : 129152K->0K(130112K), 0.0042069 secs] 145766K->17285K(523328K), 0.0042783 secs] [Times: user=0.01 sys=0.02, real=0.00 secs] 158.086: [GC 158.086: [ParNew Desired survivor size 491520 bytes, new threshold 0 (max 0) : 129152K->0K(130112K), 0.0045986 secs] 146437K->17955K(523328K), 0.0046896 secs] [Times: user=0.02 sys=0.02, real=0.01 secs] Rising to 5021.415: [GC 5021.415: [ParNew Desired survivor size 491520 bytes, new threshold 0 (max 0) : 129152K->0K(130112K), 0.0604269 secs] 225694K->97083K(523328K), 0.0605133 secs] [Times: user=0.10 sys=0.61, real=0.06 secs] 5053.611: [GC 5053.611: [ParNew Desired survivor size 491520 bytes, new threshold 0 (max 0) : 129152K->0K(130112K), 0.0569357 secs] 226235K->97636K(523328K), 0.0570316 secs] [Times: user=0.10 sys=0.60, real=0.05 secs] 5084.401: [GC 5084.401: [ParNew Desired survivor size 491520 bytes, new threshold 0 (max 0) : 129152K->0K(130112K), 0.0591064 secs] 226788K->98088K(523328K), 0.0591840 secs] [Times: user=0.09 sys=0.62, real=0.06 secs] Similarly, 512mb heap, 128mb young gen, and no MTT: 83.708: [GC 83.708: [ParNew Desired survivor size 3702784 bytes, new threshold 15 (max 15) - age 1: 626008 bytes, 626008 total - age 2: 736336 bytes, 1362344 total : 120669K->1759K(123840K), 0.0034469 secs] 135502K->16593K(517056K), 0.0035234 secs] [Times: user=0.02 sys=0.01, real=0.01 secs] 109.627: [GC 109.627: [ParNew Desired survivor size 3702784 bytes, new threshold 15 (max 15) - age 1: 634688 bytes, 634688 total - age 2: 268624 bytes, 903312 total - age 3: 736016 bytes, 1639328 total : 118367K->2275K(123840K), 0.0036776 secs] 133201K->17109K(517056K), 0.0037455 secs] [Times: user=0.02 sys=0.01, real=0.01 secs] 137.495: [GC 137.495: [ParNew Desired survivor size 3702784 bytes, new threshold 15 (max 15) - age 1: 479728 bytes, 479728 total - age 2: 269064 bytes, 748792 total - age 3: 267904 bytes, 1016696 total - age 4: 735952 bytes, 1752648 total : 118883K->2396K(123840K), 0.0040557 secs] 133717K->17230K(517056K), 0.0041302 secs] [Times: user=0.02 sys=0.01, real=0.01 secs] 165.090: [GC 165.090: [ParNew Desired survivor size 3702784 bytes, new threshold 15 (max 15) - age 1: 403856 bytes, 403856 total - age 2: 267736 bytes, 671592 total - age 3: 268616 bytes, 940208 total - age 4: 267904 bytes, 1208112 total - age 5: 729072 bytes, 1937184 total : 119004K->2668K(123840K), 0.0046744 secs] 133838K->17501K(517056K), 0.0047473 secs] [Times: user=0.04 sys=0.02, real=0.00 secs] Rising to 4981.917: [GC 4981.918: [ParNew Desired survivor size 3702784 bytes, new threshold 14 (max 15) - age 1: 533872 bytes, 533872 total - age 2: 269336 bytes, 803208 total - age 3: 268048 bytes, 1071256 total - age 4: 268272 bytes, 1339528 total - age 5: 265880 bytes, 1605408 total - age 6: 241704 bytes, 1847112 total - age 7: 241112 bytes, 2088224 total - age 8: 239680 bytes, 2327904 total - age 9: 233632 bytes, 2561536 total - age 10: 231040 bytes, 2792576 total - age 11: 231040 bytes, 3023616 total - age 12: 232256 bytes, 3255872 total - age 13: 232256 bytes, 3488128 total - age 14: 231040 bytes, 3719168 total : 122652K->4440K(123840K), 0.0654827 secs] 173869K->56113K(517056K), 0.0655671 secs] [Times: user=0.17 sys=0.62, real=0.07 secs] 5009.679: [GC 5009.679: [ParNew Desired survivor size 3702784 bytes, new threshold 14 (max 15) - age 1: 673136 bytes, 673136 total - age 2: 271704 bytes, 944840 total - age 3: 269232 bytes, 1214072 total - age 4: 268088 bytes, 1482160 total - age 5: 264528 bytes, 1746688 total - age 6: 244280 bytes, 1990968 total - age 7: 238320 bytes, 2229288 total - age 8: 241112 bytes, 2470400 total - age 9: 233416 bytes, 2703816 total - age 10: 231040 bytes, 2934856 total - age 11: 231040 bytes, 3165896 total - age 12: 231040 bytes, 3396936 total - age 13: 232256 bytes, 3629192 total - age 14: 232256 bytes, 3861448 total : 121048K->5058K(123840K), 0.0675964 secs] 172721K->56957K(517056K), 0.0677232 secs] [Times: user=0.17 sys=0.66, real=0.06 secs] 5037.742: [GC 5037.742: [ParNew Desired survivor size 3702784 bytes, new threshold 14 (max 15) - age 1: 582296 bytes, 582296 total - age 2: 268128 bytes, 850424 total - age 3: 271528 bytes, 1121952 total - age 4: 269192 bytes, 1391144 total - age 5: 264952 bytes, 1656096 total - age 6: 242496 bytes, 1898592 total - age 7: 240752 bytes, 2139344 total - age 8: 238320 bytes, 2377664 total - age 9: 234776 bytes, 2612440 total - age 10: 231040 bytes, 2843480 total - age 11: 231040 bytes, 3074520 total - age 12: 231040 bytes, 3305560 total - age 13: 231040 bytes, 3536600 total - age 14: 232256 bytes, 3768856 total : 121666K->5991K(123840K), 0.0649960 secs] 173565K->58116K(517056K), 0.0650795 secs] [Times: user=0.17 sys=0.64, real=0.06 secs] I'll look into why 232k lives through seven minutes of collections, but in both cases a full collection (triggered through JMX) brought the heap down to ~12mb and removed the "accumulated" ParNew time. Any idea why I'm seeing the slow increase in ParNew collection times and/or more tests that I should be running to diagnose it? I can provide the full logs if you'd like, I've abbreviated them here to avoid sending long messages to the entire list. On Wed, Jul 21, 2010 at 2:16 PM, Y. S. Ramakrishna < y.s.ramakrishna at oracle.com> wrote: > > > On 07/21/10 13:41, Eagle Kessler wrote: > >> Checking the configs, it looks like we are explicitly setting MTT to 0: >> >> # Min, max, total JVM size (-Xms -Xmx) >> JVM_SIZE="-server -Xms2g -Xmx2g" >> >> # New Generation Sizes (-XX:NewSize -XX:MaxNewSize) >> >> JVM_SIZE_NEW="-XX:NewSize=512m -XX:MaxNewSize=512m" >> >> # Perm Generation Sizes (-XX:PermSize -XX:MaxPermSize) >> JVM_SIZE_PERM="-XX:PermSize=128m -XX:MaxPermSize=128m" >> >> # Type of Garbage Collector to use >> >> JVM_GC_TYPE="-XX:+UseConcMarkSweepGC -XX:+UseParNewGC" >> >> JVM_GC_OPTS="-XX:MaxTenuringThreshold=0 -XX:SurvivorRatio=128" >> >> I agree that in this case we definitely want to be using survivor spaces >> (I'm unfortunately not in charge of the default GC settings yet). However, I >> didn't know that running without survivor spaces could cause this kind of >> behavior. Why does running without survivor spaces cause such a large >> performance issue? >> > > See below. > > > >> Regardless, I'll ask that -XX:+PrintTenuringDistribution be added to the >> configs, along with a non-zero MTT, and see if the issue persists. The >> > > And of course SurvivorRatio a more reasonable value like 6 or 8, > depending on expected survival rate etc. > > > rising ParNew times seem like they would be unrelated to the tenuring >> threshold, though, wouldn't they? >> > > No, it's related in the sense that MTT=0 was resulting in very very > short-lived > data to promote into the old gen. This can cause object dempgraphics (size > and > age dirstribution) to be very non-stationary, and confuse the heuristics > for sizing the free list inventory. It of course also places large pressure > on the old gen allocator, increases fragmentation, increases mutation rates > and so on. > > The only thing it might not affect much is the initial mark pauses, > which will probably stay as they were before. > > -- ramki > > >> On Wed, Jul 21, 2010 at 12:40 PM, Y. S. Ramakrishna < >> y.s.ramakrishna at oracle.com > wrote: >> >> oh, and the "(max 0)" seems to imply you have somehow ended up >> asking for >> MaxTenuringThreshold=0 which can be disastrous. It is possible that the >> default settings for CMS on certain very old JVM's did indeed result in >> MTT=0 (but memory is fading of that era), so if using such older >> vintage >> of JVM's, explicitly setting MTT and SurvivorRatio higher is a good >> idea. >> >> -- ramki >> >> >> On 07/21/10 12:34, Y. S. Ramakrishna wrote: >> >> Yes, like what i think Matt is getting at, i'd configure >> sufficiently >> large survivor spaces. Even if you expect most of your objects >> to die young, >> you'd want survivor spaces large enough to keep at least age 1 >> objects in >> the survivor space. If as you state no medium- to ling-lived >> state is >> retained, your data is mostly short-lived and you'll be able to >> do without >> any promotion at all. Your problem here is that somehow your >> survivor >> spaces may have disappeared. (+PrintHeapAtGC will tell you, and >> of course examining yr JVM options should shed more light on >> that apparent >> disappearance.) >> >> -- ramki >> >> On 07/21/10 10:09, Eagle Kessler wrote: >> >> Hello all, >> I've got a web service that I'm seeing some very strange >> behavior on around it's garbage collection, and was >> wondering if someone here could explain why it might be >> happening. The service itself is fairly simple: Take in data >> from a few sources and merge them with the existing data in >> the database. It stores nearly no state while doing this, >> and indeed heap dumps taken 1, 24, and 72 hours after boot >> indicate that we have a consistent ~12mb of live data (in a >> 2GB heap, but I hope that's not what is causing this). >> >> The GC logs, though, don't look very happy at all. After our >> start up period, they settle into a fairly consistent pattern: >> >> 1041.159: [GC 1041.159: [ParNew >> Desired survivor size 2064384 bytes, new threshold 0 (max 0) >> >> : 516224K->0K(520256K), 0.0170322 secs] >> 537266K->22659K(2093120K), 0.0171330 secs] [Times: user=0.04 >> sys=0.01, real=0.02 secs] 1606.500: [GC 1606.500: [ParNew >> Desired survivor size 2064384 bytes, new threshold 0 (max 0) >> >> : 516224K->0K(520256K), 0.0173235 secs] >> 538883K->24214K(2093120K), 0.0174138 secs] [Times: user=0.04 >> sys=0.03, real=0.02 secs] 2040.773: [GC 2040.773: [ParNew >> Desired survivor size 2064384 bytes, new threshold 0 (max 0) >> >> : 516224K->0K(520256K), 0.0196319 secs] >> 540438K->25737K(2093120K), 0.0197275 secs] [Times: user=0.05 >> sys=0.02, real=0.02 secs] >> Which we would be very nice if it kept going like that. >> However, by the first time the CMS collector runs, things >> aren't working nearly as well: >> >> 214182.439: [GC 214182.439: [ParNew >> Desired survivor size 2064384 bytes, new threshold 0 (max 0) >> : 516224K->0K(520256K), 1.0146996 secs] >> 1297278K->782575K(2093120K), 1.0148799 secs] [Times: >> user=1.21 sys=0.58, real=1.01 secs] >> 214247.437: [GC 214247.438: [ParNew >> Desired survivor size 2064384 bytes, new threshold 0 (max 0) >> : 516224K->0K(520256K), 1.2310274 secs] >> 1298799K->784188K(2093120K), 1.2311534 secs] [Times: >> user=1.46 sys=0.69, real=1.23 secs] >> 214313.642: [GC 214313.642: [ParNew >> Desired survivor size 2064384 bytes, new threshold 0 (max 0) >> : 516224K->0K(520256K), 1.2183258 secs] >> 1300412K->785710K(2093120K), 1.2184848 secs] [Times: >> user=1.45 sys=0.65, real=1.22 secs] >> >> The increasing sys time is a bit worrying, but it seems like >> the actual GC time is rising as well, even though we aren't >> collecting any more young-gen garbage. At this point, CMS >> went off >> >> 214380.695: [GC [1 CMS-initial-mark: 787188K(1572864K)] >> 787195K(2093120K), 1.8929842 secs] [Times: user=1.50 >> sys=0.02, real=1.89 secs] >> 214382.589: [CMS-concurrent-mark-start] >> 214383.056: [CMS-concurrent-mark: 0.467/0.467 secs] [Times: >> user=1.81 sys=0.01, real=0.47 secs] 214383.056: >> [CMS-concurrent-preclean-start] >> 214383.064: [CMS-concurrent-preclean: 0.008/0.008 secs] >> [Times: user=0.01 sys=0.00, real=0.01 secs] >> 214383.064: [CMS-concurrent-abortable-preclean-start] >> CMS: abort preclean due to time 214388.133: >> [CMS-concurrent-abortable-preclean: 0.242/5.069 secs] >> [Times: user=5.02 sys=0.02, real=5.07 secs] 214388.159: >> [GC[YG occupancy: 51943 K (520256 K)]214388.159: [Rescan >> (parallel) , 1.5403455 secs]214389.699: [weak refs >> processing, 0.0050170 secs] [1 CMS-remark: >> 787188K(1572864K)] 839132K(2093120K), 1.5458536 secs] >> [Times: user=1.80 sys=0.71, real=1.55 secs] >> 214389.705: [CMS-concurrent-sweep-start] >> 214390.439: [CMS-concurrent-sweep: 0.671/0.734 secs] [Times: >> user=1.35 sys=0.00, real=0.73 secs] 214390.439: >> [CMS-concurrent-reset-start] >> 214390.621: [CMS-concurrent-reset: 0.183/0.183 secs] [Times: >> user=0.20 sys=0.02, real=0.18 secs] >> >> It seems like a) initial-mark shouldn't take 1.8 seconds, b) >> if we really do only have 12mb of live data, CMS should have >> collected a lot more than it did (the next ParNew collection >> reported ~545MB of old gen in use), and c) 50% heap usage >> with very little promotion seems very early for the >> collector to go off. >> >> The next CMS cycle is at 434,973 seconds, by which point the >> young gen collections are taking 3 seconds (user 3.59, sys >> 1.60, real 3.09). The initial mark takes 4.82 seconds (user >> 3.82, sys 0.02, real 4.82), and sweeps down to 1.1gb of used >> old gen. I haven't yet confirmed it, but given the previous >> heap dumps I'd guess that they will claim 12mb of live >> objects and 1.1gb of dead objects. The current young gen >> collections (at 497,601 seconds) are taking ~3.7 seconds >> (4.33 user, 2.03 sys) Any idea what could be going on here? >> We're running JDK 1.6_16. >> >> -- Eagle Kessler >> >> >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> >> >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> >> >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> >> >> >> >> -- >> Eagle Kessler >> > -- Eagle Kessler -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100722/643576e8/attachment-0001.html From matt.fowles at gmail.com Thu Jul 22 12:10:04 2010 From: matt.fowles at gmail.com (Matt Fowles) Date: Thu, 22 Jul 2010 15:10:04 -0400 Subject: Unreasonably long ParNew, CMS pause times In-Reply-To: References: <4C474BD6.1060908@oracle.com> <4C474D2C.9070103@oracle.com> <4C4763AC.4030508@oracle.com> Message-ID: Eagle~ If I had to guess, I would attribute the increase in young collection times to fragmentation in the old gen making promotion more expensive. Given your tenuring distribution, I would just set MTT to 2. Matt On Thu, Jul 22, 2010 at 2:36 PM, Eagle Kessler wrote: > We've switched to the throughput collector with a max pause time goal, and > that seems to be behaving well. I'm still curious about the behavior that > I'm seeing with CMS, though. > > I ran some tests with -XX:+PrintTenuringDistribution and without setting > MaxTenuringThreshold, and I'm still seeing the behavior. Both of these tests > were executed under constant load: > > 512mb heap (128mb young gen), MTT=0: > 62.537: [GC 62.537: [ParNew > Desired survivor size 491520 bytes, new threshold 0 (max 0) > : 129152K->0K(130112K), 0.0064317 secs] 143985K->16058K(523328K), 0.0065156 > secs] [Times: user=0.03 sys=0.01, real=0.01 secs] > 95.633: [GC 95.633: [ParNew > Desired survivor size 491520 bytes, new threshold 0 (max 0) > : 129152K->0K(130112K), 0.0030186 secs] 145210K->16614K(523328K), 0.0030929 > secs] [Times: user=0.01 sys=0.00, real=0.00 secs] > 127.882: [GC 127.882: [ParNew > Desired survivor size 491520 bytes, new threshold 0 (max 0) > : 129152K->0K(130112K), 0.0042069 secs] 145766K->17285K(523328K), 0.0042783 > secs] [Times: user=0.01 sys=0.02, real=0.00 secs] > 158.086: [GC 158.086: [ParNew > Desired survivor size 491520 bytes, new threshold 0 (max 0) > : 129152K->0K(130112K), 0.0045986 secs] 146437K->17955K(523328K), 0.0046896 > secs] [Times: user=0.02 sys=0.02, real=0.01 secs] > > Rising to > 5021.415: [GC 5021.415: [ParNew > Desired survivor size 491520 bytes, new threshold 0 (max 0) > : 129152K->0K(130112K), 0.0604269 secs] 225694K->97083K(523328K), 0.0605133 > secs] [Times: user=0.10 sys=0.61, real=0.06 secs] > 5053.611: [GC 5053.611: [ParNew > Desired survivor size 491520 bytes, new threshold 0 (max 0) > : 129152K->0K(130112K), 0.0569357 secs] 226235K->97636K(523328K), 0.0570316 > secs] [Times: user=0.10 sys=0.60, real=0.05 secs] > 5084.401: [GC 5084.401: [ParNew > Desired survivor size 491520 bytes, new threshold 0 (max 0) > : 129152K->0K(130112K), 0.0591064 secs] 226788K->98088K(523328K), 0.0591840 > secs] [Times: user=0.09 sys=0.62, real=0.06 secs] > > > Similarly, 512mb heap, 128mb young gen, and no MTT: > 83.708: [GC 83.708: [ParNew > Desired survivor size 3702784 bytes, new threshold 15 (max 15) > - age 1: 626008 bytes, 626008 total > - age 2: 736336 bytes, 1362344 total > : 120669K->1759K(123840K), 0.0034469 secs] 135502K->16593K(517056K), > 0.0035234 secs] [Times: user=0.02 sys=0.01, real=0.01 secs] > 109.627: [GC 109.627: [ParNew > Desired survivor size 3702784 bytes, new threshold 15 (max 15) > - age 1: 634688 bytes, 634688 total > - age 2: 268624 bytes, 903312 total > - age 3: 736016 bytes, 1639328 total > : 118367K->2275K(123840K), 0.0036776 secs] 133201K->17109K(517056K), > 0.0037455 secs] [Times: user=0.02 sys=0.01, real=0.01 secs] > 137.495: [GC 137.495: [ParNew > Desired survivor size 3702784 bytes, new threshold 15 (max 15) > - age 1: 479728 bytes, 479728 total > - age 2: 269064 bytes, 748792 total > - age 3: 267904 bytes, 1016696 total > - age 4: 735952 bytes, 1752648 total > : 118883K->2396K(123840K), 0.0040557 secs] 133717K->17230K(517056K), > 0.0041302 secs] [Times: user=0.02 sys=0.01, real=0.01 secs] > 165.090: [GC 165.090: [ParNew > Desired survivor size 3702784 bytes, new threshold 15 (max 15) > - age 1: 403856 bytes, 403856 total > - age 2: 267736 bytes, 671592 total > - age 3: 268616 bytes, 940208 total > - age 4: 267904 bytes, 1208112 total > - age 5: 729072 bytes, 1937184 total > : 119004K->2668K(123840K), 0.0046744 secs] 133838K->17501K(517056K), > 0.0047473 secs] [Times: user=0.04 sys=0.02, real=0.00 secs] > > Rising to > 4981.917: [GC 4981.918: [ParNew > Desired survivor size 3702784 bytes, new threshold 14 (max 15) > - age 1: 533872 bytes, 533872 total > - age 2: 269336 bytes, 803208 total > - age 3: 268048 bytes, 1071256 total > - age 4: 268272 bytes, 1339528 total > - age 5: 265880 bytes, 1605408 total > - age 6: 241704 bytes, 1847112 total > - age 7: 241112 bytes, 2088224 total > - age 8: 239680 bytes, 2327904 total > - age 9: 233632 bytes, 2561536 total > - age 10: 231040 bytes, 2792576 total > - age 11: 231040 bytes, 3023616 total > - age 12: 232256 bytes, 3255872 total > - age 13: 232256 bytes, 3488128 total > - age 14: 231040 bytes, 3719168 total > : 122652K->4440K(123840K), 0.0654827 secs] 173869K->56113K(517056K), > 0.0655671 secs] [Times: user=0.17 sys=0.62, real=0.07 secs] > 5009.679: [GC 5009.679: [ParNew > Desired survivor size 3702784 bytes, new threshold 14 (max 15) > - age 1: 673136 bytes, 673136 total > - age 2: 271704 bytes, 944840 total > - age 3: 269232 bytes, 1214072 total > - age 4: 268088 bytes, 1482160 total > - age 5: 264528 bytes, 1746688 total > - age 6: 244280 bytes, 1990968 total > - age 7: 238320 bytes, 2229288 total > - age 8: 241112 bytes, 2470400 total > - age 9: 233416 bytes, 2703816 total > - age 10: 231040 bytes, 2934856 total > - age 11: 231040 bytes, 3165896 total > - age 12: 231040 bytes, 3396936 total > - age 13: 232256 bytes, 3629192 total > - age 14: 232256 bytes, 3861448 total > : 121048K->5058K(123840K), 0.0675964 secs] 172721K->56957K(517056K), > 0.0677232 secs] [Times: user=0.17 sys=0.66, real=0.06 secs] > 5037.742: [GC 5037.742: [ParNew > Desired survivor size 3702784 bytes, new threshold 14 (max 15) > - age 1: 582296 bytes, 582296 total > - age 2: 268128 bytes, 850424 total > - age 3: 271528 bytes, 1121952 total > - age 4: 269192 bytes, 1391144 total > - age 5: 264952 bytes, 1656096 total > - age 6: 242496 bytes, 1898592 total > - age 7: 240752 bytes, 2139344 total > - age 8: 238320 bytes, 2377664 total > - age 9: 234776 bytes, 2612440 total > - age 10: 231040 bytes, 2843480 total > - age 11: 231040 bytes, 3074520 total > - age 12: 231040 bytes, 3305560 total > - age 13: 231040 bytes, 3536600 total > - age 14: 232256 bytes, 3768856 total > : 121666K->5991K(123840K), 0.0649960 secs] 173565K->58116K(517056K), > 0.0650795 secs] [Times: user=0.17 sys=0.64, real=0.06 secs] > > > I'll look into why 232k lives through seven minutes of collections, but in > both cases a full collection (triggered through JMX) brought the heap down > to ~12mb and removed the "accumulated" ParNew time. Any idea why I'm seeing > the slow increase in ParNew collection times and/or more tests that I should > be running to diagnose it? I can provide the full logs if you'd like, I've > abbreviated them here to avoid sending long messages to the entire list. > > > > On Wed, Jul 21, 2010 at 2:16 PM, Y. S. Ramakrishna < > y.s.ramakrishna at oracle.com> wrote: > >> >> >> On 07/21/10 13:41, Eagle Kessler wrote: >> >>> Checking the configs, it looks like we are explicitly setting MTT to 0: >>> >>> # Min, max, total JVM size (-Xms -Xmx) >>> JVM_SIZE="-server -Xms2g -Xmx2g" >>> >>> # New Generation Sizes (-XX:NewSize -XX:MaxNewSize) >>> >>> JVM_SIZE_NEW="-XX:NewSize=512m -XX:MaxNewSize=512m" >>> >>> # Perm Generation Sizes (-XX:PermSize -XX:MaxPermSize) >>> JVM_SIZE_PERM="-XX:PermSize=128m -XX:MaxPermSize=128m" >>> >>> # Type of Garbage Collector to use >>> >>> JVM_GC_TYPE="-XX:+UseConcMarkSweepGC -XX:+UseParNewGC" >>> >>> JVM_GC_OPTS="-XX:MaxTenuringThreshold=0 -XX:SurvivorRatio=128" >>> >>> I agree that in this case we definitely want to be using survivor spaces >>> (I'm unfortunately not in charge of the default GC settings yet). However, I >>> didn't know that running without survivor spaces could cause this kind of >>> behavior. Why does running without survivor spaces cause such a large >>> performance issue? >>> >> >> See below. >> >> >> >>> Regardless, I'll ask that -XX:+PrintTenuringDistribution be added to the >>> configs, along with a non-zero MTT, and see if the issue persists. The >>> >> >> And of course SurvivorRatio a more reasonable value like 6 or 8, >> depending on expected survival rate etc. >> >> >> rising ParNew times seem like they would be unrelated to the tenuring >>> threshold, though, wouldn't they? >>> >> >> No, it's related in the sense that MTT=0 was resulting in very very >> short-lived >> data to promote into the old gen. This can cause object dempgraphics (size >> and >> age dirstribution) to be very non-stationary, and confuse the heuristics >> for sizing the free list inventory. It of course also places large >> pressure >> on the old gen allocator, increases fragmentation, increases mutation >> rates >> and so on. >> >> The only thing it might not affect much is the initial mark pauses, >> which will probably stay as they were before. >> >> -- ramki >> >> >>> On Wed, Jul 21, 2010 at 12:40 PM, Y. S. Ramakrishna < >>> y.s.ramakrishna at oracle.com > wrote: >>> >>> oh, and the "(max 0)" seems to imply you have somehow ended up >>> asking for >>> MaxTenuringThreshold=0 which can be disastrous. It is possible that >>> the >>> default settings for CMS on certain very old JVM's did indeed result >>> in >>> MTT=0 (but memory is fading of that era), so if using such older >>> vintage >>> of JVM's, explicitly setting MTT and SurvivorRatio higher is a good >>> idea. >>> >>> -- ramki >>> >>> >>> On 07/21/10 12:34, Y. S. Ramakrishna wrote: >>> >>> Yes, like what i think Matt is getting at, i'd configure >>> sufficiently >>> large survivor spaces. Even if you expect most of your objects >>> to die young, >>> you'd want survivor spaces large enough to keep at least age 1 >>> objects in >>> the survivor space. If as you state no medium- to ling-lived >>> state is >>> retained, your data is mostly short-lived and you'll be able to >>> do without >>> any promotion at all. Your problem here is that somehow your >>> survivor >>> spaces may have disappeared. (+PrintHeapAtGC will tell you, and >>> of course examining yr JVM options should shed more light on >>> that apparent >>> disappearance.) >>> >>> -- ramki >>> >>> On 07/21/10 10:09, Eagle Kessler wrote: >>> >>> Hello all, >>> I've got a web service that I'm seeing some very strange >>> behavior on around it's garbage collection, and was >>> wondering if someone here could explain why it might be >>> happening. The service itself is fairly simple: Take in data >>> from a few sources and merge them with the existing data in >>> the database. It stores nearly no state while doing this, >>> and indeed heap dumps taken 1, 24, and 72 hours after boot >>> indicate that we have a consistent ~12mb of live data (in a >>> 2GB heap, but I hope that's not what is causing this). >>> >>> The GC logs, though, don't look very happy at all. After our >>> start up period, they settle into a fairly consistent pattern: >>> >>> 1041.159: [GC 1041.159: [ParNew >>> Desired survivor size 2064384 bytes, new threshold 0 (max 0) >>> >>> : 516224K->0K(520256K), 0.0170322 secs] >>> 537266K->22659K(2093120K), 0.0171330 secs] [Times: user=0.04 >>> sys=0.01, real=0.02 secs] 1606.500: [GC 1606.500: [ParNew >>> Desired survivor size 2064384 bytes, new threshold 0 (max 0) >>> >>> : 516224K->0K(520256K), 0.0173235 secs] >>> 538883K->24214K(2093120K), 0.0174138 secs] [Times: user=0.04 >>> sys=0.03, real=0.02 secs] 2040.773: [GC 2040.773: [ParNew >>> Desired survivor size 2064384 bytes, new threshold 0 (max 0) >>> >>> : 516224K->0K(520256K), 0.0196319 secs] >>> 540438K->25737K(2093120K), 0.0197275 secs] [Times: user=0.05 >>> sys=0.02, real=0.02 secs] >>> Which we would be very nice if it kept going like that. >>> However, by the first time the CMS collector runs, things >>> aren't working nearly as well: >>> >>> 214182.439: [GC 214182.439: [ParNew >>> Desired survivor size 2064384 bytes, new threshold 0 (max 0) >>> : 516224K->0K(520256K), 1.0146996 secs] >>> 1297278K->782575K(2093120K), 1.0148799 secs] [Times: >>> user=1.21 sys=0.58, real=1.01 secs] >>> 214247.437: [GC 214247.438: [ParNew >>> Desired survivor size 2064384 bytes, new threshold 0 (max 0) >>> : 516224K->0K(520256K), 1.2310274 secs] >>> 1298799K->784188K(2093120K), 1.2311534 secs] [Times: >>> user=1.46 sys=0.69, real=1.23 secs] >>> 214313.642: [GC 214313.642: [ParNew >>> Desired survivor size 2064384 bytes, new threshold 0 (max 0) >>> : 516224K->0K(520256K), 1.2183258 secs] >>> 1300412K->785710K(2093120K), 1.2184848 secs] [Times: >>> user=1.45 sys=0.65, real=1.22 secs] >>> >>> The increasing sys time is a bit worrying, but it seems like >>> the actual GC time is rising as well, even though we aren't >>> collecting any more young-gen garbage. At this point, CMS >>> went off >>> >>> 214380.695: [GC [1 CMS-initial-mark: 787188K(1572864K)] >>> 787195K(2093120K), 1.8929842 secs] [Times: user=1.50 >>> sys=0.02, real=1.89 secs] >>> 214382.589: [CMS-concurrent-mark-start] >>> 214383.056: [CMS-concurrent-mark: 0.467/0.467 secs] [Times: >>> user=1.81 sys=0.01, real=0.47 secs] 214383.056: >>> [CMS-concurrent-preclean-start] >>> 214383.064: [CMS-concurrent-preclean: 0.008/0.008 secs] >>> [Times: user=0.01 sys=0.00, real=0.01 secs] >>> 214383.064: [CMS-concurrent-abortable-preclean-start] >>> CMS: abort preclean due to time 214388.133: >>> [CMS-concurrent-abortable-preclean: 0.242/5.069 secs] >>> [Times: user=5.02 sys=0.02, real=5.07 secs] 214388.159: >>> [GC[YG occupancy: 51943 K (520256 K)]214388.159: [Rescan >>> (parallel) , 1.5403455 secs]214389.699: [weak refs >>> processing, 0.0050170 secs] [1 CMS-remark: >>> 787188K(1572864K)] 839132K(2093120K), 1.5458536 secs] >>> [Times: user=1.80 sys=0.71, real=1.55 secs] >>> 214389.705: [CMS-concurrent-sweep-start] >>> 214390.439: [CMS-concurrent-sweep: 0.671/0.734 secs] [Times: >>> user=1.35 sys=0.00, real=0.73 secs] 214390.439: >>> [CMS-concurrent-reset-start] >>> 214390.621: [CMS-concurrent-reset: 0.183/0.183 secs] [Times: >>> user=0.20 sys=0.02, real=0.18 secs] >>> >>> It seems like a) initial-mark shouldn't take 1.8 seconds, b) >>> if we really do only have 12mb of live data, CMS should have >>> collected a lot more than it did (the next ParNew collection >>> reported ~545MB of old gen in use), and c) 50% heap usage >>> with very little promotion seems very early for the >>> collector to go off. >>> >>> The next CMS cycle is at 434,973 seconds, by which point the >>> young gen collections are taking 3 seconds (user 3.59, sys >>> 1.60, real 3.09). The initial mark takes 4.82 seconds (user >>> 3.82, sys 0.02, real 4.82), and sweeps down to 1.1gb of used >>> old gen. I haven't yet confirmed it, but given the previous >>> heap dumps I'd guess that they will claim 12mb of live >>> objects and 1.1gb of dead objects. The current young gen >>> collections (at 497,601 seconds) are taking ~3.7 seconds >>> (4.33 user, 2.03 sys) Any idea what could be going on here? >>> We're running JDK 1.6_16. >>> >>> -- Eagle Kessler >>> >>> >>> >>> ------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> hotspot-gc-use mailing list >>> hotspot-gc-use at openjdk.java.net >>> >>> >>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>> >>> _______________________________________________ >>> hotspot-gc-use mailing list >>> hotspot-gc-use at openjdk.java.net >>> >>> >>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>> >>> >>> >>> >>> -- >>> Eagle Kessler >>> >> > > > -- > Eagle Kessler > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100722/a0380a5a/attachment-0001.html From todd at cloudera.com Tue Jul 27 15:08:03 2010 From: todd at cloudera.com (Todd Lipcon) Date: Tue, 27 Jul 2010 15:08:03 -0700 Subject: G1GC Full GCs In-Reply-To: References: <4C3CB3E9.4040305@oracle.com> Message-ID: Hi all, Back from my vacation and took some time yesterday and today to build a fresh JDK 7 with some additional debug printouts from Peter's patch. What I found was a bit different - the rset scanning estimates are low, but I consistently am seeing "Other time" estimates in the >40ms range. Given my pause time goal of 20ms, these estimates are I think excluding most of the regions from collectability. I haven't been able to dig around yet to figure out where the long estimate for "other time" is coming from - in the collections logged it sometimes shows fairly high "Other" but the "Choose CSet" component is very short. I'll try to add some more debug info to the verbose logging and rerun some tests over the next couple of days. At the moment I'm giving the JRockit VM a try to see how its deterministic GC stacks up against G1 and CMS. Thanks -Todd On Tue, Jul 13, 2010 at 5:15 PM, Peter Schuller wrote: > Ramki/Tony, > > > Any chance of setting +PrintHeapAtGC and -XX:+PrintHeapAtGCExtended and > > sending us the log, or part of it (say between two Full GCs)? Be > prepared: > > this will generate piles of output. But it will give us per-region > > information that might shed more light on the cause of the issue.... > thanks, > > So what I have in terms of data is (see footnotes for urls references in > []): > > (a) A patch[1] that prints some additional information about estimated > costs of region eviction, and disables the GC efficiency check that > normally terminates selection of regions. (Note: This is a throw-away > patch for debugging; it's not intended as a suggested change for > inclusion.) > > (b) A log[2] showing the output of a test run I did just now, with > both your flags above and my patch enabled (but without disabling the > efficiency check). It shows fallback to full GC when the actual live > set size is 252 MB, and the maximum heap size is 2 GB (in other words, > ~ 12% liveness). An easy way to find the point of full gc is to search > for the string 'full 1'. > > (c) A file[3] with the effective VM options during the test. > > (d) Instructions for how to run the test to reproduce it (I'll get to > that at the end; it's simplified relative to previously). > > (e) Nature of the test. > > Discussion: > > WIth respect to region information: I originally tried it in response > to your recommendation earlier, but I found I did not see the > information I was after. Perhaps I was just misreading it, but I > mostly just saw either 0% or 100% fullness, and never the actual > liveness estimate as produced by the mark phase. In the log I am > referring to in this E-Mail, you can see that the last printout of > region information just before the live GC fits this pattern; I just > don't see anything that looks like legitimate liveness information > being printed. (I don't have time to dig back into it right now to > double-check what it's printing.) > > If you scroll up from the point of the full gc until you find a bunch > of output starting with "predict_region_elapsed_time_ms" you see some > output resulting from the patch, with pretty extreme values such as: > > predict_region_elapsed_time_ms: 34.378642ms total, 34.021154ms rs scan > (46542 cnum), 0.040069 copy time (20704 bytes), 0.317419 other time > predict_region_elapsed_time_ms: 45.020866ms total, 44.653222ms rs scan > (61087 cnum), 0.050225 copy time (25952 bytes), 0.317419 other time > predict_region_elapsed_time_ms: 16.250033ms total, 15.887065ms rs scan > (21734 cnum), 0.045550 copy time (23536 bytes), 0.317419 other time > predict_region_elapsed_time_ms: 226.942877ms total, 226.559163ms rs > scan (309940 cnum), 0.066296 copy time (34256 bytes), 0.317419 other > time > predict_region_elapsed_time_ms: 542.344828ms total, 541.954750ms rs > scan (741411 cnum), 0.072659 copy time (37544 bytes), 0.317419 other > time > predict_region_elapsed_time_ms: 668.595597ms total, 668.220877ms rs > scan (914147 cnum), 0.057301 copy time (29608 bytes), 0.317419 other > time > > So in the most extreme case in the excerpt above, that's > half a > second of estimate rset scanning time for a single region with 914147 > cards to be scanned. While not all are that extreme, lots and lots of > regions are very expensive and almost only due to rset scanning costs. > > If you scroll down a bit to the first (and ONLY) partial that happened > after the statistics accumulating from the marking phase, we see more > output resulting form the patch. At the end, we see: > > (picked region; 0.345890ms predicted; 1.317244ms remaining; 15kb > marked; 15kb maxlive; 1-1% liveness) > (393380 KB left in heap.) > (picked region; 0.345963ms predicted; 0.971354ms remaining; 15kb > marked; 15kb maxlive; 1-1% liveness) > (393365 KB left in heap.) > (picked region; 0.346036ms predicted; 0.625391ms remaining; 15kb > marked; 15kb maxlive; 1-1% liveness) > (393349 KB left in heap.) > (no more marked regions; next region too expensive (adaptive; > predicted 0.346036ms > remaining 0.279355ms)) > > So in other words, it picked a bunch of regions in order of "lowest > hanging fruit". The *least* low hanging fruit picked still had > liveness at 1%; in other words, there's plenty of further regions that > ideally should be collected because they contain almost no garbage > (ignoring the cost of collecting them). > > In this case, it stopped picking regions because the next region to be > picked, though cheap, was the straw that broke the camel's neck and we > simply exceeded the alloted time for this particular GC. > > However, after this partial completes, it reverts back to doing just > young gc:s. In other words, even though there's *plenty* of regions > with very low liveness, further partials aren't happening. > > By applying this part of the patch: > > - (adaptive_young_list_length() && > + (adaptive_young_list_length() && false && // scodetodo > > I artificially force g1 to not fall back to doing young gc:s for > efficiency reasons. When I run with that change, I don't experience > the slow perpetual growth until fallback to full GC. If I remember > correctly though, the rset scanning cost is in fact high, but I don't > have details saved and I'm afraid I don't have time to re-run those > tests right now and compare numbers. > > Reproducing it: > > I made some changes and the test case should now hopefully be easy to > run assuming you have maven installed. The github project is at: > > http://github.com/scode/httpgctest > > There is a README, but the shortest possible instructions to > re-produce the test that I did: > > git clone git://github.com/scode/httpgctest.git > cd httpgctest.git > git checkout 20100714_1 # grab from appropriate tag, in case I > change master > mvn package > HTTPGCTEST_LOGGC=gc.log ./run.sh > > That should start the http server; then run concurrently: > > while [ 1 ] ; do curl 'http://localhost:9191/dropdata?ratio=0.10' ; > curl 'http://localhost:9191/gendata?amount=25000' ; sleep 0.1 ; done > > And then just wait and observe. > > Nature of the test: > > So the test if run as above will essentially reach a steady state of > equilibrium with about 25000 pieces of data in a clojure immutable > map. The result is that a significant amount of new data is being > allocated, but very little writing to old regions is happening. The > garbage generated is very well spread out over the entire heap because > it goes through all objects and drops 10% (the ratio=0.10) for each > iteration, after which it adds 25000 new items. > > In other words; not a lot of old gen writing, but lots of writes to > the young gen referencing objects in the old gen. > > [1] http://distfiles.scode.org/mlref/g1/g1_region_live_stats_hack.patch > [2] > http://distfiles.scode.org/mlref/gctest/httpgctest-g1-fullgc-20100714/gc-fullfallback.log > [3] > http://distfiles.scode.org/mlref/gctest/httpgctest-g1-fullgc-20100714/vmoptions.txt > > -- > / Peter Schuller > -- Todd Lipcon Software Engineer, Cloudera -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100727/b3f9fccb/attachment.html From peter.schuller at infidyne.com Fri Jul 30 13:47:31 2010 From: peter.schuller at infidyne.com (Peter Schuller) Date: Fri, 30 Jul 2010 22:47:31 +0200 Subject: G1GC Full GCs In-Reply-To: References: <4C3CB3E9.4040305@oracle.com> Message-ID: > I consistently am seeing "Other time" estimates in the >40ms range. Given my > pause time goal of 20ms, these estimates are I think excluding most of the > regions from collectability. I haven't been able to dig around yet to figure > out where the long estimate for "other time" is coming from - in the > collections logged it sometimes shows fairly high "Other" but the "Choose > CSet" component is very short. (The following is wannabe speculation based on limited understanding of the code, please take it with a grain of salt.) My first thought here is swapping. My reading is that other time is going to be the collection set selection time plus the collection set free time (or at least intended to be). I think (am I wrong?) that this should be really low under normal circumstances since no "bulk" work is done really; in particular the *per-region* cost should be low. If the cost of these operations *per region* ended up being predicted to > 40ms, I wonder if this was not due to swapping? Additionally: As far as I can tell the estimated 'other' cost is based on a history of the cost from previous GC:s and completely independent of the particular region being evaluated. Anyways, I suspect you've already confirmed that the system is not actively swapping at the time of the fallback to full GC. But here is one low-confidence hypothesis (it would be really great to hear from one of the gc devs whether it is even remotely plausible): * At some point in time, there was swapping happening affecting GC operations such that the work done do gather stats and select regions was slow (makes some sense since that should touch lots of distinct regions and you don't need a lot of those memory accesses swapping to accumulate quite a bit of time). * This screwed up the 'other' cost history and thus the prediction, possibly for both young and non-young regions. * I believe young collections would never be entirely prevented due to pause time goals, so here the cost history and thus predictions would always have time to recover and you would not notice any effect looking at the behavior of the system down the line. * Non-young "other" cost was so high that non-young regions were never selected. This in turn meant that additional cost history for the "other" category was never recorded, preventing recovery from the temporary swap storm. * The end result is that no non-young regions are ever collected, and you end up falling back to full GC once the young collections have "leaked" enough garbage. Thoughts, anyone? -- / Peter Schuller From peter.schuller at infidyne.com Fri Jul 30 13:56:05 2010 From: peter.schuller at infidyne.com (Peter Schuller) Date: Fri, 30 Jul 2010 22:56:05 +0200 Subject: G1GC Full GCs In-Reply-To: References: <4C3CB3E9.4040305@oracle.com> Message-ID: > I consistently am seeing "Other time" estimates in the >40ms range. Given my > pause time goal of 20ms, these estimates are I think excluding most of the Btw, to test the hypothesis: When you say "constantly", are the times in fact so consistent that it's either exactly the same or almost, possibly being consistent with my proposed hypothesis that the non-young "other" time is stuck? If the young other time is not stuck I guess one might see some variation (I seem to get < 1 ms on my machine) but not a lot at all in comparison to 40ms. If you're seeing variation like 40-42 all the time, and it never decreasing significantly after it reached the 40ms range, that would be consistent with the hypothesis I believe. -- / Peter Schuller From todd at cloudera.com Fri Jul 30 14:11:52 2010 From: todd at cloudera.com (Todd Lipcon) Date: Fri, 30 Jul 2010 14:11:52 -0700 Subject: G1GC Full GCs In-Reply-To: References: <4C3CB3E9.4040305@oracle.com> Message-ID: On Fri, Jul 30, 2010 at 1:56 PM, Peter Schuller wrote: > > I consistently am seeing "Other time" estimates in the >40ms range. Given > my > > pause time goal of 20ms, these estimates are I think excluding most of > the > > Btw, to test the hypothesis: When you say "constantly", are the times > in fact so consistent that it's either exactly the same or almost, > possibly being consistent with my proposed hypothesis that the > non-young "other" time is stuck? If the young other time is not stuck > I guess one might see some variation (I seem to get < 1 ms on my > machine) but not a lot at all in comparison to 40ms. If you're seeing > variation like 40-42 all the time, and it never decreasing > significantly after it reached the 40ms range, that would be > consistent with the hypothesis I believe. > > Hi Peter, There shouldn't be any swapping during the tests - I've got RAM fairly carefully allocated and I believe swappiness was tuned down on those machines, though I will double check to be certain. I'll try to read through your full email in detail while looking at the source and the G1 paper -- right now it's a bit above my head :) FWIW, my tests on JRockit JRRT's gcprio:deterministic collector didn't go much better - eventually it fell back to a full compaction which lasted 45 seconds or so. HBase must be doing something that's really hard for GCs to deal with - either on the heuristics front or on the allocation pattern front. -Todd -- Todd Lipcon Software Engineer, Cloudera -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100730/77163736/attachment.html From peter.schuller at infidyne.com Fri Jul 30 14:39:54 2010 From: peter.schuller at infidyne.com (Peter Schuller) Date: Fri, 30 Jul 2010 23:39:54 +0200 Subject: G1GC Full GCs In-Reply-To: References: <4C3CB3E9.4040305@oracle.com> Message-ID: > There shouldn't be any swapping during the tests - I've got RAM fairly > carefully allocated and I believe swappiness was tuned down on those > machines, though I will double check to be certain. Does HBase mmap() significant amounts of memory for I/O purposes? I'm not very familiar with HBase and a quick Googling didn't yield an answer. With extensive mmap():ed I/O, excessive swapping of the application seems to be a common problem even with significant memory margins, sometimes even with swapiness turned down to 0. I've seen it happen under several circumstances, and based on reports on the cassandra-user mailing list during the past couple of months it seems I'm not alone. To be sure I recommend checking actual swapping history (or at least check that the absolute amount of memory swapped out is reasonable over time). > I'll try to read through your full email in detail while looking at the > source and the G1 paper -- right now it's a bit above my head :) Well, just to re-iterate though I have really only begun looking at it myself and my ramblings may be completely off the mark. > FWIW, my tests on JRockit JRRT's gcprio:deterministic collector didn't go > much better - eventually it fell back to a full compaction which lasted 45 > seconds or so. HBase must be doing something that's really hard for GCs to > deal with - either on the heuristics front or on the allocation pattern > front. Interesting. I don't know a lot about JRockit's implementation since not a lot of information seems to be available. I did my LRU micro-benchmark with a ~20-30 GB heap and JRockit. I could definitely press it hard enough to cause a fallback, but that seemed to be directly as a result of high allocation rates simply exceeding the forward progress made by the GC (based on blackbox observation anyway). (The other problem was that the compaction pauses were never able to complete; it seems compaction is O(n) with respect to the number of objects being compacted, and I was unable to make it compact less than 1% per GC (because the command line option only accepted integral percents), and with my object count the 1% was enough to hit the pause time requirement so compaction was aborted every time. LIkely this would have poor results over time as fragmentation becomes significant.). Does HBase go into periodic modes of very high allocation rate, or is it fairly constant over time? I'm thinking that perhaps the concurrent marking is just not triggered early enough and if large bursts of allocations happen when the heap is relatively full, that might be the triggering factor? -- / Peter Schuller From todd at cloudera.com Fri Jul 30 14:44:54 2010 From: todd at cloudera.com (Todd Lipcon) Date: Fri, 30 Jul 2010 14:44:54 -0700 Subject: G1GC Full GCs In-Reply-To: References: <4C3CB3E9.4040305@oracle.com> Message-ID: On Fri, Jul 30, 2010 at 2:39 PM, Peter Schuller wrote: > > There shouldn't be any swapping during the tests - I've got RAM fairly > > carefully allocated and I believe swappiness was tuned down on those > > machines, though I will double check to be certain. > > Does HBase mmap() significant amounts of memory for I/O purposes? I'm > not very familiar with HBase and a quick Googling didn't yield an > answer. > > Nope, we don't currently use memory mapped IO at all - all of our IO is via sockets, actually. > With extensive mmap():ed I/O, excessive swapping of the application > seems to be a common problem even with significant memory margins, > sometimes even with swapiness turned down to 0. I've seen it happen > under several circumstances, and based on reports on the > cassandra-user mailing list during the past couple of months it seems > I'm not alone. > > To be sure I recommend checking actual swapping history (or at least > check that the absolute amount of memory swapped out is reasonable > over time). > > > I'll try to read through your full email in detail while looking at the > > source and the G1 paper -- right now it's a bit above my head :) > > Well, just to re-iterate though I have really only begun looking at it > myself and my ramblings may be completely off the mark. > > > FWIW, my tests on JRockit JRRT's gcprio:deterministic collector didn't go > > much better - eventually it fell back to a full compaction which lasted > 45 > > seconds or so. HBase must be doing something that's really hard for GCs > to > > deal with - either on the heuristics front or on the allocation pattern > > front. > > Interesting. I don't know a lot about JRockit's implementation since > not a lot of information seems to be available. I did my LRU > micro-benchmark with a ~20-30 GB heap and JRockit. I could definitely > press it hard enough to cause a fallback, but that seemed to be > directly as a result of high allocation rates simply exceeding the > forward progress made by the GC (based on blackbox observation > anyway). > > (The other problem was that the compaction pauses were never able to > complete; it seems compaction is O(n) with respect to the number of > objects being compacted, and I was unable to make it compact less than > 1% per GC (because the command line option only accepted integral > percents), and with my object count the 1% was enough to hit the pause > time requirement so compaction was aborted every time. LIkely this > would have poor results over time as fragmentation becomes > significant.). > Yep, I've seen JRRT also "abort compaction" on most compactions. I couldn't quite figure out how to tell it that it was fine to pause more often for compaction, so long as each pause was short. > > Does HBase go into periodic modes of very high allocation rate, or is > it fairly constant over time? I'm thinking that perhaps the concurrent > marking is just not triggered early enough and if large bursts of > allocations happen when the heap is relatively full, that might be the > triggering factor? > Yes, this may be part of it - we have certain operations ("compactions") where we end up churning through memory and then dropping a bunch of references in a fairly short period. But we're still talking about allocation rates in the <40M/sec range, as we're disk limited. I haven't actually tried to correlate the GC pauses with the application logs to see if one causes the other, but I do imagine that this could be confusing the heuristics. -Todd -- Todd Lipcon Software Engineer, Cloudera -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100730/690d8b0e/attachment.html From peter.schuller at infidyne.com Fri Jul 30 14:58:53 2010 From: peter.schuller at infidyne.com (Peter Schuller) Date: Fri, 30 Jul 2010 23:58:53 +0200 Subject: G1GC Full GCs In-Reply-To: References: <4C3CB3E9.4040305@oracle.com> Message-ID: > Yep, I've seen JRRT also "abort compaction" on most compactions. I couldn't > quite figure out how to tell it that it was fine to pause more often for > compaction, so long as each pause was short. FWIW, I got the impression at the time (but I don't remember why; I think I was half-guessing based on assumptions about what it does and several iterations through the documentation) that it was fundamentally only *able* to do compaction during the stop-the-world pause after a concurrent mark phase. I.e., I don't think you can make it spread the work out (but I can most definitely be wrong). -- / Peter Schuller