From ryebrye at gmail.com Mon Dec 3 06:56:08 2012 From: ryebrye at gmail.com (Ryan Gardner) Date: Mon, 3 Dec 2012 09:56:08 -0500 Subject: G1 to-space-overflow on one server (but not on another identical server under identical load) In-Reply-To: References: <50B8F940.3040001@oracle.com> Message-ID: I spent some more time tuning this application over the weekend, and not being familiar with G1 I kind of fiddled around with some options until I came to what seems like a pretty good solution. (There MAY be a better way of doing it - but this seems to work very well for this application) I set IHOP at 20%, the reserved space at 25%, I played around with different region sized but settled on just using the default, and I increased Xms and Xmx to 72G and a marvelous thing started to happen... The GC pauses started to be reported as almost all being 'young' - and taking about 250ms to collect an impressive amount of garbage. The GCViewer app says that I'm seeing "GC Performance" of 72816 MB / sec and a throughput of 98.75% - the max pause I'm seeing is a tad higher than I'd like for this application at around 600ms - but those are happening about every 10 minutes or so and the average pause is 170ms every 15 seconds - which is excellent. I've noticed that as I hit the app with more load, those 'worst case' pauses will happen more frequently but still be around 600ms. I'm not an expert on G1 by any means, but it seems to me that what might be happening when I take the app to a 72GB heap is that it uses a bigger region size and the way SOLR uses its caches things die after those caches are no longer needed - so they die at about the same time but "the same time" is about a 2 minute duration - so after 2 minutes those regions can be cleared out since they are all now dead... (It's doing a roughly 200ms pause every 15 or 16 seconds - with a few 600 ms pauses sprinkled in between but usually about every 10 or so minutes apart) Any thoughts on what I could do to make it's young event cleaning faster or more spread out? (I'd even be willing to trade 6x150ms pauses for 1x600 ms pause if there's a way to tune it for such tradeoffs) I can provide the GC log for the 72GB heap G1 config if it would make it easy to see what's going on (this is the same app with the same load as the earlier one that had one of the servers experience a 30 second pause under the same load) Ryan On Fri, Nov 30, 2012 at 7:03 PM, Ryan Gardner wrote: > > On Fri, Nov 30, 2012 at 1:21 PM, John Cuthbertson < > john.cuthbertson at oracle.com> wrote: > >> ** >> Hi Ryan, >> >> I would definitely be interested in both logs. I would be interested in >> plotting any differences in the promotion rates or start times and >> durations of marking cycles. Can you also give me details of your >> command-line flags? >> > > > Yes, I can. I've created a zip with both GC logs - solr4-gc.log and > solr5-gc.log - along with a README.txt explaining what's going on with them. > > It's hosted here: http://www.filedropper.com/solrgclogs (I can host put > it somewhere else if that place wont work - it's about 2 megs when zipped > up) > > It's running in tomcat with the following CATALINA_OPTS: > > export CATALINA_OPTS="-Djava.awt.headless=true -Dfile.encoding=UTF-8 > -server \ > -Xms24G -Xmx24G \ > -XX:+UseG1GC -XX:+AggressiveOpts -XX:MaxGCPauseMillis=150 \ > -XX:+ExplicitGCInvokesConcurrent \ > -XX:+PrintGCDetails -XX:+PrintGCDateStamps > -Xloggc:$CATALINA_BASE/logs/gc.log \ > -Dcom.sun.management.jmxremote \ > -Dcom.sun.management.jmxremote.port=39650 \ > -Dcom.sun.management.jmxremote.ssl=false \ > -Dcom.sun.management.jmxremote.authenticate=true \ > > -Dcom.sun.management.jmxremote.password.file=$CATALINA_HOME/conf/jmxremote.password > \ > > -Dcom.sun.management.jmxremote.access.file=$CATALINA_HOME/conf/jmxremote.access > \ > -Dsolr.data.dir=/db/solr > -Dsolr.directoryFactory=solr.NIOFSDirectoryFactory > -Dsolr.replication.enable.master=false > -Dsolr.replication.enable.slave=false" > > >> >> There's a couple of experiments to try: >> >> * Lowering your IHOP (InitiatiangHeapOccupancyPercent - default value is >> 45) to start the marking cycle a little bit earlier >> > > The percent when it has been starting seems to be a lot higher (it seems > to move around based on when it starts the concurrent phase) - Is there a > G1 equivalent of the "UseCMSInitiatingOccupancyOnly" > > >> * Increase the value of G1ReservePercent (default value 10). >> >> Also do you know approximately if your application creates large arrays? >> If so then you could be running into issues with "holes" in the heap >> associated with these (in G1 parlance) humongous objects. This can be >> reduced by explicitly setting the value for G1HeapRegionSize. By my guess >> your region size will be either 8m or 16m (based upon the size of your >> heap). Increasing the value can increase the packing density since some >> humongous objects are no longer humongous; reducing it can increase the >> number of humongous objects but reduce the size and number of "holes" since >> the humongous objects may occupy a larger number of regions but the wasted >> space at the end of these objects is smaller. >> > > The application is basically an out-of-the box version of SOLR 4.0 with > some configuration tweaks. It has a handful of relatively small sized > caches that live for about 2 minutes before being unused, and while they > are alive some of those caches churn through a large number of objects. I > have plenty of heap overhead - I can run some tests with both a larger and > smaller G1HeapRegionSize setting to see what influence it has on the GC > performance of the app. > > I've also noticed that my young gen collections are rather long - I was > hoping that G1's adaptive sizing would realize that the young gen GC's are > taking more time than my target pause time and size down the young gen area > - but it doesn't seem to be. Is setting Xmn or G1 > > > One other thing I've noticed, is that enabling ParallelRefProcessing has > had a big positive impact on the GC performance (the average GC remark GC > ref-proc pause with default settings was 0.46 seconds - with it parallel > the average is 0.05 seconds) I've since enabled that, and I've started > looking at enabling +XX:UseLargePages and a couple of other options to get > it to work better. > > I've been hesistant about telling G1 an Xmn to use since it seemed in > earlier tests that it will not use any adaptive sizing for the young > generation if it has an Xmn and I was assuming that ergonomics would find a > better long-term young gen than I did, but the roughtly 5GB eden seems to > be yielding young gen pauses of around 700 or 900ms - which is a lot higher > than our pause target for this app. (We have plenty of spare CPU available > on the server and would prefer to push the objects into old gen and let the > concurrent phases deal with it - if we can keep the overall pauses low we > can even increase the heap size if we need to - the servers have 120GB of > RAM in them currently but we need to leave at least 64GB of RAM unallocated > for the OS to be able to cache the index files it uses) > > > >> After looking at your logs I may have a few more suggestions. >> >> BTW we have a couple of CRs (and ideas) to reduce the duration of pauses >> that experience a to-space overflow/exhaustion. I'm not sure they're public >> yet but they are: >> >> 8003235 G1: >> Parellelize displaced header restoration during evacuation failures >> 8003237 G1: Reduce unnecessary (and failing) allocation attempts >> when handling an evacuation failure >> > The first one should address the high "other" time; the second should help >> to reduce the object copy time >> >> Cool :) I can't find any details on those - hopefully they'll work their > way into a version I can use in soon :) > > >> Thanks. >> >> JohnC >> >> >> On 11/29/12 19:29, Ryan Gardner wrote: >> >> We're doing some load testing of an instance of a solr search that has a >> pretty frequent replication... the object allocation rates are all over the >> map, and tuning it with CMS was very difficult and we weren't able to meet >> our latency targets - so we decided to try G1. >> >> I'm running some tests now, and I have two identical servers that are >> getting identical traffic. on one server, here's an excerpt from the gc >> log: >> >> Here are the VM arguments that I'm using: >> -XX:+PrintGCDateStamps -XX:+PrintGCDetails >> -XX:+ExplicitGCInvokesConcurrent -XX:MaxGCPauseMillis=150 >> -XX:+AggressiveOpts -XX:+UseG1GC -Xmx24G -Xms24G >> >> Is there some option I can set to stack the deck in favor of avoiding a >> "to-space-overflow"? >> >> If anyone is interested, I can provide the full GC logs from both of >> the servers - >> >> 2012-11-30T01:42:21.857+0000: 4571.106: [GC pause (young), 0.08561600 >> secs] >> [Parallel Time: 62.0 ms] >> [GC Worker Start (ms): 4571107.0 4571107.0 4571107.0 4571107.0 >> 4571107.1 4571107.1 4571107.2 4571107.2 4571107.2 4571107.3 >> 4571107.3 4571107.3 4571107.4 4571107.4 4571107.4 4571107.5 >> 4571107.5 4571107.5 >> Avg: 4571107.2, Min: 4571107.0, Max: 4571107.5, Diff: 0.5] >> [Ext Root Scanning (ms): 1.2 1.2 1.1 1.1 1.1 1.0 1.0 1.0 >> 1.0 1.0 0.9 1.2 0.8 0.9 1.0 0.9 0.8 0.8 >> Avg: 1.0, Min: 0.8, Max: 1.2, Diff: 0.5] >> [Update RS (ms): 8.1 8.3 8.1 8.1 8.0 8.1 8.2 7.9 8.1 8.1 >> 8.2 7.9 8.0 8.0 7.8 8.1 8.1 8.9 >> Avg: 8.1, Min: 7.8, Max: 8.9, Diff: 1.1] >> [Processed Buffers : 13 14 15 13 15 11 17 15 11 14 14 11 12 13 >> 13 11 11 11 >> Sum: 234, Avg: 13, Min: 11, Max: 17, Diff: 6] >> [Scan RS (ms): 0.4 0.3 0.5 0.5 0.7 0.6 0.6 0.7 0.6 0.5 >> 0.5 0.5 0.7 0.6 0.6 0.4 0.6 0.1 >> Avg: 0.5, Min: 0.1, Max: 0.7, Diff: 0.6] >> [Object Copy (ms): 47.2 47.1 46.9 47.0 47.0 48.6 46.9 47.0 >> 46.9 46.9 47.0 46.9 47.0 47.7 47.1 47.0 46.9 48.3 >> Avg: 47.2, Min: 46.9, Max: 48.6, Diff: 1.7] >> [Termination (ms): 1.9 1.8 1.9 1.9 1.9 0.3 1.8 1.8 1.9 >> 1.9 1.8 1.8 1.9 1.2 1.7 1.9 1.8 0.0 >> Avg: 1.6, Min: 0.0, Max: 1.9, Diff: 1.9] >> [Termination Attempts : 9 8 14 13 16 16 8 1 17 20 12 14 18 11 18 >> 18 17 18 >> Sum: 248, Avg: 13, Min: 1, Max: 20, Diff: 19] >> [GC Worker End (ms): 4571166.1 4571165.7 4571165.7 4571166.0 >> 4571166.0 4571165.9 4571165.8 4571166.0 4571165.7 4571165.7 >> 4571165.7 4571165.7 4571165.7 4571165.9 4571166.1 4571166.2 >> 4571165.7 4571165.9 >> Avg: 4571165.8, Min: 4571165.7, Max: 4571166.2, Diff: 0.5] >> [GC Worker (ms): 59.1 58.7 58.7 58.9 58.9 58.8 58.6 58.8 >> 58.5 58.4 58.4 58.4 58.3 58.6 58.7 58.7 58.2 58.4 >> Avg: 58.6, Min: 58.2, Max: 59.1, Diff: 0.9] >> [GC Worker Other (ms): 3.3 3.4 3.4 3.4 3.4 3.5 3.6 3.6 3.6 >> 3.7 3.7 3.7 3.7 3.8 3.8 3.8 3.9 3.9 >> Avg: 3.6, Min: 3.3, Max: 3.9, Diff: 0.5] >> [Clear CT: 2.3 ms] >> [Other: 21.2 ms] >> [Choose CSet: 0.2 ms] >> [Ref Proc: 5.8 ms] >> [Ref Enq: 0.1 ms] >> [Free CSet: 13.6 ms] >> [Eden: 6728M(6728M)->0B(4472M) Survivors: 376M->616M Heap: >> 10758M(24576M)->4264M(24576M)] >> [Times: user=1.08 sys=0.00, real=0.08 secs] >> 2012-11-30T01:42:27.493+0000: 4576.742: [GC pause (young), 0.06992900 >> secs] >> [Parallel Time: 52.4 ms] >> [GC Worker Start (ms): 4576742.7 4576742.8 4576742.8 4576742.8 >> 4576742.9 4576742.9 4576743.0 4576743.0 4576743.1 4576743.1 >> 4576743.1 4576743.2 4576743.2 4576743.2 4576743.3 4576743.3 >> 4576743.3 4576743.3 >> Avg: 4576743.1, Min: 4576742.7, Max: 4576743.3, Diff: 0.6] >> [Ext Root Scanning (ms): 1.4 1.3 1.4 1.1 1.0 1.2 1.1 1.0 >> 1.2 0.9 1.1 1.1 1.2 1.1 1.0 1.0 0.8 0.7 >> Avg: 1.1, Min: 0.7, Max: 1.4, Diff: 0.7] >> [Update RS (ms): 0.8 0.7 0.6 0.8 0.7 2.4 0.7 0.8 0.7 1.1 >> 0.7 0.8 0.4 0.4 0.4 0.7 0.8 1.2 >> Avg: 0.8, Min: 0.4, Max: 2.4, Diff: 2.0] >> [Processed Buffers : 2 2 1 7 4 1 6 2 1 1 2 1 4 1 1 1 1 1 >> Sum: 39, Avg: 2, Min: 1, Max: 7, Diff: 6] >> [Scan RS (ms): 0.1 0.4 0.3 0.3 0.3 0.1 0.4 0.2 0.0 0.0 >> 0.1 0.0 0.2 0.3 0.4 0.1 0.2 0.0 >> Avg: 0.2, Min: 0.0, Max: 0.4, Diff: 0.3] >> [Object Copy (ms): 47.2 47.4 47.3 47.3 47.4 45.7 47.4 47.4 >> 47.2 47.4 47.4 47.1 47.5 47.2 47.2 47.1 47.4 47.0 >> Avg: 47.2, Min: 45.7, Max: 47.5, Diff: 1.8] >> [Termination (ms): 0.3 0.1 0.2 0.2 0.3 0.3 0.0 0.0 0.2 >> 0.0 0.1 0.3 0.0 0.2 0.2 0.3 0.1 0.3 >> Avg: 0.2, Min: 0.0, Max: 0.3, Diff: 0.3] >> [Termination Attempts : 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 >> Sum: 18, Avg: 1, Min: 1, Max: 1, Diff: 0] >> [GC Worker End (ms): 4576792.6 4576792.6 4576792.6 4576792.7 >> 4576792.8 4576792.7 4576792.8 4576792.9 4576792.8 4576792.7 >> 4576792.6 4576792.6 4576792.8 4576792.6 4576792.6 4576792.6 >> 4576792.6 4576792.6 >> Avg: 4576792.7, Min: 4576792.6, Max: 4576792.9, Diff: 0.3] >> [GC Worker (ms): 49.8 49.8 49.8 49.9 49.9 49.8 49.8 49.8 >> 49.7 49.5 49.5 49.4 49.6 49.3 49.3 49.3 49.3 49.3 >> Avg: 49.6, Min: 49.3, Max: 49.9, Diff: 0.6] >> [GC Worker Other (ms): 2.6 2.6 2.6 2.7 2.7 2.8 2.8 2.9 3.0 >> 3.0 3.0 3.0 3.1 3.1 3.1 3.1 3.2 3.2 >> Avg: 2.9, Min: 2.6, Max: 3.2, Diff: 0.6] >> [Clear CT: 1.6 ms] >> [Other: 15.9 ms] >> [Choose CSet: 0.2 ms] >> [Ref Proc: 3.5 ms] >> [Ref Enq: 0.1 ms] >> [Free CSet: 10.0 ms] >> [Eden: 4472M(4472M)->0B(16704M) Survivors: 616M->288M Heap: >> 8736M(24576M)->4279M(24576M)] >> [Times: user=0.92 sys=0.01, real=0.07 secs] >> 2012-11-30T01:43:06.366+0000: 4615.615: [GC pause (young) (to-space >> overflow), 7.55314600 secs] >> [Parallel Time: 5994.5 ms] >> [GC Worker Start (ms): 4615616.3 4615616.4 4615616.4 4615616.4 >> 4615616.4 4615616.4 4615616.5 4615616.5 4615616.5 4615616.6 >> 4615616.6 4615616.7 4615616.7 4615616.7 4615616.8 4615616.8 >> 4615616.8 4615616.9 >> Avg: 4615616.6, Min: 4615616.3, Max: 4615616.9, Diff: 0.6] >> [Ext Root Scanning (ms): 1.2 1.3 1.3 1.2 1.2 1.2 1.6 1.0 >> 1.1 1.0 0.9 0.9 1.4 1.1 0.9 0.9 0.9 1.0 >> Avg: 1.1, Min: 0.9, Max: 1.6, Diff: 0.8] >> [Update RS (ms): 9.8 10.1 9.5 9.5 10.0 9.6 10.4 9.5 9.8 >> 9.7 10.9 9.7 9.1 9.9 10.2 9.7 9.5 9.6 >> Avg: 9.8, Min: 9.1, Max: 10.9, Diff: 1.8] >> [Processed Buffers : 16 17 14 19 14 13 14 14 15 18 13 13 12 14 >> 13 12 14 14 >> Sum: 259, Avg: 14, Min: 12, Max: 19, Diff: 7] >> [Scan RS (ms): 1.1 0.6 1.3 1.3 0.8 1.0 0.1 1.3 1.0 1.1 >> 0.1 1.2 1.3 0.7 0.5 1.0 1.2 0.9 >> Avg: 0.9, Min: 0.1, Max: 1.3, Diff: 1.2] >> [Object Copy (ms): 5979.6 5978.9 5979.2 5979.1 5979.5 5979.3 >> 5978.7 5979.6 5979.2 5979.5 5979.2 5979.3 5978.9 5979.0 5979.1 >> 5979.2 5979.0 5979.5 >> Avg: 5979.2, Min: 5978.7, Max: 5979.6, Diff: 1.0] >> [Termination (ms): 0.1 0.7 0.2 0.5 0.1 0.3 0.6 0.2 0.3 >> 0.0 0.2 0.2 0.6 0.5 0.4 0.3 0.6 0.0 >> Avg: 0.3, Min: 0.0, Max: 0.7, Diff: 0.7] >> [Termination Attempts : 3 1 1 3 3 5 3 4 4 4 4 1 3 3 3 5 6 4 >> Sum: 60, Avg: 3, Min: 1, Max: 6, Diff: 5] >> [GC Worker End (ms): 4621608.7 4621608.0 4621608.0 4621608.7 >> 4621608.4 4621608.5 4621608.0 4621608.3 4621608.6 4621608.0 >> 4621608.2 4621608.0 4621608.0 4621608.1 4621608.3 4621608.4 >> 4621608.0 4621608.1 >> Avg: 4621608.2, Min: 4621608.0, Max: 4621608.7, Diff: 0.7] >> [GC Worker (ms): 5992.4 5991.6 5991.6 5992.3 5992.0 5992.1 >> 5991.5 5991.8 5992.1 5991.4 5991.5 5991.3 5991.3 5991.3 5991.6 >> 5991.6 5991.2 5991.2 >> Avg: 5991.7, Min: 5991.2, Max: 5992.4, Diff: 1.2] >> [GC Worker Other (ms): 2.9 2.9 3.0 3.0 3.0 3.0 3.0 3.0 3.1 >> 3.1 3.2 3.2 3.2 3.3 3.3 3.3 3.4 3.5 >> Avg: 3.1, Min: 2.9, Max: 3.5, Diff: 0.6] >> [Clear CT: 3.3 ms] >> [Other: 1555.4 ms] >> [Choose CSet: 0.4 ms] >> [Ref Proc: 57.3 ms] >> [Ref Enq: 0.3 ms] >> [Free CSet: 19.4 ms] >> [Eden: 16704M(16704M)->0B(2784M) Survivors: 288M->2128M Heap: >> 21362M(24576M)->22239M(24576M)] >> [Times: user=35.37 sys=0.33, real=7.55 secs] >> 2012-11-30T01:43:16.740+0000: 4625.989: [GC pause (young) (to-space >> overflow) (initial-mark), 23.61614000 secs] >> [Parallel Time: 21781.4 ms] >> [GC Worker Start (ms): 4625989.9 4625990.0 4625990.1 4625990.1 >> 4625990.2 4625990.2 4625990.2 4625990.3 4625990.3 4625990.3 >> 4625990.4 4625990.4 4625990.4 4625990.4 4625990.5 4625990.5 >> 4625990.5 4625990.5 >> Avg: 4625990.3, Min: 4625989.9, Max: 4625990.5, Diff: 0.6] >> [Ext Root Scanning (ms): 1.7 2.0 1.4 1.2 1.4 1.4 1.4 1.4 >> 1.1 1.2 1.2 1.0 1.1 1.1 1.1 0.9 1.1 1.1 >> Avg: 1.3, Min: 0.9, Max: 2.0, Diff: 1.1] >> [Update RS (ms): 0.0 0.0 0.0 0.0 0.0 0.0 0.4 0.4 0.5 0.6 >> 0.6 1.4 0.5 0.9 0.6 0.7 0.5 0.7 >> Avg: 0.4, Min: 0.0, Max: 1.4, Diff: 1.4] >> [Processed Buffers : 0 0 0 0 0 0 16 4 11 22 30 39 30 26 9 62 21 9 >> Sum: 279, Avg: 15, Min: 0, Max: 62, Diff: 62] >> [Scan RS (ms): 0.1 0.1 0.1 0.1 0.1 0.1 0.8 0.6 0.6 0.5 >> 0.6 0.0 0.8 0.4 0.5 0.7 0.6 0.4 >> Avg: 0.4, Min: 0.0, Max: 0.8, Diff: 0.7] >> [Object Copy (ms): 21776.6 21776.1 21776.5 21776.6 21776.2 >> 21776.4 21775.0 21774.7 21775.3 21775.3 21775.0 21774.8 21775.4 >> 21774.6 21775.2 21774.6 21774.7 21775.0 >> Avg: 21775.5, Min: 21774.6, Max: 21776.6, Diff: 2.1] >> [Termination (ms): 0.0 0.3 0.4 0.4 0.5 0.3 0.7 1.0 0.6 >> 0.4 0.6 0.7 0.2 1.0 0.5 1.0 1.0 0.7 >> Avg: 0.6, Min: 0.0, Max: 1.0, Diff: 1.0] >> [Termination Attempts : 1 1 1 2 2 1 1 1 3 1 1 1 1 1 1 2 2 1 >> Sum: 24, Avg: 1, Min: 1, Max: 3, Diff: 2] >> [GC Worker End (ms): 4647768.4 4647768.5 4647768.5 4647768.4 >> 4647768.4 4647768.7 4647768.8 4647768.5 4647768.5 4647768.4 >> 4647768.7 4647768.5 4647768.4 4647768.4 4647768.6 4647768.6 >> 4647768.4 4647768.4 >> Avg: 4647768.5, Min: 4647768.4, Max: 4647768.8, Diff: 0.3] >> [GC Worker (ms): 21778.5 21778.5 21778.4 21778.3 21778.3 >> 21778.5 21778.5 21778.2 21778.2 21778.1 21778.3 21778.1 21778.0 >> 21778.0 21778.1 21778.1 21777.9 21777.9 >> Avg: 21778.2, Min: 21777.9, Max: 21778.5, Diff: 0.6] >> [GC Worker Other (ms): 2.9 3.0 3.0 3.1 3.1 3.2 3.2 3.2 3.3 >> 3.3 3.3 3.3 3.4 3.4 3.4 3.5 3.5 3.5 >> Avg: 3.3, Min: 2.9, Max: 3.5, Diff: 0.6] >> [Clear CT: 1.5 ms] >> [Other: 1833.3 ms] >> [Choose CSet: 0.0 ms] >> [Ref Proc: 0.8 ms] >> [Ref Enq: 0.0 ms] >> [Free CSet: 3.6 ms] >> [Eden: 1384M(2784M)->0B(4912M) Survivors: 2128M->0B Heap: >> 23623M(24576M)->23623M(24576M)] >> [Times: user=33.81 sys=0.54, real=23.62 secs] >> 2012-11-30T01:43:40.357+0000: 4649.606: [GC >> concurrent-root-region-scan-start] >> 2012-11-30T01:43:40.357+0000: 4649.606: [GC >> concurrent-root-region-scan-end, 0.0000690] >> 2012-11-30T01:43:40.357+0000: 4649.606: [GC concurrent-mark-start] >> 2012-11-30T01:43:40.358+0000: 4649.607: [GC pause (young), 1.66905500 >> secs] >> [Parallel Time: 1667.5 ms] >> [GC Worker Start (ms): 4649607.0 4649607.0 4649607.1 4649607.2 >> 4649607.2 4649607.3 4649607.3 4649607.3 4649607.4 4649607.8 >> 4649607.4 4649607.4 4649607.5 4649607.5 4649607.6 4649607.6 >> 4649607.6 4649607.7 >> Avg: 4649607.4, Min: 4649607.0, Max: 4649607.8, Diff: 0.8] >> [Ext Root Scanning (ms): 1.6 2.1 2.3 1.5 2.1 1.6 2.0 1.9 >> 1.8 1.7 1.9 2.1 1.6 1.7 1.4 1.6 1.2 1.3 >> Avg: 1.7, Min: 1.2, Max: 2.3, Diff: 1.1] >> [SATB Filtering (ms): 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 >> 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 >> Avg: 0.0, Min: 0.0, Max: 0.0, Diff: 0.0] >> [Update RS (ms): 1662.9 1662.8 1662.5 1663.0 1662.6 1663.1 >> 1662.6 1662.7 1662.9 1662.5 1662.6 1662.4 1662.9 1662.8 1663.1 >> 1662.8 1663.2 1663.1 >> Avg: 1662.8, Min: 1662.4, Max: 1663.2, Diff: 0.7] >> [Processed Buffers : 10947 10965 17033 10676 10631 10692 10524 >> 10706 16833 10703 10751 12889 10735 18332 18581 19972 16982 19856 >> Sum: 247808, Avg: 13767, Min: 10524, Max: 19972, Diff: 9448] >> [Scan RS (ms): 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 >> 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 >> Avg: 0.0, Min: 0.0, Max: 0.0, Diff: 0.0] >> [Object Copy (ms): 0.5 0.1 0.1 0.3 0.2 0.1 0.1 0.1 0.0 >> 0.1 0.2 0.1 0.1 0.1 0.1 0.1 0.1 0.1 >> Avg: 0.1, Min: 0.0, Max: 0.5, Diff: 0.5] >> [Termination (ms): 0.0 0.0 0.1 0.1 0.0 0.1 0.1 0.1 0.0 >> 0.0 0.1 0.1 0.1 0.1 0.1 0.0 0.1 0.0 >> Avg: 0.0, Min: 0.0, Max: 0.1, Diff: 0.1] >> [Termination Attempts : 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 >> Sum: 18, Avg: 1, Min: 1, Max: 1, Diff: 0] >> [GC Worker End (ms): 4651272.1 4651272.1 4651272.2 4651272.1 >> 4651272.1 4651272.2 4651272.1 4651272.1 4651272.1 4651272.1 >> 4651272.1 4651272.1 4651272.1 4651272.2 4651272.1 4651272.1 >> 4651272.1 4651272.1 >> Avg: 4651272.1, Min: 4651272.1, Max: 4651272.2, Diff: 0.0] >> [GC Worker (ms): 1665.1 1665.1 1665.0 1665.0 1664.9 1664.9 >> 1664.8 1664.8 1664.8 1664.4 1664.8 1664.7 1664.7 1664.6 1664.6 >> 1664.5 1664.5 1664.5 >> Avg: 1664.8, Min: 1664.4, Max: 1665.1, Diff: 0.8] >> [GC Worker Other (ms): 2.4 2.4 2.5 2.6 2.6 2.6 2.7 2.7 2.7 >> 3.2 2.8 2.8 2.8 2.9 3.0 3.0 3.0 3.1 >> Avg: 2.8, Min: 2.4, Max: 3.2, Diff: 0.8] >> [Complete CSet Marking: 0.0 ms] >> [Clear CT: 0.5 ms] >> [Other: 1.1 ms] >> [Choose CSet: 0.0 ms] >> [Ref Proc: 0.8 ms] >> [Ref Enq: 0.0 ms] >> [Free CSet: 0.2 ms] >> [Eden: 0B(4912M)->0B(4912M) Survivors: 0B->0B Heap: >> 23623M(24576M)->23623M(24576M)] >> >> ------------------------------ >> >> _______________________________________________ >> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20121203/afd98cf3/attachment-0001.html From azuryy_yu at 126.com Sun Dec 9 01:23:21 2012 From: azuryy_yu at 126.com (=?GBK?B?09q377ar?=) Date: Sun, 9 Dec 2012 17:23:21 +0800 (CST) Subject: A Slow Yong GC question Message-ID: <3b019f4c.4165.13b7ef9d88d.Coremail.azuryy_yu@126.com> hi, I found sometimes yong gc is very slow, it paused more than 10 seconds. GC log: 1383.531: [GC 1383.531: [ParNew: 235968K->26176K(235968K), 14.1752210 secs] 1518208K->1356662K(4168128K), 14.6084490 secs] [Times: user=9.77 sys=0.14, real=14.61 secs] Total time for which application threads were stopped: 14.6241190 seconds jstat : (jstat -gc -h 10 20010 1000) S0C S1C S0U S1U EC EU OC OU PC PU YGC YGCT FGC FGCT GCT 26176.0 26176.0 26176.0 26176.0 209792.0 209792.0 3932160.0 1315008.8 163840.0 22007.9 434 330.795 14 3.943 334.738 26176.0 26176.0 26176.0 26176.0 209792.0 209792.0 3932160.0 1315008.8 163840.0 22007.9 434 330.795 14 3.943 334.738 26176.0 26176.0 26176.0 0.0 209792.0 0.0 3932160.0 1330486.2 163840.0 22007.9 434 344.970 14 3.943 348.913 Eden, S0, S1 are all full, then yong gc paused more than 14 seconds. any body give me some advice, appreciate much. 1) I used: java version "1.7.0_09" Java(TM) SE Runtime Environment (build 1.7.0_09-b05) Java HotSpot(TM) 64-Bit Server VM (build 23.5-b02, mixed mode) 2) My JVM options as follow(yong objects have a short life generally, so -Xmn256m): -Xmx4g -Xms4g -Xmn256m -server -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:+PrintGCApplicationStoppedTime -XX:-HeapDumpOnOutOfMemoryError -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=9 -XX:GCLogFileSize=20m -XX:+DisableExplicitGC -XX:+UseCompressedOops -XX:PermSize=160m -XX:MaxPermSize=160m -XX:GCTimeRatio=19 -XX:SoftRefLRUPolicyMSPerMB=0 -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=6 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSCompactAtFullCollection -XX:CMSFullGCsBeforeCompaction=0 -XX:+CMSClassUnloadingEnabled -XX:CMSMaxAbortablePrecleanTime=300 -XX:+CMSScavengeBeforeRemark -XX:ErrorFile=/usr/local/hase/logs/rs_error_gc.log -Xloggc:/usr/local/hbase/logs/rs_gc.log -XX:HeapDumpPath=/usr/local/hbase/logs/rs_error.hprof -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20121209/5b1c93ff/attachment.html From pks_chennai at yahoo.com Sun Dec 9 08:33:30 2012 From: pks_chennai at yahoo.com (Pravin Sinha) Date: Mon, 10 Dec 2012 00:33:30 +0800 (SGT) Subject: G1GC options. Message-ID: <1355070810.97330.YahooMailNeo@web193301.mail.sg3.yahoo.com> Hi, Our application uses more than 50% of the allocated memory and hence it makes more sense to move to G1GC from current CMS. Could somebody please point out a document on all the options which can be used when G1GC is enabled? Thanks, Pravin -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20121210/0894148f/attachment.html From ysr1729 at gmail.com Wed Dec 12 09:09:38 2012 From: ysr1729 at gmail.com (Srinivas Ramakrishna) Date: Wed, 12 Dec 2012 09:09:38 -0800 Subject: A Slow Yong GC question In-Reply-To: <3b019f4c.4165.13b7ef9d88d.Coremail.azuryy_yu@126.com> References: <3b019f4c.4165.13b7ef9d88d.Coremail.azuryy_yu@126.com> Message-ID: Monitor your paging rate; perhaps there are other processes on the system that are causing the young gen and parts of the old gen to page? You might also try -XX:+AlwaysPreTouch in case this happens when the application footprint is growing and physical pages are allocated for the first time. (That would typically show up as a more-or-less periodic increase in scavenge pauses as we cross into new physical pages in the old gen.) -- ramki On Sun, Dec 9, 2012 at 1:23 AM, ??? wrote: > hi, > I found sometimes yong gc is very slow, it paused more than 10 seconds. > GC log: > 1383.531: [GC 1383.531: [ParNew: 235968K->26176K(235968K), 14.1752210 > secs] 1518208K->1356662K(4168128K), 14.6084490 secs] [Times: user=9.77 > sys=0.14, real=14.61 secs] > Total time for which application threads were stopped: 14.6241190 seconds > > jstat : (jstat -gc -h 10 20010 1000) > > S0C S1C S0U S1U EC EU OC > OU PC PU YGC YGCT FGC FGCT GCT > 26176.0 26176.0 26176.0 26176.0 209792.0 209792.0 3932160.0 > 1315008.8 163840.0 22007.9 434 330.795 14 3.943 334.738 > 26176.0 26176.0 26176.0 26176.0 209792.0 209792.0 3932160.0 > 1315008.8 163840.0 22007.9 434 330.795 14 3.943 334.738 > 26176.0 26176.0 26176.0 0.0 209792.0 0.0 3932160.0 > 1330486.2 163840.0 22007.9 434 344.970 14 3.943 348.913 > > Eden, S0, S1 are all full, then yong gc paused more than 14 seconds. any > body give me some advice, appreciate much. > > 1) I used: > java version "1.7.0_09" > Java(TM) SE Runtime Environment (build 1.7.0_09-b05) > Java HotSpot(TM) 64-Bit Server VM (build 23.5-b02, mixed mode) > > 2) My JVM options as follow(yong objects have a short life generally, so > -Xmn256m): > -Xmx4g -Xms4g -Xmn256m -server -verbose:gc -XX:+PrintGCTimeStamps > -XX:+PrintGCDetails -XX:+PrintGCApplicationStoppedTime > -XX:-HeapDumpOnOutOfMemoryError -XX:+UseGCLogFileRotation > -XX:NumberOfGCLogFiles=9 -XX:GCLogFileSize=20m -XX:+DisableExplicitGC > -XX:+UseCompressedOops -XX:PermSize=160m -XX:MaxPermSize=160m > -XX:GCTimeRatio=19 -XX:SoftRefLRUPolicyMSPerMB=0 -XX:SurvivorRatio=8 > -XX:MaxTenuringThreshold=6 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC > -XX:+CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=70 > -XX:+UseCMSCompactAtFullCollection -XX:CMSFullGCsBeforeCompaction=0 > -XX:+CMSClassUnloadingEnabled -XX:CMSMaxAbortablePrecleanTime=300 > -XX:+CMSScavengeBeforeRemark > -XX:ErrorFile=/usr/local/hase/logs/rs_error_gc.log > -Xloggc:/usr/local/hbase/logs/rs_gc.log > -XX:HeapDumpPath=/usr/local/hbase/logs/rs_error.hprof > > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20121212/8fe03247/attachment.html From bernd-2012 at eckenfels.net Wed Dec 12 09:20:19 2012 From: bernd-2012 at eckenfels.net (Bernd Eckenfels) Date: Wed, 12 Dec 2012 18:20:19 +0100 Subject: A Slow Yong GC question In-Reply-To: References: <3b019f4c.4165.13b7ef9d88d.Coremail.azuryy_yu@126.com> Message-ID: Am 12.12.2012, 18:09 Uhr, schrieb Srinivas Ramakrishna : > Monitor your paging rate; perhaps there are other processes on the system > that are causing the young gen and parts of the old gen to page? I wonder how many threads and CPUs are used here. The real time is double the user time, so this points to possible swapping or CPU hogging. On the other hand the user=9s is huge by itself for a 256mb heap. I guess it would be helpfull to have a look at the collected verbose GC log, how other collections typically behave. Is there off-heap allocations involved? Gruss Bernd -- http://bernd.eckenfels.net From jesper.wilhelmsson at oracle.com Thu Dec 13 03:58:32 2012 From: jesper.wilhelmsson at oracle.com (Jesper Wilhelmsson) Date: Thu, 13 Dec 2012 12:58:32 +0100 Subject: G1GC options. In-Reply-To: <1355070810.97330.YahooMailNeo@web193301.mail.sg3.yahoo.com> References: <1355070810.97330.YahooMailNeo@web193301.mail.sg3.yahoo.com> Message-ID: <50C9C2E8.5040703@oracle.com> Hi, You can find a list of supported G1 options here: http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html#G1Options Let us know if you encounter any issues with G1. /Jesper On 9/12/12 5:33 PM, Pravin Sinha wrote: > Hi, > > Our application uses more than 50% of the allocated memory and hence > it makes more sense to move to G1GC from current CMS. Could somebody > please point out a document on all the options which can be used when > G1GC is enabled? > > Thanks, > Pravin > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20121213/74e7010e/attachment.html From pks_chennai at yahoo.com Sun Dec 16 23:12:28 2012 From: pks_chennai at yahoo.com (Pravin Sinha) Date: Mon, 17 Dec 2012 15:12:28 +0800 (SGT) Subject: G1GC Old generation taking too much time. Message-ID: <1355728348.31358.YahooMailNeo@web193301.mail.sg3.yahoo.com> Hi, We are facing a weird problem when G1GC enabled.? The Old Gen collections took about 50 min and the process was completely unresponsive. This is what we have in our deployment: * Max heap consumption was around 12.5G * we had set for JVM: -d64 -Xms15360m -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=9080 -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:GCPauseIntervalMillis=400? -XX:+PrintGCTimeStamps -XX:+PrintGCDetails? -Xloggc:/loc_to_log. We are running it on a 32 core box and the number of ParellelGC thread is 23 ( I think using 5/8 rule?). Could you please suggest something to try to avoid this long GC time? Regards, Pravin -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20121217/2c32eb59/attachment.html From taigetco at gmail.com Mon Dec 17 06:44:41 2012 From: taigetco at gmail.com (Matthew Young) Date: Mon, 17 Dec 2012 22:44:41 +0800 Subject: how to CMS gc logs Message-ID: Hi all, Currently I am working on promotion failure about CMS, using the commanding line, -XX:+CMSDumpAtPromotionFailure -XX:+CMSPrintChunksInDump -XX:+CMSPrintObjectsInDump, below is gc log segment, (0: promotion failure size = 10) (1: promotion failure size = 14) //question1, the size unit is bye or block? 0x039c0000: free block of size 16384: Next: 0x00000000 Prev: 0x00000000 [can't coalesce] //question2, what does this mean? Should I look into JVM source code to find the answer? if not, where to get references? size bfrsurp surplus desired prvSwep bfrSwep count cBirths cDeaths sBirths sDeaths //question3, can anyone explain these words? thanks -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20121217/5237e70b/attachment.html From ysr1729 at gmail.com Mon Dec 17 10:21:58 2012 From: ysr1729 at gmail.com (Srinivas Ramakrishna) Date: Mon, 17 Dec 2012 10:21:58 -0800 Subject: how to CMS gc logs In-Reply-To: References: Message-ID: Hi Matthew -- On Mon, Dec 17, 2012 at 6:44 AM, Matthew Young wrote: > Hi all, > > Currently I am working on promotion failure about CMS, using the > commanding line, -XX:+CMSDumpAtPromotionFailure -XX:+CMSPrintChunksInDump > -XX:+CMSPrintObjectsInDump, below is gc log segment, > > (0: promotion failure size = 10) (1: promotion failure size = 14) > //question1, the size unit is bye or block? > I believe it's "heap words" (4 byte units in the case of the 32 bit JVM or the compressed-refs 64-bit jvm, and 8 bytes in the case f of the regular 64-bit jvm). Thus, 10 above means 40 bytes and 14 means 56 bytes (given that it appears you are using a 32-bit JVM below). > 0x039c0000: free block of size 16384: > Next: 0x00000000 Prev: 0x00000000 [can't coalesce] > //question2, what does this mean? Should I look into JVM source code to > find the answer? if not, where to get references? > CMS GC uses a few blocks in the old generation to keep some book-keeping data structures around when objects are promoted during a scavenge. This is such a block. You can consider it CMS overhead. I don't know if this is explicitly called out in any of the verbose heap options, but perhaps should be. Please look at "PromotionInfo" and "SpoolBlock" for more information and details. > size bfrsurp surplus desired > prvSwep bfrSwep count cBirths > cDeaths sBirths sDeaths > //question3, can anyone explain these words? > This is a recounting of the census of historical statistics of various kinds kept by CMS, and used to forecast demand for objects of various sizes, and drives free block list population and free block coalition decisions. More information on these columns can be found in the header file allocationStats.hpp -- ramki > thanks > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20121217/182f9ffc/attachment.html From monica.beckwith at oracle.com Mon Dec 17 10:46:54 2012 From: monica.beckwith at oracle.com (Monica Beckwith) Date: Mon, 17 Dec 2012 12:46:54 -0600 Subject: G1GC Old generation taking too much time. In-Reply-To: <1355728348.31358.YahooMailNeo@web193301.mail.sg3.yahoo.com> References: <1355728348.31358.YahooMailNeo@web193301.mail.sg3.yahoo.com> Message-ID: <50CF689E.7040809@oracle.com> Hi Pravin, I work on G1 GC performance, and would like to look at your log. Please feel free to forward it to me at the above 'from' email address. Also, please send along your java version string. Thanks! Monica Beckwith On 12/17/2012 1:12 AM, Pravin Sinha wrote: > Hi, > > We are facing a weird problem when G1GC enabled. The Old Gen > collections took about 50 min and the process was completely unresponsive. > This is what we have in our deployment: > * Max heap consumption was around 12.5G > * we had set for JVM: > -d64 -Xms15360m -Dcom.sun.management.jmxremote > -Dcom.sun.management.jmxremote.port=9080 > -Dcom.sun.management.jmxremote.local.only=false > -Dcom.sun.management.jmxremote.authenticate=false > -Dcom.sun.management.jmxremote.ssl=false -XX:+UseG1GC > -XX:MaxGCPauseMillis=200 -XX:GCPauseIntervalMillis=400 > -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -Xloggc:/loc_to_log. > > We are running it on a 32 core box and the number of ParellelGC thread > is 23 ( I think using 5/8 rule?). Could you please suggest something > to try to avoid this long GC time? > > Regards, > Pravin > > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -- Oracle Monica Beckwith | Java Performance Engineer VOIP: +1 512 401 1274 Texas Green Oracle Oracle is committed to developing practices and products that help protect the environment -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20121217/d8618841/attachment-0001.html -------------- next part -------------- A non-text attachment was scrubbed... Name: oracle_sig_logo.gif Type: image/gif Size: 658 bytes Desc: not available Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20121217/d8618841/oracle_sig_logo-0001.gif -------------- next part -------------- A non-text attachment was scrubbed... Name: green-for-email-sig_0.gif Type: image/gif Size: 356 bytes Desc: not available Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20121217/d8618841/green-for-email-sig_0-0001.gif