From daubman at gmail.com Mon Dec 2 14:14:36 2013 From: daubman at gmail.com (Aaron Daubman) Date: Mon, 2 Dec 2013 17:14:36 -0500 Subject: Troubleshooting a ~40-second minor collection In-Reply-To: References:

Message-ID: Sorry - the 4.5M attached .tgz for GC logs may have blocked this message. I am re-sending without the logs - if anybody is interested in reviewing them please let me know and I can send them directly. Also, I should have mentioned - this is 64bit JDK 7u25 running on Centos 6.4. Thanks again, Aaron On Mon, Dec 2, 2013 at 5:11 PM, Aaron Daubman wrote: > Hi again GC geniuses, > > I have been regularly seeing this long (10s of seconds) minor collection > time now, and this time I have plenty of logs... > > Last time, the only response I received was that it was likely related to > some part of memory that GC tried to inspect having been swapped out. > That seemed to make sense, and so I set about configuring HugePages on > these linux servers to allow JVM heap to be locked as active mem (we have > plenty of ram on these 64G servers). > > That seems to be working correctly: > grep Huge /proc/meminfo > AnonHugePages: 0 kB > HugePages_Total: 10240 > HugePages_Free: 2464 > HugePages_Rsvd: 1293 > HugePages_Surp: 0 > Hugepagesize: 2048 kB > > These are the JVM flags I am using, LargePages to UseSHM is what has been > added recently to try and prevent the heap from being swapped out: > -XX:+UseG1GC > -XX:MaxGCPauseMillis=50 > -Xms16G > -Xmx16G > *-XX:+UseLargePages* > *-XX:LargePageSizeInBytes=2m* > *-XX:+UseSHM* > -XX:+AggressiveOpts > -XX:+UseFastAccessorMethods > -XX:+UseStringCache > -XX:+OptimizeStringConcat > -XX:+PrintGCDateStamps > -XX:+PrintGCTimeStamps > -XX:+PrintGCDetails > -XX:+PrintAdaptiveSizePolicy > > I am attaching a gzipped tar file of GC logs from four of the servers > seeing these tens-of-seconds minor pauses. Any help in explaining what may > be causing them would be much appreciated. > > The situation in which this occurs is when we copy at high rate data to > the server using netcat, tar and gzip (I guess it is a known issue that tar > will pollute pagecache*, but using largepages for the JVM should protect me > from swapping due to cache pollution, no?) > > Thanks as always! > Aaron > > * > http://www.mysqlperformanceblog.com/2010/04/02/fadvise-may-be-not-what-you-expect/ > > > On Mon, Oct 14, 2013 at 9:36 PM, Aaron Daubman wrote: > >> Hi All, >> >> I have unfortunately lost my GC log file (server restarted shortly after >> the event) but have AppDynamics stats. >> >> I am running Solr 3.6.1 in a jetty 9 container under jdk 1.7u25. >> >> Max heap is 16G. <7G were used at the time of the event. Typical heap >> usage is ~28%. >> >> There were around 10 minor collection events (fairly typical) during the >> minute the event occurred. The event was an almost 40-second max minor >> collection time. >> >> Around that time JVM Heap utilization was only between 27%-31% utilized - >> I cannot remember the last time we had a major collection, and I have also >> never seen such a long minor collection time. There is nothing I can see >> about traffic to the JVM that appeared abnormal. >> >> I did see some long external (JDBC) query times about this time as well, >> but thought they were more likely a symptom of the minor collection pause, >> rather than a cause. >> >> AppDynamics monitors Code Cache, G1 Eden Space, G1 Old Gen, G1 Perm Gen, >> and G1 Survivor - the max of which at the time was Perm Gen at only 60%. >> >> Is there anything else I can do (without the GC log file) to try and >> determine the cause of the unexpected 40s minor collection pause time? >> >> Thanks, >> Aaron >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20131202/d5469c21/attachment.html From bernd-2013 at eckenfels.net Mon Dec 2 14:21:12 2013 From: bernd-2013 at eckenfels.net (Bernd Eckenfels) Date: Mon, 02 Dec 2013 23:21:12 +0100 Subject: Troubleshooting a ~40-second minor collection In-Reply-To: References:

Message-ID: Hello Aaron, another rough guess is, that when you "copy high rate" that you have a lot of system interrupt time and conext switches (especially when the network or disk drivers are missbehaving). I wonder if if this can really slow down the GC so much, but it would be the next thing I would investigate. Is this a NUMA machine? Is the JVM process spread over multiple nodes? Gruss Bernd Am 02.12.2013, 23:14 Uhr, schrieb Aaron Daubman : From daubman at gmail.com Mon Dec 2 14:26:43 2013 From: daubman at gmail.com (Aaron Daubman) Date: Mon, 2 Dec 2013 17:26:43 -0500 Subject: Troubleshooting a ~40-second minor collection In-Reply-To: References:

Message-ID: Hi Bernd, Thanks for the info. This is a numa machine, however, as part of setting up hugepages, I have disabled numa (numa=off) in grub.conf (and have also disabled transparent huge page support). The JVM process is the only significant process (aside from the high-rate data copy tar/nc/pigz) running on this 32-core, 2-node 64G RAM box. The tar process is limited to using one CPU (close to 100%) but leaving 31 others free for the JVM - load average on the box is fairly low. The JVM process is spread fairly evenly over the nodes - watching htop I can see CPU jumping around among the 32 cores. Do you know what I might look at to see network/disk driver missbehavior? Thanks! Aaron On Mon, Dec 2, 2013 at 5:21 PM, Bernd Eckenfels wrote: > Hello Aaron, > > another rough guess is, that when you "copy high rate" that you have a lot > of system interrupt time and conext switches (especially when the network > or disk drivers are missbehaving). > > I wonder if if this can really slow down the GC so much, but it would be > the next thing I would investigate. > > Is this a NUMA machine? Is the JVM process spread over multiple nodes? > > Gruss > Bernd > > > Am 02.12.2013, 23:14 Uhr, schrieb Aaron Daubman : > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20131202/d4cdb40d/attachment.html From bernd-2013 at eckenfels.net Mon Dec 2 14:53:29 2013 From: bernd-2013 at eckenfels.net (Bernd Eckenfels) Date: Mon, 02 Dec 2013 23:53:29 +0100 Subject: Troubleshooting a ~40-second minor collection In-Reply-To: References:

Message-ID: Hello, Hmm, switching numa off in linux boot means that there is no numa optimization done by the kernel, the hardware is still numa (so the system most likely behaves worse as there is no optimization for local memory regions). (Maybe it would help to bind the java process to only one node (odes CPUs and memory)). This is especially good if some other stuff (other JVM or DB) can be bound to the other node. As for your question on how to go with it, I would check for a large number of hardware interrupts (hi,%irq) or context switches (compared to idle times and %soft interrupts), not so sure if there is an easy way to see if interrupt optimizations are active/needed by the drivers. (mpstat -P ALL, vmstat, /proc/interrupts). I havent been into hardware lately, but I would say >2k cs/s is something to observe closer. For network cards for example ethtool can be used to tune it (see for example http://serverfault.com/questions/241421/napi-vs-adaptive-interrupts). But I guess it is only a problem when you have mulitple GE interfaces (or faster). Gruss Bernd Am 02.12.2013, 23:26 Uhr, schrieb Aaron Daubman : > Hi Bernd, > > Thanks for the info. > This is a numa machine, however, as part of setting up hugepages, I have > disabled numa (numa=off) in grub.conf (and have also disabled transparent > huge page support). > > The JVM process is the only significant process (aside from the high-rate > data copy tar/nc/pigz) running on this 32-core, 2-node 64G RAM box. The > tar > process is limited to using one CPU (close to 100%) but leaving 31 others > free for the JVM - load average on the box is fairly low. > > The JVM process is spread fairly evenly over the nodes - watching htop I > can see CPU jumping around among the 32 cores. > > Do you know what I might look at to see network/disk driver missbehavior? > > Thanks! > Aaron > > > On Mon, Dec 2, 2013 at 5:21 PM, Bernd Eckenfels > wrote: > >> Hello Aaron, >> >> another rough guess is, that when you "copy high rate" that you have a >> lot >> of system interrupt time and conext switches (especially when the >> network >> or disk drivers are missbehaving). >> >> I wonder if if this can really slow down the GC so much, but it would be >> the next thing I would investigate. >> >> Is this a NUMA machine? Is the JVM process spread over multiple nodes? >> >> Gruss >> Bernd >> >> >> Am 02.12.2013, 23:14 Uhr, schrieb Aaron Daubman : >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> -- http://bernd.eckenfels.net From Eric.Caspole at amd.com Mon Dec 2 15:19:32 2013 From: Eric.Caspole at amd.com (Caspole, Eric) Date: Mon, 2 Dec 2013 23:19:32 +0000 Subject: Troubleshooting a ~40-second minor collection In-Reply-To: References:

Message-ID: If you know the problem is happening and you are logged into the machine you can try doing "perf top" or "perf record" during the problem time and see the kernel profile of what is happening. I think you would also want to do it when the problem is not happening to have something to compare to. Also dstat is handy in these situations to see the cpu, i/o, interrupts etc as time goes by. http://dag.wiee.rs/home-made/dstat/ Regards, Eric -----Original Message----- From: hotspot-gc-use-bounces at openjdk.java.net [mailto:hotspot-gc-use-bounces at openjdk.java.net] On Behalf Of Bernd Eckenfels Sent: Monday, December 02, 2013 5:53 PM To: hotspot-gc-use at openjdk.java.net Subject: Re: Troubleshooting a ~40-second minor collection Hello, Hmm, switching numa off in linux boot means that there is no numa optimization done by the kernel, the hardware is still numa (so the system most likely behaves worse as there is no optimization for local memory regions). (Maybe it would help to bind the java process to only one node (odes CPUs and memory)). This is especially good if some other stuff (other JVM or DB) can be bound to the other node. As for your question on how to go with it, I would check for a large number of hardware interrupts (hi,%irq) or context switches (compared to idle times and %soft interrupts), not so sure if there is an easy way to see if interrupt optimizations are active/needed by the drivers. (mpstat -P ALL, vmstat, /proc/interrupts). I havent been into hardware lately, but I would say >2k cs/s is something to observe closer. For network cards for example ethtool can be used to tune it (see for example http://serverfault.com/questions/241421/napi-vs-adaptive-interrupts). But I guess it is only a problem when you have mulitple GE interfaces (or faster). Gruss Bernd Am 02.12.2013, 23:26 Uhr, schrieb Aaron Daubman : > Hi Bernd, > > Thanks for the info. > This is a numa machine, however, as part of setting up hugepages, I > have disabled numa (numa=off) in grub.conf (and have also disabled > transparent huge page support). > > The JVM process is the only significant process (aside from the > high-rate data copy tar/nc/pigz) running on this 32-core, 2-node 64G > RAM box. The tar process is limited to using one CPU (close to 100%) > but leaving 31 others free for the JVM - load average on the box is > fairly low. > > The JVM process is spread fairly evenly over the nodes - watching htop > I can see CPU jumping around among the 32 cores. > > Do you know what I might look at to see network/disk driver missbehavior? > > Thanks! > Aaron > > > On Mon, Dec 2, 2013 at 5:21 PM, Bernd Eckenfels > wrote: > >> Hello Aaron, >> >> another rough guess is, that when you "copy high rate" that you have >> a lot of system interrupt time and conext switches (especially when >> the network or disk drivers are missbehaving). >> >> I wonder if if this can really slow down the GC so much, but it would >> be the next thing I would investigate. >> >> Is this a NUMA machine? Is the JVM process spread over multiple nodes? >> >> Gruss >> Bernd >> >> >> Am 02.12.2013, 23:14 Uhr, schrieb Aaron Daubman : >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> -- http://bernd.eckenfels.net _______________________________________________ hotspot-gc-use mailing list hotspot-gc-use at openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From daubman at gmail.com Mon Dec 2 20:32:38 2013 From: daubman at gmail.com (Aaron Daubman) Date: Mon, 2 Dec 2013 23:32:38 -0500 Subject: Troubleshooting a ~40-second minor collection In-Reply-To: References:

Message-ID: > (Maybe it would help to bind the java process to only one node > (odes CPUs and memory)). This is especially good if some other stuff > (other JVM or DB) can be bound to the other node. > Unfortunately only this one large JVM (Jetty / Solr) runs on this system, so using numactl to bind to one node would waste half the compute resources =( Also, in steady state (23 hours a day) this appears to work fine - it's only during this network copy of the solr index out to the nodes that we see any GC issue. > As for your question on how to go with it, I would check for a large > number of hardware interrupts (hi,%irq) or context switches (compared to > idle times and %soft interrupts), not so sure if there is an easy way to > see if interrupt optimizations are active/needed by the drivers. (mpstat > -P ALL, vmstat, /proc/interrupts). I havent been into hardware lately, but > I would say >2k cs/s is something to observe closer. > Hmm... we have cacti monitoring these hosts, and I do see that we jump from a steady state of ~2k CS (I think per minute, but that could be per second) up to 10k CS during the data transfer. Could you explain (or point me to docs) why high context switching like this would lead to long minor collections? Also, I used GCViewer to open some of the logs, and if it is actually parsing things correctly, my max GC pause time is actually 1-3s, so I must have been seen accumulated time (measured in ms/min) as up to 40s/min, but I guess this is not from a single pause/gc event. Would it help to narrow the GC logs I have to around the time of the long minor GC events and include them? > For network cards for example ethtool can be used to tune it (see for > example > http://serverfault.com/questions/241421/napi-vs-adaptive-interrupts). But > I guess it is only a problem when you have mulitple GE interfaces (or > faster). > Hmm... all of these servers have dual-bonded GE interfaces, I wonder if something is up there. (they actually have bonded internal and bonded external as two bonded pairs, this traffic would be going over the bonded internal interface). Thanks again, Aaron -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20131202/14333aaa/attachment.html From daubman at gmail.com Mon Dec 2 21:12:55 2013 From: daubman at gmail.com (Aaron Daubman) Date: Tue, 3 Dec 2013 00:12:55 -0500 Subject: Troubleshooting a ~40-second minor collection In-Reply-To: References:

Message-ID: > > Would it help to narrow the GC logs I have to around the time of the long > minor GC events and include them? > I've attached (trying one more time with higher bz2 compression to get through the 100K limit) the log for the 12minute period where the two largest pauses occurred and Context Switching became high on one of the servers. Over this 12 minutes GCViewer reports there were a total of 243 pauses causing 10.2s total pause time. Max pause was 3.45s and the max pause interval was 5.5s The log starts off close to the largest 3.45s collection time: 2013-12-02T16:28:37.838-0500: 2060.900: [GC pause (young) 2060.900: [G1Ergonomics (CSet Construction) start choosing CSet, predicted base time: 5.92 ms, remaining time: 44.08 ms, target pause time: 50.00 ms] 2064.332: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 1238 regions, survivors: 8 regions, predicted young region time: 13.50 ms] 2064.332: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 1238 regions, survivors: 8 regions, old: 0 regions, predicted pause time: 19.42 ms, target pause time: 50.00 ms] , 3.45061000 secs] It finishes near the second largest time: 2013-12-02T16:40:43.116-0500: 2786.177: [GC pause (young) 2786.177: [G1Ergonomics (CSet Construction) start choosing CSet, predicted base time: 5.88 ms, remaining time: 44.12 ms, target pause time: 50.00 ms] 2786.177: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 1196 regions, survivors: 4 regions, predicted young region time: 21.35 ms] 2788.769: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 1196 regions, survivors: 4 regions, old: 0 regions, predicted pause time: 27.23 ms, target pause time: 50.00 ms] , 2.60856700 secs] This was all from minor (young) collections. Thanks again, Aaron -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20131203/65bbd192/attachment-0001.html -------------- next part -------------- A non-text attachment was scrubbed... Name: GCLogFile_solr02.log.bz2 Type: application/x-bzip2 Size: 57386 bytes Desc: not available Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20131203/65bbd192/GCLogFile_solr02.log-0001.bz2 From bengt.rutisson at oracle.com Mon Dec 2 23:46:38 2013 From: bengt.rutisson at oracle.com (Bengt Rutisson) Date: Tue, 03 Dec 2013 08:46:38 +0100 Subject: Troubleshooting a ~40-second minor collection In-Reply-To: References:

Message-ID: <529D8C5E.1090401@oracle.com> Hi Aaron, I just had a quick look at the GC log and one thing that sticks out is the user vs. real time data for the long GCs that you mention. On 2013-12-03 06:12, Aaron Daubman wrote: > > Would it help to narrow the GC logs I have to around the time of > the long minor GC events and include them? > > > I've attached (trying one more time with higher bz2 compression to get > through the 100K limit) the log for the 12minute period where the two > largest pauses occurred and Context Switching became high on one of > the servers. > Over this 12 minutes GCViewer reports there were a total of 243 pauses > causing 10.2s total pause time. > Max pause was 3.45s and the max pause interval was 5.5s > > The log starts off close to the largest 3.45s collection time: > 2013-12-02T16:28:37.838-0500: 2060.900: [GC pause (young) 2060.900: > [G1Ergonomics (CSet Construction) start choosing CSet, predicted base > time: 5.92 ms, remaining time: 44.08 ms, target pause time: 50.00 ms] > 2064.332: [G1Ergonomics (CSet Construction) add young regions to > CSet, eden: 1238 regions, survivors: 8 regions, predicted young region > time: 13.50 ms] > 2064.332: [G1Ergonomics (CSet Construction) finish choosing CSet, > eden: 1238 regions, survivors: 8 regions, old: 0 regions, predicted > pause time: 19.42 ms, target pause time: 50.00 ms] > , 3.45061000 secs] This GC has has this user/real time info: [Times: user=0.17 sys=0.00, real=3.45 secs] That means that during 3.45 seconds all of the VM threads only got scheduled to actually run on the CPUs for 0.17 seconds. So, it seems like the OS has scheduled the VM threads out for most of that period. The reason for that is most likely that the system is running too many other applications at the same time. > > It finishes near the second largest time: > 2013-12-02T16:40:43.116-0500: 2786.177: [GC pause (young) 2786.177: > [G1Ergonomics (CSet Construction) start choosing CSet, predicted base > time: 5.88 ms, remaining time: 44.12 ms, target pause time: 50.00 ms] > 2786.177: [G1Ergonomics (CSet Construction) add young regions to > CSet, eden: 1196 regions, survivors: 4 regions, predicted young region > time: 21.35 ms] > 2788.769: [G1Ergonomics (CSet Construction) finish choosing CSet, > eden: 1196 regions, survivors: 4 regions, old: 0 regions, predicted > pause time: 27.23 ms, target pause time: 50.00 ms] > , 2.60856700 secs] The user/real time info for this GC is: [Times: user=0.14 sys=0.00, real=2.61 secs] So, the issue seem to be the same. For your normal young collections the user/real time looks more like this: [Times: user=0.12 sys=0.00, real=0.01 secs] That makes more sense. You are using 23 GC threads, so if they get scheduled to run in parallel the user time is higher than the real time. Hths, Bengt > > This was all from minor (young) collections. > > Thanks again, > Aaron > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20131203/6a5dc78b/attachment.html From daubman at gmail.com Tue Dec 3 04:07:08 2013 From: daubman at gmail.com (Aaron Daubman) Date: Tue, 3 Dec 2013 07:07:08 -0500 Subject: Troubleshooting a ~40-second minor collection In-Reply-To: <529D8C5E.1090401@oracle.com> References:

<529D8C5E.1090401@oracle.com> Message-ID: Hi Bengt, Thanks for the pointers - it feels like there may be some hints here as to what to look at next! > This GC has has this user/real time info: > > [Times: user=0.17 sys=0.00, real=3.45 secs] > > That means that during 3.45 seconds all of the VM threads only got > scheduled to actually run on the CPUs for 0.17 seconds. So, it seems like > the OS has scheduled the VM threads out for most of that period. The reason > for that is most likely that the system is running too many other > applications at the same time. > When I see these longer pause times its during the high-rate (~100M/s) file copy and un-tar onto this system. This uses at most 2 out of the 32 cores on this server. So I am still very confused as to what would prevent the VM threads from being scheduled... anybody have any ideas? (The server runs at a load average of ~2 on a 32 core system) Would high context switching or some type of I/O wait have any impact on long real vs. user GC time here? I'm having a hard time imagining how that would happen... especially since all of the memory should be active now... Thanks again, Aaron -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20131203/9084b252/attachment.html From bartosz.markocki at gmail.com Tue Dec 3 04:49:58 2013 From: bartosz.markocki at gmail.com (Bartek Markocki) Date: Tue, 3 Dec 2013 13:49:58 +0100 Subject: Spikes in duration of the minor GC caused by some unknown JVM activity before GC Message-ID: Hello all, We have been performing performance tests of a new release of one of our application. What caught our attention was sporadic spikes in the minor GC times. By default the application uses ParallelGC so we were unable to distinguish between enlarged minor GC times and something else. Therefore we switched to ParNew and added printing safepoint statistics together with tracing safepoint cleanup time. Based on the above we got data that shows that the duration of the STW is not driven by the duration of the minor GC but something else. Here is the extract from the GC log file: 2013-12-02T02:16:29.993-0600: 45476.110: Total time for which application threads were stopped: 0.0304740 seconds 2013-12-02T02:16:30.772-0600: 45476.889: Application time: 0.7794150 seconds {Heap before GC invocations=53652 (full 13): par new generation total 629120K, used 565117K [0x538f0000, 0x7e390000, 0x7e390000) eden space 559232K, 100% used [0x538f0000, 0x75b10000, 0x75b10000) from space 69888K, 8% used [0x75b10000, 0x760cf500, 0x79f50000) to space 69888K, 0% used [0x79f50000, 0x79f50000, 0x7e390000) tenured generation total 1398144K, used 1077208K [0x7e390000, 0xd38f0000, 0xd38f0000) the space 1398144K, 77% used [0x7e390000, 0xbff86080, 0xbff86200, 0xd38f0000) compacting perm gen total 262144K, used 97801K [0xd38f0000, 0xe38f0000, 0xf38f0000) the space 262144K, 37% used [0xd38f0000, 0xd9872690, 0xd9872800, 0xe38f0000) No shared spaces configured. 2013-12-02T02:16:30.787-0600: 45476.904: [GC2013-12-02T02:16:32.776-0600: 45478.893: [ParNew Desired survivor size 35782656 bytes, new threshold 15 (max 15) - age 1: 1856072 bytes, 1856072 total - age 2: 170120 bytes, 2026192 total - age 3: 232696 bytes, 2258888 total - age 4: 180136 bytes, 2439024 total - age 5: 235120 bytes, 2674144 total - age 6: 242976 bytes, 2917120 total - age 7: 231728 bytes, 3148848 total - age 8: 149976 bytes, 3298824 total - age 9: 117904 bytes, 3416728 total - age 10: 126936 bytes, 3543664 total - age 11: 126624 bytes, 3670288 total - age 12: 114256 bytes, 3784544 total - age 13: 146760 bytes, 3931304 total - age 14: 163808 bytes, 4095112 total - age 15: 171664 bytes, 4266776 total : 565117K->5057K(629120K), 0.0629070 secs] 1642325K->1082393K(2027264K), 2.0523080 secs] [Times: user=0.07 sys=0.00, real=2.05 secs] Heap after GC invocations=53653 (full 13): par new generation total 629120K, used 5057K [0x538f0000, 0x7e390000, 0x7e390000) eden space 559232K, 0% used [0x538f0000, 0x538f0000, 0x75b10000) from space 69888K, 7% used [0x79f50000, 0x7a4404b8, 0x7e390000) to space 69888K, 0% used [0x75b10000, 0x75b10000, 0x79f50000) tenured generation total 1398144K, used 1077336K [0x7e390000, 0xd38f0000, 0xd38f0000) the space 1398144K, 77% used [0x7e390000, 0xbffa6080, 0xbffa6200, 0xd38f0000) compacting perm gen total 262144K, used 97801K [0xd38f0000, 0xe38f0000, 0xf38f0000) the space 262144K, 37% used [0xd38f0000, 0xd9872690, 0xd9872800, 0xe38f0000) No shared spaces configured. } 2013-12-02T02:16:32.839-0600: 45478.957: Total time for which application threads were stopped: 2.0675060 seconds Please notice the difference in the times of start of the STW and GC. And here is the output from the safepoint statistics: 45476.086: [deflating idle monitors, 0.0010070 secs] 45476.087: [updating inline caches, 0.0000000 secs] 45476.087: [compilation policy safepoint handler, 0.0005410 secs] 45476.088: [sweeping nmethods, 0.0000020 secs] vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count 45476.078: GenCollectForAllocation [ 294 3 6 ] [ 4 1 6 1 21 ] 3 45476.902: [deflating idle monitors, 0.0010780 secs] 45476.903: [updating inline caches, 0.0000010 secs] 45476.903: [compilation policy safepoint handler, 0.0005200 secs] 45476.904: [sweeping nmethods, 0.0000020 secs] vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count 45476.891: GenCollectForAllocation [ 294 2 3 ] [ 9 3 13 1 2052 ] 2 So if my reading is correct we miss around 1.9 seconds - before the GC started. Based on the above it is unclear to me were this time was spent. The test was running for 40 hours. Over those 40 hours we had 26 incidents where the difference > 200ms. The above incident is the longest. Here they are: 3534.445: GenCollectForAllocation [ 307 0 2 ] [ 0 0 0 1 1935 ] 0 6403.822: GenCollectForAllocation [ 294 2 4 ] [ 0 0 0 1 1683 ] 1 9146.663: GenCollectForAllocation [ 292 2 3 ] [ 24 0 25 2 1773 ] 2 12351.684: GenCollectForAllocation [ 293 4 10 ] [ 2 0 2 1 2044 ] 2 15746.592: GenCollectForAllocation [ 294 1 2 ] [ 0 2 2 1 1697 ] 0 16574.963: GenCollectForAllocation [ 295 1 1 ] [ 0 0 0 1 1224 ] 0 18337.686: GenCollectForAllocation [ 293 1 2 ] [ 1 0 1 1 418 ] 0 19142.518: GenCollectForAllocation [ 295 0 1 ] [ 0 0 0 1 1626 ] 0 20563.826: GenCollectForAllocation [ 296 6 6 ] [ 7 2 9 1 233 ] 4 22611.752: GenCollectForAllocation [ 294 4 4 ] [ 0 0 0 1 1584 ] 1 26043.520: GenCollectForAllocation [ 295 2 6 ] [ 6 4 11 1 1883 ] 1 29584.480: GenCollectForAllocation [ 292 3 5 ] [ 0 0 0 1 1788 ] 2 33119.441: GenCollectForAllocation [ 293 2 4 ] [ 3 0 3 1 1853 ] 2 34800.660: GenCollectForAllocation [ 294 2 4 ] [ 0 0 0 1 725 ] 0 36444.246: GenCollectForAllocation [ 293 1 0 ] [ 0 0 4 1 1815 ] 0 36656.730: GenCollectForAllocation [ 294 1 3 ] [ 0 0 0 1 905 ] 0 39751.609: GenCollectForAllocation [ 294 2 4 ] [ 3 0 3 1 2207 ] 1 41836.305: GenCollectForAllocation [ 293 2 2 ] [ 1 0 1 1 286 ] 1 43323.840: GenCollectForAllocation [ 293 0 1 ] [ 0 0 0 1 2006 ] 0 45476.891: GenCollectForAllocation [ 294 2 3 ] [ 9 3 13 1 2052 ] 2 46288.453: GenCollectForAllocation [ 295 0 2 ] [ 0 4 5 1 211 ] 0 47016.430: GenCollectForAllocation [ 294 4 4 ] [ 0 0 0 1 2408 ] 0 48662.230: GenCollectForAllocation [ 293 1 4 ] [ 0 0 0 1 315 ] 1 48907.250: GenCollectForAllocation [ 296 3 6 ] [ 0 0 0 1 421 ] 0 50662.195: GenCollectForAllocation [ 294 3 4 ] [ 0 0 0 1 2043 ] 1 54128.828: GenCollectForAllocation [ 295 2 2 ] [ 1 1 3 2 2660 ] 2 57729.141: GenCollectForAllocation [ 298 1 4 ] [ 0 0 0 1 1926 ] 0 We are on java7 u40, 32bit version on RHEL5.5. The JVM settings are: -server -Xmx2048m -Xms2048m -XX:PermSize=256m -XX:MaxPermSize=512m -Dsun.rmi.dgc.client.gcInterval=1800000 -Dsun.rmi.dgc.server.gcInterval=1800000 -XX:+DisableExplicitGC -XX:+UseParNewGC -XX:MaxGCPauseMillis=3000 -verbose:gc -Xloggc:... -XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution -XX:+TraceSafepointCleanupTime -XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1 -XX:+PrintGCApplicationConcurrentTime -XX:+PrintGCApplicationStoppedTime -Djava.net.preferIPv4Stack=true The question of the day is: what else can we do to diagnose the above? Thanks in advance, Bartek -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20131203/ba6b7c60/attachment.html From vitalyd at gmail.com Tue Dec 3 06:26:45 2013 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Tue, 3 Dec 2013 09:26:45 -0500 Subject: Spikes in duration of the minor GC caused by some unknown JVM activity before GC In-Reply-To: References: Message-ID: Is anything else running on this host that consumes nontrivial cpu time? Sent from my phone On Dec 3, 2013 7:51 AM, "Bartek Markocki" wrote: > Hello all, > > We have been performing performance tests of a new release of one of our > application. What caught our attention was sporadic spikes in the minor GC > times. > > By default the application uses ParallelGC so we were unable to > distinguish between enlarged minor GC times and something else. Therefore > we switched to ParNew and added printing safepoint statistics together with > tracing safepoint cleanup time. > > Based on the above we got data that shows that the duration of the STW is > not driven by the duration of the minor GC but something else. > > Here is the extract from the GC log file: > 2013-12-02T02:16:29.993-0600: 45476.110: Total time for which application > threads were stopped: 0.0304740 seconds > 2013-12-02T02:16:30.772-0600: 45476.889: Application time: 0.7794150 > seconds > {Heap before GC invocations=53652 (full 13): > par new generation total 629120K, used 565117K [0x538f0000, 0x7e390000, > 0x7e390000) > eden space 559232K, 100% used [0x538f0000, 0x75b10000, 0x75b10000) > from space 69888K, 8% used [0x75b10000, 0x760cf500, 0x79f50000) > to space 69888K, 0% used [0x79f50000, 0x79f50000, 0x7e390000) > tenured generation total 1398144K, used 1077208K [0x7e390000, > 0xd38f0000, 0xd38f0000) > the space 1398144K, 77% used [0x7e390000, 0xbff86080, 0xbff86200, > 0xd38f0000) > compacting perm gen total 262144K, used 97801K [0xd38f0000, 0xe38f0000, > 0xf38f0000) > the space 262144K, 37% used [0xd38f0000, 0xd9872690, 0xd9872800, > 0xe38f0000) > No shared spaces configured. > 2013-12-02T02:16:30.787-0600: 45476.904: [GC2013-12-02T02:16:32.776-0600: > 45478.893: [ParNew > Desired survivor size 35782656 bytes, new threshold 15 (max 15) > - age 1: 1856072 bytes, 1856072 total > - age 2: 170120 bytes, 2026192 total > - age 3: 232696 bytes, 2258888 total > - age 4: 180136 bytes, 2439024 total > - age 5: 235120 bytes, 2674144 total > - age 6: 242976 bytes, 2917120 total > - age 7: 231728 bytes, 3148848 total > - age 8: 149976 bytes, 3298824 total > - age 9: 117904 bytes, 3416728 total > - age 10: 126936 bytes, 3543664 total > - age 11: 126624 bytes, 3670288 total > - age 12: 114256 bytes, 3784544 total > - age 13: 146760 bytes, 3931304 total > - age 14: 163808 bytes, 4095112 total > - age 15: 171664 bytes, 4266776 total > : 565117K->5057K(629120K), 0.0629070 secs] 1642325K->1082393K(2027264K), > 2.0523080 secs] [Times: user=0.07 sys=0.00, real=2.05 secs] > Heap after GC invocations=53653 (full 13): > par new generation total 629120K, used 5057K [0x538f0000, 0x7e390000, > 0x7e390000) > eden space 559232K, 0% used [0x538f0000, 0x538f0000, 0x75b10000) > from space 69888K, 7% used [0x79f50000, 0x7a4404b8, 0x7e390000) > to space 69888K, 0% used [0x75b10000, 0x75b10000, 0x79f50000) > tenured generation total 1398144K, used 1077336K [0x7e390000, > 0xd38f0000, 0xd38f0000) > the space 1398144K, 77% used [0x7e390000, 0xbffa6080, 0xbffa6200, > 0xd38f0000) > compacting perm gen total 262144K, used 97801K [0xd38f0000, 0xe38f0000, > 0xf38f0000) > the space 262144K, 37% used [0xd38f0000, 0xd9872690, 0xd9872800, > 0xe38f0000) > No shared spaces configured. > } > 2013-12-02T02:16:32.839-0600: 45478.957: Total time for which application > threads were stopped: 2.0675060 seconds > > Please notice the difference in the times of start of the STW and GC. > > And here is the output from the safepoint statistics: > 45476.086: [deflating idle monitors, 0.0010070 secs] > 45476.087: [updating inline caches, 0.0000000 secs] > 45476.087: [compilation policy safepoint handler, 0.0005410 secs] > 45476.088: [sweeping nmethods, 0.0000020 secs] > vmop [threads: total initially_running > wait_to_block] [time: spin block sync cleanup vmop] page_trap_count > 45476.078: GenCollectForAllocation [ 294 > 3 6 ] [ 4 1 6 1 21 ] 3 > 45476.902: [deflating idle monitors, 0.0010780 secs] > 45476.903: [updating inline caches, 0.0000010 secs] > 45476.903: [compilation policy safepoint handler, 0.0005200 secs] > 45476.904: [sweeping nmethods, 0.0000020 secs] > vmop [threads: total initially_running > wait_to_block] [time: spin block sync cleanup vmop] page_trap_count > 45476.891: GenCollectForAllocation [ 294 > 2 3 ] [ 9 3 13 1 2052 ] 2 > > So if my reading is correct we miss around 1.9 seconds - before the GC > started. > Based on the above it is unclear to me were this time was spent. > > The test was running for 40 hours. > Over those 40 hours we had 26 incidents where the difference > 200ms. > The above incident is the longest. > Here they are: > 3534.445: GenCollectForAllocation [ 307 > 0 2 ] [ 0 0 0 1 1935 ] 0 > 6403.822: GenCollectForAllocation [ 294 > 2 4 ] [ 0 0 0 1 1683 ] 1 > 9146.663: GenCollectForAllocation [ 292 > 2 3 ] [ 24 0 25 2 1773 ] 2 > 12351.684: GenCollectForAllocation [ 293 > 4 10 ] [ 2 0 2 1 2044 ] 2 > 15746.592: GenCollectForAllocation [ 294 > 1 2 ] [ 0 2 2 1 1697 ] 0 > 16574.963: GenCollectForAllocation [ 295 > 1 1 ] [ 0 0 0 1 1224 ] 0 > 18337.686: GenCollectForAllocation [ 293 > 1 2 ] [ 1 0 1 1 418 ] 0 > 19142.518: GenCollectForAllocation [ 295 > 0 1 ] [ 0 0 0 1 1626 ] 0 > 20563.826: GenCollectForAllocation [ 296 > 6 6 ] [ 7 2 9 1 233 ] 4 > 22611.752: GenCollectForAllocation [ 294 > 4 4 ] [ 0 0 0 1 1584 ] 1 > 26043.520: GenCollectForAllocation [ 295 > 2 6 ] [ 6 4 11 1 1883 ] 1 > 29584.480: GenCollectForAllocation [ 292 > 3 5 ] [ 0 0 0 1 1788 ] 2 > 33119.441: GenCollectForAllocation [ 293 > 2 4 ] [ 3 0 3 1 1853 ] 2 > 34800.660: GenCollectForAllocation [ 294 > 2 4 ] [ 0 0 0 1 725 ] 0 > 36444.246: GenCollectForAllocation [ 293 > 1 0 ] [ 0 0 4 1 1815 ] 0 > 36656.730: GenCollectForAllocation [ 294 > 1 3 ] [ 0 0 0 1 905 ] 0 > 39751.609: GenCollectForAllocation [ 294 > 2 4 ] [ 3 0 3 1 2207 ] 1 > 41836.305: GenCollectForAllocation [ 293 > 2 2 ] [ 1 0 1 1 286 ] 1 > 43323.840: GenCollectForAllocation [ 293 > 0 1 ] [ 0 0 0 1 2006 ] 0 > 45476.891: GenCollectForAllocation [ 294 > 2 3 ] [ 9 3 13 1 2052 ] 2 > 46288.453: GenCollectForAllocation [ 295 > 0 2 ] [ 0 4 5 1 211 ] 0 > 47016.430: GenCollectForAllocation [ 294 > 4 4 ] [ 0 0 0 1 2408 ] 0 > 48662.230: GenCollectForAllocation [ 293 > 1 4 ] [ 0 0 0 1 315 ] 1 > 48907.250: GenCollectForAllocation [ 296 > 3 6 ] [ 0 0 0 1 421 ] 0 > 50662.195: GenCollectForAllocation [ 294 > 3 4 ] [ 0 0 0 1 2043 ] 1 > 54128.828: GenCollectForAllocation [ 295 > 2 2 ] [ 1 1 3 2 2660 ] 2 > 57729.141: GenCollectForAllocation [ 298 > 1 4 ] [ 0 0 0 1 1926 ] 0 > > We are on java7 u40, 32bit version on RHEL5.5. > The JVM settings are: > -server -Xmx2048m -Xms2048m -XX:PermSize=256m -XX:MaxPermSize=512m > -Dsun.rmi.dgc.client.gcInterval=1800000 > -Dsun.rmi.dgc.server.gcInterval=1800000 -XX:+DisableExplicitGC > -XX:+UseParNewGC -XX:MaxGCPauseMillis=3000 > -verbose:gc -Xloggc:... -XX:+PrintGCDetails -XX:+PrintHeapAtGC > -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution > -XX:+TraceSafepointCleanupTime -XX:+PrintSafepointStatistics > -XX:PrintSafepointStatisticsCount=1 -XX:+PrintGCApplicationConcurrentTime > -XX:+PrintGCApplicationStoppedTime > -Djava.net.preferIPv4Stack=true > > The question of the day is: what else can we do to diagnose the above? > > Thanks in advance, > Bartek > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20131203/90c0bc5f/attachment-0001.html From vitalyd at gmail.com Tue Dec 3 06:43:11 2013 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Tue, 3 Dec 2013 09:43:11 -0500 Subject: Troubleshooting a ~40-second minor collection In-Reply-To: References:

<529D8C5E.1090401@oracle.com> Message-ID: Just a theory here ... If that copying is bringing data into memory, it will pollute cpu caches. When those utilities are context switched out and in, it's possible they get run on a core different from before, further polluting caches there (although on a non busy 32 core system, I don't know think kernel scheduler should do that). When gc threads run, it's possible they'll stall heavily due to cache misses - this will manifest itself as low user time, low system time, and high real time. Sent from my phone On Dec 3, 2013 7:08 AM, "Aaron Daubman" wrote: > Hi Bengt, > > Thanks for the pointers - it feels like there may be some hints here as to > what to look at next! > > >> This GC has has this user/real time info: >> >> [Times: user=0.17 sys=0.00, real=3.45 secs] >> >> That means that during 3.45 seconds all of the VM threads only got >> scheduled to actually run on the CPUs for 0.17 seconds. So, it seems like >> the OS has scheduled the VM threads out for most of that period. The reason >> for that is most likely that the system is running too many other >> applications at the same time. >> > > > When I see these longer pause times its during the high-rate (~100M/s) > file copy and un-tar onto this system. This uses at most 2 out of the 32 > cores on this server. So I am still very confused as to what would prevent > the VM threads from being scheduled... anybody have any ideas? (The server > runs at a load average of ~2 on a 32 core system) > > Would high context switching or some type of I/O wait have any impact on > long real vs. user GC time here? I'm having a hard time imagining how that > would happen... especially since all of the memory should be active now... > > Thanks again, > Aaron > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20131203/064fe39e/attachment.html From bartosz.markocki at gmail.com Tue Dec 3 11:11:14 2013 From: bartosz.markocki at gmail.com (Bartek Markocki) Date: Tue, 3 Dec 2013 20:11:14 +0100 Subject: Spikes in duration of the minor GC caused by some unknown JVM activity before GC In-Reply-To: References: Message-ID: yes, we have 4 instances on the box (bare-metal - 16 cores). The workload is distributed fairly equally. I checked all the instances and the issue can be found on all of them however the remaining instances showed the 'gaps' no longer than 200ms and the amount of occurrences is much lower (1 to 4). I also checked the sar statistics for the box and during the entire test they look very similar, i.e, CPU usage ~74%(user), ~10%(system), 0.15%(iowait) proc/s ~ 350 cswch/s ~70000 no swapping fault/s ~9000 majflt/s~0.02 runq-sz ~35 plist-sz ~ 2900 ldavg-1 ~ 34 ldavg-5 ~ 35 ldavg-15 ~ 37 Unfortunately the sar is set up to gather statistics with 10min bucket, so it is not the most accurate source of the data. Bartek On Tue, Dec 3, 2013 at 3:26 PM, Vitaly Davidovich wrote: > Is anything else running on this host that consumes nontrivial cpu time? > > Sent from my phone > On Dec 3, 2013 7:51 AM, "Bartek Markocki" > wrote: > >> Hello all, >> >> We have been performing performance tests of a new release of one of our >> application. What caught our attention was sporadic spikes in the minor GC >> times. >> >> By default the application uses ParallelGC so we were unable to >> distinguish between enlarged minor GC times and something else. Therefore >> we switched to ParNew and added printing safepoint statistics together with >> tracing safepoint cleanup time. >> >> Based on the above we got data that shows that the duration of the STW is >> not driven by the duration of the minor GC but something else. >> >> Here is the extract from the GC log file: >> 2013-12-02T02:16:29.993-0600: 45476.110: Total time for which application >> threads were stopped: 0.0304740 seconds >> 2013-12-02T02:16:30.772-0600: 45476.889: Application time: 0.7794150 >> seconds >> {Heap before GC invocations=53652 (full 13): >> par new generation total 629120K, used 565117K [0x538f0000, >> 0x7e390000, 0x7e390000) >> eden space 559232K, 100% used [0x538f0000, 0x75b10000, 0x75b10000) >> from space 69888K, 8% used [0x75b10000, 0x760cf500, 0x79f50000) >> to space 69888K, 0% used [0x79f50000, 0x79f50000, 0x7e390000) >> tenured generation total 1398144K, used 1077208K [0x7e390000, >> 0xd38f0000, 0xd38f0000) >> the space 1398144K, 77% used [0x7e390000, 0xbff86080, 0xbff86200, >> 0xd38f0000) >> compacting perm gen total 262144K, used 97801K [0xd38f0000, 0xe38f0000, >> 0xf38f0000) >> the space 262144K, 37% used [0xd38f0000, 0xd9872690, 0xd9872800, >> 0xe38f0000) >> No shared spaces configured. >> 2013-12-02T02:16:30.787-0600: 45476.904: [GC2013-12-02T02:16:32.776-0600: >> 45478.893: [ParNew >> Desired survivor size 35782656 bytes, new threshold 15 (max 15) >> - age 1: 1856072 bytes, 1856072 total >> - age 2: 170120 bytes, 2026192 total >> - age 3: 232696 bytes, 2258888 total >> - age 4: 180136 bytes, 2439024 total >> - age 5: 235120 bytes, 2674144 total >> - age 6: 242976 bytes, 2917120 total >> - age 7: 231728 bytes, 3148848 total >> - age 8: 149976 bytes, 3298824 total >> - age 9: 117904 bytes, 3416728 total >> - age 10: 126936 bytes, 3543664 total >> - age 11: 126624 bytes, 3670288 total >> - age 12: 114256 bytes, 3784544 total >> - age 13: 146760 bytes, 3931304 total >> - age 14: 163808 bytes, 4095112 total >> - age 15: 171664 bytes, 4266776 total >> : 565117K->5057K(629120K), 0.0629070 secs] 1642325K->1082393K(2027264K), >> 2.0523080 secs] [Times: user=0.07 sys=0.00, real=2.05 secs] >> Heap after GC invocations=53653 (full 13): >> par new generation total 629120K, used 5057K [0x538f0000, 0x7e390000, >> 0x7e390000) >> eden space 559232K, 0% used [0x538f0000, 0x538f0000, 0x75b10000) >> from space 69888K, 7% used [0x79f50000, 0x7a4404b8, 0x7e390000) >> to space 69888K, 0% used [0x75b10000, 0x75b10000, 0x79f50000) >> tenured generation total 1398144K, used 1077336K [0x7e390000, >> 0xd38f0000, 0xd38f0000) >> the space 1398144K, 77% used [0x7e390000, 0xbffa6080, 0xbffa6200, >> 0xd38f0000) >> compacting perm gen total 262144K, used 97801K [0xd38f0000, 0xe38f0000, >> 0xf38f0000) >> the space 262144K, 37% used [0xd38f0000, 0xd9872690, 0xd9872800, >> 0xe38f0000) >> No shared spaces configured. >> } >> 2013-12-02T02:16:32.839-0600: 45478.957: Total time for which application >> threads were stopped: 2.0675060 seconds >> >> Please notice the difference in the times of start of the STW and GC. >> >> And here is the output from the safepoint statistics: >> 45476.086: [deflating idle monitors, 0.0010070 secs] >> 45476.087: [updating inline caches, 0.0000000 secs] >> 45476.087: [compilation policy safepoint handler, 0.0005410 secs] >> 45476.088: [sweeping nmethods, 0.0000020 secs] >> vmop [threads: total initially_running >> wait_to_block] [time: spin block sync cleanup vmop] page_trap_count >> 45476.078: GenCollectForAllocation [ 294 >> 3 6 ] [ 4 1 6 1 21 ] 3 >> 45476.902: [deflating idle monitors, 0.0010780 secs] >> 45476.903: [updating inline caches, 0.0000010 secs] >> 45476.903: [compilation policy safepoint handler, 0.0005200 secs] >> 45476.904: [sweeping nmethods, 0.0000020 secs] >> vmop [threads: total initially_running >> wait_to_block] [time: spin block sync cleanup vmop] page_trap_count >> 45476.891: GenCollectForAllocation [ 294 >> 2 3 ] [ 9 3 13 1 2052 ] 2 >> >> So if my reading is correct we miss around 1.9 seconds - before the GC >> started. >> Based on the above it is unclear to me were this time was spent. >> >> The test was running for 40 hours. >> Over those 40 hours we had 26 incidents where the difference > 200ms. >> The above incident is the longest. >> Here they are: >> 3534.445: GenCollectForAllocation [ 307 >> 0 2 ] [ 0 0 0 1 1935 ] 0 >> 6403.822: GenCollectForAllocation [ 294 >> 2 4 ] [ 0 0 0 1 1683 ] 1 >> 9146.663: GenCollectForAllocation [ 292 >> 2 3 ] [ 24 0 25 2 1773 ] 2 >> 12351.684: GenCollectForAllocation [ 293 >> 4 10 ] [ 2 0 2 1 2044 ] 2 >> 15746.592: GenCollectForAllocation [ 294 >> 1 2 ] [ 0 2 2 1 1697 ] 0 >> 16574.963: GenCollectForAllocation [ 295 >> 1 1 ] [ 0 0 0 1 1224 ] 0 >> 18337.686: GenCollectForAllocation [ 293 >> 1 2 ] [ 1 0 1 1 418 ] 0 >> 19142.518: GenCollectForAllocation [ 295 >> 0 1 ] [ 0 0 0 1 1626 ] 0 >> 20563.826: GenCollectForAllocation [ 296 >> 6 6 ] [ 7 2 9 1 233 ] 4 >> 22611.752: GenCollectForAllocation [ 294 >> 4 4 ] [ 0 0 0 1 1584 ] 1 >> 26043.520: GenCollectForAllocation [ 295 >> 2 6 ] [ 6 4 11 1 1883 ] 1 >> 29584.480: GenCollectForAllocation [ 292 >> 3 5 ] [ 0 0 0 1 1788 ] 2 >> 33119.441: GenCollectForAllocation [ 293 >> 2 4 ] [ 3 0 3 1 1853 ] 2 >> 34800.660: GenCollectForAllocation [ 294 >> 2 4 ] [ 0 0 0 1 725 ] 0 >> 36444.246: GenCollectForAllocation [ 293 >> 1 0 ] [ 0 0 4 1 1815 ] 0 >> 36656.730: GenCollectForAllocation [ 294 >> 1 3 ] [ 0 0 0 1 905 ] 0 >> 39751.609: GenCollectForAllocation [ 294 >> 2 4 ] [ 3 0 3 1 2207 ] 1 >> 41836.305: GenCollectForAllocation [ 293 >> 2 2 ] [ 1 0 1 1 286 ] 1 >> 43323.840: GenCollectForAllocation [ 293 >> 0 1 ] [ 0 0 0 1 2006 ] 0 >> 45476.891: GenCollectForAllocation [ 294 >> 2 3 ] [ 9 3 13 1 2052 ] 2 >> 46288.453: GenCollectForAllocation [ 295 >> 0 2 ] [ 0 4 5 1 211 ] 0 >> 47016.430: GenCollectForAllocation [ 294 >> 4 4 ] [ 0 0 0 1 2408 ] 0 >> 48662.230: GenCollectForAllocation [ 293 >> 1 4 ] [ 0 0 0 1 315 ] 1 >> 48907.250: GenCollectForAllocation [ 296 >> 3 6 ] [ 0 0 0 1 421 ] 0 >> 50662.195: GenCollectForAllocation [ 294 >> 3 4 ] [ 0 0 0 1 2043 ] 1 >> 54128.828: GenCollectForAllocation [ 295 >> 2 2 ] [ 1 1 3 2 2660 ] 2 >> 57729.141: GenCollectForAllocation [ 298 >> 1 4 ] [ 0 0 0 1 1926 ] 0 >> >> We are on java7 u40, 32bit version on RHEL5.5. >> The JVM settings are: >> -server -Xmx2048m -Xms2048m -XX:PermSize=256m -XX:MaxPermSize=512m >> -Dsun.rmi.dgc.client.gcInterval=1800000 >> -Dsun.rmi.dgc.server.gcInterval=1800000 -XX:+DisableExplicitGC >> -XX:+UseParNewGC -XX:MaxGCPauseMillis=3000 >> -verbose:gc -Xloggc:... -XX:+PrintGCDetails -XX:+PrintHeapAtGC >> -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution >> -XX:+TraceSafepointCleanupTime -XX:+PrintSafepointStatistics >> -XX:PrintSafepointStatisticsCount=1 -XX:+PrintGCApplicationConcurrentTime >> -XX:+PrintGCApplicationStoppedTime >> -Djava.net.preferIPv4Stack=true >> >> The question of the day is: what else can we do to diagnose the above? >> >> Thanks in advance, >> Bartek >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20131203/c86107fd/attachment.html From vitalyd at gmail.com Tue Dec 3 12:06:58 2013 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Tue, 3 Dec 2013 15:06:58 -0500 Subject: Spikes in duration of the minor GC caused by some unknown JVM activity before GC In-Reply-To: References: Message-ID: Well, the reason I asked is because of this line: 565117K->5057K(629120K), 0.0629070 secs] 1642325K->1082393K(2027264K), 2.0523080 secs] [Times: user=0.07 sys=0.00, real=2.05 secs] Typically, when you have user time completely out of whack with real time (and low sys time), you have (1) gc threads not getting enough cpu time due to other things being runnable, (2) severe stalls incurred by gc threads due to something like cache misses or other hardware hazards, or (3) hard page faults. You have 4 instances running - only 1 user java thread? Also, JVM will allocate a bunch of gc worker threads, and since you have 4 JVMs running with each not knowing about the others, machine may get oversubscribed. Are the other pauses also showing low user and sys time but high real time (relative to other two)? Can you see if several of these JVMs are doing gcs around same time when this pauses happen? the sar output shows kind of high context switch count. Sent from my phone On Dec 3, 2013 2:11 PM, "Bartek Markocki" wrote: > yes, we have 4 instances on the box (bare-metal - 16 cores). The workload > is distributed fairly equally. > > I checked all the instances and the issue can be found on all of them > however the remaining instances showed the 'gaps' no longer than 200ms and > the amount of occurrences is much lower (1 to 4). > > I also checked the sar statistics for the box and during the entire test > they look very similar, i.e, > CPU usage ~74%(user), ~10%(system), 0.15%(iowait) > proc/s ~ 350 > cswch/s ~70000 > no swapping > fault/s ~9000 > majflt/s~0.02 > runq-sz ~35 > plist-sz ~ 2900 > ldavg-1 ~ 34 > ldavg-5 ~ 35 > ldavg-15 ~ 37 > > Unfortunately the sar is set up to gather statistics with 10min bucket, so > it is not the most accurate source of the data. > > Bartek > > > On Tue, Dec 3, 2013 at 3:26 PM, Vitaly Davidovich wrote: > >> Is anything else running on this host that consumes nontrivial cpu time? >> >> Sent from my phone >> On Dec 3, 2013 7:51 AM, "Bartek Markocki" >> wrote: >> >>> Hello all, >>> >>> We have been performing performance tests of a new release of one of our >>> application. What caught our attention was sporadic spikes in the minor GC >>> times. >>> >>> By default the application uses ParallelGC so we were unable to >>> distinguish between enlarged minor GC times and something else. Therefore >>> we switched to ParNew and added printing safepoint statistics together with >>> tracing safepoint cleanup time. >>> >>> Based on the above we got data that shows that the duration of the STW >>> is not driven by the duration of the minor GC but something else. >>> >>> Here is the extract from the GC log file: >>> 2013-12-02T02:16:29.993-0600: 45476.110: Total time for which >>> application threads were stopped: 0.0304740 seconds >>> 2013-12-02T02:16:30.772-0600: 45476.889: Application time: 0.7794150 >>> seconds >>> {Heap before GC invocations=53652 (full 13): >>> par new generation total 629120K, used 565117K [0x538f0000, >>> 0x7e390000, 0x7e390000) >>> eden space 559232K, 100% used [0x538f0000, 0x75b10000, 0x75b10000) >>> from space 69888K, 8% used [0x75b10000, 0x760cf500, 0x79f50000) >>> to space 69888K, 0% used [0x79f50000, 0x79f50000, 0x7e390000) >>> tenured generation total 1398144K, used 1077208K [0x7e390000, >>> 0xd38f0000, 0xd38f0000) >>> the space 1398144K, 77% used [0x7e390000, 0xbff86080, 0xbff86200, >>> 0xd38f0000) >>> compacting perm gen total 262144K, used 97801K [0xd38f0000, >>> 0xe38f0000, 0xf38f0000) >>> the space 262144K, 37% used [0xd38f0000, 0xd9872690, 0xd9872800, >>> 0xe38f0000) >>> No shared spaces configured. >>> 2013-12-02T02:16:30.787-0600: 45476.904: >>> [GC2013-12-02T02:16:32.776-0600: 45478.893: [ParNew >>> Desired survivor size 35782656 bytes, new threshold 15 (max 15) >>> - age 1: 1856072 bytes, 1856072 total >>> - age 2: 170120 bytes, 2026192 total >>> - age 3: 232696 bytes, 2258888 total >>> - age 4: 180136 bytes, 2439024 total >>> - age 5: 235120 bytes, 2674144 total >>> - age 6: 242976 bytes, 2917120 total >>> - age 7: 231728 bytes, 3148848 total >>> - age 8: 149976 bytes, 3298824 total >>> - age 9: 117904 bytes, 3416728 total >>> - age 10: 126936 bytes, 3543664 total >>> - age 11: 126624 bytes, 3670288 total >>> - age 12: 114256 bytes, 3784544 total >>> - age 13: 146760 bytes, 3931304 total >>> - age 14: 163808 bytes, 4095112 total >>> - age 15: 171664 bytes, 4266776 total >>> : 565117K->5057K(629120K), 0.0629070 secs] 1642325K->1082393K(2027264K), >>> 2.0523080 secs] [Times: user=0.07 sys=0.00, real=2.05 secs] >>> Heap after GC invocations=53653 (full 13): >>> par new generation total 629120K, used 5057K [0x538f0000, 0x7e390000, >>> 0x7e390000) >>> eden space 559232K, 0% used [0x538f0000, 0x538f0000, 0x75b10000) >>> from space 69888K, 7% used [0x79f50000, 0x7a4404b8, 0x7e390000) >>> to space 69888K, 0% used [0x75b10000, 0x75b10000, 0x79f50000) >>> tenured generation total 1398144K, used 1077336K [0x7e390000, >>> 0xd38f0000, 0xd38f0000) >>> the space 1398144K, 77% used [0x7e390000, 0xbffa6080, 0xbffa6200, >>> 0xd38f0000) >>> compacting perm gen total 262144K, used 97801K [0xd38f0000, >>> 0xe38f0000, 0xf38f0000) >>> the space 262144K, 37% used [0xd38f0000, 0xd9872690, 0xd9872800, >>> 0xe38f0000) >>> No shared spaces configured. >>> } >>> 2013-12-02T02:16:32.839-0600: 45478.957: Total time for which >>> application threads were stopped: 2.0675060 seconds >>> >>> Please notice the difference in the times of start of the STW and GC. >>> >>> And here is the output from the safepoint statistics: >>> 45476.086: [deflating idle monitors, 0.0010070 secs] >>> 45476.087: [updating inline caches, 0.0000000 secs] >>> 45476.087: [compilation policy safepoint handler, 0.0005410 secs] >>> 45476.088: [sweeping nmethods, 0.0000020 secs] >>> vmop [threads: total initially_running >>> wait_to_block] [time: spin block sync cleanup vmop] page_trap_count >>> 45476.078: GenCollectForAllocation [ 294 >>> 3 6 ] [ 4 1 6 1 21 ] 3 >>> 45476.902: [deflating idle monitors, 0.0010780 secs] >>> 45476.903: [updating inline caches, 0.0000010 secs] >>> 45476.903: [compilation policy safepoint handler, 0.0005200 secs] >>> 45476.904: [sweeping nmethods, 0.0000020 secs] >>> vmop [threads: total initially_running >>> wait_to_block] [time: spin block sync cleanup vmop] page_trap_count >>> 45476.891: GenCollectForAllocation [ 294 >>> 2 3 ] [ 9 3 13 1 2052 ] 2 >>> >>> So if my reading is correct we miss around 1.9 seconds - before the GC >>> started. >>> Based on the above it is unclear to me were this time was spent. >>> >>> The test was running for 40 hours. >>> Over those 40 hours we had 26 incidents where the difference > 200ms. >>> The above incident is the longest. >>> Here they are: >>> 3534.445: GenCollectForAllocation [ 307 >>> 0 2 ] [ 0 0 0 1 1935 ] 0 >>> 6403.822: GenCollectForAllocation [ 294 >>> 2 4 ] [ 0 0 0 1 1683 ] 1 >>> 9146.663: GenCollectForAllocation [ 292 >>> 2 3 ] [ 24 0 25 2 1773 ] 2 >>> 12351.684: GenCollectForAllocation [ 293 >>> 4 10 ] [ 2 0 2 1 2044 ] 2 >>> 15746.592: GenCollectForAllocation [ 294 >>> 1 2 ] [ 0 2 2 1 1697 ] 0 >>> 16574.963: GenCollectForAllocation [ 295 >>> 1 1 ] [ 0 0 0 1 1224 ] 0 >>> 18337.686: GenCollectForAllocation [ 293 >>> 1 2 ] [ 1 0 1 1 418 ] 0 >>> 19142.518: GenCollectForAllocation [ 295 >>> 0 1 ] [ 0 0 0 1 1626 ] 0 >>> 20563.826: GenCollectForAllocation [ 296 >>> 6 6 ] [ 7 2 9 1 233 ] 4 >>> 22611.752: GenCollectForAllocation [ 294 >>> 4 4 ] [ 0 0 0 1 1584 ] 1 >>> 26043.520: GenCollectForAllocation [ 295 >>> 2 6 ] [ 6 4 11 1 1883 ] 1 >>> 29584.480: GenCollectForAllocation [ 292 >>> 3 5 ] [ 0 0 0 1 1788 ] 2 >>> 33119.441: GenCollectForAllocation [ 293 >>> 2 4 ] [ 3 0 3 1 1853 ] 2 >>> 34800.660: GenCollectForAllocation [ 294 >>> 2 4 ] [ 0 0 0 1 725 ] 0 >>> 36444.246: GenCollectForAllocation [ 293 >>> 1 0 ] [ 0 0 4 1 1815 ] 0 >>> 36656.730: GenCollectForAllocation [ 294 >>> 1 3 ] [ 0 0 0 1 905 ] 0 >>> 39751.609: GenCollectForAllocation [ 294 >>> 2 4 ] [ 3 0 3 1 2207 ] 1 >>> 41836.305: GenCollectForAllocation [ 293 >>> 2 2 ] [ 1 0 1 1 286 ] 1 >>> 43323.840: GenCollectForAllocation [ 293 >>> 0 1 ] [ 0 0 0 1 2006 ] 0 >>> 45476.891: GenCollectForAllocation [ 294 >>> 2 3 ] [ 9 3 13 1 2052 ] 2 >>> 46288.453: GenCollectForAllocation [ 295 >>> 0 2 ] [ 0 4 5 1 211 ] 0 >>> 47016.430: GenCollectForAllocation [ 294 >>> 4 4 ] [ 0 0 0 1 2408 ] 0 >>> 48662.230: GenCollectForAllocation [ 293 >>> 1 4 ] [ 0 0 0 1 315 ] 1 >>> 48907.250: GenCollectForAllocation [ 296 >>> 3 6 ] [ 0 0 0 1 421 ] 0 >>> 50662.195: GenCollectForAllocation [ 294 >>> 3 4 ] [ 0 0 0 1 2043 ] 1 >>> 54128.828: GenCollectForAllocation [ 295 >>> 2 2 ] [ 1 1 3 2 2660 ] 2 >>> 57729.141: GenCollectForAllocation [ 298 >>> 1 4 ] [ 0 0 0 1 1926 ] 0 >>> >>> We are on java7 u40, 32bit version on RHEL5.5. >>> The JVM settings are: >>> -server -Xmx2048m -Xms2048m -XX:PermSize=256m -XX:MaxPermSize=512m >>> -Dsun.rmi.dgc.client.gcInterval=1800000 >>> -Dsun.rmi.dgc.server.gcInterval=1800000 -XX:+DisableExplicitGC >>> -XX:+UseParNewGC -XX:MaxGCPauseMillis=3000 >>> -verbose:gc -Xloggc:... -XX:+PrintGCDetails -XX:+PrintHeapAtGC >>> -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution >>> -XX:+TraceSafepointCleanupTime -XX:+PrintSafepointStatistics >>> -XX:PrintSafepointStatisticsCount=1 -XX:+PrintGCApplicationConcurrentTime >>> -XX:+PrintGCApplicationStoppedTime >>> -Djava.net.preferIPv4Stack=true >>> >>> The question of the day is: what else can we do to diagnose the above? >>> >>> Thanks in advance, >>> Bartek >>> >>> _______________________________________________ >>> hotspot-gc-use mailing list >>> hotspot-gc-use at openjdk.java.net >>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>> >>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20131203/df0ce46a/attachment.html From carlmeyer at hotmail.com Mon Dec 2 14:57:17 2013 From: carlmeyer at hotmail.com (Carl Meyer) Date: Tue, 3 Dec 2013 08:57:17 +1000 Subject: g1 notes with adobe coldfusion In-Reply-To: References: , Message-ID: Hello, I have been using garbage first on some Adobe ColdFusion servers which runs Java 7 (1.7.0_15 natively this particular gc log runs 1.7.0_25). Can I get some commentary on the gc logs? When I look at Java via Jconsole or Jvisualvm I notice the heap is more occupied than previous gc - UseParallelGC. This has lead to desirable outcome with the users generally noticing overall improvement in performance which can likely be attributed to their objects residing in memory longer to be referred again and one full gc pause for short duration per day or less than one per day. When looking at GC logs I wonder why do a see some threads are out of sync. Here is a log sample: {Heap before GC invocations=1712 (full 65): garbage-first heap total 10366976K, used 5739193K [0x0000000567400000, 0x00000007e0000000, 0x00000007e0000000) region size 2048K, 1012 young (2072576K), 25 survivors (51200K) compacting perm gen total 317440K, used 315881K [0x00000007e0000000, 0x00000007f3600000, 0x0000000800000000) the space 317440K, 99% used [0x00000007e0000000, 0x00000007f347a740, 0x00000007f347a800, 0x00000007f3600000)No shared spaces configured.947848.595: [GC pause (young), 0.29434754 secs] [Parallel Time: 240.9 ms] [GC Worker Start (ms): 947848595.4 947848595.5 947848595.6 947848595.7 Avg: 947848595.6, Min: 947848595.4, Max: 947848595.7, Diff: 0.2] [Ext Root Scanning (ms): 81.5 106.7 109.5 114.4 Avg: 103.0, Min: 81.5, Max: 114.4, Diff: 32.9] [Update RS (ms): 44.7 11.9 3.2 11.7 Avg: 17.9, Min: 3.2, Max: 44.7, Diff: 41.4] [Processed Buffers : 40 12 13 7 Sum: 72, Avg: 18, Min: 7, Max: 40, Diff: 33] [Scan RS (ms): 7.0 6.7 7.0 6.6 Avg: 6.9, Min: 6.6, Max: 7.0, Diff: 0.4] [Object Copy (ms): 82.1 89.7 95.3 82.2 Avg: 87.3, Min: 82.1, Max: 95.3, Diff: 13.2] [Termination (ms): 0.0 0.1 0.0 0.0 Avg: 0.0, Min: 0.0, Max: 0.1, Diff: 0.1] [Termination Attempts : 1 4 5 2 Sum: 12, Avg: 3, Min: 1, Max: 5, Diff: 4] [GC Worker End (ms): 947848810.8 947848810.7 947848810.8 947848810.9 Avg: 947848810.8, Min: 947848810.7, Max: 947848810.9, Diff: 0.1] [GC Worker (ms): 215.3 215.2 215.2 215.2 Avg: 215.2, Min: 215.2, Max: 215.3, Diff: 0.2] [GC Worker Other (ms): 25.7 25.8 25.9 25.9 Avg: 25.8, Min: 25.7, Max: 25.9, Diff: 0.2] [Clear CT: 0.7 ms] [Other: 52.7 ms] [Choose CSet: 0.2 ms] [Ref Proc: 41.6 ms] [Ref Enq: 0.3 ms] [Free CSet: 9.8 ms] [Eden: 1974M(1974M)->0B(1972M) Survivors: 50M->52M Heap: 5604M(10124M)->3632M(10124M)] [Times: user=0.87 sys=0.06, real=0.30 secs] Heap after GC invocations=1713 (full 65): garbage-first heap total 10366976K, used 3719541K [0x0000000567400000, 0x00000007e0000000, 0x00000007e0000000) region size 2048K, 26 young (53248K), 26 survivors (53248K) compacting perm gen total 317440K, used 315881K [0x00000007e0000000, 0x00000007f3600000, 0x0000000800000000) the space 317440K, 99% used [0x00000007e0000000, 0x00000007f347a740, 0x00000007f347a800, 0x00000007f3600000)No shared spaces configured.} Do the differences here (bold) indicate some more G1 tuning is needed to run optimally? [Update RS (ms): 44.7 11.9 3.2 11.7 Avg: 17.9, Min: 3.2, Max: 44.7, Diff: 41.4] [Processed Buffers : 40 12 13 7 Sum: 72, Avg: 18, Min: 7, Max: 40, Diff: 33] [Other: 52.7 ms] [Choose CSet: 0.2 ms] [Ref Proc: 41.6 ms] [Ref Enq: 0.3 ms] [Free CSet: 9.8 ms] Thanks in advance, Carl. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20131203/a5e0455b/attachment.html From charlesjhunt at gmail.com Tue Dec 3 14:01:14 2013 From: charlesjhunt at gmail.com (charlie hunt) Date: Tue, 3 Dec 2013 16:01:14 -0600 Subject: g1 notes with adobe coldfusion In-Reply-To: References: ,

Message-ID: Hi Carl, A couple things strike me as they could (possibly) use fine tuning. But, to first to address your ?threads out of sync? ? the difference you?re seeing in min and max Update RS and Processed Buffers do not necessarily mean that the threads were out of sync. It looks like one of the threads spent quite a bit more time updating remembered sets than the others, and that same thread processed a lot more buffers than the others. Hence, we can make some sense out of why one thread took quite a bit longer than the other threads. The GC threads will attempt work stealing. And, based on the termination times, and termination attempts, it doesn?t look like a particular GC thread or threads were not in a termination protocol for very long while waiting for another to complete. The Ref Proc time in the Other time looks pretty high. This suggests your app doing some significant reference processing work. You should consider adding -XX:+ParallelRefProcEnabled to reduce that Ref Proc time. I also noticed in the PrintHeapAtGC output you?ve had 65 full GCs. You might consider investigating where these are coming from, and also look for ?to-space overflow? in the GC logs. When fine tuning G1 GC, I?ve generally found that using -XX:+PrintGCDetails and -XX:+PrintAdaptiveSizePolicy along with -XX:+PrintGC[Date|Time]Stamps to be the most useful, (more so than +PrintHeapAtGC). If you haven?t already done it, if you do a search for tuning G1 GC you will probably find some helpful G1 tuning presentations from Monica Beckwith, John Cuthbertson and I. And, you will also probably find a couple good articles on InfoQ by Monica. Ooh, almost forgot ? if you can move to 7u40, you?ll probably see even better results. hths, charlie ? On Dec 2, 2013, at 4:57 PM, Carl Meyer wrote: > > Hello, > > I have been using garbage first on some Adobe ColdFusion servers which runs Java 7 (1.7.0_15 natively this particular gc log runs 1.7.0_25). Can I get some commentary on the gc logs? > > When I look at Java via Jconsole or Jvisualvm I notice the heap is more occupied than previous gc - UseParallelGC. This has lead to desirable outcome with the users generally noticing overall improvement in performance which can likely be attributed to their objects residing in memory longer to be referred again and one full gc pause for short duration per day or less than one per day. > > When looking at GC logs I wonder why do a see some threads are out of sync. Here is a log sample: > > {Heap before GC invocations=1712 (full 65): > garbage-first heap total 10366976K, used 5739193K [0x0000000567400000, 0x00000007e0000000, 0x00000007e0000000) > region size 2048K, 1012 young (2072576K), 25 survivors (51200K) > compacting perm gen total 317440K, used 315881K [0x00000007e0000000, 0x00000007f3600000, 0x0000000800000000) > the space 317440K, 99% used [0x00000007e0000000, 0x00000007f347a740, 0x00000007f347a800, 0x00000007f3600000) > No shared spaces configured. > 947848.595: [GC pause (young), 0.29434754 secs] > [Parallel Time: 240.9 ms] > [GC Worker Start (ms): 947848595.4 947848595.5 947848595.6 947848595.7 > Avg: 947848595.6, Min: 947848595.4, Max: 947848595.7, Diff: 0.2] > [Ext Root Scanning (ms): 81.5 106.7 109.5 114.4 > Avg: 103.0, Min: 81.5, Max: 114.4, Diff: 32.9] > [Update RS (ms): 44.7 11.9 3.2 11.7 > Avg: 17.9, Min: 3.2, Max: 44.7, Diff: 41.4] > [Processed Buffers : 40 12 13 7 > Sum: 72, Avg: 18, Min: 7, Max: 40, Diff: 33] > [Scan RS (ms): 7.0 6.7 7.0 6.6 > Avg: 6.9, Min: 6.6, Max: 7.0, Diff: 0.4] > [Object Copy (ms): 82.1 89.7 95.3 82.2 > Avg: 87.3, Min: 82.1, Max: 95.3, Diff: 13.2] > [Termination (ms): 0.0 0.1 0.0 0.0 > Avg: 0.0, Min: 0.0, Max: 0.1, Diff: 0.1] > [Termination Attempts : 1 4 5 2 > Sum: 12, Avg: 3, Min: 1, Max: 5, Diff: 4] > [GC Worker End (ms): 947848810.8 947848810.7 947848810.8 947848810.9 > Avg: 947848810.8, Min: 947848810.7, Max: 947848810.9, Diff: 0.1] > [GC Worker (ms): 215.3 215.2 215.2 215.2 > Avg: 215.2, Min: 215.2, Max: 215.3, Diff: 0.2] > [GC Worker Other (ms): 25.7 25.8 25.9 25.9 > Avg: 25.8, Min: 25.7, Max: 25.9, Diff: 0.2] > [Clear CT: 0.7 ms] > [Other: 52.7 ms] > [Choose CSet: 0.2 ms] > [Ref Proc: 41.6 ms] > [Ref Enq: 0.3 ms] > [Free CSet: 9.8 ms] > [Eden: 1974M(1974M)->0B(1972M) Survivors: 50M->52M Heap: 5604M(10124M)->3632M(10124M)] > [Times: user=0.87 sys=0.06, real=0.30 secs] > Heap after GC invocations=1713 (full 65): > garbage-first heap total 10366976K, used 3719541K [0x0000000567400000, 0x00000007e0000000, 0x00000007e0000000) > region size 2048K, 26 young (53248K), 26 survivors (53248K) > compacting perm gen total 317440K, used 315881K [0x00000007e0000000, 0x00000007f3600000, 0x0000000800000000) > the space 317440K, 99% used [0x00000007e0000000, 0x00000007f347a740, 0x00000007f347a800, 0x00000007f3600000) > No shared spaces configured. > } > > Do the differences here (bold) indicate some more G1 tuning is needed to run optimally? > > [Update RS (ms): 44.7 11.9 3.2 11.7 > Avg: 17.9, Min: 3.2, Max: 44.7, Diff: 41.4] > [Processed Buffers : 40 12 13 7 > Sum: 72, Avg: 18, Min: 7, Max: 40, Diff: 33] > > [Other: 52.7 ms] > [Choose CSet: 0.2 ms] > [Ref Proc: 41.6 ms] > [Ref Enq: 0.3 ms] > [Free CSet: 9.8 ms] > > > Thanks in advance, Carl. > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20131203/98a185a5/attachment-0001.html From bartosz.markocki at gmail.com Fri Dec 6 06:59:44 2013 From: bartosz.markocki at gmail.com (Bartek Markocki) Date: Fri, 6 Dec 2013 15:59:44 +0100 Subject: Spikes in duration of the minor GC caused by some unknown JVM activity before GC In-Reply-To: References:

Message-ID: Hi Vitaly, Thanks for the hint about the real time vs user+sys. I rechecked the test setup and environment. The more I dug the more things I was finding and that resulted in the conclusion there were too many moving parts so anyone can draw a meaningful conclusion from the test. So the test will be redone in a more 'predictable' environment. Despite the above I have found one more peculiar behavior which I do not fully understand. There were other box with other application involved in the test. RHEL 5.5 and java7 u10 64bit running CMS, two instances on the box. Both instances got promotion failure due to humongous allocation (~1M array). They did not get failure during the same time. When I looked closely at the performance of the Full GC I have found this fellow: 53786.315: [GC 53786.315: [ParNew (0: promotion failure size = 3) (1: promotion failure size = 5) (2: promotion failure size = 4) (3: promotion failure size = 5) (4: promotion failure size = 3) (5: promotion failure size = 3) (6: promotion failure size = 5) (7: promotion failure size = 5) (8: promotion failure size = 5) (9: promotion failure size = 3) (10: promotion failure size = 5) (11: promotion failure size = 4) (12: promotion failure size = 3) (promotion failed): 1887398K->1811380K(1887488K), 67.8065550 secs]53854.122: [CMS: 1042845K->786493K(1048576K), 1.9985940 secs] 2835614K->786493K(2936064K), [CMS Perm : 44209K->44206K(73852K)], 69.8061140 secs] [Times: user=70.20 sys=15.68, real=69.80 secs] Other promotion failures looked similar to this one: 47287.350: [GC 47287.351: [ParNew (0: promotion failure size = 3) (1: promotion failure size = 4) (2: promotion failure size = 4) (3: promotion failure size = 5) (4: promotion failure size = 5) (5: promotion failure size = 4) (6: promotion failure size = 5) (7: promotion failure size = 5) (8: promotion failure size = 3) (9: promotion failure size = 3) (10: promotion failure size = 3) (11: promotion failure size = 5) (12: promotion failure size = 4) (promotion failed): 1887488K->1836533K(1887488K), 0.8193500 secs]47288.170: [CMS: 1033291K->779944K(1048576K), 2.1294690 secs] 2778468K->779944K(2936064K), [CMS Perm : 44209K->44209K(73852K)], 2.9497890 secs] [Times: user=4.35 sys=0.08, real=2.95 secs] I checked (for the time of the incident): 1. cpu utilization - no increase (stable on ~40%) 2. swap - no swapping 3. iostat,, sar - no significant change comparing to the time before and after the incident 4. performance of the other instance on the box in terms of # transactions, duration (including max) of the transactions and did not find a sign of any slowdown And even if I ignore the value for the real time, I cannot find explanation for the value of the sys time. Does it ring any bell to you? Is there anything else I can check to understand that? Thanks a lot! Bartek On Tue, Dec 3, 2013 at 9:06 PM, Vitaly Davidovich wrote: > Well, the reason I asked is because of this line: > > 565117K->5057K(629120K), 0.0629070 secs] 1642325K->1082393K(2027264K), > 2.0523080 secs] [Times: user=0.07 sys=0.00, real=2.05 secs] > > Typically, when you have user time completely out of whack with real time > (and low sys time), you have (1) gc threads not getting enough cpu time due > to other things being runnable, (2) severe stalls incurred by gc threads > due to something like cache misses or other hardware hazards, or (3) hard > page faults. > > You have 4 instances running - only 1 user java thread? Also, JVM will > allocate a bunch of gc worker threads, and since you have 4 JVMs running > with each not knowing about the others, machine may get oversubscribed. > > Are the other pauses also showing low user and sys time but high real time > (relative to other two)? Can you see if several of these JVMs are doing gcs > around same time when this pauses happen? > > the sar output shows kind of high context switch count. > > Sent from my phone > On Dec 3, 2013 2:11 PM, "Bartek Markocki" > wrote: > >> yes, we have 4 instances on the box (bare-metal - 16 cores). The workload >> is distributed fairly equally. >> >> I checked all the instances and the issue can be found on all of them >> however the remaining instances showed the 'gaps' no longer than 200ms and >> the amount of occurrences is much lower (1 to 4). >> >> I also checked the sar statistics for the box and during the entire test >> they look very similar, i.e, >> CPU usage ~74%(user), ~10%(system), 0.15%(iowait) >> proc/s ~ 350 >> cswch/s ~70000 >> no swapping >> fault/s ~9000 >> majflt/s~0.02 >> runq-sz ~35 >> plist-sz ~ 2900 >> ldavg-1 ~ 34 >> ldavg-5 ~ 35 >> ldavg-15 ~ 37 >> >> Unfortunately the sar is set up to gather statistics with 10min bucket, >> so it is not the most accurate source of the data. >> >> Bartek >> >> >> On Tue, Dec 3, 2013 at 3:26 PM, Vitaly Davidovich wrote: >> >>> Is anything else running on this host that consumes nontrivial cpu time? >>> >>> Sent from my phone >>> On Dec 3, 2013 7:51 AM, "Bartek Markocki" >>> wrote: >>> >>>> Hello all, >>>> >>>> We have been performing performance tests of a new release of one of >>>> our application. What caught our attention was sporadic spikes in the minor >>>> GC times. >>>> >>>> By default the application uses ParallelGC so we were unable to >>>> distinguish between enlarged minor GC times and something else. Therefore >>>> we switched to ParNew and added printing safepoint statistics together with >>>> tracing safepoint cleanup time. >>>> >>>> Based on the above we got data that shows that the duration of the STW >>>> is not driven by the duration of the minor GC but something else. >>>> >>>> Here is the extract from the GC log file: >>>> 2013-12-02T02:16:29.993-0600: 45476.110: Total time for which >>>> application threads were stopped: 0.0304740 seconds >>>> 2013-12-02T02:16:30.772-0600: 45476.889: Application time: 0.7794150 >>>> seconds >>>> {Heap before GC invocations=53652 (full 13): >>>> par new generation total 629120K, used 565117K [0x538f0000, >>>> 0x7e390000, 0x7e390000) >>>> eden space 559232K, 100% used [0x538f0000, 0x75b10000, 0x75b10000) >>>> from space 69888K, 8% used [0x75b10000, 0x760cf500, 0x79f50000) >>>> to space 69888K, 0% used [0x79f50000, 0x79f50000, 0x7e390000) >>>> tenured generation total 1398144K, used 1077208K [0x7e390000, >>>> 0xd38f0000, 0xd38f0000) >>>> the space 1398144K, 77% used [0x7e390000, 0xbff86080, 0xbff86200, >>>> 0xd38f0000) >>>> compacting perm gen total 262144K, used 97801K [0xd38f0000, >>>> 0xe38f0000, 0xf38f0000) >>>> the space 262144K, 37% used [0xd38f0000, 0xd9872690, 0xd9872800, >>>> 0xe38f0000) >>>> No shared spaces configured. >>>> 2013-12-02T02:16:30.787-0600: 45476.904: >>>> [GC2013-12-02T02:16:32.776-0600: 45478.893: [ParNew >>>> Desired survivor size 35782656 bytes, new threshold 15 (max 15) >>>> - age 1: 1856072 bytes, 1856072 total >>>> - age 2: 170120 bytes, 2026192 total >>>> - age 3: 232696 bytes, 2258888 total >>>> - age 4: 180136 bytes, 2439024 total >>>> - age 5: 235120 bytes, 2674144 total >>>> - age 6: 242976 bytes, 2917120 total >>>> - age 7: 231728 bytes, 3148848 total >>>> - age 8: 149976 bytes, 3298824 total >>>> - age 9: 117904 bytes, 3416728 total >>>> - age 10: 126936 bytes, 3543664 total >>>> - age 11: 126624 bytes, 3670288 total >>>> - age 12: 114256 bytes, 3784544 total >>>> - age 13: 146760 bytes, 3931304 total >>>> - age 14: 163808 bytes, 4095112 total >>>> - age 15: 171664 bytes, 4266776 total >>>> : 565117K->5057K(629120K), 0.0629070 secs] >>>> 1642325K->1082393K(2027264K), 2.0523080 secs] [Times: user=0.07 sys=0.00, >>>> real=2.05 secs] >>>> Heap after GC invocations=53653 (full 13): >>>> par new generation total 629120K, used 5057K [0x538f0000, >>>> 0x7e390000, 0x7e390000) >>>> eden space 559232K, 0% used [0x538f0000, 0x538f0000, 0x75b10000) >>>> from space 69888K, 7% used [0x79f50000, 0x7a4404b8, 0x7e390000) >>>> to space 69888K, 0% used [0x75b10000, 0x75b10000, 0x79f50000) >>>> tenured generation total 1398144K, used 1077336K [0x7e390000, >>>> 0xd38f0000, 0xd38f0000) >>>> the space 1398144K, 77% used [0x7e390000, 0xbffa6080, 0xbffa6200, >>>> 0xd38f0000) >>>> compacting perm gen total 262144K, used 97801K [0xd38f0000, >>>> 0xe38f0000, 0xf38f0000) >>>> the space 262144K, 37% used [0xd38f0000, 0xd9872690, 0xd9872800, >>>> 0xe38f0000) >>>> No shared spaces configured. >>>> } >>>> 2013-12-02T02:16:32.839-0600: 45478.957: Total time for which >>>> application threads were stopped: 2.0675060 seconds >>>> >>>> Please notice the difference in the times of start of the STW and GC. >>>> >>>> And here is the output from the safepoint statistics: >>>> 45476.086: [deflating idle monitors, 0.0010070 secs] >>>> 45476.087: [updating inline caches, 0.0000000 secs] >>>> 45476.087: [compilation policy safepoint handler, 0.0005410 secs] >>>> 45476.088: [sweeping nmethods, 0.0000020 secs] >>>> vmop [threads: total initially_running >>>> wait_to_block] [time: spin block sync cleanup vmop] page_trap_count >>>> 45476.078: GenCollectForAllocation [ 294 >>>> 3 6 ] [ 4 1 6 1 21 ] 3 >>>> 45476.902: [deflating idle monitors, 0.0010780 secs] >>>> 45476.903: [updating inline caches, 0.0000010 secs] >>>> 45476.903: [compilation policy safepoint handler, 0.0005200 secs] >>>> 45476.904: [sweeping nmethods, 0.0000020 secs] >>>> vmop [threads: total initially_running >>>> wait_to_block] [time: spin block sync cleanup vmop] page_trap_count >>>> 45476.891: GenCollectForAllocation [ 294 >>>> 2 3 ] [ 9 3 13 1 2052 ] 2 >>>> >>>> So if my reading is correct we miss around 1.9 seconds - before the GC >>>> started. >>>> Based on the above it is unclear to me were this time was spent. >>>> >>>> The test was running for 40 hours. >>>> Over those 40 hours we had 26 incidents where the difference > 200ms. >>>> The above incident is the longest. >>>> Here they are: >>>> 3534.445: GenCollectForAllocation [ 307 >>>> 0 2 ] [ 0 0 0 1 1935 ] 0 >>>> 6403.822: GenCollectForAllocation [ 294 >>>> 2 4 ] [ 0 0 0 1 1683 ] 1 >>>> 9146.663: GenCollectForAllocation [ 292 >>>> 2 3 ] [ 24 0 25 2 1773 ] 2 >>>> 12351.684: GenCollectForAllocation [ 293 >>>> 4 10 ] [ 2 0 2 1 2044 ] 2 >>>> 15746.592: GenCollectForAllocation [ 294 >>>> 1 2 ] [ 0 2 2 1 1697 ] 0 >>>> 16574.963: GenCollectForAllocation [ 295 >>>> 1 1 ] [ 0 0 0 1 1224 ] 0 >>>> 18337.686: GenCollectForAllocation [ 293 >>>> 1 2 ] [ 1 0 1 1 418 ] 0 >>>> 19142.518: GenCollectForAllocation [ 295 >>>> 0 1 ] [ 0 0 0 1 1626 ] 0 >>>> 20563.826: GenCollectForAllocation [ 296 >>>> 6 6 ] [ 7 2 9 1 233 ] 4 >>>> 22611.752: GenCollectForAllocation [ 294 >>>> 4 4 ] [ 0 0 0 1 1584 ] 1 >>>> 26043.520: GenCollectForAllocation [ 295 >>>> 2 6 ] [ 6 4 11 1 1883 ] 1 >>>> 29584.480: GenCollectForAllocation [ 292 >>>> 3 5 ] [ 0 0 0 1 1788 ] 2 >>>> 33119.441: GenCollectForAllocation [ 293 >>>> 2 4 ] [ 3 0 3 1 1853 ] 2 >>>> 34800.660: GenCollectForAllocation [ 294 >>>> 2 4 ] [ 0 0 0 1 725 ] 0 >>>> 36444.246: GenCollectForAllocation [ 293 >>>> 1 0 ] [ 0 0 4 1 1815 ] 0 >>>> 36656.730: GenCollectForAllocation [ 294 >>>> 1 3 ] [ 0 0 0 1 905 ] 0 >>>> 39751.609: GenCollectForAllocation [ 294 >>>> 2 4 ] [ 3 0 3 1 2207 ] 1 >>>> 41836.305: GenCollectForAllocation [ 293 >>>> 2 2 ] [ 1 0 1 1 286 ] 1 >>>> 43323.840: GenCollectForAllocation [ 293 >>>> 0 1 ] [ 0 0 0 1 2006 ] 0 >>>> 45476.891: GenCollectForAllocation [ 294 >>>> 2 3 ] [ 9 3 13 1 2052 ] 2 >>>> 46288.453: GenCollectForAllocation [ 295 >>>> 0 2 ] [ 0 4 5 1 211 ] 0 >>>> 47016.430: GenCollectForAllocation [ 294 >>>> 4 4 ] [ 0 0 0 1 2408 ] 0 >>>> 48662.230: GenCollectForAllocation [ 293 >>>> 1 4 ] [ 0 0 0 1 315 ] 1 >>>> 48907.250: GenCollectForAllocation [ 296 >>>> 3 6 ] [ 0 0 0 1 421 ] 0 >>>> 50662.195: GenCollectForAllocation [ 294 >>>> 3 4 ] [ 0 0 0 1 2043 ] 1 >>>> 54128.828: GenCollectForAllocation [ 295 >>>> 2 2 ] [ 1 1 3 2 2660 ] 2 >>>> 57729.141: GenCollectForAllocation [ 298 >>>> 1 4 ] [ 0 0 0 1 1926 ] 0 >>>> >>>> We are on java7 u40, 32bit version on RHEL5.5. >>>> The JVM settings are: >>>> -server -Xmx2048m -Xms2048m -XX:PermSize=256m -XX:MaxPermSize=512m >>>> -Dsun.rmi.dgc.client.gcInterval=1800000 >>>> -Dsun.rmi.dgc.server.gcInterval=1800000 -XX:+DisableExplicitGC >>>> -XX:+UseParNewGC -XX:MaxGCPauseMillis=3000 >>>> -verbose:gc -Xloggc:... -XX:+PrintGCDetails -XX:+PrintHeapAtGC >>>> -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution >>>> -XX:+TraceSafepointCleanupTime -XX:+PrintSafepointStatistics >>>> -XX:PrintSafepointStatisticsCount=1 -XX:+PrintGCApplicationConcurrentTime >>>> -XX:+PrintGCApplicationStoppedTime >>>> -Djava.net.preferIPv4Stack=true >>>> >>>> The question of the day is: what else can we do to diagnose the above? >>>> >>>> Thanks in advance, >>>> Bartek >>>> >>>> _______________________________________________ >>>> hotspot-gc-use mailing list >>>> hotspot-gc-use at openjdk.java.net >>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>>> >>>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20131206/7140983f/attachment.html From stoth at miami-holdings.com Tue Dec 10 16:16:04 2013 From: stoth at miami-holdings.com (Steven Toth) Date: Wed, 11 Dec 2013 00:16:04 +0000 Subject: Seeking assistance with long garbage collection pauses with G1GC In-Reply-To: <0FF288A8F3C0E44A83B55C4E260BF0032AFC0B4F@DNY2I1EXC03.miamiholdings.corp> References: <0FF288A8F3C0E44A83B55C4E260BF0032AFC0B4F@DNY2I1EXC03.miamiholdings.corp> Message-ID: <0FF288A8F3C0E44A83B55C4E260BF0032AFD04E3@DNY2I1EXC03.miamiholdings.corp> Hello, We've been struggling with long pauses with the G1GC garbage collector for several weeks now and was hoping to get some assistance. We have a Java app running in a standalone JVM on RHEL. The app listens for data on one or more sockets, queues the data, and has scheduled threads pulling the data off the queue and persisting it. The data is wide, over 700 data elements per record, though all of the data elements are small Strings, Integers, or Longs. The app runs smoothly for periods of time, sometimes 30 minutes to an hour, but then we experience one or more long garbage collection pauses. The logs indicate the majority of the pause time is spent in the Object Copy time. The long pauses also have high sys time relative to the other shorter collections. Here are the JVM details: java version "1.7.0_03" Java(TM) SE Runtime Environment (build 1.7.0_03-b04) Java HotSpot(TM) 64-Bit Server VM (build 22.1-b02, mixed mode) Here are the JVM options: -XX:MaxPermSize=256m -XX:PermSize=256m -Xms3G -Xmx3G -XX:+UseG1GC -XX:G1HeapRegionSize=32M -XX:-UseGCOverheadLimit \ -Xloggc:logs/gc-STAT5-collector.log -XX:+PrintGCDetails -XX:+PrintGCDateStamps \ -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=8 -XX:GCLogFileSize=10M -XX:+PrintGCApplicationStoppedTime \ -XX:MaxNewSize=1G -XX:NewSize=1G \ -XX:+PrintGCApplicationStoppedTime -XX:+PrintTenuringDistribution -XX:+PrintAdaptiveSizePolicy After several iterations of experimenting with an assortment of options (including no options other than -Xms and -Xmx) the aforementioned options have given us the best performance with the fewest amount of long pauses. However we're still experiencing several dozen garbage collections a day that range from 1-5 seconds. The process is taskset to 4 cores (all on the same socket), but is barely using 2 of them. All of the processes on this box are pinned to their own cores (with 0 and 1 unused). The machine has plenty of free memory (20+G) and top shows the process using 2.5G of RES memory. A day's worth of garbage collection logs are attached, but here is an example of the GC log output with high Object Copy and sys time. There are numerous GC events comparable to the example below with near identical Eden/Survivors/Heap sizes that take well under 100 millis whereas this example took over 2 seconds. [Object Copy (ms): 2090.4 2224.0 2484.0 2160.1 1603.9 2071.2 887.8 1608.1 1992.0 2030.5 1692.5 1583.9 2140.3 1703.0 2174.0 1949.5 1941.1 2190.1 2153.3 1604.1 1930.8 1892.6 1651.9 [Eden: 1017M(1017M)->0B(1016M) Survivors: 7168K->8192K Heap: 1062M(3072M)->47M(3072M)] [Times: user=2.24 sys=7.22, real=2.49 secs] Any help would be greatly appreciated. Thanks. -Steve ****Confidentiality Note**** This e-mail may contain confidential and or privileged information and is solely for the use of the sender's intended recipient(s). Any review, dissemination, copying, printing or other use of this e-mail by any other persons or entities is prohibited. If you have received this e-mail in error, please contact the sender immediately by reply email and delete the material from any computer. -------------- next part -------------- A non-text attachment was scrubbed... Name: gc-STAT5-collector.log.zip Type: application/x-zip-compressed Size: 10776 bytes Desc: gc-STAT5-collector.log.zip Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20131211/d479c4b7/gc-STAT5-collector.log.zip From daubman at gmail.com Thu Dec 12 08:24:24 2013 From: daubman at gmail.com (Aaron Daubman) Date: Thu, 12 Dec 2013 11:24:24 -0500 Subject: Seeking assistance with long garbage collection pauses with G1GC In-Reply-To: <0FF288A8F3C0E44A83B55C4E260BF0032AFD04E3@DNY2I1EXC03.miamiholdings.corp> References: <0FF288A8F3C0E44A83B55C4E260BF0032AFC0B4F@DNY2I1EXC03.miamiholdings.corp> <0FF288A8F3C0E44A83B55C4E260BF0032AFD04E3@DNY2I1EXC03.miamiholdings.corp> Message-ID: Steve, This is a shot in the dark (and hopefully other more seasoned GC troubleshooters will chime in), but is this on a numa system? If so, you might trying numactl rather than taskset to make the memory allocation processor aware as well: See section: '4.1.2.1. Setting CPU Affinity with taskset' here: https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Performance_Tuning_Guide/main-cpu.html#s-cpu-tuning And also: http://users.sdsc.edu/~glockwood/comp/affinity.php#define:numactl HTH, Aaron On Tue, Dec 10, 2013 at 7:16 PM, Steven Toth wrote: > Hello, > > We've been struggling with long pauses with the G1GC garbage collector for > several weeks now and was hoping to get some assistance. > > We have a Java app running in a standalone JVM on RHEL. The app listens > for data on one or more sockets, queues the data, and has scheduled threads > pulling the data off the queue and persisting it. The data is wide, over > 700 data elements per record, though all of the data elements are small > Strings, Integers, or Longs. > > The app runs smoothly for periods of time, sometimes 30 minutes to an > hour, but then we experience one or more long garbage collection pauses. > The logs indicate the majority of the pause time is spent in the Object > Copy time. The long pauses also have high sys time relative to the other > shorter collections. > > Here are the JVM details: > > java version "1.7.0_03" > Java(TM) SE Runtime Environment (build 1.7.0_03-b04) Java HotSpot(TM) > 64-Bit Server VM (build 22.1-b02, mixed mode) > > Here are the JVM options: > > -XX:MaxPermSize=256m -XX:PermSize=256m -Xms3G -Xmx3G -XX:+UseG1GC > -XX:G1HeapRegionSize=32M -XX:-UseGCOverheadLimit \ > -Xloggc:logs/gc-STAT5-collector.log -XX:+PrintGCDetails > -XX:+PrintGCDateStamps \ -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation > -XX:NumberOfGCLogFiles=8 -XX:GCLogFileSize=10M > -XX:+PrintGCApplicationStoppedTime \ -XX:MaxNewSize=1G -XX:NewSize=1G \ > -XX:+PrintGCApplicationStoppedTime -XX:+PrintTenuringDistribution > -XX:+PrintAdaptiveSizePolicy > > After several iterations of experimenting with an assortment of options > (including no options other than -Xms and -Xmx) the aforementioned options > have given us the best performance with the fewest amount of long pauses. > However we're still experiencing several dozen garbage collections a day > that range from 1-5 seconds. > > The process is taskset to 4 cores (all on the same socket), but is barely > using 2 of them. All of the processes on this box are pinned to their own > cores (with 0 and 1 unused). The machine has plenty of free memory (20+G) > and top shows the process using 2.5G of RES memory. > > A day's worth of garbage collection logs are attached, but here is an > example of the GC log output with high Object Copy and sys time. There are > numerous GC events comparable to the example below with near identical > Eden/Survivors/Heap sizes that take well under 100 millis whereas this > example took over 2 seconds. > > [Object Copy (ms): 2090.4 2224.0 2484.0 2160.1 1603.9 2071.2 887.8 1608.1 > 1992.0 2030.5 1692.5 1583.9 2140.3 1703.0 2174.0 1949.5 1941.1 2190.1 > 2153.3 1604.1 1930.8 1892.6 1651.9 > > [Eden: 1017M(1017M)->0B(1016M) Survivors: 7168K->8192K Heap: > 1062M(3072M)->47M(3072M)] > > [Times: user=2.24 sys=7.22, real=2.49 secs] > > Any help would be greatly appreciated. > > Thanks. > > -Steve > > > ****Confidentiality Note**** This e-mail may contain confidential and or > privileged information and is solely for the use of the sender's intended > recipient(s). Any review, dissemination, copying, printing or other use of > this e-mail by any other persons or entities is prohibited. If you have > received this e-mail in error, please contact the sender immediately by > reply email and delete the material from any computer. > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20131212/6d30855e/attachment.html From monica.b at servergy.com Thu Dec 12 08:46:19 2013 From: monica.b at servergy.com (Monica Beckwith) Date: Thu, 12 Dec 2013 10:46:19 -0600 Subject: Seeking assistance with long garbage collection pauses with G1GC In-Reply-To: <0FF288A8F3C0E44A83B55C4E260BF0032AFD04E3@DNY2I1EXC03.miamiholdings.corp> References: <0FF288A8F3C0E44A83B55C4E260BF0032AFC0B4F@DNY2I1EXC03.miamiholdings.corp> <0FF288A8F3C0E44A83B55C4E260BF0032AFD04E3@DNY2I1EXC03.miamiholdings.corp> Message-ID: <52A9E85B.9070607@servergy.com> Re-trying... (I got a delivery failure wrt the gc-use alias)... Hi Steve - You need to limit your GC threads to the number of cores that are available for that process. (So, in your case, you can go up-to 4 cores) -Monica On 12/10/13 6:16 PM, Steven Toth wrote: > Hello, > > We've been struggling with long pauses with the G1GC garbage collector for several weeks now and was hoping to get some assistance. > > We have a Java app running in a standalone JVM on RHEL. The app listens for data on one or more sockets, queues the data, and has scheduled threads pulling the data off the queue and persisting it. The data is wide, over 700 data elements per record, though all of the data elements are small Strings, Integers, or Longs. > > The app runs smoothly for periods of time, sometimes 30 minutes to an hour, but then we experience one or more long garbage collection pauses. The logs indicate the majority of the pause time is spent in the Object Copy time. The long pauses also have high sys time relative to the other shorter collections. > > Here are the JVM details: > > java version "1.7.0_03" > Java(TM) SE Runtime Environment (build 1.7.0_03-b04) Java HotSpot(TM) 64-Bit Server VM (build 22.1-b02, mixed mode) > > Here are the JVM options: > > -XX:MaxPermSize=256m -XX:PermSize=256m -Xms3G -Xmx3G -XX:+UseG1GC -XX:G1HeapRegionSize=32M -XX:-UseGCOverheadLimit \ -Xloggc:logs/gc-STAT5-collector.log -XX:+PrintGCDetails -XX:+PrintGCDateStamps \ -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=8 -XX:GCLogFileSize=10M -XX:+PrintGCApplicationStoppedTime \ -XX:MaxNewSize=1G -XX:NewSize=1G \ -XX:+PrintGCApplicationStoppedTime -XX:+PrintTenuringDistribution -XX:+PrintAdaptiveSizePolicy > > After several iterations of experimenting with an assortment of options (including no options other than -Xms and -Xmx) the aforementioned options have given us the best performance with the fewest amount of long pauses. However we're still experiencing several dozen garbage collections a day that range from 1-5 seconds. > > The process is taskset to 4 cores (all on the same socket), but is barely using 2 of them. All of the processes on this box are pinned to their own cores (with 0 and 1 unused). The machine has plenty of free memory (20+G) and top shows the process using 2.5G of RES memory. > > A day's worth of garbage collection logs are attached, but here is an example of the GC log output with high Object Copy and sys time. There are numerous GC events comparable to the example below with near identical Eden/Survivors/Heap sizes that take well under 100 millis whereas this example took over 2 seconds. > > [Object Copy (ms): 2090.4 2224.0 2484.0 2160.1 1603.9 2071.2 887.8 1608.1 1992.0 2030.5 1692.5 1583.9 2140.3 1703.0 2174.0 1949.5 1941.1 2190.1 2153.3 1604.1 1930.8 1892.6 1651.9 > > [Eden: 1017M(1017M)->0B(1016M) Survivors: 7168K->8192K Heap: 1062M(3072M)->47M(3072M)] > > [Times: user=2.24 sys=7.22, real=2.49 secs] > > Any help would be greatly appreciated. > > Thanks. > > -Steve > > > ****Confidentiality Note**** This e-mail may contain confidential and or privileged information and is solely for the use of the sender's intended recipient(s). Any review, dissemination, copying, printing or other use of this e-mail by any other persons or entities is prohibited. If you have received this e-mail in error, please contact the sender immediately by reply email and delete the material from any computer. > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20131212/4259eb49/attachment.html From charlesjhunt at gmail.com Thu Dec 12 08:50:30 2013 From: charlesjhunt at gmail.com (charlie hunt) Date: Thu, 12 Dec 2013 10:50:30 -0600 Subject: Seeking assistance with long garbage collection pauses with G1GC In-Reply-To: <0FF288A8F3C0E44A83B55C4E260BF0032AFD04E3@DNY2I1EXC03.miamiholdings.corp> References: <0FF288A8F3C0E44A83B55C4E260BF0032AFC0B4F@DNY2I1EXC03.miamiholdings.corp> <0FF288A8F3C0E44A83B55C4E260BF0032AFD04E3@DNY2I1EXC03.miamiholdings.corp> Message-ID: Fyi, G1 was not officially supported on until JDK 1.7.0_04, aka 7u4. Not only are there many improvements in 7u4 vs 7u3, but many improvements since 7u4. I?d recommend you work with 7u40 or 7u45. All the above said, copy times look incredibly high for a 3 gb Java heap. Depending on your version of RHEL, if transparent huge pages are an available feature on your version RHEL, disable it. You might be seeing huge page coalescing which is contributing to your high sys time. Alternatively you may be paging / swapping, or possibly having high thread context switching. You might also need to throttle back the number GC threads. hths, charlie ? On Dec 10, 2013, at 6:16 PM, Steven Toth wrote: > Hello, > > We've been struggling with long pauses with the G1GC garbage collector for several weeks now and was hoping to get some assistance. > > We have a Java app running in a standalone JVM on RHEL. The app listens for data on one or more sockets, queues the data, and has scheduled threads pulling the data off the queue and persisting it. The data is wide, over 700 data elements per record, though all of the data elements are small Strings, Integers, or Longs. > > The app runs smoothly for periods of time, sometimes 30 minutes to an hour, but then we experience one or more long garbage collection pauses. The logs indicate the majority of the pause time is spent in the Object Copy time. The long pauses also have high sys time relative to the other shorter collections. > > Here are the JVM details: > > java version "1.7.0_03" > Java(TM) SE Runtime Environment (build 1.7.0_03-b04) Java HotSpot(TM) 64-Bit Server VM (build 22.1-b02, mixed mode) > > Here are the JVM options: > > -XX:MaxPermSize=256m -XX:PermSize=256m -Xms3G -Xmx3G -XX:+UseG1GC -XX:G1HeapRegionSize=32M -XX:-UseGCOverheadLimit \ -Xloggc:logs/gc-STAT5-collector.log -XX:+PrintGCDetails -XX:+PrintGCDateStamps \ -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=8 -XX:GCLogFileSize=10M -XX:+PrintGCApplicationStoppedTime \ -XX:MaxNewSize=1G -XX:NewSize=1G \ -XX:+PrintGCApplicationStoppedTime -XX:+PrintTenuringDistribution -XX:+PrintAdaptiveSizePolicy > > After several iterations of experimenting with an assortment of options (including no options other than -Xms and -Xmx) the aforementioned options have given us the best performance with the fewest amount of long pauses. However we're still experiencing several dozen garbage collections a day that range from 1-5 seconds. > > The process is taskset to 4 cores (all on the same socket), but is barely using 2 of them. All of the processes on this box are pinned to their own cores (with 0 and 1 unused). The machine has plenty of free memory (20+G) and top shows the process using 2.5G of RES memory. > > A day's worth of garbage collection logs are attached, but here is an example of the GC log output with high Object Copy and sys time. There are numerous GC events comparable to the example below with near identical Eden/Survivors/Heap sizes that take well under 100 millis whereas this example took over 2 seconds. > > [Object Copy (ms): 2090.4 2224.0 2484.0 2160.1 1603.9 2071.2 887.8 1608.1 1992.0 2030.5 1692.5 1583.9 2140.3 1703.0 2174.0 1949.5 1941.1 2190.1 2153.3 1604.1 1930.8 1892.6 1651.9 > > [Eden: 1017M(1017M)->0B(1016M) Survivors: 7168K->8192K Heap: 1062M(3072M)->47M(3072M)] > > [Times: user=2.24 sys=7.22, real=2.49 secs] > > Any help would be greatly appreciated. > > Thanks. > > -Steve > > > ****Confidentiality Note**** This e-mail may contain confidential and or privileged information and is solely for the use of the sender's intended recipient(s). Any review, dissemination, copying, printing or other use of this e-mail by any other persons or entities is prohibited. If you have received this e-mail in error, please contact the sender immediately by reply email and delete the material from any computer. > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From yu.zhang at oracle.com Thu Dec 12 12:35:46 2013 From: yu.zhang at oracle.com (YU ZHANG) Date: Thu, 12 Dec 2013 12:35:46 -0800 Subject: Seeking assistance with long garbage collection pauses with G1GC References: 0FF288A8F3C0E44A83B55C4E260BF0032AFC0B4F@DNY2I1EXC03.miamiholdings.corp Message-ID: <52AA1E22.1060907@oracle.com> Steve, >[Times: user=2.24 sys=7.22, real=2.49 secs] The real and user are about the same. There is not much parallel activities. Plus system time is high. As Charlie said, your system might be swapping, or doing other kernel works. -- Thanks, Jenny From wolfgang.pedot at finkzeit.at Wed Dec 18 08:51:59 2013 From: wolfgang.pedot at finkzeit.at (Wolfgang Pedot) Date: Wed, 18 Dec 2013 17:51:59 +0100 Subject: G1 trace humongous allocations Message-ID: <52B1D2AF.6010107@finkzeit.at> Hi, is there a way to trace humongous allocations when using G1? I have a productive application (7u45, ~16GB Heap-Size) and every now and again there are humongous allocations ranging from ~6-60MB. I know this from the output of -XX:+G1PrintRegionLivenessInfo. For example: ### HUMS 0x000000073a000000-0x000000073a800000 8388608 8388608 0 3424136.5 ### HUMC 0x000000073a800000-0x000000073b000000 8388608 8388608 0 637263.3 ### HUMC 0x000000073b000000-0x000000073b800000 8388608 8388608 0 276982.8 ### HUMC 0x000000073b800000-0x000000073c000000 8388608 8388608 0 10190518.0 ### HUMC 0x000000073c000000-0x000000073c800000 8388608 8388608 0 20046222.4 ### HUMC 0x000000073c800000-0x000000073d000000 6580624 6580624 0 1077287.7 This is taken from the Post-Marking Phase output of the GC-Log, usually a similar block repeats a couple of times indicating that either something big has been copied or its a repeated request. On a normal day there are one or two region-dumps containing HUMS/HUMC entries so they are fairly rare and judging from the logs they are all rather short-lived. I can pinpoint the allocations to a timewindow of 30-90min but so far have been unsuccessful to find something specific in the logs and there are hundrets of sessions on several different interfaces that could cause this. Its not really an issue but I dont see why my code should produce objects/arrays this large so I would like to know what code or at least thread did as it could also be a library. I cant go ahead and dump the heap every half hour in the hope to catch such an object while its alive, is there a way G1 could log which thread allocated the object or a stacktrace? kind regards Wolfgang Pedot From lucmolinari at gmail.com Wed Dec 18 10:58:09 2013 From: lucmolinari at gmail.com (Luciano Molinari) Date: Wed, 18 Dec 2013 16:58:09 -0200 Subject: YGC time increasing suddenly Message-ID: Hi everybody, We have a standalone Java app that receives requests through RMI and almost all the objects created by it are short (< ~100ms) lived objects. This app is running on a 24 cores server with 16 GB RAM (Red Hat Linux). During our performance tests (10k requests/second) we started to face a problem where the throughput decreases suddenly just a few minutes after the app was started. So, I started to investigate GC behaviour and to make some adjustments (increase memory, use CMS...) and now we are able to run our app properly for about 35 minutes. At this point the time spent during young collections grows sharply although no Full GC is executed (old gen is only ~4% full). I've done tests with many different parameters, but currently I'm using the following ones: java -server -verbose:gc -XX:+PrintGCDetails -XX:+PrintTenuringDistribution -XX:+PrintGCTimeStamps -XX:PrintFLSStatistics=1 -XX:SurvivorRatio=4 -XX:ParallelGCThreads=8 -XX:PermSize=256m -XX:+UseParNewGC -XX:MaxPermSize=256m -Xms7g -Xmx7g -XX:NewSize=4608m -XX:MaxNewSize=4608m -XX:MaxTenuringThreshold=15 -Dsun.rmi.dgc.client.gcInterval=3600000 -Dsun.rmi.dgc.server.gcInterval=3600000 -Djava.rmi.server.hostname=IP_ADDRESS If I use this same configuration (without CMS) the same problem occurs after 20minutes, so it doesn't seem to be related to CMS. Actually, as I mentioned above, CMS (Full GC) isn't executed during the tests. Some logs I've collected: 1992.748: [ParNew Desired survivor size 402653184 bytes, new threshold 15 (max 15) - age 1: 9308728 bytes, 9308728 total - age 2: 3448 bytes, 9312176 total - age 3: 1080 bytes, 9313256 total - age 4: 32 bytes, 9313288 total - age 5: 34768 bytes, 9348056 total - age 6: 32 bytes, 9348088 total - age 15: 2712 bytes, 9350800 total : 3154710K->10313K(3932160K), 0.0273150 secs] 3215786K->71392K(6553600K) //14 YGC happened during this window 2021.165: [ParNew Desired survivor size 402653184 bytes, new threshold 15 (max 15) - age 1: 9459544 bytes, 9459544 total - age 2: 3648200 bytes, 13107744 total - age 3: 3837976 bytes, 16945720 total - age 4: 3472448 bytes, 20418168 total - age 5: 3586896 bytes, 24005064 total - age 6: 3475560 bytes, 27480624 total - age 7: 3520952 bytes, 31001576 total - age 8: 3612088 bytes, 34613664 total - age 9: 3355160 bytes, 37968824 total - age 10: 3823032 bytes, 41791856 total - age 11: 3304576 bytes, 45096432 total - age 12: 3671288 bytes, 48767720 total - age 13: 3558696 bytes, 52326416 total - age 14: 3805744 bytes, 56132160 total - age 15: 3429672 bytes, 59561832 total : 3230658K->77508K(3932160K), 0.1143860 secs] 3291757K->142447K(6553600K) Besides the longer time to perform collection, I also realized that all 15 ages started to have larger values. I must say I'm a little confused about this scenario. Does anyone have some tip? Thanks in advance, -- Luciano -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20131218/0a43e1a3/attachment.html From bernd-2013 at eckenfels.net Wed Dec 18 11:40:57 2013 From: bernd-2013 at eckenfels.net (Bernd Eckenfels) Date: Wed, 18 Dec 2013 20:40:57 +0100 Subject: YGC time increasing suddenly In-Reply-To: References: Message-ID: Hello, I would look at the finalizer queue first. And if that does not cut it, take a heapdump and inspect it for unexpected large dominators (maybe cached softreferences - not sure about RMI DGC havent seen problems with it, but it sure can be a problem if it only cleans up once an hour.). How often do you see YGC at the beginning and then over time? It looks like every 2s? You might want to resize YGC by larger factors (but with the yg already at 4g I guess something else is a problem here). You claim that most of the data only lives for 100ms, that does not match with the age-size distribution (not at the beginning not at the end). Gruss Bernd Am 18.12.2013, 19:58 Uhr, schrieb Luciano Molinari : > Hi everybody, > > We have a standalone Java app that receives requests through RMI and > almost > all the objects created by it are short (< ~100ms) lived objects. > This app is running on a 24 cores server with 16 GB RAM (Red Hat Linux). > During our performance tests (10k requests/second) we started to face a > problem where the throughput decreases suddenly just a few minutes > after the app was started. > So, I started to investigate GC behaviour and to make some adjustments > (increase memory, use CMS...) and now we are able to run our app > properly for about 35 minutes. At this point the time spent during young > collections grows sharply although no Full GC is executed (old gen is > only > ~4% full). > > I've done tests with many different parameters, but currently I'm using > the > following ones: > java -server -verbose:gc -XX:+PrintGCDetails > -XX:+PrintTenuringDistribution > -XX:+PrintGCTimeStamps -XX:PrintFLSStatistics=1 -XX:SurvivorRatio=4 > -XX:ParallelGCThreads=8 -XX:PermSize=256m -XX:+UseParNewGC > -XX:MaxPermSize=256m -Xms7g -Xmx7g -XX:NewSize=4608m -XX:MaxNewSize=4608m > -XX:MaxTenuringThreshold=15 -Dsun.rmi.dgc.client.gcInterval=3600000 > -Dsun.rmi.dgc.server.gcInterval=3600000 > -Djava.rmi.server.hostname=IP_ADDRESS > > If I use this same configuration (without CMS) the same problem occurs > after 20minutes, so it doesn't seem to be related to CMS. Actually, as I > mentioned above, CMS (Full GC) isn't executed during the tests. > > Some logs I've collected: > > 1992.748: [ParNew > Desired survivor size 402653184 bytes, new threshold 15 (max 15) > - age 1: 9308728 bytes, 9308728 total > - age 2: 3448 bytes, 9312176 total > - age 3: 1080 bytes, 9313256 total > - age 4: 32 bytes, 9313288 total > - age 5: 34768 bytes, 9348056 total > - age 6: 32 bytes, 9348088 total > - age 15: 2712 bytes, 9350800 total > : 3154710K->10313K(3932160K), 0.0273150 secs] 3215786K->71392K(6553600K) > > //14 YGC happened during this window > > 2021.165: [ParNew > Desired survivor size 402653184 bytes, new threshold 15 (max 15) > - age 1: 9459544 bytes, 9459544 total > - age 2: 3648200 bytes, 13107744 total > - age 3: 3837976 bytes, 16945720 total > - age 4: 3472448 bytes, 20418168 total > - age 5: 3586896 bytes, 24005064 total > - age 6: 3475560 bytes, 27480624 total > - age 7: 3520952 bytes, 31001576 total > - age 8: 3612088 bytes, 34613664 total > - age 9: 3355160 bytes, 37968824 total > - age 10: 3823032 bytes, 41791856 total > - age 11: 3304576 bytes, 45096432 total > - age 12: 3671288 bytes, 48767720 total > - age 13: 3558696 bytes, 52326416 total > - age 14: 3805744 bytes, 56132160 total > - age 15: 3429672 bytes, 59561832 total > : 3230658K->77508K(3932160K), 0.1143860 secs] 3291757K->142447K(6553600K) > > Besides the longer time to perform collection, I also realized that all > 15 > ages started to have larger values. > > I must say I'm a little confused about this scenario. Does anyone have > some > tip? > > Thanks in advance, -- http://bernd.eckenfels.net From wolfgang.pedot at finkzeit.at Wed Dec 18 11:57:20 2013 From: wolfgang.pedot at finkzeit.at (Wolfgang Pedot) Date: Wed, 18 Dec 2013 20:57:20 +0100 Subject: YGC time increasing suddenly In-Reply-To: References: Message-ID: <52B1FE20.3010909@finkzeit.at> Hi, this is the first time I write an answer on this mailing-list so this could be totally useless but here goes: Your survivor-space seems to be quite empty, is the usage that low on all collects during your test? If so you could increase your survivor-ratio to gain more eden-space and if not many objects die in survivor you could also reduce the tenuring threshold. Total survivor usage has grown 6-fold from first to last GC and survivor space needs to be copied on each young gc. I admit it should probably not take that long to copy 60MB though... Here is a young-gc from one of my logs for comparison: 30230.123: [ParNew Desired survivor size 524261784 bytes, new threshold 12 (max 15) - age 1: 113917760 bytes, 113917760 total - age 2: 86192768 bytes, 200110528 total - age 3: 59060992 bytes, 259171520 total - age 4: 59319272 bytes, 318490792 total - age 5: 45307432 bytes, 363798224 total - age 6: 29478464 bytes, 393276688 total - age 7: 27440744 bytes, 420717432 total - age 8: 27947680 bytes, 448665112 total - age 9: 27294496 bytes, 475959608 total - age 10: 32830144 bytes, 508789752 total - age 11: 7490968 bytes, 516280720 total - age 12: 10723104 bytes, 527003824 total - age 13: 4549808 bytes, 531553632 total : 4306611K->731392K(4388608K), 0.1433810 secs] 10422356K->6878961K(14116608K) This is with MaxNewSize 5500m and a Survivor-Ratio of 8. You can see that GC-time is higher than yours (6core 3.33GHz Xeon), survivor-usage is way higher though. Hope I could help Wolfgang Am 18.12.2013 19:58, schrieb Luciano Molinari: > Hi everybody, > > We have a standalone Java app that receives requests through RMI and > almost all the objects created by it are short (< ~100ms) lived objects. > This app is running on a 24 cores server with 16 GB RAM (Red Hat Linux). > During our performance tests (10k requests/second) we started to face a > problem where the throughput decreases suddenly just a few minutes > after the app was started. > So, I started to investigate GC behaviour and to make some adjustments > (increase memory, use CMS...) and now we are able to run our app > properly for about 35 minutes. At this point the time spent during young > collections grows sharply although no Full GC is executed (old gen is > only ~4% full). > > I've done tests with many different parameters, but currently I'm using > the following ones: > java -server -verbose:gc -XX:+PrintGCDetails > -XX:+PrintTenuringDistribution -XX:+PrintGCTimeStamps > -XX:PrintFLSStatistics=1 -XX:SurvivorRatio=4 > -XX:ParallelGCThreads=8 -XX:PermSize=256m -XX:+UseParNewGC > -XX:MaxPermSize=256m -Xms7g -Xmx7g -XX:NewSize=4608m -XX:MaxNewSize=4608m > -XX:MaxTenuringThreshold=15 -Dsun.rmi.dgc.client.gcInterval=3600000 > -Dsun.rmi.dgc.server.gcInterval=3600000 > -Djava.rmi.server.hostname=IP_ADDRESS > > If I use this same configuration (without CMS) the same problem occurs > after 20minutes, so it doesn't seem to be related to CMS. Actually, as I > mentioned above, CMS (Full GC) isn't executed during the tests. > > Some logs I've collected: > > 1992.748: [ParNew > Desired survivor size 402653184 bytes, new threshold 15 (max 15) > - age 1: 9308728 bytes, 9308728 total > - age 2: 3448 bytes, 9312176 total > - age 3: 1080 bytes, 9313256 total > - age 4: 32 bytes, 9313288 total > - age 5: 34768 bytes, 9348056 total > - age 6: 32 bytes, 9348088 total > - age 15: 2712 bytes, 9350800 total > : 3154710K->10313K(3932160K), 0.0273150 secs] 3215786K->71392K(6553600K) > > //14 YGC happened during this window > > 2021.165: [ParNew > Desired survivor size 402653184 bytes, new threshold 15 (max 15) > - age 1: 9459544 bytes, 9459544 total > - age 2: 3648200 bytes, 13107744 total > - age 3: 3837976 bytes, 16945720 total > - age 4: 3472448 bytes, 20418168 total > - age 5: 3586896 bytes, 24005064 total > - age 6: 3475560 bytes, 27480624 total > - age 7: 3520952 bytes, 31001576 total > - age 8: 3612088 bytes, 34613664 total > - age 9: 3355160 bytes, 37968824 total > - age 10: 3823032 bytes, 41791856 total > - age 11: 3304576 bytes, 45096432 total > - age 12: 3671288 bytes, 48767720 total > - age 13: 3558696 bytes, 52326416 total > - age 14: 3805744 bytes, 56132160 total > - age 15: 3429672 bytes, 59561832 total > : 3230658K->77508K(3932160K), 0.1143860 secs] 3291757K->142447K(6553600K) > > Besides the longer time to perform collection, I also realized that all > 15 ages started to have larger values. > > I must say I'm a little confused about this scenario. Does anyone have > some tip? > > Thanks in advance, > -- > Luciano > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > From charlesjhunt at gmail.com Wed Dec 18 11:58:21 2013 From: charlesjhunt at gmail.com (charlie hunt) Date: Wed, 18 Dec 2013 13:58:21 -0600 Subject: G1 trace humongous allocations In-Reply-To: <52B1D2AF.6010107@finkzeit.at> References: <52B1D2AF.6010107@finkzeit.at> Message-ID: <6438E555-0DF5-4A1E-8E80-1681C50E894B@gmail.com> AFAIK, there is not a JVM command line option that will dump the information you are looking. Although I haven?t tried it, I have heard that there is a JFR (Java Flight Recorder) event for what the JVM in general considers large object allocations, (i.e. allocations to go on what?s called the ?slow path?). You can use JFR, which is part of Java Mission Control, you?ll need to be on JDK 7u40+, iirc. To use JFR / Mission Control in production environments you have to have a license, or some kind of purchased support for it. But, it is free to use in non-production environments. I have yet to use JFR / Mission Control. But, it?s on my short list. So, I wouldn?t be the person to ask how to use Mission Control and JFR to collect the information you?re looking for. hths, charlie ... On Dec 18, 2013, at 10:51 AM, Wolfgang Pedot wrote: > Hi, > > is there a way to trace humongous allocations when using G1? I have a > productive application (7u45, ~16GB Heap-Size) and every now and again > there are humongous allocations ranging from ~6-60MB. I know this from > the output of -XX:+G1PrintRegionLivenessInfo. For example: > > ### HUMS 0x000000073a000000-0x000000073a800000 8388608 8388608 > 0 3424136.5 > ### HUMC 0x000000073a800000-0x000000073b000000 8388608 8388608 > 0 637263.3 > ### HUMC 0x000000073b000000-0x000000073b800000 8388608 8388608 > 0 276982.8 > ### HUMC 0x000000073b800000-0x000000073c000000 8388608 8388608 > 0 10190518.0 > ### HUMC 0x000000073c000000-0x000000073c800000 8388608 8388608 > 0 20046222.4 > ### HUMC 0x000000073c800000-0x000000073d000000 6580624 6580624 > 0 1077287.7 > > This is taken from the Post-Marking Phase output of the GC-Log, usually > a similar block repeats a couple of times indicating that either > something big has been copied or its a repeated request. On a normal day > there are one or two region-dumps containing HUMS/HUMC entries so they > are fairly rare and judging from the logs they are all rather > short-lived. I can pinpoint the allocations to a timewindow of 30-90min > but so far have been unsuccessful to find something specific in the logs > and there are hundrets of sessions on several different interfaces that > could cause this. Its not really an issue but I dont see why my code > should produce objects/arrays this large so I would like to know what > code or at least thread did as it could also be a library. > > I cant go ahead and dump the heap every half hour in the hope to catch > such an object while its alive, is there a way G1 could log which thread > allocated the object or a stacktrace? > > kind regards > Wolfgang Pedot > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From rednaxelafx at gmail.com Wed Dec 18 12:37:30 2013 From: rednaxelafx at gmail.com (Krystal Mok) Date: Wed, 18 Dec 2013 12:37:30 -0800 Subject: G1 trace humongous allocations In-Reply-To: <6438E555-0DF5-4A1E-8E80-1681C50E894B@gmail.com> References: <52B1D2AF.6010107@finkzeit.at> <6438E555-0DF5-4A1E-8E80-1681C50E894B@gmail.com> Message-ID: Hi guys, I don't see a probe point for large object allocation in the OpenJDK 7u code. If there is one, than it should have been in http://hg.openjdk.java.net/jdk7u/jdk7u/hotspot/file/0025a2a965c8/src/share/vm/gc_interface/collectedHeap.inline.hpp The probe points that JFR uses should be all in the OpenJDK, so my guess is that there isn't an event for large object allocation per se. That said, I haven't used JFR in JDK7u45 yet, so take my words with a grain of salt. There's supposed to be a imprecise object allocation profiler in JFR, which records that type and stack trace of the object to be allocated when the TLAB has to refill. But I just did a quick check in OpenJDK7u code and didn't see the probe point for that, so it probably hasn't made it in yet. If you can build OpenJDK yourself, then there is a patch that my colleague and I did when I was in Taobao, specifically to log the kind of event you have in mind. The patch is here, under TAOBAOJDK-006: http://jvm.taobao.org/index.php/Sc_jvm_customization . The usage is mentioned here (from page 47): http://www.slideshare.net/RednaxelaFX/jvm-taobao The patch is made against HotSpot Express 20, which corresponds to JDK6u25. It may need some tweaks to work on OpenJDK7u. Anyway, you can always use other alternatives, like using the memory profiler that comes with VisualVM, which will instrument your bytecode to record type and stack trace info. These alternatives come with some overhead, but it might be worth it just for catching the problem you're having. HTH, - Kris On Wed, Dec 18, 2013 at 11:58 AM, charlie hunt wrote: > AFAIK, there is not a JVM command line option that will dump the > information you are looking. > > Although I haven?t tried it, I have heard that there is a JFR (Java Flight > Recorder) event for what the JVM in general considers large object > allocations, (i.e. allocations to go on what?s called the ?slow path?). > You can use JFR, which is part of Java Mission Control, you?ll need to be > on JDK 7u40+, iirc. > > To use JFR / Mission Control in production environments you have to have a > license, or some kind of purchased support for it. But, it is free to use > in non-production environments. > > I have yet to use JFR / Mission Control. But, it?s on my short list. So, > I wouldn?t be the person to ask how to use Mission Control and JFR to > collect the information you?re looking for. > > hths, > > charlie ... > > On Dec 18, 2013, at 10:51 AM, Wolfgang Pedot > wrote: > > > Hi, > > > > is there a way to trace humongous allocations when using G1? I have a > > productive application (7u45, ~16GB Heap-Size) and every now and again > > there are humongous allocations ranging from ~6-60MB. I know this from > > the output of -XX:+G1PrintRegionLivenessInfo. For example: > > > > ### HUMS 0x000000073a000000-0x000000073a800000 8388608 8388608 > > 0 3424136.5 > > ### HUMC 0x000000073a800000-0x000000073b000000 8388608 8388608 > > 0 637263.3 > > ### HUMC 0x000000073b000000-0x000000073b800000 8388608 8388608 > > 0 276982.8 > > ### HUMC 0x000000073b800000-0x000000073c000000 8388608 8388608 > > 0 10190518.0 > > ### HUMC 0x000000073c000000-0x000000073c800000 8388608 8388608 > > 0 20046222.4 > > ### HUMC 0x000000073c800000-0x000000073d000000 6580624 6580624 > > 0 1077287.7 > > > > This is taken from the Post-Marking Phase output of the GC-Log, usually > > a similar block repeats a couple of times indicating that either > > something big has been copied or its a repeated request. On a normal day > > there are one or two region-dumps containing HUMS/HUMC entries so they > > are fairly rare and judging from the logs they are all rather > > short-lived. I can pinpoint the allocations to a timewindow of 30-90min > > but so far have been unsuccessful to find something specific in the logs > > and there are hundrets of sessions on several different interfaces that > > could cause this. Its not really an issue but I dont see why my code > > should produce objects/arrays this large so I would like to know what > > code or at least thread did as it could also be a library. > > > > I cant go ahead and dump the heap every half hour in the hope to catch > > such an object while its alive, is there a way G1 could log which thread > > allocated the object or a stacktrace? > > > > kind regards > > Wolfgang Pedot > > > > _______________________________________________ > > hotspot-gc-use mailing list > > hotspot-gc-use at openjdk.java.net > > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20131218/ae8ea48f/attachment.html From charlesjhunt at gmail.com Wed Dec 18 13:53:48 2013 From: charlesjhunt at gmail.com (charlie hunt) Date: Wed, 18 Dec 2013 15:53:48 -0600 Subject: G1 trace humongous allocations In-Reply-To: References: <52B1D2AF.6010107@finkzeit.at> <6438E555-0DF5-4A1E-8E80-1681C50E894B@gmail.com> Message-ID: One could argue there isn't an explicit probe point for large object allocations. But, there is one for object allocations outside of a TLAB. And, this is the one that I had in mind. You can find it in the source file Kris mentioned above, http://hg.openjdk.java.net/jdk7u/jdk7u/hotspot/file/0025a2a965c8/src/share/vm/gc_interface/collectedHeap.inline.hpp on line 154 in the method CollectedHeap::common_mem_allocate_noinit(KlassHandle klass, size_t size, TRAPS). The probe is this piece of code: AllocTracer::send_allocation_outside_tlab_event(klass, size * HeapWordSize); Based on the trace.xml http://hg.openjdk.java.net/jdk7u/jdk7u/hotspot/file/0025a2a965c8/src/share/vm/trace/trace.xml, on line 347, you should find the "AllocObjectOutsideTLAB" event name. So, if you use JFR, and subscribe to that AllocObjectOutsideTLAB event, you will should see object allocations that occur outside a TLAB, and that event will include the size of the allocation, and include a stack trace along with class and thread involved. Though it's not *exactly* what Wolfgang may be looking for, it narrows down the possibilities and the intrusiveness of getting the information won't be nearly as great as with a profiler such as VisualVM. However, remember the license restrictions on JFR/Mission Control. In the past I've done as Kris suggested too where I've built a custom / modified version of OpenJDK to print this information. But, hacking OpenJDK and building OpenJDK, and then feeling safe about running that modified version is not for everyone. ;-) Hence, I didn't mention that path. hths, charlie ... On Wed, Dec 18, 2013 at 2:37 PM, Krystal Mok wrote: > Hi guys, > > I don't see a probe point for large object allocation in the OpenJDK 7u > code. If there is one, than it should have been in > http://hg.openjdk.java.net/jdk7u/jdk7u/hotspot/file/0025a2a965c8/src/share/vm/gc_interface/collectedHeap.inline.hpp > > The probe points that JFR uses should be all in the OpenJDK, so my guess > is that there isn't an event for large object allocation per se. That said, > I haven't used JFR in JDK7u45 yet, so take my words with a grain of salt. > > There's supposed to be a imprecise object allocation profiler in JFR, > which records that type and stack trace of the object to be allocated when > the TLAB has to refill. But I just did a quick check in OpenJDK7u code and > didn't see the probe point for that, so it probably hasn't made it in yet. > > If you can build OpenJDK yourself, then there is a patch that my colleague > and I did when I was in Taobao, specifically to log the kind of event you > have in mind. > The patch is here, under TAOBAOJDK-006: > http://jvm.taobao.org/index.php/Sc_jvm_customization . > The usage is mentioned here (from page 47): > http://www.slideshare.net/RednaxelaFX/jvm-taobao > The patch is made against HotSpot Express 20, which corresponds to > JDK6u25. It may need some tweaks to work on OpenJDK7u. > > Anyway, you can always use other alternatives, like using the memory > profiler that comes with VisualVM, which will instrument your bytecode to > record type and stack trace info. These alternatives come with some > overhead, but it might be worth it just for catching the problem you're > having. > > HTH, > - Kris > > > On Wed, Dec 18, 2013 at 11:58 AM, charlie hunt wrote: > >> AFAIK, there is not a JVM command line option that will dump the >> information you are looking. >> >> Although I haven?t tried it, I have heard that there is a JFR (Java >> Flight Recorder) event for what the JVM in general considers large object >> allocations, (i.e. allocations to go on what?s called the ?slow path?). >> You can use JFR, which is part of Java Mission Control, you?ll need to be >> on JDK 7u40+, iirc. >> >> To use JFR / Mission Control in production environments you have to have >> a license, or some kind of purchased support for it. But, it is free to >> use in non-production environments. >> >> I have yet to use JFR / Mission Control. But, it?s on my short list. >> So, I wouldn?t be the person to ask how to use Mission Control and JFR to >> collect the information you?re looking for. >> >> hths, >> >> charlie ... >> >> On Dec 18, 2013, at 10:51 AM, Wolfgang Pedot >> wrote: >> >> > Hi, >> > >> > is there a way to trace humongous allocations when using G1? I have a >> > productive application (7u45, ~16GB Heap-Size) and every now and again >> > there are humongous allocations ranging from ~6-60MB. I know this from >> > the output of -XX:+G1PrintRegionLivenessInfo. For example: >> > >> > ### HUMS 0x000000073a000000-0x000000073a800000 8388608 8388608 >> > 0 3424136.5 >> > ### HUMC 0x000000073a800000-0x000000073b000000 8388608 8388608 >> > 0 637263.3 >> > ### HUMC 0x000000073b000000-0x000000073b800000 8388608 8388608 >> > 0 276982.8 >> > ### HUMC 0x000000073b800000-0x000000073c000000 8388608 8388608 >> > 0 10190518.0 >> > ### HUMC 0x000000073c000000-0x000000073c800000 8388608 8388608 >> > 0 20046222.4 >> > ### HUMC 0x000000073c800000-0x000000073d000000 6580624 6580624 >> > 0 1077287.7 >> > >> > This is taken from the Post-Marking Phase output of the GC-Log, usually >> > a similar block repeats a couple of times indicating that either >> > something big has been copied or its a repeated request. On a normal day >> > there are one or two region-dumps containing HUMS/HUMC entries so they >> > are fairly rare and judging from the logs they are all rather >> > short-lived. I can pinpoint the allocations to a timewindow of 30-90min >> > but so far have been unsuccessful to find something specific in the logs >> > and there are hundrets of sessions on several different interfaces that >> > could cause this. Its not really an issue but I dont see why my code >> > should produce objects/arrays this large so I would like to know what >> > code or at least thread did as it could also be a library. >> > >> > I cant go ahead and dump the heap every half hour in the hope to catch >> > such an object while its alive, is there a way G1 could log which thread >> > allocated the object or a stacktrace? >> > >> > kind regards >> > Wolfgang Pedot >> > >> > _______________________________________________ >> > hotspot-gc-use mailing list >> > hotspot-gc-use at openjdk.java.net >> > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20131218/6f176b53/attachment.html From rednaxelafx at gmail.com Wed Dec 18 14:32:48 2013 From: rednaxelafx at gmail.com (Krystal Mok) Date: Wed, 18 Dec 2013 14:32:48 -0800 Subject: G1 trace humongous allocations In-Reply-To: References: <52B1D2AF.6010107@finkzeit.at> <6438E555-0DF5-4A1E-8E80-1681C50E894B@gmail.com> Message-ID: Thanks for the tip, Charlie! More comments inline: On Wed, Dec 18, 2013 at 1:53 PM, charlie hunt wrote: > One could argue there isn't an explicit probe point for large object > allocations. But, there is one for object allocations outside of a TLAB. > And, this is the one that I had in mind. You can find it in the source > file Kris mentioned above, > http://hg.openjdk.java.net/jdk7u/jdk7u/hotspot/file/0025a2a965c8/src/share/vm/gc_interface/collectedHeap.inline.hpp on > line 154 in the > method CollectedHeap::common_mem_allocate_noinit(KlassHandle klass, size_t > size, TRAPS). The probe is this piece of > code: AllocTracer::send_allocation_outside_tlab_event(klass, size * > HeapWordSize); > > Thanks. I totally missed this one, because it's not in the function that I expected it to be...oops. So the other use of AllocTracer is AllocTracer::send_allocation_in_new_tlab_event(klass, new_tlab_size * HeapWordSize, size * HeapWordSize); is in collectedHeap.cpp, line 270: http://hg.openjdk.java.net/jdk7u/jdk7u/hotspot/file/0025a2a965c8/src/share/vm/gc_interface/collectedHeap.cpp This is the probe point that I had in mind when I mentioned about the imprecise object allocation profiler. It's imprecise for "normal" small objects because most allocation goes through the fast path, where as this probe point is on the slow path of having to refill the TLAB. But, since large objects tend to be allocated outside of the TLAB, and they're always allocated through the slow path, Charlie is right that you should narrow your focus on allocation events outside of TLAB, and it should actually be pretty precise (meaning you won't miss anything). > Based on the trace.xml > http://hg.openjdk.java.net/jdk7u/jdk7u/hotspot/file/0025a2a965c8/src/share/vm/trace/trace.xml, > on line 347, you should find the "AllocObjectOutsideTLAB" event name. > > So, if you use JFR, and subscribe to that AllocObjectOutsideTLAB event, > you will should see object allocations that occur outside a TLAB, and that > event will include the size of the allocation, and include a stack trace > along with class and thread involved. > > Though it's not *exactly* what Wolfgang may be looking for, it narrows > down the possibilities and the intrusiveness of getting the information > won't be nearly as great as with a profiler such as VisualVM. However, > remember the license restrictions on JFR/Mission Control. > > In the past I've done as Kris suggested too where I've built a custom / > modified version of OpenJDK to print this information. But, hacking > OpenJDK and building OpenJDK, and then feeling safe about running that > modified version is not for everyone. ;-) Hence, I didn't mention that > path. > > Yep. The "feeling safe" part is hard for a lot of people, even for this kind of small modification. It is possible to screw things up even with a change this small, though, e.g. taking the wrong lock at the wrong time when printing information...so people scared do have a point ;-) - Kris > hths, > > charlie ... > > > > On Wed, Dec 18, 2013 at 2:37 PM, Krystal Mok wrote: > >> Hi guys, >> >> I don't see a probe point for large object allocation in the OpenJDK 7u >> code. If there is one, than it should have been in >> http://hg.openjdk.java.net/jdk7u/jdk7u/hotspot/file/0025a2a965c8/src/share/vm/gc_interface/collectedHeap.inline.hpp >> >> The probe points that JFR uses should be all in the OpenJDK, so my guess >> is that there isn't an event for large object allocation per se. That said, >> I haven't used JFR in JDK7u45 yet, so take my words with a grain of salt. >> >> There's supposed to be a imprecise object allocation profiler in JFR, >> which records that type and stack trace of the object to be allocated when >> the TLAB has to refill. But I just did a quick check in OpenJDK7u code and >> didn't see the probe point for that, so it probably hasn't made it in yet. >> >> If you can build OpenJDK yourself, then there is a patch that my >> colleague and I did when I was in Taobao, specifically to log the kind of >> event you have in mind. >> The patch is here, under TAOBAOJDK-006: >> http://jvm.taobao.org/index.php/Sc_jvm_customization . >> The usage is mentioned here (from page 47): >> http://www.slideshare.net/RednaxelaFX/jvm-taobao >> The patch is made against HotSpot Express 20, which corresponds to >> JDK6u25. It may need some tweaks to work on OpenJDK7u. >> >> Anyway, you can always use other alternatives, like using the memory >> profiler that comes with VisualVM, which will instrument your bytecode to >> record type and stack trace info. These alternatives come with some >> overhead, but it might be worth it just for catching the problem you're >> having. >> >> HTH, >> - Kris >> >> >> On Wed, Dec 18, 2013 at 11:58 AM, charlie hunt wrote: >> >>> AFAIK, there is not a JVM command line option that will dump the >>> information you are looking. >>> >>> Although I haven?t tried it, I have heard that there is a JFR (Java >>> Flight Recorder) event for what the JVM in general considers large object >>> allocations, (i.e. allocations to go on what?s called the ?slow path?). >>> You can use JFR, which is part of Java Mission Control, you?ll need to be >>> on JDK 7u40+, iirc. >>> >>> To use JFR / Mission Control in production environments you have to have >>> a license, or some kind of purchased support for it. But, it is free to >>> use in non-production environments. >>> >>> I have yet to use JFR / Mission Control. But, it?s on my short list. >>> So, I wouldn?t be the person to ask how to use Mission Control and JFR to >>> collect the information you?re looking for. >>> >>> hths, >>> >>> charlie ... >>> >>> On Dec 18, 2013, at 10:51 AM, Wolfgang Pedot >>> wrote: >>> >>> > Hi, >>> > >>> > is there a way to trace humongous allocations when using G1? I have a >>> > productive application (7u45, ~16GB Heap-Size) and every now and again >>> > there are humongous allocations ranging from ~6-60MB. I know this from >>> > the output of -XX:+G1PrintRegionLivenessInfo. For example: >>> > >>> > ### HUMS 0x000000073a000000-0x000000073a800000 8388608 8388608 >>> > 0 3424136.5 >>> > ### HUMC 0x000000073a800000-0x000000073b000000 8388608 8388608 >>> > 0 637263.3 >>> > ### HUMC 0x000000073b000000-0x000000073b800000 8388608 8388608 >>> > 0 276982.8 >>> > ### HUMC 0x000000073b800000-0x000000073c000000 8388608 8388608 >>> > 0 10190518.0 >>> > ### HUMC 0x000000073c000000-0x000000073c800000 8388608 8388608 >>> > 0 20046222.4 >>> > ### HUMC 0x000000073c800000-0x000000073d000000 6580624 6580624 >>> > 0 1077287.7 >>> > >>> > This is taken from the Post-Marking Phase output of the GC-Log, usually >>> > a similar block repeats a couple of times indicating that either >>> > something big has been copied or its a repeated request. On a normal >>> day >>> > there are one or two region-dumps containing HUMS/HUMC entries so they >>> > are fairly rare and judging from the logs they are all rather >>> > short-lived. I can pinpoint the allocations to a timewindow of 30-90min >>> > but so far have been unsuccessful to find something specific in the >>> logs >>> > and there are hundrets of sessions on several different interfaces that >>> > could cause this. Its not really an issue but I dont see why my code >>> > should produce objects/arrays this large so I would like to know what >>> > code or at least thread did as it could also be a library. >>> > >>> > I cant go ahead and dump the heap every half hour in the hope to catch >>> > such an object while its alive, is there a way G1 could log which >>> thread >>> > allocated the object or a stacktrace? >>> > >>> > kind regards >>> > Wolfgang Pedot >>> > >>> > _______________________________________________ >>> > hotspot-gc-use mailing list >>> > hotspot-gc-use at openjdk.java.net >>> > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>> >>> _______________________________________________ >>> hotspot-gc-use mailing list >>> hotspot-gc-use at openjdk.java.net >>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20131218/3d6bf1ea/attachment-0001.html From charlesjhunt at gmail.com Wed Dec 18 15:06:36 2013 From: charlesjhunt at gmail.com (charlie hunt) Date: Wed, 18 Dec 2013 17:06:36 -0600 Subject: G1 trace humongous allocations In-Reply-To: References: <52B1D2AF.6010107@finkzeit.at> <6438E555-0DF5-4A1E-8E80-1681C50E894B@gmail.com>

Message-ID: <9DCA111F-2C9A-431A-809D-4CE24F2E0B14@gmail.com> Short additional comment below. On Dec 18, 2013, at 4:32 PM, Krystal Mok wrote: > Thanks for the tip, Charlie! > More comments inline: > > On Wed, Dec 18, 2013 at 1:53 PM, charlie hunt wrote: > One could argue there isn't an explicit probe point for large object allocations. But, there is one for object allocations outside of a TLAB. And, this is the one that I had in mind. You can find it in the source file Kris mentioned above, http://hg.openjdk.java.net/jdk7u/jdk7u/hotspot/file/0025a2a965c8/src/share/vm/gc_interface/collectedHeap.inline.hpp on line 154 in the method CollectedHeap::common_mem_allocate_noinit(KlassHandle klass, size_t size, TRAPS). The probe is this piece of code: AllocTracer::send_allocation_outside_tlab_event(klass, size * HeapWordSize); > > Thanks. I totally missed this one, because it's not in the function that I expected it to be...oops. > > So the other use of AllocTracer is AllocTracer::send_allocation_in_new_tlab_event(klass, new_tlab_size * HeapWordSize, size * HeapWordSize); is in collectedHeap.cpp, line 270: http://hg.openjdk.java.net/jdk7u/jdk7u/hotspot/file/0025a2a965c8/src/share/vm/gc_interface/collectedHeap.cpp > This is the probe point that I had in mind when I mentioned about the imprecise object allocation profiler. It's imprecise for "normal" small objects because most allocation goes through the fast path, where as this probe point is on the slow path of having to refill the TLAB. Yep, iirc, the so called fast code path is (buried) in the (JIT) compiler code. And, I?m not smart enough to understand that (compiler) stuff. ;-) > > But, since large objects tend to be allocated outside of the TLAB, and they're always allocated through the slow path, Charlie is right that you should narrow your focus on allocation events outside of TLAB, and it should actually be pretty precise (meaning you won't miss anything). > > Based on the trace.xml http://hg.openjdk.java.net/jdk7u/jdk7u/hotspot/file/0025a2a965c8/src/share/vm/trace/trace.xml, on line 347, you should find the "AllocObjectOutsideTLAB" event name. > > So, if you use JFR, and subscribe to that AllocObjectOutsideTLAB event, you will should see object allocations that occur outside a TLAB, and that event will include the size of the allocation, and include a stack trace along with class and thread involved. > > Though it's not *exactly* what Wolfgang may be looking for, it narrows down the possibilities and the intrusiveness of getting the information won't be nearly as great as with a profiler such as VisualVM. However, remember the license restrictions on JFR/Mission Control. > > In the past I've done as Kris suggested too where I've built a custom / modified version of OpenJDK to print this information. But, hacking OpenJDK and building OpenJDK, and then feeling safe about running that modified version is not for everyone. ;-) Hence, I didn't mention that path. > > Yep. The "feeling safe" part is hard for a lot of people, even for this kind of small modification. It is possible to screw things up even with a change this small, though, e.g. taking the wrong lock at the wrong time when printing information...so people scared do have a point ;-) > > - Kris > > hths, > > charlie ... > > > > On Wed, Dec 18, 2013 at 2:37 PM, Krystal Mok wrote: > Hi guys, > > I don't see a probe point for large object allocation in the OpenJDK 7u code. If there is one, than it should have been in http://hg.openjdk.java.net/jdk7u/jdk7u/hotspot/file/0025a2a965c8/src/share/vm/gc_interface/collectedHeap.inline.hpp > > The probe points that JFR uses should be all in the OpenJDK, so my guess is that there isn't an event for large object allocation per se. That said, I haven't used JFR in JDK7u45 yet, so take my words with a grain of salt. > > There's supposed to be a imprecise object allocation profiler in JFR, which records that type and stack trace of the object to be allocated when the TLAB has to refill. But I just did a quick check in OpenJDK7u code and didn't see the probe point for that, so it probably hasn't made it in yet. > > If you can build OpenJDK yourself, then there is a patch that my colleague and I did when I was in Taobao, specifically to log the kind of event you have in mind. > The patch is here, under TAOBAOJDK-006: http://jvm.taobao.org/index.php/Sc_jvm_customization . > The usage is mentioned here (from page 47): http://www.slideshare.net/RednaxelaFX/jvm-taobao > The patch is made against HotSpot Express 20, which corresponds to JDK6u25. It may need some tweaks to work on OpenJDK7u. > > Anyway, you can always use other alternatives, like using the memory profiler that comes with VisualVM, which will instrument your bytecode to record type and stack trace info. These alternatives come with some overhead, but it might be worth it just for catching the problem you're having. > > HTH, > - Kris > > > On Wed, Dec 18, 2013 at 11:58 AM, charlie hunt wrote: > AFAIK, there is not a JVM command line option that will dump the information you are looking. > > Although I haven?t tried it, I have heard that there is a JFR (Java Flight Recorder) event for what the JVM in general considers large object allocations, (i.e. allocations to go on what?s called the ?slow path?). You can use JFR, which is part of Java Mission Control, you?ll need to be on JDK 7u40+, iirc. > > To use JFR / Mission Control in production environments you have to have a license, or some kind of purchased support for it. But, it is free to use in non-production environments. > > I have yet to use JFR / Mission Control. But, it?s on my short list. So, I wouldn?t be the person to ask how to use Mission Control and JFR to collect the information you?re looking for. > > hths, > > charlie ... > > On Dec 18, 2013, at 10:51 AM, Wolfgang Pedot wrote: > > > Hi, > > > > is there a way to trace humongous allocations when using G1? I have a > > productive application (7u45, ~16GB Heap-Size) and every now and again > > there are humongous allocations ranging from ~6-60MB. I know this from > > the output of -XX:+G1PrintRegionLivenessInfo. For example: > > > > ### HUMS 0x000000073a000000-0x000000073a800000 8388608 8388608 > > 0 3424136.5 > > ### HUMC 0x000000073a800000-0x000000073b000000 8388608 8388608 > > 0 637263.3 > > ### HUMC 0x000000073b000000-0x000000073b800000 8388608 8388608 > > 0 276982.8 > > ### HUMC 0x000000073b800000-0x000000073c000000 8388608 8388608 > > 0 10190518.0 > > ### HUMC 0x000000073c000000-0x000000073c800000 8388608 8388608 > > 0 20046222.4 > > ### HUMC 0x000000073c800000-0x000000073d000000 6580624 6580624 > > 0 1077287.7 > > > > This is taken from the Post-Marking Phase output of the GC-Log, usually > > a similar block repeats a couple of times indicating that either > > something big has been copied or its a repeated request. On a normal day > > there are one or two region-dumps containing HUMS/HUMC entries so they > > are fairly rare and judging from the logs they are all rather > > short-lived. I can pinpoint the allocations to a timewindow of 30-90min > > but so far have been unsuccessful to find something specific in the logs > > and there are hundrets of sessions on several different interfaces that > > could cause this. Its not really an issue but I dont see why my code > > should produce objects/arrays this large so I would like to know what > > code or at least thread did as it could also be a library. > > > > I cant go ahead and dump the heap every half hour in the hope to catch > > such an object while its alive, is there a way G1 could log which thread > > allocated the object or a stacktrace? > > > > kind regards > > Wolfgang Pedot > > > > _______________________________________________ > > hotspot-gc-use mailing list > > hotspot-gc-use at openjdk.java.net > > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20131218/23822e7d/attachment.html From lanson.zheng at gmail.com Wed Dec 18 23:53:06 2013 From: lanson.zheng at gmail.com (Guoqin Zheng) Date: Wed, 18 Dec 2013 23:53:06 -0800 Subject: How CMS interleave with ParNew Message-ID: Hey folks, While search for how CMS works in detail, I happened to an article saying usuablly before the inital-mark phase and remark phase, there is young gen collection happening and the initial-mark/remark also scan young gen space. So 1. Can you help me understand how the ParNew collection work with CMS to improve the performance. 2. Why the young gen space needs to be scanned 3. If the remark takes very long, what does that mean? 4. Finally, is there a official/or good docs talking about CMS in detail? Thanks, G. Zheng -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20131218/d021b0bc/attachment.html From wolfgang.pedot at finkzeit.at Thu Dec 19 05:35:06 2013 From: wolfgang.pedot at finkzeit.at (Wolfgang Pedot) Date: Thu, 19 Dec 2013 14:35:06 +0100 Subject: G1 trace humongous allocations In-Reply-To: References: <52B1D2AF.6010107@finkzeit.at> <6438E555-0DF5-4A1E-8E80-1681C50E894B@gmail.com> Message-ID: <52B2F60A.6050806@finkzeit.at> Hi, thanks for your input, I did not know about JFR up to now. I ran some tests in my debugger and the information it can provide is really nice. The licensing is unfortunate (need to check out the pricing) but at least it helps me collect more detailed information in the test-environment. thanks again Wolfgang From lucmolinari at gmail.com Thu Dec 19 09:00:47 2013 From: lucmolinari at gmail.com (Luciano Molinari) Date: Thu, 19 Dec 2013 15:00:47 -0200 Subject: YGC time increasing suddenly In-Reply-To: <52B1FE20.3010909@finkzeit.at> References: <52B1FE20.3010909@finkzeit.at> Message-ID: Bernd and Wolfgang, thanks for your quick answers. I took some time to answer them because I was running some tests based on your comments. *Bernd:* I would look at the finalizer queue first. *A: *From what I could find in the code, it doesn't seem to have explicit finalizers. Is there any way to check this queue?I found some articles about the problems finalize() method may cause, but I wasn't able to find something related to monitoring this queue. *Bernd:* And if that does not cut it, take a heapdump and inspect it for unexpected large dominators (maybe cached softreferences - not sure about RMI DGC havent seen problems with it, but it sure can be a problem if it only cleans up once an hour.). *A:* Regarding RMI, I ran some tests replacing it by JeroMQ but unfortunately I got the same results. About heapdump, Eclipse MAT shows almost nothing (only ~50mb) because the majority of objects are unreachable. *Bernd:* How often do you see YGC at the beginning and then over time? It looks like every 2s? You might want to resize YGC by larger factors (but with the yg already at 4g I guess something else is a problem here). *A*: After I start my tests, YGC occurs once or twice every 3 seconds, as the following log shows: jstat -gcutil 29331 3000 S0 S1 E O P YGC YGCT FGC FGCT GCT 1.40 0.00 89.74 2.13 11.86 602 12.126 1 0.086 12.212 1.64 0.00 66.92 2.13 11.86 604 12.166 1 0.086 12.252 1.38 0.00 41.10 2.13 11.86 606 12.204 1 0.086 12.290 1.47 0.00 10.86 2.13 11.86 608 12.244 1 0.086 12.330 0.00 1.47 89.35 2.13 11.86 609 12.265 1 0.086 12.351 0.00 1.51 62.11 2.13 11.86 611 12.305 1 0.086 12.391 0.00 1.38 32.83 2.14 11.86 613 12.346 1 0.086 12.432 0.00 0.96 11.06 2.21 11.86 615 12.386 1 0.086 12.472 0.97 0.00 72.35 2.22 11.86 616 12.406 1 0.086 12.492 It keeps this rate during the whole time, the only difference is that collections start to last longer. *Bernd:* You claim that most of the data only lives for 100ms, that does not match with the age-size distribution (not at the beginning not at the end). *A:* I said that for 2 reasons. Firstly, you can see by the log bellow that most transactions last < 25 ms: | interval | number of transactions | % | |------------------------+---------------------------+-------------------------| | 0 ms <= n < 25 ms : 7487644 : 97.704 | | 25 ms <= n < 50 ms : 137146 : 1.790 | | 50 ms <= n < 75 ms : 26422 : 0.345 | | 75 ms <= n < 100 ms : 8086 : 0.106 | | 100 ms <= n < 200 ms : 4081 : 0.053 | | 200 ms <= n < 500 ms : 216 : 0.003 | | 500 ms <= n < 1000 ms : 0 : 0.000 | And secondly, very few objects are promoted to old gen. *Wolfgang*, what you said about survivor also seems to make sense, but I ran some tests with survivorRation=8 and survivorRation=16 and the results were pretty much the same. I also collected some data using "sar -B" and vmstat commands in order to try to find out something else. sar -B 12:58:33 PM pgpgin/s pgpgout/s fault/s majflt/s 12:58:43 PM 0.00 5.19 16.98 0.00 12:58:53 PM 0.00 6.80 20.70 0.00 12:59:03 PM 0.00 12.81 16.72 0.00 12:59:13 PM 0.00 3.60 17.98 0.00 12:59:23 PM 0.00 14.81 118.42 0.00 12:59:33 PM 0.00 11.20 90.70 0.00 12:59:43 PM 0.00 5.20 662.60 0.00 (here GC started to take longer) 12:59:53 PM 0.00 5.20 1313.10 0.00 01:00:03 PM 0.00 20.42 960.66 0.00 01:00:13 PM 0.00 17.18 620.78 0.00 01:00:23 PM 0.00 3.60 725.93 0.00 01:00:33 PM 0.00 15.18 465.13 0.00 01:00:33 PM pgpgin/s pgpgout/s fault/s majflt/s 01:00:43 PM 0.00 12.01 508.31 0.00 01:00:53 PM 0.00 6.00 588.50 0.00 01:01:03 PM 0.00 20.00 660.80 0.00 01:01:13 PM 0.00 6.79 553.05 0.00 Page faults start to increase along with the degradation problem, but I'm not 100% sure about this relation, mainly because there's a lot of free memory, as vmstat shows bellow. However, I saw some people saying that page faults may occur even when there is free memory. vmstat procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu ------ r b swpd free buff cache si so bi bo in cs us sy id wa st 34 0 0 10803608 196472 925120 0 0 0 4 64804 109417 49 7 45 0 0 17 0 0 10802604 196472 925120 0 0 0 14 66130 111493 52 7 41 0 0 22 0 0 10795060 196472 925120 0 0 0 12 65331 110577 49 7 45 0 0 20 0 0 10758080 196472 925120 0 0 0 4 65222 111041 48 7 45 0 0 23 0 0 10712208 196472 925120 0 0 0 7 64759 110016 49 7 45 0 0 8 0 0 10682828 196472 925140 0 0 0 33 64780 109899 49 7 44 0 0 17 0 0 10655280 196472 925140 0 0 0 5 64321 109619 50 7 44 0 0 procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu ------ r b swpd free buff cache si so bi bo in cs us sy id wa st 17 0 0 10636300 196472 925140 0 0 0 12 64574 108885 50 7 44 0 0 4 0 0 10614888 196472 925140 0 0 0 5 63384 107379 49 7 44 0 0 18 0 0 10595172 196472 925140 0 0 0 14 65450 110004 50 7 43 0 0 28 0 0 10576420 196472 925140 0 0 0 4 64720 109119 48 7 45 0 0 29 0 0 10554908 196472 925140 0 0 0 25 64051 108606 51 7 42 0 0 33 0 0 10537584 196472 925140 0 0 0 11 64501 109765 50 7 43 0 0 24 0 0 10521128 196472 925140 0 0 0 5 64439 109538 51 7 42 0 0 It seems that vmstat doesn't show anything problematic. Any other advice? Thanks again. On Wed, Dec 18, 2013 at 5:57 PM, Wolfgang Pedot wrote: > Hi, > > this is the first time I write an answer on this mailing-list so this > could be totally useless but here goes: > > Your survivor-space seems to be quite empty, is the usage that low on all > collects during your test? If so you could increase your survivor-ratio to > gain more eden-space and if not many objects die in survivor you could also > reduce the tenuring threshold. Total survivor usage has grown 6-fold from > first to last GC and survivor space needs to be copied on each young gc. I > admit it should probably not take that long to copy 60MB though... > > Here is a young-gc from one of my logs for comparison: > > 30230.123: [ParNew > Desired survivor size 524261784 bytes, new threshold 12 (max 15) > - age 1: 113917760 bytes, 113917760 total > - age 2: 86192768 bytes, 200110528 total > - age 3: 59060992 bytes, 259171520 total > - age 4: 59319272 bytes, 318490792 total > - age 5: 45307432 bytes, 363798224 total > - age 6: 29478464 bytes, 393276688 total > - age 7: 27440744 bytes, 420717432 total > - age 8: 27947680 bytes, 448665112 total > - age 9: 27294496 bytes, 475959608 total > - age 10: 32830144 bytes, 508789752 total > - age 11: 7490968 bytes, 516280720 total > - age 12: 10723104 bytes, 527003824 total > - age 13: 4549808 bytes, 531553632 total > : 4306611K->731392K(4388608K), 0.1433810 secs] > 10422356K->6878961K(14116608K) > > This is with MaxNewSize 5500m and a Survivor-Ratio of 8. You can see that > GC-time is higher than yours (6core 3.33GHz Xeon), survivor-usage is way > higher though. > > Hope I could help > Wolfgang > > > Am 18.12.2013 19:58, schrieb Luciano Molinari: > >> Hi everybody, >> >> We have a standalone Java app that receives requests through RMI and >> almost all the objects created by it are short (< ~100ms) lived objects. >> This app is running on a 24 cores server with 16 GB RAM (Red Hat Linux). >> During our performance tests (10k requests/second) we started to face a >> problem where the throughput decreases suddenly just a few minutes >> after the app was started. >> So, I started to investigate GC behaviour and to make some adjustments >> (increase memory, use CMS...) and now we are able to run our app >> properly for about 35 minutes. At this point the time spent during young >> collections grows sharply although no Full GC is executed (old gen is >> only ~4% full). >> >> I've done tests with many different parameters, but currently I'm using >> the following ones: >> java -server -verbose:gc -XX:+PrintGCDetails >> -XX:+PrintTenuringDistribution -XX:+PrintGCTimeStamps >> -XX:PrintFLSStatistics=1 -XX:SurvivorRatio=4 >> -XX:ParallelGCThreads=8 -XX:PermSize=256m -XX:+UseParNewGC >> -XX:MaxPermSize=256m -Xms7g -Xmx7g -XX:NewSize=4608m -XX:MaxNewSize=4608m >> -XX:MaxTenuringThreshold=15 -Dsun.rmi.dgc.client.gcInterval=3600000 >> -Dsun.rmi.dgc.server.gcInterval=3600000 >> -Djava.rmi.server.hostname=IP_ADDRESS >> >> If I use this same configuration (without CMS) the same problem occurs >> after 20minutes, so it doesn't seem to be related to CMS. Actually, as I >> mentioned above, CMS (Full GC) isn't executed during the tests. >> >> Some logs I've collected: >> >> 1992.748: [ParNew >> Desired survivor size 402653184 bytes, new threshold 15 (max 15) >> - age 1: 9308728 bytes, 9308728 total >> - age 2: 3448 bytes, 9312176 total >> - age 3: 1080 bytes, 9313256 total >> - age 4: 32 bytes, 9313288 total >> - age 5: 34768 bytes, 9348056 total >> - age 6: 32 bytes, 9348088 total >> - age 15: 2712 bytes, 9350800 total >> : 3154710K->10313K(3932160K), 0.0273150 secs] 3215786K->71392K(6553600K) >> >> //14 YGC happened during this window >> >> 2021.165: [ParNew >> Desired survivor size 402653184 bytes, new threshold 15 (max 15) >> - age 1: 9459544 bytes, 9459544 total >> - age 2: 3648200 bytes, 13107744 total >> - age 3: 3837976 bytes, 16945720 total >> - age 4: 3472448 bytes, 20418168 total >> - age 5: 3586896 bytes, 24005064 total >> - age 6: 3475560 bytes, 27480624 total >> - age 7: 3520952 bytes, 31001576 total >> - age 8: 3612088 bytes, 34613664 total >> - age 9: 3355160 bytes, 37968824 total >> - age 10: 3823032 bytes, 41791856 total >> - age 11: 3304576 bytes, 45096432 total >> - age 12: 3671288 bytes, 48767720 total >> - age 13: 3558696 bytes, 52326416 total >> - age 14: 3805744 bytes, 56132160 total >> - age 15: 3429672 bytes, 59561832 total >> : 3230658K->77508K(3932160K), 0.1143860 secs] 3291757K->142447K(6553600K) >> >> Besides the longer time to perform collection, I also realized that all >> 15 ages started to have larger values. >> >> I must say I'm a little confused about this scenario. Does anyone have >> some tip? >> >> Thanks in advance, >> -- >> Luciano >> >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> >> > -- Luciano Davoglio Molinari -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20131219/4ce1be43/attachment-0001.html From ivan.mamontov at gmail.com Thu Dec 19 09:14:20 2013 From: ivan.mamontov at gmail.com (=?UTF-8?B?0JzQsNC80L7QvdGC0L7QsiDQmNCy0LDQvQ==?=) Date: Thu, 19 Dec 2013 21:14:20 +0400 Subject: YGC time increasing suddenly In-Reply-To: References: <52B1FE20.3010909@finkzeit.at> Message-ID: Hi, I had the same problem as you describe, but first I need to know one thing: - How many threads have been created at runtime? I am writing a letter describing the same issue and its solution. 2013/12/19 Luciano Molinari > Bernd and Wolfgang, thanks for your quick answers. I took some time to > answer them because I was running some tests based on your comments. > > *Bernd:* I would look at the finalizer queue first. > *A: *From what I could find in the code, it doesn't seem to have explicit > finalizers. Is there any way to check this queue?I found some articles > about the problems finalize() method may cause, but I wasn't able to find > something related to monitoring this queue. > > *Bernd:* And if that does not cut it, take a heapdump and inspect it for > unexpected large dominators (maybe cached softreferences - not sure > about RMI DGC havent seen problems with it, but it sure can be a problem > if it only cleans up once an hour.). > *A:* Regarding RMI, I ran some tests replacing it by JeroMQ but > unfortunately I got the same results. About heapdump, Eclipse MAT shows > almost nothing (only ~50mb) because the majority of objects are unreachable. > > *Bernd:* How often do you see YGC at the beginning and then over time? It > looks like every 2s? You might want to resize YGC by larger factors > (but with the yg already at 4g I guess something else is a problem here). > *A*: After I start my tests, YGC occurs once or twice every 3 seconds, as > the following log shows: > jstat -gcutil 29331 3000 > S0 S1 E O P YGC YGCT FGC FGCT GCT > 1.40 0.00 89.74 2.13 11.86 602 12.126 1 0.086 12.212 > 1.64 0.00 66.92 2.13 11.86 604 12.166 1 0.086 12.252 > 1.38 0.00 41.10 2.13 11.86 606 12.204 1 0.086 12.290 > 1.47 0.00 10.86 2.13 11.86 608 12.244 1 0.086 12.330 > 0.00 1.47 89.35 2.13 11.86 609 12.265 1 0.086 12.351 > 0.00 1.51 62.11 2.13 11.86 611 12.305 1 0.086 12.391 > 0.00 1.38 32.83 2.14 11.86 613 12.346 1 0.086 12.432 > 0.00 0.96 11.06 2.21 11.86 615 12.386 1 0.086 12.472 > 0.97 0.00 72.35 2.22 11.86 616 12.406 1 0.086 12.492 > It keeps this rate during the whole time, the only difference is that > collections start to last longer. > > *Bernd:* You claim that most of the data only lives for 100ms, that does > not match with the age-size distribution (not at the beginning not at the > end). > *A:* I said that for 2 reasons. Firstly, you can see by the log bellow > that most transactions last < 25 ms: > > | interval | number of transactions | % | > |------------------------+---------------------------+-------------------------| > | 0 ms <= n < 25 ms : 7487644 : 97.704 | > | 25 ms <= n < 50 ms : 137146 : 1.790 | > | 50 ms <= n < 75 ms : 26422 : 0.345 | > | 75 ms <= n < 100 ms : 8086 : 0.106 | > | 100 ms <= n < 200 ms : 4081 : 0.053 | > | 200 ms <= n < 500 ms : 216 : 0.003 | > | 500 ms <= n < 1000 ms : 0 : 0.000 | > > And secondly, very few objects are promoted to old gen. > > *Wolfgang*, what you said about survivor also seems to make sense, but I > ran some tests with survivorRation=8 and survivorRation=16 and the > results were pretty much the same. > > I also collected some data using "sar -B" and vmstat commands in order to > try to find out something else. > > sar -B > > 12:58:33 PM pgpgin/s pgpgout/s fault/s majflt/s > 12:58:43 PM 0.00 5.19 16.98 0.00 > 12:58:53 PM 0.00 6.80 20.70 0.00 > 12:59:03 PM 0.00 12.81 16.72 0.00 > 12:59:13 PM 0.00 3.60 17.98 0.00 > 12:59:23 PM 0.00 14.81 118.42 0.00 > 12:59:33 PM 0.00 11.20 90.70 0.00 > 12:59:43 PM 0.00 5.20 662.60 0.00 (here GC started to > take longer) > 12:59:53 PM 0.00 5.20 1313.10 0.00 > 01:00:03 PM 0.00 20.42 960.66 0.00 > 01:00:13 PM 0.00 17.18 620.78 0.00 > 01:00:23 PM 0.00 3.60 725.93 0.00 > 01:00:33 PM 0.00 15.18 465.13 0.00 > 01:00:33 PM pgpgin/s pgpgout/s fault/s majflt/s > 01:00:43 PM 0.00 12.01 508.31 0.00 > 01:00:53 PM 0.00 6.00 588.50 0.00 > 01:01:03 PM 0.00 20.00 660.80 0.00 > 01:01:13 PM 0.00 6.79 553.05 0.00 > > Page faults start to increase along with the degradation problem, but I'm > not 100% sure about this relation, mainly because there's a lot of free > memory, as vmstat shows bellow. However, I saw some people saying that > page faults may occur even when there is free memory. > > vmstat > > procs -----------memory---------- ---swap-- -----io---- --system-- ----- > cpu------ > r b swpd free buff cache si so bi bo in cs us syid > wa st > 34 0 0 10803608 196472 925120 0 0 0 4 64804 109417 49 > 7 45 0 0 > 17 0 0 10802604 196472 925120 0 0 0 14 66130 111493 52 > 7 41 0 0 > 22 0 0 10795060 196472 925120 0 0 0 12 65331 110577 49 > 7 45 0 0 > 20 0 0 10758080 196472 925120 0 0 0 4 65222 111041 48 > 7 45 0 0 > 23 0 0 10712208 196472 925120 0 0 0 7 64759 110016 49 > 7 45 0 0 > 8 0 0 10682828 196472 925140 0 0 0 33 64780 109899 49 > 7 44 0 0 > 17 0 0 10655280 196472 925140 0 0 0 5 64321 109619 50 > 7 44 0 0 > procs -----------memory---------- ---swap-- -----io---- --system-- ----- > cpu------ > r b swpd free buff cache si so bi bo in cs us syid > wa st > 17 0 0 10636300 196472 925140 0 0 0 12 64574 108885 50 > 7 44 0 0 > 4 0 0 10614888 196472 925140 0 0 0 5 63384 107379 49 > 7 44 0 0 > 18 0 0 10595172 196472 925140 0 0 0 14 65450 110004 50 > 7 43 0 0 > 28 0 0 10576420 196472 925140 0 0 0 4 64720 109119 48 > 7 45 0 0 > 29 0 0 10554908 196472 925140 0 0 0 25 64051 108606 51 > 7 42 0 0 > 33 0 0 10537584 196472 925140 0 0 0 11 64501 109765 50 > 7 43 0 0 > 24 0 0 10521128 196472 925140 0 0 0 5 64439 109538 51 > 7 42 0 0 > > It seems that vmstat doesn't show anything problematic. > > Any other advice? > > Thanks again. > > > On Wed, Dec 18, 2013 at 5:57 PM, Wolfgang Pedot < > wolfgang.pedot at finkzeit.at> wrote: > >> Hi, >> >> this is the first time I write an answer on this mailing-list so this >> could be totally useless but here goes: >> >> Your survivor-space seems to be quite empty, is the usage that low on all >> collects during your test? If so you could increase your survivor-ratio to >> gain more eden-space and if not many objects die in survivor you could also >> reduce the tenuring threshold. Total survivor usage has grown 6-fold from >> first to last GC and survivor space needs to be copied on each young gc. I >> admit it should probably not take that long to copy 60MB though... >> >> Here is a young-gc from one of my logs for comparison: >> >> 30230.123: [ParNew >> Desired survivor size 524261784 bytes, new threshold 12 (max 15) >> - age 1: 113917760 bytes, 113917760 total >> - age 2: 86192768 bytes, 200110528 total >> - age 3: 59060992 bytes, 259171520 total >> - age 4: 59319272 bytes, 318490792 total >> - age 5: 45307432 bytes, 363798224 total >> - age 6: 29478464 bytes, 393276688 total >> - age 7: 27440744 bytes, 420717432 total >> - age 8: 27947680 bytes, 448665112 total >> - age 9: 27294496 bytes, 475959608 total >> - age 10: 32830144 bytes, 508789752 total >> - age 11: 7490968 bytes, 516280720 total >> - age 12: 10723104 bytes, 527003824 total >> - age 13: 4549808 bytes, 531553632 total >> : 4306611K->731392K(4388608K), 0.1433810 secs] >> 10422356K->6878961K(14116608K) >> >> This is with MaxNewSize 5500m and a Survivor-Ratio of 8. You can see that >> GC-time is higher than yours (6core 3.33GHz Xeon), survivor-usage is way >> higher though. >> >> Hope I could help >> Wolfgang >> >> >> Am 18.12.2013 19:58, schrieb Luciano Molinari: >> >>> Hi everybody, >>> >>> We have a standalone Java app that receives requests through RMI and >>> almost all the objects created by it are short (< ~100ms) lived objects. >>> This app is running on a 24 cores server with 16 GB RAM (Red Hat Linux). >>> During our performance tests (10k requests/second) we started to face a >>> problem where the throughput decreases suddenly just a few minutes >>> after the app was started. >>> So, I started to investigate GC behaviour and to make some adjustments >>> (increase memory, use CMS...) and now we are able to run our app >>> properly for about 35 minutes. At this point the time spent during young >>> collections grows sharply although no Full GC is executed (old gen is >>> only ~4% full). >>> >>> I've done tests with many different parameters, but currently I'm using >>> the following ones: >>> java -server -verbose:gc -XX:+PrintGCDetails >>> -XX:+PrintTenuringDistribution -XX:+PrintGCTimeStamps >>> -XX:PrintFLSStatistics=1 -XX:SurvivorRatio=4 >>> -XX:ParallelGCThreads=8 -XX:PermSize=256m -XX:+UseParNewGC >>> -XX:MaxPermSize=256m -Xms7g -Xmx7g -XX:NewSize=4608m -XX:MaxNewSize=4608m >>> -XX:MaxTenuringThreshold=15 -Dsun.rmi.dgc.client.gcInterval=3600000 >>> -Dsun.rmi.dgc.server.gcInterval=3600000 >>> -Djava.rmi.server.hostname=IP_ADDRESS >>> >>> If I use this same configuration (without CMS) the same problem occurs >>> after 20minutes, so it doesn't seem to be related to CMS. Actually, as I >>> mentioned above, CMS (Full GC) isn't executed during the tests. >>> >>> Some logs I've collected: >>> >>> 1992.748: [ParNew >>> Desired survivor size 402653184 bytes, new threshold 15 (max 15) >>> - age 1: 9308728 bytes, 9308728 total >>> - age 2: 3448 bytes, 9312176 total >>> - age 3: 1080 bytes, 9313256 total >>> - age 4: 32 bytes, 9313288 total >>> - age 5: 34768 bytes, 9348056 total >>> - age 6: 32 bytes, 9348088 total >>> - age 15: 2712 bytes, 9350800 total >>> : 3154710K->10313K(3932160K), 0.0273150 secs] 3215786K->71392K(6553600K) >>> >>> //14 YGC happened during this window >>> >>> 2021.165: [ParNew >>> Desired survivor size 402653184 bytes, new threshold 15 (max 15) >>> - age 1: 9459544 bytes, 9459544 total >>> - age 2: 3648200 bytes, 13107744 total >>> - age 3: 3837976 bytes, 16945720 total >>> - age 4: 3472448 bytes, 20418168 total >>> - age 5: 3586896 bytes, 24005064 total >>> - age 6: 3475560 bytes, 27480624 total >>> - age 7: 3520952 bytes, 31001576 total >>> - age 8: 3612088 bytes, 34613664 total >>> - age 9: 3355160 bytes, 37968824 total >>> - age 10: 3823032 bytes, 41791856 total >>> - age 11: 3304576 bytes, 45096432 total >>> - age 12: 3671288 bytes, 48767720 total >>> - age 13: 3558696 bytes, 52326416 total >>> - age 14: 3805744 bytes, 56132160 total >>> - age 15: 3429672 bytes, 59561832 total >>> : 3230658K->77508K(3932160K), 0.1143860 secs] 3291757K->142447K(6553600K) >>> >>> Besides the longer time to perform collection, I also realized that all >>> 15 ages started to have larger values. >>> >>> I must say I'm a little confused about this scenario. Does anyone have >>> some tip? >>> >>> Thanks in advance, >>> -- >>> Luciano >>> >>> >>> _______________________________________________ >>> hotspot-gc-use mailing list >>> hotspot-gc-use at openjdk.java.net >>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>> >>> >> > > > -- > Luciano Davoglio Molinari > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > -- ? ?????????, ???????? ?.?. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20131219/f8ba42ed/attachment.html From ivan.mamontov at gmail.com Thu Dec 19 09:26:11 2013 From: ivan.mamontov at gmail.com (=?UTF-8?B?0JzQsNC80L7QvdGC0L7QsiDQmNCy0LDQvQ==?=) Date: Thu, 19 Dec 2013 21:26:11 +0400 Subject: YGC time increasing suddenly In-Reply-To: References: <52B1FE20.3010909@finkzeit.at>

Message-ID: I am very interested in the number of created threads since the beginning of the application. If you use an executor, you can tell a static sequence number, for example *coreLoadExecutor-3-thread-3* 2013/12/19 ???????? ???? > Hi, > > I had the same problem as you describe, but first I need to know one > thing: > > - How many threads have been created at runtime? > > > I am writing a letter describing the same issue and its solution. > > 2013/12/19 Luciano Molinari > >> Bernd and Wolfgang, thanks for your quick answers. I took some time to >> answer them because I was running some tests based on your comments. >> >> *Bernd:* I would look at the finalizer queue first. >> *A: *From what I could find in the code, it doesn't seem to have >> explicit finalizers. Is there any way to check this queue?I found some >> articles about the problems finalize() method may cause, but I wasn't able >> to find something related to monitoring this queue. >> >> *Bernd:* And if that does not cut it, take a heapdump and inspect it >> for unexpected large dominators (maybe cached softreferences - not sure >> about RMI DGC havent seen problems with it, but it sure can be a >> problem if it only cleans up once an hour.). >> *A:* Regarding RMI, I ran some tests replacing it by JeroMQ but >> unfortunately I got the same results. About heapdump, Eclipse MAT shows >> almost nothing (only ~50mb) because the majority of objects are unreachable. >> >> *Bernd:* How often do you see YGC at the beginning and then over time? >> It looks like every 2s? You might want to resize YGC by larger factors >> (but with the yg already at 4g I guess something else is a problem here). >> *A*: After I start my tests, YGC occurs once or twice every 3 seconds, >> as the following log shows: >> jstat -gcutil 29331 3000 >> S0 S1 E O P YGC YGCT FGC FGCT GCT >> 1.40 0.00 89.74 2.13 11.86 602 12.126 1 0.086 12.212 >> 1.64 0.00 66.92 2.13 11.86 604 12.166 1 0.086 12.252 >> 1.38 0.00 41.10 2.13 11.86 606 12.204 1 0.086 12.290 >> 1.47 0.00 10.86 2.13 11.86 608 12.244 1 0.086 12.330 >> 0.00 1.47 89.35 2.13 11.86 609 12.265 1 0.086 12.351 >> 0.00 1.51 62.11 2.13 11.86 611 12.305 1 0.086 12.391 >> 0.00 1.38 32.83 2.14 11.86 613 12.346 1 0.086 12.432 >> 0.00 0.96 11.06 2.21 11.86 615 12.386 1 0.086 12.472 >> 0.97 0.00 72.35 2.22 11.86 616 12.406 1 0.086 12.492 >> It keeps this rate during the whole time, the only difference is that >> collections start to last longer. >> >> *Bernd:* You claim that most of the data only lives for 100ms, that does >> not match with the age-size distribution (not at the beginning not at the >> end). >> *A:* I said that for 2 reasons. Firstly, you can see by the log bellow >> that most transactions last < 25 ms: >> >> | interval | number of transactions | % | >> |------------------------+---------------------------+-------------------------| >> | 0 ms <= n < 25 ms : 7487644 : 97.704 | >> | 25 ms <= n < 50 ms : 137146 : 1.790 | >> | 50 ms <= n < 75 ms : 26422 : 0.345 | >> | 75 ms <= n < 100 ms : 8086 : 0.106 | >> | 100 ms <= n < 200 ms : 4081 : 0.053 | >> | 200 ms <= n < 500 ms : 216 : 0.003 | >> | 500 ms <= n < 1000 ms : 0 : 0.000 >> | >> >> And secondly, very few objects are promoted to old gen. >> >> *Wolfgang*, what you said about survivor also seems to make sense, but I >> ran some tests with survivorRation=8 and survivorRation=16 and the >> results were pretty much the same. >> >> I also collected some data using "sar -B" and vmstat commands in order >> to try to find out something else. >> >> sar -B >> >> 12:58:33 PM pgpgin/s pgpgout/s fault/s majflt/s >> 12:58:43 PM 0.00 5.19 16.98 0.00 >> 12:58:53 PM 0.00 6.80 20.70 0.00 >> 12:59:03 PM 0.00 12.81 16.72 0.00 >> 12:59:13 PM 0.00 3.60 17.98 0.00 >> 12:59:23 PM 0.00 14.81 118.42 0.00 >> 12:59:33 PM 0.00 11.20 90.70 0.00 >> 12:59:43 PM 0.00 5.20 662.60 0.00 (here GC started to >> take longer) >> 12:59:53 PM 0.00 5.20 1313.10 0.00 >> 01:00:03 PM 0.00 20.42 960.66 0.00 >> 01:00:13 PM 0.00 17.18 620.78 0.00 >> 01:00:23 PM 0.00 3.60 725.93 0.00 >> 01:00:33 PM 0.00 15.18 465.13 0.00 >> 01:00:33 PM pgpgin/s pgpgout/s fault/s majflt/s >> 01:00:43 PM 0.00 12.01 508.31 0.00 >> 01:00:53 PM 0.00 6.00 588.50 0.00 >> 01:01:03 PM 0.00 20.00 660.80 0.00 >> 01:01:13 PM 0.00 6.79 553.05 0.00 >> >> Page faults start to increase along with the degradation problem, but I'm >> not 100% sure about this relation, mainly because there's a lot of free >> memory, as vmstat shows bellow. However, I saw some people saying that >> page faults may occur even when there is free memory. >> >> vmstat >> >> procs -----------memory---------- ---swap-- -----io---- --system-- ----- >> cpu------ >> r b swpd free buff cache si so bi bo in cs us syid >> wa st >> 34 0 0 10803608 196472 925120 0 0 0 4 64804 109417 49 >> 7 45 0 0 >> 17 0 0 10802604 196472 925120 0 0 0 14 66130 111493 52 >> 7 41 0 0 >> 22 0 0 10795060 196472 925120 0 0 0 12 65331 110577 49 >> 7 45 0 0 >> 20 0 0 10758080 196472 925120 0 0 0 4 65222 111041 48 >> 7 45 0 0 >> 23 0 0 10712208 196472 925120 0 0 0 7 64759 110016 49 >> 7 45 0 0 >> 8 0 0 10682828 196472 925140 0 0 0 33 64780 109899 49 >> 7 44 0 0 >> 17 0 0 10655280 196472 925140 0 0 0 5 64321 109619 50 >> 7 44 0 0 >> procs -----------memory---------- ---swap-- -----io---- --system-- ----- >> cpu------ >> r b swpd free buff cache si so bi bo in cs us syid >> wa st >> 17 0 0 10636300 196472 925140 0 0 0 12 64574 108885 50 >> 7 44 0 0 >> 4 0 0 10614888 196472 925140 0 0 0 5 63384 107379 49 >> 7 44 0 0 >> 18 0 0 10595172 196472 925140 0 0 0 14 65450 110004 50 >> 7 43 0 0 >> 28 0 0 10576420 196472 925140 0 0 0 4 64720 109119 48 >> 7 45 0 0 >> 29 0 0 10554908 196472 925140 0 0 0 25 64051 108606 51 >> 7 42 0 0 >> 33 0 0 10537584 196472 925140 0 0 0 11 64501 109765 50 >> 7 43 0 0 >> 24 0 0 10521128 196472 925140 0 0 0 5 64439 109538 51 >> 7 42 0 0 >> >> It seems that vmstat doesn't show anything problematic. >> >> Any other advice? >> >> Thanks again. >> >> >> On Wed, Dec 18, 2013 at 5:57 PM, Wolfgang Pedot < >> wolfgang.pedot at finkzeit.at> wrote: >> >>> Hi, >>> >>> this is the first time I write an answer on this mailing-list so this >>> could be totally useless but here goes: >>> >>> Your survivor-space seems to be quite empty, is the usage that low on >>> all collects during your test? If so you could increase your survivor-ratio >>> to gain more eden-space and if not many objects die in survivor you could >>> also reduce the tenuring threshold. Total survivor usage has grown 6-fold >>> from first to last GC and survivor space needs to be copied on each young >>> gc. I admit it should probably not take that long to copy 60MB though... >>> >>> Here is a young-gc from one of my logs for comparison: >>> >>> 30230.123: [ParNew >>> Desired survivor size 524261784 bytes, new threshold 12 (max 15) >>> - age 1: 113917760 bytes, 113917760 total >>> - age 2: 86192768 bytes, 200110528 total >>> - age 3: 59060992 bytes, 259171520 total >>> - age 4: 59319272 bytes, 318490792 total >>> - age 5: 45307432 bytes, 363798224 total >>> - age 6: 29478464 bytes, 393276688 total >>> - age 7: 27440744 bytes, 420717432 total >>> - age 8: 27947680 bytes, 448665112 total >>> - age 9: 27294496 bytes, 475959608 total >>> - age 10: 32830144 bytes, 508789752 total >>> - age 11: 7490968 bytes, 516280720 total >>> - age 12: 10723104 bytes, 527003824 total >>> - age 13: 4549808 bytes, 531553632 total >>> : 4306611K->731392K(4388608K), 0.1433810 secs] >>> 10422356K->6878961K(14116608K) >>> >>> This is with MaxNewSize 5500m and a Survivor-Ratio of 8. You can see >>> that GC-time is higher than yours (6core 3.33GHz Xeon), survivor-usage is >>> way higher though. >>> >>> Hope I could help >>> Wolfgang >>> >>> >>> Am 18.12.2013 19:58, schrieb Luciano Molinari: >>> >>>> Hi everybody, >>>> >>>> We have a standalone Java app that receives requests through RMI and >>>> almost all the objects created by it are short (< ~100ms) lived objects. >>>> This app is running on a 24 cores server with 16 GB RAM (Red Hat Linux). >>>> During our performance tests (10k requests/second) we started to face a >>>> problem where the throughput decreases suddenly just a few minutes >>>> after the app was started. >>>> So, I started to investigate GC behaviour and to make some adjustments >>>> (increase memory, use CMS...) and now we are able to run our app >>>> properly for about 35 minutes. At this point the time spent during young >>>> collections grows sharply although no Full GC is executed (old gen is >>>> only ~4% full). >>>> >>>> I've done tests with many different parameters, but currently I'm using >>>> the following ones: >>>> java -server -verbose:gc -XX:+PrintGCDetails >>>> -XX:+PrintTenuringDistribution -XX:+PrintGCTimeStamps >>>> -XX:PrintFLSStatistics=1 -XX:SurvivorRatio=4 >>>> -XX:ParallelGCThreads=8 -XX:PermSize=256m -XX:+UseParNewGC >>>> -XX:MaxPermSize=256m -Xms7g -Xmx7g -XX:NewSize=4608m >>>> -XX:MaxNewSize=4608m >>>> -XX:MaxTenuringThreshold=15 -Dsun.rmi.dgc.client.gcInterval=3600000 >>>> -Dsun.rmi.dgc.server.gcInterval=3600000 >>>> -Djava.rmi.server.hostname=IP_ADDRESS >>>> >>>> If I use this same configuration (without CMS) the same problem occurs >>>> after 20minutes, so it doesn't seem to be related to CMS. Actually, as I >>>> mentioned above, CMS (Full GC) isn't executed during the tests. >>>> >>>> Some logs I've collected: >>>> >>>> 1992.748: [ParNew >>>> Desired survivor size 402653184 bytes, new threshold 15 (max 15) >>>> - age 1: 9308728 bytes, 9308728 total >>>> - age 2: 3448 bytes, 9312176 total >>>> - age 3: 1080 bytes, 9313256 total >>>> - age 4: 32 bytes, 9313288 total >>>> - age 5: 34768 bytes, 9348056 total >>>> - age 6: 32 bytes, 9348088 total >>>> - age 15: 2712 bytes, 9350800 total >>>> : 3154710K->10313K(3932160K), 0.0273150 secs] 3215786K->71392K(6553600K) >>>> >>>> //14 YGC happened during this window >>>> >>>> 2021.165: [ParNew >>>> Desired survivor size 402653184 bytes, new threshold 15 (max 15) >>>> - age 1: 9459544 bytes, 9459544 total >>>> - age 2: 3648200 bytes, 13107744 total >>>> - age 3: 3837976 bytes, 16945720 total >>>> - age 4: 3472448 bytes, 20418168 total >>>> - age 5: 3586896 bytes, 24005064 total >>>> - age 6: 3475560 bytes, 27480624 total >>>> - age 7: 3520952 bytes, 31001576 total >>>> - age 8: 3612088 bytes, 34613664 total >>>> - age 9: 3355160 bytes, 37968824 total >>>> - age 10: 3823032 bytes, 41791856 total >>>> - age 11: 3304576 bytes, 45096432 total >>>> - age 12: 3671288 bytes, 48767720 total >>>> - age 13: 3558696 bytes, 52326416 total >>>> - age 14: 3805744 bytes, 56132160 total >>>> - age 15: 3429672 bytes, 59561832 total >>>> : 3230658K->77508K(3932160K), 0.1143860 secs] >>>> 3291757K->142447K(6553600K) >>>> >>>> Besides the longer time to perform collection, I also realized that all >>>> 15 ages started to have larger values. >>>> >>>> I must say I'm a little confused about this scenario. Does anyone have >>>> some tip? >>>> >>>> Thanks in advance, >>>> -- >>>> Luciano >>>> >>>> >>>> _______________________________________________ >>>> hotspot-gc-use mailing list >>>> hotspot-gc-use at openjdk.java.net >>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>>> >>>> >>> >> >> >> -- >> Luciano Davoglio Molinari >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> >> > > > -- > ? ?????????, > ???????? ?.?. > -- Thanks, Ivan -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20131219/d8fc81a1/attachment-0001.html From lucmolinari at gmail.com Thu Dec 19 10:05:43 2013 From: lucmolinari at gmail.com (Luciano Molinari) Date: Thu, 19 Dec 2013 16:05:43 -0200 Subject: YGC time increasing suddenly In-Reply-To: References: <52B1FE20.3010909@finkzeit.at>

Message-ID: Hi, I used thread dump to find out that...I have ~315 threads.. It would be great to read your letter and check if the solution can be applied to my case. Thanks. On Thu, Dec 19, 2013 at 3:26 PM, ???????? ???? wrote: > I am very interested in the number of created threads since the beginning > of the application. > If you use an executor, you can tell a static sequence number, for example > *coreLoadExecutor-3-thread-3* > > > 2013/12/19 ???????? ???? > >> Hi, >> >> I had the same problem as you describe, but first I need to know one >> thing: >> >> - How many threads have been created at runtime? >> >> >> I am writing a letter describing the same issue and its solution. >> >> 2013/12/19 Luciano Molinari >> >>> Bernd and Wolfgang, thanks for your quick answers. I took some time to >>> answer them because I was running some tests based on your comments. >>> >>> *Bernd:* I would look at the finalizer queue first. >>> *A: *From what I could find in the code, it doesn't seem to have >>> explicit finalizers. Is there any way to check this queue?I found some >>> articles about the problems finalize() method may cause, but I wasn't able >>> to find something related to monitoring this queue. >>> >>> *Bernd:* And if that does not cut it, take a heapdump and inspect it >>> for unexpected large dominators (maybe cached softreferences - not >>> sure about RMI DGC havent seen problems with it, but it sure can be a >>> problem if it only cleans up once an hour.). >>> *A:* Regarding RMI, I ran some tests replacing it by JeroMQ but >>> unfortunately I got the same results. About heapdump, Eclipse MAT shows >>> almost nothing (only ~50mb) because the majority of objects are unreachable. >>> >>> *Bernd:* How often do you see YGC at the beginning and then over time? >>> It looks like every 2s? You might want to resize YGC by larger >>> factors (but with the yg already at 4g I guess something else is a >>> problem here). >>> *A*: After I start my tests, YGC occurs once or twice every 3 seconds, >>> as the following log shows: >>> jstat -gcutil 29331 3000 >>> S0 S1 E O P YGC YGCT FGC FGCT GCT >>> 1.40 0.00 89.74 2.13 11.86 602 12.126 1 0.086 >>> 12.212 >>> 1.64 0.00 66.92 2.13 11.86 604 12.166 1 0.086 >>> 12.252 >>> 1.38 0.00 41.10 2.13 11.86 606 12.204 1 0.086 >>> 12.290 >>> 1.47 0.00 10.86 2.13 11.86 608 12.244 1 0.086 >>> 12.330 >>> 0.00 1.47 89.35 2.13 11.86 609 12.265 1 0.086 >>> 12.351 >>> 0.00 1.51 62.11 2.13 11.86 611 12.305 1 0.086 >>> 12.391 >>> 0.00 1.38 32.83 2.14 11.86 613 12.346 1 0.086 >>> 12.432 >>> 0.00 0.96 11.06 2.21 11.86 615 12.386 1 0.086 >>> 12.472 >>> 0.97 0.00 72.35 2.22 11.86 616 12.406 1 0.086 >>> 12.492 >>> It keeps this rate during the whole time, the only difference is that >>> collections start to last longer. >>> >>> *Bernd:* You claim that most of the data only lives for 100ms, that >>> does not match with the age-size distribution (not at the beginning not at >>> the end). >>> *A:* I said that for 2 reasons. Firstly, you can see by the log bellow >>> that most transactions last < 25 ms: >>> >>> | interval | number of transactions | % | >>> |------------------------+---------------------------+-------------------------| >>> | 0 ms <= n < 25 ms : 7487644 : 97.704 >>> | >>> | 25 ms <= n < 50 ms : 137146 : 1.790 | >>> | 50 ms <= n < 75 ms : 26422 : 0.345 | >>> | 75 ms <= n < 100 ms : 8086 : 0.106 >>> | >>> | 100 ms <= n < 200 ms : 4081 : 0.053 >>> | >>> | 200 ms <= n < 500 ms : 216 : 0.003 >>> | >>> | 500 ms <= n < 1000 ms : 0 : 0.000 >>> | >>> >>> And secondly, very few objects are promoted to old gen. >>> >>> *Wolfgang*, what you said about survivor also seems to make sense, but >>> I ran some tests with survivorRation=8 and survivorRation=16 and the >>> results were pretty much the same. >>> >>> I also collected some data using "sar -B" and vmstat commands in order >>> to try to find out something else. >>> >>> sar -B >>> >>> 12:58:33 PM pgpgin/s pgpgout/s fault/s majflt/s >>> 12:58:43 PM 0.00 5.19 16.98 0.00 >>> 12:58:53 PM 0.00 6.80 20.70 0.00 >>> 12:59:03 PM 0.00 12.81 16.72 0.00 >>> 12:59:13 PM 0.00 3.60 17.98 0.00 >>> 12:59:23 PM 0.00 14.81 118.42 0.00 >>> 12:59:33 PM 0.00 11.20 90.70 0.00 >>> 12:59:43 PM 0.00 5.20 662.60 0.00 (here GC started to >>> take longer) >>> 12:59:53 PM 0.00 5.20 1313.10 0.00 >>> 01:00:03 PM 0.00 20.42 960.66 0.00 >>> 01:00:13 PM 0.00 17.18 620.78 0.00 >>> 01:00:23 PM 0.00 3.60 725.93 0.00 >>> 01:00:33 PM 0.00 15.18 465.13 0.00 >>> 01:00:33 PM pgpgin/s pgpgout/s fault/s majflt/s >>> 01:00:43 PM 0.00 12.01 508.31 0.00 >>> 01:00:53 PM 0.00 6.00 588.50 0.00 >>> 01:01:03 PM 0.00 20.00 660.80 0.00 >>> 01:01:13 PM 0.00 6.79 553.05 0.00 >>> >>> Page faults start to increase along with the degradation problem, but >>> I'm not 100% sure about this relation, mainly because there's a lot of free >>> memory, as vmstat shows bellow. However, I saw some people saying that >>> page faults may occur even when there is free memory. >>> >>> vmstat >>> >>> procs -----------memory---------- ---swap-- -----io---- --system-- ----- >>> cpu------ >>> r b swpd free buff cache si so bi bo in cs us syid >>> wa st >>> 34 0 0 10803608 196472 925120 0 0 0 4 64804 109417 >>> 49 7 45 0 0 >>> 17 0 0 10802604 196472 925120 0 0 0 14 66130 111493 >>> 52 7 41 0 0 >>> 22 0 0 10795060 196472 925120 0 0 0 12 65331 110577 >>> 49 7 45 0 0 >>> 20 0 0 10758080 196472 925120 0 0 0 4 65222 111041 >>> 48 7 45 0 0 >>> 23 0 0 10712208 196472 925120 0 0 0 7 64759 110016 >>> 49 7 45 0 0 >>> 8 0 0 10682828 196472 925140 0 0 0 33 64780 109899 >>> 49 7 44 0 0 >>> 17 0 0 10655280 196472 925140 0 0 0 5 64321 109619 >>> 50 7 44 0 0 >>> procs -----------memory---------- ---swap-- -----io---- --system-- ----- >>> cpu------ >>> r b swpd free buff cache si so bi bo in cs us syid >>> wa st >>> 17 0 0 10636300 196472 925140 0 0 0 12 64574 108885 >>> 50 7 44 0 0 >>> 4 0 0 10614888 196472 925140 0 0 0 5 63384 107379 >>> 49 7 44 0 0 >>> 18 0 0 10595172 196472 925140 0 0 0 14 65450 110004 >>> 50 7 43 0 0 >>> 28 0 0 10576420 196472 925140 0 0 0 4 64720 109119 >>> 48 7 45 0 0 >>> 29 0 0 10554908 196472 925140 0 0 0 25 64051 108606 >>> 51 7 42 0 0 >>> 33 0 0 10537584 196472 925140 0 0 0 11 64501 109765 >>> 50 7 43 0 0 >>> 24 0 0 10521128 196472 925140 0 0 0 5 64439 109538 >>> 51 7 42 0 0 >>> >>> It seems that vmstat doesn't show anything problematic. >>> >>> Any other advice? >>> >>> Thanks again. >>> >>> >>> On Wed, Dec 18, 2013 at 5:57 PM, Wolfgang Pedot < >>> wolfgang.pedot at finkzeit.at> wrote: >>> >>>> Hi, >>>> >>>> this is the first time I write an answer on this mailing-list so this >>>> could be totally useless but here goes: >>>> >>>> Your survivor-space seems to be quite empty, is the usage that low on >>>> all collects during your test? If so you could increase your survivor-ratio >>>> to gain more eden-space and if not many objects die in survivor you could >>>> also reduce the tenuring threshold. Total survivor usage has grown 6-fold >>>> from first to last GC and survivor space needs to be copied on each young >>>> gc. I admit it should probably not take that long to copy 60MB though... >>>> >>>> Here is a young-gc from one of my logs for comparison: >>>> >>>> 30230.123: [ParNew >>>> Desired survivor size 524261784 bytes, new threshold 12 (max 15) >>>> - age 1: 113917760 bytes, 113917760 total >>>> - age 2: 86192768 bytes, 200110528 total >>>> - age 3: 59060992 bytes, 259171520 total >>>> - age 4: 59319272 bytes, 318490792 total >>>> - age 5: 45307432 bytes, 363798224 total >>>> - age 6: 29478464 bytes, 393276688 total >>>> - age 7: 27440744 bytes, 420717432 total >>>> - age 8: 27947680 bytes, 448665112 total >>>> - age 9: 27294496 bytes, 475959608 total >>>> - age 10: 32830144 bytes, 508789752 total >>>> - age 11: 7490968 bytes, 516280720 total >>>> - age 12: 10723104 bytes, 527003824 total >>>> - age 13: 4549808 bytes, 531553632 total >>>> : 4306611K->731392K(4388608K), 0.1433810 secs] >>>> 10422356K->6878961K(14116608K) >>>> >>>> This is with MaxNewSize 5500m and a Survivor-Ratio of 8. You can see >>>> that GC-time is higher than yours (6core 3.33GHz Xeon), survivor-usage is >>>> way higher though. >>>> >>>> Hope I could help >>>> Wolfgang >>>> >>>> >>>> Am 18.12.2013 19:58, schrieb Luciano Molinari: >>>> >>>>> Hi everybody, >>>>> >>>>> We have a standalone Java app that receives requests through RMI and >>>>> almost all the objects created by it are short (< ~100ms) lived >>>>> objects. >>>>> This app is running on a 24 cores server with 16 GB RAM (Red Hat >>>>> Linux). >>>>> During our performance tests (10k requests/second) we started to face a >>>>> problem where the throughput decreases suddenly just a few minutes >>>>> after the app was started. >>>>> So, I started to investigate GC behaviour and to make some adjustments >>>>> (increase memory, use CMS...) and now we are able to run our app >>>>> properly for about 35 minutes. At this point the time spent during >>>>> young >>>>> collections grows sharply although no Full GC is executed (old gen is >>>>> only ~4% full). >>>>> >>>>> I've done tests with many different parameters, but currently I'm using >>>>> the following ones: >>>>> java -server -verbose:gc -XX:+PrintGCDetails >>>>> -XX:+PrintTenuringDistribution -XX:+PrintGCTimeStamps >>>>> -XX:PrintFLSStatistics=1 -XX:SurvivorRatio=4 >>>>> -XX:ParallelGCThreads=8 -XX:PermSize=256m -XX:+UseParNewGC >>>>> -XX:MaxPermSize=256m -Xms7g -Xmx7g -XX:NewSize=4608m >>>>> -XX:MaxNewSize=4608m >>>>> -XX:MaxTenuringThreshold=15 -Dsun.rmi.dgc.client.gcInterval=3600000 >>>>> -Dsun.rmi.dgc.server.gcInterval=3600000 >>>>> -Djava.rmi.server.hostname=IP_ADDRESS >>>>> >>>>> If I use this same configuration (without CMS) the same problem occurs >>>>> after 20minutes, so it doesn't seem to be related to CMS. Actually, as >>>>> I >>>>> mentioned above, CMS (Full GC) isn't executed during the tests. >>>>> >>>>> Some logs I've collected: >>>>> >>>>> 1992.748: [ParNew >>>>> Desired survivor size 402653184 bytes, new threshold 15 (max 15) >>>>> - age 1: 9308728 bytes, 9308728 total >>>>> - age 2: 3448 bytes, 9312176 total >>>>> - age 3: 1080 bytes, 9313256 total >>>>> - age 4: 32 bytes, 9313288 total >>>>> - age 5: 34768 bytes, 9348056 total >>>>> - age 6: 32 bytes, 9348088 total >>>>> - age 15: 2712 bytes, 9350800 total >>>>> : 3154710K->10313K(3932160K), 0.0273150 secs] >>>>> 3215786K->71392K(6553600K) >>>>> >>>>> //14 YGC happened during this window >>>>> >>>>> 2021.165: [ParNew >>>>> Desired survivor size 402653184 bytes, new threshold 15 (max 15) >>>>> - age 1: 9459544 bytes, 9459544 total >>>>> - age 2: 3648200 bytes, 13107744 total >>>>> - age 3: 3837976 bytes, 16945720 total >>>>> - age 4: 3472448 bytes, 20418168 total >>>>> - age 5: 3586896 bytes, 24005064 total >>>>> - age 6: 3475560 bytes, 27480624 total >>>>> - age 7: 3520952 bytes, 31001576 total >>>>> - age 8: 3612088 bytes, 34613664 total >>>>> - age 9: 3355160 bytes, 37968824 total >>>>> - age 10: 3823032 bytes, 41791856 total >>>>> - age 11: 3304576 bytes, 45096432 total >>>>> - age 12: 3671288 bytes, 48767720 total >>>>> - age 13: 3558696 bytes, 52326416 total >>>>> - age 14: 3805744 bytes, 56132160 total >>>>> - age 15: 3429672 bytes, 59561832 total >>>>> : 3230658K->77508K(3932160K), 0.1143860 secs] >>>>> 3291757K->142447K(6553600K) >>>>> >>>>> Besides the longer time to perform collection, I also realized that all >>>>> 15 ages started to have larger values. >>>>> >>>>> I must say I'm a little confused about this scenario. Does anyone have >>>>> some tip? >>>>> >>>>> Thanks in advance, >>>>> -- >>>>> Luciano >>>>> >>>>> >>>>> _______________________________________________ >>>>> hotspot-gc-use mailing list >>>>> hotspot-gc-use at openjdk.java.net >>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>>>> >>>>> >>>> >>> >>> >>> -- >>> Luciano Davoglio Molinari >>> >>> _______________________________________________ >>> hotspot-gc-use mailing list >>> hotspot-gc-use at openjdk.java.net >>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>> >>> >> >> >> -- >> ? ?????????, >> ???????? ?.?. >> > > > > -- > Thanks, > Ivan > -- Luciano Davoglio Molinari -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20131219/d417cefc/attachment.html From bernd-2013 at eckenfels.net Thu Dec 19 11:26:19 2013 From: bernd-2013 at eckenfels.net (Bernd Eckenfels) Date: Thu, 19 Dec 2013 20:26:19 +0100 Subject: YGC time increasing suddenly In-Reply-To: References: <52B1FE20.3010909@finkzeit.at>

Message-ID: <73B04D98-28B2-42F4-8050-EF4137D7E082@eckenfels.net> You can see the number of threads created since VM start with JMX in java.lang:type=Threading TotalStartedThreadCount and PeakThreadCount BTW: is that a NUMA System? Is it using huge pages? Greetings Bernd From jon.masamitsu at oracle.com Thu Dec 19 12:54:37 2013 From: jon.masamitsu at oracle.com (Jon Masamitsu) Date: Thu, 19 Dec 2013 12:54:37 -0800 Subject: YGC time increasing suddenly In-Reply-To: References: <52B1FE20.3010909@finkzeit.at> Message-ID: <52B35D0D.9090307@oracle.com> On 12/19/2013 09:00 AM, Luciano Molinari wrote: > Bernd and Wolfgang, thanks for your quick answers. I took some time to > answer them because I was running some tests based on your comments. > > *Bernd:* I would look at the finalizer queue first. > *A: *From what I could find in the code, it doesn't seem to have > explicit finalizers. Is there any way to check this queue?I found some > articles about the problems finalize() method may cause, but I wasn't > able to find something related to monitoring this queue. Try -XX:+PrintReferenceGC Output looks like 0.247: [GC (Allocation Failure) 0.247: [ParNew0.521: [SoftReference, 0 refs, 0.0000775 secs]0.521: [WeakReference, 6 refs, 0.0000239 secs]0.521: [FinalReference, 6 refs, 0.0000551 secs]0.521: [PhantomReference, 0 refs, 0.0000745 secs]0.521: [JNI Weak Reference, 0.0000217 secs]: 34944K->4352K(39296K), 0.2746585 secs] 34944K->23967K(126720K), 0.2747496 secs] [Times: user=0.77 sys=0.10, real=0.27 secs] Jon > > *Bernd:* And if that does not cut it, take a heapdump and inspect it > for unexpected large dominators (maybe cached softreferences - not > sure about RMI DGC havent seen problems with it, but it sure can be a > problem if it only cleans up once an hour.). > *A:* Regarding RMI, I ran some tests replacing it by JeroMQ but > unfortunately I got the same results. About heapdump, Eclipse MAT > shows almost nothing (only ~50mb) because the majority of objects are > unreachable. > > *Bernd:* How often do you see YGC at the beginning and then over time? > It looks like every 2s? You might want to resize YGC by larger > factors (but with the yg already at 4g I guess something else is a > problem here). > *A*: After I start my tests, YGC occurs once or twice every 3 seconds, > as the following log shows: > jstat -gcutil 29331 3000 > S0 S1 E O P YGC YGCT FGC FGCT GCT > 1.40 0.00 89.74 2.13 11.86 602 12.126 1 0.086 12.212 > 1.64 0.00 66.92 2.13 11.86 604 12.166 1 0.086 12.252 > 1.38 0.00 41.10 2.13 11.86 606 12.204 1 0.086 12.290 > 1.47 0.00 10.86 2.13 11.86 608 12.244 1 0.086 12.330 > 0.00 1.47 89.35 2.13 11.86 609 12.265 1 0.086 12.351 > 0.00 1.51 62.11 2.13 11.86 611 12.305 1 0.086 12.391 > 0.00 1.38 32.83 2.14 11.86 613 12.346 1 0.086 12.432 > 0.00 0.96 11.06 2.21 11.86 615 12.386 1 0.086 12.472 > 0.97 0.00 72.35 2.22 11.86 616 12.406 1 0.086 12.492 > It keeps this rate during the whole time, the only difference is that > collections start to last longer. > > *Bernd:* You claim that most of the data only lives for 100ms, that > does not match with the age-size distribution (not at the beginning > not at the end). > *A:* I said that for 2 reasons. Firstly, you can see by the log bellow > that most transactions last < 25 ms: > > | interval | number of transactions | % | > |------------------------+---------------------------+-------------------------| > | 0 ms <= n < 25 ms : 7487644 : 97.704 | > | 25 ms <= n < 50 ms : 137146 : 1.790 | > | 50 ms <= n < 75 ms : 26422 : 0.345 | > | 75 ms <= n < 100 ms : 8086 : 0.106 | > | 100 ms <= n < 200 ms : 4081 : 0.053 | > | 200 ms <= n < 500 ms : 216 : 0.003 | > | 500 ms <= n < 1000 ms : 0 : 0.000 | > > And secondly, very few objects are promoted to old gen. > > *Wolfgang*, what you said about survivor also seems to make sense, but > I ran some tests with survivorRation=8 and survivorRation=16 and the > results were pretty much the same. > > I also collected some data using "sar -B" and vmstat commands in order > to try to find out something else. > > sar -B > > 12:58:33 PM pgpgin/s pgpgout/s fault/s majflt/s > 12:58:43 PM 0.00 5.19 16.98 0.00 > 12:58:53 PM 0.00 6.80 20.70 0.00 > 12:59:03 PM 0.00 12.81 16.72 0.00 > 12:59:13 PM 0.00 3.60 17.98 0.00 > 12:59:23 PM 0.00 14.81 118.42 0.00 > 12:59:33 PM 0.00 11.20 90.70 0.00 > 12:59:43 PM 0.00 5.20 662.60 0.00 (here GC started > to take longer) > 12:59:53 PM 0.00 5.20 1313.10 0.00 > 01:00:03 PM 0.00 20.42 960.66 0.00 > 01:00:13 PM 0.00 17.18 620.78 0.00 > 01:00:23 PM 0.00 3.60 725.93 0.00 > 01:00:33 PM 0.00 15.18 465.13 0.00 > 01:00:33 PM pgpgin/s pgpgout/s fault/s majflt/s > 01:00:43 PM 0.00 12.01 508.31 0.00 > 01:00:53 PM 0.00 6.00 588.50 0.00 > 01:01:03 PM 0.00 20.00 660.80 0.00 > 01:01:13 PM 0.00 6.79 553.05 0.00 > > Page faults start to increase along with the degradation problem, but > I'm not 100% sure about this relation, mainly because there's a lot of > free memory, as vmstat shows bellow. However, I saw some people saying > that page faults may occur even when there is free memory. > > vmstat > > procs -----------memory---------- ---swap-- -----io---- --system-- > -----cpu------ > r b swpd free buff cache si so bi bo in cs us sy id wa st > 34 0 0 10803608 196472 925120 0 0 0 4 64804 109417 > 49 7 45 0 0 > 17 0 0 10802604 196472 925120 0 0 0 14 66130 111493 > 52 7 41 0 0 > 22 0 0 10795060 196472 925120 0 0 0 12 65331 110577 > 49 7 45 0 0 > 20 0 0 10758080 196472 925120 0 0 0 4 65222 111041 > 48 7 45 0 0 > 23 0 0 10712208 196472 925120 0 0 0 7 64759 110016 > 49 7 45 0 0 > 8 0 0 10682828 196472 925140 0 0 0 33 64780 109899 > 49 7 44 0 0 > 17 0 0 10655280 196472 925140 0 0 0 5 64321 109619 > 50 7 44 0 0 > procs -----------memory---------- ---swap-- -----io---- --system-- > -----cpu------ > r b swpd free buff cache si so bi bo in cs us sy id wa st > 17 0 0 10636300 196472 925140 0 0 0 12 64574 108885 > 50 7 44 0 0 > 4 0 0 10614888 196472 925140 0 0 0 5 63384 107379 > 49 7 44 0 0 > 18 0 0 10595172 196472 925140 0 0 0 14 65450 110004 > 50 7 43 0 0 > 28 0 0 10576420 196472 925140 0 0 0 4 64720 109119 > 48 7 45 0 0 > 29 0 0 10554908 196472 925140 0 0 0 25 64051 108606 > 51 7 42 0 0 > 33 0 0 10537584 196472 925140 0 0 0 11 64501 109765 > 50 7 43 0 0 > 24 0 0 10521128 196472 925140 0 0 0 5 64439 109538 > 51 7 42 0 0 > > It seems that vmstat doesn't show anything problematic. > > Any other advice? > > Thanks again. > > > On Wed, Dec 18, 2013 at 5:57 PM, Wolfgang Pedot > > wrote: > > Hi, > > this is the first time I write an answer on this mailing-list so > this could be totally useless but here goes: > > Your survivor-space seems to be quite empty, is the usage that low > on all collects during your test? If so you could increase your > survivor-ratio to gain more eden-space and if not many objects die > in survivor you could also reduce the tenuring threshold. Total > survivor usage has grown 6-fold from first to last GC and survivor > space needs to be copied on each young gc. I admit it should > probably not take that long to copy 60MB though... > > Here is a young-gc from one of my logs for comparison: > > 30230.123: [ParNew > Desired survivor size 524261784 bytes, new threshold 12 (max 15) > - age 1: 113917760 bytes, 113917760 total > - age 2: 86192768 bytes, 200110528 total > - age 3: 59060992 bytes, 259171520 total > - age 4: 59319272 bytes, 318490792 total > - age 5: 45307432 bytes, 363798224 total > - age 6: 29478464 bytes, 393276688 total > - age 7: 27440744 bytes, 420717432 total > - age 8: 27947680 bytes, 448665112 total > - age 9: 27294496 bytes, 475959608 total > - age 10: 32830144 bytes, 508789752 total > - age 11: 7490968 bytes, 516280720 total > - age 12: 10723104 bytes, 527003824 total > - age 13: 4549808 bytes, 531553632 total > : 4306611K->731392K(4388608K), 0.1433810 secs] > 10422356K->6878961K(14116608K) > > This is with MaxNewSize 5500m and a Survivor-Ratio of 8. You can > see that GC-time is higher than yours (6core 3.33GHz Xeon), > survivor-usage is way higher though. > > Hope I could help > Wolfgang > > > Am 18.12.2013 19:58, schrieb Luciano Molinari: > > Hi everybody, > > We have a standalone Java app that receives requests through > RMI and > almost all the objects created by it are short (< ~100ms) > lived objects. > This app is running on a 24 cores server with 16 GB RAM (Red > Hat Linux). > During our performance tests (10k requests/second) we started > to face a > problem where the throughput decreases suddenly just a few minutes > after the app was started. > So, I started to investigate GC behaviour and to make some > adjustments > (increase memory, use CMS...) and now we are able to run our app > properly for about 35 minutes. At this point the time spent > during young > collections grows sharply although no Full GC is executed (old > gen is > only ~4% full). > > I've done tests with many different parameters, but currently > I'm using > the following ones: > java -server -verbose:gc -XX:+PrintGCDetails > -XX:+PrintTenuringDistribution -XX:+PrintGCTimeStamps > -XX:PrintFLSStatistics=1 -XX:SurvivorRatio=4 > -XX:ParallelGCThreads=8 -XX:PermSize=256m -XX:+UseParNewGC > -XX:MaxPermSize=256m -Xms7g -Xmx7g -XX:NewSize=4608m > -XX:MaxNewSize=4608m > -XX:MaxTenuringThreshold=15 > -Dsun.rmi.dgc.client.gcInterval=3600000 > -Dsun.rmi.dgc.server.gcInterval=3600000 > -Djava.rmi.server.hostname=IP_ADDRESS > > If I use this same configuration (without CMS) the same > problem occurs > after 20minutes, so it doesn't seem to be related to CMS. > Actually, as I > mentioned above, CMS (Full GC) isn't executed during the tests. > > Some logs I've collected: > > 1992.748: [ParNew > Desired survivor size 402653184 bytes, new threshold 15 (max 15) > - age 1: 9308728 bytes, 9308728 total > - age 2: 3448 bytes, 9312176 total > - age 3: 1080 bytes, 9313256 total > - age 4: 32 bytes, 9313288 total > - age 5: 34768 bytes, 9348056 total > - age 6: 32 bytes, 9348088 total > - age 15: 2712 bytes, 9350800 total > : 3154710K->10313K(3932160K), 0.0273150 secs] > 3215786K->71392K(6553600K) > > //14 YGC happened during this window > > 2021.165: [ParNew > Desired survivor size 402653184 bytes, new threshold 15 (max 15) > - age 1: 9459544 bytes, 9459544 total > - age 2: 3648200 bytes, 13107744 total > - age 3: 3837976 bytes, 16945720 total > - age 4: 3472448 bytes, 20418168 total > - age 5: 3586896 bytes, 24005064 total > - age 6: 3475560 bytes, 27480624 total > - age 7: 3520952 bytes, 31001576 total > - age 8: 3612088 bytes, 34613664 total > - age 9: 3355160 bytes, 37968824 total > - age 10: 3823032 bytes, 41791856 total > - age 11: 3304576 bytes, 45096432 total > - age 12: 3671288 bytes, 48767720 total > - age 13: 3558696 bytes, 52326416 total > - age 14: 3805744 bytes, 56132160 total > - age 15: 3429672 bytes, 59561832 total > : 3230658K->77508K(3932160K), 0.1143860 secs] > 3291757K->142447K(6553600K) > > Besides the longer time to perform collection, I also realized > that all > 15 ages started to have larger values. > > I must say I'm a little confused about this scenario. Does > anyone have > some tip? > > Thanks in advance, > -- > Luciano > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > > > > > -- > Luciano Davoglio Molinari > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20131219/02f92f9b/attachment.html From yaoshengzhe at gmail.com Thu Dec 19 14:16:06 2013 From: yaoshengzhe at gmail.com (yao) Date: Thu, 19 Dec 2013 14:16:06 -0800 Subject: G1 GC clean up time is too long Message-ID: Hi All, We have a real time application build on top of HBase (a distributed big data store), our original CMS GC works great most of time with only 0 - 2 Full GCs per node everyday. This is good performance considering our read/write traffic. Now, it's java 7 and we want to see the power of G1 GC. We tried G1 GC on two of our production but did not see a big improvement over CMS and we would like to hear feedbacks from you if there are any G1 GC tuning chances. Here is our observation: G1 GC cleanup time is around 0.75 secs and there are also some mix pauses taking more than 5 secs when application is in a stable state*. *Due to these high STW pauses, we believe it is the reason why G1 doesn't do better job than CMS (their performance is pretty close). *Here is the java 7 version we use on our G1 experiment machines:* java version "1.7.0_40" Java(TM) SE Runtime Environment (build 1.7.0_40-b43) Java HotSpot(TM) 64-Bit Server VM (build 24.0-b56, mixed mode) and *G1 parameters*: -XX:+UnlockExperimentalVMOptions -XX:+UseG1GC -Xmx67714m -Xms67714m -server -XX:MaxGCPauseMillis=150 -XX:G1HeapRegionSize=32m -XX:InitiatingHeapOccupancyPercent=30 -XX:G1ReservePercent=20 Explanations for some G1 params we set * G1HeapRegionSize=32m* We have some map/reduce jobs reading from HBase everyday and HBase will send data in a batch fashion to mapper tasks; therefore, a larger temporary object (up to 15MB) will be created during that time and that's why we set region size to 32MB to avoid humongous allocation. *InitiatingHeapOccupancyPercent=30 and * *G1ReservePercent=20* We've observed a frequent Full GC (every one hour) before explicitly setting these two parameters. After reading from various GC tuning posts, we decided to reduce InitiatingHeapOccupancyPercent to 30 and increase G1ReservePercent to 20, this reduce full GC to about 3~5 times per day per node. Given above background, we are still keep working on G1 GC tuning and hope to beat CMS on our application. For now, G1 is no worse than CMS on our production machines except full GC happens a little more on G1 machines. Here is the Pause time (STW time, including: initial mark, young, mix and clean up) graph, *x-axis*: gc start time, *y-axis*: pause time in milliseconds. Pauses larger than 2000 milliseconds are due to mix pause and some spikes are due to GC cleanup time. [image: Inline image 1] CMS vs G1, Red Curve is CMS and Green Curve is G1, *x-axis*: gc start time, *y-axis*: read latency 99 percentile in milliseconds [image: Inline image 1] A close look, CMS vs G1, Red Curve is CMS and Green Curve is G1, *x-axis*: gc start time, *y-axis*: read latency 99 percentile in milliseconds [image: Inline image 2] *Typical G1 long clean up log* 2013-12-19T14:37:48.949-0500: 926630.478: [GC pause (young) (initial-mark) Desired survivor size 268435456 bytes, new threshold 3 (max 15) - age 1: 95847832 bytes, 95847832 total - age 2: 101567360 bytes, 197415192 total - age 3: 102072520 bytes, 299487712 total 926630.478: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 32605, predicted base time: 49.35 ms, remaining time: 100.65 ms, target pause time: 150\ .00 ms] 926630.478: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 107 regions, survivors: 14 regions, predicted young region time: 47.90 ms] 926630.479: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 107 regions, survivors: 14 regions, old: 0 regions, predicted pause time: 97.25 ms, target pause\ time: 150.00 ms] , 0.1023900 secs] [Parallel Time: 74.1 ms, GC Workers: 18] [GC Worker Start (ms): Min: 926630478.9, Avg: 926630479.3, Max: 926630479.6, Diff: 0.7] [Ext Root Scanning (ms): Min: 4.4, Avg: 4.9, Max: 6.5, Diff: 2.2, Sum: 88.3] [Update RS (ms): Min: 10.6, Avg: 12.3, Max: 18.7, Diff: 8.1, Sum: 221.4] [Processed Buffers: Min: 10, Avg: 18.7, Max: 25, Diff: 15, Sum: 337] [Scan RS (ms): Min: 5.8, Avg: 12.1, Max: 12.9, Diff: 7.1, Sum: 218.2] [Object Copy (ms): Min: 43.0, Avg: 43.4, Max: 43.7, Diff: 0.8, Sum: 781.3] [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.5] [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.8] [GC Worker Total (ms): Min: 72.4, Avg: 72.8, Max: 73.1, Diff: 0.7, Sum: 1310.4] [GC Worker End (ms): Min: 926630552.0, Avg: 926630552.1, Max: 926630552.1, Diff: 0.1] [Code Root Fixup: 0.0 ms] [Clear CT: 11.1 ms] [Other: 17.2 ms] [Choose CSet: 0.1 ms] [Ref Proc: 7.9 ms] [Ref Enq: 1.8 ms] [Free CSet: 0.6 ms] [Eden: 3424.0M(3424.0M)->0.0B(3328.0M) Survivors: 448.0M->416.0M Heap: 51.9G(66.2G)->48.7G(66.2G)] [Times: user=1.46 sys=0.00, real=0.10 secs] 2013-12-19T14:37:49.051-0500: 926630.581: [GC concurrent-root-region-scan-start] 2013-12-19T14:37:49.064-0500: 926630.593: [GC concurrent-root-region-scan-end, 0.0126390 secs] 2013-12-19T14:37:49.064-0500: 926630.593: [GC concurrent-mark-start] 2013-12-19T14:37:55.287-0500: 926636.816: [GC concurrent-mark-end, 6.2228410 secs] 2013-12-19T14:37:55.291-0500: 926636.821: [GC remark 2013-12-19T14:37:55.295-0500: 926636.824: [GC ref-proc, 0.0086540 secs], 0.0767770 secs] [Times: user=1.09 sys=0.00, real=0.08 secs] 2013-12-19T14:37:55.375-0500: 926636.905: [*GC cleanup 50G->50G(66G), 0.7325340 secs*] [Times: user=12.98 sys=0.00, real=0.73 secs] *Typical G1 Full GC log*2013-12-19T14:23:13.558-0500: 925755.088: [GC pause (mixed) Desired survivor size 234881024 bytes, new threshold 4 (max 15) - age 1: 85160928 bytes, 85160928 total - age 2: 71237536 bytes, 156398464 total - age 3: 70904176 bytes, 227302640 total - age 4: 78106288 bytes, 305408928 total 925755.088: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 43149, predicted base time: 49.67 ms, remaining time: 100.33 ms, target pause time: 150.00 ms] 925755.088: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 92 regions, survivors: 13 regions, predicted young region time: 35.76 ms] 925755.094: [G1Ergonomics (CSet Construction) finish adding old regions to CSet, reason: predicted time is too high, predicted time: 4.02 ms, remaining time: 0.00 ms, old: 89 regions, min: 89 regions] 925755.094: [G1Ergonomics (CSet Construction) added expensive regions to CSet, reason: old CSet region num not reached min, old: 89 regions, expensive: 64 regions, min: 89 regions, remaining time: 0.00 ms] 925755.094: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 92 regions, survivors: 13 regions, old: 89 regions, predicted pause time: 418.65 ms, target pause time: 150.00 ms] 925755.459: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: region allocation request failed, allocation request: 4007128 bytes] 925755.459: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 4007128 bytes, attempted expansion amount: 33554432 bytes] 925755.459: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap expansion operation failed] 925756.116: [G1Ergonomics (Concurrent Cycles) do not request concurrent cycle initiation, reason: still doing mixed collections, occupancy: 70598524928 bytes, allocation request: 0 bytes, threshold: 21310419750 bytes (30.00 %), source: end of GC] 925756.116: [G1Ergonomics (Mixed GCs) continue mixed GCs, reason: candidate old regions available, candidate old regions: 617 regions, reclaimable: 9897360192 bytes (13.93 %), threshold: 10.00 %] * (to-space exhausted), 1.0285640 secs]* [Parallel Time: 821.1 ms, GC Workers: 18] [GC Worker Start (ms): Min: 925755094.0, Avg: 925755094.4, Max: 925755094.7, Diff: 0.7] [Ext Root Scanning (ms): Min: 4.6, Avg: 5.1, Max: 6.5, Diff: 1.8, Sum: 92.7] [Update RS (ms): Min: 12.4, Avg: 15.6, Max: 27.8, Diff: 15.3, Sum: 280.1] [Processed Buffers: Min: 2, Avg: 22.6, Max: 32, Diff: 30, Sum: 406] [Scan RS (ms): Min: 168.6, Avg: 180.7, Max: 184.4, Diff: 15.8, Sum: 3252.4] [Object Copy (ms): Min: 616.5, Avg: 618.1, Max: 618.9, Diff: 2.5, Sum: 11125.9] [Termination (ms): Min: 0.0, Avg: 0.3, Max: 0.6, Diff: 0.5, Sum: 5.3] [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.5] [GC Worker Total (ms): Min: 819.5, Avg: 819.8, Max: 820.2, Diff: 0.7, Sum: 14756.9] [GC Worker End (ms): Min: 925755914.2, Avg: 925755914.2, Max: 925755914.2, Diff: 0.1] [Code Root Fixup: 0.0 ms] [Clear CT: 13.5 ms] [Other: 194.0 ms] [Choose CSet: 5.6 ms] [Ref Proc: 16.9 ms] [Ref Enq: 0.9 ms] [Free CSet: 4.3 ms] * [Eden: 2944.0M(2944.0M)->0.0B(2944.0M) Survivors: 416.0M->416.0M Heap: 64.3G(66.2G)->65.9G(66.2G)]* * [Times: user=8.96 sys=0.21, real=1.03 secs]* 925756.121: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: allocation request failed, allocation request: 17960 bytes] 925756.121: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 33554432 bytes, attempted expansion amount: 33554432 bytes] 925756.121: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap expansion operation failed] *2013-12-19T14:23:14.592-0500: 925756.121: [Full GC 65G->43G(66G), 45.3188580 secs]* [Eden: 0.0B(2944.0M)->0.0B(9248.0M) Survivors: 416.0M->0.0B Heap: 65.9G(66.2G)->43.1G(66.2G)] [Times: user=88.57 sys=0.81, real=45.31 secs] Thanks Shengzhe -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20131219/d4a010d0/attachment-0001.html -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 26874 bytes Desc: not available Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20131219/d4a010d0/attachment-0003.png -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 16958 bytes Desc: not available Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20131219/d4a010d0/attachment-0004.png -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 13653 bytes Desc: not available Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20131219/d4a010d0/attachment-0005.png From ysr1729 at gmail.com Fri Dec 20 00:54:28 2013 From: ysr1729 at gmail.com (Srinivas Ramakrishna) Date: Fri, 20 Dec 2013 00:54:28 -0800 Subject: How CMS interleave with ParNew In-Reply-To: References: Message-ID: Hi G. Zheng --- On Wed, Dec 18, 2013 at 11:53 PM, Guoqin Zheng wrote: > Hey folks, > > While search for how CMS works in detail, I happened to an article saying > usuablly before the inital-mark phase and remark phase, there is young gen > collection happening and the initial-mark/remark also scan young gen space. > So > > 1. Can you help me understand how the ParNew collection work with CMS > to improve the performance. > > ParNew collections are minor collections. CMS collects just the old gen and optionally the perm gen. A concurrent CMS collection is "interrupted" by ParNew minor collections. > > 1. > 2. Why the young gen space needs to be scanned > > Since CMS scans and collects just the old gen and perm gen, it treats the young gen as a source of roots, and scans the young gen in its entirety. > > 1. If the remark takes very long, what does that mean? > > It typically means that there is a lot of concurrent mutation happening in the old gen (perhaps as a result of objects being promoted rapidly into the old gen). In this case, the concurrent scanning and precleaning of the CMS generation (which is designed to catch up with mutations in the old gen) isn't able to keep up with the rate of mutations happening there. > > 1. Finally, is there a official/or good docs talking about CMS in > detail? > > The paper by Printezis and Detlefs in ISMM 2000 was the starting point of the HotSpot CMS implementation, with a few modifications along the way. HTHS. -- ramki > Thanks, > > G. Zheng > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20131220/17869384/attachment.html From thomas.schatzl at oracle.com Fri Dec 20 01:07:07 2013 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Fri, 20 Dec 2013 10:07:07 +0100 Subject: G1 GC clean up time is too long In-Reply-To: References: Message-ID: <1387530427.2736.36.camel@cirrus> Hi, On Thu, 2013-12-19 at 14:16 -0800, yao wrote: > Hi All, > > We have a real time application build on top of HBase (a distributed > big data store), our original CMS GC works great most of time with > only 0 - 2 Full GCs per node everyday. This is good performance > considering our read/write traffic. Now, it's java 7 and we want to > see the power of G1 GC. We tried G1 GC on two of our production but > did not see a big improvement over CMS and we would like to hear > feedbacks from you if there are any G1 GC tuning chances. Thanks for trying G1 :) Expect lots of perf improvements in the future. > > Here is our observation: G1 GC cleanup time is around 0.75 secs and > there are also some mix pauses taking more than 5 secs when > application is in a stable state. Due to these high STW pauses, we > believe it is the reason why G1 doesn't do better job than CMS (their > performance is pretty close). > > Here is the java 7 version we use on our G1 experiment machines: > java version "1.7.0_40" > Java(TM) SE Runtime Environment (build 1.7.0_40-b43) > Java HotSpot(TM) 64-Bit Server VM (build 24.0-b56, mixed > > mode) > and G1 parameters: > > -XX:+UnlockExperimentalVMOptions > -XX:+UseG1GC > -Xmx67714m > -Xms67714m > -server > -XX:MaxGCPauseMillis=150 > -XX:G1HeapRegionSize=32m > -XX:InitiatingHeapOccupancyPercent=30 > -XX:G1ReservePercent=20 > > > Explanations for some G1 params we set > > G1HeapRegionSize=32m > We have some map/reduce jobs reading from HBase everyday > and HBase will send data in a batch fashion to mapper tasks; > therefore, a larger temporary object (up to 15MB) will be created > during that time and that's why we set region size to 32MB to avoid > humongous allocation. How frequent are those large objects, and how many of these large objects do you expect to have at the same time relative to live heap size? G1 can handle a few humongous objects well, but only degrades with larger amounts. If the humongous object survives a few collections, it's better to try to keep them as part of the old generation. Particularly if marking is frequent, they will be collected as part of that. Note that I think the default heap region size for your 66G heap is 32MB anyway (at least in recent jdk8, not sure if that change has been backported to 7). > InitiatingHeapOccupancyPercent=30 and G1ReservePercent=20 > > We've observed a frequent Full GC (every one hour) before > explicitly setting these two parameters. After reading from various GC > tuning posts, we decided to reduce > InitiatingHeapOccupancyPercent to 30 and increase G1ReservePercent to > 20, this reduce full GC to about 3~5 times per day per node. InitiatingHeapOccupancyPercent should (as a rule of thumb) be set to a value slightly higher than average live data, so that marking is not running constantly, burning CPU cycles. There may be situations where this is appropriate, e.g. really tight heaps. E.g. given the Full GC log output of >2013-12-19T14:23:14.592-0500: 925756.121: [Full GC 65G->43G(66G), >45.3188580 secs] > [Eden: 0.0B(2944.0M)->0.0B(9248.0M) Survivors: 416.0M->0.0B Heap: > 65.9G(66.2G)->43.1G(66.2G)] > This would mean a value of at least around 66 (43G/65G), typically at least a few percent higher. G1ReservePercent acts as a buffer for promotion spikes, so if you have a load that has frequent promotion spikes (and a tight heap), this is the way to handle them. > > Given above background, we are still keep working on G1 GC tuning and > hope to beat CMS on our application. For now, G1 is no worse than CMS > on our production machines except full GC happens a little more on G1 > machines. > > Here is the Pause time (STW time, including: initial mark, young, mix > and clean up) graph, x-axis: gc start time, y-axis: pause time in > milliseconds. There will be many optimizations in the future :) > Pauses larger than 2000 milliseconds are due to mix pause and some > spikes are due to GC cleanup time. > Typical G1 long clean up log > > 2013-12-19T14:37:55.375-0500: 926636.905: [GC cleanup 50G->50G(66G), > 0.7325340 secs] > [Times: user=12.98 sys=0.00, real=0.73 secs] Hmmm. Could you run with -XX:+UnlockDiagnosticVMOptions -XX: +G1SummarizeConcMark that prints some information about the phases of the GC Cleanup (unfortunately only at VM exit if I read the current code correctly)? > > Typical G1 Full GC log I guess it is known by now that full GC and the way there (to-space-exhaustion of any kind) are too slow. ;) The only way to fix this is to try to avoid full GC altogether. The main knobs to turn here are G1MixedGCLiveThresholdPercent, G1HeapWastePercent. G1MixedGCLiveThresholdPercent indicates at what region occupancy an old generation region is even considered for collection. The default value is 65%, i.e. G1 only considers regions that are less than 65% occupied for evacuation. Assuming you have a live set of 43G (I just took the heap size at the end of the Full GC), in case of a heap that is evenly fragmented, you are set up for an average heap size of 66G (43G/0.65). Which is exactly what your maximum heap size is, which means that you are bound to have full GCs (if your heap is "perfectly" fragmented) in the long term with that setting. The other variable is G1HeapWastePercent: it indicates how much wasted space G1 can leave without collecting it after considering all candidates (selected above). That's the 10% threshold in the > 925756.116: [G1Ergonomics (Mixed GCs) continue mixed GCs, reason: >candidate old regions available, candidate old regions: 617 regions, >reclaimable: 9897360192 bytes (13.93 %), threshold: 10.00 %] message. So in the worst case, the 10% means that G1 leaves untouched at most 10% of the heap of 65% occupied regions. I.e. at a region size of 32M and a max heap of 66G, it leaves 6.6G/.65 = 10.1G = (rounded to regions) ~320 regions. So in this case, your target old gen heap size is around 76G (43G/.65 + (66G*0.1)/.65) - which is beyond the max heap, not even counting some space for the young gen (eden and survivor) and additional safety buffer for promotion spikes. My suggestion is to increase G1MixedGCLiveThresholdPercent and decrease G1HeapWastePercent appropriately to avoid mixed GCs with to-space exhaustion and full GCs. Note that these are worst-case calculations of course, but they explain G1 running into Full GCs. Depending on how much static and dense data you have, you can modify these values appropriately. This rule-of-thumb calculation should give you a stable starting point in most cases though. > 2013-12-19T14:23:13.558-0500: 925755.088: [GC pause (mixed) >[...] > 925756.116: [G1Ergonomics (Concurrent Cycles) do not request > concurrent cycle initiation, reason: still doing mixed collections, > occupancy: 70598524928 bytes, allocation request: 0 bytes, threshold: > 21310419750 bytes (30.00 %), source: end of GC] > 925756.116: [G1Ergonomics (Mixed GCs) continue mixed GCs, reason: > candidate old regions available, candidate old regions: 617 regions, > reclaimable: 9897360192 bytes (13.93 %), threshold: 10.00 %] > (to-space exhausted), 1.0285640 secs] > [Parallel Time: 821.1 ms, GC Workers: 18] > [GC Worker Start (ms): Min: 925755094.0, Avg: 925755094.4, Max: > 925755094.7, Diff: 0.7] > [Ext Root Scanning (ms): Min: 4.6, Avg: 5.1, Max: 6.5, Diff: > 1.8, Sum: 92.7] > [Update RS (ms): Min: 12.4, Avg: 15.6, Max: 27.8, Diff: 15.3, > Sum: 280.1] > [Processed Buffers: Min: 2, Avg: 22.6, Max: 32, Diff: 30, > Sum: 406] > [Scan RS (ms): Min: 168.6, Avg: 180.7, Max: 184.4, Diff: 15.8, > Sum: 3252.4] > [Object Copy (ms): Min: 616.5, Avg: 618.1, Max: 618.9, Diff: > 2.5, Sum: 11125.9] > [Termination (ms): Min: 0.0, Avg: 0.3, Max: 0.6, Diff: 0.5, Sum: > 5.3] > [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, > Sum: 0.5] > [GC Worker Total (ms): Min: 819.5, Avg: 819.8, Max: 820.2, Diff: > 0.7, Sum: 14756.9] > [GC Worker End (ms): Min: 925755914.2, Avg: 925755914.2, Max: > 925755914.2, Diff: 0.1] > [Code Root Fixup: 0.0 ms] > [Clear CT: 13.5 ms] > [Other: 194.0 ms] Just fyi, the missing time in this phase is taken by some additional fixup of the to-space-exhaustion. > [Choose CSet: 5.6 ms] > [Ref Proc: 16.9 ms] > [Ref Enq: 0.9 ms] > [Free CSet: 4.3 ms] Hth, Thomas From ysr1729 at gmail.com Fri Dec 20 01:26:37 2013 From: ysr1729 at gmail.com (Srinivas Ramakrishna) Date: Fri, 20 Dec 2013 01:26:37 -0800 Subject: YGC time increasing suddenly In-Reply-To: References: Message-ID: Hi Luciano -- Look at the rate of promotion. It sounds as if the rate of promotion and of survival suddenly jumps up where you start seeing longer pauses. If you believe the promotion and survivor volume should not increase like that, one possibility is "nepotism", where objects that may have been prematurely promoted and then died, cause the objects they are referring to, to be kept alive. Try to do the following experiment. (1) In the one case, let the system run as it has always run. Notice the increase in young gc time. Let the system continue to run until a full gc occurs, and see if the young gc time drops (2) In a second case, let the system run util you see the increased young gc times. At this time induce an explicit full gc (via jconsole) and see if the young gc times fall. (3) Repeat (2), but with -XX:+ExplicitGCInvokesConcurrent Finally, one question, do you see the increased in young gc times after an explicit full gc? Or are you saying that the minor gc at 1992 seconds that you show below is the last "quick" minor GC, and the next one onwards is slow? If you share a complete GC log, we might be able to help. The age 15 cohort in the last quick minor GC probably holds the clue and once promoted, allows nepotism to set in. If that is the case, then -XX:+NeverPromote (if the flag still exists) might be worth experimenting with (although it carries its own dangers, and isn't really a fix, just a means to figuring out If this might be the issue). Basically, I think something like a member of a linked list is getting into the old generation before it does and then proceeds to keep its successor alive, and so on. A full gc may or may not fix that kind of issue. You could try to clear the links of list members when you pull them out of the list and that might reduce the occurrence of such phenomena. I vaguely recall this behaviour with LinkedBlockingQueue's because of which Doug Lea was forced to hold his nose and clear the references when removing members from the list to prevent this kind of false retention. (Ah, I found a reference: http://thread.gmane.org/gmane.comp.java.jsr.166-concurrency/5758 and https://bugs.openjdk.java.net/browse/JDK-6805775 ) -- ramki On Wed, Dec 18, 2013 at 10:58 AM, Luciano Molinari wrote: > Hi everybody, > > We have a standalone Java app that receives requests through RMI and > almost all the objects created by it are short (< ~100ms) lived objects. > This app is running on a 24 cores server with 16 GB RAM (Red Hat Linux). > During our performance tests (10k requests/second) we started to face a > problem where the throughput decreases suddenly just a few minutes > after the app was started. > So, I started to investigate GC behaviour and to make some adjustments > (increase memory, use CMS...) and now we are able to run our app > properly for about 35 minutes. At this point the time spent during young > collections grows sharply although no Full GC is executed (old gen is only > ~4% full). > > I've done tests with many different parameters, but currently I'm using > the following ones: > java -server -verbose:gc -XX:+PrintGCDetails > -XX:+PrintTenuringDistribution -XX:+PrintGCTimeStamps > -XX:PrintFLSStatistics=1 -XX:SurvivorRatio=4 > -XX:ParallelGCThreads=8 -XX:PermSize=256m -XX:+UseParNewGC > -XX:MaxPermSize=256m -Xms7g -Xmx7g -XX:NewSize=4608m -XX:MaxNewSize=4608m > -XX:MaxTenuringThreshold=15 -Dsun.rmi.dgc.client.gcInterval=3600000 > -Dsun.rmi.dgc.server.gcInterval=3600000 > -Djava.rmi.server.hostname=IP_ADDRESS > > If I use this same configuration (without CMS) the same problem occurs > after 20minutes, so it doesn't seem to be related to CMS. Actually, as I > mentioned above, CMS (Full GC) isn't executed during the tests. > > Some logs I've collected: > > 1992.748: [ParNew > Desired survivor size 402653184 bytes, new threshold 15 (max 15) > - age 1: 9308728 bytes, 9308728 total > - age 2: 3448 bytes, 9312176 total > - age 3: 1080 bytes, 9313256 total > - age 4: 32 bytes, 9313288 total > - age 5: 34768 bytes, 9348056 total > - age 6: 32 bytes, 9348088 total > - age 15: 2712 bytes, 9350800 total > : 3154710K->10313K(3932160K), 0.0273150 secs] 3215786K->71392K(6553600K) > > //14 YGC happened during this window > > 2021.165: [ParNew > Desired survivor size 402653184 bytes, new threshold 15 (max 15) > - age 1: 9459544 bytes, 9459544 total > - age 2: 3648200 bytes, 13107744 total > - age 3: 3837976 bytes, 16945720 total > - age 4: 3472448 bytes, 20418168 total > - age 5: 3586896 bytes, 24005064 total > - age 6: 3475560 bytes, 27480624 total > - age 7: 3520952 bytes, 31001576 total > - age 8: 3612088 bytes, 34613664 total > - age 9: 3355160 bytes, 37968824 total > - age 10: 3823032 bytes, 41791856 total > - age 11: 3304576 bytes, 45096432 total > - age 12: 3671288 bytes, 48767720 total > - age 13: 3558696 bytes, 52326416 total > - age 14: 3805744 bytes, 56132160 total > - age 15: 3429672 bytes, 59561832 total > : 3230658K->77508K(3932160K), 0.1143860 secs] 3291757K->142447K(6553600K) > > Besides the longer time to perform collection, I also realized that all 15 > ages started to have larger values. > > I must say I'm a little confused about this scenario. Does anyone have > some tip? > > Thanks in advance, > -- > Luciano > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20131220/90b6f3c2/attachment.html From thomas.schatzl at oracle.com Fri Dec 20 02:13:13 2013 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Fri, 20 Dec 2013 11:13:13 +0100 Subject: G1 GC clean up time is too long In-Reply-To: <1387530427.2736.36.camel@cirrus> References: <1387530427.2736.36.camel@cirrus> Message-ID: <1387534393.2736.43.camel@cirrus> Some correction I think: On Fri, 2013-12-20 at 10:07 +0100, Thomas Schatzl wrote: > Hi, > > On Thu, 2013-12-19 at 14:16 -0800, yao wrote: > > Hi All, > > > The other variable is G1HeapWastePercent: it indicates how much wasted > space G1 can leave without collecting it after considering all > candidates (selected above). > > That's the 10% threshold in the > > > 925756.116: [G1Ergonomics (Mixed GCs) continue mixed GCs, reason: > >candidate old regions available, candidate old regions: 617 regions, > >reclaimable: 9897360192 bytes (13.93 %), threshold: 10.00 %] > > message. > > So in the worst case, the 10% means that G1 leaves untouched at most 10% > of the heap of 65% occupied regions. > > I.e. at a region size of 32M and a max heap of 66G, it leaves 6.6G/.65 = > 10.1G = (rounded to regions) ~320 regions. > > So in this case, your target old gen heap size is around 76G (43G/.65 + > (66G*0.1)/.65) - which is beyond the max heap, not even counting some > space for the young gen (eden and survivor) and additional safety buffer > for promotion spikes. I do not think (without looking at the code again) you need to add the space occupied after all mixed gcs to the maximum old gen heap size. At some point it seems I got that calculation wrong... Anyway, the summary of all this is that default G1MixedGCLiveThresholdPercent value seems to be too low for this live heap (assuming that it is typically 43G). I.e. the resulting old gen heap size is typically determined by the G1MixedGCLiveThresholdPercent only. Note that this assumes that there are results from marking available right when the old gen reaches this object distribution - so depending on when marking starts and ends, and the first mixed GC starts, more space may be required in reality. Thomas From yu.zhang at oracle.com Fri Dec 20 11:08:05 2013 From: yu.zhang at oracle.com (YU ZHANG) Date: Fri, 20 Dec 2013 11:08:05 -0800 Subject: G1 GC clean up time is too long In-Reply-To: <1387534393.2736.43.camel@cirrus> References: <1387530427.2736.36.camel@cirrus> <1387534393.2736.43.camel@cirrus> Message-ID: <52B49595.2010108@oracle.com> Yao, Thanks for the logs. I agree with Thomas' analysis. In the gc log, there are lot of 'reclaimable percentage not over threshold' Please try -XX:G1HeapWastePercent=5% It should add more old regions for mixed gc, also triggers more mixed gc after concurrent gcs. And -XX:InitiatingHeapOccupancyPercent=65 I think these 2 should help getting rid of to-space exhausted and full gc. You have only 8 to-space exhausted. If this is not good enough, you can try increasing G1MixedGCLiveThresholdPercent (add more old regions to mixed gc) or decreasing G1HeapWastePercent(not much allocation spikes, so give more rooms to gc) Thanks, Jenny On 12/20/2013 2:13 AM, Thomas Schatzl wrote: > Some correction I think: > > On Fri, 2013-12-20 at 10:07 +0100, Thomas Schatzl wrote: >> Hi, >> >> On Thu, 2013-12-19 at 14:16 -0800, yao wrote: >>> Hi All, >>> >> The other variable is G1HeapWastePercent: it indicates how much wasted >> space G1 can leave without collecting it after considering all >> candidates (selected above). >> >> That's the 10% threshold in the >> >>> 925756.116: [G1Ergonomics (Mixed GCs) continue mixed GCs, reason: >>> candidate old regions available, candidate old regions: 617 regions, >>> reclaimable: 9897360192 bytes (13.93 %), threshold: 10.00 %] >> message. >> >> So in the worst case, the 10% means that G1 leaves untouched at most 10% >> of the heap of 65% occupied regions. >> >> I.e. at a region size of 32M and a max heap of 66G, it leaves 6.6G/.65 = >> 10.1G = (rounded to regions) ~320 regions. >> >> So in this case, your target old gen heap size is around 76G (43G/.65 + >> (66G*0.1)/.65) - which is beyond the max heap, not even counting some >> space for the young gen (eden and survivor) and additional safety buffer >> for promotion spikes. > I do not think (without looking at the code again) you need to add the > space occupied after all mixed gcs to the maximum old gen heap size. At > some point it seems I got that calculation wrong... > > Anyway, the summary of all this is that default > G1MixedGCLiveThresholdPercent value seems to be too low for this live > heap (assuming that it is typically 43G). > > I.e. the resulting old gen heap size is typically determined by the > G1MixedGCLiveThresholdPercent only. Note that this assumes that there > are results from marking available right when the old gen reaches this > object distribution - so depending on when marking starts and ends, and > the first mixed GC starts, more space may be required in reality. > > Thomas > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From lucmolinari at gmail.com Fri Dec 20 11:21:33 2013 From: lucmolinari at gmail.com (Luciano Molinari) Date: Fri, 20 Dec 2013 17:21:33 -0200 Subject: YGC time increasing suddenly In-Reply-To: References:

Message-ID: Hello everybody, *Bernd*, Yes, yesterday I found out that my system is NUMA: # numactl --hardware available: 2 nodes (0-1) node 0 size: 8056 MB node 0 free: 4312 MB node 1 size: 8080 MB node 1 free: 1543 MB node distances: node 0 1 0: 10 20 1: 20 10 Until today I wasn't using huge pages, but today I ran a test locking my app in only one NUMA node (Wolfgang's tip) and using huge pages. As the app has less resources in this case, I decreased the request rate to 7k/s. The app ran properly for about ~35min and then the same problem appeared. What's the best way to setup the JVM considering NUMA?Disable NUMA?Run JVM with -XX:+UseNUMA?I'll run a test with this parameter. I collected the number the number of threads created since VM start: 1451. However, the number of live threads was kept at about 300 threads. *Jon*, I ran a test using -XX:+PrintReferenceGC parameter and there's a link to the GC output in the end of this e-mail. I couldn't find anything too weird..maybe I didn't analyze it properly. *Srinivas*, Yes, as soon as I start to see longer pauses the promotion rate rapidly increases. As it was pointed out in one of the previous e-mail and I checked today using JConsole, survivor space is almost empty while YGC time is OK, with just a few MB in use. But when YGC time increase, survivor also starts to become full, which must explain the increasing promotion rate. Both young gc and promotion rate doesn't drop after full gc. In the log you commented the GC that took place at 1992 seconds was the last "quick" minor GC. Bellow there's link to the gc log and jstat with the whole information. I'll try to test -XX:+NeverPromote. *GC/Jstat Output* https://drive.google.com/file/d/0B-jeSacHbFsJQ3VUZHlLalRuQzA/edit?usp=sharing *Print GC Jconsole* https://drive.google.com/file/d/0B-jeSacHbFsJcTFnb1pxMENVYzg/edit?usp=sharing I really appreciate the support and time of all of you. Regards, -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20131220/3f9d17ee/attachment-0001.html From daubman at gmail.com Fri Dec 20 12:27:29 2013 From: daubman at gmail.com (Aaron Daubman) Date: Fri, 20 Dec 2013 15:27:29 -0500 Subject: YGC time increasing suddenly In-Reply-To: References:

Message-ID: Luciano, Until today I wasn't using huge pages, but today I ran a test locking my > app in only one NUMA node (Wolfgang's tip) and using > huge pages. As the app has less resources in this case, I decreased the > request rate to 7k/s. The app ran properly for > about ~35min and then the same problem appeared. > Interesting that I ran the exact same tests last night - enabling hugepages and shared memory on a NUMA system and then binding the JVM to a single socket. Apologies if all this is already common knowledge (i've had to do a lot of reading on this recently at least ;-)) Note that hugepage allocation must be done post-boot in order to actually allocate the desired number of hugepages on the socket you will be working with. I decided to try out the hugeadm utility, and found this to work for me: /usr/bin/numactl --cpunodebind=0 --membind=0 hugeadm --pool-pages-min 2M:12G --obey-mempolicy --pool-pages-max 2M:12G --create-group-mounts --set-recommended-shmmax --set-shm-group echonest --set-recommended-min_free_kbytes This would allocate 12G of 2M hugepages on socket0. Note that if you use /etc/grub.conf or even sysctl to allocate hugepages, they will be evenly split among sockets, and so you will have less usable than desired if locking a process to a single socket. I found I was actually able to handle slightly higher load at similar response times using half the systems resources, so I am essentially wasting money on this dual-socket system =( Some reading I found useful for this: https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt "Interaction of Task Memory Policy with Huge Page Allocation/Freeing" https://www.kernel.org/doc/Documentation/sysctl/vm.txt https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Performance_Tuning_Guide/main-cpu.html#s-cpu-tuning "4.1.2.2. Controlling NUMA Policy with numactl" http://users.sdsc.edu/~glockwood/comp/affinity.php#define:numactl on hugeadm: https://lwn.net/Articles/376606/ I verified this was working as-desired by: # cat /sys/devices/system/node/node*/meminfo | fgrep Huge Node 0 HugePages_Total: 6144 Node 0 HugePages_Free: 1541 Node 0 HugePages_Surp: 0 Node 1 HugePages_Total: 0 Node 1 HugePages_Free: 0 Node 1 HugePages_Surp: 0 > What's the best way to setup > the JVM considering NUMA?Disable NUMA?Run JVM with -XX:+UseNUMA?I'll run a > test with this parameter. > +UseNUMA should definitely help if you are going to run on a multi-socket NUMA system and if you (IIRC default enabled for your config) enable +UseParallelGC: http://docs.oracle.com/javase/7/docs/technotes/guides/vm/performance-enhancements-7.html#numa As a side question, I am trying to stick with G1, which still lacks NUMA compatibility, does anybody know why this still is or if it will be coming anytime soon? https://bugs.openjdk.java.net/browse/JDK-7005859 http://openjdk.java.net/jeps/157 Another side question that may potentially be useful: I have found that I need to add -XX:+UseSHM in addition to -XX:+UseLargePages in order to actually use LargePages - does this make sense / is this expected? I could find very little documentation / reference to UseSHM. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20131220/041938f1/attachment.html From ysr1729 at gmail.com Fri Dec 20 16:38:31 2013 From: ysr1729 at gmail.com (Srinivas Ramakrishna) Date: Fri, 20 Dec 2013 16:38:31 -0800 Subject: YGC time increasing suddenly In-Reply-To: References:

Message-ID: Hi Luciano -- On Fri, Dec 20, 2013 at 11:14 AM, Luciano Molinari wrote: > ... > Yes, as soon as I start to see longer pauses the promotion rate rapidly > increases. > > As it was pointed out in one of the previous e-mail and I checked today > using JConsole, survivor space is almost empty while YGC time is OK, with > just a few MB in use. But when YGC time increase, survivor also starts to > become full, which must explain the increasing promotion rate. > > Both young gc and promotion rate doesn't drop after full gc. In the log > you commented the GC that took place at 1992 seconds was the last "quick" > minor GC. I attached a gc log and jstat to this e-mail with the whole > information. > > I'll try to test -XX:+NeverPromote. > Thanks for the complete logs. I am pretty sure NeverPromote won't help at all. As you can see there's almost a phase change in the survival behaviour of your program at around 2290 - 2291 s, where survival rate suddenly sky-rockets. Since the behaviour appears to be very deterministic, I'd venture that perhaps there's something happening in your program at that point which causes many more objects to be retained. Perhaps this is a leak of some kind, or something being inadvertently retained? There wasn't any PatNew data after the first CMS gc to see if that made any difference to survival (as to my original theory of this being a case of premature promotion causing nepotism). Eyeballing your GC logs, it doesn't immediately seem to me to be a case of that. This is one place where the promotion census tool would have helped. May be you could get close enough, though, by comparing class histogram dumps or heap dumps before the onset of the high survival rates with those after the onset. I am almost sure this is a program issue and not an artifact of GC. To test that hypothesis, you could, as I think Bernd suggested earlier, run with a young gen and heap twice as large (or half as large) as you have now, and see if the survival rate jumps up at around the sane time as it does now. If the "jump time" remains invariant under heap reshaping (to larger values) then it's very likely your application itself and not an artifact of GC. Thanks for sharing, all the best, and let us know how this turned out! -- ramki > I really appreciate the support and time of all of you. > > Regards, > > > On Fri, Dec 20, 2013 at 7:26 AM, Srinivas Ramakrishna wrote: > >> Hi Luciano -- >> >> Look at the rate of promotion. It sounds as if the rate of promotion and >> of survival suddenly jumps up where you start >> seeing longer pauses. If you believe the promotion and survivor volume >> should not increase like that, one possibility >> is "nepotism", where objects that may have been prematurely promoted and >> then died, cause the objects they are >> referring to, to be kept alive. Try to do the following experiment. >> >> (1) In the one case, let the system run as it has always run. Notice the >> increase in young gc time. Let the system >> continue to run until a full gc occurs, and see if the young gc time >> drops >> >> (2) In a second case, let the system run util you see the increased young >> gc times. At this time induce an explicit full gc >> (via jconsole) and see if the young gc times fall. >> >> (3) Repeat (2), but with -XX:+ExplicitGCInvokesConcurrent >> >> Finally, one question, do you see the increased in young gc times after >> an explicit full gc? Or are you saying that the >> minor gc at 1992 seconds that you show below is the last "quick" minor >> GC, and the next one onwards is slow? >> >> If you share a complete GC log, we might be able to help. The age 15 >> cohort in the last quick minor GC probably holds the >> clue and once promoted, allows nepotism to set in. If that is the case, >> then -XX:+NeverPromote (if >> the flag still exists) might be worth experimenting with (although it >> carries its own dangers, and isn't really a fix, just >> a means to figuring out If this might be the issue). >> >> Basically, I think something like a member of a linked list is getting >> into the old generation before it does >> and then proceeds to keep its successor alive, and so on. A full gc may >> or may not fix that kind of issue. >> You could try to clear the links of list members when you pull them out >> of the list and that might reduce the >> occurrence of such phenomena. I vaguely recall this behaviour with >> LinkedBlockingQueue's because of which >> Doug Lea was forced to hold his nose and clear the references when >> removing members from the list to >> prevent this kind of false retention. (Ah, I found a reference: >> http://thread.gmane.org/gmane.comp.java.jsr.166-concurrency/5758 >> and https://bugs.openjdk.java.net/browse/JDK-6805775 ) >> >> -- ramki >> >> >> On Wed, Dec 18, 2013 at 10:58 AM, Luciano Molinari > > wrote: >> >>> Hi everybody, >>> >>> We have a standalone Java app that receives requests through RMI and >>> almost all the objects created by it are short (< ~100ms) lived objects. >>> This app is running on a 24 cores server with 16 GB RAM (Red Hat Linux). >>> During our performance tests (10k requests/second) we started to face a >>> problem where the throughput decreases suddenly just a few minutes >>> after the app was started. >>> So, I started to investigate GC behaviour and to make some adjustments >>> (increase memory, use CMS...) and now we are able to run our app >>> properly for about 35 minutes. At this point the time spent during young >>> collections grows sharply although no Full GC is executed (old gen is only >>> ~4% full). >>> >>> I've done tests with many different parameters, but currently I'm using >>> the following ones: >>> java -server -verbose:gc -XX:+PrintGCDetails >>> -XX:+PrintTenuringDistribution -XX:+PrintGCTimeStamps >>> -XX:PrintFLSStatistics=1 -XX:SurvivorRatio=4 >>> -XX:ParallelGCThreads=8 -XX:PermSize=256m -XX:+UseParNewGC >>> -XX:MaxPermSize=256m -Xms7g -Xmx7g -XX:NewSize=4608m -XX:MaxNewSize=4608m >>> -XX:MaxTenuringThreshold=15 -Dsun.rmi.dgc.client.gcInterval=3600000 >>> -Dsun.rmi.dgc.server.gcInterval=3600000 >>> -Djava.rmi.server.hostname=IP_ADDRESS >>> >>> If I use this same configuration (without CMS) the same problem occurs >>> after 20minutes, so it doesn't seem to be related to CMS. Actually, as I >>> mentioned above, CMS (Full GC) isn't executed during the tests. >>> >>> Some logs I've collected: >>> >>> 1992.748: [ParNew >>> Desired survivor size 402653184 bytes, new threshold 15 (max 15) >>> - age 1: 9308728 bytes, 9308728 total >>> - age 2: 3448 bytes, 9312176 total >>> - age 3: 1080 bytes, 9313256 total >>> - age 4: 32 bytes, 9313288 total >>> - age 5: 34768 bytes, 9348056 total >>> - age 6: 32 bytes, 9348088 total >>> - age 15: 2712 bytes, 9350800 total >>> : 3154710K->10313K(3932160K), 0.0273150 secs] 3215786K->71392K(6553600K) >>> >>> //14 YGC happened during this window >>> >>> 2021.165: [ParNew >>> Desired survivor size 402653184 bytes, new threshold 15 (max 15) >>> - age 1: 9459544 bytes, 9459544 total >>> - age 2: 3648200 bytes, 13107744 total >>> - age 3: 3837976 bytes, 16945720 total >>> - age 4: 3472448 bytes, 20418168 total >>> - age 5: 3586896 bytes, 24005064 total >>> - age 6: 3475560 bytes, 27480624 total >>> - age 7: 3520952 bytes, 31001576 total >>> - age 8: 3612088 bytes, 34613664 total >>> - age 9: 3355160 bytes, 37968824 total >>> - age 10: 3823032 bytes, 41791856 total >>> - age 11: 3304576 bytes, 45096432 total >>> - age 12: 3671288 bytes, 48767720 total >>> - age 13: 3558696 bytes, 52326416 total >>> - age 14: 3805744 bytes, 56132160 total >>> - age 15: 3429672 bytes, 59561832 total >>> : 3230658K->77508K(3932160K), 0.1143860 secs] 3291757K->142447K(6553600K) >>> >>> Besides the longer time to perform collection, I also realized that all >>> 15 ages started to have larger values. >>> >>> I must say I'm a little confused about this scenario. Does anyone have >>> some tip? >>> >>> Thanks in advance, >>> -- >>> Luciano >>> >>> _______________________________________________ >>> hotspot-gc-use mailing list >>> hotspot-gc-use at openjdk.java.net >>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>> >>> >> > > > -- > Luciano Davoglio Molinari > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20131220/e66039df/attachment.html From ivan.mamontov at gmail.com Sat Dec 21 02:24:47 2013 From: ivan.mamontov at gmail.com (=?UTF-8?B?0JzQsNC80L7QvdGC0L7QsiDQmNCy0LDQvQ==?=) Date: Sat, 21 Dec 2013 14:24:47 +0400 Subject: Minor GC Degradation Over Time Message-ID: Hi, During trends analysis of performance we discovered that garbage collecting time on JMVs steadily grows over long time. Minor GC pauses increase from 0.1 sec to 1 sec. This chart shows that GC time grows over time: https://drive.google.com/file/d/0B3vklY0UY_wHODRkSWgyRjNhTms/edit?usp=sharing This also causes some average query latency degradation. We didn't observe frequency increase, we observed large increase of time which it takes to perform minor GC. This problem happens regardless of OS(Debian 6-7 64bit, RHEL 5.9 64bit), JVM version(from 1.6.30 to 1.7.45), application or hardware. Only one thing unites them: servlet container SpringDM. There is a defect in this container - it creates a new thread for each request, so after we had fixed it we had stable GC time: https://drive.google.com/file/d/0B3vklY0UY_wHWVRJSU5QTnJOcG8/edit?usp=sharing Right now I am trying to research what exactly is the root cause of this problem. Below are some facts and figures to help clarify what happens: After some period of time(which depends on load), minor GC time grows over time. https://drive.google.com/file/d/0B3vklY0UY_wHRjh3eE9NN21FNWc/edit?usp=sharing - First of all I turned off all built-in GC ergonomics in JVM(*UsePSAdaptiveSurvivorSizePolicy, UseAdaptiveGenerationSizePolicyAtMinorCollection, UseAdaptiveGenerationSizePolicyAtMajorCollection, AdaptiveSizeThroughPutPolicy, UseAdaptiveSizePolicyFootprint**Goal, UseAdaptiveSizePolicy, UseCMSBestFit*), but result the same. - As a next step I disabled all TLAB, PLAB stuff *-XX:MinTLABSize=2m **XX:-ResizeTLAB -XX:TLABSize=2m -XX:-*ResizePLAB, but no changes. - I think I have excluded a memory problem from the list of possibilities, I mean all combinations of parameters *AlwaysPreTouch and *UseLargePages were tested. - I did not give a chance to GC threads *-XX:+BindGCTaskThreadsToCPUs * *-XX:+UseGCTaskAffinity **-XX:ParallelGCThreads=8 -XX:ConcGCThreads=8* - It was my last chance - build my own JVM, so all critical sections were covered with *GCTraceTime*. Almost immediately I found the only place where time grows over time: void ParNewGeneration::collect(bool full, bool clear_all_soft_refs, size_t size, bool is_tlab) { ... if (n_workers > 1) { GenCollectedHeap::StrongRootsScope srs(gch); *workers->run_task(&tsk); <-- time grows here* } else { So the question is: how and why does this happen? How the number of created threads can affect stop-the-world parallel collection over time(fragmentation)? Maybe someone knows or can suggest how to identify the problem. Of course I can find it(I hope), but I think it will take longer without any advice. I have done tests with different JVM parameters, but currently I'm using the following one: -XX:PermSize=256m \ -XX:MaxPermSize=256m \ -Xms4096m -Xmx4096m \ -XX:MaxTenuringThreshold=5 \ -XX:NewRatio=2 \ -XX:MinTLABSize=2m \ -XX:-ResizeTLAB \ -XX:TLABSize=2m \ -XX:-PrintTLAB \ -verbose:gc \ -XX:+UseConcMarkSweepGC -XX:+PrintGCDetails \ -XX:+PrintGCCause \ -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps \ -XX:-ResizeOldPLAB \ -XX:-ResizePLAB \ -XX:ParallelGCThreads=8 \ -XX:ConcGCThreads=8 \ -XX:-UseAdaptiveSizePolicy \ -XX:-UseCMSBestFit \ -XX:+AlwaysPreTouch \ -XX:-UseLargePages \ -XX:+UseCondCardMark \ -XX:-UsePSAdaptiveSurvivorSizePolicy \ -XX:-UseAdaptiveGenerationSizePolicyAtMinorCollection \ -XX:-UseAdaptiveGenerationSizePolicyAtMajorCollection \ -XX:AdaptiveSizeThroughPutPolicy=1 \ -XX:-UseAdaptiveSizePolicyFootprintGoal \ -XX:+DisableExplicitGC \ -XX:+UseXMMForArrayCopy \ -XX:+UseUnalignedLoadStores \ -XX:-UseBiasedLocking \ -XX:+BindGCTaskThreadsToCPUs \ -XX:+UseGCTaskAffinity \ -- Thanks, Ivan -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20131221/23ef80bd/attachment-0001.html From lucmolinari at gmail.com Sat Dec 21 08:15:07 2013 From: lucmolinari at gmail.com (Luciano Molinari) Date: Sat, 21 Dec 2013 14:15:07 -0200 Subject: YGC time increasing suddenly In-Reply-To: References:

Message-ID: Hi Aaron, Thanks for sharing your experience and these links. I'm not also familiar with NUMA systems, so I'm studying it as well. I'll run some tests based on your comments. Regards On Fri, Dec 20, 2013 at 6:27 PM, Aaron Daubman wrote: > Luciano, > > Until today I wasn't using huge pages, but today I ran a test locking my >> app in only one NUMA node (Wolfgang's tip) and using >> huge pages. As the app has less resources in this case, I decreased the >> request rate to 7k/s. The app ran properly for >> about ~35min and then the same problem appeared. >> > > Interesting that I ran the exact same tests last night - enabling > hugepages and shared memory on a NUMA system and then binding the JVM to a > single socket. Apologies if all this is already common knowledge (i've had > to do a lot of reading on this recently at least ;-)) > > Note that hugepage allocation must be done post-boot in order to actually > allocate the desired number of hugepages on the socket you will be working > with. > I decided to try out the hugeadm utility, and found this to work for me: > > /usr/bin/numactl --cpunodebind=0 --membind=0 hugeadm --pool-pages-min > 2M:12G --obey-mempolicy --pool-pages-max 2M:12G --create-group-mounts > --set-recommended-shmmax --set-shm-group echonest > --set-recommended-min_free_kbytes > > This would allocate 12G of 2M hugepages on socket0. > Note that if you use /etc/grub.conf or even sysctl to allocate hugepages, > they will be evenly split among sockets, and so you will have less usable > than desired if locking a process to a single socket. > > I found I was actually able to handle slightly higher load at similar > response times using half the systems resources, so I am essentially > wasting money on this dual-socket system =( > > Some reading I found useful for this: > https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt > "Interaction of Task Memory Policy with Huge Page Allocation/Freeing" > > https://www.kernel.org/doc/Documentation/sysctl/vm.txt > > > https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Performance_Tuning_Guide/main-cpu.html#s-cpu-tuning > "4.1.2.2. Controlling NUMA Policy with numactl" > > http://users.sdsc.edu/~glockwood/comp/affinity.php#define:numactl > > on hugeadm: > https://lwn.net/Articles/376606/ > > I verified this was working as-desired by: > # cat /sys/devices/system/node/node*/meminfo | fgrep Huge > Node 0 HugePages_Total: 6144 > Node 0 HugePages_Free: 1541 > Node 0 HugePages_Surp: 0 > Node 1 HugePages_Total: 0 > Node 1 HugePages_Free: 0 > Node 1 HugePages_Surp: 0 > > > >> What's the best way to setup >> the JVM considering NUMA?Disable NUMA?Run JVM with -XX:+UseNUMA?I'll run a >> test with this parameter. >> > > +UseNUMA should definitely help if you are going to run on a multi-socket > NUMA system and if you (IIRC default enabled for your config) enable > +UseParallelGC: > > http://docs.oracle.com/javase/7/docs/technotes/guides/vm/performance-enhancements-7.html#numa > > As a side question, I am trying to stick with G1, which still lacks NUMA > compatibility, does anybody know why this still is or if it will be coming > anytime soon? > https://bugs.openjdk.java.net/browse/JDK-7005859 > http://openjdk.java.net/jeps/157 > > Another side question that may potentially be useful: > I have found that I need to add -XX:+UseSHM in addition to > -XX:+UseLargePages in order to actually use LargePages - does this make > sense / is this expected? I could find very little documentation / > reference to UseSHM. > > > -- Luciano Davoglio Molinari -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20131221/56124b82/attachment.html From lucmolinari at gmail.com Sat Dec 21 08:15:45 2013 From: lucmolinari at gmail.com (Luciano Molinari) Date: Sat, 21 Dec 2013 14:15:45 -0200 Subject: YGC time increasing suddenly In-Reply-To: References:

Message-ID: Hi Srinivas, Yes, I agree with you that the behaviour is really awkward. I captured some heap dumps and imported them in Eclipse Mat, but using the option "Keep unreachable objects". The objects that use most of the space are Strings and some Hibernate objects (mainly org.hibernate.engine.query. NativeSQLQueryPlan). I'll try to change -XX:StringTableSize parameter to check if I have better results. I noticed that many Strings in the heap are equals..shouldn't JVM keep only different Strings and re-use the equals? Regarding hibernate, I have some native queries where I can use native JDBC, although I really don't know how much this will help. What do you mean by "promotion census tool"? I'll also run a test increasing Heap Space and NewSize. Thanks again. Regards, -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20131221/7aef8f8e/attachment.html From bernd-2013 at eckenfels.net Sat Dec 21 11:45:46 2013 From: bernd-2013 at eckenfels.net (Bernd Eckenfels) Date: Sat, 21 Dec 2013 20:45:46 +0100 Subject: YGC time increasing suddenly In-Reply-To: References:

Message-ID: Am 21.12.2013, 17:15 Uhr, schrieb Luciano Molinari : > I noticed that many Strings in the heap are equals..shouldn't JVM keep > only different Strings and re-use the equals? When a string is interned (all constants in your classes are interned) then there is only one instance for a string. However if you (or a library) construct a new string from bytes/characters then it will not be automatically interned and therefore created with multiple instances (basically if you use "new String" you get what you ask for :) You can use .intern() on a String to get its unique instance, but be sure to do this only for the most common/often used strings. This does essentially a hash lookup, so it is most often not a good idea to do it (unless you know you keep them around for long and/or equal() them very often. (For example it sometimes make sense to intern() all element names of Dom Trees but not the (too many different) values). Gruss Bernd -- http://bernd.eckenfels.net From lanson.zheng at gmail.com Sun Dec 22 18:14:21 2013 From: lanson.zheng at gmail.com (Guoqin Zheng) Date: Sun, 22 Dec 2013 18:14:21 -0800 Subject: How CMS interleave with ParNew In-Reply-To: References: Message-ID: Srinivas, Thank you very much for this detailed response. One more question, why CMS treats the young gen as the source of root references rather then the ordinary roots? In the other collectors, the roots usually comes from the stack, static variable, thread, JNI, etc. Why there is such difference? Thanks, G. Zheng On Fri, Dec 20, 2013 at 12:54 AM, Srinivas Ramakrishna wrote: > Hi G. Zheng --- > > > On Wed, Dec 18, 2013 at 11:53 PM, Guoqin Zheng wrote: > >> Hey folks, >> >> While search for how CMS works in detail, I happened to an article saying >> usuablly before the inital-mark phase and remark phase, there is young gen >> collection happening and the initial-mark/remark also scan young gen space. >> So >> >> 1. Can you help me understand how the ParNew collection work with CMS >> to improve the performance. >> >> ParNew collections are minor collections. CMS collects just the old gen > and optionally the perm gen. A concurrent CMS > collection is "interrupted" by ParNew minor collections. > >> >> 1. >> 2. Why the young gen space needs to be scanned >> >> Since CMS scans and collects just the old gen and perm gen, it treats the > young gen as a source of roots, > and scans the young gen in its entirety. > > >> >> 1. If the remark takes very long, what does that mean? >> >> It typically means that there is a lot of concurrent mutation happening > in the old gen (perhaps as a result of > objects being promoted rapidly into the old gen). In this case, the > concurrent scanning and precleaning of the > CMS generation (which is designed to catch up with mutations in the old > gen) isn't able to keep up with > the rate of mutations happening there. > > >> >> 1. Finally, is there a official/or good docs talking about CMS in >> detail? >> >> > The paper by Printezis and Detlefs in ISMM 2000 was the starting point of > the > HotSpot CMS implementation, with a few modifications along the way. > > HTHS. > -- ramki > > >> Thanks, >> >> G. Zheng >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20131222/ac6a159a/attachment.html From lanson.zheng at gmail.com Mon Dec 23 00:31:08 2013 From: lanson.zheng at gmail.com (Guoqin Zheng) Date: Mon, 23 Dec 2013 00:31:08 -0800 Subject: Take heapdump on GC? Message-ID: Hi, I have an application that fills up young gen in seconds. I want to take a heap dump on young gen collection, so that I can see what eats up the young gen so fast. How can I take such dump just before GC? Thanks, -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20131223/40ba72a2/attachment.html From Andreas.Mueller at mgm-tp.com Mon Dec 23 09:17:35 2013 From: Andreas.Mueller at mgm-tp.com (=?iso-8859-1?Q?Andreas_M=FCller?=) Date: Mon, 23 Dec 2013 17:17:35 +0000 Subject: Benchmarking hotspot's garbage collectors Message-ID: <46FF8393B58AD84D95E444264805D98FBDE0E504@edata01.mgm-edv.de> Hi all, please find a compilation of procedures and first results of GC benchmarking on my most recent blog post: http://blog.mgm-tp.com/2013/12/garbage-collection-tuning-part2/ This covers hotspot's 4 main garbage collectors ParallelGC, ParNewGC, CMS and G1 in mainly Java 7u45 and focuses on the basics like GC throughput and NewSize dependence. Comments welcome. Best regards Andreas M?ller mgm technology partners GmbH Frankfurter Ring 105a 80807 M?nchen Tel. +49 (89) 35 86 80-633 Fax +49 (89) 35 86 80-288 E-Mail Andreas.Mueller at mgm-tp.com Innovation Implemented. Sitz der Gesellschaft: M?nchen Gesch?ftsf?hrer: Hamarz Mehmanesh Handelsregister: AG M?nchen HRB 105068 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20131223/835c260f/attachment.html From bernd-2013 at eckenfels.net Mon Dec 23 14:04:01 2013 From: bernd-2013 at eckenfels.net (Bernd Eckenfels) Date: Mon, 23 Dec 2013 23:04:01 +0100 Subject: Take heapdump on GC? In-Reply-To: References: Message-ID: Am 23.12.2013, 09:31 Uhr, schrieb Guoqin Zheng : > I have an application that fills up young gen in seconds. I want to take > a > heap dump on young gen collection, so that I can see what eats up the > young > gen so fast. How can I take such dump just before GC? Filling up YoungGen in seconds is not that unusual (especially when the YG is too small). If you want to catch the app in the act I guess it is the best to do it programmatically by waiting a specified time after the last YGC happened. The last YGC can be determined from JMX as well as triggering the heapdump. I dont think there is a hook which can be called before the YGC starts. Poll the following tabular data entry to see the time od the last YGC: java.lang:type=GarbageCollector,name=PS Scavenge # LastGcInfo # endTime Schedule a thread to trigger in x seconds afterwards with: com.sun.management:type=HotSpotDiagnostic # dumpHeap(filename, false); On the other hand, it might be easier to simply take a vew heap historgrams every second or something. Greetings Bernd -- http://bernd.eckenfels.net From bernd.eckenfels at googlemail.com Mon Dec 23 16:25:08 2013 From: bernd.eckenfels at googlemail.com (Bernd Eckenfels) Date: Tue, 24 Dec 2013 01:25:08 +0100 Subject: Benchmarking hotspot's garbage collectors In-Reply-To: <46FF8393B58AD84D95E444264805D98FBDE0E504@edata01.mgm-edv.de> References: <46FF8393B58AD84D95E444264805D98FBDE0E504@edata01.mgm-edv.de> Message-ID: Hello Andreas, thank you for that work! I must say, it is still not a good signal for me to go with G1. Not even the auto tuning and robustness in ill behaved applications seem to be present. That combined with the performance impact. Granted, the tests have been with a smaller heap, but still... BTW: I have added two comments to the article, a nitpick and a request for more details on the used GC configuration and clarification on the types of GC flag combinations tested. For example CMS+ParNew would be the thing I would like to see. Bernd Am 23.12.2013, 18:17 Uhr, schrieb Andreas M?ller : > http://blog.mgm-tp.com/2013/12/garbage-collection-tuning-part2/ From bernd-2013 at eckenfels.net Mon Dec 23 18:04:47 2013 From: bernd-2013 at eckenfels.net (Bernd Eckenfels) Date: Tue, 24 Dec 2013 03:04:47 +0100 Subject: How CMS interleave with ParNew In-Reply-To: References:

Message-ID: Am 23.12.2013, 03:14 Uhr, schrieb Guoqin Zheng : > One more question, why CMS treats the young gen as the source of root > references rather then the ordinary roots? Just a uneducated guess: after a scavenger you can be sure that all surviving objects do have a root. That way you dont need to mark (and expecially traverse) them again. Of course the traditional roots have to be traversed anyway to see to which old gen objects they point to. Greetings Bernd -- http://bernd.eckenfels.net From lanson.zheng at gmail.com Mon Dec 23 22:03:53 2013 From: lanson.zheng at gmail.com (Guoqin Zheng) Date: Mon, 23 Dec 2013 22:03:53 -0800 Subject: Take heapdump on GC? In-Reply-To: References: Message-ID: Thank you, Bernd! Since I have 1G space for young gen, but it still fills up quickly. That is why I want to take a heap dump. Anyway, I will try the second option you mentioned. On Mon, Dec 23, 2013 at 2:04 PM, Bernd Eckenfels wrote: > Am 23.12.2013, 09:31 Uhr, schrieb Guoqin Zheng : > > > I have an application that fills up young gen in seconds. I want to take > > a > > heap dump on young gen collection, so that I can see what eats up the > > young > > gen so fast. How can I take such dump just before GC? > > Filling up YoungGen in seconds is not that unusual (especially when the YG > is too small). If you want to catch the app in the act I guess it is the > best to do it programmatically by waiting a specified time after the last > YGC happened. The last YGC can be determined from JMX as well as > triggering the heapdump. I dont think there is a hook which can be called > before the YGC starts. > > Poll the following tabular data entry to see the time od the last YGC: > > java.lang:type=GarbageCollector,name=PS Scavenge # LastGcInfo # endTime > > Schedule a thread to trigger in x seconds afterwards with: > > com.sun.management:type=HotSpotDiagnostic # dumpHeap(filename, false); > > On the other hand, it might be easier to simply take a vew heap > historgrams every second or something. > > Greetings > Bernd > -- > http://bernd.eckenfels.net > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20131223/4b33b99c/attachment.html From ysr1729 at gmail.com Tue Dec 24 01:19:52 2013 From: ysr1729 at gmail.com (Srinivas Ramakrishna) Date: Tue, 24 Dec 2013 01:19:52 -0800 Subject: How CMS interleave with ParNew In-Reply-To: References:

Message-ID: Hi Guoqin -- On Sun, Dec 22, 2013 at 6:14 PM, Guoqin Zheng wrote: > Srinivas, > > Thank you very much for this detailed response. > > One more question, why CMS treats the young gen as the source of root > references rather then the ordinary roots? In the other collectors, the > roots usually comes from the stack, static variable, thread, JNI, etc. Why > there is such difference? > CMS does not do any marking of objects in the minor generation, It takes them as live, and (marks/traverses and) collects only the objects in the old generation (and permanent generation, with certain caveats). It must thus treat young gen objects as roots, as it does not do any reachability analysis of young gen objects. Like G1, it could have traversed and marked the young generation objects as well, but for historical reasons it didn't. If it were to traverse and mark young gen objects, it would also have had to worry about dealing with concurrent mutation of young gen objects as well, where mutation rate is typically much higher. I can't recall the history of the design trade-offs here, but it's possible that the cost of dealing with concurrent mutation of young gen objects with an incremental update barrier of the kind CMS uses may have trumped the gains from not treating all of them as potential roots into the old gen. (G1's SATB barrier makes it much more immune to such young gen mutation.) HTHS. -- ramki -- ramki > > Thanks, > > G. Zheng > > > On Fri, Dec 20, 2013 at 12:54 AM, Srinivas Ramakrishna wrote: > >> Hi G. Zheng --- >> >> >> On Wed, Dec 18, 2013 at 11:53 PM, Guoqin Zheng wrote: >> >>> Hey folks, >>> >>> While search for how CMS works in detail, I happened to an article >>> saying usuablly before the inital-mark phase and remark phase, there is >>> young gen collection happening and the initial-mark/remark also scan young >>> gen space. So >>> >>> 1. Can you help me understand how the ParNew collection work with >>> CMS to improve the performance. >>> >>> ParNew collections are minor collections. CMS collects just the old gen >> and optionally the perm gen. A concurrent CMS >> collection is "interrupted" by ParNew minor collections. >> >>> >>> 1. >>> 2. Why the young gen space needs to be scanned >>> >>> Since CMS scans and collects just the old gen and perm gen, it treats >> the young gen as a source of roots, >> and scans the young gen in its entirety. >> >> >>> >>> 1. If the remark takes very long, what does that mean? >>> >>> It typically means that there is a lot of concurrent mutation happening >> in the old gen (perhaps as a result of >> objects being promoted rapidly into the old gen). In this case, the >> concurrent scanning and precleaning of the >> CMS generation (which is designed to catch up with mutations in the old >> gen) isn't able to keep up with >> the rate of mutations happening there. >> >> >>> >>> 1. Finally, is there a official/or good docs talking about CMS in >>> detail? >>> >>> >> The paper by Printezis and Detlefs in ISMM 2000 was the starting point of >> the >> HotSpot CMS implementation, with a few modifications along the way. >> >> HTHS. >> -- ramki >> >> >>> Thanks, >>> >>> G. Zheng >>> >>> _______________________________________________ >>> hotspot-gc-use mailing list >>> hotspot-gc-use at openjdk.java.net >>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20131224/e842e320/attachment.html From thomas.schatzl at oracle.com Tue Dec 24 02:17:02 2013 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 24 Dec 2013 11:17:02 +0100 Subject: Minor GC Degradation Over Time In-Reply-To: References: Message-ID: <1387880222.7002.5.camel@cirrus> On Sat, 2013-12-21 at 14:24 +0400, ???????? ???? wrote: > Hi, > > During trends analysis of performance we discovered that garbage > collecting time on JMVs steadily grows over long time. > > Minor GC pauses increase from 0.1 sec to 1 sec. This chart shows that > GC time grows over time: > > https://drive.google.com/file/d/0B3vklY0UY_wHODRkSWgyRjNhTms/edit?usp=sharing > > This also causes some average query latency degradation. > > We didn't observe frequency increase, we observed large increase of > time which it takes to perform minor GC. > > This problem happens regardless of OS(Debian 6-7 64bit, RHEL 5.9 > 64bit), JVM version(from 1.6.30 to 1.7.45), application or hardware. > > Only one thing unites them: servlet container SpringDM. There is a > defect in this container - it creates a new thread for each request, > so after we had fixed it we had stable GC time: > >https://drive.google.com/file/d/0B3vklY0UY_wHWVRJSU5QTnJOcG8/edit?usp=sharing > > Right now I am trying to research what exactly is the root cause of > this problem. All Java live threads need to be scanned for references into the heap at GC (stacks etc). So maybe the container leaked threads (or managed them using something like soft references which only occasionally get cleared up)? Then GC time automatically increases. This need to be a *lot* of threads though to make an impact. Thomas From alexey.ragozin at gmail.com Tue Dec 24 12:33:40 2013 From: alexey.ragozin at gmail.com (Alexey Ragozin) Date: Wed, 25 Dec 2013 00:33:40 +0400 Subject: Take heapdump on GC? Message-ID: Hi, Take a look at https://github.com/aragozin/jvm-tools/blob/master/sjk-core/COMMANDS.md#hh-command This tools could show you class histogram of garbage in heap (internally it is using jmap). Not a heapdump, but class histograms are still quite useable. Another command which can help you https://github.com/aragozin/jvm-tools/blob/master/sjk-core/COMMANDS.md#ttop-command Besides CPU usage, it show allocation rate per thread, so you can identify thread producing most garbage. Regards, Alexey On Wed, Dec 25, 2013 at 12:00 AM, wrote: > Message: 4 > Date: Mon, 23 Dec 2013 22:03:53 -0800 > From: Guoqin Zheng > Subject: Re: Take heapdump on GC? > To: Bernd Eckenfels > Cc: "hotspot-gc-use at openjdk.java.net" > > Message-ID: > > Content-Type: text/plain; charset="iso-8859-1" > > Thank you, Bernd! > > Since I have 1G space for young gen, but it still fills up quickly. That is > why I want to take a heap dump. > > Anyway, I will try the second option you mentioned. > > > > On Mon, Dec 23, 2013 at 2:04 PM, Bernd Eckenfels > wrote: > >> Am 23.12.2013, 09:31 Uhr, schrieb Guoqin Zheng : >> >> > I have an application that fills up young gen in seconds. I want to take >> > a >> > heap dump on young gen collection, so that I can see what eats up the >> > young >> > gen so fast. How can I take such dump just before GC? >> >> Filling up YoungGen in seconds is not that unusual (especially when the YG >> is too small). If you want to catch the app in the act I guess it is the >> best to do it programmatically by waiting a specified time after the last >> YGC happened. The last YGC can be determined from JMX as well as >> triggering the heapdump. I dont think there is a hook which can be called >> before the YGC starts. >> >> Poll the following tabular data entry to see the time od the last YGC: >> >> java.lang:type=GarbageCollector,name=PS Scavenge # LastGcInfo # endTime >> >> Schedule a thread to trigger in x seconds afterwards with: >> >> com.sun.management:type=HotSpotDiagnostic # dumpHeap(filename, false); >> >> On the other hand, it might be easier to simply take a vew heap >> historgrams every second or something. >> >> Greetings >> Bernd >> -- >> http://bernd.eckenfels.net >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> From yaoshengzhe at gmail.com Tue Dec 31 17:27:36 2013 From: yaoshengzhe at gmail.com (yao) Date: Tue, 31 Dec 2013 17:27:36 -0800 Subject: G1 GC clean up time is too long In-Reply-To: <52B5037C.8010704@servergy.com> References: <52B5037C.8010704@servergy.com> Message-ID: Hi Folks, Sorry for reporting GC performance result late, we are in the code freeze period for the holiday season and cannot do any production related deployment. First, I'd like to say thank you to Jenny, Monica and Thomas. Your suggestions are really helpful and help us to understand G1 GC behavior. We did NOT observe any full GCs after adjusting suggested parameters. That is really awesome, we tried these new parameters on Dec 26 and full GC disappeared since then (at least until I am writing this email, at 3:37pm EST, Dec 30). G1 parameters: *-XX:MaxGCPauseMillis=100 *-XX:G1HeapRegionSize=32m *-XX:InitiatingHeapOccupancyPercent=65 *-XX:G1ReservePercent=20 *-XX:G1HeapWastePercent=5 -XX:G1MixedGCLiveThresholdPercent=75* We've reduced MaxGCPauseMillis to 100 since our real-time system is focus on low pause, if system cannot give response in 50 milliseconds, it's totally useless for the client. However, current read latency 99 percentile is still slightly higher than CMS machines but they are pretty close (14 millis vs 12 millis). One thing we can do now is to increase heap size for G1 machines, for now, the heap size for G1 is only 90 percent of those CMS machines. This is because we observed our server process is killed by OOM killer on G1 machines and we decided to decrease heap size on G1 machines. Since G1ReservePercent was increased, we think it should be safe to increase G1 heap to be same as CMS machine. We believe it could make G1 machine give us better performance because 40 percent of heap will be used for block cache. Thanks -Shengzhe G1 Logs 2013-12-30T08:25:26.727-0500: 308692.158: [GC pause (young) Desired survivor size 234881024 bytes, new threshold 14 (max 15) - age 1: 16447904 bytes, 16447904 total - age 2: 30614384 bytes, 47062288 total - age 3: 16122104 bytes, 63184392 total - age 4: 16542280 bytes, 79726672 total - age 5: 14249520 bytes, 93976192 total - age 6: 15187728 bytes, 109163920 total - age 7: 15073808 bytes, 124237728 total - age 8: 17903552 bytes, 142141280 total - age 9: 17031280 bytes, 159172560 total - age 10: 16854792 bytes, 176027352 total - age 11: 19192480 bytes, 195219832 total - age 12: 20491176 bytes, 215711008 total - age 13: 16367528 bytes, 232078536 total - age 14: 15536120 bytes, 247614656 total 308692.158: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 32768, predicted base time: 38.52 ms, remaining time: 61.48 ms, target pause time: 100.00 ms] 308692.158: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 91 regions, survivors: 14 regions, predicted young region time: 27.76 ms] 308692.158: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 91 regions, survivors: 14 regions, old: 0 regions, predicted pause time: 66.28 ms, target pause time: 100.00 ms] 308692.233: [G1Ergonomics (Concurrent Cycles) request concurrent cycle initiation, reason: occupancy higher than threshold, occupancy: 52143587328 bytes, allocation request: 0 bytes, threshold: 46172576125 bytes (65.00 %), source: end of GC] , 0.0749020 secs] [Parallel Time: 53.9 ms, GC Workers: 18] [GC Worker Start (ms): Min: 308692158.6, Avg: 308692159.0, Max: 308692159.4, Diff: 0.8] [Ext Root Scanning (ms): Min: 3.9, Avg: 4.5, Max: 6.4, Diff: 2.4, Sum: 81.9] [Update RS (ms): Min: 10.2, Avg: 11.6, Max: 12.2, Diff: 2.0, Sum: 209.0] [Processed Buffers: Min: 15, Avg: 22.5, Max: 31, Diff: 16, Sum: 405] [Scan RS (ms): Min: 7.8, Avg: 8.0, Max: 8.3, Diff: 0.5, Sum: 144.3] [Object Copy (ms): Min: 28.3, Avg: 28.4, Max: 28.5, Diff: 0.2, Sum: 510.7] [Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 1.2] [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.5] [GC Worker Total (ms): Min: 52.3, Avg: 52.6, Max: 53.1, Diff: 0.8, Sum: 947.5] [GC Worker End (ms): Min: 308692211.6, Avg: 308692211.7, Max: 308692211.7, Diff: 0.1] [Code Root Fixup: 0.0 ms] [Clear CT: 9.8 ms] [Other: 11.1 ms] [Choose CSet: 0.0 ms] [Ref Proc: 2.4 ms] [Ref Enq: 0.4 ms] [Free CSet: 1.1 ms] [Eden: 2912.0M(2912.0M)->0.0B(3616.0M) Survivors: 448.0M->416.0M Heap: 51.7G(66.2G)->48.9G(66.2G)] [Times: user=1.07 sys=0.01, real=0.08 secs] 308697.312: [G1Ergonomics (Concurrent Cycles) initiate concurrent cycle, reason: concurrent cycle initiation requested] 2013-12-30T08:25:31.881-0500: 308697.312: [GC pause (young) (initial-mark) Desired survivor size 268435456 bytes, new threshold 15 (max 15) - age 1: 17798336 bytes, 17798336 total - age 2: 15275456 bytes, 33073792 total - age 3: 27940176 bytes, 61013968 total - age 4: 15716648 bytes, 76730616 total - age 5: 16474656 bytes, 93205272 total - age 6: 14249232 bytes, 107454504 total - age 7: 15187536 bytes, 122642040 total - age 8: 15073808 bytes, 137715848 total - age 9: 17362752 bytes, 155078600 total - age 10: 17031280 bytes, 172109880 total - age 11: 16854792 bytes, 188964672 total - age 12: 19124800 bytes, 208089472 total - age 13: 20491176 bytes, 228580648 total - age 14: 16367528 bytes, 244948176 total 308697.313: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 31028, predicted base time: 37.87 ms, remaining time: 62.13 ms, target pause time: 100.00 ms] 308697.313: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 113 regions, survivors: 13 regions, predicted young region time: 27.99 ms] 308697.313: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 113 regions, survivors: 13 regions, old: 0 regions, predicted pause time: 65.86 ms, target pause time: 100.00 ms] , 0.0724890 secs] [Parallel Time: 51.9 ms, GC Workers: 18] [GC Worker Start (ms): Min: 308697313.3, Avg: 308697313.7, Max: 308697314.0, Diff: 0.6] [Ext Root Scanning (ms): Min: 4.3, Avg: 5.7, Max: 16.7, Diff: 12.3, Sum: 101.8] [Update RS (ms): Min: 0.0, Avg: 9.3, Max: 10.4, Diff: 10.4, Sum: 166.9] [Processed Buffers: Min: 0, Avg: 22.0, Max: 30, Diff: 30, Sum: 396] [Scan RS (ms): Min: 6.4, Avg: 8.5, Max: 13.0, Diff: 6.5, Sum: 152.3] [Object Copy (ms): Min: 22.5, Avg: 27.1, Max: 27.7, Diff: 5.2, Sum: 487.0] [Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 1.0] [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.6] [GC Worker Total (ms): Min: 50.2, Avg: 50.5, Max: 50.9, Diff: 0.6, Sum: 909.5] [GC Worker End (ms): Min: 308697364.2, Avg: 308697364.2, Max: 308697364.3, Diff: 0.1] [Code Root Fixup: 0.0 ms] [Clear CT: 9.9 ms] [Other: 10.8 ms] [Choose CSet: 0.0 ms] [Ref Proc: 2.8 ms] [Ref Enq: 0.4 ms] [Free CSet: 0.9 ms] [Eden: 3616.0M(3616.0M)->0.0B(3520.0M) Survivors: 416.0M->448.0M Heap: 52.5G(66.2G)->49.0G(66.2G)] [Times: user=1.01 sys=0.00, real=0.07 secs] 2013-12-30T08:25:31.954-0500: 308697.385: [GC concurrent-root-region-scan-start] 2013-12-30T08:25:31.967-0500: 308697.398: [GC concurrent-root-region-scan-end, 0.0131710 secs] 2013-12-30T08:25:31.967-0500: 308697.398: [GC concurrent-mark-start] 2013-12-30T08:25:36.566-0500: 308701.997: [GC concurrent-mark-end, 4.5984140 secs] 2013-12-30T08:25:36.570-0500: 308702.002: [GC remark 2013-12-30T08:25:36.573-0500: 308702.004: [GC ref-proc, 0.0126990 secs], 0.0659540 secs] [Times: user=0.87 sys=0.00, real=0.06 secs] 2013-12-30T08:25:36.641-0500: 308702.072: [GC cleanup 52G->52G(66G), 0.5487830 secs] [Times: user=9.66 sys=0.06, real=0.54 secs] 2013-12-30T08:25:37.190-0500: 308702.622: [GC concurrent-cleanup-start] 2013-12-30T08:25:37.190-0500: 308702.622: [GC concurrent-cleanup-end, 0.0000480 secs] -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20131231/d5e2a1bf/attachment.html