From taras.tielkes at gmail.com Sun Jan 13 12:07:13 2013 From: taras.tielkes at gmail.com (Taras Tielkes) Date: Sun, 13 Jan 2013 21:07:13 +0100 Subject: Monitoring finalization activity Message-ID: Hi, Are there some (semi-) public counters available to track how much work is being performed with regards to finalization? I'm mainly interested in finalized instance counts by class, rather than the current size of the finalizer queue. Thanks, -tt -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130113/6fb0bc51/attachment.html From Andreas.Loew at oracle.com Sun Jan 13 14:17:56 2013 From: Andreas.Loew at oracle.com (Andreas Loew) Date: Sun, 13 Jan 2013 23:17:56 +0100 Subject: Monitoring finalization activity In-Reply-To: References: Message-ID: <50F33294.1050308@oracle.com> Hi Taras, you should be able to use BTrace (i.e. dynamic bytecode instrumentation) and register a probe on calling into the finalize() methods of the classes you want to monitor (the probe can then do the counting): http://kenai.com/projects/btrace/pages/Home http://kenai.com/projects/btrace/pages/UserGuide Hope this helps & best regards, Andreas Am 13.01.2013 21:07, schrieb Taras Tielkes: > Hi, > > Are there some (semi-) public counters available to track how much > work is being performed with regards to finalization? I'm mainly > interested in finalized instance counts by class, rather than the > current size of the finalizer queue. > > Thanks, > -tt -- Andreas Loew | Senior Java Architect ACS Principal Service Delivery Engineer ORACLE Germany From michal.warecki at gmail.com Mon Jan 14 09:21:41 2013 From: michal.warecki at gmail.com (=?ISO-8859-2?Q?Micha=B3_Warecki?=) Date: Mon, 14 Jan 2013 18:21:41 +0100 Subject: CMS lazy sweeping Message-ID: Hi All, I'm trying to understand the mark-sweep algorithm. Before I dive into implementation of CMS in OpenJDK I want to ask a few questions. Does CMS in OpenJDK use lazy sweeping with a block structured heap? If no, why? If yes, each block should contain information about class name of objects allocated in each block. Furthermore, object header does not have to contain class name because it is in block header. I think this solution will improve number of cache hits and heap size. Are there any information about this? Thanks, MW -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130114/dc6c9de3/attachment.html From bartosz.markocki at gmail.com Fri Jan 18 05:10:30 2013 From: bartosz.markocki at gmail.com (Bartek Markocki) Date: Fri, 18 Jan 2013 14:10:30 +0100 Subject: Spikes in the duration of the ParNew collections Message-ID: Hello all, During tests of a new version of our application we found out that some of the ParNew times spike to 170ms (avg 10ms) - Java6 update 38, 64bit, -server with CMS. Of course the first thing that came to our mind was a spike in allocation rate resulting in a spike in the amount of surviving objects and/ or a spike in promotion rate. Unfortunately the collection(s) in question did not showed any abnormality in this matter. To make the things even more interesting, showed in the attached extract from the gc log, some of those long lasting ParNew showed smaller promotion rate comparing to the average (21k per collection). Before re-run of the test we enabled -XX:+PrintSafepointStatistics and -XX:+TraceSafepointCleanupTime to get better understanding of STW times. As a result of that we found out that almost all the time goes to the collection time. 28253.076: GenCollectForAllocation [ 382 0 0 ] [ 0 0 0 3 170 ] 0 Additionally we noticed that user to real time ratio for a normal (normally long) ParNew collection is between 4 and 8. For the collection in question it jumps to 12 (we have 16 cores) - so not only the collection lasted longer but more CPU was used. For your review - I attached an extract from std out and gc log for the collection in question. Additionally we reran the test with the changed collector to ParallelOld and we did not notice comparable spikes in the young generation times. After that we took Java7 update 10 with the CMS and found out that the issue is still there (spikes in ParNew times) however is less noticeable, i.e., the max ParNew time was 113ms. The question of the day is: why it is happening? what else we can do/check/test to make our application run CMS on java6? Thanks in advance, Bartek $ java -version java version "1.6.0_38" Java(TM) SE Runtime Environment (build 1.6.0_38-b05) Java HotSpot(TM) 64-Bit Server VM (build 20.13-b02, mixed mode) $less /etc/redhat-release Red Hat Enterprise Linux Server release 5.5 (Tikanga) JVM options -server -Xms2g -Xmx2g -XX:PermSize=64m -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:CMSInitiatingOccupancyFraction=50 -XX:+UseCMSInitiatingOccupancyOnly -XX:NewSize=1700m -XX:MaxNewSize=1700m -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCApplicationStoppedTime -XX:+PrintFlagsFinal -Xloggc:/apps/gcLog.log -XX:+PrintGCDateStamps -XX:+PrintGCApplicationConcurrentTime -XX:PrintCMSStatistics=3 -XX:+PrintCMSInitiationStatistics -XX:+PrintAdaptiveSizePolicy -XX:+PrintGCTaskTimeStamps -XX:+PrintSharedSpaces -XX:+PrintTenuringDistribution -XX:+PrintVMQWaitTime -XX:+PrintHeapAtGC -XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=10 -XX:+TraceSafepointCleanupTime -XX:PrintFLSStatistics=2 -XX:+PrintReferenceGC -------------- next part -------------- A non-text attachment was scrubbed... Name: gcLog.log.gz Type: application/x-gzip Size: 1402 bytes Desc: not available Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130118/4c612d80/gcLog.log.gz -------------- next part -------------- A non-text attachment was scrubbed... Name: stdout.log.gz Type: application/x-gzip Size: 829 bytes Desc: not available Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130118/4c612d80/stdout.log.gz From aaisinzon at guidewire.com Wed Jan 23 15:18:28 2013 From: aaisinzon at guidewire.com (Alexandre Aisinzon) Date: Wed, 23 Jan 2013 23:18:28 +0000 Subject: JE caching Message-ID: <43E49E6EC0E84F41B98C68AB6D7820C417A4C743@sm-ex-01-vm.guidewire.com> Hi all https://forums.oracle.com/forums/thread.jspa?messageID=10017916 indicates that one should explicitly add the Compressed references parameter because: "JE cache sizing does not take into account compressed oops unless it is explicitly specified using XX:+UseCompressedOops". I am not clear what JE caching is. Can someone elaborate on what this capability is? Best Alex A -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130123/99a9d44c/attachment.html From bernd.eckenfels at googlemail.com Wed Jan 23 15:43:45 2013 From: bernd.eckenfels at googlemail.com (Bernd Eckenfels) Date: Thu, 24 Jan 2013 00:43:45 +0100 Subject: JE caching In-Reply-To: <43E49E6EC0E84F41B98C68AB6D7820C417A4C743@sm-ex-01-vm.guidewire.com> References: <43E49E6EC0E84F41B98C68AB6D7820C417A4C743@sm-ex-01-vm.guidewire.com> Message-ID: Am 24.01.2013, 00:18 Uhr, schrieb Alexandre Aisinzon : > I am not clear what JE caching is. Can someone elaborate on what this > capability is? It sounds like JE = BDB Java Edition (Forum Topic), so it seems to be the DB Cache? It is documented in the API: http://docs.oracle.com/cd/E17277_02/html/java/com/sleepycat/je/EnvironmentMutableConfig.html#setCachePercent Gruss Bernd -- https://plus.google.com/u/1/108084227682171831683/about From michal.warecki at gmail.com Wed Jan 23 15:58:22 2013 From: michal.warecki at gmail.com (=?ISO-8859-2?Q?Micha=B3_Warecki?=) Date: Thu, 24 Jan 2013 00:58:22 +0100 Subject: No subject Message-ID: http://sungersilikon.com/yahool221.php From ryebrye at gmail.com Thu Jan 24 16:29:22 2013 From: ryebrye at gmail.com (Ryan Gardner) Date: Thu, 24 Jan 2013 19:29:22 -0500 Subject: Any suggestions for number of cores / heap size for g1 or cms? In-Reply-To: References: Message-ID: We have a non-cpu intensive app that is licensed based on cpu cores. We want to use as much heap as is reasonable without having large pauses. I've only deployed g1 on machines with lots of cores - how well does it work on fewer cores? If we had a live set of 32gb and a heap of 72gb with a relatively small object allocation rate would g1 have low pause times with 4 or 8 physical cores? It doesn't matter too much how long the concurrent phase is - just the pause parts that matter. Any tips/suggestions? -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130124/2b70a09d/attachment.html From jon.masamitsu at oracle.com Thu Jan 24 21:13:03 2013 From: jon.masamitsu at oracle.com (Jon Masamitsu) Date: Thu, 24 Jan 2013 21:13:03 -0800 Subject: Spikes in the duration of the ParNew collections In-Reply-To: References: Message-ID: <5102145F.8020200@oracle.com> If you have a test setup where you can run some experiments, try -XX:-ParNewGC. There have been instances in the past where flaws in the partitioning for parallelism has caused some dramatic increases in the ParNew times. This setting will use the serial young generation collector. It will be slow but perhaps not have the spiking. If that removes the spiking, it gives us some information about the cause but probably not enough to pinpoint the problem. If I were attacking this I'd try to profile the VM to see which methods are consuming all that time. Jon On 1/18/2013 5:10 AM, Bartek Markocki wrote: > Hello all, > > During tests of a new version of our application we found out that > some of the ParNew times spike to 170ms (avg 10ms) - Java6 update 38, > 64bit, -server with CMS. > > Of course the first thing that came to our mind was a spike in > allocation rate resulting in a spike in the amount of surviving > objects and/ or a spike in promotion rate. Unfortunately the > collection(s) in question did not showed any abnormality in this > matter. To make the things even more interesting, showed in the > attached extract from the gc log, some of those long lasting ParNew > showed smaller promotion rate comparing to the average (21k per > collection). > > Before re-run of the test we enabled -XX:+PrintSafepointStatistics > and -XX:+TraceSafepointCleanupTime to get better understanding of STW > times. As a result of that we found out that almost all the time goes > to the collection time. > 28253.076: GenCollectForAllocation [ 382 0 > 0 ] [ 0 0 0 3 170 ] 0 > > Additionally we noticed that user to real time ratio for a normal > (normally long) ParNew collection is between 4 and 8. For the > collection in question it jumps to 12 (we have 16 cores) - so not only > the collection lasted longer but more CPU was used. > > For your review - I attached an extract from std out and gc log for > the collection in question. > > Additionally we reran the test with the changed collector to > ParallelOld and we did not notice comparable spikes in the young > generation times. > After that we took Java7 update 10 with the CMS and found out that the > issue is still there (spikes in ParNew times) however is less > noticeable, i.e., the max ParNew time was 113ms. > > The question of the day is: why it is happening? what else we can > do/check/test to make our application run CMS on java6? > > Thanks in advance, > Bartek > > > $ java -version > java version "1.6.0_38" > Java(TM) SE Runtime Environment (build 1.6.0_38-b05) > Java HotSpot(TM) 64-Bit Server VM (build 20.13-b02, mixed mode) > > $less /etc/redhat-release > Red Hat Enterprise Linux Server release 5.5 (Tikanga) > > JVM options > -server -Xms2g -Xmx2g -XX:PermSize=64m -XX:+UseConcMarkSweepGC > -XX:+UseParNewGC -XX:CMSInitiatingOccupancyFraction=50 > -XX:+UseCMSInitiatingOccupancyOnly -XX:NewSize=1700m > -XX:MaxNewSize=1700m -verbose:gc -XX:+PrintGCDetails > -XX:+PrintGCApplicationStoppedTime -XX:+PrintFlagsFinal > -Xloggc:/apps/gcLog.log -XX:+PrintGCDateStamps > -XX:+PrintGCApplicationConcurrentTime -XX:PrintCMSStatistics=3 > -XX:+PrintCMSInitiationStatistics -XX:+PrintAdaptiveSizePolicy > -XX:+PrintGCTaskTimeStamps -XX:+PrintSharedSpaces > -XX:+PrintTenuringDistribution -XX:+PrintVMQWaitTime > -XX:+PrintHeapAtGC -XX:+PrintSafepointStatistics > -XX:PrintSafepointStatisticsCount=10 -XX:+TraceSafepointCleanupTime > -XX:PrintFLSStatistics=2 -XX:+PrintReferenceGC > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130124/2950f7a7/attachment.html From dhd at exnet.com Thu Jan 24 23:48:27 2013 From: dhd at exnet.com (Damon Hart-Davis) Date: Fri, 25 Jan 2013 07:48:27 +0000 Subject: Any suggestions for number of cores / heap size for g1 or cms? In-Reply-To: References: Message-ID: <3563F6F7-5268-4AAE-870F-B8391E7B867A@exnet.com> Indeed how well does it do on a single core? Rgds Damon On 25 Jan 2013, at 00:29, Ryan Gardner wrote: > We have a non-cpu intensive app that is licensed based on cpu cores. > > We want to use as much heap as is reasonable without having large pauses. > > I've only deployed g1 on machines with lots of cores - how well does it work on fewer cores? If we had a live set of 32gb and a heap of 72gb with a relatively small object allocation rate would g1 have low pause times with 4 or 8 physical cores? > > It doesn't matter too much how long the concurrent phase is - just the pause parts that matter. > > Any tips/suggestions? > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From bartosz.markocki at gmail.com Fri Jan 25 08:51:31 2013 From: bartosz.markocki at gmail.com (Bartek Markocki) Date: Fri, 25 Jan 2013 17:51:31 +0100 Subject: Spikes in the duration of the ParNew collections In-Reply-To: <5102145F.8020200@oracle.com> References: <5102145F.8020200@oracle.com> Message-ID: Yes, we do have test setup however it takes about 24h to observe this behavior. For the moment the team decided to use ParallelOld collector and run 10 day long test to observe the application behavior in the long run. Right now we are 4 days into the test. After we finish we will try to rerun the test with -UseParNew and let you know. This way or another I reviewed the current test gc logs (from 6 identical instances) and in one of them found the following: {Heap before GC invocations=42414 (full 12): PSYoungGen total 1738432K, used 1738032K [0x0000000795c00000, 0x0000000800000000, 0x0000000800000000) eden space 1736064K, 100% used [0x0000000795c00000,0x00000007ffb60000,0x00000007ffb60000) from space 2368K, 83% used [0x00000007ffb60000,0x00000007ffd4c208,0x00000007ffdb0000) to space 2304K, 0% used [0x00000007ffdc0000,0x00000007ffdc0000,0x0000000800000000) ParOldGen total 356352K, used 282406K [0x0000000780000000, 0x0000000795c00000, 0x0000000795c00000) object space 356352K, 79% used [0x0000000780000000,0x00000007913c9b58,0x0000000795c00000) PSPermGen total 65792K, used 34884K [0x000000077ae00000, 0x000000077ee40000, 0x0000000780000000) object space 65792K, 53% used [0x000000077ae00000,0x000000077d011390,0x000000077ee40000) 2013-01-24T16:58:33.374-0600: 289746.531: [GCAdaptiveSizePolicy::compute_survivor_space_size_and_thresh: survived: 1803504 promoted: 106496 overflow: falseAdaptiveSizeSt art: 289746.646 collection: 42414 avg_survived_padded_avg: 2347456.750000 avg_promoted_padded_avg: 187448.734375 avg_pretenured_padded_avg: 3911.287598 tenuring_thresh: 1 target_size: 2359296 Desired survivor size 2359296 bytes, new threshold 1 (max 15) PSAdaptiveSizePolicy::compute_generation_free_space: costs minor_time: 0.009063 major_cost: 0.000009 mutator_cost: 0.990928 throughput_goal: 0.990000 live_space: 319016672 free_space: 2039283712 old_promo_size: 336265216 old_eden_size: 1703018496 desired_promo_size: 336265216 desired_eden_size: 1703018496 AdaptiveSizePolicy::survivor space sizes: collection: 42414 (2359296, 2424832) -> (2359296, 2359296) AdaptiveSizeStop: collection: 42414 [PSYoungGen: 1738032K->1761K(1738496K)] 2020439K->284272K(2094848K), 0.1150200 secs] [Times: user=1.25 sys=0.00, real=0.11 secs] Heap after GC invocations=42414 (full 12): PSYoungGen total 1738496K, used 1761K [0x0000000795c00000, 0x0000000800000000, 0x0000000800000000) eden space 1736192K, 0% used [0x0000000795c00000,0x0000000795c00000,0x00000007ffb80000) from space 2304K, 76% used [0x00000007ffdc0000,0x00000007fff784f0,0x0000000800000000) to space 2304K, 0% used [0x00000007ffb80000,0x00000007ffb80000,0x00000007ffdc0000) ParOldGen total 356352K, used 282510K [0x0000000780000000, 0x0000000795c00000, 0x0000000795c00000) object space 356352K, 79% used [0x0000000780000000,0x00000007913e3b58,0x0000000795c00000) PSPermGen total 65792K, used 34884K [0x000000077ae00000, 0x000000077ee40000, 0x0000000780000000) object space 65792K, 53% used [0x000000077ae00000,0x000000077d011390,0x000000077ee40000) } Again the allocation rate, the amount of survived, promoted objects are comparable to other scavenges however this time it took 115 ms to perform the collection whereas the average from the others is 11ms. Unfortunately we were not able to have -XX:+PrintGCTaskTimeStamps enabled (to have a better visibility into what took so long) as this caused the JVM to crash (constantly). I already reported this as a bug 2426776 but it is still internally reviewed by Oracle. There are two additional things: 1. While preparing to send the first email I found this post http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2012-January/001006.html where Ramki said 'There was an old issue wrt monitor deflation that was foixed a few releases ago'. As we are on the latest update (38) I am not expecting this to apply here but do you know the bug id for this, so I can with full confidence eliminate this? 2. While monitoring the running application I noticed that we constantly have about 2.5k objects that will need to be finalized. Those objects are marked as eligible for finalization in buckets (min: 600, max 2000 objects). The objects are instances of java.util.zip.Deflater class and are finalized quite quickly (below 2 sec - the refresh interval for my monitoring tool). Do you think it might have something in common? This observation I did recently (so for the ParallelOld collector) so for the moment I am not able to correlate this with high ParNew times. Thank you for looking at our problem. Bartek On Fri, Jan 25, 2013 at 6:13 AM, Jon Masamitsu wrote: > If you have a test setup where you can run some experiments, > try -XX:-ParNewGC. There have been instances in the past > where flaws in the partitioning for parallelism has caused some > dramatic increases in the ParNew times. This setting will use > the serial young generation collector. It will be slow but perhaps > not have the spiking. > > If that removes the spiking, it gives us some information about > the cause but probably not enough to pinpoint the problem. > If I were attacking this I'd try to profile the VM to see which > methods are consuming all that time. > > Jon > > > On 1/18/2013 5:10 AM, Bartek Markocki wrote: > > Hello all, > > During tests of a new version of our application we found out that > some of the ParNew times spike to 170ms (avg 10ms) - Java6 update 38, > 64bit, -server with CMS. > > Of course the first thing that came to our mind was a spike in > allocation rate resulting in a spike in the amount of surviving > objects and/ or a spike in promotion rate. Unfortunately the > collection(s) in question did not showed any abnormality in this > matter. To make the things even more interesting, showed in the > attached extract from the gc log, some of those long lasting ParNew > showed smaller promotion rate comparing to the average (21k per > collection). > > Before re-run of the test we enabled -XX:+PrintSafepointStatistics > and -XX:+TraceSafepointCleanupTime to get better understanding of STW > times. As a result of that we found out that almost all the time goes > to the collection time. > 28253.076: GenCollectForAllocation [ 382 0 > 0 ] [ 0 0 0 3 170 ] 0 > > Additionally we noticed that user to real time ratio for a normal > (normally long) ParNew collection is between 4 and 8. For the > collection in question it jumps to 12 (we have 16 cores) - so not only > the collection lasted longer but more CPU was used. > > For your review - I attached an extract from std out and gc log for > the collection in question. > > Additionally we reran the test with the changed collector to > ParallelOld and we did not notice comparable spikes in the young > generation times. > After that we took Java7 update 10 with the CMS and found out that the > issue is still there (spikes in ParNew times) however is less > noticeable, i.e., the max ParNew time was 113ms. > > The question of the day is: why it is happening? what else we can > do/check/test to make our application run CMS on java6? > > Thanks in advance, > Bartek > > > $ java -version > java version "1.6.0_38" > Java(TM) SE Runtime Environment (build 1.6.0_38-b05) > Java HotSpot(TM) 64-Bit Server VM (build 20.13-b02, mixed mode) > > $less /etc/redhat-release > Red Hat Enterprise Linux Server release 5.5 (Tikanga) > > JVM options > -server -Xms2g -Xmx2g -XX:PermSize=64m -XX:+UseConcMarkSweepGC > -XX:+UseParNewGC -XX:CMSInitiatingOccupancyFraction=50 > -XX:+UseCMSInitiatingOccupancyOnly -XX:NewSize=1700m > -XX:MaxNewSize=1700m -verbose:gc -XX:+PrintGCDetails > -XX:+PrintGCApplicationStoppedTime -XX:+PrintFlagsFinal > -Xloggc:/apps/gcLog.log -XX:+PrintGCDateStamps > -XX:+PrintGCApplicationConcurrentTime -XX:PrintCMSStatistics=3 > -XX:+PrintCMSInitiationStatistics -XX:+PrintAdaptiveSizePolicy > -XX:+PrintGCTaskTimeStamps -XX:+PrintSharedSpaces > -XX:+PrintTenuringDistribution -XX:+PrintVMQWaitTime > -XX:+PrintHeapAtGC -XX:+PrintSafepointStatistics > -XX:PrintSafepointStatisticsCount=10 -XX:+TraceSafepointCleanupTime > -XX:PrintFLSStatistics=2 -XX:+PrintReferenceGC > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > From ysr1729 at gmail.com Fri Jan 25 12:22:06 2013 From: ysr1729 at gmail.com (Srinivas Ramakrishna) Date: Fri, 25 Jan 2013 12:22:06 -0800 Subject: Spikes in the duration of the ParNew collections In-Reply-To: References: <5102145F.8020200@oracle.com> Message-ID: Hi Bartek -- On Fri, Jan 25, 2013 at 8:51 AM, Bartek Markocki wrote: > > 1. While preparing to send the first email I found this post > > http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2012-January/001006.html > where Ramki said 'There was > an old issue wrt monitor deflation that was foixed a few releases > ago'. As we are on the latest update (38) I am not expecting this to > apply here but do you know the bug id for this, so I can with full > confidence eliminate this? > That bug involved time outside of the actual collection time, during which the threads were paused. If I understand yr problem correctly, it's that the collection times themselves are spikey. If that understanding is correct, then yr problem would not be related to the above email. > 2. While monitoring the running application I noticed that we > constantly have about 2.5k objects that will need to be finalized. > Those objects are marked as eligible for finalization in buckets (min: > 600, max 2000 objects). The objects are instances of > java.util.zip.Deflater class and are finalized quite quickly (below 2 > sec - the refresh interval for my monitoring tool). Do you think it > might have something in common? This observation I did recently (so > for the ParallelOld collector) so for the moment I am not able to > correlate this with high ParNew times. > Yes, I'd look to see if "PrintReferenceGC" times indicate any diffs, if you haven't already done so. (Haven't followed the preceding part of the thread very closely though.) If/ehrn you find that the spikiness is specific to CMS+ParNew, one can do a few more experiments. Bear in mind that promotion policies are slightly different between the two, and ParNew isn't as adaptive as is ParallelOld (in terms of resizing/reshaping the heap). If the expts are under controlled conditions and you see spikes with a steady workload, one might be able to more quickly pinpoint the culprit(s). (For the case of CMS+ParNew, for example, the dynamic sizing of local promotion buffer lists would be one place to shine a light on.) -- ramki -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130125/4c99e219/attachment.html From taras.tielkes at gmail.com Sat Jan 26 06:51:39 2013 From: taras.tielkes at gmail.com (Taras Tielkes) Date: Sat, 26 Jan 2013 15:51:39 +0100 Subject: Monitoring finalization activity In-Reply-To: <50F33294.1050308@oracle.com> References: <50F33294.1050308@oracle.com> Message-ID: Hi, We've enabled -XX:+PrintReferenceGC, which at least gives totals per reference type cleared per GC. It seems that the tracing JVM options to see which classes are actually put on the finalizer queue are not available in product JVM builds. Is it possible to get the same data somehow through the JMX GC APIs (com.sun.management.GcInfo etc)? Thanks, -tt On Sun, Jan 13, 2013 at 11:17 PM, Andreas Loew wrote: > Hi Taras, > > you should be able to use BTrace (i.e. dynamic bytecode instrumentation) > and register a probe on calling into the finalize() methods of the classes > you want to monitor (the probe can then do the counting): > > http://kenai.com/projects/**btrace/pages/Home > http://kenai.com/projects/**btrace/pages/UserGuide > > Hope this helps & best regards, > Andreas > > > Am 13.01.2013 21:07, schrieb Taras Tielkes: > > Hi, >> >> Are there some (semi-) public counters available to track how much work >> is being performed with regards to finalization? I'm mainly interested in >> finalized instance counts by class, rather than the current size of the >> finalizer queue. >> >> Thanks, >> -tt >> > > -- > Andreas Loew | Senior Java Architect > ACS Principal Service Delivery Engineer > ORACLE Germany > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130126/3b37c90a/attachment.html From pasthelod at gmail.com Sun Jan 27 00:34:48 2013 From: pasthelod at gmail.com (Pas) Date: Sun, 27 Jan 2013 09:34:48 +0100 Subject: Sudden permanent increase in Minor (ParNew) GC time, that only a stop-the-world System.gc() alleviates Message-ID: Hello, Long story short, Minor GC times jump from ~30ms to more than a second (and increase to about 5 seconds), and only an explicit paralell Full GC can whack it out of this madness. Interestingly it looks like this bug/feature manifests when a big ~100+ MB byte[] object gets allocated, thus triggering a CMS initial-sweep. (The CMS runs fine though, but the young gen collections take forever.) http://pastebin.com/RcBkCEEE (of course, if someone's interested I here's the full 50 MBs of the otherwise rather predictable log, 2.9MB compressed http://zomg.hu/work/wtf-gc.log.xz ) We're running the stock Oracle 1.6.0_37 64bit JVM, on a 8 core new Xeon E3-something with plenty of RAM for the heap, with the following options: -Xmx5128M (-Xms5128M, though the linked gclog is without this) -XX:NewSize=300m -XX:MaxNewSize=300m -XX:PermSize=64m -XX:MaxPermSize=192m -XX:+UseParNewGC -XX:ParallelGCThreads=2 -XX:MaxTenuringThreshold=4 -XX:SurvivorRatio=3 -XX:+UseConcMarkSweepGC -XX:+UnlockDiagnosticVMOptions -XX:+CMSScavengeBeforeRemark -XX:CMSInitiatingOccupancyFraction=65 -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintTenuringDistribution -XX:+PrintGCDateStamps -XX:PrintFLSStatistics=1 -Xloggc:/logs/gc.log -verbose:gc Has anyone experienced similar issues? Thanks for your time, Pas -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130127/ff4a9a83/attachment.html From ysr1729 at gmail.com Sun Jan 27 12:00:24 2013 From: ysr1729 at gmail.com (Srinivas Ramakrishna) Date: Sun, 27 Jan 2013 12:00:24 -0800 Subject: Sudden permanent increase in Minor (ParNew) GC time, that only a stop-the-world System.gc() alleviates In-Reply-To: References: Message-ID: Check to see if the pause time increase correlates to a jump in the promotion volume per scavenge. Should be easy to get from yr gc logs (which i haven't looked at). -- ramki On Sun, Jan 27, 2013 at 12:34 AM, Pas wrote: > Hello, > > Long story short, Minor GC times jump from ~30ms to more than a second > (and increase to about 5 seconds), and only an explicit paralell Full GC > can whack it out of this madness. Interestingly it looks like this > bug/feature manifests when a big ~100+ MB byte[] object gets allocated, > thus triggering a CMS initial-sweep. (The CMS runs fine though, but the > young gen collections take forever.) > > http://pastebin.com/RcBkCEEE (of course, if someone's interested I here's > the full 50 MBs of the otherwise rather predictable log, 2.9MB compressed > http://zomg.hu/work/wtf-gc.log.xz ) > > We're running the stock Oracle 1.6.0_37 64bit JVM, on a 8 core new Xeon > E3-something with plenty of RAM for the heap, with the following options: > > -Xmx5128M > (-Xms5128M, though the linked gclog is without this) > -XX:NewSize=300m > -XX:MaxNewSize=300m > -XX:PermSize=64m > -XX:MaxPermSize=192m > > -XX:+UseParNewGC > -XX:ParallelGCThreads=2 > -XX:MaxTenuringThreshold=4 > -XX:SurvivorRatio=3 > > -XX:+UseConcMarkSweepGC > -XX:+UnlockDiagnosticVMOptions > -XX:+CMSScavengeBeforeRemark > -XX:CMSInitiatingOccupancyFraction=65 > > -XX:+PrintGC > -XX:+PrintGCDetails > -XX:+PrintTenuringDistribution > -XX:+PrintGCDateStamps > -XX:PrintFLSStatistics=1 > -Xloggc:/logs/gc.log > -verbose:gc > > Has anyone experienced similar issues? > > Thanks for your time, > Pas > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130127/35303d3a/attachment.html From bartosz.markocki at gmail.com Mon Jan 28 08:27:03 2013 From: bartosz.markocki at gmail.com (Bartek Markocki) Date: Mon, 28 Jan 2013 17:27:03 +0100 Subject: Spikes in the duration of the ParNew collections In-Reply-To: References: <5102145F.8020200@oracle.com> Message-ID: Hi Ramki, See my comment in-line: On Fri, Jan 25, 2013 at 9:22 PM, Srinivas Ramakrishna wrote: > > Hi Bartek -- > > On Fri, Jan 25, 2013 at 8:51 AM, Bartek Markocki > wrote: >> >> >> 1. While preparing to send the first email I found this post >> >> http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2012-January/001006.html >> where Ramki said 'There was >> an old issue wrt monitor deflation that was foixed a few releases >> ago'. As we are on the latest update (38) I am not expecting this to >> apply here but do you know the bug id for this, so I can with full >> confidence eliminate this? > > > That bug involved time outside of the actual collection time, during which > the threads were paused. If I understand yr problem correctly, it's that the > collection times themselves are spikey. If that understanding is correct, > then yr problem would not be related to the above email. You got it correctly; we talk about 'inside' collection time here. > >> >> 2. While monitoring the running application I noticed that we >> constantly have about 2.5k objects that will need to be finalized. >> Those objects are marked as eligible for finalization in buckets (min: >> 600, max 2000 objects). The objects are instances of >> java.util.zip.Deflater class and are finalized quite quickly (below 2 >> sec - the refresh interval for my monitoring tool). Do you think it >> might have something in common? This observation I did recently (so >> for the ParallelOld collector) so for the moment I am not able to >> correlate this with high ParNew times. > > > > Yes, I'd look to see if "PrintReferenceGC" times indicate any diffs, Unfortunately no significant diffs - in terms of time as well as the amount of refs :( The only peculiar thing (at least to me) that I noticed around the collection in question comes from FLSStatistics. The binary tree before and after looks exactly the same: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 30637977 Max Chunk Size: 30637977 Number of Blocks: 1 Av. Block Size: 30637977 Tree Height: 1 However the indexed free list before and after shows a change: Before: Statistics for IndexedFreeLists: -------------------------------- Total Free Space: 60314 Max Chunk Size: 211 Number of Blocks: 1498 Av. Block Size: 40 free=30698291 frag=0.0039 After: -------------------------------- Total Free Space: 60296 Max Chunk Size: 211 Number of Blocks: 1501 Av. Block Size: 40 free=30698273 frag=0.0039 So the free space decreased by 18 (heap words - if I am correct) however the number of blocks increased by 3 blocks. So I assume one bigger block got split into couple smaller ones. A non-intuitive behavior - at least for the first look. The additionally odd thing about this collection is the size of the promoted object(s) 144 bytes where normally we promoted around 21k. > if you > haven't already done so. (Haven't followed the preceding part of the thread > very closely though.) If/ehrn you find that the spikiness is specific to > CMS+ParNew, one can do a few more experiments. Bear in mind that promotion > policies are slightly different between the two, and ParNew isn't as > adaptive as is ParallelOld (in terms of resizing/reshaping the heap). If the > expts are under controlled conditions and you see spikes with a steady > workload, one might be able to more quickly pinpoint the culprit(s). (For > the case of CMS+ParNew, for example, the dynamic sizing of local promotion > buffer lists would be one place to shine a light on.) Understood. As I wrote previously we are in the middle of 10-day-long test. Once the test is done, I will rerun the test with (at least two instances that has): -XX:-UseParNewGC -XX:+PrintOldPLAB -XX:+PrintPLAB (and the old settings). Thanks for your help! Bartek > > -- ramki From taras.tielkes at gmail.com Mon Jan 28 13:11:35 2013 From: taras.tielkes at gmail.com (Taras Tielkes) Date: Mon, 28 Jan 2013 22:11:35 +0100 Subject: java 1.7.0u4 GarbageCollectionNotificationInfo API Message-ID: Hi, I'm playing around with the new(ish) GarbageCollectionNotificationInfo API. We're using ParNew+CMS in all our systems, and my first goal is a comparison between -XX:+PrintGCDetails -verbose:gc output and the actual data coming through the notification API. I'm using Java 1.7.0u6 for the experiments. So far, I have a number of questions: 1) duration times The javadoc for gcInfo.getDuration() describes the returned value as expressed in milliseconds. However, the values differ to the gc logs by several orders of magnitude. How are they calculated? On a 1-core Linux x64 VM, the values actually look like microseconds, but on a Win32 machines I still can't figure out any resemblance to gc log timings. Apart from the unit, what should the value represent? Real time or user time? 2) CMS events with cause "No GC" How exactly do the phases of CMS map to the notifications emitted for the CMS collector? I sometimes get events with cause "No GC". Does this indicate a background CMS cycle being initiated by hitting the occupancy fraction threshold? 3) Eden/Survivor It seems that the MemoryUsage API treats Eden and Survivor separately, i.e. survivor is not a subset of eden. This is different from the gc log presentation. Is my understanding correct? In general, I think it would be useful to have a code sample for the GC notification API that generates output as close as possible to -XX:+PrintGCDetails -verbose:gc, as far as the data required to do so is available. The API looks quite promising, it seems it could really benefit from a bit of documentation love :) Thanks, -tt -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130128/4dbc089f/attachment.html From pasthelod at gmail.com Tue Jan 29 02:01:11 2013 From: pasthelod at gmail.com (Pas) Date: Tue, 29 Jan 2013 11:01:11 +0100 Subject: Sudden permanent increase in Minor (ParNew) GC time, that only a stop-the-world System.gc() alleviates In-Reply-To: References: Message-ID: Hello! I assume promotion volume is the size of the last age group in the tenuring distribution, in my case the age 4 group. From the start of this "more than 1 second minor GC" regime, promotion volume hasn't changed, usually less than one megabyte, and to the end of the log it stays below 12MB. (~1100 ParNew GCs, only 172 above one MB.) Thanks, Pas On Sun, Jan 27, 2013 at 9:00 PM, Srinivas Ramakrishna wrote: > Check to see if the pause time increase correlates to a jump in the > promotion volume per scavenge. Should be easy to get from yr gc logs (which > i haven't looked at). > > -- ramki > > > > On Sun, Jan 27, 2013 at 12:34 AM, Pas wrote: > >> Hello, >> >> Long story short, Minor GC times jump from ~30ms to more than a second >> (and increase to about 5 seconds), and only an explicit paralell Full GC >> can whack it out of this madness. Interestingly it looks like this >> bug/feature manifests when a big ~100+ MB byte[] object gets allocated, >> thus triggering a CMS initial-sweep. (The CMS runs fine though, but the >> young gen collections take forever.) >> >> http://pastebin.com/RcBkCEEE (of course, if someone's interested I >> here's the full 50 MBs of the otherwise rather predictable log, 2.9MB >> compressed http://zomg.hu/work/wtf-gc.log.xz ) >> >> We're running the stock Oracle 1.6.0_37 64bit JVM, on a 8 core new Xeon >> E3-something with plenty of RAM for the heap, with the following options: >> >> -Xmx5128M >> (-Xms5128M, though the linked gclog is without this) >> -XX:NewSize=300m >> -XX:MaxNewSize=300m >> -XX:PermSize=64m >> -XX:MaxPermSize=192m >> >> -XX:+UseParNewGC >> -XX:ParallelGCThreads=2 >> -XX:MaxTenuringThreshold=4 >> -XX:SurvivorRatio=3 >> >> -XX:+UseConcMarkSweepGC >> -XX:+UnlockDiagnosticVMOptions >> -XX:+CMSScavengeBeforeRemark >> -XX:CMSInitiatingOccupancyFraction=65 >> >> -XX:+PrintGC >> -XX:+PrintGCDetails >> -XX:+PrintTenuringDistribution >> -XX:+PrintGCDateStamps >> -XX:PrintFLSStatistics=1 >> -Xloggc:/logs/gc.log >> -verbose:gc >> >> Has anyone experienced similar issues? >> >> Thanks for your time, >> Pas >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130129/20e639c4/attachment.html