From fweimer at bfk.de Mon Mar 1 00:31:25 2010 From: fweimer at bfk.de (Florian Weimer) Date: Mon, 01 Mar 2010 08:31:25 +0000 Subject: g1: remembered set scan costs very high w/ LRU cache type of behavior In-Reply-To: <5a1151761002271518q40d95865o311699ef66764d32@mail.gmail.com> (Peter Schuller's message of "Sun\, 28 Feb 2010 00\:18\:31 +0100") References: <5a1151761002271048n798d8e54of9981e88e2b447a8@mail.gmail.com> <4B896ED3.3040107@sun.com> <5a1151761002271219j3f241f9aj6bdd5b2a724ceebe@mail.gmail.com> <4B898ABB.4000409@sun.com> <5a1151761002271518q40d95865o311699ef66764d32@mail.gmail.com> Message-ID: <82635gzbmq.fsf@mid.bfk.de> * Peter Schuller: >> The problem you're seeing may be addressed by some >> work in progress that is being done under CR >> >> 6923991: G1: improve scalability of RSet scanning >> >> I don't think that work has been released yet. > > Well, if this commit constitutes the work: > > http://hg.openjdk.java.net/jdk7/hotspot/hotspot/rev/0414c1049f15 > > Then I suspect I have it, because the JDK comes from: > > jdk-7-ea-bin-b84-linux-x64-18_feb_2010.bin > > Which is 7 days after the commit happened. I don't think so, 0414c1049f15 does not seem to have been promoted to the master repository yet. -- Florian Weimer BFK edv-consulting GmbH http://www.bfk.de/ Kriegsstra?e 100 tel: +49-721-96201-1 D-76133 Karlsruhe fax: +49-721-96201-99 From libic at dsrg.mff.cuni.cz Mon Mar 8 07:38:53 2010 From: libic at dsrg.mff.cuni.cz (Peter Libic) Date: Mon, 08 Mar 2010 16:38:53 +0100 Subject: Counting allocations and object sizes Message-ID: <4B951A0D.4060101@dsrg.mff.cuni.cz> Hi, I'd like to count allocations and object sizes in the VM. And I want it with lowest possible overhead and perturbation - what leads me to instrumentation of the VM. After studying sources I thought all allocation should be intercepted by at least some of the functions mentioned at the bottom of this mail. unfortunately, these functions are not sufficient - as tested by following example - I run the following code with lines tagged with V1 and V2 commented out, and in both cases Iv got exactly the same counts. That means, the allocations of TstCls objects are somewhere else and I don't know where :). I'm running the class like this: java -XX:+UseParallelGC -verbose:gc -Xint DifferentAllocs Could someone please write me where should I look for a code that allocates the objects? Thanks a lot! Peter Libic PS: I'm not quite sure if this is the correct list to ask at, I'm sorry if this is the case. ============================================ public class DifferentAllocs { public static void main(String[] args) { Object[] arr = new Object[1024]; System.out.println("Starting test - ALLOCATE"); test(arr); System.out.println("Finished test - ALLOCATE"); System.out.println(arr[(new Random()).nextInt(1004)]); } public static boolean test(Object[] arr) { TstCls s, t, u; int i = 0; s=new TstCls(); t=new TstCls();u=t; /*u=new TstCls();*/ arr[0] = s; arr[1] = t; arr[2] = u; for (i = 3; i < 1003; i++) { /*V1*/ arr[i] = new TstCls(); /*V2*/ //arr[i] = u; } return s.equals(t) || t.equals(u); } } class TstCls { public int val1; public long val2; public long getV() {return val1+val2;} @Override public String toString() {return "123425345";} } ============================================ Intercepted functions: hotspot/src/share/vm/gc_interface/collectedHeap.inline.hpp: CollectedHeap::post_allocation_setup_common CollectedHeap::post_allocation_setup_no_klass_install CollectedHeap::post_allocation_install_obj_klass CollectedHeap::post_allocation_setup_obj CollectedHeap::post_allocation_setup_array CollectedHeap::common_mem_allocate_noinit CollectedHeap::common_mem_allocate_init CollectedHeap::common_permanent_mem_allocate_noinit CollectedHeap::common_permanent_mem_allocate_init CollectedHeap::allocate_from_tlab CollectedHeap::init_obj CollectedHeap::obj_allocate CollectedHeap::array_allocate CollectedHeap::large_typearray_allocate CollectedHeap::permanent_obj_allocate CollectedHeap::permanent_obj_allocate_no_klass_install CollectedHeap::permanent_array_allocate hotspot/src/share/vm/gc_interface/collectedHeap.cpp: CollectedHeap::allocate_from_tlab_slow hotspot/src/share/vm/gc_implementation/parallelScavenge/parallelScavengeHeap.cpp: ParallelScavengeHeap::mem_allocate ParallelScavengeHeap::failed_mem_allocate ParallelScavengeHeap::permanent_mem_allocate ParallelScavengeHeap::failed_permanent_mem_allocate ParallelScavengeHeap::allocate_new_tlab From mbien at fh-landshut.de Sat Mar 6 16:57:55 2010 From: mbien at fh-landshut.de (Michael Bien) Date: Sun, 07 Mar 2010 01:57:55 +0100 Subject: performance impact of JNI GetCritical* Message-ID: <4B92FA13.3070903@fh-landshut.de> Hello, I have a few performance/best practice related questions regarding heap<->C data transfers and GC implications. How optimized are the Get/ReleasePrimitiveArrayCritical JNI functions? Do they disable GC or do they only pin the array at the specific address? If they would disable the GC until release, i suppose this would have pretty severe impact esp for concurrent GCs. Could you recommend best practices? lets say I have a short java array of length 12. Should I copy it to a direct NIO buffer and use the buffer as vehicle, use Get/ReleaseCritical or something else? best regards, -- Michael Bien http://michael-bien.com/ From ercan.canlier at gmail.com Thu Mar 11 11:05:05 2010 From: ercan.canlier at gmail.com (ercan canlier) Date: Thu, 11 Mar 2010 21:05:05 +0200 Subject: About Heap Profiling Message-ID: Hi, I am having problem when i try to run jmap heap profile command in openjdk. The output of java -version is like below in my system. [ercanlier at ercanlier ~]$ java -version java version "1.6.0_17" OpenJDK Runtime Environment (IcedTea6 1.7) (fedora-34.b17.fc12-i386) OpenJDK Server VM (build 14.0-b16, mixed mode) After running jps on console i try to profile heap usage of my application but stucked with the problem Could not find symbol "gHotSpotVMTypeEntryTypeNameOffset" all output of jmap is: [ercanlier at ercanlier ~]$ jmap -heap 3774 Attaching to process ID 3774, please wait... sun.jvm.hotspot.debugger.NoSuchSymbolException: Could not find symbol "gHotSpotVMTypeEntryTypeNameOffset" in any of the known library names (libjvm.so, libjvm_g.so, gamma_g) at sun.jvm.hotspot.HotSpotTypeDataBase.lookupInProcess(HotSpotTypeDataBase.java:390) at sun.jvm.hotspot.HotSpotTypeDataBase.getLongValueFromProcess(HotSpotTypeDataBase.java:371) at sun.jvm.hotspot.HotSpotTypeDataBase.readVMTypes(HotSpotTypeDataBase.java:102) at sun.jvm.hotspot.HotSpotTypeDataBase.(HotSpotTypeDataBase.java:85) at sun.jvm.hotspot.bugspot.BugSpotAgent.setupVM(BugSpotAgent.java:568) at sun.jvm.hotspot.bugspot.BugSpotAgent.go(BugSpotAgent.java:494) at sun.jvm.hotspot.bugspot.BugSpotAgent.attach(BugSpotAgent.java:332) at sun.jvm.hotspot.tools.Tool.start(Tool.java:163) at sun.jvm.hotspot.tools.HeapSummary.main(HeapSummary.java:39) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at sun.tools.jmap.JMap.runTool(JMap.java:196) at sun.tools.jmap.JMap.main(JMap.java:128) Debugger attached successfully. sun.jvm.hotspot.tools.HeapSummary requires a java VM process/core! My os is fedora 12, the server which hosts the application is Redhat, java versions are both the same. I am in trouble with profiling the application because of that with details... Thanks in advance. Regards. -- ERCAN CANLIER -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100311/b348584f/attachment.html From John.Coomes at sun.com Thu Mar 11 13:12:44 2010 From: John.Coomes at sun.com (John Coomes) Date: Thu, 11 Mar 2010 13:12:44 -0800 Subject: About Heap Profiling In-Reply-To: References: Message-ID: <19353.23756.364376.795664@sun.com> ercan canlier (ercan.canlier at gmail.com) wrote: > Hi, > > I am having problem when i try to run jmap heap profile command in openjdk. > > The output of java -version is like below in my system. > > [ercanlier at ercanlier ~]$ java -version > java version "1.6.0_17" > OpenJDK Runtime Environment (IcedTea6 1.7) (fedora-34.b17.fc12-i386) > OpenJDK Server VM (build 14.0-b16, mixed mode) > > After running jps on console i try to profile heap usage of my application > but stucked with the problem Could not find symbol > "gHotSpotVMTypeEntryTypeNameOffset" > ... This is sun/oracle bug 6932270, which was recently fixed. Not sure when it'll make it into an update. See the following for more info: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6932270 https://bugzilla.redhat.com/show_bug.cgi?id=541548 You can download and install the Sun JDK to get a working version right away; go to http://java.sun.com/javase/ and click on the downloads tab. -John From Ryan.Highley at sabre-holdings.com Fri Mar 12 08:52:43 2010 From: Ryan.Highley at sabre-holdings.com (Highley, Ryan) Date: Fri, 12 Mar 2010 10:52:43 -0600 Subject: Discouraging CMS Due To Fragmentation Message-ID: <32DB14FBADE66B439D6A98D2B01D27EC1297CAD9@sgtulmsp02.Global.ad.sabre.com> Hello all, My company has a cache-based application using a well-known Java memory clustering framework for handling the usual failover and load balancing concerns ensuring cache state is maintained when losing a node or two. Naturally, application garbage collector choice has been explored during conversations with the support staff for this framework. The framework's support staff has insisted any application using their framework should use the ParallelOld collector due to issues regarding using CMS and its inherent memory fragmentation. The latest exchange is below with my questions first and the response following. The framework's name has been removed. Questions: Are there specific outstanding reported Sun JVM CMS bugs that are the basis for the requirement to use ParallelOld? If so, what are they, as other applications may also be subject to the same issue? If not, what is so fundamentally different about 's memory usage and garbage collector interactions that makes CMS a bad choice? >From : This is not specific but in Java in general. Sun itself recommends staying away from CMS if possible due to possible fragmentation issues. In other words, with CMS you are just delaying the problem and will possibly hit a very long pauseful collection at some point. In my now several years of reading Sun GC white papers, tuning guides, articles, blog posts, messages threads (here and elsewhere), and the like, I have never heard any assertion from Sun discouraging CMS use altogether simply due to the memory fragmentation inherent to CMS' design. Everything I have seen, and everything we have implemented successfully tuning several other applications, has shown with CMS memory fragmentation is a concern to be managed through proper promotion tuning, heap sizing and shaping, and setting a reasonable CMS initiating occupancy fraction to be fairly certain a promotion failure is at best extremely unlikely. However that nagging possibility that I've missed something along the way still exists. Can I please get a definitive response from one (or more) of the Sun garbage collection software engineers on this list in this matter? Thank you for your attention, Ryan -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100312/10954662/attachment.html From Y.S.Ramakrishna at Sun.COM Fri Mar 12 09:08:47 2010 From: Y.S.Ramakrishna at Sun.COM (Y. Srinivas Ramakrishna) Date: Fri, 12 Mar 2010 09:08:47 -0800 Subject: Discouraging CMS Due To Fragmentation In-Reply-To: <32DB14FBADE66B439D6A98D2B01D27EC1297CAD9@sgtulmsp02.Global.ad.sabre.com> References: <32DB14FBADE66B439D6A98D2B01D27EC1297CAD9@sgtulmsp02.Global.ad.sabre.com> Message-ID: <4B9A751F.6090501@sun.com> I agree with your stance below; that fragmentation is a concern that needs to be managed by means of tuning promotion rates so as to promote only medium- and long-lived objects (which usually, but not always, have a stationary or quasi-stationary object size and lifetime distribution). Jon Masamitsu has a blog that describes this in some detail, and Tony Printezis and Charlie Hunt have a JavaOne talk in which they provide tips for such tuning, and for dealing with possible long-term fragmentation. See also CR 6631166 which fixes some fragmentation issues, which certain customers have found quite effective in reducing fragmentation. There are indeed many, many customers who use CMS successfully despite theoretical concerns regarding fragmentation. All that having been said, there is probably a class of applications. especially those that store medium- and long-lived string objects with which CMS has historically performed poorly. (I do not know whether such customers have tried CMS post-6631166 however.) The reason is that if the size and lifetime distributions of medium-lived objects is long/fat-tailed or flat or, as usually happens in some cases of programs involving long-lived strings, very non-stationary, then the coalition and splitting heuristics used by CMS turn out to be much less effective. You might want to try the G1 garbage collector for such applications because G1 will not suffer from such concerns related to fragmentation. -- ramki Highley, Ryan wrote: > Hello all, > > > > My company has a cache-based application using a well-known Java memory > clustering framework for handling the usual failover and load balancing > concerns ensuring cache state is maintained when losing a node or two. > > > > Naturally, application garbage collector choice has been explored during > conversations with the support staff for this framework. The > framework's support staff has insisted any application using their > framework should use the ParallelOld collector due to issues regarding > using CMS and its inherent memory fragmentation. The latest exchange is > below with my questions first and the response following. The > framework's name has been removed. > > > > Questions: Are there specific outstanding reported Sun JVM CMS bugs > that are the basis for the requirement to use ParallelOld? If so, what > are they, as other applications may also be subject to the same issue? > If not, what is so fundamentally different about 's memory > usage and garbage collector interactions that makes CMS a bad choice? > > > >>From : This is not specific but in Java in general. > Sun itself recommends staying away from CMS if possible due to possible > fragmentation issues. In other words, with CMS you are just delaying the > problem and will possibly hit a very long pauseful collection at some > point. > > > > In my now several years of reading Sun GC white papers, tuning guides, > articles, blog posts, messages threads (here and elsewhere), and the > like, I have never heard any assertion from Sun discouraging CMS use > altogether simply due to the memory fragmentation inherent to CMS' > design. Everything I have seen, and everything we have implemented > successfully tuning several other applications, has shown with CMS > memory fragmentation is a concern to be managed through proper promotion > tuning, heap sizing and shaping, and setting a reasonable CMS initiating > occupancy fraction to be fairly certain a promotion failure is at best > extremely unlikely. > > > > However that nagging possibility that I've missed something along the > way still exists. > > > > Can I please get a definitive response from one (or more) of the Sun > garbage collection software engineers on this list in this matter? > > > > Thank you for your attention, > > > > Ryan > > > > > > > ------------------------------------------------------------------------ > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From Ryan.Highley at sabre-holdings.com Tue Mar 16 09:58:20 2010 From: Ryan.Highley at sabre-holdings.com (Highley, Ryan) Date: Tue, 16 Mar 2010 11:58:20 -0500 Subject: Discouraging CMS Due To Fragmentation In-Reply-To: <4B9A751F.6090501@sun.com> References: <32DB14FBADE66B439D6A98D2B01D27EC1297CAD9@sgtulmsp02.Global.ad.sabre.com> <4B9A751F.6090501@sun.com> Message-ID: <32DB14FBADE66B439D6A98D2B01D27EC12A90E37@sgtulmsp02.Global.ad.sabre.com> Ramki, Thank you for your prompt response and suggestions. As was mentioned in the Dev mailing list thread, the app instances we tend to deal with typically remain up for weeks, if not months, at a time. For most, the allocation patterns are fairly constant and do not require "forcing" a compacting collector to run periodically. However, there are a few requiring exactly that. We will definitely be giving G1 a try, but won't be able to deploy it to production until it's in a GA release. (The PTBs get nervous when the deployment documentation describes compiling a JVM. ;) ) CR 6631166 also looks extremely promising for many of our applications. Ryan -----Original Message----- From: Y.S.Ramakrishna at Sun.COM [mailto:Y.S.Ramakrishna at Sun.COM] Sent: Friday, March 12, 2010 11:09 AM To: Highley, Ryan Cc: hotspot-gc-use at openjdk.java.net Subject: Re: Discouraging CMS Due To Fragmentation I agree with your stance below; that fragmentation is a concern that needs to be managed by means of tuning promotion rates so as to promote only medium- and long-lived objects (which usually, but not always, have a stationary or quasi-stationary object size and lifetime distribution). Jon Masamitsu has a blog that describes this in some detail, and Tony Printezis and Charlie Hunt have a JavaOne talk in which they provide tips for such tuning, and for dealing with possible long-term fragmentation. See also CR 6631166 which fixes some fragmentation issues, which certain customers have found quite effective in reducing fragmentation. There are indeed many, many customers who use CMS successfully despite theoretical concerns regarding fragmentation. All that having been said, there is probably a class of applications. especially those that store medium- and long-lived string objects with which CMS has historically performed poorly. (I do not know whether such customers have tried CMS post-6631166 however.) The reason is that if the size and lifetime distributions of medium-lived objects is long/fat-tailed or flat or, as usually happens in some cases of programs involving long-lived strings, very non-stationary, then the coalition and splitting heuristics used by CMS turn out to be much less effective. You might want to try the G1 garbage collector for such applications because G1 will not suffer from such concerns related to fragmentation. -- ramki Highley, Ryan wrote: > Hello all, > > > > My company has a cache-based application using a well-known Java memory > clustering framework for handling the usual failover and load balancing > concerns ensuring cache state is maintained when losing a node or two. > > > > Naturally, application garbage collector choice has been explored during > conversations with the support staff for this framework. The > framework's support staff has insisted any application using their > framework should use the ParallelOld collector due to issues regarding > using CMS and its inherent memory fragmentation. The latest exchange is > below with my questions first and the response following. The > framework's name has been removed. > > > > Questions: Are there specific outstanding reported Sun JVM CMS bugs > that are the basis for the requirement to use ParallelOld? If so, what > are they, as other applications may also be subject to the same issue? > If not, what is so fundamentally different about 's memory > usage and garbage collector interactions that makes CMS a bad choice? > > > >>From : This is not specific but in Java in general. > Sun itself recommends staying away from CMS if possible due to possible > fragmentation issues. In other words, with CMS you are just delaying the > problem and will possibly hit a very long pauseful collection at some > point. > > > > In my now several years of reading Sun GC white papers, tuning guides, > articles, blog posts, messages threads (here and elsewhere), and the > like, I have never heard any assertion from Sun discouraging CMS use > altogether simply due to the memory fragmentation inherent to CMS' > design. Everything I have seen, and everything we have implemented > successfully tuning several other applications, has shown with CMS > memory fragmentation is a concern to be managed through proper promotion > tuning, heap sizing and shaping, and setting a reasonable CMS initiating > occupancy fraction to be fairly certain a promotion failure is at best > extremely unlikely. > > > > However that nagging possibility that I've missed something along the > way still exists. > > > > Can I please get a definitive response from one (or more) of the Sun > garbage collection software engineers on this list in this matter? > > > > Thank you for your attention, > > > > Ryan > > > > > > > ------------------------------------------------------------------------ > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From linuxhippy at gmail.com Thu Mar 18 10:20:20 2010 From: linuxhippy at gmail.com (Clemens Eisserer) Date: Thu, 18 Mar 2010 18:20:20 +0100 Subject: performance impact of JNI GetCritical* In-Reply-To: <4B92FA13.3070903@fh-landshut.de> References: <4B92FA13.3070903@fh-landshut.de> Message-ID: <194f62551003181020n30b122ccjc092aeecd8e32a26@mail.gmail.com> Hi Michael, > How optimized are the Get/ReleasePrimitiveArrayCritical JNI functions? > Do they disable GC or do they only pin the array at the specific address? For moving gc's they actually stop the whole GC process as long as all arrays are released by ReleasePrimitiveArrayCritical. So basically the cost is some thread-synchronization (to make the gc know on all threads not to run now), plus the impact on the GC. So for small arrays theres quite high overhead. CMS itself is non-moving (old gen) so basically there could be some optimizations, but no idea wether this is actually done. > lets say I have a short java array of length 12. Should I copy it to a > direct NIO buffer and use the buffer as vehicle, use Get/ReleaseCritical > or something else? For very short arrays this is quite likely beneficial (at least if you're on JDK7, at least older JDK6 releases do get/releasecritical + memcpy in jni code). Just to be curious, whats your actual usecase? - Clemens From mbien at fh-landshut.de Thu Mar 18 11:18:39 2010 From: mbien at fh-landshut.de (Michael Bien) Date: Thu, 18 Mar 2010 19:18:39 +0100 Subject: performance impact of JNI GetCritical* In-Reply-To: <194f62551003181020n30b122ccjc092aeecd8e32a26@mail.gmail.com> References: <4B92FA13.3070903@fh-landshut.de> <194f62551003181020n30b122ccjc092aeecd8e32a26@mail.gmail.com> Message-ID: <4BA26E7F.1030506@fh-landshut.de> Thanks for the answer Clemens, comments inline... On 03/18/2010 06:20 PM, Clemens Eisserer wrote: > Hi Michael, > > >> How optimized are the Get/ReleasePrimitiveArrayCritical JNI functions? >> Do they disable GC or do they only pin the array at the specific address? >> > For moving gc's they actually stop the whole GC process as long as all > arrays are released by ReleasePrimitiveArrayCritical. > So basically the cost is some thread-synchronization (to make the gc > know on all threads not to run now), plus the impact on the GC. So for > small arrays theres quite high overhead. > CMS itself is non-moving (old gen) so basically there could be some > optimizations, but no idea wether this is actually done. > > >> lets say I have a short java array of length 12. Should I copy it to a >> direct NIO buffer and use the buffer as vehicle, use Get/ReleaseCritical >> or something else? >> > For very short arrays this is quite likely beneficial (at least if > you're on JDK7, at least older JDK6 releases do get/releasecritical + > memcpy in jni code). > > Just to be curious, whats your actual usecase? > the usecase was to optimize the generated code of JOGL 2 / JOAL / JOCL etc. for some corner cases. I feared that putting load on GetCritical* could cause GC-dependent problems. Thats why I am already using ThreadLocal direct NIO buffers in the high-level API of JOCL to prevent this situation. - michael > - Clemens > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > > -- Michael Bien http://michael-bien.com/ From Y.S.Ramakrishna at Sun.COM Thu Mar 18 11:49:44 2010 From: Y.S.Ramakrishna at Sun.COM (Y. Srinivas Ramakrishna) Date: Thu, 18 Mar 2010 11:49:44 -0700 Subject: performance impact of JNI GetCritical* In-Reply-To: <4BA26E7F.1030506@fh-landshut.de> References: <4B92FA13.3070903@fh-landshut.de> <194f62551003181020n30b122ccjc092aeecd8e32a26@mail.gmail.com> <4BA26E7F.1030506@fh-landshut.de> Message-ID: <4BA275C8.6050600@Sun.COM> Hi Clemens. Michael -- On 03/18/10 11:18, Michael Bien wrote: > Thanks for the answer Clemens, > > comments inline... > > On 03/18/2010 06:20 PM, Clemens Eisserer wrote: >> Hi Michael, >> >> >>> How optimized are the Get/ReleasePrimitiveArrayCritical JNI functions? >>> Do they disable GC or do they only pin the array at the specific address? >>> >> For moving gc's they actually stop the whole GC process as long as all >> arrays are released by ReleasePrimitiveArrayCritical. >> So basically the cost is some thread-synchronization (to make the gc >> know on all threads not to run now), plus the impact on the GC. So for >> small arrays theres quite high overhead. >> CMS itself is non-moving (old gen) so basically there could be some >> optimizations, but no idea wether this is actually done. CMS GC is indeed permitted even when in JNI critical sections, but since most objects are allocated in Eden which uses copying collection, the performance effect of long-lived JNI CS can indeed be disastrous even when you are using CMS. In fact, it might be worse in a configuration using CMS because when Eden is full and because of a long-lived JNI CS we do not scavenge, allocations happen from the old gen, and as we know allocation out of CMS' free lists can be much slower (although of course, we can theoretically withstand much-longer-lived CS because CMS GC can concurrently collect the garbage produced from these allocations). Bottom line: CMS does not give you any practical performance advantage wrt the over(ab?)use of JNI CS. >> >> >>> lets say I have a short java array of length 12. Should I copy it to a >>> direct NIO buffer and use the buffer as vehicle, use Get/ReleaseCritical >>> or something else? >>> >> For very short arrays this is quite likely beneficial (at least if >> you're on JDK7, at least older JDK6 releases do get/releasecritical + >> memcpy in jni code). Right; this is a fact that I learnt much to my delight recently. I requested that this improvement be backported to JDK 6, but I do not know if/when that might happen. -- ramki >> >> Just to be curious, whats your actual usecase? >> > the usecase was to optimize the generated code of JOGL 2 / JOAL / JOCL > etc. for some corner cases. > I feared that putting load on GetCritical* could cause GC-dependent > problems. Thats why I am already using ThreadLocal direct NIO buffers in > the high-level API of JOCL to prevent this situation. > > - michael > > >> - Clemens >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> >> >> > From Y.S.Ramakrishna at Sun.COM Thu Mar 18 12:18:58 2010 From: Y.S.Ramakrishna at Sun.COM (Y. Srinivas Ramakrishna) Date: Thu, 18 Mar 2010 12:18:58 -0700 Subject: performance impact of JNI GetCritical* In-Reply-To: <4BA275C8.6050600@Sun.COM> References: <4B92FA13.3070903@fh-landshut.de> <194f62551003181020n30b122ccjc092aeecd8e32a26@mail.gmail.com> <4BA26E7F.1030506@fh-landshut.de> <4BA275C8.6050600@Sun.COM> Message-ID: <4BA27CA2.60707@Sun.COM> On 03/18/10 11:49, Y. Srinivas Ramakrishna wrote: >>>> lets say I have a short java array of length 12. Should I copy it to a >>>> direct NIO buffer and use the buffer as vehicle, use >>>> Get/ReleaseCritical >>>> or something else? >>>> >>> For very short arrays this is quite likely beneficial (at least if >>> you're on JDK7, at least older JDK6 releases do get/releasecritical + >>> memcpy in jni code). > > Right; this is a fact that I learnt much to my delight recently. > I requested that this improvement be backported to JDK 6, but > I do not know if/when that might happen. Let me clarify/correct my comment a bit. What I meant to say was that NIO performance/behaviour improved a lot in JDK 7 because it stopped using JNI CS, using unsafe instead. (This also had the pleasant side-effect of working around a particularly egregious CMS bug related to JNI CS recently discussed on this alias, and still unfortunately not fixed, which is what initially brought this change to my attention.) As regards Michael's original question on whether NIO or JNI CS is better suited for short arrays, at the risk of stating the obvious, I think the answer is that how much better NIO is (in JDK 7) would likely depend on the frequency of copying and size of these arrays: If JNI CS are very short-lived and infrequent and the arrays are as small as 12 bytes, the difference would likely be negligible. As frequency of use (and degree of concurrent use) and/or array size increases, NIO would likely become increasingly better than JNI CS (in JDK 7 at least). At least that's my hunch. -- ramki From charlie.hunt at oracle.com Thu Mar 18 12:53:14 2010 From: charlie.hunt at oracle.com (charlie hunt) Date: Thu, 18 Mar 2010 14:53:14 -0500 Subject: performance impact of JNI GetCritical* In-Reply-To: <4BA27CA2.60707@Sun.COM> References: <4B92FA13.3070903@fh-landshut.de> <194f62551003181020n30b122ccjc092aeecd8e32a26@mail.gmail.com> <4BA26E7F.1030506@fh-landshut.de> <4BA275C8.6050600@Sun.COM> <4BA27CA2.60707@Sun.COM> Message-ID: <4BA284AA.9080903@oracle.com> Y. Srinivas Ramakrishna wrote: > On 03/18/10 11:49, Y. Srinivas Ramakrishna wrote: > > >>>>> lets say I have a short java array of length 12. Should I copy it to a >>>>> direct NIO buffer and use the buffer as vehicle, use >>>>> Get/ReleaseCritical >>>>> or something else? >>>>> >>>>> >>>> For very short arrays this is quite likely beneficial (at least if >>>> you're on JDK7, at least older JDK6 releases do get/releasecritical + >>>> memcpy in jni code). >>>> >> Right; this is a fact that I learnt much to my delight recently. >> I requested that this improvement be backported to JDK 6, but >> I do not know if/when that might happen. >> > > Let me clarify/correct my comment a bit. > > What I meant to say was that NIO performance/behaviour improved a lot > in JDK 7 because it stopped using JNI CS, using unsafe instead. > (This also had the pleasant side-effect of working around a particularly > egregious CMS bug related to JNI CS recently discussed on this alias, > and still unfortunately not fixed, which is what initially brought this > change to my attention.) > > As regards Michael's original question on whether NIO or JNI CS > is better suited for short arrays, at the risk of stating the obvious, > I think the answer is that how much better NIO is (in JDK 7) would likely > depend on the frequency of copying and size of these arrays: If JNI CS > are very short-lived and infrequent and the arrays are as small as 12 bytes, > the difference would likely be negligible. As frequency of use (and degree > of concurrent use) and/or array size increases, NIO would likely become > increasingly better than JNI CS (in JDK 7 at least). At least that's my hunch. > > -- ramki > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > Fwiw, the enhancement Ramki mentioned of using unsafe in NIO is being integrated into JDK 6u20, (a future release since JDK 6u18 is the most current release). charlie ... -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 5599 bytes Desc: S/MIME Cryptographic Signature Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100318/0594de32/attachment.bin From stack at duboce.net Fri Mar 19 20:21:03 2010 From: stack at duboce.net (Stack) Date: Fri, 19 Mar 2010 20:21:03 -0700 Subject: Using CMS, any chance of forewarning a serial full GC is imminent? Message-ID: <7c962aed1003192021t2d53858fx796cba7414a90fe7@mail.gmail.com> Our app, a distributed database, usually does fine running CMS but when we trip a full serial GC, its disruptive (speaking euphemistically). Monitoring the running application, watching the logs or patching into the JVM-TI, is there any indicator that you know of that would give us forewarning of an imminent full serial GC? If we had this, we could take evasive action. Thanks, St.Ack P.S. G1 is what we really need but going by the response up on this and by how easy our application crashes recent releases, it looks like its going to be a good while before it'll work for our case (We're an open source database so talking to our sun/oracle vendor is not an option). From stack at duboce.net Fri Mar 19 20:31:26 2010 From: stack at duboce.net (Stack) Date: Fri, 19 Mar 2010 20:31:26 -0700 Subject: Using CMS, any chance of forewarning a serial full GC is imminent? In-Reply-To: <7c962aed1003192021t2d53858fx796cba7414a90fe7@mail.gmail.com> References: <7c962aed1003192021t2d53858fx796cba7414a90fe7@mail.gmail.com> Message-ID: <7c962aed1003192031j31dc4c6blb82f2dd42d954795@mail.gmail.com> Or, running CMS, is there a way to trigger full serial GC? If we could make it run explicitly at preordained times, this would be an improvement over it happening at maximum-embarrassment time. Thanks, St.Ack On Fri, Mar 19, 2010 at 8:21 PM, Stack wrote: > Our app, a distributed database, usually does fine running CMS but > when we trip a full serial GC, its disruptive (speaking > euphemistically). > > Monitoring the running application, watching the logs or patching into > the JVM-TI, is there any indicator that you know of that would give us > forewarning of an imminent full serial GC? ?If we had this, we could > take evasive action. > > Thanks, > St.Ack > > P.S. G1 is what we really need but going by the response up on this > and by how easy our application crashes recent releases, it looks like > its going to be a good while before it'll work for our case (We're an > open source database so talking to our sun/oracle vendor is not an > option). > From Jon.Masamitsu at Sun.COM Fri Mar 19 20:43:08 2010 From: Jon.Masamitsu at Sun.COM (Jon Masamitsu) Date: Fri, 19 Mar 2010 20:43:08 -0700 Subject: Using CMS, any chance of forewarning a serial full GC is imminent? In-Reply-To: <7c962aed1003192031j31dc4c6blb82f2dd42d954795@mail.gmail.com> References: <7c962aed1003192021t2d53858fx796cba7414a90fe7@mail.gmail.com> <7c962aed1003192031j31dc4c6blb82f2dd42d954795@mail.gmail.com> Message-ID: <22DC392D-5732-4D3B-8D0A-E7BFD7787995@sun.com> System.gc() will cause CMS to do a full collection. Does that help? As long as you have not turned off explicit GC's nor set flags so that CMS does System.gc() concurrently, you can compact the tenured (CMS) gen that way. On Mar 19, 2010, at 8:31 PM, Stack wrote: > Or, running CMS, is there a way to trigger full serial GC? If we > could make it run explicitly at preordained times, this would be an > improvement over it happening at maximum-embarrassment time. > > Thanks, > St.Ack > > > On Fri, Mar 19, 2010 at 8:21 PM, Stack wrote: >> Our app, a distributed database, usually does fine running CMS but >> when we trip a full serial GC, its disruptive (speaking >> euphemistically). >> >> Monitoring the running application, watching the logs or patching >> into >> the JVM-TI, is there any indicator that you know of that would give >> us >> forewarning of an imminent full serial GC? If we had this, we could >> take evasive action. >> >> Thanks, >> St.Ack >> >> P.S. G1 is what we really need but going by the response up on this >> and by how easy our application crashes recent releases, it looks >> like >> its going to be a good while before it'll work for our case (We're an >> open source database so talking to our sun/oracle vendor is not an >> option). >> > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From Y.S.Ramakrishna at Sun.COM Sat Mar 20 19:18:31 2010 From: Y.S.Ramakrishna at Sun.COM (Y. Srinivas Ramakrishna) Date: Sat, 20 Mar 2010 19:18:31 -0700 Subject: Using CMS, any chance of forewarning a serial full GC is imminent? In-Reply-To: <7c962aed1003192021t2d53858fx796cba7414a90fe7@mail.gmail.com> References: <7c962aed1003192021t2d53858fx796cba7414a90fe7@mail.gmail.com> Message-ID: <4BA581F7.1080904@sun.com> Hello St.Ack -- Stack wrote: > Our app, a distributed database, usually does fine running CMS but > when we trip a full serial GC, its disruptive (speaking > euphemistically). > > Monitoring the running application, watching the logs or patching into > the JVM-TI, is there any indicator that you know of that would give us > forewarning of an imminent full serial GC? If we had this, we could > take evasive action. Probably the most important input factors are the statistics internally maintained by the CMS collector regarding the population spread of free blocks, their expected historical demand and their recent demand, some combination of which should, at least theoretically, yield a suitable statistical predictor of the imminency of promotion failure. Unfortunately such a predictor has not been synthesized by us yet (and would probably be a challenge if we are to avoid false positives; and we would almost certainly need help from an expert statistician to synthesize such a predictor). If properly designed (and that's a big if, such a predictor could be exported via a suitable GC MBean and could be polled to automate the triggering of evasive action (i.e. move load to another node and trigger a GC on this node; it would be an interesting distributed coordination problem to avoid situations where a large majority of nodes decide at roughly the same time that a full GC is imminent and try to push their load to a neighbour that is ill-prepared to handle it). May be you can tell us whether -- when promotion failure occurs in your current distributed system, whether it is an isolated incident on one node or if it has a more catastrophic quality where multiple nodes succumb to the problem at about the same time, causing a promotion failure contagion, as it were, to spread rapidly through the entire system. > > Thanks, > St.Ack > > P.S. G1 is what we really need but going by the response up on this > and by how easy our application crashes recent releases, it looks like > its going to be a good while before it'll work for our case (We're an > open source database so talking to our sun/oracle vendor is not an > option). Do give G1 a try in 6u20 though. The reliability has been much improved since 6u18. Relatedly, and out of curiosity, focusing on a single node/jvm of yr system, what is the rough frequency of promotion failure that you see? -- ramki > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From peter.schuller at infidyne.com Sun Mar 21 06:12:59 2010 From: peter.schuller at infidyne.com (Peter Schuller) Date: Sun, 21 Mar 2010 14:12:59 +0100 Subject: G1 heap growth very aggressive in spite of low heap usage Message-ID: <5a1151761003210612j5cd8c3afh57d94b75b5c24c55@mail.gmail.com> Hello, what is the intended behavior of G1 with respect to heap size? I was playing around with a fairly simple test[1] designed for something else, when I realized that G1 was expanding my heap in ways that I find unexpected[2]. Reading g1CollectedHeap.cpp there seems to be attempts to adher to MaxHeapFreeRatio and MinHeapFreeRatio. However, even when I run with (on purpose) extreme options of 10/25 minimum/maximum, heap expansion happens as in [2]. My expected behavior would for the heep size to remain within some reasonable percentage of actual memory usage (assuming I only specify max heap size and leave minimum heap size untouched), and it is a desirable behavior in part because concurrent marking only seems to trigger as a function of heap usage relative to the heap size. In many situations it is desirable to have the JVM heap size be a reasonable indicator of actual memory requirements/use. If I am only using a fraction of heap size (5% in this case), the delay in concurrent marking will mean that my application (unless it *only* generates ephemeral garbage) will grow unreasonably untill concurrent marking is triggered. After it completes the heap use would go down very very significantly. The end result is that (1) I actually need a lot more memory than I otherwise would, and (2) it is difficult to monitor the real memory requirements of the application. I do understand that maximum throughput tend to be achieved by postponing concurrent marking to maximize yield, but if that was my aim, I would specify -Xms == -Xmx. With a small -Xms and a large -Xmx, I would want the heap to expand only as "necessary" (though I understand that is difficult to define), resulting in a much more reasonable trigger threshold for concurrent marking. The test is a fairly simple web server that i use 'ab' to throw traffic at. It spawns a number of threads per requests, each of which tries to produce a non-trivial stack depth, and then sleeps for a random period. The intent was to use this to try to see what kind of overhead was incurred for root set scanning with many threads (though few active) during young generation pauses. I then added some generation of permanent data that I could trigger by submitting HTTP requests. The sudden growth seen in [2], when it spikes up from 100-200 MB to 3-4 GB, is when I submit some request to trigger generation of permanent data. The actual amount is very limited however, and as you can see it very quickly spikes the memory use far beyond any reasonable 'min free' percentage might be designed to. The options I used for the run from which [2] comes from are: -XX:+PrintGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+UseG1GC -XX:MaxGCPauseMillis=25 -XX:GCPauseIntervalMillis=50 -XX:+G1ParallelRSetUpdatingEnabled -XX:+G1ParallelRSetScanningEnabled -Xmx4G -XX:G1ConfidencePercent=100 -XX:MinHeapFreeRatio=10 -XX:MaxHeapFreeRatio=25 This was run on a recent checkout of the bsdport of openjdk7: changeset: 192:9f250d0d1b40 tag: tip parent: 187:417d1a0aa480 parent: 191:6b1069f53fbc user: Greg Lewis date: Sat Mar 20 11:11:42 2010 -0700 summary: Merge from main OpenJDK repository I can provide an executable jar file if someone wants to test it but is not set up to build clojure/leiningen projects. [1] http://github.com/scode/httpgctest [2]: Here is an excerpt. Each pause is 0.25-5 seconds in between depending on whether I am generating permanent data at the time. [GC pause (young) 45M->35M(110M), 0.0482680 secs] [GC pause (young) 49M->42M(110M), 0.0428210 secs] [GC pause (young) 53M->46M(110M), 0.0238750 secs] [GC pause (young) 59M->51M(110M), 0.0312480 secs] [GC pause (young) 63M->58M(110M), 0.0404710 secs] [GC pause (young) 74M->62M(110M), 0.0262080 secs] [GC pause (young) 75M->69M(220M), 0.0509200 secs] [GC pause (young) 85M->77M(440M), 0.0478070 secs] [GC pause (young) 95M->86M(880M), 0.0560680 secs] [GC pause (young) 102M->96M(1524M), 0.0618900 secs] [GC pause (young) 112M->102M(2039M), 0.0404470 secs] [GC pause (young) 116M->111M(2451M), 0.0546120 secs] [GC pause (young) 126M->120M(2780M), 0.0441630 secs] [GC pause (young) 137M->121M(3044M), 0.0201850 secs] [GC pause (young) 136M->122M(3255M), 0.0113520 secs] [GC pause (young) 134M->122M(3424M), 0.0111860 secs] [GC pause (young) 133M->122M(3559M), 0.0089910 secs] [GC pause (young) 133M->122M(3667M), 0.0079770 secs] [GC pause (young) 140M->122M(3753M), 0.0074160 secs] [GC pause (young) 144M->136M(3753M), 0.0738920 secs] [GC pause (young) 146M->141M(3753M), 0.0363720 secs] [GC pause (young) 154M->147M(3753M), 0.0256830 secs] [GC pause (young) 158M->148M(3753M), 0.0116360 secs] [GC pause (young) 157M->149M(3753M), 0.0172780 secs] [GC pause (young) 157M->154M(3753M), 0.0273040 secs] [GC pause (young) 168M->163M(3753M), 0.0526970 secs] [GC pause (young) 179M->172M(3822M), 0.0547310 secs] [GC pause (young) 190M->180M(3877M), 0.0337350 secs] [GC pause (young) 195M->189M(3921M), 0.0530290 secs] [GC pause (young) 206M->199M(3956M), 0.0598740 secs] [GC pause (young) 218M->206M(3984M), 0.0320840 secs] [GC pause (young) 222M->207M(4007M), 0.0140790 secs] [GC pause (young) 221M->207M(4025M), 0.0084930 secs] [GC pause (young) 218M->207M(4040M), 0.0066170 secs] [GC pause (young) 219M->207M(4052M), 0.0069570 secs] [GC pause (young) 223M->207M(4061M), 0.0067710 secs] [GC pause (young) 227M->207M(4061M), 0.0071300 secs] [GC pause (young) 231M->207M(4061M), 0.0070520 secs] -- / Peter Schuller From tony.printezis at sun.com Mon Mar 22 08:35:37 2010 From: tony.printezis at sun.com (Tony Printezis) Date: Mon, 22 Mar 2010 11:35:37 -0400 Subject: G1 heap growth very aggressive in spite of low heap usage In-Reply-To: <5a1151761003210612j5cd8c3afh57d94b75b5c24c55@mail.gmail.com> References: <5a1151761003210612j5cd8c3afh57d94b75b5c24c55@mail.gmail.com> Message-ID: <4BA78E49.30906@sun.com> Peter, Yes, right now G1 expands the heap a bit too aggressively. It checks the overall GC overhead and, if it's higher than the goal, it'd expand the heap hoping that it will reach the required GC overhead (typically, the larger the heap, the lower the GC overhead). I'd guess that any micro benchmark that tries to stress the GC (i.e., do a lot of allocations, not much else) would cause G1 to expand the heap aggressively, given that such benchmarks typically do mostly GC and not much else (can't tell whether this is the case for your test as you don't have -XX:+PrintGCTimeStamps enabled to see how close together the GCs are). Setting the max heap size with -Xmx will control how much G1 will expand the heap. Tony Peter Schuller wrote: > Hello, > > what is the intended behavior of G1 with respect to heap size? I was > playing around with a fairly simple test[1] designed for something > else, when I realized that G1 was expanding my heap in ways that I > find unexpected[2]. Reading g1CollectedHeap.cpp there seems to be > attempts to adher to MaxHeapFreeRatio and MinHeapFreeRatio. However, > even when I run with (on purpose) extreme options of 10/25 > minimum/maximum, heap expansion happens as in [2]. > > My expected behavior would for the heep size to remain within some > reasonable percentage of actual memory usage (assuming I only specify > max heap size and leave minimum heap size untouched), and it is a > desirable behavior in part because concurrent marking only seems to > trigger as a function of heap usage relative to the heap size. In many > situations it is desirable to have the JVM heap size be a reasonable > indicator of actual memory requirements/use. If I am only using a > fraction of heap size (5% in this case), the delay in concurrent > marking will mean that my application (unless it *only* generates > ephemeral garbage) will grow unreasonably untill concurrent marking is > triggered. After it completes the heap use would go down very very > significantly. The end result is that (1) I actually need a lot more > memory than I otherwise would, and (2) it is difficult to monitor the > real memory requirements of the application. > > I do understand that maximum throughput tend to be achieved by > postponing concurrent marking to maximize yield, but if that was my > aim, I would specify -Xms == -Xmx. With a small -Xms and a large -Xmx, > I would want the heap to expand only as "necessary" (though I > understand that is difficult to define), resulting in a much more > reasonable trigger threshold for concurrent marking. > > The test is a fairly simple web server that i use 'ab' to throw > traffic at. It spawns a number of threads per requests, each of which > tries to produce a non-trivial stack depth, and then sleeps for a > random period. The intent was to use this to try to see what kind of > overhead was incurred for root set scanning with many threads (though > few active) during young generation pauses. > > I then added some generation of permanent data that I could trigger by > submitting HTTP requests. The sudden growth seen in [2], when it > spikes up from 100-200 MB to 3-4 GB, is when I submit some request to > trigger generation of permanent data. The actual amount is very > limited however, and as you can see it very quickly spikes the memory > use far beyond any reasonable 'min free' percentage might be designed > to. > > The options I used for the run from which [2] comes from are: > > -XX:+PrintGC > -XX:+UnlockExperimentalVMOptions > -XX:+UnlockDiagnosticVMOptions > -XX:+UseG1GC > -XX:MaxGCPauseMillis=25 > -XX:GCPauseIntervalMillis=50 > -XX:+G1ParallelRSetUpdatingEnabled > -XX:+G1ParallelRSetScanningEnabled > -Xmx4G > -XX:G1ConfidencePercent=100 > -XX:MinHeapFreeRatio=10 > -XX:MaxHeapFreeRatio=25 > > This was run on a recent checkout of the bsdport of openjdk7: > > changeset: 192:9f250d0d1b40 > tag: tip > parent: 187:417d1a0aa480 > parent: 191:6b1069f53fbc > user: Greg Lewis > date: Sat Mar 20 11:11:42 2010 -0700 > summary: Merge from main OpenJDK repository > > I can provide an executable jar file if someone wants to test it but > is not set up to build clojure/leiningen projects. > > [1] http://github.com/scode/httpgctest > > [2]: > > Here is an excerpt. Each pause is 0.25-5 seconds in between depending > on whether I am generating permanent data at the time. > > [GC pause (young) 45M->35M(110M), 0.0482680 secs] > [GC pause (young) 49M->42M(110M), 0.0428210 secs] > [GC pause (young) 53M->46M(110M), 0.0238750 secs] > [GC pause (young) 59M->51M(110M), 0.0312480 secs] > [GC pause (young) 63M->58M(110M), 0.0404710 secs] > [GC pause (young) 74M->62M(110M), 0.0262080 secs] > [GC pause (young) 75M->69M(220M), 0.0509200 secs] > [GC pause (young) 85M->77M(440M), 0.0478070 secs] > [GC pause (young) 95M->86M(880M), 0.0560680 secs] > [GC pause (young) 102M->96M(1524M), 0.0618900 secs] > [GC pause (young) 112M->102M(2039M), 0.0404470 secs] > [GC pause (young) 116M->111M(2451M), 0.0546120 secs] > [GC pause (young) 126M->120M(2780M), 0.0441630 secs] > [GC pause (young) 137M->121M(3044M), 0.0201850 secs] > [GC pause (young) 136M->122M(3255M), 0.0113520 secs] > [GC pause (young) 134M->122M(3424M), 0.0111860 secs] > [GC pause (young) 133M->122M(3559M), 0.0089910 secs] > [GC pause (young) 133M->122M(3667M), 0.0079770 secs] > [GC pause (young) 140M->122M(3753M), 0.0074160 secs] > [GC pause (young) 144M->136M(3753M), 0.0738920 secs] > [GC pause (young) 146M->141M(3753M), 0.0363720 secs] > [GC pause (young) 154M->147M(3753M), 0.0256830 secs] > [GC pause (young) 158M->148M(3753M), 0.0116360 secs] > [GC pause (young) 157M->149M(3753M), 0.0172780 secs] > [GC pause (young) 157M->154M(3753M), 0.0273040 secs] > [GC pause (young) 168M->163M(3753M), 0.0526970 secs] > [GC pause (young) 179M->172M(3822M), 0.0547310 secs] > [GC pause (young) 190M->180M(3877M), 0.0337350 secs] > [GC pause (young) 195M->189M(3921M), 0.0530290 secs] > [GC pause (young) 206M->199M(3956M), 0.0598740 secs] > [GC pause (young) 218M->206M(3984M), 0.0320840 secs] > [GC pause (young) 222M->207M(4007M), 0.0140790 secs] > [GC pause (young) 221M->207M(4025M), 0.0084930 secs] > [GC pause (young) 218M->207M(4040M), 0.0066170 secs] > [GC pause (young) 219M->207M(4052M), 0.0069570 secs] > [GC pause (young) 223M->207M(4061M), 0.0067710 secs] > [GC pause (young) 227M->207M(4061M), 0.0071300 secs] > [GC pause (young) 231M->207M(4061M), 0.0070520 secs] > > > From peter.schuller at infidyne.com Mon Mar 22 12:27:02 2010 From: peter.schuller at infidyne.com (Peter Schuller) Date: Mon, 22 Mar 2010 20:27:02 +0100 Subject: G1 heap growth very aggressive in spite of low heap usage In-Reply-To: <4BA78E49.30906@sun.com> References: <5a1151761003210612j5cd8c3afh57d94b75b5c24c55@mail.gmail.com> <4BA78E49.30906@sun.com> Message-ID: <5a1151761003221227h452c9dbaqc39fdf4c7e95ff8c@mail.gmail.com> Hello, > Yes, right now G1 expands the heap a bit too aggressively. It checks the > overall GC overhead and, if it's higher than the goal, it'd expand the heap > hoping that it will reach the required GC overhead (typically, the larger > the heap, the lower the GC overhead). Interesting. Looking at G1CollectorPolicy::expansion_amount(), this seems to be determined by average pause time ratio. Based on my understanding of G1, it seems to me that this would be fundamentally prone to "false positives". What can happen, and seems to happen in this case, is that purely young GC:s end up triggering heap growth. This is in spite of the young generation / region count being determined by the desired pause time (normally) rather than any lack of heap space. I don't know what G1 does if there is too little free heap for the preferred amount of young generation regions, but I would expect that the primary effect of increasing heap size, in terms of GC overhead, would be that of: (1) Decreasing the frequency of concurrent marks and the overhead associated with it. (2) Increasing the pay-off of partial GC:s after such a mark, due to (presumably) a larger average free ratio in selected regions. But under normal circumstances, when not actually running out of heap space, would this policy every be expected to be effective in any significant percentage of cases? While I can understand that spending too much time on GC means you want to cut down on GC overhead *in general*, there seems to me to be a very weak relation between the cost of non-young evacuations (directly and indirectly through marking) and the cost of the young generation collections. If there is an excessive cost to young generation collections due to excessive promotion into old generations, would not that rather be an indication that the pause time goal and desired GC overhead are simply incompatible given the workload of the application? If so, a way out might be to accept the added overhead (but perhaps provide diagnostic feedback). Increasing the young generation size in an attempt to increase efficiency at the detriment of collection pause time could be an option, but it would only work if the application exhibits behavior consistent with the generational hypothesis, so is not a very safe bet. Probably in an ideal world this would be a knob (which to prefer). Ideally, maybe heap expansion would primarily be triggered by high cost of non-young collections. However I also understand that it is difficult to impossible to measure the overhead of concurrent marking, even if the cost of non-young region evacuations could be measured. Another observation is that it may be advantageous to only expand the heap when an allocation rate fails (as a result of G1CollectedHeap::expand_and_allocate() perhaps?), or at least not until the lack of heap space is in some way affecting the cost of young generation collections (or e.g. at the start of concurrent marking where we expect to start incurring costs associated with the heap size being too small). Thoughts? > I'd guess that any micro benchmark > that tries to stress the GC (i.e., do a lot of allocations, not much else) > would cause G1 to expand the heap aggressively, given that such benchmarks > typically do mostly GC and not much else (can't tell whether this is the > case for your test as you don't have -XX:+PrintGCTimeStamps enabled to see > how close together the GCs are). Yes, but this also highlights, I think, an issue with the ratio of time spent since it specifically does not take into account the allocation rate of the mutator. For non-young evacuation and concurrent marking cost that would likely not matter since a larger heap would always lead to better throughput (and greater GC efficiency), but because the logic is applied also based on the cost of young generation collections the effects on heap size are probably likely to be very non-expected for any application that has temporary bursts of allocation (which I think is very much realistic even in production code). On time stamps: I can provide a sample runt with PrintGCTimeStamps, but the short answer is that they happened relatively frequently (several times per second) and the growth exhibited was not preceeded by a concurrent mark or non-young evacuations. > Setting the max heap size with -Xmx will control how much G1 will expand the > heap. Understood. However for general-purpose use I am very interested in seeing the JVM self-regulate it's memory use in such a way that a non-developer can look at the memory use of a JVM (or for that matter the heap free/total:s) and draw some kind of reasonable ballpark conclusion on memory demands. Currently one can fairly easily trigger extreme heap growth to the point of multiple orders of magnitude. And since concurrent marking and non-young evacuations won't happen until the heap size is significantly exhausted, that effectively means that your program may end up seemingly "needing" orders of magnitude more memory than what it actually does. -- / Peter Schuller From peter.schuller at infidyne.com Mon Mar 22 13:53:52 2010 From: peter.schuller at infidyne.com (Peter Schuller) Date: Mon, 22 Mar 2010 21:53:52 +0100 Subject: G1 heap growth very aggressive in spite of low heap usage In-Reply-To: <5a1151761003221227h452c9dbaqc39fdf4c7e95ff8c@mail.gmail.com> References: <5a1151761003210612j5cd8c3afh57d94b75b5c24c55@mail.gmail.com> <4BA78E49.30906@sun.com> <5a1151761003221227h452c9dbaqc39fdf4c7e95ff8c@mail.gmail.com> Message-ID: <5a1151761003221353y5d312b7br40afbe6454274d42@mail.gmail.com> >> Yes, right now G1 expands the heap a bit too aggressively. It checks the >> overall GC overhead and, if it's higher than the goal, it'd expand the heap >> hoping that it will reach the required GC overhead (typically, the larger >> the heap, the lower the GC overhead). > > Interesting. Looking at G1CollectorPolicy::expansion_amount(), this > seems to be determined by average pause time ratio. For the record, changing G1GCPercent to 100 (from the default of 10) did indeed eliminate this behavior. (Should this option be enabled in non-development builds so that it can be tweaked on an out-of-the-box JVM?) -- / Peter Schuller From tony.printezis at sun.com Mon Mar 22 15:02:01 2010 From: tony.printezis at sun.com (Tony Printezis) Date: Mon, 22 Mar 2010 18:02:01 -0400 Subject: G1 heap growth very aggressive in spite of low heap usage In-Reply-To: <5a1151761003221227h452c9dbaqc39fdf4c7e95ff8c@mail.gmail.com> References: <5a1151761003210612j5cd8c3afh57d94b75b5c24c55@mail.gmail.com> <4BA78E49.30906@sun.com> <5a1151761003221227h452c9dbaqc39fdf4c7e95ff8c@mail.gmail.com> Message-ID: <4BA7E8D9.3020200@sun.com> Peter, Peter Schuller wrote: > I don't know what G1 does if there is too little free heap for the > preferred amount of young generation regions, but I would expect that > the primary effect of increasing heap size, in terms of GC overhead, > would be that of: > > (1) Decreasing the frequency of concurrent marks and the overhead > associated with it. > (2) Increasing the pay-off of partial GC:s after such a mark, due to > (presumably) a larger average free ratio in selected regions. > There's another advantage of increasing heap size. The larger the heap size is, the larger the young gen can grow (assuming the prediction heuristics determine that the pause times will not go over the desired goal) and, as a result, the less frequent collections will be. This will also decrease the overall GC overhead. > But under normal circumstances, when not actually running out of heap > space, would this policy every be expected to be effective in any > significant percentage of cases? > Yes. See above. But, I agree that it's a bit too aggressive as it is right now. Tony > While I can understand that spending too much time on GC means you > want to cut down on GC overhead *in general*, there seems to me to be > a very weak relation between the cost of non-young evacuations > (directly and indirectly through marking) and the cost of the young > generation collections. > > If there is an excessive cost to young generation collections due to > excessive promotion into old generations, would not that rather be an > indication that the pause time goal and desired GC overhead are simply > incompatible given the workload of the application? > > If so, a way out might be to accept the added overhead (but perhaps > provide diagnostic feedback). > > Increasing the young generation size in an attempt to increase > efficiency at the detriment of collection pause time could be an > option, but it would only work if the application exhibits behavior > consistent with the generational hypothesis, so is not a very safe > bet. Probably in an ideal world this would be a knob (which to > prefer). > > Ideally, maybe heap expansion would primarily be triggered by high > cost of non-young collections. However I also understand that it is > difficult to impossible to measure the overhead of concurrent marking, > even if the cost of non-young region evacuations could be measured. > > Another observation is that it may be advantageous to only expand the > heap when an allocation rate fails (as a result of > G1CollectedHeap::expand_and_allocate() perhaps?), or at least not > until the lack of heap space is in some way affecting the cost of > young generation collections (or e.g. at the start of concurrent > marking where we expect to start incurring costs associated with the > heap size being too small). > > Thoughts? > > >> I'd guess that any micro benchmark >> that tries to stress the GC (i.e., do a lot of allocations, not much else) >> would cause G1 to expand the heap aggressively, given that such benchmarks >> typically do mostly GC and not much else (can't tell whether this is the >> case for your test as you don't have -XX:+PrintGCTimeStamps enabled to see >> how close together the GCs are). >> > > Yes, but this also highlights, I think, an issue with the ratio of > time spent since it specifically does not take into account the > allocation rate of the mutator. For non-young evacuation and > concurrent marking cost that would likely not matter since a larger > heap would always lead to better throughput (and greater GC > efficiency), but because the logic is applied also based on the cost > of young generation collections the effects on heap size are probably > likely to be very non-expected for any application that has temporary > bursts of allocation (which I think is very much realistic even in > production code). > > On time stamps: I can provide a sample runt with PrintGCTimeStamps, > but the short answer is that they happened relatively frequently > (several times per second) and the growth exhibited was not preceeded > by a concurrent mark or non-young evacuations. > > >> Setting the max heap size with -Xmx will control how much G1 will expand the >> heap. >> > > Understood. However for general-purpose use I am very interested in > seeing the JVM self-regulate it's memory use in such a way that a > non-developer can look at the memory use of a JVM (or for that matter > the heap free/total:s) and draw some kind of reasonable ballpark > conclusion on memory demands. > > Currently one can fairly easily trigger extreme heap growth to the > point of multiple orders of magnitude. And since concurrent marking > and non-young evacuations won't happen until the heap size is > significantly exhausted, that effectively means that your program may > end up seemingly "needing" orders of magnitude more memory than what > it actually does. > > From tony.printezis at sun.com Mon Mar 22 15:15:34 2010 From: tony.printezis at sun.com (Tony Printezis) Date: Mon, 22 Mar 2010 18:15:34 -0400 Subject: G1 heap growth very aggressive in spite of low heap usage In-Reply-To: <5a1151761003221353y5d312b7br40afbe6454274d42@mail.gmail.com> References: <5a1151761003210612j5cd8c3afh57d94b75b5c24c55@mail.gmail.com> <4BA78E49.30906@sun.com> <5a1151761003221227h452c9dbaqc39fdf4c7e95ff8c@mail.gmail.com> <5a1151761003221353y5d312b7br40afbe6454274d42@mail.gmail.com> Message-ID: <4BA7EC06.9090300@sun.com> Peter, We have recently made some changes to G1 to re-use existing cmd line parameters where appropriate instead of introducing new ones. In this case I think we should actually observe GCTimeRatio instead of making G1GCPercent a product flag. I opened "6937160: G1: should observe GCTimeRatio" to track this. Tony Peter Schuller wrote: >>> Yes, right now G1 expands the heap a bit too aggressively. It checks the >>> overall GC overhead and, if it's higher than the goal, it'd expand the heap >>> hoping that it will reach the required GC overhead (typically, the larger >>> the heap, the lower the GC overhead). >>> >> Interesting. Looking at G1CollectorPolicy::expansion_amount(), this >> seems to be determined by average pause time ratio. >> > > For the record, changing G1GCPercent to 100 (from the default of 10) > did indeed eliminate this behavior. > > (Should this option be enabled in non-development builds so that it > can be tweaked on an out-of-the-box JVM?) > > From ercan.canlier at gmail.com Tue Mar 23 02:57:13 2010 From: ercan.canlier at gmail.com (ercan canlier) Date: Tue, 23 Mar 2010 11:57:13 +0200 Subject: Heap Size in 32 bit Redhat Message-ID: Hi there, The os of our java based application is RHEL 5. It has 4GB physical ram. Before we install kernel modules to enable PAE, output of memory commands were showing us 3GB, successfully patched PAE module and now current memory is 4GB. The version of java is : java version "1.6.0" OpenJDK Runtime Environment (build 1.6.0-b09) OpenJDK Server VM (build 1.6.0-b09, mixed mode) After i installl debuginfo packages to my local os Fedora 12, i could successfully run jmap command to profile application. But in RHEL5 i couldnt find the proper packages to see what is going on my application. I would like to learn current heap size and all other informations about current optimization that i used like above: -Xmx3800m -Xms3800m -Xmn2g -Xss128k -XX:+UseParallelGC -XX:ParallelGCThreads=20 -XX:+UseConcMarkSweepGC I use this parameters and the application is running... Can you tell me the max heap size in 32 Bit OS which includes PAE module? In all cases as i know, people tells that if you have 32 bit system, you just see 3GB physical memory, but if you are using linux and if you install PAE kernel modules you can see more than 3GB, so in the same cases it must have heap size as well, i dont know these with details, is there anybody who can guide me in advanced? thanks -- ERCAN CANLIER -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100323/de21fcdb/attachment.html From linuxhippy at gmail.com Tue Mar 23 08:21:30 2010 From: linuxhippy at gmail.com (Clemens Eisserer) Date: Tue, 23 Mar 2010 16:21:30 +0100 Subject: Heap Size in 32 bit Redhat In-Reply-To: References: Message-ID: <194f62551003230821h1cd4b6f5mb73f89ae70bbebbc@mail.gmail.com> Hi, As far as I know there's still the 3GB/1GB user/kernel-space split even when using PAE. PAE doesn't expand the memory available per-process, but enables the OS to use up to 64GB with each process still limited to its 32-bit adress range (with some percentage reserved for the kernel). I would suggest using 64-bit JVM with CompressedOops, usually that performs very good :) - Clemens From david.tavoularis at mycom-int.com Wed Mar 24 08:41:38 2010 From: david.tavoularis at mycom-int.com (David Tavoularis) Date: Wed, 24 Mar 2010 16:41:38 +0100 Subject: Strange long pauses between 2 Young GCs Message-ID: Hi, In my application, I noticed very strange pauses (15s+19s+25s+30s+59s+28s = 2min56) between 2 Young GCs (on Java6u13 64-bits / Solaris 10) with ParallelGC+ParallelOldGC collectors. Has anyone encountered this kind of pauses and has an explanation (see extract of GC logs at the end of the mail) ? My issue is that some JMS messages have been sent to DeadMessageQueue because the process was not "responding". I do not think that my server was swapping (30GB free RAM) and usually, a Full GC takes between 2s and 10s (WallTime="real"), between 3s and 60s (CpuTime="user") Thanks in advance -- David JVM options : -server -Xms13000m -Xmx13000m -XX:+UseParallelGC -XX:+AggressiveHeap -XX:GCHeapFreeLimit=5 -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -XX:+PrintGCDateStamps -Xloggc:/path/to/gc/logs -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/path/to/heap/dumps -XX:NewSize=2048m -XX:MaxNewSize=2048m -XX:+UseParallelOldGC -XX:MaxPermSize=256m 2010-03-24T05:54:23.467+0100: 47058.915: [GC Desired survivor size 181665792 bytes, new threshold 1 (max 15) [PSYoungGen: 1792448K->16384K(1763776K)] 2472857K->702761K(12978624K), 0.1969989 secs] [Times: user=0.29 sys=0.07, real=0.20 secs] Heap after GC invocations=3606 (full 0): PSYoungGen total 1763776K, used 16384K [0xfffffffeec800000, 0xffffffff6c800000, 0xffffffff6c800000) eden space 1747392K, 0% used [0xfffffffeec800000,0xfffffffeec800000,0xffffffff57270000) from space 16384K, 100% used [0xffffffff57270000,0xffffffff58270000,0xffffffff58270000) to space 177408K, 0% used [0xffffffff61ac0000,0xffffffff61ac0000,0xffffffff6c800000) ParOldGen total 11214848K, used 686377K [0xfffffffc40000000, 0xfffffffeec800000, 0xfffffffeec800000) object space 11214848K, 6% used [0xfffffffc40000000,0xfffffffc69e4a480,0xfffffffeec800000) PSPermGen total 73728K, used 56019K [0xfffffffc30000000, 0xfffffffc34800000, 0xfffffffc40000000) object space 73728K, 75% used [0xfffffffc30000000,0xfffffffc336b4c30,0xfffffffc34800000) } Total time for which application threads were stopped: 0.2054411 seconds Application time: 2.1629642 seconds Total time for which application threads were stopped: 0.0089378 seconds Application time: 0.0000853 seconds Total time for which application threads were stopped: 0.0012767 seconds Application time: 0.0001092 seconds Total time for which application threads were stopped: 0.0010313 seconds Application time: 0.0000721 seconds Total time for which application threads were stopped: 0.0010318 seconds Application time: 0.3016250 seconds Total time for which application threads were stopped: 0.0087502 seconds Application time: 15.0009372 seconds Total time for which application threads were stopped: 0.0074670 seconds Application time: 18.9151668 seconds Total time for which application threads were stopped: 0.0230399 seconds Application time: 0.0001326 seconds Total time for which application threads were stopped: 0.0012976 seconds Application time: 0.0000646 seconds Total time for which application threads were stopped: 0.0010412 seconds Application time: 25.4543868 seconds Total time for which application threads were stopped: 0.0087742 seconds Application time: 0.0001073 seconds Total time for which application threads were stopped: 0.0013600 seconds Application time: 0.0001158 seconds Total time for which application threads were stopped: 0.0049206 seconds Application time: 0.0025729 seconds Total time for which application threads were stopped: 0.0012530 seconds Application time: 0.0001156 seconds Total time for which application threads were stopped: 0.0009663 seconds Application time: 0.0007467 seconds Total time for which application threads were stopped: 0.0010364 seconds Application time: 0.0001326 seconds Total time for which application threads were stopped: 0.0009763 seconds Application time: 29.9838922 seconds Total time for which application threads were stopped: 0.0075727 seconds Application time: 4.4557342 seconds Total time for which application threads were stopped: 0.0187998 seconds Application time: 0.0000837 seconds Total time for which application threads were stopped: 0.0012345 seconds Application time: 0.0000634 seconds Total time for which application threads were stopped: 0.0010140 seconds Application time: 59.1763410 seconds Total time for which application threads were stopped: 0.0198018 seconds Application time: 0.0001009 seconds Total time for which application threads were stopped: 0.0011658 seconds Application time: 0.0000614 seconds Total time for which application threads were stopped: 0.0010269 seconds Application time: 27.9028562 seconds Total time for which application threads were stopped: 0.0080420 seconds Application time: 0.0001698 seconds Total time for which application threads were stopped: 0.0012317 seconds Application time: 0.1080374 seconds Total time for which application threads were stopped: 0.0073232 seconds Application time: 0.1215694 seconds Total time for which application threads were stopped: 0.0078813 seconds Application time: 0.0000801 seconds Total time for which application threads were stopped: 0.0014642 seconds Application time: 0.0001074 seconds Total time for which application threads were stopped: 0.0012944 seconds Application time: 0.0000676 seconds Total time for which application threads were stopped: 0.0012189 seconds Application time: 0.0192653 seconds Total time for which application threads were stopped: 0.0021554 seconds Application time: 0.0001444 seconds Total time for which application threads were stopped: 0.0013011 seconds Application time: 0.2620606 seconds Total time for which application threads were stopped: 0.0034052 seconds Application time: 0.8994294 seconds {Heap before GC invocations=3607 (full 0): PSYoungGen total 1763776K, used 1763776K [0xfffffffeec800000, 0xffffffff6c800000, 0xffffffff6c800000) eden space 1747392K, 100% used [0xfffffffeec800000,0xffffffff57270000,0xffffffff57270000) from space 16384K, 100% used [0xffffffff57270000,0xffffffff58270000,0xffffffff58270000) to space 177408K, 0% used [0xffffffff61ac0000,0xffffffff61ac0000,0xffffffff6c800000) ParOldGen total 11214848K, used 686377K [0xfffffffc40000000, 0xfffffffeec800000, 0xfffffffeec800000) object space 11214848K, 6% used [0xfffffffc40000000,0xfffffffc69e4a480,0xfffffffeec800000) PSPermGen total 73728K, used 56020K [0xfffffffc30000000, 0xfffffffc34800000, 0xfffffffc40000000) object space 73728K, 75% used [0xfffffffc30000000,0xfffffffc336b5330,0xfffffffc34800000) 2010-03-24T05:57:29.620+0100: 47245.068: [GC $/usr/jdk/instances/jdk1.6.0/jre/bin/sparcv9/java -version java version "1.6.0_13" Java(TM) SE Runtime Environment (build 1.6.0_13-b03) Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed mode) $ uname -a SunOS XXX 5.10 Generic_120011-14 sun4u sparc SUNW,Sun-Fire -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100324/963b3303/attachment.html From Y.S.Ramakrishna at Sun.COM Wed Mar 24 10:13:04 2010 From: Y.S.Ramakrishna at Sun.COM (Y. Srinivas Ramakrishna) Date: Wed, 24 Mar 2010 10:13:04 -0700 Subject: Strange long pauses between 2 Young GCs In-Reply-To: References: Message-ID: <4BAA4820.7050805@sun.com> David Tavoularis wrote: > Hi, > > In my application, I noticed very strange pauses (15s+19s+25s+30s+59s+28s = > 2min56) between 2 Young GCs (on Java6u13 64-bits / Solaris 10) with Those are not pauses of the application. Rather those are the durations for which the application runs (is not paused). The pauses are the short ones in between those. They are likely related to bias lock revocations or deoptimization or something. If you use -XX:+PrintSafepointStatistics (especially with a latest hs18 jvm) you would know what those very short stoppages in between are for. > ParallelGC+ParallelOldGC collectors. > Has anyone encountered this kind of pauses and has an explanation (see extract > of GC logs at the end of the mail) ? > > My issue is that some JMS messages have been sent to DeadMessageQueue because > the process was not "responding". Hmm, that seems strange. What does mpstat/prstat indicate during those times? You might want to run this on Sun Studio Collector (perf anakyzer) and see if it shows up the issues? If this is on a production system, and this occurs at some time that lasts a while, you may be able to collect the performance data during that period of time. But first a simple mpstat/prstat data may be useful before a SS collector experiment. -- ramki > > I do not think that my server was swapping (30GB free RAM) and usually, a Full > GC takes between 2s and 10s (WallTime="real"), between 3s and 60s (CpuTime="user") > > Thanks in advance > > -- > David > > JVM options : > -server > -Xms13000m > -Xmx13000m > -XX:+UseParallelGC > -XX:+AggressiveHeap > -XX:GCHeapFreeLimit=5 > -XX:+PrintGCDetails > -XX:+PrintGCTimeStamps > -XX:+PrintHeapAtGC > -XX:+PrintTenuringDistribution > -XX:+PrintGCApplicationStoppedTime > -XX:+PrintGCApplicationConcurrentTime > -XX:+PrintGCDateStamps > -Xloggc:/path/to/gc/logs > -XX:+HeapDumpOnOutOfMemoryError > -XX:HeapDumpPath=/path/to/heap/dumps > -XX:NewSize=2048m > -XX:MaxNewSize=2048m > -XX:+UseParallelOldGC > -XX:MaxPermSize=256m > > 2010-03-24T*05:54:23*.467+0100: 47058.915: [GC > Desired survivor size 181665792 bytes, new threshold 1 (max 15) > [PSYoungGen: 1792448K->16384K(1763776K)] 2472857K->702761K(12978624K), 0.1969989 > secs] [Times: user=0.29 sys=0.07, real=0.20 secs] > Heap after GC invocations=3606 (full 0): > PSYoungGen total 1763776K, used 16384K [0xfffffffeec800000, 0xffffffff6c800000, > 0xffffffff6c800000) > eden space 1747392K, 0% used > [0xfffffffeec800000,0xfffffffeec800000,0xffffffff57270000) > from space 16384K, 100% used > [0xffffffff57270000,0xffffffff58270000,0xffffffff58270000) > to space 177408K, 0% used [0xffffffff61ac0000,0xffffffff61ac0000,0xffffffff6c800000) > ParOldGen total 11214848K, used 686377K [0xfffffffc40000000, 0xfffffffeec800000, > 0xfffffffeec800000) > object space 11214848K, 6% used > [0xfffffffc40000000,0xfffffffc69e4a480,0xfffffffeec800000) > PSPermGen total 73728K, used 56019K [0xfffffffc30000000, 0xfffffffc34800000, > 0xfffffffc40000000) > object space 73728K, 75% used > [0xfffffffc30000000,0xfffffffc336b4c30,0xfffffffc34800000) > } > Total time for which application threads were stopped: 0.2054411 seconds > Application time: 2.1629642 seconds > Total time for which application threads were stopped: 0.0089378 seconds > Application time: 0.0000853 seconds > Total time for which application threads were stopped: 0.0012767 seconds > Application time: 0.0001092 seconds > Total time for which application threads were stopped: 0.0010313 seconds > Application time: 0.0000721 seconds > Total time for which application threads were stopped: 0.0010318 seconds > Application time: 0.3016250 seconds > Total time for which application threads were stopped: 0.0087502 seconds > *Application time: 15.0009372 seconds* > Total time for which application threads were stopped: 0.0074670 seconds > *Application time: 18.9151668 seconds* > Total time for which application threads were stopped: 0.0230399 seconds > Application time: 0.0001326 seconds > Total time for which application threads were stopped: 0.0012976 seconds > Application time: 0.0000646 seconds > Total time for which application threads were stopped: 0.0010412 seconds > *Application time: 25.4543868 seconds* > Total time for which application threads were stopped: 0.0087742 seconds > Application time: 0.0001073 seconds > Total time for which application threads were stopped: 0.0013600 seconds > Application time: 0.0001158 seconds > Total time for which application threads were stopped: 0.0049206 seconds > Application time: 0.0025729 seconds > Total time for which application threads were stopped: 0.0012530 seconds > Application time: 0.0001156 seconds > Total time for which application threads were stopped: 0.0009663 seconds > Application time: 0.0007467 seconds > Total time for which application threads were stopped: 0.0010364 seconds > Application time: 0.0001326 seconds > Total time for which application threads were stopped: 0.0009763 seconds > *Application time: 29.9838922 seconds* > Total time for which application threads were stopped: 0.0075727 seconds > Application time: 4.4557342 seconds > Total time for which application threads were stopped: 0.0187998 seconds > Application time: 0.0000837 seconds > Total time for which application threads were stopped: 0.0012345 seconds > Application time: 0.0000634 seconds > Total time for which application threads were stopped: 0.0010140 seconds > *Application time: 59.1763410 seconds* > Total time for which application threads were stopped: 0.0198018 seconds > Application time: 0.0001009 seconds > Total time for which application threads were stopped: 0.0011658 seconds > Application time: 0.0000614 seconds > Total time for which application threads were stopped: 0.0010269 seconds > *Application time: 27.9028562 seconds* > Total time for which application threads were stopped: 0.0080420 seconds > Application time: 0.0001698 seconds > Total time for which application threads were stopped: 0.0012317 seconds > Application time: 0.1080374 seconds > Total time for which application threads were stopped: 0.0073232 seconds > Application time: 0.1215694 seconds > Total time for which application threads were stopped: 0.0078813 seconds > Application time: 0.0000801 seconds > Total time for which application threads were stopped: 0.0014642 seconds > Application time: 0.0001074 seconds > Total time for which application threads were stopped: 0.0012944 seconds > Application time: 0.0000676 seconds > Total time for which application threads were stopped: 0.0012189 seconds > Application time: 0.0192653 seconds > Total time for which application threads were stopped: 0.0021554 seconds > Application time: 0.0001444 seconds > Total time for which application threads were stopped: 0.0013011 seconds > Application time: 0.2620606 seconds > Total time for which application threads were stopped: 0.0034052 seconds > Application time: 0.8994294 seconds > {Heap before GC invocations=3607 (full 0): > PSYoungGen total 1763776K, used 1763776K [0xfffffffeec800000, > 0xffffffff6c800000, 0xffffffff6c800000) > eden space 1747392K, 100% used > [0xfffffffeec800000,0xffffffff57270000,0xffffffff57270000) > from space 16384K, 100% used > [0xffffffff57270000,0xffffffff58270000,0xffffffff58270000) > to space 177408K, 0% used [0xffffffff61ac0000,0xffffffff61ac0000,0xffffffff6c800000) > ParOldGen total 11214848K, used 686377K [0xfffffffc40000000, 0xfffffffeec800000, > 0xfffffffeec800000) > object space 11214848K, 6% used > [0xfffffffc40000000,0xfffffffc69e4a480,0xfffffffeec800000) > PSPermGen total 73728K, used 56020K [0xfffffffc30000000, 0xfffffffc34800000, > 0xfffffffc40000000) > object space 73728K, 75% used > [0xfffffffc30000000,0xfffffffc336b5330,0xfffffffc34800000) > 2010-03-24T*05:57:29*.620+0100: 47245.068: [GC > > > $/usr/jdk/instances/jdk1.6.0/jre/bin/sparcv9/java -version > java version "1.6.0_13" > Java(TM) SE Runtime Environment (build 1.6.0_13-b03) > Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed mode) > > $ uname -a > SunOS XXX 5.10 Generic_120011-14 sun4u sparc SUNW,Sun-Fire > > > > ------------------------------------------------------------------------ > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From david.tavoularis at mycom-int.com Thu Mar 25 00:54:53 2010 From: david.tavoularis at mycom-int.com (David Tavoularis) Date: Thu, 25 Mar 2010 08:54:53 +0100 Subject: Strange long pauses between 2 Young GCs In-Reply-To: <4BAA4820.7050805@sun.com> References: <4BAA4820.7050805@sun.com> Message-ID: Hi Ramki, Thanks for your answers. I misunderstood these messages and I thought that GC could have been responsible for the JMS messages sent to DeadMessageQueue. But this is not the case as there was no important pauses (less than 0.2s) between the 2 Young GCs. > Hmm, that seems strange. What does mpstat/prstat indicate during those times? They were not running. This is a production server, so I cannot launch them. This issue happened only yesterday, so I will try to investigate it with the JMS provider logs. In conclusion, there was no hotspot/GC issue. Thanks again. -- David On Wed, 24 Mar 2010 18:13:04 +0100, Y. Srinivas Ramakrishna wrote: > David Tavoularis wrote: >> Hi, >> >> In my application, I noticed very strange pauses (15s+19s+25s+30s+59s+28s = >> 2min56) between 2 Young GCs (on Java6u13 64-bits / Solaris 10) with > > Those are not pauses of the application. Rather those are the durations for > which the application runs (is not paused). The pauses are the short ones > in between those. They are likely related to bias lock revocations or > deoptimization or something. If you use -XX:+PrintSafepointStatistics (especially > with a latest hs18 jvm) you would know what those very short stoppages > in between are for. > >> ParallelGC+ParallelOldGC collectors. >> Has anyone encountered this kind of pauses and has an explanation (see extract >> of GC logs at the end of the mail) ? >> >> My issue is that some JMS messages have been sent to DeadMessageQueue because >> the process was not "responding". > > Hmm, that seems strange. What does mpstat/prstat indicate during those times? > You might want to run this on Sun Studio Collector (perf anakyzer) and > see if it shows up the issues? If this is on a production system, and > this occurs at some time that lasts a while, you may be able to collect the > performance data during that period of time. > > But first a simple mpstat/prstat data may be useful before a SS collector experiment. > > -- ramki > >> >> I do not think that my server was swapping (30GB free RAM) and usually, a Full >> GC takes between 2s and 10s (WallTime="real"), between 3s and 60s (CpuTime="user") >> >> Thanks in advance >> >> -- >> David >> >> JVM options : >> -server >> -Xms13000m >> -Xmx13000m >> -XX:+UseParallelGC >> -XX:+AggressiveHeap >> -XX:GCHeapFreeLimit=5 >> -XX:+PrintGCDetails >> -XX:+PrintGCTimeStamps >> -XX:+PrintHeapAtGC >> -XX:+PrintTenuringDistribution >> -XX:+PrintGCApplicationStoppedTime >> -XX:+PrintGCApplicationConcurrentTime >> -XX:+PrintGCDateStamps >> -Xloggc:/path/to/gc/logs >> -XX:+HeapDumpOnOutOfMemoryError >> -XX:HeapDumpPath=/path/to/heap/dumps >> -XX:NewSize=2048m >> -XX:MaxNewSize=2048m >> -XX:+UseParallelOldGC >> -XX:MaxPermSize=256m >> >> 2010-03-24T*05:54:23*.467+0100: 47058.915: [GC >> Desired survivor size 181665792 bytes, new threshold 1 (max 15) >> [PSYoungGen: 1792448K->16384K(1763776K)] 2472857K->702761K(12978624K), 0.1969989 >> secs] [Times: user=0.29 sys=0.07, real=0.20 secs] >> Heap after GC invocations=3606 (full 0): >> PSYoungGen total 1763776K, used 16384K [0xfffffffeec800000, 0xffffffff6c800000, >> 0xffffffff6c800000) >> eden space 1747392K, 0% used >> [0xfffffffeec800000,0xfffffffeec800000,0xffffffff57270000) >> from space 16384K, 100% used >> [0xffffffff57270000,0xffffffff58270000,0xffffffff58270000) >> to space 177408K, 0% used [0xffffffff61ac0000,0xffffffff61ac0000,0xffffffff6c800000) >> ParOldGen total 11214848K, used 686377K [0xfffffffc40000000, 0xfffffffeec800000, >> 0xfffffffeec800000) >> object space 11214848K, 6% used >> [0xfffffffc40000000,0xfffffffc69e4a480,0xfffffffeec800000) >> PSPermGen total 73728K, used 56019K [0xfffffffc30000000, 0xfffffffc34800000, >> 0xfffffffc40000000) >> object space 73728K, 75% used >> [0xfffffffc30000000,0xfffffffc336b4c30,0xfffffffc34800000) >> } >> Total time for which application threads were stopped: 0.2054411 seconds >> Application time: 2.1629642 seconds >> Total time for which application threads were stopped: 0.0089378 seconds >> Application time: 0.0000853 seconds >> Total time for which application threads were stopped: 0.0012767 seconds >> Application time: 0.0001092 seconds >> Total time for which application threads were stopped: 0.0010313 seconds >> Application time: 0.0000721 seconds >> Total time for which application threads were stopped: 0.0010318 seconds >> Application time: 0.3016250 seconds >> Total time for which application threads were stopped: 0.0087502 seconds >> *Application time: 15.0009372 seconds* >> Total time for which application threads were stopped: 0.0074670 seconds >> *Application time: 18.9151668 seconds* >> Total time for which application threads were stopped: 0.0230399 seconds >> Application time: 0.0001326 seconds >> Total time for which application threads were stopped: 0.0012976 seconds >> Application time: 0.0000646 seconds >> Total time for which application threads were stopped: 0.0010412 seconds >> *Application time: 25.4543868 seconds* >> Total time for which application threads were stopped: 0.0087742 seconds >> Application time: 0.0001073 seconds >> Total time for which application threads were stopped: 0.0013600 seconds >> Application time: 0.0001158 seconds >> Total time for which application threads were stopped: 0.0049206 seconds >> Application time: 0.0025729 seconds >> Total time for which application threads were stopped: 0.0012530 seconds >> Application time: 0.0001156 seconds >> Total time for which application threads were stopped: 0.0009663 seconds >> Application time: 0.0007467 seconds >> Total time for which application threads were stopped: 0.0010364 seconds >> Application time: 0.0001326 seconds >> Total time for which application threads were stopped: 0.0009763 seconds >> *Application time: 29.9838922 seconds* >> Total time for which application threads were stopped: 0.0075727 seconds >> Application time: 4.4557342 seconds >> Total time for which application threads were stopped: 0.0187998 seconds >> Application time: 0.0000837 seconds >> Total time for which application threads were stopped: 0.0012345 seconds >> Application time: 0.0000634 seconds >> Total time for which application threads were stopped: 0.0010140 seconds >> *Application time: 59.1763410 seconds* >> Total time for which application threads were stopped: 0.0198018 seconds >> Application time: 0.0001009 seconds >> Total time for which application threads were stopped: 0.0011658 seconds >> Application time: 0.0000614 seconds >> Total time for which application threads were stopped: 0.0010269 seconds >> *Application time: 27.9028562 seconds* >> Total time for which application threads were stopped: 0.0080420 seconds >> Application time: 0.0001698 seconds >> Total time for which application threads were stopped: 0.0012317 seconds >> Application time: 0.1080374 seconds >> Total time for which application threads were stopped: 0.0073232 seconds >> Application time: 0.1215694 seconds >> Total time for which application threads were stopped: 0.0078813 seconds >> Application time: 0.0000801 seconds >> Total time for which application threads were stopped: 0.0014642 seconds >> Application time: 0.0001074 seconds >> Total time for which application threads were stopped: 0.0012944 seconds >> Application time: 0.0000676 seconds >> Total time for which application threads were stopped: 0.0012189 seconds >> Application time: 0.0192653 seconds >> Total time for which application threads were stopped: 0.0021554 seconds >> Application time: 0.0001444 seconds >> Total time for which application threads were stopped: 0.0013011 seconds >> Application time: 0.2620606 seconds >> Total time for which application threads were stopped: 0.0034052 seconds >> Application time: 0.8994294 seconds >> {Heap before GC invocations=3607 (full 0): >> PSYoungGen total 1763776K, used 1763776K [0xfffffffeec800000, >> 0xffffffff6c800000, 0xffffffff6c800000) >> eden space 1747392K, 100% used >> [0xfffffffeec800000,0xffffffff57270000,0xffffffff57270000) >> from space 16384K, 100% used >> [0xffffffff57270000,0xffffffff58270000,0xffffffff58270000) >> to space 177408K, 0% used [0xffffffff61ac0000,0xffffffff61ac0000,0xffffffff6c800000) >> ParOldGen total 11214848K, used 686377K [0xfffffffc40000000, 0xfffffffeec800000, >> 0xfffffffeec800000) >> object space 11214848K, 6% used >> [0xfffffffc40000000,0xfffffffc69e4a480,0xfffffffeec800000) >> PSPermGen total 73728K, used 56020K [0xfffffffc30000000, 0xfffffffc34800000, >> 0xfffffffc40000000) >> object space 73728K, 75% used >> [0xfffffffc30000000,0xfffffffc336b5330,0xfffffffc34800000) >> 2010-03-24T*05:57:29*.620+0100: 47245.068: [GC >> >> >> $/usr/jdk/instances/jdk1.6.0/jre/bin/sparcv9/java -version >> java version "1.6.0_13" >> Java(TM) SE Runtime Environment (build 1.6.0_13-b03) >> Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed mode) >> >> $ uname -a >> SunOS XXX 5.10 Generic_120011-14 sun4u sparc SUNW,Sun-Fire >> >> >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > From shaun.hennessy at alcatel-lucent.com Thu Mar 25 10:40:33 2010 From: shaun.hennessy at alcatel-lucent.com (Shaun Hennessy) Date: Thu, 25 Mar 2010 13:40:33 -0400 Subject: CMS & DefaultMaxTenuringThreshold/SurvivorRatio In-Reply-To: <4B50DE95.4060901@Sun.COM> References: <4AC0EEAE.5010705@Sun.COM> <4AC145AE.30804@sun.com> <4AC148B6.7010608@Sun.COM> <4AC1493A.2030004@sun.com> <4B4B3ECB.5090105@alcatel-lucent.com> <4B4B937C.4080907@alcatel-lucent.com> <4B4B9FBC.4040103@sun.com> <4B4BA6AF.5080300@sun.com> <4B50C9BF.8060202@alcatel-lucent.com> <4B50DE95.4060901@Sun.COM> Message-ID: <4BABA011.7020801@alcatel-lucent.com> An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100325/55b72557/attachment.html From shaun.hennessy at alcatel-lucent.com Tue Mar 30 07:30:52 2010 From: shaun.hennessy at alcatel-lucent.com (Shaun Hennessy) Date: Tue, 30 Mar 2010 10:30:52 -0400 Subject: understanding GC logs In-Reply-To: <4BABA011.7020801@alcatel-lucent.com> References: <4AC0EEAE.5010705@Sun.COM> <4AC145AE.30804@sun.com> <4AC148B6.7010608@Sun.COM> <4AC1493A.2030004@sun.com> <4B4B3ECB.5090105@alcatel-lucent.com> <4B4B937C.4080907@alcatel-lucent.com> <4B4B9FBC.4040103@sun.com> <4B4BA6AF.5080300@sun.com> <4B50C9BF.8060202@alcatel-lucent.com> <4B50DE95.4060901@Sun.COM> <4BABA011.7020801@alcatel-lucent.com> Message-ID: <4BB20B1C.4020608@alcatel-lucent.com> An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100330/8a135a8a/attachment.html From Ryan.Highley at sabre-holdings.com Tue Mar 30 12:54:13 2010 From: Ryan.Highley at sabre-holdings.com (Highley, Ryan) Date: Tue, 30 Mar 2010 14:54:13 -0500 Subject: G1 Logging and Phases Message-ID: <32DB14FBADE66B439D6A98D2B01D27EC12E2DB86@sgtulmsp02.Global.ad.sabre.com> Hello, I have a few validation questions based on the following G1 GC log excerpt. 19.298: [GC pause (young) (initial-mark) 2141M->1957M(4000M)20.433: [GC concurrent-mark-start] , 1.1346080 secs] 20.721: [GC pause (young)20.761: [GC concurrent-mark-end, 0.2501590 sec] 2300M->2119M(4000M), 1.3597880 secs] 22.081: [GC remark, 0.0012060 secs] 22.083: [GC concurrent-count-start] 22.452: [GC pause (young) 2469M->2119M(4000M), 0.3101740 secs] 22.976: [GC concurrent-count-end, 0.8934030] 23.063: [GC cleanup 2435M->639M(4000M), 0.0142190 secs] 23.078: [GC concurrent-cleanup-start] 23.086: [GC concurrent-cleanup-end, 0.0083300] 23.108: [GC pause (young) 660M->338M(4000M), 0.2031580 secs] 23.632: [GC pause (partial) 668M->346M(4000M), 0.3303010 secs] >From what I read in concurrentMarkThread.cpp and g1CollectorPolicy.cpp in the current OpenJDK 7 G1 source code, could I please get the following confirmed or denied to ensure my understanding is correct? The "GC pause" lines are indeed stop-the-world pauses as blatantly marked. J The "GC pause (young)" lines are only evacuating young gen regions. The "GC pause (young) (initial-mark)" line is both evacuating young gen regions and prepping for a full collection with initial marking of roots. The "remark" line is also a stop-the-world pause, i.e. the G1 "Final Marking Pause" noted in the G1 whitepaper. The "cleanup" line is also a stop-the-world pause and finalizes the live object counts and sizes. The heap allocated size should be the ending size reported on the "cleanup" line plus any allocations and/or GC activity that occur during the concurrent cleanup phase. All "concurrent" lines run concurrently with application threads, similar to the CMS behavior, and in a set of one or more ConcurrentMarkThread instances. Thank you for your help, Ryan -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100330/d6ca80a1/attachment.html From jon.masamitsu at oracle.com Tue Mar 30 21:34:34 2010 From: jon.masamitsu at oracle.com (Jon Masamitsu) Date: Tue, 30 Mar 2010 21:34:34 -0700 Subject: understanding GC logs In-Reply-To: <4BB20B1C.4020608@alcatel-lucent.com> References: <4AC0EEAE.5010705@Sun.COM> <4AC145AE.30804@sun.com> <4AC148B6.7010608@Sun.COM> <4AC1493A.2030004@sun.com> <4B4B3ECB.5090105@alcatel-lucent.com> <4B4B937C.4080907@alcatel-lucent.com> <4B4B9FBC.4040103@sun.com> <4B4BA6AF.5080300@sun.com> <4B50C9BF.8060202@alcatel-lucent.com> <4B50DE95.4060901@Sun.COM> <4BABA011.7020801@alcatel-lucent.com> <4BB20B1C.4020608@alcatel-lucent.com> Message-ID: <148A7A57-B616-4CA4-9571-0F0216B0F650@oracle.com> On 03/30/10 07:30, Shaun Hennessy wrote: > > A couple of question related to the GC logs and promotion failure > messages > > I am running 6u17. > > rgv[2]: -server > argv[3]: -Xms14000m > argv[4]: -Xmx14000m > argv[5]: -XX:PermSize=800m > argv[6]: -XX:NewSize=5600m > argv[7]: -XX:MaxNewSize=5600m > argv[8]: -XX:MaxPermSize=800m > argv[9]: -XX:+DisableExplicitGC > argv[10]: -XX:+UseConcMarkSweepGC > argv[11]: -XX:+UseParNewGC > argv[12]: -XX:+UseCMSCompactAtFullCollection > argv[13]: -XX:+CMSClassUnloadingEnabled > argv[28]: -verbose:gc > argv[29]: -XX:+PrintGCDetails > argv[30]: -XX:+PrintGCDateStamps > > > > 4112.744: [GC 4112.744: [ParNew: 4940531K->530787K(5017600K), > 0.1455641 secs] 11878708K->7540012K(13619200K), 0.1457559 secs] > [Times: user=1.38 sys=0.02, real=0.15 secs] > 4113.780: [GC 4113.780: [ParNew: 4831587K->372801K(5017600K), > 0.2093305 secs] 11840812K->7551390K(13619200K), 0.2095270 secs] > [Times: user=1.34 sys=0.07, real=0.21 secs] > [Times: user=0.10 sys=0.00, real=0.11 secs] > 2010-03-24T16:31:56.490-0400: 4114.097: [CMS-concurrent-mark-start] > 4115.261: [GC 4115.261: [ParNew: 4674075K->364108K(5017600K), > 0.0755017 secs] 11852663K->7542736K(13619200K), 0.0756880 secs] > [Times: user=0.93 sys=0.00, real=0.08 secs] > 4115.338: [GC 4115.338: [ParNew: 420064K->323310K(5017600K), > 0.1112115 secs] 7598693K->7587370K(13619200K), 0.1113667 secs] > [Times: user=0.98 sys=0.02, real=0.11 secs] > 2010-03-24T16:31:58.647-0400: 4116.254: [CMS-concurrent-mark: > 1.909/2.157 secs] [Times: user=31.47 sys=1.55, real=2.16 secs] > 2010-03-24T16:31:58.647-0400: 4116.254: [CMS-concurrent-preclean- > start] > 2010-03-24T16:31:58.798-0400: 4116.405: [CMS-concurrent-preclean: > 0.149/0.151 secs] [Times: user=2.29 sys=0.12, real=0.15 secs] > 2010-03-24T16:31:58.799-0400: 4116.406: [CMS-concurrent-abortable- > preclean-start] > 4116.460: [GC 4116.460: [ParNew: 4624110K->301464K(5017600K), > 0.0914784 secs] 11888170K->7617401K(13619200K), 0.0916679 secs] > [Times: user=0.89 sys=0.03, real=0.09 secs] > 2010-03-24T16:31:59.494-0400: 4117.101: [CMS-concurrent-abortable- > preclean: 0.596/0.695 secs] [Times: user=9.88 sys=0.60, real=0.70 > secs] > [YG occupancy: 2756990 K (5017600 K)]4117.102: [Rescan (parallel) , > 0.4648394 secs]4117.567: [weak refs processing, 0.0028851 > secs]4117.570: [class unloading, 0.0240174 secs]4117.594: [scrub > symbol & string tables, 0.0898531 secs] [Times: user=1.72 sys=0.37, > real=0.58 secs] > 2010-03-24T16:32:00.079-0400: 4117.686: [CMS-concurrent-sweep-start] > 4118.116: [GC 4118.116: [ParNew: 4602264K->305089K(5017600K), > 0.0712571 secs] 11891816K->7620802K(13619200K), 0.0714474 secs] > [Times: user=0.75 sys=0.00, real=0.07 secs] > 4119.117: [GC 4119.117: [ParNew: 4605889K->263281K(5017600K), > 0.0842051 secs] 11665429K->7368947K(13619200K), 0.0843955 secs] > [Times: user=0.79 sys=0.01, real=0.08 secs] > 4125.941: [GC 4125.941: [ParNew: 4936868K->708251K(5017600K), > 0.1426036 secs] 9789305K->5612975K(13619200K), 0.1427944 secs] > [Times: user=1.56 sys=0.01, real=0.14 secs] > 4126.893: [GC 4126.894: [ParNew: 5009051K->485783K(5017600K), > 0.2210054 secs] 9536611K->5247528K(13619200K), 0.2211964 secs] > [Times: user=1.58 sys=0.04, real=0.22 secs] > 4128.102: [GC 4128.102: [ParNew: 4786583K->455386K(5017600K), > 0.0748814 secs] 8588693K->4257495K(13619200K), 0.0750694 secs] > [Times: user=0.94 sys=0.00, real=0.08 secs] > 2010-03-24T16:32:11.951-0400: 4129.558: [CMS-concurrent-sweep: > 10.777/11.872 secs] [Times: user=149.77 sys=7.25, real=11.87 secs] > 2010-03-24T16:32:11.951-0400: 4129.558: [CMS-concurrent-reset-start] > 2010-03-24T16:32:11.984-0400: 4129.591: [CMS-concurrent-reset: > 0.033/0.033 secs] [Times: user=0.04 sys=0.00, real=0.03 secs] > 4140.537: [GC 4140.537: [ParNew: 4756186K->539705K(5017600K), > 0.0873384 secs] 6572247K->2355767K(13619200K), 0.0875281 secs] > [Times: user=1.10 sys=0.00, re > al=0.09 secs] > > > 1) I no longer seem to get any "CMS-initial-mark" - is this a > change since 6u12? I'm running Java(TM) SE Runtime Environment (build 1.6.0_17-b04) Java HotSpot(TM) Server VM (build 14.3-b01, mixed mode) and I see entries such as 0.869: [GC [1 CMS-initial-mark: 22901K(229376K)] 28264K(258880K), 0.0561592 secs] [Times: user=0.05 sys=0.01, real=0.06 secs] > 2) The rescan portion than is the only non-concurrent correct? So > from the above the application was only STW for 0.58 sec. The initial-mark is also STW. The rescan is part of the remark which is STW. From my run 1.270: [GC[YG occupancy: 15860 K (29504 K)]1.270: [Rescan (parallel) , 0.1002467 secs]1.370: [weak refs processing, 0.0000167 secs] [1 CMS- remark: 22901K(229376K)] 38761K(258880K), 0.1004598 secs] [Times: user=0.11 sys=0.03, real=0.10 secs] In your entry yes it is .058 sec. > 3) This may have been a chance from 1.5 to 1.6, but this line also > used to display a CMS-remark did it not? Yes, see my example above. > 4) Is there a way to have my minor collections also display the full > date stamp (ie 2010-03-24T16:31:58.799-0400) > > When I run with -XX:+PrintGCDateStamps I see entries such as 2010-03-30T16:06:55.297-0700: 2.602: [GC 2.602: [ParNew: 29504K- >3264K(29504K), 0.2457302 secs] 52405K->38187K(258880K), 0.2460394 secs] [Times: user=0.41 sys=0.02, real=0.25 secs] I don't know why you're not seeing that. > > 1270.736: [GC 1270.736: [ParNew: 4872340K->574345K(5017600K), > 0.1967262 secs] 7123984K->2972742K(13619200K), 0.1969106 secs] > [Times: user=1.45 sys=0.05, re > al=0.20 secs] > 1272.024: [GC 1272.024: [ParNew: 4875542K->653139K(5017600K), > 0.1334760 secs] 7273939K->3051536K(13619200K), 0.1336582 secs] > [Times: user=1.54 sys=0.01, re > al=0.13 secs] > 1272.158: [GC 1272.159: [ParNew: 681949K->563105K(5017600K), > 0.2187865 secs] 3080347K->3158904K(13619200K), 0.2189362 secs] > [Times: user=1.48 sys=0.06, rea > l=0.22 secs] > 1273.398: [GC 1273.398: [ParNew: 4863905K->535051K(5017600K), > 0.1196808 secs] 7459704K->3130850K(13619200K), 0.1198694 secs] > [Times: user=1.51 sys=0.00, re > al=0.12 secs] > 1274.461: [GC 1274.461: [ParNew: 4835851K->399744K(5017600K), > 0.2861376 secs] 7431650K->3249248K(13619200K), 0.2863296 secs] > [Times: user=1.61 sys=0.09, re > al=0.29 secs] > > 5) Why did the middle minor collection occur? A big allocation? > That seems rather suspicious. It may be a side effect of using JNI critical sections. I don't know if this is such a case but its the best behavior. > > > - Promotion Failure > 4896.478: [GC 4896.478: [ParNew: 4894353K->587864K(5017600K), > 0.4789909 secs] 8473688K->4268560K(13619200K), 0.4791812 secs] > [Times: user=1.00 sys=0.61, real=0.48 secs] > 4897.812: [GC 4897.812: [ParNew: 4888664K->545903K(5017600K), > 0.4105613 secs] 8569360K->4326583K(13619200K), 0.4107560 secs] > [Times: user=1.06 sys=0.55, real=0.41 secs] > 4899.057: [GC 4899.058: [ParNew: 4846703K->638966K(5017600K), > 0.2759734 secs] 8627383K->4496987K(13619200K), 0.2761637 secs] > [Times: user=1.13 sys=0.36, real=0.28 secs] > 4900.101: [GC 4900.101: [ParNew: 4939768K->630721K(5017600K), > 0.5117751 secs] 8797789K->4607020K(13619200K), 0.5119662 secs] > [Times: user=0.84 sys=0.66, real=0.51 secs] > 4900.615: [GC 4900.615: [ParNew: 651487K->487288K(5017600K), > 0.0780183 secs] 4627786K->4463587K(13619200K), 0.0781687 secs] > [Times: user=0.96 sys=0.00, real=0.08 secs] > 4901.581: [GC 4901.581: [ParNew (promotion failed): 4788088K- > >4780999K(5017600K), 2.8947499 secs]4904.476: [CMS: 4003090K- > >1530872K(8601600K), 7.5122451 secs] 8764387K->1530872K(13619200K), > [CMS Perm : 671102K->671102K(819200K)], 10.4072102 secs] [Times: > user=11.03 sys=1.09, real=10.41 secs] > 4913.024: [GC 4913.024: [ParNew: 4300800K->316807K(5017600K), > 0.0615917 secs] 5831672K->1847679K(13619200K), 0.0617857 secs] > [Times: user=0.74 sys=0.00, real=0.06 secs] > 4914.015: [GC 4914.015: [ParNew: 4617607K->475077K(5017600K), > 0.0771389 secs] 6148479K->2005949K(13619200K), 0.0773290 secs] > [Times: user=0.95 sys=0.00, real=0.08 secs] > 4914.908: [GC 4914.908: [ParNew: 4775877K->586339K(5017600K), > 0.0857102 secs] 6306749K->2117211K(13619200K), 0.0859046 secs] > [Times: user=1.06 sys=0.00, real=0.09 secs] > 4915.816: [GC 4915.816: [ParNew: 4887139K->476398K(5017600K), > 0.1841627 secs] 6418011K->2152868K(13619200K), 0.1843556 secs] > [Times: user=1.32 sys=0.07, real=0.18 secs] > 6) So here i had a promotion failure, this is due to fragmentation > of the tenured generation rather than lack of space? Fragmentation is the likely problem. When 6u20 is released try it. It does a better job of keeping fragmentation down. > 7) Do we need 1 contiguous space in tenured big enough to hold the > complete list/size of all objects being promoted, or > do multiple spaces get used& the pieces don't all fit? The free space in the tenured generation is kept in a free list so there are multiple chunks. Don't need 1 contiguous chunk for all the promotions. > 8) What exactly is occurring during this promotion failed > collection? Based on the next example I assume > it's a (successful) scavenge. What exactly is this - which > thread(s) serial / ParallelGCThreads?, > STW?, are we simply compacting the tenured gen or are we can > actually GC the tenured? A promotion failure is a scavenge that does not succeed because there is not enough space in the old gen to do all the needed promotions. The scavenge is in essence unwound and then a full STW compaction of the entire heap is done. > > > > promotion failed, and full GC > 50786.124: [GC 50786.124: [ParNew: 4606713K->338518K(5017600K), > 0.0961884 secs] 12303455K->8081859K(13619200K), 0.0963907 secs] > [Times: user=0.91 sys=0.01, real=0.10 secs] > 50787.373: [GC 50787.373: [ParNew: 4639318K->272229K(5017600K), > 0.0749353 secs] 12382659K->8053730K(13619200K), 0.0751408 secs] > [Times: user=0.75 sys=0.00, real=0.08 secs] > 50788.483: [GC 50788.483: [ParNew: 4573029K->393397K(5017600K), > 0.0837182 secs] 12354530K->8185595K(13619200K), 0.0839321 secs] > [Times: user=1.03 sys=0.00, real=0.08 secs] > 50789.590: [GC 50789.590: [ParNew (promotion failed): 4694264K- > >4612345K(5017600K), 1.5974678 secs] 12486461K- > >12447305K(13619200K), 1.5976765 secs] [Times : user=2.38 sys=0.20, > real=1.60 secs] > GC locker: Trying a full collection because scavenge failed > 50791.188: [Full GC 50791.188: [CMS: 7834959K->1227325K(8601600K), > 6.7102106 secs] 12447305K->1227325K(13619200K), [CMS Perm : 670478K- > >670478K(819200K)], 6.7103417 secs] [Times: user=6.71 sys=0.00, > real=6.71 secs] > 50798.982: [GC 50798.982: [ParNew: 4300800K->217359K(5017600K), > 0.0364557 secs] 5528125K->1444685K(13619200K), 0.0366630 secs] > [Times: user=0.44 sys=0.00, real=0.04 secs] > 50800.246: [GC 50800.246: [ParNew: 4518167K->198753K(5017600K), > 0.0368620 secs] 5745493K->1426078K(13619200K), 0.0370604 secs] > [Times: user=0.46 sys=0.01, real=0.04 secs] > 9) Probably once I understand what the scavenge is doing will help > me understand this case, but logic seems > simply enough - fullgc on promotionfailure&scavenge failed. Yes, full STW compaction. > > > > promotion and concurrent mode failures > 53494.424: [GC 53494.424: [ParNew: 4979001K->716800K(5017600K), > 0.2120290 secs] 12583633K->8434774K(13619200K), 0.2122200 secs] > [Times: user=2.12 sys=0.03, real=0.21 secs] > 53496.131: [GC 53496.131: [ParNew: 5017600K->605278K(5017600K), > 0.2761710 secs] 12735574K->8578720K(13619200K), 0.2763597 secs] > [Times: user=1.94 sys=0.08, real=0.28 secs] > [Times: user=0.16 sys=0.00, real=0.16 secs] > 2010-03-25T06:14:58.961-0400: 53496.568: [CMS-concurrent-mark-start] > 53497.688: [GC 53497.688: [ParNew: 4906078K->545999K(5017600K), > 0.0989930 secs] 12879520K->8519441K(13619200K), 0.0991785 secs] > [Times: user=1.21 sys=0.02, real=0.10 secs] > 2010-03-25T06:15:00.188-0400: 53497.795: [CMS-concurrent-mark: > 1.107/1.227 secs] [Times: user=15.14 sys=0.42, real=1.23 secs] > 2010-03-25T06:15:00.188-0400: 53497.795: [CMS-concurrent-preclean- > start] > 2010-03-25T06:15:00.233-0400: 53497.840: [CMS-concurrent-preclean: > 0.043/0.045 secs] [Times: user=0.31 sys=0.01, real=0.04 secs] > 2010-03-25T06:15:00.233-0400: 53497.840: [CMS-concurrent-abortable- > preclean-start] > 2010-03-25T06:15:00.794-0400: 53498.401: [CMS-concurrent-abortable- > preclean: 0.541/0.560 secs] [Times: user=6.11 sys=0.22, real=0.56 > secs] > [YG occupancy: 3222128 K (5017600 K)]53498.402: [Rescan (parallel) , > 0.4447462 secs]53498.847: [weak refs processing, 0.0028967 > secs]53498.850: [class unloading, 0.0248904 secs]53498.875: [scrub > symbol & string tables, 0.0896937 secs] [Times: user=1.79 sys=0.35, > real=0.56 secs] > 2010-03-25T06:15:01.360-0400: 53498.967: [CMS-concurrent-sweep- > start] 53499.350: [GC 53499.350: [ParNew (promotion failed): > 4846799K->4718254K(5017600K), 5.3142493 secs]53504.664: > [CMS2010-03-25T06:15:11.506-0400: 53509.113: > [CMS-concurrent-sweep: 4.825/10.146 secs] [Times: user=16.61 > sys=2.94, real=10.15 secs] > (concurrent mode failure): 8087820K->1346631K(8601600K), 11.0573075 > secs] 12820241K->1346631K(13619200K), [CMS Perm : 670478K- > >670478K(819200K)], 16.37177 19 secs] [Times: user=17.62 sys=2.66, > real=16.37 secs] > 53516.713: [GC 53516.714: [ParNew: 4300800K->283359K(5017600K), > 0.0498000 secs] 5647431K->1629990K(13619200K), 0.0499965 secs] > [Times: user=0.62 sys=0.00, real=0.05 secs] > 53517.743: [GC 53517.743: [ParNew: 4584343K->340302K(5017600K), > 0.0544853 secs] 5930975K->1686933K(13619200K), 0.0546710 secs] > [Times: user=0.68 sys=0.00, real=0.05 secs] > 10) I think it's just the system is allocating at such at high rate > at this point in time ( and we don't use InitiatingOccupancy on this > app) > so we get close to full on tenured, minor collection came - no > room in tenured ---- so even though we don't say "Full GC" in this > one, > don't you get a Full GC as part of any concurrent-mode-failure? The promotion failure that happens leads to the concurrent mode failure. > > > promotion failed, scavenge failed, concurrent mode failure > 86833.016: [GC 86833.017: [ParNew: 4769273K->453398K(5017600K), > 0.1316717 secs] 12418197K->8169164K(13619200K), 0.1319220 secs] > [Times: user=1.22 sys=0.03, real=0.13 secs] > [Times: user=0.14 sys=0.00, real=0.15 secs] > 2010-03-25T15:30:37.688-0400: 86833.295: [CMS-concurrent-mark-start] > 86834.751: [GC 86834.751: [ParNew: 4754198K->513298K(5017600K), > 0.1250485 secs] 12469964K->8281014K(13619200K), 0.1252553 secs] > [Times: user=1.38 sys=0.01, real=0.13 secs] > 2010-03-25T15:30:39.310-0400: 86834.917: [CMS-concurrent-mark: > 1.453/1.621 secs] [Times: user=21.57 sys=1.15, real=1.62 secs] > 2010-03-25T15:30:39.310-0400: 86834.917: [CMS-concurrent-preclean- > start] > 2010-03-25T15:30:39.650-0400: 86835.258: [CMS-concurrent-preclean: > 0.337/0.341 secs] [Times: user=5.30 sys=0.18, real=0.34 secs] > 2010-03-25T15:30:39.651-0400: 86835.258: [CMS-concurrent-abortable- > preclean-start] > 2010-03-25T15:30:39.864-0400: 86835.471: [CMS-concurrent-abortable- > preclean: 0.211/0.214 secs] [Times: user=3.16 sys=0.19, real=0.21 > secs] > [YG occupancy: 3329361 K (5017600 K)]86835.500: [Rescan (parallel) , > 0.3868448 secs]86835.887: [weak refs processing, 0.0030042 > secs]86835.890: [class unloading, 0.0250008 secs]86835.915: [scrub > symbol & string tables, 0.0904210 secs] [Times: user=1.85 sys=0.29, > real=0.51 secs] > 2010-03-25T15:30:40.401-0400: 86836.008: [CMS-concurrent-sweep-start] > 86836.421: [GC 86836.422: [ParNew: 4814154K->680591K(5017600K), > 0.2031305 secs] 12581870K->8543701K(13619200K), 0.2033332 secs] > [Times: user=1.88 sys=0.04, real=0.20 secs] > 86836.627: [GC 86836.627: [ParNew (promotion failed): 720747K- > >511306K(5017600K), 1.3076955 secs] 8583857K->8560580K(13619200K), > 1.3078889 secs] [Times: user=2.66 sys=0.78, real=1.31 secs] > GC locker: Trying a full collection because scavenge failed > 86837.935: [Full GC 86837.935: [CMS2010-03-25T15:30:46.850-0400: > 86842.457: [CMS-concurrent-sweep: 4.926/6.449 secs] [Times: > user=15.24 sys=1.19, real=6.45 secs] > (concurrent mode failure): 8049273K->1356962K(8601600K), 9.6514031 > secs] 8560580K->1356962K(13619200K), [CMS Perm : 670523K- > >670523K(819200K)], 9.6515260 secs] [Times: user=9.65 sys=0.00, > real=9.65 secs] > 86848.669: [GC 86848.669: [ParNew: 4301133K->201781K(5017600K), > 0.0452702 secs] 5658095K->1558743K(13619200K), 0.0454738 secs] > [Times: user=0.57 sys=0.00, real=0.05 secs] > > 11) - So here our scavenge failed, - this is what gave us the "Full > GC" log message -- the concurrent mode failure > was really just a coincidence/timing? The full gc (triggered by > the promotion failure) aborts the tenured CMS collection > does it not? The "GC locker" message says that after a JNI critical section was exited the GC wanted to do a scavenge but did not think there was enough room in the old gen so it does a full STW compaction. Because a CMS concurrent collection was in progress, it is aborted and that abortion causes the concurrent mode failure to be printed. Not a coincidence. Just telling you that the CMS concurrent collection could not be completed for some reason. > > > thanks, > Shaun > > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100330/27351cfc/attachment-0001.html From volkan at hatem.net Tue Mar 30 22:50:00 2010 From: volkan at hatem.net (Volkan Hatem) Date: Wed, 31 Mar 2010 08:50:00 +0300 Subject: Detecing the generation an object belongs to Message-ID: Hi, How can I detect to which generation an object belongs? (regular java, JVMTI, ?) Is it safe to assume that all generations are allocated contigiously? Hence detecting starting offset & size of a generation may give me what I'm looking for? Where can I find more information about how GC interacts with JVM? In other words, how can I implement agents which interacts with GC the way JVMTI does? This will require more than detecting start/stop of GC collection. What GC had detected as reachable/unreachable, decision to promote an object etc. Thanks, -volkan From tony.printezis at oracle.com Wed Mar 31 08:52:20 2010 From: tony.printezis at oracle.com (Tony Printezis) Date: Wed, 31 Mar 2010 11:52:20 -0400 Subject: G1 Logging and Phases In-Reply-To: <32DB14FBADE66B439D6A98D2B01D27EC12E2DB86@sgtulmsp02.Global.ad.sabre.com> References: <32DB14FBADE66B439D6A98D2B01D27EC12E2DB86@sgtulmsp02.Global.ad.sabre.com> Message-ID: <4BB36FB4.9040903@oracle.com> Ryan, Hi. See inline. Highley, Ryan wrote: > > Hello, > > I have a few validation questions based on the following G1 GC log > excerpt. > > 19.298: [GC pause (young) (initial-mark) 2141M->1957M(4000M)20.433: > [GC concurrent-mark-start] > > , 1.1346080 secs] > > 20.721: [GC pause (young)20.761: [GC concurrent-mark-end, 0.2501590 sec] > > 2300M->2119M(4000M), 1.3597880 secs] > > 22.081: [GC remark, 0.0012060 secs] > > 22.083: [GC concurrent-count-start] > > 22.452: [GC pause (young) 2469M->2119M(4000M), 0.3101740 secs] > > 22.976: [GC concurrent-count-end, 0.8934030] > > 23.063: [GC cleanup 2435M->639M(4000M), 0.0142190 secs] > > 23.078: [GC concurrent-cleanup-start] > > 23.086: [GC concurrent-cleanup-end, 0.0083300] > > 23.108: [GC pause (young) 660M->338M(4000M), 0.2031580 secs] > > 23.632: [GC pause (partial) 668M->346M(4000M), 0.3303010 secs] > > From what I read in concurrentMarkThread.cpp and g1CollectorPolicy.cpp > in the current OpenJDK 7 G1 source code, could I please get the > following confirmed or denied to ensure my understanding is correct? > > The ?GC pause? lines are indeed stop-the-world pauses as blatantly > marked. J > Yes. > > The ?GC pause (young)? lines are only evacuating young gen regions. > Yes. > > The ?GC pause (young) (initial-mark)? line is both evacuating young > gen regions and prepping for a full collection with initial marking of > roots. > > Yes, provided by a "full collection" you mean a marking cycle (by "full collections" we refer to GCs that collect the entire heap in a STW phase). > > The ?remark? line is also a stop-the-world pause, i.e. the G1 ?Final > Marking Pause? noted in the G1 whitepaper. > Yes. > > The ?cleanup? line is also a stop-the-world pause and finalizes the > live object counts and sizes. > Yes. And it also reclaims regions that have no live objects. > > The heap allocated size should be the ending size reported on the > ?cleanup? line plus any allocations and/or GC activity that occur > during the concurrent cleanup phase. > > I'm not quite sure what you're trying to say. The line below 23.063: [GC cleanup 2435M->639M(4000M), 0.0142190 secs] says that there was a lot of garbage in your heap and the marking phase found 1796MB worth of regions that contained no live objects. So, they were reclaimed during cleanup. > > All ?concurrent? lines run concurrently with application threads, > similar to the CMS behavior, and in a set of one or more > ConcurrentMarkThread instances. > There's only one ConcurrentMarkThread (which is the "controller" thread if you want), which spawns parallel workers. Tony > > > Thank you for your help, > > Ryan > > ------------------------------------------------------------------------ > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > From Vishal.Bhasin at sabre-holdings.com Wed Mar 31 09:20:12 2010 From: Vishal.Bhasin at sabre-holdings.com (Bhasin, Vishal) Date: Wed, 31 Mar 2010 11:20:12 -0500 Subject: G1 Logging and Phases In-Reply-To: <4BB36FB4.9040903@oracle.com> References: <32DB14FBADE66B439D6A98D2B01D27EC12E2DB86@sgtulmsp02.Global.ad.sabre.com> <4BB36FB4.9040903@oracle.com> Message-ID: <3FA4C6109D066541B52E1AF02D28A914093FEF6E@sgtulmsp04.Global.ad.sabre.com> Tony, So, based on your response - GC pause (young), GC pause (partial), GC remark & GC cleanup are stop-the-world, any other lines in log (gc activity) that would also be stop-the-world? Thanks! -----Original Message----- From: hotspot-gc-use-bounces at openjdk.java.net [mailto:hotspot-gc-use-bounces at openjdk.java.net] On Behalf Of Tony Printezis Sent: Wednesday, March 31, 2010 10:52 AM To: Highley, Ryan Cc: hotspot-gc-use at openjdk.java.net Subject: Re: G1 Logging and Phases Ryan, Hi. See inline. Highley, Ryan wrote: > > Hello, > > I have a few validation questions based on the following G1 GC log > excerpt. > > 19.298: [GC pause (young) (initial-mark) 2141M->1957M(4000M)20.433: > [GC concurrent-mark-start] > > , 1.1346080 secs] > > 20.721: [GC pause (young)20.761: [GC concurrent-mark-end, 0.2501590 sec] > > 2300M->2119M(4000M), 1.3597880 secs] > > 22.081: [GC remark, 0.0012060 secs] > > 22.083: [GC concurrent-count-start] > > 22.452: [GC pause (young) 2469M->2119M(4000M), 0.3101740 secs] > > 22.976: [GC concurrent-count-end, 0.8934030] > > 23.063: [GC cleanup 2435M->639M(4000M), 0.0142190 secs] > > 23.078: [GC concurrent-cleanup-start] > > 23.086: [GC concurrent-cleanup-end, 0.0083300] > > 23.108: [GC pause (young) 660M->338M(4000M), 0.2031580 secs] > > 23.632: [GC pause (partial) 668M->346M(4000M), 0.3303010 secs] > > From what I read in concurrentMarkThread.cpp and g1CollectorPolicy.cpp > in the current OpenJDK 7 G1 source code, could I please get the > following confirmed or denied to ensure my understanding is correct? > > The "GC pause" lines are indeed stop-the-world pauses as blatantly > marked. J > Yes. > > The "GC pause (young)" lines are only evacuating young gen regions. > Yes. > > The "GC pause (young) (initial-mark)" line is both evacuating young > gen regions and prepping for a full collection with initial marking of > roots. > > Yes, provided by a "full collection" you mean a marking cycle (by "full collections" we refer to GCs that collect the entire heap in a STW phase). > > The "remark" line is also a stop-the-world pause, i.e. the G1 "Final > Marking Pause" noted in the G1 whitepaper. > Yes. > > The "cleanup" line is also a stop-the-world pause and finalizes the > live object counts and sizes. > Yes. And it also reclaims regions that have no live objects. > > The heap allocated size should be the ending size reported on the > "cleanup" line plus any allocations and/or GC activity that occur > during the concurrent cleanup phase. > > I'm not quite sure what you're trying to say. The line below 23.063: [GC cleanup 2435M->639M(4000M), 0.0142190 secs] says that there was a lot of garbage in your heap and the marking phase found 1796MB worth of regions that contained no live objects. So, they were reclaimed during cleanup. > > All "concurrent" lines run concurrently with application threads, > similar to the CMS behavior, and in a set of one or more > ConcurrentMarkThread instances. > There's only one ConcurrentMarkThread (which is the "controller" thread if you want), which spawns parallel workers. Tony > > > Thank you for your help, > > Ryan > > ------------------------------------------------------------------------ > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > _______________________________________________ hotspot-gc-use mailing list hotspot-gc-use at openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From tony.printezis at oracle.com Wed Mar 31 09:20:45 2010 From: tony.printezis at oracle.com (Tony Printezis) Date: Wed, 31 Mar 2010 12:20:45 -0400 Subject: G1 Logging and Phases In-Reply-To: <3FA4C6109D066541B52E1AF02D28A914093FEF6E@sgtulmsp04.Global.ad.sabre.com> References: <32DB14FBADE66B439D6A98D2B01D27EC12E2DB86@sgtulmsp02.Global.ad.sabre.com> <4BB36FB4.9040903@oracle.com> <3FA4C6109D066541B52E1AF02D28A914093FEF6E@sgtulmsp04.Global.ad.sabre.com> Message-ID: <4BB3765D.2000304@oracle.com> Yes. Tony Bhasin, Vishal wrote: > Tony, > > So, based on your response - GC pause (young), GC pause (partial), GC > remark & GC cleanup are stop-the-world, any other lines in log (gc > activity) that would also be stop-the-world? Thanks! > > -----Original Message----- > From: hotspot-gc-use-bounces at openjdk.java.net > [mailto:hotspot-gc-use-bounces at openjdk.java.net] On Behalf Of Tony > Printezis > Sent: Wednesday, March 31, 2010 10:52 AM > To: Highley, Ryan > Cc: hotspot-gc-use at openjdk.java.net > Subject: Re: G1 Logging and Phases > > Ryan, > > Hi. See inline. > > Highley, Ryan wrote: > >> Hello, >> >> I have a few validation questions based on the following G1 GC log >> excerpt. >> >> 19.298: [GC pause (young) (initial-mark) 2141M->1957M(4000M)20.433: >> [GC concurrent-mark-start] >> >> , 1.1346080 secs] >> >> 20.721: [GC pause (young)20.761: [GC concurrent-mark-end, 0.2501590 >> > sec] > >> 2300M->2119M(4000M), 1.3597880 secs] >> >> 22.081: [GC remark, 0.0012060 secs] >> >> 22.083: [GC concurrent-count-start] >> >> 22.452: [GC pause (young) 2469M->2119M(4000M), 0.3101740 secs] >> >> 22.976: [GC concurrent-count-end, 0.8934030] >> >> 23.063: [GC cleanup 2435M->639M(4000M), 0.0142190 secs] >> >> 23.078: [GC concurrent-cleanup-start] >> >> 23.086: [GC concurrent-cleanup-end, 0.0083300] >> >> 23.108: [GC pause (young) 660M->338M(4000M), 0.2031580 secs] >> >> 23.632: [GC pause (partial) 668M->346M(4000M), 0.3303010 secs] >> >> From what I read in concurrentMarkThread.cpp and g1CollectorPolicy.cpp >> > > >> in the current OpenJDK 7 G1 source code, could I please get the >> following confirmed or denied to ensure my understanding is correct? >> >> The "GC pause" lines are indeed stop-the-world pauses as blatantly >> marked. J >> >> > Yes. > >> The "GC pause (young)" lines are only evacuating young gen regions. >> >> > Yes. > >> The "GC pause (young) (initial-mark)" line is both evacuating young >> gen regions and prepping for a full collection with initial marking of >> > > >> roots. >> >> >> > Yes, provided by a "full collection" you mean a marking cycle (by "full > collections" we refer to GCs that collect the entire heap in a STW > phase). > >> The "remark" line is also a stop-the-world pause, i.e. the G1 "Final >> Marking Pause" noted in the G1 whitepaper. >> >> > Yes. > >> The "cleanup" line is also a stop-the-world pause and finalizes the >> live object counts and sizes. >> >> > Yes. And it also reclaims regions that have no live objects. > >> The heap allocated size should be the ending size reported on the >> "cleanup" line plus any allocations and/or GC activity that occur >> during the concurrent cleanup phase. >> >> >> > I'm not quite sure what you're trying to say. The line below > > 23.063: [GC cleanup 2435M->639M(4000M), 0.0142190 secs] > > says that there was a lot of garbage in your heap and the marking phase > found 1796MB worth of regions that contained no live objects. So, they > were reclaimed during cleanup. > >> All "concurrent" lines run concurrently with application threads, >> similar to the CMS behavior, and in a set of one or more >> ConcurrentMarkThread instances. >> >> > There's only one ConcurrentMarkThread (which is the "controller" thread > if you want), which spawns parallel workers. > > Tony > >> Thank you for your help, >> >> Ryan >> >> >> > ------------------------------------------------------------------------ > >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> >> > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > From shaun.hennessy at alcatel-lucent.com Wed Mar 31 10:45:11 2010 From: shaun.hennessy at alcatel-lucent.com (Shaun Hennessy) Date: Wed, 31 Mar 2010 13:45:11 -0400 Subject: understanding GC logs In-Reply-To: <148A7A57-B616-4CA4-9571-0F0216B0F650@oracle.com> References: <4AC0EEAE.5010705@Sun.COM> <4AC145AE.30804@sun.com> <4AC148B6.7010608@Sun.COM> <4AC1493A.2030004@sun.com> <4B4B3ECB.5090105@alcatel-lucent.com> <4B4B937C.4080907@alcatel-lucent.com> <4B4B9FBC.4040103@sun.com> <4B4BA6AF.5080300@sun.com> <4B50C9BF.8060202@alcatel-lucent.com> <4B50DE95.4060901@Sun.COM> <4BABA011.7020801@alcatel-lucent.com> <4BB20B1C.4020608@alcatel-lucent.com> <148A7A57-B616-4CA4-9571-0F0216B0F650@oracle.com> Message-ID: <4BB38A27.8050408@alcatel-lucent.com> An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100331/34a87f81/attachment-0001.html From jon.masamitsu at oracle.com Wed Mar 31 12:21:13 2010 From: jon.masamitsu at oracle.com (Jon Masamitsu) Date: Wed, 31 Mar 2010 12:21:13 -0700 Subject: understanding GC logs In-Reply-To: <4BB38A27.8050408@alcatel-lucent.com> References: <4AC0EEAE.5010705@Sun.COM> <4AC145AE.30804@sun.com> <4AC148B6.7010608@Sun.COM> <4AC1493A.2030004@sun.com> <4B4B3ECB.5090105@alcatel-lucent.com> <4B4B937C.4080907@alcatel-lucent.com> <4B4B9FBC.4040103@sun.com> <4B4BA6AF.5080300@sun.com> <4B50C9BF.8060202@alcatel-lucent.com> <4B50DE95.4060901@Sun.COM> <4BABA011.7020801@alcatel-lucent.com> <4BB20B1C.4020608@alcatel-lucent.com> <148A7A57-B616-4CA4-9571-0F0216B0F650@oracle.com> <4BB38A27.8050408@alcatel-lucent.com> Message-ID: <4BB3A0A9.8090306@oracle.com> Shaun, I tried redirecting the output with -Xloggc: and I still see everything I'm expecting. Who else out there is seeing the problems (1, 2, 3 below) with the GC logging that Shaun is seeing? Jon On 03/31/10 10:45, Shaun Hennessy wrote: > Hmm just looking around here -- I have 3 different applications all > using CMS -- > from glancing at some automated logs and a collections of machines I > can find it seems > that all application on all machines > 1) Don't log full date timestamps on minor collections > 2) No longer log the initial-mark > 3) No longer log the "CMS-remark" in that entry. > > All run 6u17 now, and have slightly different parameters.. > I'm pretty sure 3) was also true for me even before 6u17, I thought > 2) did > work in 6u12, and unsure on 1). I'll see if I can roll back to confirm > how 6u12 behaves for me and try stripping our options.... > > We are mostly a x64 AMD (4140/444/4600) shop -- but I did check a > Intel 4170 > and a SPARC v440 and same problems seem to exist: > > Here's a typical example from a different app: > > # uname -a > SunOS parking 5.10 Generic_139556-08 i86pc i386 i86pc > > 16:44:57,699 INFO [ServerInfo] Java version: 1.6.0_17,Sun > Microsystems Inc. > 16:44:57,699 INFO [ServerInfo] Java VM: Java HotSpot(TM) 64-Bit > Server VM 14.3-b01,Sun Microsystems Inc. > 16:44:57,699 INFO [ServerInfo] OS-System: SunOS 5.10,amd64 > > argv[0]: ../../jre/bin/amd64/java > argv[1]: -server > argv[2]: -DSAM > argv[3]: -XX:ThreadStackSize=512 > argv[4]: -Xms16384m > argv[5]: -Xmx16384m > argv[6]: -XX:PermSize=900m > argv[7]: -XX:NewSize=4096m > argv[8]: -XX:MaxNewSize=4096m > argv[9]: -XX:MaxPermSize=900m > argv[10]: -XX:+DisableExplicitGC > argv[11]: -XX:+UseConcMarkSweepGC > argv[12]: -XX:CMSInitiatingOccupancyFraction=75 > argv[13]: -XX:+UseParNewGC > argv[14]: -XX:+UseCMSCompactAtFullCollection > argv[15]: -XX:+CMSClassUnloadingEnabled > argv[16]: -XX:CMSInitiatingPermOccupancyFraction=95 > argv[17]: > -Djavax.xml.stream.XMLOutputFactory=com.bea.xml.stream.XMLOutputFactoryBase > argv[18]: > -Djavax.xml.stream.XMLInputFactory=com.bea.xml.stream.MXParserFactory > argv[19]: > -Djavax.xml.stream.XMLEventFactory=com.bea.xml.stream.EventFactory > argv[20]: -Dsam.thread.dump=true > argv[21]: -Xdebug > argv[22]: -Xnoagent > argv[23]: -Djava.compiler=NONE > argv[24]: -XX:+PrintClassHistogram > argv[25]: -XX:+HeapDumpOnOutOfMemoryError > argv[26]: -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5005 > argv[27]: -Dcom.sun.management.jmxremote > argv[28]: -Dcom.sun.management.jmxremote.authenticate=false > argv[29]: -Dcom.sun.management.jmxremote.ssl=false > argv[30]: -Dcom.sun.management.jmxremote.port=9999 > argv[31]: -verbose:gc > argv[32]: -XX:+PrintGCDetails > argv[33]: -XX:+PrintGCDateStamps > argv[34]: > -Djava.security.policy=../../nms/jboss/server/default/conf/server.policy > argv[35]: -Dcom.timetra.nms.propertyFile=../../nms/config/nms-server.xml > argv[36]: > -Dcom.timetra.nms.propertyFileMetrics=../../nms/config/nms-metrics.xml > argv[37]: > -Dcom.timetra.nms.propertyJaasConfigFile=../../nms/config/SamJaasLogin.config > argv[38]: -Djava.endorsed.dirs=../../nms/jboss/lib/endorsed > argv[39]: -Ddrools.compiler=JANINO > argv[40]: -Dmap.dao.descriptions.directory=../../nms/config/map > argv[41]: > -Xloggc:../../nms/log/server/GC_logs/GC_trace_032610-16:44:56.log > argv[42]: -classpath > argv[43]: > :../../nms/lib/jdom/jdom.jar:../../nms/lib/installer/nms_installer.jar:../../nms/jboss/bin/run.jar > argv[44]: > com.timetra.nms.installer.appservertools.ApplicationServerStarter > argv[45]: start > argv[46]: default > argv[47]: 138.120.183.17 > > > > Jon Masamitsu wrote: >> On 03/30/10 07:30, Shaun Hennessy wrote: >>> A couple of question related to the GC logs and promotion failure >>> messages >>> >>> I am running 6u17. >>> >>> rgv[2]: -server >>> argv[3]: -Xms14000m >>> argv[4]: -Xmx14000m >>> argv[5]: -XX:PermSize=800m >>> argv[6]: -XX:NewSize=5600m >>> argv[7]: -XX:MaxNewSize=5600m >>> argv[8]: -XX:MaxPermSize=800m >>> argv[9]: -XX:+DisableExplicitGC >>> argv[10]: -XX:+UseConcMarkSweepGC >>> argv[11]: -XX:+UseParNewGC >>> argv[12]: -XX:+UseCMSCompactAtFullCollection >>> argv[13]: -XX:+CMSClassUnloadingEnabled >>> argv[28]: -verbose:gc >>> argv[29]: -XX:+PrintGCDetails >>> argv[30]: -XX:+PrintGCDateStamps >>> >>> >>> >>> 4112.744: [GC 4112.744: [ParNew: 4940531K->530787K(5017600K), >>> 0.1455641 secs] 11878708K->7540012K(13619200K), 0.1457559 secs] >>> [Times: user=1.38 sys=0.02, real=0.15 secs] >>> 4113.780: [GC 4113.780: [ParNew: 4831587K->372801K(5017600K), >>> 0.2093305 secs] 11840812K->7551390K(13619200K), 0.2095270 secs] >>> [Times: user=1.34 sys=0.07, real=0.21 secs] >>> [Times: user=0.10 sys=0.00, real=0.11 secs] >>> 2010-03-24T16:31:56.490-0400: 4114.097: [CMS-concurrent-mark-start] >>> 4115.261: [GC 4115.261: [ParNew: 4674075K->364108K(5017600K), >>> 0.0755017 secs] 11852663K->7542736K(13619200K), 0.0756880 secs] >>> [Times: user=0.93 sys=0.00, real=0.08 secs] >>> 4115.338: [GC 4115.338: [ParNew: 420064K->323310K(5017600K), >>> 0.1112115 secs] 7598693K->7587370K(13619200K), 0.1113667 secs] >>> [Times: user=0.98 sys=0.02, real=0.11 secs] >>> 2010-03-24T16:31:58.647-0400: 4116.254: [CMS-concurrent-mark: >>> 1.909/2.157 secs] [Times: user=31.47 sys=1.55, real=2.16 secs] >>> 2010-03-24T16:31:58.647-0400: 4116.254: [CMS-concurrent-preclean-start] >>> 2010-03-24T16:31:58.798-0400: 4116.405: [CMS-concurrent-preclean: >>> 0.149/0.151 secs] [Times: user=2.29 sys=0.12, real=0.15 secs] >>> 2010-03-24T16:31:58.799-0400: 4116.406: >>> [CMS-concurrent-abortable-preclean-start] >>> 4116.460: [GC 4116.460: [ParNew: 4624110K->301464K(5017600K), >>> 0.0914784 secs] 11888170K->7617401K(13619200K), 0.0916679 secs] >>> [Times: user=0.89 sys=0.03, real=0.09 secs] >>> *2010-03-24T16:31:59.494-0400: 4117.101: >>> [CMS-concurrent-abortable-preclean: 0.596/0.695 secs] [Times: >>> user=9.88 sys=0.60, real=0.70 secs] >>> [YG occupancy: 2756990 K (5017600 K)]4117.102: [Rescan (parallel) , >>> 0.4648394 secs]4117.567: [weak refs processing, 0.0028851 >>> secs]4117.570: [class unloading, 0.0240174 secs]4117.594: [scrub >>> symbol & string tables, 0.0898531 secs] [Times: user=1.72 sys=0.37, >>> real=0.58 secs]* >>> 2010-03-24T16:32:00.079-0400: 4117.686: [CMS-concurrent-sweep-start] >>> 4118.116: [GC 4118.116: [ParNew: 4602264K->305089K(5017600K), >>> 0.0712571 secs] 11891816K->7620802K(13619200K), 0.0714474 secs] >>> [Times: user=0.75 sys=0.00, real=0.07 secs] >>> 4119.117: [GC 4119.117: [ParNew: 4605889K->263281K(5017600K), >>> 0.0842051 secs] 11665429K->7368947K(13619200K), 0.0843955 secs] >>> [Times: user=0.79 sys=0.01, real=0.08 secs] >>> 4125.941: [GC 4125.941: [ParNew: 4936868K->708251K(5017600K), >>> 0.1426036 secs] 9789305K->5612975K(13619200K), 0.1427944 secs] >>> [Times: user=1.56 sys=0.01, real=0.14 secs] >>> 4126.893: [GC 4126.894: [ParNew: 5009051K->485783K(5017600K), >>> 0.2210054 secs] 9536611K->5247528K(13619200K), 0.2211964 secs] >>> [Times: user=1.58 sys=0.04, real=0.22 secs] >>> 4128.102: [GC 4128.102: [ParNew: 4786583K->455386K(5017600K), >>> 0.0748814 secs] 8588693K->4257495K(13619200K), 0.0750694 secs] >>> [Times: user=0.94 sys=0.00, real=0.08 secs] >>> 2010-03-24T16:32:11.951-0400: 4129.558: [CMS-concurrent-sweep: >>> 10.777/11.872 secs] [Times: user=149.77 sys=7.25, real=11.87 secs] >>> 2010-03-24T16:32:11.951-0400: 4129.558: [CMS-concurrent-reset-start] >>> 2010-03-24T16:32:11.984-0400: 4129.591: [CMS-concurrent-reset: >>> 0.033/0.033 secs] [Times: user=0.04 sys=0.00, real=0.03 secs] >>> 4140.537: [GC 4140.537: [ParNew: 4756186K->539705K(5017600K), >>> 0.0873384 secs] 6572247K->2355767K(13619200K), 0.0875281 secs] >>> [Times: user=1.10 sys=0.00, re >>> al=0.09 secs] >>> >>> >>> 1) I no longer seem to get any "|CMS-initial-mark" - is this a >>> change since 6u12? >>> | >> >> I'm running >> >> Java(TM) SE Runtime Environment (build 1.6.0_17-b04) >> Java HotSpot(TM) Server VM (build 14.3-b01, mixed mode) >> >> and I see entries such as >> >> 0.869: [GC [1 CMS-initial-mark: 22901K(229376K)] 28264K(258880K), >> 0.0561592 secs] [Times: user=0.05 sys=0.01, real=0.06 secs] >> >>> |2) The rescan portion than is the only non-concurrent correct? So >>> from the above the application was only STW for 0.58 sec. >>> | >> >> The initial-mark is also STW. >> >> The rescan is part of the remark which is STW. From my run >> >> 1.270: [GC[YG occupancy: 15860 K (29504 K)]1.270: [Rescan (parallel) >> , 0.1002467 secs]1.370: [weak refs processing, 0.0000167 secs] [1 >> CMS-remark: 22901K(229376K)] 38761K(258880K), 0.1004598 secs] [Times: >> user=0.11 sys=0.03, real=0.10 secs] >> >> >> In your entry yes it is .058 sec. >> >>> |3) This may have been a chance from 1.5 to 1.6, but this line also >>> used to display a CMS-remark did it not?| >> >> Yes, see my example above. >> >>> |4) Is there a way to have my minor collections also display the >>> full date stamp (ie |2010-03-24T16:31:58.799-0400) >>> >>> >> >> When I run with -XX:+PrintGCDateStamps I see entries such as >> >> 2010-03-30T16:06:55.297-0700: 2.602: [GC 2.602: [ParNew: >> 29504K->3264K(29504K), 0.2457302 secs] 52405K->38187K(258880K), >> 0.2460394 secs] [Times: user=0.41 sys=0.02, real=0.25 secs] >> >> I don't know why you're not seeing that. >>> >>> 1270.736: [GC 1270.736: [ParNew: 4872340K->574345K(5017600K), >>> 0.1967262 secs] 7123984K->2972742K(13619200K), 0.1969106 secs] >>> [Times: user=1.45 sys=0.05, re >>> al=0.20 secs] >>> 1272.024: [GC 1272.024: [ParNew: 4875542K->653139K(5017600K), >>> 0.1334760 secs] 7273939K->3051536K(13619200K), 0.1336582 secs] >>> [Times: user=1.54 sys=0.01, re >>> al=0.13 secs] >>> *1272.158: [GC 1272.159: [ParNew: 681949K->563105K(5017600K), >>> 0.2187865 secs] 3080347K->3158904K(13619200K), 0.2189362 secs] >>> [Times: user=1.48 sys=0.06, rea >>> l=0.22 secs]* >>> 1273.398: [GC 1273.398: [ParNew: 4863905K->535051K(5017600K), >>> 0.1196808 secs] 7459704K->3130850K(13619200K), 0.1198694 secs] >>> [Times: user=1.51 sys=0.00, re >>> al=0.12 secs] >>> 1274.461: [GC 1274.461: [ParNew: 4835851K->399744K(5017600K), >>> 0.2861376 secs] 7431650K->3249248K(13619200K), 0.2863296 secs] >>> [Times: user=1.61 sys=0.09, re >>> al=0.29 secs] >>> >>> 5) Why did the middle minor collection occur? A big allocation? >>> >> >> That seems rather suspicious. It may be a side effect of using JNI >> critical sections. I don't >> know if this is such a case but its the best behavior. >> >>> >>> >>> - Promotion Failure >>> 4896.478: [GC 4896.478: [ParNew: 4894353K->587864K(5017600K), >>> 0.4789909 secs] 8473688K->4268560K(13619200K), 0.4791812 secs] >>> [Times: user=1.00 sys=0.61, real=0.48 secs] >>> 4897.812: [GC 4897.812: [ParNew: 4888664K->545903K(5017600K), >>> 0.4105613 secs] 8569360K->4326583K(13619200K), 0.4107560 secs] >>> [Times: user=1.06 sys=0.55, real=0.41 secs] >>> 4899.057: [GC 4899.058: [ParNew: 4846703K->638966K(5017600K), >>> 0.2759734 secs] 8627383K->4496987K(13619200K), 0.2761637 secs] >>> [Times: user=1.13 sys=0.36, real=0.28 secs] >>> 4900.101: [GC 4900.101: [ParNew: 4939768K->630721K(5017600K), >>> 0.5117751 secs] 8797789K->4607020K(13619200K), 0.5119662 secs] >>> [Times: user=0.84 sys=0.66, real=0.51 secs] >>> 4900.615: [GC 4900.615: [ParNew: 651487K->487288K(5017600K), >>> 0.0780183 secs] 4627786K->4463587K(13619200K), 0.0781687 secs] >>> [Times: user=0.96 sys=0.00, real=0.08 secs] >>> *4901.581: [GC 4901.581: [ParNew (promotion failed): >>> 4788088K->4780999K(5017600K), 2.8947499 secs]4904.476: [CMS: >>> 4003090K->1530872K(8601600K), 7.5122451 secs] >>> 8764387K->1530872K(13619200K), [CMS Perm : >>> 671102K->671102K(819200K)], 10.4072102 secs] [Times: user=11.03 >>> sys=1.09, real=10.41 secs]* >>> 4913.024: [GC 4913.024: [ParNew: 4300800K->316807K(5017600K), >>> 0.0615917 secs] 5831672K->1847679K(13619200K), 0.0617857 secs] >>> [Times: user=0.74 sys=0.00, real=0.06 secs] >>> 4914.015: [GC 4914.015: [ParNew: 4617607K->475077K(5017600K), >>> 0.0771389 secs] 6148479K->2005949K(13619200K), 0.0773290 secs] >>> [Times: user=0.95 sys=0.00, real=0.08 secs] >>> 4914.908: [GC 4914.908: [ParNew: 4775877K->586339K(5017600K), >>> 0.0857102 secs] 6306749K->2117211K(13619200K), 0.0859046 secs] >>> [Times: user=1.06 sys=0.00, real=0.09 secs] >>> 4915.816: [GC 4915.816: [ParNew: 4887139K->476398K(5017600K), >>> 0.1841627 secs] 6418011K->2152868K(13619200K), 0.1843556 secs] >>> [Times: user=1.32 sys=0.07, real=0.18 secs] >>> 6) So here i had a promotion failure, this is due to fragmentation >>> of the tenured generation rather than lack of space? >> >> Fragmentation is the likely problem. When 6u20 is released try it. >> It does a better job >> of keeping fragmentation down. >> >>> 7) Do we need 1 contiguous space in tenured big enough to hold the >>> complete list/size of all objects being promoted, or >>> do multiple spaces get used& the pieces don't all fit? >> >> The free space in the tenured generation is kept in a free list so >> there are multiple chunks. >> Don't need 1 contiguous chunk for all the promotions. >> >>> 8) What exactly is occurring during this promotion failed >>> collection? Based on the next example I assume >>> it's a (successful) scavenge. What exactly is this - which >>> thread(s) serial / ParallelGCThreads?, >>> STW?, are we simply compacting the tenured gen or are we can >>> actually GC the tenured? >> >> A promotion failure is a scavenge that does not succeed because there >> is not enough >> space in the old gen to do all the needed promotions. The scavenge >> is in essence >> unwound and then a full STW compaction of the entire heap is done. >> >>> >>> >>> >>> promotion failed, and full GC >>> 50786.124: [GC 50786.124: [ParNew: 4606713K->338518K(5017600K), >>> 0.0961884 secs] 12303455K->8081859K(13619200K), 0.0963907 secs] >>> [Times: user=0.91 sys=0.01, real=0.10 secs] >>> 50787.373: [GC 50787.373: [ParNew: 4639318K->272229K(5017600K), >>> 0.0749353 secs] 12382659K->8053730K(13619200K), 0.0751408 secs] >>> [Times: user=0.75 sys=0.00, real=0.08 secs] >>> 50788.483: [GC 50788.483: [ParNew: 4573029K->393397K(5017600K), >>> 0.0837182 secs] 12354530K->8185595K(13619200K), 0.0839321 secs] >>> [Times: user=1.03 sys=0.00, real=0.08 secs] >>> 50789.590: [GC 50789.590: [ParNew (promotion failed): >>> 4694264K->4612345K(5017600K), 1.5974678 secs] >>> 12486461K->12447305K(13619200K), 1.5976765 secs] [Times : user=2.38 >>> sys=0.20, real=1.60 secs] >>> GC locker: Trying a full collection because scavenge failed >>> 50791.188: [Full GC 50791.188: [CMS: 7834959K->1227325K(8601600K), >>> 6.7102106 secs] 12447305K->1227325K(13619200K), [CMS Perm : >>> 670478K->670478K(819200K)], 6.7103417 secs] [Times: user=6.71 >>> sys=0.00, real=6.71 secs] >>> 50798.982: [GC 50798.982: [ParNew: 4300800K->217359K(5017600K), >>> 0.0364557 secs] 5528125K->1444685K(13619200K), 0.0366630 secs] >>> [Times: user=0.44 sys=0.00, real=0.04 secs] >>> 50800.246: [GC 50800.246: [ParNew: 4518167K->198753K(5017600K), >>> 0.0368620 secs] 5745493K->1426078K(13619200K), 0.0370604 secs] >>> [Times: user=0.46 sys=0.01, real=0.04 secs] >>> 9) Probably once I understand what the scavenge is doing will help >>> me understand this case, but logic seems >>> simply enough - fullgc on promotionfailure&scavenge failed. >> >> Yes, full STW compaction. >> >>> >>> >>> >>> promotion and concurrent mode failures >>> 53494.424: [GC 53494.424: [ParNew: 4979001K->716800K(5017600K), >>> 0.2120290 secs] 12583633K->8434774K(13619200K), 0.2122200 secs] >>> [Times: user=2.12 sys=0.03, real=0.21 secs] >>> 53496.131: [GC 53496.131: [ParNew: 5017600K->605278K(5017600K), >>> 0.2761710 secs] 12735574K->8578720K(13619200K), 0.2763597 secs] >>> [Times: user=1.94 sys=0.08, real=0.28 secs] >>> [Times: user=0.16 sys=0.00, real=0.16 secs] >>> 2010-03-25T06:14:58.961-0400: 53496.568: [CMS-concurrent-mark-start] >>> 53497.688: [GC 53497.688: [ParNew: 4906078K->545999K(5017600K), >>> 0.0989930 secs] 12879520K->8519441K(13619200K), 0.0991785 secs] >>> [Times: user=1.21 sys=0.02, real=0.10 secs] >>> 2010-03-25T06:15:00.188-0400: 53497.795: [CMS-concurrent-mark: >>> 1.107/1.227 secs] [Times: user=15.14 sys=0.42, real=1.23 secs] >>> 2010-03-25T06:15:00.188-0400: 53497.795: [CMS-concurrent-preclean-start] >>> 2010-03-25T06:15:00.233-0400: 53497.840: [CMS-concurrent-preclean: >>> 0.043/0.045 secs] [Times: user=0.31 sys=0.01, real=0.04 secs] >>> 2010-03-25T06:15:00.233-0400: 53497.840: >>> [CMS-concurrent-abortable-preclean-start] >>> 2010-03-25T06:15:00.794-0400: 53498.401: >>> [CMS-concurrent-abortable-preclean: 0.541/0.560 secs] [Times: >>> user=6.11 sys=0.22, real=0.56 secs] >>> [YG occupancy: 3222128 K (5017600 K)]53498.402: [Rescan (parallel) , >>> 0.4447462 secs]53498.847: [weak refs processing, 0.0028967 >>> secs]53498.850: [class unloading, 0.0248904 secs]53498.875: [scrub >>> symbol & string tables, 0.0896937 secs] [Times: user=1.79 sys=0.35, >>> real=0.56 secs] >>> 2010-03-25T06:15:01.360-0400: 53498.967: >>> [CMS-concurrent-sweep-start] 53499.350: [GC 53499.350: [ParNew >>> (promotion failed): 4846799K->4718254K(5017600K), 5.3142493 >>> secs]53504.664: [CMS2010-03-25T06:15:11.506-0400: 53509.113: >>> [CMS-concurrent-sweep: 4.825/10.146 secs] [Times: user=16.61 >>> sys=2.94, real=10.15 secs] >>> (concurrent mode failure): 8087820K->1346631K(8601600K), 11.0573075 >>> secs] 12820241K->1346631K(13619200K), [CMS Perm : >>> 670478K->670478K(819200K)], 16.37177 19 secs] [Times: user=17.62 >>> sys=2.66, real=16.37 secs] >>> 53516.713: [GC 53516.714: [ParNew: 4300800K->283359K(5017600K), >>> 0.0498000 secs] 5647431K->1629990K(13619200K), 0.0499965 secs] >>> [Times: user=0.62 sys=0.00, real=0.05 secs] >>> 53517.743: [GC 53517.743: [ParNew: 4584343K->340302K(5017600K), >>> 0.0544853 secs] 5930975K->1686933K(13619200K), 0.0546710 secs] >>> [Times: user=0.68 sys=0.00, real=0.05 secs] >>> 10) I think it's just the system is allocating at such at high rate >>> at this point in time ( and we don't use InitiatingOccupancy on this >>> app) >>> so we get close to full on tenured, minor collection came - no >>> room in tenured ---- so even though we don't say "Full GC" in this one, >>> don't you get a Full GC as part of any concurrent-mode-failure? >> >> The promotion failure that happens leads to the concurrent mode failure. >>> >>> >>> promotion failed, scavenge failed, concurrent mode failure >>> 86833.016: [GC 86833.017: [ParNew: 4769273K->453398K(5017600K), >>> 0.1316717 secs] 12418197K->8169164K(13619200K), 0.1319220 secs] >>> [Times: user=1.22 sys=0.03, real=0.13 secs] >>> [Times: user=0.14 sys=0.00, real=0.15 secs] >>> 2010-03-25T15:30:37.688-0400: 86833.295: [CMS-concurrent-mark-start] >>> 86834.751: [GC 86834.751: [ParNew: 4754198K->513298K(5017600K), >>> 0.1250485 secs] 12469964K->8281014K(13619200K), 0.1252553 secs] >>> [Times: user=1.38 sys=0.01, real=0.13 secs] >>> 2010-03-25T15:30:39.310-0400: 86834.917: [CMS-concurrent-mark: >>> 1.453/1.621 secs] [Times: user=21.57 sys=1.15, real=1.62 secs] >>> 2010-03-25T15:30:39.310-0400: 86834.917: [CMS-concurrent-preclean-start] >>> 2010-03-25T15:30:39.650-0400: 86835.258: [CMS-concurrent-preclean: >>> 0.337/0.341 secs] [Times: user=5.30 sys=0.18, real=0.34 secs] >>> 2010-03-25T15:30:39.651-0400: 86835.258: >>> [CMS-concurrent-abortable-preclean-start] >>> 2010-03-25T15:30:39.864-0400: 86835.471: >>> [CMS-concurrent-abortable-preclean: 0.211/0.214 secs] [Times: >>> user=3.16 sys=0.19, real=0.21 secs] >>> [YG occupancy: 3329361 K (5017600 K)]86835.500: [Rescan (parallel) , >>> 0.3868448 secs]86835.887: [weak refs processing, 0.0030042 >>> secs]86835.890: [class unloading, 0.0250008 secs]86835.915: [scrub >>> symbol & string tables, 0.0904210 secs] [Times: user=1.85 sys=0.29, >>> real=0.51 secs] >>> 2010-03-25T15:30:40.401-0400: 86836.008: [CMS-concurrent-sweep-start] >>> 86836.421: [GC 86836.422: [ParNew: 4814154K->680591K(5017600K), >>> 0.2031305 secs] 12581870K->8543701K(13619200K), 0.2033332 secs] >>> [Times: user=1.88 sys=0.04, real=0.20 secs] >>> 86836.627: [GC 86836.627: [ParNew (promotion failed): >>> 720747K->511306K(5017600K), 1.3076955 secs] >>> 8583857K->8560580K(13619200K), 1.3078889 secs] [Times: user=2.66 >>> sys=0.78, real=1.31 secs] >>> GC locker: Trying a full collection because scavenge failed >>> 86837.935: [Full GC 86837.935: [CMS2010-03-25T15:30:46.850-0400: >>> 86842.457: [CMS-concurrent-sweep: 4.926/6.449 secs] [Times: >>> user=15.24 sys=1.19, real=6.45 secs] >>> (concurrent mode failure): 8049273K->1356962K(8601600K), 9.6514031 >>> secs] 8560580K->1356962K(13619200K), [CMS Perm : >>> 670523K->670523K(819200K)], 9.6515260 secs] [Times: user=9.65 >>> sys=0.00, real=9.65 secs] >>> 86848.669: [GC 86848.669: [ParNew: 4301133K->201781K(5017600K), >>> 0.0452702 secs] 5658095K->1558743K(13619200K), 0.0454738 secs] >>> [Times: user=0.57 sys=0.00, real=0.05 secs] >>> >>> 11) - So here our scavenge failed, - this is what gave us the "Full >>> GC" log message -- the concurrent mode failure >>> was really just a coincidence/timing? The full gc (triggered by >>> the promotion failure) aborts the tenured CMS collection >>> does it not? >> >> The "GC locker" message says that after a JNI critical section was >> exited the GC wanted to >> do a scavenge but did not think there was enough room in the old gen >> so it does a full >> STW compaction. Because a CMS concurrent collection was in progress, >> it is aborted >> and that abortion causes the concurrent mode failure to be printed. >> Not a coincidence. >> Just telling you that the CMS concurrent collection could not be >> completed for some >> reason. >> >> >>> >>> >>> thanks, >>> Shaun >>> >>> >>> >>> ------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> hotspot-gc-use mailing list >>> hotspot-gc-use at openjdk.java.net >>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100331/a0d78428/attachment-0001.html