From y.s.ramakrishna at oracle.com Tue Oct 5 23:06:50 2010 From: y.s.ramakrishna at oracle.com (Y. Srinivas Ramakrishna) Date: Tue, 05 Oct 2010 23:06:50 -0700 Subject: HandlePromotionFailure flag? Message-ID: <4CAC11FA.70203@oracle.com> Is there anyone on this alias using 6.0 or later who explicitly turns off the above flag in a production setting, i.e. uses -XX:-HandlePromotionFailure. Let me know if you do so, along with the reason why you chose to do so. I am considering removing the ability to disable this flag, at least when using CMS (if not more broadly). thanks! -- ramki From shaun.shen at oracle.com Wed Oct 6 12:04:04 2010 From: shaun.shen at oracle.com (shaun.shen at oracle.com) Date: Wed, 6 Oct 2010 12:04:04 -0700 (PDT) Subject: Auto Reply: hotspot-gc-use Digest, Vol 32, Issue 1 Message-ID: <09759dad-80c0-4e2d-a1e6-f8cbbe09325b@default> Thank you for your message. I am on training from 5 to 8 Oct and can't reply you right now. - For MCS, pls contact Attaporn (attaporn.thongkiatcharoen at oracle.com, +65 93575992) - Or call me +65 9878 6375 if needed. Cheers, Shaun From higuava at gmail.com Thu Oct 14 10:42:47 2010 From: higuava at gmail.com (Hi Guava) Date: Thu, 14 Oct 2010 13:42:47 -0400 Subject: Different Full GCs? Message-ID: I've seen different full GC messages and I don't quite understand them: 13011:1474.283: [Full GC 1474.283: [CMS: 3333048K->1021496K(12540352K), 6.3444520 secs] 3363388K->1021496K(12582848K), [CMS Perm : 30528K->30483K(30656K)], 6.3447880 secs] 34822:4101.808: [Full GC 4101.808: [ParNew: 667K->0K(42496K), 0.0210030 secs] 5953849K->5953209K(12582848K), 0.0211110 secs] 51586:25535.616: [Full GC 25535.616: [ParNew: 750K->0K(42496K), 0.0324350 secs] 5341677K->5340939K(12582848K), 0.0326130 secs] 63486:26306.646: [Full GC 26306.646: [CMS[Unloading class sun.reflect.GeneratedConstructorAccessor20] Both GC #1 and #4 are triggered by System.gc() in our code. I believe they are the same type. There was less memory available during #4 so it unloaded classes (soft reference?). But full gc is the stop-the-world gc. Why does it mention CMS in the message? GC #2 and #3 look weird to me. They were not triggered by System.gc(). They are always very short and the duration is about the same as young generation GCs. In fact, the message is exact like young generation GCs except the extra word "Full". What are these short full gcs? Are there different level of full GCs? I spent some time searching for answer but I am still confused. Can somebody help explain and suggest some reading materials? Thanks! The environment: Java HotSpot(TM) 64-Bit Server VM Version 1.5.0_19-b02 Linux Version 2.6.9-89.0.20.ELsmp amd64 -Xms12g -Xmx12g -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UseConcMarkSweepGC From y.s.ramakrishna at oracle.com Thu Oct 14 11:20:20 2010 From: y.s.ramakrishna at oracle.com (Y. S. Ramakrishna) Date: Thu, 14 Oct 2010 11:20:20 -0700 Subject: Different Full GCs? In-Reply-To: References: Message-ID: <4CB749E4.2000909@oracle.com> Hello -- On 10/14/10 10:42, Hi Guava wrote: > I've seen different full GC messages and I don't quite understand them: > 13011:1474.283: [Full GC 1474.283: [CMS: > 3333048K->1021496K(12540352K), 6.3444520 secs] > 3363388K->1021496K(12582848K), [CMS Perm : 30528K->30483K(30656K)], > 6.3447880 secs] This is a full gc; it collects the whole heap. > 34822:4101.808: [Full GC 4101.808: [ParNew: 667K->0K(42496K), > 0.0210030 secs] 5953849K->5953209K(12582848K), 0.0211110 secs] > 51586:25535.616: [Full GC 25535.616: [ParNew: 750K->0K(42496K), > 0.0324350 secs] 5341677K->5340939K(12582848K), 0.0326130 secs] These two are not full gc's. They are mislabelled. They are likely scavenge's forced by the allocation policy interacting with JNI critical sections preventing a scavenge attempt made previously. I think the labelling has been fixed in 6uXX. > 63486:26306.646: [Full GC 26306.646: [CMS[Unloading class > sun.reflect.GeneratedConstructorAccessor20] Yes this is also a full gc. In 6uXX, the first and the last would be labelled with an additional "System.gc()" label, and the two middle ones would not say "Full". I don't have a bug id handy to point you to, but i might be able to dig one up after some archeology. > > Both GC #1 and #4 are triggered by System.gc() in our code. I believe > they are the same type. There was less memory available during #4 so > it unloaded classes (soft reference?). > But full gc is the stop-the-world gc. Why does it mention CMS in the message? You are right that the "CMS" is misleading in that sense. The idea was that it collects the old generation which is typically collected by CMS. I agree that the CMS label is misleading and probably should be fixed; it's a consequence of our internal naming scheme for generation "types". > GC #2 and #3 look weird to me. They were not triggered by System.gc(). > They are always very short and the duration is about the same as young > generation GCs. In fact, the message is exact like young generation > GCs except the extra word "Full". What are these short full gcs? Are > there different level of full GCs? No, and you are right that these are just scavenges. What must have happened is that your application probably does a few short-lived JNI critical sections (JNI_Get{Array,String}Critical) which happens around the time when another thread wants to do a large allocation which will not fit in the current space available in Eden, so the allocator attempts to do a scavenge, but is prevented from doing so because of the JNI critical section. This is remembered and when the critical section is exited, a scavenge is forced. At least that's my guess based on the messages above. (BTW, what's the size of your Eden or Young Gen? The policy should probably be a little smarter and not do those scavenges until an allocation request (would) fail.) > I spent some time searching for answer but I am still confused. Can > somebody help explain and suggest some reading materials? Hope that helps a bit. Try JDK 7 or 6u21 (or whatever is the latest) and see if the confusing messages are gone. If they are still there, let us know. thanks. -- ramki > > Thanks! > > The environment: > Java HotSpot(TM) 64-Bit Server VM Version 1.5.0_19-b02 > Linux Version 2.6.9-89.0.20.ELsmp > amd64 > > -Xms12g > -Xmx12g > -XX:+PrintGC > -XX:+PrintGCDetails > -XX:+PrintGCTimeStamps > -XX:+UseConcMarkSweepGC > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From adamh at basis.com Thu Oct 14 11:31:23 2010 From: adamh at basis.com (Adam Hawthorne) Date: Thu, 14 Oct 2010 14:31:23 -0400 Subject: CMS initial mark pauses Message-ID: Hi all, I'm seeing a customer running CMS with some fairly long initial mark pauses, especially relative to all the other pauses. The machine is (I believe) an Itanium 6-way running HP-UX, with HP's version of Hotspot, version 1.6.0.06 . What I'm seeing is that all the remark and young gc pauses are less than 500ms, and very consistent. There are a lot of initial mark pauses that also fall in this range. That's our target, and things look really good. The problem is that there are occasional pauses of up to 1.5s, which is unacceptable for the customer. Does anyone have any ideas what can cause (seemingly random) long initial mark pauses? Here's an example of one of the long mark pauses: 45946.930: [GC [1 CMS-initial-mark: 957254K(1598236K)] 1473999K(2151196K), 1.3505680 secs] [Times: user=1.34 sys=0.00, real=1.35 secs] Here's an example of one of the more typical pauses: 45954.362: [GC [1 CMS-initial-mark: 963824K(1598236K)] 1001370K(2151196K), 0.1579016 secs] [Times: user=0.16 sys=0.00, real=0.16 secs] >From my understanding, initial mark pauses are supposed to be relatively short, and usually shorter than remark pauses, but I don't have a remark pause greater than 300ms. Any help is appreciated. Thanks, Adam -- Adam Hawthorne Software Engineer BASIS International Ltd. www.basis.com +1.505.345.5232 Phone -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101014/891c9e30/attachment.html From y.s.ramakrishna at oracle.com Thu Oct 14 11:56:31 2010 From: y.s.ramakrishna at oracle.com (Y. S. Ramakrishna) Date: Thu, 14 Oct 2010 11:56:31 -0700 Subject: CMS initial mark pauses In-Reply-To: References: Message-ID: <4CB7525F.9070403@oracle.com> Hi Adam -- Do you have a fuller GC log (perhaps including PrintCMSStatistics=2) to help make a sharper diagnosis? It could be:- 6412968 CMS: Long initial mark pauses which we have unfortunately not gotten to addressing yet: CMS initial mark work is (still) done single-threaded. Usually there is little such work so we are usually fine, but if survivor spaces are large and full and/or if CMS triggering occupancy is such that CMS runs frequently then you can be affected by long serial initial mark pauses because the work is non-trivial. (CMS-remark and scavenges on the other hand are done by several worker threads working in parallel.) Here's an excerpt from the "workaround" section of that bug (reproduced here because i cannot seem to get bugs.sun.com to display it) :- > This is not really a viable workaround since it might lead to suboptimal > heap configuration: > (1) use no survivor spaces (at the risk of larger scavenge pauses, larger remark pauses, > even concurrent mode failures) > (2) use a sufficiently large heap so as to be able to afford to set a > mark initiation threshold above the low water-mark (after a major > collection cycle). This will keep init-mark's riding on the coat-tails > of scavenges. > *** (#1 of 2): 2006-04-13 09:53:14 PDT xxxx at oracle.com > > Also, if using iCMS (Inceremental CMS), drop the Incremental mode and revert to > vanilla CMS. > *** (#2 of 2): 2010-04-14 11:02:03 PDT xxxx at oracle.com If you have support, you can try escalating it via your support channels to get this addressed, especially if the workaround/retuning doesn't do the job. -- ramki On 10/14/10 11:31, Adam Hawthorne wrote: > Hi all, > > I'm seeing a customer running CMS with some fairly long initial mark > pauses, especially relative to all the other pauses. The machine is (I > believe) an Itanium 6-way running HP-UX, with HP's version of Hotspot, > version 1.6.0.06 . > > What I'm seeing is that all the remark and young gc pauses are less than > 500ms, and very consistent. There are a lot of initial mark pauses that > also fall in this range. That's our target, and things look really > good. The problem is that there are occasional pauses of up to 1.5s, > which is unacceptable for the customer. Does anyone have any ideas what > can cause (seemingly random) long initial mark pauses? > > Here's an example of one of the long mark pauses: > > 45946.930: [GC [1 CMS-initial-mark: 957254K(1598236K)] > 1473999K(2151196K), 1.3505680 secs] [Times: user=1.34 sys=0.00, > real=1.35 secs] > > Here's an example of one of the more typical pauses: > > 45954.362: [GC [1 CMS-initial-mark: 963824K(1598236K)] > 1001370K(2151196K), 0.1579016 secs] [Times: user=0.16 sys=0.00, > real=0.16 secs] > > > From my understanding, initial mark pauses are supposed to be > relatively short, and usually shorter than remark pauses, but I don't > have a remark pause greater than 300ms. > > Any help is appreciated. Thanks, > > Adam > > -- > Adam Hawthorne > Software Engineer > BASIS International Ltd. > www.basis.com > +1.505.345.5232 Phone > > > ------------------------------------------------------------------------ > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From higuava at gmail.com Thu Oct 14 13:14:02 2010 From: higuava at gmail.com (Hi Guava) Date: Thu, 14 Oct 2010 16:14:02 -0400 Subject: Different Full GCs? In-Reply-To: <4CB749E4.2000909@oracle.com> References: <4CB749E4.2000909@oracle.com> Message-ID: Thanks for your response. It helped a lot. I think you are right about the mislabeled scavenges since I don't see them under 6uXX. I also noticed additional (System) label in the full gc messages in 6uXX. You asked about Eden or Young Gen size. I can only try to answer by the logs. The logs are from our customer's production. I think the Young Gen size is 42496K because of "[ParNew: 750K->0K(42496K)" in the log. I think the Eden size is 42496K / 8 * 6 = 31872K since -XX:SurvivorRatio is not set and its default is 6. Our application usually uses heap size between 8G to 64G. It creates large number of short lived objects in bursts. Should we use a large Young generation size because of this? Do you have any recommendation for Young/Tenured ratio and Eden/Survivor ratio? Thanks. On Thu, Oct 14, 2010 at 2:20 PM, Y. S. Ramakrishna wrote: > > Hello -- > > On 10/14/10 10:42, Hi Guava wrote: >> >> I've seen different full GC messages and I don't quite understand them: >> 13011:1474.283: [Full GC 1474.283: [CMS: >> 3333048K->1021496K(12540352K), 6.3444520 secs] >> 3363388K->1021496K(12582848K), [CMS Perm : 30528K->30483K(30656K)], >> 6.3447880 secs] > > This is a full gc; it collects the whole heap. > >> 34822:4101.808: [Full GC 4101.808: [ParNew: 667K->0K(42496K), >> 0.0210030 secs] 5953849K->5953209K(12582848K), 0.0211110 secs] >> 51586:25535.616: [Full GC 25535.616: [ParNew: 750K->0K(42496K), >> 0.0324350 secs] 5341677K->5340939K(12582848K), 0.0326130 secs] > > These two are not full gc's. They are mislabelled. They are likely > scavenge's forced by the allocation policy interacting with > JNI critical sections preventing a scavenge attempt made previously. > I think the labelling has been fixed in 6uXX. > >> 63486:26306.646: [Full GC 26306.646: [CMS[Unloading class >> sun.reflect.GeneratedConstructorAccessor20] > > Yes this is also a full gc. > > In 6uXX, the first and the last would be labelled with an additional > "System.gc()" ?label, and the two middle ones would not say "Full". > I don't have a bug id handy to point you to, but i might be able to > dig one up after some archeology. > >> >> Both GC #1 and #4 are triggered by System.gc() in our code. I believe >> they are the same type. There was less memory available during #4 so >> it unloaded classes (soft reference?). >> But full gc is the stop-the-world gc. Why does it mention CMS in the >> message? > > You are right that the "CMS" is misleading in that sense. > The idea was that it collects the old generation which is typically > collected by CMS. I agree that the CMS label is misleading and probably > should be fixed; it's a consequence of our internal naming scheme for > generation "types". > >> GC #2 and #3 look weird to me. They were not triggered by System.gc(). >> They are always very short and the duration is about the same as young >> generation GCs. In fact, the message is exact like young generation >> GCs except the extra word "Full". What are these short full gcs? Are >> there different level of full GCs? > > No, and you are right that these are just scavenges. What must have happened > is that your application probably does a few short-lived JNI critical > sections (JNI_Get{Array,String}Critical) which happens around the time > when another thread wants to do a large allocation which will not fit in > the current space available in Eden, so the allocator attempts to do a > scavenge, but is prevented from doing so because of the JNI critical > section. > This is remembered and when the critical section is exited, a scavenge > is forced. At least that's my guess based on the messages above. > (BTW, what's the size of your Eden or Young Gen? The policy should > probably be a little smarter and not do those scavenges until an > allocation request (would) fail.) > >> I spent some time searching for answer but I am still confused. Can >> somebody help explain and suggest some reading materials? > > Hope that helps a bit. Try JDK 7 or 6u21 (or whatever is the latest) and > see if the confusing messages are gone. If they are still there, > let us know. > > thanks. > -- ramki > >> >> Thanks! >> >> The environment: >> Java HotSpot(TM) 64-Bit Server VM Version 1.5.0_19-b02 >> Linux Version 2.6.9-89.0.20.ELsmp >> amd64 >> >> -Xms12g >> -Xmx12g >> -XX:+PrintGC >> -XX:+PrintGCDetails >> -XX:+PrintGCTimeStamps >> -XX:+UseConcMarkSweepGC >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > From y.s.ramakrishna at oracle.com Thu Oct 14 13:17:29 2010 From: y.s.ramakrishna at oracle.com (Y. S. Ramakrishna) Date: Thu, 14 Oct 2010 13:17:29 -0700 Subject: CMS initial mark pauses In-Reply-To: <4CB7525F.9070403@oracle.com> References: <4CB7525F.9070403@oracle.com> Message-ID: <4CB76559.6010708@oracle.com> Just realized that this is on HPUX/Itanium on a JVM built by HP. You'll of course need to go to HP to have that addressed; sorry for not reading yr email carefully enough to note the platform information that you had included before writing my response below. -- ramki On 10/14/10 11:56, Y. S. Ramakrishna wrote: > Hi Adam -- > > Do you have a fuller GC log (perhaps including > PrintCMSStatistics=2) to help make a sharper diagnosis? > It could be:- > > 6412968 CMS: Long initial mark pauses > > which we have unfortunately not gotten to addressing yet: > CMS initial mark work is (still) done single-threaded. > Usually there is little such work so we are usually fine, > but if survivor spaces are large and full and/or if CMS triggering > occupancy is such that CMS runs frequently then you can > be affected by long serial initial mark pauses because > the work is non-trivial. (CMS-remark and scavenges on > the other hand are done by several worker threads working > in parallel.) > > Here's an excerpt from the "workaround" section of that bug > (reproduced here because i cannot seem to get bugs.sun.com > to display it) :- > >> This is not really a viable workaround since it might lead to suboptimal >> heap configuration: >> (1) use no survivor spaces (at the risk of larger scavenge pauses, >> larger remark pauses, >> even concurrent mode failures) >> (2) use a sufficiently large heap so as to be able to afford to set a >> mark initiation threshold above the low water-mark (after a major >> collection cycle). This will keep init-mark's riding on the >> coat-tails >> of scavenges. >> *** (#1 of 2): 2006-04-13 09:53:14 PDT xxxx at oracle.com >> >> Also, if using iCMS (Inceremental CMS), drop the Incremental mode and >> revert to >> vanilla CMS. >> *** (#2 of 2): 2010-04-14 11:02:03 PDT xxxx at oracle.com > > If you have support, you can try escalating it via your support channels > to get this addressed, especially if the workaround/retuning doesn't > do the job. > > -- ramki > > On 10/14/10 11:31, Adam Hawthorne wrote: >> Hi all, >> >> I'm seeing a customer running CMS with some fairly long initial mark >> pauses, especially relative to all the other pauses. The machine is >> (I believe) an Itanium 6-way running HP-UX, with HP's version of >> Hotspot, version 1.6.0.06 . >> >> What I'm seeing is that all the remark and young gc pauses are less >> than 500ms, and very consistent. There are a lot of initial mark >> pauses that also fall in this range. That's our target, and things >> look really good. The problem is that there are occasional pauses of >> up to 1.5s, which is unacceptable for the customer. Does anyone have >> any ideas what can cause (seemingly random) long initial mark pauses? >> >> Here's an example of one of the long mark pauses: >> >> 45946.930: [GC [1 CMS-initial-mark: 957254K(1598236K)] >> 1473999K(2151196K), 1.3505680 secs] [Times: user=1.34 sys=0.00, >> real=1.35 secs] >> Here's an example of one of the more typical pauses: >> >> 45954.362: [GC [1 CMS-initial-mark: 963824K(1598236K)] >> 1001370K(2151196K), 0.1579016 secs] [Times: user=0.16 sys=0.00, >> real=0.16 secs] >> >> From my understanding, initial mark pauses are supposed to be >> relatively short, and usually shorter than remark pauses, but I don't >> have a remark pause greater than 300ms. >> >> Any help is appreciated. Thanks, >> >> Adam >> >> -- >> Adam Hawthorne >> Software Engineer >> BASIS International Ltd. >> www.basis.com >> +1.505.345.5232 Phone >> >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > From y.s.ramakrishna at oracle.com Thu Oct 14 16:00:30 2010 From: y.s.ramakrishna at oracle.com (Y. S. Ramakrishna) Date: Thu, 14 Oct 2010 16:00:30 -0700 Subject: CMS initial mark pauses In-Reply-To: References: <4CB7525F.9070403@oracle.com> Message-ID: <4CB78B8E.5050700@oracle.com> Hi Adam -- ... > > I understood before that "initial" is not done in parallel. I'm curious > - why not? When it was first implemented CMS did all its work single-threaded over a serial scavenger. It was incrementally parallelized over time but because initial-mark pauses ere usually not a concern (small edens, small survivor spaces, initial mark immediately following a scavenge) it never rose high enough in priority to parallelize. Clearly we have reached a point where the old assumptions no longer hold and it's time to parallelize it. Or better still to move to G1 which is fully parallel and concurrent, and have other advantages as well. > > I have CMSInitiatingOccupancyFraction=50 because I was concerned about > some finalization issues in our application, and I thought I remembered > reference processing wasn't done in young GC's. After enabling > PrintReferenceGC, the logs imply ParNewGC also clears references - is > that true? If so, it may not be necessary for us to include that option > anyway. Yes, scavenges do process unreachable Reference objects found in the young gen. However, once these get into the old gen, you are right that you will need a CMS cycle to identify them as unreachable and to process them appropriately. > > Here's an excerpt from the "workaround" section of that bug > (reproduced here because i cannot seem to get bugs.sun.com > > to display it) :- > > This is not really a viable workaround since it might lead to > suboptimal > heap configuration: > (1) use no survivor spaces (at the risk of larger scavenge > pauses, larger remark pauses, > even concurrent mode failures) > (2) use a sufficiently large heap so as to be able to afford to > set a > mark initiation threshold above the low water-mark (after a major > collection cycle). This will keep init-mark's riding on the > coat-tails > of scavenges. > *** (#1 of 2): 2006-04-13 09:53:14 PDT xxxx at oracle.com > > > > > The customer's application appears to fit neatly in a 2.4G heap, and we > have -Xmx4g, so I believe we might be able to apply (2) here. Is (1) > above required along with (2), or do these workarounds address the > problem independently? I ask because (a) this customer is already > concerned about pause times, so I don't have a lot of room to increase > remark and scavenge times, and (b) I'm concerned about eliminating > survivor spaces since we've dealt with significant heap fragmentation in > the past. Precisely. The two are actually additive, but either by itself may not be sufficient, and as you pointed out (1) may not even be always feasible. > > One other data point is that we have a large number of mostly idle > threads (3826 at one count), with most of the idle threads holding onto > approximately 2MB of object data. I don't know if that would > significantly contribute to the initial mark pause, but my intuition is > that it would increase the time if some of that time is spent marking > the stack locals. Yes, that could be, but probably less significant than a large Eden or survivor space, given that when the CMS initial-mark pauses come immediately after a scavenge, the pauses are much shorter, so the larger contribution is from the large Eden. If you pour your GC logs into GCHisto, you should probably see that the CMS intial-mark pauses increase as the most recent scavenge becomes more distant (or you could plot that via a spreadsheet and note that relationship). -- ramki > > > > > > Also, if using iCMS (Inceremental CMS), drop the Incremental > mode and revert to > vanilla CMS. > *** (#2 of 2): 2010-04-14 11:02:03 PDT xxxx at oracle.com > > > > If you have support, you can try escalating it via your support channels > to get this addressed, especially if the workaround/retuning doesn't > do the job. > > -- ramki > > > My option seems to be to eliminate the CMSInitiatingOccupancyFraction=50 > and keep the -Xmx4g. Would it be prudent to set -Xms4g also? > > And the log excerpt from a steady-state in the application. The sigma > on pause times for young gc and remark is 17ms and 26ms - they're like > clockwork. The initial mark is higher, 334ms due to the large-valued > outliers. > > ... From adamh at basis.com Fri Oct 15 11:49:44 2010 From: adamh at basis.com (Adam Hawthorne) Date: Fri, 15 Oct 2010 14:49:44 -0400 Subject: CMS initial mark pauses In-Reply-To: <4CB78B8E.5050700@oracle.com> References: <4CB7525F.9070403@oracle.com> <4CB78B8E.5050700@oracle.com> Message-ID: On Thu, Oct 14, 2010 at 19:00, Y. S. Ramakrishna wrote: > > Hi Adam -- > > ... > > >> I understood before that "initial" is not done in parallel. I'm curious - >> why not? >> > > When it was first implemented CMS did all its work single-threaded > over a serial scavenger. It was incrementally parallelized over time > but because initial-mark pauses ere usually not a concern (small edens, > small survivor spaces, initial mark immediately following a scavenge) > it never rose high enough in priority to parallelize. Clearly we have > reached a point where the old assumptions no longer hold and it's > time to parallelize it. Or better still to move to G1 which is fully > parallel and concurrent, and have other advantages as well. > > Thanks for the history lesson! We did mention G1 to our customer yesterday, but I'm not yet familiar enough with its tuning knobs to be confident to suggest it for a production system. We've only done minimal testing in-house, and not yet on the scale of this customer. More generally, for ParGC and CMS, our heuristic has been to set heap size, configure new size, and then if necessary, configure survivor spaces and maybe some other knobs to fulfill our customer requirements. I don't know what the equivalent settings are for G1. I'm curious if there's a similar "recipe" for getting it configured and tuned. When we tried earlier, we didn't have much success with it. Can anyone who's spent significant time tuning it relate their experiences? Is it worth trying on 2-4 core systems with 1-4g of RAM? > > >> I have CMSInitiatingOccupancyFraction=50 because I was concerned about >> some finalization issues in our application, and I thought I remembered >> reference processing wasn't done in young GC's. After enabling >> PrintReferenceGC, the logs imply ParNewGC also clears references - is that >> true? If so, it may not be necessary for us to include that option anyway. >> > > Yes, scavenges do process unreachable Reference objects found in the young > gen. > However, once these get into the old gen, you are right that you will need > a > CMS cycle to identify them as unreachable and to process them > appropriately. Thanks for the confirmation. > (1) use no survivor spaces (at the risk of larger scavenge >> pauses, larger remark pauses, >> even concurrent mode failures) >> (2) use a sufficiently large heap so as to be able to afford to >> set a >> mark initiation threshold above the low water-mark (after a >> major >> collection cycle). This will keep init-mark's riding on the >> coat-tails >> of scavenges. >> >> The customer's application appears to fit neatly in a 2.4G heap, and we >> have -Xmx4g, so I believe we might be able to apply (2) here. Is (1) above >> required along with (2), or do these workarounds address the problem >> independently? I ask because (a) this customer is already concerned about >> pause times, so I don't have a lot of room to increase remark and scavenge >> times, and (b) I'm concerned about eliminating survivor spaces since we've >> dealt with significant heap fragmentation in the past. >> > > Precisely. The two are actually additive, but either by itself may not > be sufficient, and as you pointed out (1) may not even be always feasible. I reduced the survivor spaces in my recommendation for today but did not completely eliminate them, and increased the old gen size. Unfortunately, the customer made a mistake in the settings that disabled -XX:+PrintGCDetails, so they failed to get new logs. They reported that their user experience was slightly worse, but without logs, I can't determine whether the GC's are the problem or something else. > One other data point is that we have a large number of mostly idle threads >> (3826 at one count), with most of the idle threads holding onto >> approximately 2MB of object data. I don't know if that would significantly >> contribute to the initial mark pause, but my intuition is that it would >> increase the time if some of that time is spent marking the stack locals. >> > > Yes, that could be, but probably less significant than a large Eden or > survivor > space, given that when the CMS initial-mark pauses come immediately after > a scavenge, the pauses are much shorter, so the larger contribution is > from the large Eden. If you pour your GC logs into GCHisto, you > should probably see that the CMS intial-mark pauses increase as > the most recent scavenge becomes more distant (or you could plot that > via a spreadsheet and note that relationship). > Ok, I checked it in gchisto and you were exactly right. This was immediately obvious. Thanks for your help again. > -- ramki > > >> >> >> >> Also, if using iCMS (Inceremental CMS), drop the Incremental >> mode and revert to >> vanilla CMS. >> *** (#2 of 2): 2010-04-14 11:02:03 PDT xxxx at oracle.com >> >> >> >> >> If you have support, you can try escalating it via your support >> channels >> to get this addressed, especially if the workaround/retuning doesn't >> do the job. >> >> -- ramki >> >> >> My option seems to be to eliminate the CMSInitiatingOccupancyFraction=50 >> and keep the -Xmx4g. Would it be prudent to set -Xms4g also? >> >> And the log excerpt from a steady-state in the application. The sigma on >> pause times for young gc and remark is 17ms and 26ms - they're like >> clockwork. The initial mark is higher, 334ms due to the large-valued >> outliers. >> >> >> > ... > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101015/0ca825d9/attachment-0001.html From adamh at basis.com Mon Oct 18 13:49:30 2010 From: adamh at basis.com (Adam Hawthorne) Date: Mon, 18 Oct 2010 16:49:30 -0400 Subject: CMS initial mark pauses In-Reply-To: <4CB78B8E.5050700@oracle.com> References: <4CB7525F.9070403@oracle.com> <4CB78B8E.5050700@oracle.com> Message-ID: On Thu, Oct 14, 2010 at 19:00, Y. S. Ramakrishna wrote: > > Hi Adam -- > > ... > >> >> I have CMSInitiatingOccupancyFraction=50 because I was concerned about >> some finalization issues in our application, and I thought I remembered >> reference processing wasn't done in young GC's. After enabling >> PrintReferenceGC, the logs imply ParNewGC also clears references - is that >> true? If so, it may not be necessary for us to include that option anyway. >> > > Yes, scavenges do process unreachable Reference objects found in the young > gen. > However, once these get into the old gen, you are right that you will need > a > CMS cycle to identify them as unreachable and to process them > appropriately. > > >> Here's an excerpt from the "workaround" section of that bug >> ... >> > The customer's application appears to fit neatly in a 2.4G heap, and we >> have -Xmx4g, so I believe we might be able to apply (2) here. Is (1) above >> required along with (2), or do these workarounds address the problem >> independently? I ask because (a) this customer is already concerned about >> pause times, so I don't have a lot of room to increase remark and scavenge >> times, and (b) I'm concerned about eliminating survivor spaces since we've >> dealt with significant heap fragmentation in the past. >> > > Precisely. The two are actually additive, but either by itself may not > be sufficient, and as you pointed out (1) may not even be always feasible. > ... > -- ramki > Just a followup - I removed the CMSInitiatingOccupancyFraction, and I tried to fulfill the spirit of the workaround by setting the SurvivorRatio to significantly limit the survivor space size. The customer mistyped one of the logging parameters, so I wasn't able to get the logs from that day, but the report was that performance suffered significantly I looked back at the logs and discovered that every initial-mark that followed immediately after or even modestly soon after a young gc was well within the pause time requirement. Only those which were more than halfway to the next young generation were long, just as Ramki predicted. So all I did was remove the CMSInitiatingOccupancyFraction and set the heap size to 4g, and the system was reported to be working well today. I ran some tests with G1 today, but I'll post a separate thread about that. Thanks for the help. Adam -- Adam Hawthorne Software Engineer BASIS International Ltd. www.basis.com +1.505.345.5232 Phone -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101018/14c7289b/attachment.html From adamh at basis.com Mon Oct 18 14:01:40 2010 From: adamh at basis.com (Adam Hawthorne) Date: Mon, 18 Oct 2010 17:01:40 -0400 Subject: G1 performance Message-ID: Hi all, Over the weekend, we created a test to try to reproduce our pause time issues I posted about last week so we could be more confident in our recommendation to the customer. While I had the machine provisioned, I ran our test with G1 . I'm afraid the results were quite poor for our application. I have this machine for the next week, and I'll be trying out different test configurations, but I'd like to continue to test G1 while I have the machine available. Is there more information about tuning G1? Our test box is 64-bit Linux, with 6u22 installed. I tried two different configurations. -Xms4g -Xmx4g -XX:+UnlockExperimentalVMOptions -XX:+UseG1GC -XX:NewSize=600m -XX:MaxGCPauseMillis=400 -XX:GCPauseIntervalMillis=3000 -XX:MaxPermSize=128m -server -XX\:+PrintGCDetails -XX\:+PrintGCTimeStamps -Xloggc:gc.log WithNewSize.log When that produced many long Full GC's, I tried decreasing the pause interval and removed the NewSize setting: -Xms4g -Xmx4g -XX:+UnlockExperimentalVMOptions -XX:+UseG1GC -XX:MaxGCPauseMillis=400 -XX:GCPauseIntervalMillis=2000 -XX:MaxPermSize=128m -server -XX\:+PrintGCDetails -XX\:+PrintGCTimeStamps -Xloggc:gc.log NoNewSize.log The result was that there were a lot of Full GC's each taking about 7 seconds. Young GC's performed well (except one of the first ones). Do I just need to reduce the pause interval, assuming the pause time requirement is fixed? In contrast, CMS was able to keep all pause times below 300ms with the same test, with about 25% GC overhead. I also tried various combinations of: -XX:+G1YoungGenSize=600m -XX:+G1ParallelRSetUpdatingEnabled -XX:+G1ParallelRSetScanningEnabled and the JVM would not start with any of these options. Did the names change in a recent release? If so, can someone send the new options? It would also be helpful if the following document could be updated: http://www.oracle.com/technetwork/java/javase/tech/g1-intro-jsp-135488.html If anyone is interested, I can run more tests with more logging, and I can run the test again with other Java versions. Adam -- Adam Hawthorne Software Engineer BASIS International Ltd. www.basis.com +1.505.345.5232 Phone -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101018/98d6bb00/attachment.html From higuava at gmail.com Mon Oct 25 17:32:48 2010 From: higuava at gmail.com (Hi Guava) Date: Mon, 25 Oct 2010 20:32:48 -0400 Subject: Long young generation GC? Message-ID: The third young generation GC took 439.2720750 secs but the user and real time are only 0.08 seconds. What does it mean? 72667.213: [GC 72667.213: [ParNew: 38336K->4224K(38336K), 1.2473840 secs] 3443948K->3420569K(6549376K), 1.2474290 secs] [Times: user=0.64 sys=1.10, real=1.25 secs] 72680.531: [GC 72680.532: [ParNew: 38336K->4221K(38336K), 0.2916570 secs] 3008948K->2979033K(6549376K), 0.2916710 secs] [Times: user=0.26 sys=0.03, real=0.29 secs] 72681.425: [GC 72681.426: [ParNew: 38325K->4224K(38336K), 439.2720750 secs] 3013053K->2979055K(6549376K), 439.2720750 secs] [Times: user=0.08 sys=0.00, real=0.08 secs] Environment: Java HotSpot(TM) 64-Bit Server VM Version 1.6.0_20-b02 Linux Version 2.6.18-128.1.1.el5 on amd64 -Xms6400m -Xmx6400m -Xss256k -XX:+UseConcMarkSweepGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UseCompressedOops From y.s.ramakrishna at oracle.com Mon Oct 25 17:49:33 2010 From: y.s.ramakrishna at oracle.com (Y. Srinivas Ramakrishna) Date: Mon, 25 Oct 2010 17:49:33 -0700 Subject: Long young generation GC? In-Reply-To: References: Message-ID: <4CC6259D.5050303@oracle.com> On 10/25/2010 5:32 PM, Hi Guava wrote: > The third young generation GC took 439.2720750 secs but the user and > real time are only 0.08 seconds. What does it mean? The machine may be using NTP, and the time may have been changed? JVM timestamps on Linux seem still to be based on TOD rather than on TSC. Someone in the runtime team (cc'd) may have more detail on why that might still be so. -- ramki > > 72667.213: [GC 72667.213: [ParNew: 38336K->4224K(38336K), 1.2473840 secs] > 3443948K->3420569K(6549376K), 1.2474290 secs] [Times: user=0.64 sys=1.10, > real=1.25 secs] > 72680.531: [GC 72680.532: [ParNew: 38336K->4221K(38336K), 0.2916570 secs] > 3008948K->2979033K(6549376K), 0.2916710 secs] [Times: user=0.26 sys=0.03, > real=0.29 secs] > 72681.425: [GC 72681.426: [ParNew: 38325K->4224K(38336K), 439.2720750 > secs] 3013053K->2979055K(6549376K), 439.2720750 secs] [Times: user=0.08 > sys=0.00, real=0.08 secs] > > Environment: > Java HotSpot(TM) 64-Bit Server VM Version 1.6.0_20-b02 > Linux Version 2.6.18-128.1.1.el5 on amd64 > -Xms6400m > -Xmx6400m > -Xss256k > -XX:+UseConcMarkSweepGC > -XX:+PrintGCDetails > -XX:+PrintGCTimeStamps > -XX:+UseCompressedOops > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From y.s.ramakrishna at oracle.com Mon Oct 25 17:51:57 2010 From: y.s.ramakrishna at oracle.com (Y. Srinivas Ramakrishna) Date: Mon, 25 Oct 2010 17:51:57 -0700 Subject: Long young generation GC? In-Reply-To: <4CC6259D.5050303@oracle.com> References: <4CC6259D.5050303@oracle.com> Message-ID: <4CC6262D.6050608@oracle.com> On 10/25/2010 5:49 PM, Y. Srinivas Ramakrishna wrote: > On 10/25/2010 5:32 PM, Hi Guava wrote: >> The third young generation GC took 439.2720750 secs but the user and >> real time are only 0.08 seconds. What does it mean? > > The machine may be using NTP, and the time may have been changed? Seems a rather large jump, so may not be NTP (which i am told uses adjtime() to slowly accelerate the time forward or decelerate it backward), but rather an abrupt perhaps manual change in TOD. Over to the experts.... > JVM timestamps on Linux seem still to be based on TOD rather than > on TSC. Someone in the runtime team (cc'd) may have more detail on > why that might still be so. > > -- ramki > > >> >> 72667.213: [GC 72667.213: [ParNew: 38336K->4224K(38336K), 1.2473840 secs] >> 3443948K->3420569K(6549376K), 1.2474290 secs] [Times: user=0.64 sys=1.10, >> real=1.25 secs] >> 72680.531: [GC 72680.532: [ParNew: 38336K->4221K(38336K), 0.2916570 secs] >> 3008948K->2979033K(6549376K), 0.2916710 secs] [Times: user=0.26 sys=0.03, >> real=0.29 secs] >> 72681.425: [GC 72681.426: [ParNew: 38325K->4224K(38336K), 439.2720750 >> secs] 3013053K->2979055K(6549376K), 439.2720750 secs] [Times: user=0.08 >> sys=0.00, real=0.08 secs] >> >> Environment: >> Java HotSpot(TM) 64-Bit Server VM Version 1.6.0_20-b02 >> Linux Version 2.6.18-128.1.1.el5 on amd64 >> -Xms6400m >> -Xmx6400m >> -Xss256k >> -XX:+UseConcMarkSweepGC >> -XX:+PrintGCDetails >> -XX:+PrintGCTimeStamps >> -XX:+UseCompressedOops >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > From higuava at gmail.com Tue Oct 26 07:07:04 2010 From: higuava at gmail.com (Hi Guava) Date: Tue, 26 Oct 2010 10:07:04 -0400 Subject: Long young generation GC? In-Reply-To: <4CC6262D.6050608@oracle.com> References: <4CC6259D.5050303@oracle.com> <4CC6262D.6050608@oracle.com> Message-ID: Here is additional information about the machine running the JVM. It is a virtual machine running in a private cloud. Could it be something like swapping that caused problem? On Mon, Oct 25, 2010 at 8:51 PM, Y. Srinivas Ramakrishna wrote: > On 10/25/2010 5:49 PM, Y. Srinivas Ramakrishna wrote: >> >> On 10/25/2010 5:32 PM, Hi Guava wrote: >>> >>> The third young generation GC took 439.2720750 secs but the user and >>> real time are only 0.08 seconds. What does it mean? >> >> The machine may be using NTP, and the time may have been changed? > > Seems a rather large jump, so may not be NTP (which i am told uses > adjtime() to slowly accelerate the time forward or decelerate it backward), > but rather an abrupt perhaps manual change in TOD. > > Over to the experts.... > >> JVM timestamps on Linux seem still to be based on TOD rather than >> on TSC. Someone in the runtime team (cc'd) may have more detail on >> why that might still be so. >> >> -- ramki >> >> >>> >>> 72667.213: [GC 72667.213: [ParNew: 38336K->4224K(38336K), 1.2473840 secs] >>> 3443948K->3420569K(6549376K), 1.2474290 secs] [Times: user=0.64 sys=1.10, >>> real=1.25 secs] >>> 72680.531: [GC 72680.532: [ParNew: 38336K->4221K(38336K), 0.2916570 secs] >>> 3008948K->2979033K(6549376K), 0.2916710 secs] [Times: user=0.26 sys=0.03, >>> real=0.29 secs] >>> 72681.425: [GC 72681.426: [ParNew: 38325K->4224K(38336K), 439.2720750 >>> secs] 3013053K->2979055K(6549376K), 439.2720750 secs] [Times: user=0.08 >>> sys=0.00, real=0.08 secs] >>> >>> Environment: >>> Java HotSpot(TM) 64-Bit Server VM Version 1.6.0_20-b02 >>> Linux Version 2.6.18-128.1.1.el5 on amd64 >>> -Xms6400m >>> -Xmx6400m >>> -Xss256k >>> -XX:+UseConcMarkSweepGC >>> -XX:+PrintGCDetails >>> -XX:+PrintGCTimeStamps >>> -XX:+UseCompressedOops >>> _______________________________________________ >>> hotspot-gc-use mailing list >>> hotspot-gc-use at openjdk.java.net >>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> > > From y.s.ramakrishna at oracle.com Tue Oct 26 09:34:40 2010 From: y.s.ramakrishna at oracle.com (Y. S. Ramakrishna) Date: Tue, 26 Oct 2010 09:34:40 -0700 Subject: Long young generation GC? In-Reply-To: References: <4CC6259D.5050303@oracle.com> <4CC6262D.6050608@oracle.com> Message-ID: <4CC70320.4080207@oracle.com> On 10/26/10 07:07, Hi Guava wrote: > Here is additional information about the machine running the JVM. It > is a virtual machine running in a private cloud. Could it be something > like swapping that caused problem? Not swapping, but perhaps the management of "time" perhaps in a virtualized setting (by that i mean that there may be interactions between the host/hypervisor and the guest OS that could cause the JVM to observe time jumps of this sort)? I'd suggest gathering more data on its reproducibility (or otherwise) in both a VM and non-VM setting. Over to the time experts in the runtime team who may have encountered issues in VM settings previously. (I have heard of occasional such reports in virtual settings before but don't know if any of these were definitively chased down.) You might also want to check with the VM provider to see if they might know of such issues. -- ramki > > On Mon, Oct 25, 2010 at 8:51 PM, Y. Srinivas Ramakrishna > wrote: >> On 10/25/2010 5:49 PM, Y. Srinivas Ramakrishna wrote: >>> On 10/25/2010 5:32 PM, Hi Guava wrote: >>>> The third young generation GC took 439.2720750 secs but the user and >>>> real time are only 0.08 seconds. What does it mean? >>> The machine may be using NTP, and the time may have been changed? >> Seems a rather large jump, so may not be NTP (which i am told uses >> adjtime() to slowly accelerate the time forward or decelerate it backward), >> but rather an abrupt perhaps manual change in TOD. >> >> Over to the experts.... >> >>> JVM timestamps on Linux seem still to be based on TOD rather than >>> on TSC. Someone in the runtime team (cc'd) may have more detail on >>> why that might still be so. >>> >>> -- ramki >>> >>> >>>> 72667.213: [GC 72667.213: [ParNew: 38336K->4224K(38336K), 1.2473840 secs] >>>> 3443948K->3420569K(6549376K), 1.2474290 secs] [Times: user=0.64 sys=1.10, >>>> real=1.25 secs] >>>> 72680.531: [GC 72680.532: [ParNew: 38336K->4221K(38336K), 0.2916570 secs] >>>> 3008948K->2979033K(6549376K), 0.2916710 secs] [Times: user=0.26 sys=0.03, >>>> real=0.29 secs] >>>> 72681.425: [GC 72681.426: [ParNew: 38325K->4224K(38336K), 439.2720750 >>>> secs] 3013053K->2979055K(6549376K), 439.2720750 secs] [Times: user=0.08 >>>> sys=0.00, real=0.08 secs] >>>> >>>> Environment: >>>> Java HotSpot(TM) 64-Bit Server VM Version 1.6.0_20-b02 >>>> Linux Version 2.6.18-128.1.1.el5 on amd64 >>>> -Xms6400m >>>> -Xmx6400m >>>> -Xss256k >>>> -XX:+UseConcMarkSweepGC >>>> -XX:+PrintGCDetails >>>> -XX:+PrintGCTimeStamps >>>> -XX:+UseCompressedOops >>>> _______________________________________________ >>>> hotspot-gc-use mailing list >>>> hotspot-gc-use at openjdk.java.net >>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> From higuava at gmail.com Tue Oct 26 10:49:53 2010 From: higuava at gmail.com (Hi Guava) Date: Tue, 26 Oct 2010 13:49:53 -0400 Subject: Long young generation GC? In-Reply-To: <4CC70320.4080207@oracle.com> References: <4CC6259D.5050303@oracle.com> <4CC6262D.6050608@oracle.com> <4CC70320.4080207@oracle.com> Message-ID: I now believe that this phenomenon is caused by the virtual machine. It has nothing to do with the garbage collector or JVM. I searched in the old logs and found this in all 3 old logs that I have. There are multiple virtual machines configured the same way. This problem only shows up in one of the virtual machines. By the way, the 639 seconds GC is not a perception problem. It is real. The users reported stuck process and they found the CPUs of the virtual machine was racing during that period. Can I understand this discrepancy this way? the user, sys and real times are measured in cpu cycles. They are short as they are supposed to be. The 439.2720750 time is the elapsed time. Since the virtual machine is doing something else or not functioning correctly, GC took 439 seconds even though there was only 0.08 seconds of cpu time. 72681.425: [GC 72681.426: [ParNew: 38325K->4224K(38336K), 439.2720750 secs] 3013053K->2979055K(6549376K), 439.2720750 secs] [Times: user=0.08 sys=0.00, real=0.08 secs] On Tue, Oct 26, 2010 at 12:34 PM, Y. S. Ramakrishna wrote: > > > On 10/26/10 07:07, Hi Guava wrote: >> >> Here is additional information about the machine running the JVM. It >> is a virtual machine running in a private cloud. Could it be something >> like swapping that caused problem? > > Not swapping, but perhaps the management of "time" perhaps in a virtualized > setting (by that i mean that there may be interactions between the > host/hypervisor and the guest OS that could cause the JVM to observe > time jumps of this sort)? I'd suggest gathering more data on its > reproducibility (or otherwise) in both a VM and non-VM setting. > > Over to the time experts in the runtime team who may have encountered > issues in VM settings previously. (I have heard of occasional such reports > in > virtual settings before but don't know if any of these were definitively > chased > down.) You might also want to check with the VM provider to see if they > might know of such issues. > > -- ramki > > >> >> On Mon, Oct 25, 2010 at 8:51 PM, Y. Srinivas Ramakrishna >> wrote: >>> >>> On 10/25/2010 5:49 PM, Y. Srinivas Ramakrishna wrote: >>>> >>>> On 10/25/2010 5:32 PM, Hi Guava wrote: >>>>> >>>>> The third young generation GC took 439.2720750 secs but the user and >>>>> real time are only 0.08 seconds. What does it mean? >>>> >>>> The machine may be using NTP, and the time may have been changed? >>> >>> Seems a rather large jump, so may not be NTP (which i am told uses >>> adjtime() to slowly accelerate the time forward or decelerate it >>> backward), >>> but rather an abrupt perhaps manual change in TOD. >>> >>> Over to the experts.... >>> >>>> JVM timestamps on Linux seem still to be based on TOD rather than >>>> on TSC. Someone in the runtime team (cc'd) may have more detail on >>>> why that might still be so. >>>> >>>> -- ramki >>>> >>>> >>>>> 72667.213: [GC 72667.213: [ParNew: 38336K->4224K(38336K), 1.2473840 >>>>> secs] >>>>> 3443948K->3420569K(6549376K), 1.2474290 secs] [Times: user=0.64 >>>>> sys=1.10, >>>>> real=1.25 secs] >>>>> 72680.531: [GC 72680.532: [ParNew: 38336K->4221K(38336K), 0.2916570 >>>>> secs] >>>>> 3008948K->2979033K(6549376K), 0.2916710 secs] [Times: user=0.26 >>>>> sys=0.03, >>>>> real=0.29 secs] >>>>> 72681.425: [GC 72681.426: [ParNew: 38325K->4224K(38336K), 439.2720750 >>>>> secs] 3013053K->2979055K(6549376K), 439.2720750 secs] [Times: user=0.08 >>>>> sys=0.00, real=0.08 secs] >>>>> >>>>> Environment: >>>>> Java HotSpot(TM) 64-Bit Server VM Version 1.6.0_20-b02 >>>>> Linux Version 2.6.18-128.1.1.el5 on amd64 >>>>> -Xms6400m >>>>> -Xmx6400m >>>>> -Xss256k >>>>> -XX:+UseConcMarkSweepGC >>>>> -XX:+PrintGCDetails >>>>> -XX:+PrintGCTimeStamps >>>>> -XX:+UseCompressedOops >>>>> _______________________________________________ >>>>> hotspot-gc-use mailing list >>>>> hotspot-gc-use at openjdk.java.net >>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>> > From y.s.ramakrishna at oracle.com Tue Oct 26 10:58:44 2010 From: y.s.ramakrishna at oracle.com (Y. S. Ramakrishna) Date: Tue, 26 Oct 2010 10:58:44 -0700 Subject: Long young generation GC? In-Reply-To: References: <4CC6259D.5050303@oracle.com> <4CC6262D.6050608@oracle.com> <4CC70320.4080207@oracle.com> Message-ID: <4CC716D4.2030005@oracle.com> "real" is elapsed time too, obtained from the OS via times(2). So if it's reported so small when users see much more time elapse physically, it must be the case that it's a bug in times(2) in a virtual setting. Perhaps if you can boil this down to a small and reproducible test case you can file a bug with the VM provider and with the JVM as well, the latter perhaps a shadow of the former. Over to the runtime team. -- ramki On 10/26/10 10:49, Hi Guava wrote: > I now believe that this phenomenon is caused by the virtual machine. > It has nothing to do with the garbage collector or JVM. I searched in > the old logs and found this in all 3 old logs that I have. There are > multiple virtual machines configured the same way. This problem only > shows up in one of the virtual machines. > By the way, the 639 seconds GC is not a perception problem. It is > real. The users reported stuck process and they found the CPUs of the > virtual machine was racing during that period. > Can I understand this discrepancy this way? the user, sys and real > times are measured in cpu cycles. They are short as they are supposed > to be. The 439.2720750 time is the elapsed time. Since the virtual > machine is doing something else or not functioning correctly, GC took > 439 seconds even though there was only 0.08 seconds of cpu time. > > 72681.425: [GC 72681.426: [ParNew: 38325K->4224K(38336K), 439.2720750 > secs] 3013053K->2979055K(6549376K), 439.2720750 secs] [Times: > user=0.08 sys=0.00, real=0.08 secs] > > > On Tue, Oct 26, 2010 at 12:34 PM, Y. S. Ramakrishna > wrote: >> >> On 10/26/10 07:07, Hi Guava wrote: >>> Here is additional information about the machine running the JVM. It >>> is a virtual machine running in a private cloud. Could it be something >>> like swapping that caused problem? >> Not swapping, but perhaps the management of "time" perhaps in a virtualized >> setting (by that i mean that there may be interactions between the >> host/hypervisor and the guest OS that could cause the JVM to observe >> time jumps of this sort)? I'd suggest gathering more data on its >> reproducibility (or otherwise) in both a VM and non-VM setting. >> >> Over to the time experts in the runtime team who may have encountered >> issues in VM settings previously. (I have heard of occasional such reports >> in >> virtual settings before but don't know if any of these were definitively >> chased >> down.) You might also want to check with the VM provider to see if they >> might know of such issues. >> >> -- ramki >> >> >>> On Mon, Oct 25, 2010 at 8:51 PM, Y. Srinivas Ramakrishna >>> wrote: >>>> On 10/25/2010 5:49 PM, Y. Srinivas Ramakrishna wrote: >>>>> On 10/25/2010 5:32 PM, Hi Guava wrote: >>>>>> The third young generation GC took 439.2720750 secs but the user and >>>>>> real time are only 0.08 seconds. What does it mean? >>>>> The machine may be using NTP, and the time may have been changed? >>>> Seems a rather large jump, so may not be NTP (which i am told uses >>>> adjtime() to slowly accelerate the time forward or decelerate it >>>> backward), >>>> but rather an abrupt perhaps manual change in TOD. >>>> >>>> Over to the experts.... >>>> >>>>> JVM timestamps on Linux seem still to be based on TOD rather than >>>>> on TSC. Someone in the runtime team (cc'd) may have more detail on >>>>> why that might still be so. >>>>> >>>>> -- ramki >>>>> >>>>> >>>>>> 72667.213: [GC 72667.213: [ParNew: 38336K->4224K(38336K), 1.2473840 >>>>>> secs] >>>>>> 3443948K->3420569K(6549376K), 1.2474290 secs] [Times: user=0.64 >>>>>> sys=1.10, >>>>>> real=1.25 secs] >>>>>> 72680.531: [GC 72680.532: [ParNew: 38336K->4221K(38336K), 0.2916570 >>>>>> secs] >>>>>> 3008948K->2979033K(6549376K), 0.2916710 secs] [Times: user=0.26 >>>>>> sys=0.03, >>>>>> real=0.29 secs] >>>>>> 72681.425: [GC 72681.426: [ParNew: 38325K->4224K(38336K), 439.2720750 >>>>>> secs] 3013053K->2979055K(6549376K), 439.2720750 secs] [Times: user=0.08 >>>>>> sys=0.00, real=0.08 secs] >>>>>> >>>>>> Environment: >>>>>> Java HotSpot(TM) 64-Bit Server VM Version 1.6.0_20-b02 >>>>>> Linux Version 2.6.18-128.1.1.el5 on amd64 >>>>>> -Xms6400m >>>>>> -Xmx6400m >>>>>> -Xss256k >>>>>> -XX:+UseConcMarkSweepGC >>>>>> -XX:+PrintGCDetails >>>>>> -XX:+PrintGCTimeStamps >>>>>> -XX:+UseCompressedOops >>>>>> _______________________________________________ >>>>>> hotspot-gc-use mailing list >>>>>> hotspot-gc-use at openjdk.java.net >>>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From Dori.Rabin at Starhome.com Wed Oct 27 05:05:03 2010 From: Dori.Rabin at Starhome.com (Rabin Dori) Date: Wed, 27 Oct 2010 14:05:03 +0200 Subject: i would like to post to this list Message-ID: <983CFBCFF00E9A498F2703DBD7155DC7295D82561B@ISR-IT-EX-01.starhome.local> My email is : dori.rabin at starhome.com -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101027/fbe74692/attachment.html From anthony.warden at nomura.com Wed Oct 27 05:21:04 2010 From: anthony.warden at nomura.com (anthony.warden at nomura.com) Date: Wed, 27 Oct 2010 13:21:04 +0100 Subject: i would like to post to this list In-Reply-To: <983CFBCFF00E9A498F2703DBD7155DC7295D82561B@ISR-IT-EX-01.starhome.local> References: <983CFBCFF00E9A498F2703DBD7155DC7295D82561B@ISR-IT-EX-01.starhome.local> Message-ID: <2E97E78D7F99D64DA5108FE2E9F0E8280CAB6996@LONEV3201.EUROPE.NOM> I think you just did! From: hotspot-gc-use-bounces at openjdk.java.net [mailto:hotspot-gc-use-bounces at openjdk.java.net] On Behalf Of Rabin Dori Sent: 27 October 2010 13:05 To: hotspot-gc-use at openjdk.java.net Subject: i would like to post to this list My email is : dori.rabin at starhome.com This e-mail (including any attachments) is confidential, may contain proprietary or privileged information and is intended for the named recipient(s) only. Unintended recipients are prohibited from taking action on the basis of information in this e-mail and must delete all copies. Nomura will not accept responsibility or liability for the accuracy or completeness of, or the presence of any virus or disabling code in, this e-mail. If verification is sought please request a hard copy. Any reference to the terms of executed transactions should be treated as preliminary only and subject to formal written confirmation by Nomura. Nomura reserves the right to monitor e-mail communications through its networks (in accordance with applicable laws). No confidentiality or privilege is waived or lost by Nomura by any mistransmission of this e-mail. Any reference to "Nomura" is a reference to any entity in the Nomura Holdings, Inc. group. Please read our Electronic Communications Legal Notice which forms part of this e-mail: http://www.Nomura.com/email_disclaimer.htm -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101027/0cc1fbb6/attachment.html