From aaisinzon at guidewire.com Mon Apr 9 11:37:32 2012 From: aaisinzon at guidewire.com (Alex Aisinzon) Date: Mon, 9 Apr 2012 18:37:32 +0000 Subject: Code cache Message-ID: <43E49E6EC0E84F41B98C68AB6D7820C4170F7515@sm-ex-02-vm.guidewire.com> I ran performance tests on one of our apps and saw the following error message in the GC logs: Java HotSpot(TM) 64-Bit Server VM warning: CodeCache is full. Compiler has been disabled. Java HotSpot(TM) 64-Bit Server VM warning: Try increasing the code cache size using -XX:ReservedCodeCacheSize= I scaled up the code cache to 512MB (-XX:ReservedCodeCacheSize=512m) and markedly improved performance/scalability. I have a few questions: * Is there a logging option that shows how much of the code cache is really used so that I find the right cache size without oversizing it? * What factors play into the code cache utilization? I would guess that the amount of code to compile is the dominant factor. Are there other factors like load: I would guess that some entries in the cache may get invalidated if not used much and load could be a factor in this. I was running on Sun JVM 1.6 update 30 64 bit on x86-64. Best Alex A -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120409/5782b5d6/attachment.html From dawid.weiss at gmail.com Wed Apr 11 07:24:28 2012 From: dawid.weiss at gmail.com (Dawid Weiss) Date: Wed, 11 Apr 2012 16:24:28 +0200 Subject: ParNew promotion failed, no expected OOM. Message-ID: Hi there, We are measuring certain aspects of our algorithm with a test suite which attempts to run close to the physical heap's maximum size. We do it by doing a form of binary search based on the size of data passed to the algorithm, where the lower bound is always "succeeded without an OOM" and the upper bound is "threw an OOM". This works nice but occasionally we experience an effective deadlock in which full GCs are repeatedly invoked, the application makes progress but overall it's several orders of magnitude slower than usual (hours instead of seconds). GC logs look like this: [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0220371 secs][CMS: 69016K->69014K(81152K), 0.1370901 secs] 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)], 0.1591765 secs] [Times: user=0.20 sys=0.00, real=0.16 secs] [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0170617 secs][CMS: 69016K->69014K(81152K), 0.1235417 secs] 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)], 0.1406872 secs] [Times: user=0.18 sys=0.00, real=0.14 secs] [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0191855 secs][CMS: 69016K->69014K(81152K), 0.1296462 secs] 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)], 0.1488816 secs] [Times: user=0.18 sys=0.00, real=0.15 secs] [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0232418 secs][CMS: 69016K->69014K(81152K), 0.1300695 secs] 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)], 0.1533590 secs] [Times: user=0.20 sys=0.00, real=0.15 secs] [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0190998 secs][CMS: 69016K->69014K(81152K), 0.1319668 secs] 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)], 0.1511436 secs] [Times: user=0.18 sys=0.00, real=0.15 secs] [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0168998 secs][CMS: 69017K->69015K(81152K), 0.1359254 secs] 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)], 0.1528776 secs] [Times: user=0.20 sys=0.01, real=0.16 secs] [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0214651 secs][CMS: 69017K->69015K(81152K), 0.1209494 secs] 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)], 0.1424941 secs] [Times: user=0.18 sys=0.00, real=0.14 secs] [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0200897 secs][CMS: 69017K->69015K(81152K), 0.1244227 secs] 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)], 0.1445654 secs] [Times: user=0.18 sys=0.00, real=0.14 secs] [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0203377 secs][CMS: 69017K->69015K(81152K), 0.1353857 secs] 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)], 0.1558016 secs] [Times: user=0.19 sys=0.00, real=0.16 secs] [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0201951 secs][CMS: 69017K->69015K(81152K), 0.1289750 secs] 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)], 0.1492306 secs] [Times: user=0.19 sys=0.00, real=0.15 secs] [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0206677 secs][CMS: 69017K->69015K(81152K), 0.1280734 secs] 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)], 0.1488114 secs] [Times: user=0.18 sys=0.00, real=0.15 secs] [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0150225 secs][CMS: 69017K->69015K(81152K), 0.1301056 secs] 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)], 0.1451940 secs] [Times: user=0.19 sys=0.01, real=0.14 secs] The heap limit is intentionally left smallish and the routine where this happens is in fact computational (it does allocate sporadic objects but never releases them until finished). This behavior is easy to reproduce on my Mac (quad core), java version "1.6.0_31" Java(TM) SE Runtime Environment (build 1.6.0_31-b04-414-11M3626) Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01-414, mixed mode) I read a bit about the nature of "promotion failed" and it's clear to me (or so I think) why this is happening here. My questions are: 1) why isn't OOM being triggered by gc overhead limit? It should easily be falling within the default thresholds, 2) is there anything one can do to prevent situation like the above (other than manually fiddling with limits)? Thanks in advance for any pointers and feedback, Dawid From ysr1729 at gmail.com Wed Apr 11 11:24:05 2012 From: ysr1729 at gmail.com (Srinivas Ramakrishna) Date: Wed, 11 Apr 2012 11:24:05 -0700 Subject: ParNew promotion failed, no expected OOM. In-Reply-To: References: Message-ID: I believe this is missing the "gc overhead" threshold for the space limit. As I have commented in the past, i think the GC overhead limit should consider not just the space free in the whole heap, but rather the difference between the old gen capacity and the sum of the space used in the young gen and the old gen after a major GC has competed, as a percentage of the old gen capacity. It almost seems as though you have a largish object in the young gen which will not fit in the space free in the old gen, o it will never be promoted unless sufficient space clears up in the old gen, and from what you are describing, that won't happen until your program terminates its computation. I think we need to fix the space criteria for overhead limit to deal gracefully with these kinds of situations. On an unrelated note, for such a small heap, you should probably use ParallelOldGC rather than CMS, but I realize that you didn't explicitly ask for CMS, the mac just gave it to you because that's the default. -- ramki On Wed, Apr 11, 2012 at 7:24 AM, Dawid Weiss wrote: > Hi there, > > We are measuring certain aspects of our algorithm with a test suite > which attempts to run close to the physical heap's maximum size. We do > it by doing a form of binary search based on the size of data passed > to the algorithm, where the lower bound is always "succeeded without > an OOM" and the upper bound is "threw an OOM". This works nice but > occasionally we experience an effective deadlock in which full GCs are > repeatedly invoked, the application makes progress but overall it's > several orders of magnitude slower than usual (hours instead of > seconds). > > GC logs look like this: > > [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0220371 > secs][CMS: 69016K->69014K(81152K), 0.1370901 secs] > 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)], > 0.1591765 secs] [Times: user=0.20 sys=0.00, real=0.16 secs] > [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0170617 > secs][CMS: 69016K->69014K(81152K), 0.1235417 secs] > 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)], > 0.1406872 secs] [Times: user=0.18 sys=0.00, real=0.14 secs] > [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0191855 > secs][CMS: 69016K->69014K(81152K), 0.1296462 secs] > 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)], > 0.1488816 secs] [Times: user=0.18 sys=0.00, real=0.15 secs] > [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0232418 > secs][CMS: 69016K->69014K(81152K), 0.1300695 secs] > 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)], > 0.1533590 secs] [Times: user=0.20 sys=0.00, real=0.15 secs] > [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0190998 > secs][CMS: 69016K->69014K(81152K), 0.1319668 secs] > 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)], > 0.1511436 secs] [Times: user=0.18 sys=0.00, real=0.15 secs] > [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0168998 > secs][CMS: 69017K->69015K(81152K), 0.1359254 secs] > 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)], > 0.1528776 secs] [Times: user=0.20 sys=0.01, real=0.16 secs] > [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0214651 > secs][CMS: 69017K->69015K(81152K), 0.1209494 secs] > 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)], > 0.1424941 secs] [Times: user=0.18 sys=0.00, real=0.14 secs] > [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0200897 > secs][CMS: 69017K->69015K(81152K), 0.1244227 secs] > 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)], > 0.1445654 secs] [Times: user=0.18 sys=0.00, real=0.14 secs] > [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0203377 > secs][CMS: 69017K->69015K(81152K), 0.1353857 secs] > 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)], > 0.1558016 secs] [Times: user=0.19 sys=0.00, real=0.16 secs] > [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0201951 > secs][CMS: 69017K->69015K(81152K), 0.1289750 secs] > 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)], > 0.1492306 secs] [Times: user=0.19 sys=0.00, real=0.15 secs] > [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0206677 > secs][CMS: 69017K->69015K(81152K), 0.1280734 secs] > 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)], > 0.1488114 secs] [Times: user=0.18 sys=0.00, real=0.15 secs] > [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0150225 > secs][CMS: 69017K->69015K(81152K), 0.1301056 secs] > 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)], > 0.1451940 secs] [Times: user=0.19 sys=0.01, real=0.14 secs] > > The heap limit is intentionally left smallish and the routine where > this happens is in fact computational (it does allocate sporadic > objects but never releases them until finished). > > This behavior is easy to reproduce on my Mac (quad core), > > java version "1.6.0_31" > Java(TM) SE Runtime Environment (build 1.6.0_31-b04-414-11M3626) > Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01-414, mixed mode) > > I read a bit about the nature of "promotion failed" and it's clear to > me (or so I think) why this is happening here. My questions are: > > 1) why isn't OOM being triggered by gc overhead limit? It should > easily be falling within the default thresholds, > 2) is there anything one can do to prevent situation like the above > (other than manually fiddling with limits)? > > Thanks in advance for any pointers and feedback, > > Dawid > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120411/42777acc/attachment.html From dawid.weiss at gmail.com Wed Apr 11 11:31:36 2012 From: dawid.weiss at gmail.com (Dawid Weiss) Date: Wed, 11 Apr 2012 20:31:36 +0200 Subject: ParNew promotion failed, no expected OOM. In-Reply-To: References: Message-ID: > GC has competed, as a percentage of the old gen capacity. It almost seems as > though you have a largish object in the young gen which will not fit in the space > free in the old gen, o it will never be promoted unless sufficient space clears up in the old Yes, this is exactly the case -- there is a recursive routine that builds a complex array-based data structure. The routine is recursive and I'm guessing the old gen is already filled up with other data so there is no space to fit the new array there. > I think we need to fix the space criteria for overhead limit to deal > gracefully with these kinds of situations. This would make sense even if it's really an outlier observation of mine (I'm specifically trying to reach heap boundary; not a typical use case I guess). > On an unrelated note, for such a small heap, you should probably use > ParallelOldGC rather than CMS, but I realize that you didn't explicitly ask for CMS, the mac just > gave it to you because that's the default. This happened on a mac and on ubuntu linux as well, but it's indeed of no relevance here because it's the default setting and this is what is worrying. I also figured that switching the garbage collector will be a temporary solution (I used the good old serial gc since I don't care about timings here). Thanks for confirming my suspicions. Dawid From jon.masamitsu at oracle.com Wed Apr 11 11:41:35 2012 From: jon.masamitsu at oracle.com (Jon Masamitsu) Date: Wed, 11 Apr 2012 11:41:35 -0700 Subject: ParNew promotion failed, no expected OOM. In-Reply-To: References: Message-ID: <4F85D05F.5010907@oracle.com> Dawid, I haven't look at your numbers but the OOM due to the GC overhead is thrown very conservatively. In addition to spending too much time doing GC, the policy looks at how much free space is available in the heap. It may be that there is enough free space in the heap such that the policy does not want to trigger an OOM. You see the "promotion failure" message when the GC policy thinks there is enough space in the old gen to support a young collection. It's supposed to be the exception case and I wonder a bit why you see "promotion failure" messages repeatedly instead of just seeing "Full collections" but I can see how the policy could get stuck in a situation where it keeps thinking there is enough space in the old gen but in the end there isn't. Anyway those are basically Full collections. Jon On 04/11/12 07:24, Dawid Weiss wrote: > Hi there, > > We are measuring certain aspects of our algorithm with a test suite > which attempts to run close to the physical heap's maximum size. We do > it by doing a form of binary search based on the size of data passed > to the algorithm, where the lower bound is always "succeeded without > an OOM" and the upper bound is "threw an OOM". This works nice but > occasionally we experience an effective deadlock in which full GCs are > repeatedly invoked, the application makes progress but overall it's > several orders of magnitude slower than usual (hours instead of > seconds). > > GC logs look like this: > > [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0220371 > secs][CMS: 69016K->69014K(81152K), 0.1370901 secs] > 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)], > 0.1591765 secs] [Times: user=0.20 sys=0.00, real=0.16 secs] > [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0170617 > secs][CMS: 69016K->69014K(81152K), 0.1235417 secs] > 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)], > 0.1406872 secs] [Times: user=0.18 sys=0.00, real=0.14 secs] > [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0191855 > secs][CMS: 69016K->69014K(81152K), 0.1296462 secs] > 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)], > 0.1488816 secs] [Times: user=0.18 sys=0.00, real=0.15 secs] > [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0232418 > secs][CMS: 69016K->69014K(81152K), 0.1300695 secs] > 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)], > 0.1533590 secs] [Times: user=0.20 sys=0.00, real=0.15 secs] > [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0190998 > secs][CMS: 69016K->69014K(81152K), 0.1319668 secs] > 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)], > 0.1511436 secs] [Times: user=0.18 sys=0.00, real=0.15 secs] > [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0168998 > secs][CMS: 69017K->69015K(81152K), 0.1359254 secs] > 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)], > 0.1528776 secs] [Times: user=0.20 sys=0.01, real=0.16 secs] > [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0214651 > secs][CMS: 69017K->69015K(81152K), 0.1209494 secs] > 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)], > 0.1424941 secs] [Times: user=0.18 sys=0.00, real=0.14 secs] > [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0200897 > secs][CMS: 69017K->69015K(81152K), 0.1244227 secs] > 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)], > 0.1445654 secs] [Times: user=0.18 sys=0.00, real=0.14 secs] > [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0203377 > secs][CMS: 69017K->69015K(81152K), 0.1353857 secs] > 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)], > 0.1558016 secs] [Times: user=0.19 sys=0.00, real=0.16 secs] > [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0201951 > secs][CMS: 69017K->69015K(81152K), 0.1289750 secs] > 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)], > 0.1492306 secs] [Times: user=0.19 sys=0.00, real=0.15 secs] > [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0206677 > secs][CMS: 69017K->69015K(81152K), 0.1280734 secs] > 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)], > 0.1488114 secs] [Times: user=0.18 sys=0.00, real=0.15 secs] > [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0150225 > secs][CMS: 69017K->69015K(81152K), 0.1301056 secs] > 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)], > 0.1451940 secs] [Times: user=0.19 sys=0.01, real=0.14 secs] > > The heap limit is intentionally left smallish and the routine where > this happens is in fact computational (it does allocate sporadic > objects but never releases them until finished). > > This behavior is easy to reproduce on my Mac (quad core), > > java version "1.6.0_31" > Java(TM) SE Runtime Environment (build 1.6.0_31-b04-414-11M3626) > Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01-414, mixed mode) > > I read a bit about the nature of "promotion failed" and it's clear to > me (or so I think) why this is happening here. My questions are: > > 1) why isn't OOM being triggered by gc overhead limit? It should > easily be falling within the default thresholds, > 2) is there anything one can do to prevent situation like the above > (other than manually fiddling with limits)? > > Thanks in advance for any pointers and feedback, > > Dawid > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From dawid.weiss at gmail.com Wed Apr 11 11:53:00 2012 From: dawid.weiss at gmail.com (Dawid Weiss) Date: Wed, 11 Apr 2012 20:53:00 +0200 Subject: ParNew promotion failed, no expected OOM. In-Reply-To: <4F85D05F.5010907@oracle.com> References: <4F85D05F.5010907@oracle.com> Message-ID: > I haven't look at your numbers but the OOM due to the > GC overhead is thrown very conservatively. ?In addition to I realize this but this seems like a good example of when gc overhead should fire... or so I think. There doesn't seem to be any space left at all -- 69016K->69014K(81152K) I realize these are full GCs because that's what -verbose:gc reports (I included the details because I asked for them but otherwise what you see is just FullGCs and no progress from the application itself). What's puzzling to me is that this routine only allocates memory (hard refs, there is nothing to collect) but the garbace collector _does_ drop around 2kb on every full GC... Also, this routine is normally blazing fast and should either complete or OOM very quickly but instead stalls as if 99% of the time was spent doing full collections. I really cannot explain this. Is there any way to see which objects get _dropped_ on full GC runs? I'm curious what these dropped objects are. Dawid > spending too much time doing GC, the policy looks at how > much free space is available in the heap. ?It may be that > there is enough free space in the heap such that the policy > does not want to trigger an OOM. > > You see the "promotion failure" message when the GC > policy thinks there is enough space in the old gen to > support a young collection. ?It's supposed to be the > exception case and I wonder a bit why you see > "promotion failure" messages repeatedly instead of > just seeing "Full collections" but I can see how the > policy could get stuck in a situation where it keeps > thinking there is enough space in the old gen but > in the end there isn't. ? Anyway those are basically > Full collections. > > Jon > > On 04/11/12 07:24, Dawid Weiss wrote: >> Hi there, >> >> We are measuring certain aspects of our algorithm with a test suite >> which attempts to run close to the physical heap's maximum size. We do >> it by doing a form of binary search based on the size of data passed >> to the algorithm, where the lower bound is always "succeeded without >> an OOM" and the upper bound is "threw an OOM". This works nice but >> occasionally we experience an effective deadlock in which full GCs are >> repeatedly invoked, the application makes progress but overall it's >> several orders of magnitude slower than usual (hours instead of >> seconds). >> >> GC logs look like this: >> >> [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0220371 >> secs][CMS: 69016K->69014K(81152K), 0.1370901 secs] >> 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)], >> 0.1591765 secs] [Times: user=0.20 sys=0.00, real=0.16 secs] >> [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0170617 >> secs][CMS: 69016K->69014K(81152K), 0.1235417 secs] >> 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)], >> 0.1406872 secs] [Times: user=0.18 sys=0.00, real=0.14 secs] >> [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0191855 >> secs][CMS: 69016K->69014K(81152K), 0.1296462 secs] >> 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)], >> 0.1488816 secs] [Times: user=0.18 sys=0.00, real=0.15 secs] >> [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0232418 >> secs][CMS: 69016K->69014K(81152K), 0.1300695 secs] >> 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)], >> 0.1533590 secs] [Times: user=0.20 sys=0.00, real=0.15 secs] >> [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0190998 >> secs][CMS: 69016K->69014K(81152K), 0.1319668 secs] >> 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)], >> 0.1511436 secs] [Times: user=0.18 sys=0.00, real=0.15 secs] >> [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0168998 >> secs][CMS: 69017K->69015K(81152K), 0.1359254 secs] >> 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)], >> 0.1528776 secs] [Times: user=0.20 sys=0.01, real=0.16 secs] >> [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0214651 >> secs][CMS: 69017K->69015K(81152K), 0.1209494 secs] >> 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)], >> 0.1424941 secs] [Times: user=0.18 sys=0.00, real=0.14 secs] >> [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0200897 >> secs][CMS: 69017K->69015K(81152K), 0.1244227 secs] >> 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)], >> 0.1445654 secs] [Times: user=0.18 sys=0.00, real=0.14 secs] >> [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0203377 >> secs][CMS: 69017K->69015K(81152K), 0.1353857 secs] >> 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)], >> 0.1558016 secs] [Times: user=0.19 sys=0.00, real=0.16 secs] >> [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0201951 >> secs][CMS: 69017K->69015K(81152K), 0.1289750 secs] >> 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)], >> 0.1492306 secs] [Times: user=0.19 sys=0.00, real=0.15 secs] >> [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0206677 >> secs][CMS: 69017K->69015K(81152K), 0.1280734 secs] >> 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)], >> 0.1488114 secs] [Times: user=0.18 sys=0.00, real=0.15 secs] >> [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0150225 >> secs][CMS: 69017K->69015K(81152K), 0.1301056 secs] >> 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)], >> 0.1451940 secs] [Times: user=0.19 sys=0.01, real=0.14 secs] >> >> The heap limit is intentionally left smallish and the routine where >> this happens is in fact computational (it does allocate sporadic >> objects but never releases them until finished). >> >> This behavior is easy to reproduce on my Mac (quad core), >> >> java version "1.6.0_31" >> Java(TM) SE Runtime Environment (build 1.6.0_31-b04-414-11M3626) >> Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01-414, mixed mode) >> >> I read a bit about the nature of "promotion failed" and it's clear to >> me (or so I think) why this is happening here. My questions are: >> >> 1) why isn't OOM being triggered by gc overhead limit? It should >> easily be falling within the default thresholds, >> 2) is there anything one can do to prevent situation like the above >> (other than manually fiddling with limits)? >> >> Thanks in advance for any pointers and feedback, >> >> Dawid >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From jon.masamitsu at oracle.com Wed Apr 11 23:02:18 2012 From: jon.masamitsu at oracle.com (Jon Masamitsu) Date: Wed, 11 Apr 2012 23:02:18 -0700 Subject: ParNew promotion failed, no expected OOM. In-Reply-To: References: <4F85D05F.5010907@oracle.com> Message-ID: <4F866FEA.7010607@oracle.com> Dawid, I haven't used these myself but you can try the flags PrintClassHistogramBeforeFullGC PrintClassHistogramAfterFullGC and see what gets collected. Jon On 4/11/2012 11:53 AM, Dawid Weiss wrote: >> I haven't look at your numbers but the OOM due to the >> GC overhead is thrown very conservatively. In addition to > I realize this but this seems like a good example of when gc overhead > should fire... or so I > think. There doesn't seem to be any space left at all -- > > 69016K->69014K(81152K) > > I realize these are full GCs because that's what -verbose:gc reports > (I included the details because I asked for them but otherwise what > you see is just FullGCs and no progress from the application itself). > > What's puzzling to me is that this routine only allocates memory (hard > refs, there is nothing to collect) but the garbace collector _does_ > drop around 2kb on every full GC... Also, this routine is normally > blazing fast and should either complete or OOM very quickly but > instead stalls as if 99% of the time was spent doing full collections. > I really cannot explain this. > > Is there any way to see which objects get _dropped_ on full GC runs? > I'm curious what these dropped objects are. > > Dawid > > >> spending too much time doing GC, the policy looks at how >> much free space is available in the heap. It may be that >> there is enough free space in the heap such that the policy >> does not want to trigger an OOM. >> >> You see the "promotion failure" message when the GC >> policy thinks there is enough space in the old gen to >> support a young collection. It's supposed to be the >> exception case and I wonder a bit why you see >> "promotion failure" messages repeatedly instead of >> just seeing "Full collections" but I can see how the >> policy could get stuck in a situation where it keeps >> thinking there is enough space in the old gen but >> in the end there isn't. Anyway those are basically >> Full collections. >> >> Jon >> >> On 04/11/12 07:24, Dawid Weiss wrote: >>> Hi there, >>> >>> We are measuring certain aspects of our algorithm with a test suite >>> which attempts to run close to the physical heap's maximum size. We do >>> it by doing a form of binary search based on the size of data passed >>> to the algorithm, where the lower bound is always "succeeded without >>> an OOM" and the upper bound is "threw an OOM". This works nice but >>> occasionally we experience an effective deadlock in which full GCs are >>> repeatedly invoked, the application makes progress but overall it's >>> several orders of magnitude slower than usual (hours instead of >>> seconds). >>> >>> GC logs look like this: >>> >>> [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0220371 >>> secs][CMS: 69016K->69014K(81152K), 0.1370901 secs] >>> 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)], >>> 0.1591765 secs] [Times: user=0.20 sys=0.00, real=0.16 secs] >>> [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0170617 >>> secs][CMS: 69016K->69014K(81152K), 0.1235417 secs] >>> 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)], >>> 0.1406872 secs] [Times: user=0.18 sys=0.00, real=0.14 secs] >>> [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0191855 >>> secs][CMS: 69016K->69014K(81152K), 0.1296462 secs] >>> 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)], >>> 0.1488816 secs] [Times: user=0.18 sys=0.00, real=0.15 secs] >>> [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0232418 >>> secs][CMS: 69016K->69014K(81152K), 0.1300695 secs] >>> 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)], >>> 0.1533590 secs] [Times: user=0.20 sys=0.00, real=0.15 secs] >>> [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0190998 >>> secs][CMS: 69016K->69014K(81152K), 0.1319668 secs] >>> 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)], >>> 0.1511436 secs] [Times: user=0.18 sys=0.00, real=0.15 secs] >>> [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0168998 >>> secs][CMS: 69017K->69015K(81152K), 0.1359254 secs] >>> 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)], >>> 0.1528776 secs] [Times: user=0.20 sys=0.01, real=0.16 secs] >>> [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0214651 >>> secs][CMS: 69017K->69015K(81152K), 0.1209494 secs] >>> 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)], >>> 0.1424941 secs] [Times: user=0.18 sys=0.00, real=0.14 secs] >>> [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0200897 >>> secs][CMS: 69017K->69015K(81152K), 0.1244227 secs] >>> 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)], >>> 0.1445654 secs] [Times: user=0.18 sys=0.00, real=0.14 secs] >>> [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0203377 >>> secs][CMS: 69017K->69015K(81152K), 0.1353857 secs] >>> 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)], >>> 0.1558016 secs] [Times: user=0.19 sys=0.00, real=0.16 secs] >>> [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0201951 >>> secs][CMS: 69017K->69015K(81152K), 0.1289750 secs] >>> 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)], >>> 0.1492306 secs] [Times: user=0.19 sys=0.00, real=0.15 secs] >>> [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0206677 >>> secs][CMS: 69017K->69015K(81152K), 0.1280734 secs] >>> 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)], >>> 0.1488114 secs] [Times: user=0.18 sys=0.00, real=0.15 secs] >>> [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0150225 >>> secs][CMS: 69017K->69015K(81152K), 0.1301056 secs] >>> 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)], >>> 0.1451940 secs] [Times: user=0.19 sys=0.01, real=0.14 secs] >>> >>> The heap limit is intentionally left smallish and the routine where >>> this happens is in fact computational (it does allocate sporadic >>> objects but never releases them until finished). >>> >>> This behavior is easy to reproduce on my Mac (quad core), >>> >>> java version "1.6.0_31" >>> Java(TM) SE Runtime Environment (build 1.6.0_31-b04-414-11M3626) >>> Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01-414, mixed mode) >>> >>> I read a bit about the nature of "promotion failed" and it's clear to >>> me (or so I think) why this is happening here. My questions are: >>> >>> 1) why isn't OOM being triggered by gc overhead limit? It should >>> easily be falling within the default thresholds, >>> 2) is there anything one can do to prevent situation like the above >>> (other than manually fiddling with limits)? >>> >>> Thanks in advance for any pointers and feedback, >>> >>> Dawid >>> _______________________________________________ >>> hotspot-gc-use mailing list >>> hotspot-gc-use at openjdk.java.net >>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From jon.masamitsu at oracle.com Wed Apr 11 23:28:07 2012 From: jon.masamitsu at oracle.com (Jon Masamitsu) Date: Wed, 11 Apr 2012 23:28:07 -0700 Subject: ParNew promotion failed, no expected OOM. In-Reply-To: References: Message-ID: <4F8675F7.9040205@oracle.com> Ramki, I never want to throw an OOM and then have to argue about whether the OOM was thrown prematurely. That would be a bug. As a consequence of such an approach, I accept that there will be times when it would have been more helpful if the OOM was thrown sooner. That might be a poorer quality of service but not a bug (I think). Jon On 4/11/2012 11:24 AM, Srinivas Ramakrishna wrote: > I believe this is missing the "gc overhead" threshold for the space limit. > As I have commented in the past, i think the GC overhead limit should > consider > not just the space free in the whole heap, but rather the difference > between the old gen > capacity and the sum of the space used in the young gen and the old gen > after a major > GC has competed, as a percentage of the old gen capacity. It almost seems > as though > you have a largish object in the young gen which will not fit in the space > free in the old gen, > o it will never be promoted unless sufficient space clears up in the old > gen, and from what > you are describing, that won't happen until your program terminates its > computation. > > I think we need to fix the space criteria for overhead limit to deal > gracefully > with these kinds of situations. > > On an unrelated note, for such a small heap, you should probably use > ParallelOldGC rather > than CMS, but I realize that you didn't explicitly ask for CMS, the mac > just gave it to you > because that's the default. > > -- ramki > > On Wed, Apr 11, 2012 at 7:24 AM, Dawid Weiss wrote: > >> Hi there, >> >> We are measuring certain aspects of our algorithm with a test suite >> which attempts to run close to the physical heap's maximum size. We do >> it by doing a form of binary search based on the size of data passed >> to the algorithm, where the lower bound is always "succeeded without >> an OOM" and the upper bound is "threw an OOM". This works nice but >> occasionally we experience an effective deadlock in which full GCs are >> repeatedly invoked, the application makes progress but overall it's >> several orders of magnitude slower than usual (hours instead of >> seconds). >> >> GC logs look like this: >> >> [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0220371 >> secs][CMS: 69016K->69014K(81152K), 0.1370901 secs] >> 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)], >> 0.1591765 secs] [Times: user=0.20 sys=0.00, real=0.16 secs] >> [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0170617 >> secs][CMS: 69016K->69014K(81152K), 0.1235417 secs] >> 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)], >> 0.1406872 secs] [Times: user=0.18 sys=0.00, real=0.14 secs] >> [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0191855 >> secs][CMS: 69016K->69014K(81152K), 0.1296462 secs] >> 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)], >> 0.1488816 secs] [Times: user=0.18 sys=0.00, real=0.15 secs] >> [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0232418 >> secs][CMS: 69016K->69014K(81152K), 0.1300695 secs] >> 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)], >> 0.1533590 secs] [Times: user=0.20 sys=0.00, real=0.15 secs] >> [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0190998 >> secs][CMS: 69016K->69014K(81152K), 0.1319668 secs] >> 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)], >> 0.1511436 secs] [Times: user=0.18 sys=0.00, real=0.15 secs] >> [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0168998 >> secs][CMS: 69017K->69015K(81152K), 0.1359254 secs] >> 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)], >> 0.1528776 secs] [Times: user=0.20 sys=0.01, real=0.16 secs] >> [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0214651 >> secs][CMS: 69017K->69015K(81152K), 0.1209494 secs] >> 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)], >> 0.1424941 secs] [Times: user=0.18 sys=0.00, real=0.14 secs] >> [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0200897 >> secs][CMS: 69017K->69015K(81152K), 0.1244227 secs] >> 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)], >> 0.1445654 secs] [Times: user=0.18 sys=0.00, real=0.14 secs] >> [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0203377 >> secs][CMS: 69017K->69015K(81152K), 0.1353857 secs] >> 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)], >> 0.1558016 secs] [Times: user=0.19 sys=0.00, real=0.16 secs] >> [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0201951 >> secs][CMS: 69017K->69015K(81152K), 0.1289750 secs] >> 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)], >> 0.1492306 secs] [Times: user=0.19 sys=0.00, real=0.15 secs] >> [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0206677 >> secs][CMS: 69017K->69015K(81152K), 0.1280734 secs] >> 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)], >> 0.1488114 secs] [Times: user=0.18 sys=0.00, real=0.15 secs] >> [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0150225 >> secs][CMS: 69017K->69015K(81152K), 0.1301056 secs] >> 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)], >> 0.1451940 secs] [Times: user=0.19 sys=0.01, real=0.14 secs] >> >> The heap limit is intentionally left smallish and the routine where >> this happens is in fact computational (it does allocate sporadic >> objects but never releases them until finished). >> >> This behavior is easy to reproduce on my Mac (quad core), >> >> java version "1.6.0_31" >> Java(TM) SE Runtime Environment (build 1.6.0_31-b04-414-11M3626) >> Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01-414, mixed mode) >> >> I read a bit about the nature of "promotion failed" and it's clear to >> me (or so I think) why this is happening here. My questions are: >> >> 1) why isn't OOM being triggered by gc overhead limit? It should >> easily be falling within the default thresholds, >> 2) is there anything one can do to prevent situation like the above >> (other than manually fiddling with limits)? >> >> Thanks in advance for any pointers and feedback, >> >> Dawid >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120411/73ae2f7f/attachment.html From dawid.weiss at gmail.com Wed Apr 11 23:42:21 2012 From: dawid.weiss at gmail.com (Dawid Weiss) Date: Thu, 12 Apr 2012 08:42:21 +0200 Subject: ParNew promotion failed, no expected OOM. In-Reply-To: <4F8675F7.9040205@oracle.com> References: <4F8675F7.9040205@oracle.com> Message-ID: > I never want to throw an OOM and then have to argue about? whether > the OOM was thrown prematurely.? That would be a bug.? As a consequence I agree the tradeoff here is very subtle and there is probably no optimal setting. I'll dig deeper in a spare minute and see if I can repreduce this on a simpler example. Dawid From tanman12345 at yahoo.com Thu Apr 12 09:27:13 2012 From: tanman12345 at yahoo.com (Erwin) Date: Thu, 12 Apr 2012 09:27:13 -0700 (PDT) Subject: Need help about CMS Failure and ParNew failure In-Reply-To: <4F8675F7.9040205@oracle.com> References: <4F8675F7.9040205@oracle.com> Message-ID: <1334248033.11363.YahooMailNeo@web111103.mail.gq1.yahoo.com> Hello, ? I'm not an expert when it comes to analyzing GC output and was wondering if you guys could assist? We're using Solaris 10 10/08 s10s_u6wos_07b SPARC, with WAS NDE 7.0.0.9. After a restart of our JVMs, GC seems to be fine. However, after about a week, we start seeing failures in GC log. We're getting ParNew and Concurrent mod failures. Our JVM configurations?are below: Min heap - 4096 Max heap - 6016 ? JVM Arguments -server -Xmn1000m -XX:PermSize=512m -XX:+UseConcMarkSweepGC -XX:+HeapDumpOnOutOfMemoryError -DUseSunHttpHandler=true -Djavax.xml.soap.MessageFactory=weblogic.xml.saaj.MessageFactoryImpl -Doracle.jdbc.V8Compatible=true -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSClassUnloadingEnabled -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:-TraceClassUnloading -XX:+PrintHeapAtGC? -XX:MaxPermSize=694m -XX:+DisableExplicitGC -XX:+CMSParallelRemarkEnabled? ? I'm attaching the ParNew failure as well as CMS failure files. Hope it attaches.Total of 2 files. In case they don't see below. !st same if ParNew, 2nd is CMS failure. - Thanks, Erwin ParNew failure sample: {Heap before GC invocations=7800 (full 529): ?par new generation?? total 921600K, used 530694K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) ? eden space 819200K,? 52% used [0xfffffffdd0000000, 0xfffffffdea241bb8, 0xfffffffe02000000) ? from space 102400K, 100% used [0xfffffffe02000000, 0xfffffffe08400000, 0xfffffffe08400000) ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000) ?concurrent mark-sweep generation total 4902464K, used 2682365K [0xfffffffe0e800000, 0xffffffff39b90000, 0xffffffff48000000) ?concurrent-mark-sweep perm gen total 524288K, used 238782K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) 552370.958: [GC 552370.958: [ParNew: 530694K->93237K(921600K), 0.1858500 secs] 3213060K->2832744K(5824064K), 0.1862466 secs] [Times: user=1.21 sys=0.08, real=0.19 secs] Heap after GC invocations=7801 (full 529): ?par new generation?? total 921600K, used 93237K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000) ? from space 102400K,? 91% used [0xfffffffe08400000, 0xfffffffe0df0d498, 0xfffffffe0e800000) ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000) ?concurrent mark-sweep generation total 4902464K, used 2739507K [0xfffffffe0e800000, 0xffffffff39b90000, 0xffffffff48000000) ?concurrent-mark-sweep perm gen total 524288K, used 238782K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) } {Heap before GC invocations=7801 (full 529): ?par new generation?? total 921600K, used 912377K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) ? eden space 819200K,? 99% used [0xfffffffdd0000000, 0xfffffffe01ff1320, 0xfffffffe02000000) ? from space 102400K,? 91% used [0xfffffffe08400000, 0xfffffffe0df0d498, 0xfffffffe0e800000) ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000) ?concurrent mark-sweep generation total 4902464K, used 2739507K [0xfffffffe0e800000, 0xffffffff39b90000, 0xffffffff48000000) ?concurrent-mark-sweep perm gen total 524288K, used 238795K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) 552372.849: [GC 552372.849: [ParNew (promotion failed): 912377K->869343K(921600K), 0.2641392 secs]552373.113: [CMS: 2791714K->3203612K(4902464K), 29.1902704 secs] 3651885K->3203612K(5824064K), [CMS Perm : 238795K->238246K(524288K)], 29.4609781 secs] [Times: user=30.05 sys=0.13, real=29.46 secs] Heap after GC invocations=7802 (full 530): ?par new generation?? total 921600K, used 0K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000) ? from space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000) ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000) ?concurrent mark-sweep generation total 5136384K, used 3203612K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) ?concurrent-mark-sweep perm gen total 524288K, used 238246K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) } {Heap before GC invocations=12696 (full 809): ?par new generation?? total 921600K, used 908565K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) ? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, 0xfffffffe02000000) ? from space 102400K,? 87% used [0xfffffffe02000000, 0xfffffffe07745510, 0xfffffffe08400000) ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000) ?concurrent mark-sweep generation total 5136384K, used 2696786K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) ?concurrent-mark-sweep perm gen total 524288K, used 241380K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) 980120.502: [GC 980120.502: [ParNew: 908565K->73974K(921600K), 0.1519646 secs] 3605352K->2817972K(6057984K), 0.1523927 secs] [Times: user=1.07 sys=0.03, real=0.15 secs] Heap after GC invocations=12697 (full 809): ?par new generation?? total 921600K, used 73974K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000) ? from space 102400K,? 72% used [0xfffffffe08400000, 0xfffffffe0cc3d928, 0xfffffffe0e800000) ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000) ?concurrent mark-sweep generation total 5136384K, used 2743998K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) ?concurrent-mark-sweep perm gen total 524288K, used 241380K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) } {Heap before GC invocations=12697 (full 809): ?par new generation?? total 921600K, used 893174K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) ? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, 0xfffffffe02000000) ? from space 102400K,? 72% used [0xfffffffe08400000, 0xfffffffe0cc3d928, 0xfffffffe0e800000) ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000) ?concurrent mark-sweep generation total 5136384K, used 2743998K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) ?concurrent-mark-sweep perm gen total 524288K, used 241411K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) 980130.777: [GC 980130.777: [ParNew (promotion failed): 893174K->913391K(921600K), 0.5914616 secs]980131.368: [CMS: 2778416K->2710999K(5136384K), 27.7981960 secs] 3637172K->2710999K(6057984K), [CMS Perm : 241411K->240494K(524288K)], 28.3902578 secs] [Times: user=29.37 sys=0.08, real=28.39 secs] Heap after GC invocations=12698 (full 810): ?par new generation?? total 921600K, used 0K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000) ? from space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000) ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000) ?concurrent mark-sweep generation total 5136384K, used 2710999K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) ?concurrent-mark-sweep perm gen total 524288K, used 240494K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) } {Heap before GC invocations=12698 (full 810): ?par new generation?? total 921600K, used 819200K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) ? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, 0xfffffffe02000000) ? from space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000) ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000) ?concurrent mark-sweep generation total 5136384K, used 2710999K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) ?concurrent-mark-sweep perm gen total 524288K, used 240523K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) 980171.033: [GC 980171.033: [ParNew: 819200K->102400K(921600K), 0.2144047 secs] 3530199K->2857892K(6057984K), 0.2149864 secs] [Times: user=0.99 sys=0.03, real=0.22 secs] Heap after GC invocations=12699 (full 810): ?par new generation?? total 921600K, used 102400K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000) ? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000, 0xfffffffe0e800000) ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000) ?concurrent mark-sweep generation total 5136384K, used 2755492K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) ?concurrent-mark-sweep perm gen total 524288K, used 240523K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) } CMS Failure: {Heap before GC invocations=23856 (full 1462): ?par new generation?? total 921600K, used 921600K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) ? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, 0xfffffffe02000000) ? from space 102400K, 100% used [0xfffffffe02000000, 0xfffffffe08400000, 0xfffffffe08400000) ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000) ?concurrent mark-sweep generation total 5136384K, used 3496920K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) ?concurrent-mark-sweep perm gen total 524288K, used 243562K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) 1761014.981: [GC 1761014.981: [ParNew: 921600K->102400K(921600K), 0.3004508 secs] 4418520K->3667695K(6057984K), 0.3008667 secs] [Times: user=1.69 sys=0.10, real=0.30 secs] Heap after GC invocations=23857 (full 1462): ?par new generation?? total 921600K, used 102400K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000) ? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000, 0xfffffffe0e800000) ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000) ?concurrent mark-sweep generation total 5136384K, used 3565295K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) ?concurrent-mark-sweep perm gen total 524288K, used 243562K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) } {Heap before GC invocations=23857 (full 1462): ?par new generation?? total 921600K, used 921600K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) ? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, 0xfffffffe02000000) ? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000, 0xfffffffe0e800000) ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000) ?concurrent mark-sweep generation total 5136384K, used 3565295K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) ?concurrent-mark-sweep perm gen total 524288K, used 243773K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) 1761036.997: [GC 1761036.998: [ParNew: 921600K->102400K(921600K), 0.4075457 secs] 4486895K->3763683K(6057984K), 0.4079591 secs] [Times: user=3.57 sys=0.80, real=0.41 secs] Heap after GC invocations=23858 (full 1462): ?par new generation?? total 921600K, used 102400K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000) ? from space 102400K, 100% used [0xfffffffe02000000, 0xfffffffe08400000, 0xfffffffe08400000) ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000) ?concurrent mark-sweep generation total 5136384K, used 3661283K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) ?concurrent-mark-sweep perm gen total 524288K, used 243773K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) } 1761037.410: [GC [1 CMS-initial-mark: 3661283K(5136384K)] 3763683K(6057984K), 0.0883369 secs] [Times: user=0.09 sys=0.00, real=0.09 secs] 1761037.499: [CMS-concurrent-mark-start] 1761041.447: [CMS-concurrent-mark: 3.906/3.948 secs] [Times: user=25.81 sys=1.06, real=3.95 secs] 1761041.448: [CMS-concurrent-preclean-start] 1761041.763: [CMS-concurrent-preclean: 0.312/0.315 secs] [Times: user=0.50 sys=0.02, real=0.32 secs] 1761041.763: [CMS-concurrent-abortable-preclean-start] ?CMS: abort preclean due to time 1761046.800: [CMS-concurrent-abortable-preclean: 4.720/5.036 secs] [Times: user=6.68 sys=0.18, real=5.04 secs] 1761046.808: [GC[YG occupancy: 464701 K (921600 K)]1761046.808: [Rescan (parallel) , 0.3034664 secs]1761047.112: [weak refs processing, 0.0152564 secs]1761047.128: [class unloading, 0.1518160 secs]1761047.280: [scrub symbol & string tables, 0.1332523 secs] [1 CMS-remark: 3661283K(5136384K)] 4125985K(6057984K), 0.6980401 secs] [Times: user=1.34 sys=0.70, real=0.70 secs] 1761047.507: [CMS-concurrent-sweep-start] 1761051.779: [CMS-concurrent-sweep: 4.252/4.271 secs] [Times: user=6.30 sys=0.19, real=4.27 secs] 1761051.779: [CMS-concurrent-reset-start] 1761051.837: [CMS-concurrent-reset: 0.058/0.058 secs] [Times: user=0.07 sys=0.00, real=0.06 secs] {Heap before GC invocations=23858 (full 1463): ?par new generation?? total 921600K, used 921600K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) ? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, 0xfffffffe02000000) ? from space 102400K, 100% used [0xfffffffe02000000, 0xfffffffe08400000, 0xfffffffe08400000) ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000) ?concurrent mark-sweep generation total 5136384K, used 3514703K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) ?concurrent-mark-sweep perm gen total 524288K, used 243613K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) 1761061.144: [GC 1761061.145: [ParNew: 921600K->102400K(921600K), 0.4124278 secs] 4436303K->3717736K(6057984K), 0.4128777 secs] [Times: user=1.98 sys=0.19, real=0.41 secs] Heap after GC invocations=23859 (full 1463): ?par new generation?? total 921600K, used 102400K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000) ? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000, 0xfffffffe0e800000) ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000) ?concurrent mark-sweep generation total 5136384K, used 3615336K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) ?concurrent-mark-sweep perm gen total 524288K, used 243613K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) } 1761061.562: [GC [1 CMS-initial-mark: 3615336K(5136384K)] 3718604K(6057984K), 0.0660086 secs] [Times: user=0.07 sys=0.00, real=0.07 secs] 1761061.629: [CMS-concurrent-mark-start] 1761065.589: [CMS-concurrent-mark: 3.920/3.960 secs] [Times: user=26.20 sys=1.05, real=3.96 secs] 1761065.590: [CMS-concurrent-preclean-start] 1761065.883: [CMS-concurrent-preclean: 0.290/0.293 secs] [Times: user=0.54 sys=0.02, real=0.29 secs] 1761065.883: [CMS-concurrent-abortable-preclean-start] ?CMS: abort preclean due to time 1761070.950: [CMS-concurrent-abortable-preclean: 5.035/5.067 secs] [Times: user=10.70 sys=0.36, real=5.07 secs] 1761070.958: [GC[YG occupancy: 656197 K (921600 K)]1761070.959: [Rescan (parallel) , 0.5056315 secs]1761071.465: [weak refs processing, 0.0107058 secs]1761071.476: [class unloading, 0.1500832 secs]1761071.626: [scrub symbol & string tables, 0.1278517 secs] [1 CMS-remark: 3615336K(5136384K)] 4271533K(6057984K), 0.8857121 secs] [Times: user=1.77 sys=1.08, real=0.89 secs] 1761071.845: [CMS-concurrent-sweep-start] 1761075.956: [CMS-concurrent-sweep: 4.094/4.111 secs] [Times: user=7.97 sys=0.27, real=4.11 secs] 1761075.957: [CMS-concurrent-reset-start] 1761076.031: [CMS-concurrent-reset: 0.063/0.074 secs] [Times: user=0.13 sys=0.01, real=0.07 secs] {Heap before GC invocations=23859 (full 1464): ?par new generation?? total 921600K, used 921600K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) ? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, 0xfffffffe02000000) ? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000, 0xfffffffe0e800000) ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000) ?concurrent mark-sweep generation total 5136384K, used 3544377K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) ?concurrent-mark-sweep perm gen total 524288K, used 243474K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) 1761077.540: [GC 1761077.540: [ParNew: 921600K->102400K(921600K), 0.4030394 secs] 4465977K->3740399K(6057984K), 0.4034742 secs] [Times: user=3.14 sys=0.55, real=0.40 secs] Heap after GC invocations=23860 (full 1464): ?par new generation?? total 921600K, used 102400K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000) ? from space 102400K, 100% used [0xfffffffe02000000, 0xfffffffe08400000, 0xfffffffe08400000) ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000) ?concurrent mark-sweep generation total 5136384K, used 3637999K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) ?concurrent-mark-sweep perm gen total 524288K, used 243474K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) } 1761077.948: [GC [1 CMS-initial-mark: 3637999K(5136384K)] 3740403K(6057984K), 0.0664402 secs] [Times: user=0.06 sys=0.00, real=0.07 secs] 1761078.015: [CMS-concurrent-mark-start] 1761082.141: [CMS-concurrent-mark: 4.076/4.126 secs] [Times: user=25.56 sys=1.24, real=4.13 secs] 1761082.142: [CMS-concurrent-preclean-start] 1761082.435: [CMS-concurrent-preclean: 0.290/0.293 secs] [Times: user=0.56 sys=0.03, real=0.29 secs] 1761082.435: [CMS-concurrent-abortable-preclean-start] ?CMS: abort preclean due to time 1761087.544: [CMS-concurrent-abortable-preclean: 4.166/5.108 secs] [Times: user=6.79 sys=0.38, real=5.11 secs] 1761087.554: [GC[YG occupancy: 612230 K (921600 K)]1761087.555: [Rescan (parallel) , 0.3453344 secs]1761087.900: [weak refs processing, 0.0033384 secs]1761087.904: [class unloading, 0.1515234 secs]1761088.055: [scrub symbol & string tables, 0.1280533 secs] [1 CMS-remark: 3637999K(5136384K)] 4250230K(6057984K), 0.7189376 secs] [Times: user=1.29 sys=0.76, real=0.72 secs] 1761088.274: [CMS-concurrent-sweep-start] 1761092.543: [CMS-concurrent-sweep: 4.268/4.268 secs] [Times: user=6.72 sys=0.26, real=4.27 secs] 1761092.543: [CMS-concurrent-reset-start] 1761092.606: [CMS-concurrent-reset: 0.063/0.063 secs] [Times: user=0.11 sys=0.01, real=0.06 secs] {Heap before GC invocations=23860 (full 1465): ?par new generation?? total 921600K, used 921600K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) ? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, 0xfffffffe02000000) ? from space 102400K, 100% used [0xfffffffe02000000, 0xfffffffe08400000, 0xfffffffe08400000) ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000) ?concurrent mark-sweep generation total 5136384K, used 3582457K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) ?concurrent-mark-sweep perm gen total 524288K, used 243634K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) 1761096.682: [GC 1761096.682: [ParNew: 921600K->102400K(921600K), 0.2843209 secs] 4504057K->3786219K(6057984K), 0.2847419 secs] [Times: user=1.81 sys=0.10, real=0.29 secs] Heap after GC invocations=23861 (full 1465): ?par new generation?? total 921600K, used 102400K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000) ? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000, 0xfffffffe0e800000) ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000) ?concurrent mark-sweep generation total 5136384K, used 3683819K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) ?concurrent-mark-sweep perm gen total 524288K, used 243634K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) } 1761096.971: [GC [1 CMS-initial-mark: 3683819K(5136384K)] 3786817K(6057984K), 0.0480239 secs] [Times: user=0.05 sys=0.00, real=0.05 secs] 1761097.020: [CMS-concurrent-mark-start] 1761101.145: [CMS-concurrent-mark: 4.104/4.124 secs] [Times: user=24.60 sys=1.09, real=4.13 secs] 1761101.145: [CMS-concurrent-preclean-start] 1761101.438: [CMS-concurrent-preclean: 0.290/0.292 secs] [Times: user=0.41 sys=0.02, real=0.29 secs] 1761101.438: [CMS-concurrent-abortable-preclean-start] ?CMS: abort preclean due to time 1761106.478: [CMS-concurrent-abortable-preclean: 4.694/5.040 secs] [Times: user=7.32 sys=0.23, real=5.04 secs] 1761106.486: [GC[YG occupancy: 497634 K (921600 K)]1761106.487: [Rescan (parallel) , 0.3384965 secs]1761106.825: [weak refs processing, 0.0030734 secs]1761106.829: [class unloading, 0.1503426 secs]1761106.979: [scrub symbol & string tables, 0.1273150 secs] [1 CMS-remark: 3683819K(5136384K)] 4181454K(6057984K), 0.7055549 secs] [Times: user=1.25 sys=0.64, real=0.71 secs] 1761107.193: [CMS-concurrent-sweep-start] 1761111.281: [CMS-concurrent-sweep: 4.088/4.088 secs] [Times: user=5.81 sys=0.15, real=4.09 secs] 1761111.282: [CMS-concurrent-reset-start] 1761111.349: [CMS-concurrent-reset: 0.068/0.068 secs] [Times: user=0.08 sys=0.00, real=0.07 secs] 1761111.961: [GC [1 CMS-initial-mark: 3633902K(5136384K)] 4261007K(6057984K), 0.5015835 secs] [Times: user=0.50 sys=0.00, real=0.50 secs] 1761112.463: [CMS-concurrent-mark-start] 1761116.550: [CMS-concurrent-mark: 4.036/4.087 secs] [Times: user=24.85 sys=1.09, real=4.09 secs] 1761116.551: [CMS-concurrent-preclean-start] 1761116.901: [CMS-concurrent-preclean: 0.344/0.350 secs] [Times: user=0.54 sys=0.01, real=0.35 secs] 1761116.901: [CMS-concurrent-abortable-preclean-start] {Heap before GC invocations=23861 (full 1467): ?par new generation?? total 921600K, used 921600K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) ? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, 0xfffffffe02000000) ? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000, 0xfffffffe0e800000) ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000) ?concurrent mark-sweep generation total 5136384K, used 3633902K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) ?concurrent-mark-sweep perm gen total 524288K, used 243740K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) 1761120.554: [GC 1761120.554: [ParNew: 921600K->102400K(921600K), 0.4726199 secs] 4555502K->3819626K(6057984K), 0.4732486 secs] [Times: user=3.31 sys=0.69, real=0.47 secs] Heap after GC invocations=23862 (full 1467): ?par new generation?? total 921600K, used 102400K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000) ? from space 102400K, 100% used [0xfffffffe02000000, 0xfffffffe08400000, 0xfffffffe08400000) ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000) ?concurrent mark-sweep generation total 5136384K, used 3717226K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) ?concurrent-mark-sweep perm gen total 524288K, used 243740K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) } ?CMS: abort preclean due to time 1761122.392: [CMS-concurrent-abortable-preclean: 4.729/5.490 secs] [Times: user=11.71 sys=0.97, real=5.49 secs] 1761122.401: [GC[YG occupancy: 177317 K (921600 K)]1761122.401: [Rescan (parallel) , 0.0250334 secs]1761122.427: [weak refs processing, 0.0002699 secs]1761122.427: [class unloading, 0.0817179 secs]1761122.509: [scrub symbol & string tables, 0.1383120 secs] [1 CMS-remark: 3717226K(5136384K)] 3894544K(6057984K), 0.3327016 secs] [Times: user=0.55 sys=0.04, real=0.33 secs] 1761122.735: [CMS-concurrent-sweep-start] 1761126.843: [CMS-concurrent-sweep: 4.042/4.108 secs] [Times: user=6.70 sys=0.39, real=4.11 secs] 1761126.844: [CMS-concurrent-reset-start] 1761126.907: [CMS-concurrent-reset: 0.063/0.063 secs] [Times: user=0.11 sys=0.00, real=0.06 secs] 1761127.142: [GC [1 CMS-initial-mark: 3701154K(5136384K)] 4056638K(6057984K), 0.2853309 secs] [Times: user=0.29 sys=0.00, real=0.29 secs] 1761127.428: [CMS-concurrent-mark-start] 1761131.876: [CMS-concurrent-mark: 4.398/4.448 secs] [Times: user=28.46 sys=1.55, real=4.45 secs] 1761131.877: [CMS-concurrent-preclean-start] 1761132.185: [CMS-concurrent-preclean: 0.305/0.308 secs] [Times: user=0.60 sys=0.05, real=0.31 secs] 1761132.186: [CMS-concurrent-abortable-preclean-start] ?CMS: abort preclean due to time 1761137.243: [CMS-concurrent-abortable-preclean: 5.029/5.058 secs] [Times: user=9.88 sys=0.42, real=5.06 secs] 1761137.248: [GC[YG occupancy: 783876 K (921600 K)]1761137.248: [Rescan (parallel) , 0.5402015 secs]1761137.789: [weak refs processing, 0.0022809 secs]1761137.791: [class unloading, 0.1556933 secs]1761137.947: [scrub symbol & string tables, 0.1291759 secs] [1 CMS-remark: 3701154K(5136384K)] 4485030K(6057984K), 0.9154842 secs] [Times: user=1.67 sys=0.97, real=0.92 secs] 1761138.164: [CMS-concurrent-sweep-start] {Heap before GC invocations=23862 (full 1468): ?par new generation?? total 921600K, used 920346K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) ? eden space 819200K,? 99% used [0xfffffffdd0000000, 0xfffffffe01ec6a48, 0xfffffffe02000000) ? from space 102400K, 100% used [0xfffffffe02000000, 0xfffffffe08400000, 0xfffffffe08400000) ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000) ?concurrent mark-sweep generation total 5136384K, used 3694838K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) ?concurrent-mark-sweep perm gen total 524288K, used 243810K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) 1761140.028: [GC 1761140.028: [ParNew: 920346K->102400K(921600K), 0.4882607 secs] 4615185K->4017461K(6057984K), 0.4886748 secs] [Times: user=2.71 sys=0.20, real=0.49 secs] Heap after GC invocations=23863 (full 1468): ?par new generation?? total 921600K, used 102400K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000) ? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000, 0xfffffffe0e800000) ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000) ?concurrent mark-sweep generation total 5136384K, used 3915061K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) ?concurrent-mark-sweep perm gen total 524288K, used 243810K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) } 1761142.726: [CMS-concurrent-sweep: 4.011/4.562 secs] [Times: user=12.54 sys=0.74, real=4.56 secs] 1761142.727: [CMS-concurrent-reset-start] 1761142.791: [CMS-concurrent-reset: 0.064/0.064 secs] [Times: user=0.16 sys=0.01, real=0.06 secs] 1761143.233: [GC [1 CMS-initial-mark: 3852859K(5136384K)] 4152461K(6057984K), 0.2340877 secs] [Times: user=0.23 sys=0.00, real=0.23 secs] 1761143.467: [CMS-concurrent-mark-start] 1761147.673: [CMS-concurrent-mark: 4.182/4.205 secs] [Times: user=26.19 sys=1.27, real=4.21 secs] 1761147.673: [CMS-concurrent-preclean-start] 1761147.978: [CMS-concurrent-preclean: 0.300/0.304 secs] [Times: user=0.44 sys=0.02, real=0.30 secs] 1761147.978: [CMS-concurrent-abortable-preclean-start] {Heap before GC invocations=23863 (full 1469): ?par new generation?? total 921600K, used 602663K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) ? eden space 819200K,? 61% used [0xfffffffdd0000000, 0xfffffffdee889c90, 0xfffffffe02000000) ? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000, 0xfffffffe0e800000) ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000) ?concurrent mark-sweep generation total 5136384K, used 3852859K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) ?concurrent-mark-sweep perm gen total 524288K, used 243656K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) 1761152.969: [GC 1761152.969: [ParNew: 602663K->102400K(921600K), 0.4710031 secs]1761153.440: [CMS CMS: abort preclean due to time 1761155.705: [CMS-concurrent-abortable-preclean: 6.957/7.726 secs] [Times: user=13.02 sys=0.48, real=7.73 secs] ?(concurrent mode failure): 4005428K->3905404K(5136384K), 32.6670849 secs] 4455522K->3905404K(6057984K), [CMS Perm : 243656K->243327K(524288K)], 33.1389061 secs] [Times: user=35.38 sys=0.26, real=33.14 secs] Heap after GC invocations=23864 (full 1470): ?par new generation?? total 921600K, used 0K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000) ? from space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000) ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000) ?concurrent mark-sweep generation total 5136384K, used 3905404K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) ?concurrent-mark-sweep perm gen total 524288K, used 243327K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) } 1761187.694: [GC [1 CMS-initial-mark: 4953978K(5136384K)] 4965714K(6057984K), 0.0131890 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] 1761187.708: [CMS-concurrent-mark-start] 1761191.965: [CMS-concurrent-mark: 3.634/4.257 secs] [Times: user=32.76 sys=1.91, real=4.26 secs] 1761191.966: [CMS-concurrent-preclean-start] 1761192.543: [CMS-concurrent-preclean: 0.553/0.577 secs] [Times: user=1.56 sys=0.12, real=0.58 secs] 1761192.544: [CMS-concurrent-abortable-preclean-start] ?CMS: abort preclean due to time 1761197.612: [CMS-concurrent-abortable-preclean: 2.079/5.068 secs] [Times: user=9.11 sys=0.60, real=5.07 secs] 1761197.617: [GC[YG occupancy: 813510 K (921600 K)]1761197.618: [Rescan (parallel) , 0.7500635 secs]1761198.368: [weak refs processing, 0.0020064 secs]1761198.370: [class unloading, 0.0823783 secs]1761198.453: [scrub symbol & string tables, 0.1278387 secs] [1 CMS-remark: 4953978K(5136384K)] 5767489K(6057984K), 1.0496971 secs] [Times: user=2.69 sys=1.79, real=1.05 secs] 1761198.668: [CMS-concurrent-sweep-start] {Heap before GC invocations=23864 (full 1471): ?par new generation?? total 921600K, used 819200K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) ? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, 0xfffffffe02000000) ? from space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000) ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000) ?concurrent mark-sweep generation total 5136384K, used 4953976K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) ?concurrent-mark-sweep perm gen total 524288K, used 243422K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) 1761198.700: [GC 1761198.700: [ParNew: 819200K->819200K(921600K), 0.0000919 secs]1761198.700: [CMS1761202.072: [CMS-concurrent-sweep: 3.389/3.404 secs] [Times: user=3.60 sys=0.04, real=3.40 secs] ?(concurrent mode failure): 4953976K->3789438K(5136384K), 32.6623615 secs] 5773176K->3789438K(6057984K), [CMS Perm : 243422K->243328K(524288K)], 32.6632802 secs] [Times: user=32.58 sys=0.03, real=32.66 secs] Heap after GC invocations=23865 (full 1472): ?par new generation?? total 921600K, used 0K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000) ? from space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000) ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000) ?concurrent mark-sweep generation total 5136384K, used 3789438K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) ?concurrent-mark-sweep perm gen total 524288K, used 243328K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) } 1761231.418: [GC [1 CMS-initial-mark: 3789438K(5136384K)] 3883471K(6057984K), 0.0609784 secs] [Times: user=0.06 sys=0.01, real=0.06 secs] 1761231.480: [CMS-concurrent-mark-start] 1761236.061: [CMS-concurrent-mark: 3.752/4.580 secs] [Times: user=34.48 sys=2.81, real=4.58 secs] 1761236.061: [CMS-concurrent-preclean-start] 1761236.428: [CMS-concurrent-preclean: 0.358/0.367 secs] [Times: user=0.46 sys=0.01, real=0.37 secs] 1761236.429: [CMS-concurrent-abortable-preclean-start] ?CMS: abort preclean due to time 1761241.488: [CMS-concurrent-abortable-preclean: 2.384/5.059 secs] [Times: user=5.30 sys=0.75, real=5.06 secs] 1761241.497: [GC[YG occupancy: 787969 K (921600 K)]1761241.497: [Rescan (parallel) , 0.5938799 secs]1761242.091: [weak refs processing, 0.0067469 secs]1761242.098: [class unloading, 0.0826078 secs]1761242.181: [scrub symbol & string tables, 0.1308434 secs] [1 CMS-remark: 3789438K(5136384K)] 4577408K(6057984K), 0.9017583 secs] [Times: user=2.66 sys=2.07, real=0.90 secs] 1761242.400: [CMS-concurrent-sweep-start] {Heap before GC invocations=23865 (full 1473): ?par new generation?? total 921600K, used 819200K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) ? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, 0xfffffffe02000000) ? from space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000) ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000) ?concurrent mark-sweep generation total 5136384K, used 3789391K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) ?concurrent-mark-sweep perm gen total 524288K, used 243406K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) 1761242.586: [GC 1761242.586: [ParNew: 819200K->102400K(921600K), 0.1871926 secs] 4608591K->3940305K(6057984K), 0.1879045 secs] [Times: user=0.93 sys=0.05, real=0.19 secs] Heap after GC invocations=23866 (full 1473): ?par new generation?? total 921600K, used 102400K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000) ? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000, 0xfffffffe0e800000) ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000) ?concurrent mark-sweep generation total 5136384K, used 3837905K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) ?concurrent-mark-sweep perm gen total 524288K, used 243406K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) } 1761245.857: [CMS-concurrent-sweep: 3.225/3.457 secs] [Times: user=6.21 sys=0.52, real=3.46 secs] 1761245.858: [CMS-concurrent-reset-start] 1761245.922: [CMS-concurrent-reset: 0.064/0.064 secs] [Times: user=0.08 sys=0.01, real=0.06 secs] 1761247.301: [GC [1 CMS-initial-mark: 3676150K(5136384K)] 3952072K(6057984K), 0.2229528 secs] [Times: user=0.22 sys=0.00, real=0.22 secs] 1761247.525: [CMS-concurrent-mark-start] 1761251.076: [CMS-concurrent-mark: 3.510/3.551 secs] [Times: user=23.68 sys=0.85, real=3.55 secs] 1761251.076: [CMS-concurrent-preclean-start] 1761251.375: [CMS-concurrent-preclean: 0.295/0.298 secs] [Times: user=0.72 sys=0.04, real=0.30 secs] 1761251.375: [CMS-concurrent-abortable-preclean-start] ?CMS: abort preclean due to time 1761256.460: [CMS-concurrent-abortable-preclean: 5.012/5.085 secs] [Times: user=9.93 sys=0.99, real=5.09 secs] 1761256.469: [GC[YG occupancy: 720909 K (921600 K)]1761256.469: [Rescan (parallel) , 0.4663462 secs]1761256.936: [weak refs processing, 0.0153453 secs]1761256.951: [class unloading, 0.0833874 secs]1761257.035: [scrub symbol & string tables, 0.1289153 secs] [1 CMS-remark: 3676150K(5136384K)] 4397060K(6057984K), 0.7879219 secs] [Times: user=1.55 sys=0.96, real=0.79 secs] 1761257.258: [CMS-concurrent-sweep-start] {Heap before GC invocations=23866 (full 1474): ?par new generation?? total 921600K, used 921600K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) ? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, 0xfffffffe02000000) ? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000, 0xfffffffe0e800000) ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000) ?concurrent mark-sweep generation total 5136384K, used 3669509K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) ?concurrent-mark-sweep perm gen total 524288K, used 243414K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) 1761259.137: [GC 1761259.138: [ParNew: 921600K->102400K(921600K), 0.3975686 secs] 4591109K->3894075K(6057984K), 0.3981608 secs] [Times: user=1.65 sys=0.15, real=0.40 secs] Heap after GC invocations=23867 (full 1474): ?par new generation?? total 921600K, used 102400K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000) ? from space 102400K, 100% used [0xfffffffe02000000, 0xfffffffe08400000, 0xfffffffe08400000) ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000) ?concurrent mark-sweep generation total 5136384K, used 3791675K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) ?concurrent-mark-sweep perm gen total 524288K, used 243414K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) } -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120412/16cc56a7/attachment-0001.html -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: CMS Failure.txt Url: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120412/16cc56a7/CMSFailure-0001.txt -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: PARNEW Failure.txt Url: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120412/16cc56a7/PARNEWFailure-0001.txt From aaisinzon at guidewire.com Thu Apr 12 12:15:45 2012 From: aaisinzon at guidewire.com (Alex Aisinzon) Date: Thu, 12 Apr 2012 19:15:45 +0000 Subject: Code cache Message-ID: <43E49E6EC0E84F41B98C68AB6D7820C4170FA028@sm-ex-02-vm.guidewire.com> Any feedback on this? Best Alex A From: Alex Aisinzon Sent: Monday, April 09, 2012 11:38 AM To: 'hotspot-gc-use at openjdk.java.net' Subject: Code cache I ran performance tests on one of our apps and saw the following error message in the GC logs: Java HotSpot(TM) 64-Bit Server VM warning: CodeCache is full. Compiler has been disabled. Java HotSpot(TM) 64-Bit Server VM warning: Try increasing the code cache size using -XX:ReservedCodeCacheSize= I scaled up the code cache to 512MB (-XX:ReservedCodeCacheSize=512m) and markedly improved performance/scalability. I have a few questions: * Is there a logging option that shows how much of the code cache is really used so that I find the right cache size without oversizing it? * What factors play into the code cache utilization? I would guess that the amount of code to compile is the dominant factor. Are there other factors like load: I would guess that some entries in the cache may get invalidated if not used much and load could be a factor in this. I was running on Sun JVM 1.6 update 30 64 bit on x86-64. Best Alex A -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120412/735b6e81/attachment.html From eric.caspole at amd.com Thu Apr 12 12:26:11 2012 From: eric.caspole at amd.com (Eric Caspole) Date: Thu, 12 Apr 2012 15:26:11 -0400 Subject: Code cache In-Reply-To: <43E49E6EC0E84F41B98C68AB6D7820C4170FA028@sm-ex-02-vm.guidewire.com> References: <43E49E6EC0E84F41B98C68AB6D7820C4170FA028@sm-ex-02-vm.guidewire.com> Message-ID: <4320BF98-561E-43AC-85BC-9E291108AD9B@amd.com> Hi Alex, You can try -XX:+UseCodeCacheFlushing where the JVM will selectively age out some compiled code and free up code cache space. This is not on by default in JDK 6 as far as I know. What is your application doing such that it frequently hits this problem? Regards, Eric On Apr 12, 2012, at 3:15 PM, Alex Aisinzon wrote: > Any feedback on this? > > > > Best > > > > Alex A > > > > From: Alex Aisinzon > Sent: Monday, April 09, 2012 11:38 AM > To: 'hotspot-gc-use at openjdk.java.net' > Subject: Code cache > > > > I ran performance tests on one of our apps and saw the following > error message in the GC logs: > > Java HotSpot(TM) 64-Bit Server VM warning: CodeCache is full. > Compiler has been disabled. > > Java HotSpot(TM) 64-Bit Server VM warning: Try increasing the code > cache size using -XX:ReservedCodeCacheSize= > > > > I scaled up the code cache to 512MB (- > XX:ReservedCodeCacheSize=512m) and markedly improved performance/ > scalability. > > > > I have a few questions: > > ? Is there a logging option that shows how much of the code > cache is really used so that I find the right cache size without > oversizing it? > > ? What factors play into the code cache utilization? I > would guess that the amount of code to compile is the dominant > factor. Are there other factors like load: I would guess that some > entries in the cache may get invalidated if not used much and load > could be a factor in this. > > > > I was running on Sun JVM 1.6 update 30 64 bit on x86-64. > > > > Best > > > > Alex A > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From aaisinzon at guidewire.com Thu Apr 12 13:30:33 2012 From: aaisinzon at guidewire.com (Alex Aisinzon) Date: Thu, 12 Apr 2012 20:30:33 +0000 Subject: Code cache In-Reply-To: <4320BF98-561E-43AC-85BC-9E291108AD9B@amd.com> References: <43E49E6EC0E84F41B98C68AB6D7820C4170FA028@sm-ex-02-vm.guidewire.com> <4320BF98-561E-43AC-85BC-9E291108AD9B@amd.com> Message-ID: <43E49E6EC0E84F41B98C68AB6D7820C4170FA1B8@sm-ex-02-vm.guidewire.com> Hi Eric I thank you for the feedback. I will give this tuning a try. I have explored another approach: I have added the option -XX:+PrintCompilation to track code compilation. This option is not very documented. I could infer that, without a larger code cache, about 11000 methods were compiled before hitting the issue. When using a much larger cache (512MB), I saw that about 14000 methods were compiled. My understanding is that the code cache is 48MB for the platform I used (x64). A 14000/11000*48MB aka 61MB cache is likely to avoid the issue. I have started a performance test with a 64MB code cache to see if that indeed avoids the code cache full issue. If so, I would have a method to find the right code cache size. I will report when I have the results. I will also report if -XX:+UseCodeCacheFlushing option provides similar results to the larger code cache. As for your question on why our app is hitting this issue: our applications has become heavier in its use of compiled code so this is likely the consequence of that. Best Alex A -----Original Message----- From: Eric Caspole [mailto:eric.caspole at amd.com] Sent: Thursday, April 12, 2012 12:26 PM To: Alex Aisinzon Cc: hotspot-gc-use at openjdk.java.net Subject: Re: Code cache Hi Alex, You can try -XX:+UseCodeCacheFlushing where the JVM will selectively age out some compiled code and free up code cache space. This is not on by default in JDK 6 as far as I know. What is your application doing such that it frequently hits this problem? Regards, Eric On Apr 12, 2012, at 3:15 PM, Alex Aisinzon wrote: > Any feedback on this? > > > > Best > > > > Alex A > > > > From: Alex Aisinzon > Sent: Monday, April 09, 2012 11:38 AM > To: 'hotspot-gc-use at openjdk.java.net' > Subject: Code cache > > > > I ran performance tests on one of our apps and saw the following > error message in the GC logs: > > Java HotSpot(TM) 64-Bit Server VM warning: CodeCache is full. > Compiler has been disabled. > > Java HotSpot(TM) 64-Bit Server VM warning: Try increasing the code > cache size using -XX:ReservedCodeCacheSize= > > > > I scaled up the code cache to 512MB (- > XX:ReservedCodeCacheSize=512m) and markedly improved performance/ > scalability. > > > > I have a few questions: > > * Is there a logging option that shows how much of the code > cache is really used so that I find the right cache size without > oversizing it? > > * What factors play into the code cache utilization? I > would guess that the amount of code to compile is the dominant > factor. Are there other factors like load: I would guess that some > entries in the cache may get invalidated if not used much and load > could be a factor in this. > > > > I was running on Sun JVM 1.6 update 30 64 bit on x86-64. > > > > Best > > > > Alex A > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From dawid.weiss at gmail.com Thu Apr 12 14:10:02 2012 From: dawid.weiss at gmail.com (Dawid Weiss) Date: Thu, 12 Apr 2012 23:10:02 +0200 Subject: ParNew promotion failed, no expected OOM. In-Reply-To: References: <4F8675F7.9040205@oracle.com> Message-ID: I've spent some time trying to pinpoint the problem and provide a reproducible scenario but I temporarily accept the fact that I am defeated by the darn machine. Anyway, big thanks for feedback guys. Dawid On Thu, Apr 12, 2012 at 8:42 AM, Dawid Weiss wrote: >> I never want to throw an OOM and then have to argue about? whether >> the OOM was thrown prematurely.? That would be a bug.? As a consequence > > I agree the tradeoff here is very subtle and there is probably no > optimal setting. I'll dig deeper in a spare minute and see if I can > repreduce this on a simpler example. > > Dawid From alexey.ragozin at gmail.com Fri Apr 13 04:51:34 2012 From: alexey.ragozin at gmail.com (Alexey Ragozin) Date: Fri, 13 Apr 2012 11:51:34 +0000 Subject: Need help about CMS Failure and ParNew failure Message-ID: Hi Erwin, Promotion failures are happening due to fragmentation of old space. It is normal for fragmentation to build up over time. Most simple way to fight fragmentation - create large old space from start (if you use JVM below 6u26, it is worth to upgrade). Concurrent mode failure means that concurrent collection cycle is starting too late or heap size is not enough. Again allocation more heap from start is simplest remedy. Your logs also indicating problem with initial mark pause time. I have written simple guide line for setting up CMS collector for minimal pauses, you can find more details by link http://blog.ragozin.info/2011/07/gc-check-list-for-data-grid-nodes.html You can also read more about promotion failure / fragmentation by links below http://blog.ragozin.info/2011/10/java-cg-hotspots-cms-and-heap.html http://blog.ragozin.info/2011/10/cms-heap-fragmentation-follow-up-1.html http://blog.ragozin.info/2011/11/java-gc-hotspots-cms-promotion-buffers.html > Date: Thu, 12 Apr 2012 09:27:13 -0700 (PDT) > From: Erwin > Subject: Need help about CMS Failure and ParNew failure > To: "hotspot-gc-use at openjdk.java.net" > ? ? ? ? > Message-ID: > ? ? ? ?<1334248033.11363.YahooMailNeo at web111103.mail.gq1.yahoo.com> > Content-Type: text/plain; charset="iso-8859-1" > > Hello, > ? > I'm not an expert when it comes to analyzing GC output and was wondering if you guys could assist? We're using Solaris 10 10/08 s10s_u6wos_07b SPARC, with WAS NDE 7.0.0.9. After a restart of our JVMs, GC seems to be fine. However, after about a week, we start seeing failures in GC log. We're getting ParNew and Concurrent mod failures. Our JVM configurations?are below: > Min heap - 4096 > Max heap - 6016 > ? > JVM Arguments > -server -Xmn1000m -XX:PermSize=512m -XX:+UseConcMarkSweepGC -XX:+HeapDumpOnOutOfMemoryError -DUseSunHttpHandler=true -Djavax.xml.soap.MessageFactory=weblogic.xml.saaj.MessageFactoryImpl -Doracle.jdbc.V8Compatible=true -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSClassUnloadingEnabled -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:-TraceClassUnloading -XX:+PrintHeapAtGC? -XX:MaxPermSize=694m -XX:+DisableExplicitGC -XX:+CMSParallelRemarkEnabled? > ? > I'm attaching the ParNew failure as well as CMS failure files. Hope it attaches.Total of 2 files. In case they don't see below. !st same if ParNew, 2nd is CMS failure. - Thanks, Erwin > ParNew failure sample: > {Heap before GC invocations=7800 (full 529): > ?par new generation?? total 921600K, used 530694K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) > ? eden space 819200K,? 52% used [0xfffffffdd0000000, 0xfffffffdea241bb8, 0xfffffffe02000000) > ? from space 102400K, 100% used [0xfffffffe02000000, 0xfffffffe08400000, 0xfffffffe08400000) > ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000) > ?concurrent mark-sweep generation total 4902464K, used 2682365K [0xfffffffe0e800000, 0xffffffff39b90000, 0xffffffff48000000) > ?concurrent-mark-sweep perm gen total 524288K, used 238782K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) > 552370.958: [GC 552370.958: [ParNew: 530694K->93237K(921600K), 0.1858500 secs] 3213060K->2832744K(5824064K), 0.1862466 secs] [Times: user=1.21 sys=0.08, real=0.19 secs] > Heap after GC invocations=7801 (full 529): > ?par new generation?? total 921600K, used 93237K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) > ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000) > ? from space 102400K,? 91% used [0xfffffffe08400000, 0xfffffffe0df0d498, 0xfffffffe0e800000) > ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000) > ?concurrent mark-sweep generation total 4902464K, used 2739507K [0xfffffffe0e800000, 0xffffffff39b90000, 0xffffffff48000000) > ?concurrent-mark-sweep perm gen total 524288K, used 238782K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) > } > {Heap before GC invocations=7801 (full 529): > ?par new generation?? total 921600K, used 912377K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) > ? eden space 819200K,? 99% used [0xfffffffdd0000000, 0xfffffffe01ff1320, 0xfffffffe02000000) > ? from space 102400K,? 91% used [0xfffffffe08400000, 0xfffffffe0df0d498, 0xfffffffe0e800000) > ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000) > ?concurrent mark-sweep generation total 4902464K, used 2739507K [0xfffffffe0e800000, 0xffffffff39b90000, 0xffffffff48000000) > ?concurrent-mark-sweep perm gen total 524288K, used 238795K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) > 552372.849: [GC 552372.849: [ParNew (promotion failed): 912377K->869343K(921600K), 0.2641392 secs]552373.113: [CMS: 2791714K->3203612K(4902464K), 29.1902704 secs] 3651885K->3203612K(5824064K), [CMS Perm : 238795K->238246K(524288K)], 29.4609781 secs] [Times: user=30.05 sys=0.13, real=29.46 secs] > Heap after GC invocations=7802 (full 530): > ?par new generation?? total 921600K, used 0K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) > ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000) > ? from space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000) > ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000) > ?concurrent mark-sweep generation total 5136384K, used 3203612K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) > ?concurrent-mark-sweep perm gen total 524288K, used 238246K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) > } > {Heap before GC invocations=12696 (full 809): > ?par new generation?? total 921600K, used 908565K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) > ? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, 0xfffffffe02000000) > ? from space 102400K,? 87% used [0xfffffffe02000000, 0xfffffffe07745510, 0xfffffffe08400000) > ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000) > ?concurrent mark-sweep generation total 5136384K, used 2696786K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) > ?concurrent-mark-sweep perm gen total 524288K, used 241380K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) > 980120.502: [GC 980120.502: [ParNew: 908565K->73974K(921600K), 0.1519646 secs] 3605352K->2817972K(6057984K), 0.1523927 secs] [Times: user=1.07 sys=0.03, real=0.15 secs] > Heap after GC invocations=12697 (full 809): > ?par new generation?? total 921600K, used 73974K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) > ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000) > ? from space 102400K,? 72% used [0xfffffffe08400000, 0xfffffffe0cc3d928, 0xfffffffe0e800000) > ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000) > ?concurrent mark-sweep generation total 5136384K, used 2743998K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) > ?concurrent-mark-sweep perm gen total 524288K, used 241380K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) > } > {Heap before GC invocations=12697 (full 809): > ?par new generation?? total 921600K, used 893174K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) > ? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, 0xfffffffe02000000) > ? from space 102400K,? 72% used [0xfffffffe08400000, 0xfffffffe0cc3d928, 0xfffffffe0e800000) > ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000) > ?concurrent mark-sweep generation total 5136384K, used 2743998K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) > ?concurrent-mark-sweep perm gen total 524288K, used 241411K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) > 980130.777: [GC 980130.777: [ParNew (promotion failed): 893174K->913391K(921600K), 0.5914616 secs]980131.368: [CMS: 2778416K->2710999K(5136384K), 27.7981960 secs] 3637172K->2710999K(6057984K), [CMS Perm : 241411K->240494K(524288K)], 28.3902578 secs] [Times: user=29.37 sys=0.08, real=28.39 secs] > Heap after GC invocations=12698 (full 810): > ?par new generation?? total 921600K, used 0K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) > ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000) > ? from space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000) > ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000) > ?concurrent mark-sweep generation total 5136384K, used 2710999K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) > ?concurrent-mark-sweep perm gen total 524288K, used 240494K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) > } > {Heap before GC invocations=12698 (full 810): > ?par new generation?? total 921600K, used 819200K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) > ? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, 0xfffffffe02000000) > ? from space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000) > ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000) > ?concurrent mark-sweep generation total 5136384K, used 2710999K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) > ?concurrent-mark-sweep perm gen total 524288K, used 240523K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) > 980171.033: [GC 980171.033: [ParNew: 819200K->102400K(921600K), 0.2144047 secs] 3530199K->2857892K(6057984K), 0.2149864 secs] [Times: user=0.99 sys=0.03, real=0.22 secs] > Heap after GC invocations=12699 (full 810): > ?par new generation?? total 921600K, used 102400K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) > ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000) > ? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000, 0xfffffffe0e800000) > ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000) > ?concurrent mark-sweep generation total 5136384K, used 2755492K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) > ?concurrent-mark-sweep perm gen total 524288K, used 240523K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) > } > > CMS Failure: > {Heap before GC invocations=23856 (full 1462): > ?par new generation?? total 921600K, used 921600K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) > ? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, 0xfffffffe02000000) > ? from space 102400K, 100% used [0xfffffffe02000000, 0xfffffffe08400000, 0xfffffffe08400000) > ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000) > ?concurrent mark-sweep generation total 5136384K, used 3496920K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) > ?concurrent-mark-sweep perm gen total 524288K, used 243562K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) > 1761014.981: [GC 1761014.981: [ParNew: 921600K->102400K(921600K), 0.3004508 secs] 4418520K->3667695K(6057984K), 0.3008667 secs] [Times: user=1.69 sys=0.10, real=0.30 secs] > Heap after GC invocations=23857 (full 1462): > ?par new generation?? total 921600K, used 102400K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) > ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000) > ? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000, 0xfffffffe0e800000) > ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000) > ?concurrent mark-sweep generation total 5136384K, used 3565295K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) > ?concurrent-mark-sweep perm gen total 524288K, used 243562K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) > } > {Heap before GC invocations=23857 (full 1462): > ?par new generation?? total 921600K, used 921600K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) > ? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, 0xfffffffe02000000) > ? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000, 0xfffffffe0e800000) > ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000) > ?concurrent mark-sweep generation total 5136384K, used 3565295K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) > ?concurrent-mark-sweep perm gen total 524288K, used 243773K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) > 1761036.997: [GC 1761036.998: [ParNew: 921600K->102400K(921600K), 0.4075457 secs] 4486895K->3763683K(6057984K), 0.4079591 secs] [Times: user=3.57 sys=0.80, real=0.41 secs] > Heap after GC invocations=23858 (full 1462): > ?par new generation?? total 921600K, used 102400K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) > ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000) > ? from space 102400K, 100% used [0xfffffffe02000000, 0xfffffffe08400000, 0xfffffffe08400000) > ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000) > ?concurrent mark-sweep generation total 5136384K, used 3661283K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) > ?concurrent-mark-sweep perm gen total 524288K, used 243773K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) > } > 1761037.410: [GC [1 CMS-initial-mark: 3661283K(5136384K)] 3763683K(6057984K), 0.0883369 secs] [Times: user=0.09 sys=0.00, real=0.09 secs] > 1761037.499: [CMS-concurrent-mark-start] > 1761041.447: [CMS-concurrent-mark: 3.906/3.948 secs] [Times: user=25.81 sys=1.06, real=3.95 secs] > 1761041.448: [CMS-concurrent-preclean-start] > 1761041.763: [CMS-concurrent-preclean: 0.312/0.315 secs] [Times: user=0.50 sys=0.02, real=0.32 secs] > 1761041.763: [CMS-concurrent-abortable-preclean-start] > ?CMS: abort preclean due to time 1761046.800: [CMS-concurrent-abortable-preclean: 4.720/5.036 secs] [Times: user=6.68 sys=0.18, real=5.04 secs] > 1761046.808: [GC[YG occupancy: 464701 K (921600 K)]1761046.808: [Rescan (parallel) , 0.3034664 secs]1761047.112: [weak refs processing, 0.0152564 secs]1761047.128: [class unloading, 0.1518160 secs]1761047.280: [scrub symbol & string tables, 0.1332523 secs] [1 CMS-remark: 3661283K(5136384K)] 4125985K(6057984K), 0.6980401 secs] [Times: user=1.34 sys=0.70, real=0.70 secs] > 1761047.507: [CMS-concurrent-sweep-start] > 1761051.779: [CMS-concurrent-sweep: 4.252/4.271 secs] [Times: user=6.30 sys=0.19, real=4.27 secs] > 1761051.779: [CMS-concurrent-reset-start] > 1761051.837: [CMS-concurrent-reset: 0.058/0.058 secs] [Times: user=0.07 sys=0.00, real=0.06 secs] > {Heap before GC invocations=23858 (full 1463): > ?par new generation?? total 921600K, used 921600K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) > ? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, 0xfffffffe02000000) > ? from space 102400K, 100% used [0xfffffffe02000000, 0xfffffffe08400000, 0xfffffffe08400000) > ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000) > ?concurrent mark-sweep generation total 5136384K, used 3514703K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) > ?concurrent-mark-sweep perm gen total 524288K, used 243613K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) > 1761061.144: [GC 1761061.145: [ParNew: 921600K->102400K(921600K), 0.4124278 secs] 4436303K->3717736K(6057984K), 0.4128777 secs] [Times: user=1.98 sys=0.19, real=0.41 secs] > Heap after GC invocations=23859 (full 1463): > ?par new generation?? total 921600K, used 102400K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) > ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000) > ? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000, 0xfffffffe0e800000) > ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000) > ?concurrent mark-sweep generation total 5136384K, used 3615336K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) > ?concurrent-mark-sweep perm gen total 524288K, used 243613K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) > } > 1761061.562: [GC [1 CMS-initial-mark: 3615336K(5136384K)] 3718604K(6057984K), 0.0660086 secs] [Times: user=0.07 sys=0.00, real=0.07 secs] > 1761061.629: [CMS-concurrent-mark-start] > 1761065.589: [CMS-concurrent-mark: 3.920/3.960 secs] [Times: user=26.20 sys=1.05, real=3.96 secs] > 1761065.590: [CMS-concurrent-preclean-start] > 1761065.883: [CMS-concurrent-preclean: 0.290/0.293 secs] [Times: user=0.54 sys=0.02, real=0.29 secs] > 1761065.883: [CMS-concurrent-abortable-preclean-start] > ?CMS: abort preclean due to time 1761070.950: [CMS-concurrent-abortable-preclean: 5.035/5.067 secs] [Times: user=10.70 sys=0.36, real=5.07 secs] > 1761070.958: [GC[YG occupancy: 656197 K (921600 K)]1761070.959: [Rescan (parallel) , 0.5056315 secs]1761071.465: [weak refs processing, 0.0107058 secs]1761071.476: [class unloading, 0.1500832 secs]1761071.626: [scrub symbol & string tables, 0.1278517 secs] [1 CMS-remark: 3615336K(5136384K)] 4271533K(6057984K), 0.8857121 secs] [Times: user=1.77 sys=1.08, real=0.89 secs] > 1761071.845: [CMS-concurrent-sweep-start] > 1761075.956: [CMS-concurrent-sweep: 4.094/4.111 secs] [Times: user=7.97 sys=0.27, real=4.11 secs] > 1761075.957: [CMS-concurrent-reset-start] > 1761076.031: [CMS-concurrent-reset: 0.063/0.074 secs] [Times: user=0.13 sys=0.01, real=0.07 secs] > {Heap before GC invocations=23859 (full 1464): > ?par new generation?? total 921600K, used 921600K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) > ? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, 0xfffffffe02000000) > ? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000, 0xfffffffe0e800000) > ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000) > ?concurrent mark-sweep generation total 5136384K, used 3544377K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) > ?concurrent-mark-sweep perm gen total 524288K, used 243474K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) > 1761077.540: [GC 1761077.540: [ParNew: 921600K->102400K(921600K), 0.4030394 secs] 4465977K->3740399K(6057984K), 0.4034742 secs] [Times: user=3.14 sys=0.55, real=0.40 secs] > Heap after GC invocations=23860 (full 1464): > ?par new generation?? total 921600K, used 102400K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) > ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000) > ? from space 102400K, 100% used [0xfffffffe02000000, 0xfffffffe08400000, 0xfffffffe08400000) > ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000) > ?concurrent mark-sweep generation total 5136384K, used 3637999K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) > ?concurrent-mark-sweep perm gen total 524288K, used 243474K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) > } > 1761077.948: [GC [1 CMS-initial-mark: 3637999K(5136384K)] 3740403K(6057984K), 0.0664402 secs] [Times: user=0.06 sys=0.00, real=0.07 secs] > 1761078.015: [CMS-concurrent-mark-start] > 1761082.141: [CMS-concurrent-mark: 4.076/4.126 secs] [Times: user=25.56 sys=1.24, real=4.13 secs] > 1761082.142: [CMS-concurrent-preclean-start] > 1761082.435: [CMS-concurrent-preclean: 0.290/0.293 secs] [Times: user=0.56 sys=0.03, real=0.29 secs] > 1761082.435: [CMS-concurrent-abortable-preclean-start] > ?CMS: abort preclean due to time 1761087.544: [CMS-concurrent-abortable-preclean: 4.166/5.108 secs] [Times: user=6.79 sys=0.38, real=5.11 secs] > 1761087.554: [GC[YG occupancy: 612230 K (921600 K)]1761087.555: [Rescan (parallel) , 0.3453344 secs]1761087.900: [weak refs processing, 0.0033384 secs]1761087.904: [class unloading, 0.1515234 secs]1761088.055: [scrub symbol & string tables, 0.1280533 secs] [1 CMS-remark: 3637999K(5136384K)] 4250230K(6057984K), 0.7189376 secs] [Times: user=1.29 sys=0.76, real=0.72 secs] > 1761088.274: [CMS-concurrent-sweep-start] > 1761092.543: [CMS-concurrent-sweep: 4.268/4.268 secs] [Times: user=6.72 sys=0.26, real=4.27 secs] > 1761092.543: [CMS-concurrent-reset-start] > 1761092.606: [CMS-concurrent-reset: 0.063/0.063 secs] [Times: user=0.11 sys=0.01, real=0.06 secs] > {Heap before GC invocations=23860 (full 1465): > ?par new generation?? total 921600K, used 921600K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) > ? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, 0xfffffffe02000000) > ? from space 102400K, 100% used [0xfffffffe02000000, 0xfffffffe08400000, 0xfffffffe08400000) > ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000) > ?concurrent mark-sweep generation total 5136384K, used 3582457K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) > ?concurrent-mark-sweep perm gen total 524288K, used 243634K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) > 1761096.682: [GC 1761096.682: [ParNew: 921600K->102400K(921600K), 0.2843209 secs] 4504057K->3786219K(6057984K), 0.2847419 secs] [Times: user=1.81 sys=0.10, real=0.29 secs] > Heap after GC invocations=23861 (full 1465): > ?par new generation?? total 921600K, used 102400K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) > ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000) > ? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000, 0xfffffffe0e800000) > ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000) > ?concurrent mark-sweep generation total 5136384K, used 3683819K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) > ?concurrent-mark-sweep perm gen total 524288K, used 243634K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) > } > 1761096.971: [GC [1 CMS-initial-mark: 3683819K(5136384K)] 3786817K(6057984K), 0.0480239 secs] [Times: user=0.05 sys=0.00, real=0.05 secs] > 1761097.020: [CMS-concurrent-mark-start] > 1761101.145: [CMS-concurrent-mark: 4.104/4.124 secs] [Times: user=24.60 sys=1.09, real=4.13 secs] > 1761101.145: [CMS-concurrent-preclean-start] > 1761101.438: [CMS-concurrent-preclean: 0.290/0.292 secs] [Times: user=0.41 sys=0.02, real=0.29 secs] > 1761101.438: [CMS-concurrent-abortable-preclean-start] > ?CMS: abort preclean due to time 1761106.478: [CMS-concurrent-abortable-preclean: 4.694/5.040 secs] [Times: user=7.32 sys=0.23, real=5.04 secs] > 1761106.486: [GC[YG occupancy: 497634 K (921600 K)]1761106.487: [Rescan (parallel) , 0.3384965 secs]1761106.825: [weak refs processing, 0.0030734 secs]1761106.829: [class unloading, 0.1503426 secs]1761106.979: [scrub symbol & string tables, 0.1273150 secs] [1 CMS-remark: 3683819K(5136384K)] 4181454K(6057984K), 0.7055549 secs] [Times: user=1.25 sys=0.64, real=0.71 secs] > 1761107.193: [CMS-concurrent-sweep-start] > 1761111.281: [CMS-concurrent-sweep: 4.088/4.088 secs] [Times: user=5.81 sys=0.15, real=4.09 secs] > 1761111.282: [CMS-concurrent-reset-start] > 1761111.349: [CMS-concurrent-reset: 0.068/0.068 secs] [Times: user=0.08 sys=0.00, real=0.07 secs] > 1761111.961: [GC [1 CMS-initial-mark: 3633902K(5136384K)] 4261007K(6057984K), 0.5015835 secs] [Times: user=0.50 sys=0.00, real=0.50 secs] > 1761112.463: [CMS-concurrent-mark-start] > 1761116.550: [CMS-concurrent-mark: 4.036/4.087 secs] [Times: user=24.85 sys=1.09, real=4.09 secs] > 1761116.551: [CMS-concurrent-preclean-start] > 1761116.901: [CMS-concurrent-preclean: 0.344/0.350 secs] [Times: user=0.54 sys=0.01, real=0.35 secs] > 1761116.901: [CMS-concurrent-abortable-preclean-start] > {Heap before GC invocations=23861 (full 1467): > ?par new generation?? total 921600K, used 921600K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) > ? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, 0xfffffffe02000000) > ? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000, 0xfffffffe0e800000) > ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000) > ?concurrent mark-sweep generation total 5136384K, used 3633902K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) > ?concurrent-mark-sweep perm gen total 524288K, used 243740K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) > 1761120.554: [GC 1761120.554: [ParNew: 921600K->102400K(921600K), 0.4726199 secs] 4555502K->3819626K(6057984K), 0.4732486 secs] [Times: user=3.31 sys=0.69, real=0.47 secs] > Heap after GC invocations=23862 (full 1467): > ?par new generation?? total 921600K, used 102400K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) > ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000) > ? from space 102400K, 100% used [0xfffffffe02000000, 0xfffffffe08400000, 0xfffffffe08400000) > ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000) > ?concurrent mark-sweep generation total 5136384K, used 3717226K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) > ?concurrent-mark-sweep perm gen total 524288K, used 243740K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) > } > ?CMS: abort preclean due to time 1761122.392: [CMS-concurrent-abortable-preclean: 4.729/5.490 secs] [Times: user=11.71 sys=0.97, real=5.49 secs] > 1761122.401: [GC[YG occupancy: 177317 K (921600 K)]1761122.401: [Rescan (parallel) , 0.0250334 secs]1761122.427: [weak refs processing, 0.0002699 secs]1761122.427: [class unloading, 0.0817179 secs]1761122.509: [scrub symbol & string tables, 0.1383120 secs] [1 CMS-remark: 3717226K(5136384K)] 3894544K(6057984K), 0.3327016 secs] [Times: user=0.55 sys=0.04, real=0.33 secs] > 1761122.735: [CMS-concurrent-sweep-start] > 1761126.843: [CMS-concurrent-sweep: 4.042/4.108 secs] [Times: user=6.70 sys=0.39, real=4.11 secs] > 1761126.844: [CMS-concurrent-reset-start] > 1761126.907: [CMS-concurrent-reset: 0.063/0.063 secs] [Times: user=0.11 sys=0.00, real=0.06 secs] > 1761127.142: [GC [1 CMS-initial-mark: 3701154K(5136384K)] 4056638K(6057984K), 0.2853309 secs] [Times: user=0.29 sys=0.00, real=0.29 secs] > 1761127.428: [CMS-concurrent-mark-start] > 1761131.876: [CMS-concurrent-mark: 4.398/4.448 secs] [Times: user=28.46 sys=1.55, real=4.45 secs] > 1761131.877: [CMS-concurrent-preclean-start] > 1761132.185: [CMS-concurrent-preclean: 0.305/0.308 secs] [Times: user=0.60 sys=0.05, real=0.31 secs] > 1761132.186: [CMS-concurrent-abortable-preclean-start] > ?CMS: abort preclean due to time 1761137.243: [CMS-concurrent-abortable-preclean: 5.029/5.058 secs] [Times: user=9.88 sys=0.42, real=5.06 secs] > 1761137.248: [GC[YG occupancy: 783876 K (921600 K)]1761137.248: [Rescan (parallel) , 0.5402015 secs]1761137.789: [weak refs processing, 0.0022809 secs]1761137.791: [class unloading, 0.1556933 secs]1761137.947: [scrub symbol & string tables, 0.1291759 secs] [1 CMS-remark: 3701154K(5136384K)] 4485030K(6057984K), 0.9154842 secs] [Times: user=1.67 sys=0.97, real=0.92 secs] > 1761138.164: [CMS-concurrent-sweep-start] > {Heap before GC invocations=23862 (full 1468): > ?par new generation?? total 921600K, used 920346K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) > ? eden space 819200K,? 99% used [0xfffffffdd0000000, 0xfffffffe01ec6a48, 0xfffffffe02000000) > ? from space 102400K, 100% used [0xfffffffe02000000, 0xfffffffe08400000, 0xfffffffe08400000) > ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000) > ?concurrent mark-sweep generation total 5136384K, used 3694838K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) > ?concurrent-mark-sweep perm gen total 524288K, used 243810K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) > 1761140.028: [GC 1761140.028: [ParNew: 920346K->102400K(921600K), 0.4882607 secs] 4615185K->4017461K(6057984K), 0.4886748 secs] [Times: user=2.71 sys=0.20, real=0.49 secs] > Heap after GC invocations=23863 (full 1468): > ?par new generation?? total 921600K, used 102400K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) > ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000) > ? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000, 0xfffffffe0e800000) > ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000) > ?concurrent mark-sweep generation total 5136384K, used 3915061K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) > ?concurrent-mark-sweep perm gen total 524288K, used 243810K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) > } > 1761142.726: [CMS-concurrent-sweep: 4.011/4.562 secs] [Times: user=12.54 sys=0.74, real=4.56 secs] > 1761142.727: [CMS-concurrent-reset-start] > 1761142.791: [CMS-concurrent-reset: 0.064/0.064 secs] [Times: user=0.16 sys=0.01, real=0.06 secs] > 1761143.233: [GC [1 CMS-initial-mark: 3852859K(5136384K)] 4152461K(6057984K), 0.2340877 secs] [Times: user=0.23 sys=0.00, real=0.23 secs] > 1761143.467: [CMS-concurrent-mark-start] > 1761147.673: [CMS-concurrent-mark: 4.182/4.205 secs] [Times: user=26.19 sys=1.27, real=4.21 secs] > 1761147.673: [CMS-concurrent-preclean-start] > 1761147.978: [CMS-concurrent-preclean: 0.300/0.304 secs] [Times: user=0.44 sys=0.02, real=0.30 secs] > 1761147.978: [CMS-concurrent-abortable-preclean-start] > {Heap before GC invocations=23863 (full 1469): > ?par new generation?? total 921600K, used 602663K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) > ? eden space 819200K,? 61% used [0xfffffffdd0000000, 0xfffffffdee889c90, 0xfffffffe02000000) > ? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000, 0xfffffffe0e800000) > ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000) > ?concurrent mark-sweep generation total 5136384K, used 3852859K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) > ?concurrent-mark-sweep perm gen total 524288K, used 243656K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) > 1761152.969: [GC 1761152.969: [ParNew: 602663K->102400K(921600K), 0.4710031 secs]1761153.440: [CMS CMS: abort preclean due to time 1761155.705: [CMS-concurrent-abortable-preclean: 6.957/7.726 secs] [Times: user=13.02 sys=0.48, real=7.73 secs] > ?(concurrent mode failure): 4005428K->3905404K(5136384K), 32.6670849 secs] 4455522K->3905404K(6057984K), [CMS Perm : 243656K->243327K(524288K)], 33.1389061 secs] [Times: user=35.38 sys=0.26, real=33.14 secs] > Heap after GC invocations=23864 (full 1470): > ?par new generation?? total 921600K, used 0K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) > ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000) > ? from space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000) > ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000) > ?concurrent mark-sweep generation total 5136384K, used 3905404K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) > ?concurrent-mark-sweep perm gen total 524288K, used 243327K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) > } > 1761187.694: [GC [1 CMS-initial-mark: 4953978K(5136384K)] 4965714K(6057984K), 0.0131890 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] > 1761187.708: [CMS-concurrent-mark-start] > 1761191.965: [CMS-concurrent-mark: 3.634/4.257 secs] [Times: user=32.76 sys=1.91, real=4.26 secs] > 1761191.966: [CMS-concurrent-preclean-start] > 1761192.543: [CMS-concurrent-preclean: 0.553/0.577 secs] [Times: user=1.56 sys=0.12, real=0.58 secs] > 1761192.544: [CMS-concurrent-abortable-preclean-start] > ?CMS: abort preclean due to time 1761197.612: [CMS-concurrent-abortable-preclean: 2.079/5.068 secs] [Times: user=9.11 sys=0.60, real=5.07 secs] > 1761197.617: [GC[YG occupancy: 813510 K (921600 K)]1761197.618: [Rescan (parallel) , 0.7500635 secs]1761198.368: [weak refs processing, 0.0020064 secs]1761198.370: [class unloading, 0.0823783 secs]1761198.453: [scrub symbol & string tables, 0.1278387 secs] [1 CMS-remark: 4953978K(5136384K)] 5767489K(6057984K), 1.0496971 secs] [Times: user=2.69 sys=1.79, real=1.05 secs] > 1761198.668: [CMS-concurrent-sweep-start] > {Heap before GC invocations=23864 (full 1471): > ?par new generation?? total 921600K, used 819200K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) > ? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, 0xfffffffe02000000) > ? from space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000) > ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000) > ?concurrent mark-sweep generation total 5136384K, used 4953976K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) > ?concurrent-mark-sweep perm gen total 524288K, used 243422K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) > 1761198.700: [GC 1761198.700: [ParNew: 819200K->819200K(921600K), 0.0000919 secs]1761198.700: [CMS1761202.072: [CMS-concurrent-sweep: 3.389/3.404 secs] [Times: user=3.60 sys=0.04, real=3.40 secs] > ?(concurrent mode failure): 4953976K->3789438K(5136384K), 32.6623615 secs] 5773176K->3789438K(6057984K), [CMS Perm : 243422K->243328K(524288K)], 32.6632802 secs] [Times: user=32.58 sys=0.03, real=32.66 secs] > Heap after GC invocations=23865 (full 1472): > ?par new generation?? total 921600K, used 0K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) > ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000) > ? from space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000) > ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000) > ?concurrent mark-sweep generation total 5136384K, used 3789438K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) > ?concurrent-mark-sweep perm gen total 524288K, used 243328K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) > } > 1761231.418: [GC [1 CMS-initial-mark: 3789438K(5136384K)] 3883471K(6057984K), 0.0609784 secs] [Times: user=0.06 sys=0.01, real=0.06 secs] > 1761231.480: [CMS-concurrent-mark-start] > 1761236.061: [CMS-concurrent-mark: 3.752/4.580 secs] [Times: user=34.48 sys=2.81, real=4.58 secs] > 1761236.061: [CMS-concurrent-preclean-start] > 1761236.428: [CMS-concurrent-preclean: 0.358/0.367 secs] [Times: user=0.46 sys=0.01, real=0.37 secs] > 1761236.429: [CMS-concurrent-abortable-preclean-start] > ?CMS: abort preclean due to time 1761241.488: [CMS-concurrent-abortable-preclean: 2.384/5.059 secs] [Times: user=5.30 sys=0.75, real=5.06 secs] > 1761241.497: [GC[YG occupancy: 787969 K (921600 K)]1761241.497: [Rescan (parallel) , 0.5938799 secs]1761242.091: [weak refs processing, 0.0067469 secs]1761242.098: [class unloading, 0.0826078 secs]1761242.181: [scrub symbol & string tables, 0.1308434 secs] [1 CMS-remark: 3789438K(5136384K)] 4577408K(6057984K), 0.9017583 secs] [Times: user=2.66 sys=2.07, real=0.90 secs] > 1761242.400: [CMS-concurrent-sweep-start] > {Heap before GC invocations=23865 (full 1473): > ?par new generation?? total 921600K, used 819200K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) > ? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, 0xfffffffe02000000) > ? from space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000) > ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000) > ?concurrent mark-sweep generation total 5136384K, used 3789391K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) > ?concurrent-mark-sweep perm gen total 524288K, used 243406K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) > 1761242.586: [GC 1761242.586: [ParNew: 819200K->102400K(921600K), 0.1871926 secs] 4608591K->3940305K(6057984K), 0.1879045 secs] [Times: user=0.93 sys=0.05, real=0.19 secs] > Heap after GC invocations=23866 (full 1473): > ?par new generation?? total 921600K, used 102400K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) > ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000) > ? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000, 0xfffffffe0e800000) > ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000) > ?concurrent mark-sweep generation total 5136384K, used 3837905K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) > ?concurrent-mark-sweep perm gen total 524288K, used 243406K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) > } > 1761245.857: [CMS-concurrent-sweep: 3.225/3.457 secs] [Times: user=6.21 sys=0.52, real=3.46 secs] > 1761245.858: [CMS-concurrent-reset-start] > 1761245.922: [CMS-concurrent-reset: 0.064/0.064 secs] [Times: user=0.08 sys=0.01, real=0.06 secs] > 1761247.301: [GC [1 CMS-initial-mark: 3676150K(5136384K)] 3952072K(6057984K), 0.2229528 secs] [Times: user=0.22 sys=0.00, real=0.22 secs] > 1761247.525: [CMS-concurrent-mark-start] > 1761251.076: [CMS-concurrent-mark: 3.510/3.551 secs] [Times: user=23.68 sys=0.85, real=3.55 secs] > 1761251.076: [CMS-concurrent-preclean-start] > 1761251.375: [CMS-concurrent-preclean: 0.295/0.298 secs] [Times: user=0.72 sys=0.04, real=0.30 secs] > 1761251.375: [CMS-concurrent-abortable-preclean-start] > ?CMS: abort preclean due to time 1761256.460: [CMS-concurrent-abortable-preclean: 5.012/5.085 secs] [Times: user=9.93 sys=0.99, real=5.09 secs] > 1761256.469: [GC[YG occupancy: 720909 K (921600 K)]1761256.469: [Rescan (parallel) , 0.4663462 secs]1761256.936: [weak refs processing, 0.0153453 secs]1761256.951: [class unloading, 0.0833874 secs]1761257.035: [scrub symbol & string tables, 0.1289153 secs] [1 CMS-remark: 3676150K(5136384K)] 4397060K(6057984K), 0.7879219 secs] [Times: user=1.55 sys=0.96, real=0.79 secs] > 1761257.258: [CMS-concurrent-sweep-start] > {Heap before GC invocations=23866 (full 1474): > ?par new generation?? total 921600K, used 921600K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) > ? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, 0xfffffffe02000000) > ? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000, 0xfffffffe0e800000) > ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000) > ?concurrent mark-sweep generation total 5136384K, used 3669509K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) > ?concurrent-mark-sweep perm gen total 524288K, used 243414K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) > 1761259.137: [GC 1761259.138: [ParNew: 921600K->102400K(921600K), 0.3975686 secs] 4591109K->3894075K(6057984K), 0.3981608 secs] [Times: user=1.65 sys=0.15, real=0.40 secs] > Heap after GC invocations=23867 (full 1474): > ?par new generation?? total 921600K, used 102400K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000) > ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000) > ? from space 102400K, 100% used [0xfffffffe02000000, 0xfffffffe08400000, 0xfffffffe08400000) > ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000) > ?concurrent mark-sweep generation total 5136384K, used 3791675K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) > ?concurrent-mark-sweep perm gen total 524288K, used 243414K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) > } > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120412/16cc56a7/attachment.html > -------------- next part -------------- > An embedded and charset-unspecified text was scrubbed... > Name: CMS Failure.txt > Url: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120412/16cc56a7/CMSFailure.txt > -------------- next part -------------- > An embedded and charset-unspecified text was scrubbed... > Name: PARNEW Failure.txt > Url: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120412/16cc56a7/PARNEWFailure.txt From rednaxelafx at gmail.com Fri Apr 13 06:44:52 2012 From: rednaxelafx at gmail.com (Krystal Mok) Date: Fri, 13 Apr 2012 21:44:52 +0800 Subject: Code cache In-Reply-To: <43E49E6EC0E84F41B98C68AB6D7820C4170F7515@sm-ex-02-vm.guidewire.com> References: <43E49E6EC0E84F41B98C68AB6D7820C4170F7515@sm-ex-02-vm.guidewire.com> Message-ID: Hi Alex, On Tue, Apr 10, 2012 at 2:37 AM, Alex Aisinzon wrote: > > **? **Is there a logging option that shows how much of the code > cache is really used so that I find the right cache size without oversizing > it? > FYI, you can use JConsole or other JMX clients to see the usage of code cache [1] - Kris [1]: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2012-March/003353.html -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120413/199fc271/attachment.html From alexey.ragozin at gmail.com Sun Apr 15 01:12:30 2012 From: alexey.ragozin at gmail.com (Alexey Ragozin) Date: Sun, 15 Apr 2012 12:12:30 +0400 Subject: Need help about CMS Failure and ParNew failure In-Reply-To: <1334429881.40648.YahooMailNeo@web111106.mail.gq1.yahoo.com> References: <1334429881.40648.YahooMailNeo@web111106.mail.gq1.yahoo.com> Message-ID: Hi, On Sat, Apr 14, 2012 at 10:58 PM, Erwin wrote: > Alexy, > > Thanks for the tips. I read your other links. Several questions for you: > 1. Our min heap is 4096 and max is 6016. To combat heap fragmentation, we > should try increating max to 8gb? Our young space is 1GB by setting > -Xmn1000m so increasing to 8gb will give old space an extra 2gb? Correct. I would also suggest you to use same size for -Xms and -Xmx. > 2. I should also set my -XX:CMSInitiatingOccupancyFraction=70 to something > like 60 to initate CMS sooner and prevent CMS failure? You also should set -XX:+UseCMSInitiatingOccupancyOnly, otherwise JVM may override your settings. > 3. What WAS NDE?has JDK 6u26? We're upgrading from 7.0.0.9 to 7.0.0.21. Cannot help you here. > > Thanks, > Erwin > > From: Alexey Ragozin > To: tanman12345 at yahoo.com; hotspot-gc-use at openjdk.java.net > Sent: Friday, April 13, 2012 6:51 AM > Subject: Re: Need help about CMS Failure and ParNew failure > > > Hi Erwin, > > Promotion failures are happening due to fragmentation of old space. It > is normal for fragmentation to build up over time. Most simple way to > fight fragmentation - create large old space from start (if you use > JVM below 6u26, it is worth to upgrade). > > Concurrent mode failure means that concurrent collection cycle is > starting too late or heap size is not enough. Again allocation more > heap from start is simplest remedy. > > Your logs also indicating problem with initial mark pause time. I have > written simple guide line for setting up CMS collector for minimal > pauses, you can find more details by link > http://blog.ragozin.info/2011/07/gc-check-list-for-data-grid-nodes.html > > You can also read more about promotion failure / fragmentation by links > below > http://blog.ragozin.info/2011/10/java-cg-hotspots-cms-and-heap.html > http://blog.ragozin.info/2011/10/cms-heap-fragmentation-follow-up-1.html > http://blog.ragozin.info/2011/11/java-gc-hotspots-cms-promotion-buffers.html > >> Date: Thu, 12 Apr 2012 09:27:13 -0700 (PDT) >> From: Erwin >> Subject: Need help about CMS Failure and ParNew failure >> To: "hotspot-gc-use at openjdk.java.net" >> ? ? ? ? >> Message-ID: >> ? ? ? ?<1334248033.11363.YahooMailNeo at web111103.mail.gq1.yahoo.com> >> Content-Type: text/plain; charset="iso-8859-1" >> >> Hello, >> ? >> I'm not an expert when it comes to analyzing GC output and was wondering >> if you guys could assist? We're using Solaris 10 10/08 s10s_u6wos_07b SPARC, >> with WAS NDE 7.0.0.9. After a restart of our JVMs, GC seems to be fine. >> However, after about a week, we start seeing failures in GC log. We're >> getting ParNew and Concurrent mod failures. Our JVM configurations?are >> below: >> Min heap - 4096 >> Max heap - 6016 >> ? >> JVM Arguments >> -server -Xmn1000m -XX:PermSize=512m -XX:+UseConcMarkSweepGC >> -XX:+HeapDumpOnOutOfMemoryError -DUseSunHttpHandler=true >> -Djavax.xml.soap.MessageFactory=weblogic.xml.saaj.MessageFactoryImpl >> -Doracle.jdbc.V8Compatible=true -XX:CMSInitiatingOccupancyFraction=70 >> -XX:+CMSClassUnloadingEnabled -verbose:gc -XX:+PrintGCTimeStamps >> -XX:+PrintGCDetails -XX:-TraceClassUnloading -XX:+PrintHeapAtGC? >> -XX:MaxPermSize=694m -XX:+DisableExplicitGC -XX:+CMSParallelRemarkEnabled? >> ? >> I'm attaching the ParNew failure as well as CMS failure files. Hope it >> attaches.Total of 2 files. In case they don't see below. !st same if ParNew, >> 2nd is CMS failure. - Thanks, Erwin >> ParNew failure sample: >> {Heap before GC invocations=7800 (full 529): >> ?par new generation?? total 921600K, used 530694K [0xfffffffdd0000000, >> 0xfffffffe0e800000, 0xfffffffe0e800000) >> ? eden space 819200K,? 52% used [0xfffffffdd0000000, 0xfffffffdea241bb8, >> 0xfffffffe02000000) >> ? from space 102400K, 100% used [0xfffffffe02000000, 0xfffffffe08400000, >> 0xfffffffe08400000) >> ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, >> 0xfffffffe0e800000) >> ?concurrent mark-sweep generation total 4902464K, used 2682365K >> [0xfffffffe0e800000, 0xffffffff39b90000, 0xffffffff48000000) >> ?concurrent-mark-sweep perm gen total 524288K, used 238782K >> [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) >> 552370.958: [GC 552370.958: [ParNew: 530694K->93237K(921600K), 0.1858500 >> secs] 3213060K->2832744K(5824064K), 0.1862466 secs] [Times: user=1.21 >> sys=0.08, real=0.19 secs] >> Heap after GC invocations=7801 (full 529): >> ?par new generation?? total 921600K, used 93237K [0xfffffffdd0000000, >> 0xfffffffe0e800000, 0xfffffffe0e800000) >> ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, >> 0xfffffffe02000000) >> ? from space 102400K,? 91% used [0xfffffffe08400000, 0xfffffffe0df0d498, >> 0xfffffffe0e800000) >> ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, >> 0xfffffffe08400000) >> ?concurrent mark-sweep generation total 4902464K, used 2739507K >> [0xfffffffe0e800000, 0xffffffff39b90000, 0xffffffff48000000) >> ?concurrent-mark-sweep perm gen total 524288K, used 238782K >> [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) >> } >> {Heap before GC invocations=7801 (full 529): >> ?par new generation?? total 921600K, used 912377K [0xfffffffdd0000000, >> 0xfffffffe0e800000, 0xfffffffe0e800000) >> ? eden space 819200K,? 99% used [0xfffffffdd0000000, 0xfffffffe01ff1320, >> 0xfffffffe02000000) >> ? from space 102400K,? 91% used [0xfffffffe08400000, 0xfffffffe0df0d498, >> 0xfffffffe0e800000) >> ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, >> 0xfffffffe08400000) >> ?concurrent mark-sweep generation total 4902464K, used 2739507K >> [0xfffffffe0e800000, 0xffffffff39b90000, 0xffffffff48000000) >> ?concurrent-mark-sweep perm gen total 524288K, used 238795K >> [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) >> 552372.849: [GC 552372.849: [ParNew (promotion failed): >> 912377K->869343K(921600K), 0.2641392 secs]552373.113: [CMS: >> 2791714K->3203612K(4902464K), 29.1902704 secs] 3651885K->3203612K(5824064K), >> [CMS Perm : 238795K->238246K(524288K)], 29.4609781 secs] [Times: user=30.05 >> sys=0.13, real=29.46 secs] >> Heap after GC invocations=7802 (full 530): >> ?par new generation?? total 921600K, used 0K [0xfffffffdd0000000, >> 0xfffffffe0e800000, 0xfffffffe0e800000) >> ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, >> 0xfffffffe02000000) >> ? from space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, >> 0xfffffffe08400000) >> ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, >> 0xfffffffe0e800000) >> ?concurrent mark-sweep generation total 5136384K, used 3203612K >> [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) >> ?concurrent-mark-sweep perm gen total 524288K, used 238246K >> [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) >> } >> {Heap before GC invocations=12696 (full 809): >> ?par new generation?? total 921600K, used 908565K [0xfffffffdd0000000, >> 0xfffffffe0e800000, 0xfffffffe0e800000) >> ? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, >> 0xfffffffe02000000) >> ? from space 102400K,? 87% used [0xfffffffe02000000, 0xfffffffe07745510, >> 0xfffffffe08400000) >> ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, >> 0xfffffffe0e800000) >> ?concurrent mark-sweep generation total 5136384K, used 2696786K >> [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) >> ?concurrent-mark-sweep perm gen total 524288K, used 241380K >> [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) >> 980120.502: [GC 980120.502: [ParNew: 908565K->73974K(921600K), 0.1519646 >> secs] 3605352K->2817972K(6057984K), 0.1523927 secs] [Times: user=1.07 >> sys=0.03, real=0.15 secs] >> Heap after GC invocations=12697 (full 809): >> ?par new generation?? total 921600K, used 73974K [0xfffffffdd0000000, >> 0xfffffffe0e800000, 0xfffffffe0e800000) >> ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, >> 0xfffffffe02000000) >> ? from space 102400K,? 72% used [0xfffffffe08400000, 0xfffffffe0cc3d928, >> 0xfffffffe0e800000) >> ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, >> 0xfffffffe08400000) >> ?concurrent mark-sweep generation total 5136384K, used 2743998K >> [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) >> ?concurrent-mark-sweep perm gen total 524288K, used 241380K >> [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) >> } >> {Heap before GC invocations=12697 (full 809): >> ?par new generation?? total 921600K, used 893174K [0xfffffffdd0000000, >> 0xfffffffe0e800000, 0xfffffffe0e800000) >> ? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, >> 0xfffffffe02000000) >> ? from space 102400K,? 72% used [0xfffffffe08400000, 0xfffffffe0cc3d928, >> 0xfffffffe0e800000) >> ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, >> 0xfffffffe08400000) >> ?concurrent mark-sweep generation total 5136384K, used 2743998K >> [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) >> ?concurrent-mark-sweep perm gen total 524288K, used 241411K >> [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) >> 980130.777: [GC 980130.777: [ParNew (promotion failed): >> 893174K->913391K(921600K), 0.5914616 secs]980131.368: [CMS: >> 2778416K->2710999K(5136384K), 27.7981960 secs] 3637172K->2710999K(6057984K), >> [CMS Perm : 241411K->240494K(524288K)], 28.3902578 secs] [Times: user=29.37 >> sys=0.08, real=28.39 secs] >> Heap after GC invocations=12698 (full 810): >> ?par new generation?? total 921600K, used 0K [0xfffffffdd0000000, >> 0xfffffffe0e800000, 0xfffffffe0e800000) >> ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, >> 0xfffffffe02000000) >> ? from space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, >> 0xfffffffe08400000) >> ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, >> 0xfffffffe0e800000) >> ?concurrent mark-sweep generation total 5136384K, used 2710999K >> [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) >> ?concurrent-mark-sweep perm gen total 524288K, used 240494K >> [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) >> } >> {Heap before GC invocations=12698 (full 810): >> ?par new generation?? total 921600K, used 819200K [0xfffffffdd0000000, >> 0xfffffffe0e800000, 0xfffffffe0e800000) >> ? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, >> 0xfffffffe02000000) >> ? from space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, >> 0xfffffffe08400000) >> ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, >> 0xfffffffe0e800000) >> ?concurrent mark-sweep generation total 5136384K, used 2710999K >> [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) >> ?concurrent-mark-sweep perm gen total 524288K, used 240523K >> [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) >> 980171.033: [GC 980171.033: [ParNew: 819200K->102400K(921600K), 0.2144047 >> secs] 3530199K->2857892K(6057984K), 0.2149864 secs] [Times: user=0.99 >> sys=0.03, real=0.22 secs] >> Heap after GC invocations=12699 (full 810): >> ?par new generation?? total 921600K, used 102400K [0xfffffffdd0000000, >> 0xfffffffe0e800000, 0xfffffffe0e800000) >> ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, >> 0xfffffffe02000000) >> ? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000, >> 0xfffffffe0e800000) >> ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, >> 0xfffffffe08400000) >> ?concurrent mark-sweep generation total 5136384K, used 2755492K >> [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) >> ?concurrent-mark-sweep perm gen total 524288K, used 240523K >> [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) >> } >> >> CMS Failure: >> {Heap before GC invocations=23856 (full 1462): >> ?par new generation?? total 921600K, used 921600K [0xfffffffdd0000000, >> 0xfffffffe0e800000, 0xfffffffe0e800000) >> ? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, >> 0xfffffffe02000000) >> ? from space 102400K, 100% used [0xfffffffe02000000, 0xfffffffe08400000, >> 0xfffffffe08400000) >> ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, >> 0xfffffffe0e800000) >> ?concurrent mark-sweep generation total 5136384K, used 3496920K >> [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) >> ?concurrent-mark-sweep perm gen total 524288K, used 243562K >> [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) >> 1761014.981: [GC 1761014.981: [ParNew: 921600K->102400K(921600K), >> 0.3004508 secs] 4418520K->3667695K(6057984K), 0.3008667 secs] [Times: >> user=1.69 sys=0.10, real=0.30 secs] >> Heap after GC invocations=23857 (full 1462): >> ?par new generation?? total 921600K, used 102400K [0xfffffffdd0000000, >> 0xfffffffe0e800000, 0xfffffffe0e800000) >> ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, >> 0xfffffffe02000000) >> ? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000, >> 0xfffffffe0e800000) >> ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, >> 0xfffffffe08400000) >> ?concurrent mark-sweep generation total 5136384K, used 3565295K >> [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) >> ?concurrent-mark-sweep perm gen total 524288K, used 243562K >> [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) >> } >> {Heap before GC invocations=23857 (full 1462): >> ?par new generation?? total 921600K, used 921600K [0xfffffffdd0000000, >> 0xfffffffe0e800000, 0xfffffffe0e800000) >> ? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, >> 0xfffffffe02000000) >> ? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000, >> 0xfffffffe0e800000) >> ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, >> 0xfffffffe08400000) >> ?concurrent mark-sweep generation total 5136384K, used 3565295K >> [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) >> ?concurrent-mark-sweep perm gen total 524288K, used 243773K >> [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) >> 1761036.997: [GC 1761036.998: [ParNew: 921600K->102400K(921600K), >> 0.4075457 secs] 4486895K->3763683K(6057984K), 0.4079591 secs] [Times: >> user=3.57 sys=0.80, real=0.41 secs] >> Heap after GC invocations=23858 (full 1462): >> ?par new generation?? total 921600K, used 102400K [0xfffffffdd0000000, >> 0xfffffffe0e800000, 0xfffffffe0e800000) >> ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, >> 0xfffffffe02000000) >> ? from space 102400K, 100% used [0xfffffffe02000000, 0xfffffffe08400000, >> 0xfffffffe08400000) >> ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, >> 0xfffffffe0e800000) >> ?concurrent mark-sweep generation total 5136384K, used 3661283K >> [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) >> ?concurrent-mark-sweep perm gen total 524288K, used 243773K >> [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) >> } >> 1761037.410: [GC [1 CMS-initial-mark: 3661283K(5136384K)] >> 3763683K(6057984K), 0.0883369 secs] [Times: user=0.09 sys=0.00, real=0.09 >> secs] >> 1761037.499: [CMS-concurrent-mark-start] >> 1761041.447: [CMS-concurrent-mark: 3.906/3.948 secs] [Times: user=25.81 >> sys=1.06, real=3.95 secs] >> 1761041.448: [CMS-concurrent-preclean-start] >> 1761041.763: [CMS-concurrent-preclean: 0.312/0.315 secs] [Times: user=0.50 >> sys=0.02, real=0.32 secs] >> 1761041.763: [CMS-concurrent-abortable-preclean-start] >> ?CMS: abort preclean due to time 1761046.800: >> [CMS-concurrent-abortable-preclean: 4.720/5.036 secs] [Times: user=6.68 >> sys=0.18, real=5.04 secs] >> 1761046.808: [GC[YG occupancy: 464701 K (921600 K)]1761046.808: [Rescan >> (parallel) , 0.3034664 secs]1761047.112: [weak refs processing, 0.0152564 >> secs]1761047.128: [class unloading, 0.1518160 secs]1761047.280: [scrub >> symbol & string tables, 0.1332523 secs] [1 CMS-remark: 3661283K(5136384K)] >> 4125985K(6057984K), 0.6980401 secs] [Times: user=1.34 sys=0.70, real=0.70 >> secs] >> 1761047.507: [CMS-concurrent-sweep-start] >> 1761051.779: [CMS-concurrent-sweep: 4.252/4.271 secs] [Times: user=6.30 >> sys=0.19, real=4.27 secs] >> 1761051.779: [CMS-concurrent-reset-start] >> 1761051.837: [CMS-concurrent-reset: 0.058/0.058 secs] [Times: user=0.07 >> sys=0.00, real=0.06 secs] >> {Heap before GC invocations=23858 (full 1463): >> ?par new generation?? total 921600K, used 921600K [0xfffffffdd0000000, >> 0xfffffffe0e800000, 0xfffffffe0e800000) >> ? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, >> 0xfffffffe02000000) >> ? from space 102400K, 100% used [0xfffffffe02000000, 0xfffffffe08400000, >> 0xfffffffe08400000) >> ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, >> 0xfffffffe0e800000) >> ?concurrent mark-sweep generation total 5136384K, used 3514703K >> [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) >> ?concurrent-mark-sweep perm gen total 524288K, used 243613K >> [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) >> 1761061.144: [GC 1761061.145: [ParNew: 921600K->102400K(921600K), >> 0.4124278 secs] 4436303K->3717736K(6057984K), 0.4128777 secs] [Times: >> user=1.98 sys=0.19, real=0.41 secs] >> Heap after GC invocations=23859 (full 1463): >> ?par new generation?? total 921600K, used 102400K [0xfffffffdd0000000, >> 0xfffffffe0e800000, 0xfffffffe0e800000) >> ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, >> 0xfffffffe02000000) >> ? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000, >> 0xfffffffe0e800000) >> ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, >> 0xfffffffe08400000) >> ?concurrent mark-sweep generation total 5136384K, used 3615336K >> [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) >> ?concurrent-mark-sweep perm gen total 524288K, used 243613K >> [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) >> } >> 1761061.562: [GC [1 CMS-initial-mark: 3615336K(5136384K)] >> 3718604K(6057984K), 0.0660086 secs] [Times: user=0.07 sys=0.00, real=0.07 >> secs] >> 1761061.629: [CMS-concurrent-mark-start] >> 1761065.589: [CMS-concurrent-mark: 3.920/3.960 secs] [Times: user=26.20 >> sys=1.05, real=3.96 secs] >> 1761065.590: [CMS-concurrent-preclean-start] >> 1761065.883: [CMS-concurrent-preclean: 0.290/0.293 secs] [Times: user=0.54 >> sys=0.02, real=0.29 secs] >> 1761065.883: [CMS-concurrent-abortable-preclean-start] >> ?CMS: abort preclean due to time 1761070.950: >> [CMS-concurrent-abortable-preclean: 5.035/5.067 secs] [Times: user=10.70 >> sys=0.36, real=5.07 secs] >> 1761070.958: [GC[YG occupancy: 656197 K (921600 K)]1761070.959: [Rescan >> (parallel) , 0.5056315 secs]1761071.465: [weak refs processing, 0.0107058 >> secs]1761071.476: [class unloading, 0.1500832 secs]1761071.626: [scrub >> symbol & string tables, 0.1278517 secs] [1 CMS-remark: 3615336K(5136384K)] >> 4271533K(6057984K), 0.8857121 secs] [Times: user=1.77 sys=1.08, real=0.89 >> secs] >> 1761071.845: [CMS-concurrent-sweep-start] >> 1761075.956: [CMS-concurrent-sweep: 4.094/4.111 secs] [Times: user=7.97 >> sys=0.27, real=4.11 secs] >> 1761075.957: [CMS-concurrent-reset-start] >> 1761076.031: [CMS-concurrent-reset: 0.063/0.074 secs] [Times: user=0.13 >> sys=0.01, real=0.07 secs] >> {Heap before GC invocations=23859 (full 1464): >> ?par new generation?? total 921600K, used 921600K [0xfffffffdd0000000, >> 0xfffffffe0e800000, 0xfffffffe0e800000) >> ? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, >> 0xfffffffe02000000) >> ? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000, >> 0xfffffffe0e800000) >> ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, >> 0xfffffffe08400000) >> ?concurrent mark-sweep generation total 5136384K, used 3544377K >> [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) >> ?concurrent-mark-sweep perm gen total 524288K, used 243474K >> [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) >> 1761077.540: [GC 1761077.540: [ParNew: 921600K->102400K(921600K), >> 0.4030394 secs] 4465977K->3740399K(6057984K), 0.4034742 secs] [Times: >> user=3.14 sys=0.55, real=0.40 secs] >> Heap after GC invocations=23860 (full 1464): >> ?par new generation?? total 921600K, used 102400K [0xfffffffdd0000000, >> 0xfffffffe0e800000, 0xfffffffe0e800000) >> ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, >> 0xfffffffe02000000) >> ? from space 102400K, 100% used [0xfffffffe02000000, 0xfffffffe08400000, >> 0xfffffffe08400000) >> ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, >> 0xfffffffe0e800000) >> ?concurrent mark-sweep generation total 5136384K, used 3637999K >> [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) >> ?concurrent-mark-sweep perm gen total 524288K, used 243474K >> [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) >> } >> 1761077.948: [GC [1 CMS-initial-mark: 3637999K(5136384K)] >> 3740403K(6057984K), 0.0664402 secs] [Times: user=0.06 sys=0.00, real=0.07 >> secs] >> 1761078.015: [CMS-concurrent-mark-start] >> 1761082.141: [CMS-concurrent-mark: 4.076/4.126 secs] [Times: user=25.56 >> sys=1.24, real=4.13 secs] >> 1761082.142: [CMS-concurrent-preclean-start] >> 1761082.435: [CMS-concurrent-preclean: 0.290/0.293 secs] [Times: user=0.56 >> sys=0.03, real=0.29 secs] >> 1761082.435: [CMS-concurrent-abortable-preclean-start] >> ?CMS: abort preclean due to time 1761087.544: >> [CMS-concurrent-abortable-preclean: 4.166/5.108 secs] [Times: user=6.79 >> sys=0.38, real=5.11 secs] >> 1761087.554: [GC[YG occupancy: 612230 K (921600 K)]1761087.555: [Rescan >> (parallel) , 0.3453344 secs]1761087.900: [weak refs processing, 0.0033384 >> secs]1761087.904: [class unloading, 0.1515234 secs]1761088.055: [scrub >> symbol & string tables, 0.1280533 secs] [1 CMS-remark: 3637999K(5136384K)] >> 4250230K(6057984K), 0.7189376 secs] [Times: user=1.29 sys=0.76, real=0.72 >> secs] >> 1761088.274: [CMS-concurrent-sweep-start] >> 1761092.543: [CMS-concurrent-sweep: 4.268/4.268 secs] [Times: user=6.72 >> sys=0.26, real=4.27 secs] >> 1761092.543: [CMS-concurrent-reset-start] >> 1761092.606: [CMS-concurrent-reset: 0.063/0.063 secs] [Times: user=0.11 >> sys=0.01, real=0.06 secs] >> {Heap before GC invocations=23860 (full 1465): >> ?par new generation?? total 921600K, used 921600K [0xfffffffdd0000000, >> 0xfffffffe0e800000, 0xfffffffe0e800000) >> ? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, >> 0xfffffffe02000000) >> ? from space 102400K, 100% used [0xfffffffe02000000, 0xfffffffe08400000, >> 0xfffffffe08400000) >> ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, >> 0xfffffffe0e800000) >> ?concurrent mark-sweep generation total 5136384K, used 3582457K >> [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) >> ?concurrent-mark-sweep perm gen total 524288K, used 243634K >> [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) >> 1761096.682: [GC 1761096.682: [ParNew: 921600K->102400K(921600K), >> 0.2843209 secs] 4504057K->3786219K(6057984K), 0.2847419 secs] [Times: >> user=1.81 sys=0.10, real=0.29 secs] >> Heap after GC invocations=23861 (full 1465): >> ?par new generation?? total 921600K, used 102400K [0xfffffffdd0000000, >> 0xfffffffe0e800000, 0xfffffffe0e800000) >> ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, >> 0xfffffffe02000000) >> ? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000, >> 0xfffffffe0e800000) >> ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, >> 0xfffffffe08400000) >> ?concurrent mark-sweep generation total 5136384K, used 3683819K >> [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) >> ?concurrent-mark-sweep perm gen total 524288K, used 243634K >> [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) >> } >> 1761096.971: [GC [1 CMS-initial-mark: 3683819K(5136384K)] >> 3786817K(6057984K), 0.0480239 secs] [Times: user=0.05 sys=0.00, real=0.05 >> secs] >> 1761097.020: [CMS-concurrent-mark-start] >> 1761101.145: [CMS-concurrent-mark: 4.104/4.124 secs] [Times: user=24.60 >> sys=1.09, real=4.13 secs] >> 1761101.145: [CMS-concurrent-preclean-start] >> 1761101.438: [CMS-concurrent-preclean: 0.290/0.292 secs] [Times: user=0.41 >> sys=0.02, real=0.29 secs] >> 1761101.438: [CMS-concurrent-abortable-preclean-start] >> ?CMS: abort preclean due to time 1761106.478: >> [CMS-concurrent-abortable-preclean: 4.694/5.040 secs] [Times: user=7.32 >> sys=0.23, real=5.04 secs] >> 1761106.486: [GC[YG occupancy: 497634 K (921600 K)]1761106.487: [Rescan >> (parallel) , 0.3384965 secs]1761106.825: [weak refs processing, 0.0030734 >> secs]1761106.829: [class unloading, 0.1503426 secs]1761106.979: [scrub >> symbol & string tables, 0.1273150 secs] [1 CMS-remark: 3683819K(5136384K)] >> 4181454K(6057984K), 0.7055549 secs] [Times: user=1.25 sys=0.64, real=0.71 >> secs] >> 1761107.193: [CMS-concurrent-sweep-start] >> 1761111.281: [CMS-concurrent-sweep: 4.088/4.088 secs] [Times: user=5.81 >> sys=0.15, real=4.09 secs] >> 1761111.282: [CMS-concurrent-reset-start] >> 1761111.349: [CMS-concurrent-reset: 0.068/0.068 secs] [Times: user=0.08 >> sys=0.00, real=0.07 secs] >> 1761111.961: [GC [1 CMS-initial-mark: 3633902K(5136384K)] >> 4261007K(6057984K), 0.5015835 secs] [Times: user=0.50 sys=0.00, real=0.50 >> secs] >> 1761112.463: [CMS-concurrent-mark-start] >> 1761116.550: [CMS-concurrent-mark: 4.036/4.087 secs] [Times: user=24.85 >> sys=1.09, real=4.09 secs] >> 1761116.551: [CMS-concurrent-preclean-start] >> 1761116.901: [CMS-concurrent-preclean: 0.344/0.350 secs] [Times: user=0.54 >> sys=0.01, real=0.35 secs] >> 1761116.901: [CMS-concurrent-abortable-preclean-start] >> {Heap before GC invocations=23861 (full 1467): >> ?par new generation?? total 921600K, used 921600K [0xfffffffdd0000000, >> 0xfffffffe0e800000, 0xfffffffe0e800000) >> ? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, >> 0xfffffffe02000000) >> ? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000, >> 0xfffffffe0e800000) >> ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, >> 0xfffffffe08400000) >> ?concurrent mark-sweep generation total 5136384K, used 3633902K >> [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) >> ?concurrent-mark-sweep perm gen total 524288K, used 243740K >> [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) >> 1761120.554: [GC 1761120.554: [ParNew: 921600K->102400K(921600K), >> 0.4726199 secs] 4555502K->3819626K(6057984K), 0.4732486 secs] [Times: >> user=3.31 sys=0.69, real=0.47 secs] >> Heap after GC invocations=23862 (full 1467): >> ?par new generation?? total 921600K, used 102400K [0xfffffffdd0000000, >> 0xfffffffe0e800000, 0xfffffffe0e800000) >> ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, >> 0xfffffffe02000000) >> ? from space 102400K, 100% used [0xfffffffe02000000, 0xfffffffe08400000, >> 0xfffffffe08400000) >> ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, >> 0xfffffffe0e800000) >> ?concurrent mark-sweep generation total 5136384K, used 3717226K >> [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) >> ?concurrent-mark-sweep perm gen total 524288K, used 243740K >> [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) >> } >> ?CMS: abort preclean due to time 1761122.392: >> [CMS-concurrent-abortable-preclean: 4.729/5.490 secs] [Times: user=11.71 >> sys=0.97, real=5.49 secs] >> 1761122.401: [GC[YG occupancy: 177317 K (921600 K)]1761122.401: [Rescan >> (parallel) , 0.0250334 secs]1761122.427: [weak refs processing, 0.0002699 >> secs]1761122.427: [class unloading, 0.0817179 secs]1761122.509: [scrub >> symbol & string tables, 0.1383120 secs] [1 CMS-remark: 3717226K(5136384K)] >> 3894544K(6057984K), 0.3327016 secs] [Times: user=0.55 sys=0.04, real=0.33 >> secs] >> 1761122.735: [CMS-concurrent-sweep-start] >> 1761126.843: [CMS-concurrent-sweep: 4.042/4.108 secs] [Times: user=6.70 >> sys=0.39, real=4.11 secs] >> 1761126.844: [CMS-concurrent-reset-start] >> 1761126.907: [CMS-concurrent-reset: 0.063/0.063 secs] [Times: user=0.11 >> sys=0.00, real=0.06 secs] >> 1761127.142: [GC [1 CMS-initial-mark: 3701154K(5136384K)] >> 4056638K(6057984K), 0.2853309 secs] [Times: user=0.29 sys=0.00, real=0.29 >> secs] >> 1761127.428: [CMS-concurrent-mark-start] >> 1761131.876: [CMS-concurrent-mark: 4.398/4.448 secs] [Times: user=28.46 >> sys=1.55, real=4.45 secs] >> 1761131.877: [CMS-concurrent-preclean-start] >> 1761132.185: [CMS-concurrent-preclean: 0.305/0.308 secs] [Times: user=0.60 >> sys=0.05, real=0.31 secs] >> 1761132.186: [CMS-concurrent-abortable-preclean-start] >> ?CMS: abort preclean due to time 1761137.243: >> [CMS-concurrent-abortable-preclean: 5.029/5.058 secs] [Times: user=9.88 >> sys=0.42, real=5.06 secs] >> 1761137.248: [GC[YG occupancy: 783876 K (921600 K)]1761137.248: [Rescan >> (parallel) , 0.5402015 secs]1761137.789: [weak refs processing, 0.0022809 >> secs]1761137.791: [class unloading, 0.1556933 secs]1761137.947: [scrub >> symbol & string tables, 0.1291759 secs] [1 CMS-remark: 3701154K(5136384K)] >> 4485030K(6057984K), 0.9154842 secs] [Times: user=1.67 sys=0.97, real=0.92 >> secs] >> 1761138.164: [CMS-concurrent-sweep-start] >> {Heap before GC invocations=23862 (full 1468): >> ?par new generation?? total 921600K, used 920346K [0xfffffffdd0000000, >> 0xfffffffe0e800000, 0xfffffffe0e800000) >> ? eden space 819200K,? 99% used [0xfffffffdd0000000, 0xfffffffe01ec6a48, >> 0xfffffffe02000000) >> ? from space 102400K, 100% used [0xfffffffe02000000, 0xfffffffe08400000, >> 0xfffffffe08400000) >> ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, >> 0xfffffffe0e800000) >> ?concurrent mark-sweep generation total 5136384K, used 3694838K >> [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) >> ?concurrent-mark-sweep perm gen total 524288K, used 243810K >> [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) >> 1761140.028: [GC 1761140.028: [ParNew: 920346K->102400K(921600K), >> 0.4882607 secs] 4615185K->4017461K(6057984K), 0.4886748 secs] [Times: >> user=2.71 sys=0.20, real=0.49 secs] >> Heap after GC invocations=23863 (full 1468): >> ?par new generation?? total 921600K, used 102400K [0xfffffffdd0000000, >> 0xfffffffe0e800000, 0xfffffffe0e800000) >> ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, >> 0xfffffffe02000000) >> ? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000, >> 0xfffffffe0e800000) >> ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, >> 0xfffffffe08400000) >> ?concurrent mark-sweep generation total 5136384K, used 3915061K >> [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) >> ?concurrent-mark-sweep perm gen total 524288K, used 243810K >> [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) >> } >> 1761142.726: [CMS-concurrent-sweep: 4.011/4.562 secs] [Times: user=12.54 >> sys=0.74, real=4.56 secs] >> 1761142.727: [CMS-concurrent-reset-start] >> 1761142.791: [CMS-concurrent-reset: 0.064/0.064 secs] [Times: user=0.16 >> sys=0.01, real=0.06 secs] >> 1761143.233: [GC [1 CMS-initial-mark: 3852859K(5136384K)] >> 4152461K(6057984K), 0.2340877 secs] [Times: user=0.23 sys=0.00, real=0.23 >> secs] >> 1761143.467: [CMS-concurrent-mark-start] >> 1761147.673: [CMS-concurrent-mark: 4.182/4.205 secs] [Times: user=26.19 >> sys=1.27, real=4.21 secs] >> 1761147.673: [CMS-concurrent-preclean-start] >> 1761147.978: [CMS-concurrent-preclean: 0.300/0.304 secs] [Times: user=0.44 >> sys=0.02, real=0.30 secs] >> 1761147.978: [CMS-concurrent-abortable-preclean-start] >> {Heap before GC invocations=23863 (full 1469): >> ?par new generation?? total 921600K, used 602663K [0xfffffffdd0000000, >> 0xfffffffe0e800000, 0xfffffffe0e800000) >> ? eden space 819200K,? 61% used [0xfffffffdd0000000, 0xfffffffdee889c90, >> 0xfffffffe02000000) >> ? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000, >> 0xfffffffe0e800000) >> ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, >> 0xfffffffe08400000) >> ?concurrent mark-sweep generation total 5136384K, used 3852859K >> [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) >> ?concurrent-mark-sweep perm gen total 524288K, used 243656K >> [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) >> 1761152.969: [GC 1761152.969: [ParNew: 602663K->102400K(921600K), >> 0.4710031 secs]1761153.440: [CMS CMS: abort preclean due to time >> 1761155.705: [CMS-concurrent-abortable-preclean: 6.957/7.726 secs] [Times: >> user=13.02 sys=0.48, real=7.73 secs] >> ?(concurrent mode failure): 4005428K->3905404K(5136384K), 32.6670849 secs] >> 4455522K->3905404K(6057984K), [CMS Perm : 243656K->243327K(524288K)], >> 33.1389061 secs] [Times: user=35.38 sys=0.26, real=33.14 secs] >> Heap after GC invocations=23864 (full 1470): >> ?par new generation?? total 921600K, used 0K [0xfffffffdd0000000, >> 0xfffffffe0e800000, 0xfffffffe0e800000) >> ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, >> 0xfffffffe02000000) >> ? from space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, >> 0xfffffffe08400000) >> ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, >> 0xfffffffe0e800000) >> ?concurrent mark-sweep generation total 5136384K, used 3905404K >> [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) >> ?concurrent-mark-sweep perm gen total 524288K, used 243327K >> [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) >> } >> 1761187.694: [GC [1 CMS-initial-mark: 4953978K(5136384K)] >> 4965714K(6057984K), 0.0131890 secs] [Times: user=0.01 sys=0.00, real=0.01 >> secs] >> 1761187.708: [CMS-concurrent-mark-start] >> 1761191.965: [CMS-concurrent-mark: 3.634/4.257 secs] [Times: user=32.76 >> sys=1.91, real=4.26 secs] >> 1761191.966: [CMS-concurrent-preclean-start] >> 1761192.543: [CMS-concurrent-preclean: 0.553/0.577 secs] [Times: user=1.56 >> sys=0.12, real=0.58 secs] >> 1761192.544: [CMS-concurrent-abortable-preclean-start] >> ?CMS: abort preclean due to time 1761197.612: >> [CMS-concurrent-abortable-preclean: 2.079/5.068 secs] [Times: user=9.11 >> sys=0.60, real=5.07 secs] >> 1761197.617: [GC[YG occupancy: 813510 K (921600 K)]1761197.618: [Rescan >> (parallel) , 0.7500635 secs]1761198.368: [weak refs processing, 0.0020064 >> secs]1761198.370: [class unloading, 0.0823783 secs]1761198.453: [scrub >> symbol & string tables, 0.1278387 secs] [1 CMS-remark: 4953978K(5136384K)] >> 5767489K(6057984K), 1.0496971 secs] [Times: user=2.69 sys=1.79, real=1.05 >> secs] >> 1761198.668: [CMS-concurrent-sweep-start] >> {Heap before GC invocations=23864 (full 1471): >> ?par new generation?? total 921600K, used 819200K [0xfffffffdd0000000, >> 0xfffffffe0e800000, 0xfffffffe0e800000) >> ? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, >> 0xfffffffe02000000) >> ? from space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, >> 0xfffffffe08400000) >> ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, >> 0xfffffffe0e800000) >> ?concurrent mark-sweep generation total 5136384K, used 4953976K >> [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) >> ?concurrent-mark-sweep perm gen total 524288K, used 243422K >> [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) >> 1761198.700: [GC 1761198.700: [ParNew: 819200K->819200K(921600K), >> 0.0000919 secs]1761198.700: [CMS1761202.072: [CMS-concurrent-sweep: >> 3.389/3.404 secs] [Times: user=3.60 sys=0.04, real=3.40 secs] >> ?(concurrent mode failure): 4953976K->3789438K(5136384K), 32.6623615 secs] >> 5773176K->3789438K(6057984K), [CMS Perm : 243422K->243328K(524288K)], >> 32.6632802 secs] [Times: user=32.58 sys=0.03, real=32.66 secs] >> Heap after GC invocations=23865 (full 1472): >> ?par new generation?? total 921600K, used 0K [0xfffffffdd0000000, >> 0xfffffffe0e800000, 0xfffffffe0e800000) >> ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, >> 0xfffffffe02000000) >> ? from space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, >> 0xfffffffe08400000) >> ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, >> 0xfffffffe0e800000) >> ?concurrent mark-sweep generation total 5136384K, used 3789438K >> [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) >> ?concurrent-mark-sweep perm gen total 524288K, used 243328K >> [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) >> } >> 1761231.418: [GC [1 CMS-initial-mark: 3789438K(5136384K)] >> 3883471K(6057984K), 0.0609784 secs] [Times: user=0.06 sys=0.01, real=0.06 >> secs] >> 1761231.480: [CMS-concurrent-mark-start] >> 1761236.061: [CMS-concurrent-mark: 3.752/4.580 secs] [Times: user=34.48 >> sys=2.81, real=4.58 secs] >> 1761236.061: [CMS-concurrent-preclean-start] >> 1761236.428: [CMS-concurrent-preclean: 0.358/0.367 secs] [Times: user=0.46 >> sys=0.01, real=0.37 secs] >> 1761236.429: [CMS-concurrent-abortable-preclean-start] >> ?CMS: abort preclean due to time 1761241.488: >> [CMS-concurrent-abortable-preclean: 2.384/5.059 secs] [Times: user=5.30 >> sys=0.75, real=5.06 secs] >> 1761241.497: [GC[YG occupancy: 787969 K (921600 K)]1761241.497: [Rescan >> (parallel) , 0.5938799 secs]1761242.091: [weak refs processing, 0.0067469 >> secs]1761242.098: [class unloading, 0.0826078 secs]1761242.181: [scrub >> symbol & string tables, 0.1308434 secs] [1 CMS-remark: 3789438K(5136384K)] >> 4577408K(6057984K), 0.9017583 secs] [Times: user=2.66 sys=2.07, real=0.90 >> secs] >> 1761242.400: [CMS-concurrent-sweep-start] >> {Heap before GC invocations=23865 (full 1473): >> ?par new generation?? total 921600K, used 819200K [0xfffffffdd0000000, >> 0xfffffffe0e800000, 0xfffffffe0e800000) >> ? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, >> 0xfffffffe02000000) >> ? from space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, >> 0xfffffffe08400000) >> ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, >> 0xfffffffe0e800000) >> ?concurrent mark-sweep generation total 5136384K, used 3789391K >> [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) >> ?concurrent-mark-sweep perm gen total 524288K, used 243406K >> [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) >> 1761242.586: [GC 1761242.586: [ParNew: 819200K->102400K(921600K), >> 0.1871926 secs] 4608591K->3940305K(6057984K), 0.1879045 secs] [Times: >> user=0.93 sys=0.05, real=0.19 secs] >> Heap after GC invocations=23866 (full 1473): >> ?par new generation?? total 921600K, used 102400K [0xfffffffdd0000000, >> 0xfffffffe0e800000, 0xfffffffe0e800000) >> ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, >> 0xfffffffe02000000) >> ? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000, >> 0xfffffffe0e800000) >> ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, >> 0xfffffffe08400000) >> ?concurrent mark-sweep generation total 5136384K, used 3837905K >> [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) >> ?concurrent-mark-sweep perm gen total 524288K, used 243406K >> [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) >> } >> 1761245.857: [CMS-concurrent-sweep: 3.225/3.457 secs] [Times: user=6.21 >> sys=0.52, real=3.46 secs] >> 1761245.858: [CMS-concurrent-reset-start] >> 1761245.922: [CMS-concurrent-reset: 0.064/0.064 secs] [Times: user=0.08 >> sys=0.01, real=0.06 secs] >> 1761247.301: [GC [1 CMS-initial-mark: 3676150K(5136384K)] >> 3952072K(6057984K), 0.2229528 secs] [Times: user=0.22 sys=0.00, real=0.22 >> secs] >> 1761247.525: [CMS-concurrent-mark-start] >> 1761251.076: [CMS-concurrent-mark: 3.510/3.551 secs] [Times: user=23.68 >> sys=0.85, real=3.55 secs] >> 1761251.076: [CMS-concurrent-preclean-start] >> 1761251.375: [CMS-concurrent-preclean: 0.295/0.298 secs] [Times: user=0.72 >> sys=0.04, real=0.30 secs] >> 1761251.375: [CMS-concurrent-abortable-preclean-start] >> ?CMS: abort preclean due to time 1761256.460: >> [CMS-concurrent-abortable-preclean: 5.012/5.085 secs] [Times: user=9.93 >> sys=0.99, real=5.09 secs] >> 1761256.469: [GC[YG occupancy: 720909 K (921600 K)]1761256.469: [Rescan >> (parallel) , 0.4663462 secs]1761256.936: [weak refs processing, 0.0153453 >> secs]1761256.951: [class unloading, 0.0833874 secs]1761257.035: [scrub >> symbol & string tables, 0.1289153 secs] [1 CMS-remark: 3676150K(5136384K)] >> 4397060K(6057984K), 0.7879219 secs] [Times: user=1.55 sys=0.96, real=0.79 >> secs] >> 1761257.258: [CMS-concurrent-sweep-start] >> {Heap before GC invocations=23866 (full 1474): >> ?par new generation?? total 921600K, used 921600K [0xfffffffdd0000000, >> 0xfffffffe0e800000, 0xfffffffe0e800000) >> ? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, >> 0xfffffffe02000000) >> ? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000, >> 0xfffffffe0e800000) >> ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, >> 0xfffffffe08400000) >> ?concurrent mark-sweep generation total 5136384K, used 3669509K >> [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) >> ?concurrent-mark-sweep perm gen total 524288K, used 243414K >> [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) >> 1761259.137: [GC 1761259.138: [ParNew: 921600K->102400K(921600K), >> 0.3975686 secs] 4591109K->3894075K(6057984K), 0.3981608 secs] [Times: >> user=1.65 sys=0.15, real=0.40 secs] >> Heap after GC invocations=23867 (full 1474): >> ?par new generation?? total 921600K, used 102400K [0xfffffffdd0000000, >> 0xfffffffe0e800000, 0xfffffffe0e800000) >> ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, >> 0xfffffffe02000000) >> ? from space 102400K, 100% used [0xfffffffe02000000, 0xfffffffe08400000, >> 0xfffffffe08400000) >> ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, >> 0xfffffffe0e800000) >> ?concurrent mark-sweep generation total 5136384K, used 3791675K >> [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000) >> ?concurrent-mark-sweep perm gen total 524288K, used 243414K >> [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000) >> } >> -------------- next part -------------- >> An HTML attachment was scrubbed... >> URL: >> http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120412/16cc56a7/attachment.html >> -------------- next part -------------- >> An embedded and charset-unspecified text was scrubbed... >> Name: CMS Failure.txt >> Url: >> http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120412/16cc56a7/CMSFailure.txt >> -------------- next part -------------- >> An embedded and charset-unspecified text was scrubbed... >> Name: PARNEW Failure.txt >> Url: >> http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120412/16cc56a7/PARNEWFailure.txt > > From taras.tielkes at gmail.com Sun Apr 15 05:34:07 2012 From: taras.tielkes at gmail.com (Taras Tielkes) Date: Sun, 15 Apr 2012 14:34:07 +0200 Subject: Promotion failures: indication of CMS fragmentation? In-Reply-To: References: <4EF9FCAC.3030208@oracle.com> <4F06A270.3010701@oracle.com> <4F0DBEC4.7040907@oracle.com> <4F1ECE7B.3040502@oracle.com> <4F1F2ED7.6060308@oracle.com> <4F20F78D.9070905@oracle.com> Message-ID: Hi Chi, Srinivas, Optimizing the cost of ParNew (by lowering MTT) would be nice, but for now my priority is still to minimize the promotion failures. For example, on the machine running CMS with the "larger" young gen and survivor spaces (-Xmx5g -Xmn800 -XX:SurvivorRatio=4), I've just seen a promotion failure again. Below is a snippet of gc.log showing this. To put this into perspective, this is a first promotion failure on that machine in a couple of weeks. Still, zero failures would beat a single failure, since the clients connecting to this application will only wait a few seconds before timing out and terminating the connection. In addition, the promotion failures are occurring in peak usage moments. Apart from trying to eliminate the promotion failure pauses, my main goal is to learn how to understand the root cause in a case like this. Any suggestions for things to try or read up on are appreciated. Kind regards, Taras ------------------------------------------------ 2012-04-13T17:44:27.777+0200: 3296255.045: [GC 3296255.046: [ParNew Desired survivor size 69894144 bytes, new threshold 15 (max 15) - age 1: 3684448 bytes, 3684448 total - age 2: 824984 bytes, 4509432 total - age 3: 885120 bytes, 5394552 total - age 4: 756568 bytes, 6151120 total - age 5: 696880 bytes, 6848000 total - age 6: 890688 bytes, 7738688 total - age 7: 2631184 bytes, 10369872 total - age 8: 719976 bytes, 11089848 total - age 9: 724944 bytes, 11814792 total - age 10: 750360 bytes, 12565152 total - age 11: 934944 bytes, 13500096 total - age 12: 521080 bytes, 14021176 total - age 13: 543392 bytes, 14564568 total - age 14: 906616 bytes, 15471184 total - age 15: 504008 bytes, 15975192 total : 568932K->22625K(682688K), 0.0410180 secs] 3077079K->2531413K(5106368K), 0.0416940 secs] [Times: user=0.30 sys=0.01, real=0.05 secs] 2012-04-13T17:44:33.893+0200: 3296261.162: [GC 3296261.162: [ParNew Desired survivor size 69894144 bytes, new threshold 15 (max 15) - age 1: 2975896 bytes, 2975896 total - age 2: 742592 bytes, 3718488 total - age 3: 812864 bytes, 4531352 total - age 4: 873488 bytes, 5404840 total - age 5: 746128 bytes, 6150968 total - age 6: 685192 bytes, 6836160 total - age 7: 888376 bytes, 7724536 total - age 8: 2621688 bytes, 10346224 total - age 9: 715608 bytes, 11061832 total - age 10: 723336 bytes, 11785168 total - age 11: 749856 bytes, 12535024 total - age 12: 914632 bytes, 13449656 total - age 13: 520944 bytes, 13970600 total - age 14: 543224 bytes, 14513824 total - age 15: 906040 bytes, 15419864 total : 568801K->22726K(682688K), 0.0447800 secs] 3077589K->2532008K(5106368K), 0.0454710 secs] [Times: user=0.33 sys=0.00, real=0.05 secs] 2012-04-13T17:44:40.231+0200: 3296267.499: [GC 3296267.500: [ParNew (1: promotion failure size = 16) (2: promotion failure size = 56) (4: promotion failure size = 342) (5: promotion failure size = 1026) (6: promotion failure size = 278) (promotion failed) Desired survivor size 69894144 bytes, new threshold 15 (max 15) - age 1: 2436840 bytes, 2436840 total - age 2: 1625136 bytes, 4061976 total - age 3: 691664 bytes, 4753640 total - age 4: 799992 bytes, 5553632 total - age 5: 858344 bytes, 6411976 total - age 6: 730200 bytes, 7142176 total - age 7: 680072 bytes, 7822248 total - age 8: 885960 bytes, 8708208 total - age 9: 2618544 bytes, 11326752 total - age 10: 709168 bytes, 12035920 total - age 11: 714576 bytes, 12750496 total - age 12: 734976 bytes, 13485472 total - age 13: 905048 bytes, 14390520 total - age 14: 520320 bytes, 14910840 total - age 15: 543056 bytes, 15453896 total : 568902K->568678K(682688K), 0.3130580 secs]3296267.813: [CMS: 2510091K->573489K(4423680K), 7.7481330 secs] 3078184K->573489K(5106368K), [CMS Perm : 144002K-> 143970K(262144K)], 8.0619690 secs] [Times: user=8.35 sys=0.01, real=8.06 secs] 2012-04-13T17:44:51.337+0200: 3296278.606: [GC 3296278.606: [ParNew Desired survivor size 69894144 bytes, new threshold 15 (max 15) - age 1: 33717528 bytes, 33717528 total : 546176K->43054K(682688K), 0.0515990 secs] 1119665K->616543K(5106368K), 0.0523550 secs] [Times: user=0.34 sys=0.00, real=0.05 secs] ------------------------------------------------ On Tue, Mar 20, 2012 at 10:12 PM, Srinivas Ramakrishna wrote: > As Chi-ho noted, about 3-4 MB of data does get promoted per scavenge, > after having > sloshed around in your survivor spaces some 15 times. I'd venture that > whatever winnowing > of young objects was to ocur has in fact occured already within the > first 3-4 scavenges that > an object has survived, after which the drop-off in population is less > sharp. So I'd suggest > lowering the MTT to about 3, while leaving the survivor ratio intact. > That should reduce your > copying costs and bring down your scavenge pauses further, while not > adversely affecting > your promotion rates (and concomitantly the fragmentation). > > One thing that was a bit puzzling about the stats below was that you'd > expect the volume > of generation X in scavenge N to be no less than the volume of > generation X+1 in scavenge N+1, > but occasionally that natural invariant does not appear to hold, which > is quite puzzling -- > indicating perhaps that either ages or populations are not being > correctly tracked. > > I don't know if anyone else has noticed that in their tenuring > distributions as well.... > > -- ramki > > On Tue, Mar 20, 2012 at 9:36 AM, Taras Tielkes wrote: >> Hi, >> >> I've collected -XX:+PrintTenuringDistribution data from a node in our >> production environment, running -Xmx5g -Xmn400m -XX:SurvivorRatio=8. >> On one other production node, we've configured a larger new gen, and >> larger survivor spaces (-Xmx5g -Xmn800m -XX:SurvivorRatio=4). >> This node has -XX:+PrintTenuringDistribution logging as well. >> >> The node running the larger new gen and survivor spaces has not run >> into a promotion failure yet, while the ones still running the old >> config have hit a few. >> The promotion failures are typically experienced at high load periods, >> which makes sense, as allocation and promotion will experience a spike >> in those periods as well. >> >> The inherent nature of the application implies relatively long >> sessions (towards a few hours), retaining a fair amout of state up to >> an hour. >> I believe this is the main reason of the relatively high promotion >> rate we're experiencing. >> >> >> Here's a fragment of gc log from one of the nodes running the older >> (smaller) new gen, including a promotion failure: >> ------------------------- >> 2012-03-15T18:32:17.785+0100: 796604.225: [GC 796604.225: [ParNew >> Desired survivor size 20971520 bytes, new threshold 8 (max 15) >> - age ? 1: ? ?2927728 bytes, ? ?2927728 total >> - age ? 2: ? ?2428512 bytes, ? ?5356240 total >> - age ? 3: ? ?2696376 bytes, ? ?8052616 total >> - age ? 4: ? ?2623576 bytes, ? 10676192 total >> - age ? 5: ? ?3365576 bytes, ? 14041768 total >> - age ? 6: ? ?2792272 bytes, ? 16834040 total >> - age ? 7: ? ?2233008 bytes, ? 19067048 total >> - age ? 8: ? ?2263824 bytes, ? 21330872 total >> : 358709K->29362K(368640K), 0.0461460 secs] >> 3479492K->3151874K(5201920K), 0.0467320 secs] [Times: user=0.34 >> sys=0.01, real=0.05 secs] >> 2012-03-15T18:32:21.546+0100: 796607.986: [GC 796607.986: [ParNew (0: >> promotion failure size = 25) ?(1: promotion failure size = 25) ?(2: >> promotion failure size = 25) ?(3: promotion failure size = 25) ?(4: >> promotion failure size = 25) ?(5 >> : promotion failure size = 25) ?(6: promotion failure size = 341) ?(7: >> promotion failure size = 25) ?(promotion failed) >> Desired survivor size 20971520 bytes, new threshold 8 (max 15) >> - age ? 1: ? ?3708208 bytes, ? ?3708208 total >> - age ? 2: ? ?2174384 bytes, ? ?5882592 total >> - age ? 3: ? ?2383256 bytes, ? ?8265848 total >> - age ? 4: ? ?2689912 bytes, ? 10955760 total >> - age ? 5: ? ?2621832 bytes, ? 13577592 total >> - age ? 6: ? ?3360440 bytes, ? 16938032 total >> - age ? 7: ? ?2784136 bytes, ? 19722168 total >> - age ? 8: ? ?2220232 bytes, ? 21942400 total >> : 357042K->356456K(368640K), 0.2734100 secs]796608.259: [CMS: >> 3124189K->516640K(4833280K), 6.8127070 secs] >> 3479554K->516640K(5201920K), [CMS Perm : 142423K->142284K(262144K)], >> 7.0867850 secs] [Times: user=7.32 sys=0.07, real=7.09 secs] >> 2012-03-15T18:32:30.279+0100: 796616.719: [GC 796616.720: [ParNew >> Desired survivor size 20971520 bytes, new threshold 1 (max 15) >> - age ? 1: ? 29721456 bytes, ? 29721456 total >> : 327680K->40960K(368640K), 0.0403130 secs] >> 844320K->557862K(5201920K), 0.0409070 secs] [Times: user=0.27 >> sys=0.01, real=0.04 secs] >> 2012-03-15T18:32:32.701+0100: 796619.141: [GC 796619.141: [ParNew >> Desired survivor size 20971520 bytes, new threshold 15 (max 15) >> - age ? 1: ? 10310176 bytes, ? 10310176 total >> ------------------------- >> >> For contrast, here's a gc log fragment from the single node running >> the larger new gen and larger survivor spaces: >> (the fragment is from the same point in time, with the nodes >> experiencing equal load) >> ------------------------- >> 2012-03-15T18:32:12.067+0100: 797119.336: [GC 797119.336: [ParNew >> Desired survivor size 69894144 bytes, new threshold 15 (max 15) >> - age ? 1: ? ?5611536 bytes, ? ?5611536 total >> - age ? 2: ? ?3731888 bytes, ? ?9343424 total >> - age ? 3: ? ?3450672 bytes, ? 12794096 total >> - age ? 4: ? ?3314744 bytes, ? 16108840 total >> - age ? 5: ? ?3459888 bytes, ? 19568728 total >> - age ? 6: ? ?3334712 bytes, ? 22903440 total >> - age ? 7: ? ?3671960 bytes, ? 26575400 total >> - age ? 8: ? ?3841608 bytes, ? 30417008 total >> - age ? 9: ? ?2035392 bytes, ? 32452400 total >> - age ?10: ? ?1975056 bytes, ? 34427456 total >> - age ?11: ? ?2021344 bytes, ? 36448800 total >> - age ?12: ? ?1520752 bytes, ? 37969552 total >> - age ?13: ? ?1494176 bytes, ? 39463728 total >> - age ?14: ? ?2355136 bytes, ? 41818864 total >> - age ?15: ? ?1279000 bytes, ? 43097864 total >> : 603473K->61640K(682688K), 0.0756570 secs] >> 3373284K->2832383K(5106368K), 0.0762090 secs] [Times: user=0.56 >> sys=0.00, real=0.08 secs] >> 2012-03-15T18:32:18.200+0100: 797125.468: [GC 797125.469: [ParNew >> Desired survivor size 69894144 bytes, new threshold 15 (max 15) >> - age ? 1: ? ?6101320 bytes, ? ?6101320 total >> - age ? 2: ? ?4446776 bytes, ? 10548096 total >> - age ? 3: ? ?3701384 bytes, ? 14249480 total >> - age ? 4: ? ?3438488 bytes, ? 17687968 total >> - age ? 5: ? ?3295360 bytes, ? 20983328 total >> - age ? 6: ? ?3403320 bytes, ? 24386648 total >> - age ? 7: ? ?3323368 bytes, ? 27710016 total >> - age ? 8: ? ?3665760 bytes, ? 31375776 total >> - age ? 9: ? ?2427904 bytes, ? 33803680 total >> - age ?10: ? ?1418656 bytes, ? 35222336 total >> - age ?11: ? ?1955192 bytes, ? 37177528 total >> - age ?12: ? ?2006064 bytes, ? 39183592 total >> - age ?13: ? ?1520768 bytes, ? 40704360 total >> - age ?14: ? ?1493728 bytes, ? 42198088 total >> - age ?15: ? ?2354376 bytes, ? 44552464 total >> : 607816K->62650K(682688K), 0.0779270 secs] >> 3378559K->2834643K(5106368K), 0.0784690 secs] [Times: user=0.58 >> sys=0.00, real=0.08 secs] >> ------------------------- >> >> Questions: >> >> 1) From the tenuring distributions, it seems that the application >> benefits from larger new gen and survivor spaces. >> The next thing we'll try is to run with -Xmn1g -XX:SurvivorRatio=2, >> and see if the ParNew times are still acceptable. >> Does this seem a sensible approach in this context? >> Are there other variables beyond ParNew times that limit scaling the >> new gen to a large size? >> >> 2) Given the object age demographics inherent to our application, we >> can not expect to see the majority of data get collected in the new >> gen. >> >> Our approach to fight the promotion failures consists of three aspects: >> a) Lower the overall allocation rate of our application (by improving >> wasteful hotspots), to decrease overall ParNew collection frequency. >> b) Configure the new gen and survivor spaces as large as possible, >> keeping an eye on ParNew times and overall new/tenured ratio. >> c) Try to refactor the data structures that form the bulk of promoted >> data, to retain only the strictly required subgraphs. >> >> Is there anything else I can try or measure, in order to better >> understand the problem? >> >> Thanks in advance, >> Taras >> >> >> On Wed, Feb 22, 2012 at 10:51 AM, Taras Tielkes wrote: >>> (this time properly responding to the list alias) >>> Hi Srinivas, >>> >>> We're running 1.6.0 u29 on Linux x64. My understanding is that >>> CompressedOops is enabled by default since u23. >>> >>> At least this page seems to support that: >>> http://docs.oracle.com/javase/7/docs/technotes/guides/vm/performance-enhancements-7.html >>> >>> Regarding the other remarks (also from Todd and Chi), I'll comment >>> later. The first thing on my list is to collect >>> PrintTenuringDistribution data now. >>> >>> Kind regards, >>> Taras >>> >>> On Wed, Feb 22, 2012 at 10:50 AM, Taras Tielkes wrote: >>>> Hi Srinivas, >>>> >>>> We're running 1.6.0 u29 on Linux x64. My understanding is that >>>> CompressedOops is enabled by default since u23. >>>> >>>> At least this page seems to support that: >>>> http://docs.oracle.com/javase/7/docs/technotes/guides/vm/performance-enhancements-7.html >>>> >>>> Regarding the other remarks (also from Todd and Chi), I'll comment >>>> later. The first thing on my list is to collect >>>> PrintTenuringDistribution data now. >>>> >>>> Kind regards, >>>> Taras >>>> >>>> On Wed, Feb 22, 2012 at 12:40 AM, Srinivas Ramakrishna >>>> wrote: >>>>> I agree that premature promotions are almost always the first and most >>>>> important thing to fix when running >>>>> into fragmentation or overload issues with CMS. However, I can also imagine >>>>> long-lived objects with a highly >>>>> non-stationary size distribution which can also cause problems for CMS >>>>> despite best efforts to tune against >>>>> premature promotion. >>>>> >>>>> I didn't think Treas was running with MTT=0, although MTT > 0 is no recipe >>>>> for avoiding premature promotion >>>>> with bursty loads that case overflow the survivor spaces -- as you say large >>>>> survivor spaces with a low >>>>> TargetSurvivorRatio -- so as to leave plenty of space to absorb/accommodate >>>>> spiking/bursty loads? is >>>>> definitely a "best practice" for CMS (and possibly for other concurrent >>>>> collectors as well). >>>>> >>>>> One thing Taras can do to see if premature promotion might be an issue is to >>>>> look at the tenuring >>>>> threshold in his case. A rough proxy (if PrintTenuringDistribution is not >>>>> enabled) is to look at the >>>>> promotion volume per scavenge. It may be possible, if premature promotion is >>>>> a cause, to see >>>>> some kind of medium-term correlation between high promotion volume and >>>>> eventual promotion >>>>> failure despite frequent CMS collections. >>>>> >>>>> One other point which may or may not be relevant. I see that Taras is not >>>>> using CompressedOops... >>>>> Using that alone would greatly decrease memory pressure and provide more >>>>> breathing room to CMS, >>>>> which is also almost always a good idea. >>>>> >>>>> -- ramki >>>>> >>>>> On Tue, Feb 21, 2012 at 10:16 AM, Chi Ho Kwok wrote: >>>>>> >>>>>> Hi Teras, >>>>>> >>>>>> I think you may want to look into sizing the new and especially the >>>>>> survivor spaces differently. We run something similar to what you described, >>>>>> high volume request processing with large dataset loading, and what we've >>>>>> seen at the start is that the survivor spaces are completely overloaded, >>>>>> causing premature promotions. >>>>>> >>>>>> We've configured our vm with the following goals/guideline: >>>>>> >>>>>> old space is for semi-permanent data, living for at least 30s, average ~10 >>>>>> minutes >>>>>> new space contains only temporary and just loaded data >>>>>> surviving objects from new should never reach old in 1 gc, so the survivor >>>>>> space may never be 100% full >>>>>> >>>>>> With jstat -gcutil `pidof java` 2000, we see things like: >>>>>> >>>>>> ? S0 ? ? S1 ? ? E ? ? ?O ? ? ?P ? ? YGC ? ? YGCT ? ?FGC ? ?FGCT ? ? GCT >>>>>> ?70.20 ? 0.00 ?19.65 ?57.60 ?59.90 124808 29474.299 ?2498 ?191.110 >>>>>> 29665.409 >>>>>> ?70.20 ? 0.00 ?92.89 ?57.60 ?59.90 124808 29474.299 ?2498 ?191.110 >>>>>> 29665.409 >>>>>> ?70.20 ? 0.00 ?93.47 ?57.60 ?59.90 124808 29474.299 ?2498 ?191.110 >>>>>> 29665.409 >>>>>> ? 0.00 ?65.69 ?78.07 ?58.09 ?59.90 124809 29474.526 ?2498 ?191.110 >>>>>> 29665.636 >>>>>> ?84.97 ? 0.00 ?48.19 ?58.57 ?59.90 124810 29474.774 ?2498 ?191.110 >>>>>> 29665.884 >>>>>> ?84.97 ? 0.00 ?81.30 ?58.57 ?59.90 124810 29474.774 ?2498 ?191.110 >>>>>> 29665.884 >>>>>> ? 0.00 ?62.64 ?27.22 ?59.12 ?59.90 124811 29474.992 ?2498 ?191.110 >>>>>> 29666.102 >>>>>> ? 0.00 ?62.64 ?54.47 ?59.12 ?59.90 124811 29474.992 ?2498 ?191.110 >>>>>> 29666.102 >>>>>> ?75.68 ? 0.00 ? 6.80 ?59.53 ?59.90 124812 29475.228 ?2498 ?191.110 >>>>>> 29666.338 >>>>>> ?75.68 ? 0.00 ?23.38 ?59.53 ?59.90 124812 29475.228 ?2498 ?191.110 >>>>>> 29666.338 >>>>>> ?75.68 ? 0.00 ?27.72 ?59.53 ?59.90 124812 29475.228 ?2498 ?191.110 >>>>>> 29666.338 >>>>>> >>>>>> If you follow the lines, you can see Eden fill up to 100% on line 4, >>>>>> surviving objects are copied into S1, S0 is collected and added 0.49% to >>>>>> Old. On line 5, another GC happened, with Eden->S0, S1->Old, etc. No objects >>>>>> is ever transferred from Eden to Old, unless there's a huge peak of >>>>>> requests. >>>>>> >>>>>> This is with a: 32GB heap, Mxn1200M, SurvivorRatio 2 (600MB Eden, 300MB >>>>>> S0, 300MB S1), MaxTenuringThreshold 1 (whatever is still alive in S0/1 on >>>>>> the second GC is copied to old, don't wait, web requests are quite bursty). >>>>>> With about 1 collection every 2-5 seconds, objects promoted to Old must live >>>>>> for at 4-10 seconds; as that's longer than an average request (50ms-1s), >>>>>> none of the temporary data ever makes it into Old, which is much more >>>>>> expensive to collect. It works even with a higher than default >>>>>> CMSInitiatingOccupancyFraction=76 to optimize for space available for the >>>>>> large data cache we have. >>>>>> >>>>>> >>>>>> With your config of 400MB Total new, with 350MB Eden, 25MB S0, 25MB S1 >>>>>> (SurvivorRatio 8), no tenuring threshold, I think loads of new objects get >>>>>> copied from Eden to Old directly, causing trouble for the CMS. You can use >>>>>> jstat to get live stats and tweak until it doesn't happen. If you can't make >>>>>> changes on live that easil, try doubling the new size indeed, with a 400 >>>>>> Eden, 200 S0, 200 S1 and?MaxTenuringThreshold?1 setting. It's probably >>>>>> overkill, but if should solve the problem if it is caused by premature >>>>>> promotion. >>>>>> >>>>>> >>>>>> Chi Ho Kwok >>>>>> >>>>>> >>>>>> On Tue, Feb 21, 2012 at 5:55 PM, Taras Tielkes >>>>>> wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> We've removed the "-XX:+CMSScavengeBeforeRemark" setting from 50% of >>>>>>> our production nodes. >>>>>>> After running for a few weeks, it seems that there's no impact from >>>>>>> removing this option. >>>>>>> Which is good, since it seems we can remove it from the other nodes as >>>>>>> well, simplifying our overall JVM configuration ;-) >>>>>>> >>>>>>> However, we're still seeing promotion failures on all nodes, once >>>>>>> every day or so. >>>>>>> >>>>>>> There's still the "Magic 1026": this accounts for ~60% of the >>>>>>> promotion failures that we're seeing (single ParNew thread thread, >>>>>>> 1026 failure size): >>>>>>> -------------------- >>>>>>> 2012-02-06T09:13:51.806+0100: 328095.085: [GC 328095.086: [ParNew: >>>>>>> 359895K->29357K(368640K), 0.0429070 secs] >>>>>>> 3471021K->3143476K(5201920K), 0.0434950 secs] [Times: user=0.32 >>>>>>> sys=0.00, real=0.04 secs] >>>>>>> 2012-02-06T09:13:55.922+0100: 328099.201: [GC 328099.201: [ParNew: >>>>>>> 357037K->31817K(368640K), 0.0429130 secs] >>>>>>> 3471156K->3148946K(5201920K), 0.0434930 secs] [Times: user=0.31 >>>>>>> sys=0.00, real=0.04 secs] >>>>>>> 2012-02-06T09:13:59.044+0100: 328102.324: [GC 328102.324: [ParNew >>>>>>> (promotion failure size = 1026) ?(promotion failed): >>>>>>> 359497K->368640K(368640K), 0.2226790 secs]328102.547: [CMS: >>>>>>> 3125609K->451515K(4833280K), 5.6225880 secs] 3476626K->4515 >>>>>>> 15K(5201920K), [CMS Perm : 124373K->124353K(262144K)], 5.8459380 secs] >>>>>>> [Times: user=6.20 sys=0.01, real=5.85 secs] >>>>>>> 2012-02-06T09:14:05.243+0100: 328108.522: [GC 328108.523: [ParNew: >>>>>>> 327680K->40960K(368640K), 0.0319160 secs] 779195K->497658K(5201920K), >>>>>>> 0.0325360 secs] [Times: user=0.21 sys=0.01, real=0.03 secs] >>>>>>> 2012-02-06T09:14:07.836+0100: 328111.116: [GC 328111.116: [ParNew: >>>>>>> 368640K->32785K(368640K), 0.0744670 secs] 825338K->520234K(5201920K), >>>>>>> 0.0750390 secs] [Times: user=0.40 sys=0.02, real=0.08 secs] >>>>>>> -------------------- >>>>>>> Given the 1026 word size, I'm wondering if I should be hunting for an >>>>>>> overuse of BufferedInputStream/BufferedOutoutStream, since both have >>>>>>> 8192 as a default buffer size. >>>>>>> >>>>>>> The second group of promotion failures look like this (multiple ParNew >>>>>>> threads, small failure sizes): >>>>>>> -------------------- >>>>>>> 2012-02-06T09:50:15.773+0100: 328756.964: [GC 328756.964: [ParNew: >>>>>>> 356116K->29934K(368640K), 0.0461100 secs] >>>>>>> 3203863K->2880162K(5201920K), 0.0468870 secs] [Times: user=0.34 >>>>>>> sys=0.01, real=0.05 secs] >>>>>>> 2012-02-06T09:50:19.153+0100: 328760.344: [GC 328760.344: [ParNew: >>>>>>> 357614K->30359K(368640K), 0.0454680 secs] >>>>>>> 3207842K->2882892K(5201920K), 0.0462280 secs] [Times: user=0.33 >>>>>>> sys=0.01, real=0.05 secs] >>>>>>> 2012-02-06T09:50:22.658+0100: 328763.849: [GC 328763.849: [ParNew (1: >>>>>>> promotion failure size = 25) ?(4: promotion failure size = 25) ?(6: >>>>>>> promotion failure size = 25) ?(7: promotion failure size = 144) >>>>>>> (promotion failed): 358039K->358358 >>>>>>> K(368640K), 0.2148680 secs]328764.064: [CMS: >>>>>>> 2854709K->446750K(4833280K), 5.8368270 secs] >>>>>>> 3210572K->446750K(5201920K), [CMS Perm : 124670K->124644K(262144K)], >>>>>>> 6.0525230 secs] [Times: user=6.32 sys=0.00, real=6.05 secs] >>>>>>> 2012-02-06T09:50:29.896+0100: 328771.086: [GC 328771.087: [ParNew: >>>>>>> 327680K->22569K(368640K), 0.0227080 secs] 774430K->469319K(5201920K), >>>>>>> 0.0235020 secs] [Times: user=0.16 sys=0.00, real=0.02 secs] >>>>>>> 2012-02-06T09:50:31.076+0100: 328772.266: [GC 328772.267: [ParNew: >>>>>>> 350249K->22264K(368640K), 0.0235480 secs] 796999K->469014K(5201920K), >>>>>>> 0.0243000 secs] [Times: user=0.18 sys=0.01, real=0.02 secs] >>>>>>> -------------------- >>>>>>> >>>>>>> We're going to try to double the new size on a single node, to see the >>>>>>> effects of that. >>>>>>> >>>>>>> Beyond this experiment, is there any additional data I can collect to >>>>>>> better understand the nature of the promotion failures? >>>>>>> Am I facing collecting free list statistics at this point? >>>>>>> >>>>>>> Thanks, >>>>>>> Taras >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> hotspot-gc-use mailing list >>>>>> hotspot-gc-use at openjdk.java.net >>>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>>>>> >>>>> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From taras.tielkes at gmail.com Sun Apr 15 08:08:03 2012 From: taras.tielkes at gmail.com (Taras Tielkes) Date: Sun, 15 Apr 2012 17:08:03 +0200 Subject: Promotion failures: indication of CMS fragmentation? In-Reply-To: References: <4EF9FCAC.3030208@oracle.com> <4F06A270.3010701@oracle.com> <4F0DBEC4.7040907@oracle.com> <4F1ECE7B.3040502@oracle.com> <4F1F2ED7.6060308@oracle.com> <4F20F78D.9070905@oracle.com> Message-ID: Hi Chi, Is it o.k. if I send this off-list to you directly? If so, how much more do you need? Just enough to cover the previous CMS? We're running with -XX:CMSInitiatingOccupancyFraction=68 and -XX:+UseCMSInitiatingOccupancyOnly, by the way. I do have shell access, however, on that particular machine we're experiencing the "process id not found" issue with jstat. I think this can be worked around by fiddling with temp directory options, but we haven't tried that yet. Regarding the jstat output, I assume this would be most valuable to have for the exact moment when the promotion failure happens, correct? If so, we can try to set up jstat to run in the background continuously, to have more diagnostic data in the future. Kind regards, Taras On Sun, Apr 15, 2012 at 2:48 PM, Chi Ho Kwok wrote: > Hi Teras, > > Can you send me a larger chunk of the log? I'm interested in seeing when the > last CMS was run and what it freed. Maybe it's kicking in too late, the full > GC triggered by promotion failure only found 600M live data, rest was > garbage. If that's the cause, lowering?XX:CMSInitiatingOccupancyFraction can > help. > > Also, do you have shell access to that machine? If so, try running jstat, > you can see the usage of all generations live as it happens. > > > Chi Ho Kwok > > On Sun, Apr 15, 2012 at 2:34 PM, Taras Tielkes > wrote: >> >> Hi Chi, Srinivas, >> >> Optimizing the cost of ParNew (by lowering MTT) would be nice, but for >> now my priority is still to minimize the promotion failures. >> >> For example, on the machine running CMS with the "larger" young gen >> and survivor spaces (-Xmx5g -Xmn800 -XX:SurvivorRatio=4), I've just >> seen a promotion failure again. Below is a snippet of gc.log showing >> this. >> To put this into perspective, this is a first promotion failure on >> that machine in a couple of weeks. Still, zero failures would beat a >> single failure, since the clients connecting to this application will >> only wait a few seconds before timing out and terminating the >> connection. In addition, the promotion failures are occurring in peak >> usage moments. >> >> Apart from trying to eliminate the promotion failure pauses, my main >> goal is to learn how to understand the root cause in a case like this. >> Any suggestions for things to try or read up on are appreciated. >> >> Kind regards, >> Taras >> ------------------------------------------------ >> 2012-04-13T17:44:27.777+0200: 3296255.045: [GC 3296255.046: [ParNew >> Desired survivor size 69894144 bytes, new threshold 15 (max 15) >> - age ? 1: ? ?3684448 bytes, ? ?3684448 total >> - age ? 2: ? ? 824984 bytes, ? ?4509432 total >> - age ? 3: ? ? 885120 bytes, ? ?5394552 total >> - age ? 4: ? ? 756568 bytes, ? ?6151120 total >> - age ? 5: ? ? 696880 bytes, ? ?6848000 total >> - age ? 6: ? ? 890688 bytes, ? ?7738688 total >> - age ? 7: ? ?2631184 bytes, ? 10369872 total >> - age ? 8: ? ? 719976 bytes, ? 11089848 total >> - age ? 9: ? ? 724944 bytes, ? 11814792 total >> - age ?10: ? ? 750360 bytes, ? 12565152 total >> - age ?11: ? ? 934944 bytes, ? 13500096 total >> - age ?12: ? ? 521080 bytes, ? 14021176 total >> - age ?13: ? ? 543392 bytes, ? 14564568 total >> - age ?14: ? ? 906616 bytes, ? 15471184 total >> - age ?15: ? ? 504008 bytes, ? 15975192 total >> : 568932K->22625K(682688K), 0.0410180 secs] >> 3077079K->2531413K(5106368K), 0.0416940 secs] [Times: user=0.30 >> sys=0.01, real=0.05 secs] >> 2012-04-13T17:44:33.893+0200: 3296261.162: [GC 3296261.162: [ParNew >> Desired survivor size 69894144 bytes, new threshold 15 (max 15) >> - age ? 1: ? ?2975896 bytes, ? ?2975896 total >> - age ? 2: ? ? 742592 bytes, ? ?3718488 total >> - age ? 3: ? ? 812864 bytes, ? ?4531352 total >> - age ? 4: ? ? 873488 bytes, ? ?5404840 total >> - age ? 5: ? ? 746128 bytes, ? ?6150968 total >> - age ? 6: ? ? 685192 bytes, ? ?6836160 total >> - age ? 7: ? ? 888376 bytes, ? ?7724536 total >> - age ? 8: ? ?2621688 bytes, ? 10346224 total >> - age ? 9: ? ? 715608 bytes, ? 11061832 total >> - age ?10: ? ? 723336 bytes, ? 11785168 total >> - age ?11: ? ? 749856 bytes, ? 12535024 total >> - age ?12: ? ? 914632 bytes, ? 13449656 total >> - age ?13: ? ? 520944 bytes, ? 13970600 total >> - age ?14: ? ? 543224 bytes, ? 14513824 total >> - age ?15: ? ? 906040 bytes, ? 15419864 total >> : 568801K->22726K(682688K), 0.0447800 secs] >> 3077589K->2532008K(5106368K), 0.0454710 secs] [Times: user=0.33 >> sys=0.00, real=0.05 secs] >> 2012-04-13T17:44:40.231+0200: 3296267.499: [GC 3296267.500: [ParNew >> (1: promotion failure size = 16) ?(2: promotion failure size = 56) >> (4: promotion failure >> size = 342) ?(5: promotion failure size = 1026) ?(6: promotion failure >> size = 278) ?(promotion failed) >> Desired survivor size 69894144 bytes, new threshold 15 (max 15) >> - age ? 1: ? ?2436840 bytes, ? ?2436840 total >> - age ? 2: ? ?1625136 bytes, ? ?4061976 total >> - age ? 3: ? ? 691664 bytes, ? ?4753640 total >> - age ? 4: ? ? 799992 bytes, ? ?5553632 total >> - age ? 5: ? ? 858344 bytes, ? ?6411976 total >> - age ? 6: ? ? 730200 bytes, ? ?7142176 total >> - age ? 7: ? ? 680072 bytes, ? ?7822248 total >> - age ? 8: ? ? 885960 bytes, ? ?8708208 total >> - age ? 9: ? ?2618544 bytes, ? 11326752 total >> - age ?10: ? ? 709168 bytes, ? 12035920 total >> - age ?11: ? ? 714576 bytes, ? 12750496 total >> - age ?12: ? ? 734976 bytes, ? 13485472 total >> - age ?13: ? ? 905048 bytes, ? 14390520 total >> - age ?14: ? ? 520320 bytes, ? 14910840 total >> - age ?15: ? ? 543056 bytes, ? 15453896 total >> : 568902K->568678K(682688K), 0.3130580 secs]3296267.813: [CMS: >> 2510091K->573489K(4423680K), 7.7481330 secs] >> 3078184K->573489K(5106368K), [CMS Perm : 144002K-> >> 143970K(262144K)], 8.0619690 secs] [Times: user=8.35 sys=0.01, real=8.06 >> secs] >> 2012-04-13T17:44:51.337+0200: 3296278.606: [GC 3296278.606: [ParNew >> Desired survivor size 69894144 bytes, new threshold 15 (max 15) >> - age ? 1: ? 33717528 bytes, ? 33717528 total >> : 546176K->43054K(682688K), 0.0515990 secs] >> 1119665K->616543K(5106368K), 0.0523550 secs] [Times: user=0.34 >> sys=0.00, real=0.05 secs] >> ------------------------------------------------ >> >> On Tue, Mar 20, 2012 at 10:12 PM, Srinivas Ramakrishna >> wrote: >> > As Chi-ho noted, about 3-4 MB of data does get promoted per scavenge, >> > after having >> > sloshed around in your survivor spaces some 15 times. I'd venture that >> > whatever winnowing >> > of young objects was to ocur has in fact occured already within the >> > first 3-4 scavenges that >> > an object has survived, after which the drop-off in population is less >> > sharp. So I'd suggest >> > lowering the MTT to about 3, while leaving the survivor ratio intact. >> > That should reduce your >> > copying costs and bring down your scavenge pauses further, while not >> > adversely affecting >> > your promotion rates (and concomitantly the fragmentation). >> > >> > One thing that was a bit puzzling about the stats below was that you'd >> > expect the volume >> > of generation X in scavenge N to be no less than the volume of >> > generation X+1 in scavenge N+1, >> > but occasionally that natural invariant does not appear to hold, which >> > is quite puzzling -- >> > indicating perhaps that either ages or populations are not being >> > correctly tracked. >> > >> > I don't know if anyone else has noticed that in their tenuring >> > distributions as well.... >> > >> > -- ramki >> > >> > On Tue, Mar 20, 2012 at 9:36 AM, Taras Tielkes >> > wrote: >> >> Hi, >> >> >> >> I've collected -XX:+PrintTenuringDistribution data from a node in our >> >> production environment, running -Xmx5g -Xmn400m -XX:SurvivorRatio=8. >> >> On one other production node, we've configured a larger new gen, and >> >> larger survivor spaces (-Xmx5g -Xmn800m -XX:SurvivorRatio=4). >> >> This node has -XX:+PrintTenuringDistribution logging as well. >> >> >> >> The node running the larger new gen and survivor spaces has not run >> >> into a promotion failure yet, while the ones still running the old >> >> config have hit a few. >> >> The promotion failures are typically experienced at high load periods, >> >> which makes sense, as allocation and promotion will experience a spike >> >> in those periods as well. >> >> >> >> The inherent nature of the application implies relatively long >> >> sessions (towards a few hours), retaining a fair amout of state up to >> >> an hour. >> >> I believe this is the main reason of the relatively high promotion >> >> rate we're experiencing. >> >> >> >> >> >> Here's a fragment of gc log from one of the nodes running the older >> >> (smaller) new gen, including a promotion failure: >> >> ------------------------- >> >> 2012-03-15T18:32:17.785+0100: 796604.225: [GC 796604.225: [ParNew >> >> Desired survivor size 20971520 bytes, new threshold 8 (max 15) >> >> - age ? 1: ? ?2927728 bytes, ? ?2927728 total >> >> - age ? 2: ? ?2428512 bytes, ? ?5356240 total >> >> - age ? 3: ? ?2696376 bytes, ? ?8052616 total >> >> - age ? 4: ? ?2623576 bytes, ? 10676192 total >> >> - age ? 5: ? ?3365576 bytes, ? 14041768 total >> >> - age ? 6: ? ?2792272 bytes, ? 16834040 total >> >> - age ? 7: ? ?2233008 bytes, ? 19067048 total >> >> - age ? 8: ? ?2263824 bytes, ? 21330872 total >> >> : 358709K->29362K(368640K), 0.0461460 secs] >> >> 3479492K->3151874K(5201920K), 0.0467320 secs] [Times: user=0.34 >> >> sys=0.01, real=0.05 secs] >> >> 2012-03-15T18:32:21.546+0100: 796607.986: [GC 796607.986: [ParNew (0: >> >> promotion failure size = 25) ?(1: promotion failure size = 25) ?(2: >> >> promotion failure size = 25) ?(3: promotion failure size = 25) ?(4: >> >> promotion failure size = 25) ?(5 >> >> : promotion failure size = 25) ?(6: promotion failure size = 341) ?(7: >> >> promotion failure size = 25) ?(promotion failed) >> >> Desired survivor size 20971520 bytes, new threshold 8 (max 15) >> >> - age ? 1: ? ?3708208 bytes, ? ?3708208 total >> >> - age ? 2: ? ?2174384 bytes, ? ?5882592 total >> >> - age ? 3: ? ?2383256 bytes, ? ?8265848 total >> >> - age ? 4: ? ?2689912 bytes, ? 10955760 total >> >> - age ? 5: ? ?2621832 bytes, ? 13577592 total >> >> - age ? 6: ? ?3360440 bytes, ? 16938032 total >> >> - age ? 7: ? ?2784136 bytes, ? 19722168 total >> >> - age ? 8: ? ?2220232 bytes, ? 21942400 total >> >> : 357042K->356456K(368640K), 0.2734100 secs]796608.259: [CMS: >> >> 3124189K->516640K(4833280K), 6.8127070 secs] >> >> 3479554K->516640K(5201920K), [CMS Perm : 142423K->142284K(262144K)], >> >> 7.0867850 secs] [Times: user=7.32 sys=0.07, real=7.09 secs] >> >> 2012-03-15T18:32:30.279+0100: 796616.719: [GC 796616.720: [ParNew >> >> Desired survivor size 20971520 bytes, new threshold 1 (max 15) >> >> - age ? 1: ? 29721456 bytes, ? 29721456 total >> >> : 327680K->40960K(368640K), 0.0403130 secs] >> >> 844320K->557862K(5201920K), 0.0409070 secs] [Times: user=0.27 >> >> sys=0.01, real=0.04 secs] >> >> 2012-03-15T18:32:32.701+0100: 796619.141: [GC 796619.141: [ParNew >> >> Desired survivor size 20971520 bytes, new threshold 15 (max 15) >> >> - age ? 1: ? 10310176 bytes, ? 10310176 total >> >> ------------------------- >> >> >> >> For contrast, here's a gc log fragment from the single node running >> >> the larger new gen and larger survivor spaces: >> >> (the fragment is from the same point in time, with the nodes >> >> experiencing equal load) >> >> ------------------------- >> >> 2012-03-15T18:32:12.067+0100: 797119.336: [GC 797119.336: [ParNew >> >> Desired survivor size 69894144 bytes, new threshold 15 (max 15) >> >> - age ? 1: ? ?5611536 bytes, ? ?5611536 total >> >> - age ? 2: ? ?3731888 bytes, ? ?9343424 total >> >> - age ? 3: ? ?3450672 bytes, ? 12794096 total >> >> - age ? 4: ? ?3314744 bytes, ? 16108840 total >> >> - age ? 5: ? ?3459888 bytes, ? 19568728 total >> >> - age ? 6: ? ?3334712 bytes, ? 22903440 total >> >> - age ? 7: ? ?3671960 bytes, ? 26575400 total >> >> - age ? 8: ? ?3841608 bytes, ? 30417008 total >> >> - age ? 9: ? ?2035392 bytes, ? 32452400 total >> >> - age ?10: ? ?1975056 bytes, ? 34427456 total >> >> - age ?11: ? ?2021344 bytes, ? 36448800 total >> >> - age ?12: ? ?1520752 bytes, ? 37969552 total >> >> - age ?13: ? ?1494176 bytes, ? 39463728 total >> >> - age ?14: ? ?2355136 bytes, ? 41818864 total >> >> - age ?15: ? ?1279000 bytes, ? 43097864 total >> >> : 603473K->61640K(682688K), 0.0756570 secs] >> >> 3373284K->2832383K(5106368K), 0.0762090 secs] [Times: user=0.56 >> >> sys=0.00, real=0.08 secs] >> >> 2012-03-15T18:32:18.200+0100: 797125.468: [GC 797125.469: [ParNew >> >> Desired survivor size 69894144 bytes, new threshold 15 (max 15) >> >> - age ? 1: ? ?6101320 bytes, ? ?6101320 total >> >> - age ? 2: ? ?4446776 bytes, ? 10548096 total >> >> - age ? 3: ? ?3701384 bytes, ? 14249480 total >> >> - age ? 4: ? ?3438488 bytes, ? 17687968 total >> >> - age ? 5: ? ?3295360 bytes, ? 20983328 total >> >> - age ? 6: ? ?3403320 bytes, ? 24386648 total >> >> - age ? 7: ? ?3323368 bytes, ? 27710016 total >> >> - age ? 8: ? ?3665760 bytes, ? 31375776 total >> >> - age ? 9: ? ?2427904 bytes, ? 33803680 total >> >> - age ?10: ? ?1418656 bytes, ? 35222336 total >> >> - age ?11: ? ?1955192 bytes, ? 37177528 total >> >> - age ?12: ? ?2006064 bytes, ? 39183592 total >> >> - age ?13: ? ?1520768 bytes, ? 40704360 total >> >> - age ?14: ? ?1493728 bytes, ? 42198088 total >> >> - age ?15: ? ?2354376 bytes, ? 44552464 total >> >> : 607816K->62650K(682688K), 0.0779270 secs] >> >> 3378559K->2834643K(5106368K), 0.0784690 secs] [Times: user=0.58 >> >> sys=0.00, real=0.08 secs] >> >> ------------------------- >> >> >> >> Questions: >> >> >> >> 1) From the tenuring distributions, it seems that the application >> >> benefits from larger new gen and survivor spaces. >> >> The next thing we'll try is to run with -Xmn1g -XX:SurvivorRatio=2, >> >> and see if the ParNew times are still acceptable. >> >> Does this seem a sensible approach in this context? >> >> Are there other variables beyond ParNew times that limit scaling the >> >> new gen to a large size? >> >> >> >> 2) Given the object age demographics inherent to our application, we >> >> can not expect to see the majority of data get collected in the new >> >> gen. >> >> >> >> Our approach to fight the promotion failures consists of three aspects: >> >> a) Lower the overall allocation rate of our application (by improving >> >> wasteful hotspots), to decrease overall ParNew collection frequency. >> >> b) Configure the new gen and survivor spaces as large as possible, >> >> keeping an eye on ParNew times and overall new/tenured ratio. >> >> c) Try to refactor the data structures that form the bulk of promoted >> >> data, to retain only the strictly required subgraphs. >> >> >> >> Is there anything else I can try or measure, in order to better >> >> understand the problem? >> >> >> >> Thanks in advance, >> >> Taras >> >> >> >> >> >> On Wed, Feb 22, 2012 at 10:51 AM, Taras Tielkes >> >> wrote: >> >>> (this time properly responding to the list alias) >> >>> Hi Srinivas, >> >>> >> >>> We're running 1.6.0 u29 on Linux x64. My understanding is that >> >>> CompressedOops is enabled by default since u23. >> >>> >> >>> At least this page seems to support that: >> >>> >> >>> http://docs.oracle.com/javase/7/docs/technotes/guides/vm/performance-enhancements-7.html >> >>> >> >>> Regarding the other remarks (also from Todd and Chi), I'll comment >> >>> later. The first thing on my list is to collect >> >>> PrintTenuringDistribution data now. >> >>> >> >>> Kind regards, >> >>> Taras >> >>> >> >>> On Wed, Feb 22, 2012 at 10:50 AM, Taras Tielkes >> >>> wrote: >> >>>> Hi Srinivas, >> >>>> >> >>>> We're running 1.6.0 u29 on Linux x64. My understanding is that >> >>>> CompressedOops is enabled by default since u23. >> >>>> >> >>>> At least this page seems to support that: >> >>>> >> >>>> http://docs.oracle.com/javase/7/docs/technotes/guides/vm/performance-enhancements-7.html >> >>>> >> >>>> Regarding the other remarks (also from Todd and Chi), I'll comment >> >>>> later. The first thing on my list is to collect >> >>>> PrintTenuringDistribution data now. >> >>>> >> >>>> Kind regards, >> >>>> Taras >> >>>> >> >>>> On Wed, Feb 22, 2012 at 12:40 AM, Srinivas Ramakrishna >> >>>> wrote: >> >>>>> I agree that premature promotions are almost always the first and >> >>>>> most >> >>>>> important thing to fix when running >> >>>>> into fragmentation or overload issues with CMS. However, I can also >> >>>>> imagine >> >>>>> long-lived objects with a highly >> >>>>> non-stationary size distribution which can also cause problems for >> >>>>> CMS >> >>>>> despite best efforts to tune against >> >>>>> premature promotion. >> >>>>> >> >>>>> I didn't think Treas was running with MTT=0, although MTT > 0 is no >> >>>>> recipe >> >>>>> for avoiding premature promotion >> >>>>> with bursty loads that case overflow the survivor spaces -- as you >> >>>>> say large >> >>>>> survivor spaces with a low >> >>>>> TargetSurvivorRatio -- so as to leave plenty of space to >> >>>>> absorb/accommodate >> >>>>> spiking/bursty loads? is >> >>>>> definitely a "best practice" for CMS (and possibly for other >> >>>>> concurrent >> >>>>> collectors as well). >> >>>>> >> >>>>> One thing Taras can do to see if premature promotion might be an >> >>>>> issue is to >> >>>>> look at the tenuring >> >>>>> threshold in his case. A rough proxy (if PrintTenuringDistribution >> >>>>> is not >> >>>>> enabled) is to look at the >> >>>>> promotion volume per scavenge. It may be possible, if premature >> >>>>> promotion is >> >>>>> a cause, to see >> >>>>> some kind of medium-term correlation between high promotion volume >> >>>>> and >> >>>>> eventual promotion >> >>>>> failure despite frequent CMS collections. >> >>>>> >> >>>>> One other point which may or may not be relevant. I see that Taras >> >>>>> is not >> >>>>> using CompressedOops... >> >>>>> Using that alone would greatly decrease memory pressure and provide >> >>>>> more >> >>>>> breathing room to CMS, >> >>>>> which is also almost always a good idea. >> >>>>> >> >>>>> -- ramki >> >>>>> >> >>>>> On Tue, Feb 21, 2012 at 10:16 AM, Chi Ho Kwok >> >>>>> wrote: >> >>>>>> >> >>>>>> Hi Teras, >> >>>>>> >> >>>>>> I think you may want to look into sizing the new and especially the >> >>>>>> survivor spaces differently. We run something similar to what you >> >>>>>> described, >> >>>>>> high volume request processing with large dataset loading, and what >> >>>>>> we've >> >>>>>> seen at the start is that the survivor spaces are completely >> >>>>>> overloaded, >> >>>>>> causing premature promotions. >> >>>>>> >> >>>>>> We've configured our vm with the following goals/guideline: >> >>>>>> >> >>>>>> old space is for semi-permanent data, living for at least 30s, >> >>>>>> average ~10 >> >>>>>> minutes >> >>>>>> new space contains only temporary and just loaded data >> >>>>>> surviving objects from new should never reach old in 1 gc, so the >> >>>>>> survivor >> >>>>>> space may never be 100% full >> >>>>>> >> >>>>>> With jstat -gcutil `pidof java` 2000, we see things like: >> >>>>>> >> >>>>>> ? S0 ? ? S1 ? ? E ? ? ?O ? ? ?P ? ? YGC ? ? YGCT ? ?FGC ? ?FGCT >> >>>>>> GCT >> >>>>>> ?70.20 ? 0.00 ?19.65 ?57.60 ?59.90 124808 29474.299 ?2498 ?191.110 >> >>>>>> 29665.409 >> >>>>>> ?70.20 ? 0.00 ?92.89 ?57.60 ?59.90 124808 29474.299 ?2498 ?191.110 >> >>>>>> 29665.409 >> >>>>>> ?70.20 ? 0.00 ?93.47 ?57.60 ?59.90 124808 29474.299 ?2498 ?191.110 >> >>>>>> 29665.409 >> >>>>>> ? 0.00 ?65.69 ?78.07 ?58.09 ?59.90 124809 29474.526 ?2498 ?191.110 >> >>>>>> 29665.636 >> >>>>>> ?84.97 ? 0.00 ?48.19 ?58.57 ?59.90 124810 29474.774 ?2498 ?191.110 >> >>>>>> 29665.884 >> >>>>>> ?84.97 ? 0.00 ?81.30 ?58.57 ?59.90 124810 29474.774 ?2498 ?191.110 >> >>>>>> 29665.884 >> >>>>>> ? 0.00 ?62.64 ?27.22 ?59.12 ?59.90 124811 29474.992 ?2498 ?191.110 >> >>>>>> 29666.102 >> >>>>>> ? 0.00 ?62.64 ?54.47 ?59.12 ?59.90 124811 29474.992 ?2498 ?191.110 >> >>>>>> 29666.102 >> >>>>>> ?75.68 ? 0.00 ? 6.80 ?59.53 ?59.90 124812 29475.228 ?2498 ?191.110 >> >>>>>> 29666.338 >> >>>>>> ?75.68 ? 0.00 ?23.38 ?59.53 ?59.90 124812 29475.228 ?2498 ?191.110 >> >>>>>> 29666.338 >> >>>>>> ?75.68 ? 0.00 ?27.72 ?59.53 ?59.90 124812 29475.228 ?2498 ?191.110 >> >>>>>> 29666.338 >> >>>>>> >> >>>>>> If you follow the lines, you can see Eden fill up to 100% on line >> >>>>>> 4, >> >>>>>> surviving objects are copied into S1, S0 is collected and added >> >>>>>> 0.49% to >> >>>>>> Old. On line 5, another GC happened, with Eden->S0, S1->Old, etc. >> >>>>>> No objects >> >>>>>> is ever transferred from Eden to Old, unless there's a huge peak of >> >>>>>> requests. >> >>>>>> >> >>>>>> This is with a: 32GB heap, Mxn1200M, SurvivorRatio 2 (600MB Eden, >> >>>>>> 300MB >> >>>>>> S0, 300MB S1), MaxTenuringThreshold 1 (whatever is still alive in >> >>>>>> S0/1 on >> >>>>>> the second GC is copied to old, don't wait, web requests are quite >> >>>>>> bursty). >> >>>>>> With about 1 collection every 2-5 seconds, objects promoted to Old >> >>>>>> must live >> >>>>>> for at 4-10 seconds; as that's longer than an average request >> >>>>>> (50ms-1s), >> >>>>>> none of the temporary data ever makes it into Old, which is much >> >>>>>> more >> >>>>>> expensive to collect. It works even with a higher than default >> >>>>>> CMSInitiatingOccupancyFraction=76 to optimize for space available >> >>>>>> for the >> >>>>>> large data cache we have. >> >>>>>> >> >>>>>> >> >>>>>> With your config of 400MB Total new, with 350MB Eden, 25MB S0, 25MB >> >>>>>> S1 >> >>>>>> (SurvivorRatio 8), no tenuring threshold, I think loads of new >> >>>>>> objects get >> >>>>>> copied from Eden to Old directly, causing trouble for the CMS. You >> >>>>>> can use >> >>>>>> jstat to get live stats and tweak until it doesn't happen. If you >> >>>>>> can't make >> >>>>>> changes on live that easil, try doubling the new size indeed, with >> >>>>>> a 400 >> >>>>>> Eden, 200 S0, 200 S1 and?MaxTenuringThreshold?1 setting. It's >> >>>>>> probably >> >>>>>> overkill, but if should solve the problem if it is caused by >> >>>>>> premature >> >>>>>> promotion. >> >>>>>> >> >>>>>> >> >>>>>> Chi Ho Kwok >> >>>>>> >> >>>>>> >> >>>>>> On Tue, Feb 21, 2012 at 5:55 PM, Taras Tielkes >> >>>>>> >> >>>>>> wrote: >> >>>>>>> >> >>>>>>> Hi, >> >>>>>>> >> >>>>>>> We've removed the "-XX:+CMSScavengeBeforeRemark" setting from 50% >> >>>>>>> of >> >>>>>>> our production nodes. >> >>>>>>> After running for a few weeks, it seems that there's no impact >> >>>>>>> from >> >>>>>>> removing this option. >> >>>>>>> Which is good, since it seems we can remove it from the other >> >>>>>>> nodes as >> >>>>>>> well, simplifying our overall JVM configuration ;-) >> >>>>>>> >> >>>>>>> However, we're still seeing promotion failures on all nodes, once >> >>>>>>> every day or so. >> >>>>>>> >> >>>>>>> There's still the "Magic 1026": this accounts for ~60% of the >> >>>>>>> promotion failures that we're seeing (single ParNew thread thread, >> >>>>>>> 1026 failure size): >> >>>>>>> -------------------- >> >>>>>>> 2012-02-06T09:13:51.806+0100: 328095.085: [GC 328095.086: [ParNew: >> >>>>>>> 359895K->29357K(368640K), 0.0429070 secs] >> >>>>>>> 3471021K->3143476K(5201920K), 0.0434950 secs] [Times: user=0.32 >> >>>>>>> sys=0.00, real=0.04 secs] >> >>>>>>> 2012-02-06T09:13:55.922+0100: 328099.201: [GC 328099.201: [ParNew: >> >>>>>>> 357037K->31817K(368640K), 0.0429130 secs] >> >>>>>>> 3471156K->3148946K(5201920K), 0.0434930 secs] [Times: user=0.31 >> >>>>>>> sys=0.00, real=0.04 secs] >> >>>>>>> 2012-02-06T09:13:59.044+0100: 328102.324: [GC 328102.324: [ParNew >> >>>>>>> (promotion failure size = 1026) ?(promotion failed): >> >>>>>>> 359497K->368640K(368640K), 0.2226790 secs]328102.547: [CMS: >> >>>>>>> 3125609K->451515K(4833280K), 5.6225880 secs] 3476626K->4515 >> >>>>>>> 15K(5201920K), [CMS Perm : 124373K->124353K(262144K)], 5.8459380 >> >>>>>>> secs] >> >>>>>>> [Times: user=6.20 sys=0.01, real=5.85 secs] >> >>>>>>> 2012-02-06T09:14:05.243+0100: 328108.522: [GC 328108.523: [ParNew: >> >>>>>>> 327680K->40960K(368640K), 0.0319160 secs] >> >>>>>>> 779195K->497658K(5201920K), >> >>>>>>> 0.0325360 secs] [Times: user=0.21 sys=0.01, real=0.03 secs] >> >>>>>>> 2012-02-06T09:14:07.836+0100: 328111.116: [GC 328111.116: [ParNew: >> >>>>>>> 368640K->32785K(368640K), 0.0744670 secs] >> >>>>>>> 825338K->520234K(5201920K), >> >>>>>>> 0.0750390 secs] [Times: user=0.40 sys=0.02, real=0.08 secs] >> >>>>>>> -------------------- >> >>>>>>> Given the 1026 word size, I'm wondering if I should be hunting for >> >>>>>>> an >> >>>>>>> overuse of BufferedInputStream/BufferedOutoutStream, since both >> >>>>>>> have >> >>>>>>> 8192 as a default buffer size. >> >>>>>>> >> >>>>>>> The second group of promotion failures look like this (multiple >> >>>>>>> ParNew >> >>>>>>> threads, small failure sizes): >> >>>>>>> -------------------- >> >>>>>>> 2012-02-06T09:50:15.773+0100: 328756.964: [GC 328756.964: [ParNew: >> >>>>>>> 356116K->29934K(368640K), 0.0461100 secs] >> >>>>>>> 3203863K->2880162K(5201920K), 0.0468870 secs] [Times: user=0.34 >> >>>>>>> sys=0.01, real=0.05 secs] >> >>>>>>> 2012-02-06T09:50:19.153+0100: 328760.344: [GC 328760.344: [ParNew: >> >>>>>>> 357614K->30359K(368640K), 0.0454680 secs] >> >>>>>>> 3207842K->2882892K(5201920K), 0.0462280 secs] [Times: user=0.33 >> >>>>>>> sys=0.01, real=0.05 secs] >> >>>>>>> 2012-02-06T09:50:22.658+0100: 328763.849: [GC 328763.849: [ParNew >> >>>>>>> (1: >> >>>>>>> promotion failure size = 25) ?(4: promotion failure size = 25) >> >>>>>>> ?(6: >> >>>>>>> promotion failure size = 25) ?(7: promotion failure size = 144) >> >>>>>>> (promotion failed): 358039K->358358 >> >>>>>>> K(368640K), 0.2148680 secs]328764.064: [CMS: >> >>>>>>> 2854709K->446750K(4833280K), 5.8368270 secs] >> >>>>>>> 3210572K->446750K(5201920K), [CMS Perm : >> >>>>>>> 124670K->124644K(262144K)], >> >>>>>>> 6.0525230 secs] [Times: user=6.32 sys=0.00, real=6.05 secs] >> >>>>>>> 2012-02-06T09:50:29.896+0100: 328771.086: [GC 328771.087: [ParNew: >> >>>>>>> 327680K->22569K(368640K), 0.0227080 secs] >> >>>>>>> 774430K->469319K(5201920K), >> >>>>>>> 0.0235020 secs] [Times: user=0.16 sys=0.00, real=0.02 secs] >> >>>>>>> 2012-02-06T09:50:31.076+0100: 328772.266: [GC 328772.267: [ParNew: >> >>>>>>> 350249K->22264K(368640K), 0.0235480 secs] >> >>>>>>> 796999K->469014K(5201920K), >> >>>>>>> 0.0243000 secs] [Times: user=0.18 sys=0.01, real=0.02 secs] >> >>>>>>> -------------------- >> >>>>>>> >> >>>>>>> We're going to try to double the new size on a single node, to see >> >>>>>>> the >> >>>>>>> effects of that. >> >>>>>>> >> >>>>>>> Beyond this experiment, is there any additional data I can collect >> >>>>>>> to >> >>>>>>> better understand the nature of the promotion failures? >> >>>>>>> Am I facing collecting free list statistics at this point? >> >>>>>>> >> >>>>>>> Thanks, >> >>>>>>> Taras >> >>>>>> >> >>>>>> >> >>>>>> _______________________________________________ >> >>>>>> hotspot-gc-use mailing list >> >>>>>> hotspot-gc-use at openjdk.java.net >> >>>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> >>>>>> >> >>>>> >> >> _______________________________________________ >> >> hotspot-gc-use mailing list >> >> hotspot-gc-use at openjdk.java.net >> >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > From taras.tielkes at gmail.com Sun Apr 15 09:41:02 2012 From: taras.tielkes at gmail.com (Taras Tielkes) Date: Sun, 15 Apr 2012 18:41:02 +0200 Subject: Promotion failures: indication of CMS fragmentation? In-Reply-To: References: <4EF9FCAC.3030208@oracle.com> <4F06A270.3010701@oracle.com> <4F0DBEC4.7040907@oracle.com> <4F1ECE7B.3040502@oracle.com> <4F1F2ED7.6060308@oracle.com> <4F20F78D.9070905@oracle.com> Message-ID: Hi Chi, I've sent you a decent chunk of the gc.log file off-list (hopefully not too large). For completeness, we're running with the following options (ignoring the diagnostic ones): ----- -server -Xms5g -Xmx5g -Xmn800m -XX:PermSize=256m -XX:MaxPermSize=256m -XX:SurvivorRatio=4 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+DisableExplicitGC -XX:+UseCMSInitiatingOccupancyOnly -XX:+CMSClassUnloadingEnabled -XX:CMSInitiatingOccupancyFraction=68 ----- Platform is Java 6u29 running on Linux 2.6 x64. Hardware is 2xquad Xeons, but pretty old ones (pre-Nehalem, no QPI). The gc logs will (typically) show big peaks at the start and end of the working day - this is nature of the domain our application targets. I would expect the live set to be below 1G (usually below 600M even). However, we can experience temporary spikes of higher volume longer-living object allocation bursts. We'll set up a jstat log for this machine. I do have historical jstat logs for one of the other machines, but that one is still running with a smaller new gen, and smaller survivor spaces. If there's any other relevant data that I can collect, let me know. Kind regards, Taras On Sun, Apr 15, 2012 at 6:15 PM, Chi Ho Kwok wrote: > Hi Teras, > > Sure thing. Just the previous CMS should be enough, it doesn't matter if > there is 10 or 1000 parnew's between that and the failure. > > As for the jstat failure, it looks like it looks in > /tmp/hsperfdata_[username] for the pid by default, maybe something > like?-J-Djava.io.tmpdir=[path, like /app/client/program/tomcat/temp] can > help; and from what I've seen, running jstat as the same user as the process > or root is required. Historical data is nice to have, but even just staring > at it for 15 minutes should give you a hint for the old gen usage. > > If the collection starts at 68, takes a while and the heap fills to 80%+ > before it's done when it's not busy, it's probably wise to lower the initial > occupancy factor or increase the thread count so it completes faster. We run > with?-XX:ParallelCMSThreads=3 on a 8 hw thread server, the default (2) was > too slow for us as we run with 76%, it still takes 15s on average for CMS to > scan and clean the old gen (while old gen grows to up to 80% full), much > longer can mean a promotion failure during request spikes. > > > Chi Ho Kwok > > > On Sun, Apr 15, 2012 at 5:08 PM, Taras Tielkes > wrote: >> >> Hi Chi, >> >> Is it o.k. if I send this off-list to you directly? If so, how much >> more do you need? Just enough to cover the previous CMS? >> We're running with ?-XX:CMSInitiatingOccupancyFraction=68 and >> -XX:+UseCMSInitiatingOccupancyOnly, by the way. >> >> I do have shell access, however, on that particular machine we're >> experiencing the "process id not found" issue with jstat. >> I think this can be worked around by fiddling with temp directory >> options, but we haven't tried that yet. >> Regarding the jstat output, I assume this would be most valuable to >> have for the exact moment when the promotion failure happens, correct? >> If so, we can try to set up jstat to run in the background >> continuously, to have more diagnostic data in the future. >> >> Kind regards, >> Taras >> >> On Sun, Apr 15, 2012 at 2:48 PM, Chi Ho Kwok wrote: >> > Hi Teras, >> > >> > Can you send me a larger chunk of the log? I'm interested in seeing when >> > the >> > last CMS was run and what it freed. Maybe it's kicking in too late, the >> > full >> > GC triggered by promotion failure only found 600M live data, rest was >> > garbage. If that's the cause, lowering?XX:CMSInitiatingOccupancyFraction >> > can >> > help. >> > >> > Also, do you have shell access to that machine? If so, try running >> > jstat, >> > you can see the usage of all generations live as it happens. >> > >> > >> > Chi Ho Kwok >> > >> > On Sun, Apr 15, 2012 at 2:34 PM, Taras Tielkes >> > wrote: >> >> >> >> Hi Chi, Srinivas, >> >> >> >> Optimizing the cost of ParNew (by lowering MTT) would be nice, but for >> >> now my priority is still to minimize the promotion failures. >> >> >> >> For example, on the machine running CMS with the "larger" young gen >> >> and survivor spaces (-Xmx5g -Xmn800 -XX:SurvivorRatio=4), I've just >> >> seen a promotion failure again. Below is a snippet of gc.log showing >> >> this. >> >> To put this into perspective, this is a first promotion failure on >> >> that machine in a couple of weeks. Still, zero failures would beat a >> >> single failure, since the clients connecting to this application will >> >> only wait a few seconds before timing out and terminating the >> >> connection. In addition, the promotion failures are occurring in peak >> >> usage moments. >> >> >> >> Apart from trying to eliminate the promotion failure pauses, my main >> >> goal is to learn how to understand the root cause in a case like this. >> >> Any suggestions for things to try or read up on are appreciated. >> >> >> >> Kind regards, >> >> Taras >> >> ------------------------------------------------ >> >> 2012-04-13T17:44:27.777+0200: 3296255.045: [GC 3296255.046: [ParNew >> >> Desired survivor size 69894144 bytes, new threshold 15 (max 15) >> >> - age ? 1: ? ?3684448 bytes, ? ?3684448 total >> >> - age ? 2: ? ? 824984 bytes, ? ?4509432 total >> >> - age ? 3: ? ? 885120 bytes, ? ?5394552 total >> >> - age ? 4: ? ? 756568 bytes, ? ?6151120 total >> >> - age ? 5: ? ? 696880 bytes, ? ?6848000 total >> >> - age ? 6: ? ? 890688 bytes, ? ?7738688 total >> >> - age ? 7: ? ?2631184 bytes, ? 10369872 total >> >> - age ? 8: ? ? 719976 bytes, ? 11089848 total >> >> - age ? 9: ? ? 724944 bytes, ? 11814792 total >> >> - age ?10: ? ? 750360 bytes, ? 12565152 total >> >> - age ?11: ? ? 934944 bytes, ? 13500096 total >> >> - age ?12: ? ? 521080 bytes, ? 14021176 total >> >> - age ?13: ? ? 543392 bytes, ? 14564568 total >> >> - age ?14: ? ? 906616 bytes, ? 15471184 total >> >> - age ?15: ? ? 504008 bytes, ? 15975192 total >> >> : 568932K->22625K(682688K), 0.0410180 secs] >> >> 3077079K->2531413K(5106368K), 0.0416940 secs] [Times: user=0.30 >> >> sys=0.01, real=0.05 secs] >> >> 2012-04-13T17:44:33.893+0200: 3296261.162: [GC 3296261.162: [ParNew >> >> Desired survivor size 69894144 bytes, new threshold 15 (max 15) >> >> - age ? 1: ? ?2975896 bytes, ? ?2975896 total >> >> - age ? 2: ? ? 742592 bytes, ? ?3718488 total >> >> - age ? 3: ? ? 812864 bytes, ? ?4531352 total >> >> - age ? 4: ? ? 873488 bytes, ? ?5404840 total >> >> - age ? 5: ? ? 746128 bytes, ? ?6150968 total >> >> - age ? 6: ? ? 685192 bytes, ? ?6836160 total >> >> - age ? 7: ? ? 888376 bytes, ? ?7724536 total >> >> - age ? 8: ? ?2621688 bytes, ? 10346224 total >> >> - age ? 9: ? ? 715608 bytes, ? 11061832 total >> >> - age ?10: ? ? 723336 bytes, ? 11785168 total >> >> - age ?11: ? ? 749856 bytes, ? 12535024 total >> >> - age ?12: ? ? 914632 bytes, ? 13449656 total >> >> - age ?13: ? ? 520944 bytes, ? 13970600 total >> >> - age ?14: ? ? 543224 bytes, ? 14513824 total >> >> - age ?15: ? ? 906040 bytes, ? 15419864 total >> >> : 568801K->22726K(682688K), 0.0447800 secs] >> >> 3077589K->2532008K(5106368K), 0.0454710 secs] [Times: user=0.33 >> >> sys=0.00, real=0.05 secs] >> >> 2012-04-13T17:44:40.231+0200: 3296267.499: [GC 3296267.500: [ParNew >> >> (1: promotion failure size = 16) ?(2: promotion failure size = 56) >> >> (4: promotion failure >> >> size = 342) ?(5: promotion failure size = 1026) ?(6: promotion failure >> >> size = 278) ?(promotion failed) >> >> Desired survivor size 69894144 bytes, new threshold 15 (max 15) >> >> - age ? 1: ? ?2436840 bytes, ? ?2436840 total >> >> - age ? 2: ? ?1625136 bytes, ? ?4061976 total >> >> - age ? 3: ? ? 691664 bytes, ? ?4753640 total >> >> - age ? 4: ? ? 799992 bytes, ? ?5553632 total >> >> - age ? 5: ? ? 858344 bytes, ? ?6411976 total >> >> - age ? 6: ? ? 730200 bytes, ? ?7142176 total >> >> - age ? 7: ? ? 680072 bytes, ? ?7822248 total >> >> - age ? 8: ? ? 885960 bytes, ? ?8708208 total >> >> - age ? 9: ? ?2618544 bytes, ? 11326752 total >> >> - age ?10: ? ? 709168 bytes, ? 12035920 total >> >> - age ?11: ? ? 714576 bytes, ? 12750496 total >> >> - age ?12: ? ? 734976 bytes, ? 13485472 total >> >> - age ?13: ? ? 905048 bytes, ? 14390520 total >> >> - age ?14: ? ? 520320 bytes, ? 14910840 total >> >> - age ?15: ? ? 543056 bytes, ? 15453896 total >> >> : 568902K->568678K(682688K), 0.3130580 secs]3296267.813: [CMS: >> >> 2510091K->573489K(4423680K), 7.7481330 secs] >> >> 3078184K->573489K(5106368K), [CMS Perm : 144002K-> >> >> 143970K(262144K)], 8.0619690 secs] [Times: user=8.35 sys=0.01, >> >> real=8.06 >> >> secs] >> >> 2012-04-13T17:44:51.337+0200: 3296278.606: [GC 3296278.606: [ParNew >> >> Desired survivor size 69894144 bytes, new threshold 15 (max 15) >> >> - age ? 1: ? 33717528 bytes, ? 33717528 total >> >> : 546176K->43054K(682688K), 0.0515990 secs] >> >> 1119665K->616543K(5106368K), 0.0523550 secs] [Times: user=0.34 >> >> sys=0.00, real=0.05 secs] >> >> ------------------------------------------------ >> >> >> >> On Tue, Mar 20, 2012 at 10:12 PM, Srinivas Ramakrishna >> >> wrote: >> >> > As Chi-ho noted, about 3-4 MB of data does get promoted per scavenge, >> >> > after having >> >> > sloshed around in your survivor spaces some 15 times. I'd venture >> >> > that >> >> > whatever winnowing >> >> > of young objects was to ocur has in fact occured already within the >> >> > first 3-4 scavenges that >> >> > an object has survived, after which the drop-off in population is >> >> > less >> >> > sharp. So I'd suggest >> >> > lowering the MTT to about 3, while leaving the survivor ratio intact. >> >> > That should reduce your >> >> > copying costs and bring down your scavenge pauses further, while not >> >> > adversely affecting >> >> > your promotion rates (and concomitantly the fragmentation). >> >> > >> >> > One thing that was a bit puzzling about the stats below was that >> >> > you'd >> >> > expect the volume >> >> > of generation X in scavenge N to be no less than the volume of >> >> > generation X+1 in scavenge N+1, >> >> > but occasionally that natural invariant does not appear to hold, >> >> > which >> >> > is quite puzzling -- >> >> > indicating perhaps that either ages or populations are not being >> >> > correctly tracked. >> >> > >> >> > I don't know if anyone else has noticed that in their tenuring >> >> > distributions as well.... >> >> > >> >> > -- ramki >> >> > >> >> > On Tue, Mar 20, 2012 at 9:36 AM, Taras Tielkes >> >> > >> >> > wrote: >> >> >> Hi, >> >> >> >> >> >> I've collected -XX:+PrintTenuringDistribution data from a node in >> >> >> our >> >> >> production environment, running -Xmx5g -Xmn400m -XX:SurvivorRatio=8. >> >> >> On one other production node, we've configured a larger new gen, and >> >> >> larger survivor spaces (-Xmx5g -Xmn800m -XX:SurvivorRatio=4). >> >> >> This node has -XX:+PrintTenuringDistribution logging as well. >> >> >> >> >> >> The node running the larger new gen and survivor spaces has not run >> >> >> into a promotion failure yet, while the ones still running the old >> >> >> config have hit a few. >> >> >> The promotion failures are typically experienced at high load >> >> >> periods, >> >> >> which makes sense, as allocation and promotion will experience a >> >> >> spike >> >> >> in those periods as well. >> >> >> >> >> >> The inherent nature of the application implies relatively long >> >> >> sessions (towards a few hours), retaining a fair amout of state up >> >> >> to >> >> >> an hour. >> >> >> I believe this is the main reason of the relatively high promotion >> >> >> rate we're experiencing. >> >> >> >> >> >> >> >> >> Here's a fragment of gc log from one of the nodes running the older >> >> >> (smaller) new gen, including a promotion failure: >> >> >> ------------------------- >> >> >> 2012-03-15T18:32:17.785+0100: 796604.225: [GC 796604.225: [ParNew >> >> >> Desired survivor size 20971520 bytes, new threshold 8 (max 15) >> >> >> - age ? 1: ? ?2927728 bytes, ? ?2927728 total >> >> >> - age ? 2: ? ?2428512 bytes, ? ?5356240 total >> >> >> - age ? 3: ? ?2696376 bytes, ? ?8052616 total >> >> >> - age ? 4: ? ?2623576 bytes, ? 10676192 total >> >> >> - age ? 5: ? ?3365576 bytes, ? 14041768 total >> >> >> - age ? 6: ? ?2792272 bytes, ? 16834040 total >> >> >> - age ? 7: ? ?2233008 bytes, ? 19067048 total >> >> >> - age ? 8: ? ?2263824 bytes, ? 21330872 total >> >> >> : 358709K->29362K(368640K), 0.0461460 secs] >> >> >> 3479492K->3151874K(5201920K), 0.0467320 secs] [Times: user=0.34 >> >> >> sys=0.01, real=0.05 secs] >> >> >> 2012-03-15T18:32:21.546+0100: 796607.986: [GC 796607.986: [ParNew >> >> >> (0: >> >> >> promotion failure size = 25) ?(1: promotion failure size = 25) ?(2: >> >> >> promotion failure size = 25) ?(3: promotion failure size = 25) ?(4: >> >> >> promotion failure size = 25) ?(5 >> >> >> : promotion failure size = 25) ?(6: promotion failure size = 341) >> >> >> ?(7: >> >> >> promotion failure size = 25) ?(promotion failed) >> >> >> Desired survivor size 20971520 bytes, new threshold 8 (max 15) >> >> >> - age ? 1: ? ?3708208 bytes, ? ?3708208 total >> >> >> - age ? 2: ? ?2174384 bytes, ? ?5882592 total >> >> >> - age ? 3: ? ?2383256 bytes, ? ?8265848 total >> >> >> - age ? 4: ? ?2689912 bytes, ? 10955760 total >> >> >> - age ? 5: ? ?2621832 bytes, ? 13577592 total >> >> >> - age ? 6: ? ?3360440 bytes, ? 16938032 total >> >> >> - age ? 7: ? ?2784136 bytes, ? 19722168 total >> >> >> - age ? 8: ? ?2220232 bytes, ? 21942400 total >> >> >> : 357042K->356456K(368640K), 0.2734100 secs]796608.259: [CMS: >> >> >> 3124189K->516640K(4833280K), 6.8127070 secs] >> >> >> 3479554K->516640K(5201920K), [CMS Perm : 142423K->142284K(262144K)], >> >> >> 7.0867850 secs] [Times: user=7.32 sys=0.07, real=7.09 secs] >> >> >> 2012-03-15T18:32:30.279+0100: 796616.719: [GC 796616.720: [ParNew >> >> >> Desired survivor size 20971520 bytes, new threshold 1 (max 15) >> >> >> - age ? 1: ? 29721456 bytes, ? 29721456 total >> >> >> : 327680K->40960K(368640K), 0.0403130 secs] >> >> >> 844320K->557862K(5201920K), 0.0409070 secs] [Times: user=0.27 >> >> >> sys=0.01, real=0.04 secs] >> >> >> 2012-03-15T18:32:32.701+0100: 796619.141: [GC 796619.141: [ParNew >> >> >> Desired survivor size 20971520 bytes, new threshold 15 (max 15) >> >> >> - age ? 1: ? 10310176 bytes, ? 10310176 total >> >> >> ------------------------- >> >> >> >> >> >> For contrast, here's a gc log fragment from the single node running >> >> >> the larger new gen and larger survivor spaces: >> >> >> (the fragment is from the same point in time, with the nodes >> >> >> experiencing equal load) >> >> >> ------------------------- >> >> >> 2012-03-15T18:32:12.067+0100: 797119.336: [GC 797119.336: [ParNew >> >> >> Desired survivor size 69894144 bytes, new threshold 15 (max 15) >> >> >> - age ? 1: ? ?5611536 bytes, ? ?5611536 total >> >> >> - age ? 2: ? ?3731888 bytes, ? ?9343424 total >> >> >> - age ? 3: ? ?3450672 bytes, ? 12794096 total >> >> >> - age ? 4: ? ?3314744 bytes, ? 16108840 total >> >> >> - age ? 5: ? ?3459888 bytes, ? 19568728 total >> >> >> - age ? 6: ? ?3334712 bytes, ? 22903440 total >> >> >> - age ? 7: ? ?3671960 bytes, ? 26575400 total >> >> >> - age ? 8: ? ?3841608 bytes, ? 30417008 total >> >> >> - age ? 9: ? ?2035392 bytes, ? 32452400 total >> >> >> - age ?10: ? ?1975056 bytes, ? 34427456 total >> >> >> - age ?11: ? ?2021344 bytes, ? 36448800 total >> >> >> - age ?12: ? ?1520752 bytes, ? 37969552 total >> >> >> - age ?13: ? ?1494176 bytes, ? 39463728 total >> >> >> - age ?14: ? ?2355136 bytes, ? 41818864 total >> >> >> - age ?15: ? ?1279000 bytes, ? 43097864 total >> >> >> : 603473K->61640K(682688K), 0.0756570 secs] >> >> >> 3373284K->2832383K(5106368K), 0.0762090 secs] [Times: user=0.56 >> >> >> sys=0.00, real=0.08 secs] >> >> >> 2012-03-15T18:32:18.200+0100: 797125.468: [GC 797125.469: [ParNew >> >> >> Desired survivor size 69894144 bytes, new threshold 15 (max 15) >> >> >> - age ? 1: ? ?6101320 bytes, ? ?6101320 total >> >> >> - age ? 2: ? ?4446776 bytes, ? 10548096 total >> >> >> - age ? 3: ? ?3701384 bytes, ? 14249480 total >> >> >> - age ? 4: ? ?3438488 bytes, ? 17687968 total >> >> >> - age ? 5: ? ?3295360 bytes, ? 20983328 total >> >> >> - age ? 6: ? ?3403320 bytes, ? 24386648 total >> >> >> - age ? 7: ? ?3323368 bytes, ? 27710016 total >> >> >> - age ? 8: ? ?3665760 bytes, ? 31375776 total >> >> >> - age ? 9: ? ?2427904 bytes, ? 33803680 total >> >> >> - age ?10: ? ?1418656 bytes, ? 35222336 total >> >> >> - age ?11: ? ?1955192 bytes, ? 37177528 total >> >> >> - age ?12: ? ?2006064 bytes, ? 39183592 total >> >> >> - age ?13: ? ?1520768 bytes, ? 40704360 total >> >> >> - age ?14: ? ?1493728 bytes, ? 42198088 total >> >> >> - age ?15: ? ?2354376 bytes, ? 44552464 total >> >> >> : 607816K->62650K(682688K), 0.0779270 secs] >> >> >> 3378559K->2834643K(5106368K), 0.0784690 secs] [Times: user=0.58 >> >> >> sys=0.00, real=0.08 secs] >> >> >> ------------------------- >> >> >> >> >> >> Questions: >> >> >> >> >> >> 1) From the tenuring distributions, it seems that the application >> >> >> benefits from larger new gen and survivor spaces. >> >> >> The next thing we'll try is to run with -Xmn1g -XX:SurvivorRatio=2, >> >> >> and see if the ParNew times are still acceptable. >> >> >> Does this seem a sensible approach in this context? >> >> >> Are there other variables beyond ParNew times that limit scaling the >> >> >> new gen to a large size? >> >> >> >> >> >> 2) Given the object age demographics inherent to our application, we >> >> >> can not expect to see the majority of data get collected in the new >> >> >> gen. >> >> >> >> >> >> Our approach to fight the promotion failures consists of three >> >> >> aspects: >> >> >> a) Lower the overall allocation rate of our application (by >> >> >> improving >> >> >> wasteful hotspots), to decrease overall ParNew collection frequency. >> >> >> b) Configure the new gen and survivor spaces as large as possible, >> >> >> keeping an eye on ParNew times and overall new/tenured ratio. >> >> >> c) Try to refactor the data structures that form the bulk of >> >> >> promoted >> >> >> data, to retain only the strictly required subgraphs. >> >> >> >> >> >> Is there anything else I can try or measure, in order to better >> >> >> understand the problem? >> >> >> >> >> >> Thanks in advance, >> >> >> Taras >> >> >> >> >> >> >> >> >> On Wed, Feb 22, 2012 at 10:51 AM, Taras Tielkes >> >> >> wrote: >> >> >>> (this time properly responding to the list alias) >> >> >>> Hi Srinivas, >> >> >>> >> >> >>> We're running 1.6.0 u29 on Linux x64. My understanding is that >> >> >>> CompressedOops is enabled by default since u23. >> >> >>> >> >> >>> At least this page seems to support that: >> >> >>> >> >> >>> >> >> >>> http://docs.oracle.com/javase/7/docs/technotes/guides/vm/performance-enhancements-7.html >> >> >>> >> >> >>> Regarding the other remarks (also from Todd and Chi), I'll comment >> >> >>> later. The first thing on my list is to collect >> >> >>> PrintTenuringDistribution data now. >> >> >>> >> >> >>> Kind regards, >> >> >>> Taras >> >> >>> >> >> >>> On Wed, Feb 22, 2012 at 10:50 AM, Taras Tielkes >> >> >>> wrote: >> >> >>>> Hi Srinivas, >> >> >>>> >> >> >>>> We're running 1.6.0 u29 on Linux x64. My understanding is that >> >> >>>> CompressedOops is enabled by default since u23. >> >> >>>> >> >> >>>> At least this page seems to support that: >> >> >>>> >> >> >>>> >> >> >>>> http://docs.oracle.com/javase/7/docs/technotes/guides/vm/performance-enhancements-7.html >> >> >>>> >> >> >>>> Regarding the other remarks (also from Todd and Chi), I'll comment >> >> >>>> later. The first thing on my list is to collect >> >> >>>> PrintTenuringDistribution data now. >> >> >>>> >> >> >>>> Kind regards, >> >> >>>> Taras >> >> >>>> >> >> >>>> On Wed, Feb 22, 2012 at 12:40 AM, Srinivas Ramakrishna >> >> >>>> wrote: >> >> >>>>> I agree that premature promotions are almost always the first and >> >> >>>>> most >> >> >>>>> important thing to fix when running >> >> >>>>> into fragmentation or overload issues with CMS. However, I can >> >> >>>>> also >> >> >>>>> imagine >> >> >>>>> long-lived objects with a highly >> >> >>>>> non-stationary size distribution which can also cause problems >> >> >>>>> for >> >> >>>>> CMS >> >> >>>>> despite best efforts to tune against >> >> >>>>> premature promotion. >> >> >>>>> >> >> >>>>> I didn't think Treas was running with MTT=0, although MTT > 0 is >> >> >>>>> no >> >> >>>>> recipe >> >> >>>>> for avoiding premature promotion >> >> >>>>> with bursty loads that case overflow the survivor spaces -- as >> >> >>>>> you >> >> >>>>> say large >> >> >>>>> survivor spaces with a low >> >> >>>>> TargetSurvivorRatio -- so as to leave plenty of space to >> >> >>>>> absorb/accommodate >> >> >>>>> spiking/bursty loads? is >> >> >>>>> definitely a "best practice" for CMS (and possibly for other >> >> >>>>> concurrent >> >> >>>>> collectors as well). >> >> >>>>> >> >> >>>>> One thing Taras can do to see if premature promotion might be an >> >> >>>>> issue is to >> >> >>>>> look at the tenuring >> >> >>>>> threshold in his case. A rough proxy (if >> >> >>>>> PrintTenuringDistribution >> >> >>>>> is not >> >> >>>>> enabled) is to look at the >> >> >>>>> promotion volume per scavenge. It may be possible, if premature >> >> >>>>> promotion is >> >> >>>>> a cause, to see >> >> >>>>> some kind of medium-term correlation between high promotion >> >> >>>>> volume >> >> >>>>> and >> >> >>>>> eventual promotion >> >> >>>>> failure despite frequent CMS collections. >> >> >>>>> >> >> >>>>> One other point which may or may not be relevant. I see that >> >> >>>>> Taras >> >> >>>>> is not >> >> >>>>> using CompressedOops... >> >> >>>>> Using that alone would greatly decrease memory pressure and >> >> >>>>> provide >> >> >>>>> more >> >> >>>>> breathing room to CMS, >> >> >>>>> which is also almost always a good idea. >> >> >>>>> >> >> >>>>> -- ramki >> >> >>>>> >> >> >>>>> On Tue, Feb 21, 2012 at 10:16 AM, Chi Ho Kwok >> >> >>>>> >> >> >>>>> wrote: >> >> >>>>>> >> >> >>>>>> Hi Teras, >> >> >>>>>> >> >> >>>>>> I think you may want to look into sizing the new and especially >> >> >>>>>> the >> >> >>>>>> survivor spaces differently. We run something similar to what >> >> >>>>>> you >> >> >>>>>> described, >> >> >>>>>> high volume request processing with large dataset loading, and >> >> >>>>>> what >> >> >>>>>> we've >> >> >>>>>> seen at the start is that the survivor spaces are completely >> >> >>>>>> overloaded, >> >> >>>>>> causing premature promotions. >> >> >>>>>> >> >> >>>>>> We've configured our vm with the following goals/guideline: >> >> >>>>>> >> >> >>>>>> old space is for semi-permanent data, living for at least 30s, >> >> >>>>>> average ~10 >> >> >>>>>> minutes >> >> >>>>>> new space contains only temporary and just loaded data >> >> >>>>>> surviving objects from new should never reach old in 1 gc, so >> >> >>>>>> the >> >> >>>>>> survivor >> >> >>>>>> space may never be 100% full >> >> >>>>>> >> >> >>>>>> With jstat -gcutil `pidof java` 2000, we see things like: >> >> >>>>>> >> >> >>>>>> ? S0 ? ? S1 ? ? E ? ? ?O ? ? ?P ? ? YGC ? ? YGCT ? ?FGC ? ?FGCT >> >> >>>>>> GCT >> >> >>>>>> ?70.20 ? 0.00 ?19.65 ?57.60 ?59.90 124808 29474.299 ?2498 >> >> >>>>>> ?191.110 >> >> >>>>>> 29665.409 >> >> >>>>>> ?70.20 ? 0.00 ?92.89 ?57.60 ?59.90 124808 29474.299 ?2498 >> >> >>>>>> ?191.110 >> >> >>>>>> 29665.409 >> >> >>>>>> ?70.20 ? 0.00 ?93.47 ?57.60 ?59.90 124808 29474.299 ?2498 >> >> >>>>>> ?191.110 >> >> >>>>>> 29665.409 >> >> >>>>>> ? 0.00 ?65.69 ?78.07 ?58.09 ?59.90 124809 29474.526 ?2498 >> >> >>>>>> ?191.110 >> >> >>>>>> 29665.636 >> >> >>>>>> ?84.97 ? 0.00 ?48.19 ?58.57 ?59.90 124810 29474.774 ?2498 >> >> >>>>>> ?191.110 >> >> >>>>>> 29665.884 >> >> >>>>>> ?84.97 ? 0.00 ?81.30 ?58.57 ?59.90 124810 29474.774 ?2498 >> >> >>>>>> ?191.110 >> >> >>>>>> 29665.884 >> >> >>>>>> ? 0.00 ?62.64 ?27.22 ?59.12 ?59.90 124811 29474.992 ?2498 >> >> >>>>>> ?191.110 >> >> >>>>>> 29666.102 >> >> >>>>>> ? 0.00 ?62.64 ?54.47 ?59.12 ?59.90 124811 29474.992 ?2498 >> >> >>>>>> ?191.110 >> >> >>>>>> 29666.102 >> >> >>>>>> ?75.68 ? 0.00 ? 6.80 ?59.53 ?59.90 124812 29475.228 ?2498 >> >> >>>>>> ?191.110 >> >> >>>>>> 29666.338 >> >> >>>>>> ?75.68 ? 0.00 ?23.38 ?59.53 ?59.90 124812 29475.228 ?2498 >> >> >>>>>> ?191.110 >> >> >>>>>> 29666.338 >> >> >>>>>> ?75.68 ? 0.00 ?27.72 ?59.53 ?59.90 124812 29475.228 ?2498 >> >> >>>>>> ?191.110 >> >> >>>>>> 29666.338 >> >> >>>>>> >> >> >>>>>> If you follow the lines, you can see Eden fill up to 100% on >> >> >>>>>> line >> >> >>>>>> 4, >> >> >>>>>> surviving objects are copied into S1, S0 is collected and added >> >> >>>>>> 0.49% to >> >> >>>>>> Old. On line 5, another GC happened, with Eden->S0, S1->Old, >> >> >>>>>> etc. >> >> >>>>>> No objects >> >> >>>>>> is ever transferred from Eden to Old, unless there's a huge peak >> >> >>>>>> of >> >> >>>>>> requests. >> >> >>>>>> >> >> >>>>>> This is with a: 32GB heap, Mxn1200M, SurvivorRatio 2 (600MB >> >> >>>>>> Eden, >> >> >>>>>> 300MB >> >> >>>>>> S0, 300MB S1), MaxTenuringThreshold 1 (whatever is still alive >> >> >>>>>> in >> >> >>>>>> S0/1 on >> >> >>>>>> the second GC is copied to old, don't wait, web requests are >> >> >>>>>> quite >> >> >>>>>> bursty). >> >> >>>>>> With about 1 collection every 2-5 seconds, objects promoted to >> >> >>>>>> Old >> >> >>>>>> must live >> >> >>>>>> for at 4-10 seconds; as that's longer than an average request >> >> >>>>>> (50ms-1s), >> >> >>>>>> none of the temporary data ever makes it into Old, which is much >> >> >>>>>> more >> >> >>>>>> expensive to collect. It works even with a higher than default >> >> >>>>>> CMSInitiatingOccupancyFraction=76 to optimize for space >> >> >>>>>> available >> >> >>>>>> for the >> >> >>>>>> large data cache we have. >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> With your config of 400MB Total new, with 350MB Eden, 25MB S0, >> >> >>>>>> 25MB >> >> >>>>>> S1 >> >> >>>>>> (SurvivorRatio 8), no tenuring threshold, I think loads of new >> >> >>>>>> objects get >> >> >>>>>> copied from Eden to Old directly, causing trouble for the CMS. >> >> >>>>>> You >> >> >>>>>> can use >> >> >>>>>> jstat to get live stats and tweak until it doesn't happen. If >> >> >>>>>> you >> >> >>>>>> can't make >> >> >>>>>> changes on live that easil, try doubling the new size indeed, >> >> >>>>>> with >> >> >>>>>> a 400 >> >> >>>>>> Eden, 200 S0, 200 S1 and?MaxTenuringThreshold?1 setting. It's >> >> >>>>>> probably >> >> >>>>>> overkill, but if should solve the problem if it is caused by >> >> >>>>>> premature >> >> >>>>>> promotion. >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> Chi Ho Kwok >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> On Tue, Feb 21, 2012 at 5:55 PM, Taras Tielkes >> >> >>>>>> >> >> >>>>>> wrote: >> >> >>>>>>> >> >> >>>>>>> Hi, >> >> >>>>>>> >> >> >>>>>>> We've removed the "-XX:+CMSScavengeBeforeRemark" setting from >> >> >>>>>>> 50% >> >> >>>>>>> of >> >> >>>>>>> our production nodes. >> >> >>>>>>> After running for a few weeks, it seems that there's no impact >> >> >>>>>>> from >> >> >>>>>>> removing this option. >> >> >>>>>>> Which is good, since it seems we can remove it from the other >> >> >>>>>>> nodes as >> >> >>>>>>> well, simplifying our overall JVM configuration ;-) >> >> >>>>>>> >> >> >>>>>>> However, we're still seeing promotion failures on all nodes, >> >> >>>>>>> once >> >> >>>>>>> every day or so. >> >> >>>>>>> >> >> >>>>>>> There's still the "Magic 1026": this accounts for ~60% of the >> >> >>>>>>> promotion failures that we're seeing (single ParNew thread >> >> >>>>>>> thread, >> >> >>>>>>> 1026 failure size): >> >> >>>>>>> -------------------- >> >> >>>>>>> 2012-02-06T09:13:51.806+0100: 328095.085: [GC 328095.086: >> >> >>>>>>> [ParNew: >> >> >>>>>>> 359895K->29357K(368640K), 0.0429070 secs] >> >> >>>>>>> 3471021K->3143476K(5201920K), 0.0434950 secs] [Times: user=0.32 >> >> >>>>>>> sys=0.00, real=0.04 secs] >> >> >>>>>>> 2012-02-06T09:13:55.922+0100: 328099.201: [GC 328099.201: >> >> >>>>>>> [ParNew: >> >> >>>>>>> 357037K->31817K(368640K), 0.0429130 secs] >> >> >>>>>>> 3471156K->3148946K(5201920K), 0.0434930 secs] [Times: user=0.31 >> >> >>>>>>> sys=0.00, real=0.04 secs] >> >> >>>>>>> 2012-02-06T09:13:59.044+0100: 328102.324: [GC 328102.324: >> >> >>>>>>> [ParNew >> >> >>>>>>> (promotion failure size = 1026) ?(promotion failed): >> >> >>>>>>> 359497K->368640K(368640K), 0.2226790 secs]328102.547: [CMS: >> >> >>>>>>> 3125609K->451515K(4833280K), 5.6225880 secs] 3476626K->4515 >> >> >>>>>>> 15K(5201920K), [CMS Perm : 124373K->124353K(262144K)], >> >> >>>>>>> 5.8459380 >> >> >>>>>>> secs] >> >> >>>>>>> [Times: user=6.20 sys=0.01, real=5.85 secs] >> >> >>>>>>> 2012-02-06T09:14:05.243+0100: 328108.522: [GC 328108.523: >> >> >>>>>>> [ParNew: >> >> >>>>>>> 327680K->40960K(368640K), 0.0319160 secs] >> >> >>>>>>> 779195K->497658K(5201920K), >> >> >>>>>>> 0.0325360 secs] [Times: user=0.21 sys=0.01, real=0.03 secs] >> >> >>>>>>> 2012-02-06T09:14:07.836+0100: 328111.116: [GC 328111.116: >> >> >>>>>>> [ParNew: >> >> >>>>>>> 368640K->32785K(368640K), 0.0744670 secs] >> >> >>>>>>> 825338K->520234K(5201920K), >> >> >>>>>>> 0.0750390 secs] [Times: user=0.40 sys=0.02, real=0.08 secs] >> >> >>>>>>> -------------------- >> >> >>>>>>> Given the 1026 word size, I'm wondering if I should be hunting >> >> >>>>>>> for >> >> >>>>>>> an >> >> >>>>>>> overuse of BufferedInputStream/BufferedOutoutStream, since both >> >> >>>>>>> have >> >> >>>>>>> 8192 as a default buffer size. >> >> >>>>>>> >> >> >>>>>>> The second group of promotion failures look like this (multiple >> >> >>>>>>> ParNew >> >> >>>>>>> threads, small failure sizes): >> >> >>>>>>> -------------------- >> >> >>>>>>> 2012-02-06T09:50:15.773+0100: 328756.964: [GC 328756.964: >> >> >>>>>>> [ParNew: >> >> >>>>>>> 356116K->29934K(368640K), 0.0461100 secs] >> >> >>>>>>> 3203863K->2880162K(5201920K), 0.0468870 secs] [Times: user=0.34 >> >> >>>>>>> sys=0.01, real=0.05 secs] >> >> >>>>>>> 2012-02-06T09:50:19.153+0100: 328760.344: [GC 328760.344: >> >> >>>>>>> [ParNew: >> >> >>>>>>> 357614K->30359K(368640K), 0.0454680 secs] >> >> >>>>>>> 3207842K->2882892K(5201920K), 0.0462280 secs] [Times: user=0.33 >> >> >>>>>>> sys=0.01, real=0.05 secs] >> >> >>>>>>> 2012-02-06T09:50:22.658+0100: 328763.849: [GC 328763.849: >> >> >>>>>>> [ParNew >> >> >>>>>>> (1: >> >> >>>>>>> promotion failure size = 25) ?(4: promotion failure size = 25) >> >> >>>>>>> ?(6: >> >> >>>>>>> promotion failure size = 25) ?(7: promotion failure size = 144) >> >> >>>>>>> (promotion failed): 358039K->358358 >> >> >>>>>>> K(368640K), 0.2148680 secs]328764.064: [CMS: >> >> >>>>>>> 2854709K->446750K(4833280K), 5.8368270 secs] >> >> >>>>>>> 3210572K->446750K(5201920K), [CMS Perm : >> >> >>>>>>> 124670K->124644K(262144K)], >> >> >>>>>>> 6.0525230 secs] [Times: user=6.32 sys=0.00, real=6.05 secs] >> >> >>>>>>> 2012-02-06T09:50:29.896+0100: 328771.086: [GC 328771.087: >> >> >>>>>>> [ParNew: >> >> >>>>>>> 327680K->22569K(368640K), 0.0227080 secs] >> >> >>>>>>> 774430K->469319K(5201920K), >> >> >>>>>>> 0.0235020 secs] [Times: user=0.16 sys=0.00, real=0.02 secs] >> >> >>>>>>> 2012-02-06T09:50:31.076+0100: 328772.266: [GC 328772.267: >> >> >>>>>>> [ParNew: >> >> >>>>>>> 350249K->22264K(368640K), 0.0235480 secs] >> >> >>>>>>> 796999K->469014K(5201920K), >> >> >>>>>>> 0.0243000 secs] [Times: user=0.18 sys=0.01, real=0.02 secs] >> >> >>>>>>> -------------------- >> >> >>>>>>> >> >> >>>>>>> We're going to try to double the new size on a single node, to >> >> >>>>>>> see >> >> >>>>>>> the >> >> >>>>>>> effects of that. >> >> >>>>>>> >> >> >>>>>>> Beyond this experiment, is there any additional data I can >> >> >>>>>>> collect >> >> >>>>>>> to >> >> >>>>>>> better understand the nature of the promotion failures? >> >> >>>>>>> Am I facing collecting free list statistics at this point? >> >> >>>>>>> >> >> >>>>>>> Thanks, >> >> >>>>>>> Taras >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> _______________________________________________ >> >> >>>>>> hotspot-gc-use mailing list >> >> >>>>>> hotspot-gc-use at openjdk.java.net >> >> >>>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> >> >>>>>> >> >> >>>>> >> >> >> _______________________________________________ >> >> >> hotspot-gc-use mailing list >> >> >> hotspot-gc-use at openjdk.java.net >> >> >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> >> _______________________________________________ >> >> hotspot-gc-use mailing list >> >> hotspot-gc-use at openjdk.java.net >> >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> > >> > >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > From chkwok at digibites.nl Sun Apr 15 10:11:34 2012 From: chkwok at digibites.nl (Chi Ho Kwok) Date: Sun, 15 Apr 2012 19:11:34 +0200 Subject: Promotion failures: indication of CMS fragmentation? In-Reply-To: References: <4EF9FCAC.3030208@oracle.com> <4F06A270.3010701@oracle.com> <4F0DBEC4.7040907@oracle.com> <4F1ECE7B.3040502@oracle.com> <4F1F2ED7.6060308@oracle.com> <4F20F78D.9070905@oracle.com> Message-ID: Hi Teras, Hmm, it looks like it failed even tho there's tons of space available, 2.4G used, 1.8G free out of 4.2G CMS old gen. Or am I reading the next line wrong? (snipped age histogram) [GC 3296267.500: [ParNew (1: promotion failure size = 16) (2: promotion failure size = 56) (4: promotion failure size = 342) (5: promotion failure size = 1026) (6: promotion failure size = 278) (promotion failed): 568902K->568678K(682688K), 0.3130580 secs]3296267.813: *[CMS: 2510091K->573489K(4423680K), 7.7481330 secs]* 3078184K->573489K(5106368K), [CMS Perm : 144002K->143970K(262144K)], 8.0619690 secs] [Times: user=8.35 sys=0.01, real=8.06 secs] Normally I'd say fragmentation, but how can it not find a spot for a 16 bytes chunk? I'm completely out of ideas now - anyone else? Here's a brute force "solution": what the app "needs" is 600M of live data in the old gen, that's what left usually after collection. Increase "safety margin" by adding memory to the old gen pool if possible by increasing total heap size, and set initial occupancy ratio to a silly low number like 45%. Hopefully, it will survive until the next software/jvm/kernel patch that requires a restart of the service or machine. I've seen something similar in our logs as well, with 19%/2.9GB free, my guess is that CMS needs a few GB to play with... Nowadays we run with a larger safety margin, doubled the heap on that machine to 32GB, I haven't seen any CMS and promotion failures since then (Jan 2010). 128265.354: [GC 128265.355: [ParNew (promotion failed): 589631K->589631K(589824K), 0.3582340 secs]128265.713: *[CMS: 12965822K->10393148K(15990784K), 20.9654520 secs]*13462337K->10393148K(16580608K), [CMS Perm : 20604K->16846K(34456K)], 21.3239890 secs] [Times: user=22.06 sys=0.09, real=21.32 secs] Regards, Chi Ho On Sun, Apr 15, 2012 at 6:41 PM, Taras Tielkes wrote: > Hi Chi, > > I've sent you a decent chunk of the gc.log file off-list (hopefully > not too large). > > For completeness, we're running with the following options (ignoring > the diagnostic ones): > ----- > -server > -Xms5g > -Xmx5g > -Xmn800m > -XX:PermSize=256m > -XX:MaxPermSize=256m > -XX:SurvivorRatio=4 > -XX:+UseConcMarkSweepGC > -XX:+UseParNewGC > -XX:+DisableExplicitGC > -XX:+UseCMSInitiatingOccupancyOnly > -XX:+CMSClassUnloadingEnabled > -XX:CMSInitiatingOccupancyFraction=68 > ----- > Platform is Java 6u29 running on Linux 2.6 x64. > Hardware is 2xquad Xeons, but pretty old ones (pre-Nehalem, no QPI). > > The gc logs will (typically) show big peaks at the start and end of > the working day - this is nature of the domain our application > targets. > > I would expect the live set to be below 1G (usually below 600M even). > However, we can experience temporary spikes of higher volume > longer-living object allocation bursts. > > We'll set up a jstat log for this machine. I do have historical jstat > logs for one of the other machines, but that one is still running with > a smaller new gen, and smaller survivor spaces. If there's any other > relevant data that I can collect, let me know. > > Kind regards, > Taras > > On Sun, Apr 15, 2012 at 6:15 PM, Chi Ho Kwok wrote: > > Hi Teras, > > > > Sure thing. Just the previous CMS should be enough, it doesn't matter if > > there is 10 or 1000 parnew's between that and the failure. > > > > As for the jstat failure, it looks like it looks in > > /tmp/hsperfdata_[username] for the pid by default, maybe something > > like -J-Djava.io.tmpdir=[path, like /app/client/program/tomcat/temp] can > > help; and from what I've seen, running jstat as the same user as the > process > > or root is required. Historical data is nice to have, but even just > staring > > at it for 15 minutes should give you a hint for the old gen usage. > > > > If the collection starts at 68, takes a while and the heap fills to 80%+ > > before it's done when it's not busy, it's probably wise to lower the > initial > > occupancy factor or increase the thread count so it completes faster. We > run > > with -XX:ParallelCMSThreads=3 on a 8 hw thread server, the default (2) > was > > too slow for us as we run with 76%, it still takes 15s on average for > CMS to > > scan and clean the old gen (while old gen grows to up to 80% full), much > > longer can mean a promotion failure during request spikes. > > > > > > Chi Ho Kwok > > > > > > On Sun, Apr 15, 2012 at 5:08 PM, Taras Tielkes > > wrote: > >> > >> Hi Chi, > >> > >> Is it o.k. if I send this off-list to you directly? If so, how much > >> more do you need? Just enough to cover the previous CMS? > >> We're running with -XX:CMSInitiatingOccupancyFraction=68 and > >> -XX:+UseCMSInitiatingOccupancyOnly, by the way. > >> > >> I do have shell access, however, on that particular machine we're > >> experiencing the "process id not found" issue with jstat. > >> I think this can be worked around by fiddling with temp directory > >> options, but we haven't tried that yet. > >> Regarding the jstat output, I assume this would be most valuable to > >> have for the exact moment when the promotion failure happens, correct? > >> If so, we can try to set up jstat to run in the background > >> continuously, to have more diagnostic data in the future. > >> > >> Kind regards, > >> Taras > >> > >> On Sun, Apr 15, 2012 at 2:48 PM, Chi Ho Kwok > wrote: > >> > Hi Teras, > >> > > >> > Can you send me a larger chunk of the log? I'm interested in seeing > when > >> > the > >> > last CMS was run and what it freed. Maybe it's kicking in too late, > the > >> > full > >> > GC triggered by promotion failure only found 600M live data, rest was > >> > garbage. If that's the cause, > lowering XX:CMSInitiatingOccupancyFraction > >> > can > >> > help. > >> > > >> > Also, do you have shell access to that machine? If so, try running > >> > jstat, > >> > you can see the usage of all generations live as it happens. > >> > > >> > > >> > Chi Ho Kwok > >> > > >> > On Sun, Apr 15, 2012 at 2:34 PM, Taras Tielkes < > taras.tielkes at gmail.com> > >> > wrote: > >> >> > >> >> Hi Chi, Srinivas, > >> >> > >> >> Optimizing the cost of ParNew (by lowering MTT) would be nice, but > for > >> >> now my priority is still to minimize the promotion failures. > >> >> > >> >> For example, on the machine running CMS with the "larger" young gen > >> >> and survivor spaces (-Xmx5g -Xmn800 -XX:SurvivorRatio=4), I've just > >> >> seen a promotion failure again. Below is a snippet of gc.log showing > >> >> this. > >> >> To put this into perspective, this is a first promotion failure on > >> >> that machine in a couple of weeks. Still, zero failures would beat a > >> >> single failure, since the clients connecting to this application will > >> >> only wait a few seconds before timing out and terminating the > >> >> connection. In addition, the promotion failures are occurring in peak > >> >> usage moments. > >> >> > >> >> Apart from trying to eliminate the promotion failure pauses, my main > >> >> goal is to learn how to understand the root cause in a case like > this. > >> >> Any suggestions for things to try or read up on are appreciated. > >> >> > >> >> Kind regards, > >> >> Taras > >> >> ------------------------------------------------ > >> >> 2012-04-13T17:44:27.777+0200: 3296255.045: [GC 3296255.046: [ParNew > >> >> Desired survivor size 69894144 bytes, new threshold 15 (max 15) > >> >> - age 1: 3684448 bytes, 3684448 total > >> >> - age 2: 824984 bytes, 4509432 total > >> >> - age 3: 885120 bytes, 5394552 total > >> >> - age 4: 756568 bytes, 6151120 total > >> >> - age 5: 696880 bytes, 6848000 total > >> >> - age 6: 890688 bytes, 7738688 total > >> >> - age 7: 2631184 bytes, 10369872 total > >> >> - age 8: 719976 bytes, 11089848 total > >> >> - age 9: 724944 bytes, 11814792 total > >> >> - age 10: 750360 bytes, 12565152 total > >> >> - age 11: 934944 bytes, 13500096 total > >> >> - age 12: 521080 bytes, 14021176 total > >> >> - age 13: 543392 bytes, 14564568 total > >> >> - age 14: 906616 bytes, 15471184 total > >> >> - age 15: 504008 bytes, 15975192 total > >> >> : 568932K->22625K(682688K), 0.0410180 secs] > >> >> 3077079K->2531413K(5106368K), 0.0416940 secs] [Times: user=0.30 > >> >> sys=0.01, real=0.05 secs] > >> >> 2012-04-13T17:44:33.893+0200: 3296261.162: [GC 3296261.162: [ParNew > >> >> Desired survivor size 69894144 bytes, new threshold 15 (max 15) > >> >> - age 1: 2975896 bytes, 2975896 total > >> >> - age 2: 742592 bytes, 3718488 total > >> >> - age 3: 812864 bytes, 4531352 total > >> >> - age 4: 873488 bytes, 5404840 total > >> >> - age 5: 746128 bytes, 6150968 total > >> >> - age 6: 685192 bytes, 6836160 total > >> >> - age 7: 888376 bytes, 7724536 total > >> >> - age 8: 2621688 bytes, 10346224 total > >> >> - age 9: 715608 bytes, 11061832 total > >> >> - age 10: 723336 bytes, 11785168 total > >> >> - age 11: 749856 bytes, 12535024 total > >> >> - age 12: 914632 bytes, 13449656 total > >> >> - age 13: 520944 bytes, 13970600 total > >> >> - age 14: 543224 bytes, 14513824 total > >> >> - age 15: 906040 bytes, 15419864 total > >> >> : 568801K->22726K(682688K), 0.0447800 secs] > >> >> 3077589K->2532008K(5106368K), 0.0454710 secs] [Times: user=0.33 > >> >> sys=0.00, real=0.05 secs] > >> >> 2012-04-13T17:44:40.231+0200: 3296267.499: [GC 3296267.500: [ParNew > >> >> (1: promotion failure size = 16) (2: promotion failure size = 56) > >> >> (4: promotion failure > >> >> size = 342) (5: promotion failure size = 1026) (6: promotion > failure > >> >> size = 278) (promotion failed) > >> >> Desired survivor size 69894144 bytes, new threshold 15 (max 15) > >> >> - age 1: 2436840 bytes, 2436840 total > >> >> - age 2: 1625136 bytes, 4061976 total > >> >> - age 3: 691664 bytes, 4753640 total > >> >> - age 4: 799992 bytes, 5553632 total > >> >> - age 5: 858344 bytes, 6411976 total > >> >> - age 6: 730200 bytes, 7142176 total > >> >> - age 7: 680072 bytes, 7822248 total > >> >> - age 8: 885960 bytes, 8708208 total > >> >> - age 9: 2618544 bytes, 11326752 total > >> >> - age 10: 709168 bytes, 12035920 total > >> >> - age 11: 714576 bytes, 12750496 total > >> >> - age 12: 734976 bytes, 13485472 total > >> >> - age 13: 905048 bytes, 14390520 total > >> >> - age 14: 520320 bytes, 14910840 total > >> >> - age 15: 543056 bytes, 15453896 total > >> >> : 568902K->568678K(682688K), 0.3130580 secs]3296267.813: [CMS: > >> >> 2510091K->573489K(4423680K), 7.7481330 secs] > >> >> 3078184K->573489K(5106368K), [CMS Perm : 144002K-> > >> >> 143970K(262144K)], 8.0619690 secs] [Times: user=8.35 sys=0.01, > >> >> real=8.06 > >> >> secs] > >> >> 2012-04-13T17:44:51.337+0200: 3296278.606: [GC 3296278.606: [ParNew > >> >> Desired survivor size 69894144 bytes, new threshold 15 (max 15) > >> >> - age 1: 33717528 bytes, 33717528 total > >> >> : 546176K->43054K(682688K), 0.0515990 secs] > >> >> 1119665K->616543K(5106368K), 0.0523550 secs] [Times: user=0.34 > >> >> sys=0.00, real=0.05 secs] > >> >> ------------------------------------------------ > >> >> > >> >> On Tue, Mar 20, 2012 at 10:12 PM, Srinivas Ramakrishna > >> >> wrote: > >> >> > As Chi-ho noted, about 3-4 MB of data does get promoted per > scavenge, > >> >> > after having > >> >> > sloshed around in your survivor spaces some 15 times. I'd venture > >> >> > that > >> >> > whatever winnowing > >> >> > of young objects was to ocur has in fact occured already within the > >> >> > first 3-4 scavenges that > >> >> > an object has survived, after which the drop-off in population is > >> >> > less > >> >> > sharp. So I'd suggest > >> >> > lowering the MTT to about 3, while leaving the survivor ratio > intact. > >> >> > That should reduce your > >> >> > copying costs and bring down your scavenge pauses further, while > not > >> >> > adversely affecting > >> >> > your promotion rates (and concomitantly the fragmentation). > >> >> > > >> >> > One thing that was a bit puzzling about the stats below was that > >> >> > you'd > >> >> > expect the volume > >> >> > of generation X in scavenge N to be no less than the volume of > >> >> > generation X+1 in scavenge N+1, > >> >> > but occasionally that natural invariant does not appear to hold, > >> >> > which > >> >> > is quite puzzling -- > >> >> > indicating perhaps that either ages or populations are not being > >> >> > correctly tracked. > >> >> > > >> >> > I don't know if anyone else has noticed that in their tenuring > >> >> > distributions as well.... > >> >> > > >> >> > -- ramki > >> >> > > >> >> > On Tue, Mar 20, 2012 at 9:36 AM, Taras Tielkes > >> >> > > >> >> > wrote: > >> >> >> Hi, > >> >> >> > >> >> >> I've collected -XX:+PrintTenuringDistribution data from a node in > >> >> >> our > >> >> >> production environment, running -Xmx5g -Xmn400m > -XX:SurvivorRatio=8. > >> >> >> On one other production node, we've configured a larger new gen, > and > >> >> >> larger survivor spaces (-Xmx5g -Xmn800m -XX:SurvivorRatio=4). > >> >> >> This node has -XX:+PrintTenuringDistribution logging as well. > >> >> >> > >> >> >> The node running the larger new gen and survivor spaces has not > run > >> >> >> into a promotion failure yet, while the ones still running the old > >> >> >> config have hit a few. > >> >> >> The promotion failures are typically experienced at high load > >> >> >> periods, > >> >> >> which makes sense, as allocation and promotion will experience a > >> >> >> spike > >> >> >> in those periods as well. > >> >> >> > >> >> >> The inherent nature of the application implies relatively long > >> >> >> sessions (towards a few hours), retaining a fair amout of state up > >> >> >> to > >> >> >> an hour. > >> >> >> I believe this is the main reason of the relatively high promotion > >> >> >> rate we're experiencing. > >> >> >> > >> >> >> > >> >> >> Here's a fragment of gc log from one of the nodes running the > older > >> >> >> (smaller) new gen, including a promotion failure: > >> >> >> ------------------------- > >> >> >> 2012-03-15T18:32:17.785+0100: 796604.225: [GC 796604.225: [ParNew > >> >> >> Desired survivor size 20971520 bytes, new threshold 8 (max 15) > >> >> >> - age 1: 2927728 bytes, 2927728 total > >> >> >> - age 2: 2428512 bytes, 5356240 total > >> >> >> - age 3: 2696376 bytes, 8052616 total > >> >> >> - age 4: 2623576 bytes, 10676192 total > >> >> >> - age 5: 3365576 bytes, 14041768 total > >> >> >> - age 6: 2792272 bytes, 16834040 total > >> >> >> - age 7: 2233008 bytes, 19067048 total > >> >> >> - age 8: 2263824 bytes, 21330872 total > >> >> >> : 358709K->29362K(368640K), 0.0461460 secs] > >> >> >> 3479492K->3151874K(5201920K), 0.0467320 secs] [Times: user=0.34 > >> >> >> sys=0.01, real=0.05 secs] > >> >> >> 2012-03-15T18:32:21.546+0100: 796607.986: [GC 796607.986: [ParNew > >> >> >> (0: > >> >> >> promotion failure size = 25) (1: promotion failure size = 25) > (2: > >> >> >> promotion failure size = 25) (3: promotion failure size = 25) > (4: > >> >> >> promotion failure size = 25) (5 > >> >> >> : promotion failure size = 25) (6: promotion failure size = 341) > >> >> >> (7: > >> >> >> promotion failure size = 25) (promotion failed) > >> >> >> Desired survivor size 20971520 bytes, new threshold 8 (max 15) > >> >> >> - age 1: 3708208 bytes, 3708208 total > >> >> >> - age 2: 2174384 bytes, 5882592 total > >> >> >> - age 3: 2383256 bytes, 8265848 total > >> >> >> - age 4: 2689912 bytes, 10955760 total > >> >> >> - age 5: 2621832 bytes, 13577592 total > >> >> >> - age 6: 3360440 bytes, 16938032 total > >> >> >> - age 7: 2784136 bytes, 19722168 total > >> >> >> - age 8: 2220232 bytes, 21942400 total > >> >> >> : 357042K->356456K(368640K), 0.2734100 secs]796608.259: [CMS: > >> >> >> 3124189K->516640K(4833280K), 6.8127070 secs] > >> >> >> 3479554K->516640K(5201920K), [CMS Perm : > 142423K->142284K(262144K)], > >> >> >> 7.0867850 secs] [Times: user=7.32 sys=0.07, real=7.09 secs] > >> >> >> 2012-03-15T18:32:30.279+0100: 796616.719: [GC 796616.720: [ParNew > >> >> >> Desired survivor size 20971520 bytes, new threshold 1 (max 15) > >> >> >> - age 1: 29721456 bytes, 29721456 total > >> >> >> : 327680K->40960K(368640K), 0.0403130 secs] > >> >> >> 844320K->557862K(5201920K), 0.0409070 secs] [Times: user=0.27 > >> >> >> sys=0.01, real=0.04 secs] > >> >> >> 2012-03-15T18:32:32.701+0100: 796619.141: [GC 796619.141: [ParNew > >> >> >> Desired survivor size 20971520 bytes, new threshold 15 (max 15) > >> >> >> - age 1: 10310176 bytes, 10310176 total > >> >> >> ------------------------- > >> >> >> > >> >> >> For contrast, here's a gc log fragment from the single node > running > >> >> >> the larger new gen and larger survivor spaces: > >> >> >> (the fragment is from the same point in time, with the nodes > >> >> >> experiencing equal load) > >> >> >> ------------------------- > >> >> >> 2012-03-15T18:32:12.067+0100: 797119.336: [GC 797119.336: [ParNew > >> >> >> Desired survivor size 69894144 bytes, new threshold 15 (max 15) > >> >> >> - age 1: 5611536 bytes, 5611536 total > >> >> >> - age 2: 3731888 bytes, 9343424 total > >> >> >> - age 3: 3450672 bytes, 12794096 total > >> >> >> - age 4: 3314744 bytes, 16108840 total > >> >> >> - age 5: 3459888 bytes, 19568728 total > >> >> >> - age 6: 3334712 bytes, 22903440 total > >> >> >> - age 7: 3671960 bytes, 26575400 total > >> >> >> - age 8: 3841608 bytes, 30417008 total > >> >> >> - age 9: 2035392 bytes, 32452400 total > >> >> >> - age 10: 1975056 bytes, 34427456 total > >> >> >> - age 11: 2021344 bytes, 36448800 total > >> >> >> - age 12: 1520752 bytes, 37969552 total > >> >> >> - age 13: 1494176 bytes, 39463728 total > >> >> >> - age 14: 2355136 bytes, 41818864 total > >> >> >> - age 15: 1279000 bytes, 43097864 total > >> >> >> : 603473K->61640K(682688K), 0.0756570 secs] > >> >> >> 3373284K->2832383K(5106368K), 0.0762090 secs] [Times: user=0.56 > >> >> >> sys=0.00, real=0.08 secs] > >> >> >> 2012-03-15T18:32:18.200+0100: 797125.468: [GC 797125.469: [ParNew > >> >> >> Desired survivor size 69894144 bytes, new threshold 15 (max 15) > >> >> >> - age 1: 6101320 bytes, 6101320 total > >> >> >> - age 2: 4446776 bytes, 10548096 total > >> >> >> - age 3: 3701384 bytes, 14249480 total > >> >> >> - age 4: 3438488 bytes, 17687968 total > >> >> >> - age 5: 3295360 bytes, 20983328 total > >> >> >> - age 6: 3403320 bytes, 24386648 total > >> >> >> - age 7: 3323368 bytes, 27710016 total > >> >> >> - age 8: 3665760 bytes, 31375776 total > >> >> >> - age 9: 2427904 bytes, 33803680 total > >> >> >> - age 10: 1418656 bytes, 35222336 total > >> >> >> - age 11: 1955192 bytes, 37177528 total > >> >> >> - age 12: 2006064 bytes, 39183592 total > >> >> >> - age 13: 1520768 bytes, 40704360 total > >> >> >> - age 14: 1493728 bytes, 42198088 total > >> >> >> - age 15: 2354376 bytes, 44552464 total > >> >> >> : 607816K->62650K(682688K), 0.0779270 secs] > >> >> >> 3378559K->2834643K(5106368K), 0.0784690 secs] [Times: user=0.58 > >> >> >> sys=0.00, real=0.08 secs] > >> >> >> ------------------------- > >> >> >> > >> >> >> Questions: > >> >> >> > >> >> >> 1) From the tenuring distributions, it seems that the application > >> >> >> benefits from larger new gen and survivor spaces. > >> >> >> The next thing we'll try is to run with -Xmn1g > -XX:SurvivorRatio=2, > >> >> >> and see if the ParNew times are still acceptable. > >> >> >> Does this seem a sensible approach in this context? > >> >> >> Are there other variables beyond ParNew times that limit scaling > the > >> >> >> new gen to a large size? > >> >> >> > >> >> >> 2) Given the object age demographics inherent to our application, > we > >> >> >> can not expect to see the majority of data get collected in the > new > >> >> >> gen. > >> >> >> > >> >> >> Our approach to fight the promotion failures consists of three > >> >> >> aspects: > >> >> >> a) Lower the overall allocation rate of our application (by > >> >> >> improving > >> >> >> wasteful hotspots), to decrease overall ParNew collection > frequency. > >> >> >> b) Configure the new gen and survivor spaces as large as possible, > >> >> >> keeping an eye on ParNew times and overall new/tenured ratio. > >> >> >> c) Try to refactor the data structures that form the bulk of > >> >> >> promoted > >> >> >> data, to retain only the strictly required subgraphs. > >> >> >> > >> >> >> Is there anything else I can try or measure, in order to better > >> >> >> understand the problem? > >> >> >> > >> >> >> Thanks in advance, > >> >> >> Taras > >> >> >> > >> >> >> > >> >> >> On Wed, Feb 22, 2012 at 10:51 AM, Taras Tielkes > >> >> >> wrote: > >> >> >>> (this time properly responding to the list alias) > >> >> >>> Hi Srinivas, > >> >> >>> > >> >> >>> We're running 1.6.0 u29 on Linux x64. My understanding is that > >> >> >>> CompressedOops is enabled by default since u23. > >> >> >>> > >> >> >>> At least this page seems to support that: > >> >> >>> > >> >> >>> > >> >> >>> > http://docs.oracle.com/javase/7/docs/technotes/guides/vm/performance-enhancements-7.html > >> >> >>> > >> >> >>> Regarding the other remarks (also from Todd and Chi), I'll > comment > >> >> >>> later. The first thing on my list is to collect > >> >> >>> PrintTenuringDistribution data now. > >> >> >>> > >> >> >>> Kind regards, > >> >> >>> Taras > >> >> >>> > >> >> >>> On Wed, Feb 22, 2012 at 10:50 AM, Taras Tielkes > >> >> >>> wrote: > >> >> >>>> Hi Srinivas, > >> >> >>>> > >> >> >>>> We're running 1.6.0 u29 on Linux x64. My understanding is that > >> >> >>>> CompressedOops is enabled by default since u23. > >> >> >>>> > >> >> >>>> At least this page seems to support that: > >> >> >>>> > >> >> >>>> > >> >> >>>> > http://docs.oracle.com/javase/7/docs/technotes/guides/vm/performance-enhancements-7.html > >> >> >>>> > >> >> >>>> Regarding the other remarks (also from Todd and Chi), I'll > comment > >> >> >>>> later. The first thing on my list is to collect > >> >> >>>> PrintTenuringDistribution data now. > >> >> >>>> > >> >> >>>> Kind regards, > >> >> >>>> Taras > >> >> >>>> > >> >> >>>> On Wed, Feb 22, 2012 at 12:40 AM, Srinivas Ramakrishna > >> >> >>>> wrote: > >> >> >>>>> I agree that premature promotions are almost always the first > and > >> >> >>>>> most > >> >> >>>>> important thing to fix when running > >> >> >>>>> into fragmentation or overload issues with CMS. However, I can > >> >> >>>>> also > >> >> >>>>> imagine > >> >> >>>>> long-lived objects with a highly > >> >> >>>>> non-stationary size distribution which can also cause problems > >> >> >>>>> for > >> >> >>>>> CMS > >> >> >>>>> despite best efforts to tune against > >> >> >>>>> premature promotion. > >> >> >>>>> > >> >> >>>>> I didn't think Treas was running with MTT=0, although MTT > 0 > is > >> >> >>>>> no > >> >> >>>>> recipe > >> >> >>>>> for avoiding premature promotion > >> >> >>>>> with bursty loads that case overflow the survivor spaces -- as > >> >> >>>>> you > >> >> >>>>> say large > >> >> >>>>> survivor spaces with a low > >> >> >>>>> TargetSurvivorRatio -- so as to leave plenty of space to > >> >> >>>>> absorb/accommodate > >> >> >>>>> spiking/bursty loads is > >> >> >>>>> definitely a "best practice" for CMS (and possibly for other > >> >> >>>>> concurrent > >> >> >>>>> collectors as well). > >> >> >>>>> > >> >> >>>>> One thing Taras can do to see if premature promotion might be > an > >> >> >>>>> issue is to > >> >> >>>>> look at the tenuring > >> >> >>>>> threshold in his case. A rough proxy (if > >> >> >>>>> PrintTenuringDistribution > >> >> >>>>> is not > >> >> >>>>> enabled) is to look at the > >> >> >>>>> promotion volume per scavenge. It may be possible, if premature > >> >> >>>>> promotion is > >> >> >>>>> a cause, to see > >> >> >>>>> some kind of medium-term correlation between high promotion > >> >> >>>>> volume > >> >> >>>>> and > >> >> >>>>> eventual promotion > >> >> >>>>> failure despite frequent CMS collections. > >> >> >>>>> > >> >> >>>>> One other point which may or may not be relevant. I see that > >> >> >>>>> Taras > >> >> >>>>> is not > >> >> >>>>> using CompressedOops... > >> >> >>>>> Using that alone would greatly decrease memory pressure and > >> >> >>>>> provide > >> >> >>>>> more > >> >> >>>>> breathing room to CMS, > >> >> >>>>> which is also almost always a good idea. > >> >> >>>>> > >> >> >>>>> -- ramki > >> >> >>>>> > >> >> >>>>> On Tue, Feb 21, 2012 at 10:16 AM, Chi Ho Kwok > >> >> >>>>> > >> >> >>>>> wrote: > >> >> >>>>>> > >> >> >>>>>> Hi Teras, > >> >> >>>>>> > >> >> >>>>>> I think you may want to look into sizing the new and > especially > >> >> >>>>>> the > >> >> >>>>>> survivor spaces differently. We run something similar to what > >> >> >>>>>> you > >> >> >>>>>> described, > >> >> >>>>>> high volume request processing with large dataset loading, and > >> >> >>>>>> what > >> >> >>>>>> we've > >> >> >>>>>> seen at the start is that the survivor spaces are completely > >> >> >>>>>> overloaded, > >> >> >>>>>> causing premature promotions. > >> >> >>>>>> > >> >> >>>>>> We've configured our vm with the following goals/guideline: > >> >> >>>>>> > >> >> >>>>>> old space is for semi-permanent data, living for at least 30s, > >> >> >>>>>> average ~10 > >> >> >>>>>> minutes > >> >> >>>>>> new space contains only temporary and just loaded data > >> >> >>>>>> surviving objects from new should never reach old in 1 gc, so > >> >> >>>>>> the > >> >> >>>>>> survivor > >> >> >>>>>> space may never be 100% full > >> >> >>>>>> > >> >> >>>>>> With jstat -gcutil `pidof java` 2000, we see things like: > >> >> >>>>>> > >> >> >>>>>> S0 S1 E O P YGC YGCT FGC > FGCT > >> >> >>>>>> GCT > >> >> >>>>>> 70.20 0.00 19.65 57.60 59.90 124808 29474.299 2498 > >> >> >>>>>> 191.110 > >> >> >>>>>> 29665.409 > >> >> >>>>>> 70.20 0.00 92.89 57.60 59.90 124808 29474.299 2498 > >> >> >>>>>> 191.110 > >> >> >>>>>> 29665.409 > >> >> >>>>>> 70.20 0.00 93.47 57.60 59.90 124808 29474.299 2498 > >> >> >>>>>> 191.110 > >> >> >>>>>> 29665.409 > >> >> >>>>>> 0.00 65.69 78.07 58.09 59.90 124809 29474.526 2498 > >> >> >>>>>> 191.110 > >> >> >>>>>> 29665.636 > >> >> >>>>>> 84.97 0.00 48.19 58.57 59.90 124810 29474.774 2498 > >> >> >>>>>> 191.110 > >> >> >>>>>> 29665.884 > >> >> >>>>>> 84.97 0.00 81.30 58.57 59.90 124810 29474.774 2498 > >> >> >>>>>> 191.110 > >> >> >>>>>> 29665.884 > >> >> >>>>>> 0.00 62.64 27.22 59.12 59.90 124811 29474.992 2498 > >> >> >>>>>> 191.110 > >> >> >>>>>> 29666.102 > >> >> >>>>>> 0.00 62.64 54.47 59.12 59.90 124811 29474.992 2498 > >> >> >>>>>> 191.110 > >> >> >>>>>> 29666.102 > >> >> >>>>>> 75.68 0.00 6.80 59.53 59.90 124812 29475.228 2498 > >> >> >>>>>> 191.110 > >> >> >>>>>> 29666.338 > >> >> >>>>>> 75.68 0.00 23.38 59.53 59.90 124812 29475.228 2498 > >> >> >>>>>> 191.110 > >> >> >>>>>> 29666.338 > >> >> >>>>>> 75.68 0.00 27.72 59.53 59.90 124812 29475.228 2498 > >> >> >>>>>> 191.110 > >> >> >>>>>> 29666.338 > >> >> >>>>>> > >> >> >>>>>> If you follow the lines, you can see Eden fill up to 100% on > >> >> >>>>>> line > >> >> >>>>>> 4, > >> >> >>>>>> surviving objects are copied into S1, S0 is collected and > added > >> >> >>>>>> 0.49% to > >> >> >>>>>> Old. On line 5, another GC happened, with Eden->S0, S1->Old, > >> >> >>>>>> etc. > >> >> >>>>>> No objects > >> >> >>>>>> is ever transferred from Eden to Old, unless there's a huge > peak > >> >> >>>>>> of > >> >> >>>>>> requests. > >> >> >>>>>> > >> >> >>>>>> This is with a: 32GB heap, Mxn1200M, SurvivorRatio 2 (600MB > >> >> >>>>>> Eden, > >> >> >>>>>> 300MB > >> >> >>>>>> S0, 300MB S1), MaxTenuringThreshold 1 (whatever is still alive > >> >> >>>>>> in > >> >> >>>>>> S0/1 on > >> >> >>>>>> the second GC is copied to old, don't wait, web requests are > >> >> >>>>>> quite > >> >> >>>>>> bursty). > >> >> >>>>>> With about 1 collection every 2-5 seconds, objects promoted to > >> >> >>>>>> Old > >> >> >>>>>> must live > >> >> >>>>>> for at 4-10 seconds; as that's longer than an average request > >> >> >>>>>> (50ms-1s), > >> >> >>>>>> none of the temporary data ever makes it into Old, which is > much > >> >> >>>>>> more > >> >> >>>>>> expensive to collect. It works even with a higher than default > >> >> >>>>>> CMSInitiatingOccupancyFraction=76 to optimize for space > >> >> >>>>>> available > >> >> >>>>>> for the > >> >> >>>>>> large data cache we have. > >> >> >>>>>> > >> >> >>>>>> > >> >> >>>>>> With your config of 400MB Total new, with 350MB Eden, 25MB S0, > >> >> >>>>>> 25MB > >> >> >>>>>> S1 > >> >> >>>>>> (SurvivorRatio 8), no tenuring threshold, I think loads of new > >> >> >>>>>> objects get > >> >> >>>>>> copied from Eden to Old directly, causing trouble for the CMS. > >> >> >>>>>> You > >> >> >>>>>> can use > >> >> >>>>>> jstat to get live stats and tweak until it doesn't happen. If > >> >> >>>>>> you > >> >> >>>>>> can't make > >> >> >>>>>> changes on live that easil, try doubling the new size indeed, > >> >> >>>>>> with > >> >> >>>>>> a 400 > >> >> >>>>>> Eden, 200 S0, 200 S1 and MaxTenuringThreshold 1 setting. It's > >> >> >>>>>> probably > >> >> >>>>>> overkill, but if should solve the problem if it is caused by > >> >> >>>>>> premature > >> >> >>>>>> promotion. > >> >> >>>>>> > >> >> >>>>>> > >> >> >>>>>> Chi Ho Kwok > >> >> >>>>>> > >> >> >>>>>> > >> >> >>>>>> On Tue, Feb 21, 2012 at 5:55 PM, Taras Tielkes > >> >> >>>>>> > >> >> >>>>>> wrote: > >> >> >>>>>>> > >> >> >>>>>>> Hi, > >> >> >>>>>>> > >> >> >>>>>>> We've removed the "-XX:+CMSScavengeBeforeRemark" setting from > >> >> >>>>>>> 50% > >> >> >>>>>>> of > >> >> >>>>>>> our production nodes. > >> >> >>>>>>> After running for a few weeks, it seems that there's no > impact > >> >> >>>>>>> from > >> >> >>>>>>> removing this option. > >> >> >>>>>>> Which is good, since it seems we can remove it from the other > >> >> >>>>>>> nodes as > >> >> >>>>>>> well, simplifying our overall JVM configuration ;-) > >> >> >>>>>>> > >> >> >>>>>>> However, we're still seeing promotion failures on all nodes, > >> >> >>>>>>> once > >> >> >>>>>>> every day or so. > >> >> >>>>>>> > >> >> >>>>>>> There's still the "Magic 1026": this accounts for ~60% of the > >> >> >>>>>>> promotion failures that we're seeing (single ParNew thread > >> >> >>>>>>> thread, > >> >> >>>>>>> 1026 failure size): > >> >> >>>>>>> -------------------- > >> >> >>>>>>> 2012-02-06T09:13:51.806+0100: 328095.085: [GC 328095.086: > >> >> >>>>>>> [ParNew: > >> >> >>>>>>> 359895K->29357K(368640K), 0.0429070 secs] > >> >> >>>>>>> 3471021K->3143476K(5201920K), 0.0434950 secs] [Times: > user=0.32 > >> >> >>>>>>> sys=0.00, real=0.04 secs] > >> >> >>>>>>> 2012-02-06T09:13:55.922+0100: 328099.201: [GC 328099.201: > >> >> >>>>>>> [ParNew: > >> >> >>>>>>> 357037K->31817K(368640K), 0.0429130 secs] > >> >> >>>>>>> 3471156K->3148946K(5201920K), 0.0434930 secs] [Times: > user=0.31 > >> >> >>>>>>> sys=0.00, real=0.04 secs] > >> >> >>>>>>> 2012-02-06T09:13:59.044+0100: 328102.324: [GC 328102.324: > >> >> >>>>>>> [ParNew > >> >> >>>>>>> (promotion failure size = 1026) (promotion failed): > >> >> >>>>>>> 359497K->368640K(368640K), 0.2226790 secs]328102.547: [CMS: > >> >> >>>>>>> 3125609K->451515K(4833280K), 5.6225880 secs] 3476626K->4515 > >> >> >>>>>>> 15K(5201920K), [CMS Perm : 124373K->124353K(262144K)], > >> >> >>>>>>> 5.8459380 > >> >> >>>>>>> secs] > >> >> >>>>>>> [Times: user=6.20 sys=0.01, real=5.85 secs] > >> >> >>>>>>> 2012-02-06T09:14:05.243+0100: 328108.522: [GC 328108.523: > >> >> >>>>>>> [ParNew: > >> >> >>>>>>> 327680K->40960K(368640K), 0.0319160 secs] > >> >> >>>>>>> 779195K->497658K(5201920K), > >> >> >>>>>>> 0.0325360 secs] [Times: user=0.21 sys=0.01, real=0.03 secs] > >> >> >>>>>>> 2012-02-06T09:14:07.836+0100: 328111.116: [GC 328111.116: > >> >> >>>>>>> [ParNew: > >> >> >>>>>>> 368640K->32785K(368640K), 0.0744670 secs] > >> >> >>>>>>> 825338K->520234K(5201920K), > >> >> >>>>>>> 0.0750390 secs] [Times: user=0.40 sys=0.02, real=0.08 secs] > >> >> >>>>>>> -------------------- > >> >> >>>>>>> Given the 1026 word size, I'm wondering if I should be > hunting > >> >> >>>>>>> for > >> >> >>>>>>> an > >> >> >>>>>>> overuse of BufferedInputStream/BufferedOutoutStream, since > both > >> >> >>>>>>> have > >> >> >>>>>>> 8192 as a default buffer size. > >> >> >>>>>>> > >> >> >>>>>>> The second group of promotion failures look like this > (multiple > >> >> >>>>>>> ParNew > >> >> >>>>>>> threads, small failure sizes): > >> >> >>>>>>> -------------------- > >> >> >>>>>>> 2012-02-06T09:50:15.773+0100: 328756.964: [GC 328756.964: > >> >> >>>>>>> [ParNew: > >> >> >>>>>>> 356116K->29934K(368640K), 0.0461100 secs] > >> >> >>>>>>> 3203863K->2880162K(5201920K), 0.0468870 secs] [Times: > user=0.34 > >> >> >>>>>>> sys=0.01, real=0.05 secs] > >> >> >>>>>>> 2012-02-06T09:50:19.153+0100: 328760.344: [GC 328760.344: > >> >> >>>>>>> [ParNew: > >> >> >>>>>>> 357614K->30359K(368640K), 0.0454680 secs] > >> >> >>>>>>> 3207842K->2882892K(5201920K), 0.0462280 secs] [Times: > user=0.33 > >> >> >>>>>>> sys=0.01, real=0.05 secs] > >> >> >>>>>>> 2012-02-06T09:50:22.658+0100: 328763.849: [GC 328763.849: > >> >> >>>>>>> [ParNew > >> >> >>>>>>> (1: > >> >> >>>>>>> promotion failure size = 25) (4: promotion failure size = > 25) > >> >> >>>>>>> (6: > >> >> >>>>>>> promotion failure size = 25) (7: promotion failure size = > 144) > >> >> >>>>>>> (promotion failed): 358039K->358358 > >> >> >>>>>>> K(368640K), 0.2148680 secs]328764.064: [CMS: > >> >> >>>>>>> 2854709K->446750K(4833280K), 5.8368270 secs] > >> >> >>>>>>> 3210572K->446750K(5201920K), [CMS Perm : > >> >> >>>>>>> 124670K->124644K(262144K)], > >> >> >>>>>>> 6.0525230 secs] [Times: user=6.32 sys=0.00, real=6.05 secs] > >> >> >>>>>>> 2012-02-06T09:50:29.896+0100: 328771.086: [GC 328771.087: > >> >> >>>>>>> [ParNew: > >> >> >>>>>>> 327680K->22569K(368640K), 0.0227080 secs] > >> >> >>>>>>> 774430K->469319K(5201920K), > >> >> >>>>>>> 0.0235020 secs] [Times: user=0.16 sys=0.00, real=0.02 secs] > >> >> >>>>>>> 2012-02-06T09:50:31.076+0100: 328772.266: [GC 328772.267: > >> >> >>>>>>> [ParNew: > >> >> >>>>>>> 350249K->22264K(368640K), 0.0235480 secs] > >> >> >>>>>>> 796999K->469014K(5201920K), > >> >> >>>>>>> 0.0243000 secs] [Times: user=0.18 sys=0.01, real=0.02 secs] > >> >> >>>>>>> -------------------- > >> >> >>>>>>> > >> >> >>>>>>> We're going to try to double the new size on a single node, > to > >> >> >>>>>>> see > >> >> >>>>>>> the > >> >> >>>>>>> effects of that. > >> >> >>>>>>> > >> >> >>>>>>> Beyond this experiment, is there any additional data I can > >> >> >>>>>>> collect > >> >> >>>>>>> to > >> >> >>>>>>> better understand the nature of the promotion failures? > >> >> >>>>>>> Am I facing collecting free list statistics at this point? > >> >> >>>>>>> > >> >> >>>>>>> Thanks, > >> >> >>>>>>> Taras > >> >> >>>>>> > >> >> >>>>>> > >> >> >>>>>> _______________________________________________ > >> >> >>>>>> hotspot-gc-use mailing list > >> >> >>>>>> hotspot-gc-use at openjdk.java.net > >> >> >>>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > >> >> >>>>>> > >> >> >>>>> > >> >> >> _______________________________________________ > >> >> >> hotspot-gc-use mailing list > >> >> >> hotspot-gc-use at openjdk.java.net > >> >> >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > >> >> _______________________________________________ > >> >> hotspot-gc-use mailing list > >> >> hotspot-gc-use at openjdk.java.net > >> >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > >> > > >> > > >> _______________________________________________ > >> hotspot-gc-use mailing list > >> hotspot-gc-use at openjdk.java.net > >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120415/76d0dacc/attachment-0001.html From taras.tielkes at gmail.com Tue Apr 17 04:38:54 2012 From: taras.tielkes at gmail.com (Taras Tielkes) Date: Tue, 17 Apr 2012 13:38:54 +0200 Subject: Promotion failures: indication of CMS fragmentation? In-Reply-To: References: <4EF9FCAC.3030208@oracle.com> <4F06A270.3010701@oracle.com> <4F0DBEC4.7040907@oracle.com> <4F1ECE7B.3040502@oracle.com> <4F1F2ED7.6060308@oracle.com> <4F20F78D.9070905@oracle.com> Message-ID: Hi, Perhaps it's me, but I find it hard to actually understand the error message. The promotion failure error mentions 5 different word sizes, for (I assume) 5 different ParNew threads. Which of these threads actually failed to promote the data to tenured space? The one with the largest work size? It would be nice if the message could be improved/expanded in the future to make it more easy to diagnose such events. -tt On Sun, Apr 15, 2012 at 7:11 PM, Chi Ho Kwok wrote: > Hi Teras, > > Hmm, it looks like it failed even tho there's tons of space available, 2.4G > used, 1.8G free out of 4.2G CMS old gen. Or am I reading the next line > wrong? (snipped age histogram) > > [GC 3296267.500: [ParNew (1: promotion failure size = 16) ?(2: promotion > failure size = 56) ?(4: promotion failure size = 342) ?(5: promotion failure > size = 1026) ?(6: promotion failure size = 278) ?(promotion failed): > 568902K->568678K(682688K), 0.3130580 secs]3296267.813: [CMS: > 2510091K->573489K(4423680K), 7.7481330 secs] 3078184K->573489K(5106368K), > [CMS Perm : 144002K->143970K(262144K)], 8.0619690 secs] [Times: user=8.35 > sys=0.01, real=8.06 secs] > > > Normally I'd say fragmentation, but how can it not find a spot for a 16 > bytes chunk? I'm completely out of ideas now - anyone else? > > > Here's a brute force "solution": what the app "needs" is 600M of live data > in the old gen, that's what left usually after collection. Increase "safety > margin" by?adding memory to the old gen pool if possible by increasing total > heap size, and set initial occupancy ratio to a silly low number like 45%. > Hopefully, it will survive until the next software/jvm/kernel patch that > requires a restart of the service or machine. > > > I've seen something similar in our logs as well, with 19%/2.9GB free, my > guess is that CMS needs a few GB to play with... Nowadays we run with a > larger safety margin, doubled the heap on that machine to 32GB, I haven't > seen any CMS and promotion failures since then (Jan 2010). > > 128265.354: [GC 128265.355: [ParNew (promotion failed): > 589631K->589631K(589824K), 0.3582340 secs]128265.713: [CMS: > 12965822K->10393148K(15990784K), 20.9654520 secs] > 13462337K->10393148K(16580608K), [CMS Perm : 20604K->16846K(34456K)], > 21.3239890 secs] [Times: user=22.06 sys=0.09, real=21.32 secs] > > > > Regards, > > Chi Ho > > > On Sun, Apr 15, 2012 at 6:41 PM, Taras Tielkes > wrote: >> >> Hi Chi, >> >> I've sent you a decent chunk of the gc.log file off-list (hopefully >> not too large). >> >> For completeness, we're running with the following options (ignoring >> the diagnostic ones): >> ----- >> -server >> -Xms5g >> -Xmx5g >> -Xmn800m >> -XX:PermSize=256m >> -XX:MaxPermSize=256m >> -XX:SurvivorRatio=4 >> -XX:+UseConcMarkSweepGC >> -XX:+UseParNewGC >> -XX:+DisableExplicitGC >> -XX:+UseCMSInitiatingOccupancyOnly >> -XX:+CMSClassUnloadingEnabled >> -XX:CMSInitiatingOccupancyFraction=68 >> ----- >> Platform is Java 6u29 running on Linux 2.6 x64. >> Hardware is 2xquad Xeons, but pretty old ones (pre-Nehalem, no QPI). >> >> The gc logs will (typically) show big peaks at the start and end of >> the working day - this is nature of the domain our application >> targets. >> >> I would expect the live set to be below 1G (usually below 600M even). >> However, we can experience temporary spikes of higher volume >> longer-living object allocation bursts. >> >> We'll set up a jstat log for this machine. I do have historical jstat >> logs for one of the other machines, but that one is still running with >> a smaller new gen, and smaller survivor spaces. If there's any other >> relevant data that I can collect, let me know. >> >> Kind regards, >> Taras >> >> On Sun, Apr 15, 2012 at 6:15 PM, Chi Ho Kwok wrote: >> > Hi Teras, >> > >> > Sure thing. Just the previous CMS should be enough, it doesn't matter if >> > there is 10 or 1000 parnew's between that and the failure. >> > >> > As for the jstat failure, it looks like it looks in >> > /tmp/hsperfdata_[username] for the pid by default, maybe something >> > like?-J-Djava.io.tmpdir=[path, like /app/client/program/tomcat/temp] can >> > help; and from what I've seen, running jstat as the same user as the >> > process >> > or root is required. Historical data is nice to have, but even just >> > staring >> > at it for 15 minutes should give you a hint for the old gen usage. >> > >> > If the collection starts at 68, takes a while and the heap fills to 80%+ >> > before it's done when it's not busy, it's probably wise to lower the >> > initial >> > occupancy factor or increase the thread count so it completes faster. We >> > run >> > with?-XX:ParallelCMSThreads=3 on a 8 hw thread server, the default (2) >> > was >> > too slow for us as we run with 76%, it still takes 15s on average for >> > CMS to >> > scan and clean the old gen (while old gen grows to up to 80% full), much >> > longer can mean a promotion failure during request spikes. >> > >> > >> > Chi Ho Kwok >> > >> > >> > On Sun, Apr 15, 2012 at 5:08 PM, Taras Tielkes >> > wrote: >> >> >> >> Hi Chi, >> >> >> >> Is it o.k. if I send this off-list to you directly? If so, how much >> >> more do you need? Just enough to cover the previous CMS? >> >> We're running with ?-XX:CMSInitiatingOccupancyFraction=68 and >> >> -XX:+UseCMSInitiatingOccupancyOnly, by the way. >> >> >> >> I do have shell access, however, on that particular machine we're >> >> experiencing the "process id not found" issue with jstat. >> >> I think this can be worked around by fiddling with temp directory >> >> options, but we haven't tried that yet. >> >> Regarding the jstat output, I assume this would be most valuable to >> >> have for the exact moment when the promotion failure happens, correct? >> >> If so, we can try to set up jstat to run in the background >> >> continuously, to have more diagnostic data in the future. >> >> >> >> Kind regards, >> >> Taras >> >> >> >> On Sun, Apr 15, 2012 at 2:48 PM, Chi Ho Kwok >> >> wrote: >> >> > Hi Teras, >> >> > >> >> > Can you send me a larger chunk of the log? I'm interested in seeing >> >> > when >> >> > the >> >> > last CMS was run and what it freed. Maybe it's kicking in too late, >> >> > the >> >> > full >> >> > GC triggered by promotion failure only found 600M live data, rest was >> >> > garbage. If that's the cause, >> >> > lowering?XX:CMSInitiatingOccupancyFraction >> >> > can >> >> > help. >> >> > >> >> > Also, do you have shell access to that machine? If so, try running >> >> > jstat, >> >> > you can see the usage of all generations live as it happens. >> >> > >> >> > >> >> > Chi Ho Kwok >> >> > >> >> > On Sun, Apr 15, 2012 at 2:34 PM, Taras Tielkes >> >> > >> >> > wrote: >> >> >> >> >> >> Hi Chi, Srinivas, >> >> >> >> >> >> Optimizing the cost of ParNew (by lowering MTT) would be nice, but >> >> >> for >> >> >> now my priority is still to minimize the promotion failures. >> >> >> >> >> >> For example, on the machine running CMS with the "larger" young gen >> >> >> and survivor spaces (-Xmx5g -Xmn800 -XX:SurvivorRatio=4), I've just >> >> >> seen a promotion failure again. Below is a snippet of gc.log showing >> >> >> this. >> >> >> To put this into perspective, this is a first promotion failure on >> >> >> that machine in a couple of weeks. Still, zero failures would beat a >> >> >> single failure, since the clients connecting to this application >> >> >> will >> >> >> only wait a few seconds before timing out and terminating the >> >> >> connection. In addition, the promotion failures are occurring in >> >> >> peak >> >> >> usage moments. >> >> >> >> >> >> Apart from trying to eliminate the promotion failure pauses, my main >> >> >> goal is to learn how to understand the root cause in a case like >> >> >> this. >> >> >> Any suggestions for things to try or read up on are appreciated. >> >> >> >> >> >> Kind regards, >> >> >> Taras >> >> >> ------------------------------------------------ >> >> >> 2012-04-13T17:44:27.777+0200: 3296255.045: [GC 3296255.046: [ParNew >> >> >> Desired survivor size 69894144 bytes, new threshold 15 (max 15) >> >> >> - age ? 1: ? ?3684448 bytes, ? ?3684448 total >> >> >> - age ? 2: ? ? 824984 bytes, ? ?4509432 total >> >> >> - age ? 3: ? ? 885120 bytes, ? ?5394552 total >> >> >> - age ? 4: ? ? 756568 bytes, ? ?6151120 total >> >> >> - age ? 5: ? ? 696880 bytes, ? ?6848000 total >> >> >> - age ? 6: ? ? 890688 bytes, ? ?7738688 total >> >> >> - age ? 7: ? ?2631184 bytes, ? 10369872 total >> >> >> - age ? 8: ? ? 719976 bytes, ? 11089848 total >> >> >> - age ? 9: ? ? 724944 bytes, ? 11814792 total >> >> >> - age ?10: ? ? 750360 bytes, ? 12565152 total >> >> >> - age ?11: ? ? 934944 bytes, ? 13500096 total >> >> >> - age ?12: ? ? 521080 bytes, ? 14021176 total >> >> >> - age ?13: ? ? 543392 bytes, ? 14564568 total >> >> >> - age ?14: ? ? 906616 bytes, ? 15471184 total >> >> >> - age ?15: ? ? 504008 bytes, ? 15975192 total >> >> >> : 568932K->22625K(682688K), 0.0410180 secs] >> >> >> 3077079K->2531413K(5106368K), 0.0416940 secs] [Times: user=0.30 >> >> >> sys=0.01, real=0.05 secs] >> >> >> 2012-04-13T17:44:33.893+0200: 3296261.162: [GC 3296261.162: [ParNew >> >> >> Desired survivor size 69894144 bytes, new threshold 15 (max 15) >> >> >> - age ? 1: ? ?2975896 bytes, ? ?2975896 total >> >> >> - age ? 2: ? ? 742592 bytes, ? ?3718488 total >> >> >> - age ? 3: ? ? 812864 bytes, ? ?4531352 total >> >> >> - age ? 4: ? ? 873488 bytes, ? ?5404840 total >> >> >> - age ? 5: ? ? 746128 bytes, ? ?6150968 total >> >> >> - age ? 6: ? ? 685192 bytes, ? ?6836160 total >> >> >> - age ? 7: ? ? 888376 bytes, ? ?7724536 total >> >> >> - age ? 8: ? ?2621688 bytes, ? 10346224 total >> >> >> - age ? 9: ? ? 715608 bytes, ? 11061832 total >> >> >> - age ?10: ? ? 723336 bytes, ? 11785168 total >> >> >> - age ?11: ? ? 749856 bytes, ? 12535024 total >> >> >> - age ?12: ? ? 914632 bytes, ? 13449656 total >> >> >> - age ?13: ? ? 520944 bytes, ? 13970600 total >> >> >> - age ?14: ? ? 543224 bytes, ? 14513824 total >> >> >> - age ?15: ? ? 906040 bytes, ? 15419864 total >> >> >> : 568801K->22726K(682688K), 0.0447800 secs] >> >> >> 3077589K->2532008K(5106368K), 0.0454710 secs] [Times: user=0.33 >> >> >> sys=0.00, real=0.05 secs] >> >> >> 2012-04-13T17:44:40.231+0200: 3296267.499: [GC 3296267.500: [ParNew >> >> >> (1: promotion failure size = 16) ?(2: promotion failure size = 56) >> >> >> (4: promotion failure >> >> >> size = 342) ?(5: promotion failure size = 1026) ?(6: promotion >> >> >> failure >> >> >> size = 278) ?(promotion failed) >> >> >> Desired survivor size 69894144 bytes, new threshold 15 (max 15) >> >> >> - age ? 1: ? ?2436840 bytes, ? ?2436840 total >> >> >> - age ? 2: ? ?1625136 bytes, ? ?4061976 total >> >> >> - age ? 3: ? ? 691664 bytes, ? ?4753640 total >> >> >> - age ? 4: ? ? 799992 bytes, ? ?5553632 total >> >> >> - age ? 5: ? ? 858344 bytes, ? ?6411976 total >> >> >> - age ? 6: ? ? 730200 bytes, ? ?7142176 total >> >> >> - age ? 7: ? ? 680072 bytes, ? ?7822248 total >> >> >> - age ? 8: ? ? 885960 bytes, ? ?8708208 total >> >> >> - age ? 9: ? ?2618544 bytes, ? 11326752 total >> >> >> - age ?10: ? ? 709168 bytes, ? 12035920 total >> >> >> - age ?11: ? ? 714576 bytes, ? 12750496 total >> >> >> - age ?12: ? ? 734976 bytes, ? 13485472 total >> >> >> - age ?13: ? ? 905048 bytes, ? 14390520 total >> >> >> - age ?14: ? ? 520320 bytes, ? 14910840 total >> >> >> - age ?15: ? ? 543056 bytes, ? 15453896 total >> >> >> : 568902K->568678K(682688K), 0.3130580 secs]3296267.813: [CMS: >> >> >> 2510091K->573489K(4423680K), 7.7481330 secs] >> >> >> 3078184K->573489K(5106368K), [CMS Perm : 144002K-> >> >> >> 143970K(262144K)], 8.0619690 secs] [Times: user=8.35 sys=0.01, >> >> >> real=8.06 >> >> >> secs] >> >> >> 2012-04-13T17:44:51.337+0200: 3296278.606: [GC 3296278.606: [ParNew >> >> >> Desired survivor size 69894144 bytes, new threshold 15 (max 15) >> >> >> - age ? 1: ? 33717528 bytes, ? 33717528 total >> >> >> : 546176K->43054K(682688K), 0.0515990 secs] >> >> >> 1119665K->616543K(5106368K), 0.0523550 secs] [Times: user=0.34 >> >> >> sys=0.00, real=0.05 secs] >> >> >> ------------------------------------------------ >> >> >> >> >> >> On Tue, Mar 20, 2012 at 10:12 PM, Srinivas Ramakrishna >> >> >> wrote: >> >> >> > As Chi-ho noted, about 3-4 MB of data does get promoted per >> >> >> > scavenge, >> >> >> > after having >> >> >> > sloshed around in your survivor spaces some 15 times. I'd venture >> >> >> > that >> >> >> > whatever winnowing >> >> >> > of young objects was to ocur has in fact occured already within >> >> >> > the >> >> >> > first 3-4 scavenges that >> >> >> > an object has survived, after which the drop-off in population is >> >> >> > less >> >> >> > sharp. So I'd suggest >> >> >> > lowering the MTT to about 3, while leaving the survivor ratio >> >> >> > intact. >> >> >> > That should reduce your >> >> >> > copying costs and bring down your scavenge pauses further, while >> >> >> > not >> >> >> > adversely affecting >> >> >> > your promotion rates (and concomitantly the fragmentation). >> >> >> > >> >> >> > One thing that was a bit puzzling about the stats below was that >> >> >> > you'd >> >> >> > expect the volume >> >> >> > of generation X in scavenge N to be no less than the volume of >> >> >> > generation X+1 in scavenge N+1, >> >> >> > but occasionally that natural invariant does not appear to hold, >> >> >> > which >> >> >> > is quite puzzling -- >> >> >> > indicating perhaps that either ages or populations are not being >> >> >> > correctly tracked. >> >> >> > >> >> >> > I don't know if anyone else has noticed that in their tenuring >> >> >> > distributions as well.... >> >> >> > >> >> >> > -- ramki >> >> >> > >> >> >> > On Tue, Mar 20, 2012 at 9:36 AM, Taras Tielkes >> >> >> > >> >> >> > wrote: >> >> >> >> Hi, >> >> >> >> >> >> >> >> I've collected -XX:+PrintTenuringDistribution data from a node in >> >> >> >> our >> >> >> >> production environment, running -Xmx5g -Xmn400m >> >> >> >> -XX:SurvivorRatio=8. >> >> >> >> On one other production node, we've configured a larger new gen, >> >> >> >> and >> >> >> >> larger survivor spaces (-Xmx5g -Xmn800m -XX:SurvivorRatio=4). >> >> >> >> This node has -XX:+PrintTenuringDistribution logging as well. >> >> >> >> >> >> >> >> The node running the larger new gen and survivor spaces has not >> >> >> >> run >> >> >> >> into a promotion failure yet, while the ones still running the >> >> >> >> old >> >> >> >> config have hit a few. >> >> >> >> The promotion failures are typically experienced at high load >> >> >> >> periods, >> >> >> >> which makes sense, as allocation and promotion will experience a >> >> >> >> spike >> >> >> >> in those periods as well. >> >> >> >> >> >> >> >> The inherent nature of the application implies relatively long >> >> >> >> sessions (towards a few hours), retaining a fair amout of state >> >> >> >> up >> >> >> >> to >> >> >> >> an hour. >> >> >> >> I believe this is the main reason of the relatively high >> >> >> >> promotion >> >> >> >> rate we're experiencing. >> >> >> >> >> >> >> >> >> >> >> >> Here's a fragment of gc log from one of the nodes running the >> >> >> >> older >> >> >> >> (smaller) new gen, including a promotion failure: >> >> >> >> ------------------------- >> >> >> >> 2012-03-15T18:32:17.785+0100: 796604.225: [GC 796604.225: [ParNew >> >> >> >> Desired survivor size 20971520 bytes, new threshold 8 (max 15) >> >> >> >> - age ? 1: ? ?2927728 bytes, ? ?2927728 total >> >> >> >> - age ? 2: ? ?2428512 bytes, ? ?5356240 total >> >> >> >> - age ? 3: ? ?2696376 bytes, ? ?8052616 total >> >> >> >> - age ? 4: ? ?2623576 bytes, ? 10676192 total >> >> >> >> - age ? 5: ? ?3365576 bytes, ? 14041768 total >> >> >> >> - age ? 6: ? ?2792272 bytes, ? 16834040 total >> >> >> >> - age ? 7: ? ?2233008 bytes, ? 19067048 total >> >> >> >> - age ? 8: ? ?2263824 bytes, ? 21330872 total >> >> >> >> : 358709K->29362K(368640K), 0.0461460 secs] >> >> >> >> 3479492K->3151874K(5201920K), 0.0467320 secs] [Times: user=0.34 >> >> >> >> sys=0.01, real=0.05 secs] >> >> >> >> 2012-03-15T18:32:21.546+0100: 796607.986: [GC 796607.986: [ParNew >> >> >> >> (0: >> >> >> >> promotion failure size = 25) ?(1: promotion failure size = 25) >> >> >> >> ?(2: >> >> >> >> promotion failure size = 25) ?(3: promotion failure size = 25) >> >> >> >> ?(4: >> >> >> >> promotion failure size = 25) ?(5 >> >> >> >> : promotion failure size = 25) ?(6: promotion failure size = 341) >> >> >> >> ?(7: >> >> >> >> promotion failure size = 25) ?(promotion failed) >> >> >> >> Desired survivor size 20971520 bytes, new threshold 8 (max 15) >> >> >> >> - age ? 1: ? ?3708208 bytes, ? ?3708208 total >> >> >> >> - age ? 2: ? ?2174384 bytes, ? ?5882592 total >> >> >> >> - age ? 3: ? ?2383256 bytes, ? ?8265848 total >> >> >> >> - age ? 4: ? ?2689912 bytes, ? 10955760 total >> >> >> >> - age ? 5: ? ?2621832 bytes, ? 13577592 total >> >> >> >> - age ? 6: ? ?3360440 bytes, ? 16938032 total >> >> >> >> - age ? 7: ? ?2784136 bytes, ? 19722168 total >> >> >> >> - age ? 8: ? ?2220232 bytes, ? 21942400 total >> >> >> >> : 357042K->356456K(368640K), 0.2734100 secs]796608.259: [CMS: >> >> >> >> 3124189K->516640K(4833280K), 6.8127070 secs] >> >> >> >> 3479554K->516640K(5201920K), [CMS Perm : >> >> >> >> 142423K->142284K(262144K)], >> >> >> >> 7.0867850 secs] [Times: user=7.32 sys=0.07, real=7.09 secs] >> >> >> >> 2012-03-15T18:32:30.279+0100: 796616.719: [GC 796616.720: [ParNew >> >> >> >> Desired survivor size 20971520 bytes, new threshold 1 (max 15) >> >> >> >> - age ? 1: ? 29721456 bytes, ? 29721456 total >> >> >> >> : 327680K->40960K(368640K), 0.0403130 secs] >> >> >> >> 844320K->557862K(5201920K), 0.0409070 secs] [Times: user=0.27 >> >> >> >> sys=0.01, real=0.04 secs] >> >> >> >> 2012-03-15T18:32:32.701+0100: 796619.141: [GC 796619.141: [ParNew >> >> >> >> Desired survivor size 20971520 bytes, new threshold 15 (max 15) >> >> >> >> - age ? 1: ? 10310176 bytes, ? 10310176 total >> >> >> >> ------------------------- >> >> >> >> >> >> >> >> For contrast, here's a gc log fragment from the single node >> >> >> >> running >> >> >> >> the larger new gen and larger survivor spaces: >> >> >> >> (the fragment is from the same point in time, with the nodes >> >> >> >> experiencing equal load) >> >> >> >> ------------------------- >> >> >> >> 2012-03-15T18:32:12.067+0100: 797119.336: [GC 797119.336: [ParNew >> >> >> >> Desired survivor size 69894144 bytes, new threshold 15 (max 15) >> >> >> >> - age ? 1: ? ?5611536 bytes, ? ?5611536 total >> >> >> >> - age ? 2: ? ?3731888 bytes, ? ?9343424 total >> >> >> >> - age ? 3: ? ?3450672 bytes, ? 12794096 total >> >> >> >> - age ? 4: ? ?3314744 bytes, ? 16108840 total >> >> >> >> - age ? 5: ? ?3459888 bytes, ? 19568728 total >> >> >> >> - age ? 6: ? ?3334712 bytes, ? 22903440 total >> >> >> >> - age ? 7: ? ?3671960 bytes, ? 26575400 total >> >> >> >> - age ? 8: ? ?3841608 bytes, ? 30417008 total >> >> >> >> - age ? 9: ? ?2035392 bytes, ? 32452400 total >> >> >> >> - age ?10: ? ?1975056 bytes, ? 34427456 total >> >> >> >> - age ?11: ? ?2021344 bytes, ? 36448800 total >> >> >> >> - age ?12: ? ?1520752 bytes, ? 37969552 total >> >> >> >> - age ?13: ? ?1494176 bytes, ? 39463728 total >> >> >> >> - age ?14: ? ?2355136 bytes, ? 41818864 total >> >> >> >> - age ?15: ? ?1279000 bytes, ? 43097864 total >> >> >> >> : 603473K->61640K(682688K), 0.0756570 secs] >> >> >> >> 3373284K->2832383K(5106368K), 0.0762090 secs] [Times: user=0.56 >> >> >> >> sys=0.00, real=0.08 secs] >> >> >> >> 2012-03-15T18:32:18.200+0100: 797125.468: [GC 797125.469: [ParNew >> >> >> >> Desired survivor size 69894144 bytes, new threshold 15 (max 15) >> >> >> >> - age ? 1: ? ?6101320 bytes, ? ?6101320 total >> >> >> >> - age ? 2: ? ?4446776 bytes, ? 10548096 total >> >> >> >> - age ? 3: ? ?3701384 bytes, ? 14249480 total >> >> >> >> - age ? 4: ? ?3438488 bytes, ? 17687968 total >> >> >> >> - age ? 5: ? ?3295360 bytes, ? 20983328 total >> >> >> >> - age ? 6: ? ?3403320 bytes, ? 24386648 total >> >> >> >> - age ? 7: ? ?3323368 bytes, ? 27710016 total >> >> >> >> - age ? 8: ? ?3665760 bytes, ? 31375776 total >> >> >> >> - age ? 9: ? ?2427904 bytes, ? 33803680 total >> >> >> >> - age ?10: ? ?1418656 bytes, ? 35222336 total >> >> >> >> - age ?11: ? ?1955192 bytes, ? 37177528 total >> >> >> >> - age ?12: ? ?2006064 bytes, ? 39183592 total >> >> >> >> - age ?13: ? ?1520768 bytes, ? 40704360 total >> >> >> >> - age ?14: ? ?1493728 bytes, ? 42198088 total >> >> >> >> - age ?15: ? ?2354376 bytes, ? 44552464 total >> >> >> >> : 607816K->62650K(682688K), 0.0779270 secs] >> >> >> >> 3378559K->2834643K(5106368K), 0.0784690 secs] [Times: user=0.58 >> >> >> >> sys=0.00, real=0.08 secs] >> >> >> >> ------------------------- >> >> >> >> >> >> >> >> Questions: >> >> >> >> >> >> >> >> 1) From the tenuring distributions, it seems that the application >> >> >> >> benefits from larger new gen and survivor spaces. >> >> >> >> The next thing we'll try is to run with -Xmn1g >> >> >> >> -XX:SurvivorRatio=2, >> >> >> >> and see if the ParNew times are still acceptable. >> >> >> >> Does this seem a sensible approach in this context? >> >> >> >> Are there other variables beyond ParNew times that limit scaling >> >> >> >> the >> >> >> >> new gen to a large size? >> >> >> >> >> >> >> >> 2) Given the object age demographics inherent to our application, >> >> >> >> we >> >> >> >> can not expect to see the majority of data get collected in the >> >> >> >> new >> >> >> >> gen. >> >> >> >> >> >> >> >> Our approach to fight the promotion failures consists of three >> >> >> >> aspects: >> >> >> >> a) Lower the overall allocation rate of our application (by >> >> >> >> improving >> >> >> >> wasteful hotspots), to decrease overall ParNew collection >> >> >> >> frequency. >> >> >> >> b) Configure the new gen and survivor spaces as large as >> >> >> >> possible, >> >> >> >> keeping an eye on ParNew times and overall new/tenured ratio. >> >> >> >> c) Try to refactor the data structures that form the bulk of >> >> >> >> promoted >> >> >> >> data, to retain only the strictly required subgraphs. >> >> >> >> >> >> >> >> Is there anything else I can try or measure, in order to better >> >> >> >> understand the problem? >> >> >> >> >> >> >> >> Thanks in advance, >> >> >> >> Taras >> >> >> >> >> >> >> >> >> >> >> >> On Wed, Feb 22, 2012 at 10:51 AM, Taras Tielkes >> >> >> >> wrote: >> >> >> >>> (this time properly responding to the list alias) >> >> >> >>> Hi Srinivas, >> >> >> >>> >> >> >> >>> We're running 1.6.0 u29 on Linux x64. My understanding is that >> >> >> >>> CompressedOops is enabled by default since u23. >> >> >> >>> >> >> >> >>> At least this page seems to support that: >> >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> http://docs.oracle.com/javase/7/docs/technotes/guides/vm/performance-enhancements-7.html >> >> >> >>> >> >> >> >>> Regarding the other remarks (also from Todd and Chi), I'll >> >> >> >>> comment >> >> >> >>> later. The first thing on my list is to collect >> >> >> >>> PrintTenuringDistribution data now. >> >> >> >>> >> >> >> >>> Kind regards, >> >> >> >>> Taras >> >> >> >>> >> >> >> >>> On Wed, Feb 22, 2012 at 10:50 AM, Taras Tielkes >> >> >> >>> wrote: >> >> >> >>>> Hi Srinivas, >> >> >> >>>> >> >> >> >>>> We're running 1.6.0 u29 on Linux x64. My understanding is that >> >> >> >>>> CompressedOops is enabled by default since u23. >> >> >> >>>> >> >> >> >>>> At least this page seems to support that: >> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >> >>>> http://docs.oracle.com/javase/7/docs/technotes/guides/vm/performance-enhancements-7.html >> >> >> >>>> >> >> >> >>>> Regarding the other remarks (also from Todd and Chi), I'll >> >> >> >>>> comment >> >> >> >>>> later. The first thing on my list is to collect >> >> >> >>>> PrintTenuringDistribution data now. >> >> >> >>>> >> >> >> >>>> Kind regards, >> >> >> >>>> Taras >> >> >> >>>> >> >> >> >>>> On Wed, Feb 22, 2012 at 12:40 AM, Srinivas Ramakrishna >> >> >> >>>> wrote: >> >> >> >>>>> I agree that premature promotions are almost always the first >> >> >> >>>>> and >> >> >> >>>>> most >> >> >> >>>>> important thing to fix when running >> >> >> >>>>> into fragmentation or overload issues with CMS. However, I can >> >> >> >>>>> also >> >> >> >>>>> imagine >> >> >> >>>>> long-lived objects with a highly >> >> >> >>>>> non-stationary size distribution which can also cause problems >> >> >> >>>>> for >> >> >> >>>>> CMS >> >> >> >>>>> despite best efforts to tune against >> >> >> >>>>> premature promotion. >> >> >> >>>>> >> >> >> >>>>> I didn't think Treas was running with MTT=0, although MTT > 0 >> >> >> >>>>> is >> >> >> >>>>> no >> >> >> >>>>> recipe >> >> >> >>>>> for avoiding premature promotion >> >> >> >>>>> with bursty loads that case overflow the survivor spaces -- as >> >> >> >>>>> you >> >> >> >>>>> say large >> >> >> >>>>> survivor spaces with a low >> >> >> >>>>> TargetSurvivorRatio -- so as to leave plenty of space to >> >> >> >>>>> absorb/accommodate >> >> >> >>>>> spiking/bursty loads? is >> >> >> >>>>> definitely a "best practice" for CMS (and possibly for other >> >> >> >>>>> concurrent >> >> >> >>>>> collectors as well). >> >> >> >>>>> >> >> >> >>>>> One thing Taras can do to see if premature promotion might be >> >> >> >>>>> an >> >> >> >>>>> issue is to >> >> >> >>>>> look at the tenuring >> >> >> >>>>> threshold in his case. A rough proxy (if >> >> >> >>>>> PrintTenuringDistribution >> >> >> >>>>> is not >> >> >> >>>>> enabled) is to look at the >> >> >> >>>>> promotion volume per scavenge. It may be possible, if >> >> >> >>>>> premature >> >> >> >>>>> promotion is >> >> >> >>>>> a cause, to see >> >> >> >>>>> some kind of medium-term correlation between high promotion >> >> >> >>>>> volume >> >> >> >>>>> and >> >> >> >>>>> eventual promotion >> >> >> >>>>> failure despite frequent CMS collections. >> >> >> >>>>> >> >> >> >>>>> One other point which may or may not be relevant. I see that >> >> >> >>>>> Taras >> >> >> >>>>> is not >> >> >> >>>>> using CompressedOops... >> >> >> >>>>> Using that alone would greatly decrease memory pressure and >> >> >> >>>>> provide >> >> >> >>>>> more >> >> >> >>>>> breathing room to CMS, >> >> >> >>>>> which is also almost always a good idea. >> >> >> >>>>> >> >> >> >>>>> -- ramki >> >> >> >>>>> >> >> >> >>>>> On Tue, Feb 21, 2012 at 10:16 AM, Chi Ho Kwok >> >> >> >>>>> >> >> >> >>>>> wrote: >> >> >> >>>>>> >> >> >> >>>>>> Hi Teras, >> >> >> >>>>>> >> >> >> >>>>>> I think you may want to look into sizing the new and >> >> >> >>>>>> especially >> >> >> >>>>>> the >> >> >> >>>>>> survivor spaces differently. We run something similar to what >> >> >> >>>>>> you >> >> >> >>>>>> described, >> >> >> >>>>>> high volume request processing with large dataset loading, >> >> >> >>>>>> and >> >> >> >>>>>> what >> >> >> >>>>>> we've >> >> >> >>>>>> seen at the start is that the survivor spaces are completely >> >> >> >>>>>> overloaded, >> >> >> >>>>>> causing premature promotions. >> >> >> >>>>>> >> >> >> >>>>>> We've configured our vm with the following goals/guideline: >> >> >> >>>>>> >> >> >> >>>>>> old space is for semi-permanent data, living for at least >> >> >> >>>>>> 30s, >> >> >> >>>>>> average ~10 >> >> >> >>>>>> minutes >> >> >> >>>>>> new space contains only temporary and just loaded data >> >> >> >>>>>> surviving objects from new should never reach old in 1 gc, so >> >> >> >>>>>> the >> >> >> >>>>>> survivor >> >> >> >>>>>> space may never be 100% full >> >> >> >>>>>> >> >> >> >>>>>> With jstat -gcutil `pidof java` 2000, we see things like: >> >> >> >>>>>> >> >> >> >>>>>> ? S0 ? ? S1 ? ? E ? ? ?O ? ? ?P ? ? YGC ? ? YGCT ? ?FGC >> >> >> >>>>>> ?FGCT >> >> >> >>>>>> GCT >> >> >> >>>>>> ?70.20 ? 0.00 ?19.65 ?57.60 ?59.90 124808 29474.299 ?2498 >> >> >> >>>>>> ?191.110 >> >> >> >>>>>> 29665.409 >> >> >> >>>>>> ?70.20 ? 0.00 ?92.89 ?57.60 ?59.90 124808 29474.299 ?2498 >> >> >> >>>>>> ?191.110 >> >> >> >>>>>> 29665.409 >> >> >> >>>>>> ?70.20 ? 0.00 ?93.47 ?57.60 ?59.90 124808 29474.299 ?2498 >> >> >> >>>>>> ?191.110 >> >> >> >>>>>> 29665.409 >> >> >> >>>>>> ? 0.00 ?65.69 ?78.07 ?58.09 ?59.90 124809 29474.526 ?2498 >> >> >> >>>>>> ?191.110 >> >> >> >>>>>> 29665.636 >> >> >> >>>>>> ?84.97 ? 0.00 ?48.19 ?58.57 ?59.90 124810 29474.774 ?2498 >> >> >> >>>>>> ?191.110 >> >> >> >>>>>> 29665.884 >> >> >> >>>>>> ?84.97 ? 0.00 ?81.30 ?58.57 ?59.90 124810 29474.774 ?2498 >> >> >> >>>>>> ?191.110 >> >> >> >>>>>> 29665.884 >> >> >> >>>>>> ? 0.00 ?62.64 ?27.22 ?59.12 ?59.90 124811 29474.992 ?2498 >> >> >> >>>>>> ?191.110 >> >> >> >>>>>> 29666.102 >> >> >> >>>>>> ? 0.00 ?62.64 ?54.47 ?59.12 ?59.90 124811 29474.992 ?2498 >> >> >> >>>>>> ?191.110 >> >> >> >>>>>> 29666.102 >> >> >> >>>>>> ?75.68 ? 0.00 ? 6.80 ?59.53 ?59.90 124812 29475.228 ?2498 >> >> >> >>>>>> ?191.110 >> >> >> >>>>>> 29666.338 >> >> >> >>>>>> ?75.68 ? 0.00 ?23.38 ?59.53 ?59.90 124812 29475.228 ?2498 >> >> >> >>>>>> ?191.110 >> >> >> >>>>>> 29666.338 >> >> >> >>>>>> ?75.68 ? 0.00 ?27.72 ?59.53 ?59.90 124812 29475.228 ?2498 >> >> >> >>>>>> ?191.110 >> >> >> >>>>>> 29666.338 >> >> >> >>>>>> >> >> >> >>>>>> If you follow the lines, you can see Eden fill up to 100% on >> >> >> >>>>>> line >> >> >> >>>>>> 4, >> >> >> >>>>>> surviving objects are copied into S1, S0 is collected and >> >> >> >>>>>> added >> >> >> >>>>>> 0.49% to >> >> >> >>>>>> Old. On line 5, another GC happened, with Eden->S0, S1->Old, >> >> >> >>>>>> etc. >> >> >> >>>>>> No objects >> >> >> >>>>>> is ever transferred from Eden to Old, unless there's a huge >> >> >> >>>>>> peak >> >> >> >>>>>> of >> >> >> >>>>>> requests. >> >> >> >>>>>> >> >> >> >>>>>> This is with a: 32GB heap, Mxn1200M, SurvivorRatio 2 (600MB >> >> >> >>>>>> Eden, >> >> >> >>>>>> 300MB >> >> >> >>>>>> S0, 300MB S1), MaxTenuringThreshold 1 (whatever is still >> >> >> >>>>>> alive >> >> >> >>>>>> in >> >> >> >>>>>> S0/1 on >> >> >> >>>>>> the second GC is copied to old, don't wait, web requests are >> >> >> >>>>>> quite >> >> >> >>>>>> bursty). >> >> >> >>>>>> With about 1 collection every 2-5 seconds, objects promoted >> >> >> >>>>>> to >> >> >> >>>>>> Old >> >> >> >>>>>> must live >> >> >> >>>>>> for at 4-10 seconds; as that's longer than an average request >> >> >> >>>>>> (50ms-1s), >> >> >> >>>>>> none of the temporary data ever makes it into Old, which is >> >> >> >>>>>> much >> >> >> >>>>>> more >> >> >> >>>>>> expensive to collect. It works even with a higher than >> >> >> >>>>>> default >> >> >> >>>>>> CMSInitiatingOccupancyFraction=76 to optimize for space >> >> >> >>>>>> available >> >> >> >>>>>> for the >> >> >> >>>>>> large data cache we have. >> >> >> >>>>>> >> >> >> >>>>>> >> >> >> >>>>>> With your config of 400MB Total new, with 350MB Eden, 25MB >> >> >> >>>>>> S0, >> >> >> >>>>>> 25MB >> >> >> >>>>>> S1 >> >> >> >>>>>> (SurvivorRatio 8), no tenuring threshold, I think loads of >> >> >> >>>>>> new >> >> >> >>>>>> objects get >> >> >> >>>>>> copied from Eden to Old directly, causing trouble for the >> >> >> >>>>>> CMS. >> >> >> >>>>>> You >> >> >> >>>>>> can use >> >> >> >>>>>> jstat to get live stats and tweak until it doesn't happen. If >> >> >> >>>>>> you >> >> >> >>>>>> can't make >> >> >> >>>>>> changes on live that easil, try doubling the new size indeed, >> >> >> >>>>>> with >> >> >> >>>>>> a 400 >> >> >> >>>>>> Eden, 200 S0, 200 S1 and?MaxTenuringThreshold?1 setting. It's >> >> >> >>>>>> probably >> >> >> >>>>>> overkill, but if should solve the problem if it is caused by >> >> >> >>>>>> premature >> >> >> >>>>>> promotion. >> >> >> >>>>>> >> >> >> >>>>>> >> >> >> >>>>>> Chi Ho Kwok >> >> >> >>>>>> >> >> >> >>>>>> >> >> >> >>>>>> On Tue, Feb 21, 2012 at 5:55 PM, Taras Tielkes >> >> >> >>>>>> >> >> >> >>>>>> wrote: >> >> >> >>>>>>> >> >> >> >>>>>>> Hi, >> >> >> >>>>>>> >> >> >> >>>>>>> We've removed the "-XX:+CMSScavengeBeforeRemark" setting >> >> >> >>>>>>> from >> >> >> >>>>>>> 50% >> >> >> >>>>>>> of >> >> >> >>>>>>> our production nodes. >> >> >> >>>>>>> After running for a few weeks, it seems that there's no >> >> >> >>>>>>> impact >> >> >> >>>>>>> from >> >> >> >>>>>>> removing this option. >> >> >> >>>>>>> Which is good, since it seems we can remove it from the >> >> >> >>>>>>> other >> >> >> >>>>>>> nodes as >> >> >> >>>>>>> well, simplifying our overall JVM configuration ;-) >> >> >> >>>>>>> >> >> >> >>>>>>> However, we're still seeing promotion failures on all nodes, >> >> >> >>>>>>> once >> >> >> >>>>>>> every day or so. >> >> >> >>>>>>> >> >> >> >>>>>>> There's still the "Magic 1026": this accounts for ~60% of >> >> >> >>>>>>> the >> >> >> >>>>>>> promotion failures that we're seeing (single ParNew thread >> >> >> >>>>>>> thread, >> >> >> >>>>>>> 1026 failure size): >> >> >> >>>>>>> -------------------- >> >> >> >>>>>>> 2012-02-06T09:13:51.806+0100: 328095.085: [GC 328095.086: >> >> >> >>>>>>> [ParNew: >> >> >> >>>>>>> 359895K->29357K(368640K), 0.0429070 secs] >> >> >> >>>>>>> 3471021K->3143476K(5201920K), 0.0434950 secs] [Times: >> >> >> >>>>>>> user=0.32 >> >> >> >>>>>>> sys=0.00, real=0.04 secs] >> >> >> >>>>>>> 2012-02-06T09:13:55.922+0100: 328099.201: [GC 328099.201: >> >> >> >>>>>>> [ParNew: >> >> >> >>>>>>> 357037K->31817K(368640K), 0.0429130 secs] >> >> >> >>>>>>> 3471156K->3148946K(5201920K), 0.0434930 secs] [Times: >> >> >> >>>>>>> user=0.31 >> >> >> >>>>>>> sys=0.00, real=0.04 secs] >> >> >> >>>>>>> 2012-02-06T09:13:59.044+0100: 328102.324: [GC 328102.324: >> >> >> >>>>>>> [ParNew >> >> >> >>>>>>> (promotion failure size = 1026) ?(promotion failed): >> >> >> >>>>>>> 359497K->368640K(368640K), 0.2226790 secs]328102.547: [CMS: >> >> >> >>>>>>> 3125609K->451515K(4833280K), 5.6225880 secs] 3476626K->4515 >> >> >> >>>>>>> 15K(5201920K), [CMS Perm : 124373K->124353K(262144K)], >> >> >> >>>>>>> 5.8459380 >> >> >> >>>>>>> secs] >> >> >> >>>>>>> [Times: user=6.20 sys=0.01, real=5.85 secs] >> >> >> >>>>>>> 2012-02-06T09:14:05.243+0100: 328108.522: [GC 328108.523: >> >> >> >>>>>>> [ParNew: >> >> >> >>>>>>> 327680K->40960K(368640K), 0.0319160 secs] >> >> >> >>>>>>> 779195K->497658K(5201920K), >> >> >> >>>>>>> 0.0325360 secs] [Times: user=0.21 sys=0.01, real=0.03 secs] >> >> >> >>>>>>> 2012-02-06T09:14:07.836+0100: 328111.116: [GC 328111.116: >> >> >> >>>>>>> [ParNew: >> >> >> >>>>>>> 368640K->32785K(368640K), 0.0744670 secs] >> >> >> >>>>>>> 825338K->520234K(5201920K), >> >> >> >>>>>>> 0.0750390 secs] [Times: user=0.40 sys=0.02, real=0.08 secs] >> >> >> >>>>>>> -------------------- >> >> >> >>>>>>> Given the 1026 word size, I'm wondering if I should be >> >> >> >>>>>>> hunting >> >> >> >>>>>>> for >> >> >> >>>>>>> an >> >> >> >>>>>>> overuse of BufferedInputStream/BufferedOutoutStream, since >> >> >> >>>>>>> both >> >> >> >>>>>>> have >> >> >> >>>>>>> 8192 as a default buffer size. >> >> >> >>>>>>> >> >> >> >>>>>>> The second group of promotion failures look like this >> >> >> >>>>>>> (multiple >> >> >> >>>>>>> ParNew >> >> >> >>>>>>> threads, small failure sizes): >> >> >> >>>>>>> -------------------- >> >> >> >>>>>>> 2012-02-06T09:50:15.773+0100: 328756.964: [GC 328756.964: >> >> >> >>>>>>> [ParNew: >> >> >> >>>>>>> 356116K->29934K(368640K), 0.0461100 secs] >> >> >> >>>>>>> 3203863K->2880162K(5201920K), 0.0468870 secs] [Times: >> >> >> >>>>>>> user=0.34 >> >> >> >>>>>>> sys=0.01, real=0.05 secs] >> >> >> >>>>>>> 2012-02-06T09:50:19.153+0100: 328760.344: [GC 328760.344: >> >> >> >>>>>>> [ParNew: >> >> >> >>>>>>> 357614K->30359K(368640K), 0.0454680 secs] >> >> >> >>>>>>> 3207842K->2882892K(5201920K), 0.0462280 secs] [Times: >> >> >> >>>>>>> user=0.33 >> >> >> >>>>>>> sys=0.01, real=0.05 secs] >> >> >> >>>>>>> 2012-02-06T09:50:22.658+0100: 328763.849: [GC 328763.849: >> >> >> >>>>>>> [ParNew >> >> >> >>>>>>> (1: >> >> >> >>>>>>> promotion failure size = 25) ?(4: promotion failure size = >> >> >> >>>>>>> 25) >> >> >> >>>>>>> ?(6: >> >> >> >>>>>>> promotion failure size = 25) ?(7: promotion failure size = >> >> >> >>>>>>> 144) >> >> >> >>>>>>> (promotion failed): 358039K->358358 >> >> >> >>>>>>> K(368640K), 0.2148680 secs]328764.064: [CMS: >> >> >> >>>>>>> 2854709K->446750K(4833280K), 5.8368270 secs] >> >> >> >>>>>>> 3210572K->446750K(5201920K), [CMS Perm : >> >> >> >>>>>>> 124670K->124644K(262144K)], >> >> >> >>>>>>> 6.0525230 secs] [Times: user=6.32 sys=0.00, real=6.05 secs] >> >> >> >>>>>>> 2012-02-06T09:50:29.896+0100: 328771.086: [GC 328771.087: >> >> >> >>>>>>> [ParNew: >> >> >> >>>>>>> 327680K->22569K(368640K), 0.0227080 secs] >> >> >> >>>>>>> 774430K->469319K(5201920K), >> >> >> >>>>>>> 0.0235020 secs] [Times: user=0.16 sys=0.00, real=0.02 secs] >> >> >> >>>>>>> 2012-02-06T09:50:31.076+0100: 328772.266: [GC 328772.267: >> >> >> >>>>>>> [ParNew: >> >> >> >>>>>>> 350249K->22264K(368640K), 0.0235480 secs] >> >> >> >>>>>>> 796999K->469014K(5201920K), >> >> >> >>>>>>> 0.0243000 secs] [Times: user=0.18 sys=0.01, real=0.02 secs] >> >> >> >>>>>>> -------------------- >> >> >> >>>>>>> >> >> >> >>>>>>> We're going to try to double the new size on a single node, >> >> >> >>>>>>> to >> >> >> >>>>>>> see >> >> >> >>>>>>> the >> >> >> >>>>>>> effects of that. >> >> >> >>>>>>> >> >> >> >>>>>>> Beyond this experiment, is there any additional data I can >> >> >> >>>>>>> collect >> >> >> >>>>>>> to >> >> >> >>>>>>> better understand the nature of the promotion failures? >> >> >> >>>>>>> Am I facing collecting free list statistics at this point? >> >> >> >>>>>>> >> >> >> >>>>>>> Thanks, >> >> >> >>>>>>> Taras >> >> >> >>>>>> >> >> >> >>>>>> >> >> >> >>>>>> _______________________________________________ >> >> >> >>>>>> hotspot-gc-use mailing list >> >> >> >>>>>> hotspot-gc-use at openjdk.java.net >> >> >> >>>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> >> >> >>>>>> >> >> >> >>>>> >> >> >> >> _______________________________________________ >> >> >> >> hotspot-gc-use mailing list >> >> >> >> hotspot-gc-use at openjdk.java.net >> >> >> >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> >> >> _______________________________________________ >> >> >> hotspot-gc-use mailing list >> >> >> hotspot-gc-use at openjdk.java.net >> >> >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> >> > >> >> > >> >> _______________________________________________ >> >> hotspot-gc-use mailing list >> >> hotspot-gc-use at openjdk.java.net >> >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> > >> > >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > From schelwa at tibco.com Tue Apr 17 08:08:34 2012 From: schelwa at tibco.com (Shivkumar Chelwa) Date: Tue, 17 Apr 2012 15:08:34 +0000 Subject: CMS Full GC Message-ID: Hi, Till date I was using JRE 6u22 with following garbage collection parameters and the CMS cycle use to kick-in appropriately (when heap reaches 75%) -server -d64 -Xms2048m -Xmx2048m -javaagent: my-instrumentation.jar -XX:MaxPermSize=256m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+DisableExplicitGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -verbose:gc -Xloggc:LB01.log -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -Djava.library=/usr/java/jre/lib/sparcv9/server/libjvm.so -Xmx8192M -Xms8192M -Xss256K But I switched to JRE 6u29 and see the CMS Full GC happening randomly. Can you please help me undercover this mystery. Here is one of the log message from gc log file. 13475.239: [Full GC 13475.239: [CMS: 4321575K->3717474K(7898752K), 54.0602376 secs] 4412277K->3717474K(8339648K), [CMS Perm : 73791K->73339K(74048K)], 54.0615557 secs] [Times: user=53.97 sy s=0.12, real=54.06 secs] Kindly help. Regards, Shiv -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120417/5eb2f8ad/attachment.html From ysr1729 at gmail.com Tue Apr 17 12:06:30 2012 From: ysr1729 at gmail.com (Srinivas Ramakrishna) Date: Tue, 17 Apr 2012 12:06:30 -0700 Subject: CMS Full GC In-Reply-To: References: Message-ID: Is it possible that you are GC'ing here to expand perm gen. Check if permgen footprint changed between the two JVM releases (when running yr application). Now, CMS should quietly expand perm gen without doing a stop-world GC, but there was a temporary regression in that functionality before it was fixed again. I can't however recall the JVM versions where the regression was introduced and then fixed. But all of this is handwaving on my part. If you run 6u22 and 6u29 both with -XX:+PrintHeapAtGC, you might have more visibility into why the GC is kicking in. A longer log would allow the community to perhaps provide suggestions as well. Which reminds me that there is a bug in the printing of GC cause (as printed by jstat) which needs to be fixed. HotSpot/GC folk, have you noticed that we never see a "perm gen allocation" as the GC cause even when that's really the reason for a full gc? (not that that should happen here where CMS is being used.) -- ramki On Tue, Apr 17, 2012 at 8:08 AM, Shivkumar Chelwa wrote: > Hi,**** > > ** ** > > Till date I was using JRE 6u22 with following garbage collection > parameters and the CMS cycle use to kick-in appropriately (when heap > reaches 75%)**** > > ** ** > > ** ** > > -server -d64 -Xms2048m -Xmx2048m -javaagent: my-instrumentation.jar > -XX:MaxPermSize=256m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC > -XX:+DisableExplicitGC -XX:CMSInitiatingOccupancyFraction=75 > -XX:+UseCMSInitiatingOccupancyOnly -verbose:gc -Xloggc:LB01.log > -XX:+PrintGCTimeStamps -XX:+PrintGCDetails > -Djava.library=/usr/java/jre/lib/sparcv9/server/libjvm.so -Xmx8192M > -Xms8192M -Xss256K**** > > ** ** > > But I switched to JRE 6u29 and see the *CMS Full GC* happening randomly. > Can you please help me undercover this mystery. Here is one of the log > message from gc log file.**** > > ** ** > > 13475.239: [*Full GC* 13475.239: [CMS: 4321575K->3717474K(7898752K), *54.0602376 > secs*] 4412277K->3717474K(8339648K), [CMS Perm : 73791K->73339K(74048K)], > 54.0615557 secs] [Times: user=53.97 sy**** > > s=0.12, real=54.06 secs]**** > > ** ** > > ** ** > > Kindly help.**** > > ** ** > > ** ** > > Regards,**** > > Shiv**** > > ** ** > > ** ** > > ** ** > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120417/36357428/attachment.html From schelwa at tibco.com Tue Apr 17 12:55:53 2012 From: schelwa at tibco.com (Shivkumar Chelwa) Date: Tue, 17 Apr 2012 19:55:53 +0000 Subject: CMS Full GC In-Reply-To: References: Message-ID: Thanks Ramki. The perm gen size is well below the max setting. Only 70-80M is being used out of 256M, so I don't think it is an issue. Will -XX:+PrintHeapAtGC prints heap only when there is Full GC? Regards, Shiv ________________________________ From: Srinivas Ramakrishna [mailto:ysr1729 at gmail.com] Sent: 17 April 2012 15:07 To: Shivkumar Chelwa Cc: hotspot-gc-use at openjdk.java.net Subject: Re: CMS Full GC Is it possible that you are GC'ing here to expand perm gen. Check if permgen footprint changed between the two JVM releases (when running yr application). Now, CMS should quietly expand perm gen without doing a stop-world GC, but there was a temporary regression in that functionality before it was fixed again. I can't however recall the JVM versions where the regression was introduced and then fixed. But all of this is handwaving on my part. If you run 6u22 and 6u29 both with -XX:+PrintHeapAtGC, you might have more visibility into why the GC is kicking in. A longer log would allow the community to perhaps provide suggestions as well. Which reminds me that there is a bug in the printing of GC cause (as printed by jstat) which needs to be fixed. HotSpot/GC folk, have you noticed that we never see a "perm gen allocation" as the GC cause even when that's really the reason for a full gc? (not that that should happen here where CMS is being used.) -- ramki On Tue, Apr 17, 2012 at 8:08 AM, Shivkumar Chelwa > wrote: Hi, Till date I was using JRE 6u22 with following garbage collection parameters and the CMS cycle use to kick-in appropriately (when heap reaches 75%) -server -d64 -Xms2048m -Xmx2048m -javaagent: my-instrumentation.jar -XX:MaxPermSize=256m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+DisableExplicitGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -verbose:gc -Xloggc:LB01.log -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -Djava.library=/usr/java/jre/lib/sparcv9/server/libjvm.so -Xmx8192M -Xms8192M -Xss256K But I switched to JRE 6u29 and see the CMS Full GC happening randomly. Can you please help me undercover this mystery. Here is one of the log message from gc log file. 13475.239: [Full GC 13475.239: [CMS: 4321575K->3717474K(7898752K), 54.0602376 secs] 4412277K->3717474K(8339648K), [CMS Perm : 73791K->73339K(74048K)], 54.0615557 secs] [Times: user=53.97 sy s=0.12, real=54.06 secs] Kindly help. Regards, Shiv _______________________________________________ hotspot-gc-use mailing list hotspot-gc-use at openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120417/b38a6794/attachment.html From ysr1729 at gmail.com Tue Apr 17 13:52:16 2012 From: ysr1729 at gmail.com (Srinivas Ramakrishna) Date: Tue, 17 Apr 2012 13:52:16 -0700 Subject: CMS Full GC In-Reply-To: References: Message-ID: It'll print at every gc, minor as well as major. Sent from my iPhone On Apr 17, 2012, at 12:55 PM, Shivkumar Chelwa wrote: > Thanks Ramki. The perm gen size is well below the max setting. Only 70-80M is being used out of 256M, so I don?t think it is an issue. > > Will -XX:+PrintHeapAtGC prints heap only when there is Full GC? > > Regards, > Shiv > > From: Srinivas Ramakrishna [mailto:ysr1729 at gmail.com] > Sent: 17 April 2012 15:07 > To: Shivkumar Chelwa > Cc: hotspot-gc-use at openjdk.java.net > Subject: Re: CMS Full GC > > Is it possible that you are GC'ing here to expand perm gen. Check if permgen footprint changed between the two JVM releases (when running yr application). > Now, CMS should quietly expand perm gen without doing a stop-world GC, but there was a temporary regression in that functionality before it was fixed again. > I can't however recall the JVM versions where the regression was introduced and then fixed. But all of this is handwaving on my part. > If you run 6u22 and 6u29 both with -XX:+PrintHeapAtGC, you might have more visibility into why the GC is kicking in. A longer log would allow > the community to perhaps provide suggestions as well. > > Which reminds me that there is a bug in the printing of GC cause (as printed by jstat) which needs to be fixed. HotSpot/GC folk, have you noticed that we never > see a "perm gen allocation" as the GC cause even when that's really the reason for a full gc? (not that that should happen here where CMS is being used.) > > -- ramki > > On Tue, Apr 17, 2012 at 8:08 AM, Shivkumar Chelwa wrote: > Hi, > > Till date I was using JRE 6u22 with following garbage collection parameters and the CMS cycle use to kick-in appropriately (when heap reaches 75%) > > > -server -d64 -Xms2048m -Xmx2048m -javaagent: my-instrumentation.jar -XX:MaxPermSize=256m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+DisableExplicitGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -verbose:gc -Xloggc:LB01.log -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -Djava.library=/usr/java/jre/lib/sparcv9/server/libjvm.so -Xmx8192M -Xms8192M -Xss256K > > But I switched to JRE 6u29 and see the CMS Full GC happening randomly. Can you please help me undercover this mystery. Here is one of the log message from gc log file. > > 13475.239: [Full GC 13475.239: [CMS: 4321575K->3717474K(7898752K), 54.0602376 secs] 4412277K->3717474K(8339648K), [CMS Perm : 73791K->73339K(74048K)], 54.0615557 secs] [Times: user=53.97 sy > s=0.12, real=54.06 secs] > > > Kindly help. > > > Regards, > Shiv > > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120417/c388ba58/attachment-0001.html From the.6th.month at gmail.com Wed Apr 18 01:16:18 2012 From: the.6th.month at gmail.com (the.6th.month at gmail.com) Date: Wed, 18 Apr 2012 16:16:18 +0800 Subject: does UseParallelOldGC guarantee a better full gc performance Message-ID: hi all: I'm currently using jdk 6u26. I just enabled UseParallelOldGC, expecting that would enhance the full gc efficiency and decrease the mark-sweep time by using multiple-core. The JAVA_OPTS is as below: -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution -Xloggc:gc.log-server -Xms4000m -Xmx4000m -Xss256k -Xmn256m -XX:PermSize=256m -XX:+UseParallelOldGC -server -Dorg.apache.jasper.compiler.Parser.STRICT_QUOTE_ESCAPING=false as shown in jinfo output, the settings have taken effect, and the ParallelGCThreads is 4 since the jvm is running on a four-core server. But what's strange is that the mark-sweep time remains almost unchanged (at around 6-8 seconds), do I miss something here? Does anyone have the same experience or any idea about the reason behind? Thanks very much for help All the best, Leon -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120418/5574cb47/attachment.html From sbordet at intalio.com Wed Apr 18 01:24:30 2012 From: sbordet at intalio.com (Simone Bordet) Date: Wed, 18 Apr 2012 10:24:30 +0200 Subject: does UseParallelOldGC guarantee a better full gc performance In-Reply-To: References: Message-ID: Hi, On Wed, Apr 18, 2012 at 10:16, the.6th.month at gmail.com wrote: > hi all: > I'm currently using jdk 6u26. I just enabled UseParallelOldGC, expecting > that would enhance the full gc efficiency and decrease the mark-sweep time > by using multiple-core. The JAVA_OPTS is as below: > -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution > -Xloggc:gc.log-server -Xms4000m -Xmx4000m -Xss256k -Xmn256m > -XX:PermSize=256m -XX:+UseParallelOldGC? -server > -Dorg.apache.jasper.compiler.Parser.STRICT_QUOTE_ESCAPING=false > as shown in jinfo output, the settings have taken effect, and the > ParallelGCThreads is 4 since the jvm is running on a four-core server. > But what's strange is that the mark-sweep time remains almost unchanged (at > around 6-8 seconds), do I miss something here? Does anyone have the same > experience or any idea about the reason behind? > Thanks very much for help The young generation is fairly small for a 4GiB heap. Can we see the lines you mention from the logs ? Simon -- http://cometd.org http://intalio.com http://bordet.blogspot.com ---- Finally, no matter how good the architecture and design are, to deliver bug-free software with optimal performance and reliability, the implementation technique must be flawless.?? Victoria Livschitz From the.6th.month at gmail.com Wed Apr 18 01:58:11 2012 From: the.6th.month at gmail.com (the.6th.month at gmail.com) Date: Wed, 18 Apr 2012 16:58:11 +0800 Subject: does UseParallelOldGC guarantee a better full gc performance In-Reply-To: References: Message-ID: Hi, Simon: this is the full gc log for your concern. 2012-04-18T16:47:24.824+0800: 988.392: [GC Desired survivor size 14876672 bytes, new threshold 1 (max 15) [PSYoungGen: 236288K->8126K(247616K)] 4054802K->3830711K(4081472K), 0.0512250 secs] [Times: user=0.15 sys=0.00, real=0.05 secs] 2012-04-18T16:47:24.875+0800: 988.443: [Full GC [PSYoungGen: 8126K->0K(247616K)] [PSOldGen: 3822585K->1751429K(3833856K)] 3830711K->1751429K(4081472K) [PSPermGen: 81721K->81721K(262144K)], 6.6108630 secs] [Times: user=6.62 sys=0.00, real=6.61 secs] the full gc time is almost unchanged since I enabled paralleloldgc. Do you have any recommendation for an appropriate young gen size? Thanks All the best, Leon On 18 April 2012 16:24, Simone Bordet wrote: > Hi, > > On Wed, Apr 18, 2012 at 10:16, the.6th.month at gmail.com > wrote: > > hi all: > > I'm currently using jdk 6u26. I just enabled UseParallelOldGC, expecting > > that would enhance the full gc efficiency and decrease the mark-sweep > time > > by using multiple-core. The JAVA_OPTS is as below: > > -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution > > -Xloggc:gc.log-server -Xms4000m -Xmx4000m -Xss256k -Xmn256m > > -XX:PermSize=256m -XX:+UseParallelOldGC -server > > -Dorg.apache.jasper.compiler.Parser.STRICT_QUOTE_ESCAPING=false > > as shown in jinfo output, the settings have taken effect, and the > > ParallelGCThreads is 4 since the jvm is running on a four-core server. > > But what's strange is that the mark-sweep time remains almost unchanged > (at > > around 6-8 seconds), do I miss something here? Does anyone have the same > > experience or any idea about the reason behind? > > Thanks very much for help > > The young generation is fairly small for a 4GiB heap. > > Can we see the lines you mention from the logs ? > > Simon > -- > http://cometd.org > http://intalio.com > http://bordet.blogspot.com > ---- > Finally, no matter how good the architecture and design are, > to deliver bug-free software with optimal performance and reliability, > the implementation technique must be flawless. Victoria Livschitz > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120418/c6acd54f/attachment.html From sbordet at intalio.com Wed Apr 18 02:19:34 2012 From: sbordet at intalio.com (Simone Bordet) Date: Wed, 18 Apr 2012 11:19:34 +0200 Subject: does UseParallelOldGC guarantee a better full gc performance In-Reply-To: References: Message-ID: Hi, On Wed, Apr 18, 2012 at 10:58, the.6th.month at gmail.com wrote: > Hi, Simon: > > this is the full gc log for your concern. > 2012-04-18T16:47:24.824+0800: 988.392: [GC > Desired survivor size 14876672 bytes, new threshold 1 (max 15) > ?[PSYoungGen: 236288K->8126K(247616K)] 4054802K->3830711K(4081472K), > 0.0512250 secs] [Times: user=0.15 sys=0.00, real=0.05 secs] > > 2012-04-18T16:47:24.875+0800: 988.443: [Full GC [PSYoungGen: > 8126K->0K(247616K)] [PSOldGen: 3822585K->1751429K(3833856K)] > 3830711K->1751429K(4081472K) [PSPermGen: 81721K->81721K(262144K)], 6.6108630 > secs] [Times: user=6.62 sys=0.00, real=6.61 secs] > > the full gc time is almost unchanged since I enabled paralleloldgc. > > Do you have any recommendation for an appropriate young gen size? Usually, applications generate a lot of short lived objects that can be reclaimed very efficiently in the young generation. If you have a small young generation, these objects will be promoted in old generation, where their collection is usually more expensive. It really depends what your application does, but I would remove the -Xmn option for now (leaving the default of 1/3 of the total heap, i.e. ~1.6 GiB), and see if you get benefits. As for the times being unchanged, I do not know. My experience is that UseParallelOldGC works as expected: I frequently see 1.5-2x gains on 2 cores, and I have seen 6x gains on 8 cores. Simon -- http://cometd.org http://intalio.com http://bordet.blogspot.com ---- Finally, no matter how good the architecture and design are, to deliver bug-free software with optimal performance and reliability, the implementation technique must be flawless.?? Victoria Livschitz From the.6th.month at gmail.com Wed Apr 18 03:07:05 2012 From: the.6th.month at gmail.com (the.6th.month at gmail.com) Date: Wed, 18 Apr 2012 18:07:05 +0800 Subject: does UseParallelOldGC guarantee a better full gc performance In-Reply-To: References: Message-ID: Hi, Simon: Thanks for your reply. That's really weird, I'll look into it and give the feedback later Thanks again. All the best, Leon On 18 April 2012 17:19, Simone Bordet wrote: > Hi, > > On Wed, Apr 18, 2012 at 10:58, the.6th.month at gmail.com > wrote: > > Hi, Simon: > > > > this is the full gc log for your concern. > > 2012-04-18T16:47:24.824+0800: 988.392: [GC > > Desired survivor size 14876672 bytes, new threshold 1 (max 15) > > [PSYoungGen: 236288K->8126K(247616K)] 4054802K->3830711K(4081472K), > > 0.0512250 secs] [Times: user=0.15 sys=0.00, real=0.05 secs] > > > > 2012-04-18T16:47:24.875+0800: 988.443: [Full GC [PSYoungGen: > > 8126K->0K(247616K)] [PSOldGen: 3822585K->1751429K(3833856K)] > > 3830711K->1751429K(4081472K) [PSPermGen: 81721K->81721K(262144K)], > 6.6108630 > > secs] [Times: user=6.62 sys=0.00, real=6.61 secs] > > > > the full gc time is almost unchanged since I enabled paralleloldgc. > > > > Do you have any recommendation for an appropriate young gen size? > > Usually, applications generate a lot of short lived objects that can > be reclaimed very efficiently in the young generation. > If you have a small young generation, these objects will be promoted > in old generation, where their collection is usually more expensive. > > It really depends what your application does, but I would remove the > -Xmn option for now (leaving the default of 1/3 of the total heap, > i.e. ~1.6 GiB), and see if you get benefits. > > As for the times being unchanged, I do not know. > My experience is that UseParallelOldGC works as expected: I frequently > see 1.5-2x gains on 2 cores, and I have seen 6x gains on 8 cores. > > Simon > -- > http://cometd.org > http://intalio.com > http://bordet.blogspot.com > ---- > Finally, no matter how good the architecture and design are, > to deliver bug-free software with optimal performance and reliability, > the implementation technique must be flawless. Victoria Livschitz > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120418/58299b7d/attachment.html From the.6th.month at gmail.com Wed Apr 18 04:03:37 2012 From: the.6th.month at gmail.com (the.6th.month at gmail.com) Date: Wed, 18 Apr 2012 19:03:37 +0800 Subject: does UseParallelOldGC guarantee a better full gc performance In-Reply-To: References: Message-ID: hi, Simon: here is another gc-log fragment about full gc, you can see that although I've configured the jvm to UseParallelOldGC and increased the younggen size to 768m, it still takes 13 seconds to finish the full gc, even worse than before. {Heap before GC invocations=109 (full 2): PSYoungGen total 705984K, used 14531K [0x00000007d0000000, 0x0000000800000000, 0x0000000800000000) eden space 629120K, 0% used [0x00000007d0000000,0x00000007d0000000,0x00000007f6660000) from space 76864K, 18% used [0x00000007f6660000,0x00000007f7490da8,0x00000007fb170000) to space 76672K, 0% used [0x00000007fb520000,0x00000007fb520000,0x0000000800000000) ParOldGen total 3309568K, used 3279215K [0x0000000706000000, 0x00000007d0000000, 0x00000007d0000000) object space 3309568K, 99% used [0x0000000706000000,0x00000007ce25bd68,0x00000007d0000000) PSPermGen total 262144K, used 79139K [0x00000006f6000000, 0x0000000706000000, 0x0000000706000000) object space 262144K, 30% used [0x00000006f6000000,0x00000006fad48e38,0x0000000706000000) 2012-04-18T18:55:53.500+0800: 767.928: [Full GC [PSYoungGen: 14531K->0K(705984K)] [ParOldGen: 3279215K->1474447K(3309568K)] 3293746K->1474447K(4015552K) [PSPermGen: 79139K->74190K(262144K)], 13.0669910 secs] [Times: user=41.91 sys=0.19, real=13.06 secs] quite counter-intuitive, huh? Leon On 18 April 2012 18:07, the.6th.month at gmail.com wrote: > Hi, Simon: > Thanks for your reply. That's really weird, I'll look into it and give the > feedback later > Thanks again. > > All the best, > Leon > > > On 18 April 2012 17:19, Simone Bordet wrote: > >> Hi, >> >> On Wed, Apr 18, 2012 at 10:58, the.6th.month at gmail.com >> wrote: >> > Hi, Simon: >> > >> > this is the full gc log for your concern. >> > 2012-04-18T16:47:24.824+0800: 988.392: [GC >> > Desired survivor size 14876672 bytes, new threshold 1 (max 15) >> > [PSYoungGen: 236288K->8126K(247616K)] 4054802K->3830711K(4081472K), >> > 0.0512250 secs] [Times: user=0.15 sys=0.00, real=0.05 secs] >> > >> > 2012-04-18T16:47:24.875+0800: 988.443: [Full GC [PSYoungGen: >> > 8126K->0K(247616K)] [PSOldGen: 3822585K->1751429K(3833856K)] >> > 3830711K->1751429K(4081472K) [PSPermGen: 81721K->81721K(262144K)], >> 6.6108630 >> > secs] [Times: user=6.62 sys=0.00, real=6.61 secs] >> > >> > the full gc time is almost unchanged since I enabled paralleloldgc. >> > >> > Do you have any recommendation for an appropriate young gen size? >> >> Usually, applications generate a lot of short lived objects that can >> be reclaimed very efficiently in the young generation. >> If you have a small young generation, these objects will be promoted >> in old generation, where their collection is usually more expensive. >> >> It really depends what your application does, but I would remove the >> -Xmn option for now (leaving the default of 1/3 of the total heap, >> i.e. ~1.6 GiB), and see if you get benefits. >> >> As for the times being unchanged, I do not know. >> My experience is that UseParallelOldGC works as expected: I frequently >> see 1.5-2x gains on 2 cores, and I have seen 6x gains on 8 cores. >> >> Simon >> -- >> http://cometd.org >> http://intalio.com >> http://bordet.blogspot.com >> ---- >> Finally, no matter how good the architecture and design are, >> to deliver bug-free software with optimal performance and reliability, >> the implementation technique must be flawless. Victoria Livschitz >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120418/37188e0a/attachment-0001.html From sbordet at intalio.com Wed Apr 18 06:10:08 2012 From: sbordet at intalio.com (Simone Bordet) Date: Wed, 18 Apr 2012 15:10:08 +0200 Subject: does UseParallelOldGC guarantee a better full gc performance In-Reply-To: References: Message-ID: Hi, On Wed, Apr 18, 2012 at 13:03, the.6th.month at gmail.com wrote: > hi, Simon: > here is another gc-log fragment about full gc, you can see that although > I've configured the jvm to UseParallelOldGC and increased the younggen size > to 768m, it still takes 13 seconds to finish the full gc, even worse than > before. > > {Heap before GC invocations=109 (full 2): > ?PSYoungGen????? total 705984K, used 14531K [0x00000007d0000000, > 0x0000000800000000, 0x0000000800000000) > ? eden space 629120K, 0% used > [0x00000007d0000000,0x00000007d0000000,0x00000007f6660000) > ? from space 76864K, 18% used > [0x00000007f6660000,0x00000007f7490da8,0x00000007fb170000) > ? to?? space 76672K, 0% used > [0x00000007fb520000,0x00000007fb520000,0x0000000800000000) > ?ParOldGen?????? total 3309568K, used 3279215K [0x0000000706000000, > 0x00000007d0000000, 0x00000007d0000000) > ? object space 3309568K, 99% used > [0x0000000706000000,0x00000007ce25bd68,0x00000007d0000000) > ?PSPermGen?????? total 262144K, used 79139K [0x00000006f6000000, > 0x0000000706000000, 0x0000000706000000) > ? object space 262144K, 30% used > [0x00000006f6000000,0x00000006fad48e38,0x0000000706000000) > 2012-04-18T18:55:53.500+0800: 767.928: [Full GC [PSYoungGen: > 14531K->0K(705984K)] [ParOldGen: 3279215K->1474447K(3309568K)] > 3293746K->1474447K(4015552K) [PSPermGen: 79139K->74190K(262144K)], > 13.0669910 secs] [Times: user=41.91 sys=0.19, real=13.06 secs] > > quite counter-intuitive, huh? Well, maybe. But it shows that the parallel collector does its work, since you had a 41.91/13.06 = 3.2x gain on your 4 cores. The rest of the analysis depends on what your application does (e.g. allocation rate and whether uses caches, references, etc.) and on a complete GC log. Simon -- http://cometd.org http://intalio.com http://bordet.blogspot.com ---- Finally, no matter how good the architecture and design are, to deliver bug-free software with optimal performance and reliability, the implementation technique must be flawless.?? Victoria Livschitz From jon.masamitsu at oracle.com Wed Apr 18 09:19:31 2012 From: jon.masamitsu at oracle.com (Jon Masamitsu) Date: Wed, 18 Apr 2012 09:19:31 -0700 Subject: does UseParallelOldGC guarantee a better full gc performance In-Reply-To: References: Message-ID: <4F8EE993.8030502@oracle.com> Leon, In this log you see as part of an entry "PSOldGen:" which says you're using the serial mark sweep. I see in your later posts that "ParOldGen:" appears in your log and that is the parallel mark sweep collector. Jon On 4/18/2012 1:58 AM, the.6th.month at gmail.com wrote: > Hi, Simon: > > this is the full gc log for your concern. > 2012-04-18T16:47:24.824+0800: 988.392: [GC > Desired survivor size 14876672 bytes, new threshold 1 (max 15) > [PSYoungGen: 236288K->8126K(247616K)] 4054802K->3830711K(4081472K), > 0.0512250 secs] [Times: user=0.15 sys=0.00, real=0.05 secs] > > 2012-04-18T16:47:24.875+0800: 988.443: [Full GC [PSYoungGen: > 8126K->0K(247616K)] [PSOldGen: 3822585K->1751429K(3833856K)] > 3830711K->1751429K(4081472K) [PSPermGen: 81721K->81721K(262144K)], > 6.6108630 secs] [Times: user=6.62 sys=0.00, real=6.61 secs] > > the full gc time is almost unchanged since I enabled paralleloldgc. > > Do you have any recommendation for an appropriate young gen size? > > Thanks > > All the best, > Leon > > > On 18 April 2012 16:24, Simone Bordet wrote: > >> Hi, >> >> On Wed, Apr 18, 2012 at 10:16, the.6th.month at gmail.com >> wrote: >>> hi all: >>> I'm currently using jdk 6u26. I just enabled UseParallelOldGC, expecting >>> that would enhance the full gc efficiency and decrease the mark-sweep >> time >>> by using multiple-core. The JAVA_OPTS is as below: >>> -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution >>> -Xloggc:gc.log-server -Xms4000m -Xmx4000m -Xss256k -Xmn256m >>> -XX:PermSize=256m -XX:+UseParallelOldGC -server >>> -Dorg.apache.jasper.compiler.Parser.STRICT_QUOTE_ESCAPING=false >>> as shown in jinfo output, the settings have taken effect, and the >>> ParallelGCThreads is 4 since the jvm is running on a four-core server. >>> But what's strange is that the mark-sweep time remains almost unchanged >> (at >>> around 6-8 seconds), do I miss something here? Does anyone have the same >>> experience or any idea about the reason behind? >>> Thanks very much for help >> The young generation is fairly small for a 4GiB heap. >> >> Can we see the lines you mention from the logs ? >> >> Simon >> -- >> http://cometd.org >> http://intalio.com >> http://bordet.blogspot.com >> ---- >> Finally, no matter how good the architecture and design are, >> to deliver bug-free software with optimal performance and reliability, >> the implementation technique must be flawless. Victoria Livschitz >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120418/0fbd072d/attachment.html From the.6th.month at gmail.com Wed Apr 18 09:27:01 2012 From: the.6th.month at gmail.com (the.6th.month at gmail.com) Date: Thu, 19 Apr 2012 00:27:01 +0800 Subject: does UseParallelOldGC guarantee a better full gc performance In-Reply-To: <4F8EE993.8030502@oracle.com> References: <4F8EE993.8030502@oracle.com> Message-ID: Hi, Jon, yup,,,I know, but what is weird is the paroldgen doesn't bring about better full gc performance as seen from JMX metrics but bring unexpected swap consumption. I am gonna look into my application instead for some inspiration. Leon On 19 April 2012 00:19, Jon Masamitsu wrote: > ** > Leon, > > In this log you see as part of an entry "PSOldGen:" which says you're > using the serial mark sweep. I see in your later posts that "ParOldGen:" > appears in your log and that is the parallel mark sweep collector. > > Jon > > > On 4/18/2012 1:58 AM, the.6th.month at gmail.com wrote: > > Hi, Simon: > > this is the full gc log for your concern. > 2012-04-18T16:47:24.824+0800: 988.392: [GC > Desired survivor size 14876672 bytes, new threshold 1 (max 15) > [PSYoungGen: 236288K->8126K(247616K)] 4054802K->3830711K(4081472K), > 0.0512250 secs] [Times: user=0.15 sys=0.00, real=0.05 secs] > > 2012-04-18T16:47:24.875+0800: 988.443: [Full GC [PSYoungGen: > 8126K->0K(247616K)] [PSOldGen: 3822585K->1751429K(3833856K)] > 3830711K->1751429K(4081472K) [PSPermGen: 81721K->81721K(262144K)], > 6.6108630 secs] [Times: user=6.62 sys=0.00, real=6.61 secs] > > the full gc time is almost unchanged since I enabled paralleloldgc. > > Do you have any recommendation for an appropriate young gen size? > > Thanks > > All the best, > Leon > > > On 18 April 2012 16:24, Simone Bordet wrote: > > > Hi, > > On Wed, Apr 18, 2012 at 10:16, the.6th.month at gmail.com wrote: > > hi all: > I'm currently using jdk 6u26. I just enabled UseParallelOldGC, expecting > that would enhance the full gc efficiency and decrease the mark-sweep > > time > > by using multiple-core. The JAVA_OPTS is as below: > -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution > -Xloggc:gc.log-server -Xms4000m -Xmx4000m -Xss256k -Xmn256m > -XX:PermSize=256m -XX:+UseParallelOldGC -server > -Dorg.apache.jasper.compiler.Parser.STRICT_QUOTE_ESCAPING=false > as shown in jinfo output, the settings have taken effect, and the > ParallelGCThreads is 4 since the jvm is running on a four-core server. > But what's strange is that the mark-sweep time remains almost unchanged > > (at > > around 6-8 seconds), do I miss something here? Does anyone have the same > experience or any idea about the reason behind? > Thanks very much for help > > The young generation is fairly small for a 4GiB heap. > > Can we see the lines you mention from the logs ? > > Simon > --http://cometd.orghttp://intalio.comhttp://bordet.blogspot.com > ---- > Finally, no matter how good the architecture and design are, > to deliver bug-free software with optimal performance and reliability, > the implementation technique must be flawless. Victoria Livschitz > _______________________________________________ > hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > > _______________________________________________ > hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120419/ecb9a050/attachment.html From jon.masamitsu at oracle.com Wed Apr 18 10:36:31 2012 From: jon.masamitsu at oracle.com (Jon Masamitsu) Date: Wed, 18 Apr 2012 10:36:31 -0700 Subject: does UseParallelOldGC guarantee a better full gc performance In-Reply-To: References: <4F8EE993.8030502@oracle.com> Message-ID: <4F8EFB9F.5030404@oracle.com> Leon, I don't think I've actually seen logs with the same flags except changing parallel old for serial old so hard for me to say. Simon's comment > Well, maybe. But it shows that the parallel collector does its work, > since you had a 41.91/13.06 = 3.2x gain on your 4 cores. says there is a parallel speed up, however, so I'll let you investigate you application and leave it at that. Jon On 4/18/2012 9:27 AM, the.6th.month at gmail.com wrote: > Hi, Jon, > yup,,,I know, but what is weird is the paroldgen doesn't bring about better > full gc performance as seen from JMX metrics but bring unexpected swap > consumption. > I am gonna look into my application instead for some inspiration. > > Leon > > On 19 April 2012 00:19, Jon Masamitsu wrote: > >> ** >> Leon, >> >> In this log you see as part of an entry "PSOldGen:" which says you're >> using the serial mark sweep. I see in your later posts that "ParOldGen:" >> appears in your log and that is the parallel mark sweep collector. >> >> Jon >> >> >> On 4/18/2012 1:58 AM, the.6th.month at gmail.com wrote: >> >> Hi, Simon: >> >> this is the full gc log for your concern. >> 2012-04-18T16:47:24.824+0800: 988.392: [GC >> Desired survivor size 14876672 bytes, new threshold 1 (max 15) >> [PSYoungGen: 236288K->8126K(247616K)] 4054802K->3830711K(4081472K), >> 0.0512250 secs] [Times: user=0.15 sys=0.00, real=0.05 secs] >> >> 2012-04-18T16:47:24.875+0800: 988.443: [Full GC [PSYoungGen: >> 8126K->0K(247616K)] [PSOldGen: 3822585K->1751429K(3833856K)] >> 3830711K->1751429K(4081472K) [PSPermGen: 81721K->81721K(262144K)], >> 6.6108630 secs] [Times: user=6.62 sys=0.00, real=6.61 secs] >> >> the full gc time is almost unchanged since I enabled paralleloldgc. >> >> Do you have any recommendation for an appropriate young gen size? >> >> Thanks >> >> All the best, >> Leon >> >> >> On 18 April 2012 16:24, Simone Bordet wrote: >> >> >> Hi, >> >> On Wed, Apr 18, 2012 at 10:16, the.6th.month at gmail.com wrote: >> >> hi all: >> I'm currently using jdk 6u26. I just enabled UseParallelOldGC, expecting >> that would enhance the full gc efficiency and decrease the mark-sweep >> >> time >> >> by using multiple-core. The JAVA_OPTS is as below: >> -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution >> -Xloggc:gc.log-server -Xms4000m -Xmx4000m -Xss256k -Xmn256m >> -XX:PermSize=256m -XX:+UseParallelOldGC -server >> -Dorg.apache.jasper.compiler.Parser.STRICT_QUOTE_ESCAPING=false >> as shown in jinfo output, the settings have taken effect, and the >> ParallelGCThreads is 4 since the jvm is running on a four-core server. >> But what's strange is that the mark-sweep time remains almost unchanged >> >> (at >> >> around 6-8 seconds), do I miss something here? Does anyone have the same >> experience or any idea about the reason behind? >> Thanks very much for help >> >> The young generation is fairly small for a 4GiB heap. >> >> Can we see the lines you mention from the logs ? >> >> Simon >> --http://cometd.orghttp://intalio.comhttp://bordet.blogspot.com >> ---- >> Finally, no matter how good the architecture and design are, >> to deliver bug-free software with optimal performance and reliability, >> the implementation technique must be flawless. Victoria Livschitz >> _______________________________________________ >> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> >> >> _______________________________________________ >> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> >> From ysr1729 at gmail.com Wed Apr 18 13:07:19 2012 From: ysr1729 at gmail.com (Srinivas Ramakrishna) Date: Wed, 18 Apr 2012 13:07:19 -0700 Subject: does UseParallelOldGC guarantee a better full gc performance In-Reply-To: <4F8EFB9F.5030404@oracle.com> References: <4F8EE993.8030502@oracle.com> <4F8EFB9F.5030404@oracle.com> Message-ID: On Wed, Apr 18, 2012 at 10:36 AM, Jon Masamitsu wrote: > Leon, > > I don't think I've actually seen logs with the same flags except changing > parallel old for serial old so hard for me to say. Simon's comment > > > Well, maybe. But it shows that the parallel collector does its work, > > since you had a 41.91/13.06 = 3.2x gain on your 4 cores. > I think Simon's "speed up" is a bit misleading. He shows that the wall-time of 13.06 s does user time eqvt work worth 41.91 seconds, so indeed a lot of user-level work is done in those 13.06 seconds. I'd call that "intrinsic parallelism" rather than speed-up. However, that's a misleading way to define speed-up because (for all that the user cares about) all of that parallel work may be overhead of the parallel algorithm so that the bottom-line speed-up disappears. Rather, Simon and Leon, you want to compare the wall-clock pause-time seen with parallel old with that seen with serial old (which i believe is what Leon may have been referring to) which is how speed-up should be defined when comparing a parallel algorithm with a serial couterpart. Leon, in the past we observed (and you will likely find some discussion in the archives) that a particular phase called the "deferred updates" phase was taking a bulk of the time when we encountered longer pauses with parallel old. That's phase when work is done single-threaded and would exhibit lower parallelism. Typically, but not always, this would happen during the full gc pauses during which maximal compaction was forced. (This is done by default during the first and every 20 subsequent full collections -- or so.) We worked around that by turning off maximal compaction and letting the dense prefix alone. I believe a bug may have been filed following that discussion and it had been my intention to try and fix it (per discussion on the list). Unfortunately, other matters intervened and I was unable to get back to that work. PrintParallelGC{Task,Phase}Times (i think) will give you more visibility into the various phases etc. and might help you diagnose the performance issue. -- ramki > says there is a parallel speed up, however, so I'll let you investigate > you application > and leave it at that. > > Jon > > > On 4/18/2012 9:27 AM, the.6th.month at gmail.com wrote: > > Hi, Jon, > > yup,,,I know, but what is weird is the paroldgen doesn't bring about > better > > full gc performance as seen from JMX metrics but bring unexpected swap > > consumption. > > I am gonna look into my application instead for some inspiration. > > > > Leon > > > > On 19 April 2012 00:19, Jon Masamitsu wrote: > > > >> ** > >> Leon, > >> > >> In this log you see as part of an entry "PSOldGen:" which says you're > >> using the serial mark sweep. I see in your later posts that > "ParOldGen:" > >> appears in your log and that is the parallel mark sweep collector. > >> > >> Jon > >> > >> > >> On 4/18/2012 1:58 AM, the.6th.month at gmail.com wrote: > >> > >> Hi, Simon: > >> > >> this is the full gc log for your concern. > >> 2012-04-18T16:47:24.824+0800: 988.392: [GC > >> Desired survivor size 14876672 bytes, new threshold 1 (max 15) > >> [PSYoungGen: 236288K->8126K(247616K)] 4054802K->3830711K(4081472K), > >> 0.0512250 secs] [Times: user=0.15 sys=0.00, real=0.05 secs] > >> > >> 2012-04-18T16:47:24.875+0800: 988.443: [Full GC [PSYoungGen: > >> 8126K->0K(247616K)] [PSOldGen: 3822585K->1751429K(3833856K)] > >> 3830711K->1751429K(4081472K) [PSPermGen: 81721K->81721K(262144K)], > >> 6.6108630 secs] [Times: user=6.62 sys=0.00, real=6.61 secs] > >> > >> the full gc time is almost unchanged since I enabled paralleloldgc. > >> > >> Do you have any recommendation for an appropriate young gen size? > >> > >> Thanks > >> > >> All the best, > >> Leon > >> > >> > >> On 18 April 2012 16:24, Simone Bordet < > sbordet at intalio.com> wrote: > >> > >> > >> Hi, > >> > >> On Wed, Apr 18, 2012 at 10:16, the.6th.month at gmail.com< > the.6th.month at gmail.com> wrote: > >> > >> hi all: > >> I'm currently using jdk 6u26. I just enabled UseParallelOldGC, expecting > >> that would enhance the full gc efficiency and decrease the mark-sweep > >> > >> time > >> > >> by using multiple-core. The JAVA_OPTS is as below: > >> -XX:+PrintGCDetails -XX:+PrintGCDateStamps > -XX:+PrintTenuringDistribution > >> -Xloggc:gc.log-server -Xms4000m -Xmx4000m -Xss256k -Xmn256m > >> -XX:PermSize=256m -XX:+UseParallelOldGC -server > >> -Dorg.apache.jasper.compiler.Parser.STRICT_QUOTE_ESCAPING=false > >> as shown in jinfo output, the settings have taken effect, and the > >> ParallelGCThreads is 4 since the jvm is running on a four-core server. > >> But what's strange is that the mark-sweep time remains almost unchanged > >> > >> (at > >> > >> around 6-8 seconds), do I miss something here? Does anyone have the > same > >> experience or any idea about the reason behind? > >> Thanks very much for help > >> > >> The young generation is fairly small for a 4GiB heap. > >> > >> Can we see the lines you mention from the logs ? > >> > >> Simon > >> --http://cometd.orghttp://intalio.comhttp://bordet.blogspot.com > >> ---- > >> Finally, no matter how good the architecture and design are, > >> to deliver bug-free software with optimal performance and reliability, > >> the implementation technique must be flawless. Victoria Livschitz > >> _______________________________________________ > >> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp:// > mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > >> > >> > >> _______________________________________________ > >> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp:// > mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > >> > >> > >> _______________________________________________ > >> hotspot-gc-use mailing list > >> hotspot-gc-use at openjdk.java.net > >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > >> > >> > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120418/7ed96c7a/attachment.html From Peter.B.Kessler at Oracle.COM Wed Apr 18 14:52:32 2012 From: Peter.B.Kessler at Oracle.COM (Peter B. Kessler) Date: Wed, 18 Apr 2012 14:52:32 -0700 Subject: does UseParallelOldGC guarantee a better full gc performance In-Reply-To: References: <4F8EE993.8030502@oracle.com> Message-ID: <4F8F37A0.2070700@Oracle.COM> "swap consumption"? How much *physical* memory do you have on this box? (It would be nice if the GC logs included the physical memory available.) What else is running on the box? ... peter the.6th.month at gmail.com wrote: > Hi, Jon, > yup,,,I know, but what is weird is the paroldgen doesn't bring about > better full gc performance as seen from JMX metrics but bring unexpected > swap consumption. > I am gonna look into my application instead for some inspiration. > > Leon > > On 19 April 2012 00:19, Jon Masamitsu > wrote: > > __ > Leon, > > In this log you see as part of an entry "PSOldGen:" which says you're > using the serial mark sweep. I see in your later posts that > "ParOldGen:" > appears in your log and that is the parallel mark sweep collector. > > Jon > > > On 4/18/2012 1:58 AM, the.6th.month at gmail.com > wrote: >> Hi, Simon: >> >> this is the full gc log for your concern. >> 2012-04-18T16:47:24.824+0800: 988.392: [GC >> Desired survivor size 14876672 bytes, new threshold 1 (max 15) >> [PSYoungGen: 236288K->8126K(247616K)] 4054802K->3830711K(4081472K), >> 0.0512250 secs] [Times: user=0.15 sys=0.00, real=0.05 secs] >> >> 2012-04-18T16:47:24.875+0800: 988.443: [Full GC [PSYoungGen: >> 8126K->0K(247616K)] [PSOldGen: 3822585K->1751429K(3833856K)] >> 3830711K->1751429K(4081472K) [PSPermGen: 81721K->81721K(262144K)], >> 6.6108630 secs] [Times: user=6.62 sys=0.00, real=6.61 secs] >> >> the full gc time is almost unchanged since I enabled paralleloldgc. >> >> Do you have any recommendation for an appropriate young gen size? >> >> Thanks >> >> All the best, >> Leon >> >> >> On 18 April 2012 16:24, Simone Bordet wrote: >> >>> Hi, >>> >>> On Wed, Apr 18, 2012 at 10:16, the.6th.month at gmail.com >>> wrote: >>>> hi all: >>>> I'm currently using jdk 6u26. I just enabled UseParallelOldGC, expecting >>>> that would enhance the full gc efficiency and decrease the mark-sweep >>> time >>>> by using multiple-core. The JAVA_OPTS is as below: >>>> -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution >>>> -Xloggc:gc.log-server -Xms4000m -Xmx4000m -Xss256k -Xmn256m >>>> -XX:PermSize=256m -XX:+UseParallelOldGC -server >>>> -Dorg.apache.jasper.compiler.Parser.STRICT_QUOTE_ESCAPING=false >>>> as shown in jinfo output, the settings have taken effect, and the >>>> ParallelGCThreads is 4 since the jvm is running on a four-core server. >>>> But what's strange is that the mark-sweep time remains almost unchanged >>> (at >>>> around 6-8 seconds), do I miss something here? Does anyone have the same >>>> experience or any idea about the reason behind? >>>> Thanks very much for help >>> The young generation is fairly small for a 4GiB heap. >>> >>> Can we see the lines you mention from the logs ? >>> >>> Simon >>> -- >>> http://cometd.org >>> http://intalio.com >>> http://bordet.blogspot.com >>> ---- >>> Finally, no matter how good the architecture and design are, >>> to deliver bug-free software with optimal performance and reliability, >>> the implementation technique must be flawless. Victoria Livschitz >>> _______________________________________________ >>> hotspot-gc-use mailing list >>> hotspot-gc-use at openjdk.java.net >>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>> >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > > > ------------------------------------------------------------------------ > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From the.6th.month at gmail.com Thu Apr 19 01:51:54 2012 From: the.6th.month at gmail.com (the.6th.month at gmail.com) Date: Thu, 19 Apr 2012 16:51:54 +0800 Subject: does UseParallelOldGC guarantee a better full gc performance In-Reply-To: References: <4F8EE993.8030502@oracle.com> <4F8EFB9F.5030404@oracle.com> Message-ID: hi, Srinivas: that explains, i do observe that no performance gain has been obtained thru par old gc via the jmx mark_sweep_time (i have a monitoring system collecting that and print out with rrdtool). hopefully that's the result of maximum compaction, but i am keen to ask whether it will bring about any negative impact on performance, like leaving lots of fragmentations unreclaimed. all th best Leon On Apr 19, 2012 4:07 AM, "Srinivas Ramakrishna" wrote: > > > On Wed, Apr 18, 2012 at 10:36 AM, Jon Masamitsu wrote: > >> Leon, >> >> I don't think I've actually seen logs with the same flags except changing >> parallel old for serial old so hard for me to say. Simon's comment >> >> > Well, maybe. But it shows that the parallel collector does its work, >> > since you had a 41.91/13.06 = 3.2x gain on your 4 cores. >> > > I think Simon's "speed up" is a bit misleading. He shows that the > wall-time of 13.06 s > does user time eqvt work worth 41.91 seconds, so indeed a lot of > user-level work is > done in those 13.06 seconds. I'd call that "intrinsic parallelism" rather > than speed-up. > However, that's a misleading way to define speed-up because > (for all that the user cares about) all of that parallel work may be > overhead of the parallel algorithm > so that the bottom-line speed-up disappears. Rather, Simon and Leon, you > want to compare > the wall-clock pause-time seen with parallel old with that seen with > serial old (which i believe > is what Leon may have been referring to) which is how speed-up should be > defined when > comparing a parallel algorithm with a serial couterpart. > > Leon, in the past we observed (and you will likely find some discussion in > the archives) that > a particular phase called the "deferred updates" phase was taking a bulk > of the time > when we encountered longer pauses with parallel old. That's phase when > work is done > single-threaded and would exhibit lower parallelism. Typically, but not > always, this > would happen during the full gc pauses during which maximal compaction was > forced. > (This is done by default during the first and every 20 subsequent full > collections -- or so.) > We worked around that by turning off maximal compaction and letting the > dense prefix > alone. > > I believe a bug may have been filed following that discussion and it had > been my intention to > try and fix it (per discussion on the list). Unfortunately, other matters > intervened and I was > unable to get back to that work. > > PrintParallelGC{Task,Phase}Times (i think) will give you more visibility > into the various phases etc. and > might help you diagnose the performance issue. > > -- ramki > > >> says there is a parallel speed up, however, so I'll let you investigate >> you application >> and leave it at that. >> >> Jon >> >> >> On 4/18/2012 9:27 AM, the.6th.month at gmail.com wrote: >> > Hi, Jon, >> > yup,,,I know, but what is weird is the paroldgen doesn't bring about >> better >> > full gc performance as seen from JMX metrics but bring unexpected swap >> > consumption. >> > I am gonna look into my application instead for some inspiration. >> > >> > Leon >> > >> > On 19 April 2012 00:19, Jon Masamitsu wrote: >> > >> >> ** >> >> Leon, >> >> >> >> In this log you see as part of an entry "PSOldGen:" which says you're >> >> using the serial mark sweep. I see in your later posts that >> "ParOldGen:" >> >> appears in your log and that is the parallel mark sweep collector. >> >> >> >> Jon >> >> >> >> >> >> On 4/18/2012 1:58 AM, the.6th.month at gmail.com wrote: >> >> >> >> Hi, Simon: >> >> >> >> this is the full gc log for your concern. >> >> 2012-04-18T16:47:24.824+0800: 988.392: [GC >> >> Desired survivor size 14876672 bytes, new threshold 1 (max 15) >> >> [PSYoungGen: 236288K->8126K(247616K)] 4054802K->3830711K(4081472K), >> >> 0.0512250 secs] [Times: user=0.15 sys=0.00, real=0.05 secs] >> >> >> >> 2012-04-18T16:47:24.875+0800: 988.443: [Full GC [PSYoungGen: >> >> 8126K->0K(247616K)] [PSOldGen: 3822585K->1751429K(3833856K)] >> >> 3830711K->1751429K(4081472K) [PSPermGen: 81721K->81721K(262144K)], >> >> 6.6108630 secs] [Times: user=6.62 sys=0.00, real=6.61 secs] >> >> >> >> the full gc time is almost unchanged since I enabled paralleloldgc. >> >> >> >> Do you have any recommendation for an appropriate young gen size? >> >> >> >> Thanks >> >> >> >> All the best, >> >> Leon >> >> >> >> >> >> On 18 April 2012 16:24, Simone Bordet < >> sbordet at intalio.com> wrote: >> >> >> >> >> >> Hi, >> >> >> >> On Wed, Apr 18, 2012 at 10:16, the.6th.month at gmail.com< >> the.6th.month at gmail.com> wrote: >> >> >> >> hi all: >> >> I'm currently using jdk 6u26. I just enabled UseParallelOldGC, >> expecting >> >> that would enhance the full gc efficiency and decrease the mark-sweep >> >> >> >> time >> >> >> >> by using multiple-core. The JAVA_OPTS is as below: >> >> -XX:+PrintGCDetails -XX:+PrintGCDateStamps >> -XX:+PrintTenuringDistribution >> >> -Xloggc:gc.log-server -Xms4000m -Xmx4000m -Xss256k -Xmn256m >> >> -XX:PermSize=256m -XX:+UseParallelOldGC -server >> >> -Dorg.apache.jasper.compiler.Parser.STRICT_QUOTE_ESCAPING=false >> >> as shown in jinfo output, the settings have taken effect, and the >> >> ParallelGCThreads is 4 since the jvm is running on a four-core server. >> >> But what's strange is that the mark-sweep time remains almost unchanged >> >> >> >> (at >> >> >> >> around 6-8 seconds), do I miss something here? Does anyone have the >> same >> >> experience or any idea about the reason behind? >> >> Thanks very much for help >> >> >> >> The young generation is fairly small for a 4GiB heap. >> >> >> >> Can we see the lines you mention from the logs ? >> >> >> >> Simon >> >> --http://cometd.orghttp://intalio.comhttp://bordet.blogspot.com >> >> ---- >> >> Finally, no matter how good the architecture and design are, >> >> to deliver bug-free software with optimal performance and reliability, >> >> the implementation technique must be flawless. Victoria Livschitz >> >> _______________________________________________ >> >> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp:// >> mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> >> >> >> >> >> _______________________________________________ >> >> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp:// >> mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> >> >> >> >> >> _______________________________________________ >> >> hotspot-gc-use mailing list >> >> hotspot-gc-use at openjdk.java.net >> >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> >> >> >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120419/f51227da/attachment-0001.html From ysr1729 at gmail.com Fri Apr 20 02:44:26 2012 From: ysr1729 at gmail.com (Srinivas Ramakrishna) Date: Fri, 20 Apr 2012 02:44:26 -0700 Subject: does UseParallelOldGC guarantee a better full gc performance In-Reply-To: References: <4F8EE993.8030502@oracle.com> <4F8EFB9F.5030404@oracle.com> Message-ID: BTW, max compaction doesn't happen every time, i think it happens in the 4th gc and then every 20th gc or so. It;s those occasional gc's that would be impacted. (And that had been our experience with generally good performance but the occasional much slower pause. Don't know if your experience is similar.) No I don't think excessive deadwood is an issue. What is an issue is how well this keeps up, since in general the incidence of the deferred updates phase may be affected by the number and size of the deferred objects and their oop-richness, so I am not sure how good a mitigant avoiding maximal compaction is for long-lived JVM's with churn of latge objects in the old gen. -- ramki On Thu, Apr 19, 2012 at 1:51 AM, the.6th.month at gmail.com < the.6th.month at gmail.com> wrote: > hi, Srinivas: > that explains, i do observe that no performance gain has been obtained > thru par old gc via the jmx mark_sweep_time (i have a monitoring system > collecting that and print out with rrdtool). hopefully that's the result of > maximum compaction, but i am keen to ask whether it will bring about any > negative impact on performance, like leaving lots of fragmentations > unreclaimed. > > all th best > Leon > On Apr 19, 2012 4:07 AM, "Srinivas Ramakrishna" wrote: > >> >> >> On Wed, Apr 18, 2012 at 10:36 AM, Jon Masamitsu > > wrote: >> >>> Leon, >>> >>> I don't think I've actually seen logs with the same flags except changing >>> parallel old for serial old so hard for me to say. Simon's comment >>> >>> > Well, maybe. But it shows that the parallel collector does its work, >>> > since you had a 41.91/13.06 = 3.2x gain on your 4 cores. >>> >> >> I think Simon's "speed up" is a bit misleading. He shows that the >> wall-time of 13.06 s >> does user time eqvt work worth 41.91 seconds, so indeed a lot of >> user-level work is >> done in those 13.06 seconds. I'd call that "intrinsic parallelism" rather >> than speed-up. >> However, that's a misleading way to define speed-up because >> (for all that the user cares about) all of that parallel work may be >> overhead of the parallel algorithm >> so that the bottom-line speed-up disappears. Rather, Simon and Leon, you >> want to compare >> the wall-clock pause-time seen with parallel old with that seen with >> serial old (which i believe >> is what Leon may have been referring to) which is how speed-up should be >> defined when >> comparing a parallel algorithm with a serial couterpart. >> >> Leon, in the past we observed (and you will likely find some discussion >> in the archives) that >> a particular phase called the "deferred updates" phase was taking a bulk >> of the time >> when we encountered longer pauses with parallel old. That's phase when >> work is done >> single-threaded and would exhibit lower parallelism. Typically, but not >> always, this >> would happen during the full gc pauses during which maximal compaction >> was forced. >> (This is done by default during the first and every 20 subsequent full >> collections -- or so.) >> We worked around that by turning off maximal compaction and letting the >> dense prefix >> alone. >> >> I believe a bug may have been filed following that discussion and it had >> been my intention to >> try and fix it (per discussion on the list). Unfortunately, other matters >> intervened and I was >> unable to get back to that work. >> >> PrintParallelGC{Task,Phase}Times (i think) will give you more visibility >> into the various phases etc. and >> might help you diagnose the performance issue. >> >> -- ramki >> >> >>> says there is a parallel speed up, however, so I'll let you investigate >>> you application >>> and leave it at that. >>> >>> Jon >>> >>> >>> On 4/18/2012 9:27 AM, the.6th.month at gmail.com wrote: >>> > Hi, Jon, >>> > yup,,,I know, but what is weird is the paroldgen doesn't bring about >>> better >>> > full gc performance as seen from JMX metrics but bring unexpected swap >>> > consumption. >>> > I am gonna look into my application instead for some inspiration. >>> > >>> > Leon >>> > >>> > On 19 April 2012 00:19, Jon Masamitsu >>> wrote: >>> > >>> >> ** >>> >> Leon, >>> >> >>> >> In this log you see as part of an entry "PSOldGen:" which says you're >>> >> using the serial mark sweep. I see in your later posts that >>> "ParOldGen:" >>> >> appears in your log and that is the parallel mark sweep collector. >>> >> >>> >> Jon >>> >> >>> >> >>> >> On 4/18/2012 1:58 AM, the.6th.month at gmail.com wrote: >>> >> >>> >> Hi, Simon: >>> >> >>> >> this is the full gc log for your concern. >>> >> 2012-04-18T16:47:24.824+0800: 988.392: [GC >>> >> Desired survivor size 14876672 bytes, new threshold 1 (max 15) >>> >> [PSYoungGen: 236288K->8126K(247616K)] 4054802K->3830711K(4081472K), >>> >> 0.0512250 secs] [Times: user=0.15 sys=0.00, real=0.05 secs] >>> >> >>> >> 2012-04-18T16:47:24.875+0800: 988.443: [Full GC [PSYoungGen: >>> >> 8126K->0K(247616K)] [PSOldGen: 3822585K->1751429K(3833856K)] >>> >> 3830711K->1751429K(4081472K) [PSPermGen: 81721K->81721K(262144K)], >>> >> 6.6108630 secs] [Times: user=6.62 sys=0.00, real=6.61 secs] >>> >> >>> >> the full gc time is almost unchanged since I enabled paralleloldgc. >>> >> >>> >> Do you have any recommendation for an appropriate young gen size? >>> >> >>> >> Thanks >>> >> >>> >> All the best, >>> >> Leon >>> >> >>> >> >>> >> On 18 April 2012 16:24, Simone Bordet < >>> sbordet at intalio.com> wrote: >>> >> >>> >> >>> >> Hi, >>> >> >>> >> On Wed, Apr 18, 2012 at 10:16, the.6th.month at gmail.com< >>> the.6th.month at gmail.com> wrote: >>> >> >>> >> hi all: >>> >> I'm currently using jdk 6u26. I just enabled UseParallelOldGC, >>> expecting >>> >> that would enhance the full gc efficiency and decrease the mark-sweep >>> >> >>> >> time >>> >> >>> >> by using multiple-core. The JAVA_OPTS is as below: >>> >> -XX:+PrintGCDetails -XX:+PrintGCDateStamps >>> -XX:+PrintTenuringDistribution >>> >> -Xloggc:gc.log-server -Xms4000m -Xmx4000m -Xss256k -Xmn256m >>> >> -XX:PermSize=256m -XX:+UseParallelOldGC -server >>> >> -Dorg.apache.jasper.compiler.Parser.STRICT_QUOTE_ESCAPING=false >>> >> as shown in jinfo output, the settings have taken effect, and the >>> >> ParallelGCThreads is 4 since the jvm is running on a four-core server. >>> >> But what's strange is that the mark-sweep time remains almost >>> unchanged >>> >> >>> >> (at >>> >> >>> >> around 6-8 seconds), do I miss something here? Does anyone have the >>> same >>> >> experience or any idea about the reason behind? >>> >> Thanks very much for help >>> >> >>> >> The young generation is fairly small for a 4GiB heap. >>> >> >>> >> Can we see the lines you mention from the logs ? >>> >> >>> >> Simon >>> >> --http://cometd.orghttp://intalio.comhttp://bordet.blogspot.com >>> >> ---- >>> >> Finally, no matter how good the architecture and design are, >>> >> to deliver bug-free software with optimal performance and reliability, >>> >> the implementation technique must be flawless. Victoria Livschitz >>> >> _______________________________________________ >>> >> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp:// >>> mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>> >> >>> >> >>> >> _______________________________________________ >>> >> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp:// >>> mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>> >> >>> >> >>> >> _______________________________________________ >>> >> hotspot-gc-use mailing list >>> >> hotspot-gc-use at openjdk.java.net >>> >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>> >> >>> >> >>> _______________________________________________ >>> hotspot-gc-use mailing list >>> hotspot-gc-use at openjdk.java.net >>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120420/9cb8195e/attachment.html From the.6th.month at gmail.com Fri Apr 20 08:01:39 2012 From: the.6th.month at gmail.com (the.6th.month at gmail.com) Date: Fri, 20 Apr 2012 23:01:39 +0800 Subject: does UseParallelOldGC guarantee a better full gc performance In-Reply-To: References: <4F8EE993.8030502@oracle.com> <4F8EFB9F.5030404@oracle.com> Message-ID: Hi, Srinivas: Can you explain more about "since in general the incidence of the deferred updates phase may be affected by the number and size of the deferred objects and their oop-richness". I don't quite understand what it means and if it doesn't bother you too much, can you possible give some explanations about what a deferred object means. Thanks a million. All the best, Leon On 20 April 2012 17:44, Srinivas Ramakrishna wrote: > BTW, max compaction doesn't happen every time, i think it happens in the > 4th gc and then every 20th gc or so. > It;s those occasional gc's that would be impacted. (And that had been our > experience with generally good performance > but the occasional much slower pause. Don't know if your experience is > similar.) > > No I don't think excessive deadwood is an issue. What is an issue is how > well this keeps up, > since in general the incidence of the deferred updates phase may be > affected by the number and > size of the deferred objects and their oop-richness, so I am not sure how > good a mitigant > avoiding maximal compaction is for long-lived JVM's with churn of latge > objects in the old > gen. > > -- ramki > > > On Thu, Apr 19, 2012 at 1:51 AM, the.6th.month at gmail.com < > the.6th.month at gmail.com> wrote: > >> hi, Srinivas: >> that explains, i do observe that no performance gain has been obtained >> thru par old gc via the jmx mark_sweep_time (i have a monitoring system >> collecting that and print out with rrdtool). hopefully that's the result of >> maximum compaction, but i am keen to ask whether it will bring about any >> negative impact on performance, like leaving lots of fragmentations >> unreclaimed. >> >> all th best >> Leon >> On Apr 19, 2012 4:07 AM, "Srinivas Ramakrishna" >> wrote: >> >>> >>> >>> On Wed, Apr 18, 2012 at 10:36 AM, Jon Masamitsu < >>> jon.masamitsu at oracle.com> wrote: >>> >>>> Leon, >>>> >>>> I don't think I've actually seen logs with the same flags except >>>> changing >>>> parallel old for serial old so hard for me to say. Simon's comment >>>> >>>> > Well, maybe. But it shows that the parallel collector does its work, >>>> > since you had a 41.91/13.06 = 3.2x gain on your 4 cores. >>>> >>> >>> I think Simon's "speed up" is a bit misleading. He shows that the >>> wall-time of 13.06 s >>> does user time eqvt work worth 41.91 seconds, so indeed a lot of >>> user-level work is >>> done in those 13.06 seconds. I'd call that "intrinsic parallelism" >>> rather than speed-up. >>> However, that's a misleading way to define speed-up because >>> (for all that the user cares about) all of that parallel work may be >>> overhead of the parallel algorithm >>> so that the bottom-line speed-up disappears. Rather, Simon and Leon, you >>> want to compare >>> the wall-clock pause-time seen with parallel old with that seen with >>> serial old (which i believe >>> is what Leon may have been referring to) which is how speed-up should be >>> defined when >>> comparing a parallel algorithm with a serial couterpart. >>> >>> Leon, in the past we observed (and you will likely find some discussion >>> in the archives) that >>> a particular phase called the "deferred updates" phase was taking a bulk >>> of the time >>> when we encountered longer pauses with parallel old. That's phase when >>> work is done >>> single-threaded and would exhibit lower parallelism. Typically, but not >>> always, this >>> would happen during the full gc pauses during which maximal compaction >>> was forced. >>> (This is done by default during the first and every 20 subsequent full >>> collections -- or so.) >>> We worked around that by turning off maximal compaction and letting the >>> dense prefix >>> alone. >>> >>> I believe a bug may have been filed following that discussion and it had >>> been my intention to >>> try and fix it (per discussion on the list). Unfortunately, other >>> matters intervened and I was >>> unable to get back to that work. >>> >>> PrintParallelGC{Task,Phase}Times (i think) will give you more visibility >>> into the various phases etc. and >>> might help you diagnose the performance issue. >>> >>> -- ramki >>> >>> >>>> says there is a parallel speed up, however, so I'll let you investigate >>>> you application >>>> and leave it at that. >>>> >>>> Jon >>>> >>>> >>>> On 4/18/2012 9:27 AM, the.6th.month at gmail.com wrote: >>>> > Hi, Jon, >>>> > yup,,,I know, but what is weird is the paroldgen doesn't bring about >>>> better >>>> > full gc performance as seen from JMX metrics but bring unexpected swap >>>> > consumption. >>>> > I am gonna look into my application instead for some inspiration. >>>> > >>>> > Leon >>>> > >>>> > On 19 April 2012 00:19, Jon Masamitsu >>>> wrote: >>>> > >>>> >> ** >>>> >> Leon, >>>> >> >>>> >> In this log you see as part of an entry "PSOldGen:" which says you're >>>> >> using the serial mark sweep. I see in your later posts that >>>> "ParOldGen:" >>>> >> appears in your log and that is the parallel mark sweep collector. >>>> >> >>>> >> Jon >>>> >> >>>> >> >>>> >> On 4/18/2012 1:58 AM, the.6th.month at gmail.com wrote: >>>> >> >>>> >> Hi, Simon: >>>> >> >>>> >> this is the full gc log for your concern. >>>> >> 2012-04-18T16:47:24.824+0800: 988.392: [GC >>>> >> Desired survivor size 14876672 bytes, new threshold 1 (max 15) >>>> >> [PSYoungGen: 236288K->8126K(247616K)] 4054802K->3830711K(4081472K), >>>> >> 0.0512250 secs] [Times: user=0.15 sys=0.00, real=0.05 secs] >>>> >> >>>> >> 2012-04-18T16:47:24.875+0800: 988.443: [Full GC [PSYoungGen: >>>> >> 8126K->0K(247616K)] [PSOldGen: 3822585K->1751429K(3833856K)] >>>> >> 3830711K->1751429K(4081472K) [PSPermGen: 81721K->81721K(262144K)], >>>> >> 6.6108630 secs] [Times: user=6.62 sys=0.00, real=6.61 secs] >>>> >> >>>> >> the full gc time is almost unchanged since I enabled paralleloldgc. >>>> >> >>>> >> Do you have any recommendation for an appropriate young gen size? >>>> >> >>>> >> Thanks >>>> >> >>>> >> All the best, >>>> >> Leon >>>> >> >>>> >> >>>> >> On 18 April 2012 16:24, Simone Bordet < >>>> sbordet at intalio.com> wrote: >>>> >> >>>> >> >>>> >> Hi, >>>> >> >>>> >> On Wed, Apr 18, 2012 at 10:16, the.6th.month at gmail.com< >>>> the.6th.month at gmail.com> wrote: >>>> >> >>>> >> hi all: >>>> >> I'm currently using jdk 6u26. I just enabled UseParallelOldGC, >>>> expecting >>>> >> that would enhance the full gc efficiency and decrease the mark-sweep >>>> >> >>>> >> time >>>> >> >>>> >> by using multiple-core. The JAVA_OPTS is as below: >>>> >> -XX:+PrintGCDetails -XX:+PrintGCDateStamps >>>> -XX:+PrintTenuringDistribution >>>> >> -Xloggc:gc.log-server -Xms4000m -Xmx4000m -Xss256k -Xmn256m >>>> >> -XX:PermSize=256m -XX:+UseParallelOldGC -server >>>> >> -Dorg.apache.jasper.compiler.Parser.STRICT_QUOTE_ESCAPING=false >>>> >> as shown in jinfo output, the settings have taken effect, and the >>>> >> ParallelGCThreads is 4 since the jvm is running on a four-core >>>> server. >>>> >> But what's strange is that the mark-sweep time remains almost >>>> unchanged >>>> >> >>>> >> (at >>>> >> >>>> >> around 6-8 seconds), do I miss something here? Does anyone have >>>> the same >>>> >> experience or any idea about the reason behind? >>>> >> Thanks very much for help >>>> >> >>>> >> The young generation is fairly small for a 4GiB heap. >>>> >> >>>> >> Can we see the lines you mention from the logs ? >>>> >> >>>> >> Simon >>>> >> --http://cometd.orghttp://intalio.comhttp://bordet.blogspot.com >>>> >> ---- >>>> >> Finally, no matter how good the architecture and design are, >>>> >> to deliver bug-free software with optimal performance and >>>> reliability, >>>> >> the implementation technique must be flawless. Victoria Livschitz >>>> >> _______________________________________________ >>>> >> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp:// >>>> mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>>> >> >>>> >> >>>> >> _______________________________________________ >>>> >> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp:// >>>> mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>>> >> >>>> >> >>>> >> _______________________________________________ >>>> >> hotspot-gc-use mailing list >>>> >> hotspot-gc-use at openjdk.java.net >>>> >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>>> >> >>>> >> >>>> _______________________________________________ >>>> hotspot-gc-use mailing list >>>> hotspot-gc-use at openjdk.java.net >>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>>> >>> >>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120420/6cbe0357/attachment-0001.html From aaisinzon at guidewire.com Fri Apr 20 09:24:08 2012 From: aaisinzon at guidewire.com (Alex Aisinzon) Date: Fri, 20 Apr 2012 16:24:08 +0000 Subject: Code cache References: <43E49E6EC0E84F41B98C68AB6D7820C4170FA028@sm-ex-02-vm.guidewire.com> <4320BF98-561E-43AC-85BC-9E291108AD9B@amd.com> Message-ID: <43E49E6EC0E84F41B98C68AB6D7820C41710145F@sm-ex-02-vm.guidewire.com> Eric I tried -XX:+UseCodeCacheFlushing and associated performance/scalability was markedly poorer than increasing the code cache. I will stick to tuning the code cache. Best Alex A -----Original Message----- From: Alex Aisinzon Sent: Thursday, April 12, 2012 1:31 PM To: 'Eric Caspole' Cc: hotspot-gc-use at openjdk.java.net Subject: RE: Code cache Hi Eric I thank you for the feedback. I will give this tuning a try. I have explored another approach: I have added the option -XX:+PrintCompilation to track code compilation. This option is not very documented. I could infer that, without a larger code cache, about 11000 methods were compiled before hitting the issue. When using a much larger cache (512MB), I saw that about 14000 methods were compiled. My understanding is that the code cache is 48MB for the platform I used (x64). A 14000/11000*48MB aka 61MB cache is likely to avoid the issue. I have started a performance test with a 64MB code cache to see if that indeed avoids the code cache full issue. If so, I would have a method to find the right code cache size. I will report when I have the results. I will also report if -XX:+UseCodeCacheFlushing option provides similar results to the larger code cache. As for your question on why our app is hitting this issue: our applications has become heavier in its use of compiled code so this is likely the consequence of that. Best Alex A -----Original Message----- From: Eric Caspole [mailto:eric.caspole at amd.com] Sent: Thursday, April 12, 2012 12:26 PM To: Alex Aisinzon Cc: hotspot-gc-use at openjdk.java.net Subject: Re: Code cache Hi Alex, You can try -XX:+UseCodeCacheFlushing where the JVM will selectively age out some compiled code and free up code cache space. This is not on by default in JDK 6 as far as I know. What is your application doing such that it frequently hits this problem? Regards, Eric On Apr 12, 2012, at 3:15 PM, Alex Aisinzon wrote: > Any feedback on this? > > > > Best > > > > Alex A > > > > From: Alex Aisinzon > Sent: Monday, April 09, 2012 11:38 AM > To: 'hotspot-gc-use at openjdk.java.net' > Subject: Code cache > > > > I ran performance tests on one of our apps and saw the following > error message in the GC logs: > > Java HotSpot(TM) 64-Bit Server VM warning: CodeCache is full. > Compiler has been disabled. > > Java HotSpot(TM) 64-Bit Server VM warning: Try increasing the code > cache size using -XX:ReservedCodeCacheSize= > > > > I scaled up the code cache to 512MB (- > XX:ReservedCodeCacheSize=512m) and markedly improved performance/ > scalability. > > > > I have a few questions: > > * Is there a logging option that shows how much of the code > cache is really used so that I find the right cache size without > oversizing it? > > * What factors play into the code cache utilization? I > would guess that the amount of code to compile is the dominant > factor. Are there other factors like load: I would guess that some > entries in the cache may get invalidated if not used much and load > could be a factor in this. > > > > I was running on Sun JVM 1.6 update 30 64 bit on x86-64. > > > > Best > > > > Alex A > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From aaisinzon at guidewire.com Fri Apr 20 09:26:10 2012 From: aaisinzon at guidewire.com (Alex Aisinzon) Date: Fri, 20 Apr 2012 16:26:10 +0000 Subject: G1 evolution/maturing Message-ID: <43E49E6EC0E84F41B98C68AB6D7820C417101474@sm-ex-02-vm.guidewire.com> Hi all I still see a lot of discussions around CMS. G1 is supposed to solve some of CMS's issues/limitations, namely fragmentation. I gave G1 a try about a year ago and it seemed not yet ready. Has G1 evolved much in this last year and, if so, which release should I try with? Best Alex Aisinzon -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120420/5a21901b/attachment.html From jon.masamitsu at oracle.com Fri Apr 20 10:05:38 2012 From: jon.masamitsu at oracle.com (Jon Masamitsu) Date: Fri, 20 Apr 2012 10:05:38 -0700 Subject: G1 evolution/maturing In-Reply-To: <43E49E6EC0E84F41B98C68AB6D7820C417101474@sm-ex-02-vm.guidewire.com> References: <43E49E6EC0E84F41B98C68AB6D7820C417101474@sm-ex-02-vm.guidewire.com> Message-ID: <4F919762.8010901@oracle.com> Alex, Over the last year there has been work to make G1 more stable, move more work to the concurrent phases, simplify some code to improve performance and adjust G1 policy for choosing regions to collect. Large heaps have typically been used in measuring performance so there is that bias in the improvements (meaning we probably don't have good numbers on how much performance with smaller heaps have changed). 7u4 is the release to try. Jon On 04/20/12 09:26, Alex Aisinzon wrote: > > Hi all > > I still see a lot of discussions around CMS. G1 is supposed to solve > some of CMS's issues/limitations, namely fragmentation. > > I gave G1 a try about a year ago and it seemed not yet ready. > > Has G1 evolved much in this last year and, if so, which release should > I try with? > > Best > > Alex Aisinzon > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120420/82dcc3b6/attachment.html From eric.caspole at amd.com Fri Apr 20 10:34:13 2012 From: eric.caspole at amd.com (Eric Caspole) Date: Fri, 20 Apr 2012 13:34:13 -0400 Subject: Code cache In-Reply-To: <43E49E6EC0E84F41B98C68AB6D7820C41710145F@sm-ex-02-vm.guidewire.com> References: <43E49E6EC0E84F41B98C68AB6D7820C4170FA028@sm-ex-02-vm.guidewire.com> <4320BF98-561E-43AC-85BC-9E291108AD9B@amd.com> <43E49E6EC0E84F41B98C68AB6D7820C41710145F@sm-ex-02-vm.guidewire.com> Message-ID: <9B919A33-BA8D-4960-995F-2191747CE157@amd.com> Yes, if your live working set size of compiled methods is bigger or very close to the code cache size then +UseCodeCacheFlushing won't really help, because it will keep trying to recompile the methods and throw them away over and over. On Apr 20, 2012, at 12:24 PM, Alex Aisinzon wrote: > Eric > > I tried -XX:+UseCodeCacheFlushing and associated performance/ > scalability was markedly poorer than increasing the code cache. > I will stick to tuning the code cache. > > Best > > Alex A > > -----Original Message----- > From: Alex Aisinzon > Sent: Thursday, April 12, 2012 1:31 PM > To: 'Eric Caspole' > Cc: hotspot-gc-use at openjdk.java.net > Subject: RE: Code cache > > Hi Eric > > I thank you for the feedback. I will give this tuning a try. > I have explored another approach: I have added the option -XX: > +PrintCompilation to track code compilation. > This option is not very documented. I could infer that, without a > larger code cache, about 11000 methods were compiled before hitting > the issue. > When using a much larger cache (512MB), I saw that about 14000 > methods were compiled. > My understanding is that the code cache is 48MB for the platform I > used (x64). A 14000/11000*48MB aka 61MB cache is likely to avoid > the issue. I have started a performance test with a 64MB code cache > to see if that indeed avoids the code cache full issue. > > If so, I would have a method to find the right code cache size. > I will report when I have the results. I will also report if -XX: > +UseCodeCacheFlushing option provides similar results to the larger > code cache. > > As for your question on why our app is hitting this issue: our > applications has become heavier in its use of compiled code so this > is likely the consequence of that. > > Best > > Alex A > > -----Original Message----- > From: Eric Caspole [mailto:eric.caspole at amd.com] > Sent: Thursday, April 12, 2012 12:26 PM > To: Alex Aisinzon > Cc: hotspot-gc-use at openjdk.java.net > Subject: Re: Code cache > > Hi Alex, > You can try -XX:+UseCodeCacheFlushing where the JVM will selectively > age out some compiled code and free up code cache space. This is not > on by default in JDK 6 as far as I know. > > What is your application doing such that it frequently hits this > problem? > > Regards, > Eric > > > On Apr 12, 2012, at 3:15 PM, Alex Aisinzon wrote: > >> Any feedback on this? >> >> >> >> Best >> >> >> >> Alex A >> >> >> >> From: Alex Aisinzon >> Sent: Monday, April 09, 2012 11:38 AM >> To: 'hotspot-gc-use at openjdk.java.net' >> Subject: Code cache >> >> >> >> I ran performance tests on one of our apps and saw the following >> error message in the GC logs: >> >> Java HotSpot(TM) 64-Bit Server VM warning: CodeCache is full. >> Compiler has been disabled. >> >> Java HotSpot(TM) 64-Bit Server VM warning: Try increasing the code >> cache size using -XX:ReservedCodeCacheSize= >> >> >> >> I scaled up the code cache to 512MB (- >> XX:ReservedCodeCacheSize=512m) and markedly improved performance/ >> scalability. >> >> >> >> I have a few questions: >> >> * Is there a logging option that shows how much of the code >> cache is really used so that I find the right cache size without >> oversizing it? >> >> * What factors play into the code cache utilization? I >> would guess that the amount of code to compile is the dominant >> factor. Are there other factors like load: I would guess that some >> entries in the cache may get invalidated if not used much and load >> could be a factor in this. >> >> >> >> I was running on Sun JVM 1.6 update 30 64 bit on x86-64. >> >> >> >> Best >> >> >> >> Alex A >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > > From taras.tielkes at gmail.com Fri Apr 20 12:46:44 2012 From: taras.tielkes at gmail.com (Taras Tielkes) Date: Fri, 20 Apr 2012 21:46:44 +0200 Subject: Faster card marking: chances for Java 6 backport Message-ID: Hi, Are there plans to port RFE 7068625 to Java 6? Thanks, -tt From jon.masamitsu at oracle.com Fri Apr 20 15:42:21 2012 From: jon.masamitsu at oracle.com (Jon Masamitsu) Date: Fri, 20 Apr 2012 15:42:21 -0700 Subject: Faster card marking: chances for Java 6 backport In-Reply-To: References: Message-ID: <4F91E64D.1070509@oracle.com> Taras, I haven't heard any discussions about a backport. I think it's a issue that the sustaining organization would have to consider (since it's to jdk6). Jon On 4/20/2012 12:46 PM, Taras Tielkes wrote: > Hi, > > Are there plans to port RFE 7068625 to Java 6? > > Thanks, > -tt > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From ysr1729 at gmail.com Fri Apr 20 16:20:21 2012 From: ysr1729 at gmail.com (Srinivas Ramakrishna) Date: Fri, 20 Apr 2012 16:20:21 -0700 Subject: does UseParallelOldGC guarantee a better full gc performance In-Reply-To: References: <4F8EE993.8030502@oracle.com> <4F8EFB9F.5030404@oracle.com> Message-ID: Hi Leon -- (sorry for overloading standard replicated database terminology here which may have confused you.) Here's the relevant explanation from Peter Kessler:- http://markmail.org/message/fhoffb4ksczxk26q The URL also contains the discussion earlier this year on this list that I had alluded to before. -- ramki On Fri, Apr 20, 2012 at 8:01 AM, the.6th.month at gmail.com < the.6th.month at gmail.com> wrote: > Hi, Srinivas: > Can you explain more about "since in general the incidence of the deferred > updates phase may be affected by the number and size of the deferred > objects and their oop-richness". I don't quite understand what it means and > if it doesn't bother you too much, can you possible give some explanations > about what a deferred object means. > Thanks a million. > > All the best, > Leon > > > On 20 April 2012 17:44, Srinivas Ramakrishna wrote: > >> BTW, max compaction doesn't happen every time, i think it happens in the >> 4th gc and then every 20th gc or so. >> It;s those occasional gc's that would be impacted. (And that had been our >> experience with generally good performance >> but the occasional much slower pause. Don't know if your experience is >> similar.) >> >> No I don't think excessive deadwood is an issue. What is an issue is how >> well this keeps up, >> since in general the incidence of the deferred updates phase may be >> affected by the number and >> size of the deferred objects and their oop-richness, so I am not sure how >> good a mitigant >> avoiding maximal compaction is for long-lived JVM's with churn of latge >> objects in the old >> gen. >> >> -- ramki >> >> >> On Thu, Apr 19, 2012 at 1:51 AM, the.6th.month at gmail.com < >> the.6th.month at gmail.com> wrote: >> >>> hi, Srinivas: >>> that explains, i do observe that no performance gain has been obtained >>> thru par old gc via the jmx mark_sweep_time (i have a monitoring system >>> collecting that and print out with rrdtool). hopefully that's the result of >>> maximum compaction, but i am keen to ask whether it will bring about any >>> negative impact on performance, like leaving lots of fragmentations >>> unreclaimed. >>> >>> all th best >>> Leon >>> On Apr 19, 2012 4:07 AM, "Srinivas Ramakrishna" >>> wrote: >>> >>>> >>>> >>>> On Wed, Apr 18, 2012 at 10:36 AM, Jon Masamitsu < >>>> jon.masamitsu at oracle.com> wrote: >>>> >>>>> Leon, >>>>> >>>>> I don't think I've actually seen logs with the same flags except >>>>> changing >>>>> parallel old for serial old so hard for me to say. Simon's comment >>>>> >>>>> > Well, maybe. But it shows that the parallel collector does its work, >>>>> > since you had a 41.91/13.06 = 3.2x gain on your 4 cores. >>>>> >>>> >>>> I think Simon's "speed up" is a bit misleading. He shows that the >>>> wall-time of 13.06 s >>>> does user time eqvt work worth 41.91 seconds, so indeed a lot of >>>> user-level work is >>>> done in those 13.06 seconds. I'd call that "intrinsic parallelism" >>>> rather than speed-up. >>>> However, that's a misleading way to define speed-up because >>>> (for all that the user cares about) all of that parallel work may be >>>> overhead of the parallel algorithm >>>> so that the bottom-line speed-up disappears. Rather, Simon and Leon, >>>> you want to compare >>>> the wall-clock pause-time seen with parallel old with that seen with >>>> serial old (which i believe >>>> is what Leon may have been referring to) which is how speed-up should >>>> be defined when >>>> comparing a parallel algorithm with a serial couterpart. >>>> >>>> Leon, in the past we observed (and you will likely find some discussion >>>> in the archives) that >>>> a particular phase called the "deferred updates" phase was taking a >>>> bulk of the time >>>> when we encountered longer pauses with parallel old. That's phase when >>>> work is done >>>> single-threaded and would exhibit lower parallelism. Typically, but not >>>> always, this >>>> would happen during the full gc pauses during which maximal compaction >>>> was forced. >>>> (This is done by default during the first and every 20 subsequent full >>>> collections -- or so.) >>>> We worked around that by turning off maximal compaction and letting the >>>> dense prefix >>>> alone. >>>> >>>> I believe a bug may have been filed following that discussion and it >>>> had been my intention to >>>> try and fix it (per discussion on the list). Unfortunately, other >>>> matters intervened and I was >>>> unable to get back to that work. >>>> >>>> PrintParallelGC{Task,Phase}Times (i think) will give you more >>>> visibility into the various phases etc. and >>>> might help you diagnose the performance issue. >>>> >>>> -- ramki >>>> >>>> >>>>> says there is a parallel speed up, however, so I'll let you investigate >>>>> you application >>>>> and leave it at that. >>>>> >>>>> Jon >>>>> >>>>> >>>>> On 4/18/2012 9:27 AM, the.6th.month at gmail.com wrote: >>>>> > Hi, Jon, >>>>> > yup,,,I know, but what is weird is the paroldgen doesn't bring about >>>>> better >>>>> > full gc performance as seen from JMX metrics but bring unexpected >>>>> swap >>>>> > consumption. >>>>> > I am gonna look into my application instead for some inspiration. >>>>> > >>>>> > Leon >>>>> > >>>>> > On 19 April 2012 00:19, Jon Masamitsu >>>>> wrote: >>>>> > >>>>> >> ** >>>>> >> Leon, >>>>> >> >>>>> >> In this log you see as part of an entry "PSOldGen:" which says >>>>> you're >>>>> >> using the serial mark sweep. I see in your later posts that >>>>> "ParOldGen:" >>>>> >> appears in your log and that is the parallel mark sweep collector. >>>>> >> >>>>> >> Jon >>>>> >> >>>>> >> >>>>> >> On 4/18/2012 1:58 AM, the.6th.month at gmail.com wrote: >>>>> >> >>>>> >> Hi, Simon: >>>>> >> >>>>> >> this is the full gc log for your concern. >>>>> >> 2012-04-18T16:47:24.824+0800: 988.392: [GC >>>>> >> Desired survivor size 14876672 bytes, new threshold 1 (max 15) >>>>> >> [PSYoungGen: 236288K->8126K(247616K)] >>>>> 4054802K->3830711K(4081472K), >>>>> >> 0.0512250 secs] [Times: user=0.15 sys=0.00, real=0.05 secs] >>>>> >> >>>>> >> 2012-04-18T16:47:24.875+0800: 988.443: [Full GC [PSYoungGen: >>>>> >> 8126K->0K(247616K)] [PSOldGen: 3822585K->1751429K(3833856K)] >>>>> >> 3830711K->1751429K(4081472K) [PSPermGen: 81721K->81721K(262144K)], >>>>> >> 6.6108630 secs] [Times: user=6.62 sys=0.00, real=6.61 secs] >>>>> >> >>>>> >> the full gc time is almost unchanged since I enabled paralleloldgc. >>>>> >> >>>>> >> Do you have any recommendation for an appropriate young gen size? >>>>> >> >>>>> >> Thanks >>>>> >> >>>>> >> All the best, >>>>> >> Leon >>>>> >> >>>>> >> >>>>> >> On 18 April 2012 16:24, Simone Bordet < >>>>> sbordet at intalio.com> wrote: >>>>> >> >>>>> >> >>>>> >> Hi, >>>>> >> >>>>> >> On Wed, Apr 18, 2012 at 10:16, the.6th.month at gmail.com< >>>>> the.6th.month at gmail.com> wrote: >>>>> >> >>>>> >> hi all: >>>>> >> I'm currently using jdk 6u26. I just enabled UseParallelOldGC, >>>>> expecting >>>>> >> that would enhance the full gc efficiency and decrease the >>>>> mark-sweep >>>>> >> >>>>> >> time >>>>> >> >>>>> >> by using multiple-core. The JAVA_OPTS is as below: >>>>> >> -XX:+PrintGCDetails -XX:+PrintGCDateStamps >>>>> -XX:+PrintTenuringDistribution >>>>> >> -Xloggc:gc.log-server -Xms4000m -Xmx4000m -Xss256k -Xmn256m >>>>> >> -XX:PermSize=256m -XX:+UseParallelOldGC -server >>>>> >> -Dorg.apache.jasper.compiler.Parser.STRICT_QUOTE_ESCAPING=false >>>>> >> as shown in jinfo output, the settings have taken effect, and the >>>>> >> ParallelGCThreads is 4 since the jvm is running on a four-core >>>>> server. >>>>> >> But what's strange is that the mark-sweep time remains almost >>>>> unchanged >>>>> >> >>>>> >> (at >>>>> >> >>>>> >> around 6-8 seconds), do I miss something here? Does anyone have >>>>> the same >>>>> >> experience or any idea about the reason behind? >>>>> >> Thanks very much for help >>>>> >> >>>>> >> The young generation is fairly small for a 4GiB heap. >>>>> >> >>>>> >> Can we see the lines you mention from the logs ? >>>>> >> >>>>> >> Simon >>>>> >> --http://cometd.orghttp://intalio.comhttp://bordet.blogspot.com >>>>> >> ---- >>>>> >> Finally, no matter how good the architecture and design are, >>>>> >> to deliver bug-free software with optimal performance and >>>>> reliability, >>>>> >> the implementation technique must be flawless. Victoria Livschitz >>>>> >> _______________________________________________ >>>>> >> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp:// >>>>> mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>>>> >> >>>>> >> >>>>> >> _______________________________________________ >>>>> >> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp:// >>>>> mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>>>> >> >>>>> >> >>>>> >> _______________________________________________ >>>>> >> hotspot-gc-use mailing list >>>>> >> hotspot-gc-use at openjdk.java.net >>>>> >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>>>> >> >>>>> >> >>>>> _______________________________________________ >>>>> hotspot-gc-use mailing list >>>>> hotspot-gc-use at openjdk.java.net >>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>>>> >>>> >>>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120420/d1d387f6/attachment.html From taras.tielkes at gmail.com Sun Apr 22 13:24:31 2012 From: taras.tielkes at gmail.com (Taras Tielkes) Date: Sun, 22 Apr 2012 22:24:31 +0200 Subject: Two basic questions on -verbosegc and -XX:+PrintTenuringDistribution Message-ID: Hi, We're using a time-series database to store and aggregate monitoring data from our systems, including GC behavior. I'm thinking of adding two metrics: * total allocation (in K per minute) * total promotion (in K per minute) The gc logs are the source for this data, and I'd like to verify that my understanding of the numbers is correct. Here's an example verbosegc line of output (we're running ParNew+CMS): [GC 2136581.585: [ParNew:345951K->40960K(368640K), 0.0676780 secs] 3608692K->3323692K(5201920K), 0.0680220 secs] a) The delta between the ParNew "before" and "after" is: 345951K-40960K=304991K My understanding is that the 304991K is the total of (collected in young gen + promoted to tenured gen) Since this number of composed of two things, it's not directly useful by itself. b) The delta between the overall heap "before" and "after" is: 3608692K-3323692K=285000K I assume that this is effectively the volume that was collected in this ParNew cycle. Would it be correct to calculate the total allocation rate of the running application (in a given period) from summing the total heap deltas (in a given timespan)? I do realize that it's a "collected kilobytes" metric, but I think it's close enough to be used as a "delayed" allocation number, especially when looking at a timescale of 10 minutes or more. It has the additional convenience of requiring to parse the current gc.log line only, and not needing to correlate with the preceding ParNew event. c) I take it that the difference between the two deltas (ParNew delta and total heap delta) is effectively the promotion volume? In the example above, this would give a promotion volume of (345951K-40960K)-(3608692K-3323692K)=19991K d) When looking at -XX:+PrintTenuringDistribution output, I assume the distribution reflects the situation *after* the enclosing ParNew event in the log. Thanks in advance for any corrections, -tt From rainer.jung at kippdata.de Sun Apr 22 14:00:49 2012 From: rainer.jung at kippdata.de (Rainer Jung) Date: Sun, 22 Apr 2012 23:00:49 +0200 Subject: Two basic questions on -verbosegc and -XX:+PrintTenuringDistribution In-Reply-To: References: Message-ID: <4F947181.50003@kippdata.de> On 22.04.2012 22:24, Taras Tielkes wrote: > Hi, > > We're using a time-series database to store and aggregate monitoring > data from our systems, including GC behavior. > > I'm thinking of adding two metrics: > * total allocation (in K per minute) > * total promotion (in K per minute) > > The gc logs are the source for this data, and I'd like to verify that > my understanding of the numbers is correct. > > Here's an example verbosegc line of output (we're running ParNew+CMS): > [GC 2136581.585: [ParNew:345951K->40960K(368640K), 0.0676780 secs] > 3608692K->3323692K(5201920K), 0.0680220 secs] > > a) The delta between the ParNew "before" and "after" is: 345951K-40960K=304991K > My understanding is that the 304991K is the total of (collected in > young gen + promoted to tenured gen) > Since this number of composed of two things, it's not directly useful by itself. > > b) The delta between the overall heap "before" and "after" is: > 3608692K-3323692K=285000K > I assume that this is effectively the volume that was collected in > this ParNew cycle. > Would it be correct to calculate the total allocation rate of the > running application (in a given period) from summing the total heap > deltas (in a given timespan)? > > I do realize that it's a "collected kilobytes" metric, but I think > it's close enough to be used as a "delayed" allocation number, > especially when looking at a timescale of 10 minutes or more. > It has the additional convenience of requiring to parse the current > gc.log line only, and not needing to correlate with the preceding > ParNew event. > > c) I take it that the difference between the two deltas (ParNew delta > and total heap delta) is effectively the promotion volume? > In the example above, this would give a promotion volume of > (345951K-40960K)-(3608692K-3323692K)=19991K > > d) When looking at -XX:+PrintTenuringDistribution output, I assume the > distribution reflects the situation *after* the enclosing ParNew event > in the log. Have a look at -XX:+PrintHeapAtGC. This will help you get more precise numbers. Regards, Rainer From rednaxelafx at gmail.com Sun Apr 22 19:40:41 2012 From: rednaxelafx at gmail.com (Krystal Mok) Date: Mon, 23 Apr 2012 10:40:41 +0800 Subject: Two basic questions on -verbosegc and -XX:+PrintTenuringDistribution In-Reply-To: References: Message-ID: Hi Taras, d) When looking at -XX:+PrintTenuringDistribution output, I assume the > distribution reflects the situation *after* the enclosing ParNew event > in the log. That's right. The stats are actually printed after the collection has completed. FYI, to get accurate promotion size info, you don't always have to parse the GC log. There's a PerfData counter that keeps track of the promoted size (in bytes) in a minor GC. You could use jstat to fetch the value of that counter, like this: $ jstat -J-Djstat.showUnsupported=true -snap `pgrep java` | grep sun.gc.policy.promoted= sun.gc.policy.promoted=680475760 There are a couple of other counters that can be played in conjuntion, e.g. sun.gc.collector.0.invocations, which shows the number of minor GCs: $ jstat -J-Djstat.showUnsupported=true -snap `pgrep java` | grep sun.gc.collector.0.invocations= sun.gc.collector.0.invocations=23 - Kris On Mon, Apr 23, 2012 at 4:24 AM, Taras Tielkes wrote: > Hi, > > We're using a time-series database to store and aggregate monitoring > data from our systems, including GC behavior. > > I'm thinking of adding two metrics: > * total allocation (in K per minute) > * total promotion (in K per minute) > > The gc logs are the source for this data, and I'd like to verify that > my understanding of the numbers is correct. > > Here's an example verbosegc line of output (we're running ParNew+CMS): > [GC 2136581.585: [ParNew:345951K->40960K(368640K), 0.0676780 secs] > 3608692K->3323692K(5201920K), 0.0680220 secs] > > a) The delta between the ParNew "before" and "after" is: > 345951K-40960K=304991K > My understanding is that the 304991K is the total of (collected in > young gen + promoted to tenured gen) > Since this number of composed of two things, it's not directly useful by > itself. > > b) The delta between the overall heap "before" and "after" is: > 3608692K-3323692K=285000K > I assume that this is effectively the volume that was collected in > this ParNew cycle. > Would it be correct to calculate the total allocation rate of the > running application (in a given period) from summing the total heap > deltas (in a given timespan)? > > I do realize that it's a "collected kilobytes" metric, but I think > it's close enough to be used as a "delayed" allocation number, > especially when looking at a timescale of 10 minutes or more. > It has the additional convenience of requiring to parse the current > gc.log line only, and not needing to correlate with the preceding > ParNew event. > > c) I take it that the difference between the two deltas (ParNew delta > and total heap delta) is effectively the promotion volume? > In the example above, this would give a promotion volume of > (345951K-40960K)-(3608692K-3323692K)=19991K > > d) When looking at -XX:+PrintTenuringDistribution output, I assume the > distribution reflects the situation *after* the enclosing ParNew event > in the log. > > Thanks in advance for any corrections, > -tt > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120423/1972a8f3/attachment.html From the.6th.month at gmail.com Sun Apr 22 21:08:01 2012 From: the.6th.month at gmail.com (the.6th.month at gmail.com) Date: Mon, 23 Apr 2012 12:08:01 +0800 Subject: Two basic questions on -verbosegc and -XX:+PrintTenuringDistribution In-Reply-To: References: Message-ID: Hi, Krystal: those perf data are pretty interesting. Can I get them from JMX metrics, I have a system running aside to collect jmx metrics and reflect them to cacti and nagios graphs All the best Leon On 23 April 2012 10:40, Krystal Mok wrote: > Hi Taras, > > d) When looking at -XX:+PrintTenuringDistribution output, I assume the >> distribution reflects the situation *after* the enclosing ParNew event >> in the log. > > > That's right. The stats are actually printed after the collection has > completed. > > FYI, to get accurate promotion size info, you don't always have to parse > the GC log. There's a PerfData counter that keeps track of the promoted > size (in bytes) in a minor GC. You could use jstat to fetch the value of > that counter, like this: > > $ jstat -J-Djstat.showUnsupported=true -snap `pgrep java` | grep > sun.gc.policy.promoted= > sun.gc.policy.promoted=680475760 > > There are a couple of other counters that can be played in conjuntion, > e.g. sun.gc.collector.0.invocations, which shows the number of minor GCs: > > $ jstat -J-Djstat.showUnsupported=true -snap `pgrep java` | grep > sun.gc.collector.0.invocations= > sun.gc.collector.0.invocations=23 > > - Kris > > > On Mon, Apr 23, 2012 at 4:24 AM, Taras Tielkes wrote: > >> Hi, >> >> We're using a time-series database to store and aggregate monitoring >> data from our systems, including GC behavior. >> >> I'm thinking of adding two metrics: >> * total allocation (in K per minute) >> * total promotion (in K per minute) >> >> The gc logs are the source for this data, and I'd like to verify that >> my understanding of the numbers is correct. >> >> Here's an example verbosegc line of output (we're running ParNew+CMS): >> [GC 2136581.585: [ParNew:345951K->40960K(368640K), 0.0676780 secs] >> 3608692K->3323692K(5201920K), 0.0680220 secs] >> >> a) The delta between the ParNew "before" and "after" is: >> 345951K-40960K=304991K >> My understanding is that the 304991K is the total of (collected in >> young gen + promoted to tenured gen) >> Since this number of composed of two things, it's not directly useful by >> itself. >> >> b) The delta between the overall heap "before" and "after" is: >> 3608692K-3323692K=285000K >> I assume that this is effectively the volume that was collected in >> this ParNew cycle. >> Would it be correct to calculate the total allocation rate of the >> running application (in a given period) from summing the total heap >> deltas (in a given timespan)? >> >> I do realize that it's a "collected kilobytes" metric, but I think >> it's close enough to be used as a "delayed" allocation number, >> especially when looking at a timescale of 10 minutes or more. >> It has the additional convenience of requiring to parse the current >> gc.log line only, and not needing to correlate with the preceding >> ParNew event. >> >> c) I take it that the difference between the two deltas (ParNew delta >> and total heap delta) is effectively the promotion volume? >> In the example above, this would give a promotion volume of >> (345951K-40960K)-(3608692K-3323692K)=19991K >> >> d) When looking at -XX:+PrintTenuringDistribution output, I assume the >> distribution reflects the situation *after* the enclosing ParNew event >> in the log. >> >> Thanks in advance for any corrections, >> -tt >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120423/c8db605f/attachment.html From rednaxelafx at gmail.com Sun Apr 22 21:35:15 2012 From: rednaxelafx at gmail.com (Krystal Mok) Date: Mon, 23 Apr 2012 12:35:15 +0800 Subject: Two basic questions on -verbosegc and -XX:+PrintTenuringDistribution In-Reply-To: References: Message-ID: Hi Leon, I'm afraid not. I'm not aware of any built-in JMX beans that expose these counters. - Kris On Mon, Apr 23, 2012 at 12:08 PM, the.6th.month at gmail.com < the.6th.month at gmail.com> wrote: > Hi, Krystal: > those perf data are pretty interesting. Can I get them from JMX metrics, I > have a system running aside to collect jmx metrics and reflect them to > cacti and nagios graphs > > All the best > Leon > > > On 23 April 2012 10:40, Krystal Mok wrote: > >> Hi Taras, >> >> d) When looking at -XX:+PrintTenuringDistribution output, I assume the >>> distribution reflects the situation *after* the enclosing ParNew event >>> in the log. >> >> >> That's right. The stats are actually printed after the collection has >> completed. >> >> FYI, to get accurate promotion size info, you don't always have to parse >> the GC log. There's a PerfData counter that keeps track of the promoted >> size (in bytes) in a minor GC. You could use jstat to fetch the value of >> that counter, like this: >> >> $ jstat -J-Djstat.showUnsupported=true -snap `pgrep java` | grep >> sun.gc.policy.promoted= >> sun.gc.policy.promoted=680475760 >> >> There are a couple of other counters that can be played in conjuntion, >> e.g. sun.gc.collector.0.invocations, which shows the number of minor GCs: >> >> $ jstat -J-Djstat.showUnsupported=true -snap `pgrep java` | grep >> sun.gc.collector.0.invocations= >> sun.gc.collector.0.invocations=23 >> >> - Kris >> >> >> On Mon, Apr 23, 2012 at 4:24 AM, Taras Tielkes wrote: >> >>> Hi, >>> >>> We're using a time-series database to store and aggregate monitoring >>> data from our systems, including GC behavior. >>> >>> I'm thinking of adding two metrics: >>> * total allocation (in K per minute) >>> * total promotion (in K per minute) >>> >>> The gc logs are the source for this data, and I'd like to verify that >>> my understanding of the numbers is correct. >>> >>> Here's an example verbosegc line of output (we're running ParNew+CMS): >>> [GC 2136581.585: [ParNew:345951K->40960K(368640K), 0.0676780 secs] >>> 3608692K->3323692K(5201920K), 0.0680220 secs] >>> >>> a) The delta between the ParNew "before" and "after" is: >>> 345951K-40960K=304991K >>> My understanding is that the 304991K is the total of (collected in >>> young gen + promoted to tenured gen) >>> Since this number of composed of two things, it's not directly useful by >>> itself. >>> >>> b) The delta between the overall heap "before" and "after" is: >>> 3608692K-3323692K=285000K >>> I assume that this is effectively the volume that was collected in >>> this ParNew cycle. >>> Would it be correct to calculate the total allocation rate of the >>> running application (in a given period) from summing the total heap >>> deltas (in a given timespan)? >>> >>> I do realize that it's a "collected kilobytes" metric, but I think >>> it's close enough to be used as a "delayed" allocation number, >>> especially when looking at a timescale of 10 minutes or more. >>> It has the additional convenience of requiring to parse the current >>> gc.log line only, and not needing to correlate with the preceding >>> ParNew event. >>> >>> c) I take it that the difference between the two deltas (ParNew delta >>> and total heap delta) is effectively the promotion volume? >>> In the example above, this would give a promotion volume of >>> (345951K-40960K)-(3608692K-3323692K)=19991K >>> >>> d) When looking at -XX:+PrintTenuringDistribution output, I assume the >>> distribution reflects the situation *after* the enclosing ParNew event >>> in the log. >>> >>> Thanks in advance for any corrections, >>> -tt >>> _______________________________________________ >>> hotspot-gc-use mailing list >>> hotspot-gc-use at openjdk.java.net >>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>> >> >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120423/fbd76012/attachment-0001.html From bengt.rutisson at oracle.com Mon Apr 23 00:18:02 2012 From: bengt.rutisson at oracle.com (Bengt Rutisson) Date: Mon, 23 Apr 2012 09:18:02 +0200 Subject: Faster card marking: chances for Java 6 backport In-Reply-To: <4F91E64D.1070509@oracle.com> References: <4F91E64D.1070509@oracle.com> Message-ID: <4F95022A.7060103@oracle.com> Taras, Maybe I'm being a bit picky here, but just to be clear. The change for 7068625 is for faster card scanning - not marking. I agree with Jon, I don't think this will be backported to JDK6 unless there is an explicit customer request to do so. Bengt On 2012-04-21 00:42, Jon Masamitsu wrote: > Taras, > > I haven't heard any discussions about a backport. > I think it's a issue that the sustaining organization would > have to consider (since it's to jdk6). > > Jon > > On 4/20/2012 12:46 PM, Taras Tielkes wrote: >> Hi, >> >> Are there plans to port RFE 7068625 to Java 6? >> >> Thanks, >> -tt >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From alexey.ragozin at gmail.com Mon Apr 23 01:05:52 2012 From: alexey.ragozin at gmail.com (Alexey Ragozin) Date: Mon, 23 Apr 2012 08:05:52 +0000 Subject: Two basic questions on -verbosegc and -XX:+PrintTenuringDistribution Message-ID: Hi, If you need this information for monitoring, you can get it with JMX. Some time ago I have written tool displaying GC metrics (similar to GC log). It is using attach API and JMX to get data from JVM. It is available at http://code.google.com/p/gridkit/downloads/detail?name=gcrep.jar Usage: java -jar gcrep.jar But you probably will be more interested in sources, you can find them here http://code.google.com/p/gridkit/source/browse/branches/aragozin-sandbox/young-gc-bench/src/main/java/org/gridkit/util/monitoring/MBeanGCMonitor.java?spec=svn1461&r=1461 Regards, Alexey > Date: Sun, 22 Apr 2012 22:24:31 +0200 > From: Taras Tielkes > Subject: Two basic questions on -verbosegc and > ? ? ? ?-XX:+PrintTenuringDistribution > To: hotspot-gc-use at openjdk.java.net > Message-ID: > ? ? ? ? > Content-Type: text/plain; charset=ISO-8859-1 > > Hi, > > We're using a time-series database to store and aggregate monitoring > data from our systems, including GC behavior. > > I'm thinking of adding two metrics: > * total allocation (in K per minute) > * total promotion (in K per minute) > > The gc logs are the source for this data, and I'd like to verify that > my understanding of the numbers is correct. > > Here's an example verbosegc line of output (we're running ParNew+CMS): > [GC 2136581.585: [ParNew:345951K->40960K(368640K), 0.0676780 secs] > 3608692K->3323692K(5201920K), 0.0680220 secs] > > a) The delta between the ParNew "before" and "after" is: 345951K-40960K=304991K > My understanding is that the 304991K is the total of (collected in > young gen + promoted to tenured gen) > Since this number of composed of two things, it's not directly useful by itself. > > b) The delta between the overall heap "before" and "after" is: > 3608692K-3323692K=285000K > I assume that this is effectively the volume that was collected in > this ParNew cycle. > Would it be correct to calculate the total allocation rate of the > running application (in a given period) from summing the total heap > deltas (in a given timespan)? > > I do realize that it's a "collected kilobytes" metric, but I think > it's close enough to be used as a "delayed" allocation number, > especially when looking at a timescale of 10 minutes or more. > It has the additional convenience of requiring to parse the current > gc.log line only, and not needing to correlate with the preceding > ParNew event. > > c) I take it that the difference between the two deltas (ParNew delta > and total heap delta) is effectively the promotion volume? > In the example above, this would give a promotion volume of > (345951K-40960K)-(3608692K-3323692K)=19991K > > d) When looking at -XX:+PrintTenuringDistribution output, I assume the > distribution reflects the situation *after* the enclosing ParNew event > in the log. > > Thanks in advance for any corrections, > -tt From the.6th.month at gmail.com Mon Apr 23 01:13:00 2012 From: the.6th.month at gmail.com (the.6th.month at gmail.com) Date: Mon, 23 Apr 2012 16:13:00 +0800 Subject: Two basic questions on -verbosegc and -XX:+PrintTenuringDistribution In-Reply-To: References: Message-ID: Hi, Alexey looks pretty cool, so basically you are parsing LastGCInfo to get those metrics, right? Leon On 23 April 2012 16:05, Alexey Ragozin wrote: > Hi, > > If you need this information for monitoring, you can get it with JMX. > Some time ago I have written tool displaying GC metrics (similar to GC > log). It is using attach API and JMX to get data from JVM. > > It is available at > http://code.google.com/p/gridkit/downloads/detail?name=gcrep.jar > Usage: java -jar gcrep.jar > > But you probably will be more interested in sources, you can find them here > > http://code.google.com/p/gridkit/source/browse/branches/aragozin-sandbox/young-gc-bench/src/main/java/org/gridkit/util/monitoring/MBeanGCMonitor.java?spec=svn1461&r=1461 > > Regards, > Alexey > > > Date: Sun, 22 Apr 2012 22:24:31 +0200 > > From: Taras Tielkes > > Subject: Two basic questions on -verbosegc and > > -XX:+PrintTenuringDistribution > > To: hotspot-gc-use at openjdk.java.net > > Message-ID: > > < > CA+R7V78bTOkvaYgwNCPC2MfiqdV-QBtOidz3nUXjb9bwZ5FrNg at mail.gmail.com> > > Content-Type: text/plain; charset=ISO-8859-1 > > > > Hi, > > > > We're using a time-series database to store and aggregate monitoring > > data from our systems, including GC behavior. > > > > I'm thinking of adding two metrics: > > * total allocation (in K per minute) > > * total promotion (in K per minute) > > > > The gc logs are the source for this data, and I'd like to verify that > > my understanding of the numbers is correct. > > > > Here's an example verbosegc line of output (we're running ParNew+CMS): > > [GC 2136581.585: [ParNew:345951K->40960K(368640K), 0.0676780 secs] > > 3608692K->3323692K(5201920K), 0.0680220 secs] > > > > a) The delta between the ParNew "before" and "after" is: > 345951K-40960K=304991K > > My understanding is that the 304991K is the total of (collected in > > young gen + promoted to tenured gen) > > Since this number of composed of two things, it's not directly useful by > itself. > > > > b) The delta between the overall heap "before" and "after" is: > > 3608692K-3323692K=285000K > > I assume that this is effectively the volume that was collected in > > this ParNew cycle. > > Would it be correct to calculate the total allocation rate of the > > running application (in a given period) from summing the total heap > > deltas (in a given timespan)? > > > > I do realize that it's a "collected kilobytes" metric, but I think > > it's close enough to be used as a "delayed" allocation number, > > especially when looking at a timescale of 10 minutes or more. > > It has the additional convenience of requiring to parse the current > > gc.log line only, and not needing to correlate with the preceding > > ParNew event. > > > > c) I take it that the difference between the two deltas (ParNew delta > > and total heap delta) is effectively the promotion volume? > > In the example above, this would give a promotion volume of > > (345951K-40960K)-(3608692K-3323692K)=19991K > > > > d) When looking at -XX:+PrintTenuringDistribution output, I assume the > > distribution reflects the situation *after* the enclosing ParNew event > > in the log. > > > > Thanks in advance for any corrections, > > -tt > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120423/63fb7345/attachment.html From alexey.ragozin at gmail.com Mon Apr 23 01:51:07 2012 From: alexey.ragozin at gmail.com (Alexey Ragozin) Date: Mon, 23 Apr 2012 08:51:07 +0000 Subject: Two basic questions on -verbosegc and -XX:+PrintTenuringDistribution In-Reply-To: References: Message-ID: Exactly. I'm polling JMX regularly to catch LastGCInfo for every collection (not ideal solution though). LastGCInfo have all in need (I'm mostly interested in STW pauses and allocation / reclaim rates). There are some limitations - I cannot reproduce tenuring distribution from JMX, CMS fragmentation metrics also available only through logs (very unfortunate). Regards, Alexey On Mon, Apr 23, 2012 at 8:13 AM, the.6th.month at gmail.com wrote: >> Hi, Alexey >> looks pretty cool, so basically you are parsing LastGCInfo to get those >> metrics, right? >> >> Leon >> >> On 23 April 2012 16:05, Alexey Ragozin wrote: >>> >>> Hi, >>> >>> If you need this information for monitoring, you can get it with JMX. >>> Some time ago I have written tool displaying GC metrics (similar to GC >>> log). It is using attach API and JMX to get data from JVM. >>> >>> It is available at >>> http://code.google.com/p/gridkit/downloads/detail?name=gcrep.jar >>> Usage: java -jar gcrep.jar >>> >>> But you probably will be more interested in sources, you can find them >>> here >>> >>> http://code.google.com/p/gridkit/source/browse/branches/aragozin-sandbox/young-gc-bench/src/main/java/org/gridkit/util/monitoring/MBeanGCMonitor.java?spec=svn1461&r=1461 >>> >>> Regards, >>> Alexey >>> >>> > Date: Sun, 22 Apr 2012 22:24:31 +0200 >>> > From: Taras Tielkes >>> > Subject: Two basic questions on -verbosegc and >>> > ? ? ? ?-XX:+PrintTenuringDistribution >>> > To: hotspot-gc-use at openjdk.java.net >>> > Message-ID: >>> > >>> > ? >>> > Content-Type: text/plain; charset=ISO-8859-1 >>> > >>> > Hi, >>> > >>> > We're using a time-series database to store and aggregate monitoring >>> > data from our systems, including GC behavior. >>> > >>> > I'm thinking of adding two metrics: >>> > * total allocation (in K per minute) >>> > * total promotion (in K per minute) >>> > >>> > The gc logs are the source for this data, and I'd like to verify that >>> > my understanding of the numbers is correct. >>> > >>> > Here's an example verbosegc line of output (we're running ParNew+CMS): >>> > [GC 2136581.585: [ParNew:345951K->40960K(368640K), 0.0676780 secs] >>> > 3608692K->3323692K(5201920K), 0.0680220 secs] >>> > >>> > a) The delta between the ParNew "before" and "after" is: >>> > 345951K-40960K=304991K >>> > My understanding is that the 304991K is the total of (collected in >>> > young gen + promoted to tenured gen) >>> > Since this number of composed of two things, it's not directly useful by >>> > itself. >>> > >>> > b) The delta between the overall heap "before" and "after" is: >>> > 3608692K-3323692K=285000K >>> > I assume that this is effectively the volume that was collected in >>> > this ParNew cycle. >>> > Would it be correct to calculate the total allocation rate of the >>> > running application (in a given period) from summing the total heap >>> > deltas (in a given timespan)? >>> > >>> > I do realize that it's a "collected kilobytes" metric, but I think >>> > it's close enough to be used as a "delayed" allocation number, >>> > especially when looking at a timescale of 10 minutes or more. >>> > It has the additional convenience of requiring to parse the current >>> > gc.log line only, and not needing to correlate with the preceding >>> > ParNew event. >>> > >>> > c) I take it that the difference between the two deltas (ParNew delta >>> > and total heap delta) is effectively the promotion volume? >>> > In the example above, this would give a promotion volume of >>> > (345951K-40960K)-(3608692K-3323692K)=19991K >>> > >>> > d) When looking at -XX:+PrintTenuringDistribution output, I assume the >>> > distribution reflects the situation *after* the enclosing ParNew event >>> > in the log. >>> > >>> > Thanks in advance for any corrections, >>> > -tt >>> _______________________________________________ >>> hotspot-gc-use mailing list >>> hotspot-gc-use at openjdk.java.net >>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> >> From kbbryant61 at gmail.com Mon Apr 23 12:28:07 2012 From: kbbryant61 at gmail.com (Kobe Bryant) Date: Mon, 23 Apr 2012 12:28:07 -0700 Subject: Two basic questions on -verbosegc and -XX:+PrintTenuringDistribution In-Reply-To: References: Message-ID: sorry for interjecting myself into this interesting conversation: > There's a PerfData counter that keeps track of the promoted size (in bytes) in a minor GC. >You could use jstat to fetch the value of that counter, like this: does this give me the number of promoted bytes in the last minor gc? so if I have to track promotion volumes at each gc I have to keep polling this metric (and even then I might miss an update and lose information, since this info is not cumulative), correct? Also, is there a similar metric to track size harvested from tenured space at each full GC? thank you /K On Sun, Apr 22, 2012 at 7:40 PM, Krystal Mok wrote: > Hi Taras, > > d) When looking at -XX:+PrintTenuringDistribution output, I assume the >> distribution reflects the situation *after* the enclosing ParNew event >> in the log. > > > That's right. The stats are actually printed after the collection has > completed. > > FYI, to get accurate promotion size info, you don't always have to parse > the GC log. There's a PerfData counter that keeps track of the promoted > size (in bytes) in a minor GC. You could use jstat to fetch the value of > that counter, like this: > > $ jstat -J-Djstat.showUnsupported=true -snap `pgrep java` | grep > sun.gc.policy.promoted= > sun.gc.policy.promoted=680475760 > > There are a couple of other counters that can be played in conjuntion, > e.g. sun.gc.collector.0.invocations, which shows the number of minor GCs: > > $ jstat -J-Djstat.showUnsupported=true -snap `pgrep java` | grep > sun.gc.collector.0.invocations= > sun.gc.collector.0.invocations=23 > > - Kris > > > On Mon, Apr 23, 2012 at 4:24 AM, Taras Tielkes wrote: > >> Hi, >> >> We're using a time-series database to store and aggregate monitoring >> data from our systems, including GC behavior. >> >> I'm thinking of adding two metrics: >> * total allocation (in K per minute) >> * total promotion (in K per minute) >> >> The gc logs are the source for this data, and I'd like to verify that >> my understanding of the numbers is correct. >> >> Here's an example verbosegc line of output (we're running ParNew+CMS): >> [GC 2136581.585: [ParNew:345951K->40960K(368640K), 0.0676780 secs] >> 3608692K->3323692K(5201920K), 0.0680220 secs] >> >> a) The delta between the ParNew "before" and "after" is: >> 345951K-40960K=304991K >> My understanding is that the 304991K is the total of (collected in >> young gen + promoted to tenured gen) >> Since this number of composed of two things, it's not directly useful by >> itself. >> >> b) The delta between the overall heap "before" and "after" is: >> 3608692K-3323692K=285000K >> I assume that this is effectively the volume that was collected in >> this ParNew cycle. >> Would it be correct to calculate the total allocation rate of the >> running application (in a given period) from summing the total heap >> deltas (in a given timespan)? >> >> I do realize that it's a "collected kilobytes" metric, but I think >> it's close enough to be used as a "delayed" allocation number, >> especially when looking at a timescale of 10 minutes or more. >> It has the additional convenience of requiring to parse the current >> gc.log line only, and not needing to correlate with the preceding >> ParNew event. >> >> c) I take it that the difference between the two deltas (ParNew delta >> and total heap delta) is effectively the promotion volume? >> In the example above, this would give a promotion volume of >> (345951K-40960K)-(3608692K-3323692K)=19991K >> >> d) When looking at -XX:+PrintTenuringDistribution output, I assume the >> distribution reflects the situation *after* the enclosing ParNew event >> in the log. >> >> Thanks in advance for any corrections, >> -tt >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120423/77c4d3b6/attachment.html From taras.tielkes at gmail.com Mon Apr 23 13:03:38 2012 From: taras.tielkes at gmail.com (Taras Tielkes) Date: Mon, 23 Apr 2012 22:03:38 +0200 Subject: Two basic questions on -verbosegc and -XX:+PrintTenuringDistribution In-Reply-To: References: Message-ID: Hi, Sorry to return this thread to the original question :-) The additional data from jstat -snap is indeed quite useful. However, I think the totals easily harvested from the gc logs are accurate enough for my purposes, which is measuring overall allocation rate, and overall promotion rate. Performing a few manual calculations shows that the promotion volume I calculate as the differentce of ParNew delta and total heap delta is reasonably close to the "tenuring age 15" age group from the preceding ParNew. I just want to make sure I'm not missing something obvious here. The assumption is of course that PermGen is quite stable, and that promotion and CMS failures are relatively rate. Thanks, -tt On Mon, Apr 23, 2012 at 9:28 PM, Kobe Bryant wrote: > sorry for interjecting myself into this interesting conversation: > > ? >?There's a PerfData counter that keeps track of the promoted size (in > bytes) in a minor GC. > ? >You could use jstat to fetch the value of that counter, like this: > > does this give me the number of promoted bytes in the last minor gc? so if I > have to track promotion volumes > at each gc I have to keep polling this metric (and even then I might miss an > update and lose information, since > this info is not cumulative), correct? > > Also, is there a similar metric to track size harvested from tenured space > at each full GC? > > thank you > > /K > > On Sun, Apr 22, 2012 at 7:40 PM, Krystal Mok wrote: >> >> Hi Taras, >> >>> d) When looking at -XX:+PrintTenuringDistribution output, I assume the >>> distribution reflects the situation *after* the enclosing ParNew event >>> in the log. >> >> >> That's right. The stats are actually printed after the collection has >> completed. >> >> FYI, to get accurate promotion size info, you don't always have to parse >> the GC log. There's a PerfData counter that keeps track of the promoted size >> (in bytes) in a minor GC. You could use jstat to fetch the value of that >> counter, like this: >> >> $ jstat -J-Djstat.showUnsupported=true -snap `pgrep java` | grep >> sun.gc.policy.promoted= >> sun.gc.policy.promoted=680475760 >> >> There are a couple of other counters that can be played in conjuntion, >> e.g. sun.gc.collector.0.invocations, which shows the number of minor GCs: >> >> $ jstat -J-Djstat.showUnsupported=true -snap `pgrep java` | grep >> sun.gc.collector.0.invocations= >> sun.gc.collector.0.invocations=23 >> >> - Kris >> >> >> On Mon, Apr 23, 2012 at 4:24 AM, Taras Tielkes >> wrote: >>> >>> Hi, >>> >>> We're using a time-series database to store and aggregate monitoring >>> data from our systems, including GC behavior. >>> >>> I'm thinking of adding two metrics: >>> * total allocation (in K per minute) >>> * total promotion (in K per minute) >>> >>> The gc logs are the source for this data, and I'd like to verify that >>> my understanding of the numbers is correct. >>> >>> Here's an example verbosegc line of output (we're running ParNew+CMS): >>> [GC 2136581.585: [ParNew:345951K->40960K(368640K), 0.0676780 secs] >>> 3608692K->3323692K(5201920K), 0.0680220 secs] >>> >>> a) The delta between the ParNew "before" and "after" is: >>> 345951K-40960K=304991K >>> My understanding is that the 304991K is the total of (collected in >>> young gen + promoted to tenured gen) >>> Since this number of composed of two things, it's not directly useful by >>> itself. >>> >>> b) The delta between the overall heap "before" and "after" is: >>> 3608692K-3323692K=285000K >>> I assume that this is effectively the volume that was collected in >>> this ParNew cycle. >>> Would it be correct to calculate the total allocation rate of the >>> running application (in a given period) from summing the total heap >>> deltas (in a given timespan)? >>> >>> I do realize that it's a "collected kilobytes" metric, but I think >>> it's close enough to be used as a "delayed" allocation number, >>> especially when looking at a timescale of 10 minutes or more. >>> It has the additional convenience of requiring to parse the current >>> gc.log line only, and not needing to correlate with the preceding >>> ParNew event. >>> >>> c) I take it that the difference between the two deltas (ParNew delta >>> and total heap delta) is effectively the promotion volume? >>> In the example above, this would give a promotion volume of >>> (345951K-40960K)-(3608692K-3323692K)=19991K >>> >>> d) When looking at -XX:+PrintTenuringDistribution output, I assume the >>> distribution reflects the situation *after* the enclosing ParNew event >>> in the log. >>> >>> Thanks in advance for any corrections, >>> -tt >>> _______________________________________________ >>> hotspot-gc-use mailing list >>> hotspot-gc-use at openjdk.java.net >>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> >> >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > From taras.tielkes at gmail.com Mon Apr 23 13:07:31 2012 From: taras.tielkes at gmail.com (Taras Tielkes) Date: Mon, 23 Apr 2012 22:07:31 +0200 Subject: Faster card marking: chances for Java 6 backport In-Reply-To: <4F95022A.7060103@oracle.com> References: <4F91E64D.1070509@oracle.com> <4F95022A.7060103@oracle.com> Message-ID: Hi Bengt, Thanks for the correction - you're completely right, of course. To me, the decision process for which performance improvements are backported to the previous release stream has never been completely clear. Given that the change in question seems quite an isolated fix, I though it would make sense to ask. Thanks, -tt On Mon, Apr 23, 2012 at 9:18 AM, Bengt Rutisson wrote: > > Taras, > > Maybe I'm being a bit picky here, but just to be clear. The change for > 7068625 is for faster card scanning - not marking. > > I agree with Jon, I don't think this will be backported to JDK6 unless > there is an explicit customer request to do so. > > Bengt > > On 2012-04-21 00:42, Jon Masamitsu wrote: >> Taras, >> >> I haven't heard any discussions about a backport. >> I think it's a issue that the sustaining organization would >> have to consider (since it's to jdk6). >> >> Jon >> >> On 4/20/2012 12:46 PM, Taras Tielkes wrote: >>> Hi, >>> >>> Are there plans to port RFE 7068625 to Java 6? >>> >>> Thanks, >>> -tt >>> _______________________________________________ >>> hotspot-gc-use mailing list >>> hotspot-gc-use at openjdk.java.net >>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From ysr1729 at gmail.com Mon Apr 23 13:51:12 2012 From: ysr1729 at gmail.com (Srinivas Ramakrishna) Date: Mon, 23 Apr 2012 13:51:12 -0700 Subject: Two basic questions on -verbosegc and -XX:+PrintTenuringDistribution In-Reply-To: References: Message-ID: Yes, that's right. By the way, a few years ago, John had posted an awk script on this alias that did this for you. I recently had occasion to need to use it, and found it gave a few problems with the jdk 6 logs i was processing, so I fixed a few bugs in it and extended it to summarize and plot other metrics of interest to me. I am happy to share my modifications to John;s script here and on the OpenJDK PrintGCStats project page later this week. -- ramki On Mon, Apr 23, 2012 at 1:03 PM, Taras Tielkes wrote: > Hi, > > Sorry to return this thread to the original question :-) > The additional data from jstat -snap is indeed quite useful. > > However, I think the totals easily harvested from the gc logs are > accurate enough for my purposes, which is measuring overall allocation > rate, and overall promotion rate. > Performing a few manual calculations shows that the promotion volume I > calculate as the differentce of ParNew delta and total heap delta is > reasonably close to the "tenuring age 15" age group from the preceding > ParNew. > I just want to make sure I'm not missing something obvious here. The > assumption is of course that PermGen is quite stable, and that > promotion and CMS failures are relatively rate. > > Thanks, > -tt > > On Mon, Apr 23, 2012 at 9:28 PM, Kobe Bryant wrote: > > sorry for interjecting myself into this interesting conversation: > > > > > There's a PerfData counter that keeps track of the promoted size (in > > bytes) in a minor GC. > > >You could use jstat to fetch the value of that counter, like this: > > > > does this give me the number of promoted bytes in the last minor gc? so > if I > > have to track promotion volumes > > at each gc I have to keep polling this metric (and even then I might > miss an > > update and lose information, since > > this info is not cumulative), correct? > > > > Also, is there a similar metric to track size harvested from tenured > space > > at each full GC? > > > > thank you > > > > /K > > > > On Sun, Apr 22, 2012 at 7:40 PM, Krystal Mok > wrote: > >> > >> Hi Taras, > >> > >>> d) When looking at -XX:+PrintTenuringDistribution output, I assume the > >>> distribution reflects the situation *after* the enclosing ParNew event > >>> in the log. > >> > >> > >> That's right. The stats are actually printed after the collection has > >> completed. > >> > >> FYI, to get accurate promotion size info, you don't always have to parse > >> the GC log. There's a PerfData counter that keeps track of the promoted > size > >> (in bytes) in a minor GC. You could use jstat to fetch the value of that > >> counter, like this: > >> > >> $ jstat -J-Djstat.showUnsupported=true -snap `pgrep java` | grep > >> sun.gc.policy.promoted= > >> sun.gc.policy.promoted=680475760 > >> > >> There are a couple of other counters that can be played in conjuntion, > >> e.g. sun.gc.collector.0.invocations, which shows the number of minor > GCs: > >> > >> $ jstat -J-Djstat.showUnsupported=true -snap `pgrep java` | grep > >> sun.gc.collector.0.invocations= > >> sun.gc.collector.0.invocations=23 > >> > >> - Kris > >> > >> > >> On Mon, Apr 23, 2012 at 4:24 AM, Taras Tielkes > > >> wrote: > >>> > >>> Hi, > >>> > >>> We're using a time-series database to store and aggregate monitoring > >>> data from our systems, including GC behavior. > >>> > >>> I'm thinking of adding two metrics: > >>> * total allocation (in K per minute) > >>> * total promotion (in K per minute) > >>> > >>> The gc logs are the source for this data, and I'd like to verify that > >>> my understanding of the numbers is correct. > >>> > >>> Here's an example verbosegc line of output (we're running ParNew+CMS): > >>> [GC 2136581.585: [ParNew:345951K->40960K(368640K), 0.0676780 secs] > >>> 3608692K->3323692K(5201920K), 0.0680220 secs] > >>> > >>> a) The delta between the ParNew "before" and "after" is: > >>> 345951K-40960K=304991K > >>> My understanding is that the 304991K is the total of (collected in > >>> young gen + promoted to tenured gen) > >>> Since this number of composed of two things, it's not directly useful > by > >>> itself. > >>> > >>> b) The delta between the overall heap "before" and "after" is: > >>> 3608692K-3323692K=285000K > >>> I assume that this is effectively the volume that was collected in > >>> this ParNew cycle. > >>> Would it be correct to calculate the total allocation rate of the > >>> running application (in a given period) from summing the total heap > >>> deltas (in a given timespan)? > >>> > >>> I do realize that it's a "collected kilobytes" metric, but I think > >>> it's close enough to be used as a "delayed" allocation number, > >>> especially when looking at a timescale of 10 minutes or more. > >>> It has the additional convenience of requiring to parse the current > >>> gc.log line only, and not needing to correlate with the preceding > >>> ParNew event. > >>> > >>> c) I take it that the difference between the two deltas (ParNew delta > >>> and total heap delta) is effectively the promotion volume? > >>> In the example above, this would give a promotion volume of > >>> (345951K-40960K)-(3608692K-3323692K)=19991K > >>> > >>> d) When looking at -XX:+PrintTenuringDistribution output, I assume the > >>> distribution reflects the situation *after* the enclosing ParNew event > >>> in the log. > >>> > >>> Thanks in advance for any corrections, > >>> -tt > >>> _______________________________________________ > >>> hotspot-gc-use mailing list > >>> hotspot-gc-use at openjdk.java.net > >>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > >> > >> > >> > >> _______________________________________________ > >> hotspot-gc-use mailing list > >> hotspot-gc-use at openjdk.java.net > >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > >> > > > > > > _______________________________________________ > > hotspot-gc-use mailing list > > hotspot-gc-use at openjdk.java.net > > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120423/17dc0686/attachment-0001.html From Bond.Chen at lombardrisk.com Tue Apr 24 02:49:31 2012 From: Bond.Chen at lombardrisk.com (Bond Chen) Date: Tue, 24 Apr 2012 10:49:31 +0100 Subject: Promotion Failed when the Old Generation Usage is very low. Message-ID: <4F96E7AB.9AAE.00F7.0@lombardrisk.com> Hi , We're suffering high frequent promotion failed and concurrent mode failure, cause very long GC pause(5 seconds to 1000 seconds even more some time) attached the '1st promote failed' and '49th promotion failed' of gc.log 1, The '1st promote failed' caused by the old generation usage is too high, no enough space for promotion, but the '49th promotion failed', only used 2615456K out of 10387456K, what happed? 2, Does the CMS throwing 'Concurrent Mode Failure' combat the old generation? move all objects together and leave only one free block? or Only 'Full GC' does this? 3, when will 'Promotion failure' cause ''Concurrent Mode Failure' and some time 'Full GC' ? Regards, Bond /****parameter ***/ ### New JVM Parameter #Below line changed per RH recommendation 15 Dec 2009 #export RUN_ARGS=" -d64 -server -Xms2048M -Xmx12144M -XX:PermSize=512m -XX:MaxPermSize=512m -Xss1024k " export RUN_ARGS=" -server -d64 -Xms2048M -Xmx12144M -XX:PermSize=512m -XX:MaxPermSize=512m -Xss1024k " export RUN_ARGS=" $RUN_ARGS -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+UseTLAB -XX:+CMSIncrementalMode " #Below line commented per RH recommendation 15 Dec 2009 #export RUN_ARGS=" $RUN_ARGS -XX:+UseCMSCompactAtFullCollection " #Below line changed per RH recommendation 15 Dec 2009 #export RUN_ARGS=" $RUN_ARGS -XX:+CMSIncrementalPacing -XX:CMSIncrementalDutyCycleMin=0 -XX:CMSIncrementalDutyCycle=10 -XX:MaxTenuringThreshold=0 " export RUN_ARGS=" $RUN_ARGS -XX:+CMSIncrementalPacing -XX:CMSIncrementalDutyCycleMin=0 -XX:CMSIncrementalDutyCycle=10 -XX:MaxTenuringThreshold=32 " #Below 2 lines added per RH recommendation 15 Dec 2009 #export RUN_ARGS=" -XX:ParallelGCThreads=13 " #export RUN_ARGS=" -XX:SurvivorRatio=48 " #Below 2 lines added per RH recommendation 16 Dec 2009 RUN_ARGS=" $RUN_ARGS -XX:ParallelGCThreads=13 " RUN_ARGS=" $RUN_ARGS -XX:SurvivorRatio=48 " ### set for cluster monitor added on 25-Jun-2011 export RUN_ARGS="$RUN_ARGS -Djboss.cluster.monitor.switch=y"; export RUN_ARGS="$RUN_ARGS -Djboss.cluster.number=2"; #Below line changed with RELEASE_2009_1_SP10.2 on 26 Feb 2010 #export RUN_ARGS=" $RUN_ARGS -XX:CMSInitiatingOccupancyFraction=70 -XX:+PrintTenuringDistribution -Xmn512m -XX:+UseLargePages -XX:LargePageSizeInBytes=64k " export RUN_ARGS=" $RUN_ARGS -XX:CMSInitiatingOccupancyFraction=60 -Xmn2000m -XX:+UseLargePages -XX:LargePageSizeInBytes=64k " #Below line added with RELEASE_2009_1_SP10.2 on 26 Feb 2010 export RUN_ARGS=" $RUN_ARGS -XX:+CMSClassUnloadingEnabled -XX:+ExplicitGCInvokesConcurrent -XX:+AggressiveOpts " export RUN_ARGS=" $RUN_ARGS -XX:+PrintGCDetails -XX:+PrintGCApplicationStoppedTime -Xloggc:./gc_${start_ts}.log " export RUN_ARGS=" $RUN_ARGS -Dsun.rmi.dgc.server.gcInterval=18000000 -Dsun.rmi.dgc.client.gcInterval=18000000 -verbose:gc" /***parameter /** the 1st promotion failed **/ 169682.980: [GC Before GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 7127332 Max Chunk Size: 6041118 Number of Blocks: 1785 Av. Block Size: 3992 Tree Height: 24 Before GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 6834133 Max Chunk Size: 97353 Number of Blocks: 4773 Av. Block Size: 1431 Tree Height: 27 169682.981: [ParNew (promotion failed): 2007040K->2007040K(2007040K), 48.9558338 secs]169731.937: [CMS169741.903: [CMS-concurrent-sweep: 10.823/99.414 secs] [Times: user=127.09 sys=25.97, real=99.41 secs] (concurrent mode failure): 8681490K->2319271K(10387456K), 44.6304362 secs] 10395485K->2319271K(12394496K), [CMS Perm : 291584K->290856K(524288K)]After GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 1032711195 Max Chunk Size: 1032711195 Number of Blocks: 1 Av. Block Size: 1032711195 Tree Height: 1 After GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 0 Max Chunk Size: 0 Number of Blocks: 0 Tree Height: 0 icms_dc=16 , 93.5876901 secs] [Times: user=97.28 sys=21.58, real=93.59 secs] /** the 1st promotion failed **/ /** the 49th promotion failed ***/ 298786.901: [GC Before GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 236997970 Max Chunk Size: 236997970 Number of Blocks: 1 Av. Block Size: 236997970 Tree Height: 1 Before GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 0 Max Chunk Size: 0 Number of Blocks: 0 Tree Height: 0 298786.902: [ParNew (promotion failed): 2007039K->2007040K(2007040K), 4.5565939 secs]298791.458: [CMS: 2615456K->1813239K(10387456K), 19.2232319 secs] 4346089K->1813239K(12394496K), [CMS Perm : 299206K->299126K(524288K)]After GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 1097483360 Max Chunk Size: 1097483360 Number of Blocks: 1 Av. Block Size: 1097483360 Tree Height: 1 After GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 0 Max Chunk Size: 0 Number of Blocks: 0 Tree Height: 0 icms_dc=0 , 23.7805042 secs] [Times: user=25.47 sys=0.02, real=23.78 secs] Total time for which application threads were stopped: 23.7861234 seconds /** the 49th promotion failed ***/ This e-mail together with any attachments (the "Message") is confidential and may contain privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and delete this Message from your system. Any unauthorized copying, disclosure, distribution or use of this Message is strictly forbidden. From taras.tielkes at gmail.com Tue Apr 24 15:08:52 2012 From: taras.tielkes at gmail.com (Taras Tielkes) Date: Wed, 25 Apr 2012 00:08:52 +0200 Subject: Two basic questions on -verbosegc and -XX:+PrintTenuringDistribution In-Reply-To: References: Message-ID: Hi, One correction to my original post. To collect a metric representing the overall "allocation rate" of our application, I should be summing the ParNew deltas, not the "overall heap" deltas from the gc log. The "overall heap" delta will reflect the amount collected from ParNew. However, the ParNew delta will also include the volume of objects promoted to tenured gen. This is a more accurate (albeit delayed) representation of allocation volume, since we don't care how objects leave the new gen - by being collected or by being promoted. If it left the new gen by promotion, it was briefly before allocated, and thus should contribute to the reported allocation volume. Cheers, -tt On Mon, Apr 23, 2012 at 10:51 PM, Srinivas Ramakrishna wrote: > Yes, that's right. > > By the way, a few years ago, John had posted an awk script on this alias > that did this for you. > I recently had occasion to need to use it, and found it gave a few problems > with the jdk 6 logs i > was processing, so I fixed a few bugs in it and extended it to summarize and > plot other metrics > of interest to me. I am happy to share my modifications to John;s script > here and on > the OpenJDK PrintGCStats project page later this week. > > -- ramki > > > On Mon, Apr 23, 2012 at 1:03 PM, Taras Tielkes > wrote: >> >> Hi, >> >> Sorry to return this thread to the original question :-) >> The additional data from jstat -snap is indeed quite useful. >> >> However, I think the totals easily harvested from the gc logs are >> accurate enough for my purposes, which is measuring overall allocation >> rate, and overall promotion rate. >> Performing a few manual calculations shows that the promotion volume I >> calculate as the differentce of ParNew delta and total heap delta is >> reasonably close to the "tenuring age 15" age group from the preceding >> ParNew. >> I just want to make sure I'm not missing something obvious here. The >> assumption is of course that PermGen is quite stable, and that >> promotion and CMS failures are relatively rate. >> >> Thanks, >> -tt >> >> On Mon, Apr 23, 2012 at 9:28 PM, Kobe Bryant wrote: >> > sorry for interjecting myself into this interesting conversation: >> > >> > ? >?There's a PerfData counter that keeps track of the promoted size (in >> > bytes) in a minor GC. >> > ? >You could use jstat to fetch the value of that counter, like this: >> > >> > does this give me the number of promoted bytes in the last minor gc? so >> > if I >> > have to track promotion volumes >> > at each gc I have to keep polling this metric (and even then I might >> > miss an >> > update and lose information, since >> > this info is not cumulative), correct? >> > >> > Also, is there a similar metric to track size harvested from tenured >> > space >> > at each full GC? >> > >> > thank you >> > >> > /K >> > >> > On Sun, Apr 22, 2012 at 7:40 PM, Krystal Mok >> > wrote: >> >> >> >> Hi Taras, >> >> >> >>> d) When looking at -XX:+PrintTenuringDistribution output, I assume the >> >>> distribution reflects the situation *after* the enclosing ParNew event >> >>> in the log. >> >> >> >> >> >> That's right. The stats are actually printed after the collection has >> >> completed. >> >> >> >> FYI, to get accurate promotion size info, you don't always have to >> >> parse >> >> the GC log. There's a PerfData counter that keeps track of the promoted >> >> size >> >> (in bytes) in a minor GC. You could use jstat to fetch the value of >> >> that >> >> counter, like this: >> >> >> >> $ jstat -J-Djstat.showUnsupported=true -snap `pgrep java` | grep >> >> sun.gc.policy.promoted= >> >> sun.gc.policy.promoted=680475760 >> >> >> >> There are a couple of other counters that can be played in conjuntion, >> >> e.g. sun.gc.collector.0.invocations, which shows the number of minor >> >> GCs: >> >> >> >> $ jstat -J-Djstat.showUnsupported=true -snap `pgrep java` | grep >> >> sun.gc.collector.0.invocations= >> >> sun.gc.collector.0.invocations=23 >> >> >> >> - Kris >> >> >> >> >> >> On Mon, Apr 23, 2012 at 4:24 AM, Taras Tielkes >> >> >> >> wrote: >> >>> >> >>> Hi, >> >>> >> >>> We're using a time-series database to store and aggregate monitoring >> >>> data from our systems, including GC behavior. >> >>> >> >>> I'm thinking of adding two metrics: >> >>> * total allocation (in K per minute) >> >>> * total promotion (in K per minute) >> >>> >> >>> The gc logs are the source for this data, and I'd like to verify that >> >>> my understanding of the numbers is correct. >> >>> >> >>> Here's an example verbosegc line of output (we're running ParNew+CMS): >> >>> [GC 2136581.585: [ParNew:345951K->40960K(368640K), 0.0676780 secs] >> >>> 3608692K->3323692K(5201920K), 0.0680220 secs] >> >>> >> >>> a) The delta between the ParNew "before" and "after" is: >> >>> 345951K-40960K=304991K >> >>> My understanding is that the 304991K is the total of (collected in >> >>> young gen + promoted to tenured gen) >> >>> Since this number of composed of two things, it's not directly useful >> >>> by >> >>> itself. >> >>> >> >>> b) The delta between the overall heap "before" and "after" is: >> >>> 3608692K-3323692K=285000K >> >>> I assume that this is effectively the volume that was collected in >> >>> this ParNew cycle. >> >>> Would it be correct to calculate the total allocation rate of the >> >>> running application (in a given period) from summing the total heap >> >>> deltas (in a given timespan)? >> >>> >> >>> I do realize that it's a "collected kilobytes" metric, but I think >> >>> it's close enough to be used as a "delayed" allocation number, >> >>> especially when looking at a timescale of 10 minutes or more. >> >>> It has the additional convenience of requiring to parse the current >> >>> gc.log line only, and not needing to correlate with the preceding >> >>> ParNew event. >> >>> >> >>> c) I take it that the difference between the two deltas (ParNew delta >> >>> and total heap delta) is effectively the promotion volume? >> >>> In the example above, this would give a promotion volume of >> >>> (345951K-40960K)-(3608692K-3323692K)=19991K >> >>> >> >>> d) When looking at -XX:+PrintTenuringDistribution output, I assume the >> >>> distribution reflects the situation *after* the enclosing ParNew event >> >>> in the log. >> >>> >> >>> Thanks in advance for any corrections, >> >>> -tt >> >>> _______________________________________________ >> >>> hotspot-gc-use mailing list >> >>> hotspot-gc-use at openjdk.java.net >> >>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> >> >> >> >> >> >> >> _______________________________________________ >> >> hotspot-gc-use mailing list >> >> hotspot-gc-use at openjdk.java.net >> >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> >> >> > >> > >> > _______________________________________________ >> > hotspot-gc-use mailing list >> > hotspot-gc-use at openjdk.java.net >> > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> > >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > From ysr1729 at gmail.com Tue Apr 24 22:09:32 2012 From: ysr1729 at gmail.com (Srinivas Ramakrishna) Date: Tue, 24 Apr 2012 22:09:32 -0700 Subject: CMS Full GC In-Reply-To: References: Message-ID: Hi Shiv -- Which version of the JDK are you on? As I said there was a temporary regression in this behaviour (i.e. expand without full gc) with CMS, which was fixed up later. Unfortunately, can't recall the CR# or the versions of that, although i can probably dig that up from the mercurial history if needed, i don't have the sources handy at the moment. More importantly, by default CMS does not collect the perm gen in a concurrent collection cycle, so you have to explicitly enable concurrent perm gen collection via -XX:+CMSClassUnloadingEnabled (and in older versions also -XX:+CMSPermGenSweepingEnabled). If you are stuck on a version of the JVM where the perm gen expansion regression exists, you should explicitly set both -XX:PermSize and -XX:MaxPermSize to the maximum size of perm gen. (And definitely enable perm gen collection via he flags listed in the last para.) Hopefully that should get rid of these "unwanted" full collections. -- ramki On Tue, Apr 24, 2012 at 9:09 PM, Shivkumar Chelwa wrote: > ** > > Hi Ramki,**** > > ** ** > > I enabled ?jstat ?gccause? for the application instance and found > following few GC causes in the logs.**** > > ** ** > > 1. Allocation Failure ? not sure what that means**** > 2. Permanent Generation Full ? I have few doubts here.**** > 1. The MaxPermSize is set to 256m but the gc log file displays a > different size 74240K. See the following line from gc log file.**** > > 56876.963: [Full GC 56876.963: [CMS: 4181041K->3724534K(7898752K), > 77.5881180 secs] 4211397K->3724534K(8339648K), [CMS Perm : * > 73972K->73511K(74240K)],* 77.5901936 secs] [Times: user=77.47 sys=0.19, > real=77.59 secs]**** > > 1. Why should there be a ?Full GC? for permanent generation > collection?**** > 2. The permanent generation utilization is consistently over 99% > and after ?Full GC? it comes down to 60%, why it didn?t expand the > committed memory instead of doing a full gc?**** > 3. JConsole shows following stats for ?CMS Perm Gen? sizes**** > > ** i. **Used: 74,329 > kbytes **** > > ** ii. **Committed: 74,432 > kbytes **** > > ** iii. **Max: 262,144 > kbytes **** > > ** ** > > ** ** > > These are the garbage collection setting I am using for application,**** > > ** ** > > -server -d64 -javaagent:instrumentation.jar -XX:MaxPermSize=256m > -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+DisableExplicitGC > -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly > -verbose:gc -Xloggc:/logs/LB01.log -XX:+PrintGCTimeStamps > -XX:+PrintGCDetails -Xmx8192M -Xms8192M -Xss256K**** > > ** ** > > ** ** > > There are few lines from ?jstat ?gccause? output where it displays > ?Permanent Generation Full? as the gc cause. Also attaching the gc log file > and ?jstat ?gccasue? output for reference.**** > > ** ** > > S0 S1 E O P YGC YGCT FGC FGCT > GCT LGCC GCC **** > > 0.00 20.05 5.24 52.93 99.02 1625 63.648 1 0.000 63.648 > No GC Permanent Generation Full**** > > 0.00 20.05 5.24 52.93 99.02 1625 63.648 1 0.000 63.648 > No GC Permanent Generation Full**** > > 0.00 0.00 0.00 47.15 60.00 1625 63.648 1 77.588 141.236 > Permanent Generation Full No GC **** > > 0.00 0.00 41.19 47.15 60.02 1625 63.648 1 77.588 > 141.236 Permanent Generation Full No GC **** > > ** ** > > Thanks,**** > > Shiv**** > > ** ** > ------------------------------ > > *From:* Srinivas Ramakrishna [mailto:ysr1729 at gmail.com] > *Sent:* 17 April 2012 15:07 > *To:* Shivkumar Chelwa > *Cc:* **hotspot-gc-use at openjdk.java.net** > *Subject:* Re: CMS Full GC**** > > ** ** > > Is it possible that you are GC'ing here to expand perm gen. Check if > permgen footprint changed between the two JVM releases (when running yr > application). > > Now, CMS should quietly expand perm gen without doing a stop-world GC, but > there was a temporary regression in that functionality before it was fixed > again. > I can't however recall the JVM versions where the regression was > introduced and then fixed. But all of this is handwaving on my part. > If you run 6u22 and 6u29 both with -XX:+PrintHeapAtGC, you might have more > visibility into why the GC is kicking in. A longer log would allow > the community to perhaps provide suggestions as well. > > Which reminds me that there is a bug in the printing of GC cause (as > printed by jstat) which needs to be fixed. HotSpot/GC folk, have you > noticed that we never > see a "perm gen allocation" as the GC cause even when that's really the > reason for a full gc? (not that that should happen here where CMS is being > used.) > > -- ramki**** > > On Tue, Apr 17, 2012 at 8:08 AM, Shivkumar Chelwa > wrote:**** > > Hi,**** > > **** > > Till date I was using JRE 6u22 with following garbage collection > parameters and the CMS cycle use to kick-in appropriately (when heap > reaches 75%)**** > > **** > > **** > > -server -d64 -Xms2048m -Xmx2048m -javaagent: my-instrumentation.jar > -XX:MaxPermSize=256m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC > -XX:+DisableExplicitGC -XX:CMSInitiatingOccupancyFraction=75 > -XX:+UseCMSInitiatingOccupancyOnly -verbose:gc -Xloggc:LB01.log > -XX:+PrintGCTimeStamps -XX:+PrintGCDetails > -Djava.library=/usr/java/jre/lib/sparcv9/server/libjvm.so -Xmx8192M > -Xms8192M -Xss256K**** > > **** > > But I switched to JRE 6u29 and see the *CMS Full GC* happening randomly. > Can you please help me undercover this mystery. Here is one of the log > message from gc log file.**** > > **** > > 13475.239: [*Full GC* 13475.239: [CMS: 4321575K->3717474K(7898752K), *54.0602376 > secs*] 4412277K->3717474K(8339648K), [CMS Perm : 73791K->73339K(74048K)], > 54.0615557 secs] [Times: user=53.97 sy**** > > s=0.12, real=54.06 secs]**** > > **** > > **** > > Kindly help.**** > > **** > > **** > > Regards,**** > > Shiv**** > > **** > > **** > > **** > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use**** > > ** ** > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120424/84e221ff/attachment-0001.html From schelwa at tibco.com Tue Apr 24 22:55:57 2012 From: schelwa at tibco.com (Shivkumar Chelwa) Date: Wed, 25 Apr 2012 05:55:57 +0000 Subject: CMS Full GC In-Reply-To: Message-ID: Using JRE 6 update 29 on Solaris 10 SPARC(64 bit) Regards, Shiv schelwa at tibco.com From: Srinivas Ramakrishna [mailto:ysr1729 at gmail.com] Sent: Tuesday, April 24, 2012 10:09 PM To: Shivkumar Chelwa Cc: hotspot-gc-use at openjdk.java.net Subject: Re: CMS Full GC Hi Shiv -- Which version of the JDK are you on? As I said there was a temporary regression in this behaviour (i.e. expand without full gc) with CMS, which was fixed up later. Unfortunately, can't recall the CR# or the versions of that, although i can probably dig that up from the mercurial history if needed, i don't have the sources handy at the moment. More importantly, by default CMS does not collect the perm gen in a concurrent collection cycle, so you have to explicitly enable concurrent perm gen collection via -XX:+CMSClassUnloadingEnabled (and in older versions also -XX:+CMSPermGenSweepingEnabled). If you are stuck on a version of the JVM where the perm gen expansion regression exists, you should explicitly set both -XX:PermSize and -XX:MaxPermSize to the maximum size of perm gen. (And definitely enable perm gen collection via he flags listed in the last para.) Hopefully that should get rid of these "unwanted" full collections. -- ramki On Tue, Apr 24, 2012 at 9:09 PM, Shivkumar Chelwa > wrote: Hi Ramki, I enabled ?jstat ?gccause? for the application instance and found following few GC causes in the logs. 1. Allocation Failure ? not sure what that means 2. Permanent Generation Full ? I have few doubts here. * The MaxPermSize is set to 256m but the gc log file displays a different size 74240K. See the following line from gc log file. 56876.963: [Full GC 56876.963: [CMS: 4181041K->3724534K(7898752K), 77.5881180 secs] 4211397K->3724534K(8339648K), [CMS Perm : 73972K->73511K(74240K)], 77.5901936 secs] [Times: user=77.47 sys=0.19, real=77.59 secs] * Why should there be a ?Full GC? for permanent generation collection? * The permanent generation utilization is consistently over 99% and after ?Full GC? it comes down to 60%, why it didn?t expand the committed memory instead of doing a full gc? * JConsole shows following stats for ?CMS Perm Gen? sizes i. Used: 74,329 kbytes ii. Committed: 74,432 kbytes iii. Max: 262,144 kbytes These are the garbage collection setting I am using for application, -server -d64 -javaagent:instrumentation.jar -XX:MaxPermSize=256m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+DisableExplicitGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -verbose:gc -Xloggc:/logs/LB01.log -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -Xmx8192M -Xms8192M -Xss256K There are few lines from ?jstat ?gccause? output where it displays ?Permanent Generation Full? as the gc cause. Also attaching the gc log file and ?jstat ?gccasue? output for reference. S0 S1 E O P YGC YGCT FGC FGCT GCT LGCC GCC 0.00 20.05 5.24 52.93 99.02 1625 63.648 1 0.000 63.648 No GC Permanent Generation Full 0.00 20.05 5.24 52.93 99.02 1625 63.648 1 0.000 63.648 No GC Permanent Generation Full 0.00 0.00 0.00 47.15 60.00 1625 63.648 1 77.588 141.236 Permanent Generation Full No GC 0.00 0.00 41.19 47.15 60.02 1625 63.648 1 77.588 141.236 Permanent Generation Full No GC Thanks, Shiv ________________________________ From: Srinivas Ramakrishna [mailto:ysr1729 at gmail.com] Sent: 17 April 2012 15:07 To: Shivkumar Chelwa Cc: hotspot-gc-use at openjdk.java.net Subject: Re: CMS Full GC Is it possible that you are GC'ing here to expand perm gen. Check if permgen footprint changed between the two JVM releases (when running yr application). Now, CMS should quietly expand perm gen without doing a stop-world GC, but there was a temporary regression in that functionality before it was fixed again. I can't however recall the JVM versions where the regression was introduced and then fixed. But all of this is handwaving on my part. If you run 6u22 and 6u29 both with -XX:+PrintHeapAtGC, you might have more visibility into why the GC is kicking in. A longer log would allow the community to perhaps provide suggestions as well. Which reminds me that there is a bug in the printing of GC cause (as printed by jstat) which needs to be fixed. HotSpot/GC folk, have you noticed that we never see a "perm gen allocation" as the GC cause even when that's really the reason for a full gc? (not that that should happen here where CMS is being used.) -- ramki On Tue, Apr 17, 2012 at 8:08 AM, Shivkumar Chelwa > wrote: Hi, Till date I was using JRE 6u22 with following garbage collection parameters and the CMS cycle use to kick-in appropriately (when heap reaches 75%) -server -d64 -Xms2048m -Xmx2048m -javaagent: my-instrumentation.jar -XX:MaxPermSize=256m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+DisableExplicitGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -verbose:gc -Xloggc:LB01.log -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -Djava.library=/usr/java/jre/lib/sparcv9/server/libjvm.so -Xmx8192M -Xms8192M -Xss256K But I switched to JRE 6u29 and see the CMS Full GC happening randomly. Can you please help me undercover this mystery. Here is one of the log message from gc log file. 13475.239: [Full GC 13475.239: [CMS: 4321575K->3717474K(7898752K), 54.0602376 secs] 4412277K->3717474K(8339648K), [CMS Perm : 73791K->73339K(74048K)], 54.0615557 secs] [Times: user=53.97 sy s=0.12, real=54.06 secs] Kindly help. Regards, Shiv _______________________________________________ hotspot-gc-use mailing list hotspot-gc-use at openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120425/add7f67f/attachment-0001.html From ysr1729 at gmail.com Wed Apr 25 00:01:37 2012 From: ysr1729 at gmail.com (Srinivas Ramakrishna) Date: Wed, 25 Apr 2012 00:01:37 -0700 Subject: Promotion Failed when the Old Generation Usage is very low. In-Reply-To: <4F96E7AB.9AAE.00F7.0@lombardrisk.com> References: <4F96E7AB.9AAE.00F7.0@lombardrisk.com> Message-ID: Bond, you are apparently using an MP box. I'd suggest losing the "incremental" options entirely and dropping the max tenuring threshold to 8 or so. I'd make use of the size of the young gen and the survivor spaces to control promotion into the old gen, which i would size at two times your application footprint plus the size of the young gen as a starting point and refine from there. There have been some suggestions on this alias from Chi-Ho Kwok etc. on the importance of reducing promotion of very young objects into the old generation to prevent fragmentation. LOnger-lived objects typically imply (for most but not all applications) relatively stable and less non-stationary distributions which CMS block inventorying heuristics prefer. more inline below... On Tue, Apr 24, 2012 at 2:49 AM, Bond Chen wrote: > Hi , > > We're suffering high frequent promotion failed and concurrent mode > failure, cause very long GC pause(5 seconds to 1000 seconds even more some > time) attached the '1st promote failed' and '49th promotion failed' of > gc.log > > 1, The '1st promote failed' caused by the old generation usage is too > high, no enough space for promotion, but the '49th promotion failed', only > used > 2615456K out of 10387456K, what happed? > either a large object allocation or fragmentation or more likely both. > > 2, Does the CMS throwing 'Concurrent Mode Failure' combat the old > generation? move all objects together and leave only one free block? or > Only 'Full GC' does this? > concurrent mode failure results in compaction, yes. > > 3, when will 'Promotion failure' cause ''Concurrent Mode Failure' and some > time 'Full GC' ? > it's just a notional difference. Both should be called "concurrent mode failure". I think the newer mesages say "concurrent mode interrupted" and "full gc" respectively. In the latter case there is not an ongoing concurrent cycle that was interrupted. From the standpoint of the effect on the application (long pause for gc) and of the state of the heap after gc (fully compacted) there is little difference. For historical reasons, "concurrent mode failure" usually results in longer pauses because an ongoing concurrent collection phase first completes an ongoing phase before bailing to compaction, whereas in the latter case there is no such delay so is usually less painful. -- ramki > Regards, > Bond > > /****parameter ***/ > ### New JVM Parameter > #Below line changed per RH recommendation 15 Dec 2009 > #export RUN_ARGS=" -d64 -server -Xms2048M -Xmx12144M -XX:PermSize=512m > -XX:MaxPermSize=512m -Xss1024k " > export RUN_ARGS=" -server -d64 -Xms2048M -Xmx12144M -XX:PermSize=512m > -XX:MaxPermSize=512m -Xss1024k " > > export RUN_ARGS=" $RUN_ARGS -XX:+UseParNewGC -XX:+UseConcMarkSweepGC > -XX:+CMSParallelRemarkEnabled -XX:+UseTLAB -XX:+CMSIncrementalMode " > > #Below line commented per RH recommendation 15 Dec 2009 > #export RUN_ARGS=" $RUN_ARGS -XX:+UseCMSCompactAtFullCollection " > > #Below line changed per RH recommendation 15 Dec 2009 > #export RUN_ARGS=" $RUN_ARGS -XX:+CMSIncrementalPacing > -XX:CMSIncrementalDutyCycleMin=0 -XX:CMSIncrementalDutyCycle=10 > -XX:MaxTenuringThreshold=0 " > export RUN_ARGS=" $RUN_ARGS -XX:+CMSIncrementalPacing > -XX:CMSIncrementalDutyCycleMin=0 -XX:CMSIncrementalDutyCycle=10 > -XX:MaxTenuringThreshold=32 " > > #Below 2 lines added per RH recommendation 15 Dec 2009 > #export RUN_ARGS=" -XX:ParallelGCThreads=13 " > #export RUN_ARGS=" -XX:SurvivorRatio=48 " > > #Below 2 lines added per RH recommendation 16 Dec 2009 > RUN_ARGS=" $RUN_ARGS -XX:ParallelGCThreads=13 " > RUN_ARGS=" $RUN_ARGS -XX:SurvivorRatio=48 " > > ### set for cluster monitor added on 25-Jun-2011 > export RUN_ARGS="$RUN_ARGS -Djboss.cluster.monitor.switch=y"; > export RUN_ARGS="$RUN_ARGS -Djboss.cluster.number=2"; > > #Below line changed with RELEASE_2009_1_SP10.2 on 26 Feb 2010 > #export RUN_ARGS=" $RUN_ARGS -XX:CMSInitiatingOccupancyFraction=70 > -XX:+PrintTenuringDistribution -Xmn512m -XX:+UseLargePages > -XX:LargePageSizeInBytes=64k " > export RUN_ARGS=" $RUN_ARGS -XX:CMSInitiatingOccupancyFraction=60 > -Xmn2000m -XX:+UseLargePages -XX:LargePageSizeInBytes=64k " > > #Below line added with RELEASE_2009_1_SP10.2 on 26 Feb 2010 > export RUN_ARGS=" $RUN_ARGS -XX:+CMSClassUnloadingEnabled > -XX:+ExplicitGCInvokesConcurrent -XX:+AggressiveOpts " > > export RUN_ARGS=" $RUN_ARGS -XX:+PrintGCDetails > -XX:+PrintGCApplicationStoppedTime -Xloggc:./gc_${start_ts}.log " > > export RUN_ARGS=" $RUN_ARGS -Dsun.rmi.dgc.server.gcInterval=18000000 > -Dsun.rmi.dgc.client.gcInterval=18000000 -verbose:gc" > > > > /***parameter > > > > > > > > /** the 1st promotion failed **/ > > 169682.980: [GC Before GC: > Statistics for BinaryTreeDictionary: > ------------------------------------ > Total Free Space: 7127332 > Max Chunk Size: 6041118 > Number of Blocks: 1785 > Av. Block Size: 3992 > Tree Height: 24 > Before GC: > Statistics for BinaryTreeDictionary: > ------------------------------------ > Total Free Space: 6834133 > Max Chunk Size: 97353 > Number of Blocks: 4773 > Av. Block Size: 1431 > Tree Height: 27 > 169682.981: [ParNew (promotion failed): 2007040K->2007040K(2007040K), > 48.9558338 secs]169731.937: [CMS169741.903: [CMS-concurrent-sweep: > 10.823/99.414 secs] [Times: user=127.09 sys=25.97, real=99.41 secs] > (concurrent mode failure): 8681490K->2319271K(10387456K), 44.6304362 > secs] 10395485K->2319271K(12394496K), [CMS Perm : > 291584K->290856K(524288K)]After GC: > Statistics for BinaryTreeDictionary: > ------------------------------------ > Total Free Space: 1032711195 > Max Chunk Size: 1032711195 > Number of Blocks: 1 > Av. Block Size: 1032711195 > Tree Height: 1 > After GC: > Statistics for BinaryTreeDictionary: > ------------------------------------ > Total Free Space: 0 > Max Chunk Size: 0 > Number of Blocks: 0 > Tree Height: 0 > icms_dc=16 , 93.5876901 secs] [Times: user=97.28 sys=21.58, real=93.59 > secs] > > /** the 1st promotion failed **/ > > > > /** the 49th promotion failed ***/ > 298786.901: [GC Before GC: > Statistics for BinaryTreeDictionary: > ------------------------------------ > Total Free Space: 236997970 > Max Chunk Size: 236997970 > Number of Blocks: 1 > Av. Block Size: 236997970 > Tree Height: 1 > Before GC: > Statistics for BinaryTreeDictionary: > ------------------------------------ > Total Free Space: 0 > Max Chunk Size: 0 > Number of Blocks: 0 > Tree Height: 0 > 298786.902: [ParNew (promotion failed): 2007039K->2007040K(2007040K), > 4.5565939 secs]298791.458: [CMS: 2615456K->1813239K(10387456K), 19.2232319 > secs] 4346089K->1813239K(12394496K), [CMS Perm : > 299206K->299126K(524288K)]After GC: > Statistics for BinaryTreeDictionary: > ------------------------------------ > Total Free Space: 1097483360 > Max Chunk Size: 1097483360 > Number of Blocks: 1 > Av. Block Size: 1097483360 > Tree Height: 1 > After GC: > Statistics for BinaryTreeDictionary: > ------------------------------------ > Total Free Space: 0 > Max Chunk Size: 0 > Number of Blocks: 0 > Tree Height: 0 > icms_dc=0 , 23.7805042 secs] [Times: user=25.47 sys=0.02, real=23.78 secs] > Total time for which application threads were stopped: 23.7861234 seconds > /** the 49th promotion failed ***/ > > This e-mail together with any attachments (the "Message") is confidential > and may contain privileged information. If you are not the intended > recipient (or have received this e-mail in error) please notify the sender > immediately and delete this Message from your system. Any unauthorized > copying, disclosure, distribution or use of this Message is strictly > forbidden. > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120425/fd0eec46/attachment.html From Bond.Chen at lombardrisk.com Wed Apr 25 23:45:58 2012 From: Bond.Chen at lombardrisk.com (Bond Chen) Date: Thu, 26 Apr 2012 07:45:58 +0100 Subject: Promotion Failed when the Old Generation Usage is very low. In-Reply-To: References: <4F96E7AB.9AAE.00F7.0@lombardrisk.com> Message-ID: <4F995FA6.9AAE.00F7.0@lombardrisk.com> Hi Srinivas, Thanks very much for your response, very glad to have expert can talk about GC issue. For my question#2, your answer is 'concurrent mode failure' cause Old generation compaction, I have attach a piece of gc log from real production of our client, there're 4 ParNew GCs, the 1st one at time 298550.966 having Promotion Failed and Concurrent Mode Failure, the 2nd and 3rd are OK, the 4th one at 298786.902 having Promotion Failed again, but the whole old generation only used 2615456K out of 10387456K, and have been compacted at the 1st time. which confuses me a lot. Regards, Bond >>> Srinivas Ramakrishna 4/25/2012 3:01 PM >>> Bond, you are apparently using an MP box. I'd suggest losing the "incremental" options entirely and dropping the max tenuring threshold to 8 or so. I'd make use of the size of the young gen and the survivor spaces to control promotion into the old gen, which i would size at two times your application footprint plus the size of the young gen as a starting point and refine from there. There have been some suggestions on this alias from Chi-Ho Kwok etc. on the importance of reducing promotion of very young objects into the old generation to prevent fragmentation. LOnger-lived objects typically imply (for most but not all applications) relatively stable and less non-stationary distributions which CMS block inventorying heuristics prefer. more inline below... On Tue, Apr 24, 2012 at 2:49 AM, Bond Chen wrote: > Hi , > > We're suffering high frequent promotion failed and concurrent mode > failure, cause very long GC pause(5 seconds to 1000 seconds even more some > time) attached the '1st promote failed' and '49th promotion failed' of > gc.log > > 1, The '1st promote failed' caused by the old generation usage is too > high, no enough space for promotion, but the '49th promotion failed', only > used > 2615456K out of 10387456K, what happed? > either a large object allocation or fragmentation or more likely both. > > 2, Does the CMS throwing 'Concurrent Mode Failure' combat the old > generation? move all objects together and leave only one free block? or > Only 'Full GC' does this? > concurrent mode failure results in compaction, yes. > > 3, when will 'Promotion failure' cause ''Concurrent Mode Failure' and some > time 'Full GC' ? > it's just a notional difference. Both should be called "concurrent mode failure". I think the newer mesages say "concurrent mode interrupted" and "full gc" respectively. In the latter case there is not an ongoing concurrent cycle that was interrupted. From the standpoint of the effect on the application (long pause for gc) and of the state of the heap after gc (fully compacted) there is little difference. For historical reasons, "concurrent mode failure" usually results in longer pauses because an ongoing concurrent collection phase first completes an ongoing phase before bailing to compaction, whereas in the latter case there is no such delay so is usually less painful. -- ramki > Regards, > Bond > > /****parameter ***/ > ### New JVM Parameter > #Below line changed per RH recommendation 15 Dec 2009 > #export RUN_ARGS=" -d64 -server -Xms2048M -Xmx12144M -XX:PermSize=512m > -XX:MaxPermSize=512m -Xss1024k " > export RUN_ARGS=" -server -d64 -Xms2048M -Xmx12144M -XX:PermSize=512m > -XX:MaxPermSize=512m -Xss1024k " > > export RUN_ARGS=" $RUN_ARGS -XX:+UseParNewGC -XX:+UseConcMarkSweepGC > -XX:+CMSParallelRemarkEnabled -XX:+UseTLAB -XX:+CMSIncrementalMode " > > #Below line commented per RH recommendation 15 Dec 2009 > #export RUN_ARGS=" $RUN_ARGS -XX:+UseCMSCompactAtFullCollection " > > #Below line changed per RH recommendation 15 Dec 2009 > #export RUN_ARGS=" $RUN_ARGS -XX:+CMSIncrementalPacing > -XX:CMSIncrementalDutyCycleMin=0 -XX:CMSIncrementalDutyCycle=10 > -XX:MaxTenuringThreshold=0 " > export RUN_ARGS=" $RUN_ARGS -XX:+CMSIncrementalPacing > -XX:CMSIncrementalDutyCycleMin=0 -XX:CMSIncrementalDutyCycle=10 > -XX:MaxTenuringThreshold=32 " > > #Below 2 lines added per RH recommendation 15 Dec 2009 > #export RUN_ARGS=" -XX:ParallelGCThreads=13 " > #export RUN_ARGS=" -XX:SurvivorRatio=48 " > > #Below 2 lines added per RH recommendation 16 Dec 2009 > RUN_ARGS=" $RUN_ARGS -XX:ParallelGCThreads=13 " > RUN_ARGS=" $RUN_ARGS -XX:SurvivorRatio=48 " > > ### set for cluster monitor added on 25-Jun-2011 > export RUN_ARGS="$RUN_ARGS -Djboss.cluster.monitor.switch=y"; > export RUN_ARGS="$RUN_ARGS -Djboss.cluster.number=2"; > > #Below line changed with RELEASE_2009_1_SP10.2 on 26 Feb 2010 > #export RUN_ARGS=" $RUN_ARGS -XX:CMSInitiatingOccupancyFraction=70 > -XX:+PrintTenuringDistribution -Xmn512m -XX:+UseLargePages > -XX:LargePageSizeInBytes=64k " > export RUN_ARGS=" $RUN_ARGS -XX:CMSInitiatingOccupancyFraction=60 > -Xmn2000m -XX:+UseLargePages -XX:LargePageSizeInBytes=64k " > > #Below line added with RELEASE_2009_1_SP10.2 on 26 Feb 2010 > export RUN_ARGS=" $RUN_ARGS -XX:+CMSClassUnloadingEnabled > -XX:+ExplicitGCInvokesConcurrent -XX:+AggressiveOpts " > > export RUN_ARGS=" $RUN_ARGS -XX:+PrintGCDetails > -XX:+PrintGCApplicationStoppedTime -Xloggc:./gc_${start_ts}.log " > > export RUN_ARGS=" $RUN_ARGS -Dsun.rmi.dgc.server.gcInterval=18000000 > -Dsun.rmi.dgc.client.gcInterval=18000000 -verbose:gc" > > > > /***parameter > > > > > > > > /** the 1st promotion failed **/ > > 169682.980: [GC Before GC: > Statistics for BinaryTreeDictionary: > ------------------------------------ > Total Free Space: 7127332 > Max Chunk Size: 6041118 > Number of Blocks: 1785 > Av. Block Size: 3992 > Tree Height: 24 > Before GC: > Statistics for BinaryTreeDictionary: > ------------------------------------ > Total Free Space: 6834133 > Max Chunk Size: 97353 > Number of Blocks: 4773 > Av. Block Size: 1431 > Tree Height: 27 > 169682.981: [ParNew (promotion failed): 2007040K->2007040K(2007040K), > 48.9558338 secs]169731.937: [CMS169741.903: [CMS-concurrent-sweep: > 10.823/99.414 secs] [Times: user=127.09 sys=25.97, real=99.41 secs] > (concurrent mode failure): 8681490K->2319271K(10387456K), 44.6304362 > secs] 10395485K->2319271K(12394496K), [CMS Perm : > 291584K->290856K(524288K)]After GC: > Statistics for BinaryTreeDictionary: > ------------------------------------ > Total Free Space: 1032711195 > Max Chunk Size: 1032711195 > Number of Blocks: 1 > Av. Block Size: 1032711195 > Tree Height: 1 > After GC: > Statistics for BinaryTreeDictionary: > ------------------------------------ > Total Free Space: 0 > Max Chunk Size: 0 > Number of Blocks: 0 > Tree Height: 0 > icms_dc=16 , 93.5876901 secs] [Times: user=97.28 sys=21.58, real=93.59 > secs] > > /** the 1st promotion failed **/ > > > > /** the 49th promotion failed ***/ > 298786.901: [GC Before GC: > Statistics for BinaryTreeDictionary: > ------------------------------------ > Total Free Space: 236997970 > Max Chunk Size: 236997970 > Number of Blocks: 1 > Av. Block Size: 236997970 > Tree Height: 1 > Before GC: > Statistics for BinaryTreeDictionary: > ------------------------------------ > Total Free Space: 0 > Max Chunk Size: 0 > Number of Blocks: 0 > Tree Height: 0 > 298786.902: [ParNew (promotion failed): 2007039K->2007040K(2007040K), > 4.5565939 secs]298791.458: [CMS: 2615456K->1813239K(10387456K), 19.2232319 > secs] 4346089K->1813239K(12394496K), [CMS Perm : > 299206K->299126K(524288K)]After GC: > Statistics for BinaryTreeDictionary: > ------------------------------------ > Total Free Space: 1097483360 > Max Chunk Size: 1097483360 > Number of Blocks: 1 > Av. Block Size: 1097483360 > Tree Height: 1 > After GC: > Statistics for BinaryTreeDictionary: > ------------------------------------ > Total Free Space: 0 > Max Chunk Size: 0 > Number of Blocks: 0 > Tree Height: 0 > icms_dc=0 , 23.7805042 secs] [Times: user=25.47 sys=0.02, real=23.78 secs] > Total time for which application threads were stopped: 23.7861234 seconds > /** the 49th promotion failed ***/ > > This e-mail together with any attachments (the "Message") is confidential > and may contain privileged information. If you are not the intended > recipient (or have received this e-mail in error) please notify the sender > immediately and delete this Message from your system. Any unauthorized > copying, disclosure, distribution or use of this Message is strictly > forbidden. > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > This e-mail together with any attachments (the "Message") is confidential and may contain privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and delete this Message from your system. Any unauthorized copying, disclosure, distribution or use of this Message is strictly forbidden. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120426/92573cf1/attachment-0001.html -------------- next part -------------- A non-text attachment was scrubbed... Name: gc_twoNeibouringPromotionFailure.log Type: application/octet-stream Size: 7272 bytes Desc: not available Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120426/92573cf1/gc_twoNeibouringPromotionFailure-0001.log From ion.savin at tora.com Thu Apr 26 01:44:50 2012 From: ion.savin at tora.com (Ion Savin) Date: Thu, 26 Apr 2012 11:44:50 +0300 Subject: -Xloggc and -XX:+Max*=n values (was: Re: Two basic questions on -verbosegc and -XX:+PrintTenuringDistribution) In-Reply-To: <4F947181.50003@kippdata.de> References: <4F947181.50003@kippdata.de> Message-ID: <4F990B02.4020004@tora.com> Hi, > Have a look at -XX:+PrintHeapAtGC. This will help you get more precise > numbers. Is there any flag which can be used to have the max values for heap, young and perm listed in the GC log (-Xloggc)? I know there's -XX:+PrintCommandLineFlags but the output is sent to stdout not the gc log file. Regards, Ion Savin From Martin.Hare-Robertson at metaswitch.com Thu Apr 26 06:35:03 2012 From: Martin.Hare-Robertson at metaswitch.com (Martin Hare-Robertson) Date: Thu, 26 Apr 2012 13:35:03 +0000 Subject: PermGen Collection Issues Message-ID: <01E0A60827F5E5459B77A1D0FB9B524B46F3CB68@ENFICSMBX1.datcon.co.uk> Hi, I am hitting a "java.lang.OutOfMemoryError: PermGen space" in a situation where I think a great deal of the perm gen is actually eligible for collection. I am running Tomcat 6 using a 32 bit Hotspot JVM (1.6.0_07). I have had some trouble in the past with classloader leaks when I reload webapps within Tomcat. I have therefore been doing some testing to explicitly reload my webapp many times and then use heap dumps to track down any classloader leaks. I have a script for testing the reloading of my webapp which does the following: 1) Submit 5 login requests to ensure that the webapp is loaded and working. 2) Reload the webapp. 3) Wait 30 seconds to give threads from the old webapp sufficient time to terminate. 4) Goto 1) I have fixed a number of bugs and now think that my webapp isn't leaking any classloaders. However, when I run this script I find that OutOfMemoryErrors get thrown after ~30 iterations. However, the heap dump which was made at the time when the OutOfMemoryError is thrown shows that all of the old Classloaders are only weakly referenced (according to Eclipse MAT). This suggests to me that something has gone wrong with the garbage collection that it has thrown an OutOfMemoryError when there was memory which was only weakly referenced which should have been freed. I ran jstat to record the GC activity and the perm gen ends up stuck at 100% full and the last GC cause is always one of "Permanent Generation Full" and "Last ditch collection". My GC command line options are as follows: -Xms600m -Xmx600m -XX:+UseMembar -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:CMSInitiatingOccupancyFraction=80 -XX:+UseCMSInitiatingOccupancyOnly -XX:PermSize=120m -XX:MaxPermSize=120m -XX:NewSize=128m -XX:SurvivorRatio=8 -Xss120k My issue seems similar to this old bug (http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6545719) which claims to have been fixed in 6u4. When exactly is a "java.lang.OutOfMemoryError: PermGen space" error allowed to be thrown? Presumably this occurs when a thread attempts to allocate into the perm gen and the perm gen collector is unable to free up sufficient space? When the perm gen is collected would you expect all garbage to be collected or does the collector quit early? I have attached an example graph showing the perm gen occupancy which I am seeing during a test. This seems to show only a small amount of perm gen being freed at each collection. MartinHR ___ Here is the complete jstat output from my test run: -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120426/69fe168d/attachment-0001.html -------------- next part -------------- A non-text attachment was scrubbed... Name: jstat.out Type: application/octet-stream Size: 543377 bytes Desc: jstat.out Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120426/69fe168d/jstat-0001.out -------------- next part -------------- A non-text attachment was scrubbed... Name: perm.png Type: image/png Size: 2338 bytes Desc: perm.png Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120426/69fe168d/perm-0001.png From holger.hoffstaette at googlemail.com Thu Apr 26 15:59:40 2012 From: holger.hoffstaette at googlemail.com (=?windows-1252?Q?Holger_Hoffst=E4tte?=) Date: Fri, 27 Apr 2012 00:59:40 +0200 Subject: PermGen Collection Issues In-Reply-To: <01E0A60827F5E5459B77A1D0FB9B524B46F3CB68@ENFICSMBX1.datcon.co.uk> References: <01E0A60827F5E5459B77A1D0FB9B524B46F3CB68@ENFICSMBX1.datcon.co.uk> Message-ID: <4F99D35C.3000800@googlemail.com> On 26.04.2012 15:35, Martin Hare-Robertson wrote: > I am hitting a ?java.lang.OutOfMemoryError: PermGen space? in a situation > where I think a great deal of the perm gen is actually eligible for > collection. By default CMS does not collect classes; try with -XX:+CMSClassUnloadingEnabled. > I am running Tomcat 6 using a 32 bit Hotspot JVM (1.6.0_07). I have had Those are fairly old too. Also note that just because your app code does not leak, many (badly written) libraries keep static state or hidden internal threads alive unless they are explicitly shut down/cleaned up. -h From Martin.Hare-Robertson at metaswitch.com Fri Apr 27 01:22:33 2012 From: Martin.Hare-Robertson at metaswitch.com (Martin Hare-Robertson) Date: Fri, 27 Apr 2012 08:22:33 +0000 Subject: PermGen Collection Issues In-Reply-To: <01E0A60827F5E5459B77A1D0FB9B524B5DF833A9@ENFICSMBX1.datcon.co.uk> References: <01E0A60827F5E5459B77A1D0FB9B524B5DF833A9@ENFICSMBX1.datcon.co.uk> Message-ID: <01E0A60827F5E5459B77A1D0FB9B524B5DF843CF@ENFICSMBX1.datcon.co.uk> >> I am hitting a "java.lang.OutOfMemoryError: PermGen space" in a situation >> where I think a great deal of the perm gen is actually eligible for >> collection. > >By default CMS does not collect classes; try with >-XX:+CMSClassUnloadingEnabled. Running with -XX:+CMSClassUnloadingEnabled improved the results as Tomcat survived 58 webapp reloads (compared to 30 without CMSClassUnloadingEnabled) before hitting the same endless (Permanent Generation Full/Last ditch collection) GC issues and throwing "OutOfMemoryError: PermGen space". >> I am running Tomcat 6 using a 32 bit Hotspot JVM (1.6.0_07). I have had > >Those are fairly old too. Also note that just because your app code does >not leak, many (badly written) libraries keep static state or hidden >internal threads alive unless they are explicitly shut down/cleaned up. The heap dumps which I have taken should reveal if any libraries are doing anything to keep old webapp classloaders alive. According to Eclipse MAT the only path to a GC root for the old classloaders is through weak references. Is there anything else I could try to fix this? Is it possible that this is a JVM/GC bug? From Martin.Hare-Robertson at metaswitch.com Fri Apr 27 06:04:15 2012 From: Martin.Hare-Robertson at metaswitch.com (Martin Hare-Robertson) Date: Fri, 27 Apr 2012 13:04:15 +0000 Subject: Java 7 Perm Gen Message-ID: <01E0A60827F5E5459B77A1D0FB9B524B5DF85709@ENFICSMBX1.datcon.co.uk> Hi, I see that one of the plans for Java 7+ is to retire the perm gen as a separate space. Can you confirm what the progress is of this project as of Java 7u4 and when we can expect to see the complete removal of the Perm Gen? Are the former contents of the Perm Gen all being moved out into native memory? Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120427/2f7199a2/attachment.html From jon.masamitsu at oracle.com Fri Apr 27 10:17:45 2012 From: jon.masamitsu at oracle.com (Jon Masamitsu) Date: Fri, 27 Apr 2012 10:17:45 -0700 Subject: Java 7 Perm Gen In-Reply-To: <01E0A60827F5E5459B77A1D0FB9B524B5DF85709@ENFICSMBX1.datcon.co.uk> References: <01E0A60827F5E5459B77A1D0FB9B524B5DF85709@ENFICSMBX1.datcon.co.uk> Message-ID: <4F9AD4B9.8000404@oracle.com> The removal of the permanent generation is planned for jdk 8. It will not go into a jdk 7 update soon. Once it's stabilized, we'll consider it for a jdk 7 update. The vast majority of the current contents of perm gen will go into native memory. There is some that may move to the Java heap. Jon On 4/27/2012 6:04 AM, Martin Hare-Robertson wrote: > Hi, > > I see that one of the plans for Java 7+ is to retire the perm gen as a separate space. > > Can you confirm what the progress is of this project as of Java 7u4 and when we can expect to see the complete removal of the Perm Gen? Are the former contents of the Perm Gen all being moved out into native memory? > > Martin > > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120427/71598783/attachment.html From performanceguy at gmail.com Fri Apr 27 14:03:14 2012 From: performanceguy at gmail.com (John O'Brien) Date: Fri, 27 Apr 2012 14:03:14 -0700 Subject: Question about -XX:+PrintTenuringDistribution and age not being printed. Message-ID: Hi everyone, I understand that : 1) par-new has features that make it work with CMS. 2) par-scavenge does not have these features and is incompatible with CMS. 3) Otherwise they are the same core algorithm...both parallel stop the world copying collectors. Why does PrintTenuringDistribution only print out the ages when ParNew is enabled? If they are the same algorithm then shouldn't they both print out age ? par-scavenge does not print "ages" for me when PrintTenuringDistribution is on. I use Parallel Old and Parallel Scavenge (ParNew can't be used with Parallel Old). Ref: https://blogs.oracle.com/jonthecollector/entry/our_collectors I searched the mailing lists and did not see anything , read some blogs and looked through some books. Regards, John From jon.masamitsu at oracle.com Fri Apr 27 14:29:26 2012 From: jon.masamitsu at oracle.com (Jon Masamitsu) Date: Fri, 27 Apr 2012 14:29:26 -0700 Subject: Question about -XX:+PrintTenuringDistribution and age not being printed. In-Reply-To: References: Message-ID: <4F9B0FB6.8090803@oracle.com> John, On 4/27/2012 2:03 PM, John O'Brien wrote: > Hi everyone, > > I understand that : > > 1) par-new has features that make it work with CMS. Yes. > 2) par-scavenge does not have these features and is incompatible with CMS. Yes. > 3) Otherwise they are the same core algorithm...both parallel stop the > world copying collectors. ParNew and Parallel Scavenge are two different implementations of parallel STW collectors. They share some code but much is different. ParallelScavenge support UseAdaptiveSizePolicy and ParNew does not (never finished). ParallelScavenge varies the tenuring threshold to keep the survivor spaces from overflowing and also varies the sizes of the survivor spaces relative to eden while ParNew has a fixed ratio between eden and the survivor sizes. It's hard to keep track of the differences. > Why does PrintTenuringDistribution only print out the ages when ParNew > is enabled? > If they are the same algorithm then shouldn't they both print out age > ? par-scavenge does not print "ages" for me when > PrintTenuringDistribution is on. Not the same algorithm. > I use Parallel Old and Parallel Scavenge (ParNew can't be used with > Parallel Old). Correct. Jon > Ref: https://blogs.oracle.com/jonthecollector/entry/our_collectors > > I searched the mailing lists and did not see anything , read some > blogs and looked through some books. > > > Regards, > John > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From comp_ at gmx.net Fri Apr 27 14:57:30 2012 From: comp_ at gmx.net (Ion Savin) Date: Sat, 28 Apr 2012 00:57:30 +0300 Subject: heap expanded after young gen gc? Message-ID: <4F9B164A.5070204@gmx.net> Hi, From the GC log below it seems that the heap gets expanded after young gen collection also (131008K total at 0.030 got expanded to 262080K total at 0.184). I was under the impression that this happens only during Full GC (which might be triggered by the need to resize). 0.030: [GC [PSYoungGen: 128K->64K(192K)] 128K->128K(131008K), 0.0005310 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 0.184: [GC [PSYoungGen: 138K->64K(192K)] 261932K->261881K(262080K), 0.0021120 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] 0.186: [Full GC [PSYoungGen: 64K->0K(192K)] [PSOldGen: 261817K->122K(130816K)] 261881K->122K(131008K) [PSPermGen: 2552K->2552K(21248K)], 0.0110530 secs] [Times: user=0.01 sys=0.01, real=0.01 secs] 0.297: [GC [PSYoungGen: 2K->32K(192K)] 261854K->261883K(262080K), 0.0013380 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] If expansion can happen after a young gen collection how is the -XX:MinHeapFreeRatio (and -XX:MaxHeapFreeRatio) flag interpreted given that after young gen GC objects might get promoted to the tenured generation filling up the heap above the min ratio? The heap is sized like this: -Xms128m -Xmx256m -Xmn256k And the app is just generating collectable junk in a loop. What I would expect to happen is for the heap to fill up to 128m, a full gc to happen and since the old gen free space is over the default 40% min heap free before expansion no heap resize to happen. Please advise. Thank you! Regards, Ion Savin From jobrien at ieee.org Fri Apr 27 15:30:13 2012 From: jobrien at ieee.org (John O'Brien) Date: Fri, 27 Apr 2012 15:30:13 -0700 Subject: Question about -XX:+PrintTenuringDistribution and age not being printed. In-Reply-To: <4F9B0FB6.8090803@oracle.com> References: <4F9B0FB6.8090803@oracle.com> Message-ID: Thanks for the quick response today. To finish up,using Parallel old with UseAdaptivePolicy is true by default and I -XX:MaxTenuringThreshold=3 I expected an incompatible parameter error. I did not get it. The logs show the tenuring threshold adjusted below and above the value of 5 e.g "new threshold 7 (max 5)." Then I decided to switch of useAdaptiveSizePolicy and see if threshold would be stuck at 5 but no threshold logs were printed. My question: Does -XX:MaxTenuringThreshold=3 work with ParallelScavenge? (Seems not). Looks like my override is to fix size the survivor spaces through use of some other flags? This will also turn off adaptivesizepolicy as it is not needed. Regards, John On Fri, Apr 27, 2012 at 2:29 PM, Jon Masamitsu wrote: > John, > > > On 4/27/2012 2:03 PM, John O'Brien wrote: >> Hi everyone, >> >> I understand that : >> >> 1) par-new has features that make it work with CMS. > > Yes. > >> 2) par-scavenge does not have these features and is incompatible with CMS. > Yes. >> 3) Otherwise they are the same core algorithm...both parallel stop the >> world copying collectors. > > ParNew and Parallel Scavenge are two different implementations of > parallel STW > collectors. ?They share some code but much is different. ?ParallelScavenge > support UseAdaptiveSizePolicy and ParNew does not (never finished). > ParallelScavenge > varies the tenuring threshold to keep the survivor spaces from > overflowing and > also varies the sizes of the survivor spaces relative to eden while ParNew > has a fixed ratio between eden and the survivor sizes. ?It's hard to > keep track > of the differences. > >> Why does PrintTenuringDistribution only print out the ages when ParNew >> is enabled? >> If they are the same algorithm then shouldn't they both print out age >> ? par-scavenge does not print "ages" for me when >> PrintTenuringDistribution is on. > > Not the same algorithm. > >> I use Parallel Old and Parallel Scavenge (ParNew can't be used with >> Parallel Old). > > Correct. > > Jon > >> Ref: https://blogs.oracle.com/jonthecollector/entry/our_collectors >> >> I searched the mailing lists and did not see anything , read some >> blogs and looked through some books. >> >> >> Regards, >> John >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From jon.masamitsu at oracle.com Fri Apr 27 18:09:52 2012 From: jon.masamitsu at oracle.com (Jon Masamitsu) Date: Fri, 27 Apr 2012 18:09:52 -0700 Subject: Question about -XX:+PrintTenuringDistribution and age not being printed. In-Reply-To: References: <4F9B0FB6.8090803@oracle.com> Message-ID: <4F9B4360.4090108@oracle.com> On 4/27/2012 3:30 PM, John O'Brien wrote: > Thanks for the quick response today. > > To finish up,using Parallel old with UseAdaptivePolicy is true by > default and I -XX:MaxTenuringThreshold=3 I expected an incompatible > parameter error. Our consistency checking is not that good but in this case MaxTenuringThreshold is used by ParallelScavenge but not in the same way as ParNew. ParallelScavenge picks a tenuring threshold that it thinks will keep the survivor spaces from over flowing. If the survivor spaces don't have much in them after a scavenge, ParallelScavenge may raise the tenuring threshold to better use the survivor spaces but not above MaxTenuringThreshold. > I did not get it. The logs show the tenuring threshold adjusted > below and above the value of 5 e.g "new threshold 7 (max 5)." I haven't seen that behavior. That's a bug. > Then I decided to switch of useAdaptiveSizePolicy and see if threshold > would be stuck at 5 but no threshold logs were printed. The logging code is also guarded by UseAdaptiveSizePolicy. > My question: Does -XX:MaxTenuringThreshold=3 work with > ParallelScavenge? (Seems not). I just looked at the code and I don't see what's wrong. Jon > Looks like my override is to fix size the survivor spaces through use > of some other flags? This will also turn off adaptivesizepolicy as it > is not needed. > > Regards, > John > > > On Fri, Apr 27, 2012 at 2:29 PM, Jon Masamitsu wrote: >> John, >> >> >> On 4/27/2012 2:03 PM, John O'Brien wrote: >>> Hi everyone, >>> >>> I understand that : >>> >>> 1) par-new has features that make it work with CMS. >> Yes. >> >>> 2) par-scavenge does not have these features and is incompatible with CMS. >> Yes. >>> 3) Otherwise they are the same core algorithm...both parallel stop the >>> world copying collectors. >> ParNew and Parallel Scavenge are two different implementations of >> parallel STW >> collectors. They share some code but much is different. ParallelScavenge >> support UseAdaptiveSizePolicy and ParNew does not (never finished). >> ParallelScavenge >> varies the tenuring threshold to keep the survivor spaces from >> overflowing and >> also varies the sizes of the survivor spaces relative to eden while ParNew >> has a fixed ratio between eden and the survivor sizes. It's hard to >> keep track >> of the differences. >> >>> Why does PrintTenuringDistribution only print out the ages when ParNew >>> is enabled? >>> If they are the same algorithm then shouldn't they both print out age >>> ? par-scavenge does not print "ages" for me when >>> PrintTenuringDistribution is on. >> Not the same algorithm. >> >>> I use Parallel Old and Parallel Scavenge (ParNew can't be used with >>> Parallel Old). >> Correct. >> >> Jon >> >>> Ref: https://blogs.oracle.com/jonthecollector/entry/our_collectors >>> >>> I searched the mailing lists and did not see anything , read some >>> blogs and looked through some books. >>> >>> >>> Regards, >>> John >>> _______________________________________________ >>> hotspot-gc-use mailing list >>> hotspot-gc-use at openjdk.java.net >>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use