From aaisinzon at guidewire.com  Mon Apr  9 11:37:32 2012
From: aaisinzon at guidewire.com (Alex Aisinzon)
Date: Mon, 9 Apr 2012 18:37:32 +0000
Subject: Code cache
Message-ID: <43E49E6EC0E84F41B98C68AB6D7820C4170F7515@sm-ex-02-vm.guidewire.com>

I ran performance tests on one of our apps and saw the following error message in the GC logs:
Java HotSpot(TM) 64-Bit Server VM warning: CodeCache is full. Compiler has been disabled.
Java HotSpot(TM) 64-Bit Server VM warning: Try increasing the code cache size using -XX:ReservedCodeCacheSize=

I scaled up the code cache to 512MB (-XX:ReservedCodeCacheSize=512m) and markedly improved performance/scalability.

I have a few questions:

*         Is there a logging option that shows how much of the code cache is really used so that I find the right cache size without oversizing it?

*         What factors play into the code cache utilization? I would guess that the amount of code to compile is the dominant factor. Are there other factors like load: I would guess that some entries in the cache may get invalidated if not used much and load could be a factor in this.

I was running on Sun JVM 1.6 update 30 64 bit on x86-64.

Best

Alex A
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120409/5782b5d6/attachment.html 

From dawid.weiss at gmail.com  Wed Apr 11 07:24:28 2012
From: dawid.weiss at gmail.com (Dawid Weiss)
Date: Wed, 11 Apr 2012 16:24:28 +0200
Subject: ParNew promotion failed, no expected OOM.
Message-ID: <CAM21Rt-Txh5ceNreLJMTB9TXQCxcqZUSjNeu8VszqqLGst7c=g@mail.gmail.com>

Hi there,

We are measuring certain aspects of our algorithm with a test suite
which attempts to run close to the physical heap's maximum size. We do
it by doing a form of binary search based on the size of data passed
to the algorithm, where the lower bound is always "succeeded without
an OOM" and the upper bound is "threw an OOM". This works nice but
occasionally we experience an effective deadlock in which full GCs are
repeatedly invoked, the application makes progress but overall it's
several orders of magnitude slower than usual (hours instead of
seconds).

GC logs look like this:

[GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0220371
secs][CMS: 69016K->69014K(81152K), 0.1370901 secs]
86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)],
0.1591765 secs] [Times: user=0.20 sys=0.00, real=0.16 secs]
[GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0170617
secs][CMS: 69016K->69014K(81152K), 0.1235417 secs]
86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)],
0.1406872 secs] [Times: user=0.18 sys=0.00, real=0.14 secs]
[GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0191855
secs][CMS: 69016K->69014K(81152K), 0.1296462 secs]
86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)],
0.1488816 secs] [Times: user=0.18 sys=0.00, real=0.15 secs]
[GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0232418
secs][CMS: 69016K->69014K(81152K), 0.1300695 secs]
86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)],
0.1533590 secs] [Times: user=0.20 sys=0.00, real=0.15 secs]
[GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0190998
secs][CMS: 69016K->69014K(81152K), 0.1319668 secs]
86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)],
0.1511436 secs] [Times: user=0.18 sys=0.00, real=0.15 secs]
[GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0168998
secs][CMS: 69017K->69015K(81152K), 0.1359254 secs]
86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)],
0.1528776 secs] [Times: user=0.20 sys=0.01, real=0.16 secs]
[GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0214651
secs][CMS: 69017K->69015K(81152K), 0.1209494 secs]
86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)],
0.1424941 secs] [Times: user=0.18 sys=0.00, real=0.14 secs]
[GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0200897
secs][CMS: 69017K->69015K(81152K), 0.1244227 secs]
86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)],
0.1445654 secs] [Times: user=0.18 sys=0.00, real=0.14 secs]
[GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0203377
secs][CMS: 69017K->69015K(81152K), 0.1353857 secs]
86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)],
0.1558016 secs] [Times: user=0.19 sys=0.00, real=0.16 secs]
[GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0201951
secs][CMS: 69017K->69015K(81152K), 0.1289750 secs]
86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)],
0.1492306 secs] [Times: user=0.19 sys=0.00, real=0.15 secs]
[GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0206677
secs][CMS: 69017K->69015K(81152K), 0.1280734 secs]
86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)],
0.1488114 secs] [Times: user=0.18 sys=0.00, real=0.15 secs]
[GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0150225
secs][CMS: 69017K->69015K(81152K), 0.1301056 secs]
86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)],
0.1451940 secs] [Times: user=0.19 sys=0.01, real=0.14 secs]

The heap limit is intentionally left smallish and the routine where
this happens is in fact computational (it does allocate sporadic
objects but never releases them until finished).

This behavior is easy to reproduce on my Mac (quad core),

java version "1.6.0_31"
Java(TM) SE Runtime Environment (build 1.6.0_31-b04-414-11M3626)
Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01-414, mixed mode)

I read a bit about the nature of "promotion failed" and it's clear to
me (or so I think) why this is happening here. My questions are:

1) why isn't OOM being triggered by gc overhead limit? It should
easily be falling within the default thresholds,
2) is there anything one can do to prevent situation like the above
(other than manually fiddling with limits)?

Thanks in advance for any pointers and feedback,

Dawid

From ysr1729 at gmail.com  Wed Apr 11 11:24:05 2012
From: ysr1729 at gmail.com (Srinivas Ramakrishna)
Date: Wed, 11 Apr 2012 11:24:05 -0700
Subject: ParNew promotion failed, no expected OOM.
In-Reply-To: <CAM21Rt-Txh5ceNreLJMTB9TXQCxcqZUSjNeu8VszqqLGst7c=g@mail.gmail.com>
References: <CAM21Rt-Txh5ceNreLJMTB9TXQCxcqZUSjNeu8VszqqLGst7c=g@mail.gmail.com>
Message-ID: <CABzyjy=KSs1VUjrvD-dCmZ7hmtBsERnNJ+-wKET=4AXN3useJw@mail.gmail.com>

I believe this is missing the "gc overhead" threshold for the space limit.
As I have commented in the past, i think the GC overhead limit should
consider
not just the space free in the whole heap, but rather the difference
between the old gen
capacity and the sum of the space used in the young gen and the old gen
after a major
GC has competed, as a percentage of the old gen capacity. It almost seems
as though
you have a largish object in the young gen which will not fit in the space
free in the old gen,
o it will never be promoted unless sufficient space clears up in the old
gen, and from what
you are describing, that won't happen until your program terminates its
computation.

I think we need to fix the space criteria for overhead limit to deal
gracefully
with these kinds of situations.

On an unrelated note, for such a small heap, you should probably use
ParallelOldGC rather
than CMS, but I realize that you didn't explicitly ask for CMS, the mac
just gave it to you
because that's the default.

-- ramki

On Wed, Apr 11, 2012 at 7:24 AM, Dawid Weiss <dawid.weiss at gmail.com> wrote:

> Hi there,
>
> We are measuring certain aspects of our algorithm with a test suite
> which attempts to run close to the physical heap's maximum size. We do
> it by doing a form of binary search based on the size of data passed
> to the algorithm, where the lower bound is always "succeeded without
> an OOM" and the upper bound is "threw an OOM". This works nice but
> occasionally we experience an effective deadlock in which full GCs are
> repeatedly invoked, the application makes progress but overall it's
> several orders of magnitude slower than usual (hours instead of
> seconds).
>
> GC logs look like this:
>
> [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0220371
> secs][CMS: 69016K->69014K(81152K), 0.1370901 secs]
> 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)],
> 0.1591765 secs] [Times: user=0.20 sys=0.00, real=0.16 secs]
> [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0170617
> secs][CMS: 69016K->69014K(81152K), 0.1235417 secs]
> 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)],
> 0.1406872 secs] [Times: user=0.18 sys=0.00, real=0.14 secs]
> [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0191855
> secs][CMS: 69016K->69014K(81152K), 0.1296462 secs]
> 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)],
> 0.1488816 secs] [Times: user=0.18 sys=0.00, real=0.15 secs]
> [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0232418
> secs][CMS: 69016K->69014K(81152K), 0.1300695 secs]
> 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)],
> 0.1533590 secs] [Times: user=0.20 sys=0.00, real=0.15 secs]
> [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0190998
> secs][CMS: 69016K->69014K(81152K), 0.1319668 secs]
> 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)],
> 0.1511436 secs] [Times: user=0.18 sys=0.00, real=0.15 secs]
> [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0168998
> secs][CMS: 69017K->69015K(81152K), 0.1359254 secs]
> 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)],
> 0.1528776 secs] [Times: user=0.20 sys=0.01, real=0.16 secs]
> [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0214651
> secs][CMS: 69017K->69015K(81152K), 0.1209494 secs]
> 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)],
> 0.1424941 secs] [Times: user=0.18 sys=0.00, real=0.14 secs]
> [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0200897
> secs][CMS: 69017K->69015K(81152K), 0.1244227 secs]
> 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)],
> 0.1445654 secs] [Times: user=0.18 sys=0.00, real=0.14 secs]
> [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0203377
> secs][CMS: 69017K->69015K(81152K), 0.1353857 secs]
> 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)],
> 0.1558016 secs] [Times: user=0.19 sys=0.00, real=0.16 secs]
> [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0201951
> secs][CMS: 69017K->69015K(81152K), 0.1289750 secs]
> 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)],
> 0.1492306 secs] [Times: user=0.19 sys=0.00, real=0.15 secs]
> [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0206677
> secs][CMS: 69017K->69015K(81152K), 0.1280734 secs]
> 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)],
> 0.1488114 secs] [Times: user=0.18 sys=0.00, real=0.15 secs]
> [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0150225
> secs][CMS: 69017K->69015K(81152K), 0.1301056 secs]
> 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)],
> 0.1451940 secs] [Times: user=0.19 sys=0.01, real=0.14 secs]
>
> The heap limit is intentionally left smallish and the routine where
> this happens is in fact computational (it does allocate sporadic
> objects but never releases them until finished).
>
> This behavior is easy to reproduce on my Mac (quad core),
>
> java version "1.6.0_31"
> Java(TM) SE Runtime Environment (build 1.6.0_31-b04-414-11M3626)
> Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01-414, mixed mode)
>
> I read a bit about the nature of "promotion failed" and it's clear to
> me (or so I think) why this is happening here. My questions are:
>
> 1) why isn't OOM being triggered by gc overhead limit? It should
> easily be falling within the default thresholds,
> 2) is there anything one can do to prevent situation like the above
> (other than manually fiddling with limits)?
>
> Thanks in advance for any pointers and feedback,
>
> Dawid
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120411/42777acc/attachment.html 

From dawid.weiss at gmail.com  Wed Apr 11 11:31:36 2012
From: dawid.weiss at gmail.com (Dawid Weiss)
Date: Wed, 11 Apr 2012 20:31:36 +0200
Subject: ParNew promotion failed, no expected OOM.
In-Reply-To: <CABzyjy=KSs1VUjrvD-dCmZ7hmtBsERnNJ+-wKET=4AXN3useJw@mail.gmail.com>
References: <CAM21Rt-Txh5ceNreLJMTB9TXQCxcqZUSjNeu8VszqqLGst7c=g@mail.gmail.com>
	<CABzyjy=KSs1VUjrvD-dCmZ7hmtBsERnNJ+-wKET=4AXN3useJw@mail.gmail.com>
Message-ID: <CAM21Rt8mW0k3K0fMgo2h9xWNDE17H9g7szi_kqFyz5xfecw9pw@mail.gmail.com>

> GC has competed, as a percentage of the old gen capacity. It almost seems as
> though you have a largish object in the young gen which will not fit in the space
> free in the old gen, o it will never be promoted unless sufficient space clears up in the old

Yes, this is exactly the case -- there is a recursive routine that
builds a complex array-based data structure. The routine is recursive
and I'm guessing the old gen is already filled up with other data so
there is no space to fit the new array there.

> I think we need to fix the space criteria for overhead limit to deal
> gracefully with these kinds of situations.

This would make sense even if it's really an outlier observation of
mine (I'm specifically trying to reach heap boundary; not a typical
use case I guess).

> On an unrelated note, for such a small heap, you should probably use
> ParallelOldGC rather than CMS, but I realize that you didn't explicitly ask for CMS, the mac just
> gave it to you because that's the default.

This happened on a mac and on ubuntu linux as well, but it's indeed of
no relevance here because it's the default setting and this is what is
worrying. I also figured that switching the garbage collector will be
a temporary solution (I used the good old serial gc since I don't care
about timings here).

Thanks for confirming my suspicions.

Dawid

From jon.masamitsu at oracle.com  Wed Apr 11 11:41:35 2012
From: jon.masamitsu at oracle.com (Jon Masamitsu)
Date: Wed, 11 Apr 2012 11:41:35 -0700
Subject: ParNew promotion failed, no expected OOM.
In-Reply-To: <CAM21Rt-Txh5ceNreLJMTB9TXQCxcqZUSjNeu8VszqqLGst7c=g@mail.gmail.com>
References: <CAM21Rt-Txh5ceNreLJMTB9TXQCxcqZUSjNeu8VszqqLGst7c=g@mail.gmail.com>
Message-ID: <4F85D05F.5010907@oracle.com>

Dawid,

I haven't look at your numbers but the OOM due to the
GC overhead is thrown very conservatively.  In addition to
spending too much time doing GC, the policy looks at how
much free space is available in the heap.  It may be that
there is enough free space in the heap such that the policy
does not want to trigger an OOM.

You see the "promotion failure" message when the GC
policy thinks there is enough space in the old gen to
support a young collection.  It's supposed to be the
exception case and I wonder a bit why you see
"promotion failure" messages repeatedly instead of
just seeing "Full collections" but I can see how the
policy could get stuck in a situation where it keeps
thinking there is enough space in the old gen but
in the end there isn't.   Anyway those are basically
Full collections.

Jon

On 04/11/12 07:24, Dawid Weiss wrote:
> Hi there,
>
> We are measuring certain aspects of our algorithm with a test suite
> which attempts to run close to the physical heap's maximum size. We do
> it by doing a form of binary search based on the size of data passed
> to the algorithm, where the lower bound is always "succeeded without
> an OOM" and the upper bound is "threw an OOM". This works nice but
> occasionally we experience an effective deadlock in which full GCs are
> repeatedly invoked, the application makes progress but overall it's
> several orders of magnitude slower than usual (hours instead of
> seconds).
>
> GC logs look like this:
>
> [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0220371
> secs][CMS: 69016K->69014K(81152K), 0.1370901 secs]
> 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)],
> 0.1591765 secs] [Times: user=0.20 sys=0.00, real=0.16 secs]
> [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0170617
> secs][CMS: 69016K->69014K(81152K), 0.1235417 secs]
> 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)],
> 0.1406872 secs] [Times: user=0.18 sys=0.00, real=0.14 secs]
> [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0191855
> secs][CMS: 69016K->69014K(81152K), 0.1296462 secs]
> 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)],
> 0.1488816 secs] [Times: user=0.18 sys=0.00, real=0.15 secs]
> [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0232418
> secs][CMS: 69016K->69014K(81152K), 0.1300695 secs]
> 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)],
> 0.1533590 secs] [Times: user=0.20 sys=0.00, real=0.15 secs]
> [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0190998
> secs][CMS: 69016K->69014K(81152K), 0.1319668 secs]
> 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)],
> 0.1511436 secs] [Times: user=0.18 sys=0.00, real=0.15 secs]
> [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0168998
> secs][CMS: 69017K->69015K(81152K), 0.1359254 secs]
> 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)],
> 0.1528776 secs] [Times: user=0.20 sys=0.01, real=0.16 secs]
> [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0214651
> secs][CMS: 69017K->69015K(81152K), 0.1209494 secs]
> 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)],
> 0.1424941 secs] [Times: user=0.18 sys=0.00, real=0.14 secs]
> [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0200897
> secs][CMS: 69017K->69015K(81152K), 0.1244227 secs]
> 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)],
> 0.1445654 secs] [Times: user=0.18 sys=0.00, real=0.14 secs]
> [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0203377
> secs][CMS: 69017K->69015K(81152K), 0.1353857 secs]
> 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)],
> 0.1558016 secs] [Times: user=0.19 sys=0.00, real=0.16 secs]
> [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0201951
> secs][CMS: 69017K->69015K(81152K), 0.1289750 secs]
> 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)],
> 0.1492306 secs] [Times: user=0.19 sys=0.00, real=0.15 secs]
> [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0206677
> secs][CMS: 69017K->69015K(81152K), 0.1280734 secs]
> 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)],
> 0.1488114 secs] [Times: user=0.18 sys=0.00, real=0.15 secs]
> [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0150225
> secs][CMS: 69017K->69015K(81152K), 0.1301056 secs]
> 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)],
> 0.1451940 secs] [Times: user=0.19 sys=0.01, real=0.14 secs]
>
> The heap limit is intentionally left smallish and the routine where
> this happens is in fact computational (it does allocate sporadic
> objects but never releases them until finished).
>
> This behavior is easy to reproduce on my Mac (quad core),
>
> java version "1.6.0_31"
> Java(TM) SE Runtime Environment (build 1.6.0_31-b04-414-11M3626)
> Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01-414, mixed mode)
>
> I read a bit about the nature of "promotion failed" and it's clear to
> me (or so I think) why this is happening here. My questions are:
>
> 1) why isn't OOM being triggered by gc overhead limit? It should
> easily be falling within the default thresholds,
> 2) is there anything one can do to prevent situation like the above
> (other than manually fiddling with limits)?
>
> Thanks in advance for any pointers and feedback,
>
> Dawid
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

From dawid.weiss at gmail.com  Wed Apr 11 11:53:00 2012
From: dawid.weiss at gmail.com (Dawid Weiss)
Date: Wed, 11 Apr 2012 20:53:00 +0200
Subject: ParNew promotion failed, no expected OOM.
In-Reply-To: <4F85D05F.5010907@oracle.com>
References: <CAM21Rt-Txh5ceNreLJMTB9TXQCxcqZUSjNeu8VszqqLGst7c=g@mail.gmail.com>
	<4F85D05F.5010907@oracle.com>
Message-ID: <CAM21Rt9B75iuXjvOyL6xBh=xhkAAJ0T_vReHCj+JAT8xk4GumQ@mail.gmail.com>

> I haven't look at your numbers but the OOM due to the
> GC overhead is thrown very conservatively. ?In addition to

I realize this but this seems like a good example of when gc overhead
should fire... or so I
think. There doesn't seem to be any space left at all --

69016K->69014K(81152K)

I realize these are full GCs because that's what -verbose:gc reports
(I included the details because I asked for them but otherwise what
you see is just FullGCs and no progress from the application itself).

What's puzzling to me is that this routine only allocates memory (hard
refs, there is nothing to collect) but the garbace collector _does_
drop around 2kb on every full GC... Also, this routine is normally
blazing fast and should either complete or OOM very quickly but
instead stalls as if 99% of the time was spent doing full collections.
I really cannot explain this.

Is there any way to see which objects get _dropped_ on full GC runs?
I'm curious what these dropped objects are.

Dawid


> spending too much time doing GC, the policy looks at how
> much free space is available in the heap. ?It may be that
> there is enough free space in the heap such that the policy
> does not want to trigger an OOM.
>
> You see the "promotion failure" message when the GC
> policy thinks there is enough space in the old gen to
> support a young collection. ?It's supposed to be the
> exception case and I wonder a bit why you see
> "promotion failure" messages repeatedly instead of
> just seeing "Full collections" but I can see how the
> policy could get stuck in a situation where it keeps
> thinking there is enough space in the old gen but
> in the end there isn't. ? Anyway those are basically
> Full collections.
>
> Jon
>
> On 04/11/12 07:24, Dawid Weiss wrote:
>> Hi there,
>>
>> We are measuring certain aspects of our algorithm with a test suite
>> which attempts to run close to the physical heap's maximum size. We do
>> it by doing a form of binary search based on the size of data passed
>> to the algorithm, where the lower bound is always "succeeded without
>> an OOM" and the upper bound is "threw an OOM". This works nice but
>> occasionally we experience an effective deadlock in which full GCs are
>> repeatedly invoked, the application makes progress but overall it's
>> several orders of magnitude slower than usual (hours instead of
>> seconds).
>>
>> GC logs look like this:
>>
>> [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0220371
>> secs][CMS: 69016K->69014K(81152K), 0.1370901 secs]
>> 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)],
>> 0.1591765 secs] [Times: user=0.20 sys=0.00, real=0.16 secs]
>> [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0170617
>> secs][CMS: 69016K->69014K(81152K), 0.1235417 secs]
>> 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)],
>> 0.1406872 secs] [Times: user=0.18 sys=0.00, real=0.14 secs]
>> [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0191855
>> secs][CMS: 69016K->69014K(81152K), 0.1296462 secs]
>> 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)],
>> 0.1488816 secs] [Times: user=0.18 sys=0.00, real=0.15 secs]
>> [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0232418
>> secs][CMS: 69016K->69014K(81152K), 0.1300695 secs]
>> 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)],
>> 0.1533590 secs] [Times: user=0.20 sys=0.00, real=0.15 secs]
>> [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0190998
>> secs][CMS: 69016K->69014K(81152K), 0.1319668 secs]
>> 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)],
>> 0.1511436 secs] [Times: user=0.18 sys=0.00, real=0.15 secs]
>> [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0168998
>> secs][CMS: 69017K->69015K(81152K), 0.1359254 secs]
>> 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)],
>> 0.1528776 secs] [Times: user=0.20 sys=0.01, real=0.16 secs]
>> [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0214651
>> secs][CMS: 69017K->69015K(81152K), 0.1209494 secs]
>> 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)],
>> 0.1424941 secs] [Times: user=0.18 sys=0.00, real=0.14 secs]
>> [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0200897
>> secs][CMS: 69017K->69015K(81152K), 0.1244227 secs]
>> 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)],
>> 0.1445654 secs] [Times: user=0.18 sys=0.00, real=0.14 secs]
>> [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0203377
>> secs][CMS: 69017K->69015K(81152K), 0.1353857 secs]
>> 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)],
>> 0.1558016 secs] [Times: user=0.19 sys=0.00, real=0.16 secs]
>> [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0201951
>> secs][CMS: 69017K->69015K(81152K), 0.1289750 secs]
>> 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)],
>> 0.1492306 secs] [Times: user=0.19 sys=0.00, real=0.15 secs]
>> [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0206677
>> secs][CMS: 69017K->69015K(81152K), 0.1280734 secs]
>> 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)],
>> 0.1488114 secs] [Times: user=0.18 sys=0.00, real=0.15 secs]
>> [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0150225
>> secs][CMS: 69017K->69015K(81152K), 0.1301056 secs]
>> 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)],
>> 0.1451940 secs] [Times: user=0.19 sys=0.01, real=0.14 secs]
>>
>> The heap limit is intentionally left smallish and the routine where
>> this happens is in fact computational (it does allocate sporadic
>> objects but never releases them until finished).
>>
>> This behavior is easy to reproduce on my Mac (quad core),
>>
>> java version "1.6.0_31"
>> Java(TM) SE Runtime Environment (build 1.6.0_31-b04-414-11M3626)
>> Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01-414, mixed mode)
>>
>> I read a bit about the nature of "promotion failed" and it's clear to
>> me (or so I think) why this is happening here. My questions are:
>>
>> 1) why isn't OOM being triggered by gc overhead limit? It should
>> easily be falling within the default thresholds,
>> 2) is there anything one can do to prevent situation like the above
>> (other than manually fiddling with limits)?
>>
>> Thanks in advance for any pointers and feedback,
>>
>> Dawid
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

From jon.masamitsu at oracle.com  Wed Apr 11 23:02:18 2012
From: jon.masamitsu at oracle.com (Jon Masamitsu)
Date: Wed, 11 Apr 2012 23:02:18 -0700
Subject: ParNew promotion failed, no expected OOM.
In-Reply-To: <CAM21Rt9B75iuXjvOyL6xBh=xhkAAJ0T_vReHCj+JAT8xk4GumQ@mail.gmail.com>
References: <CAM21Rt-Txh5ceNreLJMTB9TXQCxcqZUSjNeu8VszqqLGst7c=g@mail.gmail.com>
	<4F85D05F.5010907@oracle.com>
	<CAM21Rt9B75iuXjvOyL6xBh=xhkAAJ0T_vReHCj+JAT8xk4GumQ@mail.gmail.com>
Message-ID: <4F866FEA.7010607@oracle.com>

Dawid,

I haven't used these myself but you can try the flags

PrintClassHistogramBeforeFullGC
PrintClassHistogramAfterFullGC

and see what gets collected.

Jon


On 4/11/2012 11:53 AM, Dawid Weiss wrote:
>> I haven't look at your numbers but the OOM due to the
>> GC overhead is thrown very conservatively.  In addition to
> I realize this but this seems like a good example of when gc overhead
> should fire... or so I
> think. There doesn't seem to be any space left at all --
>
> 69016K->69014K(81152K)
>
> I realize these are full GCs because that's what -verbose:gc reports
> (I included the details because I asked for them but otherwise what
> you see is just FullGCs and no progress from the application itself).
>
> What's puzzling to me is that this routine only allocates memory (hard
> refs, there is nothing to collect) but the garbace collector _does_
> drop around 2kb on every full GC... Also, this routine is normally
> blazing fast and should either complete or OOM very quickly but
> instead stalls as if 99% of the time was spent doing full collections.
> I really cannot explain this.
>
> Is there any way to see which objects get _dropped_ on full GC runs?
> I'm curious what these dropped objects are.
>
> Dawid
>
>
>> spending too much time doing GC, the policy looks at how
>> much free space is available in the heap.  It may be that
>> there is enough free space in the heap such that the policy
>> does not want to trigger an OOM.
>>
>> You see the "promotion failure" message when the GC
>> policy thinks there is enough space in the old gen to
>> support a young collection.  It's supposed to be the
>> exception case and I wonder a bit why you see
>> "promotion failure" messages repeatedly instead of
>> just seeing "Full collections" but I can see how the
>> policy could get stuck in a situation where it keeps
>> thinking there is enough space in the old gen but
>> in the end there isn't.   Anyway those are basically
>> Full collections.
>>
>> Jon
>>
>> On 04/11/12 07:24, Dawid Weiss wrote:
>>> Hi there,
>>>
>>> We are measuring certain aspects of our algorithm with a test suite
>>> which attempts to run close to the physical heap's maximum size. We do
>>> it by doing a form of binary search based on the size of data passed
>>> to the algorithm, where the lower bound is always "succeeded without
>>> an OOM" and the upper bound is "threw an OOM". This works nice but
>>> occasionally we experience an effective deadlock in which full GCs are
>>> repeatedly invoked, the application makes progress but overall it's
>>> several orders of magnitude slower than usual (hours instead of
>>> seconds).
>>>
>>> GC logs look like this:
>>>
>>> [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0220371
>>> secs][CMS: 69016K->69014K(81152K), 0.1370901 secs]
>>> 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)],
>>> 0.1591765 secs] [Times: user=0.20 sys=0.00, real=0.16 secs]
>>> [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0170617
>>> secs][CMS: 69016K->69014K(81152K), 0.1235417 secs]
>>> 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)],
>>> 0.1406872 secs] [Times: user=0.18 sys=0.00, real=0.14 secs]
>>> [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0191855
>>> secs][CMS: 69016K->69014K(81152K), 0.1296462 secs]
>>> 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)],
>>> 0.1488816 secs] [Times: user=0.18 sys=0.00, real=0.15 secs]
>>> [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0232418
>>> secs][CMS: 69016K->69014K(81152K), 0.1300695 secs]
>>> 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)],
>>> 0.1533590 secs] [Times: user=0.20 sys=0.00, real=0.15 secs]
>>> [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0190998
>>> secs][CMS: 69016K->69014K(81152K), 0.1319668 secs]
>>> 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)],
>>> 0.1511436 secs] [Times: user=0.18 sys=0.00, real=0.15 secs]
>>> [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0168998
>>> secs][CMS: 69017K->69015K(81152K), 0.1359254 secs]
>>> 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)],
>>> 0.1528776 secs] [Times: user=0.20 sys=0.01, real=0.16 secs]
>>> [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0214651
>>> secs][CMS: 69017K->69015K(81152K), 0.1209494 secs]
>>> 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)],
>>> 0.1424941 secs] [Times: user=0.18 sys=0.00, real=0.14 secs]
>>> [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0200897
>>> secs][CMS: 69017K->69015K(81152K), 0.1244227 secs]
>>> 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)],
>>> 0.1445654 secs] [Times: user=0.18 sys=0.00, real=0.14 secs]
>>> [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0203377
>>> secs][CMS: 69017K->69015K(81152K), 0.1353857 secs]
>>> 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)],
>>> 0.1558016 secs] [Times: user=0.19 sys=0.00, real=0.16 secs]
>>> [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0201951
>>> secs][CMS: 69017K->69015K(81152K), 0.1289750 secs]
>>> 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)],
>>> 0.1492306 secs] [Times: user=0.19 sys=0.00, real=0.15 secs]
>>> [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0206677
>>> secs][CMS: 69017K->69015K(81152K), 0.1280734 secs]
>>> 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)],
>>> 0.1488114 secs] [Times: user=0.18 sys=0.00, real=0.15 secs]
>>> [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0150225
>>> secs][CMS: 69017K->69015K(81152K), 0.1301056 secs]
>>> 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)],
>>> 0.1451940 secs] [Times: user=0.19 sys=0.01, real=0.14 secs]
>>>
>>> The heap limit is intentionally left smallish and the routine where
>>> this happens is in fact computational (it does allocate sporadic
>>> objects but never releases them until finished).
>>>
>>> This behavior is easy to reproduce on my Mac (quad core),
>>>
>>> java version "1.6.0_31"
>>> Java(TM) SE Runtime Environment (build 1.6.0_31-b04-414-11M3626)
>>> Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01-414, mixed mode)
>>>
>>> I read a bit about the nature of "promotion failed" and it's clear to
>>> me (or so I think) why this is happening here. My questions are:
>>>
>>> 1) why isn't OOM being triggered by gc overhead limit? It should
>>> easily be falling within the default thresholds,
>>> 2) is there anything one can do to prevent situation like the above
>>> (other than manually fiddling with limits)?
>>>
>>> Thanks in advance for any pointers and feedback,
>>>
>>> Dawid
>>> _______________________________________________
>>> hotspot-gc-use mailing list
>>> hotspot-gc-use at openjdk.java.net
>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

From jon.masamitsu at oracle.com  Wed Apr 11 23:28:07 2012
From: jon.masamitsu at oracle.com (Jon Masamitsu)
Date: Wed, 11 Apr 2012 23:28:07 -0700
Subject: ParNew promotion failed, no expected OOM.
In-Reply-To: <CABzyjy=KSs1VUjrvD-dCmZ7hmtBsERnNJ+-wKET=4AXN3useJw@mail.gmail.com>
References: <CAM21Rt-Txh5ceNreLJMTB9TXQCxcqZUSjNeu8VszqqLGst7c=g@mail.gmail.com>
	<CABzyjy=KSs1VUjrvD-dCmZ7hmtBsERnNJ+-wKET=4AXN3useJw@mail.gmail.com>
Message-ID: <4F8675F7.9040205@oracle.com>

Ramki,

I never want to throw an OOM and then have to argue about  whether
the OOM was thrown prematurely.  That would be a bug.  As a consequence
of such an approach, I accept that there will be times when it would
have been more helpful if the OOM was thrown sooner.   That might be  a 
poorer
quality of service but not a bug (I think).

Jon

On 4/11/2012 11:24 AM, Srinivas Ramakrishna wrote:
> I believe this is missing the "gc overhead" threshold for the space limit.
> As I have commented in the past, i think the GC overhead limit should
> consider
> not just the space free in the whole heap, but rather the difference
> between the old gen
> capacity and the sum of the space used in the young gen and the old gen
> after a major
> GC has competed, as a percentage of the old gen capacity. It almost seems
> as though
> you have a largish object in the young gen which will not fit in the space
> free in the old gen,
> o it will never be promoted unless sufficient space clears up in the old
> gen, and from what
> you are describing, that won't happen until your program terminates its
> computation.
>
> I think we need to fix the space criteria for overhead limit to deal
> gracefully
> with these kinds of situations.
>
> On an unrelated note, for such a small heap, you should probably use
> ParallelOldGC rather
> than CMS, but I realize that you didn't explicitly ask for CMS, the mac
> just gave it to you
> because that's the default.
>
> -- ramki
>
> On Wed, Apr 11, 2012 at 7:24 AM, Dawid Weiss<dawid.weiss at gmail.com>  wrote:
>
>> Hi there,
>>
>> We are measuring certain aspects of our algorithm with a test suite
>> which attempts to run close to the physical heap's maximum size. We do
>> it by doing a form of binary search based on the size of data passed
>> to the algorithm, where the lower bound is always "succeeded without
>> an OOM" and the upper bound is "threw an OOM". This works nice but
>> occasionally we experience an effective deadlock in which full GCs are
>> repeatedly invoked, the application makes progress but overall it's
>> several orders of magnitude slower than usual (hours instead of
>> seconds).
>>
>> GC logs look like this:
>>
>> [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0220371
>> secs][CMS: 69016K->69014K(81152K), 0.1370901 secs]
>> 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)],
>> 0.1591765 secs] [Times: user=0.20 sys=0.00, real=0.16 secs]
>> [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0170617
>> secs][CMS: 69016K->69014K(81152K), 0.1235417 secs]
>> 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)],
>> 0.1406872 secs] [Times: user=0.18 sys=0.00, real=0.14 secs]
>> [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0191855
>> secs][CMS: 69016K->69014K(81152K), 0.1296462 secs]
>> 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)],
>> 0.1488816 secs] [Times: user=0.18 sys=0.00, real=0.15 secs]
>> [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0232418
>> secs][CMS: 69016K->69014K(81152K), 0.1300695 secs]
>> 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)],
>> 0.1533590 secs] [Times: user=0.20 sys=0.00, real=0.15 secs]
>> [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0190998
>> secs][CMS: 69016K->69014K(81152K), 0.1319668 secs]
>> 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)],
>> 0.1511436 secs] [Times: user=0.18 sys=0.00, real=0.15 secs]
>> [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0168998
>> secs][CMS: 69017K->69015K(81152K), 0.1359254 secs]
>> 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)],
>> 0.1528776 secs] [Times: user=0.20 sys=0.01, real=0.16 secs]
>> [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0214651
>> secs][CMS: 69017K->69015K(81152K), 0.1209494 secs]
>> 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)],
>> 0.1424941 secs] [Times: user=0.18 sys=0.00, real=0.14 secs]
>> [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0200897
>> secs][CMS: 69017K->69015K(81152K), 0.1244227 secs]
>> 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)],
>> 0.1445654 secs] [Times: user=0.18 sys=0.00, real=0.14 secs]
>> [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0203377
>> secs][CMS: 69017K->69015K(81152K), 0.1353857 secs]
>> 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)],
>> 0.1558016 secs] [Times: user=0.19 sys=0.00, real=0.16 secs]
>> [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0201951
>> secs][CMS: 69017K->69015K(81152K), 0.1289750 secs]
>> 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)],
>> 0.1492306 secs] [Times: user=0.19 sys=0.00, real=0.15 secs]
>> [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0206677
>> secs][CMS: 69017K->69015K(81152K), 0.1280734 secs]
>> 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)],
>> 0.1488114 secs] [Times: user=0.18 sys=0.00, real=0.15 secs]
>> [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0150225
>> secs][CMS: 69017K->69015K(81152K), 0.1301056 secs]
>> 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)],
>> 0.1451940 secs] [Times: user=0.19 sys=0.01, real=0.14 secs]
>>
>> The heap limit is intentionally left smallish and the routine where
>> this happens is in fact computational (it does allocate sporadic
>> objects but never releases them until finished).
>>
>> This behavior is easy to reproduce on my Mac (quad core),
>>
>> java version "1.6.0_31"
>> Java(TM) SE Runtime Environment (build 1.6.0_31-b04-414-11M3626)
>> Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01-414, mixed mode)
>>
>> I read a bit about the nature of "promotion failed" and it's clear to
>> me (or so I think) why this is happening here. My questions are:
>>
>> 1) why isn't OOM being triggered by gc overhead limit? It should
>> easily be falling within the default thresholds,
>> 2) is there anything one can do to prevent situation like the above
>> (other than manually fiddling with limits)?
>>
>> Thanks in advance for any pointers and feedback,
>>
>> Dawid
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120411/73ae2f7f/attachment.html 

From dawid.weiss at gmail.com  Wed Apr 11 23:42:21 2012
From: dawid.weiss at gmail.com (Dawid Weiss)
Date: Thu, 12 Apr 2012 08:42:21 +0200
Subject: ParNew promotion failed, no expected OOM.
In-Reply-To: <4F8675F7.9040205@oracle.com>
References: <CAM21Rt-Txh5ceNreLJMTB9TXQCxcqZUSjNeu8VszqqLGst7c=g@mail.gmail.com>
	<CABzyjy=KSs1VUjrvD-dCmZ7hmtBsERnNJ+-wKET=4AXN3useJw@mail.gmail.com>
	<4F8675F7.9040205@oracle.com>
Message-ID: <CAM21Rt-=9_onCt7=dNX31rj5MNgmnhW17gsD1DKQiTwr7bgzOQ@mail.gmail.com>

> I never want to throw an OOM and then have to argue about? whether
> the OOM was thrown prematurely.? That would be a bug.? As a consequence

I agree the tradeoff here is very subtle and there is probably no
optimal setting. I'll dig deeper in a spare minute and see if I can
repreduce this on a simpler example.

Dawid

From tanman12345 at yahoo.com  Thu Apr 12 09:27:13 2012
From: tanman12345 at yahoo.com (Erwin)
Date: Thu, 12 Apr 2012 09:27:13 -0700 (PDT)
Subject: Need help about CMS Failure and ParNew failure
In-Reply-To: <4F8675F7.9040205@oracle.com>
References: <CAM21Rt-Txh5ceNreLJMTB9TXQCxcqZUSjNeu8VszqqLGst7c=g@mail.gmail.com>
	<CABzyjy=KSs1VUjrvD-dCmZ7hmtBsERnNJ+-wKET=4AXN3useJw@mail.gmail.com>
	<4F8675F7.9040205@oracle.com>
Message-ID: <1334248033.11363.YahooMailNeo@web111103.mail.gq1.yahoo.com>

Hello,
?
I'm not an expert when it comes to analyzing GC output and was wondering if you guys could assist? We're using Solaris 10 10/08 s10s_u6wos_07b SPARC, with WAS NDE 7.0.0.9. After a restart of our JVMs, GC seems to be fine. However, after about a week, we start seeing failures in GC log. We're getting ParNew and Concurrent mod failures. Our JVM configurations?are below:
Min heap - 4096
Max heap - 6016
?
JVM Arguments
-server -Xmn1000m -XX:PermSize=512m -XX:+UseConcMarkSweepGC -XX:+HeapDumpOnOutOfMemoryError -DUseSunHttpHandler=true -Djavax.xml.soap.MessageFactory=weblogic.xml.saaj.MessageFactoryImpl -Doracle.jdbc.V8Compatible=true -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSClassUnloadingEnabled -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:-TraceClassUnloading -XX:+PrintHeapAtGC? -XX:MaxPermSize=694m -XX:+DisableExplicitGC -XX:+CMSParallelRemarkEnabled? 
?
I'm attaching the ParNew failure as well as CMS failure files. Hope it attaches.Total of 2 files. In case they don't see below. !st same if ParNew, 2nd is CMS failure. - Thanks, Erwin
ParNew failure sample:
{Heap before GC invocations=7800 (full 529):
?par new generation?? total 921600K, used 530694K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
? eden space 819200K,? 52% used [0xfffffffdd0000000, 0xfffffffdea241bb8, 0xfffffffe02000000)
? from space 102400K, 100% used [0xfffffffe02000000, 0xfffffffe08400000, 0xfffffffe08400000)
? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000)
?concurrent mark-sweep generation total 4902464K, used 2682365K [0xfffffffe0e800000, 0xffffffff39b90000, 0xffffffff48000000)
?concurrent-mark-sweep perm gen total 524288K, used 238782K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
552370.958: [GC 552370.958: [ParNew: 530694K->93237K(921600K), 0.1858500 secs] 3213060K->2832744K(5824064K), 0.1862466 secs] [Times: user=1.21 sys=0.08, real=0.19 secs] 
Heap after GC invocations=7801 (full 529):
?par new generation?? total 921600K, used 93237K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000)
? from space 102400K,? 91% used [0xfffffffe08400000, 0xfffffffe0df0d498, 0xfffffffe0e800000)
? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000)
?concurrent mark-sweep generation total 4902464K, used 2739507K [0xfffffffe0e800000, 0xffffffff39b90000, 0xffffffff48000000)
?concurrent-mark-sweep perm gen total 524288K, used 238782K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
}
{Heap before GC invocations=7801 (full 529):
?par new generation?? total 921600K, used 912377K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
? eden space 819200K,? 99% used [0xfffffffdd0000000, 0xfffffffe01ff1320, 0xfffffffe02000000)
? from space 102400K,? 91% used [0xfffffffe08400000, 0xfffffffe0df0d498, 0xfffffffe0e800000)
? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000)
?concurrent mark-sweep generation total 4902464K, used 2739507K [0xfffffffe0e800000, 0xffffffff39b90000, 0xffffffff48000000)
?concurrent-mark-sweep perm gen total 524288K, used 238795K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
552372.849: [GC 552372.849: [ParNew (promotion failed): 912377K->869343K(921600K), 0.2641392 secs]552373.113: [CMS: 2791714K->3203612K(4902464K), 29.1902704 secs] 3651885K->3203612K(5824064K), [CMS Perm : 238795K->238246K(524288K)], 29.4609781 secs] [Times: user=30.05 sys=0.13, real=29.46 secs] 
Heap after GC invocations=7802 (full 530):
?par new generation?? total 921600K, used 0K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000)
? from space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000)
? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000)
?concurrent mark-sweep generation total 5136384K, used 3203612K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
?concurrent-mark-sweep perm gen total 524288K, used 238246K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
}
{Heap before GC invocations=12696 (full 809):
?par new generation?? total 921600K, used 908565K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, 0xfffffffe02000000)
? from space 102400K,? 87% used [0xfffffffe02000000, 0xfffffffe07745510, 0xfffffffe08400000)
? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000)
?concurrent mark-sweep generation total 5136384K, used 2696786K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
?concurrent-mark-sweep perm gen total 524288K, used 241380K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
980120.502: [GC 980120.502: [ParNew: 908565K->73974K(921600K), 0.1519646 secs] 3605352K->2817972K(6057984K), 0.1523927 secs] [Times: user=1.07 sys=0.03, real=0.15 secs] 
Heap after GC invocations=12697 (full 809):
?par new generation?? total 921600K, used 73974K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000)
? from space 102400K,? 72% used [0xfffffffe08400000, 0xfffffffe0cc3d928, 0xfffffffe0e800000)
? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000)
?concurrent mark-sweep generation total 5136384K, used 2743998K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
?concurrent-mark-sweep perm gen total 524288K, used 241380K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
}
{Heap before GC invocations=12697 (full 809):
?par new generation?? total 921600K, used 893174K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, 0xfffffffe02000000)
? from space 102400K,? 72% used [0xfffffffe08400000, 0xfffffffe0cc3d928, 0xfffffffe0e800000)
? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000)
?concurrent mark-sweep generation total 5136384K, used 2743998K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
?concurrent-mark-sweep perm gen total 524288K, used 241411K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
980130.777: [GC 980130.777: [ParNew (promotion failed): 893174K->913391K(921600K), 0.5914616 secs]980131.368: [CMS: 2778416K->2710999K(5136384K), 27.7981960 secs] 3637172K->2710999K(6057984K), [CMS Perm : 241411K->240494K(524288K)], 28.3902578 secs] [Times: user=29.37 sys=0.08, real=28.39 secs] 
Heap after GC invocations=12698 (full 810):
?par new generation?? total 921600K, used 0K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000)
? from space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000)
? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000)
?concurrent mark-sweep generation total 5136384K, used 2710999K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
?concurrent-mark-sweep perm gen total 524288K, used 240494K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
}
{Heap before GC invocations=12698 (full 810):
?par new generation?? total 921600K, used 819200K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, 0xfffffffe02000000)
? from space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000)
? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000)
?concurrent mark-sweep generation total 5136384K, used 2710999K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
?concurrent-mark-sweep perm gen total 524288K, used 240523K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
980171.033: [GC 980171.033: [ParNew: 819200K->102400K(921600K), 0.2144047 secs] 3530199K->2857892K(6057984K), 0.2149864 secs] [Times: user=0.99 sys=0.03, real=0.22 secs] 
Heap after GC invocations=12699 (full 810):
?par new generation?? total 921600K, used 102400K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000)
? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000, 0xfffffffe0e800000)
? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000)
?concurrent mark-sweep generation total 5136384K, used 2755492K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
?concurrent-mark-sweep perm gen total 524288K, used 240523K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
}

CMS Failure:
{Heap before GC invocations=23856 (full 1462):
?par new generation?? total 921600K, used 921600K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, 0xfffffffe02000000)
? from space 102400K, 100% used [0xfffffffe02000000, 0xfffffffe08400000, 0xfffffffe08400000)
? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000)
?concurrent mark-sweep generation total 5136384K, used 3496920K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
?concurrent-mark-sweep perm gen total 524288K, used 243562K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
1761014.981: [GC 1761014.981: [ParNew: 921600K->102400K(921600K), 0.3004508 secs] 4418520K->3667695K(6057984K), 0.3008667 secs] [Times: user=1.69 sys=0.10, real=0.30 secs] 
Heap after GC invocations=23857 (full 1462):
?par new generation?? total 921600K, used 102400K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000)
? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000, 0xfffffffe0e800000)
? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000)
?concurrent mark-sweep generation total 5136384K, used 3565295K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
?concurrent-mark-sweep perm gen total 524288K, used 243562K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
}
{Heap before GC invocations=23857 (full 1462):
?par new generation?? total 921600K, used 921600K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, 0xfffffffe02000000)
? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000, 0xfffffffe0e800000)
? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000)
?concurrent mark-sweep generation total 5136384K, used 3565295K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
?concurrent-mark-sweep perm gen total 524288K, used 243773K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
1761036.997: [GC 1761036.998: [ParNew: 921600K->102400K(921600K), 0.4075457 secs] 4486895K->3763683K(6057984K), 0.4079591 secs] [Times: user=3.57 sys=0.80, real=0.41 secs] 
Heap after GC invocations=23858 (full 1462):
?par new generation?? total 921600K, used 102400K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000)
? from space 102400K, 100% used [0xfffffffe02000000, 0xfffffffe08400000, 0xfffffffe08400000)
? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000)
?concurrent mark-sweep generation total 5136384K, used 3661283K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
?concurrent-mark-sweep perm gen total 524288K, used 243773K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
}
1761037.410: [GC [1 CMS-initial-mark: 3661283K(5136384K)] 3763683K(6057984K), 0.0883369 secs] [Times: user=0.09 sys=0.00, real=0.09 secs] 
1761037.499: [CMS-concurrent-mark-start]
1761041.447: [CMS-concurrent-mark: 3.906/3.948 secs] [Times: user=25.81 sys=1.06, real=3.95 secs] 
1761041.448: [CMS-concurrent-preclean-start]
1761041.763: [CMS-concurrent-preclean: 0.312/0.315 secs] [Times: user=0.50 sys=0.02, real=0.32 secs] 
1761041.763: [CMS-concurrent-abortable-preclean-start]
?CMS: abort preclean due to time 1761046.800: [CMS-concurrent-abortable-preclean: 4.720/5.036 secs] [Times: user=6.68 sys=0.18, real=5.04 secs] 
1761046.808: [GC[YG occupancy: 464701 K (921600 K)]1761046.808: [Rescan (parallel) , 0.3034664 secs]1761047.112: [weak refs processing, 0.0152564 secs]1761047.128: [class unloading, 0.1518160 secs]1761047.280: [scrub symbol & string tables, 0.1332523 secs] [1 CMS-remark: 3661283K(5136384K)] 4125985K(6057984K), 0.6980401 secs] [Times: user=1.34 sys=0.70, real=0.70 secs] 
1761047.507: [CMS-concurrent-sweep-start]
1761051.779: [CMS-concurrent-sweep: 4.252/4.271 secs] [Times: user=6.30 sys=0.19, real=4.27 secs] 
1761051.779: [CMS-concurrent-reset-start]
1761051.837: [CMS-concurrent-reset: 0.058/0.058 secs] [Times: user=0.07 sys=0.00, real=0.06 secs] 
{Heap before GC invocations=23858 (full 1463):
?par new generation?? total 921600K, used 921600K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, 0xfffffffe02000000)
? from space 102400K, 100% used [0xfffffffe02000000, 0xfffffffe08400000, 0xfffffffe08400000)
? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000)
?concurrent mark-sweep generation total 5136384K, used 3514703K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
?concurrent-mark-sweep perm gen total 524288K, used 243613K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
1761061.144: [GC 1761061.145: [ParNew: 921600K->102400K(921600K), 0.4124278 secs] 4436303K->3717736K(6057984K), 0.4128777 secs] [Times: user=1.98 sys=0.19, real=0.41 secs] 
Heap after GC invocations=23859 (full 1463):
?par new generation?? total 921600K, used 102400K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000)
? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000, 0xfffffffe0e800000)
? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000)
?concurrent mark-sweep generation total 5136384K, used 3615336K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
?concurrent-mark-sweep perm gen total 524288K, used 243613K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
}
1761061.562: [GC [1 CMS-initial-mark: 3615336K(5136384K)] 3718604K(6057984K), 0.0660086 secs] [Times: user=0.07 sys=0.00, real=0.07 secs] 
1761061.629: [CMS-concurrent-mark-start]
1761065.589: [CMS-concurrent-mark: 3.920/3.960 secs] [Times: user=26.20 sys=1.05, real=3.96 secs] 
1761065.590: [CMS-concurrent-preclean-start]
1761065.883: [CMS-concurrent-preclean: 0.290/0.293 secs] [Times: user=0.54 sys=0.02, real=0.29 secs] 
1761065.883: [CMS-concurrent-abortable-preclean-start]
?CMS: abort preclean due to time 1761070.950: [CMS-concurrent-abortable-preclean: 5.035/5.067 secs] [Times: user=10.70 sys=0.36, real=5.07 secs] 
1761070.958: [GC[YG occupancy: 656197 K (921600 K)]1761070.959: [Rescan (parallel) , 0.5056315 secs]1761071.465: [weak refs processing, 0.0107058 secs]1761071.476: [class unloading, 0.1500832 secs]1761071.626: [scrub symbol & string tables, 0.1278517 secs] [1 CMS-remark: 3615336K(5136384K)] 4271533K(6057984K), 0.8857121 secs] [Times: user=1.77 sys=1.08, real=0.89 secs] 
1761071.845: [CMS-concurrent-sweep-start]
1761075.956: [CMS-concurrent-sweep: 4.094/4.111 secs] [Times: user=7.97 sys=0.27, real=4.11 secs] 
1761075.957: [CMS-concurrent-reset-start]
1761076.031: [CMS-concurrent-reset: 0.063/0.074 secs] [Times: user=0.13 sys=0.01, real=0.07 secs] 
{Heap before GC invocations=23859 (full 1464):
?par new generation?? total 921600K, used 921600K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, 0xfffffffe02000000)
? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000, 0xfffffffe0e800000)
? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000)
?concurrent mark-sweep generation total 5136384K, used 3544377K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
?concurrent-mark-sweep perm gen total 524288K, used 243474K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
1761077.540: [GC 1761077.540: [ParNew: 921600K->102400K(921600K), 0.4030394 secs] 4465977K->3740399K(6057984K), 0.4034742 secs] [Times: user=3.14 sys=0.55, real=0.40 secs] 
Heap after GC invocations=23860 (full 1464):
?par new generation?? total 921600K, used 102400K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000)
? from space 102400K, 100% used [0xfffffffe02000000, 0xfffffffe08400000, 0xfffffffe08400000)
? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000)
?concurrent mark-sweep generation total 5136384K, used 3637999K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
?concurrent-mark-sweep perm gen total 524288K, used 243474K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
}
1761077.948: [GC [1 CMS-initial-mark: 3637999K(5136384K)] 3740403K(6057984K), 0.0664402 secs] [Times: user=0.06 sys=0.00, real=0.07 secs] 
1761078.015: [CMS-concurrent-mark-start]
1761082.141: [CMS-concurrent-mark: 4.076/4.126 secs] [Times: user=25.56 sys=1.24, real=4.13 secs] 
1761082.142: [CMS-concurrent-preclean-start]
1761082.435: [CMS-concurrent-preclean: 0.290/0.293 secs] [Times: user=0.56 sys=0.03, real=0.29 secs] 
1761082.435: [CMS-concurrent-abortable-preclean-start]
?CMS: abort preclean due to time 1761087.544: [CMS-concurrent-abortable-preclean: 4.166/5.108 secs] [Times: user=6.79 sys=0.38, real=5.11 secs] 
1761087.554: [GC[YG occupancy: 612230 K (921600 K)]1761087.555: [Rescan (parallel) , 0.3453344 secs]1761087.900: [weak refs processing, 0.0033384 secs]1761087.904: [class unloading, 0.1515234 secs]1761088.055: [scrub symbol & string tables, 0.1280533 secs] [1 CMS-remark: 3637999K(5136384K)] 4250230K(6057984K), 0.7189376 secs] [Times: user=1.29 sys=0.76, real=0.72 secs] 
1761088.274: [CMS-concurrent-sweep-start]
1761092.543: [CMS-concurrent-sweep: 4.268/4.268 secs] [Times: user=6.72 sys=0.26, real=4.27 secs] 
1761092.543: [CMS-concurrent-reset-start]
1761092.606: [CMS-concurrent-reset: 0.063/0.063 secs] [Times: user=0.11 sys=0.01, real=0.06 secs] 
{Heap before GC invocations=23860 (full 1465):
?par new generation?? total 921600K, used 921600K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, 0xfffffffe02000000)
? from space 102400K, 100% used [0xfffffffe02000000, 0xfffffffe08400000, 0xfffffffe08400000)
? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000)
?concurrent mark-sweep generation total 5136384K, used 3582457K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
?concurrent-mark-sweep perm gen total 524288K, used 243634K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
1761096.682: [GC 1761096.682: [ParNew: 921600K->102400K(921600K), 0.2843209 secs] 4504057K->3786219K(6057984K), 0.2847419 secs] [Times: user=1.81 sys=0.10, real=0.29 secs] 
Heap after GC invocations=23861 (full 1465):
?par new generation?? total 921600K, used 102400K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000)
? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000, 0xfffffffe0e800000)
? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000)
?concurrent mark-sweep generation total 5136384K, used 3683819K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
?concurrent-mark-sweep perm gen total 524288K, used 243634K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
}
1761096.971: [GC [1 CMS-initial-mark: 3683819K(5136384K)] 3786817K(6057984K), 0.0480239 secs] [Times: user=0.05 sys=0.00, real=0.05 secs] 
1761097.020: [CMS-concurrent-mark-start]
1761101.145: [CMS-concurrent-mark: 4.104/4.124 secs] [Times: user=24.60 sys=1.09, real=4.13 secs] 
1761101.145: [CMS-concurrent-preclean-start]
1761101.438: [CMS-concurrent-preclean: 0.290/0.292 secs] [Times: user=0.41 sys=0.02, real=0.29 secs] 
1761101.438: [CMS-concurrent-abortable-preclean-start]
?CMS: abort preclean due to time 1761106.478: [CMS-concurrent-abortable-preclean: 4.694/5.040 secs] [Times: user=7.32 sys=0.23, real=5.04 secs] 
1761106.486: [GC[YG occupancy: 497634 K (921600 K)]1761106.487: [Rescan (parallel) , 0.3384965 secs]1761106.825: [weak refs processing, 0.0030734 secs]1761106.829: [class unloading, 0.1503426 secs]1761106.979: [scrub symbol & string tables, 0.1273150 secs] [1 CMS-remark: 3683819K(5136384K)] 4181454K(6057984K), 0.7055549 secs] [Times: user=1.25 sys=0.64, real=0.71 secs] 
1761107.193: [CMS-concurrent-sweep-start]
1761111.281: [CMS-concurrent-sweep: 4.088/4.088 secs] [Times: user=5.81 sys=0.15, real=4.09 secs] 
1761111.282: [CMS-concurrent-reset-start]
1761111.349: [CMS-concurrent-reset: 0.068/0.068 secs] [Times: user=0.08 sys=0.00, real=0.07 secs] 
1761111.961: [GC [1 CMS-initial-mark: 3633902K(5136384K)] 4261007K(6057984K), 0.5015835 secs] [Times: user=0.50 sys=0.00, real=0.50 secs] 
1761112.463: [CMS-concurrent-mark-start]
1761116.550: [CMS-concurrent-mark: 4.036/4.087 secs] [Times: user=24.85 sys=1.09, real=4.09 secs] 
1761116.551: [CMS-concurrent-preclean-start]
1761116.901: [CMS-concurrent-preclean: 0.344/0.350 secs] [Times: user=0.54 sys=0.01, real=0.35 secs] 
1761116.901: [CMS-concurrent-abortable-preclean-start]
{Heap before GC invocations=23861 (full 1467):
?par new generation?? total 921600K, used 921600K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, 0xfffffffe02000000)
? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000, 0xfffffffe0e800000)
? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000)
?concurrent mark-sweep generation total 5136384K, used 3633902K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
?concurrent-mark-sweep perm gen total 524288K, used 243740K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
1761120.554: [GC 1761120.554: [ParNew: 921600K->102400K(921600K), 0.4726199 secs] 4555502K->3819626K(6057984K), 0.4732486 secs] [Times: user=3.31 sys=0.69, real=0.47 secs] 
Heap after GC invocations=23862 (full 1467):
?par new generation?? total 921600K, used 102400K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000)
? from space 102400K, 100% used [0xfffffffe02000000, 0xfffffffe08400000, 0xfffffffe08400000)
? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000)
?concurrent mark-sweep generation total 5136384K, used 3717226K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
?concurrent-mark-sweep perm gen total 524288K, used 243740K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
}
?CMS: abort preclean due to time 1761122.392: [CMS-concurrent-abortable-preclean: 4.729/5.490 secs] [Times: user=11.71 sys=0.97, real=5.49 secs] 
1761122.401: [GC[YG occupancy: 177317 K (921600 K)]1761122.401: [Rescan (parallel) , 0.0250334 secs]1761122.427: [weak refs processing, 0.0002699 secs]1761122.427: [class unloading, 0.0817179 secs]1761122.509: [scrub symbol & string tables, 0.1383120 secs] [1 CMS-remark: 3717226K(5136384K)] 3894544K(6057984K), 0.3327016 secs] [Times: user=0.55 sys=0.04, real=0.33 secs] 
1761122.735: [CMS-concurrent-sweep-start]
1761126.843: [CMS-concurrent-sweep: 4.042/4.108 secs] [Times: user=6.70 sys=0.39, real=4.11 secs] 
1761126.844: [CMS-concurrent-reset-start]
1761126.907: [CMS-concurrent-reset: 0.063/0.063 secs] [Times: user=0.11 sys=0.00, real=0.06 secs] 
1761127.142: [GC [1 CMS-initial-mark: 3701154K(5136384K)] 4056638K(6057984K), 0.2853309 secs] [Times: user=0.29 sys=0.00, real=0.29 secs] 
1761127.428: [CMS-concurrent-mark-start]
1761131.876: [CMS-concurrent-mark: 4.398/4.448 secs] [Times: user=28.46 sys=1.55, real=4.45 secs] 
1761131.877: [CMS-concurrent-preclean-start]
1761132.185: [CMS-concurrent-preclean: 0.305/0.308 secs] [Times: user=0.60 sys=0.05, real=0.31 secs] 
1761132.186: [CMS-concurrent-abortable-preclean-start]
?CMS: abort preclean due to time 1761137.243: [CMS-concurrent-abortable-preclean: 5.029/5.058 secs] [Times: user=9.88 sys=0.42, real=5.06 secs] 
1761137.248: [GC[YG occupancy: 783876 K (921600 K)]1761137.248: [Rescan (parallel) , 0.5402015 secs]1761137.789: [weak refs processing, 0.0022809 secs]1761137.791: [class unloading, 0.1556933 secs]1761137.947: [scrub symbol & string tables, 0.1291759 secs] [1 CMS-remark: 3701154K(5136384K)] 4485030K(6057984K), 0.9154842 secs] [Times: user=1.67 sys=0.97, real=0.92 secs] 
1761138.164: [CMS-concurrent-sweep-start]
{Heap before GC invocations=23862 (full 1468):
?par new generation?? total 921600K, used 920346K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
? eden space 819200K,? 99% used [0xfffffffdd0000000, 0xfffffffe01ec6a48, 0xfffffffe02000000)
? from space 102400K, 100% used [0xfffffffe02000000, 0xfffffffe08400000, 0xfffffffe08400000)
? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000)
?concurrent mark-sweep generation total 5136384K, used 3694838K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
?concurrent-mark-sweep perm gen total 524288K, used 243810K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
1761140.028: [GC 1761140.028: [ParNew: 920346K->102400K(921600K), 0.4882607 secs] 4615185K->4017461K(6057984K), 0.4886748 secs] [Times: user=2.71 sys=0.20, real=0.49 secs] 
Heap after GC invocations=23863 (full 1468):
?par new generation?? total 921600K, used 102400K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000)
? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000, 0xfffffffe0e800000)
? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000)
?concurrent mark-sweep generation total 5136384K, used 3915061K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
?concurrent-mark-sweep perm gen total 524288K, used 243810K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
}
1761142.726: [CMS-concurrent-sweep: 4.011/4.562 secs] [Times: user=12.54 sys=0.74, real=4.56 secs] 
1761142.727: [CMS-concurrent-reset-start]
1761142.791: [CMS-concurrent-reset: 0.064/0.064 secs] [Times: user=0.16 sys=0.01, real=0.06 secs] 
1761143.233: [GC [1 CMS-initial-mark: 3852859K(5136384K)] 4152461K(6057984K), 0.2340877 secs] [Times: user=0.23 sys=0.00, real=0.23 secs] 
1761143.467: [CMS-concurrent-mark-start]
1761147.673: [CMS-concurrent-mark: 4.182/4.205 secs] [Times: user=26.19 sys=1.27, real=4.21 secs] 
1761147.673: [CMS-concurrent-preclean-start]
1761147.978: [CMS-concurrent-preclean: 0.300/0.304 secs] [Times: user=0.44 sys=0.02, real=0.30 secs] 
1761147.978: [CMS-concurrent-abortable-preclean-start]
{Heap before GC invocations=23863 (full 1469):
?par new generation?? total 921600K, used 602663K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
? eden space 819200K,? 61% used [0xfffffffdd0000000, 0xfffffffdee889c90, 0xfffffffe02000000)
? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000, 0xfffffffe0e800000)
? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000)
?concurrent mark-sweep generation total 5136384K, used 3852859K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
?concurrent-mark-sweep perm gen total 524288K, used 243656K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
1761152.969: [GC 1761152.969: [ParNew: 602663K->102400K(921600K), 0.4710031 secs]1761153.440: [CMS CMS: abort preclean due to time 1761155.705: [CMS-concurrent-abortable-preclean: 6.957/7.726 secs] [Times: user=13.02 sys=0.48, real=7.73 secs] 
?(concurrent mode failure): 4005428K->3905404K(5136384K), 32.6670849 secs] 4455522K->3905404K(6057984K), [CMS Perm : 243656K->243327K(524288K)], 33.1389061 secs] [Times: user=35.38 sys=0.26, real=33.14 secs] 
Heap after GC invocations=23864 (full 1470):
?par new generation?? total 921600K, used 0K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000)
? from space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000)
? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000)
?concurrent mark-sweep generation total 5136384K, used 3905404K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
?concurrent-mark-sweep perm gen total 524288K, used 243327K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
}
1761187.694: [GC [1 CMS-initial-mark: 4953978K(5136384K)] 4965714K(6057984K), 0.0131890 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] 
1761187.708: [CMS-concurrent-mark-start]
1761191.965: [CMS-concurrent-mark: 3.634/4.257 secs] [Times: user=32.76 sys=1.91, real=4.26 secs] 
1761191.966: [CMS-concurrent-preclean-start]
1761192.543: [CMS-concurrent-preclean: 0.553/0.577 secs] [Times: user=1.56 sys=0.12, real=0.58 secs] 
1761192.544: [CMS-concurrent-abortable-preclean-start]
?CMS: abort preclean due to time 1761197.612: [CMS-concurrent-abortable-preclean: 2.079/5.068 secs] [Times: user=9.11 sys=0.60, real=5.07 secs] 
1761197.617: [GC[YG occupancy: 813510 K (921600 K)]1761197.618: [Rescan (parallel) , 0.7500635 secs]1761198.368: [weak refs processing, 0.0020064 secs]1761198.370: [class unloading, 0.0823783 secs]1761198.453: [scrub symbol & string tables, 0.1278387 secs] [1 CMS-remark: 4953978K(5136384K)] 5767489K(6057984K), 1.0496971 secs] [Times: user=2.69 sys=1.79, real=1.05 secs] 
1761198.668: [CMS-concurrent-sweep-start]
{Heap before GC invocations=23864 (full 1471):
?par new generation?? total 921600K, used 819200K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, 0xfffffffe02000000)
? from space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000)
? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000)
?concurrent mark-sweep generation total 5136384K, used 4953976K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
?concurrent-mark-sweep perm gen total 524288K, used 243422K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
1761198.700: [GC 1761198.700: [ParNew: 819200K->819200K(921600K), 0.0000919 secs]1761198.700: [CMS1761202.072: [CMS-concurrent-sweep: 3.389/3.404 secs] [Times: user=3.60 sys=0.04, real=3.40 secs] 
?(concurrent mode failure): 4953976K->3789438K(5136384K), 32.6623615 secs] 5773176K->3789438K(6057984K), [CMS Perm : 243422K->243328K(524288K)], 32.6632802 secs] [Times: user=32.58 sys=0.03, real=32.66 secs] 
Heap after GC invocations=23865 (full 1472):
?par new generation?? total 921600K, used 0K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000)
? from space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000)
? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000)
?concurrent mark-sweep generation total 5136384K, used 3789438K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
?concurrent-mark-sweep perm gen total 524288K, used 243328K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
}
1761231.418: [GC [1 CMS-initial-mark: 3789438K(5136384K)] 3883471K(6057984K), 0.0609784 secs] [Times: user=0.06 sys=0.01, real=0.06 secs] 
1761231.480: [CMS-concurrent-mark-start]
1761236.061: [CMS-concurrent-mark: 3.752/4.580 secs] [Times: user=34.48 sys=2.81, real=4.58 secs] 
1761236.061: [CMS-concurrent-preclean-start]
1761236.428: [CMS-concurrent-preclean: 0.358/0.367 secs] [Times: user=0.46 sys=0.01, real=0.37 secs] 
1761236.429: [CMS-concurrent-abortable-preclean-start]
?CMS: abort preclean due to time 1761241.488: [CMS-concurrent-abortable-preclean: 2.384/5.059 secs] [Times: user=5.30 sys=0.75, real=5.06 secs] 
1761241.497: [GC[YG occupancy: 787969 K (921600 K)]1761241.497: [Rescan (parallel) , 0.5938799 secs]1761242.091: [weak refs processing, 0.0067469 secs]1761242.098: [class unloading, 0.0826078 secs]1761242.181: [scrub symbol & string tables, 0.1308434 secs] [1 CMS-remark: 3789438K(5136384K)] 4577408K(6057984K), 0.9017583 secs] [Times: user=2.66 sys=2.07, real=0.90 secs] 
1761242.400: [CMS-concurrent-sweep-start]
{Heap before GC invocations=23865 (full 1473):
?par new generation?? total 921600K, used 819200K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, 0xfffffffe02000000)
? from space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000)
? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000)
?concurrent mark-sweep generation total 5136384K, used 3789391K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
?concurrent-mark-sweep perm gen total 524288K, used 243406K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
1761242.586: [GC 1761242.586: [ParNew: 819200K->102400K(921600K), 0.1871926 secs] 4608591K->3940305K(6057984K), 0.1879045 secs] [Times: user=0.93 sys=0.05, real=0.19 secs] 
Heap after GC invocations=23866 (full 1473):
?par new generation?? total 921600K, used 102400K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000)
? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000, 0xfffffffe0e800000)
? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000)
?concurrent mark-sweep generation total 5136384K, used 3837905K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
?concurrent-mark-sweep perm gen total 524288K, used 243406K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
}
1761245.857: [CMS-concurrent-sweep: 3.225/3.457 secs] [Times: user=6.21 sys=0.52, real=3.46 secs] 
1761245.858: [CMS-concurrent-reset-start]
1761245.922: [CMS-concurrent-reset: 0.064/0.064 secs] [Times: user=0.08 sys=0.01, real=0.06 secs] 
1761247.301: [GC [1 CMS-initial-mark: 3676150K(5136384K)] 3952072K(6057984K), 0.2229528 secs] [Times: user=0.22 sys=0.00, real=0.22 secs] 
1761247.525: [CMS-concurrent-mark-start]
1761251.076: [CMS-concurrent-mark: 3.510/3.551 secs] [Times: user=23.68 sys=0.85, real=3.55 secs] 
1761251.076: [CMS-concurrent-preclean-start]
1761251.375: [CMS-concurrent-preclean: 0.295/0.298 secs] [Times: user=0.72 sys=0.04, real=0.30 secs] 
1761251.375: [CMS-concurrent-abortable-preclean-start]
?CMS: abort preclean due to time 1761256.460: [CMS-concurrent-abortable-preclean: 5.012/5.085 secs] [Times: user=9.93 sys=0.99, real=5.09 secs] 
1761256.469: [GC[YG occupancy: 720909 K (921600 K)]1761256.469: [Rescan (parallel) , 0.4663462 secs]1761256.936: [weak refs processing, 0.0153453 secs]1761256.951: [class unloading, 0.0833874 secs]1761257.035: [scrub symbol & string tables, 0.1289153 secs] [1 CMS-remark: 3676150K(5136384K)] 4397060K(6057984K), 0.7879219 secs] [Times: user=1.55 sys=0.96, real=0.79 secs] 
1761257.258: [CMS-concurrent-sweep-start]
{Heap before GC invocations=23866 (full 1474):
?par new generation?? total 921600K, used 921600K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, 0xfffffffe02000000)
? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000, 0xfffffffe0e800000)
? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000)
?concurrent mark-sweep generation total 5136384K, used 3669509K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
?concurrent-mark-sweep perm gen total 524288K, used 243414K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
1761259.137: [GC 1761259.138: [ParNew: 921600K->102400K(921600K), 0.3975686 secs] 4591109K->3894075K(6057984K), 0.3981608 secs] [Times: user=1.65 sys=0.15, real=0.40 secs] 
Heap after GC invocations=23867 (full 1474):
?par new generation?? total 921600K, used 102400K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000)
? from space 102400K, 100% used [0xfffffffe02000000, 0xfffffffe08400000, 0xfffffffe08400000)
? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000)
?concurrent mark-sweep generation total 5136384K, used 3791675K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
?concurrent-mark-sweep perm gen total 524288K, used 243414K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
}
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120412/16cc56a7/attachment-0001.html 
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: CMS Failure.txt
Url: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120412/16cc56a7/CMSFailure-0001.txt 
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: PARNEW Failure.txt
Url: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120412/16cc56a7/PARNEWFailure-0001.txt 

From aaisinzon at guidewire.com  Thu Apr 12 12:15:45 2012
From: aaisinzon at guidewire.com (Alex Aisinzon)
Date: Thu, 12 Apr 2012 19:15:45 +0000
Subject: Code cache
Message-ID: <43E49E6EC0E84F41B98C68AB6D7820C4170FA028@sm-ex-02-vm.guidewire.com>

Any feedback on this?

Best

Alex A

From: Alex Aisinzon
Sent: Monday, April 09, 2012 11:38 AM
To: 'hotspot-gc-use at openjdk.java.net'
Subject: Code cache

I ran performance tests on one of our apps and saw the following error message in the GC logs:
Java HotSpot(TM) 64-Bit Server VM warning: CodeCache is full. Compiler has been disabled.
Java HotSpot(TM) 64-Bit Server VM warning: Try increasing the code cache size using -XX:ReservedCodeCacheSize=

I scaled up the code cache to 512MB (-XX:ReservedCodeCacheSize=512m) and markedly improved performance/scalability.

I have a few questions:

*         Is there a logging option that shows how much of the code cache is really used so that I find the right cache size without oversizing it?

*         What factors play into the code cache utilization? I would guess that the amount of code to compile is the dominant factor. Are there other factors like load: I would guess that some entries in the cache may get invalidated if not used much and load could be a factor in this.

I was running on Sun JVM 1.6 update 30 64 bit on x86-64.

Best

Alex A
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120412/735b6e81/attachment.html 

From eric.caspole at amd.com  Thu Apr 12 12:26:11 2012
From: eric.caspole at amd.com (Eric Caspole)
Date: Thu, 12 Apr 2012 15:26:11 -0400
Subject: Code cache
In-Reply-To: <43E49E6EC0E84F41B98C68AB6D7820C4170FA028@sm-ex-02-vm.guidewire.com>
References: <43E49E6EC0E84F41B98C68AB6D7820C4170FA028@sm-ex-02-vm.guidewire.com>
Message-ID: <4320BF98-561E-43AC-85BC-9E291108AD9B@amd.com>

Hi Alex,
You can try -XX:+UseCodeCacheFlushing where the JVM will selectively  
age out some compiled code and free up code cache space. This is not  
on by default in JDK 6 as far as I know.

What is your application doing such that it frequently hits this  
problem?

Regards,
Eric


On Apr 12, 2012, at 3:15 PM, Alex Aisinzon wrote:

> Any feedback on this?
>
>
>
> Best
>
>
>
> Alex A
>
>
>
> From: Alex Aisinzon
> Sent: Monday, April 09, 2012 11:38 AM
> To: 'hotspot-gc-use at openjdk.java.net'
> Subject: Code cache
>
>
>
> I ran performance tests on one of our apps and saw the following  
> error message in the GC logs:
>
> Java HotSpot(TM) 64-Bit Server VM warning: CodeCache is full.  
> Compiler has been disabled.
>
> Java HotSpot(TM) 64-Bit Server VM warning: Try increasing the code  
> cache size using -XX:ReservedCodeCacheSize=
>
>
>
> I scaled up the code cache to 512MB (- 
> XX:ReservedCodeCacheSize=512m) and markedly improved performance/ 
> scalability.
>
>
>
> I have a few questions:
>
> ?         Is there a logging option that shows how much of the code  
> cache is really used so that I find the right cache size without  
> oversizing it?
>
> ?         What factors play into the code cache utilization? I  
> would guess that the amount of code to compile is the dominant  
> factor. Are there other factors like load: I would guess that some  
> entries in the cache may get invalidated if not used much and load  
> could be a factor in this.
>
>
>
> I was running on Sun JVM 1.6 update 30 64 bit on x86-64.
>
>
>
> Best
>
>
>
> Alex A
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use


From aaisinzon at guidewire.com  Thu Apr 12 13:30:33 2012
From: aaisinzon at guidewire.com (Alex Aisinzon)
Date: Thu, 12 Apr 2012 20:30:33 +0000
Subject: Code cache
In-Reply-To: <4320BF98-561E-43AC-85BC-9E291108AD9B@amd.com>
References: <43E49E6EC0E84F41B98C68AB6D7820C4170FA028@sm-ex-02-vm.guidewire.com>
	<4320BF98-561E-43AC-85BC-9E291108AD9B@amd.com>
Message-ID: <43E49E6EC0E84F41B98C68AB6D7820C4170FA1B8@sm-ex-02-vm.guidewire.com>

Hi Eric

I thank you for the feedback. I will give this tuning a try.
I have explored another approach: I have added the option -XX:+PrintCompilation to track code compilation.
This option is not very documented. I could infer that, without a larger code cache, about 11000 methods were compiled before hitting the issue.
When using a much larger cache (512MB), I saw that about 14000 methods were compiled. 
My understanding is that the code cache is 48MB for the platform I used (x64). A 14000/11000*48MB aka 61MB cache is likely to avoid the issue. I have started a performance test with a 64MB code cache to see if that indeed avoids the code cache full issue.
If so, I would have a method to find the right code cache size. 
I will report when I have the results. I will also report if -XX:+UseCodeCacheFlushing option provides similar results to the larger code cache.

As for your question on why our app is hitting this issue: our applications has become heavier in its use of compiled code so this is likely the consequence of that. 

Best

Alex A

-----Original Message-----
From: Eric Caspole [mailto:eric.caspole at amd.com] 
Sent: Thursday, April 12, 2012 12:26 PM
To: Alex Aisinzon
Cc: hotspot-gc-use at openjdk.java.net
Subject: Re: Code cache

Hi Alex,
You can try -XX:+UseCodeCacheFlushing where the JVM will selectively  
age out some compiled code and free up code cache space. This is not  
on by default in JDK 6 as far as I know.

What is your application doing such that it frequently hits this  
problem?

Regards,
Eric


On Apr 12, 2012, at 3:15 PM, Alex Aisinzon wrote:

> Any feedback on this?
>
>
>
> Best
>
>
>
> Alex A
>
>
>
> From: Alex Aisinzon
> Sent: Monday, April 09, 2012 11:38 AM
> To: 'hotspot-gc-use at openjdk.java.net'
> Subject: Code cache
>
>
>
> I ran performance tests on one of our apps and saw the following  
> error message in the GC logs:
>
> Java HotSpot(TM) 64-Bit Server VM warning: CodeCache is full.  
> Compiler has been disabled.
>
> Java HotSpot(TM) 64-Bit Server VM warning: Try increasing the code  
> cache size using -XX:ReservedCodeCacheSize=
>
>
>
> I scaled up the code cache to 512MB (- 
> XX:ReservedCodeCacheSize=512m) and markedly improved performance/ 
> scalability.
>
>
>
> I have a few questions:
>
> *         Is there a logging option that shows how much of the code  
> cache is really used so that I find the right cache size without  
> oversizing it?
>
> *         What factors play into the code cache utilization? I  
> would guess that the amount of code to compile is the dominant  
> factor. Are there other factors like load: I would guess that some  
> entries in the cache may get invalidated if not used much and load  
> could be a factor in this.
>
>
>
> I was running on Sun JVM 1.6 update 30 64 bit on x86-64.
>
>
>
> Best
>
>
>
> Alex A
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use


From dawid.weiss at gmail.com  Thu Apr 12 14:10:02 2012
From: dawid.weiss at gmail.com (Dawid Weiss)
Date: Thu, 12 Apr 2012 23:10:02 +0200
Subject: ParNew promotion failed, no expected OOM.
In-Reply-To: <CAM21Rt-=9_onCt7=dNX31rj5MNgmnhW17gsD1DKQiTwr7bgzOQ@mail.gmail.com>
References: <CAM21Rt-Txh5ceNreLJMTB9TXQCxcqZUSjNeu8VszqqLGst7c=g@mail.gmail.com>
	<CABzyjy=KSs1VUjrvD-dCmZ7hmtBsERnNJ+-wKET=4AXN3useJw@mail.gmail.com>
	<4F8675F7.9040205@oracle.com>
	<CAM21Rt-=9_onCt7=dNX31rj5MNgmnhW17gsD1DKQiTwr7bgzOQ@mail.gmail.com>
Message-ID: <CAM21Rt-4hf5060du6c5Qrv5U=S_ugXTG6EPKQCcxCU1ky+uWXQ@mail.gmail.com>

I've spent some time trying to pinpoint the problem and provide a
reproducible scenario but I temporarily accept the fact that I am
defeated by the darn machine.

Anyway, big thanks for feedback guys.

Dawid

On Thu, Apr 12, 2012 at 8:42 AM, Dawid Weiss <dawid.weiss at gmail.com> wrote:
>> I never want to throw an OOM and then have to argue about? whether
>> the OOM was thrown prematurely.? That would be a bug.? As a consequence
>
> I agree the tradeoff here is very subtle and there is probably no
> optimal setting. I'll dig deeper in a spare minute and see if I can
> repreduce this on a simpler example.
>
> Dawid

From alexey.ragozin at gmail.com  Fri Apr 13 04:51:34 2012
From: alexey.ragozin at gmail.com (Alexey Ragozin)
Date: Fri, 13 Apr 2012 11:51:34 +0000
Subject: Need help about CMS Failure and ParNew failure
Message-ID: <CAMgTVmLXr=mCnJT-5hpjN1tucgSVx4-3pang28r5CD-y9cx4Ng@mail.gmail.com>

Hi Erwin,

Promotion failures are happening due to fragmentation of old space. It
is normal for fragmentation to build up over time. Most simple way to
fight fragmentation - create large old space from start (if you use
JVM below 6u26, it is worth to upgrade).

Concurrent mode failure means that concurrent collection cycle is
starting too late or heap size is not enough. Again allocation more
heap from start is simplest remedy.

Your logs also indicating problem with initial mark pause time. I have
written simple guide line for setting up CMS collector for minimal
pauses, you can find more details by link
http://blog.ragozin.info/2011/07/gc-check-list-for-data-grid-nodes.html

You can also read more about promotion failure / fragmentation by links below
http://blog.ragozin.info/2011/10/java-cg-hotspots-cms-and-heap.html
http://blog.ragozin.info/2011/10/cms-heap-fragmentation-follow-up-1.html
http://blog.ragozin.info/2011/11/java-gc-hotspots-cms-promotion-buffers.html

> Date: Thu, 12 Apr 2012 09:27:13 -0700 (PDT)
> From: Erwin <tanman12345 at yahoo.com>
> Subject: Need help about CMS Failure and ParNew failure
> To: "hotspot-gc-use at openjdk.java.net"
> ? ? ? ?<hotspot-gc-use at openjdk.java.net>
> Message-ID:
> ? ? ? ?<1334248033.11363.YahooMailNeo at web111103.mail.gq1.yahoo.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Hello,
> ?
> I'm not an expert when it comes to analyzing GC output and was wondering if you guys could assist? We're using Solaris 10 10/08 s10s_u6wos_07b SPARC, with WAS NDE 7.0.0.9. After a restart of our JVMs, GC seems to be fine. However, after about a week, we start seeing failures in GC log. We're getting ParNew and Concurrent mod failures. Our JVM configurations?are below:
> Min heap - 4096
> Max heap - 6016
> ?
> JVM Arguments
> -server -Xmn1000m -XX:PermSize=512m -XX:+UseConcMarkSweepGC -XX:+HeapDumpOnOutOfMemoryError -DUseSunHttpHandler=true -Djavax.xml.soap.MessageFactory=weblogic.xml.saaj.MessageFactoryImpl -Doracle.jdbc.V8Compatible=true -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSClassUnloadingEnabled -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:-TraceClassUnloading -XX:+PrintHeapAtGC? -XX:MaxPermSize=694m -XX:+DisableExplicitGC -XX:+CMSParallelRemarkEnabled?
> ?
> I'm attaching the ParNew failure as well as CMS failure files. Hope it attaches.Total of 2 files. In case they don't see below. !st same if ParNew, 2nd is CMS failure. - Thanks, Erwin
> ParNew failure sample:
> {Heap before GC invocations=7800 (full 529):
> ?par new generation?? total 921600K, used 530694K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
> ? eden space 819200K,? 52% used [0xfffffffdd0000000, 0xfffffffdea241bb8, 0xfffffffe02000000)
> ? from space 102400K, 100% used [0xfffffffe02000000, 0xfffffffe08400000, 0xfffffffe08400000)
> ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000)
> ?concurrent mark-sweep generation total 4902464K, used 2682365K [0xfffffffe0e800000, 0xffffffff39b90000, 0xffffffff48000000)
> ?concurrent-mark-sweep perm gen total 524288K, used 238782K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
> 552370.958: [GC 552370.958: [ParNew: 530694K->93237K(921600K), 0.1858500 secs] 3213060K->2832744K(5824064K), 0.1862466 secs] [Times: user=1.21 sys=0.08, real=0.19 secs]
> Heap after GC invocations=7801 (full 529):
> ?par new generation?? total 921600K, used 93237K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
> ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000)
> ? from space 102400K,? 91% used [0xfffffffe08400000, 0xfffffffe0df0d498, 0xfffffffe0e800000)
> ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000)
> ?concurrent mark-sweep generation total 4902464K, used 2739507K [0xfffffffe0e800000, 0xffffffff39b90000, 0xffffffff48000000)
> ?concurrent-mark-sweep perm gen total 524288K, used 238782K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
> }
> {Heap before GC invocations=7801 (full 529):
> ?par new generation?? total 921600K, used 912377K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
> ? eden space 819200K,? 99% used [0xfffffffdd0000000, 0xfffffffe01ff1320, 0xfffffffe02000000)
> ? from space 102400K,? 91% used [0xfffffffe08400000, 0xfffffffe0df0d498, 0xfffffffe0e800000)
> ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000)
> ?concurrent mark-sweep generation total 4902464K, used 2739507K [0xfffffffe0e800000, 0xffffffff39b90000, 0xffffffff48000000)
> ?concurrent-mark-sweep perm gen total 524288K, used 238795K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
> 552372.849: [GC 552372.849: [ParNew (promotion failed): 912377K->869343K(921600K), 0.2641392 secs]552373.113: [CMS: 2791714K->3203612K(4902464K), 29.1902704 secs] 3651885K->3203612K(5824064K), [CMS Perm : 238795K->238246K(524288K)], 29.4609781 secs] [Times: user=30.05 sys=0.13, real=29.46 secs]
> Heap after GC invocations=7802 (full 530):
> ?par new generation?? total 921600K, used 0K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
> ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000)
> ? from space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000)
> ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000)
> ?concurrent mark-sweep generation total 5136384K, used 3203612K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
> ?concurrent-mark-sweep perm gen total 524288K, used 238246K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
> }
> {Heap before GC invocations=12696 (full 809):
> ?par new generation?? total 921600K, used 908565K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
> ? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, 0xfffffffe02000000)
> ? from space 102400K,? 87% used [0xfffffffe02000000, 0xfffffffe07745510, 0xfffffffe08400000)
> ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000)
> ?concurrent mark-sweep generation total 5136384K, used 2696786K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
> ?concurrent-mark-sweep perm gen total 524288K, used 241380K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
> 980120.502: [GC 980120.502: [ParNew: 908565K->73974K(921600K), 0.1519646 secs] 3605352K->2817972K(6057984K), 0.1523927 secs] [Times: user=1.07 sys=0.03, real=0.15 secs]
> Heap after GC invocations=12697 (full 809):
> ?par new generation?? total 921600K, used 73974K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
> ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000)
> ? from space 102400K,? 72% used [0xfffffffe08400000, 0xfffffffe0cc3d928, 0xfffffffe0e800000)
> ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000)
> ?concurrent mark-sweep generation total 5136384K, used 2743998K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
> ?concurrent-mark-sweep perm gen total 524288K, used 241380K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
> }
> {Heap before GC invocations=12697 (full 809):
> ?par new generation?? total 921600K, used 893174K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
> ? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, 0xfffffffe02000000)
> ? from space 102400K,? 72% used [0xfffffffe08400000, 0xfffffffe0cc3d928, 0xfffffffe0e800000)
> ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000)
> ?concurrent mark-sweep generation total 5136384K, used 2743998K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
> ?concurrent-mark-sweep perm gen total 524288K, used 241411K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
> 980130.777: [GC 980130.777: [ParNew (promotion failed): 893174K->913391K(921600K), 0.5914616 secs]980131.368: [CMS: 2778416K->2710999K(5136384K), 27.7981960 secs] 3637172K->2710999K(6057984K), [CMS Perm : 241411K->240494K(524288K)], 28.3902578 secs] [Times: user=29.37 sys=0.08, real=28.39 secs]
> Heap after GC invocations=12698 (full 810):
> ?par new generation?? total 921600K, used 0K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
> ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000)
> ? from space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000)
> ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000)
> ?concurrent mark-sweep generation total 5136384K, used 2710999K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
> ?concurrent-mark-sweep perm gen total 524288K, used 240494K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
> }
> {Heap before GC invocations=12698 (full 810):
> ?par new generation?? total 921600K, used 819200K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
> ? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, 0xfffffffe02000000)
> ? from space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000)
> ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000)
> ?concurrent mark-sweep generation total 5136384K, used 2710999K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
> ?concurrent-mark-sweep perm gen total 524288K, used 240523K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
> 980171.033: [GC 980171.033: [ParNew: 819200K->102400K(921600K), 0.2144047 secs] 3530199K->2857892K(6057984K), 0.2149864 secs] [Times: user=0.99 sys=0.03, real=0.22 secs]
> Heap after GC invocations=12699 (full 810):
> ?par new generation?? total 921600K, used 102400K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
> ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000)
> ? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000, 0xfffffffe0e800000)
> ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000)
> ?concurrent mark-sweep generation total 5136384K, used 2755492K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
> ?concurrent-mark-sweep perm gen total 524288K, used 240523K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
> }
>
> CMS Failure:
> {Heap before GC invocations=23856 (full 1462):
> ?par new generation?? total 921600K, used 921600K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
> ? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, 0xfffffffe02000000)
> ? from space 102400K, 100% used [0xfffffffe02000000, 0xfffffffe08400000, 0xfffffffe08400000)
> ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000)
> ?concurrent mark-sweep generation total 5136384K, used 3496920K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
> ?concurrent-mark-sweep perm gen total 524288K, used 243562K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
> 1761014.981: [GC 1761014.981: [ParNew: 921600K->102400K(921600K), 0.3004508 secs] 4418520K->3667695K(6057984K), 0.3008667 secs] [Times: user=1.69 sys=0.10, real=0.30 secs]
> Heap after GC invocations=23857 (full 1462):
> ?par new generation?? total 921600K, used 102400K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
> ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000)
> ? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000, 0xfffffffe0e800000)
> ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000)
> ?concurrent mark-sweep generation total 5136384K, used 3565295K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
> ?concurrent-mark-sweep perm gen total 524288K, used 243562K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
> }
> {Heap before GC invocations=23857 (full 1462):
> ?par new generation?? total 921600K, used 921600K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
> ? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, 0xfffffffe02000000)
> ? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000, 0xfffffffe0e800000)
> ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000)
> ?concurrent mark-sweep generation total 5136384K, used 3565295K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
> ?concurrent-mark-sweep perm gen total 524288K, used 243773K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
> 1761036.997: [GC 1761036.998: [ParNew: 921600K->102400K(921600K), 0.4075457 secs] 4486895K->3763683K(6057984K), 0.4079591 secs] [Times: user=3.57 sys=0.80, real=0.41 secs]
> Heap after GC invocations=23858 (full 1462):
> ?par new generation?? total 921600K, used 102400K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
> ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000)
> ? from space 102400K, 100% used [0xfffffffe02000000, 0xfffffffe08400000, 0xfffffffe08400000)
> ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000)
> ?concurrent mark-sweep generation total 5136384K, used 3661283K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
> ?concurrent-mark-sweep perm gen total 524288K, used 243773K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
> }
> 1761037.410: [GC [1 CMS-initial-mark: 3661283K(5136384K)] 3763683K(6057984K), 0.0883369 secs] [Times: user=0.09 sys=0.00, real=0.09 secs]
> 1761037.499: [CMS-concurrent-mark-start]
> 1761041.447: [CMS-concurrent-mark: 3.906/3.948 secs] [Times: user=25.81 sys=1.06, real=3.95 secs]
> 1761041.448: [CMS-concurrent-preclean-start]
> 1761041.763: [CMS-concurrent-preclean: 0.312/0.315 secs] [Times: user=0.50 sys=0.02, real=0.32 secs]
> 1761041.763: [CMS-concurrent-abortable-preclean-start]
> ?CMS: abort preclean due to time 1761046.800: [CMS-concurrent-abortable-preclean: 4.720/5.036 secs] [Times: user=6.68 sys=0.18, real=5.04 secs]
> 1761046.808: [GC[YG occupancy: 464701 K (921600 K)]1761046.808: [Rescan (parallel) , 0.3034664 secs]1761047.112: [weak refs processing, 0.0152564 secs]1761047.128: [class unloading, 0.1518160 secs]1761047.280: [scrub symbol & string tables, 0.1332523 secs] [1 CMS-remark: 3661283K(5136384K)] 4125985K(6057984K), 0.6980401 secs] [Times: user=1.34 sys=0.70, real=0.70 secs]
> 1761047.507: [CMS-concurrent-sweep-start]
> 1761051.779: [CMS-concurrent-sweep: 4.252/4.271 secs] [Times: user=6.30 sys=0.19, real=4.27 secs]
> 1761051.779: [CMS-concurrent-reset-start]
> 1761051.837: [CMS-concurrent-reset: 0.058/0.058 secs] [Times: user=0.07 sys=0.00, real=0.06 secs]
> {Heap before GC invocations=23858 (full 1463):
> ?par new generation?? total 921600K, used 921600K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
> ? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, 0xfffffffe02000000)
> ? from space 102400K, 100% used [0xfffffffe02000000, 0xfffffffe08400000, 0xfffffffe08400000)
> ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000)
> ?concurrent mark-sweep generation total 5136384K, used 3514703K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
> ?concurrent-mark-sweep perm gen total 524288K, used 243613K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
> 1761061.144: [GC 1761061.145: [ParNew: 921600K->102400K(921600K), 0.4124278 secs] 4436303K->3717736K(6057984K), 0.4128777 secs] [Times: user=1.98 sys=0.19, real=0.41 secs]
> Heap after GC invocations=23859 (full 1463):
> ?par new generation?? total 921600K, used 102400K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
> ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000)
> ? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000, 0xfffffffe0e800000)
> ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000)
> ?concurrent mark-sweep generation total 5136384K, used 3615336K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
> ?concurrent-mark-sweep perm gen total 524288K, used 243613K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
> }
> 1761061.562: [GC [1 CMS-initial-mark: 3615336K(5136384K)] 3718604K(6057984K), 0.0660086 secs] [Times: user=0.07 sys=0.00, real=0.07 secs]
> 1761061.629: [CMS-concurrent-mark-start]
> 1761065.589: [CMS-concurrent-mark: 3.920/3.960 secs] [Times: user=26.20 sys=1.05, real=3.96 secs]
> 1761065.590: [CMS-concurrent-preclean-start]
> 1761065.883: [CMS-concurrent-preclean: 0.290/0.293 secs] [Times: user=0.54 sys=0.02, real=0.29 secs]
> 1761065.883: [CMS-concurrent-abortable-preclean-start]
> ?CMS: abort preclean due to time 1761070.950: [CMS-concurrent-abortable-preclean: 5.035/5.067 secs] [Times: user=10.70 sys=0.36, real=5.07 secs]
> 1761070.958: [GC[YG occupancy: 656197 K (921600 K)]1761070.959: [Rescan (parallel) , 0.5056315 secs]1761071.465: [weak refs processing, 0.0107058 secs]1761071.476: [class unloading, 0.1500832 secs]1761071.626: [scrub symbol & string tables, 0.1278517 secs] [1 CMS-remark: 3615336K(5136384K)] 4271533K(6057984K), 0.8857121 secs] [Times: user=1.77 sys=1.08, real=0.89 secs]
> 1761071.845: [CMS-concurrent-sweep-start]
> 1761075.956: [CMS-concurrent-sweep: 4.094/4.111 secs] [Times: user=7.97 sys=0.27, real=4.11 secs]
> 1761075.957: [CMS-concurrent-reset-start]
> 1761076.031: [CMS-concurrent-reset: 0.063/0.074 secs] [Times: user=0.13 sys=0.01, real=0.07 secs]
> {Heap before GC invocations=23859 (full 1464):
> ?par new generation?? total 921600K, used 921600K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
> ? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, 0xfffffffe02000000)
> ? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000, 0xfffffffe0e800000)
> ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000)
> ?concurrent mark-sweep generation total 5136384K, used 3544377K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
> ?concurrent-mark-sweep perm gen total 524288K, used 243474K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
> 1761077.540: [GC 1761077.540: [ParNew: 921600K->102400K(921600K), 0.4030394 secs] 4465977K->3740399K(6057984K), 0.4034742 secs] [Times: user=3.14 sys=0.55, real=0.40 secs]
> Heap after GC invocations=23860 (full 1464):
> ?par new generation?? total 921600K, used 102400K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
> ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000)
> ? from space 102400K, 100% used [0xfffffffe02000000, 0xfffffffe08400000, 0xfffffffe08400000)
> ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000)
> ?concurrent mark-sweep generation total 5136384K, used 3637999K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
> ?concurrent-mark-sweep perm gen total 524288K, used 243474K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
> }
> 1761077.948: [GC [1 CMS-initial-mark: 3637999K(5136384K)] 3740403K(6057984K), 0.0664402 secs] [Times: user=0.06 sys=0.00, real=0.07 secs]
> 1761078.015: [CMS-concurrent-mark-start]
> 1761082.141: [CMS-concurrent-mark: 4.076/4.126 secs] [Times: user=25.56 sys=1.24, real=4.13 secs]
> 1761082.142: [CMS-concurrent-preclean-start]
> 1761082.435: [CMS-concurrent-preclean: 0.290/0.293 secs] [Times: user=0.56 sys=0.03, real=0.29 secs]
> 1761082.435: [CMS-concurrent-abortable-preclean-start]
> ?CMS: abort preclean due to time 1761087.544: [CMS-concurrent-abortable-preclean: 4.166/5.108 secs] [Times: user=6.79 sys=0.38, real=5.11 secs]
> 1761087.554: [GC[YG occupancy: 612230 K (921600 K)]1761087.555: [Rescan (parallel) , 0.3453344 secs]1761087.900: [weak refs processing, 0.0033384 secs]1761087.904: [class unloading, 0.1515234 secs]1761088.055: [scrub symbol & string tables, 0.1280533 secs] [1 CMS-remark: 3637999K(5136384K)] 4250230K(6057984K), 0.7189376 secs] [Times: user=1.29 sys=0.76, real=0.72 secs]
> 1761088.274: [CMS-concurrent-sweep-start]
> 1761092.543: [CMS-concurrent-sweep: 4.268/4.268 secs] [Times: user=6.72 sys=0.26, real=4.27 secs]
> 1761092.543: [CMS-concurrent-reset-start]
> 1761092.606: [CMS-concurrent-reset: 0.063/0.063 secs] [Times: user=0.11 sys=0.01, real=0.06 secs]
> {Heap before GC invocations=23860 (full 1465):
> ?par new generation?? total 921600K, used 921600K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
> ? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, 0xfffffffe02000000)
> ? from space 102400K, 100% used [0xfffffffe02000000, 0xfffffffe08400000, 0xfffffffe08400000)
> ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000)
> ?concurrent mark-sweep generation total 5136384K, used 3582457K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
> ?concurrent-mark-sweep perm gen total 524288K, used 243634K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
> 1761096.682: [GC 1761096.682: [ParNew: 921600K->102400K(921600K), 0.2843209 secs] 4504057K->3786219K(6057984K), 0.2847419 secs] [Times: user=1.81 sys=0.10, real=0.29 secs]
> Heap after GC invocations=23861 (full 1465):
> ?par new generation?? total 921600K, used 102400K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
> ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000)
> ? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000, 0xfffffffe0e800000)
> ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000)
> ?concurrent mark-sweep generation total 5136384K, used 3683819K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
> ?concurrent-mark-sweep perm gen total 524288K, used 243634K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
> }
> 1761096.971: [GC [1 CMS-initial-mark: 3683819K(5136384K)] 3786817K(6057984K), 0.0480239 secs] [Times: user=0.05 sys=0.00, real=0.05 secs]
> 1761097.020: [CMS-concurrent-mark-start]
> 1761101.145: [CMS-concurrent-mark: 4.104/4.124 secs] [Times: user=24.60 sys=1.09, real=4.13 secs]
> 1761101.145: [CMS-concurrent-preclean-start]
> 1761101.438: [CMS-concurrent-preclean: 0.290/0.292 secs] [Times: user=0.41 sys=0.02, real=0.29 secs]
> 1761101.438: [CMS-concurrent-abortable-preclean-start]
> ?CMS: abort preclean due to time 1761106.478: [CMS-concurrent-abortable-preclean: 4.694/5.040 secs] [Times: user=7.32 sys=0.23, real=5.04 secs]
> 1761106.486: [GC[YG occupancy: 497634 K (921600 K)]1761106.487: [Rescan (parallel) , 0.3384965 secs]1761106.825: [weak refs processing, 0.0030734 secs]1761106.829: [class unloading, 0.1503426 secs]1761106.979: [scrub symbol & string tables, 0.1273150 secs] [1 CMS-remark: 3683819K(5136384K)] 4181454K(6057984K), 0.7055549 secs] [Times: user=1.25 sys=0.64, real=0.71 secs]
> 1761107.193: [CMS-concurrent-sweep-start]
> 1761111.281: [CMS-concurrent-sweep: 4.088/4.088 secs] [Times: user=5.81 sys=0.15, real=4.09 secs]
> 1761111.282: [CMS-concurrent-reset-start]
> 1761111.349: [CMS-concurrent-reset: 0.068/0.068 secs] [Times: user=0.08 sys=0.00, real=0.07 secs]
> 1761111.961: [GC [1 CMS-initial-mark: 3633902K(5136384K)] 4261007K(6057984K), 0.5015835 secs] [Times: user=0.50 sys=0.00, real=0.50 secs]
> 1761112.463: [CMS-concurrent-mark-start]
> 1761116.550: [CMS-concurrent-mark: 4.036/4.087 secs] [Times: user=24.85 sys=1.09, real=4.09 secs]
> 1761116.551: [CMS-concurrent-preclean-start]
> 1761116.901: [CMS-concurrent-preclean: 0.344/0.350 secs] [Times: user=0.54 sys=0.01, real=0.35 secs]
> 1761116.901: [CMS-concurrent-abortable-preclean-start]
> {Heap before GC invocations=23861 (full 1467):
> ?par new generation?? total 921600K, used 921600K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
> ? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, 0xfffffffe02000000)
> ? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000, 0xfffffffe0e800000)
> ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000)
> ?concurrent mark-sweep generation total 5136384K, used 3633902K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
> ?concurrent-mark-sweep perm gen total 524288K, used 243740K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
> 1761120.554: [GC 1761120.554: [ParNew: 921600K->102400K(921600K), 0.4726199 secs] 4555502K->3819626K(6057984K), 0.4732486 secs] [Times: user=3.31 sys=0.69, real=0.47 secs]
> Heap after GC invocations=23862 (full 1467):
> ?par new generation?? total 921600K, used 102400K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
> ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000)
> ? from space 102400K, 100% used [0xfffffffe02000000, 0xfffffffe08400000, 0xfffffffe08400000)
> ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000)
> ?concurrent mark-sweep generation total 5136384K, used 3717226K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
> ?concurrent-mark-sweep perm gen total 524288K, used 243740K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
> }
> ?CMS: abort preclean due to time 1761122.392: [CMS-concurrent-abortable-preclean: 4.729/5.490 secs] [Times: user=11.71 sys=0.97, real=5.49 secs]
> 1761122.401: [GC[YG occupancy: 177317 K (921600 K)]1761122.401: [Rescan (parallel) , 0.0250334 secs]1761122.427: [weak refs processing, 0.0002699 secs]1761122.427: [class unloading, 0.0817179 secs]1761122.509: [scrub symbol & string tables, 0.1383120 secs] [1 CMS-remark: 3717226K(5136384K)] 3894544K(6057984K), 0.3327016 secs] [Times: user=0.55 sys=0.04, real=0.33 secs]
> 1761122.735: [CMS-concurrent-sweep-start]
> 1761126.843: [CMS-concurrent-sweep: 4.042/4.108 secs] [Times: user=6.70 sys=0.39, real=4.11 secs]
> 1761126.844: [CMS-concurrent-reset-start]
> 1761126.907: [CMS-concurrent-reset: 0.063/0.063 secs] [Times: user=0.11 sys=0.00, real=0.06 secs]
> 1761127.142: [GC [1 CMS-initial-mark: 3701154K(5136384K)] 4056638K(6057984K), 0.2853309 secs] [Times: user=0.29 sys=0.00, real=0.29 secs]
> 1761127.428: [CMS-concurrent-mark-start]
> 1761131.876: [CMS-concurrent-mark: 4.398/4.448 secs] [Times: user=28.46 sys=1.55, real=4.45 secs]
> 1761131.877: [CMS-concurrent-preclean-start]
> 1761132.185: [CMS-concurrent-preclean: 0.305/0.308 secs] [Times: user=0.60 sys=0.05, real=0.31 secs]
> 1761132.186: [CMS-concurrent-abortable-preclean-start]
> ?CMS: abort preclean due to time 1761137.243: [CMS-concurrent-abortable-preclean: 5.029/5.058 secs] [Times: user=9.88 sys=0.42, real=5.06 secs]
> 1761137.248: [GC[YG occupancy: 783876 K (921600 K)]1761137.248: [Rescan (parallel) , 0.5402015 secs]1761137.789: [weak refs processing, 0.0022809 secs]1761137.791: [class unloading, 0.1556933 secs]1761137.947: [scrub symbol & string tables, 0.1291759 secs] [1 CMS-remark: 3701154K(5136384K)] 4485030K(6057984K), 0.9154842 secs] [Times: user=1.67 sys=0.97, real=0.92 secs]
> 1761138.164: [CMS-concurrent-sweep-start]
> {Heap before GC invocations=23862 (full 1468):
> ?par new generation?? total 921600K, used 920346K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
> ? eden space 819200K,? 99% used [0xfffffffdd0000000, 0xfffffffe01ec6a48, 0xfffffffe02000000)
> ? from space 102400K, 100% used [0xfffffffe02000000, 0xfffffffe08400000, 0xfffffffe08400000)
> ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000)
> ?concurrent mark-sweep generation total 5136384K, used 3694838K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
> ?concurrent-mark-sweep perm gen total 524288K, used 243810K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
> 1761140.028: [GC 1761140.028: [ParNew: 920346K->102400K(921600K), 0.4882607 secs] 4615185K->4017461K(6057984K), 0.4886748 secs] [Times: user=2.71 sys=0.20, real=0.49 secs]
> Heap after GC invocations=23863 (full 1468):
> ?par new generation?? total 921600K, used 102400K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
> ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000)
> ? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000, 0xfffffffe0e800000)
> ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000)
> ?concurrent mark-sweep generation total 5136384K, used 3915061K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
> ?concurrent-mark-sweep perm gen total 524288K, used 243810K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
> }
> 1761142.726: [CMS-concurrent-sweep: 4.011/4.562 secs] [Times: user=12.54 sys=0.74, real=4.56 secs]
> 1761142.727: [CMS-concurrent-reset-start]
> 1761142.791: [CMS-concurrent-reset: 0.064/0.064 secs] [Times: user=0.16 sys=0.01, real=0.06 secs]
> 1761143.233: [GC [1 CMS-initial-mark: 3852859K(5136384K)] 4152461K(6057984K), 0.2340877 secs] [Times: user=0.23 sys=0.00, real=0.23 secs]
> 1761143.467: [CMS-concurrent-mark-start]
> 1761147.673: [CMS-concurrent-mark: 4.182/4.205 secs] [Times: user=26.19 sys=1.27, real=4.21 secs]
> 1761147.673: [CMS-concurrent-preclean-start]
> 1761147.978: [CMS-concurrent-preclean: 0.300/0.304 secs] [Times: user=0.44 sys=0.02, real=0.30 secs]
> 1761147.978: [CMS-concurrent-abortable-preclean-start]
> {Heap before GC invocations=23863 (full 1469):
> ?par new generation?? total 921600K, used 602663K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
> ? eden space 819200K,? 61% used [0xfffffffdd0000000, 0xfffffffdee889c90, 0xfffffffe02000000)
> ? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000, 0xfffffffe0e800000)
> ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000)
> ?concurrent mark-sweep generation total 5136384K, used 3852859K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
> ?concurrent-mark-sweep perm gen total 524288K, used 243656K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
> 1761152.969: [GC 1761152.969: [ParNew: 602663K->102400K(921600K), 0.4710031 secs]1761153.440: [CMS CMS: abort preclean due to time 1761155.705: [CMS-concurrent-abortable-preclean: 6.957/7.726 secs] [Times: user=13.02 sys=0.48, real=7.73 secs]
> ?(concurrent mode failure): 4005428K->3905404K(5136384K), 32.6670849 secs] 4455522K->3905404K(6057984K), [CMS Perm : 243656K->243327K(524288K)], 33.1389061 secs] [Times: user=35.38 sys=0.26, real=33.14 secs]
> Heap after GC invocations=23864 (full 1470):
> ?par new generation?? total 921600K, used 0K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
> ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000)
> ? from space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000)
> ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000)
> ?concurrent mark-sweep generation total 5136384K, used 3905404K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
> ?concurrent-mark-sweep perm gen total 524288K, used 243327K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
> }
> 1761187.694: [GC [1 CMS-initial-mark: 4953978K(5136384K)] 4965714K(6057984K), 0.0131890 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
> 1761187.708: [CMS-concurrent-mark-start]
> 1761191.965: [CMS-concurrent-mark: 3.634/4.257 secs] [Times: user=32.76 sys=1.91, real=4.26 secs]
> 1761191.966: [CMS-concurrent-preclean-start]
> 1761192.543: [CMS-concurrent-preclean: 0.553/0.577 secs] [Times: user=1.56 sys=0.12, real=0.58 secs]
> 1761192.544: [CMS-concurrent-abortable-preclean-start]
> ?CMS: abort preclean due to time 1761197.612: [CMS-concurrent-abortable-preclean: 2.079/5.068 secs] [Times: user=9.11 sys=0.60, real=5.07 secs]
> 1761197.617: [GC[YG occupancy: 813510 K (921600 K)]1761197.618: [Rescan (parallel) , 0.7500635 secs]1761198.368: [weak refs processing, 0.0020064 secs]1761198.370: [class unloading, 0.0823783 secs]1761198.453: [scrub symbol & string tables, 0.1278387 secs] [1 CMS-remark: 4953978K(5136384K)] 5767489K(6057984K), 1.0496971 secs] [Times: user=2.69 sys=1.79, real=1.05 secs]
> 1761198.668: [CMS-concurrent-sweep-start]
> {Heap before GC invocations=23864 (full 1471):
> ?par new generation?? total 921600K, used 819200K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
> ? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, 0xfffffffe02000000)
> ? from space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000)
> ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000)
> ?concurrent mark-sweep generation total 5136384K, used 4953976K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
> ?concurrent-mark-sweep perm gen total 524288K, used 243422K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
> 1761198.700: [GC 1761198.700: [ParNew: 819200K->819200K(921600K), 0.0000919 secs]1761198.700: [CMS1761202.072: [CMS-concurrent-sweep: 3.389/3.404 secs] [Times: user=3.60 sys=0.04, real=3.40 secs]
> ?(concurrent mode failure): 4953976K->3789438K(5136384K), 32.6623615 secs] 5773176K->3789438K(6057984K), [CMS Perm : 243422K->243328K(524288K)], 32.6632802 secs] [Times: user=32.58 sys=0.03, real=32.66 secs]
> Heap after GC invocations=23865 (full 1472):
> ?par new generation?? total 921600K, used 0K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
> ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000)
> ? from space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000)
> ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000)
> ?concurrent mark-sweep generation total 5136384K, used 3789438K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
> ?concurrent-mark-sweep perm gen total 524288K, used 243328K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
> }
> 1761231.418: [GC [1 CMS-initial-mark: 3789438K(5136384K)] 3883471K(6057984K), 0.0609784 secs] [Times: user=0.06 sys=0.01, real=0.06 secs]
> 1761231.480: [CMS-concurrent-mark-start]
> 1761236.061: [CMS-concurrent-mark: 3.752/4.580 secs] [Times: user=34.48 sys=2.81, real=4.58 secs]
> 1761236.061: [CMS-concurrent-preclean-start]
> 1761236.428: [CMS-concurrent-preclean: 0.358/0.367 secs] [Times: user=0.46 sys=0.01, real=0.37 secs]
> 1761236.429: [CMS-concurrent-abortable-preclean-start]
> ?CMS: abort preclean due to time 1761241.488: [CMS-concurrent-abortable-preclean: 2.384/5.059 secs] [Times: user=5.30 sys=0.75, real=5.06 secs]
> 1761241.497: [GC[YG occupancy: 787969 K (921600 K)]1761241.497: [Rescan (parallel) , 0.5938799 secs]1761242.091: [weak refs processing, 0.0067469 secs]1761242.098: [class unloading, 0.0826078 secs]1761242.181: [scrub symbol & string tables, 0.1308434 secs] [1 CMS-remark: 3789438K(5136384K)] 4577408K(6057984K), 0.9017583 secs] [Times: user=2.66 sys=2.07, real=0.90 secs]
> 1761242.400: [CMS-concurrent-sweep-start]
> {Heap before GC invocations=23865 (full 1473):
> ?par new generation?? total 921600K, used 819200K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
> ? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, 0xfffffffe02000000)
> ? from space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000)
> ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000)
> ?concurrent mark-sweep generation total 5136384K, used 3789391K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
> ?concurrent-mark-sweep perm gen total 524288K, used 243406K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
> 1761242.586: [GC 1761242.586: [ParNew: 819200K->102400K(921600K), 0.1871926 secs] 4608591K->3940305K(6057984K), 0.1879045 secs] [Times: user=0.93 sys=0.05, real=0.19 secs]
> Heap after GC invocations=23866 (full 1473):
> ?par new generation?? total 921600K, used 102400K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
> ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000)
> ? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000, 0xfffffffe0e800000)
> ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000)
> ?concurrent mark-sweep generation total 5136384K, used 3837905K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
> ?concurrent-mark-sweep perm gen total 524288K, used 243406K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
> }
> 1761245.857: [CMS-concurrent-sweep: 3.225/3.457 secs] [Times: user=6.21 sys=0.52, real=3.46 secs]
> 1761245.858: [CMS-concurrent-reset-start]
> 1761245.922: [CMS-concurrent-reset: 0.064/0.064 secs] [Times: user=0.08 sys=0.01, real=0.06 secs]
> 1761247.301: [GC [1 CMS-initial-mark: 3676150K(5136384K)] 3952072K(6057984K), 0.2229528 secs] [Times: user=0.22 sys=0.00, real=0.22 secs]
> 1761247.525: [CMS-concurrent-mark-start]
> 1761251.076: [CMS-concurrent-mark: 3.510/3.551 secs] [Times: user=23.68 sys=0.85, real=3.55 secs]
> 1761251.076: [CMS-concurrent-preclean-start]
> 1761251.375: [CMS-concurrent-preclean: 0.295/0.298 secs] [Times: user=0.72 sys=0.04, real=0.30 secs]
> 1761251.375: [CMS-concurrent-abortable-preclean-start]
> ?CMS: abort preclean due to time 1761256.460: [CMS-concurrent-abortable-preclean: 5.012/5.085 secs] [Times: user=9.93 sys=0.99, real=5.09 secs]
> 1761256.469: [GC[YG occupancy: 720909 K (921600 K)]1761256.469: [Rescan (parallel) , 0.4663462 secs]1761256.936: [weak refs processing, 0.0153453 secs]1761256.951: [class unloading, 0.0833874 secs]1761257.035: [scrub symbol & string tables, 0.1289153 secs] [1 CMS-remark: 3676150K(5136384K)] 4397060K(6057984K), 0.7879219 secs] [Times: user=1.55 sys=0.96, real=0.79 secs]
> 1761257.258: [CMS-concurrent-sweep-start]
> {Heap before GC invocations=23866 (full 1474):
> ?par new generation?? total 921600K, used 921600K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
> ? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000, 0xfffffffe02000000)
> ? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000, 0xfffffffe0e800000)
> ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000, 0xfffffffe08400000)
> ?concurrent mark-sweep generation total 5136384K, used 3669509K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
> ?concurrent-mark-sweep perm gen total 524288K, used 243414K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
> 1761259.137: [GC 1761259.138: [ParNew: 921600K->102400K(921600K), 0.3975686 secs] 4591109K->3894075K(6057984K), 0.3981608 secs] [Times: user=1.65 sys=0.15, real=0.40 secs]
> Heap after GC invocations=23867 (full 1474):
> ?par new generation?? total 921600K, used 102400K [0xfffffffdd0000000, 0xfffffffe0e800000, 0xfffffffe0e800000)
> ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000, 0xfffffffe02000000)
> ? from space 102400K, 100% used [0xfffffffe02000000, 0xfffffffe08400000, 0xfffffffe08400000)
> ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000, 0xfffffffe0e800000)
> ?concurrent mark-sweep generation total 5136384K, used 3791675K [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
> ?concurrent-mark-sweep perm gen total 524288K, used 243414K [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
> }
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120412/16cc56a7/attachment.html
> -------------- next part --------------
> An embedded and charset-unspecified text was scrubbed...
> Name: CMS Failure.txt
> Url: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120412/16cc56a7/CMSFailure.txt
> -------------- next part --------------
> An embedded and charset-unspecified text was scrubbed...
> Name: PARNEW Failure.txt
> Url: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120412/16cc56a7/PARNEWFailure.txt

From rednaxelafx at gmail.com  Fri Apr 13 06:44:52 2012
From: rednaxelafx at gmail.com (Krystal Mok)
Date: Fri, 13 Apr 2012 21:44:52 +0800
Subject: Code cache
In-Reply-To: <43E49E6EC0E84F41B98C68AB6D7820C4170F7515@sm-ex-02-vm.guidewire.com>
References: <43E49E6EC0E84F41B98C68AB6D7820C4170F7515@sm-ex-02-vm.guidewire.com>
Message-ID: <CA+cQ+tQoqfiQjmgckf=8N9StO9MNO454Hb03mGd1KKps4HjFQw@mail.gmail.com>

Hi Alex,

On Tue, Apr 10, 2012 at 2:37 AM, Alex Aisinzon <aaisinzon at guidewire.com>wrote:
>
> **?         **Is there a logging option that shows how much of the code
> cache is really used so that I find the right cache size without oversizing
> it?
>
FYI, you can use JConsole or other JMX clients to see the usage of code
cache [1]

- Kris

[1]:
http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2012-March/003353.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120413/199fc271/attachment.html 

From alexey.ragozin at gmail.com  Sun Apr 15 01:12:30 2012
From: alexey.ragozin at gmail.com (Alexey Ragozin)
Date: Sun, 15 Apr 2012 12:12:30 +0400
Subject: Need help about CMS Failure and ParNew failure
In-Reply-To: <1334429881.40648.YahooMailNeo@web111106.mail.gq1.yahoo.com>
References: <CAMgTVmLXr=mCnJT-5hpjN1tucgSVx4-3pang28r5CD-y9cx4Ng@mail.gmail.com>
	<1334429881.40648.YahooMailNeo@web111106.mail.gq1.yahoo.com>
Message-ID: <CAMgTVmLYpZJQmvEpvObSOcwqrqJe2NnTkOLLgFy3coubYPu=Mg@mail.gmail.com>

Hi,

On Sat, Apr 14, 2012 at 10:58 PM, Erwin <tanman12345 at yahoo.com> wrote:
> Alexy,
>
> Thanks for the tips. I read your other links. Several questions for you:
> 1. Our min heap is 4096 and max is 6016. To combat heap fragmentation, we
> should try increating max to 8gb? Our young space is 1GB by setting
> -Xmn1000m so increasing to 8gb will give old space an extra 2gb?
Correct. I would also suggest you to use same size for -Xms and -Xmx.
> 2. I should also set my -XX:CMSInitiatingOccupancyFraction=70 to something
> like 60 to initate CMS sooner and prevent CMS failure?
You also should set -XX:+UseCMSInitiatingOccupancyOnly, otherwise JVM
may override your settings.
> 3. What WAS NDE?has JDK 6u26? We're upgrading from 7.0.0.9 to 7.0.0.21.
Cannot help you here.
>
> Thanks,
> Erwin
>
> From: Alexey Ragozin <alexey.ragozin at gmail.com>
> To: tanman12345 at yahoo.com; hotspot-gc-use at openjdk.java.net
> Sent: Friday, April 13, 2012 6:51 AM
> Subject: Re: Need help about CMS Failure and ParNew failure
>
>
> Hi Erwin,
>
> Promotion failures are happening due to fragmentation of old space. It
> is normal for fragmentation to build up over time. Most simple way to
> fight fragmentation - create large old space from start (if you use
> JVM below 6u26, it is worth to upgrade).
>
> Concurrent mode failure means that concurrent collection cycle is
> starting too late or heap size is not enough. Again allocation more
> heap from start is simplest remedy.
>
> Your logs also indicating problem with initial mark pause time. I have
> written simple guide line for setting up CMS collector for minimal
> pauses, you can find more details by link
> http://blog.ragozin.info/2011/07/gc-check-list-for-data-grid-nodes.html
>
> You can also read more about promotion failure / fragmentation by links
> below
> http://blog.ragozin.info/2011/10/java-cg-hotspots-cms-and-heap.html
> http://blog.ragozin.info/2011/10/cms-heap-fragmentation-follow-up-1.html
> http://blog.ragozin.info/2011/11/java-gc-hotspots-cms-promotion-buffers.html
>
>> Date: Thu, 12 Apr 2012 09:27:13 -0700 (PDT)
>> From: Erwin <tanman12345 at yahoo.com>
>> Subject: Need help about CMS Failure and ParNew failure
>> To: "hotspot-gc-use at openjdk.java.net"
>> ? ? ? ?<hotspot-gc-use at openjdk.java.net>
>> Message-ID:
>> ? ? ? ?<1334248033.11363.YahooMailNeo at web111103.mail.gq1.yahoo.com>
>> Content-Type: text/plain; charset="iso-8859-1"
>>
>> Hello,
>> ?
>> I'm not an expert when it comes to analyzing GC output and was wondering
>> if you guys could assist? We're using Solaris 10 10/08 s10s_u6wos_07b SPARC,
>> with WAS NDE 7.0.0.9. After a restart of our JVMs, GC seems to be fine.
>> However, after about a week, we start seeing failures in GC log. We're
>> getting ParNew and Concurrent mod failures. Our JVM configurations?are
>> below:
>> Min heap - 4096
>> Max heap - 6016
>> ?
>> JVM Arguments
>> -server -Xmn1000m -XX:PermSize=512m -XX:+UseConcMarkSweepGC
>> -XX:+HeapDumpOnOutOfMemoryError -DUseSunHttpHandler=true
>> -Djavax.xml.soap.MessageFactory=weblogic.xml.saaj.MessageFactoryImpl
>> -Doracle.jdbc.V8Compatible=true -XX:CMSInitiatingOccupancyFraction=70
>> -XX:+CMSClassUnloadingEnabled -verbose:gc -XX:+PrintGCTimeStamps
>> -XX:+PrintGCDetails -XX:-TraceClassUnloading -XX:+PrintHeapAtGC?
>> -XX:MaxPermSize=694m -XX:+DisableExplicitGC -XX:+CMSParallelRemarkEnabled?
>> ?
>> I'm attaching the ParNew failure as well as CMS failure files. Hope it
>> attaches.Total of 2 files. In case they don't see below. !st same if ParNew,
>> 2nd is CMS failure. - Thanks, Erwin
>> ParNew failure sample:
>> {Heap before GC invocations=7800 (full 529):
>> ?par new generation?? total 921600K, used 530694K [0xfffffffdd0000000,
>> 0xfffffffe0e800000, 0xfffffffe0e800000)
>> ? eden space 819200K,? 52% used [0xfffffffdd0000000, 0xfffffffdea241bb8,
>> 0xfffffffe02000000)
>> ? from space 102400K, 100% used [0xfffffffe02000000, 0xfffffffe08400000,
>> 0xfffffffe08400000)
>> ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000,
>> 0xfffffffe0e800000)
>> ?concurrent mark-sweep generation total 4902464K, used 2682365K
>> [0xfffffffe0e800000, 0xffffffff39b90000, 0xffffffff48000000)
>> ?concurrent-mark-sweep perm gen total 524288K, used 238782K
>> [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
>> 552370.958: [GC 552370.958: [ParNew: 530694K->93237K(921600K), 0.1858500
>> secs] 3213060K->2832744K(5824064K), 0.1862466 secs] [Times: user=1.21
>> sys=0.08, real=0.19 secs]
>> Heap after GC invocations=7801 (full 529):
>> ?par new generation?? total 921600K, used 93237K [0xfffffffdd0000000,
>> 0xfffffffe0e800000, 0xfffffffe0e800000)
>> ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000,
>> 0xfffffffe02000000)
>> ? from space 102400K,? 91% used [0xfffffffe08400000, 0xfffffffe0df0d498,
>> 0xfffffffe0e800000)
>> ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000,
>> 0xfffffffe08400000)
>> ?concurrent mark-sweep generation total 4902464K, used 2739507K
>> [0xfffffffe0e800000, 0xffffffff39b90000, 0xffffffff48000000)
>> ?concurrent-mark-sweep perm gen total 524288K, used 238782K
>> [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
>> }
>> {Heap before GC invocations=7801 (full 529):
>> ?par new generation?? total 921600K, used 912377K [0xfffffffdd0000000,
>> 0xfffffffe0e800000, 0xfffffffe0e800000)
>> ? eden space 819200K,? 99% used [0xfffffffdd0000000, 0xfffffffe01ff1320,
>> 0xfffffffe02000000)
>> ? from space 102400K,? 91% used [0xfffffffe08400000, 0xfffffffe0df0d498,
>> 0xfffffffe0e800000)
>> ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000,
>> 0xfffffffe08400000)
>> ?concurrent mark-sweep generation total 4902464K, used 2739507K
>> [0xfffffffe0e800000, 0xffffffff39b90000, 0xffffffff48000000)
>> ?concurrent-mark-sweep perm gen total 524288K, used 238795K
>> [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
>> 552372.849: [GC 552372.849: [ParNew (promotion failed):
>> 912377K->869343K(921600K), 0.2641392 secs]552373.113: [CMS:
>> 2791714K->3203612K(4902464K), 29.1902704 secs] 3651885K->3203612K(5824064K),
>> [CMS Perm : 238795K->238246K(524288K)], 29.4609781 secs] [Times: user=30.05
>> sys=0.13, real=29.46 secs]
>> Heap after GC invocations=7802 (full 530):
>> ?par new generation?? total 921600K, used 0K [0xfffffffdd0000000,
>> 0xfffffffe0e800000, 0xfffffffe0e800000)
>> ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000,
>> 0xfffffffe02000000)
>> ? from space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000,
>> 0xfffffffe08400000)
>> ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000,
>> 0xfffffffe0e800000)
>> ?concurrent mark-sweep generation total 5136384K, used 3203612K
>> [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
>> ?concurrent-mark-sweep perm gen total 524288K, used 238246K
>> [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
>> }
>> {Heap before GC invocations=12696 (full 809):
>> ?par new generation?? total 921600K, used 908565K [0xfffffffdd0000000,
>> 0xfffffffe0e800000, 0xfffffffe0e800000)
>> ? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000,
>> 0xfffffffe02000000)
>> ? from space 102400K,? 87% used [0xfffffffe02000000, 0xfffffffe07745510,
>> 0xfffffffe08400000)
>> ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000,
>> 0xfffffffe0e800000)
>> ?concurrent mark-sweep generation total 5136384K, used 2696786K
>> [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
>> ?concurrent-mark-sweep perm gen total 524288K, used 241380K
>> [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
>> 980120.502: [GC 980120.502: [ParNew: 908565K->73974K(921600K), 0.1519646
>> secs] 3605352K->2817972K(6057984K), 0.1523927 secs] [Times: user=1.07
>> sys=0.03, real=0.15 secs]
>> Heap after GC invocations=12697 (full 809):
>> ?par new generation?? total 921600K, used 73974K [0xfffffffdd0000000,
>> 0xfffffffe0e800000, 0xfffffffe0e800000)
>> ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000,
>> 0xfffffffe02000000)
>> ? from space 102400K,? 72% used [0xfffffffe08400000, 0xfffffffe0cc3d928,
>> 0xfffffffe0e800000)
>> ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000,
>> 0xfffffffe08400000)
>> ?concurrent mark-sweep generation total 5136384K, used 2743998K
>> [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
>> ?concurrent-mark-sweep perm gen total 524288K, used 241380K
>> [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
>> }
>> {Heap before GC invocations=12697 (full 809):
>> ?par new generation?? total 921600K, used 893174K [0xfffffffdd0000000,
>> 0xfffffffe0e800000, 0xfffffffe0e800000)
>> ? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000,
>> 0xfffffffe02000000)
>> ? from space 102400K,? 72% used [0xfffffffe08400000, 0xfffffffe0cc3d928,
>> 0xfffffffe0e800000)
>> ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000,
>> 0xfffffffe08400000)
>> ?concurrent mark-sweep generation total 5136384K, used 2743998K
>> [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
>> ?concurrent-mark-sweep perm gen total 524288K, used 241411K
>> [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
>> 980130.777: [GC 980130.777: [ParNew (promotion failed):
>> 893174K->913391K(921600K), 0.5914616 secs]980131.368: [CMS:
>> 2778416K->2710999K(5136384K), 27.7981960 secs] 3637172K->2710999K(6057984K),
>> [CMS Perm : 241411K->240494K(524288K)], 28.3902578 secs] [Times: user=29.37
>> sys=0.08, real=28.39 secs]
>> Heap after GC invocations=12698 (full 810):
>> ?par new generation?? total 921600K, used 0K [0xfffffffdd0000000,
>> 0xfffffffe0e800000, 0xfffffffe0e800000)
>> ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000,
>> 0xfffffffe02000000)
>> ? from space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000,
>> 0xfffffffe08400000)
>> ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000,
>> 0xfffffffe0e800000)
>> ?concurrent mark-sweep generation total 5136384K, used 2710999K
>> [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
>> ?concurrent-mark-sweep perm gen total 524288K, used 240494K
>> [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
>> }
>> {Heap before GC invocations=12698 (full 810):
>> ?par new generation?? total 921600K, used 819200K [0xfffffffdd0000000,
>> 0xfffffffe0e800000, 0xfffffffe0e800000)
>> ? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000,
>> 0xfffffffe02000000)
>> ? from space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000,
>> 0xfffffffe08400000)
>> ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000,
>> 0xfffffffe0e800000)
>> ?concurrent mark-sweep generation total 5136384K, used 2710999K
>> [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
>> ?concurrent-mark-sweep perm gen total 524288K, used 240523K
>> [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
>> 980171.033: [GC 980171.033: [ParNew: 819200K->102400K(921600K), 0.2144047
>> secs] 3530199K->2857892K(6057984K), 0.2149864 secs] [Times: user=0.99
>> sys=0.03, real=0.22 secs]
>> Heap after GC invocations=12699 (full 810):
>> ?par new generation?? total 921600K, used 102400K [0xfffffffdd0000000,
>> 0xfffffffe0e800000, 0xfffffffe0e800000)
>> ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000,
>> 0xfffffffe02000000)
>> ? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000,
>> 0xfffffffe0e800000)
>> ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000,
>> 0xfffffffe08400000)
>> ?concurrent mark-sweep generation total 5136384K, used 2755492K
>> [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
>> ?concurrent-mark-sweep perm gen total 524288K, used 240523K
>> [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
>> }
>>
>> CMS Failure:
>> {Heap before GC invocations=23856 (full 1462):
>> ?par new generation?? total 921600K, used 921600K [0xfffffffdd0000000,
>> 0xfffffffe0e800000, 0xfffffffe0e800000)
>> ? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000,
>> 0xfffffffe02000000)
>> ? from space 102400K, 100% used [0xfffffffe02000000, 0xfffffffe08400000,
>> 0xfffffffe08400000)
>> ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000,
>> 0xfffffffe0e800000)
>> ?concurrent mark-sweep generation total 5136384K, used 3496920K
>> [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
>> ?concurrent-mark-sweep perm gen total 524288K, used 243562K
>> [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
>> 1761014.981: [GC 1761014.981: [ParNew: 921600K->102400K(921600K),
>> 0.3004508 secs] 4418520K->3667695K(6057984K), 0.3008667 secs] [Times:
>> user=1.69 sys=0.10, real=0.30 secs]
>> Heap after GC invocations=23857 (full 1462):
>> ?par new generation?? total 921600K, used 102400K [0xfffffffdd0000000,
>> 0xfffffffe0e800000, 0xfffffffe0e800000)
>> ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000,
>> 0xfffffffe02000000)
>> ? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000,
>> 0xfffffffe0e800000)
>> ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000,
>> 0xfffffffe08400000)
>> ?concurrent mark-sweep generation total 5136384K, used 3565295K
>> [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
>> ?concurrent-mark-sweep perm gen total 524288K, used 243562K
>> [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
>> }
>> {Heap before GC invocations=23857 (full 1462):
>> ?par new generation?? total 921600K, used 921600K [0xfffffffdd0000000,
>> 0xfffffffe0e800000, 0xfffffffe0e800000)
>> ? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000,
>> 0xfffffffe02000000)
>> ? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000,
>> 0xfffffffe0e800000)
>> ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000,
>> 0xfffffffe08400000)
>> ?concurrent mark-sweep generation total 5136384K, used 3565295K
>> [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
>> ?concurrent-mark-sweep perm gen total 524288K, used 243773K
>> [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
>> 1761036.997: [GC 1761036.998: [ParNew: 921600K->102400K(921600K),
>> 0.4075457 secs] 4486895K->3763683K(6057984K), 0.4079591 secs] [Times:
>> user=3.57 sys=0.80, real=0.41 secs]
>> Heap after GC invocations=23858 (full 1462):
>> ?par new generation?? total 921600K, used 102400K [0xfffffffdd0000000,
>> 0xfffffffe0e800000, 0xfffffffe0e800000)
>> ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000,
>> 0xfffffffe02000000)
>> ? from space 102400K, 100% used [0xfffffffe02000000, 0xfffffffe08400000,
>> 0xfffffffe08400000)
>> ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000,
>> 0xfffffffe0e800000)
>> ?concurrent mark-sweep generation total 5136384K, used 3661283K
>> [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
>> ?concurrent-mark-sweep perm gen total 524288K, used 243773K
>> [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
>> }
>> 1761037.410: [GC [1 CMS-initial-mark: 3661283K(5136384K)]
>> 3763683K(6057984K), 0.0883369 secs] [Times: user=0.09 sys=0.00, real=0.09
>> secs]
>> 1761037.499: [CMS-concurrent-mark-start]
>> 1761041.447: [CMS-concurrent-mark: 3.906/3.948 secs] [Times: user=25.81
>> sys=1.06, real=3.95 secs]
>> 1761041.448: [CMS-concurrent-preclean-start]
>> 1761041.763: [CMS-concurrent-preclean: 0.312/0.315 secs] [Times: user=0.50
>> sys=0.02, real=0.32 secs]
>> 1761041.763: [CMS-concurrent-abortable-preclean-start]
>> ?CMS: abort preclean due to time 1761046.800:
>> [CMS-concurrent-abortable-preclean: 4.720/5.036 secs] [Times: user=6.68
>> sys=0.18, real=5.04 secs]
>> 1761046.808: [GC[YG occupancy: 464701 K (921600 K)]1761046.808: [Rescan
>> (parallel) , 0.3034664 secs]1761047.112: [weak refs processing, 0.0152564
>> secs]1761047.128: [class unloading, 0.1518160 secs]1761047.280: [scrub
>> symbol & string tables, 0.1332523 secs] [1 CMS-remark: 3661283K(5136384K)]
>> 4125985K(6057984K), 0.6980401 secs] [Times: user=1.34 sys=0.70, real=0.70
>> secs]
>> 1761047.507: [CMS-concurrent-sweep-start]
>> 1761051.779: [CMS-concurrent-sweep: 4.252/4.271 secs] [Times: user=6.30
>> sys=0.19, real=4.27 secs]
>> 1761051.779: [CMS-concurrent-reset-start]
>> 1761051.837: [CMS-concurrent-reset: 0.058/0.058 secs] [Times: user=0.07
>> sys=0.00, real=0.06 secs]
>> {Heap before GC invocations=23858 (full 1463):
>> ?par new generation?? total 921600K, used 921600K [0xfffffffdd0000000,
>> 0xfffffffe0e800000, 0xfffffffe0e800000)
>> ? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000,
>> 0xfffffffe02000000)
>> ? from space 102400K, 100% used [0xfffffffe02000000, 0xfffffffe08400000,
>> 0xfffffffe08400000)
>> ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000,
>> 0xfffffffe0e800000)
>> ?concurrent mark-sweep generation total 5136384K, used 3514703K
>> [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
>> ?concurrent-mark-sweep perm gen total 524288K, used 243613K
>> [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
>> 1761061.144: [GC 1761061.145: [ParNew: 921600K->102400K(921600K),
>> 0.4124278 secs] 4436303K->3717736K(6057984K), 0.4128777 secs] [Times:
>> user=1.98 sys=0.19, real=0.41 secs]
>> Heap after GC invocations=23859 (full 1463):
>> ?par new generation?? total 921600K, used 102400K [0xfffffffdd0000000,
>> 0xfffffffe0e800000, 0xfffffffe0e800000)
>> ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000,
>> 0xfffffffe02000000)
>> ? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000,
>> 0xfffffffe0e800000)
>> ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000,
>> 0xfffffffe08400000)
>> ?concurrent mark-sweep generation total 5136384K, used 3615336K
>> [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
>> ?concurrent-mark-sweep perm gen total 524288K, used 243613K
>> [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
>> }
>> 1761061.562: [GC [1 CMS-initial-mark: 3615336K(5136384K)]
>> 3718604K(6057984K), 0.0660086 secs] [Times: user=0.07 sys=0.00, real=0.07
>> secs]
>> 1761061.629: [CMS-concurrent-mark-start]
>> 1761065.589: [CMS-concurrent-mark: 3.920/3.960 secs] [Times: user=26.20
>> sys=1.05, real=3.96 secs]
>> 1761065.590: [CMS-concurrent-preclean-start]
>> 1761065.883: [CMS-concurrent-preclean: 0.290/0.293 secs] [Times: user=0.54
>> sys=0.02, real=0.29 secs]
>> 1761065.883: [CMS-concurrent-abortable-preclean-start]
>> ?CMS: abort preclean due to time 1761070.950:
>> [CMS-concurrent-abortable-preclean: 5.035/5.067 secs] [Times: user=10.70
>> sys=0.36, real=5.07 secs]
>> 1761070.958: [GC[YG occupancy: 656197 K (921600 K)]1761070.959: [Rescan
>> (parallel) , 0.5056315 secs]1761071.465: [weak refs processing, 0.0107058
>> secs]1761071.476: [class unloading, 0.1500832 secs]1761071.626: [scrub
>> symbol & string tables, 0.1278517 secs] [1 CMS-remark: 3615336K(5136384K)]
>> 4271533K(6057984K), 0.8857121 secs] [Times: user=1.77 sys=1.08, real=0.89
>> secs]
>> 1761071.845: [CMS-concurrent-sweep-start]
>> 1761075.956: [CMS-concurrent-sweep: 4.094/4.111 secs] [Times: user=7.97
>> sys=0.27, real=4.11 secs]
>> 1761075.957: [CMS-concurrent-reset-start]
>> 1761076.031: [CMS-concurrent-reset: 0.063/0.074 secs] [Times: user=0.13
>> sys=0.01, real=0.07 secs]
>> {Heap before GC invocations=23859 (full 1464):
>> ?par new generation?? total 921600K, used 921600K [0xfffffffdd0000000,
>> 0xfffffffe0e800000, 0xfffffffe0e800000)
>> ? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000,
>> 0xfffffffe02000000)
>> ? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000,
>> 0xfffffffe0e800000)
>> ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000,
>> 0xfffffffe08400000)
>> ?concurrent mark-sweep generation total 5136384K, used 3544377K
>> [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
>> ?concurrent-mark-sweep perm gen total 524288K, used 243474K
>> [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
>> 1761077.540: [GC 1761077.540: [ParNew: 921600K->102400K(921600K),
>> 0.4030394 secs] 4465977K->3740399K(6057984K), 0.4034742 secs] [Times:
>> user=3.14 sys=0.55, real=0.40 secs]
>> Heap after GC invocations=23860 (full 1464):
>> ?par new generation?? total 921600K, used 102400K [0xfffffffdd0000000,
>> 0xfffffffe0e800000, 0xfffffffe0e800000)
>> ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000,
>> 0xfffffffe02000000)
>> ? from space 102400K, 100% used [0xfffffffe02000000, 0xfffffffe08400000,
>> 0xfffffffe08400000)
>> ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000,
>> 0xfffffffe0e800000)
>> ?concurrent mark-sweep generation total 5136384K, used 3637999K
>> [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
>> ?concurrent-mark-sweep perm gen total 524288K, used 243474K
>> [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
>> }
>> 1761077.948: [GC [1 CMS-initial-mark: 3637999K(5136384K)]
>> 3740403K(6057984K), 0.0664402 secs] [Times: user=0.06 sys=0.00, real=0.07
>> secs]
>> 1761078.015: [CMS-concurrent-mark-start]
>> 1761082.141: [CMS-concurrent-mark: 4.076/4.126 secs] [Times: user=25.56
>> sys=1.24, real=4.13 secs]
>> 1761082.142: [CMS-concurrent-preclean-start]
>> 1761082.435: [CMS-concurrent-preclean: 0.290/0.293 secs] [Times: user=0.56
>> sys=0.03, real=0.29 secs]
>> 1761082.435: [CMS-concurrent-abortable-preclean-start]
>> ?CMS: abort preclean due to time 1761087.544:
>> [CMS-concurrent-abortable-preclean: 4.166/5.108 secs] [Times: user=6.79
>> sys=0.38, real=5.11 secs]
>> 1761087.554: [GC[YG occupancy: 612230 K (921600 K)]1761087.555: [Rescan
>> (parallel) , 0.3453344 secs]1761087.900: [weak refs processing, 0.0033384
>> secs]1761087.904: [class unloading, 0.1515234 secs]1761088.055: [scrub
>> symbol & string tables, 0.1280533 secs] [1 CMS-remark: 3637999K(5136384K)]
>> 4250230K(6057984K), 0.7189376 secs] [Times: user=1.29 sys=0.76, real=0.72
>> secs]
>> 1761088.274: [CMS-concurrent-sweep-start]
>> 1761092.543: [CMS-concurrent-sweep: 4.268/4.268 secs] [Times: user=6.72
>> sys=0.26, real=4.27 secs]
>> 1761092.543: [CMS-concurrent-reset-start]
>> 1761092.606: [CMS-concurrent-reset: 0.063/0.063 secs] [Times: user=0.11
>> sys=0.01, real=0.06 secs]
>> {Heap before GC invocations=23860 (full 1465):
>> ?par new generation?? total 921600K, used 921600K [0xfffffffdd0000000,
>> 0xfffffffe0e800000, 0xfffffffe0e800000)
>> ? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000,
>> 0xfffffffe02000000)
>> ? from space 102400K, 100% used [0xfffffffe02000000, 0xfffffffe08400000,
>> 0xfffffffe08400000)
>> ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000,
>> 0xfffffffe0e800000)
>> ?concurrent mark-sweep generation total 5136384K, used 3582457K
>> [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
>> ?concurrent-mark-sweep perm gen total 524288K, used 243634K
>> [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
>> 1761096.682: [GC 1761096.682: [ParNew: 921600K->102400K(921600K),
>> 0.2843209 secs] 4504057K->3786219K(6057984K), 0.2847419 secs] [Times:
>> user=1.81 sys=0.10, real=0.29 secs]
>> Heap after GC invocations=23861 (full 1465):
>> ?par new generation?? total 921600K, used 102400K [0xfffffffdd0000000,
>> 0xfffffffe0e800000, 0xfffffffe0e800000)
>> ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000,
>> 0xfffffffe02000000)
>> ? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000,
>> 0xfffffffe0e800000)
>> ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000,
>> 0xfffffffe08400000)
>> ?concurrent mark-sweep generation total 5136384K, used 3683819K
>> [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
>> ?concurrent-mark-sweep perm gen total 524288K, used 243634K
>> [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
>> }
>> 1761096.971: [GC [1 CMS-initial-mark: 3683819K(5136384K)]
>> 3786817K(6057984K), 0.0480239 secs] [Times: user=0.05 sys=0.00, real=0.05
>> secs]
>> 1761097.020: [CMS-concurrent-mark-start]
>> 1761101.145: [CMS-concurrent-mark: 4.104/4.124 secs] [Times: user=24.60
>> sys=1.09, real=4.13 secs]
>> 1761101.145: [CMS-concurrent-preclean-start]
>> 1761101.438: [CMS-concurrent-preclean: 0.290/0.292 secs] [Times: user=0.41
>> sys=0.02, real=0.29 secs]
>> 1761101.438: [CMS-concurrent-abortable-preclean-start]
>> ?CMS: abort preclean due to time 1761106.478:
>> [CMS-concurrent-abortable-preclean: 4.694/5.040 secs] [Times: user=7.32
>> sys=0.23, real=5.04 secs]
>> 1761106.486: [GC[YG occupancy: 497634 K (921600 K)]1761106.487: [Rescan
>> (parallel) , 0.3384965 secs]1761106.825: [weak refs processing, 0.0030734
>> secs]1761106.829: [class unloading, 0.1503426 secs]1761106.979: [scrub
>> symbol & string tables, 0.1273150 secs] [1 CMS-remark: 3683819K(5136384K)]
>> 4181454K(6057984K), 0.7055549 secs] [Times: user=1.25 sys=0.64, real=0.71
>> secs]
>> 1761107.193: [CMS-concurrent-sweep-start]
>> 1761111.281: [CMS-concurrent-sweep: 4.088/4.088 secs] [Times: user=5.81
>> sys=0.15, real=4.09 secs]
>> 1761111.282: [CMS-concurrent-reset-start]
>> 1761111.349: [CMS-concurrent-reset: 0.068/0.068 secs] [Times: user=0.08
>> sys=0.00, real=0.07 secs]
>> 1761111.961: [GC [1 CMS-initial-mark: 3633902K(5136384K)]
>> 4261007K(6057984K), 0.5015835 secs] [Times: user=0.50 sys=0.00, real=0.50
>> secs]
>> 1761112.463: [CMS-concurrent-mark-start]
>> 1761116.550: [CMS-concurrent-mark: 4.036/4.087 secs] [Times: user=24.85
>> sys=1.09, real=4.09 secs]
>> 1761116.551: [CMS-concurrent-preclean-start]
>> 1761116.901: [CMS-concurrent-preclean: 0.344/0.350 secs] [Times: user=0.54
>> sys=0.01, real=0.35 secs]
>> 1761116.901: [CMS-concurrent-abortable-preclean-start]
>> {Heap before GC invocations=23861 (full 1467):
>> ?par new generation?? total 921600K, used 921600K [0xfffffffdd0000000,
>> 0xfffffffe0e800000, 0xfffffffe0e800000)
>> ? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000,
>> 0xfffffffe02000000)
>> ? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000,
>> 0xfffffffe0e800000)
>> ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000,
>> 0xfffffffe08400000)
>> ?concurrent mark-sweep generation total 5136384K, used 3633902K
>> [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
>> ?concurrent-mark-sweep perm gen total 524288K, used 243740K
>> [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
>> 1761120.554: [GC 1761120.554: [ParNew: 921600K->102400K(921600K),
>> 0.4726199 secs] 4555502K->3819626K(6057984K), 0.4732486 secs] [Times:
>> user=3.31 sys=0.69, real=0.47 secs]
>> Heap after GC invocations=23862 (full 1467):
>> ?par new generation?? total 921600K, used 102400K [0xfffffffdd0000000,
>> 0xfffffffe0e800000, 0xfffffffe0e800000)
>> ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000,
>> 0xfffffffe02000000)
>> ? from space 102400K, 100% used [0xfffffffe02000000, 0xfffffffe08400000,
>> 0xfffffffe08400000)
>> ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000,
>> 0xfffffffe0e800000)
>> ?concurrent mark-sweep generation total 5136384K, used 3717226K
>> [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
>> ?concurrent-mark-sweep perm gen total 524288K, used 243740K
>> [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
>> }
>> ?CMS: abort preclean due to time 1761122.392:
>> [CMS-concurrent-abortable-preclean: 4.729/5.490 secs] [Times: user=11.71
>> sys=0.97, real=5.49 secs]
>> 1761122.401: [GC[YG occupancy: 177317 K (921600 K)]1761122.401: [Rescan
>> (parallel) , 0.0250334 secs]1761122.427: [weak refs processing, 0.0002699
>> secs]1761122.427: [class unloading, 0.0817179 secs]1761122.509: [scrub
>> symbol & string tables, 0.1383120 secs] [1 CMS-remark: 3717226K(5136384K)]
>> 3894544K(6057984K), 0.3327016 secs] [Times: user=0.55 sys=0.04, real=0.33
>> secs]
>> 1761122.735: [CMS-concurrent-sweep-start]
>> 1761126.843: [CMS-concurrent-sweep: 4.042/4.108 secs] [Times: user=6.70
>> sys=0.39, real=4.11 secs]
>> 1761126.844: [CMS-concurrent-reset-start]
>> 1761126.907: [CMS-concurrent-reset: 0.063/0.063 secs] [Times: user=0.11
>> sys=0.00, real=0.06 secs]
>> 1761127.142: [GC [1 CMS-initial-mark: 3701154K(5136384K)]
>> 4056638K(6057984K), 0.2853309 secs] [Times: user=0.29 sys=0.00, real=0.29
>> secs]
>> 1761127.428: [CMS-concurrent-mark-start]
>> 1761131.876: [CMS-concurrent-mark: 4.398/4.448 secs] [Times: user=28.46
>> sys=1.55, real=4.45 secs]
>> 1761131.877: [CMS-concurrent-preclean-start]
>> 1761132.185: [CMS-concurrent-preclean: 0.305/0.308 secs] [Times: user=0.60
>> sys=0.05, real=0.31 secs]
>> 1761132.186: [CMS-concurrent-abortable-preclean-start]
>> ?CMS: abort preclean due to time 1761137.243:
>> [CMS-concurrent-abortable-preclean: 5.029/5.058 secs] [Times: user=9.88
>> sys=0.42, real=5.06 secs]
>> 1761137.248: [GC[YG occupancy: 783876 K (921600 K)]1761137.248: [Rescan
>> (parallel) , 0.5402015 secs]1761137.789: [weak refs processing, 0.0022809
>> secs]1761137.791: [class unloading, 0.1556933 secs]1761137.947: [scrub
>> symbol & string tables, 0.1291759 secs] [1 CMS-remark: 3701154K(5136384K)]
>> 4485030K(6057984K), 0.9154842 secs] [Times: user=1.67 sys=0.97, real=0.92
>> secs]
>> 1761138.164: [CMS-concurrent-sweep-start]
>> {Heap before GC invocations=23862 (full 1468):
>> ?par new generation?? total 921600K, used 920346K [0xfffffffdd0000000,
>> 0xfffffffe0e800000, 0xfffffffe0e800000)
>> ? eden space 819200K,? 99% used [0xfffffffdd0000000, 0xfffffffe01ec6a48,
>> 0xfffffffe02000000)
>> ? from space 102400K, 100% used [0xfffffffe02000000, 0xfffffffe08400000,
>> 0xfffffffe08400000)
>> ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000,
>> 0xfffffffe0e800000)
>> ?concurrent mark-sweep generation total 5136384K, used 3694838K
>> [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
>> ?concurrent-mark-sweep perm gen total 524288K, used 243810K
>> [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
>> 1761140.028: [GC 1761140.028: [ParNew: 920346K->102400K(921600K),
>> 0.4882607 secs] 4615185K->4017461K(6057984K), 0.4886748 secs] [Times:
>> user=2.71 sys=0.20, real=0.49 secs]
>> Heap after GC invocations=23863 (full 1468):
>> ?par new generation?? total 921600K, used 102400K [0xfffffffdd0000000,
>> 0xfffffffe0e800000, 0xfffffffe0e800000)
>> ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000,
>> 0xfffffffe02000000)
>> ? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000,
>> 0xfffffffe0e800000)
>> ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000,
>> 0xfffffffe08400000)
>> ?concurrent mark-sweep generation total 5136384K, used 3915061K
>> [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
>> ?concurrent-mark-sweep perm gen total 524288K, used 243810K
>> [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
>> }
>> 1761142.726: [CMS-concurrent-sweep: 4.011/4.562 secs] [Times: user=12.54
>> sys=0.74, real=4.56 secs]
>> 1761142.727: [CMS-concurrent-reset-start]
>> 1761142.791: [CMS-concurrent-reset: 0.064/0.064 secs] [Times: user=0.16
>> sys=0.01, real=0.06 secs]
>> 1761143.233: [GC [1 CMS-initial-mark: 3852859K(5136384K)]
>> 4152461K(6057984K), 0.2340877 secs] [Times: user=0.23 sys=0.00, real=0.23
>> secs]
>> 1761143.467: [CMS-concurrent-mark-start]
>> 1761147.673: [CMS-concurrent-mark: 4.182/4.205 secs] [Times: user=26.19
>> sys=1.27, real=4.21 secs]
>> 1761147.673: [CMS-concurrent-preclean-start]
>> 1761147.978: [CMS-concurrent-preclean: 0.300/0.304 secs] [Times: user=0.44
>> sys=0.02, real=0.30 secs]
>> 1761147.978: [CMS-concurrent-abortable-preclean-start]
>> {Heap before GC invocations=23863 (full 1469):
>> ?par new generation?? total 921600K, used 602663K [0xfffffffdd0000000,
>> 0xfffffffe0e800000, 0xfffffffe0e800000)
>> ? eden space 819200K,? 61% used [0xfffffffdd0000000, 0xfffffffdee889c90,
>> 0xfffffffe02000000)
>> ? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000,
>> 0xfffffffe0e800000)
>> ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000,
>> 0xfffffffe08400000)
>> ?concurrent mark-sweep generation total 5136384K, used 3852859K
>> [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
>> ?concurrent-mark-sweep perm gen total 524288K, used 243656K
>> [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
>> 1761152.969: [GC 1761152.969: [ParNew: 602663K->102400K(921600K),
>> 0.4710031 secs]1761153.440: [CMS CMS: abort preclean due to time
>> 1761155.705: [CMS-concurrent-abortable-preclean: 6.957/7.726 secs] [Times:
>> user=13.02 sys=0.48, real=7.73 secs]
>> ?(concurrent mode failure): 4005428K->3905404K(5136384K), 32.6670849 secs]
>> 4455522K->3905404K(6057984K), [CMS Perm : 243656K->243327K(524288K)],
>> 33.1389061 secs] [Times: user=35.38 sys=0.26, real=33.14 secs]
>> Heap after GC invocations=23864 (full 1470):
>> ?par new generation?? total 921600K, used 0K [0xfffffffdd0000000,
>> 0xfffffffe0e800000, 0xfffffffe0e800000)
>> ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000,
>> 0xfffffffe02000000)
>> ? from space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000,
>> 0xfffffffe08400000)
>> ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000,
>> 0xfffffffe0e800000)
>> ?concurrent mark-sweep generation total 5136384K, used 3905404K
>> [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
>> ?concurrent-mark-sweep perm gen total 524288K, used 243327K
>> [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
>> }
>> 1761187.694: [GC [1 CMS-initial-mark: 4953978K(5136384K)]
>> 4965714K(6057984K), 0.0131890 secs] [Times: user=0.01 sys=0.00, real=0.01
>> secs]
>> 1761187.708: [CMS-concurrent-mark-start]
>> 1761191.965: [CMS-concurrent-mark: 3.634/4.257 secs] [Times: user=32.76
>> sys=1.91, real=4.26 secs]
>> 1761191.966: [CMS-concurrent-preclean-start]
>> 1761192.543: [CMS-concurrent-preclean: 0.553/0.577 secs] [Times: user=1.56
>> sys=0.12, real=0.58 secs]
>> 1761192.544: [CMS-concurrent-abortable-preclean-start]
>> ?CMS: abort preclean due to time 1761197.612:
>> [CMS-concurrent-abortable-preclean: 2.079/5.068 secs] [Times: user=9.11
>> sys=0.60, real=5.07 secs]
>> 1761197.617: [GC[YG occupancy: 813510 K (921600 K)]1761197.618: [Rescan
>> (parallel) , 0.7500635 secs]1761198.368: [weak refs processing, 0.0020064
>> secs]1761198.370: [class unloading, 0.0823783 secs]1761198.453: [scrub
>> symbol & string tables, 0.1278387 secs] [1 CMS-remark: 4953978K(5136384K)]
>> 5767489K(6057984K), 1.0496971 secs] [Times: user=2.69 sys=1.79, real=1.05
>> secs]
>> 1761198.668: [CMS-concurrent-sweep-start]
>> {Heap before GC invocations=23864 (full 1471):
>> ?par new generation?? total 921600K, used 819200K [0xfffffffdd0000000,
>> 0xfffffffe0e800000, 0xfffffffe0e800000)
>> ? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000,
>> 0xfffffffe02000000)
>> ? from space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000,
>> 0xfffffffe08400000)
>> ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000,
>> 0xfffffffe0e800000)
>> ?concurrent mark-sweep generation total 5136384K, used 4953976K
>> [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
>> ?concurrent-mark-sweep perm gen total 524288K, used 243422K
>> [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
>> 1761198.700: [GC 1761198.700: [ParNew: 819200K->819200K(921600K),
>> 0.0000919 secs]1761198.700: [CMS1761202.072: [CMS-concurrent-sweep:
>> 3.389/3.404 secs] [Times: user=3.60 sys=0.04, real=3.40 secs]
>> ?(concurrent mode failure): 4953976K->3789438K(5136384K), 32.6623615 secs]
>> 5773176K->3789438K(6057984K), [CMS Perm : 243422K->243328K(524288K)],
>> 32.6632802 secs] [Times: user=32.58 sys=0.03, real=32.66 secs]
>> Heap after GC invocations=23865 (full 1472):
>> ?par new generation?? total 921600K, used 0K [0xfffffffdd0000000,
>> 0xfffffffe0e800000, 0xfffffffe0e800000)
>> ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000,
>> 0xfffffffe02000000)
>> ? from space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000,
>> 0xfffffffe08400000)
>> ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000,
>> 0xfffffffe0e800000)
>> ?concurrent mark-sweep generation total 5136384K, used 3789438K
>> [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
>> ?concurrent-mark-sweep perm gen total 524288K, used 243328K
>> [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
>> }
>> 1761231.418: [GC [1 CMS-initial-mark: 3789438K(5136384K)]
>> 3883471K(6057984K), 0.0609784 secs] [Times: user=0.06 sys=0.01, real=0.06
>> secs]
>> 1761231.480: [CMS-concurrent-mark-start]
>> 1761236.061: [CMS-concurrent-mark: 3.752/4.580 secs] [Times: user=34.48
>> sys=2.81, real=4.58 secs]
>> 1761236.061: [CMS-concurrent-preclean-start]
>> 1761236.428: [CMS-concurrent-preclean: 0.358/0.367 secs] [Times: user=0.46
>> sys=0.01, real=0.37 secs]
>> 1761236.429: [CMS-concurrent-abortable-preclean-start]
>> ?CMS: abort preclean due to time 1761241.488:
>> [CMS-concurrent-abortable-preclean: 2.384/5.059 secs] [Times: user=5.30
>> sys=0.75, real=5.06 secs]
>> 1761241.497: [GC[YG occupancy: 787969 K (921600 K)]1761241.497: [Rescan
>> (parallel) , 0.5938799 secs]1761242.091: [weak refs processing, 0.0067469
>> secs]1761242.098: [class unloading, 0.0826078 secs]1761242.181: [scrub
>> symbol & string tables, 0.1308434 secs] [1 CMS-remark: 3789438K(5136384K)]
>> 4577408K(6057984K), 0.9017583 secs] [Times: user=2.66 sys=2.07, real=0.90
>> secs]
>> 1761242.400: [CMS-concurrent-sweep-start]
>> {Heap before GC invocations=23865 (full 1473):
>> ?par new generation?? total 921600K, used 819200K [0xfffffffdd0000000,
>> 0xfffffffe0e800000, 0xfffffffe0e800000)
>> ? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000,
>> 0xfffffffe02000000)
>> ? from space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000,
>> 0xfffffffe08400000)
>> ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000,
>> 0xfffffffe0e800000)
>> ?concurrent mark-sweep generation total 5136384K, used 3789391K
>> [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
>> ?concurrent-mark-sweep perm gen total 524288K, used 243406K
>> [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
>> 1761242.586: [GC 1761242.586: [ParNew: 819200K->102400K(921600K),
>> 0.1871926 secs] 4608591K->3940305K(6057984K), 0.1879045 secs] [Times:
>> user=0.93 sys=0.05, real=0.19 secs]
>> Heap after GC invocations=23866 (full 1473):
>> ?par new generation?? total 921600K, used 102400K [0xfffffffdd0000000,
>> 0xfffffffe0e800000, 0xfffffffe0e800000)
>> ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000,
>> 0xfffffffe02000000)
>> ? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000,
>> 0xfffffffe0e800000)
>> ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000,
>> 0xfffffffe08400000)
>> ?concurrent mark-sweep generation total 5136384K, used 3837905K
>> [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
>> ?concurrent-mark-sweep perm gen total 524288K, used 243406K
>> [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
>> }
>> 1761245.857: [CMS-concurrent-sweep: 3.225/3.457 secs] [Times: user=6.21
>> sys=0.52, real=3.46 secs]
>> 1761245.858: [CMS-concurrent-reset-start]
>> 1761245.922: [CMS-concurrent-reset: 0.064/0.064 secs] [Times: user=0.08
>> sys=0.01, real=0.06 secs]
>> 1761247.301: [GC [1 CMS-initial-mark: 3676150K(5136384K)]
>> 3952072K(6057984K), 0.2229528 secs] [Times: user=0.22 sys=0.00, real=0.22
>> secs]
>> 1761247.525: [CMS-concurrent-mark-start]
>> 1761251.076: [CMS-concurrent-mark: 3.510/3.551 secs] [Times: user=23.68
>> sys=0.85, real=3.55 secs]
>> 1761251.076: [CMS-concurrent-preclean-start]
>> 1761251.375: [CMS-concurrent-preclean: 0.295/0.298 secs] [Times: user=0.72
>> sys=0.04, real=0.30 secs]
>> 1761251.375: [CMS-concurrent-abortable-preclean-start]
>> ?CMS: abort preclean due to time 1761256.460:
>> [CMS-concurrent-abortable-preclean: 5.012/5.085 secs] [Times: user=9.93
>> sys=0.99, real=5.09 secs]
>> 1761256.469: [GC[YG occupancy: 720909 K (921600 K)]1761256.469: [Rescan
>> (parallel) , 0.4663462 secs]1761256.936: [weak refs processing, 0.0153453
>> secs]1761256.951: [class unloading, 0.0833874 secs]1761257.035: [scrub
>> symbol & string tables, 0.1289153 secs] [1 CMS-remark: 3676150K(5136384K)]
>> 4397060K(6057984K), 0.7879219 secs] [Times: user=1.55 sys=0.96, real=0.79
>> secs]
>> 1761257.258: [CMS-concurrent-sweep-start]
>> {Heap before GC invocations=23866 (full 1474):
>> ?par new generation?? total 921600K, used 921600K [0xfffffffdd0000000,
>> 0xfffffffe0e800000, 0xfffffffe0e800000)
>> ? eden space 819200K, 100% used [0xfffffffdd0000000, 0xfffffffe02000000,
>> 0xfffffffe02000000)
>> ? from space 102400K, 100% used [0xfffffffe08400000, 0xfffffffe0e800000,
>> 0xfffffffe0e800000)
>> ? to?? space 102400K,?? 0% used [0xfffffffe02000000, 0xfffffffe02000000,
>> 0xfffffffe08400000)
>> ?concurrent mark-sweep generation total 5136384K, used 3669509K
>> [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
>> ?concurrent-mark-sweep perm gen total 524288K, used 243414K
>> [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
>> 1761259.137: [GC 1761259.138: [ParNew: 921600K->102400K(921600K),
>> 0.3975686 secs] 4591109K->3894075K(6057984K), 0.3981608 secs] [Times:
>> user=1.65 sys=0.15, real=0.40 secs]
>> Heap after GC invocations=23867 (full 1474):
>> ?par new generation?? total 921600K, used 102400K [0xfffffffdd0000000,
>> 0xfffffffe0e800000, 0xfffffffe0e800000)
>> ? eden space 819200K,?? 0% used [0xfffffffdd0000000, 0xfffffffdd0000000,
>> 0xfffffffe02000000)
>> ? from space 102400K, 100% used [0xfffffffe02000000, 0xfffffffe08400000,
>> 0xfffffffe08400000)
>> ? to?? space 102400K,?? 0% used [0xfffffffe08400000, 0xfffffffe08400000,
>> 0xfffffffe0e800000)
>> ?concurrent mark-sweep generation total 5136384K, used 3791675K
>> [0xfffffffe0e800000, 0xffffffff48000000, 0xffffffff48000000)
>> ?concurrent-mark-sweep perm gen total 524288K, used 243414K
>> [0xffffffff48000000, 0xffffffff68000000, 0xffffffff73800000)
>> }
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL:
>> http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120412/16cc56a7/attachment.html
>> -------------- next part --------------
>> An embedded and charset-unspecified text was scrubbed...
>> Name: CMS Failure.txt
>> Url:
>> http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120412/16cc56a7/CMSFailure.txt
>> -------------- next part --------------
>> An embedded and charset-unspecified text was scrubbed...
>> Name: PARNEW Failure.txt
>> Url:
>> http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120412/16cc56a7/PARNEWFailure.txt
>
>

From taras.tielkes at gmail.com  Sun Apr 15 05:34:07 2012
From: taras.tielkes at gmail.com (Taras Tielkes)
Date: Sun, 15 Apr 2012 14:34:07 +0200
Subject: Promotion failures: indication of CMS fragmentation?
In-Reply-To: <CABzyjynVHStdeACv_o3zyBhc_P_5yrMrqLb+++aWn8cft51PuA@mail.gmail.com>
References: <CA+R7V78oeNvQwWOjagdANw=h0Ws_p5da7BDeOhguoKT1V5n5dQ@mail.gmail.com>
	<4EF9FCAC.3030208@oracle.com>
	<CA+R7V7-SGdXmbtqo=+2VQwKVnCVCZdj4M=gQfrxiGf2fEMi3cA@mail.gmail.com>
	<4F06A270.3010701@oracle.com>
	<CA+R7V78Twoz0a=J5oCRYJjBdnptPdUv9Jnvt4wiLUsh3Cy+bHw@mail.gmail.com>
	<4F0DBEC4.7040907@oracle.com>
	<CA+R7V7-pxrKH5L2brxZRZwKrv7ZF3aYtQkZmb7-A=nSLn5QfYg@mail.gmail.com>
	<4F1ECE7B.3040502@oracle.com>
	<CA+R7V79x29mXvkEKuPnCYrAJfZjzHc5QnfgrNCYPZFO8GRYayg@mail.gmail.com>
	<4F1F2ED7.6060308@oracle.com>
	<CA+R7V7_P4xdsOMdM+KgiO-urNMiPakQQcdjnOQ_yYo4KZhko2w@mail.gmail.com>
	<4F20F78D.9070905@oracle.com>
	<CA+R7V79M0B2UTqqxiUGfoK-1pMP54e+biBnH+wy=zGEA2vjihg@mail.gmail.com>
	<CA+R7V79F59SJL6F7QvmWAuCKyisv5MFuDvsBfkDuvU0UcZ_iOw@mail.gmail.com>
	<CA+R7V7_st6DPnJZOMUnAeRVeYND42Y19rAUjNJ+PhtF72Ur2mQ@mail.gmail.com>
	<CAG7eTFoeYitaBjgt2eUT3kXqU2SGk1eC5eofdAAL1SjuCFMHCg@mail.gmail.com>
	<CABzyjyna2Mq7EXDiZ8mVB=1MX9Gw1=e2z8zO8X69QeodVKbBrg@mail.gmail.com>
	<CA+R7V79NDPWq2YeX8NQ9j6XW8P7=dZbQWJwzG6dxz4UgUGvKuA@mail.gmail.com>
	<CA+R7V783r2459y7r6zxXNP9_eQ2KOa5Oh0Mr2tPhS1d-8H-ing@mail.gmail.com>
	<CA+R7V78c=D70MJv-jUw2X4ayOOUV0UBZDphcucRSpiYYncoV1A@mail.gmail.com>
	<CABzyjynVHStdeACv_o3zyBhc_P_5yrMrqLb+++aWn8cft51PuA@mail.gmail.com>
Message-ID: <CA+R7V7-pON9KOHg+pkV3ZRxodP3=OgyVRL__A2+YAR1OGAUtfw@mail.gmail.com>

Hi Chi, Srinivas,

Optimizing the cost of ParNew (by lowering MTT) would be nice, but for
now my priority is still to minimize the promotion failures.

For example, on the machine running CMS with the "larger" young gen
and survivor spaces (-Xmx5g -Xmn800 -XX:SurvivorRatio=4), I've just
seen a promotion failure again. Below is a snippet of gc.log showing
this.
To put this into perspective, this is a first promotion failure on
that machine in a couple of weeks. Still, zero failures would beat a
single failure, since the clients connecting to this application will
only wait a few seconds before timing out and terminating the
connection. In addition, the promotion failures are occurring in peak
usage moments.

Apart from trying to eliminate the promotion failure pauses, my main
goal is to learn how to understand the root cause in a case like this.
Any suggestions for things to try or read up on are appreciated.

Kind regards,
Taras
------------------------------------------------
2012-04-13T17:44:27.777+0200: 3296255.045: [GC 3296255.046: [ParNew
Desired survivor size 69894144 bytes, new threshold 15 (max 15)
- age   1:    3684448 bytes,    3684448 total
- age   2:     824984 bytes,    4509432 total
- age   3:     885120 bytes,    5394552 total
- age   4:     756568 bytes,    6151120 total
- age   5:     696880 bytes,    6848000 total
- age   6:     890688 bytes,    7738688 total
- age   7:    2631184 bytes,   10369872 total
- age   8:     719976 bytes,   11089848 total
- age   9:     724944 bytes,   11814792 total
- age  10:     750360 bytes,   12565152 total
- age  11:     934944 bytes,   13500096 total
- age  12:     521080 bytes,   14021176 total
- age  13:     543392 bytes,   14564568 total
- age  14:     906616 bytes,   15471184 total
- age  15:     504008 bytes,   15975192 total
: 568932K->22625K(682688K), 0.0410180 secs]
3077079K->2531413K(5106368K), 0.0416940 secs] [Times: user=0.30
sys=0.01, real=0.05 secs]
2012-04-13T17:44:33.893+0200: 3296261.162: [GC 3296261.162: [ParNew
Desired survivor size 69894144 bytes, new threshold 15 (max 15)
- age   1:    2975896 bytes,    2975896 total
- age   2:     742592 bytes,    3718488 total
- age   3:     812864 bytes,    4531352 total
- age   4:     873488 bytes,    5404840 total
- age   5:     746128 bytes,    6150968 total
- age   6:     685192 bytes,    6836160 total
- age   7:     888376 bytes,    7724536 total
- age   8:    2621688 bytes,   10346224 total
- age   9:     715608 bytes,   11061832 total
- age  10:     723336 bytes,   11785168 total
- age  11:     749856 bytes,   12535024 total
- age  12:     914632 bytes,   13449656 total
- age  13:     520944 bytes,   13970600 total
- age  14:     543224 bytes,   14513824 total
- age  15:     906040 bytes,   15419864 total
: 568801K->22726K(682688K), 0.0447800 secs]
3077589K->2532008K(5106368K), 0.0454710 secs] [Times: user=0.33
sys=0.00, real=0.05 secs]
2012-04-13T17:44:40.231+0200: 3296267.499: [GC 3296267.500: [ParNew
(1: promotion failure size = 16)  (2: promotion failure size = 56)
(4: promotion failure
size = 342)  (5: promotion failure size = 1026)  (6: promotion failure
size = 278)  (promotion failed)
Desired survivor size 69894144 bytes, new threshold 15 (max 15)
- age   1:    2436840 bytes,    2436840 total
- age   2:    1625136 bytes,    4061976 total
- age   3:     691664 bytes,    4753640 total
- age   4:     799992 bytes,    5553632 total
- age   5:     858344 bytes,    6411976 total
- age   6:     730200 bytes,    7142176 total
- age   7:     680072 bytes,    7822248 total
- age   8:     885960 bytes,    8708208 total
- age   9:    2618544 bytes,   11326752 total
- age  10:     709168 bytes,   12035920 total
- age  11:     714576 bytes,   12750496 total
- age  12:     734976 bytes,   13485472 total
- age  13:     905048 bytes,   14390520 total
- age  14:     520320 bytes,   14910840 total
- age  15:     543056 bytes,   15453896 total
: 568902K->568678K(682688K), 0.3130580 secs]3296267.813: [CMS:
2510091K->573489K(4423680K), 7.7481330 secs]
3078184K->573489K(5106368K), [CMS Perm : 144002K->
143970K(262144K)], 8.0619690 secs] [Times: user=8.35 sys=0.01, real=8.06 secs]
2012-04-13T17:44:51.337+0200: 3296278.606: [GC 3296278.606: [ParNew
Desired survivor size 69894144 bytes, new threshold 15 (max 15)
- age   1:   33717528 bytes,   33717528 total
: 546176K->43054K(682688K), 0.0515990 secs]
1119665K->616543K(5106368K), 0.0523550 secs] [Times: user=0.34
sys=0.00, real=0.05 secs]
------------------------------------------------

On Tue, Mar 20, 2012 at 10:12 PM, Srinivas Ramakrishna
<ysr1729 at gmail.com> wrote:
> As Chi-ho noted, about 3-4 MB of data does get promoted per scavenge,
> after having
> sloshed around in your survivor spaces some 15 times. I'd venture that
> whatever winnowing
> of young objects was to ocur has in fact occured already within the
> first 3-4 scavenges that
> an object has survived, after which the drop-off in population is less
> sharp. So I'd suggest
> lowering the MTT to about 3, while leaving the survivor ratio intact.
> That should reduce your
> copying costs and bring down your scavenge pauses further, while not
> adversely affecting
> your promotion rates (and concomitantly the fragmentation).
>
> One thing that was a bit puzzling about the stats below was that you'd
> expect the volume
> of generation X in scavenge N to be no less than the volume of
> generation X+1 in scavenge N+1,
> but occasionally that natural invariant does not appear to hold, which
> is quite puzzling --
> indicating perhaps that either ages or populations are not being
> correctly tracked.
>
> I don't know if anyone else has noticed that in their tenuring
> distributions as well....
>
> -- ramki
>
> On Tue, Mar 20, 2012 at 9:36 AM, Taras Tielkes <taras.tielkes at gmail.com> wrote:
>> Hi,
>>
>> I've collected -XX:+PrintTenuringDistribution data from a node in our
>> production environment, running -Xmx5g -Xmn400m -XX:SurvivorRatio=8.
>> On one other production node, we've configured a larger new gen, and
>> larger survivor spaces (-Xmx5g -Xmn800m -XX:SurvivorRatio=4).
>> This node has -XX:+PrintTenuringDistribution logging as well.
>>
>> The node running the larger new gen and survivor spaces has not run
>> into a promotion failure yet, while the ones still running the old
>> config have hit a few.
>> The promotion failures are typically experienced at high load periods,
>> which makes sense, as allocation and promotion will experience a spike
>> in those periods as well.
>>
>> The inherent nature of the application implies relatively long
>> sessions (towards a few hours), retaining a fair amout of state up to
>> an hour.
>> I believe this is the main reason of the relatively high promotion
>> rate we're experiencing.
>>
>>
>> Here's a fragment of gc log from one of the nodes running the older
>> (smaller) new gen, including a promotion failure:
>> -------------------------
>> 2012-03-15T18:32:17.785+0100: 796604.225: [GC 796604.225: [ParNew
>> Desired survivor size 20971520 bytes, new threshold 8 (max 15)
>> - age ? 1: ? ?2927728 bytes, ? ?2927728 total
>> - age ? 2: ? ?2428512 bytes, ? ?5356240 total
>> - age ? 3: ? ?2696376 bytes, ? ?8052616 total
>> - age ? 4: ? ?2623576 bytes, ? 10676192 total
>> - age ? 5: ? ?3365576 bytes, ? 14041768 total
>> - age ? 6: ? ?2792272 bytes, ? 16834040 total
>> - age ? 7: ? ?2233008 bytes, ? 19067048 total
>> - age ? 8: ? ?2263824 bytes, ? 21330872 total
>> : 358709K->29362K(368640K), 0.0461460 secs]
>> 3479492K->3151874K(5201920K), 0.0467320 secs] [Times: user=0.34
>> sys=0.01, real=0.05 secs]
>> 2012-03-15T18:32:21.546+0100: 796607.986: [GC 796607.986: [ParNew (0:
>> promotion failure size = 25) ?(1: promotion failure size = 25) ?(2:
>> promotion failure size = 25) ?(3: promotion failure size = 25) ?(4:
>> promotion failure size = 25) ?(5
>> : promotion failure size = 25) ?(6: promotion failure size = 341) ?(7:
>> promotion failure size = 25) ?(promotion failed)
>> Desired survivor size 20971520 bytes, new threshold 8 (max 15)
>> - age ? 1: ? ?3708208 bytes, ? ?3708208 total
>> - age ? 2: ? ?2174384 bytes, ? ?5882592 total
>> - age ? 3: ? ?2383256 bytes, ? ?8265848 total
>> - age ? 4: ? ?2689912 bytes, ? 10955760 total
>> - age ? 5: ? ?2621832 bytes, ? 13577592 total
>> - age ? 6: ? ?3360440 bytes, ? 16938032 total
>> - age ? 7: ? ?2784136 bytes, ? 19722168 total
>> - age ? 8: ? ?2220232 bytes, ? 21942400 total
>> : 357042K->356456K(368640K), 0.2734100 secs]796608.259: [CMS:
>> 3124189K->516640K(4833280K), 6.8127070 secs]
>> 3479554K->516640K(5201920K), [CMS Perm : 142423K->142284K(262144K)],
>> 7.0867850 secs] [Times: user=7.32 sys=0.07, real=7.09 secs]
>> 2012-03-15T18:32:30.279+0100: 796616.719: [GC 796616.720: [ParNew
>> Desired survivor size 20971520 bytes, new threshold 1 (max 15)
>> - age ? 1: ? 29721456 bytes, ? 29721456 total
>> : 327680K->40960K(368640K), 0.0403130 secs]
>> 844320K->557862K(5201920K), 0.0409070 secs] [Times: user=0.27
>> sys=0.01, real=0.04 secs]
>> 2012-03-15T18:32:32.701+0100: 796619.141: [GC 796619.141: [ParNew
>> Desired survivor size 20971520 bytes, new threshold 15 (max 15)
>> - age ? 1: ? 10310176 bytes, ? 10310176 total
>> -------------------------
>>
>> For contrast, here's a gc log fragment from the single node running
>> the larger new gen and larger survivor spaces:
>> (the fragment is from the same point in time, with the nodes
>> experiencing equal load)
>> -------------------------
>> 2012-03-15T18:32:12.067+0100: 797119.336: [GC 797119.336: [ParNew
>> Desired survivor size 69894144 bytes, new threshold 15 (max 15)
>> - age ? 1: ? ?5611536 bytes, ? ?5611536 total
>> - age ? 2: ? ?3731888 bytes, ? ?9343424 total
>> - age ? 3: ? ?3450672 bytes, ? 12794096 total
>> - age ? 4: ? ?3314744 bytes, ? 16108840 total
>> - age ? 5: ? ?3459888 bytes, ? 19568728 total
>> - age ? 6: ? ?3334712 bytes, ? 22903440 total
>> - age ? 7: ? ?3671960 bytes, ? 26575400 total
>> - age ? 8: ? ?3841608 bytes, ? 30417008 total
>> - age ? 9: ? ?2035392 bytes, ? 32452400 total
>> - age ?10: ? ?1975056 bytes, ? 34427456 total
>> - age ?11: ? ?2021344 bytes, ? 36448800 total
>> - age ?12: ? ?1520752 bytes, ? 37969552 total
>> - age ?13: ? ?1494176 bytes, ? 39463728 total
>> - age ?14: ? ?2355136 bytes, ? 41818864 total
>> - age ?15: ? ?1279000 bytes, ? 43097864 total
>> : 603473K->61640K(682688K), 0.0756570 secs]
>> 3373284K->2832383K(5106368K), 0.0762090 secs] [Times: user=0.56
>> sys=0.00, real=0.08 secs]
>> 2012-03-15T18:32:18.200+0100: 797125.468: [GC 797125.469: [ParNew
>> Desired survivor size 69894144 bytes, new threshold 15 (max 15)
>> - age ? 1: ? ?6101320 bytes, ? ?6101320 total
>> - age ? 2: ? ?4446776 bytes, ? 10548096 total
>> - age ? 3: ? ?3701384 bytes, ? 14249480 total
>> - age ? 4: ? ?3438488 bytes, ? 17687968 total
>> - age ? 5: ? ?3295360 bytes, ? 20983328 total
>> - age ? 6: ? ?3403320 bytes, ? 24386648 total
>> - age ? 7: ? ?3323368 bytes, ? 27710016 total
>> - age ? 8: ? ?3665760 bytes, ? 31375776 total
>> - age ? 9: ? ?2427904 bytes, ? 33803680 total
>> - age ?10: ? ?1418656 bytes, ? 35222336 total
>> - age ?11: ? ?1955192 bytes, ? 37177528 total
>> - age ?12: ? ?2006064 bytes, ? 39183592 total
>> - age ?13: ? ?1520768 bytes, ? 40704360 total
>> - age ?14: ? ?1493728 bytes, ? 42198088 total
>> - age ?15: ? ?2354376 bytes, ? 44552464 total
>> : 607816K->62650K(682688K), 0.0779270 secs]
>> 3378559K->2834643K(5106368K), 0.0784690 secs] [Times: user=0.58
>> sys=0.00, real=0.08 secs]
>> -------------------------
>>
>> Questions:
>>
>> 1) From the tenuring distributions, it seems that the application
>> benefits from larger new gen and survivor spaces.
>> The next thing we'll try is to run with -Xmn1g -XX:SurvivorRatio=2,
>> and see if the ParNew times are still acceptable.
>> Does this seem a sensible approach in this context?
>> Are there other variables beyond ParNew times that limit scaling the
>> new gen to a large size?
>>
>> 2) Given the object age demographics inherent to our application, we
>> can not expect to see the majority of data get collected in the new
>> gen.
>>
>> Our approach to fight the promotion failures consists of three aspects:
>> a) Lower the overall allocation rate of our application (by improving
>> wasteful hotspots), to decrease overall ParNew collection frequency.
>> b) Configure the new gen and survivor spaces as large as possible,
>> keeping an eye on ParNew times and overall new/tenured ratio.
>> c) Try to refactor the data structures that form the bulk of promoted
>> data, to retain only the strictly required subgraphs.
>>
>> Is there anything else I can try or measure, in order to better
>> understand the problem?
>>
>> Thanks in advance,
>> Taras
>>
>>
>> On Wed, Feb 22, 2012 at 10:51 AM, Taras Tielkes <taras.tielkes at gmail.com> wrote:
>>> (this time properly responding to the list alias)
>>> Hi Srinivas,
>>>
>>> We're running 1.6.0 u29 on Linux x64. My understanding is that
>>> CompressedOops is enabled by default since u23.
>>>
>>> At least this page seems to support that:
>>> http://docs.oracle.com/javase/7/docs/technotes/guides/vm/performance-enhancements-7.html
>>>
>>> Regarding the other remarks (also from Todd and Chi), I'll comment
>>> later. The first thing on my list is to collect
>>> PrintTenuringDistribution data now.
>>>
>>> Kind regards,
>>> Taras
>>>
>>> On Wed, Feb 22, 2012 at 10:50 AM, Taras Tielkes <taras.tielkes at gmail.com> wrote:
>>>> Hi Srinivas,
>>>>
>>>> We're running 1.6.0 u29 on Linux x64. My understanding is that
>>>> CompressedOops is enabled by default since u23.
>>>>
>>>> At least this page seems to support that:
>>>> http://docs.oracle.com/javase/7/docs/technotes/guides/vm/performance-enhancements-7.html
>>>>
>>>> Regarding the other remarks (also from Todd and Chi), I'll comment
>>>> later. The first thing on my list is to collect
>>>> PrintTenuringDistribution data now.
>>>>
>>>> Kind regards,
>>>> Taras
>>>>
>>>> On Wed, Feb 22, 2012 at 12:40 AM, Srinivas Ramakrishna
>>>> <ysr1729 at gmail.com> wrote:
>>>>> I agree that premature promotions are almost always the first and most
>>>>> important thing to fix when running
>>>>> into fragmentation or overload issues with CMS. However, I can also imagine
>>>>> long-lived objects with a highly
>>>>> non-stationary size distribution which can also cause problems for CMS
>>>>> despite best efforts to tune against
>>>>> premature promotion.
>>>>>
>>>>> I didn't think Treas was running with MTT=0, although MTT > 0 is no recipe
>>>>> for avoiding premature promotion
>>>>> with bursty loads that case overflow the survivor spaces -- as you say large
>>>>> survivor spaces with a low
>>>>> TargetSurvivorRatio -- so as to leave plenty of space to absorb/accommodate
>>>>> spiking/bursty loads? is
>>>>> definitely a "best practice" for CMS (and possibly for other concurrent
>>>>> collectors as well).
>>>>>
>>>>> One thing Taras can do to see if premature promotion might be an issue is to
>>>>> look at the tenuring
>>>>> threshold in his case. A rough proxy (if PrintTenuringDistribution is not
>>>>> enabled) is to look at the
>>>>> promotion volume per scavenge. It may be possible, if premature promotion is
>>>>> a cause, to see
>>>>> some kind of medium-term correlation between high promotion volume and
>>>>> eventual promotion
>>>>> failure despite frequent CMS collections.
>>>>>
>>>>> One other point which may or may not be relevant. I see that Taras is not
>>>>> using CompressedOops...
>>>>> Using that alone would greatly decrease memory pressure and provide more
>>>>> breathing room to CMS,
>>>>> which is also almost always a good idea.
>>>>>
>>>>> -- ramki
>>>>>
>>>>> On Tue, Feb 21, 2012 at 10:16 AM, Chi Ho Kwok <chkwok at digibites.nl> wrote:
>>>>>>
>>>>>> Hi Teras,
>>>>>>
>>>>>> I think you may want to look into sizing the new and especially the
>>>>>> survivor spaces differently. We run something similar to what you described,
>>>>>> high volume request processing with large dataset loading, and what we've
>>>>>> seen at the start is that the survivor spaces are completely overloaded,
>>>>>> causing premature promotions.
>>>>>>
>>>>>> We've configured our vm with the following goals/guideline:
>>>>>>
>>>>>> old space is for semi-permanent data, living for at least 30s, average ~10
>>>>>> minutes
>>>>>> new space contains only temporary and just loaded data
>>>>>> surviving objects from new should never reach old in 1 gc, so the survivor
>>>>>> space may never be 100% full
>>>>>>
>>>>>> With jstat -gcutil `pidof java` 2000, we see things like:
>>>>>>
>>>>>> ? S0 ? ? S1 ? ? E ? ? ?O ? ? ?P ? ? YGC ? ? YGCT ? ?FGC ? ?FGCT ? ? GCT
>>>>>> ?70.20 ? 0.00 ?19.65 ?57.60 ?59.90 124808 29474.299 ?2498 ?191.110
>>>>>> 29665.409
>>>>>> ?70.20 ? 0.00 ?92.89 ?57.60 ?59.90 124808 29474.299 ?2498 ?191.110
>>>>>> 29665.409
>>>>>> ?70.20 ? 0.00 ?93.47 ?57.60 ?59.90 124808 29474.299 ?2498 ?191.110
>>>>>> 29665.409
>>>>>> ? 0.00 ?65.69 ?78.07 ?58.09 ?59.90 124809 29474.526 ?2498 ?191.110
>>>>>> 29665.636
>>>>>> ?84.97 ? 0.00 ?48.19 ?58.57 ?59.90 124810 29474.774 ?2498 ?191.110
>>>>>> 29665.884
>>>>>> ?84.97 ? 0.00 ?81.30 ?58.57 ?59.90 124810 29474.774 ?2498 ?191.110
>>>>>> 29665.884
>>>>>> ? 0.00 ?62.64 ?27.22 ?59.12 ?59.90 124811 29474.992 ?2498 ?191.110
>>>>>> 29666.102
>>>>>> ? 0.00 ?62.64 ?54.47 ?59.12 ?59.90 124811 29474.992 ?2498 ?191.110
>>>>>> 29666.102
>>>>>> ?75.68 ? 0.00 ? 6.80 ?59.53 ?59.90 124812 29475.228 ?2498 ?191.110
>>>>>> 29666.338
>>>>>> ?75.68 ? 0.00 ?23.38 ?59.53 ?59.90 124812 29475.228 ?2498 ?191.110
>>>>>> 29666.338
>>>>>> ?75.68 ? 0.00 ?27.72 ?59.53 ?59.90 124812 29475.228 ?2498 ?191.110
>>>>>> 29666.338
>>>>>>
>>>>>> If you follow the lines, you can see Eden fill up to 100% on line 4,
>>>>>> surviving objects are copied into S1, S0 is collected and added 0.49% to
>>>>>> Old. On line 5, another GC happened, with Eden->S0, S1->Old, etc. No objects
>>>>>> is ever transferred from Eden to Old, unless there's a huge peak of
>>>>>> requests.
>>>>>>
>>>>>> This is with a: 32GB heap, Mxn1200M, SurvivorRatio 2 (600MB Eden, 300MB
>>>>>> S0, 300MB S1), MaxTenuringThreshold 1 (whatever is still alive in S0/1 on
>>>>>> the second GC is copied to old, don't wait, web requests are quite bursty).
>>>>>> With about 1 collection every 2-5 seconds, objects promoted to Old must live
>>>>>> for at 4-10 seconds; as that's longer than an average request (50ms-1s),
>>>>>> none of the temporary data ever makes it into Old, which is much more
>>>>>> expensive to collect. It works even with a higher than default
>>>>>> CMSInitiatingOccupancyFraction=76 to optimize for space available for the
>>>>>> large data cache we have.
>>>>>>
>>>>>>
>>>>>> With your config of 400MB Total new, with 350MB Eden, 25MB S0, 25MB S1
>>>>>> (SurvivorRatio 8), no tenuring threshold, I think loads of new objects get
>>>>>> copied from Eden to Old directly, causing trouble for the CMS. You can use
>>>>>> jstat to get live stats and tweak until it doesn't happen. If you can't make
>>>>>> changes on live that easil, try doubling the new size indeed, with a 400
>>>>>> Eden, 200 S0, 200 S1 and?MaxTenuringThreshold?1 setting. It's probably
>>>>>> overkill, but if should solve the problem if it is caused by premature
>>>>>> promotion.
>>>>>>
>>>>>>
>>>>>> Chi Ho Kwok
>>>>>>
>>>>>>
>>>>>> On Tue, Feb 21, 2012 at 5:55 PM, Taras Tielkes <taras.tielkes at gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> We've removed the "-XX:+CMSScavengeBeforeRemark" setting from 50% of
>>>>>>> our production nodes.
>>>>>>> After running for a few weeks, it seems that there's no impact from
>>>>>>> removing this option.
>>>>>>> Which is good, since it seems we can remove it from the other nodes as
>>>>>>> well, simplifying our overall JVM configuration ;-)
>>>>>>>
>>>>>>> However, we're still seeing promotion failures on all nodes, once
>>>>>>> every day or so.
>>>>>>>
>>>>>>> There's still the "Magic 1026": this accounts for ~60% of the
>>>>>>> promotion failures that we're seeing (single ParNew thread thread,
>>>>>>> 1026 failure size):
>>>>>>> --------------------
>>>>>>> 2012-02-06T09:13:51.806+0100: 328095.085: [GC 328095.086: [ParNew:
>>>>>>> 359895K->29357K(368640K), 0.0429070 secs]
>>>>>>> 3471021K->3143476K(5201920K), 0.0434950 secs] [Times: user=0.32
>>>>>>> sys=0.00, real=0.04 secs]
>>>>>>> 2012-02-06T09:13:55.922+0100: 328099.201: [GC 328099.201: [ParNew:
>>>>>>> 357037K->31817K(368640K), 0.0429130 secs]
>>>>>>> 3471156K->3148946K(5201920K), 0.0434930 secs] [Times: user=0.31
>>>>>>> sys=0.00, real=0.04 secs]
>>>>>>> 2012-02-06T09:13:59.044+0100: 328102.324: [GC 328102.324: [ParNew
>>>>>>> (promotion failure size = 1026) ?(promotion failed):
>>>>>>> 359497K->368640K(368640K), 0.2226790 secs]328102.547: [CMS:
>>>>>>> 3125609K->451515K(4833280K), 5.6225880 secs] 3476626K->4515
>>>>>>> 15K(5201920K), [CMS Perm : 124373K->124353K(262144K)], 5.8459380 secs]
>>>>>>> [Times: user=6.20 sys=0.01, real=5.85 secs]
>>>>>>> 2012-02-06T09:14:05.243+0100: 328108.522: [GC 328108.523: [ParNew:
>>>>>>> 327680K->40960K(368640K), 0.0319160 secs] 779195K->497658K(5201920K),
>>>>>>> 0.0325360 secs] [Times: user=0.21 sys=0.01, real=0.03 secs]
>>>>>>> 2012-02-06T09:14:07.836+0100: 328111.116: [GC 328111.116: [ParNew:
>>>>>>> 368640K->32785K(368640K), 0.0744670 secs] 825338K->520234K(5201920K),
>>>>>>> 0.0750390 secs] [Times: user=0.40 sys=0.02, real=0.08 secs]
>>>>>>> --------------------
>>>>>>> Given the 1026 word size, I'm wondering if I should be hunting for an
>>>>>>> overuse of BufferedInputStream/BufferedOutoutStream, since both have
>>>>>>> 8192 as a default buffer size.
>>>>>>>
>>>>>>> The second group of promotion failures look like this (multiple ParNew
>>>>>>> threads, small failure sizes):
>>>>>>> --------------------
>>>>>>> 2012-02-06T09:50:15.773+0100: 328756.964: [GC 328756.964: [ParNew:
>>>>>>> 356116K->29934K(368640K), 0.0461100 secs]
>>>>>>> 3203863K->2880162K(5201920K), 0.0468870 secs] [Times: user=0.34
>>>>>>> sys=0.01, real=0.05 secs]
>>>>>>> 2012-02-06T09:50:19.153+0100: 328760.344: [GC 328760.344: [ParNew:
>>>>>>> 357614K->30359K(368640K), 0.0454680 secs]
>>>>>>> 3207842K->2882892K(5201920K), 0.0462280 secs] [Times: user=0.33
>>>>>>> sys=0.01, real=0.05 secs]
>>>>>>> 2012-02-06T09:50:22.658+0100: 328763.849: [GC 328763.849: [ParNew (1:
>>>>>>> promotion failure size = 25) ?(4: promotion failure size = 25) ?(6:
>>>>>>> promotion failure size = 25) ?(7: promotion failure size = 144)
>>>>>>> (promotion failed): 358039K->358358
>>>>>>> K(368640K), 0.2148680 secs]328764.064: [CMS:
>>>>>>> 2854709K->446750K(4833280K), 5.8368270 secs]
>>>>>>> 3210572K->446750K(5201920K), [CMS Perm : 124670K->124644K(262144K)],
>>>>>>> 6.0525230 secs] [Times: user=6.32 sys=0.00, real=6.05 secs]
>>>>>>> 2012-02-06T09:50:29.896+0100: 328771.086: [GC 328771.087: [ParNew:
>>>>>>> 327680K->22569K(368640K), 0.0227080 secs] 774430K->469319K(5201920K),
>>>>>>> 0.0235020 secs] [Times: user=0.16 sys=0.00, real=0.02 secs]
>>>>>>> 2012-02-06T09:50:31.076+0100: 328772.266: [GC 328772.267: [ParNew:
>>>>>>> 350249K->22264K(368640K), 0.0235480 secs] 796999K->469014K(5201920K),
>>>>>>> 0.0243000 secs] [Times: user=0.18 sys=0.01, real=0.02 secs]
>>>>>>> --------------------
>>>>>>>
>>>>>>> We're going to try to double the new size on a single node, to see the
>>>>>>> effects of that.
>>>>>>>
>>>>>>> Beyond this experiment, is there any additional data I can collect to
>>>>>>> better understand the nature of the promotion failures?
>>>>>>> Am I facing collecting free list statistics at this point?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Taras
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> hotspot-gc-use mailing list
>>>>>> hotspot-gc-use at openjdk.java.net
>>>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>>>>
>>>>>
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

From taras.tielkes at gmail.com  Sun Apr 15 08:08:03 2012
From: taras.tielkes at gmail.com (Taras Tielkes)
Date: Sun, 15 Apr 2012 17:08:03 +0200
Subject: Promotion failures: indication of CMS fragmentation?
In-Reply-To: <CAG7eTFryQozJmQ=2MEQn0Yrz+c9LuxbknAPnmQwzN6sdM22j3g@mail.gmail.com>
References: <CA+R7V78oeNvQwWOjagdANw=h0Ws_p5da7BDeOhguoKT1V5n5dQ@mail.gmail.com>
	<4EF9FCAC.3030208@oracle.com>
	<CA+R7V7-SGdXmbtqo=+2VQwKVnCVCZdj4M=gQfrxiGf2fEMi3cA@mail.gmail.com>
	<4F06A270.3010701@oracle.com>
	<CA+R7V78Twoz0a=J5oCRYJjBdnptPdUv9Jnvt4wiLUsh3Cy+bHw@mail.gmail.com>
	<4F0DBEC4.7040907@oracle.com>
	<CA+R7V7-pxrKH5L2brxZRZwKrv7ZF3aYtQkZmb7-A=nSLn5QfYg@mail.gmail.com>
	<4F1ECE7B.3040502@oracle.com>
	<CA+R7V79x29mXvkEKuPnCYrAJfZjzHc5QnfgrNCYPZFO8GRYayg@mail.gmail.com>
	<4F1F2ED7.6060308@oracle.com>
	<CA+R7V7_P4xdsOMdM+KgiO-urNMiPakQQcdjnOQ_yYo4KZhko2w@mail.gmail.com>
	<4F20F78D.9070905@oracle.com>
	<CA+R7V79M0B2UTqqxiUGfoK-1pMP54e+biBnH+wy=zGEA2vjihg@mail.gmail.com>
	<CA+R7V79F59SJL6F7QvmWAuCKyisv5MFuDvsBfkDuvU0UcZ_iOw@mail.gmail.com>
	<CA+R7V7_st6DPnJZOMUnAeRVeYND42Y19rAUjNJ+PhtF72Ur2mQ@mail.gmail.com>
	<CAG7eTFoeYitaBjgt2eUT3kXqU2SGk1eC5eofdAAL1SjuCFMHCg@mail.gmail.com>
	<CABzyjyna2Mq7EXDiZ8mVB=1MX9Gw1=e2z8zO8X69QeodVKbBrg@mail.gmail.com>
	<CA+R7V79NDPWq2YeX8NQ9j6XW8P7=dZbQWJwzG6dxz4UgUGvKuA@mail.gmail.com>
	<CA+R7V783r2459y7r6zxXNP9_eQ2KOa5Oh0Mr2tPhS1d-8H-ing@mail.gmail.com>
	<CA+R7V78c=D70MJv-jUw2X4ayOOUV0UBZDphcucRSpiYYncoV1A@mail.gmail.com>
	<CABzyjynVHStdeACv_o3zyBhc_P_5yrMrqLb+++aWn8cft51PuA@mail.gmail.com>
	<CA+R7V7-pON9KOHg+pkV3ZRxodP3=OgyVRL__A2+YAR1OGAUtfw@mail.gmail.com>
	<CAG7eTFryQozJmQ=2MEQn0Yrz+c9LuxbknAPnmQwzN6sdM22j3g@mail.gmail.com>
Message-ID: <CA+R7V7_GFq7mpVKcraMDDVfGEzAz9Wvn=K5k=14L9XM8EW3O4Q@mail.gmail.com>

Hi Chi,

Is it o.k. if I send this off-list to you directly? If so, how much
more do you need? Just enough to cover the previous CMS?
We're running with  -XX:CMSInitiatingOccupancyFraction=68 and
-XX:+UseCMSInitiatingOccupancyOnly, by the way.

I do have shell access, however, on that particular machine we're
experiencing the "process id not found" issue with jstat.
I think this can be worked around by fiddling with temp directory
options, but we haven't tried that yet.
Regarding the jstat output, I assume this would be most valuable to
have for the exact moment when the promotion failure happens, correct?
If so, we can try to set up jstat to run in the background
continuously, to have more diagnostic data in the future.

Kind regards,
Taras

On Sun, Apr 15, 2012 at 2:48 PM, Chi Ho Kwok <chkwok at digibites.nl> wrote:
> Hi Teras,
>
> Can you send me a larger chunk of the log? I'm interested in seeing when the
> last CMS was run and what it freed. Maybe it's kicking in too late, the full
> GC triggered by promotion failure only found 600M live data, rest was
> garbage. If that's the cause, lowering?XX:CMSInitiatingOccupancyFraction can
> help.
>
> Also, do you have shell access to that machine? If so, try running jstat,
> you can see the usage of all generations live as it happens.
>
>
> Chi Ho Kwok
>
> On Sun, Apr 15, 2012 at 2:34 PM, Taras Tielkes <taras.tielkes at gmail.com>
> wrote:
>>
>> Hi Chi, Srinivas,
>>
>> Optimizing the cost of ParNew (by lowering MTT) would be nice, but for
>> now my priority is still to minimize the promotion failures.
>>
>> For example, on the machine running CMS with the "larger" young gen
>> and survivor spaces (-Xmx5g -Xmn800 -XX:SurvivorRatio=4), I've just
>> seen a promotion failure again. Below is a snippet of gc.log showing
>> this.
>> To put this into perspective, this is a first promotion failure on
>> that machine in a couple of weeks. Still, zero failures would beat a
>> single failure, since the clients connecting to this application will
>> only wait a few seconds before timing out and terminating the
>> connection. In addition, the promotion failures are occurring in peak
>> usage moments.
>>
>> Apart from trying to eliminate the promotion failure pauses, my main
>> goal is to learn how to understand the root cause in a case like this.
>> Any suggestions for things to try or read up on are appreciated.
>>
>> Kind regards,
>> Taras
>> ------------------------------------------------
>> 2012-04-13T17:44:27.777+0200: 3296255.045: [GC 3296255.046: [ParNew
>> Desired survivor size 69894144 bytes, new threshold 15 (max 15)
>> - age ? 1: ? ?3684448 bytes, ? ?3684448 total
>> - age ? 2: ? ? 824984 bytes, ? ?4509432 total
>> - age ? 3: ? ? 885120 bytes, ? ?5394552 total
>> - age ? 4: ? ? 756568 bytes, ? ?6151120 total
>> - age ? 5: ? ? 696880 bytes, ? ?6848000 total
>> - age ? 6: ? ? 890688 bytes, ? ?7738688 total
>> - age ? 7: ? ?2631184 bytes, ? 10369872 total
>> - age ? 8: ? ? 719976 bytes, ? 11089848 total
>> - age ? 9: ? ? 724944 bytes, ? 11814792 total
>> - age ?10: ? ? 750360 bytes, ? 12565152 total
>> - age ?11: ? ? 934944 bytes, ? 13500096 total
>> - age ?12: ? ? 521080 bytes, ? 14021176 total
>> - age ?13: ? ? 543392 bytes, ? 14564568 total
>> - age ?14: ? ? 906616 bytes, ? 15471184 total
>> - age ?15: ? ? 504008 bytes, ? 15975192 total
>> : 568932K->22625K(682688K), 0.0410180 secs]
>> 3077079K->2531413K(5106368K), 0.0416940 secs] [Times: user=0.30
>> sys=0.01, real=0.05 secs]
>> 2012-04-13T17:44:33.893+0200: 3296261.162: [GC 3296261.162: [ParNew
>> Desired survivor size 69894144 bytes, new threshold 15 (max 15)
>> - age ? 1: ? ?2975896 bytes, ? ?2975896 total
>> - age ? 2: ? ? 742592 bytes, ? ?3718488 total
>> - age ? 3: ? ? 812864 bytes, ? ?4531352 total
>> - age ? 4: ? ? 873488 bytes, ? ?5404840 total
>> - age ? 5: ? ? 746128 bytes, ? ?6150968 total
>> - age ? 6: ? ? 685192 bytes, ? ?6836160 total
>> - age ? 7: ? ? 888376 bytes, ? ?7724536 total
>> - age ? 8: ? ?2621688 bytes, ? 10346224 total
>> - age ? 9: ? ? 715608 bytes, ? 11061832 total
>> - age ?10: ? ? 723336 bytes, ? 11785168 total
>> - age ?11: ? ? 749856 bytes, ? 12535024 total
>> - age ?12: ? ? 914632 bytes, ? 13449656 total
>> - age ?13: ? ? 520944 bytes, ? 13970600 total
>> - age ?14: ? ? 543224 bytes, ? 14513824 total
>> - age ?15: ? ? 906040 bytes, ? 15419864 total
>> : 568801K->22726K(682688K), 0.0447800 secs]
>> 3077589K->2532008K(5106368K), 0.0454710 secs] [Times: user=0.33
>> sys=0.00, real=0.05 secs]
>> 2012-04-13T17:44:40.231+0200: 3296267.499: [GC 3296267.500: [ParNew
>> (1: promotion failure size = 16) ?(2: promotion failure size = 56)
>> (4: promotion failure
>> size = 342) ?(5: promotion failure size = 1026) ?(6: promotion failure
>> size = 278) ?(promotion failed)
>> Desired survivor size 69894144 bytes, new threshold 15 (max 15)
>> - age ? 1: ? ?2436840 bytes, ? ?2436840 total
>> - age ? 2: ? ?1625136 bytes, ? ?4061976 total
>> - age ? 3: ? ? 691664 bytes, ? ?4753640 total
>> - age ? 4: ? ? 799992 bytes, ? ?5553632 total
>> - age ? 5: ? ? 858344 bytes, ? ?6411976 total
>> - age ? 6: ? ? 730200 bytes, ? ?7142176 total
>> - age ? 7: ? ? 680072 bytes, ? ?7822248 total
>> - age ? 8: ? ? 885960 bytes, ? ?8708208 total
>> - age ? 9: ? ?2618544 bytes, ? 11326752 total
>> - age ?10: ? ? 709168 bytes, ? 12035920 total
>> - age ?11: ? ? 714576 bytes, ? 12750496 total
>> - age ?12: ? ? 734976 bytes, ? 13485472 total
>> - age ?13: ? ? 905048 bytes, ? 14390520 total
>> - age ?14: ? ? 520320 bytes, ? 14910840 total
>> - age ?15: ? ? 543056 bytes, ? 15453896 total
>> : 568902K->568678K(682688K), 0.3130580 secs]3296267.813: [CMS:
>> 2510091K->573489K(4423680K), 7.7481330 secs]
>> 3078184K->573489K(5106368K), [CMS Perm : 144002K->
>> 143970K(262144K)], 8.0619690 secs] [Times: user=8.35 sys=0.01, real=8.06
>> secs]
>> 2012-04-13T17:44:51.337+0200: 3296278.606: [GC 3296278.606: [ParNew
>> Desired survivor size 69894144 bytes, new threshold 15 (max 15)
>> - age ? 1: ? 33717528 bytes, ? 33717528 total
>> : 546176K->43054K(682688K), 0.0515990 secs]
>> 1119665K->616543K(5106368K), 0.0523550 secs] [Times: user=0.34
>> sys=0.00, real=0.05 secs]
>> ------------------------------------------------
>>
>> On Tue, Mar 20, 2012 at 10:12 PM, Srinivas Ramakrishna
>> <ysr1729 at gmail.com> wrote:
>> > As Chi-ho noted, about 3-4 MB of data does get promoted per scavenge,
>> > after having
>> > sloshed around in your survivor spaces some 15 times. I'd venture that
>> > whatever winnowing
>> > of young objects was to ocur has in fact occured already within the
>> > first 3-4 scavenges that
>> > an object has survived, after which the drop-off in population is less
>> > sharp. So I'd suggest
>> > lowering the MTT to about 3, while leaving the survivor ratio intact.
>> > That should reduce your
>> > copying costs and bring down your scavenge pauses further, while not
>> > adversely affecting
>> > your promotion rates (and concomitantly the fragmentation).
>> >
>> > One thing that was a bit puzzling about the stats below was that you'd
>> > expect the volume
>> > of generation X in scavenge N to be no less than the volume of
>> > generation X+1 in scavenge N+1,
>> > but occasionally that natural invariant does not appear to hold, which
>> > is quite puzzling --
>> > indicating perhaps that either ages or populations are not being
>> > correctly tracked.
>> >
>> > I don't know if anyone else has noticed that in their tenuring
>> > distributions as well....
>> >
>> > -- ramki
>> >
>> > On Tue, Mar 20, 2012 at 9:36 AM, Taras Tielkes <taras.tielkes at gmail.com>
>> > wrote:
>> >> Hi,
>> >>
>> >> I've collected -XX:+PrintTenuringDistribution data from a node in our
>> >> production environment, running -Xmx5g -Xmn400m -XX:SurvivorRatio=8.
>> >> On one other production node, we've configured a larger new gen, and
>> >> larger survivor spaces (-Xmx5g -Xmn800m -XX:SurvivorRatio=4).
>> >> This node has -XX:+PrintTenuringDistribution logging as well.
>> >>
>> >> The node running the larger new gen and survivor spaces has not run
>> >> into a promotion failure yet, while the ones still running the old
>> >> config have hit a few.
>> >> The promotion failures are typically experienced at high load periods,
>> >> which makes sense, as allocation and promotion will experience a spike
>> >> in those periods as well.
>> >>
>> >> The inherent nature of the application implies relatively long
>> >> sessions (towards a few hours), retaining a fair amout of state up to
>> >> an hour.
>> >> I believe this is the main reason of the relatively high promotion
>> >> rate we're experiencing.
>> >>
>> >>
>> >> Here's a fragment of gc log from one of the nodes running the older
>> >> (smaller) new gen, including a promotion failure:
>> >> -------------------------
>> >> 2012-03-15T18:32:17.785+0100: 796604.225: [GC 796604.225: [ParNew
>> >> Desired survivor size 20971520 bytes, new threshold 8 (max 15)
>> >> - age ? 1: ? ?2927728 bytes, ? ?2927728 total
>> >> - age ? 2: ? ?2428512 bytes, ? ?5356240 total
>> >> - age ? 3: ? ?2696376 bytes, ? ?8052616 total
>> >> - age ? 4: ? ?2623576 bytes, ? 10676192 total
>> >> - age ? 5: ? ?3365576 bytes, ? 14041768 total
>> >> - age ? 6: ? ?2792272 bytes, ? 16834040 total
>> >> - age ? 7: ? ?2233008 bytes, ? 19067048 total
>> >> - age ? 8: ? ?2263824 bytes, ? 21330872 total
>> >> : 358709K->29362K(368640K), 0.0461460 secs]
>> >> 3479492K->3151874K(5201920K), 0.0467320 secs] [Times: user=0.34
>> >> sys=0.01, real=0.05 secs]
>> >> 2012-03-15T18:32:21.546+0100: 796607.986: [GC 796607.986: [ParNew (0:
>> >> promotion failure size = 25) ?(1: promotion failure size = 25) ?(2:
>> >> promotion failure size = 25) ?(3: promotion failure size = 25) ?(4:
>> >> promotion failure size = 25) ?(5
>> >> : promotion failure size = 25) ?(6: promotion failure size = 341) ?(7:
>> >> promotion failure size = 25) ?(promotion failed)
>> >> Desired survivor size 20971520 bytes, new threshold 8 (max 15)
>> >> - age ? 1: ? ?3708208 bytes, ? ?3708208 total
>> >> - age ? 2: ? ?2174384 bytes, ? ?5882592 total
>> >> - age ? 3: ? ?2383256 bytes, ? ?8265848 total
>> >> - age ? 4: ? ?2689912 bytes, ? 10955760 total
>> >> - age ? 5: ? ?2621832 bytes, ? 13577592 total
>> >> - age ? 6: ? ?3360440 bytes, ? 16938032 total
>> >> - age ? 7: ? ?2784136 bytes, ? 19722168 total
>> >> - age ? 8: ? ?2220232 bytes, ? 21942400 total
>> >> : 357042K->356456K(368640K), 0.2734100 secs]796608.259: [CMS:
>> >> 3124189K->516640K(4833280K), 6.8127070 secs]
>> >> 3479554K->516640K(5201920K), [CMS Perm : 142423K->142284K(262144K)],
>> >> 7.0867850 secs] [Times: user=7.32 sys=0.07, real=7.09 secs]
>> >> 2012-03-15T18:32:30.279+0100: 796616.719: [GC 796616.720: [ParNew
>> >> Desired survivor size 20971520 bytes, new threshold 1 (max 15)
>> >> - age ? 1: ? 29721456 bytes, ? 29721456 total
>> >> : 327680K->40960K(368640K), 0.0403130 secs]
>> >> 844320K->557862K(5201920K), 0.0409070 secs] [Times: user=0.27
>> >> sys=0.01, real=0.04 secs]
>> >> 2012-03-15T18:32:32.701+0100: 796619.141: [GC 796619.141: [ParNew
>> >> Desired survivor size 20971520 bytes, new threshold 15 (max 15)
>> >> - age ? 1: ? 10310176 bytes, ? 10310176 total
>> >> -------------------------
>> >>
>> >> For contrast, here's a gc log fragment from the single node running
>> >> the larger new gen and larger survivor spaces:
>> >> (the fragment is from the same point in time, with the nodes
>> >> experiencing equal load)
>> >> -------------------------
>> >> 2012-03-15T18:32:12.067+0100: 797119.336: [GC 797119.336: [ParNew
>> >> Desired survivor size 69894144 bytes, new threshold 15 (max 15)
>> >> - age ? 1: ? ?5611536 bytes, ? ?5611536 total
>> >> - age ? 2: ? ?3731888 bytes, ? ?9343424 total
>> >> - age ? 3: ? ?3450672 bytes, ? 12794096 total
>> >> - age ? 4: ? ?3314744 bytes, ? 16108840 total
>> >> - age ? 5: ? ?3459888 bytes, ? 19568728 total
>> >> - age ? 6: ? ?3334712 bytes, ? 22903440 total
>> >> - age ? 7: ? ?3671960 bytes, ? 26575400 total
>> >> - age ? 8: ? ?3841608 bytes, ? 30417008 total
>> >> - age ? 9: ? ?2035392 bytes, ? 32452400 total
>> >> - age ?10: ? ?1975056 bytes, ? 34427456 total
>> >> - age ?11: ? ?2021344 bytes, ? 36448800 total
>> >> - age ?12: ? ?1520752 bytes, ? 37969552 total
>> >> - age ?13: ? ?1494176 bytes, ? 39463728 total
>> >> - age ?14: ? ?2355136 bytes, ? 41818864 total
>> >> - age ?15: ? ?1279000 bytes, ? 43097864 total
>> >> : 603473K->61640K(682688K), 0.0756570 secs]
>> >> 3373284K->2832383K(5106368K), 0.0762090 secs] [Times: user=0.56
>> >> sys=0.00, real=0.08 secs]
>> >> 2012-03-15T18:32:18.200+0100: 797125.468: [GC 797125.469: [ParNew
>> >> Desired survivor size 69894144 bytes, new threshold 15 (max 15)
>> >> - age ? 1: ? ?6101320 bytes, ? ?6101320 total
>> >> - age ? 2: ? ?4446776 bytes, ? 10548096 total
>> >> - age ? 3: ? ?3701384 bytes, ? 14249480 total
>> >> - age ? 4: ? ?3438488 bytes, ? 17687968 total
>> >> - age ? 5: ? ?3295360 bytes, ? 20983328 total
>> >> - age ? 6: ? ?3403320 bytes, ? 24386648 total
>> >> - age ? 7: ? ?3323368 bytes, ? 27710016 total
>> >> - age ? 8: ? ?3665760 bytes, ? 31375776 total
>> >> - age ? 9: ? ?2427904 bytes, ? 33803680 total
>> >> - age ?10: ? ?1418656 bytes, ? 35222336 total
>> >> - age ?11: ? ?1955192 bytes, ? 37177528 total
>> >> - age ?12: ? ?2006064 bytes, ? 39183592 total
>> >> - age ?13: ? ?1520768 bytes, ? 40704360 total
>> >> - age ?14: ? ?1493728 bytes, ? 42198088 total
>> >> - age ?15: ? ?2354376 bytes, ? 44552464 total
>> >> : 607816K->62650K(682688K), 0.0779270 secs]
>> >> 3378559K->2834643K(5106368K), 0.0784690 secs] [Times: user=0.58
>> >> sys=0.00, real=0.08 secs]
>> >> -------------------------
>> >>
>> >> Questions:
>> >>
>> >> 1) From the tenuring distributions, it seems that the application
>> >> benefits from larger new gen and survivor spaces.
>> >> The next thing we'll try is to run with -Xmn1g -XX:SurvivorRatio=2,
>> >> and see if the ParNew times are still acceptable.
>> >> Does this seem a sensible approach in this context?
>> >> Are there other variables beyond ParNew times that limit scaling the
>> >> new gen to a large size?
>> >>
>> >> 2) Given the object age demographics inherent to our application, we
>> >> can not expect to see the majority of data get collected in the new
>> >> gen.
>> >>
>> >> Our approach to fight the promotion failures consists of three aspects:
>> >> a) Lower the overall allocation rate of our application (by improving
>> >> wasteful hotspots), to decrease overall ParNew collection frequency.
>> >> b) Configure the new gen and survivor spaces as large as possible,
>> >> keeping an eye on ParNew times and overall new/tenured ratio.
>> >> c) Try to refactor the data structures that form the bulk of promoted
>> >> data, to retain only the strictly required subgraphs.
>> >>
>> >> Is there anything else I can try or measure, in order to better
>> >> understand the problem?
>> >>
>> >> Thanks in advance,
>> >> Taras
>> >>
>> >>
>> >> On Wed, Feb 22, 2012 at 10:51 AM, Taras Tielkes
>> >> <taras.tielkes at gmail.com> wrote:
>> >>> (this time properly responding to the list alias)
>> >>> Hi Srinivas,
>> >>>
>> >>> We're running 1.6.0 u29 on Linux x64. My understanding is that
>> >>> CompressedOops is enabled by default since u23.
>> >>>
>> >>> At least this page seems to support that:
>> >>>
>> >>> http://docs.oracle.com/javase/7/docs/technotes/guides/vm/performance-enhancements-7.html
>> >>>
>> >>> Regarding the other remarks (also from Todd and Chi), I'll comment
>> >>> later. The first thing on my list is to collect
>> >>> PrintTenuringDistribution data now.
>> >>>
>> >>> Kind regards,
>> >>> Taras
>> >>>
>> >>> On Wed, Feb 22, 2012 at 10:50 AM, Taras Tielkes
>> >>> <taras.tielkes at gmail.com> wrote:
>> >>>> Hi Srinivas,
>> >>>>
>> >>>> We're running 1.6.0 u29 on Linux x64. My understanding is that
>> >>>> CompressedOops is enabled by default since u23.
>> >>>>
>> >>>> At least this page seems to support that:
>> >>>>
>> >>>> http://docs.oracle.com/javase/7/docs/technotes/guides/vm/performance-enhancements-7.html
>> >>>>
>> >>>> Regarding the other remarks (also from Todd and Chi), I'll comment
>> >>>> later. The first thing on my list is to collect
>> >>>> PrintTenuringDistribution data now.
>> >>>>
>> >>>> Kind regards,
>> >>>> Taras
>> >>>>
>> >>>> On Wed, Feb 22, 2012 at 12:40 AM, Srinivas Ramakrishna
>> >>>> <ysr1729 at gmail.com> wrote:
>> >>>>> I agree that premature promotions are almost always the first and
>> >>>>> most
>> >>>>> important thing to fix when running
>> >>>>> into fragmentation or overload issues with CMS. However, I can also
>> >>>>> imagine
>> >>>>> long-lived objects with a highly
>> >>>>> non-stationary size distribution which can also cause problems for
>> >>>>> CMS
>> >>>>> despite best efforts to tune against
>> >>>>> premature promotion.
>> >>>>>
>> >>>>> I didn't think Treas was running with MTT=0, although MTT > 0 is no
>> >>>>> recipe
>> >>>>> for avoiding premature promotion
>> >>>>> with bursty loads that case overflow the survivor spaces -- as you
>> >>>>> say large
>> >>>>> survivor spaces with a low
>> >>>>> TargetSurvivorRatio -- so as to leave plenty of space to
>> >>>>> absorb/accommodate
>> >>>>> spiking/bursty loads? is
>> >>>>> definitely a "best practice" for CMS (and possibly for other
>> >>>>> concurrent
>> >>>>> collectors as well).
>> >>>>>
>> >>>>> One thing Taras can do to see if premature promotion might be an
>> >>>>> issue is to
>> >>>>> look at the tenuring
>> >>>>> threshold in his case. A rough proxy (if PrintTenuringDistribution
>> >>>>> is not
>> >>>>> enabled) is to look at the
>> >>>>> promotion volume per scavenge. It may be possible, if premature
>> >>>>> promotion is
>> >>>>> a cause, to see
>> >>>>> some kind of medium-term correlation between high promotion volume
>> >>>>> and
>> >>>>> eventual promotion
>> >>>>> failure despite frequent CMS collections.
>> >>>>>
>> >>>>> One other point which may or may not be relevant. I see that Taras
>> >>>>> is not
>> >>>>> using CompressedOops...
>> >>>>> Using that alone would greatly decrease memory pressure and provide
>> >>>>> more
>> >>>>> breathing room to CMS,
>> >>>>> which is also almost always a good idea.
>> >>>>>
>> >>>>> -- ramki
>> >>>>>
>> >>>>> On Tue, Feb 21, 2012 at 10:16 AM, Chi Ho Kwok <chkwok at digibites.nl>
>> >>>>> wrote:
>> >>>>>>
>> >>>>>> Hi Teras,
>> >>>>>>
>> >>>>>> I think you may want to look into sizing the new and especially the
>> >>>>>> survivor spaces differently. We run something similar to what you
>> >>>>>> described,
>> >>>>>> high volume request processing with large dataset loading, and what
>> >>>>>> we've
>> >>>>>> seen at the start is that the survivor spaces are completely
>> >>>>>> overloaded,
>> >>>>>> causing premature promotions.
>> >>>>>>
>> >>>>>> We've configured our vm with the following goals/guideline:
>> >>>>>>
>> >>>>>> old space is for semi-permanent data, living for at least 30s,
>> >>>>>> average ~10
>> >>>>>> minutes
>> >>>>>> new space contains only temporary and just loaded data
>> >>>>>> surviving objects from new should never reach old in 1 gc, so the
>> >>>>>> survivor
>> >>>>>> space may never be 100% full
>> >>>>>>
>> >>>>>> With jstat -gcutil `pidof java` 2000, we see things like:
>> >>>>>>
>> >>>>>> ? S0 ? ? S1 ? ? E ? ? ?O ? ? ?P ? ? YGC ? ? YGCT ? ?FGC ? ?FGCT
>> >>>>>> GCT
>> >>>>>> ?70.20 ? 0.00 ?19.65 ?57.60 ?59.90 124808 29474.299 ?2498 ?191.110
>> >>>>>> 29665.409
>> >>>>>> ?70.20 ? 0.00 ?92.89 ?57.60 ?59.90 124808 29474.299 ?2498 ?191.110
>> >>>>>> 29665.409
>> >>>>>> ?70.20 ? 0.00 ?93.47 ?57.60 ?59.90 124808 29474.299 ?2498 ?191.110
>> >>>>>> 29665.409
>> >>>>>> ? 0.00 ?65.69 ?78.07 ?58.09 ?59.90 124809 29474.526 ?2498 ?191.110
>> >>>>>> 29665.636
>> >>>>>> ?84.97 ? 0.00 ?48.19 ?58.57 ?59.90 124810 29474.774 ?2498 ?191.110
>> >>>>>> 29665.884
>> >>>>>> ?84.97 ? 0.00 ?81.30 ?58.57 ?59.90 124810 29474.774 ?2498 ?191.110
>> >>>>>> 29665.884
>> >>>>>> ? 0.00 ?62.64 ?27.22 ?59.12 ?59.90 124811 29474.992 ?2498 ?191.110
>> >>>>>> 29666.102
>> >>>>>> ? 0.00 ?62.64 ?54.47 ?59.12 ?59.90 124811 29474.992 ?2498 ?191.110
>> >>>>>> 29666.102
>> >>>>>> ?75.68 ? 0.00 ? 6.80 ?59.53 ?59.90 124812 29475.228 ?2498 ?191.110
>> >>>>>> 29666.338
>> >>>>>> ?75.68 ? 0.00 ?23.38 ?59.53 ?59.90 124812 29475.228 ?2498 ?191.110
>> >>>>>> 29666.338
>> >>>>>> ?75.68 ? 0.00 ?27.72 ?59.53 ?59.90 124812 29475.228 ?2498 ?191.110
>> >>>>>> 29666.338
>> >>>>>>
>> >>>>>> If you follow the lines, you can see Eden fill up to 100% on line
>> >>>>>> 4,
>> >>>>>> surviving objects are copied into S1, S0 is collected and added
>> >>>>>> 0.49% to
>> >>>>>> Old. On line 5, another GC happened, with Eden->S0, S1->Old, etc.
>> >>>>>> No objects
>> >>>>>> is ever transferred from Eden to Old, unless there's a huge peak of
>> >>>>>> requests.
>> >>>>>>
>> >>>>>> This is with a: 32GB heap, Mxn1200M, SurvivorRatio 2 (600MB Eden,
>> >>>>>> 300MB
>> >>>>>> S0, 300MB S1), MaxTenuringThreshold 1 (whatever is still alive in
>> >>>>>> S0/1 on
>> >>>>>> the second GC is copied to old, don't wait, web requests are quite
>> >>>>>> bursty).
>> >>>>>> With about 1 collection every 2-5 seconds, objects promoted to Old
>> >>>>>> must live
>> >>>>>> for at 4-10 seconds; as that's longer than an average request
>> >>>>>> (50ms-1s),
>> >>>>>> none of the temporary data ever makes it into Old, which is much
>> >>>>>> more
>> >>>>>> expensive to collect. It works even with a higher than default
>> >>>>>> CMSInitiatingOccupancyFraction=76 to optimize for space available
>> >>>>>> for the
>> >>>>>> large data cache we have.
>> >>>>>>
>> >>>>>>
>> >>>>>> With your config of 400MB Total new, with 350MB Eden, 25MB S0, 25MB
>> >>>>>> S1
>> >>>>>> (SurvivorRatio 8), no tenuring threshold, I think loads of new
>> >>>>>> objects get
>> >>>>>> copied from Eden to Old directly, causing trouble for the CMS. You
>> >>>>>> can use
>> >>>>>> jstat to get live stats and tweak until it doesn't happen. If you
>> >>>>>> can't make
>> >>>>>> changes on live that easil, try doubling the new size indeed, with
>> >>>>>> a 400
>> >>>>>> Eden, 200 S0, 200 S1 and?MaxTenuringThreshold?1 setting. It's
>> >>>>>> probably
>> >>>>>> overkill, but if should solve the problem if it is caused by
>> >>>>>> premature
>> >>>>>> promotion.
>> >>>>>>
>> >>>>>>
>> >>>>>> Chi Ho Kwok
>> >>>>>>
>> >>>>>>
>> >>>>>> On Tue, Feb 21, 2012 at 5:55 PM, Taras Tielkes
>> >>>>>> <taras.tielkes at gmail.com>
>> >>>>>> wrote:
>> >>>>>>>
>> >>>>>>> Hi,
>> >>>>>>>
>> >>>>>>> We've removed the "-XX:+CMSScavengeBeforeRemark" setting from 50%
>> >>>>>>> of
>> >>>>>>> our production nodes.
>> >>>>>>> After running for a few weeks, it seems that there's no impact
>> >>>>>>> from
>> >>>>>>> removing this option.
>> >>>>>>> Which is good, since it seems we can remove it from the other
>> >>>>>>> nodes as
>> >>>>>>> well, simplifying our overall JVM configuration ;-)
>> >>>>>>>
>> >>>>>>> However, we're still seeing promotion failures on all nodes, once
>> >>>>>>> every day or so.
>> >>>>>>>
>> >>>>>>> There's still the "Magic 1026": this accounts for ~60% of the
>> >>>>>>> promotion failures that we're seeing (single ParNew thread thread,
>> >>>>>>> 1026 failure size):
>> >>>>>>> --------------------
>> >>>>>>> 2012-02-06T09:13:51.806+0100: 328095.085: [GC 328095.086: [ParNew:
>> >>>>>>> 359895K->29357K(368640K), 0.0429070 secs]
>> >>>>>>> 3471021K->3143476K(5201920K), 0.0434950 secs] [Times: user=0.32
>> >>>>>>> sys=0.00, real=0.04 secs]
>> >>>>>>> 2012-02-06T09:13:55.922+0100: 328099.201: [GC 328099.201: [ParNew:
>> >>>>>>> 357037K->31817K(368640K), 0.0429130 secs]
>> >>>>>>> 3471156K->3148946K(5201920K), 0.0434930 secs] [Times: user=0.31
>> >>>>>>> sys=0.00, real=0.04 secs]
>> >>>>>>> 2012-02-06T09:13:59.044+0100: 328102.324: [GC 328102.324: [ParNew
>> >>>>>>> (promotion failure size = 1026) ?(promotion failed):
>> >>>>>>> 359497K->368640K(368640K), 0.2226790 secs]328102.547: [CMS:
>> >>>>>>> 3125609K->451515K(4833280K), 5.6225880 secs] 3476626K->4515
>> >>>>>>> 15K(5201920K), [CMS Perm : 124373K->124353K(262144K)], 5.8459380
>> >>>>>>> secs]
>> >>>>>>> [Times: user=6.20 sys=0.01, real=5.85 secs]
>> >>>>>>> 2012-02-06T09:14:05.243+0100: 328108.522: [GC 328108.523: [ParNew:
>> >>>>>>> 327680K->40960K(368640K), 0.0319160 secs]
>> >>>>>>> 779195K->497658K(5201920K),
>> >>>>>>> 0.0325360 secs] [Times: user=0.21 sys=0.01, real=0.03 secs]
>> >>>>>>> 2012-02-06T09:14:07.836+0100: 328111.116: [GC 328111.116: [ParNew:
>> >>>>>>> 368640K->32785K(368640K), 0.0744670 secs]
>> >>>>>>> 825338K->520234K(5201920K),
>> >>>>>>> 0.0750390 secs] [Times: user=0.40 sys=0.02, real=0.08 secs]
>> >>>>>>> --------------------
>> >>>>>>> Given the 1026 word size, I'm wondering if I should be hunting for
>> >>>>>>> an
>> >>>>>>> overuse of BufferedInputStream/BufferedOutoutStream, since both
>> >>>>>>> have
>> >>>>>>> 8192 as a default buffer size.
>> >>>>>>>
>> >>>>>>> The second group of promotion failures look like this (multiple
>> >>>>>>> ParNew
>> >>>>>>> threads, small failure sizes):
>> >>>>>>> --------------------
>> >>>>>>> 2012-02-06T09:50:15.773+0100: 328756.964: [GC 328756.964: [ParNew:
>> >>>>>>> 356116K->29934K(368640K), 0.0461100 secs]
>> >>>>>>> 3203863K->2880162K(5201920K), 0.0468870 secs] [Times: user=0.34
>> >>>>>>> sys=0.01, real=0.05 secs]
>> >>>>>>> 2012-02-06T09:50:19.153+0100: 328760.344: [GC 328760.344: [ParNew:
>> >>>>>>> 357614K->30359K(368640K), 0.0454680 secs]
>> >>>>>>> 3207842K->2882892K(5201920K), 0.0462280 secs] [Times: user=0.33
>> >>>>>>> sys=0.01, real=0.05 secs]
>> >>>>>>> 2012-02-06T09:50:22.658+0100: 328763.849: [GC 328763.849: [ParNew
>> >>>>>>> (1:
>> >>>>>>> promotion failure size = 25) ?(4: promotion failure size = 25)
>> >>>>>>> ?(6:
>> >>>>>>> promotion failure size = 25) ?(7: promotion failure size = 144)
>> >>>>>>> (promotion failed): 358039K->358358
>> >>>>>>> K(368640K), 0.2148680 secs]328764.064: [CMS:
>> >>>>>>> 2854709K->446750K(4833280K), 5.8368270 secs]
>> >>>>>>> 3210572K->446750K(5201920K), [CMS Perm :
>> >>>>>>> 124670K->124644K(262144K)],
>> >>>>>>> 6.0525230 secs] [Times: user=6.32 sys=0.00, real=6.05 secs]
>> >>>>>>> 2012-02-06T09:50:29.896+0100: 328771.086: [GC 328771.087: [ParNew:
>> >>>>>>> 327680K->22569K(368640K), 0.0227080 secs]
>> >>>>>>> 774430K->469319K(5201920K),
>> >>>>>>> 0.0235020 secs] [Times: user=0.16 sys=0.00, real=0.02 secs]
>> >>>>>>> 2012-02-06T09:50:31.076+0100: 328772.266: [GC 328772.267: [ParNew:
>> >>>>>>> 350249K->22264K(368640K), 0.0235480 secs]
>> >>>>>>> 796999K->469014K(5201920K),
>> >>>>>>> 0.0243000 secs] [Times: user=0.18 sys=0.01, real=0.02 secs]
>> >>>>>>> --------------------
>> >>>>>>>
>> >>>>>>> We're going to try to double the new size on a single node, to see
>> >>>>>>> the
>> >>>>>>> effects of that.
>> >>>>>>>
>> >>>>>>> Beyond this experiment, is there any additional data I can collect
>> >>>>>>> to
>> >>>>>>> better understand the nature of the promotion failures?
>> >>>>>>> Am I facing collecting free list statistics at this point?
>> >>>>>>>
>> >>>>>>> Thanks,
>> >>>>>>> Taras
>> >>>>>>
>> >>>>>>
>> >>>>>> _______________________________________________
>> >>>>>> hotspot-gc-use mailing list
>> >>>>>> hotspot-gc-use at openjdk.java.net
>> >>>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>> >>>>>>
>> >>>>>
>> >> _______________________________________________
>> >> hotspot-gc-use mailing list
>> >> hotspot-gc-use at openjdk.java.net
>> >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>

From taras.tielkes at gmail.com  Sun Apr 15 09:41:02 2012
From: taras.tielkes at gmail.com (Taras Tielkes)
Date: Sun, 15 Apr 2012 18:41:02 +0200
Subject: Promotion failures: indication of CMS fragmentation?
In-Reply-To: <CAG7eTFoo9oj7BS29qnur8SQwHWVKD+wT-+MkSspro1JOu1DYeQ@mail.gmail.com>
References: <CA+R7V78oeNvQwWOjagdANw=h0Ws_p5da7BDeOhguoKT1V5n5dQ@mail.gmail.com>
	<4EF9FCAC.3030208@oracle.com>
	<CA+R7V7-SGdXmbtqo=+2VQwKVnCVCZdj4M=gQfrxiGf2fEMi3cA@mail.gmail.com>
	<4F06A270.3010701@oracle.com>
	<CA+R7V78Twoz0a=J5oCRYJjBdnptPdUv9Jnvt4wiLUsh3Cy+bHw@mail.gmail.com>
	<4F0DBEC4.7040907@oracle.com>
	<CA+R7V7-pxrKH5L2brxZRZwKrv7ZF3aYtQkZmb7-A=nSLn5QfYg@mail.gmail.com>
	<4F1ECE7B.3040502@oracle.com>
	<CA+R7V79x29mXvkEKuPnCYrAJfZjzHc5QnfgrNCYPZFO8GRYayg@mail.gmail.com>
	<4F1F2ED7.6060308@oracle.com>
	<CA+R7V7_P4xdsOMdM+KgiO-urNMiPakQQcdjnOQ_yYo4KZhko2w@mail.gmail.com>
	<4F20F78D.9070905@oracle.com>
	<CA+R7V79M0B2UTqqxiUGfoK-1pMP54e+biBnH+wy=zGEA2vjihg@mail.gmail.com>
	<CA+R7V79F59SJL6F7QvmWAuCKyisv5MFuDvsBfkDuvU0UcZ_iOw@mail.gmail.com>
	<CA+R7V7_st6DPnJZOMUnAeRVeYND42Y19rAUjNJ+PhtF72Ur2mQ@mail.gmail.com>
	<CAG7eTFoeYitaBjgt2eUT3kXqU2SGk1eC5eofdAAL1SjuCFMHCg@mail.gmail.com>
	<CABzyjyna2Mq7EXDiZ8mVB=1MX9Gw1=e2z8zO8X69QeodVKbBrg@mail.gmail.com>
	<CA+R7V79NDPWq2YeX8NQ9j6XW8P7=dZbQWJwzG6dxz4UgUGvKuA@mail.gmail.com>
	<CA+R7V783r2459y7r6zxXNP9_eQ2KOa5Oh0Mr2tPhS1d-8H-ing@mail.gmail.com>
	<CA+R7V78c=D70MJv-jUw2X4ayOOUV0UBZDphcucRSpiYYncoV1A@mail.gmail.com>
	<CABzyjynVHStdeACv_o3zyBhc_P_5yrMrqLb+++aWn8cft51PuA@mail.gmail.com>
	<CA+R7V7-pON9KOHg+pkV3ZRxodP3=OgyVRL__A2+YAR1OGAUtfw@mail.gmail.com>
	<CAG7eTFryQozJmQ=2MEQn0Yrz+c9LuxbknAPnmQwzN6sdM22j3g@mail.gmail.com>
	<CA+R7V7_GFq7mpVKcraMDDVfGEzAz9Wvn=K5k=14L9XM8EW3O4Q@mail.gmail.com>
	<CAG7eTFoo9oj7BS29qnur8SQwHWVKD+wT-+MkSspro1JOu1DYeQ@mail.gmail.com>
Message-ID: <CA+R7V78XOkt36PKuPfECuion4isHhbJewMf_Y-6mEtR8JsG=RA@mail.gmail.com>

Hi Chi,

I've sent you a decent chunk of the gc.log file off-list (hopefully
not too large).

For completeness, we're running with the following options (ignoring
the diagnostic ones):
-----
-server
-Xms5g
-Xmx5g
-Xmn800m
-XX:PermSize=256m
-XX:MaxPermSize=256m
-XX:SurvivorRatio=4
-XX:+UseConcMarkSweepGC
-XX:+UseParNewGC
-XX:+DisableExplicitGC
-XX:+UseCMSInitiatingOccupancyOnly
-XX:+CMSClassUnloadingEnabled
-XX:CMSInitiatingOccupancyFraction=68
-----
Platform is Java 6u29 running on Linux 2.6 x64.
Hardware is 2xquad Xeons, but pretty old ones (pre-Nehalem, no QPI).

The gc logs will (typically) show big peaks at the start and end of
the working day - this is nature of the domain our application
targets.

I would expect the live set to be below 1G (usually below 600M even).
However, we can experience temporary spikes of higher volume
longer-living object allocation bursts.

We'll set up a jstat log for this machine. I do have historical jstat
logs for one of the other machines, but that one is still running with
a smaller new gen, and smaller survivor spaces. If there's any other
relevant data that I can collect, let me know.

Kind regards,
Taras

On Sun, Apr 15, 2012 at 6:15 PM, Chi Ho Kwok <chkwok at digibites.nl> wrote:
> Hi Teras,
>
> Sure thing. Just the previous CMS should be enough, it doesn't matter if
> there is 10 or 1000 parnew's between that and the failure.
>
> As for the jstat failure, it looks like it looks in
> /tmp/hsperfdata_[username] for the pid by default, maybe something
> like?-J-Djava.io.tmpdir=[path, like /app/client/program/tomcat/temp] can
> help; and from what I've seen, running jstat as the same user as the process
> or root is required. Historical data is nice to have, but even just staring
> at it for 15 minutes should give you a hint for the old gen usage.
>
> If the collection starts at 68, takes a while and the heap fills to 80%+
> before it's done when it's not busy, it's probably wise to lower the initial
> occupancy factor or increase the thread count so it completes faster. We run
> with?-XX:ParallelCMSThreads=3 on a 8 hw thread server, the default (2) was
> too slow for us as we run with 76%, it still takes 15s on average for CMS to
> scan and clean the old gen (while old gen grows to up to 80% full), much
> longer can mean a promotion failure during request spikes.
>
>
> Chi Ho Kwok
>
>
> On Sun, Apr 15, 2012 at 5:08 PM, Taras Tielkes <taras.tielkes at gmail.com>
> wrote:
>>
>> Hi Chi,
>>
>> Is it o.k. if I send this off-list to you directly? If so, how much
>> more do you need? Just enough to cover the previous CMS?
>> We're running with ?-XX:CMSInitiatingOccupancyFraction=68 and
>> -XX:+UseCMSInitiatingOccupancyOnly, by the way.
>>
>> I do have shell access, however, on that particular machine we're
>> experiencing the "process id not found" issue with jstat.
>> I think this can be worked around by fiddling with temp directory
>> options, but we haven't tried that yet.
>> Regarding the jstat output, I assume this would be most valuable to
>> have for the exact moment when the promotion failure happens, correct?
>> If so, we can try to set up jstat to run in the background
>> continuously, to have more diagnostic data in the future.
>>
>> Kind regards,
>> Taras
>>
>> On Sun, Apr 15, 2012 at 2:48 PM, Chi Ho Kwok <chkwok at digibites.nl> wrote:
>> > Hi Teras,
>> >
>> > Can you send me a larger chunk of the log? I'm interested in seeing when
>> > the
>> > last CMS was run and what it freed. Maybe it's kicking in too late, the
>> > full
>> > GC triggered by promotion failure only found 600M live data, rest was
>> > garbage. If that's the cause, lowering?XX:CMSInitiatingOccupancyFraction
>> > can
>> > help.
>> >
>> > Also, do you have shell access to that machine? If so, try running
>> > jstat,
>> > you can see the usage of all generations live as it happens.
>> >
>> >
>> > Chi Ho Kwok
>> >
>> > On Sun, Apr 15, 2012 at 2:34 PM, Taras Tielkes <taras.tielkes at gmail.com>
>> > wrote:
>> >>
>> >> Hi Chi, Srinivas,
>> >>
>> >> Optimizing the cost of ParNew (by lowering MTT) would be nice, but for
>> >> now my priority is still to minimize the promotion failures.
>> >>
>> >> For example, on the machine running CMS with the "larger" young gen
>> >> and survivor spaces (-Xmx5g -Xmn800 -XX:SurvivorRatio=4), I've just
>> >> seen a promotion failure again. Below is a snippet of gc.log showing
>> >> this.
>> >> To put this into perspective, this is a first promotion failure on
>> >> that machine in a couple of weeks. Still, zero failures would beat a
>> >> single failure, since the clients connecting to this application will
>> >> only wait a few seconds before timing out and terminating the
>> >> connection. In addition, the promotion failures are occurring in peak
>> >> usage moments.
>> >>
>> >> Apart from trying to eliminate the promotion failure pauses, my main
>> >> goal is to learn how to understand the root cause in a case like this.
>> >> Any suggestions for things to try or read up on are appreciated.
>> >>
>> >> Kind regards,
>> >> Taras
>> >> ------------------------------------------------
>> >> 2012-04-13T17:44:27.777+0200: 3296255.045: [GC 3296255.046: [ParNew
>> >> Desired survivor size 69894144 bytes, new threshold 15 (max 15)
>> >> - age ? 1: ? ?3684448 bytes, ? ?3684448 total
>> >> - age ? 2: ? ? 824984 bytes, ? ?4509432 total
>> >> - age ? 3: ? ? 885120 bytes, ? ?5394552 total
>> >> - age ? 4: ? ? 756568 bytes, ? ?6151120 total
>> >> - age ? 5: ? ? 696880 bytes, ? ?6848000 total
>> >> - age ? 6: ? ? 890688 bytes, ? ?7738688 total
>> >> - age ? 7: ? ?2631184 bytes, ? 10369872 total
>> >> - age ? 8: ? ? 719976 bytes, ? 11089848 total
>> >> - age ? 9: ? ? 724944 bytes, ? 11814792 total
>> >> - age ?10: ? ? 750360 bytes, ? 12565152 total
>> >> - age ?11: ? ? 934944 bytes, ? 13500096 total
>> >> - age ?12: ? ? 521080 bytes, ? 14021176 total
>> >> - age ?13: ? ? 543392 bytes, ? 14564568 total
>> >> - age ?14: ? ? 906616 bytes, ? 15471184 total
>> >> - age ?15: ? ? 504008 bytes, ? 15975192 total
>> >> : 568932K->22625K(682688K), 0.0410180 secs]
>> >> 3077079K->2531413K(5106368K), 0.0416940 secs] [Times: user=0.30
>> >> sys=0.01, real=0.05 secs]
>> >> 2012-04-13T17:44:33.893+0200: 3296261.162: [GC 3296261.162: [ParNew
>> >> Desired survivor size 69894144 bytes, new threshold 15 (max 15)
>> >> - age ? 1: ? ?2975896 bytes, ? ?2975896 total
>> >> - age ? 2: ? ? 742592 bytes, ? ?3718488 total
>> >> - age ? 3: ? ? 812864 bytes, ? ?4531352 total
>> >> - age ? 4: ? ? 873488 bytes, ? ?5404840 total
>> >> - age ? 5: ? ? 746128 bytes, ? ?6150968 total
>> >> - age ? 6: ? ? 685192 bytes, ? ?6836160 total
>> >> - age ? 7: ? ? 888376 bytes, ? ?7724536 total
>> >> - age ? 8: ? ?2621688 bytes, ? 10346224 total
>> >> - age ? 9: ? ? 715608 bytes, ? 11061832 total
>> >> - age ?10: ? ? 723336 bytes, ? 11785168 total
>> >> - age ?11: ? ? 749856 bytes, ? 12535024 total
>> >> - age ?12: ? ? 914632 bytes, ? 13449656 total
>> >> - age ?13: ? ? 520944 bytes, ? 13970600 total
>> >> - age ?14: ? ? 543224 bytes, ? 14513824 total
>> >> - age ?15: ? ? 906040 bytes, ? 15419864 total
>> >> : 568801K->22726K(682688K), 0.0447800 secs]
>> >> 3077589K->2532008K(5106368K), 0.0454710 secs] [Times: user=0.33
>> >> sys=0.00, real=0.05 secs]
>> >> 2012-04-13T17:44:40.231+0200: 3296267.499: [GC 3296267.500: [ParNew
>> >> (1: promotion failure size = 16) ?(2: promotion failure size = 56)
>> >> (4: promotion failure
>> >> size = 342) ?(5: promotion failure size = 1026) ?(6: promotion failure
>> >> size = 278) ?(promotion failed)
>> >> Desired survivor size 69894144 bytes, new threshold 15 (max 15)
>> >> - age ? 1: ? ?2436840 bytes, ? ?2436840 total
>> >> - age ? 2: ? ?1625136 bytes, ? ?4061976 total
>> >> - age ? 3: ? ? 691664 bytes, ? ?4753640 total
>> >> - age ? 4: ? ? 799992 bytes, ? ?5553632 total
>> >> - age ? 5: ? ? 858344 bytes, ? ?6411976 total
>> >> - age ? 6: ? ? 730200 bytes, ? ?7142176 total
>> >> - age ? 7: ? ? 680072 bytes, ? ?7822248 total
>> >> - age ? 8: ? ? 885960 bytes, ? ?8708208 total
>> >> - age ? 9: ? ?2618544 bytes, ? 11326752 total
>> >> - age ?10: ? ? 709168 bytes, ? 12035920 total
>> >> - age ?11: ? ? 714576 bytes, ? 12750496 total
>> >> - age ?12: ? ? 734976 bytes, ? 13485472 total
>> >> - age ?13: ? ? 905048 bytes, ? 14390520 total
>> >> - age ?14: ? ? 520320 bytes, ? 14910840 total
>> >> - age ?15: ? ? 543056 bytes, ? 15453896 total
>> >> : 568902K->568678K(682688K), 0.3130580 secs]3296267.813: [CMS:
>> >> 2510091K->573489K(4423680K), 7.7481330 secs]
>> >> 3078184K->573489K(5106368K), [CMS Perm : 144002K->
>> >> 143970K(262144K)], 8.0619690 secs] [Times: user=8.35 sys=0.01,
>> >> real=8.06
>> >> secs]
>> >> 2012-04-13T17:44:51.337+0200: 3296278.606: [GC 3296278.606: [ParNew
>> >> Desired survivor size 69894144 bytes, new threshold 15 (max 15)
>> >> - age ? 1: ? 33717528 bytes, ? 33717528 total
>> >> : 546176K->43054K(682688K), 0.0515990 secs]
>> >> 1119665K->616543K(5106368K), 0.0523550 secs] [Times: user=0.34
>> >> sys=0.00, real=0.05 secs]
>> >> ------------------------------------------------
>> >>
>> >> On Tue, Mar 20, 2012 at 10:12 PM, Srinivas Ramakrishna
>> >> <ysr1729 at gmail.com> wrote:
>> >> > As Chi-ho noted, about 3-4 MB of data does get promoted per scavenge,
>> >> > after having
>> >> > sloshed around in your survivor spaces some 15 times. I'd venture
>> >> > that
>> >> > whatever winnowing
>> >> > of young objects was to ocur has in fact occured already within the
>> >> > first 3-4 scavenges that
>> >> > an object has survived, after which the drop-off in population is
>> >> > less
>> >> > sharp. So I'd suggest
>> >> > lowering the MTT to about 3, while leaving the survivor ratio intact.
>> >> > That should reduce your
>> >> > copying costs and bring down your scavenge pauses further, while not
>> >> > adversely affecting
>> >> > your promotion rates (and concomitantly the fragmentation).
>> >> >
>> >> > One thing that was a bit puzzling about the stats below was that
>> >> > you'd
>> >> > expect the volume
>> >> > of generation X in scavenge N to be no less than the volume of
>> >> > generation X+1 in scavenge N+1,
>> >> > but occasionally that natural invariant does not appear to hold,
>> >> > which
>> >> > is quite puzzling --
>> >> > indicating perhaps that either ages or populations are not being
>> >> > correctly tracked.
>> >> >
>> >> > I don't know if anyone else has noticed that in their tenuring
>> >> > distributions as well....
>> >> >
>> >> > -- ramki
>> >> >
>> >> > On Tue, Mar 20, 2012 at 9:36 AM, Taras Tielkes
>> >> > <taras.tielkes at gmail.com>
>> >> > wrote:
>> >> >> Hi,
>> >> >>
>> >> >> I've collected -XX:+PrintTenuringDistribution data from a node in
>> >> >> our
>> >> >> production environment, running -Xmx5g -Xmn400m -XX:SurvivorRatio=8.
>> >> >> On one other production node, we've configured a larger new gen, and
>> >> >> larger survivor spaces (-Xmx5g -Xmn800m -XX:SurvivorRatio=4).
>> >> >> This node has -XX:+PrintTenuringDistribution logging as well.
>> >> >>
>> >> >> The node running the larger new gen and survivor spaces has not run
>> >> >> into a promotion failure yet, while the ones still running the old
>> >> >> config have hit a few.
>> >> >> The promotion failures are typically experienced at high load
>> >> >> periods,
>> >> >> which makes sense, as allocation and promotion will experience a
>> >> >> spike
>> >> >> in those periods as well.
>> >> >>
>> >> >> The inherent nature of the application implies relatively long
>> >> >> sessions (towards a few hours), retaining a fair amout of state up
>> >> >> to
>> >> >> an hour.
>> >> >> I believe this is the main reason of the relatively high promotion
>> >> >> rate we're experiencing.
>> >> >>
>> >> >>
>> >> >> Here's a fragment of gc log from one of the nodes running the older
>> >> >> (smaller) new gen, including a promotion failure:
>> >> >> -------------------------
>> >> >> 2012-03-15T18:32:17.785+0100: 796604.225: [GC 796604.225: [ParNew
>> >> >> Desired survivor size 20971520 bytes, new threshold 8 (max 15)
>> >> >> - age ? 1: ? ?2927728 bytes, ? ?2927728 total
>> >> >> - age ? 2: ? ?2428512 bytes, ? ?5356240 total
>> >> >> - age ? 3: ? ?2696376 bytes, ? ?8052616 total
>> >> >> - age ? 4: ? ?2623576 bytes, ? 10676192 total
>> >> >> - age ? 5: ? ?3365576 bytes, ? 14041768 total
>> >> >> - age ? 6: ? ?2792272 bytes, ? 16834040 total
>> >> >> - age ? 7: ? ?2233008 bytes, ? 19067048 total
>> >> >> - age ? 8: ? ?2263824 bytes, ? 21330872 total
>> >> >> : 358709K->29362K(368640K), 0.0461460 secs]
>> >> >> 3479492K->3151874K(5201920K), 0.0467320 secs] [Times: user=0.34
>> >> >> sys=0.01, real=0.05 secs]
>> >> >> 2012-03-15T18:32:21.546+0100: 796607.986: [GC 796607.986: [ParNew
>> >> >> (0:
>> >> >> promotion failure size = 25) ?(1: promotion failure size = 25) ?(2:
>> >> >> promotion failure size = 25) ?(3: promotion failure size = 25) ?(4:
>> >> >> promotion failure size = 25) ?(5
>> >> >> : promotion failure size = 25) ?(6: promotion failure size = 341)
>> >> >> ?(7:
>> >> >> promotion failure size = 25) ?(promotion failed)
>> >> >> Desired survivor size 20971520 bytes, new threshold 8 (max 15)
>> >> >> - age ? 1: ? ?3708208 bytes, ? ?3708208 total
>> >> >> - age ? 2: ? ?2174384 bytes, ? ?5882592 total
>> >> >> - age ? 3: ? ?2383256 bytes, ? ?8265848 total
>> >> >> - age ? 4: ? ?2689912 bytes, ? 10955760 total
>> >> >> - age ? 5: ? ?2621832 bytes, ? 13577592 total
>> >> >> - age ? 6: ? ?3360440 bytes, ? 16938032 total
>> >> >> - age ? 7: ? ?2784136 bytes, ? 19722168 total
>> >> >> - age ? 8: ? ?2220232 bytes, ? 21942400 total
>> >> >> : 357042K->356456K(368640K), 0.2734100 secs]796608.259: [CMS:
>> >> >> 3124189K->516640K(4833280K), 6.8127070 secs]
>> >> >> 3479554K->516640K(5201920K), [CMS Perm : 142423K->142284K(262144K)],
>> >> >> 7.0867850 secs] [Times: user=7.32 sys=0.07, real=7.09 secs]
>> >> >> 2012-03-15T18:32:30.279+0100: 796616.719: [GC 796616.720: [ParNew
>> >> >> Desired survivor size 20971520 bytes, new threshold 1 (max 15)
>> >> >> - age ? 1: ? 29721456 bytes, ? 29721456 total
>> >> >> : 327680K->40960K(368640K), 0.0403130 secs]
>> >> >> 844320K->557862K(5201920K), 0.0409070 secs] [Times: user=0.27
>> >> >> sys=0.01, real=0.04 secs]
>> >> >> 2012-03-15T18:32:32.701+0100: 796619.141: [GC 796619.141: [ParNew
>> >> >> Desired survivor size 20971520 bytes, new threshold 15 (max 15)
>> >> >> - age ? 1: ? 10310176 bytes, ? 10310176 total
>> >> >> -------------------------
>> >> >>
>> >> >> For contrast, here's a gc log fragment from the single node running
>> >> >> the larger new gen and larger survivor spaces:
>> >> >> (the fragment is from the same point in time, with the nodes
>> >> >> experiencing equal load)
>> >> >> -------------------------
>> >> >> 2012-03-15T18:32:12.067+0100: 797119.336: [GC 797119.336: [ParNew
>> >> >> Desired survivor size 69894144 bytes, new threshold 15 (max 15)
>> >> >> - age ? 1: ? ?5611536 bytes, ? ?5611536 total
>> >> >> - age ? 2: ? ?3731888 bytes, ? ?9343424 total
>> >> >> - age ? 3: ? ?3450672 bytes, ? 12794096 total
>> >> >> - age ? 4: ? ?3314744 bytes, ? 16108840 total
>> >> >> - age ? 5: ? ?3459888 bytes, ? 19568728 total
>> >> >> - age ? 6: ? ?3334712 bytes, ? 22903440 total
>> >> >> - age ? 7: ? ?3671960 bytes, ? 26575400 total
>> >> >> - age ? 8: ? ?3841608 bytes, ? 30417008 total
>> >> >> - age ? 9: ? ?2035392 bytes, ? 32452400 total
>> >> >> - age ?10: ? ?1975056 bytes, ? 34427456 total
>> >> >> - age ?11: ? ?2021344 bytes, ? 36448800 total
>> >> >> - age ?12: ? ?1520752 bytes, ? 37969552 total
>> >> >> - age ?13: ? ?1494176 bytes, ? 39463728 total
>> >> >> - age ?14: ? ?2355136 bytes, ? 41818864 total
>> >> >> - age ?15: ? ?1279000 bytes, ? 43097864 total
>> >> >> : 603473K->61640K(682688K), 0.0756570 secs]
>> >> >> 3373284K->2832383K(5106368K), 0.0762090 secs] [Times: user=0.56
>> >> >> sys=0.00, real=0.08 secs]
>> >> >> 2012-03-15T18:32:18.200+0100: 797125.468: [GC 797125.469: [ParNew
>> >> >> Desired survivor size 69894144 bytes, new threshold 15 (max 15)
>> >> >> - age ? 1: ? ?6101320 bytes, ? ?6101320 total
>> >> >> - age ? 2: ? ?4446776 bytes, ? 10548096 total
>> >> >> - age ? 3: ? ?3701384 bytes, ? 14249480 total
>> >> >> - age ? 4: ? ?3438488 bytes, ? 17687968 total
>> >> >> - age ? 5: ? ?3295360 bytes, ? 20983328 total
>> >> >> - age ? 6: ? ?3403320 bytes, ? 24386648 total
>> >> >> - age ? 7: ? ?3323368 bytes, ? 27710016 total
>> >> >> - age ? 8: ? ?3665760 bytes, ? 31375776 total
>> >> >> - age ? 9: ? ?2427904 bytes, ? 33803680 total
>> >> >> - age ?10: ? ?1418656 bytes, ? 35222336 total
>> >> >> - age ?11: ? ?1955192 bytes, ? 37177528 total
>> >> >> - age ?12: ? ?2006064 bytes, ? 39183592 total
>> >> >> - age ?13: ? ?1520768 bytes, ? 40704360 total
>> >> >> - age ?14: ? ?1493728 bytes, ? 42198088 total
>> >> >> - age ?15: ? ?2354376 bytes, ? 44552464 total
>> >> >> : 607816K->62650K(682688K), 0.0779270 secs]
>> >> >> 3378559K->2834643K(5106368K), 0.0784690 secs] [Times: user=0.58
>> >> >> sys=0.00, real=0.08 secs]
>> >> >> -------------------------
>> >> >>
>> >> >> Questions:
>> >> >>
>> >> >> 1) From the tenuring distributions, it seems that the application
>> >> >> benefits from larger new gen and survivor spaces.
>> >> >> The next thing we'll try is to run with -Xmn1g -XX:SurvivorRatio=2,
>> >> >> and see if the ParNew times are still acceptable.
>> >> >> Does this seem a sensible approach in this context?
>> >> >> Are there other variables beyond ParNew times that limit scaling the
>> >> >> new gen to a large size?
>> >> >>
>> >> >> 2) Given the object age demographics inherent to our application, we
>> >> >> can not expect to see the majority of data get collected in the new
>> >> >> gen.
>> >> >>
>> >> >> Our approach to fight the promotion failures consists of three
>> >> >> aspects:
>> >> >> a) Lower the overall allocation rate of our application (by
>> >> >> improving
>> >> >> wasteful hotspots), to decrease overall ParNew collection frequency.
>> >> >> b) Configure the new gen and survivor spaces as large as possible,
>> >> >> keeping an eye on ParNew times and overall new/tenured ratio.
>> >> >> c) Try to refactor the data structures that form the bulk of
>> >> >> promoted
>> >> >> data, to retain only the strictly required subgraphs.
>> >> >>
>> >> >> Is there anything else I can try or measure, in order to better
>> >> >> understand the problem?
>> >> >>
>> >> >> Thanks in advance,
>> >> >> Taras
>> >> >>
>> >> >>
>> >> >> On Wed, Feb 22, 2012 at 10:51 AM, Taras Tielkes
>> >> >> <taras.tielkes at gmail.com> wrote:
>> >> >>> (this time properly responding to the list alias)
>> >> >>> Hi Srinivas,
>> >> >>>
>> >> >>> We're running 1.6.0 u29 on Linux x64. My understanding is that
>> >> >>> CompressedOops is enabled by default since u23.
>> >> >>>
>> >> >>> At least this page seems to support that:
>> >> >>>
>> >> >>>
>> >> >>> http://docs.oracle.com/javase/7/docs/technotes/guides/vm/performance-enhancements-7.html
>> >> >>>
>> >> >>> Regarding the other remarks (also from Todd and Chi), I'll comment
>> >> >>> later. The first thing on my list is to collect
>> >> >>> PrintTenuringDistribution data now.
>> >> >>>
>> >> >>> Kind regards,
>> >> >>> Taras
>> >> >>>
>> >> >>> On Wed, Feb 22, 2012 at 10:50 AM, Taras Tielkes
>> >> >>> <taras.tielkes at gmail.com> wrote:
>> >> >>>> Hi Srinivas,
>> >> >>>>
>> >> >>>> We're running 1.6.0 u29 on Linux x64. My understanding is that
>> >> >>>> CompressedOops is enabled by default since u23.
>> >> >>>>
>> >> >>>> At least this page seems to support that:
>> >> >>>>
>> >> >>>>
>> >> >>>> http://docs.oracle.com/javase/7/docs/technotes/guides/vm/performance-enhancements-7.html
>> >> >>>>
>> >> >>>> Regarding the other remarks (also from Todd and Chi), I'll comment
>> >> >>>> later. The first thing on my list is to collect
>> >> >>>> PrintTenuringDistribution data now.
>> >> >>>>
>> >> >>>> Kind regards,
>> >> >>>> Taras
>> >> >>>>
>> >> >>>> On Wed, Feb 22, 2012 at 12:40 AM, Srinivas Ramakrishna
>> >> >>>> <ysr1729 at gmail.com> wrote:
>> >> >>>>> I agree that premature promotions are almost always the first and
>> >> >>>>> most
>> >> >>>>> important thing to fix when running
>> >> >>>>> into fragmentation or overload issues with CMS. However, I can
>> >> >>>>> also
>> >> >>>>> imagine
>> >> >>>>> long-lived objects with a highly
>> >> >>>>> non-stationary size distribution which can also cause problems
>> >> >>>>> for
>> >> >>>>> CMS
>> >> >>>>> despite best efforts to tune against
>> >> >>>>> premature promotion.
>> >> >>>>>
>> >> >>>>> I didn't think Treas was running with MTT=0, although MTT > 0 is
>> >> >>>>> no
>> >> >>>>> recipe
>> >> >>>>> for avoiding premature promotion
>> >> >>>>> with bursty loads that case overflow the survivor spaces -- as
>> >> >>>>> you
>> >> >>>>> say large
>> >> >>>>> survivor spaces with a low
>> >> >>>>> TargetSurvivorRatio -- so as to leave plenty of space to
>> >> >>>>> absorb/accommodate
>> >> >>>>> spiking/bursty loads? is
>> >> >>>>> definitely a "best practice" for CMS (and possibly for other
>> >> >>>>> concurrent
>> >> >>>>> collectors as well).
>> >> >>>>>
>> >> >>>>> One thing Taras can do to see if premature promotion might be an
>> >> >>>>> issue is to
>> >> >>>>> look at the tenuring
>> >> >>>>> threshold in his case. A rough proxy (if
>> >> >>>>> PrintTenuringDistribution
>> >> >>>>> is not
>> >> >>>>> enabled) is to look at the
>> >> >>>>> promotion volume per scavenge. It may be possible, if premature
>> >> >>>>> promotion is
>> >> >>>>> a cause, to see
>> >> >>>>> some kind of medium-term correlation between high promotion
>> >> >>>>> volume
>> >> >>>>> and
>> >> >>>>> eventual promotion
>> >> >>>>> failure despite frequent CMS collections.
>> >> >>>>>
>> >> >>>>> One other point which may or may not be relevant. I see that
>> >> >>>>> Taras
>> >> >>>>> is not
>> >> >>>>> using CompressedOops...
>> >> >>>>> Using that alone would greatly decrease memory pressure and
>> >> >>>>> provide
>> >> >>>>> more
>> >> >>>>> breathing room to CMS,
>> >> >>>>> which is also almost always a good idea.
>> >> >>>>>
>> >> >>>>> -- ramki
>> >> >>>>>
>> >> >>>>> On Tue, Feb 21, 2012 at 10:16 AM, Chi Ho Kwok
>> >> >>>>> <chkwok at digibites.nl>
>> >> >>>>> wrote:
>> >> >>>>>>
>> >> >>>>>> Hi Teras,
>> >> >>>>>>
>> >> >>>>>> I think you may want to look into sizing the new and especially
>> >> >>>>>> the
>> >> >>>>>> survivor spaces differently. We run something similar to what
>> >> >>>>>> you
>> >> >>>>>> described,
>> >> >>>>>> high volume request processing with large dataset loading, and
>> >> >>>>>> what
>> >> >>>>>> we've
>> >> >>>>>> seen at the start is that the survivor spaces are completely
>> >> >>>>>> overloaded,
>> >> >>>>>> causing premature promotions.
>> >> >>>>>>
>> >> >>>>>> We've configured our vm with the following goals/guideline:
>> >> >>>>>>
>> >> >>>>>> old space is for semi-permanent data, living for at least 30s,
>> >> >>>>>> average ~10
>> >> >>>>>> minutes
>> >> >>>>>> new space contains only temporary and just loaded data
>> >> >>>>>> surviving objects from new should never reach old in 1 gc, so
>> >> >>>>>> the
>> >> >>>>>> survivor
>> >> >>>>>> space may never be 100% full
>> >> >>>>>>
>> >> >>>>>> With jstat -gcutil `pidof java` 2000, we see things like:
>> >> >>>>>>
>> >> >>>>>> ? S0 ? ? S1 ? ? E ? ? ?O ? ? ?P ? ? YGC ? ? YGCT ? ?FGC ? ?FGCT
>> >> >>>>>> GCT
>> >> >>>>>> ?70.20 ? 0.00 ?19.65 ?57.60 ?59.90 124808 29474.299 ?2498
>> >> >>>>>> ?191.110
>> >> >>>>>> 29665.409
>> >> >>>>>> ?70.20 ? 0.00 ?92.89 ?57.60 ?59.90 124808 29474.299 ?2498
>> >> >>>>>> ?191.110
>> >> >>>>>> 29665.409
>> >> >>>>>> ?70.20 ? 0.00 ?93.47 ?57.60 ?59.90 124808 29474.299 ?2498
>> >> >>>>>> ?191.110
>> >> >>>>>> 29665.409
>> >> >>>>>> ? 0.00 ?65.69 ?78.07 ?58.09 ?59.90 124809 29474.526 ?2498
>> >> >>>>>> ?191.110
>> >> >>>>>> 29665.636
>> >> >>>>>> ?84.97 ? 0.00 ?48.19 ?58.57 ?59.90 124810 29474.774 ?2498
>> >> >>>>>> ?191.110
>> >> >>>>>> 29665.884
>> >> >>>>>> ?84.97 ? 0.00 ?81.30 ?58.57 ?59.90 124810 29474.774 ?2498
>> >> >>>>>> ?191.110
>> >> >>>>>> 29665.884
>> >> >>>>>> ? 0.00 ?62.64 ?27.22 ?59.12 ?59.90 124811 29474.992 ?2498
>> >> >>>>>> ?191.110
>> >> >>>>>> 29666.102
>> >> >>>>>> ? 0.00 ?62.64 ?54.47 ?59.12 ?59.90 124811 29474.992 ?2498
>> >> >>>>>> ?191.110
>> >> >>>>>> 29666.102
>> >> >>>>>> ?75.68 ? 0.00 ? 6.80 ?59.53 ?59.90 124812 29475.228 ?2498
>> >> >>>>>> ?191.110
>> >> >>>>>> 29666.338
>> >> >>>>>> ?75.68 ? 0.00 ?23.38 ?59.53 ?59.90 124812 29475.228 ?2498
>> >> >>>>>> ?191.110
>> >> >>>>>> 29666.338
>> >> >>>>>> ?75.68 ? 0.00 ?27.72 ?59.53 ?59.90 124812 29475.228 ?2498
>> >> >>>>>> ?191.110
>> >> >>>>>> 29666.338
>> >> >>>>>>
>> >> >>>>>> If you follow the lines, you can see Eden fill up to 100% on
>> >> >>>>>> line
>> >> >>>>>> 4,
>> >> >>>>>> surviving objects are copied into S1, S0 is collected and added
>> >> >>>>>> 0.49% to
>> >> >>>>>> Old. On line 5, another GC happened, with Eden->S0, S1->Old,
>> >> >>>>>> etc.
>> >> >>>>>> No objects
>> >> >>>>>> is ever transferred from Eden to Old, unless there's a huge peak
>> >> >>>>>> of
>> >> >>>>>> requests.
>> >> >>>>>>
>> >> >>>>>> This is with a: 32GB heap, Mxn1200M, SurvivorRatio 2 (600MB
>> >> >>>>>> Eden,
>> >> >>>>>> 300MB
>> >> >>>>>> S0, 300MB S1), MaxTenuringThreshold 1 (whatever is still alive
>> >> >>>>>> in
>> >> >>>>>> S0/1 on
>> >> >>>>>> the second GC is copied to old, don't wait, web requests are
>> >> >>>>>> quite
>> >> >>>>>> bursty).
>> >> >>>>>> With about 1 collection every 2-5 seconds, objects promoted to
>> >> >>>>>> Old
>> >> >>>>>> must live
>> >> >>>>>> for at 4-10 seconds; as that's longer than an average request
>> >> >>>>>> (50ms-1s),
>> >> >>>>>> none of the temporary data ever makes it into Old, which is much
>> >> >>>>>> more
>> >> >>>>>> expensive to collect. It works even with a higher than default
>> >> >>>>>> CMSInitiatingOccupancyFraction=76 to optimize for space
>> >> >>>>>> available
>> >> >>>>>> for the
>> >> >>>>>> large data cache we have.
>> >> >>>>>>
>> >> >>>>>>
>> >> >>>>>> With your config of 400MB Total new, with 350MB Eden, 25MB S0,
>> >> >>>>>> 25MB
>> >> >>>>>> S1
>> >> >>>>>> (SurvivorRatio 8), no tenuring threshold, I think loads of new
>> >> >>>>>> objects get
>> >> >>>>>> copied from Eden to Old directly, causing trouble for the CMS.
>> >> >>>>>> You
>> >> >>>>>> can use
>> >> >>>>>> jstat to get live stats and tweak until it doesn't happen. If
>> >> >>>>>> you
>> >> >>>>>> can't make
>> >> >>>>>> changes on live that easil, try doubling the new size indeed,
>> >> >>>>>> with
>> >> >>>>>> a 400
>> >> >>>>>> Eden, 200 S0, 200 S1 and?MaxTenuringThreshold?1 setting. It's
>> >> >>>>>> probably
>> >> >>>>>> overkill, but if should solve the problem if it is caused by
>> >> >>>>>> premature
>> >> >>>>>> promotion.
>> >> >>>>>>
>> >> >>>>>>
>> >> >>>>>> Chi Ho Kwok
>> >> >>>>>>
>> >> >>>>>>
>> >> >>>>>> On Tue, Feb 21, 2012 at 5:55 PM, Taras Tielkes
>> >> >>>>>> <taras.tielkes at gmail.com>
>> >> >>>>>> wrote:
>> >> >>>>>>>
>> >> >>>>>>> Hi,
>> >> >>>>>>>
>> >> >>>>>>> We've removed the "-XX:+CMSScavengeBeforeRemark" setting from
>> >> >>>>>>> 50%
>> >> >>>>>>> of
>> >> >>>>>>> our production nodes.
>> >> >>>>>>> After running for a few weeks, it seems that there's no impact
>> >> >>>>>>> from
>> >> >>>>>>> removing this option.
>> >> >>>>>>> Which is good, since it seems we can remove it from the other
>> >> >>>>>>> nodes as
>> >> >>>>>>> well, simplifying our overall JVM configuration ;-)
>> >> >>>>>>>
>> >> >>>>>>> However, we're still seeing promotion failures on all nodes,
>> >> >>>>>>> once
>> >> >>>>>>> every day or so.
>> >> >>>>>>>
>> >> >>>>>>> There's still the "Magic 1026": this accounts for ~60% of the
>> >> >>>>>>> promotion failures that we're seeing (single ParNew thread
>> >> >>>>>>> thread,
>> >> >>>>>>> 1026 failure size):
>> >> >>>>>>> --------------------
>> >> >>>>>>> 2012-02-06T09:13:51.806+0100: 328095.085: [GC 328095.086:
>> >> >>>>>>> [ParNew:
>> >> >>>>>>> 359895K->29357K(368640K), 0.0429070 secs]
>> >> >>>>>>> 3471021K->3143476K(5201920K), 0.0434950 secs] [Times: user=0.32
>> >> >>>>>>> sys=0.00, real=0.04 secs]
>> >> >>>>>>> 2012-02-06T09:13:55.922+0100: 328099.201: [GC 328099.201:
>> >> >>>>>>> [ParNew:
>> >> >>>>>>> 357037K->31817K(368640K), 0.0429130 secs]
>> >> >>>>>>> 3471156K->3148946K(5201920K), 0.0434930 secs] [Times: user=0.31
>> >> >>>>>>> sys=0.00, real=0.04 secs]
>> >> >>>>>>> 2012-02-06T09:13:59.044+0100: 328102.324: [GC 328102.324:
>> >> >>>>>>> [ParNew
>> >> >>>>>>> (promotion failure size = 1026) ?(promotion failed):
>> >> >>>>>>> 359497K->368640K(368640K), 0.2226790 secs]328102.547: [CMS:
>> >> >>>>>>> 3125609K->451515K(4833280K), 5.6225880 secs] 3476626K->4515
>> >> >>>>>>> 15K(5201920K), [CMS Perm : 124373K->124353K(262144K)],
>> >> >>>>>>> 5.8459380
>> >> >>>>>>> secs]
>> >> >>>>>>> [Times: user=6.20 sys=0.01, real=5.85 secs]
>> >> >>>>>>> 2012-02-06T09:14:05.243+0100: 328108.522: [GC 328108.523:
>> >> >>>>>>> [ParNew:
>> >> >>>>>>> 327680K->40960K(368640K), 0.0319160 secs]
>> >> >>>>>>> 779195K->497658K(5201920K),
>> >> >>>>>>> 0.0325360 secs] [Times: user=0.21 sys=0.01, real=0.03 secs]
>> >> >>>>>>> 2012-02-06T09:14:07.836+0100: 328111.116: [GC 328111.116:
>> >> >>>>>>> [ParNew:
>> >> >>>>>>> 368640K->32785K(368640K), 0.0744670 secs]
>> >> >>>>>>> 825338K->520234K(5201920K),
>> >> >>>>>>> 0.0750390 secs] [Times: user=0.40 sys=0.02, real=0.08 secs]
>> >> >>>>>>> --------------------
>> >> >>>>>>> Given the 1026 word size, I'm wondering if I should be hunting
>> >> >>>>>>> for
>> >> >>>>>>> an
>> >> >>>>>>> overuse of BufferedInputStream/BufferedOutoutStream, since both
>> >> >>>>>>> have
>> >> >>>>>>> 8192 as a default buffer size.
>> >> >>>>>>>
>> >> >>>>>>> The second group of promotion failures look like this (multiple
>> >> >>>>>>> ParNew
>> >> >>>>>>> threads, small failure sizes):
>> >> >>>>>>> --------------------
>> >> >>>>>>> 2012-02-06T09:50:15.773+0100: 328756.964: [GC 328756.964:
>> >> >>>>>>> [ParNew:
>> >> >>>>>>> 356116K->29934K(368640K), 0.0461100 secs]
>> >> >>>>>>> 3203863K->2880162K(5201920K), 0.0468870 secs] [Times: user=0.34
>> >> >>>>>>> sys=0.01, real=0.05 secs]
>> >> >>>>>>> 2012-02-06T09:50:19.153+0100: 328760.344: [GC 328760.344:
>> >> >>>>>>> [ParNew:
>> >> >>>>>>> 357614K->30359K(368640K), 0.0454680 secs]
>> >> >>>>>>> 3207842K->2882892K(5201920K), 0.0462280 secs] [Times: user=0.33
>> >> >>>>>>> sys=0.01, real=0.05 secs]
>> >> >>>>>>> 2012-02-06T09:50:22.658+0100: 328763.849: [GC 328763.849:
>> >> >>>>>>> [ParNew
>> >> >>>>>>> (1:
>> >> >>>>>>> promotion failure size = 25) ?(4: promotion failure size = 25)
>> >> >>>>>>> ?(6:
>> >> >>>>>>> promotion failure size = 25) ?(7: promotion failure size = 144)
>> >> >>>>>>> (promotion failed): 358039K->358358
>> >> >>>>>>> K(368640K), 0.2148680 secs]328764.064: [CMS:
>> >> >>>>>>> 2854709K->446750K(4833280K), 5.8368270 secs]
>> >> >>>>>>> 3210572K->446750K(5201920K), [CMS Perm :
>> >> >>>>>>> 124670K->124644K(262144K)],
>> >> >>>>>>> 6.0525230 secs] [Times: user=6.32 sys=0.00, real=6.05 secs]
>> >> >>>>>>> 2012-02-06T09:50:29.896+0100: 328771.086: [GC 328771.087:
>> >> >>>>>>> [ParNew:
>> >> >>>>>>> 327680K->22569K(368640K), 0.0227080 secs]
>> >> >>>>>>> 774430K->469319K(5201920K),
>> >> >>>>>>> 0.0235020 secs] [Times: user=0.16 sys=0.00, real=0.02 secs]
>> >> >>>>>>> 2012-02-06T09:50:31.076+0100: 328772.266: [GC 328772.267:
>> >> >>>>>>> [ParNew:
>> >> >>>>>>> 350249K->22264K(368640K), 0.0235480 secs]
>> >> >>>>>>> 796999K->469014K(5201920K),
>> >> >>>>>>> 0.0243000 secs] [Times: user=0.18 sys=0.01, real=0.02 secs]
>> >> >>>>>>> --------------------
>> >> >>>>>>>
>> >> >>>>>>> We're going to try to double the new size on a single node, to
>> >> >>>>>>> see
>> >> >>>>>>> the
>> >> >>>>>>> effects of that.
>> >> >>>>>>>
>> >> >>>>>>> Beyond this experiment, is there any additional data I can
>> >> >>>>>>> collect
>> >> >>>>>>> to
>> >> >>>>>>> better understand the nature of the promotion failures?
>> >> >>>>>>> Am I facing collecting free list statistics at this point?
>> >> >>>>>>>
>> >> >>>>>>> Thanks,
>> >> >>>>>>> Taras
>> >> >>>>>>
>> >> >>>>>>
>> >> >>>>>> _______________________________________________
>> >> >>>>>> hotspot-gc-use mailing list
>> >> >>>>>> hotspot-gc-use at openjdk.java.net
>> >> >>>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>> >> >>>>>>
>> >> >>>>>
>> >> >> _______________________________________________
>> >> >> hotspot-gc-use mailing list
>> >> >> hotspot-gc-use at openjdk.java.net
>> >> >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>> >> _______________________________________________
>> >> hotspot-gc-use mailing list
>> >> hotspot-gc-use at openjdk.java.net
>> >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>> >
>> >
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>

From chkwok at digibites.nl  Sun Apr 15 10:11:34 2012
From: chkwok at digibites.nl (Chi Ho Kwok)
Date: Sun, 15 Apr 2012 19:11:34 +0200
Subject: Promotion failures: indication of CMS fragmentation?
In-Reply-To: <CA+R7V78XOkt36PKuPfECuion4isHhbJewMf_Y-6mEtR8JsG=RA@mail.gmail.com>
References: <CA+R7V78oeNvQwWOjagdANw=h0Ws_p5da7BDeOhguoKT1V5n5dQ@mail.gmail.com>
	<4EF9FCAC.3030208@oracle.com>
	<CA+R7V7-SGdXmbtqo=+2VQwKVnCVCZdj4M=gQfrxiGf2fEMi3cA@mail.gmail.com>
	<4F06A270.3010701@oracle.com>
	<CA+R7V78Twoz0a=J5oCRYJjBdnptPdUv9Jnvt4wiLUsh3Cy+bHw@mail.gmail.com>
	<4F0DBEC4.7040907@oracle.com>
	<CA+R7V7-pxrKH5L2brxZRZwKrv7ZF3aYtQkZmb7-A=nSLn5QfYg@mail.gmail.com>
	<4F1ECE7B.3040502@oracle.com>
	<CA+R7V79x29mXvkEKuPnCYrAJfZjzHc5QnfgrNCYPZFO8GRYayg@mail.gmail.com>
	<4F1F2ED7.6060308@oracle.com>
	<CA+R7V7_P4xdsOMdM+KgiO-urNMiPakQQcdjnOQ_yYo4KZhko2w@mail.gmail.com>
	<4F20F78D.9070905@oracle.com>
	<CA+R7V79M0B2UTqqxiUGfoK-1pMP54e+biBnH+wy=zGEA2vjihg@mail.gmail.com>
	<CA+R7V79F59SJL6F7QvmWAuCKyisv5MFuDvsBfkDuvU0UcZ_iOw@mail.gmail.com>
	<CA+R7V7_st6DPnJZOMUnAeRVeYND42Y19rAUjNJ+PhtF72Ur2mQ@mail.gmail.com>
	<CAG7eTFoeYitaBjgt2eUT3kXqU2SGk1eC5eofdAAL1SjuCFMHCg@mail.gmail.com>
	<CABzyjyna2Mq7EXDiZ8mVB=1MX9Gw1=e2z8zO8X69QeodVKbBrg@mail.gmail.com>
	<CA+R7V79NDPWq2YeX8NQ9j6XW8P7=dZbQWJwzG6dxz4UgUGvKuA@mail.gmail.com>
	<CA+R7V783r2459y7r6zxXNP9_eQ2KOa5Oh0Mr2tPhS1d-8H-ing@mail.gmail.com>
	<CA+R7V78c=D70MJv-jUw2X4ayOOUV0UBZDphcucRSpiYYncoV1A@mail.gmail.com>
	<CABzyjynVHStdeACv_o3zyBhc_P_5yrMrqLb+++aWn8cft51PuA@mail.gmail.com>
	<CA+R7V7-pON9KOHg+pkV3ZRxodP3=OgyVRL__A2+YAR1OGAUtfw@mail.gmail.com>
	<CAG7eTFryQozJmQ=2MEQn0Yrz+c9LuxbknAPnmQwzN6sdM22j3g@mail.gmail.com>
	<CA+R7V7_GFq7mpVKcraMDDVfGEzAz9Wvn=K5k=14L9XM8EW3O4Q@mail.gmail.com>
	<CAG7eTFoo9oj7BS29qnur8SQwHWVKD+wT-+MkSspro1JOu1DYeQ@mail.gmail.com>
	<CA+R7V78XOkt36PKuPfECuion4isHhbJewMf_Y-6mEtR8JsG=RA@mail.gmail.com>
Message-ID: <CAG7eTFquP0iFx0ut5zAUaDg=2Kxty0nCEA_JsCC0AAygO8ooHA@mail.gmail.com>

Hi Teras,

Hmm, it looks like it failed even tho there's tons of space available, 2.4G
used, 1.8G free out of 4.2G CMS old gen. Or am I reading the next line
wrong? (snipped age histogram)

[GC 3296267.500: [ParNew (1: promotion failure size = 16)  (2: promotion
failure size = 56)  (4: promotion failure size = 342)  (5: promotion
failure size = 1026)  (6: promotion failure size = 278)  (promotion
failed): 568902K->568678K(682688K), 0.3130580 secs]3296267.813: *[CMS:
2510091K->573489K(4423680K), 7.7481330 secs]* 3078184K->573489K(5106368K),
[CMS Perm : 144002K->143970K(262144K)], 8.0619690 secs] [Times: user=8.35
sys=0.01, real=8.06 secs]


Normally I'd say fragmentation, but how can it not find a spot for a 16
bytes chunk? I'm completely out of ideas now - anyone else?


Here's a brute force "solution": what the app "needs" is 600M of live data
in the old gen, that's what left usually after collection. Increase "safety
margin" by adding memory to the old gen pool if possible by increasing
total heap size, and set initial occupancy ratio to a silly low number like
45%. Hopefully, it will survive until the next software/jvm/kernel patch
that requires a restart of the service or machine.


I've seen something similar in our logs as well, with 19%/2.9GB free, my
guess is that CMS needs a few GB to play with... Nowadays we run with a
larger safety margin, doubled the heap on that machine to 32GB, I haven't
seen any CMS and promotion failures since then (Jan 2010).

128265.354: [GC 128265.355: [ParNew (promotion failed):
589631K->589631K(589824K), 0.3582340 secs]128265.713: *[CMS:
12965822K->10393148K(15990784K), 20.9654520
secs]*13462337K->10393148K(16580608K), [CMS Perm :
20604K->16846K(34456K)],
21.3239890 secs] [Times: user=22.06 sys=0.09, real=21.32 secs]


Regards,

Chi Ho


On Sun, Apr 15, 2012 at 6:41 PM, Taras Tielkes <taras.tielkes at gmail.com>wrote:

> Hi Chi,
>
> I've sent you a decent chunk of the gc.log file off-list (hopefully
> not too large).
>
> For completeness, we're running with the following options (ignoring
> the diagnostic ones):
> -----
> -server
> -Xms5g
> -Xmx5g
> -Xmn800m
> -XX:PermSize=256m
> -XX:MaxPermSize=256m
> -XX:SurvivorRatio=4
> -XX:+UseConcMarkSweepGC
> -XX:+UseParNewGC
> -XX:+DisableExplicitGC
> -XX:+UseCMSInitiatingOccupancyOnly
> -XX:+CMSClassUnloadingEnabled
> -XX:CMSInitiatingOccupancyFraction=68
> -----
> Platform is Java 6u29 running on Linux 2.6 x64.
> Hardware is 2xquad Xeons, but pretty old ones (pre-Nehalem, no QPI).
>
> The gc logs will (typically) show big peaks at the start and end of
> the working day - this is nature of the domain our application
> targets.
>
> I would expect the live set to be below 1G (usually below 600M even).
> However, we can experience temporary spikes of higher volume
> longer-living object allocation bursts.
>
> We'll set up a jstat log for this machine. I do have historical jstat
> logs for one of the other machines, but that one is still running with
> a smaller new gen, and smaller survivor spaces. If there's any other
> relevant data that I can collect, let me know.
>
> Kind regards,
> Taras
>
> On Sun, Apr 15, 2012 at 6:15 PM, Chi Ho Kwok <chkwok at digibites.nl> wrote:
> > Hi Teras,
> >
> > Sure thing. Just the previous CMS should be enough, it doesn't matter if
> > there is 10 or 1000 parnew's between that and the failure.
> >
> > As for the jstat failure, it looks like it looks in
> > /tmp/hsperfdata_[username] for the pid by default, maybe something
> > like -J-Djava.io.tmpdir=[path, like /app/client/program/tomcat/temp] can
> > help; and from what I've seen, running jstat as the same user as the
> process
> > or root is required. Historical data is nice to have, but even just
> staring
> > at it for 15 minutes should give you a hint for the old gen usage.
> >
> > If the collection starts at 68, takes a while and the heap fills to 80%+
> > before it's done when it's not busy, it's probably wise to lower the
> initial
> > occupancy factor or increase the thread count so it completes faster. We
> run
> > with -XX:ParallelCMSThreads=3 on a 8 hw thread server, the default (2)
> was
> > too slow for us as we run with 76%, it still takes 15s on average for
> CMS to
> > scan and clean the old gen (while old gen grows to up to 80% full), much
> > longer can mean a promotion failure during request spikes.
> >
> >
> > Chi Ho Kwok
> >
> >
> > On Sun, Apr 15, 2012 at 5:08 PM, Taras Tielkes <taras.tielkes at gmail.com>
> > wrote:
> >>
> >> Hi Chi,
> >>
> >> Is it o.k. if I send this off-list to you directly? If so, how much
> >> more do you need? Just enough to cover the previous CMS?
> >> We're running with  -XX:CMSInitiatingOccupancyFraction=68 and
> >> -XX:+UseCMSInitiatingOccupancyOnly, by the way.
> >>
> >> I do have shell access, however, on that particular machine we're
> >> experiencing the "process id not found" issue with jstat.
> >> I think this can be worked around by fiddling with temp directory
> >> options, but we haven't tried that yet.
> >> Regarding the jstat output, I assume this would be most valuable to
> >> have for the exact moment when the promotion failure happens, correct?
> >> If so, we can try to set up jstat to run in the background
> >> continuously, to have more diagnostic data in the future.
> >>
> >> Kind regards,
> >> Taras
> >>
> >> On Sun, Apr 15, 2012 at 2:48 PM, Chi Ho Kwok <chkwok at digibites.nl>
> wrote:
> >> > Hi Teras,
> >> >
> >> > Can you send me a larger chunk of the log? I'm interested in seeing
> when
> >> > the
> >> > last CMS was run and what it freed. Maybe it's kicking in too late,
> the
> >> > full
> >> > GC triggered by promotion failure only found 600M live data, rest was
> >> > garbage. If that's the cause,
> lowering XX:CMSInitiatingOccupancyFraction
> >> > can
> >> > help.
> >> >
> >> > Also, do you have shell access to that machine? If so, try running
> >> > jstat,
> >> > you can see the usage of all generations live as it happens.
> >> >
> >> >
> >> > Chi Ho Kwok
> >> >
> >> > On Sun, Apr 15, 2012 at 2:34 PM, Taras Tielkes <
> taras.tielkes at gmail.com>
> >> > wrote:
> >> >>
> >> >> Hi Chi, Srinivas,
> >> >>
> >> >> Optimizing the cost of ParNew (by lowering MTT) would be nice, but
> for
> >> >> now my priority is still to minimize the promotion failures.
> >> >>
> >> >> For example, on the machine running CMS with the "larger" young gen
> >> >> and survivor spaces (-Xmx5g -Xmn800 -XX:SurvivorRatio=4), I've just
> >> >> seen a promotion failure again. Below is a snippet of gc.log showing
> >> >> this.
> >> >> To put this into perspective, this is a first promotion failure on
> >> >> that machine in a couple of weeks. Still, zero failures would beat a
> >> >> single failure, since the clients connecting to this application will
> >> >> only wait a few seconds before timing out and terminating the
> >> >> connection. In addition, the promotion failures are occurring in peak
> >> >> usage moments.
> >> >>
> >> >> Apart from trying to eliminate the promotion failure pauses, my main
> >> >> goal is to learn how to understand the root cause in a case like
> this.
> >> >> Any suggestions for things to try or read up on are appreciated.
> >> >>
> >> >> Kind regards,
> >> >> Taras
> >> >> ------------------------------------------------
> >> >> 2012-04-13T17:44:27.777+0200: 3296255.045: [GC 3296255.046: [ParNew
> >> >> Desired survivor size 69894144 bytes, new threshold 15 (max 15)
> >> >> - age   1:    3684448 bytes,    3684448 total
> >> >> - age   2:     824984 bytes,    4509432 total
> >> >> - age   3:     885120 bytes,    5394552 total
> >> >> - age   4:     756568 bytes,    6151120 total
> >> >> - age   5:     696880 bytes,    6848000 total
> >> >> - age   6:     890688 bytes,    7738688 total
> >> >> - age   7:    2631184 bytes,   10369872 total
> >> >> - age   8:     719976 bytes,   11089848 total
> >> >> - age   9:     724944 bytes,   11814792 total
> >> >> - age  10:     750360 bytes,   12565152 total
> >> >> - age  11:     934944 bytes,   13500096 total
> >> >> - age  12:     521080 bytes,   14021176 total
> >> >> - age  13:     543392 bytes,   14564568 total
> >> >> - age  14:     906616 bytes,   15471184 total
> >> >> - age  15:     504008 bytes,   15975192 total
> >> >> : 568932K->22625K(682688K), 0.0410180 secs]
> >> >> 3077079K->2531413K(5106368K), 0.0416940 secs] [Times: user=0.30
> >> >> sys=0.01, real=0.05 secs]
> >> >> 2012-04-13T17:44:33.893+0200: 3296261.162: [GC 3296261.162: [ParNew
> >> >> Desired survivor size 69894144 bytes, new threshold 15 (max 15)
> >> >> - age   1:    2975896 bytes,    2975896 total
> >> >> - age   2:     742592 bytes,    3718488 total
> >> >> - age   3:     812864 bytes,    4531352 total
> >> >> - age   4:     873488 bytes,    5404840 total
> >> >> - age   5:     746128 bytes,    6150968 total
> >> >> - age   6:     685192 bytes,    6836160 total
> >> >> - age   7:     888376 bytes,    7724536 total
> >> >> - age   8:    2621688 bytes,   10346224 total
> >> >> - age   9:     715608 bytes,   11061832 total
> >> >> - age  10:     723336 bytes,   11785168 total
> >> >> - age  11:     749856 bytes,   12535024 total
> >> >> - age  12:     914632 bytes,   13449656 total
> >> >> - age  13:     520944 bytes,   13970600 total
> >> >> - age  14:     543224 bytes,   14513824 total
> >> >> - age  15:     906040 bytes,   15419864 total
> >> >> : 568801K->22726K(682688K), 0.0447800 secs]
> >> >> 3077589K->2532008K(5106368K), 0.0454710 secs] [Times: user=0.33
> >> >> sys=0.00, real=0.05 secs]
> >> >> 2012-04-13T17:44:40.231+0200: 3296267.499: [GC 3296267.500: [ParNew
> >> >> (1: promotion failure size = 16)  (2: promotion failure size = 56)
> >> >> (4: promotion failure
> >> >> size = 342)  (5: promotion failure size = 1026)  (6: promotion
> failure
> >> >> size = 278)  (promotion failed)
> >> >> Desired survivor size 69894144 bytes, new threshold 15 (max 15)
> >> >> - age   1:    2436840 bytes,    2436840 total
> >> >> - age   2:    1625136 bytes,    4061976 total
> >> >> - age   3:     691664 bytes,    4753640 total
> >> >> - age   4:     799992 bytes,    5553632 total
> >> >> - age   5:     858344 bytes,    6411976 total
> >> >> - age   6:     730200 bytes,    7142176 total
> >> >> - age   7:     680072 bytes,    7822248 total
> >> >> - age   8:     885960 bytes,    8708208 total
> >> >> - age   9:    2618544 bytes,   11326752 total
> >> >> - age  10:     709168 bytes,   12035920 total
> >> >> - age  11:     714576 bytes,   12750496 total
> >> >> - age  12:     734976 bytes,   13485472 total
> >> >> - age  13:     905048 bytes,   14390520 total
> >> >> - age  14:     520320 bytes,   14910840 total
> >> >> - age  15:     543056 bytes,   15453896 total
> >> >> : 568902K->568678K(682688K), 0.3130580 secs]3296267.813: [CMS:
> >> >> 2510091K->573489K(4423680K), 7.7481330 secs]
> >> >> 3078184K->573489K(5106368K), [CMS Perm : 144002K->
> >> >> 143970K(262144K)], 8.0619690 secs] [Times: user=8.35 sys=0.01,
> >> >> real=8.06
> >> >> secs]
> >> >> 2012-04-13T17:44:51.337+0200: 3296278.606: [GC 3296278.606: [ParNew
> >> >> Desired survivor size 69894144 bytes, new threshold 15 (max 15)
> >> >> - age   1:   33717528 bytes,   33717528 total
> >> >> : 546176K->43054K(682688K), 0.0515990 secs]
> >> >> 1119665K->616543K(5106368K), 0.0523550 secs] [Times: user=0.34
> >> >> sys=0.00, real=0.05 secs]
> >> >> ------------------------------------------------
> >> >>
> >> >> On Tue, Mar 20, 2012 at 10:12 PM, Srinivas Ramakrishna
> >> >> <ysr1729 at gmail.com> wrote:
> >> >> > As Chi-ho noted, about 3-4 MB of data does get promoted per
> scavenge,
> >> >> > after having
> >> >> > sloshed around in your survivor spaces some 15 times. I'd venture
> >> >> > that
> >> >> > whatever winnowing
> >> >> > of young objects was to ocur has in fact occured already within the
> >> >> > first 3-4 scavenges that
> >> >> > an object has survived, after which the drop-off in population is
> >> >> > less
> >> >> > sharp. So I'd suggest
> >> >> > lowering the MTT to about 3, while leaving the survivor ratio
> intact.
> >> >> > That should reduce your
> >> >> > copying costs and bring down your scavenge pauses further, while
> not
> >> >> > adversely affecting
> >> >> > your promotion rates (and concomitantly the fragmentation).
> >> >> >
> >> >> > One thing that was a bit puzzling about the stats below was that
> >> >> > you'd
> >> >> > expect the volume
> >> >> > of generation X in scavenge N to be no less than the volume of
> >> >> > generation X+1 in scavenge N+1,
> >> >> > but occasionally that natural invariant does not appear to hold,
> >> >> > which
> >> >> > is quite puzzling --
> >> >> > indicating perhaps that either ages or populations are not being
> >> >> > correctly tracked.
> >> >> >
> >> >> > I don't know if anyone else has noticed that in their tenuring
> >> >> > distributions as well....
> >> >> >
> >> >> > -- ramki
> >> >> >
> >> >> > On Tue, Mar 20, 2012 at 9:36 AM, Taras Tielkes
> >> >> > <taras.tielkes at gmail.com>
> >> >> > wrote:
> >> >> >> Hi,
> >> >> >>
> >> >> >> I've collected -XX:+PrintTenuringDistribution data from a node in
> >> >> >> our
> >> >> >> production environment, running -Xmx5g -Xmn400m
> -XX:SurvivorRatio=8.
> >> >> >> On one other production node, we've configured a larger new gen,
> and
> >> >> >> larger survivor spaces (-Xmx5g -Xmn800m -XX:SurvivorRatio=4).
> >> >> >> This node has -XX:+PrintTenuringDistribution logging as well.
> >> >> >>
> >> >> >> The node running the larger new gen and survivor spaces has not
> run
> >> >> >> into a promotion failure yet, while the ones still running the old
> >> >> >> config have hit a few.
> >> >> >> The promotion failures are typically experienced at high load
> >> >> >> periods,
> >> >> >> which makes sense, as allocation and promotion will experience a
> >> >> >> spike
> >> >> >> in those periods as well.
> >> >> >>
> >> >> >> The inherent nature of the application implies relatively long
> >> >> >> sessions (towards a few hours), retaining a fair amout of state up
> >> >> >> to
> >> >> >> an hour.
> >> >> >> I believe this is the main reason of the relatively high promotion
> >> >> >> rate we're experiencing.
> >> >> >>
> >> >> >>
> >> >> >> Here's a fragment of gc log from one of the nodes running the
> older
> >> >> >> (smaller) new gen, including a promotion failure:
> >> >> >> -------------------------
> >> >> >> 2012-03-15T18:32:17.785+0100: 796604.225: [GC 796604.225: [ParNew
> >> >> >> Desired survivor size 20971520 bytes, new threshold 8 (max 15)
> >> >> >> - age   1:    2927728 bytes,    2927728 total
> >> >> >> - age   2:    2428512 bytes,    5356240 total
> >> >> >> - age   3:    2696376 bytes,    8052616 total
> >> >> >> - age   4:    2623576 bytes,   10676192 total
> >> >> >> - age   5:    3365576 bytes,   14041768 total
> >> >> >> - age   6:    2792272 bytes,   16834040 total
> >> >> >> - age   7:    2233008 bytes,   19067048 total
> >> >> >> - age   8:    2263824 bytes,   21330872 total
> >> >> >> : 358709K->29362K(368640K), 0.0461460 secs]
> >> >> >> 3479492K->3151874K(5201920K), 0.0467320 secs] [Times: user=0.34
> >> >> >> sys=0.01, real=0.05 secs]
> >> >> >> 2012-03-15T18:32:21.546+0100: 796607.986: [GC 796607.986: [ParNew
> >> >> >> (0:
> >> >> >> promotion failure size = 25)  (1: promotion failure size = 25)
>  (2:
> >> >> >> promotion failure size = 25)  (3: promotion failure size = 25)
>  (4:
> >> >> >> promotion failure size = 25)  (5
> >> >> >> : promotion failure size = 25)  (6: promotion failure size = 341)
> >> >> >>  (7:
> >> >> >> promotion failure size = 25)  (promotion failed)
> >> >> >> Desired survivor size 20971520 bytes, new threshold 8 (max 15)
> >> >> >> - age   1:    3708208 bytes,    3708208 total
> >> >> >> - age   2:    2174384 bytes,    5882592 total
> >> >> >> - age   3:    2383256 bytes,    8265848 total
> >> >> >> - age   4:    2689912 bytes,   10955760 total
> >> >> >> - age   5:    2621832 bytes,   13577592 total
> >> >> >> - age   6:    3360440 bytes,   16938032 total
> >> >> >> - age   7:    2784136 bytes,   19722168 total
> >> >> >> - age   8:    2220232 bytes,   21942400 total
> >> >> >> : 357042K->356456K(368640K), 0.2734100 secs]796608.259: [CMS:
> >> >> >> 3124189K->516640K(4833280K), 6.8127070 secs]
> >> >> >> 3479554K->516640K(5201920K), [CMS Perm :
> 142423K->142284K(262144K)],
> >> >> >> 7.0867850 secs] [Times: user=7.32 sys=0.07, real=7.09 secs]
> >> >> >> 2012-03-15T18:32:30.279+0100: 796616.719: [GC 796616.720: [ParNew
> >> >> >> Desired survivor size 20971520 bytes, new threshold 1 (max 15)
> >> >> >> - age   1:   29721456 bytes,   29721456 total
> >> >> >> : 327680K->40960K(368640K), 0.0403130 secs]
> >> >> >> 844320K->557862K(5201920K), 0.0409070 secs] [Times: user=0.27
> >> >> >> sys=0.01, real=0.04 secs]
> >> >> >> 2012-03-15T18:32:32.701+0100: 796619.141: [GC 796619.141: [ParNew
> >> >> >> Desired survivor size 20971520 bytes, new threshold 15 (max 15)
> >> >> >> - age   1:   10310176 bytes,   10310176 total
> >> >> >> -------------------------
> >> >> >>
> >> >> >> For contrast, here's a gc log fragment from the single node
> running
> >> >> >> the larger new gen and larger survivor spaces:
> >> >> >> (the fragment is from the same point in time, with the nodes
> >> >> >> experiencing equal load)
> >> >> >> -------------------------
> >> >> >> 2012-03-15T18:32:12.067+0100: 797119.336: [GC 797119.336: [ParNew
> >> >> >> Desired survivor size 69894144 bytes, new threshold 15 (max 15)
> >> >> >> - age   1:    5611536 bytes,    5611536 total
> >> >> >> - age   2:    3731888 bytes,    9343424 total
> >> >> >> - age   3:    3450672 bytes,   12794096 total
> >> >> >> - age   4:    3314744 bytes,   16108840 total
> >> >> >> - age   5:    3459888 bytes,   19568728 total
> >> >> >> - age   6:    3334712 bytes,   22903440 total
> >> >> >> - age   7:    3671960 bytes,   26575400 total
> >> >> >> - age   8:    3841608 bytes,   30417008 total
> >> >> >> - age   9:    2035392 bytes,   32452400 total
> >> >> >> - age  10:    1975056 bytes,   34427456 total
> >> >> >> - age  11:    2021344 bytes,   36448800 total
> >> >> >> - age  12:    1520752 bytes,   37969552 total
> >> >> >> - age  13:    1494176 bytes,   39463728 total
> >> >> >> - age  14:    2355136 bytes,   41818864 total
> >> >> >> - age  15:    1279000 bytes,   43097864 total
> >> >> >> : 603473K->61640K(682688K), 0.0756570 secs]
> >> >> >> 3373284K->2832383K(5106368K), 0.0762090 secs] [Times: user=0.56
> >> >> >> sys=0.00, real=0.08 secs]
> >> >> >> 2012-03-15T18:32:18.200+0100: 797125.468: [GC 797125.469: [ParNew
> >> >> >> Desired survivor size 69894144 bytes, new threshold 15 (max 15)
> >> >> >> - age   1:    6101320 bytes,    6101320 total
> >> >> >> - age   2:    4446776 bytes,   10548096 total
> >> >> >> - age   3:    3701384 bytes,   14249480 total
> >> >> >> - age   4:    3438488 bytes,   17687968 total
> >> >> >> - age   5:    3295360 bytes,   20983328 total
> >> >> >> - age   6:    3403320 bytes,   24386648 total
> >> >> >> - age   7:    3323368 bytes,   27710016 total
> >> >> >> - age   8:    3665760 bytes,   31375776 total
> >> >> >> - age   9:    2427904 bytes,   33803680 total
> >> >> >> - age  10:    1418656 bytes,   35222336 total
> >> >> >> - age  11:    1955192 bytes,   37177528 total
> >> >> >> - age  12:    2006064 bytes,   39183592 total
> >> >> >> - age  13:    1520768 bytes,   40704360 total
> >> >> >> - age  14:    1493728 bytes,   42198088 total
> >> >> >> - age  15:    2354376 bytes,   44552464 total
> >> >> >> : 607816K->62650K(682688K), 0.0779270 secs]
> >> >> >> 3378559K->2834643K(5106368K), 0.0784690 secs] [Times: user=0.58
> >> >> >> sys=0.00, real=0.08 secs]
> >> >> >> -------------------------
> >> >> >>
> >> >> >> Questions:
> >> >> >>
> >> >> >> 1) From the tenuring distributions, it seems that the application
> >> >> >> benefits from larger new gen and survivor spaces.
> >> >> >> The next thing we'll try is to run with -Xmn1g
> -XX:SurvivorRatio=2,
> >> >> >> and see if the ParNew times are still acceptable.
> >> >> >> Does this seem a sensible approach in this context?
> >> >> >> Are there other variables beyond ParNew times that limit scaling
> the
> >> >> >> new gen to a large size?
> >> >> >>
> >> >> >> 2) Given the object age demographics inherent to our application,
> we
> >> >> >> can not expect to see the majority of data get collected in the
> new
> >> >> >> gen.
> >> >> >>
> >> >> >> Our approach to fight the promotion failures consists of three
> >> >> >> aspects:
> >> >> >> a) Lower the overall allocation rate of our application (by
> >> >> >> improving
> >> >> >> wasteful hotspots), to decrease overall ParNew collection
> frequency.
> >> >> >> b) Configure the new gen and survivor spaces as large as possible,
> >> >> >> keeping an eye on ParNew times and overall new/tenured ratio.
> >> >> >> c) Try to refactor the data structures that form the bulk of
> >> >> >> promoted
> >> >> >> data, to retain only the strictly required subgraphs.
> >> >> >>
> >> >> >> Is there anything else I can try or measure, in order to better
> >> >> >> understand the problem?
> >> >> >>
> >> >> >> Thanks in advance,
> >> >> >> Taras
> >> >> >>
> >> >> >>
> >> >> >> On Wed, Feb 22, 2012 at 10:51 AM, Taras Tielkes
> >> >> >> <taras.tielkes at gmail.com> wrote:
> >> >> >>> (this time properly responding to the list alias)
> >> >> >>> Hi Srinivas,
> >> >> >>>
> >> >> >>> We're running 1.6.0 u29 on Linux x64. My understanding is that
> >> >> >>> CompressedOops is enabled by default since u23.
> >> >> >>>
> >> >> >>> At least this page seems to support that:
> >> >> >>>
> >> >> >>>
> >> >> >>>
> http://docs.oracle.com/javase/7/docs/technotes/guides/vm/performance-enhancements-7.html
> >> >> >>>
> >> >> >>> Regarding the other remarks (also from Todd and Chi), I'll
> comment
> >> >> >>> later. The first thing on my list is to collect
> >> >> >>> PrintTenuringDistribution data now.
> >> >> >>>
> >> >> >>> Kind regards,
> >> >> >>> Taras
> >> >> >>>
> >> >> >>> On Wed, Feb 22, 2012 at 10:50 AM, Taras Tielkes
> >> >> >>> <taras.tielkes at gmail.com> wrote:
> >> >> >>>> Hi Srinivas,
> >> >> >>>>
> >> >> >>>> We're running 1.6.0 u29 on Linux x64. My understanding is that
> >> >> >>>> CompressedOops is enabled by default since u23.
> >> >> >>>>
> >> >> >>>> At least this page seems to support that:
> >> >> >>>>
> >> >> >>>>
> >> >> >>>>
> http://docs.oracle.com/javase/7/docs/technotes/guides/vm/performance-enhancements-7.html
> >> >> >>>>
> >> >> >>>> Regarding the other remarks (also from Todd and Chi), I'll
> comment
> >> >> >>>> later. The first thing on my list is to collect
> >> >> >>>> PrintTenuringDistribution data now.
> >> >> >>>>
> >> >> >>>> Kind regards,
> >> >> >>>> Taras
> >> >> >>>>
> >> >> >>>> On Wed, Feb 22, 2012 at 12:40 AM, Srinivas Ramakrishna
> >> >> >>>> <ysr1729 at gmail.com> wrote:
> >> >> >>>>> I agree that premature promotions are almost always the first
> and
> >> >> >>>>> most
> >> >> >>>>> important thing to fix when running
> >> >> >>>>> into fragmentation or overload issues with CMS. However, I can
> >> >> >>>>> also
> >> >> >>>>> imagine
> >> >> >>>>> long-lived objects with a highly
> >> >> >>>>> non-stationary size distribution which can also cause problems
> >> >> >>>>> for
> >> >> >>>>> CMS
> >> >> >>>>> despite best efforts to tune against
> >> >> >>>>> premature promotion.
> >> >> >>>>>
> >> >> >>>>> I didn't think Treas was running with MTT=0, although MTT > 0
> is
> >> >> >>>>> no
> >> >> >>>>> recipe
> >> >> >>>>> for avoiding premature promotion
> >> >> >>>>> with bursty loads that case overflow the survivor spaces -- as
> >> >> >>>>> you
> >> >> >>>>> say large
> >> >> >>>>> survivor spaces with a low
> >> >> >>>>> TargetSurvivorRatio -- so as to leave plenty of space to
> >> >> >>>>> absorb/accommodate
> >> >> >>>>> spiking/bursty loads  is
> >> >> >>>>> definitely a "best practice" for CMS (and possibly for other
> >> >> >>>>> concurrent
> >> >> >>>>> collectors as well).
> >> >> >>>>>
> >> >> >>>>> One thing Taras can do to see if premature promotion might be
> an
> >> >> >>>>> issue is to
> >> >> >>>>> look at the tenuring
> >> >> >>>>> threshold in his case. A rough proxy (if
> >> >> >>>>> PrintTenuringDistribution
> >> >> >>>>> is not
> >> >> >>>>> enabled) is to look at the
> >> >> >>>>> promotion volume per scavenge. It may be possible, if premature
> >> >> >>>>> promotion is
> >> >> >>>>> a cause, to see
> >> >> >>>>> some kind of medium-term correlation between high promotion
> >> >> >>>>> volume
> >> >> >>>>> and
> >> >> >>>>> eventual promotion
> >> >> >>>>> failure despite frequent CMS collections.
> >> >> >>>>>
> >> >> >>>>> One other point which may or may not be relevant. I see that
> >> >> >>>>> Taras
> >> >> >>>>> is not
> >> >> >>>>> using CompressedOops...
> >> >> >>>>> Using that alone would greatly decrease memory pressure and
> >> >> >>>>> provide
> >> >> >>>>> more
> >> >> >>>>> breathing room to CMS,
> >> >> >>>>> which is also almost always a good idea.
> >> >> >>>>>
> >> >> >>>>> -- ramki
> >> >> >>>>>
> >> >> >>>>> On Tue, Feb 21, 2012 at 10:16 AM, Chi Ho Kwok
> >> >> >>>>> <chkwok at digibites.nl>
> >> >> >>>>> wrote:
> >> >> >>>>>>
> >> >> >>>>>> Hi Teras,
> >> >> >>>>>>
> >> >> >>>>>> I think you may want to look into sizing the new and
> especially
> >> >> >>>>>> the
> >> >> >>>>>> survivor spaces differently. We run something similar to what
> >> >> >>>>>> you
> >> >> >>>>>> described,
> >> >> >>>>>> high volume request processing with large dataset loading, and
> >> >> >>>>>> what
> >> >> >>>>>> we've
> >> >> >>>>>> seen at the start is that the survivor spaces are completely
> >> >> >>>>>> overloaded,
> >> >> >>>>>> causing premature promotions.
> >> >> >>>>>>
> >> >> >>>>>> We've configured our vm with the following goals/guideline:
> >> >> >>>>>>
> >> >> >>>>>> old space is for semi-permanent data, living for at least 30s,
> >> >> >>>>>> average ~10
> >> >> >>>>>> minutes
> >> >> >>>>>> new space contains only temporary and just loaded data
> >> >> >>>>>> surviving objects from new should never reach old in 1 gc, so
> >> >> >>>>>> the
> >> >> >>>>>> survivor
> >> >> >>>>>> space may never be 100% full
> >> >> >>>>>>
> >> >> >>>>>> With jstat -gcutil `pidof java` 2000, we see things like:
> >> >> >>>>>>
> >> >> >>>>>>   S0     S1     E      O      P     YGC     YGCT    FGC
>  FGCT
> >> >> >>>>>> GCT
> >> >> >>>>>>  70.20   0.00  19.65  57.60  59.90 124808 29474.299  2498
> >> >> >>>>>>  191.110
> >> >> >>>>>> 29665.409
> >> >> >>>>>>  70.20   0.00  92.89  57.60  59.90 124808 29474.299  2498
> >> >> >>>>>>  191.110
> >> >> >>>>>> 29665.409
> >> >> >>>>>>  70.20   0.00  93.47  57.60  59.90 124808 29474.299  2498
> >> >> >>>>>>  191.110
> >> >> >>>>>> 29665.409
> >> >> >>>>>>   0.00  65.69  78.07  58.09  59.90 124809 29474.526  2498
> >> >> >>>>>>  191.110
> >> >> >>>>>> 29665.636
> >> >> >>>>>>  84.97   0.00  48.19  58.57  59.90 124810 29474.774  2498
> >> >> >>>>>>  191.110
> >> >> >>>>>> 29665.884
> >> >> >>>>>>  84.97   0.00  81.30  58.57  59.90 124810 29474.774  2498
> >> >> >>>>>>  191.110
> >> >> >>>>>> 29665.884
> >> >> >>>>>>   0.00  62.64  27.22  59.12  59.90 124811 29474.992  2498
> >> >> >>>>>>  191.110
> >> >> >>>>>> 29666.102
> >> >> >>>>>>   0.00  62.64  54.47  59.12  59.90 124811 29474.992  2498
> >> >> >>>>>>  191.110
> >> >> >>>>>> 29666.102
> >> >> >>>>>>  75.68   0.00   6.80  59.53  59.90 124812 29475.228  2498
> >> >> >>>>>>  191.110
> >> >> >>>>>> 29666.338
> >> >> >>>>>>  75.68   0.00  23.38  59.53  59.90 124812 29475.228  2498
> >> >> >>>>>>  191.110
> >> >> >>>>>> 29666.338
> >> >> >>>>>>  75.68   0.00  27.72  59.53  59.90 124812 29475.228  2498
> >> >> >>>>>>  191.110
> >> >> >>>>>> 29666.338
> >> >> >>>>>>
> >> >> >>>>>> If you follow the lines, you can see Eden fill up to 100% on
> >> >> >>>>>> line
> >> >> >>>>>> 4,
> >> >> >>>>>> surviving objects are copied into S1, S0 is collected and
> added
> >> >> >>>>>> 0.49% to
> >> >> >>>>>> Old. On line 5, another GC happened, with Eden->S0, S1->Old,
> >> >> >>>>>> etc.
> >> >> >>>>>> No objects
> >> >> >>>>>> is ever transferred from Eden to Old, unless there's a huge
> peak
> >> >> >>>>>> of
> >> >> >>>>>> requests.
> >> >> >>>>>>
> >> >> >>>>>> This is with a: 32GB heap, Mxn1200M, SurvivorRatio 2 (600MB
> >> >> >>>>>> Eden,
> >> >> >>>>>> 300MB
> >> >> >>>>>> S0, 300MB S1), MaxTenuringThreshold 1 (whatever is still alive
> >> >> >>>>>> in
> >> >> >>>>>> S0/1 on
> >> >> >>>>>> the second GC is copied to old, don't wait, web requests are
> >> >> >>>>>> quite
> >> >> >>>>>> bursty).
> >> >> >>>>>> With about 1 collection every 2-5 seconds, objects promoted to
> >> >> >>>>>> Old
> >> >> >>>>>> must live
> >> >> >>>>>> for at 4-10 seconds; as that's longer than an average request
> >> >> >>>>>> (50ms-1s),
> >> >> >>>>>> none of the temporary data ever makes it into Old, which is
> much
> >> >> >>>>>> more
> >> >> >>>>>> expensive to collect. It works even with a higher than default
> >> >> >>>>>> CMSInitiatingOccupancyFraction=76 to optimize for space
> >> >> >>>>>> available
> >> >> >>>>>> for the
> >> >> >>>>>> large data cache we have.
> >> >> >>>>>>
> >> >> >>>>>>
> >> >> >>>>>> With your config of 400MB Total new, with 350MB Eden, 25MB S0,
> >> >> >>>>>> 25MB
> >> >> >>>>>> S1
> >> >> >>>>>> (SurvivorRatio 8), no tenuring threshold, I think loads of new
> >> >> >>>>>> objects get
> >> >> >>>>>> copied from Eden to Old directly, causing trouble for the CMS.
> >> >> >>>>>> You
> >> >> >>>>>> can use
> >> >> >>>>>> jstat to get live stats and tweak until it doesn't happen. If
> >> >> >>>>>> you
> >> >> >>>>>> can't make
> >> >> >>>>>> changes on live that easil, try doubling the new size indeed,
> >> >> >>>>>> with
> >> >> >>>>>> a 400
> >> >> >>>>>> Eden, 200 S0, 200 S1 and MaxTenuringThreshold 1 setting. It's
> >> >> >>>>>> probably
> >> >> >>>>>> overkill, but if should solve the problem if it is caused by
> >> >> >>>>>> premature
> >> >> >>>>>> promotion.
> >> >> >>>>>>
> >> >> >>>>>>
> >> >> >>>>>> Chi Ho Kwok
> >> >> >>>>>>
> >> >> >>>>>>
> >> >> >>>>>> On Tue, Feb 21, 2012 at 5:55 PM, Taras Tielkes
> >> >> >>>>>> <taras.tielkes at gmail.com>
> >> >> >>>>>> wrote:
> >> >> >>>>>>>
> >> >> >>>>>>> Hi,
> >> >> >>>>>>>
> >> >> >>>>>>> We've removed the "-XX:+CMSScavengeBeforeRemark" setting from
> >> >> >>>>>>> 50%
> >> >> >>>>>>> of
> >> >> >>>>>>> our production nodes.
> >> >> >>>>>>> After running for a few weeks, it seems that there's no
> impact
> >> >> >>>>>>> from
> >> >> >>>>>>> removing this option.
> >> >> >>>>>>> Which is good, since it seems we can remove it from the other
> >> >> >>>>>>> nodes as
> >> >> >>>>>>> well, simplifying our overall JVM configuration ;-)
> >> >> >>>>>>>
> >> >> >>>>>>> However, we're still seeing promotion failures on all nodes,
> >> >> >>>>>>> once
> >> >> >>>>>>> every day or so.
> >> >> >>>>>>>
> >> >> >>>>>>> There's still the "Magic 1026": this accounts for ~60% of the
> >> >> >>>>>>> promotion failures that we're seeing (single ParNew thread
> >> >> >>>>>>> thread,
> >> >> >>>>>>> 1026 failure size):
> >> >> >>>>>>> --------------------
> >> >> >>>>>>> 2012-02-06T09:13:51.806+0100: 328095.085: [GC 328095.086:
> >> >> >>>>>>> [ParNew:
> >> >> >>>>>>> 359895K->29357K(368640K), 0.0429070 secs]
> >> >> >>>>>>> 3471021K->3143476K(5201920K), 0.0434950 secs] [Times:
> user=0.32
> >> >> >>>>>>> sys=0.00, real=0.04 secs]
> >> >> >>>>>>> 2012-02-06T09:13:55.922+0100: 328099.201: [GC 328099.201:
> >> >> >>>>>>> [ParNew:
> >> >> >>>>>>> 357037K->31817K(368640K), 0.0429130 secs]
> >> >> >>>>>>> 3471156K->3148946K(5201920K), 0.0434930 secs] [Times:
> user=0.31
> >> >> >>>>>>> sys=0.00, real=0.04 secs]
> >> >> >>>>>>> 2012-02-06T09:13:59.044+0100: 328102.324: [GC 328102.324:
> >> >> >>>>>>> [ParNew
> >> >> >>>>>>> (promotion failure size = 1026)  (promotion failed):
> >> >> >>>>>>> 359497K->368640K(368640K), 0.2226790 secs]328102.547: [CMS:
> >> >> >>>>>>> 3125609K->451515K(4833280K), 5.6225880 secs] 3476626K->4515
> >> >> >>>>>>> 15K(5201920K), [CMS Perm : 124373K->124353K(262144K)],
> >> >> >>>>>>> 5.8459380
> >> >> >>>>>>> secs]
> >> >> >>>>>>> [Times: user=6.20 sys=0.01, real=5.85 secs]
> >> >> >>>>>>> 2012-02-06T09:14:05.243+0100: 328108.522: [GC 328108.523:
> >> >> >>>>>>> [ParNew:
> >> >> >>>>>>> 327680K->40960K(368640K), 0.0319160 secs]
> >> >> >>>>>>> 779195K->497658K(5201920K),
> >> >> >>>>>>> 0.0325360 secs] [Times: user=0.21 sys=0.01, real=0.03 secs]
> >> >> >>>>>>> 2012-02-06T09:14:07.836+0100: 328111.116: [GC 328111.116:
> >> >> >>>>>>> [ParNew:
> >> >> >>>>>>> 368640K->32785K(368640K), 0.0744670 secs]
> >> >> >>>>>>> 825338K->520234K(5201920K),
> >> >> >>>>>>> 0.0750390 secs] [Times: user=0.40 sys=0.02, real=0.08 secs]
> >> >> >>>>>>> --------------------
> >> >> >>>>>>> Given the 1026 word size, I'm wondering if I should be
> hunting
> >> >> >>>>>>> for
> >> >> >>>>>>> an
> >> >> >>>>>>> overuse of BufferedInputStream/BufferedOutoutStream, since
> both
> >> >> >>>>>>> have
> >> >> >>>>>>> 8192 as a default buffer size.
> >> >> >>>>>>>
> >> >> >>>>>>> The second group of promotion failures look like this
> (multiple
> >> >> >>>>>>> ParNew
> >> >> >>>>>>> threads, small failure sizes):
> >> >> >>>>>>> --------------------
> >> >> >>>>>>> 2012-02-06T09:50:15.773+0100: 328756.964: [GC 328756.964:
> >> >> >>>>>>> [ParNew:
> >> >> >>>>>>> 356116K->29934K(368640K), 0.0461100 secs]
> >> >> >>>>>>> 3203863K->2880162K(5201920K), 0.0468870 secs] [Times:
> user=0.34
> >> >> >>>>>>> sys=0.01, real=0.05 secs]
> >> >> >>>>>>> 2012-02-06T09:50:19.153+0100: 328760.344: [GC 328760.344:
> >> >> >>>>>>> [ParNew:
> >> >> >>>>>>> 357614K->30359K(368640K), 0.0454680 secs]
> >> >> >>>>>>> 3207842K->2882892K(5201920K), 0.0462280 secs] [Times:
> user=0.33
> >> >> >>>>>>> sys=0.01, real=0.05 secs]
> >> >> >>>>>>> 2012-02-06T09:50:22.658+0100: 328763.849: [GC 328763.849:
> >> >> >>>>>>> [ParNew
> >> >> >>>>>>> (1:
> >> >> >>>>>>> promotion failure size = 25)  (4: promotion failure size =
> 25)
> >> >> >>>>>>>  (6:
> >> >> >>>>>>> promotion failure size = 25)  (7: promotion failure size =
> 144)
> >> >> >>>>>>> (promotion failed): 358039K->358358
> >> >> >>>>>>> K(368640K), 0.2148680 secs]328764.064: [CMS:
> >> >> >>>>>>> 2854709K->446750K(4833280K), 5.8368270 secs]
> >> >> >>>>>>> 3210572K->446750K(5201920K), [CMS Perm :
> >> >> >>>>>>> 124670K->124644K(262144K)],
> >> >> >>>>>>> 6.0525230 secs] [Times: user=6.32 sys=0.00, real=6.05 secs]
> >> >> >>>>>>> 2012-02-06T09:50:29.896+0100: 328771.086: [GC 328771.087:
> >> >> >>>>>>> [ParNew:
> >> >> >>>>>>> 327680K->22569K(368640K), 0.0227080 secs]
> >> >> >>>>>>> 774430K->469319K(5201920K),
> >> >> >>>>>>> 0.0235020 secs] [Times: user=0.16 sys=0.00, real=0.02 secs]
> >> >> >>>>>>> 2012-02-06T09:50:31.076+0100: 328772.266: [GC 328772.267:
> >> >> >>>>>>> [ParNew:
> >> >> >>>>>>> 350249K->22264K(368640K), 0.0235480 secs]
> >> >> >>>>>>> 796999K->469014K(5201920K),
> >> >> >>>>>>> 0.0243000 secs] [Times: user=0.18 sys=0.01, real=0.02 secs]
> >> >> >>>>>>> --------------------
> >> >> >>>>>>>
> >> >> >>>>>>> We're going to try to double the new size on a single node,
> to
> >> >> >>>>>>> see
> >> >> >>>>>>> the
> >> >> >>>>>>> effects of that.
> >> >> >>>>>>>
> >> >> >>>>>>> Beyond this experiment, is there any additional data I can
> >> >> >>>>>>> collect
> >> >> >>>>>>> to
> >> >> >>>>>>> better understand the nature of the promotion failures?
> >> >> >>>>>>> Am I facing collecting free list statistics at this point?
> >> >> >>>>>>>
> >> >> >>>>>>> Thanks,
> >> >> >>>>>>> Taras
> >> >> >>>>>>
> >> >> >>>>>>
> >> >> >>>>>> _______________________________________________
> >> >> >>>>>> hotspot-gc-use mailing list
> >> >> >>>>>> hotspot-gc-use at openjdk.java.net
> >> >> >>>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
> >> >> >>>>>>
> >> >> >>>>>
> >> >> >> _______________________________________________
> >> >> >> hotspot-gc-use mailing list
> >> >> >> hotspot-gc-use at openjdk.java.net
> >> >> >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
> >> >> _______________________________________________
> >> >> hotspot-gc-use mailing list
> >> >> hotspot-gc-use at openjdk.java.net
> >> >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
> >> >
> >> >
> >> _______________________________________________
> >> hotspot-gc-use mailing list
> >> hotspot-gc-use at openjdk.java.net
> >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
> >
> >
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120415/76d0dacc/attachment-0001.html 

From taras.tielkes at gmail.com  Tue Apr 17 04:38:54 2012
From: taras.tielkes at gmail.com (Taras Tielkes)
Date: Tue, 17 Apr 2012 13:38:54 +0200
Subject: Promotion failures: indication of CMS fragmentation?
In-Reply-To: <CAG7eTFquP0iFx0ut5zAUaDg=2Kxty0nCEA_JsCC0AAygO8ooHA@mail.gmail.com>
References: <CA+R7V78oeNvQwWOjagdANw=h0Ws_p5da7BDeOhguoKT1V5n5dQ@mail.gmail.com>
	<4EF9FCAC.3030208@oracle.com>
	<CA+R7V7-SGdXmbtqo=+2VQwKVnCVCZdj4M=gQfrxiGf2fEMi3cA@mail.gmail.com>
	<4F06A270.3010701@oracle.com>
	<CA+R7V78Twoz0a=J5oCRYJjBdnptPdUv9Jnvt4wiLUsh3Cy+bHw@mail.gmail.com>
	<4F0DBEC4.7040907@oracle.com>
	<CA+R7V7-pxrKH5L2brxZRZwKrv7ZF3aYtQkZmb7-A=nSLn5QfYg@mail.gmail.com>
	<4F1ECE7B.3040502@oracle.com>
	<CA+R7V79x29mXvkEKuPnCYrAJfZjzHc5QnfgrNCYPZFO8GRYayg@mail.gmail.com>
	<4F1F2ED7.6060308@oracle.com>
	<CA+R7V7_P4xdsOMdM+KgiO-urNMiPakQQcdjnOQ_yYo4KZhko2w@mail.gmail.com>
	<4F20F78D.9070905@oracle.com>
	<CA+R7V79M0B2UTqqxiUGfoK-1pMP54e+biBnH+wy=zGEA2vjihg@mail.gmail.com>
	<CA+R7V79F59SJL6F7QvmWAuCKyisv5MFuDvsBfkDuvU0UcZ_iOw@mail.gmail.com>
	<CA+R7V7_st6DPnJZOMUnAeRVeYND42Y19rAUjNJ+PhtF72Ur2mQ@mail.gmail.com>
	<CAG7eTFoeYitaBjgt2eUT3kXqU2SGk1eC5eofdAAL1SjuCFMHCg@mail.gmail.com>
	<CABzyjyna2Mq7EXDiZ8mVB=1MX9Gw1=e2z8zO8X69QeodVKbBrg@mail.gmail.com>
	<CA+R7V79NDPWq2YeX8NQ9j6XW8P7=dZbQWJwzG6dxz4UgUGvKuA@mail.gmail.com>
	<CA+R7V783r2459y7r6zxXNP9_eQ2KOa5Oh0Mr2tPhS1d-8H-ing@mail.gmail.com>
	<CA+R7V78c=D70MJv-jUw2X4ayOOUV0UBZDphcucRSpiYYncoV1A@mail.gmail.com>
	<CABzyjynVHStdeACv_o3zyBhc_P_5yrMrqLb+++aWn8cft51PuA@mail.gmail.com>
	<CA+R7V7-pON9KOHg+pkV3ZRxodP3=OgyVRL__A2+YAR1OGAUtfw@mail.gmail.com>
	<CAG7eTFryQozJmQ=2MEQn0Yrz+c9LuxbknAPnmQwzN6sdM22j3g@mail.gmail.com>
	<CA+R7V7_GFq7mpVKcraMDDVfGEzAz9Wvn=K5k=14L9XM8EW3O4Q@mail.gmail.com>
	<CAG7eTFoo9oj7BS29qnur8SQwHWVKD+wT-+MkSspro1JOu1DYeQ@mail.gmail.com>
	<CA+R7V78XOkt36PKuPfECuion4isHhbJewMf_Y-6mEtR8JsG=RA@mail.gmail.com>
	<CAG7eTFquP0iFx0ut5zAUaDg=2Kxty0nCEA_JsCC0AAygO8ooHA@mail.gmail.com>
Message-ID: <CA+R7V78BQ13bR7m28W9hV8msFmEdG5OJdHDhqa=Y-UDcj+QR1g@mail.gmail.com>

Hi,

Perhaps it's me, but I find it hard to actually understand the error message.
The promotion failure error mentions 5 different word sizes, for (I
assume) 5 different ParNew threads.

Which of these threads actually failed to promote the data to tenured
space? The one with the largest work size?
It would be nice if the message could be improved/expanded in the
future to make it more easy to diagnose such events.

-tt

On Sun, Apr 15, 2012 at 7:11 PM, Chi Ho Kwok <chkwok at digibites.nl> wrote:
> Hi Teras,
>
> Hmm, it looks like it failed even tho there's tons of space available, 2.4G
> used, 1.8G free out of 4.2G CMS old gen. Or am I reading the next line
> wrong? (snipped age histogram)
>
> [GC 3296267.500: [ParNew (1: promotion failure size = 16) ?(2: promotion
> failure size = 56) ?(4: promotion failure size = 342) ?(5: promotion failure
> size = 1026) ?(6: promotion failure size = 278) ?(promotion failed):
> 568902K->568678K(682688K), 0.3130580 secs]3296267.813: [CMS:
> 2510091K->573489K(4423680K), 7.7481330 secs] 3078184K->573489K(5106368K),
> [CMS Perm : 144002K->143970K(262144K)], 8.0619690 secs] [Times: user=8.35
> sys=0.01, real=8.06 secs]
>
>
> Normally I'd say fragmentation, but how can it not find a spot for a 16
> bytes chunk? I'm completely out of ideas now - anyone else?
>
>
> Here's a brute force "solution": what the app "needs" is 600M of live data
> in the old gen, that's what left usually after collection. Increase "safety
> margin" by?adding memory to the old gen pool if possible by increasing total
> heap size, and set initial occupancy ratio to a silly low number like 45%.
> Hopefully, it will survive until the next software/jvm/kernel patch that
> requires a restart of the service or machine.
>
>
> I've seen something similar in our logs as well, with 19%/2.9GB free, my
> guess is that CMS needs a few GB to play with... Nowadays we run with a
> larger safety margin, doubled the heap on that machine to 32GB, I haven't
> seen any CMS and promotion failures since then (Jan 2010).
>
> 128265.354: [GC 128265.355: [ParNew (promotion failed):
> 589631K->589631K(589824K), 0.3582340 secs]128265.713: [CMS:
> 12965822K->10393148K(15990784K), 20.9654520 secs]
> 13462337K->10393148K(16580608K), [CMS Perm : 20604K->16846K(34456K)],
> 21.3239890 secs] [Times: user=22.06 sys=0.09, real=21.32 secs]
>
>
>
> Regards,
>
> Chi Ho
>
>
> On Sun, Apr 15, 2012 at 6:41 PM, Taras Tielkes <taras.tielkes at gmail.com>
> wrote:
>>
>> Hi Chi,
>>
>> I've sent you a decent chunk of the gc.log file off-list (hopefully
>> not too large).
>>
>> For completeness, we're running with the following options (ignoring
>> the diagnostic ones):
>> -----
>> -server
>> -Xms5g
>> -Xmx5g
>> -Xmn800m
>> -XX:PermSize=256m
>> -XX:MaxPermSize=256m
>> -XX:SurvivorRatio=4
>> -XX:+UseConcMarkSweepGC
>> -XX:+UseParNewGC
>> -XX:+DisableExplicitGC
>> -XX:+UseCMSInitiatingOccupancyOnly
>> -XX:+CMSClassUnloadingEnabled
>> -XX:CMSInitiatingOccupancyFraction=68
>> -----
>> Platform is Java 6u29 running on Linux 2.6 x64.
>> Hardware is 2xquad Xeons, but pretty old ones (pre-Nehalem, no QPI).
>>
>> The gc logs will (typically) show big peaks at the start and end of
>> the working day - this is nature of the domain our application
>> targets.
>>
>> I would expect the live set to be below 1G (usually below 600M even).
>> However, we can experience temporary spikes of higher volume
>> longer-living object allocation bursts.
>>
>> We'll set up a jstat log for this machine. I do have historical jstat
>> logs for one of the other machines, but that one is still running with
>> a smaller new gen, and smaller survivor spaces. If there's any other
>> relevant data that I can collect, let me know.
>>
>> Kind regards,
>> Taras
>>
>> On Sun, Apr 15, 2012 at 6:15 PM, Chi Ho Kwok <chkwok at digibites.nl> wrote:
>> > Hi Teras,
>> >
>> > Sure thing. Just the previous CMS should be enough, it doesn't matter if
>> > there is 10 or 1000 parnew's between that and the failure.
>> >
>> > As for the jstat failure, it looks like it looks in
>> > /tmp/hsperfdata_[username] for the pid by default, maybe something
>> > like?-J-Djava.io.tmpdir=[path, like /app/client/program/tomcat/temp] can
>> > help; and from what I've seen, running jstat as the same user as the
>> > process
>> > or root is required. Historical data is nice to have, but even just
>> > staring
>> > at it for 15 minutes should give you a hint for the old gen usage.
>> >
>> > If the collection starts at 68, takes a while and the heap fills to 80%+
>> > before it's done when it's not busy, it's probably wise to lower the
>> > initial
>> > occupancy factor or increase the thread count so it completes faster. We
>> > run
>> > with?-XX:ParallelCMSThreads=3 on a 8 hw thread server, the default (2)
>> > was
>> > too slow for us as we run with 76%, it still takes 15s on average for
>> > CMS to
>> > scan and clean the old gen (while old gen grows to up to 80% full), much
>> > longer can mean a promotion failure during request spikes.
>> >
>> >
>> > Chi Ho Kwok
>> >
>> >
>> > On Sun, Apr 15, 2012 at 5:08 PM, Taras Tielkes <taras.tielkes at gmail.com>
>> > wrote:
>> >>
>> >> Hi Chi,
>> >>
>> >> Is it o.k. if I send this off-list to you directly? If so, how much
>> >> more do you need? Just enough to cover the previous CMS?
>> >> We're running with ?-XX:CMSInitiatingOccupancyFraction=68 and
>> >> -XX:+UseCMSInitiatingOccupancyOnly, by the way.
>> >>
>> >> I do have shell access, however, on that particular machine we're
>> >> experiencing the "process id not found" issue with jstat.
>> >> I think this can be worked around by fiddling with temp directory
>> >> options, but we haven't tried that yet.
>> >> Regarding the jstat output, I assume this would be most valuable to
>> >> have for the exact moment when the promotion failure happens, correct?
>> >> If so, we can try to set up jstat to run in the background
>> >> continuously, to have more diagnostic data in the future.
>> >>
>> >> Kind regards,
>> >> Taras
>> >>
>> >> On Sun, Apr 15, 2012 at 2:48 PM, Chi Ho Kwok <chkwok at digibites.nl>
>> >> wrote:
>> >> > Hi Teras,
>> >> >
>> >> > Can you send me a larger chunk of the log? I'm interested in seeing
>> >> > when
>> >> > the
>> >> > last CMS was run and what it freed. Maybe it's kicking in too late,
>> >> > the
>> >> > full
>> >> > GC triggered by promotion failure only found 600M live data, rest was
>> >> > garbage. If that's the cause,
>> >> > lowering?XX:CMSInitiatingOccupancyFraction
>> >> > can
>> >> > help.
>> >> >
>> >> > Also, do you have shell access to that machine? If so, try running
>> >> > jstat,
>> >> > you can see the usage of all generations live as it happens.
>> >> >
>> >> >
>> >> > Chi Ho Kwok
>> >> >
>> >> > On Sun, Apr 15, 2012 at 2:34 PM, Taras Tielkes
>> >> > <taras.tielkes at gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> Hi Chi, Srinivas,
>> >> >>
>> >> >> Optimizing the cost of ParNew (by lowering MTT) would be nice, but
>> >> >> for
>> >> >> now my priority is still to minimize the promotion failures.
>> >> >>
>> >> >> For example, on the machine running CMS with the "larger" young gen
>> >> >> and survivor spaces (-Xmx5g -Xmn800 -XX:SurvivorRatio=4), I've just
>> >> >> seen a promotion failure again. Below is a snippet of gc.log showing
>> >> >> this.
>> >> >> To put this into perspective, this is a first promotion failure on
>> >> >> that machine in a couple of weeks. Still, zero failures would beat a
>> >> >> single failure, since the clients connecting to this application
>> >> >> will
>> >> >> only wait a few seconds before timing out and terminating the
>> >> >> connection. In addition, the promotion failures are occurring in
>> >> >> peak
>> >> >> usage moments.
>> >> >>
>> >> >> Apart from trying to eliminate the promotion failure pauses, my main
>> >> >> goal is to learn how to understand the root cause in a case like
>> >> >> this.
>> >> >> Any suggestions for things to try or read up on are appreciated.
>> >> >>
>> >> >> Kind regards,
>> >> >> Taras
>> >> >> ------------------------------------------------
>> >> >> 2012-04-13T17:44:27.777+0200: 3296255.045: [GC 3296255.046: [ParNew
>> >> >> Desired survivor size 69894144 bytes, new threshold 15 (max 15)
>> >> >> - age ? 1: ? ?3684448 bytes, ? ?3684448 total
>> >> >> - age ? 2: ? ? 824984 bytes, ? ?4509432 total
>> >> >> - age ? 3: ? ? 885120 bytes, ? ?5394552 total
>> >> >> - age ? 4: ? ? 756568 bytes, ? ?6151120 total
>> >> >> - age ? 5: ? ? 696880 bytes, ? ?6848000 total
>> >> >> - age ? 6: ? ? 890688 bytes, ? ?7738688 total
>> >> >> - age ? 7: ? ?2631184 bytes, ? 10369872 total
>> >> >> - age ? 8: ? ? 719976 bytes, ? 11089848 total
>> >> >> - age ? 9: ? ? 724944 bytes, ? 11814792 total
>> >> >> - age ?10: ? ? 750360 bytes, ? 12565152 total
>> >> >> - age ?11: ? ? 934944 bytes, ? 13500096 total
>> >> >> - age ?12: ? ? 521080 bytes, ? 14021176 total
>> >> >> - age ?13: ? ? 543392 bytes, ? 14564568 total
>> >> >> - age ?14: ? ? 906616 bytes, ? 15471184 total
>> >> >> - age ?15: ? ? 504008 bytes, ? 15975192 total
>> >> >> : 568932K->22625K(682688K), 0.0410180 secs]
>> >> >> 3077079K->2531413K(5106368K), 0.0416940 secs] [Times: user=0.30
>> >> >> sys=0.01, real=0.05 secs]
>> >> >> 2012-04-13T17:44:33.893+0200: 3296261.162: [GC 3296261.162: [ParNew
>> >> >> Desired survivor size 69894144 bytes, new threshold 15 (max 15)
>> >> >> - age ? 1: ? ?2975896 bytes, ? ?2975896 total
>> >> >> - age ? 2: ? ? 742592 bytes, ? ?3718488 total
>> >> >> - age ? 3: ? ? 812864 bytes, ? ?4531352 total
>> >> >> - age ? 4: ? ? 873488 bytes, ? ?5404840 total
>> >> >> - age ? 5: ? ? 746128 bytes, ? ?6150968 total
>> >> >> - age ? 6: ? ? 685192 bytes, ? ?6836160 total
>> >> >> - age ? 7: ? ? 888376 bytes, ? ?7724536 total
>> >> >> - age ? 8: ? ?2621688 bytes, ? 10346224 total
>> >> >> - age ? 9: ? ? 715608 bytes, ? 11061832 total
>> >> >> - age ?10: ? ? 723336 bytes, ? 11785168 total
>> >> >> - age ?11: ? ? 749856 bytes, ? 12535024 total
>> >> >> - age ?12: ? ? 914632 bytes, ? 13449656 total
>> >> >> - age ?13: ? ? 520944 bytes, ? 13970600 total
>> >> >> - age ?14: ? ? 543224 bytes, ? 14513824 total
>> >> >> - age ?15: ? ? 906040 bytes, ? 15419864 total
>> >> >> : 568801K->22726K(682688K), 0.0447800 secs]
>> >> >> 3077589K->2532008K(5106368K), 0.0454710 secs] [Times: user=0.33
>> >> >> sys=0.00, real=0.05 secs]
>> >> >> 2012-04-13T17:44:40.231+0200: 3296267.499: [GC 3296267.500: [ParNew
>> >> >> (1: promotion failure size = 16) ?(2: promotion failure size = 56)
>> >> >> (4: promotion failure
>> >> >> size = 342) ?(5: promotion failure size = 1026) ?(6: promotion
>> >> >> failure
>> >> >> size = 278) ?(promotion failed)
>> >> >> Desired survivor size 69894144 bytes, new threshold 15 (max 15)
>> >> >> - age ? 1: ? ?2436840 bytes, ? ?2436840 total
>> >> >> - age ? 2: ? ?1625136 bytes, ? ?4061976 total
>> >> >> - age ? 3: ? ? 691664 bytes, ? ?4753640 total
>> >> >> - age ? 4: ? ? 799992 bytes, ? ?5553632 total
>> >> >> - age ? 5: ? ? 858344 bytes, ? ?6411976 total
>> >> >> - age ? 6: ? ? 730200 bytes, ? ?7142176 total
>> >> >> - age ? 7: ? ? 680072 bytes, ? ?7822248 total
>> >> >> - age ? 8: ? ? 885960 bytes, ? ?8708208 total
>> >> >> - age ? 9: ? ?2618544 bytes, ? 11326752 total
>> >> >> - age ?10: ? ? 709168 bytes, ? 12035920 total
>> >> >> - age ?11: ? ? 714576 bytes, ? 12750496 total
>> >> >> - age ?12: ? ? 734976 bytes, ? 13485472 total
>> >> >> - age ?13: ? ? 905048 bytes, ? 14390520 total
>> >> >> - age ?14: ? ? 520320 bytes, ? 14910840 total
>> >> >> - age ?15: ? ? 543056 bytes, ? 15453896 total
>> >> >> : 568902K->568678K(682688K), 0.3130580 secs]3296267.813: [CMS:
>> >> >> 2510091K->573489K(4423680K), 7.7481330 secs]
>> >> >> 3078184K->573489K(5106368K), [CMS Perm : 144002K->
>> >> >> 143970K(262144K)], 8.0619690 secs] [Times: user=8.35 sys=0.01,
>> >> >> real=8.06
>> >> >> secs]
>> >> >> 2012-04-13T17:44:51.337+0200: 3296278.606: [GC 3296278.606: [ParNew
>> >> >> Desired survivor size 69894144 bytes, new threshold 15 (max 15)
>> >> >> - age ? 1: ? 33717528 bytes, ? 33717528 total
>> >> >> : 546176K->43054K(682688K), 0.0515990 secs]
>> >> >> 1119665K->616543K(5106368K), 0.0523550 secs] [Times: user=0.34
>> >> >> sys=0.00, real=0.05 secs]
>> >> >> ------------------------------------------------
>> >> >>
>> >> >> On Tue, Mar 20, 2012 at 10:12 PM, Srinivas Ramakrishna
>> >> >> <ysr1729 at gmail.com> wrote:
>> >> >> > As Chi-ho noted, about 3-4 MB of data does get promoted per
>> >> >> > scavenge,
>> >> >> > after having
>> >> >> > sloshed around in your survivor spaces some 15 times. I'd venture
>> >> >> > that
>> >> >> > whatever winnowing
>> >> >> > of young objects was to ocur has in fact occured already within
>> >> >> > the
>> >> >> > first 3-4 scavenges that
>> >> >> > an object has survived, after which the drop-off in population is
>> >> >> > less
>> >> >> > sharp. So I'd suggest
>> >> >> > lowering the MTT to about 3, while leaving the survivor ratio
>> >> >> > intact.
>> >> >> > That should reduce your
>> >> >> > copying costs and bring down your scavenge pauses further, while
>> >> >> > not
>> >> >> > adversely affecting
>> >> >> > your promotion rates (and concomitantly the fragmentation).
>> >> >> >
>> >> >> > One thing that was a bit puzzling about the stats below was that
>> >> >> > you'd
>> >> >> > expect the volume
>> >> >> > of generation X in scavenge N to be no less than the volume of
>> >> >> > generation X+1 in scavenge N+1,
>> >> >> > but occasionally that natural invariant does not appear to hold,
>> >> >> > which
>> >> >> > is quite puzzling --
>> >> >> > indicating perhaps that either ages or populations are not being
>> >> >> > correctly tracked.
>> >> >> >
>> >> >> > I don't know if anyone else has noticed that in their tenuring
>> >> >> > distributions as well....
>> >> >> >
>> >> >> > -- ramki
>> >> >> >
>> >> >> > On Tue, Mar 20, 2012 at 9:36 AM, Taras Tielkes
>> >> >> > <taras.tielkes at gmail.com>
>> >> >> > wrote:
>> >> >> >> Hi,
>> >> >> >>
>> >> >> >> I've collected -XX:+PrintTenuringDistribution data from a node in
>> >> >> >> our
>> >> >> >> production environment, running -Xmx5g -Xmn400m
>> >> >> >> -XX:SurvivorRatio=8.
>> >> >> >> On one other production node, we've configured a larger new gen,
>> >> >> >> and
>> >> >> >> larger survivor spaces (-Xmx5g -Xmn800m -XX:SurvivorRatio=4).
>> >> >> >> This node has -XX:+PrintTenuringDistribution logging as well.
>> >> >> >>
>> >> >> >> The node running the larger new gen and survivor spaces has not
>> >> >> >> run
>> >> >> >> into a promotion failure yet, while the ones still running the
>> >> >> >> old
>> >> >> >> config have hit a few.
>> >> >> >> The promotion failures are typically experienced at high load
>> >> >> >> periods,
>> >> >> >> which makes sense, as allocation and promotion will experience a
>> >> >> >> spike
>> >> >> >> in those periods as well.
>> >> >> >>
>> >> >> >> The inherent nature of the application implies relatively long
>> >> >> >> sessions (towards a few hours), retaining a fair amout of state
>> >> >> >> up
>> >> >> >> to
>> >> >> >> an hour.
>> >> >> >> I believe this is the main reason of the relatively high
>> >> >> >> promotion
>> >> >> >> rate we're experiencing.
>> >> >> >>
>> >> >> >>
>> >> >> >> Here's a fragment of gc log from one of the nodes running the
>> >> >> >> older
>> >> >> >> (smaller) new gen, including a promotion failure:
>> >> >> >> -------------------------
>> >> >> >> 2012-03-15T18:32:17.785+0100: 796604.225: [GC 796604.225: [ParNew
>> >> >> >> Desired survivor size 20971520 bytes, new threshold 8 (max 15)
>> >> >> >> - age ? 1: ? ?2927728 bytes, ? ?2927728 total
>> >> >> >> - age ? 2: ? ?2428512 bytes, ? ?5356240 total
>> >> >> >> - age ? 3: ? ?2696376 bytes, ? ?8052616 total
>> >> >> >> - age ? 4: ? ?2623576 bytes, ? 10676192 total
>> >> >> >> - age ? 5: ? ?3365576 bytes, ? 14041768 total
>> >> >> >> - age ? 6: ? ?2792272 bytes, ? 16834040 total
>> >> >> >> - age ? 7: ? ?2233008 bytes, ? 19067048 total
>> >> >> >> - age ? 8: ? ?2263824 bytes, ? 21330872 total
>> >> >> >> : 358709K->29362K(368640K), 0.0461460 secs]
>> >> >> >> 3479492K->3151874K(5201920K), 0.0467320 secs] [Times: user=0.34
>> >> >> >> sys=0.01, real=0.05 secs]
>> >> >> >> 2012-03-15T18:32:21.546+0100: 796607.986: [GC 796607.986: [ParNew
>> >> >> >> (0:
>> >> >> >> promotion failure size = 25) ?(1: promotion failure size = 25)
>> >> >> >> ?(2:
>> >> >> >> promotion failure size = 25) ?(3: promotion failure size = 25)
>> >> >> >> ?(4:
>> >> >> >> promotion failure size = 25) ?(5
>> >> >> >> : promotion failure size = 25) ?(6: promotion failure size = 341)
>> >> >> >> ?(7:
>> >> >> >> promotion failure size = 25) ?(promotion failed)
>> >> >> >> Desired survivor size 20971520 bytes, new threshold 8 (max 15)
>> >> >> >> - age ? 1: ? ?3708208 bytes, ? ?3708208 total
>> >> >> >> - age ? 2: ? ?2174384 bytes, ? ?5882592 total
>> >> >> >> - age ? 3: ? ?2383256 bytes, ? ?8265848 total
>> >> >> >> - age ? 4: ? ?2689912 bytes, ? 10955760 total
>> >> >> >> - age ? 5: ? ?2621832 bytes, ? 13577592 total
>> >> >> >> - age ? 6: ? ?3360440 bytes, ? 16938032 total
>> >> >> >> - age ? 7: ? ?2784136 bytes, ? 19722168 total
>> >> >> >> - age ? 8: ? ?2220232 bytes, ? 21942400 total
>> >> >> >> : 357042K->356456K(368640K), 0.2734100 secs]796608.259: [CMS:
>> >> >> >> 3124189K->516640K(4833280K), 6.8127070 secs]
>> >> >> >> 3479554K->516640K(5201920K), [CMS Perm :
>> >> >> >> 142423K->142284K(262144K)],
>> >> >> >> 7.0867850 secs] [Times: user=7.32 sys=0.07, real=7.09 secs]
>> >> >> >> 2012-03-15T18:32:30.279+0100: 796616.719: [GC 796616.720: [ParNew
>> >> >> >> Desired survivor size 20971520 bytes, new threshold 1 (max 15)
>> >> >> >> - age ? 1: ? 29721456 bytes, ? 29721456 total
>> >> >> >> : 327680K->40960K(368640K), 0.0403130 secs]
>> >> >> >> 844320K->557862K(5201920K), 0.0409070 secs] [Times: user=0.27
>> >> >> >> sys=0.01, real=0.04 secs]
>> >> >> >> 2012-03-15T18:32:32.701+0100: 796619.141: [GC 796619.141: [ParNew
>> >> >> >> Desired survivor size 20971520 bytes, new threshold 15 (max 15)
>> >> >> >> - age ? 1: ? 10310176 bytes, ? 10310176 total
>> >> >> >> -------------------------
>> >> >> >>
>> >> >> >> For contrast, here's a gc log fragment from the single node
>> >> >> >> running
>> >> >> >> the larger new gen and larger survivor spaces:
>> >> >> >> (the fragment is from the same point in time, with the nodes
>> >> >> >> experiencing equal load)
>> >> >> >> -------------------------
>> >> >> >> 2012-03-15T18:32:12.067+0100: 797119.336: [GC 797119.336: [ParNew
>> >> >> >> Desired survivor size 69894144 bytes, new threshold 15 (max 15)
>> >> >> >> - age ? 1: ? ?5611536 bytes, ? ?5611536 total
>> >> >> >> - age ? 2: ? ?3731888 bytes, ? ?9343424 total
>> >> >> >> - age ? 3: ? ?3450672 bytes, ? 12794096 total
>> >> >> >> - age ? 4: ? ?3314744 bytes, ? 16108840 total
>> >> >> >> - age ? 5: ? ?3459888 bytes, ? 19568728 total
>> >> >> >> - age ? 6: ? ?3334712 bytes, ? 22903440 total
>> >> >> >> - age ? 7: ? ?3671960 bytes, ? 26575400 total
>> >> >> >> - age ? 8: ? ?3841608 bytes, ? 30417008 total
>> >> >> >> - age ? 9: ? ?2035392 bytes, ? 32452400 total
>> >> >> >> - age ?10: ? ?1975056 bytes, ? 34427456 total
>> >> >> >> - age ?11: ? ?2021344 bytes, ? 36448800 total
>> >> >> >> - age ?12: ? ?1520752 bytes, ? 37969552 total
>> >> >> >> - age ?13: ? ?1494176 bytes, ? 39463728 total
>> >> >> >> - age ?14: ? ?2355136 bytes, ? 41818864 total
>> >> >> >> - age ?15: ? ?1279000 bytes, ? 43097864 total
>> >> >> >> : 603473K->61640K(682688K), 0.0756570 secs]
>> >> >> >> 3373284K->2832383K(5106368K), 0.0762090 secs] [Times: user=0.56
>> >> >> >> sys=0.00, real=0.08 secs]
>> >> >> >> 2012-03-15T18:32:18.200+0100: 797125.468: [GC 797125.469: [ParNew
>> >> >> >> Desired survivor size 69894144 bytes, new threshold 15 (max 15)
>> >> >> >> - age ? 1: ? ?6101320 bytes, ? ?6101320 total
>> >> >> >> - age ? 2: ? ?4446776 bytes, ? 10548096 total
>> >> >> >> - age ? 3: ? ?3701384 bytes, ? 14249480 total
>> >> >> >> - age ? 4: ? ?3438488 bytes, ? 17687968 total
>> >> >> >> - age ? 5: ? ?3295360 bytes, ? 20983328 total
>> >> >> >> - age ? 6: ? ?3403320 bytes, ? 24386648 total
>> >> >> >> - age ? 7: ? ?3323368 bytes, ? 27710016 total
>> >> >> >> - age ? 8: ? ?3665760 bytes, ? 31375776 total
>> >> >> >> - age ? 9: ? ?2427904 bytes, ? 33803680 total
>> >> >> >> - age ?10: ? ?1418656 bytes, ? 35222336 total
>> >> >> >> - age ?11: ? ?1955192 bytes, ? 37177528 total
>> >> >> >> - age ?12: ? ?2006064 bytes, ? 39183592 total
>> >> >> >> - age ?13: ? ?1520768 bytes, ? 40704360 total
>> >> >> >> - age ?14: ? ?1493728 bytes, ? 42198088 total
>> >> >> >> - age ?15: ? ?2354376 bytes, ? 44552464 total
>> >> >> >> : 607816K->62650K(682688K), 0.0779270 secs]
>> >> >> >> 3378559K->2834643K(5106368K), 0.0784690 secs] [Times: user=0.58
>> >> >> >> sys=0.00, real=0.08 secs]
>> >> >> >> -------------------------
>> >> >> >>
>> >> >> >> Questions:
>> >> >> >>
>> >> >> >> 1) From the tenuring distributions, it seems that the application
>> >> >> >> benefits from larger new gen and survivor spaces.
>> >> >> >> The next thing we'll try is to run with -Xmn1g
>> >> >> >> -XX:SurvivorRatio=2,
>> >> >> >> and see if the ParNew times are still acceptable.
>> >> >> >> Does this seem a sensible approach in this context?
>> >> >> >> Are there other variables beyond ParNew times that limit scaling
>> >> >> >> the
>> >> >> >> new gen to a large size?
>> >> >> >>
>> >> >> >> 2) Given the object age demographics inherent to our application,
>> >> >> >> we
>> >> >> >> can not expect to see the majority of data get collected in the
>> >> >> >> new
>> >> >> >> gen.
>> >> >> >>
>> >> >> >> Our approach to fight the promotion failures consists of three
>> >> >> >> aspects:
>> >> >> >> a) Lower the overall allocation rate of our application (by
>> >> >> >> improving
>> >> >> >> wasteful hotspots), to decrease overall ParNew collection
>> >> >> >> frequency.
>> >> >> >> b) Configure the new gen and survivor spaces as large as
>> >> >> >> possible,
>> >> >> >> keeping an eye on ParNew times and overall new/tenured ratio.
>> >> >> >> c) Try to refactor the data structures that form the bulk of
>> >> >> >> promoted
>> >> >> >> data, to retain only the strictly required subgraphs.
>> >> >> >>
>> >> >> >> Is there anything else I can try or measure, in order to better
>> >> >> >> understand the problem?
>> >> >> >>
>> >> >> >> Thanks in advance,
>> >> >> >> Taras
>> >> >> >>
>> >> >> >>
>> >> >> >> On Wed, Feb 22, 2012 at 10:51 AM, Taras Tielkes
>> >> >> >> <taras.tielkes at gmail.com> wrote:
>> >> >> >>> (this time properly responding to the list alias)
>> >> >> >>> Hi Srinivas,
>> >> >> >>>
>> >> >> >>> We're running 1.6.0 u29 on Linux x64. My understanding is that
>> >> >> >>> CompressedOops is enabled by default since u23.
>> >> >> >>>
>> >> >> >>> At least this page seems to support that:
>> >> >> >>>
>> >> >> >>>
>> >> >> >>>
>> >> >> >>> http://docs.oracle.com/javase/7/docs/technotes/guides/vm/performance-enhancements-7.html
>> >> >> >>>
>> >> >> >>> Regarding the other remarks (also from Todd and Chi), I'll
>> >> >> >>> comment
>> >> >> >>> later. The first thing on my list is to collect
>> >> >> >>> PrintTenuringDistribution data now.
>> >> >> >>>
>> >> >> >>> Kind regards,
>> >> >> >>> Taras
>> >> >> >>>
>> >> >> >>> On Wed, Feb 22, 2012 at 10:50 AM, Taras Tielkes
>> >> >> >>> <taras.tielkes at gmail.com> wrote:
>> >> >> >>>> Hi Srinivas,
>> >> >> >>>>
>> >> >> >>>> We're running 1.6.0 u29 on Linux x64. My understanding is that
>> >> >> >>>> CompressedOops is enabled by default since u23.
>> >> >> >>>>
>> >> >> >>>> At least this page seems to support that:
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>> http://docs.oracle.com/javase/7/docs/technotes/guides/vm/performance-enhancements-7.html
>> >> >> >>>>
>> >> >> >>>> Regarding the other remarks (also from Todd and Chi), I'll
>> >> >> >>>> comment
>> >> >> >>>> later. The first thing on my list is to collect
>> >> >> >>>> PrintTenuringDistribution data now.
>> >> >> >>>>
>> >> >> >>>> Kind regards,
>> >> >> >>>> Taras
>> >> >> >>>>
>> >> >> >>>> On Wed, Feb 22, 2012 at 12:40 AM, Srinivas Ramakrishna
>> >> >> >>>> <ysr1729 at gmail.com> wrote:
>> >> >> >>>>> I agree that premature promotions are almost always the first
>> >> >> >>>>> and
>> >> >> >>>>> most
>> >> >> >>>>> important thing to fix when running
>> >> >> >>>>> into fragmentation or overload issues with CMS. However, I can
>> >> >> >>>>> also
>> >> >> >>>>> imagine
>> >> >> >>>>> long-lived objects with a highly
>> >> >> >>>>> non-stationary size distribution which can also cause problems
>> >> >> >>>>> for
>> >> >> >>>>> CMS
>> >> >> >>>>> despite best efforts to tune against
>> >> >> >>>>> premature promotion.
>> >> >> >>>>>
>> >> >> >>>>> I didn't think Treas was running with MTT=0, although MTT > 0
>> >> >> >>>>> is
>> >> >> >>>>> no
>> >> >> >>>>> recipe
>> >> >> >>>>> for avoiding premature promotion
>> >> >> >>>>> with bursty loads that case overflow the survivor spaces -- as
>> >> >> >>>>> you
>> >> >> >>>>> say large
>> >> >> >>>>> survivor spaces with a low
>> >> >> >>>>> TargetSurvivorRatio -- so as to leave plenty of space to
>> >> >> >>>>> absorb/accommodate
>> >> >> >>>>> spiking/bursty loads? is
>> >> >> >>>>> definitely a "best practice" for CMS (and possibly for other
>> >> >> >>>>> concurrent
>> >> >> >>>>> collectors as well).
>> >> >> >>>>>
>> >> >> >>>>> One thing Taras can do to see if premature promotion might be
>> >> >> >>>>> an
>> >> >> >>>>> issue is to
>> >> >> >>>>> look at the tenuring
>> >> >> >>>>> threshold in his case. A rough proxy (if
>> >> >> >>>>> PrintTenuringDistribution
>> >> >> >>>>> is not
>> >> >> >>>>> enabled) is to look at the
>> >> >> >>>>> promotion volume per scavenge. It may be possible, if
>> >> >> >>>>> premature
>> >> >> >>>>> promotion is
>> >> >> >>>>> a cause, to see
>> >> >> >>>>> some kind of medium-term correlation between high promotion
>> >> >> >>>>> volume
>> >> >> >>>>> and
>> >> >> >>>>> eventual promotion
>> >> >> >>>>> failure despite frequent CMS collections.
>> >> >> >>>>>
>> >> >> >>>>> One other point which may or may not be relevant. I see that
>> >> >> >>>>> Taras
>> >> >> >>>>> is not
>> >> >> >>>>> using CompressedOops...
>> >> >> >>>>> Using that alone would greatly decrease memory pressure and
>> >> >> >>>>> provide
>> >> >> >>>>> more
>> >> >> >>>>> breathing room to CMS,
>> >> >> >>>>> which is also almost always a good idea.
>> >> >> >>>>>
>> >> >> >>>>> -- ramki
>> >> >> >>>>>
>> >> >> >>>>> On Tue, Feb 21, 2012 at 10:16 AM, Chi Ho Kwok
>> >> >> >>>>> <chkwok at digibites.nl>
>> >> >> >>>>> wrote:
>> >> >> >>>>>>
>> >> >> >>>>>> Hi Teras,
>> >> >> >>>>>>
>> >> >> >>>>>> I think you may want to look into sizing the new and
>> >> >> >>>>>> especially
>> >> >> >>>>>> the
>> >> >> >>>>>> survivor spaces differently. We run something similar to what
>> >> >> >>>>>> you
>> >> >> >>>>>> described,
>> >> >> >>>>>> high volume request processing with large dataset loading,
>> >> >> >>>>>> and
>> >> >> >>>>>> what
>> >> >> >>>>>> we've
>> >> >> >>>>>> seen at the start is that the survivor spaces are completely
>> >> >> >>>>>> overloaded,
>> >> >> >>>>>> causing premature promotions.
>> >> >> >>>>>>
>> >> >> >>>>>> We've configured our vm with the following goals/guideline:
>> >> >> >>>>>>
>> >> >> >>>>>> old space is for semi-permanent data, living for at least
>> >> >> >>>>>> 30s,
>> >> >> >>>>>> average ~10
>> >> >> >>>>>> minutes
>> >> >> >>>>>> new space contains only temporary and just loaded data
>> >> >> >>>>>> surviving objects from new should never reach old in 1 gc, so
>> >> >> >>>>>> the
>> >> >> >>>>>> survivor
>> >> >> >>>>>> space may never be 100% full
>> >> >> >>>>>>
>> >> >> >>>>>> With jstat -gcutil `pidof java` 2000, we see things like:
>> >> >> >>>>>>
>> >> >> >>>>>> ? S0 ? ? S1 ? ? E ? ? ?O ? ? ?P ? ? YGC ? ? YGCT ? ?FGC
>> >> >> >>>>>> ?FGCT
>> >> >> >>>>>> GCT
>> >> >> >>>>>> ?70.20 ? 0.00 ?19.65 ?57.60 ?59.90 124808 29474.299 ?2498
>> >> >> >>>>>> ?191.110
>> >> >> >>>>>> 29665.409
>> >> >> >>>>>> ?70.20 ? 0.00 ?92.89 ?57.60 ?59.90 124808 29474.299 ?2498
>> >> >> >>>>>> ?191.110
>> >> >> >>>>>> 29665.409
>> >> >> >>>>>> ?70.20 ? 0.00 ?93.47 ?57.60 ?59.90 124808 29474.299 ?2498
>> >> >> >>>>>> ?191.110
>> >> >> >>>>>> 29665.409
>> >> >> >>>>>> ? 0.00 ?65.69 ?78.07 ?58.09 ?59.90 124809 29474.526 ?2498
>> >> >> >>>>>> ?191.110
>> >> >> >>>>>> 29665.636
>> >> >> >>>>>> ?84.97 ? 0.00 ?48.19 ?58.57 ?59.90 124810 29474.774 ?2498
>> >> >> >>>>>> ?191.110
>> >> >> >>>>>> 29665.884
>> >> >> >>>>>> ?84.97 ? 0.00 ?81.30 ?58.57 ?59.90 124810 29474.774 ?2498
>> >> >> >>>>>> ?191.110
>> >> >> >>>>>> 29665.884
>> >> >> >>>>>> ? 0.00 ?62.64 ?27.22 ?59.12 ?59.90 124811 29474.992 ?2498
>> >> >> >>>>>> ?191.110
>> >> >> >>>>>> 29666.102
>> >> >> >>>>>> ? 0.00 ?62.64 ?54.47 ?59.12 ?59.90 124811 29474.992 ?2498
>> >> >> >>>>>> ?191.110
>> >> >> >>>>>> 29666.102
>> >> >> >>>>>> ?75.68 ? 0.00 ? 6.80 ?59.53 ?59.90 124812 29475.228 ?2498
>> >> >> >>>>>> ?191.110
>> >> >> >>>>>> 29666.338
>> >> >> >>>>>> ?75.68 ? 0.00 ?23.38 ?59.53 ?59.90 124812 29475.228 ?2498
>> >> >> >>>>>> ?191.110
>> >> >> >>>>>> 29666.338
>> >> >> >>>>>> ?75.68 ? 0.00 ?27.72 ?59.53 ?59.90 124812 29475.228 ?2498
>> >> >> >>>>>> ?191.110
>> >> >> >>>>>> 29666.338
>> >> >> >>>>>>
>> >> >> >>>>>> If you follow the lines, you can see Eden fill up to 100% on
>> >> >> >>>>>> line
>> >> >> >>>>>> 4,
>> >> >> >>>>>> surviving objects are copied into S1, S0 is collected and
>> >> >> >>>>>> added
>> >> >> >>>>>> 0.49% to
>> >> >> >>>>>> Old. On line 5, another GC happened, with Eden->S0, S1->Old,
>> >> >> >>>>>> etc.
>> >> >> >>>>>> No objects
>> >> >> >>>>>> is ever transferred from Eden to Old, unless there's a huge
>> >> >> >>>>>> peak
>> >> >> >>>>>> of
>> >> >> >>>>>> requests.
>> >> >> >>>>>>
>> >> >> >>>>>> This is with a: 32GB heap, Mxn1200M, SurvivorRatio 2 (600MB
>> >> >> >>>>>> Eden,
>> >> >> >>>>>> 300MB
>> >> >> >>>>>> S0, 300MB S1), MaxTenuringThreshold 1 (whatever is still
>> >> >> >>>>>> alive
>> >> >> >>>>>> in
>> >> >> >>>>>> S0/1 on
>> >> >> >>>>>> the second GC is copied to old, don't wait, web requests are
>> >> >> >>>>>> quite
>> >> >> >>>>>> bursty).
>> >> >> >>>>>> With about 1 collection every 2-5 seconds, objects promoted
>> >> >> >>>>>> to
>> >> >> >>>>>> Old
>> >> >> >>>>>> must live
>> >> >> >>>>>> for at 4-10 seconds; as that's longer than an average request
>> >> >> >>>>>> (50ms-1s),
>> >> >> >>>>>> none of the temporary data ever makes it into Old, which is
>> >> >> >>>>>> much
>> >> >> >>>>>> more
>> >> >> >>>>>> expensive to collect. It works even with a higher than
>> >> >> >>>>>> default
>> >> >> >>>>>> CMSInitiatingOccupancyFraction=76 to optimize for space
>> >> >> >>>>>> available
>> >> >> >>>>>> for the
>> >> >> >>>>>> large data cache we have.
>> >> >> >>>>>>
>> >> >> >>>>>>
>> >> >> >>>>>> With your config of 400MB Total new, with 350MB Eden, 25MB
>> >> >> >>>>>> S0,
>> >> >> >>>>>> 25MB
>> >> >> >>>>>> S1
>> >> >> >>>>>> (SurvivorRatio 8), no tenuring threshold, I think loads of
>> >> >> >>>>>> new
>> >> >> >>>>>> objects get
>> >> >> >>>>>> copied from Eden to Old directly, causing trouble for the
>> >> >> >>>>>> CMS.
>> >> >> >>>>>> You
>> >> >> >>>>>> can use
>> >> >> >>>>>> jstat to get live stats and tweak until it doesn't happen. If
>> >> >> >>>>>> you
>> >> >> >>>>>> can't make
>> >> >> >>>>>> changes on live that easil, try doubling the new size indeed,
>> >> >> >>>>>> with
>> >> >> >>>>>> a 400
>> >> >> >>>>>> Eden, 200 S0, 200 S1 and?MaxTenuringThreshold?1 setting. It's
>> >> >> >>>>>> probably
>> >> >> >>>>>> overkill, but if should solve the problem if it is caused by
>> >> >> >>>>>> premature
>> >> >> >>>>>> promotion.
>> >> >> >>>>>>
>> >> >> >>>>>>
>> >> >> >>>>>> Chi Ho Kwok
>> >> >> >>>>>>
>> >> >> >>>>>>
>> >> >> >>>>>> On Tue, Feb 21, 2012 at 5:55 PM, Taras Tielkes
>> >> >> >>>>>> <taras.tielkes at gmail.com>
>> >> >> >>>>>> wrote:
>> >> >> >>>>>>>
>> >> >> >>>>>>> Hi,
>> >> >> >>>>>>>
>> >> >> >>>>>>> We've removed the "-XX:+CMSScavengeBeforeRemark" setting
>> >> >> >>>>>>> from
>> >> >> >>>>>>> 50%
>> >> >> >>>>>>> of
>> >> >> >>>>>>> our production nodes.
>> >> >> >>>>>>> After running for a few weeks, it seems that there's no
>> >> >> >>>>>>> impact
>> >> >> >>>>>>> from
>> >> >> >>>>>>> removing this option.
>> >> >> >>>>>>> Which is good, since it seems we can remove it from the
>> >> >> >>>>>>> other
>> >> >> >>>>>>> nodes as
>> >> >> >>>>>>> well, simplifying our overall JVM configuration ;-)
>> >> >> >>>>>>>
>> >> >> >>>>>>> However, we're still seeing promotion failures on all nodes,
>> >> >> >>>>>>> once
>> >> >> >>>>>>> every day or so.
>> >> >> >>>>>>>
>> >> >> >>>>>>> There's still the "Magic 1026": this accounts for ~60% of
>> >> >> >>>>>>> the
>> >> >> >>>>>>> promotion failures that we're seeing (single ParNew thread
>> >> >> >>>>>>> thread,
>> >> >> >>>>>>> 1026 failure size):
>> >> >> >>>>>>> --------------------
>> >> >> >>>>>>> 2012-02-06T09:13:51.806+0100: 328095.085: [GC 328095.086:
>> >> >> >>>>>>> [ParNew:
>> >> >> >>>>>>> 359895K->29357K(368640K), 0.0429070 secs]
>> >> >> >>>>>>> 3471021K->3143476K(5201920K), 0.0434950 secs] [Times:
>> >> >> >>>>>>> user=0.32
>> >> >> >>>>>>> sys=0.00, real=0.04 secs]
>> >> >> >>>>>>> 2012-02-06T09:13:55.922+0100: 328099.201: [GC 328099.201:
>> >> >> >>>>>>> [ParNew:
>> >> >> >>>>>>> 357037K->31817K(368640K), 0.0429130 secs]
>> >> >> >>>>>>> 3471156K->3148946K(5201920K), 0.0434930 secs] [Times:
>> >> >> >>>>>>> user=0.31
>> >> >> >>>>>>> sys=0.00, real=0.04 secs]
>> >> >> >>>>>>> 2012-02-06T09:13:59.044+0100: 328102.324: [GC 328102.324:
>> >> >> >>>>>>> [ParNew
>> >> >> >>>>>>> (promotion failure size = 1026) ?(promotion failed):
>> >> >> >>>>>>> 359497K->368640K(368640K), 0.2226790 secs]328102.547: [CMS:
>> >> >> >>>>>>> 3125609K->451515K(4833280K), 5.6225880 secs] 3476626K->4515
>> >> >> >>>>>>> 15K(5201920K), [CMS Perm : 124373K->124353K(262144K)],
>> >> >> >>>>>>> 5.8459380
>> >> >> >>>>>>> secs]
>> >> >> >>>>>>> [Times: user=6.20 sys=0.01, real=5.85 secs]
>> >> >> >>>>>>> 2012-02-06T09:14:05.243+0100: 328108.522: [GC 328108.523:
>> >> >> >>>>>>> [ParNew:
>> >> >> >>>>>>> 327680K->40960K(368640K), 0.0319160 secs]
>> >> >> >>>>>>> 779195K->497658K(5201920K),
>> >> >> >>>>>>> 0.0325360 secs] [Times: user=0.21 sys=0.01, real=0.03 secs]
>> >> >> >>>>>>> 2012-02-06T09:14:07.836+0100: 328111.116: [GC 328111.116:
>> >> >> >>>>>>> [ParNew:
>> >> >> >>>>>>> 368640K->32785K(368640K), 0.0744670 secs]
>> >> >> >>>>>>> 825338K->520234K(5201920K),
>> >> >> >>>>>>> 0.0750390 secs] [Times: user=0.40 sys=0.02, real=0.08 secs]
>> >> >> >>>>>>> --------------------
>> >> >> >>>>>>> Given the 1026 word size, I'm wondering if I should be
>> >> >> >>>>>>> hunting
>> >> >> >>>>>>> for
>> >> >> >>>>>>> an
>> >> >> >>>>>>> overuse of BufferedInputStream/BufferedOutoutStream, since
>> >> >> >>>>>>> both
>> >> >> >>>>>>> have
>> >> >> >>>>>>> 8192 as a default buffer size.
>> >> >> >>>>>>>
>> >> >> >>>>>>> The second group of promotion failures look like this
>> >> >> >>>>>>> (multiple
>> >> >> >>>>>>> ParNew
>> >> >> >>>>>>> threads, small failure sizes):
>> >> >> >>>>>>> --------------------
>> >> >> >>>>>>> 2012-02-06T09:50:15.773+0100: 328756.964: [GC 328756.964:
>> >> >> >>>>>>> [ParNew:
>> >> >> >>>>>>> 356116K->29934K(368640K), 0.0461100 secs]
>> >> >> >>>>>>> 3203863K->2880162K(5201920K), 0.0468870 secs] [Times:
>> >> >> >>>>>>> user=0.34
>> >> >> >>>>>>> sys=0.01, real=0.05 secs]
>> >> >> >>>>>>> 2012-02-06T09:50:19.153+0100: 328760.344: [GC 328760.344:
>> >> >> >>>>>>> [ParNew:
>> >> >> >>>>>>> 357614K->30359K(368640K), 0.0454680 secs]
>> >> >> >>>>>>> 3207842K->2882892K(5201920K), 0.0462280 secs] [Times:
>> >> >> >>>>>>> user=0.33
>> >> >> >>>>>>> sys=0.01, real=0.05 secs]
>> >> >> >>>>>>> 2012-02-06T09:50:22.658+0100: 328763.849: [GC 328763.849:
>> >> >> >>>>>>> [ParNew
>> >> >> >>>>>>> (1:
>> >> >> >>>>>>> promotion failure size = 25) ?(4: promotion failure size =
>> >> >> >>>>>>> 25)
>> >> >> >>>>>>> ?(6:
>> >> >> >>>>>>> promotion failure size = 25) ?(7: promotion failure size =
>> >> >> >>>>>>> 144)
>> >> >> >>>>>>> (promotion failed): 358039K->358358
>> >> >> >>>>>>> K(368640K), 0.2148680 secs]328764.064: [CMS:
>> >> >> >>>>>>> 2854709K->446750K(4833280K), 5.8368270 secs]
>> >> >> >>>>>>> 3210572K->446750K(5201920K), [CMS Perm :
>> >> >> >>>>>>> 124670K->124644K(262144K)],
>> >> >> >>>>>>> 6.0525230 secs] [Times: user=6.32 sys=0.00, real=6.05 secs]
>> >> >> >>>>>>> 2012-02-06T09:50:29.896+0100: 328771.086: [GC 328771.087:
>> >> >> >>>>>>> [ParNew:
>> >> >> >>>>>>> 327680K->22569K(368640K), 0.0227080 secs]
>> >> >> >>>>>>> 774430K->469319K(5201920K),
>> >> >> >>>>>>> 0.0235020 secs] [Times: user=0.16 sys=0.00, real=0.02 secs]
>> >> >> >>>>>>> 2012-02-06T09:50:31.076+0100: 328772.266: [GC 328772.267:
>> >> >> >>>>>>> [ParNew:
>> >> >> >>>>>>> 350249K->22264K(368640K), 0.0235480 secs]
>> >> >> >>>>>>> 796999K->469014K(5201920K),
>> >> >> >>>>>>> 0.0243000 secs] [Times: user=0.18 sys=0.01, real=0.02 secs]
>> >> >> >>>>>>> --------------------
>> >> >> >>>>>>>
>> >> >> >>>>>>> We're going to try to double the new size on a single node,
>> >> >> >>>>>>> to
>> >> >> >>>>>>> see
>> >> >> >>>>>>> the
>> >> >> >>>>>>> effects of that.
>> >> >> >>>>>>>
>> >> >> >>>>>>> Beyond this experiment, is there any additional data I can
>> >> >> >>>>>>> collect
>> >> >> >>>>>>> to
>> >> >> >>>>>>> better understand the nature of the promotion failures?
>> >> >> >>>>>>> Am I facing collecting free list statistics at this point?
>> >> >> >>>>>>>
>> >> >> >>>>>>> Thanks,
>> >> >> >>>>>>> Taras
>> >> >> >>>>>>
>> >> >> >>>>>>
>> >> >> >>>>>> _______________________________________________
>> >> >> >>>>>> hotspot-gc-use mailing list
>> >> >> >>>>>> hotspot-gc-use at openjdk.java.net
>> >> >> >>>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>> >> >> >>>>>>
>> >> >> >>>>>
>> >> >> >> _______________________________________________
>> >> >> >> hotspot-gc-use mailing list
>> >> >> >> hotspot-gc-use at openjdk.java.net
>> >> >> >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>> >> >> _______________________________________________
>> >> >> hotspot-gc-use mailing list
>> >> >> hotspot-gc-use at openjdk.java.net
>> >> >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>> >> >
>> >> >
>> >> _______________________________________________
>> >> hotspot-gc-use mailing list
>> >> hotspot-gc-use at openjdk.java.net
>> >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>> >
>> >
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>

From schelwa at tibco.com  Tue Apr 17 08:08:34 2012
From: schelwa at tibco.com (Shivkumar Chelwa)
Date: Tue, 17 Apr 2012 15:08:34 +0000
Subject: CMS Full GC
Message-ID: <A46594AC6A252142A6EF919EAF7EAD28264DC0@PA-MBX04.na.tibco.com>

Hi,

Till date I was using JRE 6u22 with following garbage collection parameters and the CMS cycle use to kick-in appropriately (when heap reaches 75%)


-server -d64 -Xms2048m -Xmx2048m -javaagent: my-instrumentation.jar -XX:MaxPermSize=256m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+DisableExplicitGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -verbose:gc -Xloggc:LB01.log -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -Djava.library=/usr/java/jre/lib/sparcv9/server/libjvm.so -Xmx8192M -Xms8192M -Xss256K

But I switched to JRE 6u29 and see the CMS Full GC happening randomly. Can you please help me undercover this mystery. Here is one of the log message from gc log file.

13475.239: [Full GC 13475.239: [CMS: 4321575K->3717474K(7898752K), 54.0602376 secs] 4412277K->3717474K(8339648K), [CMS Perm : 73791K->73339K(74048K)], 54.0615557 secs] [Times: user=53.97 sy
s=0.12, real=54.06 secs]


Kindly help.


Regards,
Shiv


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120417/5eb2f8ad/attachment.html 

From ysr1729 at gmail.com  Tue Apr 17 12:06:30 2012
From: ysr1729 at gmail.com (Srinivas Ramakrishna)
Date: Tue, 17 Apr 2012 12:06:30 -0700
Subject: CMS Full GC
In-Reply-To: <A46594AC6A252142A6EF919EAF7EAD28264DC0@PA-MBX04.na.tibco.com>
References: <A46594AC6A252142A6EF919EAF7EAD28264DC0@PA-MBX04.na.tibco.com>
Message-ID: <CABzyjymrXEXzeHAhfk3JcXCJfhSdf1bxXah=bUBdSKE5yCDVuQ@mail.gmail.com>

Is it possible that you are GC'ing here to expand perm gen. Check if
permgen footprint changed between the two JVM releases (when running yr
application).
Now, CMS should quietly expand perm gen without doing a stop-world GC, but
there was a temporary regression in that functionality before it was fixed
again.
I can't however recall the JVM versions where the regression was introduced
and then fixed. But all of this is handwaving on my part.
If you run 6u22 and 6u29 both with -XX:+PrintHeapAtGC, you might have more
visibility into why the GC is kicking in. A longer log would allow
the community to perhaps provide suggestions as well.

Which reminds me that there is a bug in the printing of GC cause (as
printed by jstat) which needs to be fixed. HotSpot/GC folk, have you
noticed that we never
see a "perm gen allocation" as the GC cause even when that's really the
reason for a full gc? (not that that should happen here where CMS is being
used.)

-- ramki

On Tue, Apr 17, 2012 at 8:08 AM, Shivkumar Chelwa <schelwa at tibco.com> wrote:

>  Hi,****
>
> ** **
>
> Till date I was using JRE 6u22 with following garbage collection
> parameters and the CMS cycle use to kick-in appropriately (when heap
> reaches 75%)****
>
> ** **
>
> ** **
>
> -server -d64 -Xms2048m -Xmx2048m -javaagent: my-instrumentation.jar
> -XX:MaxPermSize=256m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
> -XX:+DisableExplicitGC -XX:CMSInitiatingOccupancyFraction=75
> -XX:+UseCMSInitiatingOccupancyOnly -verbose:gc -Xloggc:LB01.log
> -XX:+PrintGCTimeStamps -XX:+PrintGCDetails
> -Djava.library=/usr/java/jre/lib/sparcv9/server/libjvm.so -Xmx8192M
> -Xms8192M -Xss256K****
>
> ** **
>
> But I switched to JRE 6u29 and see the *CMS Full GC* happening randomly.
> Can you please help me undercover this mystery. Here is one of the log
> message from gc log file.****
>
> ** **
>
> 13475.239: [*Full GC* 13475.239: [CMS: 4321575K->3717474K(7898752K), *54.0602376
> secs*] 4412277K->3717474K(8339648K), [CMS Perm : 73791K->73339K(74048K)],
> 54.0615557 secs] [Times: user=53.97 sy****
>
> s=0.12, real=54.06 secs]****
>
> ** **
>
> ** **
>
> Kindly help.****
>
> ** **
>
> ** **
>
> Regards,****
>
> Shiv****
>
> ** **
>
> ** **
>
> ** **
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120417/36357428/attachment.html 

From schelwa at tibco.com  Tue Apr 17 12:55:53 2012
From: schelwa at tibco.com (Shivkumar Chelwa)
Date: Tue, 17 Apr 2012 19:55:53 +0000
Subject: CMS Full GC
In-Reply-To: <CABzyjymrXEXzeHAhfk3JcXCJfhSdf1bxXah=bUBdSKE5yCDVuQ@mail.gmail.com>
References: <A46594AC6A252142A6EF919EAF7EAD28264DC0@PA-MBX04.na.tibco.com>
	<CABzyjymrXEXzeHAhfk3JcXCJfhSdf1bxXah=bUBdSKE5yCDVuQ@mail.gmail.com>
Message-ID: <A46594AC6A252142A6EF919EAF7EAD28265107@PA-MBX04.na.tibco.com>

Thanks Ramki. The perm gen size is well below the max setting. Only 70-80M is being used out of 256M, so I don't think it is an issue.

Will -XX:+PrintHeapAtGC prints heap only when there is Full GC?

Regards,
Shiv

________________________________
From: Srinivas Ramakrishna [mailto:ysr1729 at gmail.com]
Sent: 17 April 2012 15:07
To: Shivkumar Chelwa
Cc: hotspot-gc-use at openjdk.java.net
Subject: Re: CMS Full GC

Is it possible that you are GC'ing here to expand perm gen. Check if permgen footprint changed between the two JVM releases (when running yr application).
Now, CMS should quietly expand perm gen without doing a stop-world GC, but there was a temporary regression in that functionality before it was fixed again.
I can't however recall the JVM versions where the regression was introduced and then fixed. But all of this is handwaving on my part.
If you run 6u22 and 6u29 both with -XX:+PrintHeapAtGC, you might have more visibility into why the GC is kicking in. A longer log would allow
the community to perhaps provide suggestions as well.

Which reminds me that there is a bug in the printing of GC cause (as printed by jstat) which needs to be fixed. HotSpot/GC folk, have you noticed that we never
see a "perm gen allocation" as the GC cause even when that's really the reason for a full gc? (not that that should happen here where CMS is being used.)

-- ramki
On Tue, Apr 17, 2012 at 8:08 AM, Shivkumar Chelwa <schelwa at tibco.com<mailto:schelwa at tibco.com>> wrote:
Hi,

Till date I was using JRE 6u22 with following garbage collection parameters and the CMS cycle use to kick-in appropriately (when heap reaches 75%)


-server -d64 -Xms2048m -Xmx2048m -javaagent: my-instrumentation.jar -XX:MaxPermSize=256m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+DisableExplicitGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -verbose:gc -Xloggc:LB01.log -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -Djava.library=/usr/java/jre/lib/sparcv9/server/libjvm.so -Xmx8192M -Xms8192M -Xss256K

But I switched to JRE 6u29 and see the CMS Full GC happening randomly. Can you please help me undercover this mystery. Here is one of the log message from gc log file.

13475.239: [Full GC 13475.239: [CMS: 4321575K->3717474K(7898752K), 54.0602376 secs] 4412277K->3717474K(8339648K), [CMS Perm : 73791K->73339K(74048K)], 54.0615557 secs] [Times: user=53.97 sy
s=0.12, real=54.06 secs]


Kindly help.


Regards,
Shiv


_______________________________________________
hotspot-gc-use mailing list
hotspot-gc-use at openjdk.java.net<mailto:hotspot-gc-use at openjdk.java.net>
http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120417/b38a6794/attachment.html 

From ysr1729 at gmail.com  Tue Apr 17 13:52:16 2012
From: ysr1729 at gmail.com (Srinivas Ramakrishna)
Date: Tue, 17 Apr 2012 13:52:16 -0700
Subject: CMS Full GC
In-Reply-To: <A46594AC6A252142A6EF919EAF7EAD28265107@PA-MBX04.na.tibco.com>
References: <A46594AC6A252142A6EF919EAF7EAD28264DC0@PA-MBX04.na.tibco.com>
	<CABzyjymrXEXzeHAhfk3JcXCJfhSdf1bxXah=bUBdSKE5yCDVuQ@mail.gmail.com>
	<A46594AC6A252142A6EF919EAF7EAD28265107@PA-MBX04.na.tibco.com>
Message-ID: <A5C0367A-177E-406F-B65D-EA5C1F58CC96@gmail.com>

It'll print at every gc, minor as well as major.

Sent from my iPhone

On Apr 17, 2012, at 12:55 PM, Shivkumar Chelwa <schelwa at tibco.com> wrote:

> Thanks Ramki. The perm gen size is well below the max setting. Only 70-80M is being used out of 256M, so I don?t think it is an issue.
>  
> Will -XX:+PrintHeapAtGC prints heap only when there is Full GC?
>  
> Regards,
> Shiv
>  
> From: Srinivas Ramakrishna [mailto:ysr1729 at gmail.com] 
> Sent: 17 April 2012 15:07
> To: Shivkumar Chelwa
> Cc: hotspot-gc-use at openjdk.java.net
> Subject: Re: CMS Full GC
>  
> Is it possible that you are GC'ing here to expand perm gen. Check if permgen footprint changed between the two JVM releases (when running yr application).
> Now, CMS should quietly expand perm gen without doing a stop-world GC, but there was a temporary regression in that functionality before it was fixed again.
> I can't however recall the JVM versions where the regression was introduced and then fixed. But all of this is handwaving on my part.
> If you run 6u22 and 6u29 both with -XX:+PrintHeapAtGC, you might have more visibility into why the GC is kicking in. A longer log would allow
> the community to perhaps provide suggestions as well.
> 
> Which reminds me that there is a bug in the printing of GC cause (as printed by jstat) which needs to be fixed. HotSpot/GC folk, have you noticed that we never
> see a "perm gen allocation" as the GC cause even when that's really the reason for a full gc? (not that that should happen here where CMS is being used.)
> 
> -- ramki
> 
> On Tue, Apr 17, 2012 at 8:08 AM, Shivkumar Chelwa <schelwa at tibco.com> wrote:
> Hi,
>  
> Till date I was using JRE 6u22 with following garbage collection parameters and the CMS cycle use to kick-in appropriately (when heap reaches 75%)
>  
>  
> -server -d64 -Xms2048m -Xmx2048m -javaagent: my-instrumentation.jar -XX:MaxPermSize=256m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+DisableExplicitGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -verbose:gc -Xloggc:LB01.log -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -Djava.library=/usr/java/jre/lib/sparcv9/server/libjvm.so -Xmx8192M -Xms8192M -Xss256K
>  
> But I switched to JRE 6u29 and see the CMS Full GC happening randomly. Can you please help me undercover this mystery. Here is one of the log message from gc log file.
>  
> 13475.239: [Full GC 13475.239: [CMS: 4321575K->3717474K(7898752K), 54.0602376 secs] 4412277K->3717474K(8339648K), [CMS Perm : 73791K->73339K(74048K)], 54.0615557 secs] [Times: user=53.97 sy
> s=0.12, real=54.06 secs]
>  
>  
> Kindly help.
>  
>  
> Regards,
> Shiv
>  
>  
>  
> 
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
> 
>  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120417/c388ba58/attachment-0001.html 

From the.6th.month at gmail.com  Wed Apr 18 01:16:18 2012
From: the.6th.month at gmail.com (the.6th.month at gmail.com)
Date: Wed, 18 Apr 2012 16:16:18 +0800
Subject: does UseParallelOldGC guarantee a better full gc performance
Message-ID: <CAKzy53mWV1CyU7oJbmj5=2Bzq2Dq3VXvfuzTtz43=JE6S026-g@mail.gmail.com>

hi all:
I'm currently using jdk 6u26. I just enabled UseParallelOldGC, expecting
that would enhance the full gc efficiency and decrease the mark-sweep time
by using multiple-core. The JAVA_OPTS is as below:
-XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution
-Xloggc:gc.log-server -Xms4000m -Xmx4000m -Xss256k -Xmn256m
-XX:PermSize=256m -XX:+UseParallelOldGC  -server
-Dorg.apache.jasper.compiler.Parser.STRICT_QUOTE_ESCAPING=false
as shown in jinfo output, the settings have taken effect, and the
ParallelGCThreads is 4 since the jvm is running on a four-core server.
But what's strange is that the mark-sweep time remains almost unchanged (at
around 6-8 seconds), do I miss something here? Does anyone have the same
experience or any idea about the reason behind?
Thanks very much for help

All the best,
Leon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120418/5574cb47/attachment.html 

From sbordet at intalio.com  Wed Apr 18 01:24:30 2012
From: sbordet at intalio.com (Simone Bordet)
Date: Wed, 18 Apr 2012 10:24:30 +0200
Subject: does UseParallelOldGC guarantee a better full gc performance
In-Reply-To: <CAKzy53mWV1CyU7oJbmj5=2Bzq2Dq3VXvfuzTtz43=JE6S026-g@mail.gmail.com>
References: <CAKzy53mWV1CyU7oJbmj5=2Bzq2Dq3VXvfuzTtz43=JE6S026-g@mail.gmail.com>
Message-ID: <CAFWmRJ14HdoM1FavGL=WX=R3AP7AkLngxV_95fLtJk0H1ELM3w@mail.gmail.com>

Hi,

On Wed, Apr 18, 2012 at 10:16, the.6th.month at gmail.com
<the.6th.month at gmail.com> wrote:
> hi all:
> I'm currently using jdk 6u26. I just enabled UseParallelOldGC, expecting
> that would enhance the full gc efficiency and decrease the mark-sweep time
> by using multiple-core. The JAVA_OPTS is as below:
> -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution
> -Xloggc:gc.log-server -Xms4000m -Xmx4000m -Xss256k -Xmn256m
> -XX:PermSize=256m -XX:+UseParallelOldGC? -server
> -Dorg.apache.jasper.compiler.Parser.STRICT_QUOTE_ESCAPING=false
> as shown in jinfo output, the settings have taken effect, and the
> ParallelGCThreads is 4 since the jvm is running on a four-core server.
> But what's strange is that the mark-sweep time remains almost unchanged (at
> around 6-8 seconds), do I miss something here? Does anyone have the same
> experience or any idea about the reason behind?
> Thanks very much for help

The young generation is fairly small for a 4GiB heap.

Can we see the lines you mention from the logs ?

Simon
-- 
http://cometd.org
http://intalio.com
http://bordet.blogspot.com
----
Finally, no matter how good the architecture and design are,
to deliver bug-free software with optimal performance and reliability,
the implementation technique must be flawless.?? Victoria Livschitz

From the.6th.month at gmail.com  Wed Apr 18 01:58:11 2012
From: the.6th.month at gmail.com (the.6th.month at gmail.com)
Date: Wed, 18 Apr 2012 16:58:11 +0800
Subject: does UseParallelOldGC guarantee a better full gc performance
In-Reply-To: <CAFWmRJ14HdoM1FavGL=WX=R3AP7AkLngxV_95fLtJk0H1ELM3w@mail.gmail.com>
References: <CAKzy53mWV1CyU7oJbmj5=2Bzq2Dq3VXvfuzTtz43=JE6S026-g@mail.gmail.com>
	<CAFWmRJ14HdoM1FavGL=WX=R3AP7AkLngxV_95fLtJk0H1ELM3w@mail.gmail.com>
Message-ID: <CAKzy53mXkLAb81UH-zhaowMyQtEsFHP7XV9GpShsRzjCvh3HBA@mail.gmail.com>

Hi, Simon:

this is the full gc log for your concern.
2012-04-18T16:47:24.824+0800: 988.392: [GC
Desired survivor size 14876672 bytes, new threshold 1 (max 15)
 [PSYoungGen: 236288K->8126K(247616K)] 4054802K->3830711K(4081472K),
0.0512250 secs] [Times: user=0.15 sys=0.00, real=0.05 secs]

2012-04-18T16:47:24.875+0800: 988.443: [Full GC [PSYoungGen:
8126K->0K(247616K)] [PSOldGen: 3822585K->1751429K(3833856K)]
3830711K->1751429K(4081472K) [PSPermGen: 81721K->81721K(262144K)],
6.6108630 secs] [Times: user=6.62 sys=0.00, real=6.61 secs]

the full gc time is almost unchanged since I enabled paralleloldgc.

Do you have any recommendation for an appropriate young gen size?

Thanks

All the best,
Leon


On 18 April 2012 16:24, Simone Bordet <sbordet at intalio.com> wrote:

> Hi,
>
> On Wed, Apr 18, 2012 at 10:16, the.6th.month at gmail.com
> <the.6th.month at gmail.com> wrote:
> > hi all:
> > I'm currently using jdk 6u26. I just enabled UseParallelOldGC, expecting
> > that would enhance the full gc efficiency and decrease the mark-sweep
> time
> > by using multiple-core. The JAVA_OPTS is as below:
> > -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution
> > -Xloggc:gc.log-server -Xms4000m -Xmx4000m -Xss256k -Xmn256m
> > -XX:PermSize=256m -XX:+UseParallelOldGC  -server
> > -Dorg.apache.jasper.compiler.Parser.STRICT_QUOTE_ESCAPING=false
> > as shown in jinfo output, the settings have taken effect, and the
> > ParallelGCThreads is 4 since the jvm is running on a four-core server.
> > But what's strange is that the mark-sweep time remains almost unchanged
> (at
> > around 6-8 seconds), do I miss something here? Does anyone have the same
> > experience or any idea about the reason behind?
> > Thanks very much for help
>
> The young generation is fairly small for a 4GiB heap.
>
> Can we see the lines you mention from the logs ?
>
> Simon
> --
> http://cometd.org
> http://intalio.com
> http://bordet.blogspot.com
> ----
> Finally, no matter how good the architecture and design are,
> to deliver bug-free software with optimal performance and reliability,
> the implementation technique must be flawless.   Victoria Livschitz
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120418/c6acd54f/attachment.html 

From sbordet at intalio.com  Wed Apr 18 02:19:34 2012
From: sbordet at intalio.com (Simone Bordet)
Date: Wed, 18 Apr 2012 11:19:34 +0200
Subject: does UseParallelOldGC guarantee a better full gc performance
In-Reply-To: <CAKzy53mXkLAb81UH-zhaowMyQtEsFHP7XV9GpShsRzjCvh3HBA@mail.gmail.com>
References: <CAKzy53mWV1CyU7oJbmj5=2Bzq2Dq3VXvfuzTtz43=JE6S026-g@mail.gmail.com>
	<CAFWmRJ14HdoM1FavGL=WX=R3AP7AkLngxV_95fLtJk0H1ELM3w@mail.gmail.com>
	<CAKzy53mXkLAb81UH-zhaowMyQtEsFHP7XV9GpShsRzjCvh3HBA@mail.gmail.com>
Message-ID: <CAFWmRJ03m1n2ZpmWBHVUO2TzJNVcRP6yNDqmACcE_yXhdOMYzg@mail.gmail.com>

Hi,

On Wed, Apr 18, 2012 at 10:58, the.6th.month at gmail.com
<the.6th.month at gmail.com> wrote:
> Hi, Simon:
>
> this is the full gc log for your concern.
> 2012-04-18T16:47:24.824+0800: 988.392: [GC
> Desired survivor size 14876672 bytes, new threshold 1 (max 15)
> ?[PSYoungGen: 236288K->8126K(247616K)] 4054802K->3830711K(4081472K),
> 0.0512250 secs] [Times: user=0.15 sys=0.00, real=0.05 secs]
>
> 2012-04-18T16:47:24.875+0800: 988.443: [Full GC [PSYoungGen:
> 8126K->0K(247616K)] [PSOldGen: 3822585K->1751429K(3833856K)]
> 3830711K->1751429K(4081472K) [PSPermGen: 81721K->81721K(262144K)], 6.6108630
> secs] [Times: user=6.62 sys=0.00, real=6.61 secs]
>
> the full gc time is almost unchanged since I enabled paralleloldgc.
>
> Do you have any recommendation for an appropriate young gen size?

Usually, applications generate a lot of short lived objects that can
be reclaimed very efficiently in the young generation.
If you have a small young generation, these objects will be promoted
in old generation, where their collection is usually more expensive.

It really depends what your application does, but I would remove the
-Xmn option for now (leaving the default of 1/3 of the total heap,
i.e. ~1.6 GiB), and see if you get benefits.

As for the times being unchanged, I do not know.
My experience is that UseParallelOldGC works as expected: I frequently
see 1.5-2x gains on 2 cores, and I have seen 6x gains on 8 cores.

Simon
-- 
http://cometd.org
http://intalio.com
http://bordet.blogspot.com
----
Finally, no matter how good the architecture and design are,
to deliver bug-free software with optimal performance and reliability,
the implementation technique must be flawless.?? Victoria Livschitz

From the.6th.month at gmail.com  Wed Apr 18 03:07:05 2012
From: the.6th.month at gmail.com (the.6th.month at gmail.com)
Date: Wed, 18 Apr 2012 18:07:05 +0800
Subject: does UseParallelOldGC guarantee a better full gc performance
In-Reply-To: <CAFWmRJ03m1n2ZpmWBHVUO2TzJNVcRP6yNDqmACcE_yXhdOMYzg@mail.gmail.com>
References: <CAKzy53mWV1CyU7oJbmj5=2Bzq2Dq3VXvfuzTtz43=JE6S026-g@mail.gmail.com>
	<CAFWmRJ14HdoM1FavGL=WX=R3AP7AkLngxV_95fLtJk0H1ELM3w@mail.gmail.com>
	<CAKzy53mXkLAb81UH-zhaowMyQtEsFHP7XV9GpShsRzjCvh3HBA@mail.gmail.com>
	<CAFWmRJ03m1n2ZpmWBHVUO2TzJNVcRP6yNDqmACcE_yXhdOMYzg@mail.gmail.com>
Message-ID: <CAKzy53nKr5NtHhXzB0Sqe9gcZwhkmR0vQ21x0jQOHMO53BaY=w@mail.gmail.com>

Hi, Simon:
Thanks for your reply. That's really weird, I'll look into it and give the
feedback later
Thanks again.

All the best,
Leon

On 18 April 2012 17:19, Simone Bordet <sbordet at intalio.com> wrote:

> Hi,
>
> On Wed, Apr 18, 2012 at 10:58, the.6th.month at gmail.com
> <the.6th.month at gmail.com> wrote:
> > Hi, Simon:
> >
> > this is the full gc log for your concern.
> > 2012-04-18T16:47:24.824+0800: 988.392: [GC
> > Desired survivor size 14876672 bytes, new threshold 1 (max 15)
> >  [PSYoungGen: 236288K->8126K(247616K)] 4054802K->3830711K(4081472K),
> > 0.0512250 secs] [Times: user=0.15 sys=0.00, real=0.05 secs]
> >
> > 2012-04-18T16:47:24.875+0800: 988.443: [Full GC [PSYoungGen:
> > 8126K->0K(247616K)] [PSOldGen: 3822585K->1751429K(3833856K)]
> > 3830711K->1751429K(4081472K) [PSPermGen: 81721K->81721K(262144K)],
> 6.6108630
> > secs] [Times: user=6.62 sys=0.00, real=6.61 secs]
> >
> > the full gc time is almost unchanged since I enabled paralleloldgc.
> >
> > Do you have any recommendation for an appropriate young gen size?
>
> Usually, applications generate a lot of short lived objects that can
> be reclaimed very efficiently in the young generation.
> If you have a small young generation, these objects will be promoted
> in old generation, where their collection is usually more expensive.
>
> It really depends what your application does, but I would remove the
> -Xmn option for now (leaving the default of 1/3 of the total heap,
> i.e. ~1.6 GiB), and see if you get benefits.
>
> As for the times being unchanged, I do not know.
> My experience is that UseParallelOldGC works as expected: I frequently
> see 1.5-2x gains on 2 cores, and I have seen 6x gains on 8 cores.
>
> Simon
> --
> http://cometd.org
> http://intalio.com
> http://bordet.blogspot.com
> ----
> Finally, no matter how good the architecture and design are,
> to deliver bug-free software with optimal performance and reliability,
> the implementation technique must be flawless.   Victoria Livschitz
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120418/58299b7d/attachment.html 

From the.6th.month at gmail.com  Wed Apr 18 04:03:37 2012
From: the.6th.month at gmail.com (the.6th.month at gmail.com)
Date: Wed, 18 Apr 2012 19:03:37 +0800
Subject: does UseParallelOldGC guarantee a better full gc performance
In-Reply-To: <CAKzy53nKr5NtHhXzB0Sqe9gcZwhkmR0vQ21x0jQOHMO53BaY=w@mail.gmail.com>
References: <CAKzy53mWV1CyU7oJbmj5=2Bzq2Dq3VXvfuzTtz43=JE6S026-g@mail.gmail.com>
	<CAFWmRJ14HdoM1FavGL=WX=R3AP7AkLngxV_95fLtJk0H1ELM3w@mail.gmail.com>
	<CAKzy53mXkLAb81UH-zhaowMyQtEsFHP7XV9GpShsRzjCvh3HBA@mail.gmail.com>
	<CAFWmRJ03m1n2ZpmWBHVUO2TzJNVcRP6yNDqmACcE_yXhdOMYzg@mail.gmail.com>
	<CAKzy53nKr5NtHhXzB0Sqe9gcZwhkmR0vQ21x0jQOHMO53BaY=w@mail.gmail.com>
Message-ID: <CAKzy53k7Mo3W3Y2cwi=Pi1dzwujevOHWO54jmbYGxjxkf12uxg@mail.gmail.com>

hi, Simon:
here is another gc-log fragment about full gc, you can see that although
I've configured the jvm to UseParallelOldGC and increased the younggen size
to 768m, it still takes 13 seconds to finish the full gc, even worse than
before.

{Heap before GC invocations=109 (full 2):
 PSYoungGen      total 705984K, used 14531K [0x00000007d0000000,
0x0000000800000000, 0x0000000800000000)
  eden space 629120K, 0% used
[0x00000007d0000000,0x00000007d0000000,0x00000007f6660000)
  from space 76864K, 18% used
[0x00000007f6660000,0x00000007f7490da8,0x00000007fb170000)
  to   space 76672K, 0% used
[0x00000007fb520000,0x00000007fb520000,0x0000000800000000)
 ParOldGen       total 3309568K, used 3279215K [0x0000000706000000,
0x00000007d0000000, 0x00000007d0000000)
  object space 3309568K, 99% used
[0x0000000706000000,0x00000007ce25bd68,0x00000007d0000000)
 PSPermGen       total 262144K, used 79139K [0x00000006f6000000,
0x0000000706000000, 0x0000000706000000)
  object space 262144K, 30% used
[0x00000006f6000000,0x00000006fad48e38,0x0000000706000000)
2012-04-18T18:55:53.500+0800: 767.928: [Full GC [PSYoungGen:
14531K->0K(705984K)] [ParOldGen: 3279215K->1474447K(3309568K)]
3293746K->1474447K(4015552K) [PSPermGen: 79139K->74190K(262144K)],
13.0669910 secs] [Times: user=41.91 sys=0.19, real=13.06 secs]

quite counter-intuitive, huh?

Leon

On 18 April 2012 18:07, the.6th.month at gmail.com <the.6th.month at gmail.com>wrote:

> Hi, Simon:
> Thanks for your reply. That's really weird, I'll look into it and give the
> feedback later
> Thanks again.
>
> All the best,
> Leon
>
>
> On 18 April 2012 17:19, Simone Bordet <sbordet at intalio.com> wrote:
>
>> Hi,
>>
>> On Wed, Apr 18, 2012 at 10:58, the.6th.month at gmail.com
>> <the.6th.month at gmail.com> wrote:
>> > Hi, Simon:
>> >
>> > this is the full gc log for your concern.
>> > 2012-04-18T16:47:24.824+0800: 988.392: [GC
>> > Desired survivor size 14876672 bytes, new threshold 1 (max 15)
>> >  [PSYoungGen: 236288K->8126K(247616K)] 4054802K->3830711K(4081472K),
>> > 0.0512250 secs] [Times: user=0.15 sys=0.00, real=0.05 secs]
>> >
>> > 2012-04-18T16:47:24.875+0800: 988.443: [Full GC [PSYoungGen:
>> > 8126K->0K(247616K)] [PSOldGen: 3822585K->1751429K(3833856K)]
>> > 3830711K->1751429K(4081472K) [PSPermGen: 81721K->81721K(262144K)],
>> 6.6108630
>> > secs] [Times: user=6.62 sys=0.00, real=6.61 secs]
>> >
>> > the full gc time is almost unchanged since I enabled paralleloldgc.
>> >
>> > Do you have any recommendation for an appropriate young gen size?
>>
>> Usually, applications generate a lot of short lived objects that can
>> be reclaimed very efficiently in the young generation.
>> If you have a small young generation, these objects will be promoted
>> in old generation, where their collection is usually more expensive.
>>
>> It really depends what your application does, but I would remove the
>> -Xmn option for now (leaving the default of 1/3 of the total heap,
>> i.e. ~1.6 GiB), and see if you get benefits.
>>
>> As for the times being unchanged, I do not know.
>> My experience is that UseParallelOldGC works as expected: I frequently
>> see 1.5-2x gains on 2 cores, and I have seen 6x gains on 8 cores.
>>
>> Simon
>> --
>> http://cometd.org
>> http://intalio.com
>> http://bordet.blogspot.com
>> ----
>> Finally, no matter how good the architecture and design are,
>> to deliver bug-free software with optimal performance and reliability,
>> the implementation technique must be flawless.   Victoria Livschitz
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120418/37188e0a/attachment-0001.html 

From sbordet at intalio.com  Wed Apr 18 06:10:08 2012
From: sbordet at intalio.com (Simone Bordet)
Date: Wed, 18 Apr 2012 15:10:08 +0200
Subject: does UseParallelOldGC guarantee a better full gc performance
In-Reply-To: <CAKzy53k7Mo3W3Y2cwi=Pi1dzwujevOHWO54jmbYGxjxkf12uxg@mail.gmail.com>
References: <CAKzy53mWV1CyU7oJbmj5=2Bzq2Dq3VXvfuzTtz43=JE6S026-g@mail.gmail.com>
	<CAFWmRJ14HdoM1FavGL=WX=R3AP7AkLngxV_95fLtJk0H1ELM3w@mail.gmail.com>
	<CAKzy53mXkLAb81UH-zhaowMyQtEsFHP7XV9GpShsRzjCvh3HBA@mail.gmail.com>
	<CAFWmRJ03m1n2ZpmWBHVUO2TzJNVcRP6yNDqmACcE_yXhdOMYzg@mail.gmail.com>
	<CAKzy53nKr5NtHhXzB0Sqe9gcZwhkmR0vQ21x0jQOHMO53BaY=w@mail.gmail.com>
	<CAKzy53k7Mo3W3Y2cwi=Pi1dzwujevOHWO54jmbYGxjxkf12uxg@mail.gmail.com>
Message-ID: <CAFWmRJ2Kse4=ZoapYU7g3bGP38PNaPJ8aD6-+ZrpsheQcgx4Xw@mail.gmail.com>

Hi,

On Wed, Apr 18, 2012 at 13:03, the.6th.month at gmail.com
<the.6th.month at gmail.com> wrote:
> hi, Simon:
> here is another gc-log fragment about full gc, you can see that although
> I've configured the jvm to UseParallelOldGC and increased the younggen size
> to 768m, it still takes 13 seconds to finish the full gc, even worse than
> before.
>
> {Heap before GC invocations=109 (full 2):
> ?PSYoungGen????? total 705984K, used 14531K [0x00000007d0000000,
> 0x0000000800000000, 0x0000000800000000)
> ? eden space 629120K, 0% used
> [0x00000007d0000000,0x00000007d0000000,0x00000007f6660000)
> ? from space 76864K, 18% used
> [0x00000007f6660000,0x00000007f7490da8,0x00000007fb170000)
> ? to?? space 76672K, 0% used
> [0x00000007fb520000,0x00000007fb520000,0x0000000800000000)
> ?ParOldGen?????? total 3309568K, used 3279215K [0x0000000706000000,
> 0x00000007d0000000, 0x00000007d0000000)
> ? object space 3309568K, 99% used
> [0x0000000706000000,0x00000007ce25bd68,0x00000007d0000000)
> ?PSPermGen?????? total 262144K, used 79139K [0x00000006f6000000,
> 0x0000000706000000, 0x0000000706000000)
> ? object space 262144K, 30% used
> [0x00000006f6000000,0x00000006fad48e38,0x0000000706000000)
> 2012-04-18T18:55:53.500+0800: 767.928: [Full GC [PSYoungGen:
> 14531K->0K(705984K)] [ParOldGen: 3279215K->1474447K(3309568K)]
> 3293746K->1474447K(4015552K) [PSPermGen: 79139K->74190K(262144K)],
> 13.0669910 secs] [Times: user=41.91 sys=0.19, real=13.06 secs]
>
> quite counter-intuitive, huh?

Well, maybe. But it shows that the parallel collector does its work,
since you had a 41.91/13.06 = 3.2x gain on your 4 cores.

The rest of the analysis depends on what your application does (e.g.
allocation rate and whether uses caches, references, etc.) and on a
complete GC log.

Simon
-- 
http://cometd.org
http://intalio.com
http://bordet.blogspot.com
----
Finally, no matter how good the architecture and design are,
to deliver bug-free software with optimal performance and reliability,
the implementation technique must be flawless.?? Victoria Livschitz

From jon.masamitsu at oracle.com  Wed Apr 18 09:19:31 2012
From: jon.masamitsu at oracle.com (Jon Masamitsu)
Date: Wed, 18 Apr 2012 09:19:31 -0700
Subject: does UseParallelOldGC guarantee a better full gc performance
In-Reply-To: <CAKzy53mXkLAb81UH-zhaowMyQtEsFHP7XV9GpShsRzjCvh3HBA@mail.gmail.com>
References: <CAKzy53mWV1CyU7oJbmj5=2Bzq2Dq3VXvfuzTtz43=JE6S026-g@mail.gmail.com>	<CAFWmRJ14HdoM1FavGL=WX=R3AP7AkLngxV_95fLtJk0H1ELM3w@mail.gmail.com>
	<CAKzy53mXkLAb81UH-zhaowMyQtEsFHP7XV9GpShsRzjCvh3HBA@mail.gmail.com>
Message-ID: <4F8EE993.8030502@oracle.com>

Leon,

In this log you see as part of an entry "PSOldGen:" which says you're
using the serial mark sweep.  I see in your later posts that "ParOldGen:"
appears in your log and that is the parallel mark sweep collector.

Jon

On 4/18/2012 1:58 AM, the.6th.month at gmail.com wrote:
> Hi, Simon:
>
> this is the full gc log for your concern.
> 2012-04-18T16:47:24.824+0800: 988.392: [GC
> Desired survivor size 14876672 bytes, new threshold 1 (max 15)
>   [PSYoungGen: 236288K->8126K(247616K)] 4054802K->3830711K(4081472K),
> 0.0512250 secs] [Times: user=0.15 sys=0.00, real=0.05 secs]
>
> 2012-04-18T16:47:24.875+0800: 988.443: [Full GC [PSYoungGen:
> 8126K->0K(247616K)] [PSOldGen: 3822585K->1751429K(3833856K)]
> 3830711K->1751429K(4081472K) [PSPermGen: 81721K->81721K(262144K)],
> 6.6108630 secs] [Times: user=6.62 sys=0.00, real=6.61 secs]
>
> the full gc time is almost unchanged since I enabled paralleloldgc.
>
> Do you have any recommendation for an appropriate young gen size?
>
> Thanks
>
> All the best,
> Leon
>
>
> On 18 April 2012 16:24, Simone Bordet<sbordet at intalio.com>  wrote:
>
>> Hi,
>>
>> On Wed, Apr 18, 2012 at 10:16, the.6th.month at gmail.com
>> <the.6th.month at gmail.com>  wrote:
>>> hi all:
>>> I'm currently using jdk 6u26. I just enabled UseParallelOldGC, expecting
>>> that would enhance the full gc efficiency and decrease the mark-sweep
>> time
>>> by using multiple-core. The JAVA_OPTS is as below:
>>> -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution
>>> -Xloggc:gc.log-server -Xms4000m -Xmx4000m -Xss256k -Xmn256m
>>> -XX:PermSize=256m -XX:+UseParallelOldGC  -server
>>> -Dorg.apache.jasper.compiler.Parser.STRICT_QUOTE_ESCAPING=false
>>> as shown in jinfo output, the settings have taken effect, and the
>>> ParallelGCThreads is 4 since the jvm is running on a four-core server.
>>> But what's strange is that the mark-sweep time remains almost unchanged
>> (at
>>> around 6-8 seconds), do I miss something here? Does anyone have the same
>>> experience or any idea about the reason behind?
>>> Thanks very much for help
>> The young generation is fairly small for a 4GiB heap.
>>
>> Can we see the lines you mention from the logs ?
>>
>> Simon
>> --
>> http://cometd.org
>> http://intalio.com
>> http://bordet.blogspot.com
>> ----
>> Finally, no matter how good the architecture and design are,
>> to deliver bug-free software with optimal performance and reliability,
>> the implementation technique must be flawless.   Victoria Livschitz
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120418/0fbd072d/attachment.html 

From the.6th.month at gmail.com  Wed Apr 18 09:27:01 2012
From: the.6th.month at gmail.com (the.6th.month at gmail.com)
Date: Thu, 19 Apr 2012 00:27:01 +0800
Subject: does UseParallelOldGC guarantee a better full gc performance
In-Reply-To: <4F8EE993.8030502@oracle.com>
References: <CAKzy53mWV1CyU7oJbmj5=2Bzq2Dq3VXvfuzTtz43=JE6S026-g@mail.gmail.com>
	<CAFWmRJ14HdoM1FavGL=WX=R3AP7AkLngxV_95fLtJk0H1ELM3w@mail.gmail.com>
	<CAKzy53mXkLAb81UH-zhaowMyQtEsFHP7XV9GpShsRzjCvh3HBA@mail.gmail.com>
	<4F8EE993.8030502@oracle.com>
Message-ID: <CAKzy53=UL3AC623gXeyvZcDdOZ2SNw53xE0bNH7BCy-rWmSKPg@mail.gmail.com>

Hi, Jon,
yup,,,I know, but what is weird is the paroldgen doesn't bring about better
full gc performance as seen from JMX metrics but bring unexpected swap
consumption.
I am gonna look into my application instead for some inspiration.

Leon

On 19 April 2012 00:19, Jon Masamitsu <jon.masamitsu at oracle.com> wrote:

> **
> Leon,
>
> In this log you see as part of an entry "PSOldGen:" which says you're
> using the serial mark sweep.  I see in your later posts that "ParOldGen:"
> appears in your log and that is the parallel mark sweep collector.
>
> Jon
>
>
> On 4/18/2012 1:58 AM, the.6th.month at gmail.com wrote:
>
> Hi, Simon:
>
> this is the full gc log for your concern.
> 2012-04-18T16:47:24.824+0800: 988.392: [GC
> Desired survivor size 14876672 bytes, new threshold 1 (max 15)
>  [PSYoungGen: 236288K->8126K(247616K)] 4054802K->3830711K(4081472K),
> 0.0512250 secs] [Times: user=0.15 sys=0.00, real=0.05 secs]
>
> 2012-04-18T16:47:24.875+0800: 988.443: [Full GC [PSYoungGen:
> 8126K->0K(247616K)] [PSOldGen: 3822585K->1751429K(3833856K)]
> 3830711K->1751429K(4081472K) [PSPermGen: 81721K->81721K(262144K)],
> 6.6108630 secs] [Times: user=6.62 sys=0.00, real=6.61 secs]
>
> the full gc time is almost unchanged since I enabled paralleloldgc.
>
> Do you have any recommendation for an appropriate young gen size?
>
> Thanks
>
> All the best,
> Leon
>
>
> On 18 April 2012 16:24, Simone Bordet <sbordet at intalio.com> <sbordet at intalio.com> wrote:
>
>
>  Hi,
>
> On Wed, Apr 18, 2012 at 10:16, the.6th.month at gmail.com<the.6th.month at gmail.com> <the.6th.month at gmail.com> wrote:
>
>  hi all:
> I'm currently using jdk 6u26. I just enabled UseParallelOldGC, expecting
> that would enhance the full gc efficiency and decrease the mark-sweep
>
>  time
>
>  by using multiple-core. The JAVA_OPTS is as below:
> -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution
> -Xloggc:gc.log-server -Xms4000m -Xmx4000m -Xss256k -Xmn256m
> -XX:PermSize=256m -XX:+UseParallelOldGC  -server
> -Dorg.apache.jasper.compiler.Parser.STRICT_QUOTE_ESCAPING=false
> as shown in jinfo output, the settings have taken effect, and the
> ParallelGCThreads is 4 since the jvm is running on a four-core server.
> But what's strange is that the mark-sweep time remains almost unchanged
>
>  (at
>
>  around 6-8 seconds), do I miss something here? Does anyone have the same
> experience or any idea about the reason behind?
> Thanks very much for help
>
>  The young generation is fairly small for a 4GiB heap.
>
> Can we see the lines you mention from the logs ?
>
> Simon
> --http://cometd.orghttp://intalio.comhttp://bordet.blogspot.com
> ----
> Finally, no matter how good the architecture and design are,
> to deliver bug-free software with optimal performance and reliability,
> the implementation technique must be flawless.   Victoria Livschitz
> _______________________________________________
> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
> _______________________________________________
> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120419/ecb9a050/attachment.html 

From jon.masamitsu at oracle.com  Wed Apr 18 10:36:31 2012
From: jon.masamitsu at oracle.com (Jon Masamitsu)
Date: Wed, 18 Apr 2012 10:36:31 -0700
Subject: does UseParallelOldGC guarantee a better full gc performance
In-Reply-To: <CAKzy53=UL3AC623gXeyvZcDdOZ2SNw53xE0bNH7BCy-rWmSKPg@mail.gmail.com>
References: <CAKzy53mWV1CyU7oJbmj5=2Bzq2Dq3VXvfuzTtz43=JE6S026-g@mail.gmail.com>
	<CAFWmRJ14HdoM1FavGL=WX=R3AP7AkLngxV_95fLtJk0H1ELM3w@mail.gmail.com>
	<CAKzy53mXkLAb81UH-zhaowMyQtEsFHP7XV9GpShsRzjCvh3HBA@mail.gmail.com>
	<4F8EE993.8030502@oracle.com>
	<CAKzy53=UL3AC623gXeyvZcDdOZ2SNw53xE0bNH7BCy-rWmSKPg@mail.gmail.com>
Message-ID: <4F8EFB9F.5030404@oracle.com>

Leon,

I don't think I've actually seen logs with the same flags except changing
parallel old for serial old so hard for me to say.  Simon's  comment

> Well, maybe. But it shows that the parallel collector does its work,
> since you had a 41.91/13.06 = 3.2x gain on your 4 cores.

says there is a parallel speed up, however, so I'll let you investigate 
you application
and leave it at that.

Jon


On 4/18/2012 9:27 AM, the.6th.month at gmail.com wrote:
> Hi, Jon,
> yup,,,I know, but what is weird is the paroldgen doesn't bring about better
> full gc performance as seen from JMX metrics but bring unexpected swap
> consumption.
> I am gonna look into my application instead for some inspiration.
>
> Leon
>
> On 19 April 2012 00:19, Jon Masamitsu<jon.masamitsu at oracle.com>  wrote:
>
>> **
>> Leon,
>>
>> In this log you see as part of an entry "PSOldGen:" which says you're
>> using the serial mark sweep.  I see in your later posts that "ParOldGen:"
>> appears in your log and that is the parallel mark sweep collector.
>>
>> Jon
>>
>>
>> On 4/18/2012 1:58 AM, the.6th.month at gmail.com wrote:
>>
>> Hi, Simon:
>>
>> this is the full gc log for your concern.
>> 2012-04-18T16:47:24.824+0800: 988.392: [GC
>> Desired survivor size 14876672 bytes, new threshold 1 (max 15)
>>   [PSYoungGen: 236288K->8126K(247616K)] 4054802K->3830711K(4081472K),
>> 0.0512250 secs] [Times: user=0.15 sys=0.00, real=0.05 secs]
>>
>> 2012-04-18T16:47:24.875+0800: 988.443: [Full GC [PSYoungGen:
>> 8126K->0K(247616K)] [PSOldGen: 3822585K->1751429K(3833856K)]
>> 3830711K->1751429K(4081472K) [PSPermGen: 81721K->81721K(262144K)],
>> 6.6108630 secs] [Times: user=6.62 sys=0.00, real=6.61 secs]
>>
>> the full gc time is almost unchanged since I enabled paralleloldgc.
>>
>> Do you have any recommendation for an appropriate young gen size?
>>
>> Thanks
>>
>> All the best,
>> Leon
>>
>>
>> On 18 April 2012 16:24, Simone Bordet<sbordet at intalio.com>  <sbordet at intalio.com>  wrote:
>>
>>
>>   Hi,
>>
>> On Wed, Apr 18, 2012 at 10:16, the.6th.month at gmail.com<the.6th.month at gmail.com>  <the.6th.month at gmail.com>  wrote:
>>
>>   hi all:
>> I'm currently using jdk 6u26. I just enabled UseParallelOldGC, expecting
>> that would enhance the full gc efficiency and decrease the mark-sweep
>>
>>   time
>>
>>   by using multiple-core. The JAVA_OPTS is as below:
>> -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution
>> -Xloggc:gc.log-server -Xms4000m -Xmx4000m -Xss256k -Xmn256m
>> -XX:PermSize=256m -XX:+UseParallelOldGC  -server
>> -Dorg.apache.jasper.compiler.Parser.STRICT_QUOTE_ESCAPING=false
>> as shown in jinfo output, the settings have taken effect, and the
>> ParallelGCThreads is 4 since the jvm is running on a four-core server.
>> But what's strange is that the mark-sweep time remains almost unchanged
>>
>>   (at
>>
>>   around 6-8 seconds), do I miss something here? Does anyone have the same
>> experience or any idea about the reason behind?
>> Thanks very much for help
>>
>>   The young generation is fairly small for a 4GiB heap.
>>
>> Can we see the lines you mention from the logs ?
>>
>> Simon
>> --http://cometd.orghttp://intalio.comhttp://bordet.blogspot.com
>> ----
>> Finally, no matter how good the architecture and design are,
>> to deliver bug-free software with optimal performance and reliability,
>> the implementation technique must be flawless.   Victoria Livschitz
>> _______________________________________________
>> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>
>>
>> _______________________________________________
>> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>
>>
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>
>>

From ysr1729 at gmail.com  Wed Apr 18 13:07:19 2012
From: ysr1729 at gmail.com (Srinivas Ramakrishna)
Date: Wed, 18 Apr 2012 13:07:19 -0700
Subject: does UseParallelOldGC guarantee a better full gc performance
In-Reply-To: <4F8EFB9F.5030404@oracle.com>
References: <CAKzy53mWV1CyU7oJbmj5=2Bzq2Dq3VXvfuzTtz43=JE6S026-g@mail.gmail.com>
	<CAFWmRJ14HdoM1FavGL=WX=R3AP7AkLngxV_95fLtJk0H1ELM3w@mail.gmail.com>
	<CAKzy53mXkLAb81UH-zhaowMyQtEsFHP7XV9GpShsRzjCvh3HBA@mail.gmail.com>
	<4F8EE993.8030502@oracle.com>
	<CAKzy53=UL3AC623gXeyvZcDdOZ2SNw53xE0bNH7BCy-rWmSKPg@mail.gmail.com>
	<4F8EFB9F.5030404@oracle.com>
Message-ID: <CABzyjykcAwpOfnepZGRmWpQfrazThkbnDBT3qzLTo_tS7tv1NQ@mail.gmail.com>

On Wed, Apr 18, 2012 at 10:36 AM, Jon Masamitsu <jon.masamitsu at oracle.com>wrote:

> Leon,
>
> I don't think I've actually seen logs with the same flags except changing
> parallel old for serial old so hard for me to say.  Simon's  comment
>
> > Well, maybe. But it shows that the parallel collector does its work,
> > since you had a 41.91/13.06 = 3.2x gain on your 4 cores.
>

I think Simon's "speed up" is a bit misleading. He shows that the wall-time
of 13.06 s
does user time eqvt work worth 41.91 seconds, so indeed a lot of user-level
work is
done in those 13.06 seconds. I'd call that "intrinsic parallelism" rather
than speed-up.
However, that's a misleading way to define speed-up because
(for all that the user cares about) all of that parallel work may be
overhead of the parallel algorithm
so that the bottom-line speed-up disappears. Rather, Simon and Leon, you
want to compare
the wall-clock pause-time seen with parallel old with that seen with serial
old (which i believe
is what Leon may have been referring to) which is how speed-up should be
defined when
comparing a parallel algorithm with a serial couterpart.

Leon, in the past we observed (and you will likely find some discussion in
the archives) that
a particular phase called the "deferred updates" phase was taking a bulk of
the time
when we encountered longer pauses with parallel old. That's phase when work
is done
single-threaded and would exhibit lower parallelism. Typically, but not
always, this
would happen during the full gc pauses during which maximal compaction was
forced.
(This is done by default during the first and every 20 subsequent full
collections -- or so.)
We worked around that by turning off maximal compaction and letting the
dense prefix
alone.

I believe a bug may have been filed following that discussion and it had
been my intention to
try and fix it (per discussion on the list). Unfortunately, other matters
intervened and I was
unable to get back to that work.

PrintParallelGC{Task,Phase}Times (i think) will give you more visibility
into the various phases etc. and
might help you diagnose the performance issue.

-- ramki


> says there is a parallel speed up, however, so I'll let you investigate
> you application
> and leave it at that.
>
> Jon
>
>
> On 4/18/2012 9:27 AM, the.6th.month at gmail.com wrote:
> > Hi, Jon,
> > yup,,,I know, but what is weird is the paroldgen doesn't bring about
> better
> > full gc performance as seen from JMX metrics but bring unexpected swap
> > consumption.
> > I am gonna look into my application instead for some inspiration.
> >
> > Leon
> >
> > On 19 April 2012 00:19, Jon Masamitsu<jon.masamitsu at oracle.com>  wrote:
> >
> >> **
> >> Leon,
> >>
> >> In this log you see as part of an entry "PSOldGen:" which says you're
> >> using the serial mark sweep.  I see in your later posts that
> "ParOldGen:"
> >> appears in your log and that is the parallel mark sweep collector.
> >>
> >> Jon
> >>
> >>
> >> On 4/18/2012 1:58 AM, the.6th.month at gmail.com wrote:
> >>
> >> Hi, Simon:
> >>
> >> this is the full gc log for your concern.
> >> 2012-04-18T16:47:24.824+0800: 988.392: [GC
> >> Desired survivor size 14876672 bytes, new threshold 1 (max 15)
> >>   [PSYoungGen: 236288K->8126K(247616K)] 4054802K->3830711K(4081472K),
> >> 0.0512250 secs] [Times: user=0.15 sys=0.00, real=0.05 secs]
> >>
> >> 2012-04-18T16:47:24.875+0800: 988.443: [Full GC [PSYoungGen:
> >> 8126K->0K(247616K)] [PSOldGen: 3822585K->1751429K(3833856K)]
> >> 3830711K->1751429K(4081472K) [PSPermGen: 81721K->81721K(262144K)],
> >> 6.6108630 secs] [Times: user=6.62 sys=0.00, real=6.61 secs]
> >>
> >> the full gc time is almost unchanged since I enabled paralleloldgc.
> >>
> >> Do you have any recommendation for an appropriate young gen size?
> >>
> >> Thanks
> >>
> >> All the best,
> >> Leon
> >>
> >>
> >> On 18 April 2012 16:24, Simone Bordet<sbordet at intalio.com>  <
> sbordet at intalio.com>  wrote:
> >>
> >>
> >>   Hi,
> >>
> >> On Wed, Apr 18, 2012 at 10:16, the.6th.month at gmail.com<
> the.6th.month at gmail.com>  <the.6th.month at gmail.com>  wrote:
> >>
> >>   hi all:
> >> I'm currently using jdk 6u26. I just enabled UseParallelOldGC, expecting
> >> that would enhance the full gc efficiency and decrease the mark-sweep
> >>
> >>   time
> >>
> >>   by using multiple-core. The JAVA_OPTS is as below:
> >> -XX:+PrintGCDetails -XX:+PrintGCDateStamps
> -XX:+PrintTenuringDistribution
> >> -Xloggc:gc.log-server -Xms4000m -Xmx4000m -Xss256k -Xmn256m
> >> -XX:PermSize=256m -XX:+UseParallelOldGC  -server
> >> -Dorg.apache.jasper.compiler.Parser.STRICT_QUOTE_ESCAPING=false
> >> as shown in jinfo output, the settings have taken effect, and the
> >> ParallelGCThreads is 4 since the jvm is running on a four-core server.
> >> But what's strange is that the mark-sweep time remains almost unchanged
> >>
> >>   (at
> >>
> >>   around 6-8 seconds), do I miss something here? Does anyone have the
> same
> >> experience or any idea about the reason behind?
> >> Thanks very much for help
> >>
> >>   The young generation is fairly small for a 4GiB heap.
> >>
> >> Can we see the lines you mention from the logs ?
> >>
> >> Simon
> >> --http://cometd.orghttp://intalio.comhttp://bordet.blogspot.com
> >> ----
> >> Finally, no matter how good the architecture and design are,
> >> to deliver bug-free software with optimal performance and reliability,
> >> the implementation technique must be flawless.   Victoria Livschitz
> >> _______________________________________________
> >> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://
> mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
> >>
> >>
> >> _______________________________________________
> >> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://
> mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
> >>
> >>
> >> _______________________________________________
> >> hotspot-gc-use mailing list
> >> hotspot-gc-use at openjdk.java.net
> >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
> >>
> >>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120418/7ed96c7a/attachment.html 

From Peter.B.Kessler at Oracle.COM  Wed Apr 18 14:52:32 2012
From: Peter.B.Kessler at Oracle.COM (Peter B. Kessler)
Date: Wed, 18 Apr 2012 14:52:32 -0700
Subject: does UseParallelOldGC guarantee a better full gc performance
In-Reply-To: <CAKzy53=UL3AC623gXeyvZcDdOZ2SNw53xE0bNH7BCy-rWmSKPg@mail.gmail.com>
References: <CAKzy53mWV1CyU7oJbmj5=2Bzq2Dq3VXvfuzTtz43=JE6S026-g@mail.gmail.com>	<CAFWmRJ14HdoM1FavGL=WX=R3AP7AkLngxV_95fLtJk0H1ELM3w@mail.gmail.com>	<CAKzy53mXkLAb81UH-zhaowMyQtEsFHP7XV9GpShsRzjCvh3HBA@mail.gmail.com>	<4F8EE993.8030502@oracle.com>
	<CAKzy53=UL3AC623gXeyvZcDdOZ2SNw53xE0bNH7BCy-rWmSKPg@mail.gmail.com>
Message-ID: <4F8F37A0.2070700@Oracle.COM>

"swap consumption"?  How much *physical* memory do you have on this box?  (It would be nice if the GC logs included the physical memory available.)  What else is running on the box?

			... peter

the.6th.month at gmail.com wrote:
> Hi, Jon,
> yup,,,I know, but what is weird is the paroldgen doesn't bring about 
> better full gc performance as seen from JMX metrics but bring unexpected 
> swap consumption.
> I am gonna look into my application instead for some inspiration.
> 
> Leon
> 
> On 19 April 2012 00:19, Jon Masamitsu <jon.masamitsu at oracle.com 
> <mailto:jon.masamitsu at oracle.com>> wrote:
> 
>     __
>     Leon,
> 
>     In this log you see as part of an entry "PSOldGen:" which says you're
>     using the serial mark sweep.  I see in your later posts that
>     "ParOldGen:"
>     appears in your log and that is the parallel mark sweep collector.
> 
>     Jon
> 
> 
>     On 4/18/2012 1:58 AM, the.6th.month at gmail.com
>     <mailto:the.6th.month at gmail.com> wrote:
>>     Hi, Simon:
>>
>>     this is the full gc log for your concern.
>>     2012-04-18T16:47:24.824+0800: 988.392: [GC
>>     Desired survivor size 14876672 bytes, new threshold 1 (max 15)
>>      [PSYoungGen: 236288K->8126K(247616K)] 4054802K->3830711K(4081472K),
>>     0.0512250 secs] [Times: user=0.15 sys=0.00, real=0.05 secs]
>>
>>     2012-04-18T16:47:24.875+0800: 988.443: [Full GC [PSYoungGen:
>>     8126K->0K(247616K)] [PSOldGen: 3822585K->1751429K(3833856K)]
>>     3830711K->1751429K(4081472K) [PSPermGen: 81721K->81721K(262144K)],
>>     6.6108630 secs] [Times: user=6.62 sys=0.00, real=6.61 secs]
>>
>>     the full gc time is almost unchanged since I enabled paralleloldgc.
>>
>>     Do you have any recommendation for an appropriate young gen size?
>>
>>     Thanks
>>
>>     All the best,
>>     Leon
>>
>>
>>     On 18 April 2012 16:24, Simone Bordet <sbordet at intalio.com> <mailto:sbordet at intalio.com> wrote:
>>
>>>     Hi,
>>>
>>>     On Wed, Apr 18, 2012 at 10:16, the.6th.month at gmail.com <mailto:the.6th.month at gmail.com>
>>>     <the.6th.month at gmail.com> <mailto:the.6th.month at gmail.com> wrote:
>>>>     hi all:
>>>>     I'm currently using jdk 6u26. I just enabled UseParallelOldGC, expecting
>>>>     that would enhance the full gc efficiency and decrease the mark-sweep
>>>     time
>>>>     by using multiple-core. The JAVA_OPTS is as below:
>>>>     -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution
>>>>     -Xloggc:gc.log-server -Xms4000m -Xmx4000m -Xss256k -Xmn256m
>>>>     -XX:PermSize=256m -XX:+UseParallelOldGC  -server
>>>>     -Dorg.apache.jasper.compiler.Parser.STRICT_QUOTE_ESCAPING=false
>>>>     as shown in jinfo output, the settings have taken effect, and the
>>>>     ParallelGCThreads is 4 since the jvm is running on a four-core server.
>>>>     But what's strange is that the mark-sweep time remains almost unchanged
>>>     (at
>>>>     around 6-8 seconds), do I miss something here? Does anyone have the same
>>>>     experience or any idea about the reason behind?
>>>>     Thanks very much for help
>>>     The young generation is fairly small for a 4GiB heap.
>>>
>>>     Can we see the lines you mention from the logs ?
>>>
>>>     Simon
>>>     --
>>>     http://cometd.org
>>>     http://intalio.com
>>>     http://bordet.blogspot.com
>>>     ----
>>>     Finally, no matter how good the architecture and design are,
>>>     to deliver bug-free software with optimal performance and reliability,
>>>     the implementation technique must be flawless.   Victoria Livschitz
>>>     _______________________________________________
>>>     hotspot-gc-use mailing list
>>>     hotspot-gc-use at openjdk.java.net <mailto:hotspot-gc-use at openjdk.java.net>
>>>     http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>
>>
>>     _______________________________________________
>>     hotspot-gc-use mailing list
>>     hotspot-gc-use at openjdk.java.net <mailto:hotspot-gc-use at openjdk.java.net>
>>     http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
> 
>     _______________________________________________
>     hotspot-gc-use mailing list
>     hotspot-gc-use at openjdk.java.net <mailto:hotspot-gc-use at openjdk.java.net>
>     http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

From the.6th.month at gmail.com  Thu Apr 19 01:51:54 2012
From: the.6th.month at gmail.com (the.6th.month at gmail.com)
Date: Thu, 19 Apr 2012 16:51:54 +0800
Subject: does UseParallelOldGC guarantee a better full gc performance
In-Reply-To: <CABzyjykcAwpOfnepZGRmWpQfrazThkbnDBT3qzLTo_tS7tv1NQ@mail.gmail.com>
References: <CAKzy53mWV1CyU7oJbmj5=2Bzq2Dq3VXvfuzTtz43=JE6S026-g@mail.gmail.com>
	<CAFWmRJ14HdoM1FavGL=WX=R3AP7AkLngxV_95fLtJk0H1ELM3w@mail.gmail.com>
	<CAKzy53mXkLAb81UH-zhaowMyQtEsFHP7XV9GpShsRzjCvh3HBA@mail.gmail.com>
	<4F8EE993.8030502@oracle.com>
	<CAKzy53=UL3AC623gXeyvZcDdOZ2SNw53xE0bNH7BCy-rWmSKPg@mail.gmail.com>
	<4F8EFB9F.5030404@oracle.com>
	<CABzyjykcAwpOfnepZGRmWpQfrazThkbnDBT3qzLTo_tS7tv1NQ@mail.gmail.com>
Message-ID: <CAKzy53mwZkGfwtyeVxDg7zH8H1Ss1QUTA6Kyap3LCD7vVqFr8w@mail.gmail.com>

hi, Srinivas:
that explains, i do observe that no performance gain has been obtained thru
par old gc via the jmx mark_sweep_time (i have a monitoring system
collecting that and print out with rrdtool). hopefully that's the result of
maximum compaction, but i am keen to ask whether it will bring about any
negative impact on performance, like leaving lots of fragmentations
unreclaimed.

all th best
Leon
On Apr 19, 2012 4:07 AM, "Srinivas Ramakrishna" <ysr1729 at gmail.com> wrote:

>
>
> On Wed, Apr 18, 2012 at 10:36 AM, Jon Masamitsu <jon.masamitsu at oracle.com>wrote:
>
>> Leon,
>>
>> I don't think I've actually seen logs with the same flags except changing
>> parallel old for serial old so hard for me to say.  Simon's  comment
>>
>> > Well, maybe. But it shows that the parallel collector does its work,
>> > since you had a 41.91/13.06 = 3.2x gain on your 4 cores.
>>
>
> I think Simon's "speed up" is a bit misleading. He shows that the
> wall-time of 13.06 s
> does user time eqvt work worth 41.91 seconds, so indeed a lot of
> user-level work is
> done in those 13.06 seconds. I'd call that "intrinsic parallelism" rather
> than speed-up.
> However, that's a misleading way to define speed-up because
> (for all that the user cares about) all of that parallel work may be
> overhead of the parallel algorithm
> so that the bottom-line speed-up disappears. Rather, Simon and Leon, you
> want to compare
> the wall-clock pause-time seen with parallel old with that seen with
> serial old (which i believe
> is what Leon may have been referring to) which is how speed-up should be
> defined when
> comparing a parallel algorithm with a serial couterpart.
>
> Leon, in the past we observed (and you will likely find some discussion in
> the archives) that
> a particular phase called the "deferred updates" phase was taking a bulk
> of the time
> when we encountered longer pauses with parallel old. That's phase when
> work is done
> single-threaded and would exhibit lower parallelism. Typically, but not
> always, this
> would happen during the full gc pauses during which maximal compaction was
> forced.
> (This is done by default during the first and every 20 subsequent full
> collections -- or so.)
> We worked around that by turning off maximal compaction and letting the
> dense prefix
> alone.
>
> I believe a bug may have been filed following that discussion and it had
> been my intention to
> try and fix it (per discussion on the list). Unfortunately, other matters
> intervened and I was
> unable to get back to that work.
>
> PrintParallelGC{Task,Phase}Times (i think) will give you more visibility
> into the various phases etc. and
> might help you diagnose the performance issue.
>
> -- ramki
>
>
>> says there is a parallel speed up, however, so I'll let you investigate
>> you application
>> and leave it at that.
>>
>> Jon
>>
>>
>> On 4/18/2012 9:27 AM, the.6th.month at gmail.com wrote:
>> > Hi, Jon,
>> > yup,,,I know, but what is weird is the paroldgen doesn't bring about
>> better
>> > full gc performance as seen from JMX metrics but bring unexpected swap
>> > consumption.
>> > I am gonna look into my application instead for some inspiration.
>> >
>> > Leon
>> >
>> > On 19 April 2012 00:19, Jon Masamitsu<jon.masamitsu at oracle.com>  wrote:
>> >
>> >> **
>> >> Leon,
>> >>
>> >> In this log you see as part of an entry "PSOldGen:" which says you're
>> >> using the serial mark sweep.  I see in your later posts that
>> "ParOldGen:"
>> >> appears in your log and that is the parallel mark sweep collector.
>> >>
>> >> Jon
>> >>
>> >>
>> >> On 4/18/2012 1:58 AM, the.6th.month at gmail.com wrote:
>> >>
>> >> Hi, Simon:
>> >>
>> >> this is the full gc log for your concern.
>> >> 2012-04-18T16:47:24.824+0800: 988.392: [GC
>> >> Desired survivor size 14876672 bytes, new threshold 1 (max 15)
>> >>   [PSYoungGen: 236288K->8126K(247616K)] 4054802K->3830711K(4081472K),
>> >> 0.0512250 secs] [Times: user=0.15 sys=0.00, real=0.05 secs]
>> >>
>> >> 2012-04-18T16:47:24.875+0800: 988.443: [Full GC [PSYoungGen:
>> >> 8126K->0K(247616K)] [PSOldGen: 3822585K->1751429K(3833856K)]
>> >> 3830711K->1751429K(4081472K) [PSPermGen: 81721K->81721K(262144K)],
>> >> 6.6108630 secs] [Times: user=6.62 sys=0.00, real=6.61 secs]
>> >>
>> >> the full gc time is almost unchanged since I enabled paralleloldgc.
>> >>
>> >> Do you have any recommendation for an appropriate young gen size?
>> >>
>> >> Thanks
>> >>
>> >> All the best,
>> >> Leon
>> >>
>> >>
>> >> On 18 April 2012 16:24, Simone Bordet<sbordet at intalio.com>  <
>> sbordet at intalio.com>  wrote:
>> >>
>> >>
>> >>   Hi,
>> >>
>> >> On Wed, Apr 18, 2012 at 10:16, the.6th.month at gmail.com<
>> the.6th.month at gmail.com>  <the.6th.month at gmail.com>  wrote:
>> >>
>> >>   hi all:
>> >> I'm currently using jdk 6u26. I just enabled UseParallelOldGC,
>> expecting
>> >> that would enhance the full gc efficiency and decrease the mark-sweep
>> >>
>> >>   time
>> >>
>> >>   by using multiple-core. The JAVA_OPTS is as below:
>> >> -XX:+PrintGCDetails -XX:+PrintGCDateStamps
>> -XX:+PrintTenuringDistribution
>> >> -Xloggc:gc.log-server -Xms4000m -Xmx4000m -Xss256k -Xmn256m
>> >> -XX:PermSize=256m -XX:+UseParallelOldGC  -server
>> >> -Dorg.apache.jasper.compiler.Parser.STRICT_QUOTE_ESCAPING=false
>> >> as shown in jinfo output, the settings have taken effect, and the
>> >> ParallelGCThreads is 4 since the jvm is running on a four-core server.
>> >> But what's strange is that the mark-sweep time remains almost unchanged
>> >>
>> >>   (at
>> >>
>> >>   around 6-8 seconds), do I miss something here? Does anyone have the
>> same
>> >> experience or any idea about the reason behind?
>> >> Thanks very much for help
>> >>
>> >>   The young generation is fairly small for a 4GiB heap.
>> >>
>> >> Can we see the lines you mention from the logs ?
>> >>
>> >> Simon
>> >> --http://cometd.orghttp://intalio.comhttp://bordet.blogspot.com
>> >> ----
>> >> Finally, no matter how good the architecture and design are,
>> >> to deliver bug-free software with optimal performance and reliability,
>> >> the implementation technique must be flawless.   Victoria Livschitz
>> >> _______________________________________________
>> >> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://
>> mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>> >>
>> >>
>> >> _______________________________________________
>> >> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://
>> mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>> >>
>> >>
>> >> _______________________________________________
>> >> hotspot-gc-use mailing list
>> >> hotspot-gc-use at openjdk.java.net
>> >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>> >>
>> >>
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120419/f51227da/attachment-0001.html 

From ysr1729 at gmail.com  Fri Apr 20 02:44:26 2012
From: ysr1729 at gmail.com (Srinivas Ramakrishna)
Date: Fri, 20 Apr 2012 02:44:26 -0700
Subject: does UseParallelOldGC guarantee a better full gc performance
In-Reply-To: <CAKzy53mwZkGfwtyeVxDg7zH8H1Ss1QUTA6Kyap3LCD7vVqFr8w@mail.gmail.com>
References: <CAKzy53mWV1CyU7oJbmj5=2Bzq2Dq3VXvfuzTtz43=JE6S026-g@mail.gmail.com>
	<CAFWmRJ14HdoM1FavGL=WX=R3AP7AkLngxV_95fLtJk0H1ELM3w@mail.gmail.com>
	<CAKzy53mXkLAb81UH-zhaowMyQtEsFHP7XV9GpShsRzjCvh3HBA@mail.gmail.com>
	<4F8EE993.8030502@oracle.com>
	<CAKzy53=UL3AC623gXeyvZcDdOZ2SNw53xE0bNH7BCy-rWmSKPg@mail.gmail.com>
	<4F8EFB9F.5030404@oracle.com>
	<CABzyjykcAwpOfnepZGRmWpQfrazThkbnDBT3qzLTo_tS7tv1NQ@mail.gmail.com>
	<CAKzy53mwZkGfwtyeVxDg7zH8H1Ss1QUTA6Kyap3LCD7vVqFr8w@mail.gmail.com>
Message-ID: <CABzyjynup=9DmM_v8mA7u3ZJMQ3ahzDZReK=Neks=z9YNUAJiA@mail.gmail.com>

BTW, max compaction doesn't happen every time, i think it happens in the
4th gc and then every 20th gc or so.
It;s those occasional gc's that would be impacted. (And that had been our
experience with generally good performance
but the occasional much slower pause. Don't know if your experience is
similar.)

No I don't think excessive deadwood is an issue. What is an issue is how
well this keeps up,
since in general the incidence of the deferred updates phase may be
affected by the number and
size of the deferred objects and their oop-richness, so I am not sure how
good a mitigant
avoiding maximal compaction is for long-lived JVM's with churn of latge
objects in the old
gen.

-- ramki

On Thu, Apr 19, 2012 at 1:51 AM, the.6th.month at gmail.com <
the.6th.month at gmail.com> wrote:

> hi, Srinivas:
> that explains, i do observe that no performance gain has been obtained
> thru par old gc via the jmx mark_sweep_time (i have a monitoring system
> collecting that and print out with rrdtool). hopefully that's the result of
> maximum compaction, but i am keen to ask whether it will bring about any
> negative impact on performance, like leaving lots of fragmentations
> unreclaimed.
>
> all th best
> Leon
> On Apr 19, 2012 4:07 AM, "Srinivas Ramakrishna" <ysr1729 at gmail.com> wrote:
>
>>
>>
>> On Wed, Apr 18, 2012 at 10:36 AM, Jon Masamitsu <jon.masamitsu at oracle.com
>> > wrote:
>>
>>> Leon,
>>>
>>> I don't think I've actually seen logs with the same flags except changing
>>> parallel old for serial old so hard for me to say.  Simon's  comment
>>>
>>> > Well, maybe. But it shows that the parallel collector does its work,
>>> > since you had a 41.91/13.06 = 3.2x gain on your 4 cores.
>>>
>>
>> I think Simon's "speed up" is a bit misleading. He shows that the
>> wall-time of 13.06 s
>> does user time eqvt work worth 41.91 seconds, so indeed a lot of
>> user-level work is
>> done in those 13.06 seconds. I'd call that "intrinsic parallelism" rather
>> than speed-up.
>> However, that's a misleading way to define speed-up because
>> (for all that the user cares about) all of that parallel work may be
>> overhead of the parallel algorithm
>> so that the bottom-line speed-up disappears. Rather, Simon and Leon, you
>> want to compare
>> the wall-clock pause-time seen with parallel old with that seen with
>> serial old (which i believe
>> is what Leon may have been referring to) which is how speed-up should be
>> defined when
>> comparing a parallel algorithm with a serial couterpart.
>>
>> Leon, in the past we observed (and you will likely find some discussion
>> in the archives) that
>> a particular phase called the "deferred updates" phase was taking a bulk
>> of the time
>> when we encountered longer pauses with parallel old. That's phase when
>> work is done
>> single-threaded and would exhibit lower parallelism. Typically, but not
>> always, this
>> would happen during the full gc pauses during which maximal compaction
>> was forced.
>> (This is done by default during the first and every 20 subsequent full
>> collections -- or so.)
>> We worked around that by turning off maximal compaction and letting the
>> dense prefix
>> alone.
>>
>> I believe a bug may have been filed following that discussion and it had
>> been my intention to
>> try and fix it (per discussion on the list). Unfortunately, other matters
>> intervened and I was
>> unable to get back to that work.
>>
>> PrintParallelGC{Task,Phase}Times (i think) will give you more visibility
>> into the various phases etc. and
>> might help you diagnose the performance issue.
>>
>> -- ramki
>>
>>
>>> says there is a parallel speed up, however, so I'll let you investigate
>>> you application
>>> and leave it at that.
>>>
>>> Jon
>>>
>>>
>>> On 4/18/2012 9:27 AM, the.6th.month at gmail.com wrote:
>>> > Hi, Jon,
>>> > yup,,,I know, but what is weird is the paroldgen doesn't bring about
>>> better
>>> > full gc performance as seen from JMX metrics but bring unexpected swap
>>> > consumption.
>>> > I am gonna look into my application instead for some inspiration.
>>> >
>>> > Leon
>>> >
>>> > On 19 April 2012 00:19, Jon Masamitsu<jon.masamitsu at oracle.com>
>>>  wrote:
>>> >
>>> >> **
>>> >> Leon,
>>> >>
>>> >> In this log you see as part of an entry "PSOldGen:" which says you're
>>> >> using the serial mark sweep.  I see in your later posts that
>>> "ParOldGen:"
>>> >> appears in your log and that is the parallel mark sweep collector.
>>> >>
>>> >> Jon
>>> >>
>>> >>
>>> >> On 4/18/2012 1:58 AM, the.6th.month at gmail.com wrote:
>>> >>
>>> >> Hi, Simon:
>>> >>
>>> >> this is the full gc log for your concern.
>>> >> 2012-04-18T16:47:24.824+0800: 988.392: [GC
>>> >> Desired survivor size 14876672 bytes, new threshold 1 (max 15)
>>> >>   [PSYoungGen: 236288K->8126K(247616K)] 4054802K->3830711K(4081472K),
>>> >> 0.0512250 secs] [Times: user=0.15 sys=0.00, real=0.05 secs]
>>> >>
>>> >> 2012-04-18T16:47:24.875+0800: 988.443: [Full GC [PSYoungGen:
>>> >> 8126K->0K(247616K)] [PSOldGen: 3822585K->1751429K(3833856K)]
>>> >> 3830711K->1751429K(4081472K) [PSPermGen: 81721K->81721K(262144K)],
>>> >> 6.6108630 secs] [Times: user=6.62 sys=0.00, real=6.61 secs]
>>> >>
>>> >> the full gc time is almost unchanged since I enabled paralleloldgc.
>>> >>
>>> >> Do you have any recommendation for an appropriate young gen size?
>>> >>
>>> >> Thanks
>>> >>
>>> >> All the best,
>>> >> Leon
>>> >>
>>> >>
>>> >> On 18 April 2012 16:24, Simone Bordet<sbordet at intalio.com>  <
>>> sbordet at intalio.com>  wrote:
>>> >>
>>> >>
>>> >>   Hi,
>>> >>
>>> >> On Wed, Apr 18, 2012 at 10:16, the.6th.month at gmail.com<
>>> the.6th.month at gmail.com>  <the.6th.month at gmail.com>  wrote:
>>> >>
>>> >>   hi all:
>>> >> I'm currently using jdk 6u26. I just enabled UseParallelOldGC,
>>> expecting
>>> >> that would enhance the full gc efficiency and decrease the mark-sweep
>>> >>
>>> >>   time
>>> >>
>>> >>   by using multiple-core. The JAVA_OPTS is as below:
>>> >> -XX:+PrintGCDetails -XX:+PrintGCDateStamps
>>> -XX:+PrintTenuringDistribution
>>> >> -Xloggc:gc.log-server -Xms4000m -Xmx4000m -Xss256k -Xmn256m
>>> >> -XX:PermSize=256m -XX:+UseParallelOldGC  -server
>>> >> -Dorg.apache.jasper.compiler.Parser.STRICT_QUOTE_ESCAPING=false
>>> >> as shown in jinfo output, the settings have taken effect, and the
>>> >> ParallelGCThreads is 4 since the jvm is running on a four-core server.
>>> >> But what's strange is that the mark-sweep time remains almost
>>> unchanged
>>> >>
>>> >>   (at
>>> >>
>>> >>   around 6-8 seconds), do I miss something here? Does anyone have the
>>> same
>>> >> experience or any idea about the reason behind?
>>> >> Thanks very much for help
>>> >>
>>> >>   The young generation is fairly small for a 4GiB heap.
>>> >>
>>> >> Can we see the lines you mention from the logs ?
>>> >>
>>> >> Simon
>>> >> --http://cometd.orghttp://intalio.comhttp://bordet.blogspot.com
>>> >> ----
>>> >> Finally, no matter how good the architecture and design are,
>>> >> to deliver bug-free software with optimal performance and reliability,
>>> >> the implementation technique must be flawless.   Victoria Livschitz
>>> >> _______________________________________________
>>> >> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://
>>> mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>> >>
>>> >>
>>> >> _______________________________________________
>>> >> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://
>>> mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>> >>
>>> >>
>>> >> _______________________________________________
>>> >> hotspot-gc-use mailing list
>>> >> hotspot-gc-use at openjdk.java.net
>>> >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>> >>
>>> >>
>>> _______________________________________________
>>> hotspot-gc-use mailing list
>>> hotspot-gc-use at openjdk.java.net
>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120420/9cb8195e/attachment.html 

From the.6th.month at gmail.com  Fri Apr 20 08:01:39 2012
From: the.6th.month at gmail.com (the.6th.month at gmail.com)
Date: Fri, 20 Apr 2012 23:01:39 +0800
Subject: does UseParallelOldGC guarantee a better full gc performance
In-Reply-To: <CABzyjynup=9DmM_v8mA7u3ZJMQ3ahzDZReK=Neks=z9YNUAJiA@mail.gmail.com>
References: <CAKzy53mWV1CyU7oJbmj5=2Bzq2Dq3VXvfuzTtz43=JE6S026-g@mail.gmail.com>
	<CAFWmRJ14HdoM1FavGL=WX=R3AP7AkLngxV_95fLtJk0H1ELM3w@mail.gmail.com>
	<CAKzy53mXkLAb81UH-zhaowMyQtEsFHP7XV9GpShsRzjCvh3HBA@mail.gmail.com>
	<4F8EE993.8030502@oracle.com>
	<CAKzy53=UL3AC623gXeyvZcDdOZ2SNw53xE0bNH7BCy-rWmSKPg@mail.gmail.com>
	<4F8EFB9F.5030404@oracle.com>
	<CABzyjykcAwpOfnepZGRmWpQfrazThkbnDBT3qzLTo_tS7tv1NQ@mail.gmail.com>
	<CAKzy53mwZkGfwtyeVxDg7zH8H1Ss1QUTA6Kyap3LCD7vVqFr8w@mail.gmail.com>
	<CABzyjynup=9DmM_v8mA7u3ZJMQ3ahzDZReK=Neks=z9YNUAJiA@mail.gmail.com>
Message-ID: <CAKzy53njZEEk0RXudRi7x1Ev7X7d2z5mmPJKpEaqqDOWMgzqtQ@mail.gmail.com>

Hi, Srinivas:
Can you explain more about "since in general the incidence of the deferred
updates phase may be affected by the number and size of the deferred
objects and their oop-richness". I don't quite understand what it means and
if it doesn't bother you too much, can you possible give some explanations
about what a deferred object means.
Thanks a million.

All the best,
Leon

On 20 April 2012 17:44, Srinivas Ramakrishna <ysr1729 at gmail.com> wrote:

> BTW, max compaction doesn't happen every time, i think it happens in the
> 4th gc and then every 20th gc or so.
> It;s those occasional gc's that would be impacted. (And that had been our
> experience with generally good performance
> but the occasional much slower pause. Don't know if your experience is
> similar.)
>
> No I don't think excessive deadwood is an issue. What is an issue is how
> well this keeps up,
> since in general the incidence of the deferred updates phase may be
> affected by the number and
> size of the deferred objects and their oop-richness, so I am not sure how
> good a mitigant
> avoiding maximal compaction is for long-lived JVM's with churn of latge
> objects in the old
> gen.
>
> -- ramki
>
>
> On Thu, Apr 19, 2012 at 1:51 AM, the.6th.month at gmail.com <
> the.6th.month at gmail.com> wrote:
>
>> hi, Srinivas:
>> that explains, i do observe that no performance gain has been obtained
>> thru par old gc via the jmx mark_sweep_time (i have a monitoring system
>> collecting that and print out with rrdtool). hopefully that's the result of
>> maximum compaction, but i am keen to ask whether it will bring about any
>> negative impact on performance, like leaving lots of fragmentations
>> unreclaimed.
>>
>> all th best
>> Leon
>> On Apr 19, 2012 4:07 AM, "Srinivas Ramakrishna" <ysr1729 at gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Wed, Apr 18, 2012 at 10:36 AM, Jon Masamitsu <
>>> jon.masamitsu at oracle.com> wrote:
>>>
>>>> Leon,
>>>>
>>>> I don't think I've actually seen logs with the same flags except
>>>> changing
>>>> parallel old for serial old so hard for me to say.  Simon's  comment
>>>>
>>>> > Well, maybe. But it shows that the parallel collector does its work,
>>>> > since you had a 41.91/13.06 = 3.2x gain on your 4 cores.
>>>>
>>>
>>> I think Simon's "speed up" is a bit misleading. He shows that the
>>> wall-time of 13.06 s
>>> does user time eqvt work worth 41.91 seconds, so indeed a lot of
>>> user-level work is
>>> done in those 13.06 seconds. I'd call that "intrinsic parallelism"
>>> rather than speed-up.
>>> However, that's a misleading way to define speed-up because
>>> (for all that the user cares about) all of that parallel work may be
>>> overhead of the parallel algorithm
>>> so that the bottom-line speed-up disappears. Rather, Simon and Leon, you
>>> want to compare
>>> the wall-clock pause-time seen with parallel old with that seen with
>>> serial old (which i believe
>>> is what Leon may have been referring to) which is how speed-up should be
>>> defined when
>>> comparing a parallel algorithm with a serial couterpart.
>>>
>>> Leon, in the past we observed (and you will likely find some discussion
>>> in the archives) that
>>> a particular phase called the "deferred updates" phase was taking a bulk
>>> of the time
>>> when we encountered longer pauses with parallel old. That's phase when
>>> work is done
>>> single-threaded and would exhibit lower parallelism. Typically, but not
>>> always, this
>>> would happen during the full gc pauses during which maximal compaction
>>> was forced.
>>> (This is done by default during the first and every 20 subsequent full
>>> collections -- or so.)
>>> We worked around that by turning off maximal compaction and letting the
>>> dense prefix
>>> alone.
>>>
>>> I believe a bug may have been filed following that discussion and it had
>>> been my intention to
>>> try and fix it (per discussion on the list). Unfortunately, other
>>> matters intervened and I was
>>> unable to get back to that work.
>>>
>>> PrintParallelGC{Task,Phase}Times (i think) will give you more visibility
>>> into the various phases etc. and
>>> might help you diagnose the performance issue.
>>>
>>> -- ramki
>>>
>>>
>>>> says there is a parallel speed up, however, so I'll let you investigate
>>>> you application
>>>> and leave it at that.
>>>>
>>>> Jon
>>>>
>>>>
>>>> On 4/18/2012 9:27 AM, the.6th.month at gmail.com wrote:
>>>> > Hi, Jon,
>>>> > yup,,,I know, but what is weird is the paroldgen doesn't bring about
>>>> better
>>>> > full gc performance as seen from JMX metrics but bring unexpected swap
>>>> > consumption.
>>>> > I am gonna look into my application instead for some inspiration.
>>>> >
>>>> > Leon
>>>> >
>>>> > On 19 April 2012 00:19, Jon Masamitsu<jon.masamitsu at oracle.com>
>>>>  wrote:
>>>> >
>>>> >> **
>>>> >> Leon,
>>>> >>
>>>> >> In this log you see as part of an entry "PSOldGen:" which says you're
>>>> >> using the serial mark sweep.  I see in your later posts that
>>>> "ParOldGen:"
>>>> >> appears in your log and that is the parallel mark sweep collector.
>>>> >>
>>>> >> Jon
>>>> >>
>>>> >>
>>>> >> On 4/18/2012 1:58 AM, the.6th.month at gmail.com wrote:
>>>> >>
>>>> >> Hi, Simon:
>>>> >>
>>>> >> this is the full gc log for your concern.
>>>> >> 2012-04-18T16:47:24.824+0800: 988.392: [GC
>>>> >> Desired survivor size 14876672 bytes, new threshold 1 (max 15)
>>>> >>   [PSYoungGen: 236288K->8126K(247616K)] 4054802K->3830711K(4081472K),
>>>> >> 0.0512250 secs] [Times: user=0.15 sys=0.00, real=0.05 secs]
>>>> >>
>>>> >> 2012-04-18T16:47:24.875+0800: 988.443: [Full GC [PSYoungGen:
>>>> >> 8126K->0K(247616K)] [PSOldGen: 3822585K->1751429K(3833856K)]
>>>> >> 3830711K->1751429K(4081472K) [PSPermGen: 81721K->81721K(262144K)],
>>>> >> 6.6108630 secs] [Times: user=6.62 sys=0.00, real=6.61 secs]
>>>> >>
>>>> >> the full gc time is almost unchanged since I enabled paralleloldgc.
>>>> >>
>>>> >> Do you have any recommendation for an appropriate young gen size?
>>>> >>
>>>> >> Thanks
>>>> >>
>>>> >> All the best,
>>>> >> Leon
>>>> >>
>>>> >>
>>>> >> On 18 April 2012 16:24, Simone Bordet<sbordet at intalio.com>  <
>>>> sbordet at intalio.com>  wrote:
>>>> >>
>>>> >>
>>>> >>   Hi,
>>>> >>
>>>> >> On Wed, Apr 18, 2012 at 10:16, the.6th.month at gmail.com<
>>>> the.6th.month at gmail.com>  <the.6th.month at gmail.com>  wrote:
>>>> >>
>>>> >>   hi all:
>>>> >> I'm currently using jdk 6u26. I just enabled UseParallelOldGC,
>>>> expecting
>>>> >> that would enhance the full gc efficiency and decrease the mark-sweep
>>>> >>
>>>> >>   time
>>>> >>
>>>> >>   by using multiple-core. The JAVA_OPTS is as below:
>>>> >> -XX:+PrintGCDetails -XX:+PrintGCDateStamps
>>>> -XX:+PrintTenuringDistribution
>>>> >> -Xloggc:gc.log-server -Xms4000m -Xmx4000m -Xss256k -Xmn256m
>>>> >> -XX:PermSize=256m -XX:+UseParallelOldGC  -server
>>>> >> -Dorg.apache.jasper.compiler.Parser.STRICT_QUOTE_ESCAPING=false
>>>> >> as shown in jinfo output, the settings have taken effect, and the
>>>> >> ParallelGCThreads is 4 since the jvm is running on a four-core
>>>> server.
>>>> >> But what's strange is that the mark-sweep time remains almost
>>>> unchanged
>>>> >>
>>>> >>   (at
>>>> >>
>>>> >>   around 6-8 seconds), do I miss something here? Does anyone have
>>>> the same
>>>> >> experience or any idea about the reason behind?
>>>> >> Thanks very much for help
>>>> >>
>>>> >>   The young generation is fairly small for a 4GiB heap.
>>>> >>
>>>> >> Can we see the lines you mention from the logs ?
>>>> >>
>>>> >> Simon
>>>> >> --http://cometd.orghttp://intalio.comhttp://bordet.blogspot.com
>>>> >> ----
>>>> >> Finally, no matter how good the architecture and design are,
>>>> >> to deliver bug-free software with optimal performance and
>>>> reliability,
>>>> >> the implementation technique must be flawless.   Victoria Livschitz
>>>> >> _______________________________________________
>>>> >> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://
>>>> mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>> >>
>>>> >>
>>>> >> _______________________________________________
>>>> >> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://
>>>> mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>> >>
>>>> >>
>>>> >> _______________________________________________
>>>> >> hotspot-gc-use mailing list
>>>> >> hotspot-gc-use at openjdk.java.net
>>>> >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>> >>
>>>> >>
>>>> _______________________________________________
>>>> hotspot-gc-use mailing list
>>>> hotspot-gc-use at openjdk.java.net
>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>>
>>>
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120420/6cbe0357/attachment-0001.html 

From aaisinzon at guidewire.com  Fri Apr 20 09:24:08 2012
From: aaisinzon at guidewire.com (Alex Aisinzon)
Date: Fri, 20 Apr 2012 16:24:08 +0000
Subject: Code cache
References: <43E49E6EC0E84F41B98C68AB6D7820C4170FA028@sm-ex-02-vm.guidewire.com>
	<4320BF98-561E-43AC-85BC-9E291108AD9B@amd.com> 
Message-ID: <43E49E6EC0E84F41B98C68AB6D7820C41710145F@sm-ex-02-vm.guidewire.com>

Eric

I tried -XX:+UseCodeCacheFlushing and associated performance/scalability was markedly poorer than increasing the code cache.
I will stick to tuning the code cache.

Best

Alex A

-----Original Message-----
From: Alex Aisinzon 
Sent: Thursday, April 12, 2012 1:31 PM
To: 'Eric Caspole'
Cc: hotspot-gc-use at openjdk.java.net
Subject: RE: Code cache

Hi Eric

I thank you for the feedback. I will give this tuning a try.
I have explored another approach: I have added the option -XX:+PrintCompilation to track code compilation.
This option is not very documented. I could infer that, without a larger code cache, about 11000 methods were compiled before hitting the issue.
When using a much larger cache (512MB), I saw that about 14000 methods were compiled. 
My understanding is that the code cache is 48MB for the platform I used (x64). A 14000/11000*48MB aka 61MB cache is likely to avoid the issue. I have started a performance test with a 64MB code cache to see if that indeed avoids the code cache full issue.

If so, I would have a method to find the right code cache size. 
I will report when I have the results. I will also report if -XX:+UseCodeCacheFlushing option provides similar results to the larger code cache.

As for your question on why our app is hitting this issue: our applications has become heavier in its use of compiled code so this is likely the consequence of that. 

Best

Alex A

-----Original Message-----
From: Eric Caspole [mailto:eric.caspole at amd.com] 
Sent: Thursday, April 12, 2012 12:26 PM
To: Alex Aisinzon
Cc: hotspot-gc-use at openjdk.java.net
Subject: Re: Code cache

Hi Alex,
You can try -XX:+UseCodeCacheFlushing where the JVM will selectively  
age out some compiled code and free up code cache space. This is not  
on by default in JDK 6 as far as I know.

What is your application doing such that it frequently hits this  
problem?

Regards,
Eric


On Apr 12, 2012, at 3:15 PM, Alex Aisinzon wrote:

> Any feedback on this?
>
>
>
> Best
>
>
>
> Alex A
>
>
>
> From: Alex Aisinzon
> Sent: Monday, April 09, 2012 11:38 AM
> To: 'hotspot-gc-use at openjdk.java.net'
> Subject: Code cache
>
>
>
> I ran performance tests on one of our apps and saw the following  
> error message in the GC logs:
>
> Java HotSpot(TM) 64-Bit Server VM warning: CodeCache is full.  
> Compiler has been disabled.
>
> Java HotSpot(TM) 64-Bit Server VM warning: Try increasing the code  
> cache size using -XX:ReservedCodeCacheSize=
>
>
>
> I scaled up the code cache to 512MB (- 
> XX:ReservedCodeCacheSize=512m) and markedly improved performance/ 
> scalability.
>
>
>
> I have a few questions:
>
> *         Is there a logging option that shows how much of the code  
> cache is really used so that I find the right cache size without  
> oversizing it?
>
> *         What factors play into the code cache utilization? I  
> would guess that the amount of code to compile is the dominant  
> factor. Are there other factors like load: I would guess that some  
> entries in the cache may get invalidated if not used much and load  
> could be a factor in this.
>
>
>
> I was running on Sun JVM 1.6 update 30 64 bit on x86-64.
>
>
>
> Best
>
>
>
> Alex A
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use


From aaisinzon at guidewire.com  Fri Apr 20 09:26:10 2012
From: aaisinzon at guidewire.com (Alex Aisinzon)
Date: Fri, 20 Apr 2012 16:26:10 +0000
Subject: G1 evolution/maturing
Message-ID: <43E49E6EC0E84F41B98C68AB6D7820C417101474@sm-ex-02-vm.guidewire.com>

Hi all

I still see a lot of discussions around CMS. G1 is supposed to solve some of CMS's issues/limitations, namely fragmentation.
I gave G1 a try about a year ago and it seemed not yet ready.
Has G1 evolved much in this last year and, if so, which release should I try with?

Best

Alex Aisinzon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120420/5a21901b/attachment.html 

From jon.masamitsu at oracle.com  Fri Apr 20 10:05:38 2012
From: jon.masamitsu at oracle.com (Jon Masamitsu)
Date: Fri, 20 Apr 2012 10:05:38 -0700
Subject: G1 evolution/maturing
In-Reply-To: <43E49E6EC0E84F41B98C68AB6D7820C417101474@sm-ex-02-vm.guidewire.com>
References: <43E49E6EC0E84F41B98C68AB6D7820C417101474@sm-ex-02-vm.guidewire.com>
Message-ID: <4F919762.8010901@oracle.com>

Alex,

Over the last year there has been work to make G1 more stable,
move more work to the concurrent phases, simplify some
code to improve performance and adjust G1 policy
for choosing regions to collect.  Large heaps have typically
been used in measuring performance so there is that
bias in the improvements (meaning we probably don't
have good numbers on how much performance with
smaller heaps have changed).  7u4 is the release to
try.

Jon


On 04/20/12 09:26, Alex Aisinzon wrote:
>
> Hi all
>
> I still see a lot of discussions around CMS. G1 is supposed to solve 
> some of CMS's issues/limitations, namely fragmentation.
>
> I gave G1 a try about a year ago and it seemed not yet ready.
>
> Has G1 evolved much in this last year and, if so, which release should 
> I try with?
>
> Best
>
> Alex Aisinzon
>
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120420/82dcc3b6/attachment.html 

From eric.caspole at amd.com  Fri Apr 20 10:34:13 2012
From: eric.caspole at amd.com (Eric Caspole)
Date: Fri, 20 Apr 2012 13:34:13 -0400
Subject: Code cache
In-Reply-To: <43E49E6EC0E84F41B98C68AB6D7820C41710145F@sm-ex-02-vm.guidewire.com>
References: <43E49E6EC0E84F41B98C68AB6D7820C4170FA028@sm-ex-02-vm.guidewire.com>
	<4320BF98-561E-43AC-85BC-9E291108AD9B@amd.com>
	<43E49E6EC0E84F41B98C68AB6D7820C41710145F@sm-ex-02-vm.guidewire.com>
Message-ID: <9B919A33-BA8D-4960-995F-2191747CE157@amd.com>

Yes, if your live working set size of compiled methods is bigger or  
very close to the code cache size then  +UseCodeCacheFlushing won't  
really help, because it will keep trying to recompile the methods and  
throw them away over and over.


On Apr 20, 2012, at 12:24 PM, Alex Aisinzon wrote:

> Eric
>
> I tried -XX:+UseCodeCacheFlushing and associated performance/ 
> scalability was markedly poorer than increasing the code cache.
> I will stick to tuning the code cache.
>
> Best
>
> Alex A
>
> -----Original Message-----
> From: Alex Aisinzon
> Sent: Thursday, April 12, 2012 1:31 PM
> To: 'Eric Caspole'
> Cc: hotspot-gc-use at openjdk.java.net
> Subject: RE: Code cache
>
> Hi Eric
>
> I thank you for the feedback. I will give this tuning a try.
> I have explored another approach: I have added the option -XX: 
> +PrintCompilation to track code compilation.
> This option is not very documented. I could infer that, without a  
> larger code cache, about 11000 methods were compiled before hitting  
> the issue.
> When using a much larger cache (512MB), I saw that about 14000  
> methods were compiled.
> My understanding is that the code cache is 48MB for the platform I  
> used (x64). A 14000/11000*48MB aka 61MB cache is likely to avoid  
> the issue. I have started a performance test with a 64MB code cache  
> to see if that indeed avoids the code cache full issue.
>
> If so, I would have a method to find the right code cache size.
> I will report when I have the results. I will also report if -XX: 
> +UseCodeCacheFlushing option provides similar results to the larger  
> code cache.
>
> As for your question on why our app is hitting this issue: our  
> applications has become heavier in its use of compiled code so this  
> is likely the consequence of that.
>
> Best
>
> Alex A
>
> -----Original Message-----
> From: Eric Caspole [mailto:eric.caspole at amd.com]
> Sent: Thursday, April 12, 2012 12:26 PM
> To: Alex Aisinzon
> Cc: hotspot-gc-use at openjdk.java.net
> Subject: Re: Code cache
>
> Hi Alex,
> You can try -XX:+UseCodeCacheFlushing where the JVM will selectively
> age out some compiled code and free up code cache space. This is not
> on by default in JDK 6 as far as I know.
>
> What is your application doing such that it frequently hits this
> problem?
>
> Regards,
> Eric
>
>
> On Apr 12, 2012, at 3:15 PM, Alex Aisinzon wrote:
>
>> Any feedback on this?
>>
>>
>>
>> Best
>>
>>
>>
>> Alex A
>>
>>
>>
>> From: Alex Aisinzon
>> Sent: Monday, April 09, 2012 11:38 AM
>> To: 'hotspot-gc-use at openjdk.java.net'
>> Subject: Code cache
>>
>>
>>
>> I ran performance tests on one of our apps and saw the following
>> error message in the GC logs:
>>
>> Java HotSpot(TM) 64-Bit Server VM warning: CodeCache is full.
>> Compiler has been disabled.
>>
>> Java HotSpot(TM) 64-Bit Server VM warning: Try increasing the code
>> cache size using -XX:ReservedCodeCacheSize=
>>
>>
>>
>> I scaled up the code cache to 512MB (-
>> XX:ReservedCodeCacheSize=512m) and markedly improved performance/
>> scalability.
>>
>>
>>
>> I have a few questions:
>>
>> *         Is there a logging option that shows how much of the code
>> cache is really used so that I find the right cache size without
>> oversizing it?
>>
>> *         What factors play into the code cache utilization? I
>> would guess that the amount of code to compile is the dominant
>> factor. Are there other factors like load: I would guess that some
>> entries in the cache may get invalidated if not used much and load
>> could be a factor in this.
>>
>>
>>
>> I was running on Sun JVM 1.6 update 30 64 bit on x86-64.
>>
>>
>>
>> Best
>>
>>
>>
>> Alex A
>>
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
>


From taras.tielkes at gmail.com  Fri Apr 20 12:46:44 2012
From: taras.tielkes at gmail.com (Taras Tielkes)
Date: Fri, 20 Apr 2012 21:46:44 +0200
Subject: Faster card marking: chances for Java 6 backport
Message-ID: <CA+R7V79XnJ8U+6noqZWvs26CWKwLvYtReW=vTCvzgWsFN4bmnw@mail.gmail.com>

Hi,

Are there plans to port RFE 7068625 to Java 6?

Thanks,
-tt

From jon.masamitsu at oracle.com  Fri Apr 20 15:42:21 2012
From: jon.masamitsu at oracle.com (Jon Masamitsu)
Date: Fri, 20 Apr 2012 15:42:21 -0700
Subject: Faster card marking: chances for Java 6 backport
In-Reply-To: <CA+R7V79XnJ8U+6noqZWvs26CWKwLvYtReW=vTCvzgWsFN4bmnw@mail.gmail.com>
References: <CA+R7V79XnJ8U+6noqZWvs26CWKwLvYtReW=vTCvzgWsFN4bmnw@mail.gmail.com>
Message-ID: <4F91E64D.1070509@oracle.com>

Taras,

I haven't heard any discussions about a backport.
I think it's a issue that the sustaining organization would
have to consider (since it's to jdk6).

Jon

On 4/20/2012 12:46 PM, Taras Tielkes wrote:
> Hi,
>
> Are there plans to port RFE 7068625 to Java 6?
>
> Thanks,
> -tt
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

From ysr1729 at gmail.com  Fri Apr 20 16:20:21 2012
From: ysr1729 at gmail.com (Srinivas Ramakrishna)
Date: Fri, 20 Apr 2012 16:20:21 -0700
Subject: does UseParallelOldGC guarantee a better full gc performance
In-Reply-To: <CAKzy53njZEEk0RXudRi7x1Ev7X7d2z5mmPJKpEaqqDOWMgzqtQ@mail.gmail.com>
References: <CAKzy53mWV1CyU7oJbmj5=2Bzq2Dq3VXvfuzTtz43=JE6S026-g@mail.gmail.com>
	<CAFWmRJ14HdoM1FavGL=WX=R3AP7AkLngxV_95fLtJk0H1ELM3w@mail.gmail.com>
	<CAKzy53mXkLAb81UH-zhaowMyQtEsFHP7XV9GpShsRzjCvh3HBA@mail.gmail.com>
	<4F8EE993.8030502@oracle.com>
	<CAKzy53=UL3AC623gXeyvZcDdOZ2SNw53xE0bNH7BCy-rWmSKPg@mail.gmail.com>
	<4F8EFB9F.5030404@oracle.com>
	<CABzyjykcAwpOfnepZGRmWpQfrazThkbnDBT3qzLTo_tS7tv1NQ@mail.gmail.com>
	<CAKzy53mwZkGfwtyeVxDg7zH8H1Ss1QUTA6Kyap3LCD7vVqFr8w@mail.gmail.com>
	<CABzyjynup=9DmM_v8mA7u3ZJMQ3ahzDZReK=Neks=z9YNUAJiA@mail.gmail.com>
	<CAKzy53njZEEk0RXudRi7x1Ev7X7d2z5mmPJKpEaqqDOWMgzqtQ@mail.gmail.com>
Message-ID: <CABzyjyn+PshpZPMZ1ndyq1EJSy7rkWT8JZ=Szg+oe2Y6C0KfOg@mail.gmail.com>

Hi Leon -- (sorry for overloading standard replicated database terminology
here which may have confused you.)

Here's the relevant explanation from Peter Kessler:-

   http://markmail.org/message/fhoffb4ksczxk26q

The URL also contains the discussion earlier this year on this list that I
had alluded to before.

-- ramki

On Fri, Apr 20, 2012 at 8:01 AM, the.6th.month at gmail.com <
the.6th.month at gmail.com> wrote:

> Hi, Srinivas:
> Can you explain more about "since in general the incidence of the deferred
> updates phase may be affected by the number and size of the deferred
> objects and their oop-richness". I don't quite understand what it means and
> if it doesn't bother you too much, can you possible give some explanations
> about what a deferred object means.
> Thanks a million.
>
> All the best,
> Leon
>
>
> On 20 April 2012 17:44, Srinivas Ramakrishna <ysr1729 at gmail.com> wrote:
>
>> BTW, max compaction doesn't happen every time, i think it happens in the
>> 4th gc and then every 20th gc or so.
>> It;s those occasional gc's that would be impacted. (And that had been our
>> experience with generally good performance
>> but the occasional much slower pause. Don't know if your experience is
>> similar.)
>>
>> No I don't think excessive deadwood is an issue. What is an issue is how
>> well this keeps up,
>> since in general the incidence of the deferred updates phase may be
>> affected by the number and
>> size of the deferred objects and their oop-richness, so I am not sure how
>> good a mitigant
>> avoiding maximal compaction is for long-lived JVM's with churn of latge
>> objects in the old
>> gen.
>>
>> -- ramki
>>
>>
>> On Thu, Apr 19, 2012 at 1:51 AM, the.6th.month at gmail.com <
>> the.6th.month at gmail.com> wrote:
>>
>>> hi, Srinivas:
>>> that explains, i do observe that no performance gain has been obtained
>>> thru par old gc via the jmx mark_sweep_time (i have a monitoring system
>>> collecting that and print out with rrdtool). hopefully that's the result of
>>> maximum compaction, but i am keen to ask whether it will bring about any
>>> negative impact on performance, like leaving lots of fragmentations
>>> unreclaimed.
>>>
>>> all th best
>>> Leon
>>> On Apr 19, 2012 4:07 AM, "Srinivas Ramakrishna" <ysr1729 at gmail.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Wed, Apr 18, 2012 at 10:36 AM, Jon Masamitsu <
>>>> jon.masamitsu at oracle.com> wrote:
>>>>
>>>>> Leon,
>>>>>
>>>>> I don't think I've actually seen logs with the same flags except
>>>>> changing
>>>>> parallel old for serial old so hard for me to say.  Simon's  comment
>>>>>
>>>>> > Well, maybe. But it shows that the parallel collector does its work,
>>>>> > since you had a 41.91/13.06 = 3.2x gain on your 4 cores.
>>>>>
>>>>
>>>> I think Simon's "speed up" is a bit misleading. He shows that the
>>>> wall-time of 13.06 s
>>>> does user time eqvt work worth 41.91 seconds, so indeed a lot of
>>>> user-level work is
>>>> done in those 13.06 seconds. I'd call that "intrinsic parallelism"
>>>> rather than speed-up.
>>>> However, that's a misleading way to define speed-up because
>>>> (for all that the user cares about) all of that parallel work may be
>>>> overhead of the parallel algorithm
>>>> so that the bottom-line speed-up disappears. Rather, Simon and Leon,
>>>> you want to compare
>>>> the wall-clock pause-time seen with parallel old with that seen with
>>>> serial old (which i believe
>>>> is what Leon may have been referring to) which is how speed-up should
>>>> be defined when
>>>> comparing a parallel algorithm with a serial couterpart.
>>>>
>>>> Leon, in the past we observed (and you will likely find some discussion
>>>> in the archives) that
>>>> a particular phase called the "deferred updates" phase was taking a
>>>> bulk of the time
>>>> when we encountered longer pauses with parallel old. That's phase when
>>>> work is done
>>>> single-threaded and would exhibit lower parallelism. Typically, but not
>>>> always, this
>>>> would happen during the full gc pauses during which maximal compaction
>>>> was forced.
>>>> (This is done by default during the first and every 20 subsequent full
>>>> collections -- or so.)
>>>> We worked around that by turning off maximal compaction and letting the
>>>> dense prefix
>>>> alone.
>>>>
>>>> I believe a bug may have been filed following that discussion and it
>>>> had been my intention to
>>>> try and fix it (per discussion on the list). Unfortunately, other
>>>> matters intervened and I was
>>>> unable to get back to that work.
>>>>
>>>> PrintParallelGC{Task,Phase}Times (i think) will give you more
>>>> visibility into the various phases etc. and
>>>> might help you diagnose the performance issue.
>>>>
>>>> -- ramki
>>>>
>>>>
>>>>> says there is a parallel speed up, however, so I'll let you investigate
>>>>> you application
>>>>> and leave it at that.
>>>>>
>>>>> Jon
>>>>>
>>>>>
>>>>> On 4/18/2012 9:27 AM, the.6th.month at gmail.com wrote:
>>>>> > Hi, Jon,
>>>>> > yup,,,I know, but what is weird is the paroldgen doesn't bring about
>>>>> better
>>>>> > full gc performance as seen from JMX metrics but bring unexpected
>>>>> swap
>>>>> > consumption.
>>>>> > I am gonna look into my application instead for some inspiration.
>>>>> >
>>>>> > Leon
>>>>> >
>>>>> > On 19 April 2012 00:19, Jon Masamitsu<jon.masamitsu at oracle.com>
>>>>>  wrote:
>>>>> >
>>>>> >> **
>>>>> >> Leon,
>>>>> >>
>>>>> >> In this log you see as part of an entry "PSOldGen:" which says
>>>>> you're
>>>>> >> using the serial mark sweep.  I see in your later posts that
>>>>> "ParOldGen:"
>>>>> >> appears in your log and that is the parallel mark sweep collector.
>>>>> >>
>>>>> >> Jon
>>>>> >>
>>>>> >>
>>>>> >> On 4/18/2012 1:58 AM, the.6th.month at gmail.com wrote:
>>>>> >>
>>>>> >> Hi, Simon:
>>>>> >>
>>>>> >> this is the full gc log for your concern.
>>>>> >> 2012-04-18T16:47:24.824+0800: 988.392: [GC
>>>>> >> Desired survivor size 14876672 bytes, new threshold 1 (max 15)
>>>>> >>   [PSYoungGen: 236288K->8126K(247616K)]
>>>>> 4054802K->3830711K(4081472K),
>>>>> >> 0.0512250 secs] [Times: user=0.15 sys=0.00, real=0.05 secs]
>>>>> >>
>>>>> >> 2012-04-18T16:47:24.875+0800: 988.443: [Full GC [PSYoungGen:
>>>>> >> 8126K->0K(247616K)] [PSOldGen: 3822585K->1751429K(3833856K)]
>>>>> >> 3830711K->1751429K(4081472K) [PSPermGen: 81721K->81721K(262144K)],
>>>>> >> 6.6108630 secs] [Times: user=6.62 sys=0.00, real=6.61 secs]
>>>>> >>
>>>>> >> the full gc time is almost unchanged since I enabled paralleloldgc.
>>>>> >>
>>>>> >> Do you have any recommendation for an appropriate young gen size?
>>>>> >>
>>>>> >> Thanks
>>>>> >>
>>>>> >> All the best,
>>>>> >> Leon
>>>>> >>
>>>>> >>
>>>>> >> On 18 April 2012 16:24, Simone Bordet<sbordet at intalio.com>  <
>>>>> sbordet at intalio.com>  wrote:
>>>>> >>
>>>>> >>
>>>>> >>   Hi,
>>>>> >>
>>>>> >> On Wed, Apr 18, 2012 at 10:16, the.6th.month at gmail.com<
>>>>> the.6th.month at gmail.com>  <the.6th.month at gmail.com>  wrote:
>>>>> >>
>>>>> >>   hi all:
>>>>> >> I'm currently using jdk 6u26. I just enabled UseParallelOldGC,
>>>>> expecting
>>>>> >> that would enhance the full gc efficiency and decrease the
>>>>> mark-sweep
>>>>> >>
>>>>> >>   time
>>>>> >>
>>>>> >>   by using multiple-core. The JAVA_OPTS is as below:
>>>>> >> -XX:+PrintGCDetails -XX:+PrintGCDateStamps
>>>>> -XX:+PrintTenuringDistribution
>>>>> >> -Xloggc:gc.log-server -Xms4000m -Xmx4000m -Xss256k -Xmn256m
>>>>> >> -XX:PermSize=256m -XX:+UseParallelOldGC  -server
>>>>> >> -Dorg.apache.jasper.compiler.Parser.STRICT_QUOTE_ESCAPING=false
>>>>> >> as shown in jinfo output, the settings have taken effect, and the
>>>>> >> ParallelGCThreads is 4 since the jvm is running on a four-core
>>>>> server.
>>>>> >> But what's strange is that the mark-sweep time remains almost
>>>>> unchanged
>>>>> >>
>>>>> >>   (at
>>>>> >>
>>>>> >>   around 6-8 seconds), do I miss something here? Does anyone have
>>>>> the same
>>>>> >> experience or any idea about the reason behind?
>>>>> >> Thanks very much for help
>>>>> >>
>>>>> >>   The young generation is fairly small for a 4GiB heap.
>>>>> >>
>>>>> >> Can we see the lines you mention from the logs ?
>>>>> >>
>>>>> >> Simon
>>>>> >> --http://cometd.orghttp://intalio.comhttp://bordet.blogspot.com
>>>>> >> ----
>>>>> >> Finally, no matter how good the architecture and design are,
>>>>> >> to deliver bug-free software with optimal performance and
>>>>> reliability,
>>>>> >> the implementation technique must be flawless.   Victoria Livschitz
>>>>> >> _______________________________________________
>>>>> >> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://
>>>>> mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>>> >>
>>>>> >>
>>>>> >> _______________________________________________
>>>>> >> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://
>>>>> mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>>> >>
>>>>> >>
>>>>> >> _______________________________________________
>>>>> >> hotspot-gc-use mailing list
>>>>> >> hotspot-gc-use at openjdk.java.net
>>>>> >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>>> >>
>>>>> >>
>>>>> _______________________________________________
>>>>> hotspot-gc-use mailing list
>>>>> hotspot-gc-use at openjdk.java.net
>>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>>>
>>>>
>>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120420/d1d387f6/attachment.html 

From taras.tielkes at gmail.com  Sun Apr 22 13:24:31 2012
From: taras.tielkes at gmail.com (Taras Tielkes)
Date: Sun, 22 Apr 2012 22:24:31 +0200
Subject: Two basic questions on -verbosegc and -XX:+PrintTenuringDistribution
Message-ID: <CA+R7V78bTOkvaYgwNCPC2MfiqdV-QBtOidz3nUXjb9bwZ5FrNg@mail.gmail.com>

Hi,

We're using a time-series database to store and aggregate monitoring
data from our systems, including GC behavior.

I'm thinking of adding two metrics:
* total allocation (in K per minute)
* total promotion (in K per minute)

The gc logs are the source for this data, and I'd like to verify that
my understanding of the numbers is correct.

Here's an example verbosegc line of output (we're running ParNew+CMS):
[GC 2136581.585: [ParNew:345951K->40960K(368640K), 0.0676780 secs]
3608692K->3323692K(5201920K), 0.0680220 secs]

a) The delta between the ParNew "before" and "after" is: 345951K-40960K=304991K
My understanding is that the 304991K is the total of (collected in
young gen + promoted to tenured gen)
Since this number of composed of two things, it's not directly useful by itself.

b) The delta between the overall heap "before" and "after" is:
3608692K-3323692K=285000K
I assume that this is effectively the volume that was collected in
this ParNew cycle.
Would it be correct to calculate the total allocation rate of the
running application (in a given period) from summing the total heap
deltas (in a given timespan)?

I do realize that it's a "collected kilobytes" metric, but I think
it's close enough to be used as a "delayed" allocation number,
especially when looking at a timescale of 10 minutes or more.
It has the additional convenience of requiring to parse the current
gc.log line only, and not needing to correlate with the preceding
ParNew event.

c) I take it that the difference between the two deltas (ParNew delta
and total heap delta) is effectively the promotion volume?
In the example above, this would give a promotion volume of
(345951K-40960K)-(3608692K-3323692K)=19991K

d) When looking at -XX:+PrintTenuringDistribution output, I assume the
distribution reflects the situation *after* the enclosing ParNew event
in the log.

Thanks in advance for any corrections,
-tt

From rainer.jung at kippdata.de  Sun Apr 22 14:00:49 2012
From: rainer.jung at kippdata.de (Rainer Jung)
Date: Sun, 22 Apr 2012 23:00:49 +0200
Subject: Two basic questions on -verbosegc and
	-XX:+PrintTenuringDistribution
In-Reply-To: <CA+R7V78bTOkvaYgwNCPC2MfiqdV-QBtOidz3nUXjb9bwZ5FrNg@mail.gmail.com>
References: <CA+R7V78bTOkvaYgwNCPC2MfiqdV-QBtOidz3nUXjb9bwZ5FrNg@mail.gmail.com>
Message-ID: <4F947181.50003@kippdata.de>

On 22.04.2012 22:24, Taras Tielkes wrote:
> Hi,
>
> We're using a time-series database to store and aggregate monitoring
> data from our systems, including GC behavior.
>
> I'm thinking of adding two metrics:
> * total allocation (in K per minute)
> * total promotion (in K per minute)
>
> The gc logs are the source for this data, and I'd like to verify that
> my understanding of the numbers is correct.
>
> Here's an example verbosegc line of output (we're running ParNew+CMS):
> [GC 2136581.585: [ParNew:345951K->40960K(368640K), 0.0676780 secs]
> 3608692K->3323692K(5201920K), 0.0680220 secs]
>
> a) The delta between the ParNew "before" and "after" is: 345951K-40960K=304991K
> My understanding is that the 304991K is the total of (collected in
> young gen + promoted to tenured gen)
> Since this number of composed of two things, it's not directly useful by itself.
>
> b) The delta between the overall heap "before" and "after" is:
> 3608692K-3323692K=285000K
> I assume that this is effectively the volume that was collected in
> this ParNew cycle.
> Would it be correct to calculate the total allocation rate of the
> running application (in a given period) from summing the total heap
> deltas (in a given timespan)?
>
> I do realize that it's a "collected kilobytes" metric, but I think
> it's close enough to be used as a "delayed" allocation number,
> especially when looking at a timescale of 10 minutes or more.
> It has the additional convenience of requiring to parse the current
> gc.log line only, and not needing to correlate with the preceding
> ParNew event.
>
> c) I take it that the difference between the two deltas (ParNew delta
> and total heap delta) is effectively the promotion volume?
> In the example above, this would give a promotion volume of
> (345951K-40960K)-(3608692K-3323692K)=19991K
>
> d) When looking at -XX:+PrintTenuringDistribution output, I assume the
> distribution reflects the situation *after* the enclosing ParNew event
> in the log.

Have a look at -XX:+PrintHeapAtGC. This will help you get more precise 
numbers.

Regards,

Rainer


From rednaxelafx at gmail.com  Sun Apr 22 19:40:41 2012
From: rednaxelafx at gmail.com (Krystal Mok)
Date: Mon, 23 Apr 2012 10:40:41 +0800
Subject: Two basic questions on -verbosegc and
	-XX:+PrintTenuringDistribution
In-Reply-To: <CA+R7V78bTOkvaYgwNCPC2MfiqdV-QBtOidz3nUXjb9bwZ5FrNg@mail.gmail.com>
References: <CA+R7V78bTOkvaYgwNCPC2MfiqdV-QBtOidz3nUXjb9bwZ5FrNg@mail.gmail.com>
Message-ID: <CA+cQ+tQroj4rUhopFyEvW+5EYfC0fRSzMS14nBjkmEKZQ1ZXbw@mail.gmail.com>

Hi Taras,

d) When looking at -XX:+PrintTenuringDistribution output, I assume the
> distribution reflects the situation *after* the enclosing ParNew event
> in the log.


That's right. The stats are actually printed after the collection has
completed.

FYI, to get accurate promotion size info, you don't always have to parse
the GC log. There's a PerfData counter that keeps track of the promoted
size (in bytes) in a minor GC. You could use jstat to fetch the value of
that counter, like this:

$ jstat -J-Djstat.showUnsupported=true -snap `pgrep java` | grep
sun.gc.policy.promoted=
sun.gc.policy.promoted=680475760

There are a couple of other counters that can be played in conjuntion, e.g.
sun.gc.collector.0.invocations, which shows the number of minor GCs:

$ jstat -J-Djstat.showUnsupported=true -snap `pgrep java` | grep
sun.gc.collector.0.invocations=
sun.gc.collector.0.invocations=23

- Kris

On Mon, Apr 23, 2012 at 4:24 AM, Taras Tielkes <taras.tielkes at gmail.com>wrote:

> Hi,
>
> We're using a time-series database to store and aggregate monitoring
> data from our systems, including GC behavior.
>
> I'm thinking of adding two metrics:
> * total allocation (in K per minute)
> * total promotion (in K per minute)
>
> The gc logs are the source for this data, and I'd like to verify that
> my understanding of the numbers is correct.
>
> Here's an example verbosegc line of output (we're running ParNew+CMS):
> [GC 2136581.585: [ParNew:345951K->40960K(368640K), 0.0676780 secs]
> 3608692K->3323692K(5201920K), 0.0680220 secs]
>
> a) The delta between the ParNew "before" and "after" is:
> 345951K-40960K=304991K
> My understanding is that the 304991K is the total of (collected in
> young gen + promoted to tenured gen)
> Since this number of composed of two things, it's not directly useful by
> itself.
>
> b) The delta between the overall heap "before" and "after" is:
> 3608692K-3323692K=285000K
> I assume that this is effectively the volume that was collected in
> this ParNew cycle.
> Would it be correct to calculate the total allocation rate of the
> running application (in a given period) from summing the total heap
> deltas (in a given timespan)?
>
> I do realize that it's a "collected kilobytes" metric, but I think
> it's close enough to be used as a "delayed" allocation number,
> especially when looking at a timescale of 10 minutes or more.
> It has the additional convenience of requiring to parse the current
> gc.log line only, and not needing to correlate with the preceding
> ParNew event.
>
> c) I take it that the difference between the two deltas (ParNew delta
> and total heap delta) is effectively the promotion volume?
> In the example above, this would give a promotion volume of
> (345951K-40960K)-(3608692K-3323692K)=19991K
>
> d) When looking at -XX:+PrintTenuringDistribution output, I assume the
> distribution reflects the situation *after* the enclosing ParNew event
> in the log.
>
> Thanks in advance for any corrections,
> -tt
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120423/1972a8f3/attachment.html 

From the.6th.month at gmail.com  Sun Apr 22 21:08:01 2012
From: the.6th.month at gmail.com (the.6th.month at gmail.com)
Date: Mon, 23 Apr 2012 12:08:01 +0800
Subject: Two basic questions on -verbosegc and
	-XX:+PrintTenuringDistribution
In-Reply-To: <CA+cQ+tQroj4rUhopFyEvW+5EYfC0fRSzMS14nBjkmEKZQ1ZXbw@mail.gmail.com>
References: <CA+R7V78bTOkvaYgwNCPC2MfiqdV-QBtOidz3nUXjb9bwZ5FrNg@mail.gmail.com>
	<CA+cQ+tQroj4rUhopFyEvW+5EYfC0fRSzMS14nBjkmEKZQ1ZXbw@mail.gmail.com>
Message-ID: <CAKzy53mWA=1=WEKidmPCeFGk7N5Konk5B8_OROR-2Ppk-VefWg@mail.gmail.com>

Hi, Krystal:
those perf data are pretty interesting. Can I get them from JMX metrics, I
have a system running aside to collect jmx metrics and reflect them to
cacti and nagios graphs

All the best
Leon

On 23 April 2012 10:40, Krystal Mok <rednaxelafx at gmail.com> wrote:

> Hi Taras,
>
> d) When looking at -XX:+PrintTenuringDistribution output, I assume the
>> distribution reflects the situation *after* the enclosing ParNew event
>> in the log.
>
>
> That's right. The stats are actually printed after the collection has
> completed.
>
> FYI, to get accurate promotion size info, you don't always have to parse
> the GC log. There's a PerfData counter that keeps track of the promoted
> size (in bytes) in a minor GC. You could use jstat to fetch the value of
> that counter, like this:
>
> $ jstat -J-Djstat.showUnsupported=true -snap `pgrep java` | grep
> sun.gc.policy.promoted=
> sun.gc.policy.promoted=680475760
>
> There are a couple of other counters that can be played in conjuntion,
> e.g. sun.gc.collector.0.invocations, which shows the number of minor GCs:
>
> $ jstat -J-Djstat.showUnsupported=true -snap `pgrep java` | grep
> sun.gc.collector.0.invocations=
> sun.gc.collector.0.invocations=23
>
> - Kris
>
>
> On Mon, Apr 23, 2012 at 4:24 AM, Taras Tielkes <taras.tielkes at gmail.com>wrote:
>
>> Hi,
>>
>> We're using a time-series database to store and aggregate monitoring
>> data from our systems, including GC behavior.
>>
>> I'm thinking of adding two metrics:
>> * total allocation (in K per minute)
>> * total promotion (in K per minute)
>>
>> The gc logs are the source for this data, and I'd like to verify that
>> my understanding of the numbers is correct.
>>
>> Here's an example verbosegc line of output (we're running ParNew+CMS):
>> [GC 2136581.585: [ParNew:345951K->40960K(368640K), 0.0676780 secs]
>> 3608692K->3323692K(5201920K), 0.0680220 secs]
>>
>> a) The delta between the ParNew "before" and "after" is:
>> 345951K-40960K=304991K
>> My understanding is that the 304991K is the total of (collected in
>> young gen + promoted to tenured gen)
>> Since this number of composed of two things, it's not directly useful by
>> itself.
>>
>> b) The delta between the overall heap "before" and "after" is:
>> 3608692K-3323692K=285000K
>> I assume that this is effectively the volume that was collected in
>> this ParNew cycle.
>> Would it be correct to calculate the total allocation rate of the
>> running application (in a given period) from summing the total heap
>> deltas (in a given timespan)?
>>
>> I do realize that it's a "collected kilobytes" metric, but I think
>> it's close enough to be used as a "delayed" allocation number,
>> especially when looking at a timescale of 10 minutes or more.
>> It has the additional convenience of requiring to parse the current
>> gc.log line only, and not needing to correlate with the preceding
>> ParNew event.
>>
>> c) I take it that the difference between the two deltas (ParNew delta
>> and total heap delta) is effectively the promotion volume?
>> In the example above, this would give a promotion volume of
>> (345951K-40960K)-(3608692K-3323692K)=19991K
>>
>> d) When looking at -XX:+PrintTenuringDistribution output, I assume the
>> distribution reflects the situation *after* the enclosing ParNew event
>> in the log.
>>
>> Thanks in advance for any corrections,
>> -tt
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120423/c8db605f/attachment.html 

From rednaxelafx at gmail.com  Sun Apr 22 21:35:15 2012
From: rednaxelafx at gmail.com (Krystal Mok)
Date: Mon, 23 Apr 2012 12:35:15 +0800
Subject: Two basic questions on -verbosegc and
	-XX:+PrintTenuringDistribution
In-Reply-To: <CAKzy53mWA=1=WEKidmPCeFGk7N5Konk5B8_OROR-2Ppk-VefWg@mail.gmail.com>
References: <CA+R7V78bTOkvaYgwNCPC2MfiqdV-QBtOidz3nUXjb9bwZ5FrNg@mail.gmail.com>
	<CA+cQ+tQroj4rUhopFyEvW+5EYfC0fRSzMS14nBjkmEKZQ1ZXbw@mail.gmail.com>
	<CAKzy53mWA=1=WEKidmPCeFGk7N5Konk5B8_OROR-2Ppk-VefWg@mail.gmail.com>
Message-ID: <CA+cQ+tTrc3wbBMQHusqvg7k+4NWhG3szYgiqJijz=G=z7PMoLw@mail.gmail.com>

Hi Leon,

I'm afraid not. I'm not aware of any built-in JMX beans that expose these
counters.

- Kris

On Mon, Apr 23, 2012 at 12:08 PM, the.6th.month at gmail.com <
the.6th.month at gmail.com> wrote:

> Hi, Krystal:
> those perf data are pretty interesting. Can I get them from JMX metrics, I
> have a system running aside to collect jmx metrics and reflect them to
> cacti and nagios graphs
>
> All the best
> Leon
>
>
> On 23 April 2012 10:40, Krystal Mok <rednaxelafx at gmail.com> wrote:
>
>> Hi Taras,
>>
>> d) When looking at -XX:+PrintTenuringDistribution output, I assume the
>>> distribution reflects the situation *after* the enclosing ParNew event
>>> in the log.
>>
>>
>> That's right. The stats are actually printed after the collection has
>> completed.
>>
>> FYI, to get accurate promotion size info, you don't always have to parse
>> the GC log. There's a PerfData counter that keeps track of the promoted
>> size (in bytes) in a minor GC. You could use jstat to fetch the value of
>> that counter, like this:
>>
>> $ jstat -J-Djstat.showUnsupported=true -snap `pgrep java` | grep
>> sun.gc.policy.promoted=
>> sun.gc.policy.promoted=680475760
>>
>> There are a couple of other counters that can be played in conjuntion,
>> e.g. sun.gc.collector.0.invocations, which shows the number of minor GCs:
>>
>> $ jstat -J-Djstat.showUnsupported=true -snap `pgrep java` | grep
>> sun.gc.collector.0.invocations=
>> sun.gc.collector.0.invocations=23
>>
>> - Kris
>>
>>
>> On Mon, Apr 23, 2012 at 4:24 AM, Taras Tielkes <taras.tielkes at gmail.com>wrote:
>>
>>> Hi,
>>>
>>> We're using a time-series database to store and aggregate monitoring
>>> data from our systems, including GC behavior.
>>>
>>> I'm thinking of adding two metrics:
>>> * total allocation (in K per minute)
>>> * total promotion (in K per minute)
>>>
>>> The gc logs are the source for this data, and I'd like to verify that
>>> my understanding of the numbers is correct.
>>>
>>> Here's an example verbosegc line of output (we're running ParNew+CMS):
>>> [GC 2136581.585: [ParNew:345951K->40960K(368640K), 0.0676780 secs]
>>> 3608692K->3323692K(5201920K), 0.0680220 secs]
>>>
>>> a) The delta between the ParNew "before" and "after" is:
>>> 345951K-40960K=304991K
>>> My understanding is that the 304991K is the total of (collected in
>>> young gen + promoted to tenured gen)
>>> Since this number of composed of two things, it's not directly useful by
>>> itself.
>>>
>>> b) The delta between the overall heap "before" and "after" is:
>>> 3608692K-3323692K=285000K
>>> I assume that this is effectively the volume that was collected in
>>> this ParNew cycle.
>>> Would it be correct to calculate the total allocation rate of the
>>> running application (in a given period) from summing the total heap
>>> deltas (in a given timespan)?
>>>
>>> I do realize that it's a "collected kilobytes" metric, but I think
>>> it's close enough to be used as a "delayed" allocation number,
>>> especially when looking at a timescale of 10 minutes or more.
>>> It has the additional convenience of requiring to parse the current
>>> gc.log line only, and not needing to correlate with the preceding
>>> ParNew event.
>>>
>>> c) I take it that the difference between the two deltas (ParNew delta
>>> and total heap delta) is effectively the promotion volume?
>>> In the example above, this would give a promotion volume of
>>> (345951K-40960K)-(3608692K-3323692K)=19991K
>>>
>>> d) When looking at -XX:+PrintTenuringDistribution output, I assume the
>>> distribution reflects the situation *after* the enclosing ParNew event
>>> in the log.
>>>
>>> Thanks in advance for any corrections,
>>> -tt
>>> _______________________________________________
>>> hotspot-gc-use mailing list
>>> hotspot-gc-use at openjdk.java.net
>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>
>>
>>
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120423/fbd76012/attachment-0001.html 

From bengt.rutisson at oracle.com  Mon Apr 23 00:18:02 2012
From: bengt.rutisson at oracle.com (Bengt Rutisson)
Date: Mon, 23 Apr 2012 09:18:02 +0200
Subject: Faster card marking: chances for Java 6 backport
In-Reply-To: <4F91E64D.1070509@oracle.com>
References: <CA+R7V79XnJ8U+6noqZWvs26CWKwLvYtReW=vTCvzgWsFN4bmnw@mail.gmail.com>
	<4F91E64D.1070509@oracle.com>
Message-ID: <4F95022A.7060103@oracle.com>


Taras,

Maybe I'm being a bit picky here, but just to be clear. The change for 
7068625 is for faster card scanning - not marking.

I agree with Jon, I don't think this will be backported to JDK6 unless 
there is an explicit customer request to do so.

Bengt

On 2012-04-21 00:42, Jon Masamitsu wrote:
> Taras,
>
> I haven't heard any discussions about a backport.
> I think it's a issue that the sustaining organization would
> have to consider (since it's to jdk6).
>
> Jon
>
> On 4/20/2012 12:46 PM, Taras Tielkes wrote:
>> Hi,
>>
>> Are there plans to port RFE 7068625 to Java 6?
>>
>> Thanks,
>> -tt
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use


From alexey.ragozin at gmail.com  Mon Apr 23 01:05:52 2012
From: alexey.ragozin at gmail.com (Alexey Ragozin)
Date: Mon, 23 Apr 2012 08:05:52 +0000
Subject: Two basic questions on -verbosegc and
	-XX:+PrintTenuringDistribution
Message-ID: <CAMgTVmK7X4bEbUnAWYGKawkfotTK0=eNbynpwK-GRpqOJopnMA@mail.gmail.com>

Hi,

If you need this information for monitoring, you can get it with JMX.
Some time ago I have written tool displaying GC metrics (similar to GC
log). It is using attach API and JMX to get data from JVM.

It is available at
http://code.google.com/p/gridkit/downloads/detail?name=gcrep.jar
Usage: java -jar gcrep.jar <PID>

But you probably will be more interested in sources, you can find them here
http://code.google.com/p/gridkit/source/browse/branches/aragozin-sandbox/young-gc-bench/src/main/java/org/gridkit/util/monitoring/MBeanGCMonitor.java?spec=svn1461&r=1461

Regards,
Alexey

> Date: Sun, 22 Apr 2012 22:24:31 +0200
> From: Taras Tielkes <taras.tielkes at gmail.com>
> Subject: Two basic questions on -verbosegc and
> ? ? ? ?-XX:+PrintTenuringDistribution
> To: hotspot-gc-use at openjdk.java.net
> Message-ID:
> ? ? ? ?<CA+R7V78bTOkvaYgwNCPC2MfiqdV-QBtOidz3nUXjb9bwZ5FrNg at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> Hi,
>
> We're using a time-series database to store and aggregate monitoring
> data from our systems, including GC behavior.
>
> I'm thinking of adding two metrics:
> * total allocation (in K per minute)
> * total promotion (in K per minute)
>
> The gc logs are the source for this data, and I'd like to verify that
> my understanding of the numbers is correct.
>
> Here's an example verbosegc line of output (we're running ParNew+CMS):
> [GC 2136581.585: [ParNew:345951K->40960K(368640K), 0.0676780 secs]
> 3608692K->3323692K(5201920K), 0.0680220 secs]
>
> a) The delta between the ParNew "before" and "after" is: 345951K-40960K=304991K
> My understanding is that the 304991K is the total of (collected in
> young gen + promoted to tenured gen)
> Since this number of composed of two things, it's not directly useful by itself.
>
> b) The delta between the overall heap "before" and "after" is:
> 3608692K-3323692K=285000K
> I assume that this is effectively the volume that was collected in
> this ParNew cycle.
> Would it be correct to calculate the total allocation rate of the
> running application (in a given period) from summing the total heap
> deltas (in a given timespan)?
>
> I do realize that it's a "collected kilobytes" metric, but I think
> it's close enough to be used as a "delayed" allocation number,
> especially when looking at a timescale of 10 minutes or more.
> It has the additional convenience of requiring to parse the current
> gc.log line only, and not needing to correlate with the preceding
> ParNew event.
>
> c) I take it that the difference between the two deltas (ParNew delta
> and total heap delta) is effectively the promotion volume?
> In the example above, this would give a promotion volume of
> (345951K-40960K)-(3608692K-3323692K)=19991K
>
> d) When looking at -XX:+PrintTenuringDistribution output, I assume the
> distribution reflects the situation *after* the enclosing ParNew event
> in the log.
>
> Thanks in advance for any corrections,
> -tt

From the.6th.month at gmail.com  Mon Apr 23 01:13:00 2012
From: the.6th.month at gmail.com (the.6th.month at gmail.com)
Date: Mon, 23 Apr 2012 16:13:00 +0800
Subject: Two basic questions on -verbosegc and
	-XX:+PrintTenuringDistribution
In-Reply-To: <CAMgTVmK7X4bEbUnAWYGKawkfotTK0=eNbynpwK-GRpqOJopnMA@mail.gmail.com>
References: <CAMgTVmK7X4bEbUnAWYGKawkfotTK0=eNbynpwK-GRpqOJopnMA@mail.gmail.com>
Message-ID: <CAKzy53kUR1aJwgqaBrAigwsa9Y0NoK_wBkjbxwcJ+mGRAgHz-A@mail.gmail.com>

Hi, Alexey
looks pretty cool, so basically you are parsing LastGCInfo to get those
metrics, right?

Leon

On 23 April 2012 16:05, Alexey Ragozin <alexey.ragozin at gmail.com> wrote:

> Hi,
>
> If you need this information for monitoring, you can get it with JMX.
> Some time ago I have written tool displaying GC metrics (similar to GC
> log). It is using attach API and JMX to get data from JVM.
>
> It is available at
> http://code.google.com/p/gridkit/downloads/detail?name=gcrep.jar
> Usage: java -jar gcrep.jar <PID>
>
> But you probably will be more interested in sources, you can find them here
>
> http://code.google.com/p/gridkit/source/browse/branches/aragozin-sandbox/young-gc-bench/src/main/java/org/gridkit/util/monitoring/MBeanGCMonitor.java?spec=svn1461&r=1461
>
> Regards,
> Alexey
>
> > Date: Sun, 22 Apr 2012 22:24:31 +0200
> > From: Taras Tielkes <taras.tielkes at gmail.com>
> > Subject: Two basic questions on -verbosegc and
> >        -XX:+PrintTenuringDistribution
> > To: hotspot-gc-use at openjdk.java.net
> > Message-ID:
> >        <
> CA+R7V78bTOkvaYgwNCPC2MfiqdV-QBtOidz3nUXjb9bwZ5FrNg at mail.gmail.com>
> > Content-Type: text/plain; charset=ISO-8859-1
> >
> > Hi,
> >
> > We're using a time-series database to store and aggregate monitoring
> > data from our systems, including GC behavior.
> >
> > I'm thinking of adding two metrics:
> > * total allocation (in K per minute)
> > * total promotion (in K per minute)
> >
> > The gc logs are the source for this data, and I'd like to verify that
> > my understanding of the numbers is correct.
> >
> > Here's an example verbosegc line of output (we're running ParNew+CMS):
> > [GC 2136581.585: [ParNew:345951K->40960K(368640K), 0.0676780 secs]
> > 3608692K->3323692K(5201920K), 0.0680220 secs]
> >
> > a) The delta between the ParNew "before" and "after" is:
> 345951K-40960K=304991K
> > My understanding is that the 304991K is the total of (collected in
> > young gen + promoted to tenured gen)
> > Since this number of composed of two things, it's not directly useful by
> itself.
> >
> > b) The delta between the overall heap "before" and "after" is:
> > 3608692K-3323692K=285000K
> > I assume that this is effectively the volume that was collected in
> > this ParNew cycle.
> > Would it be correct to calculate the total allocation rate of the
> > running application (in a given period) from summing the total heap
> > deltas (in a given timespan)?
> >
> > I do realize that it's a "collected kilobytes" metric, but I think
> > it's close enough to be used as a "delayed" allocation number,
> > especially when looking at a timescale of 10 minutes or more.
> > It has the additional convenience of requiring to parse the current
> > gc.log line only, and not needing to correlate with the preceding
> > ParNew event.
> >
> > c) I take it that the difference between the two deltas (ParNew delta
> > and total heap delta) is effectively the promotion volume?
> > In the example above, this would give a promotion volume of
> > (345951K-40960K)-(3608692K-3323692K)=19991K
> >
> > d) When looking at -XX:+PrintTenuringDistribution output, I assume the
> > distribution reflects the situation *after* the enclosing ParNew event
> > in the log.
> >
> > Thanks in advance for any corrections,
> > -tt
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120423/63fb7345/attachment.html 

From alexey.ragozin at gmail.com  Mon Apr 23 01:51:07 2012
From: alexey.ragozin at gmail.com (Alexey Ragozin)
Date: Mon, 23 Apr 2012 08:51:07 +0000
Subject: Two basic questions on -verbosegc and
	-XX:+PrintTenuringDistribution
In-Reply-To: <CAMgTVm+jS9mG0Hr=iVY8kvDjWUiXR0jJKbP0kLFmjvEn3h=b0g@mail.gmail.com>
References: <CAMgTVmK7X4bEbUnAWYGKawkfotTK0=eNbynpwK-GRpqOJopnMA@mail.gmail.com>
	<CAKzy53kUR1aJwgqaBrAigwsa9Y0NoK_wBkjbxwcJ+mGRAgHz-A@mail.gmail.com>
	<CAMgTVm+jS9mG0Hr=iVY8kvDjWUiXR0jJKbP0kLFmjvEn3h=b0g@mail.gmail.com>
Message-ID: <CAMgTVm+VJM4yNO+-V_FFzbUs0ZF=T56sEOm2onsM=0Z+oT-YuQ@mail.gmail.com>

Exactly.

I'm polling JMX regularly to catch LastGCInfo for every collection
(not ideal solution though). LastGCInfo have all in need (I'm mostly
interested in STW pauses and allocation / reclaim rates).

There are some limitations - I cannot reproduce tenuring distribution
from JMX, CMS fragmentation metrics also available only through logs
(very unfortunate).

Regards,
Alexey

On Mon, Apr 23, 2012 at 8:13 AM, the.6th.month at gmail.com
<the.6th.month at gmail.com> wrote:
>> Hi, Alexey
>> looks pretty cool, so basically you are parsing LastGCInfo to get those
>> metrics, right?
>>
>> Leon
>>
>> On 23 April 2012 16:05, Alexey Ragozin <alexey.ragozin at gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> If you need this information for monitoring, you can get it with JMX.
>>> Some time ago I have written tool displaying GC metrics (similar to GC
>>> log). It is using attach API and JMX to get data from JVM.
>>>
>>> It is available at
>>> http://code.google.com/p/gridkit/downloads/detail?name=gcrep.jar
>>> Usage: java -jar gcrep.jar <PID>
>>>
>>> But you probably will be more interested in sources, you can find them
>>> here
>>>
>>> http://code.google.com/p/gridkit/source/browse/branches/aragozin-sandbox/young-gc-bench/src/main/java/org/gridkit/util/monitoring/MBeanGCMonitor.java?spec=svn1461&r=1461
>>>
>>> Regards,
>>> Alexey
>>>
>>> > Date: Sun, 22 Apr 2012 22:24:31 +0200
>>> > From: Taras Tielkes <taras.tielkes at gmail.com>
>>> > Subject: Two basic questions on -verbosegc and
>>> > ? ? ? ?-XX:+PrintTenuringDistribution
>>> > To: hotspot-gc-use at openjdk.java.net
>>> > Message-ID:
>>> >
>>> > ?<CA+R7V78bTOkvaYgwNCPC2MfiqdV-QBtOidz3nUXjb9bwZ5FrNg at mail.gmail.com>
>>> > Content-Type: text/plain; charset=ISO-8859-1
>>> >
>>> > Hi,
>>> >
>>> > We're using a time-series database to store and aggregate monitoring
>>> > data from our systems, including GC behavior.
>>> >
>>> > I'm thinking of adding two metrics:
>>> > * total allocation (in K per minute)
>>> > * total promotion (in K per minute)
>>> >
>>> > The gc logs are the source for this data, and I'd like to verify that
>>> > my understanding of the numbers is correct.
>>> >
>>> > Here's an example verbosegc line of output (we're running ParNew+CMS):
>>> > [GC 2136581.585: [ParNew:345951K->40960K(368640K), 0.0676780 secs]
>>> > 3608692K->3323692K(5201920K), 0.0680220 secs]
>>> >
>>> > a) The delta between the ParNew "before" and "after" is:
>>> > 345951K-40960K=304991K
>>> > My understanding is that the 304991K is the total of (collected in
>>> > young gen + promoted to tenured gen)
>>> > Since this number of composed of two things, it's not directly useful by
>>> > itself.
>>> >
>>> > b) The delta between the overall heap "before" and "after" is:
>>> > 3608692K-3323692K=285000K
>>> > I assume that this is effectively the volume that was collected in
>>> > this ParNew cycle.
>>> > Would it be correct to calculate the total allocation rate of the
>>> > running application (in a given period) from summing the total heap
>>> > deltas (in a given timespan)?
>>> >
>>> > I do realize that it's a "collected kilobytes" metric, but I think
>>> > it's close enough to be used as a "delayed" allocation number,
>>> > especially when looking at a timescale of 10 minutes or more.
>>> > It has the additional convenience of requiring to parse the current
>>> > gc.log line only, and not needing to correlate with the preceding
>>> > ParNew event.
>>> >
>>> > c) I take it that the difference between the two deltas (ParNew delta
>>> > and total heap delta) is effectively the promotion volume?
>>> > In the example above, this would give a promotion volume of
>>> > (345951K-40960K)-(3608692K-3323692K)=19991K
>>> >
>>> > d) When looking at -XX:+PrintTenuringDistribution output, I assume the
>>> > distribution reflects the situation *after* the enclosing ParNew event
>>> > in the log.
>>> >
>>> > Thanks in advance for any corrections,
>>> > -tt
>>> _______________________________________________
>>> hotspot-gc-use mailing list
>>> hotspot-gc-use at openjdk.java.net
>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>
>>

From kbbryant61 at gmail.com  Mon Apr 23 12:28:07 2012
From: kbbryant61 at gmail.com (Kobe Bryant)
Date: Mon, 23 Apr 2012 12:28:07 -0700
Subject: Two basic questions on -verbosegc and
	-XX:+PrintTenuringDistribution
In-Reply-To: <CA+cQ+tQroj4rUhopFyEvW+5EYfC0fRSzMS14nBjkmEKZQ1ZXbw@mail.gmail.com>
References: <CA+R7V78bTOkvaYgwNCPC2MfiqdV-QBtOidz3nUXjb9bwZ5FrNg@mail.gmail.com>
	<CA+cQ+tQroj4rUhopFyEvW+5EYfC0fRSzMS14nBjkmEKZQ1ZXbw@mail.gmail.com>
Message-ID: <CAMJNr9k7ZRe-MwyPeO-HsD90Q5KbtXV0Vd_9-E6Ptzm=1vZpgg@mail.gmail.com>

sorry for interjecting myself into this interesting conversation:

  > There's a PerfData counter that keeps track of the promoted size (in
bytes) in a minor GC.
  >You could use jstat to fetch the value of that counter, like this:

does this give me the number of promoted bytes in the last minor gc? so if
I have to track promotion volumes
at each gc I have to keep polling this metric (and even then I might miss
an update and lose information, since
this info is not cumulative), correct?

Also, is there a similar metric to track size harvested from tenured space
at each full GC?

thank you

/K

On Sun, Apr 22, 2012 at 7:40 PM, Krystal Mok <rednaxelafx at gmail.com> wrote:

> Hi Taras,
>
> d) When looking at -XX:+PrintTenuringDistribution output, I assume the
>> distribution reflects the situation *after* the enclosing ParNew event
>> in the log.
>
>
> That's right. The stats are actually printed after the collection has
> completed.
>
> FYI, to get accurate promotion size info, you don't always have to parse
> the GC log. There's a PerfData counter that keeps track of the promoted
> size (in bytes) in a minor GC. You could use jstat to fetch the value of
> that counter, like this:
>
> $ jstat -J-Djstat.showUnsupported=true -snap `pgrep java` | grep
> sun.gc.policy.promoted=
> sun.gc.policy.promoted=680475760
>
> There are a couple of other counters that can be played in conjuntion,
> e.g. sun.gc.collector.0.invocations, which shows the number of minor GCs:
>
> $ jstat -J-Djstat.showUnsupported=true -snap `pgrep java` | grep
> sun.gc.collector.0.invocations=
> sun.gc.collector.0.invocations=23
>
> - Kris
>
>
> On Mon, Apr 23, 2012 at 4:24 AM, Taras Tielkes <taras.tielkes at gmail.com>wrote:
>
>> Hi,
>>
>> We're using a time-series database to store and aggregate monitoring
>> data from our systems, including GC behavior.
>>
>> I'm thinking of adding two metrics:
>> * total allocation (in K per minute)
>> * total promotion (in K per minute)
>>
>> The gc logs are the source for this data, and I'd like to verify that
>> my understanding of the numbers is correct.
>>
>> Here's an example verbosegc line of output (we're running ParNew+CMS):
>> [GC 2136581.585: [ParNew:345951K->40960K(368640K), 0.0676780 secs]
>> 3608692K->3323692K(5201920K), 0.0680220 secs]
>>
>> a) The delta between the ParNew "before" and "after" is:
>> 345951K-40960K=304991K
>> My understanding is that the 304991K is the total of (collected in
>> young gen + promoted to tenured gen)
>> Since this number of composed of two things, it's not directly useful by
>> itself.
>>
>> b) The delta between the overall heap "before" and "after" is:
>> 3608692K-3323692K=285000K
>> I assume that this is effectively the volume that was collected in
>> this ParNew cycle.
>> Would it be correct to calculate the total allocation rate of the
>> running application (in a given period) from summing the total heap
>> deltas (in a given timespan)?
>>
>> I do realize that it's a "collected kilobytes" metric, but I think
>> it's close enough to be used as a "delayed" allocation number,
>> especially when looking at a timescale of 10 minutes or more.
>> It has the additional convenience of requiring to parse the current
>> gc.log line only, and not needing to correlate with the preceding
>> ParNew event.
>>
>> c) I take it that the difference between the two deltas (ParNew delta
>> and total heap delta) is effectively the promotion volume?
>> In the example above, this would give a promotion volume of
>> (345951K-40960K)-(3608692K-3323692K)=19991K
>>
>> d) When looking at -XX:+PrintTenuringDistribution output, I assume the
>> distribution reflects the situation *after* the enclosing ParNew event
>> in the log.
>>
>> Thanks in advance for any corrections,
>> -tt
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120423/77c4d3b6/attachment.html 

From taras.tielkes at gmail.com  Mon Apr 23 13:03:38 2012
From: taras.tielkes at gmail.com (Taras Tielkes)
Date: Mon, 23 Apr 2012 22:03:38 +0200
Subject: Two basic questions on -verbosegc and
	-XX:+PrintTenuringDistribution
In-Reply-To: <CAMJNr9k7ZRe-MwyPeO-HsD90Q5KbtXV0Vd_9-E6Ptzm=1vZpgg@mail.gmail.com>
References: <CA+R7V78bTOkvaYgwNCPC2MfiqdV-QBtOidz3nUXjb9bwZ5FrNg@mail.gmail.com>
	<CA+cQ+tQroj4rUhopFyEvW+5EYfC0fRSzMS14nBjkmEKZQ1ZXbw@mail.gmail.com>
	<CAMJNr9k7ZRe-MwyPeO-HsD90Q5KbtXV0Vd_9-E6Ptzm=1vZpgg@mail.gmail.com>
Message-ID: <CA+R7V79xFH=ZPdfL_9yZJ3DBgCP+qX=qupG8eP8mxMKuHCCW_Q@mail.gmail.com>

Hi,

Sorry to return this thread to the original question :-)
The additional data from jstat -snap is indeed quite useful.

However, I think the totals easily harvested from the gc logs are
accurate enough for my purposes, which is measuring overall allocation
rate, and overall promotion rate.
Performing a few manual calculations shows that the promotion volume I
calculate as the differentce of ParNew delta and total heap delta is
reasonably close to the "tenuring age 15" age group from the preceding
ParNew.
I just want to make sure I'm not missing something obvious here. The
assumption is of course that PermGen is quite stable, and that
promotion and CMS failures are relatively rate.

Thanks,
-tt

On Mon, Apr 23, 2012 at 9:28 PM, Kobe Bryant <kbbryant61 at gmail.com> wrote:
> sorry for interjecting myself into this interesting conversation:
>
> ? >?There's a PerfData counter that keeps track of the promoted size (in
> bytes) in a minor GC.
> ? >You could use jstat to fetch the value of that counter, like this:
>
> does this give me the number of promoted bytes in the last minor gc? so if I
> have to track promotion volumes
> at each gc I have to keep polling this metric (and even then I might miss an
> update and lose information, since
> this info is not cumulative), correct?
>
> Also, is there a similar metric to track size harvested from tenured space
> at each full GC?
>
> thank you
>
> /K
>
> On Sun, Apr 22, 2012 at 7:40 PM, Krystal Mok <rednaxelafx at gmail.com> wrote:
>>
>> Hi Taras,
>>
>>> d) When looking at -XX:+PrintTenuringDistribution output, I assume the
>>> distribution reflects the situation *after* the enclosing ParNew event
>>> in the log.
>>
>>
>> That's right. The stats are actually printed after the collection has
>> completed.
>>
>> FYI, to get accurate promotion size info, you don't always have to parse
>> the GC log. There's a PerfData counter that keeps track of the promoted size
>> (in bytes) in a minor GC. You could use jstat to fetch the value of that
>> counter, like this:
>>
>> $ jstat -J-Djstat.showUnsupported=true -snap `pgrep java` | grep
>> sun.gc.policy.promoted=
>> sun.gc.policy.promoted=680475760
>>
>> There are a couple of other counters that can be played in conjuntion,
>> e.g. sun.gc.collector.0.invocations, which shows the number of minor GCs:
>>
>> $ jstat -J-Djstat.showUnsupported=true -snap `pgrep java` | grep
>> sun.gc.collector.0.invocations=
>> sun.gc.collector.0.invocations=23
>>
>> - Kris
>>
>>
>> On Mon, Apr 23, 2012 at 4:24 AM, Taras Tielkes <taras.tielkes at gmail.com>
>> wrote:
>>>
>>> Hi,
>>>
>>> We're using a time-series database to store and aggregate monitoring
>>> data from our systems, including GC behavior.
>>>
>>> I'm thinking of adding two metrics:
>>> * total allocation (in K per minute)
>>> * total promotion (in K per minute)
>>>
>>> The gc logs are the source for this data, and I'd like to verify that
>>> my understanding of the numbers is correct.
>>>
>>> Here's an example verbosegc line of output (we're running ParNew+CMS):
>>> [GC 2136581.585: [ParNew:345951K->40960K(368640K), 0.0676780 secs]
>>> 3608692K->3323692K(5201920K), 0.0680220 secs]
>>>
>>> a) The delta between the ParNew "before" and "after" is:
>>> 345951K-40960K=304991K
>>> My understanding is that the 304991K is the total of (collected in
>>> young gen + promoted to tenured gen)
>>> Since this number of composed of two things, it's not directly useful by
>>> itself.
>>>
>>> b) The delta between the overall heap "before" and "after" is:
>>> 3608692K-3323692K=285000K
>>> I assume that this is effectively the volume that was collected in
>>> this ParNew cycle.
>>> Would it be correct to calculate the total allocation rate of the
>>> running application (in a given period) from summing the total heap
>>> deltas (in a given timespan)?
>>>
>>> I do realize that it's a "collected kilobytes" metric, but I think
>>> it's close enough to be used as a "delayed" allocation number,
>>> especially when looking at a timescale of 10 minutes or more.
>>> It has the additional convenience of requiring to parse the current
>>> gc.log line only, and not needing to correlate with the preceding
>>> ParNew event.
>>>
>>> c) I take it that the difference between the two deltas (ParNew delta
>>> and total heap delta) is effectively the promotion volume?
>>> In the example above, this would give a promotion volume of
>>> (345951K-40960K)-(3608692K-3323692K)=19991K
>>>
>>> d) When looking at -XX:+PrintTenuringDistribution output, I assume the
>>> distribution reflects the situation *after* the enclosing ParNew event
>>> in the log.
>>>
>>> Thanks in advance for any corrections,
>>> -tt
>>> _______________________________________________
>>> hotspot-gc-use mailing list
>>> hotspot-gc-use at openjdk.java.net
>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>
>>
>>
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>

From taras.tielkes at gmail.com  Mon Apr 23 13:07:31 2012
From: taras.tielkes at gmail.com (Taras Tielkes)
Date: Mon, 23 Apr 2012 22:07:31 +0200
Subject: Faster card marking: chances for Java 6 backport
In-Reply-To: <4F95022A.7060103@oracle.com>
References: <CA+R7V79XnJ8U+6noqZWvs26CWKwLvYtReW=vTCvzgWsFN4bmnw@mail.gmail.com>
	<4F91E64D.1070509@oracle.com> <4F95022A.7060103@oracle.com>
Message-ID: <CA+R7V7_sLyC=x2XW2Y6FXJqGgxdfRWKhHnYmx-_j2q8ATJJnTw@mail.gmail.com>

Hi Bengt,

Thanks for the correction - you're completely right, of course.

To me, the decision process for which performance improvements are
backported to the previous release stream has never been completely
clear.
Given that the change in question seems quite an isolated fix, I
though it would make sense to ask.

Thanks,
-tt

On Mon, Apr 23, 2012 at 9:18 AM, Bengt Rutisson
<bengt.rutisson at oracle.com> wrote:
>
> Taras,
>
> Maybe I'm being a bit picky here, but just to be clear. The change for
> 7068625 is for faster card scanning - not marking.
>
> I agree with Jon, I don't think this will be backported to JDK6 unless
> there is an explicit customer request to do so.
>
> Bengt
>
> On 2012-04-21 00:42, Jon Masamitsu wrote:
>> Taras,
>>
>> I haven't heard any discussions about a backport.
>> I think it's a issue that the sustaining organization would
>> have to consider (since it's to jdk6).
>>
>> Jon
>>
>> On 4/20/2012 12:46 PM, Taras Tielkes wrote:
>>> Hi,
>>>
>>> Are there plans to port RFE 7068625 to Java 6?
>>>
>>> Thanks,
>>> -tt
>>> _______________________________________________
>>> hotspot-gc-use mailing list
>>> hotspot-gc-use at openjdk.java.net
>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

From ysr1729 at gmail.com  Mon Apr 23 13:51:12 2012
From: ysr1729 at gmail.com (Srinivas Ramakrishna)
Date: Mon, 23 Apr 2012 13:51:12 -0700
Subject: Two basic questions on -verbosegc and
	-XX:+PrintTenuringDistribution
In-Reply-To: <CA+R7V79xFH=ZPdfL_9yZJ3DBgCP+qX=qupG8eP8mxMKuHCCW_Q@mail.gmail.com>
References: <CA+R7V78bTOkvaYgwNCPC2MfiqdV-QBtOidz3nUXjb9bwZ5FrNg@mail.gmail.com>
	<CA+cQ+tQroj4rUhopFyEvW+5EYfC0fRSzMS14nBjkmEKZQ1ZXbw@mail.gmail.com>
	<CAMJNr9k7ZRe-MwyPeO-HsD90Q5KbtXV0Vd_9-E6Ptzm=1vZpgg@mail.gmail.com>
	<CA+R7V79xFH=ZPdfL_9yZJ3DBgCP+qX=qupG8eP8mxMKuHCCW_Q@mail.gmail.com>
Message-ID: <CABzyjykVvs1FfHc+SS=zDQa=LngCL_syDe2o=pbRqeJjgmPmeQ@mail.gmail.com>

Yes, that's right.

By the way, a few years ago, John had posted an awk script on this alias
that did this for you.
I recently had occasion to need to use it, and found it gave a few problems
with the jdk 6 logs i
was processing, so I fixed a few bugs in it and extended it to summarize
and plot other metrics
of interest to me. I am happy to share my modifications to John;s script
here and on
the OpenJDK PrintGCStats project page later this week.

-- ramki

On Mon, Apr 23, 2012 at 1:03 PM, Taras Tielkes <taras.tielkes at gmail.com>wrote:

> Hi,
>
> Sorry to return this thread to the original question :-)
> The additional data from jstat -snap is indeed quite useful.
>
> However, I think the totals easily harvested from the gc logs are
> accurate enough for my purposes, which is measuring overall allocation
> rate, and overall promotion rate.
> Performing a few manual calculations shows that the promotion volume I
> calculate as the differentce of ParNew delta and total heap delta is
> reasonably close to the "tenuring age 15" age group from the preceding
> ParNew.
> I just want to make sure I'm not missing something obvious here. The
> assumption is of course that PermGen is quite stable, and that
> promotion and CMS failures are relatively rate.
>
> Thanks,
> -tt
>
> On Mon, Apr 23, 2012 at 9:28 PM, Kobe Bryant <kbbryant61 at gmail.com> wrote:
> > sorry for interjecting myself into this interesting conversation:
> >
> >   > There's a PerfData counter that keeps track of the promoted size (in
> > bytes) in a minor GC.
> >   >You could use jstat to fetch the value of that counter, like this:
> >
> > does this give me the number of promoted bytes in the last minor gc? so
> if I
> > have to track promotion volumes
> > at each gc I have to keep polling this metric (and even then I might
> miss an
> > update and lose information, since
> > this info is not cumulative), correct?
> >
> > Also, is there a similar metric to track size harvested from tenured
> space
> > at each full GC?
> >
> > thank you
> >
> > /K
> >
> > On Sun, Apr 22, 2012 at 7:40 PM, Krystal Mok <rednaxelafx at gmail.com>
> wrote:
> >>
> >> Hi Taras,
> >>
> >>> d) When looking at -XX:+PrintTenuringDistribution output, I assume the
> >>> distribution reflects the situation *after* the enclosing ParNew event
> >>> in the log.
> >>
> >>
> >> That's right. The stats are actually printed after the collection has
> >> completed.
> >>
> >> FYI, to get accurate promotion size info, you don't always have to parse
> >> the GC log. There's a PerfData counter that keeps track of the promoted
> size
> >> (in bytes) in a minor GC. You could use jstat to fetch the value of that
> >> counter, like this:
> >>
> >> $ jstat -J-Djstat.showUnsupported=true -snap `pgrep java` | grep
> >> sun.gc.policy.promoted=
> >> sun.gc.policy.promoted=680475760
> >>
> >> There are a couple of other counters that can be played in conjuntion,
> >> e.g. sun.gc.collector.0.invocations, which shows the number of minor
> GCs:
> >>
> >> $ jstat -J-Djstat.showUnsupported=true -snap `pgrep java` | grep
> >> sun.gc.collector.0.invocations=
> >> sun.gc.collector.0.invocations=23
> >>
> >> - Kris
> >>
> >>
> >> On Mon, Apr 23, 2012 at 4:24 AM, Taras Tielkes <taras.tielkes at gmail.com
> >
> >> wrote:
> >>>
> >>> Hi,
> >>>
> >>> We're using a time-series database to store and aggregate monitoring
> >>> data from our systems, including GC behavior.
> >>>
> >>> I'm thinking of adding two metrics:
> >>> * total allocation (in K per minute)
> >>> * total promotion (in K per minute)
> >>>
> >>> The gc logs are the source for this data, and I'd like to verify that
> >>> my understanding of the numbers is correct.
> >>>
> >>> Here's an example verbosegc line of output (we're running ParNew+CMS):
> >>> [GC 2136581.585: [ParNew:345951K->40960K(368640K), 0.0676780 secs]
> >>> 3608692K->3323692K(5201920K), 0.0680220 secs]
> >>>
> >>> a) The delta between the ParNew "before" and "after" is:
> >>> 345951K-40960K=304991K
> >>> My understanding is that the 304991K is the total of (collected in
> >>> young gen + promoted to tenured gen)
> >>> Since this number of composed of two things, it's not directly useful
> by
> >>> itself.
> >>>
> >>> b) The delta between the overall heap "before" and "after" is:
> >>> 3608692K-3323692K=285000K
> >>> I assume that this is effectively the volume that was collected in
> >>> this ParNew cycle.
> >>> Would it be correct to calculate the total allocation rate of the
> >>> running application (in a given period) from summing the total heap
> >>> deltas (in a given timespan)?
> >>>
> >>> I do realize that it's a "collected kilobytes" metric, but I think
> >>> it's close enough to be used as a "delayed" allocation number,
> >>> especially when looking at a timescale of 10 minutes or more.
> >>> It has the additional convenience of requiring to parse the current
> >>> gc.log line only, and not needing to correlate with the preceding
> >>> ParNew event.
> >>>
> >>> c) I take it that the difference between the two deltas (ParNew delta
> >>> and total heap delta) is effectively the promotion volume?
> >>> In the example above, this would give a promotion volume of
> >>> (345951K-40960K)-(3608692K-3323692K)=19991K
> >>>
> >>> d) When looking at -XX:+PrintTenuringDistribution output, I assume the
> >>> distribution reflects the situation *after* the enclosing ParNew event
> >>> in the log.
> >>>
> >>> Thanks in advance for any corrections,
> >>> -tt
> >>> _______________________________________________
> >>> hotspot-gc-use mailing list
> >>> hotspot-gc-use at openjdk.java.net
> >>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
> >>
> >>
> >>
> >> _______________________________________________
> >> hotspot-gc-use mailing list
> >> hotspot-gc-use at openjdk.java.net
> >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
> >>
> >
> >
> > _______________________________________________
> > hotspot-gc-use mailing list
> > hotspot-gc-use at openjdk.java.net
> > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
> >
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120423/17dc0686/attachment-0001.html 

From Bond.Chen at lombardrisk.com  Tue Apr 24 02:49:31 2012
From: Bond.Chen at lombardrisk.com (Bond Chen)
Date: Tue, 24 Apr 2012 10:49:31 +0100
Subject: Promotion Failed when the Old Generation Usage is very low.
Message-ID: <4F96E7AB.9AAE.00F7.0@lombardrisk.com>

Hi ,

We're suffering high frequent promotion failed and concurrent mode failure, cause very long GC pause(5 seconds to 1000 seconds even more some time) attached the '1st promote failed' and '49th promotion failed' of gc.log

1, The '1st promote failed' caused by the old generation usage is too high, no enough space for promotion, but the  '49th promotion failed', only used 
2615456K out of 10387456K, what happed? 

2, Does the CMS throwing 'Concurrent Mode Failure' combat the old generation? move all objects together and leave only one free block? or Only 'Full GC' does this?

3, when will 'Promotion failure' cause ''Concurrent Mode Failure' and some time 'Full GC' ?

Regards,
Bond

/****parameter ***/
### New JVM Parameter
#Below line changed per RH recommendation 15 Dec 2009
#export RUN_ARGS=" -d64 -server -Xms2048M -Xmx12144M -XX:PermSize=512m -XX:MaxPermSize=512m -Xss1024k "
export RUN_ARGS=" -server -d64 -Xms2048M -Xmx12144M -XX:PermSize=512m -XX:MaxPermSize=512m -Xss1024k "

export RUN_ARGS=" $RUN_ARGS -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+UseTLAB -XX:+CMSIncrementalMode "

#Below line commented per RH recommendation 15 Dec 2009
#export RUN_ARGS=" $RUN_ARGS -XX:+UseCMSCompactAtFullCollection "

#Below line changed per RH recommendation 15 Dec 2009
#export RUN_ARGS=" $RUN_ARGS -XX:+CMSIncrementalPacing -XX:CMSIncrementalDutyCycleMin=0 -XX:CMSIncrementalDutyCycle=10 -XX:MaxTenuringThreshold=0 "
export RUN_ARGS=" $RUN_ARGS -XX:+CMSIncrementalPacing -XX:CMSIncrementalDutyCycleMin=0 -XX:CMSIncrementalDutyCycle=10 -XX:MaxTenuringThreshold=32 "

#Below 2 lines added per RH recommendation 15 Dec 2009
#export RUN_ARGS=" -XX:ParallelGCThreads=13 "
#export RUN_ARGS=" -XX:SurvivorRatio=48 "

#Below 2 lines added per RH recommendation 16 Dec 2009
RUN_ARGS=" $RUN_ARGS -XX:ParallelGCThreads=13 "
RUN_ARGS=" $RUN_ARGS -XX:SurvivorRatio=48 "

### set for cluster monitor  added on 25-Jun-2011
export RUN_ARGS="$RUN_ARGS -Djboss.cluster.monitor.switch=y";
export RUN_ARGS="$RUN_ARGS -Djboss.cluster.number=2";

#Below line changed with RELEASE_2009_1_SP10.2 on 26 Feb 2010
#export RUN_ARGS=" $RUN_ARGS -XX:CMSInitiatingOccupancyFraction=70 -XX:+PrintTenuringDistribution -Xmn512m -XX:+UseLargePages -XX:LargePageSizeInBytes=64k "
export RUN_ARGS=" $RUN_ARGS -XX:CMSInitiatingOccupancyFraction=60 -Xmn2000m -XX:+UseLargePages -XX:LargePageSizeInBytes=64k "

#Below line added with RELEASE_2009_1_SP10.2 on 26 Feb 2010
export RUN_ARGS=" $RUN_ARGS -XX:+CMSClassUnloadingEnabled -XX:+ExplicitGCInvokesConcurrent -XX:+AggressiveOpts "

export RUN_ARGS=" $RUN_ARGS -XX:+PrintGCDetails -XX:+PrintGCApplicationStoppedTime -Xloggc:./gc_${start_ts}.log "

export RUN_ARGS=" $RUN_ARGS  -Dsun.rmi.dgc.server.gcInterval=18000000 -Dsun.rmi.dgc.client.gcInterval=18000000 -verbose:gc"


/***parameter


/** the 1st promotion failed **/

169682.980: [GC Before GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 7127332
Max   Chunk Size: 6041118
Number of Blocks: 1785
Av.  Block  Size: 3992
Tree      Height: 24
Before GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 6834133
Max   Chunk Size: 97353
Number of Blocks: 4773
Av.  Block  Size: 1431
Tree      Height: 27
169682.981: [ParNew (promotion failed): 2007040K->2007040K(2007040K), 48.9558338 secs]169731.937: [CMS169741.903: [CMS-concurrent-sweep: 10.823/99.414 secs] [Times: user=127.09 sys=25.97, real=99.41 secs] 
 (concurrent mode failure): 8681490K->2319271K(10387456K), 44.6304362 secs] 10395485K->2319271K(12394496K), [CMS Perm : 291584K->290856K(524288K)]After GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 1032711195
Max   Chunk Size: 1032711195
Number of Blocks: 1
Av.  Block  Size: 1032711195
Tree      Height: 1
After GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 0
Max   Chunk Size: 0
Number of Blocks: 0
Tree      Height: 0
 icms_dc=16 , 93.5876901 secs] [Times: user=97.28 sys=21.58, real=93.59 secs] 

/** the 1st promotion failed **/


/** the 49th promotion failed ***/
298786.901: [GC Before GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 236997970
Max   Chunk Size: 236997970
Number of Blocks: 1
Av.  Block  Size: 236997970
Tree      Height: 1
Before GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 0
Max   Chunk Size: 0
Number of Blocks: 0
Tree      Height: 0
298786.902: [ParNew (promotion failed): 2007039K->2007040K(2007040K), 4.5565939 secs]298791.458: [CMS: 2615456K->1813239K(10387456K), 19.2232319 secs] 4346089K->1813239K(12394496K), [CMS Perm : 299206K->299126K(524288K)]After GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 1097483360
Max   Chunk Size: 1097483360
Number of Blocks: 1
Av.  Block  Size: 1097483360
Tree      Height: 1
After GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 0
Max   Chunk Size: 0
Number of Blocks: 0
Tree      Height: 0
 icms_dc=0 , 23.7805042 secs] [Times: user=25.47 sys=0.02, real=23.78 secs] 
Total time for which application threads were stopped: 23.7861234 seconds
/** the 49th promotion failed ***/

This e-mail together with any attachments (the "Message") is confidential and may contain privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and delete this Message from your system.  Any unauthorized copying, disclosure, distribution or use of this Message is strictly forbidden.


From taras.tielkes at gmail.com  Tue Apr 24 15:08:52 2012
From: taras.tielkes at gmail.com (Taras Tielkes)
Date: Wed, 25 Apr 2012 00:08:52 +0200
Subject: Two basic questions on -verbosegc and
	-XX:+PrintTenuringDistribution
In-Reply-To: <CABzyjykVvs1FfHc+SS=zDQa=LngCL_syDe2o=pbRqeJjgmPmeQ@mail.gmail.com>
References: <CA+R7V78bTOkvaYgwNCPC2MfiqdV-QBtOidz3nUXjb9bwZ5FrNg@mail.gmail.com>
	<CA+cQ+tQroj4rUhopFyEvW+5EYfC0fRSzMS14nBjkmEKZQ1ZXbw@mail.gmail.com>
	<CAMJNr9k7ZRe-MwyPeO-HsD90Q5KbtXV0Vd_9-E6Ptzm=1vZpgg@mail.gmail.com>
	<CA+R7V79xFH=ZPdfL_9yZJ3DBgCP+qX=qupG8eP8mxMKuHCCW_Q@mail.gmail.com>
	<CABzyjykVvs1FfHc+SS=zDQa=LngCL_syDe2o=pbRqeJjgmPmeQ@mail.gmail.com>
Message-ID: <CA+R7V7-fKezo6MkBUUZrR_ewKUD3J0V7fJK=tJo5d-h16TCTwg@mail.gmail.com>

Hi,

One correction to my original post.

To collect a metric representing the overall "allocation rate" of our
application, I should be summing the ParNew deltas, not the "overall
heap" deltas from the gc log.

The "overall heap" delta will reflect the amount collected from ParNew.
However, the ParNew delta will also include the volume of objects
promoted to tenured gen.

This is a more accurate (albeit delayed) representation of allocation
volume, since we don't care how objects leave the new gen - by being
collected or by being promoted.
If it left the new gen by promotion, it was briefly before allocated,
and thus should contribute to the reported allocation volume.

Cheers,
-tt

On Mon, Apr 23, 2012 at 10:51 PM, Srinivas Ramakrishna
<ysr1729 at gmail.com> wrote:
> Yes, that's right.
>
> By the way, a few years ago, John had posted an awk script on this alias
> that did this for you.
> I recently had occasion to need to use it, and found it gave a few problems
> with the jdk 6 logs i
> was processing, so I fixed a few bugs in it and extended it to summarize and
> plot other metrics
> of interest to me. I am happy to share my modifications to John;s script
> here and on
> the OpenJDK PrintGCStats project page later this week.
>
> -- ramki
>
>
> On Mon, Apr 23, 2012 at 1:03 PM, Taras Tielkes <taras.tielkes at gmail.com>
> wrote:
>>
>> Hi,
>>
>> Sorry to return this thread to the original question :-)
>> The additional data from jstat -snap is indeed quite useful.
>>
>> However, I think the totals easily harvested from the gc logs are
>> accurate enough for my purposes, which is measuring overall allocation
>> rate, and overall promotion rate.
>> Performing a few manual calculations shows that the promotion volume I
>> calculate as the differentce of ParNew delta and total heap delta is
>> reasonably close to the "tenuring age 15" age group from the preceding
>> ParNew.
>> I just want to make sure I'm not missing something obvious here. The
>> assumption is of course that PermGen is quite stable, and that
>> promotion and CMS failures are relatively rate.
>>
>> Thanks,
>> -tt
>>
>> On Mon, Apr 23, 2012 at 9:28 PM, Kobe Bryant <kbbryant61 at gmail.com> wrote:
>> > sorry for interjecting myself into this interesting conversation:
>> >
>> > ? >?There's a PerfData counter that keeps track of the promoted size (in
>> > bytes) in a minor GC.
>> > ? >You could use jstat to fetch the value of that counter, like this:
>> >
>> > does this give me the number of promoted bytes in the last minor gc? so
>> > if I
>> > have to track promotion volumes
>> > at each gc I have to keep polling this metric (and even then I might
>> > miss an
>> > update and lose information, since
>> > this info is not cumulative), correct?
>> >
>> > Also, is there a similar metric to track size harvested from tenured
>> > space
>> > at each full GC?
>> >
>> > thank you
>> >
>> > /K
>> >
>> > On Sun, Apr 22, 2012 at 7:40 PM, Krystal Mok <rednaxelafx at gmail.com>
>> > wrote:
>> >>
>> >> Hi Taras,
>> >>
>> >>> d) When looking at -XX:+PrintTenuringDistribution output, I assume the
>> >>> distribution reflects the situation *after* the enclosing ParNew event
>> >>> in the log.
>> >>
>> >>
>> >> That's right. The stats are actually printed after the collection has
>> >> completed.
>> >>
>> >> FYI, to get accurate promotion size info, you don't always have to
>> >> parse
>> >> the GC log. There's a PerfData counter that keeps track of the promoted
>> >> size
>> >> (in bytes) in a minor GC. You could use jstat to fetch the value of
>> >> that
>> >> counter, like this:
>> >>
>> >> $ jstat -J-Djstat.showUnsupported=true -snap `pgrep java` | grep
>> >> sun.gc.policy.promoted=
>> >> sun.gc.policy.promoted=680475760
>> >>
>> >> There are a couple of other counters that can be played in conjuntion,
>> >> e.g. sun.gc.collector.0.invocations, which shows the number of minor
>> >> GCs:
>> >>
>> >> $ jstat -J-Djstat.showUnsupported=true -snap `pgrep java` | grep
>> >> sun.gc.collector.0.invocations=
>> >> sun.gc.collector.0.invocations=23
>> >>
>> >> - Kris
>> >>
>> >>
>> >> On Mon, Apr 23, 2012 at 4:24 AM, Taras Tielkes
>> >> <taras.tielkes at gmail.com>
>> >> wrote:
>> >>>
>> >>> Hi,
>> >>>
>> >>> We're using a time-series database to store and aggregate monitoring
>> >>> data from our systems, including GC behavior.
>> >>>
>> >>> I'm thinking of adding two metrics:
>> >>> * total allocation (in K per minute)
>> >>> * total promotion (in K per minute)
>> >>>
>> >>> The gc logs are the source for this data, and I'd like to verify that
>> >>> my understanding of the numbers is correct.
>> >>>
>> >>> Here's an example verbosegc line of output (we're running ParNew+CMS):
>> >>> [GC 2136581.585: [ParNew:345951K->40960K(368640K), 0.0676780 secs]
>> >>> 3608692K->3323692K(5201920K), 0.0680220 secs]
>> >>>
>> >>> a) The delta between the ParNew "before" and "after" is:
>> >>> 345951K-40960K=304991K
>> >>> My understanding is that the 304991K is the total of (collected in
>> >>> young gen + promoted to tenured gen)
>> >>> Since this number of composed of two things, it's not directly useful
>> >>> by
>> >>> itself.
>> >>>
>> >>> b) The delta between the overall heap "before" and "after" is:
>> >>> 3608692K-3323692K=285000K
>> >>> I assume that this is effectively the volume that was collected in
>> >>> this ParNew cycle.
>> >>> Would it be correct to calculate the total allocation rate of the
>> >>> running application (in a given period) from summing the total heap
>> >>> deltas (in a given timespan)?
>> >>>
>> >>> I do realize that it's a "collected kilobytes" metric, but I think
>> >>> it's close enough to be used as a "delayed" allocation number,
>> >>> especially when looking at a timescale of 10 minutes or more.
>> >>> It has the additional convenience of requiring to parse the current
>> >>> gc.log line only, and not needing to correlate with the preceding
>> >>> ParNew event.
>> >>>
>> >>> c) I take it that the difference between the two deltas (ParNew delta
>> >>> and total heap delta) is effectively the promotion volume?
>> >>> In the example above, this would give a promotion volume of
>> >>> (345951K-40960K)-(3608692K-3323692K)=19991K
>> >>>
>> >>> d) When looking at -XX:+PrintTenuringDistribution output, I assume the
>> >>> distribution reflects the situation *after* the enclosing ParNew event
>> >>> in the log.
>> >>>
>> >>> Thanks in advance for any corrections,
>> >>> -tt
>> >>> _______________________________________________
>> >>> hotspot-gc-use mailing list
>> >>> hotspot-gc-use at openjdk.java.net
>> >>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>> >>
>> >>
>> >>
>> >> _______________________________________________
>> >> hotspot-gc-use mailing list
>> >> hotspot-gc-use at openjdk.java.net
>> >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>> >>
>> >
>> >
>> > _______________________________________________
>> > hotspot-gc-use mailing list
>> > hotspot-gc-use at openjdk.java.net
>> > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>> >
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>

From ysr1729 at gmail.com  Tue Apr 24 22:09:32 2012
From: ysr1729 at gmail.com (Srinivas Ramakrishna)
Date: Tue, 24 Apr 2012 22:09:32 -0700
Subject: CMS Full GC
In-Reply-To: <A46594AC6A252142A6EF919EAF7EAD282686CB@PA-MBX04.na.tibco.com>
References: <A46594AC6A252142A6EF919EAF7EAD28264DC0@PA-MBX04.na.tibco.com>
	<CABzyjymrXEXzeHAhfk3JcXCJfhSdf1bxXah=bUBdSKE5yCDVuQ@mail.gmail.com>
	<A46594AC6A252142A6EF919EAF7EAD282686CB@PA-MBX04.na.tibco.com>
Message-ID: <CABzyjynUcwd5qjUiic+T-Kz8dAZPtER3MFhGq+MKAd8_hekWxA@mail.gmail.com>

Hi Shiv --

Which version of the JDK are you on? As I said there was a temporary
regression in this behaviour (i.e. expand without full gc) with CMS,
which was fixed up later. Unfortunately, can't recall the CR# or the
versions of that, although i can probably  dig that up from the mercurial
history if needed, i don't have the sources handy at the moment.

More importantly, by default CMS does not collect the perm gen in a
concurrent collection cycle, so you have to explicitly enable concurrent
perm gen
collection via -XX:+CMSClassUnloadingEnabled (and in older versions also
-XX:+CMSPermGenSweepingEnabled).

If you are stuck on a version of the JVM where the perm gen expansion
regression exists, you should explicitly set both
-XX:PermSize and -XX:MaxPermSize to the maximum size of perm gen. (And
definitely enable perm gen collection via
he flags listed in the last para.)

Hopefully that should get rid of these "unwanted" full collections.

-- ramki

On Tue, Apr 24, 2012 at 9:09 PM, Shivkumar Chelwa <schelwa at tibco.com> wrote:

> **
>
> Hi Ramki,****
>
> ** **
>
> I enabled ?jstat ?gccause? for the application instance and found
> following few GC causes in the logs.****
>
> ** **
>
>    1. Allocation Failure ? not sure what that means****
>    2. Permanent Generation Full ? I have few doubts here.****
>       1. The MaxPermSize is set to 256m but the gc log file displays a
>       different size 74240K. See the following line from gc log file.****
>
> 56876.963: [Full GC 56876.963: [CMS: 4181041K->3724534K(7898752K),
> 77.5881180 secs] 4211397K->3724534K(8339648K), [CMS Perm : *
> 73972K->73511K(74240K)],* 77.5901936 secs] [Times: user=77.47 sys=0.19,
> real=77.59 secs]****
>
>     1. Why should there be a ?Full GC? for permanent generation
>       collection?****
>       2. The permanent generation utilization is consistently over 99%
>       and after ?Full GC? it comes down to 60%, why it didn?t expand the
>       committed memory instead of doing a full gc?****
>       3. JConsole shows following stats for ?CMS Perm Gen? sizes****
>
>  **                                                    i.     **Used:     74,329
> kbytes ****
>
> **                                                   ii.     **Committed:     74,432
> kbytes ****
>
> **                                                  iii.     **Max:    262,144
> kbytes ****
>
> ** **
>
> ** **
>
> These are the garbage collection setting I am using for application,****
>
> ** **
>
> -server -d64 -javaagent:instrumentation.jar -XX:MaxPermSize=256m
> -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+DisableExplicitGC
> -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly
> -verbose:gc -Xloggc:/logs/LB01.log -XX:+PrintGCTimeStamps
> -XX:+PrintGCDetails -Xmx8192M -Xms8192M -Xss256K****
>
> ** **
>
> ** **
>
> There are few lines from ?jstat ?gccause? output where it displays
> ?Permanent Generation Full? as the gc cause. Also attaching the gc log file
> and ?jstat ?gccasue? output for reference.****
>
> ** **
>
>   S0     S1          E      O      P       YGC    YGCT  FGC    FGCT
> GCT    LGCC                 GCC                 ****
>
>   0.00  20.05   5.24  52.93  99.02   1625   63.648     1    0.000   63.648
>     No GC                Permanent Generation Full****
>
>   0.00  20.05   5.24  52.93  99.02   1625   63.648     1    0.000   63.648
>     No GC                Permanent Generation Full****
>
>   0.00   0.00   0.00  47.15  60.00   1625   63.648     1   77.588  141.236
>    Permanent Generation Full No GC               ****
>
>   0.00   0.00  41.19  47.15  60.02   1625   63.648     1   77.588
> 141.236   Permanent Generation Full No GC               ****
>
> ** **
>
> Thanks,****
>
> Shiv****
>
> ** **
>  ------------------------------
>
> *From:* Srinivas Ramakrishna [mailto:ysr1729 at gmail.com]
> *Sent:* 17 April 2012 15:07
> *To:* Shivkumar Chelwa
> *Cc:* **hotspot-gc-use at openjdk.java.net**
> *Subject:* Re: CMS Full GC****
>
> ** **
>
> Is it possible that you are GC'ing here to expand perm gen. Check if
> permgen footprint changed between the two JVM releases (when running yr
> application).
>
> Now, CMS should quietly expand perm gen without doing a stop-world GC, but
> there was a temporary regression in that functionality before it was fixed
> again.
> I can't however recall the JVM versions where the regression was
> introduced and then fixed. But all of this is handwaving on my part.
> If you run 6u22 and 6u29 both with -XX:+PrintHeapAtGC, you might have more
> visibility into why the GC is kicking in. A longer log would allow
> the community to perhaps provide suggestions as well.
>
> Which reminds me that there is a bug in the printing of GC cause (as
> printed by jstat) which needs to be fixed. HotSpot/GC folk, have you
> noticed that we never
> see a "perm gen allocation" as the GC cause even when that's really the
> reason for a full gc? (not that that should happen here where CMS is being
> used.)
>
> -- ramki****
>
>  On Tue, Apr 17, 2012 at 8:08 AM, Shivkumar Chelwa <schelwa at tibco.com>
> wrote:****
>
> Hi,****
>
>  ****
>
> Till date I was using JRE 6u22 with following garbage collection
> parameters and the CMS cycle use to kick-in appropriately (when heap
> reaches 75%)****
>
>  ****
>
>  ****
>
> -server -d64 -Xms2048m -Xmx2048m -javaagent: my-instrumentation.jar
> -XX:MaxPermSize=256m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
> -XX:+DisableExplicitGC -XX:CMSInitiatingOccupancyFraction=75
> -XX:+UseCMSInitiatingOccupancyOnly -verbose:gc -Xloggc:LB01.log
> -XX:+PrintGCTimeStamps -XX:+PrintGCDetails
> -Djava.library=/usr/java/jre/lib/sparcv9/server/libjvm.so -Xmx8192M
> -Xms8192M -Xss256K****
>
>  ****
>
> But I switched to JRE 6u29 and see the *CMS Full GC* happening randomly.
> Can you please help me undercover this mystery. Here is one of the log
> message from gc log file.****
>
>  ****
>
> 13475.239: [*Full GC* 13475.239: [CMS: 4321575K->3717474K(7898752K), *54.0602376
> secs*] 4412277K->3717474K(8339648K), [CMS Perm : 73791K->73339K(74048K)],
> 54.0615557 secs] [Times: user=53.97 sy****
>
> s=0.12, real=54.06 secs]****
>
>  ****
>
>  ****
>
> Kindly help.****
>
>  ****
>
>  ****
>
> Regards,****
>
> Shiv****
>
>  ****
>
>  ****
>
>  ****
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use****
>
> ** **
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120424/84e221ff/attachment-0001.html 

From schelwa at tibco.com  Tue Apr 24 22:55:57 2012
From: schelwa at tibco.com (Shivkumar Chelwa)
Date: Wed, 25 Apr 2012 05:55:57 +0000
Subject: CMS Full GC
In-Reply-To: <CABzyjynUcwd5qjUiic+T-Kz8dAZPtER3MFhGq+MKAd8_hekWxA@mail.gmail.com>
Message-ID: <A46594AC6A252142A6EF919EAF7EAD28268761@PA-MBX04.na.tibco.com>

Using JRE 6 update 29 on Solaris 10 SPARC(64 bit)

Regards,
Shiv

schelwa at tibco.com

From: Srinivas Ramakrishna [mailto:ysr1729 at gmail.com]
Sent: Tuesday, April 24, 2012 10:09 PM
To: Shivkumar Chelwa
Cc: hotspot-gc-use at openjdk.java.net <hotspot-gc-use at openjdk.java.net>
Subject: Re: CMS Full GC

Hi Shiv --

Which version of the JDK are you on? As I said there was a temporary regression in this behaviour (i.e. expand without full gc) with CMS,
which was fixed up later. Unfortunately, can't recall the CR# or the versions of that, although i can probably  dig that up from the mercurial
history if needed, i don't have the sources handy at the moment.

More importantly, by default CMS does not collect the perm gen in a concurrent collection cycle, so you have to explicitly enable concurrent perm gen
collection via -XX:+CMSClassUnloadingEnabled (and in older versions also -XX:+CMSPermGenSweepingEnabled).

If you are stuck on a version of the JVM where the perm gen expansion regression exists, you should explicitly set both
-XX:PermSize and -XX:MaxPermSize to the maximum size of perm gen. (And definitely enable perm gen collection via
he flags listed in the last para.)

Hopefully that should get rid of these "unwanted" full collections.

-- ramki

On Tue, Apr 24, 2012 at 9:09 PM, Shivkumar Chelwa <schelwa at tibco.com<mailto:schelwa at tibco.com>> wrote:
Hi Ramki,

I enabled ?jstat ?gccause? for the application instance and found following few GC causes in the logs.


  1.  Allocation Failure ? not sure what that means
  2.  Permanent Generation Full ? I have few doubts here.
     *   The MaxPermSize is set to 256m but the gc log file displays a different size 74240K. See the following line from gc log file.
56876.963: [Full GC 56876.963: [CMS: 4181041K->3724534K(7898752K), 77.5881180 secs] 4211397K->3724534K(8339648K), [CMS Perm : 73972K->73511K(74240K)], 77.5901936 secs] [Times: user=77.47 sys=0.19, real=77.59 secs]

     *   Why should there be a ?Full GC? for permanent generation collection?
     *   The permanent generation utilization is consistently over 99% and after ?Full GC? it comes down to 60%, why it didn?t expand the committed memory instead of doing a full gc?
     *   JConsole shows following stats for ?CMS Perm Gen? sizes
                                                    i.     Used:     74,329 kbytes
                                                   ii.     Committed:     74,432 kbytes
                                                  iii.     Max:    262,144 kbytes


These are the garbage collection setting I am using for application,

-server -d64 -javaagent:instrumentation.jar -XX:MaxPermSize=256m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+DisableExplicitGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -verbose:gc -Xloggc:/logs/LB01.log -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -Xmx8192M -Xms8192M -Xss256K


There are few lines from ?jstat ?gccause? output where it displays ?Permanent Generation Full? as the gc cause. Also attaching the gc log file and ?jstat ?gccasue? output for reference.

  S0     S1          E      O      P       YGC    YGCT  FGC    FGCT     GCT    LGCC                 GCC
  0.00  20.05   5.24  52.93  99.02   1625   63.648     1    0.000   63.648     No GC                Permanent Generation Full
  0.00  20.05   5.24  52.93  99.02   1625   63.648     1    0.000   63.648     No GC                Permanent Generation Full
  0.00   0.00   0.00  47.15  60.00   1625   63.648     1   77.588  141.236    Permanent Generation Full No GC
  0.00   0.00  41.19  47.15  60.02   1625   63.648     1   77.588  141.236   Permanent Generation Full No GC

Thanks,
Shiv

________________________________
From: Srinivas Ramakrishna [mailto:ysr1729 at gmail.com<mailto:ysr1729 at gmail.com>]
Sent: 17 April 2012 15:07
To: Shivkumar Chelwa
Cc: hotspot-gc-use at openjdk.java.net<mailto:hotspot-gc-use at openjdk.java.net>
Subject: Re: CMS Full GC

Is it possible that you are GC'ing here to expand perm gen. Check if permgen footprint changed between the two JVM releases (when running yr application).

Now, CMS should quietly expand perm gen without doing a stop-world GC, but there was a temporary regression in that functionality before it was fixed again.
I can't however recall the JVM versions where the regression was introduced and then fixed. But all of this is handwaving on my part.
If you run 6u22 and 6u29 both with -XX:+PrintHeapAtGC, you might have more visibility into why the GC is kicking in. A longer log would allow
the community to perhaps provide suggestions as well.

Which reminds me that there is a bug in the printing of GC cause (as printed by jstat) which needs to be fixed. HotSpot/GC folk, have you noticed that we never
see a "perm gen allocation" as the GC cause even when that's really the reason for a full gc? (not that that should happen here where CMS is being used.)

-- ramki
On Tue, Apr 17, 2012 at 8:08 AM, Shivkumar Chelwa <schelwa at tibco.com<mailto:schelwa at tibco.com>> wrote:
Hi,

Till date I was using JRE 6u22 with following garbage collection parameters and the CMS cycle use to kick-in appropriately (when heap reaches 75%)


-server -d64 -Xms2048m -Xmx2048m -javaagent: my-instrumentation.jar -XX:MaxPermSize=256m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+DisableExplicitGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -verbose:gc -Xloggc:LB01.log -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -Djava.library=/usr/java/jre/lib/sparcv9/server/libjvm.so -Xmx8192M -Xms8192M -Xss256K

But I switched to JRE 6u29 and see the CMS Full GC happening randomly. Can you please help me undercover this mystery. Here is one of the log message from gc log file.

13475.239: [Full GC 13475.239: [CMS: 4321575K->3717474K(7898752K), 54.0602376 secs] 4412277K->3717474K(8339648K), [CMS Perm : 73791K->73339K(74048K)], 54.0615557 secs] [Times: user=53.97 sy
s=0.12, real=54.06 secs]


Kindly help.


Regards,
Shiv


_______________________________________________
hotspot-gc-use mailing list
hotspot-gc-use at openjdk.java.net<mailto:hotspot-gc-use at openjdk.java.net>
http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120425/add7f67f/attachment-0001.html 

From ysr1729 at gmail.com  Wed Apr 25 00:01:37 2012
From: ysr1729 at gmail.com (Srinivas Ramakrishna)
Date: Wed, 25 Apr 2012 00:01:37 -0700
Subject: Promotion Failed when the Old Generation Usage is very low.
In-Reply-To: <4F96E7AB.9AAE.00F7.0@lombardrisk.com>
References: <4F96E7AB.9AAE.00F7.0@lombardrisk.com>
Message-ID: <CABzyjynH+=dtzAN5T2=rztu+c5tBBRu0H6ZyOm+sGFvi5bN3cA@mail.gmail.com>

Bond, you are apparently using an MP box. I'd suggest losing the
"incremental" options entirely and dropping the
max tenuring threshold to 8 or so. I'd make use of the size of the young
gen and the survivor spaces to
control promotion into the old gen, which i would size at two times your
application footprint plus the size of
the young gen as a starting point and refine from there. There have been
some suggestions on this alias
from Chi-Ho Kwok etc. on the importance of reducing promotion of very young
objects into the old
generation to prevent fragmentation. LOnger-lived objects typically imply
(for most but not all applications)
relatively stable and less non-stationary distributions which CMS block
inventorying heuristics prefer.

more inline below...

On Tue, Apr 24, 2012 at 2:49 AM, Bond Chen <Bond.Chen at lombardrisk.com>wrote:

> Hi ,
>
> We're suffering high frequent promotion failed and concurrent mode
> failure, cause very long GC pause(5 seconds to 1000 seconds even more some
> time) attached the '1st promote failed' and '49th promotion failed' of
> gc.log
>
> 1, The '1st promote failed' caused by the old generation usage is too
> high, no enough space for promotion, but the  '49th promotion failed', only
> used
> 2615456K out of 10387456K, what happed?
>

either a large object allocation or fragmentation or more likely both.


>
> 2, Does the CMS throwing 'Concurrent Mode Failure' combat the old
> generation? move all objects together and leave only one free block? or
> Only 'Full GC' does this?
>

concurrent mode failure results in compaction, yes.


>
> 3, when will 'Promotion failure' cause ''Concurrent Mode Failure' and some
> time 'Full GC' ?
>

it's just a notional difference. Both should be called "concurrent mode
failure". I think the newer mesages say "concurrent mode interrupted"
and "full gc" respectively. In the latter case there is not an ongoing
concurrent cycle that was interrupted. From the standpoint of
the effect on the application (long pause for gc) and of the state of the
heap after gc (fully compacted) there is little difference.
For historical reasons, "concurrent mode failure" usually results in longer
pauses because an ongoing concurrent collection phase
first completes an ongoing phase before bailing to compaction, whereas in
the latter case there is no such delay so is usually
less painful.

-- ramki


> Regards,
> Bond
>
> /****parameter ***/
> ### New JVM Parameter
> #Below line changed per RH recommendation 15 Dec 2009
> #export RUN_ARGS=" -d64 -server -Xms2048M -Xmx12144M -XX:PermSize=512m
> -XX:MaxPermSize=512m -Xss1024k "
> export RUN_ARGS=" -server -d64 -Xms2048M -Xmx12144M -XX:PermSize=512m
> -XX:MaxPermSize=512m -Xss1024k "
>
> export RUN_ARGS=" $RUN_ARGS -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
> -XX:+CMSParallelRemarkEnabled -XX:+UseTLAB -XX:+CMSIncrementalMode "
>
> #Below line commented per RH recommendation 15 Dec 2009
> #export RUN_ARGS=" $RUN_ARGS -XX:+UseCMSCompactAtFullCollection "
>
> #Below line changed per RH recommendation 15 Dec 2009
> #export RUN_ARGS=" $RUN_ARGS -XX:+CMSIncrementalPacing
> -XX:CMSIncrementalDutyCycleMin=0 -XX:CMSIncrementalDutyCycle=10
> -XX:MaxTenuringThreshold=0 "
> export RUN_ARGS=" $RUN_ARGS -XX:+CMSIncrementalPacing
> -XX:CMSIncrementalDutyCycleMin=0 -XX:CMSIncrementalDutyCycle=10
> -XX:MaxTenuringThreshold=32 "
>
> #Below 2 lines added per RH recommendation 15 Dec 2009
> #export RUN_ARGS=" -XX:ParallelGCThreads=13 "
> #export RUN_ARGS=" -XX:SurvivorRatio=48 "
>
> #Below 2 lines added per RH recommendation 16 Dec 2009
> RUN_ARGS=" $RUN_ARGS -XX:ParallelGCThreads=13 "
> RUN_ARGS=" $RUN_ARGS -XX:SurvivorRatio=48 "
>
> ### set for cluster monitor  added on 25-Jun-2011
> export RUN_ARGS="$RUN_ARGS -Djboss.cluster.monitor.switch=y";
> export RUN_ARGS="$RUN_ARGS -Djboss.cluster.number=2";
>
> #Below line changed with RELEASE_2009_1_SP10.2 on 26 Feb 2010
> #export RUN_ARGS=" $RUN_ARGS -XX:CMSInitiatingOccupancyFraction=70
> -XX:+PrintTenuringDistribution -Xmn512m -XX:+UseLargePages
> -XX:LargePageSizeInBytes=64k "
> export RUN_ARGS=" $RUN_ARGS -XX:CMSInitiatingOccupancyFraction=60
> -Xmn2000m -XX:+UseLargePages -XX:LargePageSizeInBytes=64k "
>
> #Below line added with RELEASE_2009_1_SP10.2 on 26 Feb 2010
> export RUN_ARGS=" $RUN_ARGS -XX:+CMSClassUnloadingEnabled
> -XX:+ExplicitGCInvokesConcurrent -XX:+AggressiveOpts "
>
> export RUN_ARGS=" $RUN_ARGS -XX:+PrintGCDetails
> -XX:+PrintGCApplicationStoppedTime -Xloggc:./gc_${start_ts}.log "
>
> export RUN_ARGS=" $RUN_ARGS  -Dsun.rmi.dgc.server.gcInterval=18000000
> -Dsun.rmi.dgc.client.gcInterval=18000000 -verbose:gc"
>
>
>
> /***parameter
>
>
>
>
>
>
>
> /** the 1st promotion failed **/
>
> 169682.980: [GC Before GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 7127332
> Max   Chunk Size: 6041118
> Number of Blocks: 1785
> Av.  Block  Size: 3992
> Tree      Height: 24
> Before GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 6834133
> Max   Chunk Size: 97353
> Number of Blocks: 4773
> Av.  Block  Size: 1431
> Tree      Height: 27
> 169682.981: [ParNew (promotion failed): 2007040K->2007040K(2007040K),
> 48.9558338 secs]169731.937: [CMS169741.903: [CMS-concurrent-sweep:
> 10.823/99.414 secs] [Times: user=127.09 sys=25.97, real=99.41 secs]
>  (concurrent mode failure): 8681490K->2319271K(10387456K), 44.6304362
> secs] 10395485K->2319271K(12394496K), [CMS Perm :
> 291584K->290856K(524288K)]After GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 1032711195
> Max   Chunk Size: 1032711195
> Number of Blocks: 1
> Av.  Block  Size: 1032711195
> Tree      Height: 1
> After GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 0
> Max   Chunk Size: 0
> Number of Blocks: 0
> Tree      Height: 0
>  icms_dc=16 , 93.5876901 secs] [Times: user=97.28 sys=21.58, real=93.59
> secs]
>
> /** the 1st promotion failed **/
>
>
>
> /** the 49th promotion failed ***/
> 298786.901: [GC Before GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 236997970
> Max   Chunk Size: 236997970
> Number of Blocks: 1
> Av.  Block  Size: 236997970
> Tree      Height: 1
> Before GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 0
> Max   Chunk Size: 0
> Number of Blocks: 0
> Tree      Height: 0
> 298786.902: [ParNew (promotion failed): 2007039K->2007040K(2007040K),
> 4.5565939 secs]298791.458: [CMS: 2615456K->1813239K(10387456K), 19.2232319
> secs] 4346089K->1813239K(12394496K), [CMS Perm :
> 299206K->299126K(524288K)]After GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 1097483360
> Max   Chunk Size: 1097483360
> Number of Blocks: 1
> Av.  Block  Size: 1097483360
> Tree      Height: 1
> After GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 0
> Max   Chunk Size: 0
> Number of Blocks: 0
> Tree      Height: 0
>  icms_dc=0 , 23.7805042 secs] [Times: user=25.47 sys=0.02, real=23.78 secs]
> Total time for which application threads were stopped: 23.7861234 seconds
> /** the 49th promotion failed ***/
>
> This e-mail together with any attachments (the "Message") is confidential
> and may contain privileged information. If you are not the intended
> recipient (or have received this e-mail in error) please notify the sender
> immediately and delete this Message from your system.  Any unauthorized
> copying, disclosure, distribution or use of this Message is strictly
> forbidden.
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120425/fd0eec46/attachment.html 

From Bond.Chen at lombardrisk.com  Wed Apr 25 23:45:58 2012
From: Bond.Chen at lombardrisk.com (Bond Chen)
Date: Thu, 26 Apr 2012 07:45:58 +0100
Subject: Promotion Failed when the Old Generation Usage is very
 low.
In-Reply-To: <CABzyjynH+=dtzAN5T2=rztu+c5tBBRu0H6ZyOm+sGFvi5bN3cA@mail.gmail.com>
References: <4F96E7AB.9AAE.00F7.0@lombardrisk.com>
	<CABzyjynH+=dtzAN5T2=rztu+c5tBBRu0H6ZyOm+sGFvi5bN3cA@mail.gmail.com>
Message-ID: <4F995FA6.9AAE.00F7.0@lombardrisk.com>

Hi Srinivas,
 
Thanks very much for your response, very glad to have expert can talk about GC issue.
 
 
For my question#2, your answer is 'concurrent mode failure' cause Old generation compaction, I have attach a piece of gc log from real production of our client, there're 4 ParNew GCs, 
the 1st one at time 298550.966 having Promotion Failed and Concurrent Mode Failure, the 2nd and 3rd are OK, the 4th one at 298786.902 having Promotion Failed
again, but the whole old generation only used 2615456K out of 10387456K, and have been compacted at the 1st time. which confuses me a lot.
 
 
Regards,
Bond
 
 
>>> Srinivas Ramakrishna <ysr1729 at gmail.com> 4/25/2012 3:01 PM >>>
Bond, you are apparently using an MP box. I'd suggest losing the
"incremental" options entirely and dropping the
max tenuring threshold to 8 or so. I'd make use of the size of the young
gen and the survivor spaces to
control promotion into the old gen, which i would size at two times your
application footprint plus the size of
the young gen as a starting point and refine from there. There have been
some suggestions on this alias
from Chi-Ho Kwok etc. on the importance of reducing promotion of very young
objects into the old
generation to prevent fragmentation. LOnger-lived objects typically imply
(for most but not all applications)
relatively stable and less non-stationary distributions which CMS block
inventorying heuristics prefer.

more inline below...

On Tue, Apr 24, 2012 at 2:49 AM, Bond Chen <Bond.Chen at lombardrisk.com>wrote:

> Hi ,
>
> We're suffering high frequent promotion failed and concurrent mode
> failure, cause very long GC pause(5 seconds to 1000 seconds even more some
> time) attached the '1st promote failed' and '49th promotion failed' of
> gc.log
>
> 1, The '1st promote failed' caused by the old generation usage is too
> high, no enough space for promotion, but the  '49th promotion failed', only
> used
> 2615456K out of 10387456K, what happed?
>

either a large object allocation or fragmentation or more likely both.


>
> 2, Does the CMS throwing 'Concurrent Mode Failure' combat the old
> generation? move all objects together and leave only one free block? or
> Only 'Full GC' does this?
>

concurrent mode failure results in compaction, yes.


>
> 3, when will 'Promotion failure' cause ''Concurrent Mode Failure' and some
> time 'Full GC' ?
>

it's just a notional difference. Both should be called "concurrent mode
failure". I think the newer mesages say "concurrent mode interrupted"
and "full gc" respectively. In the latter case there is not an ongoing
concurrent cycle that was interrupted. From the standpoint of
the effect on the application (long pause for gc) and of the state of the
heap after gc (fully compacted) there is little difference.
For historical reasons, "concurrent mode failure" usually results in longer
pauses because an ongoing concurrent collection phase
first completes an ongoing phase before bailing to compaction, whereas in
the latter case there is no such delay so is usually
less painful.

-- ramki


> Regards,
> Bond
>
> /****parameter ***/
> ### New JVM Parameter
> #Below line changed per RH recommendation 15 Dec 2009
> #export RUN_ARGS=" -d64 -server -Xms2048M -Xmx12144M -XX:PermSize=512m
> -XX:MaxPermSize=512m -Xss1024k "
> export RUN_ARGS=" -server -d64 -Xms2048M -Xmx12144M -XX:PermSize=512m
> -XX:MaxPermSize=512m -Xss1024k "
>
> export RUN_ARGS=" $RUN_ARGS -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
> -XX:+CMSParallelRemarkEnabled -XX:+UseTLAB -XX:+CMSIncrementalMode "
>
> #Below line commented per RH recommendation 15 Dec 2009
> #export RUN_ARGS=" $RUN_ARGS -XX:+UseCMSCompactAtFullCollection "
>
> #Below line changed per RH recommendation 15 Dec 2009
> #export RUN_ARGS=" $RUN_ARGS -XX:+CMSIncrementalPacing
> -XX:CMSIncrementalDutyCycleMin=0 -XX:CMSIncrementalDutyCycle=10
> -XX:MaxTenuringThreshold=0 "
> export RUN_ARGS=" $RUN_ARGS -XX:+CMSIncrementalPacing
> -XX:CMSIncrementalDutyCycleMin=0 -XX:CMSIncrementalDutyCycle=10
> -XX:MaxTenuringThreshold=32 "
>
> #Below 2 lines added per RH recommendation 15 Dec 2009
> #export RUN_ARGS=" -XX:ParallelGCThreads=13 "
> #export RUN_ARGS=" -XX:SurvivorRatio=48 "
>
> #Below 2 lines added per RH recommendation 16 Dec 2009
> RUN_ARGS=" $RUN_ARGS -XX:ParallelGCThreads=13 "
> RUN_ARGS=" $RUN_ARGS -XX:SurvivorRatio=48 "
>
> ### set for cluster monitor  added on 25-Jun-2011
> export RUN_ARGS="$RUN_ARGS -Djboss.cluster.monitor.switch=y";
> export RUN_ARGS="$RUN_ARGS -Djboss.cluster.number=2";
>
> #Below line changed with RELEASE_2009_1_SP10.2 on 26 Feb 2010
> #export RUN_ARGS=" $RUN_ARGS -XX:CMSInitiatingOccupancyFraction=70
> -XX:+PrintTenuringDistribution -Xmn512m -XX:+UseLargePages
> -XX:LargePageSizeInBytes=64k "
> export RUN_ARGS=" $RUN_ARGS -XX:CMSInitiatingOccupancyFraction=60
> -Xmn2000m -XX:+UseLargePages -XX:LargePageSizeInBytes=64k "
>
> #Below line added with RELEASE_2009_1_SP10.2 on 26 Feb 2010
> export RUN_ARGS=" $RUN_ARGS -XX:+CMSClassUnloadingEnabled
> -XX:+ExplicitGCInvokesConcurrent -XX:+AggressiveOpts "
>
> export RUN_ARGS=" $RUN_ARGS -XX:+PrintGCDetails
> -XX:+PrintGCApplicationStoppedTime -Xloggc:./gc_${start_ts}.log "
>
> export RUN_ARGS=" $RUN_ARGS  -Dsun.rmi.dgc.server.gcInterval=18000000
> -Dsun.rmi.dgc.client.gcInterval=18000000 -verbose:gc"
>
>
>
> /***parameter
>
>
>
>
>
>
>
> /** the 1st promotion failed **/
>
> 169682.980: [GC Before GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 7127332
> Max   Chunk Size: 6041118
> Number of Blocks: 1785
> Av.  Block  Size: 3992
> Tree      Height: 24
> Before GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 6834133
> Max   Chunk Size: 97353
> Number of Blocks: 4773
> Av.  Block  Size: 1431
> Tree      Height: 27
> 169682.981: [ParNew (promotion failed): 2007040K->2007040K(2007040K),
> 48.9558338 secs]169731.937: [CMS169741.903: [CMS-concurrent-sweep:
> 10.823/99.414 secs] [Times: user=127.09 sys=25.97, real=99.41 secs]
>  (concurrent mode failure): 8681490K->2319271K(10387456K), 44.6304362
> secs] 10395485K->2319271K(12394496K), [CMS Perm :
> 291584K->290856K(524288K)]After GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 1032711195
> Max   Chunk Size: 1032711195
> Number of Blocks: 1
> Av.  Block  Size: 1032711195
> Tree      Height: 1
> After GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 0
> Max   Chunk Size: 0
> Number of Blocks: 0
> Tree      Height: 0
>  icms_dc=16 , 93.5876901 secs] [Times: user=97.28 sys=21.58, real=93.59
> secs]
>
> /** the 1st promotion failed **/
>
>
>
> /** the 49th promotion failed ***/
> 298786.901: [GC Before GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 236997970
> Max   Chunk Size: 236997970
> Number of Blocks: 1
> Av.  Block  Size: 236997970
> Tree      Height: 1
> Before GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 0
> Max   Chunk Size: 0
> Number of Blocks: 0
> Tree      Height: 0
> 298786.902: [ParNew (promotion failed): 2007039K->2007040K(2007040K),
> 4.5565939 secs]298791.458: [CMS: 2615456K->1813239K(10387456K), 19.2232319
> secs] 4346089K->1813239K(12394496K), [CMS Perm :
> 299206K->299126K(524288K)]After GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 1097483360
> Max   Chunk Size: 1097483360
> Number of Blocks: 1
> Av.  Block  Size: 1097483360
> Tree      Height: 1
> After GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 0
> Max   Chunk Size: 0
> Number of Blocks: 0
> Tree      Height: 0
>  icms_dc=0 , 23.7805042 secs] [Times: user=25.47 sys=0.02, real=23.78 secs]
> Total time for which application threads were stopped: 23.7861234 seconds
> /** the 49th promotion failed ***/
>
> This e-mail together with any attachments (the "Message") is confidential
> and may contain privileged information. If you are not the intended
> recipient (or have received this e-mail in error) please notify the sender
> immediately and delete this Message from your system.  Any unauthorized
> copying, disclosure, distribution or use of this Message is strictly
> forbidden.
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>


This e-mail together with any attachments (the "Message") is confidential and may contain privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and delete this Message from your system.  Any unauthorized copying, disclosure, distribution or use of this Message is strictly forbidden.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120426/92573cf1/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gc_twoNeibouringPromotionFailure.log
Type: application/octet-stream
Size: 7272 bytes
Desc: not available
Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120426/92573cf1/gc_twoNeibouringPromotionFailure-0001.log 

From ion.savin at tora.com  Thu Apr 26 01:44:50 2012
From: ion.savin at tora.com (Ion Savin)
Date: Thu, 26 Apr 2012 11:44:50 +0300
Subject: -Xloggc and -XX:+Max*=n values (was: Re: Two basic questions on
	-verbosegc and -XX:+PrintTenuringDistribution)
In-Reply-To: <4F947181.50003@kippdata.de>
References: <CA+R7V78bTOkvaYgwNCPC2MfiqdV-QBtOidz3nUXjb9bwZ5FrNg@mail.gmail.com>
	<4F947181.50003@kippdata.de>
Message-ID: <4F990B02.4020004@tora.com>

Hi,

> Have a look at -XX:+PrintHeapAtGC. This will help you get more precise
> numbers.

Is there any flag which can be used to have the max values for heap, 
young and perm listed in the GC log (-Xloggc)?

I know there's -XX:+PrintCommandLineFlags but the output is sent to 
stdout not the gc log file.

Regards,
  Ion Savin

From Martin.Hare-Robertson at metaswitch.com  Thu Apr 26 06:35:03 2012
From: Martin.Hare-Robertson at metaswitch.com (Martin Hare-Robertson)
Date: Thu, 26 Apr 2012 13:35:03 +0000
Subject: PermGen Collection Issues
Message-ID: <01E0A60827F5E5459B77A1D0FB9B524B46F3CB68@ENFICSMBX1.datcon.co.uk>

Hi,

I am hitting a "java.lang.OutOfMemoryError: PermGen space" in a situation where I think a great deal of the perm gen is actually eligible for collection.

I am running Tomcat 6 using a 32 bit Hotspot JVM (1.6.0_07). I have had some trouble in the past with classloader leaks when I reload webapps within Tomcat. I have therefore been doing some testing to explicitly reload my webapp many times and then use heap dumps to track down any classloader leaks. I have a script for testing the reloading of my webapp which does the following:

1) Submit 5 login requests to ensure that the webapp is loaded and working.
2) Reload the webapp.
3) Wait 30 seconds to give threads from the old webapp sufficient time to terminate.
4) Goto 1)

I have fixed a number of bugs and now think that my webapp isn't leaking any classloaders. However, when I run this script I find that OutOfMemoryErrors get thrown after ~30 iterations. However, the heap dump which was made at the time when the OutOfMemoryError is thrown shows that all of the old Classloaders are only weakly referenced (according to Eclipse MAT).

This suggests to me that something has gone wrong with the garbage collection that it has thrown an OutOfMemoryError when there was memory which was only weakly referenced which should have been freed.

I ran jstat to record the GC activity and the perm gen ends up stuck at 100% full and the last GC cause is always one of "Permanent Generation Full" and "Last ditch collection".

My GC command line options are as follows:
-Xms600m
-Xmx600m
-XX:+UseMembar
-XX:+UseConcMarkSweepGC
-XX:+UseParNewGC
-XX:CMSInitiatingOccupancyFraction=80
-XX:+UseCMSInitiatingOccupancyOnly
-XX:PermSize=120m
-XX:MaxPermSize=120m
-XX:NewSize=128m
-XX:SurvivorRatio=8
-Xss120k

My issue seems similar to this old bug (http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6545719) which claims to have been fixed in 6u4.

When exactly is a "java.lang.OutOfMemoryError: PermGen space" error allowed to be thrown? Presumably this occurs when a thread attempts to allocate into the perm gen and the perm gen collector is unable to free up sufficient space? When the perm gen is collected would you expect all garbage to be collected or does the collector quit early? I have attached an example graph showing the perm gen occupancy which I am seeing during a test. This seems to show only a small amount of perm gen being freed at each collection.


MartinHR
___

Here is the complete jstat output from my test run:


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120426/69fe168d/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: jstat.out
Type: application/octet-stream
Size: 543377 bytes
Desc: jstat.out
Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120426/69fe168d/jstat-0001.out 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: perm.png
Type: image/png
Size: 2338 bytes
Desc: perm.png
Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120426/69fe168d/perm-0001.png 

From holger.hoffstaette at googlemail.com  Thu Apr 26 15:59:40 2012
From: holger.hoffstaette at googlemail.com (=?windows-1252?Q?Holger_Hoffst=E4tte?=)
Date: Fri, 27 Apr 2012 00:59:40 +0200
Subject: PermGen Collection Issues
In-Reply-To: <01E0A60827F5E5459B77A1D0FB9B524B46F3CB68@ENFICSMBX1.datcon.co.uk>
References: <01E0A60827F5E5459B77A1D0FB9B524B46F3CB68@ENFICSMBX1.datcon.co.uk>
Message-ID: <4F99D35C.3000800@googlemail.com>

On 26.04.2012 15:35, Martin Hare-Robertson wrote:
> I am hitting a ?java.lang.OutOfMemoryError: PermGen space? in a situation
> where I think a great deal of the perm gen is actually eligible for
> collection.

By default CMS does not collect classes; try with
-XX:+CMSClassUnloadingEnabled.

> I am running Tomcat 6 using a 32 bit Hotspot JVM (1.6.0_07). I have had

Those are fairly old too. Also note that just because your app code does
not leak, many (badly written) libraries keep static state or hidden
internal threads alive unless they are explicitly shut down/cleaned up.

-h

From Martin.Hare-Robertson at metaswitch.com  Fri Apr 27 01:22:33 2012
From: Martin.Hare-Robertson at metaswitch.com (Martin Hare-Robertson)
Date: Fri, 27 Apr 2012 08:22:33 +0000
Subject: PermGen Collection Issues
In-Reply-To: <01E0A60827F5E5459B77A1D0FB9B524B5DF833A9@ENFICSMBX1.datcon.co.uk>
References: <01E0A60827F5E5459B77A1D0FB9B524B5DF833A9@ENFICSMBX1.datcon.co.uk>
Message-ID: <01E0A60827F5E5459B77A1D0FB9B524B5DF843CF@ENFICSMBX1.datcon.co.uk>

>> I am hitting a "java.lang.OutOfMemoryError: PermGen space" in a situation
>> where I think a great deal of the perm gen is actually eligible for
>> collection.
>
>By default CMS does not collect classes; try with
>-XX:+CMSClassUnloadingEnabled.

Running with -XX:+CMSClassUnloadingEnabled improved the results as Tomcat survived 58 webapp reloads (compared to 30 without CMSClassUnloadingEnabled) before hitting the same endless (Permanent Generation Full/Last ditch collection) GC issues and throwing "OutOfMemoryError: PermGen space".

>> I am running Tomcat 6 using a 32 bit Hotspot JVM (1.6.0_07). I have had
>
>Those are fairly old too. Also note that just because your app code does
>not leak, many (badly written) libraries keep static state or hidden
>internal threads alive unless they are explicitly shut down/cleaned up.

The heap dumps which I have taken should reveal if any libraries are doing anything to keep old webapp classloaders alive. According to Eclipse MAT the only path to a GC root for the old classloaders is through weak references.

Is there anything else I could try to fix this? Is it possible that this is a JVM/GC bug? 

From Martin.Hare-Robertson at metaswitch.com  Fri Apr 27 06:04:15 2012
From: Martin.Hare-Robertson at metaswitch.com (Martin Hare-Robertson)
Date: Fri, 27 Apr 2012 13:04:15 +0000
Subject: Java 7 Perm Gen
Message-ID: <01E0A60827F5E5459B77A1D0FB9B524B5DF85709@ENFICSMBX1.datcon.co.uk>

Hi,

I see that one of the plans for Java 7+ is to retire the perm gen as a separate space.

Can you confirm what the progress is of this project as of Java 7u4 and when we can expect to see the complete removal of the Perm Gen? Are the former contents of the Perm Gen all being moved out into native memory?

Martin

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120427/2f7199a2/attachment.html 

From jon.masamitsu at oracle.com  Fri Apr 27 10:17:45 2012
From: jon.masamitsu at oracle.com (Jon Masamitsu)
Date: Fri, 27 Apr 2012 10:17:45 -0700
Subject: Java 7 Perm Gen
In-Reply-To: <01E0A60827F5E5459B77A1D0FB9B524B5DF85709@ENFICSMBX1.datcon.co.uk>
References: <01E0A60827F5E5459B77A1D0FB9B524B5DF85709@ENFICSMBX1.datcon.co.uk>
Message-ID: <4F9AD4B9.8000404@oracle.com>

The removal of the permanent generation is planned for jdk 8.  It will
not go into a jdk 7 update soon.   Once it's stabilized, we'll consider
it for a jdk 7 update.

The vast majority of the current contents of perm gen will go
into native memory.  There is some that may move to the
Java heap.

Jon

On 4/27/2012 6:04 AM, Martin Hare-Robertson wrote:
> Hi,
>
> I see that one of the plans for Java 7+ is to retire the perm gen as a separate space.
>
> Can you confirm what the progress is of this project as of Java 7u4 and when we can expect to see the complete removal of the Perm Gen? Are the former contents of the Perm Gen all being moved out into native memory?
>
> Martin
>
>
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120427/71598783/attachment.html 

From performanceguy at gmail.com  Fri Apr 27 14:03:14 2012
From: performanceguy at gmail.com (John O'Brien)
Date: Fri, 27 Apr 2012 14:03:14 -0700
Subject: Question about -XX:+PrintTenuringDistribution and age not being
	printed.
Message-ID: <CA+_uMJ+s=Tq77LueHCBV7f4JQ66-WQ+hxu6uZXffJLFCYV55cw@mail.gmail.com>

Hi everyone,

I understand that :

1) par-new has features that make it work with CMS.
2) par-scavenge does not have these features and is incompatible with CMS.
3) Otherwise they are the same core algorithm...both parallel stop the
world copying collectors.

Why does PrintTenuringDistribution only print out the ages when ParNew
is enabled?
If they are the same algorithm then shouldn't they both print out age
? par-scavenge does not print "ages" for me when
PrintTenuringDistribution is on.

I use Parallel Old and Parallel Scavenge (ParNew can't be used with
Parallel Old).

Ref: https://blogs.oracle.com/jonthecollector/entry/our_collectors

I searched the mailing lists and did not see anything , read some
blogs and looked through some books.


Regards,
John

From jon.masamitsu at oracle.com  Fri Apr 27 14:29:26 2012
From: jon.masamitsu at oracle.com (Jon Masamitsu)
Date: Fri, 27 Apr 2012 14:29:26 -0700
Subject: Question about -XX:+PrintTenuringDistribution and age not being
	printed.
In-Reply-To: <CA+_uMJ+s=Tq77LueHCBV7f4JQ66-WQ+hxu6uZXffJLFCYV55cw@mail.gmail.com>
References: <CA+_uMJ+s=Tq77LueHCBV7f4JQ66-WQ+hxu6uZXffJLFCYV55cw@mail.gmail.com>
Message-ID: <4F9B0FB6.8090803@oracle.com>

John,


On 4/27/2012 2:03 PM, John O'Brien wrote:
> Hi everyone,
>
> I understand that :
>
> 1) par-new has features that make it work with CMS.

Yes.

> 2) par-scavenge does not have these features and is incompatible with CMS.
Yes.
> 3) Otherwise they are the same core algorithm...both parallel stop the
> world copying collectors.

ParNew and Parallel Scavenge are two different implementations of 
parallel STW
collectors.  They share some code but much is different.  ParallelScavenge
support UseAdaptiveSizePolicy and ParNew does not (never finished).  
ParallelScavenge
varies the tenuring threshold to keep the survivor spaces from 
overflowing and
also varies the sizes of the survivor spaces relative to eden while ParNew
has a fixed ratio between eden and the survivor sizes.  It's hard to 
keep track
of the differences.

> Why does PrintTenuringDistribution only print out the ages when ParNew
> is enabled?
> If they are the same algorithm then shouldn't they both print out age
> ? par-scavenge does not print "ages" for me when
> PrintTenuringDistribution is on.

Not the same algorithm.

> I use Parallel Old and Parallel Scavenge (ParNew can't be used with
> Parallel Old).

Correct.

Jon

> Ref: https://blogs.oracle.com/jonthecollector/entry/our_collectors
>
> I searched the mailing lists and did not see anything , read some
> blogs and looked through some books.
>
>
> Regards,
> John
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

From comp_ at gmx.net  Fri Apr 27 14:57:30 2012
From: comp_ at gmx.net (Ion Savin)
Date: Sat, 28 Apr 2012 00:57:30 +0300
Subject: heap expanded after young gen gc?
Message-ID: <4F9B164A.5070204@gmx.net>

Hi,

 From the GC log below it seems that the heap gets expanded after young 
gen collection also (131008K total at 0.030 got expanded to 262080K 
total at 0.184). I was under the impression that this happens only 
during Full GC (which might be triggered by the need to resize).

0.030: [GC [PSYoungGen: 128K->64K(192K)] 128K->128K(131008K), 0.0005310 
secs] [Times: user=0.00 sys=0.00, real=0.00 secs]

0.184: [GC [PSYoungGen: 138K->64K(192K)] 261932K->261881K(262080K), 
0.0021120 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]

0.186: [Full GC [PSYoungGen: 64K->0K(192K)] [PSOldGen: 
261817K->122K(130816K)] 261881K->122K(131008K) [PSPermGen: 
2552K->2552K(21248K)], 0.0110530 secs] [Times: user=0.01 sys=0.01, 
real=0.01 secs]

0.297: [GC [PSYoungGen: 2K->32K(192K)] 261854K->261883K(262080K), 
0.0013380 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]

If expansion can happen after a young gen collection how is the 
-XX:MinHeapFreeRatio (and -XX:MaxHeapFreeRatio) flag interpreted given 
that after young gen GC objects might get promoted to the tenured 
generation filling up the heap above the min ratio?

The heap is sized like this: -Xms128m -Xmx256m -Xmn256k
And the app is just generating collectable junk in a loop.

What I would expect to happen is for the heap to fill up to 128m, a full 
gc to happen and since the old gen free space is over the default 40% 
min heap free before expansion no heap resize to happen.

Please advise. Thank you!

Regards,
  Ion Savin

From jobrien at ieee.org  Fri Apr 27 15:30:13 2012
From: jobrien at ieee.org (John O'Brien)
Date: Fri, 27 Apr 2012 15:30:13 -0700
Subject: Question about -XX:+PrintTenuringDistribution and age not being
	printed.
In-Reply-To: <4F9B0FB6.8090803@oracle.com>
References: <CA+_uMJ+s=Tq77LueHCBV7f4JQ66-WQ+hxu6uZXffJLFCYV55cw@mail.gmail.com>
	<4F9B0FB6.8090803@oracle.com>
Message-ID: <CA+_uMJJJruQxg-2Msy+MO4Wmwaf5esFRuyCZ05ZYRNBSn0+8Sw@mail.gmail.com>

Thanks for the quick response today.

To finish up,using Parallel old with  UseAdaptivePolicy is true by
default and I  -XX:MaxTenuringThreshold=3  I expected an incompatible
parameter error.
I did not get it.  The logs  show the tenuring threshold adjusted
below and above the value of 5 e.g "new threshold 7 (max 5)."

Then I decided to switch of useAdaptiveSizePolicy and see if threshold
would be stuck at 5 but no threshold logs were printed.

My  question: Does   -XX:MaxTenuringThreshold=3 work with
ParallelScavenge? (Seems not).

Looks like my  override is to fix size the survivor spaces through use
of some other flags? This will also turn off adaptivesizepolicy as it
is not needed.

Regards,
John


On Fri, Apr 27, 2012 at 2:29 PM, Jon Masamitsu <jon.masamitsu at oracle.com> wrote:
> John,
>
>
> On 4/27/2012 2:03 PM, John O'Brien wrote:
>> Hi everyone,
>>
>> I understand that :
>>
>> 1) par-new has features that make it work with CMS.
>
> Yes.
>
>> 2) par-scavenge does not have these features and is incompatible with CMS.
> Yes.
>> 3) Otherwise they are the same core algorithm...both parallel stop the
>> world copying collectors.
>
> ParNew and Parallel Scavenge are two different implementations of
> parallel STW
> collectors. ?They share some code but much is different. ?ParallelScavenge
> support UseAdaptiveSizePolicy and ParNew does not (never finished).
> ParallelScavenge
> varies the tenuring threshold to keep the survivor spaces from
> overflowing and
> also varies the sizes of the survivor spaces relative to eden while ParNew
> has a fixed ratio between eden and the survivor sizes. ?It's hard to
> keep track
> of the differences.
>
>> Why does PrintTenuringDistribution only print out the ages when ParNew
>> is enabled?
>> If they are the same algorithm then shouldn't they both print out age
>> ? par-scavenge does not print "ages" for me when
>> PrintTenuringDistribution is on.
>
> Not the same algorithm.
>
>> I use Parallel Old and Parallel Scavenge (ParNew can't be used with
>> Parallel Old).
>
> Correct.
>
> Jon
>
>> Ref: https://blogs.oracle.com/jonthecollector/entry/our_collectors
>>
>> I searched the mailing lists and did not see anything , read some
>> blogs and looked through some books.
>>
>>
>> Regards,
>> John
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

From jon.masamitsu at oracle.com  Fri Apr 27 18:09:52 2012
From: jon.masamitsu at oracle.com (Jon Masamitsu)
Date: Fri, 27 Apr 2012 18:09:52 -0700
Subject: Question about -XX:+PrintTenuringDistribution and age not being
	printed.
In-Reply-To: <CA+_uMJJJruQxg-2Msy+MO4Wmwaf5esFRuyCZ05ZYRNBSn0+8Sw@mail.gmail.com>
References: <CA+_uMJ+s=Tq77LueHCBV7f4JQ66-WQ+hxu6uZXffJLFCYV55cw@mail.gmail.com>	<4F9B0FB6.8090803@oracle.com>
	<CA+_uMJJJruQxg-2Msy+MO4Wmwaf5esFRuyCZ05ZYRNBSn0+8Sw@mail.gmail.com>
Message-ID: <4F9B4360.4090108@oracle.com>


On 4/27/2012 3:30 PM, John O'Brien wrote:
> Thanks for the quick response today.
>
> To finish up,using Parallel old with  UseAdaptivePolicy is true by
> default and I  -XX:MaxTenuringThreshold=3  I expected an incompatible
> parameter error.

Our consistency checking is not that good but in this
case MaxTenuringThreshold  is used by ParallelScavenge
but not in the same way as ParNew.  ParallelScavenge
picks a tenuring threshold that it thinks will keep the
survivor spaces from over flowing.  If the survivor
spaces don't have much in them after a scavenge,
ParallelScavenge may raise the tenuring threshold to
better use the survivor spaces but not above MaxTenuringThreshold.

> I did not get it.  The logs  show the tenuring threshold adjusted
> below and above the value of 5 e.g "new threshold 7 (max 5)."

I haven't seen that behavior.  That's a bug.

> Then I decided to switch of useAdaptiveSizePolicy and see if threshold
> would be stuck at 5 but no threshold logs were printed.

The logging code  is also guarded by  UseAdaptiveSizePolicy.
> My  question: Does   -XX:MaxTenuringThreshold=3 work with
> ParallelScavenge? (Seems not).

I just looked at the code and I don't see what's wrong.

Jon

> Looks like my  override is to fix size the survivor spaces through use
> of some other flags? This will also turn off adaptivesizepolicy as it
> is not needed.
>
> Regards,
> John
>
>
> On Fri, Apr 27, 2012 at 2:29 PM, Jon Masamitsu<jon.masamitsu at oracle.com>  wrote:
>> John,
>>
>>
>> On 4/27/2012 2:03 PM, John O'Brien wrote:
>>> Hi everyone,
>>>
>>> I understand that :
>>>
>>> 1) par-new has features that make it work with CMS.
>> Yes.
>>
>>> 2) par-scavenge does not have these features and is incompatible with CMS.
>> Yes.
>>> 3) Otherwise they are the same core algorithm...both parallel stop the
>>> world copying collectors.
>> ParNew and Parallel Scavenge are two different implementations of
>> parallel STW
>> collectors.  They share some code but much is different.  ParallelScavenge
>> support UseAdaptiveSizePolicy and ParNew does not (never finished).
>> ParallelScavenge
>> varies the tenuring threshold to keep the survivor spaces from
>> overflowing and
>> also varies the sizes of the survivor spaces relative to eden while ParNew
>> has a fixed ratio between eden and the survivor sizes.  It's hard to
>> keep track
>> of the differences.
>>
>>> Why does PrintTenuringDistribution only print out the ages when ParNew
>>> is enabled?
>>> If they are the same algorithm then shouldn't they both print out age
>>> ? par-scavenge does not print "ages" for me when
>>> PrintTenuringDistribution is on.
>> Not the same algorithm.
>>
>>> I use Parallel Old and Parallel Scavenge (ParNew can't be used with
>>> Parallel Old).
>> Correct.
>>
>> Jon
>>
>>> Ref: https://blogs.oracle.com/jonthecollector/entry/our_collectors
>>>
>>> I searched the mailing lists and did not see anything , read some
>>> blogs and looked through some books.
>>>
>>>
>>> Regards,
>>> John
>>> _______________________________________________
>>> hotspot-gc-use mailing list
>>> hotspot-gc-use at openjdk.java.net
>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use