From y.s.ramakrishna at oracle.com  Tue Oct  5 23:06:50 2010
From: y.s.ramakrishna at oracle.com (Y. Srinivas Ramakrishna)
Date: Tue, 05 Oct 2010 23:06:50 -0700
Subject: HandlePromotionFailure flag?
Message-ID: <4CAC11FA.70203@oracle.com>


Is there anyone on this alias using 6.0 or later who explicitly turns off
the above flag in a production setting, i.e. uses -XX:-HandlePromotionFailure.
Let me know if you do so, along with the reason why you chose to do so.
I am considering removing the ability to disable this flag, at
least when using CMS (if not more broadly).

thanks!
-- ramki

From shaun.shen at oracle.com  Wed Oct  6 12:04:04 2010
From: shaun.shen at oracle.com (shaun.shen at oracle.com)
Date: Wed, 6 Oct 2010 12:04:04 -0700 (PDT)
Subject: Auto Reply: hotspot-gc-use Digest, Vol 32, Issue 1
Message-ID: <09759dad-80c0-4e2d-a1e6-f8cbbe09325b@default>

Thank you for your message. I am on training from 5 to 8 Oct and can't reply you right now.

- For MCS, pls contact Attaporn (attaporn.thongkiatcharoen at oracle.com, +65 93575992)
- Or call me +65 9878 6375 if needed.

Cheers,
Shaun


From higuava at gmail.com  Thu Oct 14 10:42:47 2010
From: higuava at gmail.com (Hi Guava)
Date: Thu, 14 Oct 2010 13:42:47 -0400
Subject: Different Full GCs?
Message-ID: <AANLkTimQTqX421OyZEW0GEmCELTChr26oFK+NNnoZ4uT@mail.gmail.com>

I've seen different full GC messages and I don't quite understand them:
13011:1474.283: [Full GC 1474.283: [CMS:
3333048K->1021496K(12540352K), 6.3444520 secs]
3363388K->1021496K(12582848K), [CMS Perm : 30528K->30483K(30656K)],
6.3447880 secs]
34822:4101.808: [Full GC 4101.808: [ParNew: 667K->0K(42496K),
0.0210030 secs] 5953849K->5953209K(12582848K), 0.0211110 secs]
51586:25535.616: [Full GC 25535.616: [ParNew: 750K->0K(42496K),
0.0324350 secs] 5341677K->5340939K(12582848K), 0.0326130 secs]
63486:26306.646: [Full GC 26306.646: [CMS[Unloading class
sun.reflect.GeneratedConstructorAccessor20]

Both GC #1 and #4 are triggered by System.gc() in our code. I believe
they are the same type. There was less memory available during #4 so
it unloaded classes (soft reference?).
But full gc is the stop-the-world gc. Why does it mention CMS in the message?
GC #2 and #3 look weird to me. They were not triggered by System.gc().
They are always very short and the duration is about the same as young
generation GCs. In fact, the message is exact like young generation
GCs except the extra word "Full". What are these short full gcs? Are
there different level of full GCs?
I spent some time searching for answer but I am still confused. Can
somebody help explain and suggest some reading materials?

Thanks!

The environment:
Java HotSpot(TM) 64-Bit Server VM Version 1.5.0_19-b02
Linux Version 2.6.9-89.0.20.ELsmp
amd64

-Xms12g
-Xmx12g
-XX:+PrintGC
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+UseConcMarkSweepGC

From y.s.ramakrishna at oracle.com  Thu Oct 14 11:20:20 2010
From: y.s.ramakrishna at oracle.com (Y. S. Ramakrishna)
Date: Thu, 14 Oct 2010 11:20:20 -0700
Subject: Different Full GCs?
In-Reply-To: <AANLkTimQTqX421OyZEW0GEmCELTChr26oFK+NNnoZ4uT@mail.gmail.com>
References: <AANLkTimQTqX421OyZEW0GEmCELTChr26oFK+NNnoZ4uT@mail.gmail.com>
Message-ID: <4CB749E4.2000909@oracle.com>


Hello --

On 10/14/10 10:42, Hi Guava wrote:
> I've seen different full GC messages and I don't quite understand them:
> 13011:1474.283: [Full GC 1474.283: [CMS:
> 3333048K->1021496K(12540352K), 6.3444520 secs]
> 3363388K->1021496K(12582848K), [CMS Perm : 30528K->30483K(30656K)],
> 6.3447880 secs]

This is a full gc; it collects the whole heap.

> 34822:4101.808: [Full GC 4101.808: [ParNew: 667K->0K(42496K),
> 0.0210030 secs] 5953849K->5953209K(12582848K), 0.0211110 secs]
> 51586:25535.616: [Full GC 25535.616: [ParNew: 750K->0K(42496K),
> 0.0324350 secs] 5341677K->5340939K(12582848K), 0.0326130 secs]

These two are not full gc's. They are mislabelled. They are likely
scavenge's forced by the allocation policy interacting with
JNI critical sections preventing a scavenge attempt made previously.
I think the labelling has been fixed in 6uXX.

> 63486:26306.646: [Full GC 26306.646: [CMS[Unloading class
> sun.reflect.GeneratedConstructorAccessor20]

Yes this is also a full gc.

In 6uXX, the first and the last would be labelled with an additional
"System.gc()"  label, and the two middle ones would not say "Full".
I don't have a bug id handy to point you to, but i might be able to
dig one up after some archeology.

> 
> Both GC #1 and #4 are triggered by System.gc() in our code. I believe
> they are the same type. There was less memory available during #4 so
> it unloaded classes (soft reference?).
> But full gc is the stop-the-world gc. Why does it mention CMS in the message?

You are right that the "CMS" is misleading in that sense.
The idea was that it collects the old generation which is typically
collected by CMS. I agree that the CMS label is misleading and probably
should be fixed; it's a consequence of our internal naming scheme for
generation "types".

> GC #2 and #3 look weird to me. They were not triggered by System.gc().
> They are always very short and the duration is about the same as young
> generation GCs. In fact, the message is exact like young generation
> GCs except the extra word "Full". What are these short full gcs? Are
> there different level of full GCs?

No, and you are right that these are just scavenges. What must have happened
is that your application probably does a few short-lived JNI critical
sections (JNI_Get{Array,String}Critical) which happens around the time
when another thread wants to do a large allocation which will not fit in
the current space available in Eden, so the allocator attempts to do a
scavenge, but is prevented from doing so because of the JNI critical section.
This is remembered and when the critical section is exited, a scavenge
is forced. At least that's my guess based on the messages above.
(BTW, what's the size of your Eden or Young Gen? The policy should
probably be a little smarter and not do those scavenges until an
allocation request (would) fail.)

> I spent some time searching for answer but I am still confused. Can
> somebody help explain and suggest some reading materials?

Hope that helps a bit. Try JDK 7 or 6u21 (or whatever is the latest) and
see if the confusing messages are gone. If they are still there,
let us know.

thanks.
-- ramki

> 
> Thanks!
> 
> The environment:
> Java HotSpot(TM) 64-Bit Server VM Version 1.5.0_19-b02
> Linux Version 2.6.9-89.0.20.ELsmp
> amd64
> 
> -Xms12g
> -Xmx12g
> -XX:+PrintGC
> -XX:+PrintGCDetails
> -XX:+PrintGCTimeStamps
> -XX:+UseConcMarkSweepGC
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

From adamh at basis.com  Thu Oct 14 11:31:23 2010
From: adamh at basis.com (Adam Hawthorne)
Date: Thu, 14 Oct 2010 14:31:23 -0400
Subject: CMS initial mark pauses
Message-ID: <AANLkTim-k4uEfa5+trZZoqxmOQAwxgEg9PVitERXPz-Z@mail.gmail.com>

Hi all,

I'm seeing a customer running CMS with some fairly long initial mark pauses,
especially relative to all the other pauses.  The machine is (I believe) an
Itanium 6-way running HP-UX, with HP's version of Hotspot, version 1.6.0.06
.

What I'm seeing is that all the remark and young gc pauses are less than
500ms, and very consistent.  There are a lot of initial mark pauses that
also fall in this range.  That's our target, and things look really good.
 The problem is that there are occasional pauses of up to 1.5s, which is
unacceptable for the customer.  Does anyone have any ideas what can cause
(seemingly random) long initial mark pauses?

Here's an example of one of the long mark pauses:

45946.930: [GC [1 CMS-initial-mark: 957254K(1598236K)] 1473999K(2151196K),
1.3505680 secs] [Times: user=1.34 sys=0.00, real=1.35 secs]

Here's an example of one of the more typical pauses:

45954.362: [GC [1 CMS-initial-mark: 963824K(1598236K)] 1001370K(2151196K),
0.1579016 secs] [Times: user=0.16 sys=0.00, real=0.16 secs]


>From my understanding, initial mark pauses are supposed to be relatively
short, and usually shorter than remark pauses, but I don't have a remark
pause greater than 300ms.

Any help is appreciated.  Thanks,

Adam

--
Adam Hawthorne
Software Engineer
BASIS International Ltd.
www.basis.com
+1.505.345.5232 Phone
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101014/891c9e30/attachment.html 

From y.s.ramakrishna at oracle.com  Thu Oct 14 11:56:31 2010
From: y.s.ramakrishna at oracle.com (Y. S. Ramakrishna)
Date: Thu, 14 Oct 2010 11:56:31 -0700
Subject: CMS initial mark pauses
In-Reply-To: <AANLkTim-k4uEfa5+trZZoqxmOQAwxgEg9PVitERXPz-Z@mail.gmail.com>
References: <AANLkTim-k4uEfa5+trZZoqxmOQAwxgEg9PVitERXPz-Z@mail.gmail.com>
Message-ID: <4CB7525F.9070403@oracle.com>

Hi Adam --

Do you have a fuller GC log (perhaps including
PrintCMSStatistics=2) to help make a sharper diagnosis?
It could be:-

    6412968 CMS: Long initial mark pauses

which we have unfortunately not gotten to addressing yet:
CMS initial mark work is (still) done single-threaded.
Usually there is little such work so we are usually fine,
but if survivor spaces are large and full and/or if CMS triggering
occupancy is such that CMS runs frequently then you can
be affected by long serial initial mark pauses because
the work is non-trivial. (CMS-remark and scavenges on
the other hand are done by several worker threads working
in parallel.)

Here's an excerpt from the "workaround" section of that bug
(reproduced here because i cannot seem to get bugs.sun.com
to display it) :-

> This is not really a viable workaround since it might lead to suboptimal
> heap configuration:
> (1) use no survivor spaces (at the risk of larger scavenge pauses, larger remark pauses,
>     even concurrent mode failures)
> (2) use a sufficiently large heap so as to be able to afford to set a
>     mark initiation threshold above the low water-mark (after a major
>     collection cycle). This will keep init-mark's riding on the coat-tails
>     of scavenges.
> *** (#1 of 2): 2006-04-13 09:53:14 PDT xxxx at oracle.com
> 
> Also, if using iCMS (Inceremental CMS), drop the Incremental mode and revert to
> vanilla CMS.
> *** (#2 of 2): 2010-04-14 11:02:03 PDT xxxx at oracle.com

If you have support, you can try escalating it via your support channels
to get this addressed, especially if the workaround/retuning doesn't
do the job.

-- ramki

On 10/14/10 11:31, Adam Hawthorne wrote:
> Hi all,
> 
> I'm seeing a customer running CMS with some fairly long initial mark 
> pauses, especially relative to all the other pauses.  The machine is (I 
> believe) an Itanium 6-way running HP-UX, with HP's version of Hotspot, 
> version 1.6.0.06 .
> 
> What I'm seeing is that all the remark and young gc pauses are less than 
> 500ms, and very consistent.  There are a lot of initial mark pauses that 
> also fall in this range.  That's our target, and things look really 
> good.  The problem is that there are occasional pauses of up to 1.5s, 
> which is unacceptable for the customer.  Does anyone have any ideas what 
> can cause (seemingly random) long initial mark pauses?
> 
> Here's an example of one of the long mark pauses:
> 
> 45946.930: [GC [1 CMS-initial-mark: 957254K(1598236K)] 
> 1473999K(2151196K), 1.3505680 secs] [Times: user=1.34 sys=0.00, 
> real=1.35 secs] 
> 
> Here's an example of one of the more typical pauses:
> 
> 45954.362: [GC [1 CMS-initial-mark: 963824K(1598236K)] 
> 1001370K(2151196K), 0.1579016 secs] [Times: user=0.16 sys=0.00, 
> real=0.16 secs] 
> 
> 
>  From my understanding, initial mark pauses are supposed to be 
> relatively short, and usually shorter than remark pauses, but I don't 
> have a remark pause greater than 300ms.
> 
> Any help is appreciated.  Thanks,
> 
> Adam
> 
> --
> Adam Hawthorne
> Software Engineer
> BASIS International Ltd.
> www.basis.com <http://www.basis.com>
> +1.505.345.5232 Phone
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

From higuava at gmail.com  Thu Oct 14 13:14:02 2010
From: higuava at gmail.com (Hi Guava)
Date: Thu, 14 Oct 2010 16:14:02 -0400
Subject: Different Full GCs?
In-Reply-To: <4CB749E4.2000909@oracle.com>
References: <AANLkTimQTqX421OyZEW0GEmCELTChr26oFK+NNnoZ4uT@mail.gmail.com>
	<4CB749E4.2000909@oracle.com>
Message-ID: <AANLkTimNJ0sDgjW1hQHydVwp8R9s+KfRkj-JyimGqs69@mail.gmail.com>

Thanks for your response. It helped a lot.
I think you are right about the mislabeled scavenges since I don't see
them under 6uXX. I also noticed additional (System) label in the full
gc messages in 6uXX.
You asked about Eden or Young Gen size. I can only try to answer by
the logs. The logs are from our customer's production. I think the
Young Gen size is 42496K because of "[ParNew: 750K->0K(42496K)" in the
log. I think the Eden size is 42496K / 8 * 6  = 31872K since
-XX:SurvivorRatio is not set and its default is 6.
Our application usually uses heap size between 8G to 64G. It creates
large number of short lived objects in bursts. Should we use a large
Young generation size because of this? Do you have any recommendation
for Young/Tenured ratio and Eden/Survivor ratio?

Thanks.

On Thu, Oct 14, 2010 at 2:20 PM, Y. S. Ramakrishna
<y.s.ramakrishna at oracle.com> wrote:
>
> Hello --
>
> On 10/14/10 10:42, Hi Guava wrote:
>>
>> I've seen different full GC messages and I don't quite understand them:
>> 13011:1474.283: [Full GC 1474.283: [CMS:
>> 3333048K->1021496K(12540352K), 6.3444520 secs]
>> 3363388K->1021496K(12582848K), [CMS Perm : 30528K->30483K(30656K)],
>> 6.3447880 secs]
>
> This is a full gc; it collects the whole heap.
>
>> 34822:4101.808: [Full GC 4101.808: [ParNew: 667K->0K(42496K),
>> 0.0210030 secs] 5953849K->5953209K(12582848K), 0.0211110 secs]
>> 51586:25535.616: [Full GC 25535.616: [ParNew: 750K->0K(42496K),
>> 0.0324350 secs] 5341677K->5340939K(12582848K), 0.0326130 secs]
>
> These two are not full gc's. They are mislabelled. They are likely
> scavenge's forced by the allocation policy interacting with
> JNI critical sections preventing a scavenge attempt made previously.
> I think the labelling has been fixed in 6uXX.
>
>> 63486:26306.646: [Full GC 26306.646: [CMS[Unloading class
>> sun.reflect.GeneratedConstructorAccessor20]
>
> Yes this is also a full gc.
>
> In 6uXX, the first and the last would be labelled with an additional
> "System.gc()" ?label, and the two middle ones would not say "Full".
> I don't have a bug id handy to point you to, but i might be able to
> dig one up after some archeology.
>
>>
>> Both GC #1 and #4 are triggered by System.gc() in our code. I believe
>> they are the same type. There was less memory available during #4 so
>> it unloaded classes (soft reference?).
>> But full gc is the stop-the-world gc. Why does it mention CMS in the
>> message?
>
> You are right that the "CMS" is misleading in that sense.
> The idea was that it collects the old generation which is typically
> collected by CMS. I agree that the CMS label is misleading and probably
> should be fixed; it's a consequence of our internal naming scheme for
> generation "types".
>
>> GC #2 and #3 look weird to me. They were not triggered by System.gc().
>> They are always very short and the duration is about the same as young
>> generation GCs. In fact, the message is exact like young generation
>> GCs except the extra word "Full". What are these short full gcs? Are
>> there different level of full GCs?
>
> No, and you are right that these are just scavenges. What must have happened
> is that your application probably does a few short-lived JNI critical
> sections (JNI_Get{Array,String}Critical) which happens around the time
> when another thread wants to do a large allocation which will not fit in
> the current space available in Eden, so the allocator attempts to do a
> scavenge, but is prevented from doing so because of the JNI critical
> section.
> This is remembered and when the critical section is exited, a scavenge
> is forced. At least that's my guess based on the messages above.
> (BTW, what's the size of your Eden or Young Gen? The policy should
> probably be a little smarter and not do those scavenges until an
> allocation request (would) fail.)
>
>> I spent some time searching for answer but I am still confused. Can
>> somebody help explain and suggest some reading materials?
>
> Hope that helps a bit. Try JDK 7 or 6u21 (or whatever is the latest) and
> see if the confusing messages are gone. If they are still there,
> let us know.
>
> thanks.
> -- ramki
>
>>
>> Thanks!
>>
>> The environment:
>> Java HotSpot(TM) 64-Bit Server VM Version 1.5.0_19-b02
>> Linux Version 2.6.9-89.0.20.ELsmp
>> amd64
>>
>> -Xms12g
>> -Xmx12g
>> -XX:+PrintGC
>> -XX:+PrintGCDetails
>> -XX:+PrintGCTimeStamps
>> -XX:+UseConcMarkSweepGC
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>

From y.s.ramakrishna at oracle.com  Thu Oct 14 13:17:29 2010
From: y.s.ramakrishna at oracle.com (Y. S. Ramakrishna)
Date: Thu, 14 Oct 2010 13:17:29 -0700
Subject: CMS initial mark pauses
In-Reply-To: <4CB7525F.9070403@oracle.com>
References: <AANLkTim-k4uEfa5+trZZoqxmOQAwxgEg9PVitERXPz-Z@mail.gmail.com>
	<4CB7525F.9070403@oracle.com>
Message-ID: <4CB76559.6010708@oracle.com>

Just realized that this is on HPUX/Itanium on a
JVM built by HP. You'll of course need  to go to HP to have that
addressed; sorry for not reading yr email carefully enough to
note the platform information that you had included before
writing my response below.

-- ramki

On 10/14/10 11:56, Y. S. Ramakrishna wrote:
> Hi Adam --
> 
> Do you have a fuller GC log (perhaps including
> PrintCMSStatistics=2) to help make a sharper diagnosis?
> It could be:-
> 
>    6412968 CMS: Long initial mark pauses
> 
> which we have unfortunately not gotten to addressing yet:
> CMS initial mark work is (still) done single-threaded.
> Usually there is little such work so we are usually fine,
> but if survivor spaces are large and full and/or if CMS triggering
> occupancy is such that CMS runs frequently then you can
> be affected by long serial initial mark pauses because
> the work is non-trivial. (CMS-remark and scavenges on
> the other hand are done by several worker threads working
> in parallel.)
> 
> Here's an excerpt from the "workaround" section of that bug
> (reproduced here because i cannot seem to get bugs.sun.com
> to display it) :-
> 
>> This is not really a viable workaround since it might lead to suboptimal
>> heap configuration:
>> (1) use no survivor spaces (at the risk of larger scavenge pauses, 
>> larger remark pauses,
>>     even concurrent mode failures)
>> (2) use a sufficiently large heap so as to be able to afford to set a
>>     mark initiation threshold above the low water-mark (after a major
>>     collection cycle). This will keep init-mark's riding on the 
>> coat-tails
>>     of scavenges.
>> *** (#1 of 2): 2006-04-13 09:53:14 PDT xxxx at oracle.com
>>
>> Also, if using iCMS (Inceremental CMS), drop the Incremental mode and 
>> revert to
>> vanilla CMS.
>> *** (#2 of 2): 2010-04-14 11:02:03 PDT xxxx at oracle.com
> 
> If you have support, you can try escalating it via your support channels
> to get this addressed, especially if the workaround/retuning doesn't
> do the job.
> 
> -- ramki
> 
> On 10/14/10 11:31, Adam Hawthorne wrote:
>> Hi all,
>>
>> I'm seeing a customer running CMS with some fairly long initial mark 
>> pauses, especially relative to all the other pauses.  The machine is 
>> (I believe) an Itanium 6-way running HP-UX, with HP's version of 
>> Hotspot, version 1.6.0.06 .
>>
>> What I'm seeing is that all the remark and young gc pauses are less 
>> than 500ms, and very consistent.  There are a lot of initial mark 
>> pauses that also fall in this range.  That's our target, and things 
>> look really good.  The problem is that there are occasional pauses of 
>> up to 1.5s, which is unacceptable for the customer.  Does anyone have 
>> any ideas what can cause (seemingly random) long initial mark pauses?
>>
>> Here's an example of one of the long mark pauses:
>>
>> 45946.930: [GC [1 CMS-initial-mark: 957254K(1598236K)] 
>> 1473999K(2151196K), 1.3505680 secs] [Times: user=1.34 sys=0.00, 
>> real=1.35 secs]
>> Here's an example of one of the more typical pauses:
>>
>> 45954.362: [GC [1 CMS-initial-mark: 963824K(1598236K)] 
>> 1001370K(2151196K), 0.1579016 secs] [Times: user=0.16 sys=0.00, 
>> real=0.16 secs]
>>
>>  From my understanding, initial mark pauses are supposed to be 
>> relatively short, and usually shorter than remark pauses, but I don't 
>> have a remark pause greater than 300ms.
>>
>> Any help is appreciated.  Thanks,
>>
>> Adam
>>
>> -- 
>> Adam Hawthorne
>> Software Engineer
>> BASIS International Ltd.
>> www.basis.com <http://www.basis.com>
>> +1.505.345.5232 Phone
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
> 

From y.s.ramakrishna at oracle.com  Thu Oct 14 16:00:30 2010
From: y.s.ramakrishna at oracle.com (Y. S. Ramakrishna)
Date: Thu, 14 Oct 2010 16:00:30 -0700
Subject: CMS initial mark pauses
In-Reply-To: <AANLkTi=sQGtFM4F9CePKdWvH1+qrBecMQgignNh_iy=c@mail.gmail.com>
References: <AANLkTim-k4uEfa5+trZZoqxmOQAwxgEg9PVitERXPz-Z@mail.gmail.com>
	<4CB7525F.9070403@oracle.com>
	<AANLkTi=sQGtFM4F9CePKdWvH1+qrBecMQgignNh_iy=c@mail.gmail.com>
Message-ID: <4CB78B8E.5050700@oracle.com>


Hi Adam --

...
> 
> I understood before that "initial" is not done in parallel.  I'm curious 
> - why not?

When it was first implemented CMS did all its work single-threaded
over a serial scavenger. It was incrementally parallelized over time
but because initial-mark pauses ere usually not a concern (small edens,
small survivor spaces, initial mark immediately following a scavenge)
it never rose high enough in priority to parallelize. Clearly we have
reached a point where the old assumptions no longer hold and it's
time to parallelize it. Or better still to move to G1 which is fully
parallel and concurrent, and have other advantages as well.

> 
> I have CMSInitiatingOccupancyFraction=50 because I was concerned about 
> some finalization issues in our application, and I thought I remembered 
> reference processing wasn't done in young GC's.  After enabling 
> PrintReferenceGC, the logs imply  ParNewGC also clears references - is 
> that true?  If so, it may not be necessary for us to include that option 
> anyway.

Yes, scavenges do process unreachable Reference objects found in the young gen.
However, once these get into the old gen, you are right that you will need a
CMS cycle to identify them as unreachable and to process them appropriately.

> 
>     Here's an excerpt from the "workaround" section of that bug
>     (reproduced here because i cannot seem to get bugs.sun.com
>     <http://bugs.sun.com>
>     to display it) :-
> 
>         This is not really a viable workaround since it might lead to
>         suboptimal
>         heap configuration:
>         (1) use no survivor spaces (at the risk of larger scavenge
>         pauses, larger remark pauses,
>            even concurrent mode failures)
>         (2) use a sufficiently large heap so as to be able to afford to
>         set a
>            mark initiation threshold above the low water-mark (after a major
>            collection cycle). This will keep init-mark's riding on the
>         coat-tails
>            of scavenges.
>         *** (#1 of 2): 2006-04-13 09:53:14 PDT xxxx at oracle.com
>         <mailto:xxxx at oracle.com>
> 
> 
> 
> The customer's application appears to fit neatly in a 2.4G heap, and we 
> have -Xmx4g, so I believe we might be able to apply (2) here.  Is (1) 
> above required along with (2), or do these workarounds address the 
> problem independently?  I ask because (a) this customer is already 
> concerned about pause times, so I don't have a lot of room to increase 
> remark and scavenge times, and (b) I'm concerned about eliminating 
> survivor spaces since we've dealt with significant heap fragmentation in 
> the past.

Precisely. The two are actually additive, but either by itself may not
be sufficient, and as you pointed out (1) may not even be always feasible.

> 
> One other data point is that we have a large number of mostly idle 
> threads (3826 at one count), with most of the idle threads holding onto 
> approximately 2MB of object data.  I don't know if that would 
> significantly contribute to the initial mark pause, but my intuition is 
> that it would increase the time if some of that time is spent marking 
> the stack locals.

Yes, that could be, but probably less significant than a large Eden or survivor
space, given that when the CMS initial-mark pauses come immediately after
a scavenge, the pauses are much shorter, so the larger contribution is
from the large Eden. If you pour your GC logs into GCHisto, you
should probably see that the CMS intial-mark pauses increase as
the most recent scavenge becomes more distant (or you could plot that
via a spreadsheet and note that relationship).

-- ramki

> 
>  
> 
> 
> 
>         Also, if using iCMS (Inceremental CMS), drop the Incremental
>         mode and revert to
>         vanilla CMS.
>         *** (#2 of 2): 2010-04-14 11:02:03 PDT xxxx at oracle.com
>         <mailto:xxxx at oracle.com>
> 
> 
>     If you have support, you can try escalating it via your support channels
>     to get this addressed, especially if the workaround/retuning doesn't
>     do the job.
> 
>     -- ramki
> 
> 
> My option seems to be to eliminate the CMSInitiatingOccupancyFraction=50 
> and keep the -Xmx4g.  Would it be prudent to set -Xms4g also?
> 
> And the log excerpt from a steady-state in the application.  The sigma 
> on pause times for young gc and remark is 17ms and 26ms - they're like 
> clockwork.  The initial mark is higher, 334ms due to the large-valued 
> outliers.
> 
> 

...

From adamh at basis.com  Fri Oct 15 11:49:44 2010
From: adamh at basis.com (Adam Hawthorne)
Date: Fri, 15 Oct 2010 14:49:44 -0400
Subject: CMS initial mark pauses
In-Reply-To: <4CB78B8E.5050700@oracle.com>
References: <AANLkTim-k4uEfa5+trZZoqxmOQAwxgEg9PVitERXPz-Z@mail.gmail.com>
	<4CB7525F.9070403@oracle.com>
	<AANLkTi=sQGtFM4F9CePKdWvH1+qrBecMQgignNh_iy=c@mail.gmail.com>
	<4CB78B8E.5050700@oracle.com>
Message-ID: <AANLkTimmzs0m6wjVYs3PfjsX_1ZySc7kcFTWYREUmUS2@mail.gmail.com>

On Thu, Oct 14, 2010 at 19:00, Y. S. Ramakrishna <y.s.ramakrishna at oracle.com
> wrote:

>
> Hi Adam --
>
> ...
>
>
>> I understood before that "initial" is not done in parallel.  I'm curious -
>> why not?
>>
>
> When it was first implemented CMS did all its work single-threaded
> over a serial scavenger. It was incrementally parallelized over time
> but because initial-mark pauses ere usually not a concern (small edens,
> small survivor spaces, initial mark immediately following a scavenge)
> it never rose high enough in priority to parallelize. Clearly we have
> reached a point where the old assumptions no longer hold and it's
> time to parallelize it. Or better still to move to G1 which is fully
> parallel and concurrent, and have other advantages as well.
>
>
Thanks for the history lesson!  We did mention G1 to our customer yesterday,
but I'm not yet familiar enough with its tuning knobs to be confident to
suggest it for a production system.  We've only done minimal testing
in-house, and not yet on the scale of this customer.

More generally, for ParGC and CMS, our heuristic has been to set heap size,
configure new size, and then if necessary, configure survivor spaces and
maybe some other knobs to fulfill our customer requirements.  I don't know
what the equivalent settings are for G1.  I'm curious if there's a similar
"recipe" for getting it configured and tuned.  When we tried earlier, we
didn't have much success with it.  Can anyone who's spent significant time
tuning it relate their experiences?  Is it worth trying on 2-4 core systems
with 1-4g of RAM?


>
>
>> I have CMSInitiatingOccupancyFraction=50 because I was concerned about
>> some finalization issues in our application, and I thought I remembered
>> reference processing wasn't done in young GC's.  After enabling
>> PrintReferenceGC, the logs imply  ParNewGC also clears references - is that
>> true?  If so, it may not be necessary for us to include that option anyway.
>>
>
> Yes, scavenges do process unreachable Reference objects found in the young
> gen.
> However, once these get into the old gen, you are right that you will need
> a
> CMS cycle to identify them as unreachable and to process them
> appropriately.


Thanks for the confirmation.


>        (1) use no survivor spaces (at the risk of larger scavenge
>>        pauses, larger remark pauses,
>>           even concurrent mode failures)
>>        (2) use a sufficiently large heap so as to be able to afford to
>>        set a
>>           mark initiation threshold above the low water-mark (after a
>> major
>>           collection cycle). This will keep init-mark's riding on the
>>        coat-tails
>>           of scavenges.
>>
>> The customer's application appears to fit neatly in a 2.4G heap, and we
>> have -Xmx4g, so I believe we might be able to apply (2) here.  Is (1) above
>> required along with (2), or do these workarounds address the problem
>> independently?  I ask because (a) this customer is already concerned about
>> pause times, so I don't have a lot of room to increase remark and scavenge
>> times, and (b) I'm concerned about eliminating survivor spaces since we've
>> dealt with significant heap fragmentation in the past.
>>
>
> Precisely. The two are actually additive, but either by itself may not
> be sufficient, and as you pointed out (1) may not even be always feasible.


I reduced the survivor spaces in my recommendation for today but did not
completely eliminate them, and increased the old gen size.  Unfortunately,
the customer made a mistake in the settings that disabled
-XX:+PrintGCDetails, so they failed to get new logs.  They reported that
their user experience was slightly worse, but without logs, I can't
determine whether the GC's are the problem or something else.


> One other data point is that we have a large number of mostly idle threads
>> (3826 at one count), with most of the idle threads holding onto
>> approximately 2MB of object data.  I don't know if that would significantly
>> contribute to the initial mark pause, but my intuition is that it would
>> increase the time if some of that time is spent marking the stack locals.
>>
>
> Yes, that could be, but probably less significant than a large Eden or
> survivor
> space, given that when the CMS initial-mark pauses come immediately after
> a scavenge, the pauses are much shorter, so the larger contribution is
> from the large Eden. If you pour your GC logs into GCHisto, you
> should probably see that the CMS intial-mark pauses increase as
> the most recent scavenge becomes more distant (or you could plot that
> via a spreadsheet and note that relationship).
>

Ok, I checked it in gchisto and you were exactly right.  This was
immediately obvious.

Thanks for your help again.


> -- ramki
>
>
>>
>>
>>
>>        Also, if using iCMS (Inceremental CMS), drop the Incremental
>>        mode and revert to
>>        vanilla CMS.
>>        *** (#2 of 2): 2010-04-14 11:02:03 PDT xxxx at oracle.com
>>        <mailto:xxxx at oracle.com>
>>
>>
>>
>>    If you have support, you can try escalating it via your support
>> channels
>>    to get this addressed, especially if the workaround/retuning doesn't
>>    do the job.
>>
>>    -- ramki
>>
>>
>> My option seems to be to eliminate the CMSInitiatingOccupancyFraction=50
>> and keep the -Xmx4g.  Would it be prudent to set -Xms4g also?
>>
>> And the log excerpt from a steady-state in the application.  The sigma on
>> pause times for young gc and remark is 17ms and 26ms - they're like
>> clockwork.  The initial mark is higher, 334ms due to the large-valued
>> outliers.
>>
>>
>>
> ...
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101015/0ca825d9/attachment-0001.html 

From adamh at basis.com  Mon Oct 18 13:49:30 2010
From: adamh at basis.com (Adam Hawthorne)
Date: Mon, 18 Oct 2010 16:49:30 -0400
Subject: CMS initial mark pauses
In-Reply-To: <4CB78B8E.5050700@oracle.com>
References: <AANLkTim-k4uEfa5+trZZoqxmOQAwxgEg9PVitERXPz-Z@mail.gmail.com>
	<4CB7525F.9070403@oracle.com>
	<AANLkTi=sQGtFM4F9CePKdWvH1+qrBecMQgignNh_iy=c@mail.gmail.com>
	<4CB78B8E.5050700@oracle.com>
Message-ID: <AANLkTimF_Qrk2RVDcosP=aWVYYjwAwCYbMnZC5pEJeuG@mail.gmail.com>

On Thu, Oct 14, 2010 at 19:00, Y. S. Ramakrishna <y.s.ramakrishna at oracle.com
> wrote:

>
> Hi Adam --
>
> ...
>
>>
>> I have CMSInitiatingOccupancyFraction=50 because I was concerned about
>> some finalization issues in our application, and I thought I remembered
>> reference processing wasn't done in young GC's.  After enabling
>> PrintReferenceGC, the logs imply  ParNewGC also clears references - is that
>> true?  If so, it may not be necessary for us to include that option anyway.
>>
>
> Yes, scavenges do process unreachable Reference objects found in the young
> gen.
> However, once these get into the old gen, you are right that you will need
> a
> CMS cycle to identify them as unreachable and to process them
> appropriately.
>
>
>>    Here's an excerpt from the "workaround" section of that bug
>>    ...
>>
>  The customer's application appears to fit neatly in a 2.4G heap, and we
>> have -Xmx4g, so I believe we might be able to apply (2) here.  Is (1) above
>> required along with (2), or do these workarounds address the problem
>> independently?  I ask because (a) this customer is already concerned about
>> pause times, so I don't have a lot of room to increase remark and scavenge
>> times, and (b) I'm concerned about eliminating survivor spaces since we've
>> dealt with significant heap fragmentation in the past.
>>
>
> Precisely. The two are actually additive, but either by itself may not
> be sufficient, and as you pointed out (1) may not even be always feasible.
> ...
> -- ramki
>


Just a followup - I removed the CMSInitiatingOccupancyFraction, and I tried
to fulfill the spirit of the workaround by setting the SurvivorRatio to
significantly limit the survivor space size.  The customer mistyped one of
the logging parameters, so I wasn't able to get the logs from that day, but
the report was that performance suffered significantly

I looked back at the logs and discovered that every initial-mark that
followed immediately after or even modestly soon after a young gc was well
within the pause time requirement.  Only those which were more than halfway
to the next young generation were long, just as Ramki predicted.

So all I did was remove the CMSInitiatingOccupancyFraction and set the heap
size to 4g, and the system was reported to be working well today.

I ran some tests with G1 today, but I'll post a separate thread about that.

Thanks for the help.

Adam

--
Adam Hawthorne
Software Engineer
BASIS International Ltd.
www.basis.com
+1.505.345.5232 Phone
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101018/14c7289b/attachment.html 

From adamh at basis.com  Mon Oct 18 14:01:40 2010
From: adamh at basis.com (Adam Hawthorne)
Date: Mon, 18 Oct 2010 17:01:40 -0400
Subject: G1 performance
Message-ID: <AANLkTin4oZHvkiPT9Z5pzhGYcDqLbx=xUaZ-A5Rc_5ax@mail.gmail.com>

Hi all,

Over the weekend, we created a test to try to reproduce our pause time
issues I posted about last week so we could be more confident in our
recommendation to the customer.  While I had the machine provisioned, I ran
our test with G1 .  I'm afraid the results were quite poor for our
application.  I have this machine for the next week, and I'll be trying out
different test configurations, but I'd like to continue to test G1 while I
have the machine available.  Is there more information about tuning G1?  Our
test box is 64-bit Linux, with 6u22 installed.  I tried two different
configurations.

-Xms4g -Xmx4g -XX:+UnlockExperimentalVMOptions -XX:+UseG1GC -XX:NewSize=600m
-XX:MaxGCPauseMillis=400 -XX:GCPauseIntervalMillis=3000 -XX:MaxPermSize=128m
-server -XX\:+PrintGCDetails -XX\:+PrintGCTimeStamps -Xloggc:gc.log

WithNewSize.log<https://sites.google.com/a/basis.com/adam/files/WithNewSize.log?attredirects=0&d=1>

When that produced many long Full GC's, I tried decreasing the pause
interval and removed the NewSize setting:

-Xms4g -Xmx4g -XX:+UnlockExperimentalVMOptions -XX:+UseG1GC
-XX:MaxGCPauseMillis=400 -XX:GCPauseIntervalMillis=2000 -XX:MaxPermSize=128m
-server -XX\:+PrintGCDetails -XX\:+PrintGCTimeStamps -Xloggc:gc.log

NoNewSize.log<https://sites.google.com/a/basis.com/adam/files/NoNewSize.log?attredirects=0&d=1>

The result was that there were a lot of Full GC's each taking about 7
seconds.  Young GC's performed well (except one of the first ones).  Do I
just need to reduce the pause interval, assuming the pause time requirement
is fixed?

In contrast, CMS was able to keep all pause times below 300ms with the same
test, with about 25% GC overhead.

I also tried various combinations of:

-XX:+G1YoungGenSize=600m -XX:+G1ParallelRSetUpdatingEnabled
-XX:+G1ParallelRSetScanningEnabled

and the JVM would not start with any of these options.  Did the names change
in a recent release?  If so, can someone send the new options?  It would
also be helpful if the following document could be updated:

http://www.oracle.com/technetwork/java/javase/tech/g1-intro-jsp-135488.html

If anyone is interested, I can run more tests with more logging, and I can
run the test again with other Java versions.

Adam

--
Adam Hawthorne
Software Engineer
BASIS International Ltd.
www.basis.com
+1.505.345.5232 Phone
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101018/98d6bb00/attachment.html 

From higuava at gmail.com  Mon Oct 25 17:32:48 2010
From: higuava at gmail.com (Hi Guava)
Date: Mon, 25 Oct 2010 20:32:48 -0400
Subject: Long young generation GC?
Message-ID: <AANLkTikvwa7bQDXZjToi5hsD8BEW4QdLCRq1GHTRk6-N@mail.gmail.com>

The third young generation GC took 439.2720750 secs but the user and
real time are only 0.08 seconds. What does it mean?

72667.213: [GC 72667.213: [ParNew: 38336K->4224K(38336K), 1.2473840 secs]
3443948K->3420569K(6549376K), 1.2474290 secs] [Times: user=0.64 sys=1.10,
real=1.25 secs]
72680.531: [GC 72680.532: [ParNew: 38336K->4221K(38336K), 0.2916570 secs]
3008948K->2979033K(6549376K), 0.2916710 secs] [Times: user=0.26 sys=0.03,
real=0.29 secs]
72681.425: [GC 72681.426: [ParNew: 38325K->4224K(38336K), 439.2720750
secs] 3013053K->2979055K(6549376K), 439.2720750 secs] [Times: user=0.08
sys=0.00, real=0.08 secs]

Environment:
Java HotSpot(TM) 64-Bit Server VM Version 1.6.0_20-b02
Linux Version 2.6.18-128.1.1.el5 on amd64
-Xms6400m
-Xmx6400m
-Xss256k
-XX:+UseConcMarkSweepGC
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+UseCompressedOops

From y.s.ramakrishna at oracle.com  Mon Oct 25 17:49:33 2010
From: y.s.ramakrishna at oracle.com (Y. Srinivas Ramakrishna)
Date: Mon, 25 Oct 2010 17:49:33 -0700
Subject: Long young generation GC?
In-Reply-To: <AANLkTikvwa7bQDXZjToi5hsD8BEW4QdLCRq1GHTRk6-N@mail.gmail.com>
References: <AANLkTikvwa7bQDXZjToi5hsD8BEW4QdLCRq1GHTRk6-N@mail.gmail.com>
Message-ID: <4CC6259D.5050303@oracle.com>

On 10/25/2010 5:32 PM, Hi Guava wrote:
> The third young generation GC took 439.2720750 secs but the user and
> real time are only 0.08 seconds. What does it mean?

The machine may be using NTP, and the time may have been changed?
JVM timestamps on Linux seem still to be based on TOD rather than
on TSC. Someone in the runtime team (cc'd) may have more detail on
why that might still be so.

-- ramki


>
> 72667.213: [GC 72667.213: [ParNew: 38336K->4224K(38336K), 1.2473840 secs]
> 3443948K->3420569K(6549376K), 1.2474290 secs] [Times: user=0.64 sys=1.10,
> real=1.25 secs]
> 72680.531: [GC 72680.532: [ParNew: 38336K->4221K(38336K), 0.2916570 secs]
> 3008948K->2979033K(6549376K), 0.2916710 secs] [Times: user=0.26 sys=0.03,
> real=0.29 secs]
> 72681.425: [GC 72681.426: [ParNew: 38325K->4224K(38336K), 439.2720750
> secs] 3013053K->2979055K(6549376K), 439.2720750 secs] [Times: user=0.08
> sys=0.00, real=0.08 secs]
>
> Environment:
> Java HotSpot(TM) 64-Bit Server VM Version 1.6.0_20-b02
> Linux Version 2.6.18-128.1.1.el5 on amd64
> -Xms6400m
> -Xmx6400m
> -Xss256k
> -XX:+UseConcMarkSweepGC
> -XX:+PrintGCDetails
> -XX:+PrintGCTimeStamps
> -XX:+UseCompressedOops
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use


From y.s.ramakrishna at oracle.com  Mon Oct 25 17:51:57 2010
From: y.s.ramakrishna at oracle.com (Y. Srinivas Ramakrishna)
Date: Mon, 25 Oct 2010 17:51:57 -0700
Subject: Long young generation GC?
In-Reply-To: <4CC6259D.5050303@oracle.com>
References: <AANLkTikvwa7bQDXZjToi5hsD8BEW4QdLCRq1GHTRk6-N@mail.gmail.com>
	<4CC6259D.5050303@oracle.com>
Message-ID: <4CC6262D.6050608@oracle.com>

On 10/25/2010 5:49 PM, Y. Srinivas Ramakrishna wrote:
> On 10/25/2010 5:32 PM, Hi Guava wrote:
>> The third young generation GC took 439.2720750 secs but the user and
>> real time are only 0.08 seconds. What does it mean?
>
> The machine may be using NTP, and the time may have been changed?

Seems a rather large jump, so may not be NTP (which i am told uses
adjtime() to slowly accelerate the time forward or decelerate it backward),
but rather an abrupt perhaps manual change in TOD.

Over to the experts....

> JVM timestamps on Linux seem still to be based on TOD rather than
> on TSC. Someone in the runtime team (cc'd) may have more detail on
> why that might still be so.
>
> -- ramki
>
>
>>
>> 72667.213: [GC 72667.213: [ParNew: 38336K->4224K(38336K), 1.2473840 secs]
>> 3443948K->3420569K(6549376K), 1.2474290 secs] [Times: user=0.64 sys=1.10,
>> real=1.25 secs]
>> 72680.531: [GC 72680.532: [ParNew: 38336K->4221K(38336K), 0.2916570 secs]
>> 3008948K->2979033K(6549376K), 0.2916710 secs] [Times: user=0.26 sys=0.03,
>> real=0.29 secs]
>> 72681.425: [GC 72681.426: [ParNew: 38325K->4224K(38336K), 439.2720750
>> secs] 3013053K->2979055K(6549376K), 439.2720750 secs] [Times: user=0.08
>> sys=0.00, real=0.08 secs]
>>
>> Environment:
>> Java HotSpot(TM) 64-Bit Server VM Version 1.6.0_20-b02
>> Linux Version 2.6.18-128.1.1.el5 on amd64
>> -Xms6400m
>> -Xmx6400m
>> -Xss256k
>> -XX:+UseConcMarkSweepGC
>> -XX:+PrintGCDetails
>> -XX:+PrintGCTimeStamps
>> -XX:+UseCompressedOops
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>


From higuava at gmail.com  Tue Oct 26 07:07:04 2010
From: higuava at gmail.com (Hi Guava)
Date: Tue, 26 Oct 2010 10:07:04 -0400
Subject: Long young generation GC?
In-Reply-To: <4CC6262D.6050608@oracle.com>
References: <AANLkTikvwa7bQDXZjToi5hsD8BEW4QdLCRq1GHTRk6-N@mail.gmail.com>
	<4CC6259D.5050303@oracle.com> <4CC6262D.6050608@oracle.com>
Message-ID: <AANLkTi=rpk8EorzL3GL_=gP9ZUrSaoNYc=nXzhqt5e-_@mail.gmail.com>

Here is additional information about the machine running the JVM. It
is a virtual machine running in a private cloud. Could it be something
like swapping that caused problem?

On Mon, Oct 25, 2010 at 8:51 PM, Y. Srinivas Ramakrishna
<y.s.ramakrishna at oracle.com> wrote:
> On 10/25/2010 5:49 PM, Y. Srinivas Ramakrishna wrote:
>>
>> On 10/25/2010 5:32 PM, Hi Guava wrote:
>>>
>>> The third young generation GC took 439.2720750 secs but the user and
>>> real time are only 0.08 seconds. What does it mean?
>>
>> The machine may be using NTP, and the time may have been changed?
>
> Seems a rather large jump, so may not be NTP (which i am told uses
> adjtime() to slowly accelerate the time forward or decelerate it backward),
> but rather an abrupt perhaps manual change in TOD.
>
> Over to the experts....
>
>> JVM timestamps on Linux seem still to be based on TOD rather than
>> on TSC. Someone in the runtime team (cc'd) may have more detail on
>> why that might still be so.
>>
>> -- ramki
>>
>>
>>>
>>> 72667.213: [GC 72667.213: [ParNew: 38336K->4224K(38336K), 1.2473840 secs]
>>> 3443948K->3420569K(6549376K), 1.2474290 secs] [Times: user=0.64 sys=1.10,
>>> real=1.25 secs]
>>> 72680.531: [GC 72680.532: [ParNew: 38336K->4221K(38336K), 0.2916570 secs]
>>> 3008948K->2979033K(6549376K), 0.2916710 secs] [Times: user=0.26 sys=0.03,
>>> real=0.29 secs]
>>> 72681.425: [GC 72681.426: [ParNew: 38325K->4224K(38336K), 439.2720750
>>> secs] 3013053K->2979055K(6549376K), 439.2720750 secs] [Times: user=0.08
>>> sys=0.00, real=0.08 secs]
>>>
>>> Environment:
>>> Java HotSpot(TM) 64-Bit Server VM Version 1.6.0_20-b02
>>> Linux Version 2.6.18-128.1.1.el5 on amd64
>>> -Xms6400m
>>> -Xmx6400m
>>> -Xss256k
>>> -XX:+UseConcMarkSweepGC
>>> -XX:+PrintGCDetails
>>> -XX:+PrintGCTimeStamps
>>> -XX:+UseCompressedOops
>>> _______________________________________________
>>> hotspot-gc-use mailing list
>>> hotspot-gc-use at openjdk.java.net
>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>
>
>

From y.s.ramakrishna at oracle.com  Tue Oct 26 09:34:40 2010
From: y.s.ramakrishna at oracle.com (Y. S. Ramakrishna)
Date: Tue, 26 Oct 2010 09:34:40 -0700
Subject: Long young generation GC?
In-Reply-To: <AANLkTi=rpk8EorzL3GL_=gP9ZUrSaoNYc=nXzhqt5e-_@mail.gmail.com>
References: <AANLkTikvwa7bQDXZjToi5hsD8BEW4QdLCRq1GHTRk6-N@mail.gmail.com>	<4CC6259D.5050303@oracle.com>	<4CC6262D.6050608@oracle.com>
	<AANLkTi=rpk8EorzL3GL_=gP9ZUrSaoNYc=nXzhqt5e-_@mail.gmail.com>
Message-ID: <4CC70320.4080207@oracle.com>


On 10/26/10 07:07, Hi Guava wrote:
> Here is additional information about the machine running the JVM. It
> is a virtual machine running in a private cloud. Could it be something
> like swapping that caused problem?

Not swapping, but perhaps the management of "time" perhaps in a virtualized
setting (by that i mean that there may be interactions between the
host/hypervisor and the guest OS that could cause the JVM to observe
time jumps of this sort)? I'd suggest gathering more data on its
reproducibility (or otherwise) in both a VM and non-VM setting.

Over to the time experts in the runtime team who may have encountered
issues in VM settings previously. (I have heard of occasional such reports in
virtual settings before but don't know if any of these were definitively chased
down.) You might also want to check with the VM provider to see if they
might know of such issues.

-- ramki


> 
> On Mon, Oct 25, 2010 at 8:51 PM, Y. Srinivas Ramakrishna
> <y.s.ramakrishna at oracle.com> wrote:
>> On 10/25/2010 5:49 PM, Y. Srinivas Ramakrishna wrote:
>>> On 10/25/2010 5:32 PM, Hi Guava wrote:
>>>> The third young generation GC took 439.2720750 secs but the user and
>>>> real time are only 0.08 seconds. What does it mean?
>>> The machine may be using NTP, and the time may have been changed?
>> Seems a rather large jump, so may not be NTP (which i am told uses
>> adjtime() to slowly accelerate the time forward or decelerate it backward),
>> but rather an abrupt perhaps manual change in TOD.
>>
>> Over to the experts....
>>
>>> JVM timestamps on Linux seem still to be based on TOD rather than
>>> on TSC. Someone in the runtime team (cc'd) may have more detail on
>>> why that might still be so.
>>>
>>> -- ramki
>>>
>>>
>>>> 72667.213: [GC 72667.213: [ParNew: 38336K->4224K(38336K), 1.2473840 secs]
>>>> 3443948K->3420569K(6549376K), 1.2474290 secs] [Times: user=0.64 sys=1.10,
>>>> real=1.25 secs]
>>>> 72680.531: [GC 72680.532: [ParNew: 38336K->4221K(38336K), 0.2916570 secs]
>>>> 3008948K->2979033K(6549376K), 0.2916710 secs] [Times: user=0.26 sys=0.03,
>>>> real=0.29 secs]
>>>> 72681.425: [GC 72681.426: [ParNew: 38325K->4224K(38336K), 439.2720750
>>>> secs] 3013053K->2979055K(6549376K), 439.2720750 secs] [Times: user=0.08
>>>> sys=0.00, real=0.08 secs]
>>>>
>>>> Environment:
>>>> Java HotSpot(TM) 64-Bit Server VM Version 1.6.0_20-b02
>>>> Linux Version 2.6.18-128.1.1.el5 on amd64
>>>> -Xms6400m
>>>> -Xmx6400m
>>>> -Xss256k
>>>> -XX:+UseConcMarkSweepGC
>>>> -XX:+PrintGCDetails
>>>> -XX:+PrintGCTimeStamps
>>>> -XX:+UseCompressedOops
>>>> _______________________________________________
>>>> hotspot-gc-use mailing list
>>>> hotspot-gc-use at openjdk.java.net
>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>

From higuava at gmail.com  Tue Oct 26 10:49:53 2010
From: higuava at gmail.com (Hi Guava)
Date: Tue, 26 Oct 2010 13:49:53 -0400
Subject: Long young generation GC?
In-Reply-To: <4CC70320.4080207@oracle.com>
References: <AANLkTikvwa7bQDXZjToi5hsD8BEW4QdLCRq1GHTRk6-N@mail.gmail.com>
	<4CC6259D.5050303@oracle.com> <4CC6262D.6050608@oracle.com>
	<AANLkTi=rpk8EorzL3GL_=gP9ZUrSaoNYc=nXzhqt5e-_@mail.gmail.com>
	<4CC70320.4080207@oracle.com>
Message-ID: <AANLkTikfToCQu2AmojinM-fXXhE_bmX-7peamR5GBrgD@mail.gmail.com>

I now believe that this phenomenon is caused by the virtual machine.
It has nothing to do with the garbage collector or JVM. I searched in
the old logs and found this in all 3 old logs that I have. There are
multiple virtual machines configured the same way. This problem only
shows up in one of the virtual machines.
By the way, the 639 seconds GC is not a perception problem. It is
real. The users reported stuck process and they found the CPUs of the
virtual machine was racing during that period.
Can I understand this discrepancy this way? the user, sys and real
times are measured in cpu cycles. They are short as they are supposed
to be. The 439.2720750 time is the elapsed time. Since the virtual
machine is doing something else or not functioning correctly, GC took
439 seconds even though there was only 0.08 seconds of cpu time.

72681.425: [GC 72681.426: [ParNew: 38325K->4224K(38336K), 439.2720750
secs] 3013053K->2979055K(6549376K), 439.2720750 secs] [Times:
user=0.08 sys=0.00, real=0.08 secs]


On Tue, Oct 26, 2010 at 12:34 PM, Y. S. Ramakrishna
<y.s.ramakrishna at oracle.com> wrote:
>
>
> On 10/26/10 07:07, Hi Guava wrote:
>>
>> Here is additional information about the machine running the JVM. It
>> is a virtual machine running in a private cloud. Could it be something
>> like swapping that caused problem?
>
> Not swapping, but perhaps the management of "time" perhaps in a virtualized
> setting (by that i mean that there may be interactions between the
> host/hypervisor and the guest OS that could cause the JVM to observe
> time jumps of this sort)? I'd suggest gathering more data on its
> reproducibility (or otherwise) in both a VM and non-VM setting.
>
> Over to the time experts in the runtime team who may have encountered
> issues in VM settings previously. (I have heard of occasional such reports
> in
> virtual settings before but don't know if any of these were definitively
> chased
> down.) You might also want to check with the VM provider to see if they
> might know of such issues.
>
> -- ramki
>
>
>>
>> On Mon, Oct 25, 2010 at 8:51 PM, Y. Srinivas Ramakrishna
>> <y.s.ramakrishna at oracle.com> wrote:
>>>
>>> On 10/25/2010 5:49 PM, Y. Srinivas Ramakrishna wrote:
>>>>
>>>> On 10/25/2010 5:32 PM, Hi Guava wrote:
>>>>>
>>>>> The third young generation GC took 439.2720750 secs but the user and
>>>>> real time are only 0.08 seconds. What does it mean?
>>>>
>>>> The machine may be using NTP, and the time may have been changed?
>>>
>>> Seems a rather large jump, so may not be NTP (which i am told uses
>>> adjtime() to slowly accelerate the time forward or decelerate it
>>> backward),
>>> but rather an abrupt perhaps manual change in TOD.
>>>
>>> Over to the experts....
>>>
>>>> JVM timestamps on Linux seem still to be based on TOD rather than
>>>> on TSC. Someone in the runtime team (cc'd) may have more detail on
>>>> why that might still be so.
>>>>
>>>> -- ramki
>>>>
>>>>
>>>>> 72667.213: [GC 72667.213: [ParNew: 38336K->4224K(38336K), 1.2473840
>>>>> secs]
>>>>> 3443948K->3420569K(6549376K), 1.2474290 secs] [Times: user=0.64
>>>>> sys=1.10,
>>>>> real=1.25 secs]
>>>>> 72680.531: [GC 72680.532: [ParNew: 38336K->4221K(38336K), 0.2916570
>>>>> secs]
>>>>> 3008948K->2979033K(6549376K), 0.2916710 secs] [Times: user=0.26
>>>>> sys=0.03,
>>>>> real=0.29 secs]
>>>>> 72681.425: [GC 72681.426: [ParNew: 38325K->4224K(38336K), 439.2720750
>>>>> secs] 3013053K->2979055K(6549376K), 439.2720750 secs] [Times: user=0.08
>>>>> sys=0.00, real=0.08 secs]
>>>>>
>>>>> Environment:
>>>>> Java HotSpot(TM) 64-Bit Server VM Version 1.6.0_20-b02
>>>>> Linux Version 2.6.18-128.1.1.el5 on amd64
>>>>> -Xms6400m
>>>>> -Xmx6400m
>>>>> -Xss256k
>>>>> -XX:+UseConcMarkSweepGC
>>>>> -XX:+PrintGCDetails
>>>>> -XX:+PrintGCTimeStamps
>>>>> -XX:+UseCompressedOops
>>>>> _______________________________________________
>>>>> hotspot-gc-use mailing list
>>>>> hotspot-gc-use at openjdk.java.net
>>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>
>

From y.s.ramakrishna at oracle.com  Tue Oct 26 10:58:44 2010
From: y.s.ramakrishna at oracle.com (Y. S. Ramakrishna)
Date: Tue, 26 Oct 2010 10:58:44 -0700
Subject: Long young generation GC?
In-Reply-To: <AANLkTikfToCQu2AmojinM-fXXhE_bmX-7peamR5GBrgD@mail.gmail.com>
References: <AANLkTikvwa7bQDXZjToi5hsD8BEW4QdLCRq1GHTRk6-N@mail.gmail.com>	<4CC6259D.5050303@oracle.com>	<4CC6262D.6050608@oracle.com>	<AANLkTi=rpk8EorzL3GL_=gP9ZUrSaoNYc=nXzhqt5e-_@mail.gmail.com>	<4CC70320.4080207@oracle.com>
	<AANLkTikfToCQu2AmojinM-fXXhE_bmX-7peamR5GBrgD@mail.gmail.com>
Message-ID: <4CC716D4.2030005@oracle.com>

"real" is elapsed time too, obtained from the OS via times(2).
So if it's reported so small when users see much more time elapse physically,
it must be the case that it's a bug in times(2) in a virtual setting.
Perhaps if you can boil this down to a small and reproducible test case
you can file a bug with the VM provider and with the JVM as well, the
latter perhaps a shadow of the former.

Over to the runtime team.
-- ramki

On 10/26/10 10:49, Hi Guava wrote:
> I now believe that this phenomenon is caused by the virtual machine.
> It has nothing to do with the garbage collector or JVM. I searched in
> the old logs and found this in all 3 old logs that I have. There are
> multiple virtual machines configured the same way. This problem only
> shows up in one of the virtual machines.
> By the way, the 639 seconds GC is not a perception problem. It is
> real. The users reported stuck process and they found the CPUs of the
> virtual machine was racing during that period.
> Can I understand this discrepancy this way? the user, sys and real
> times are measured in cpu cycles. They are short as they are supposed
> to be. The 439.2720750 time is the elapsed time. Since the virtual
> machine is doing something else or not functioning correctly, GC took
> 439 seconds even though there was only 0.08 seconds of cpu time.
> 
> 72681.425: [GC 72681.426: [ParNew: 38325K->4224K(38336K), 439.2720750
> secs] 3013053K->2979055K(6549376K), 439.2720750 secs] [Times:
> user=0.08 sys=0.00, real=0.08 secs]
> 
> 
> On Tue, Oct 26, 2010 at 12:34 PM, Y. S. Ramakrishna
> <y.s.ramakrishna at oracle.com> wrote:
>>
>> On 10/26/10 07:07, Hi Guava wrote:
>>> Here is additional information about the machine running the JVM. It
>>> is a virtual machine running in a private cloud. Could it be something
>>> like swapping that caused problem?
>> Not swapping, but perhaps the management of "time" perhaps in a virtualized
>> setting (by that i mean that there may be interactions between the
>> host/hypervisor and the guest OS that could cause the JVM to observe
>> time jumps of this sort)? I'd suggest gathering more data on its
>> reproducibility (or otherwise) in both a VM and non-VM setting.
>>
>> Over to the time experts in the runtime team who may have encountered
>> issues in VM settings previously. (I have heard of occasional such reports
>> in
>> virtual settings before but don't know if any of these were definitively
>> chased
>> down.) You might also want to check with the VM provider to see if they
>> might know of such issues.
>>
>> -- ramki
>>
>>
>>> On Mon, Oct 25, 2010 at 8:51 PM, Y. Srinivas Ramakrishna
>>> <y.s.ramakrishna at oracle.com> wrote:
>>>> On 10/25/2010 5:49 PM, Y. Srinivas Ramakrishna wrote:
>>>>> On 10/25/2010 5:32 PM, Hi Guava wrote:
>>>>>> The third young generation GC took 439.2720750 secs but the user and
>>>>>> real time are only 0.08 seconds. What does it mean?
>>>>> The machine may be using NTP, and the time may have been changed?
>>>> Seems a rather large jump, so may not be NTP (which i am told uses
>>>> adjtime() to slowly accelerate the time forward or decelerate it
>>>> backward),
>>>> but rather an abrupt perhaps manual change in TOD.
>>>>
>>>> Over to the experts....
>>>>
>>>>> JVM timestamps on Linux seem still to be based on TOD rather than
>>>>> on TSC. Someone in the runtime team (cc'd) may have more detail on
>>>>> why that might still be so.
>>>>>
>>>>> -- ramki
>>>>>
>>>>>
>>>>>> 72667.213: [GC 72667.213: [ParNew: 38336K->4224K(38336K), 1.2473840
>>>>>> secs]
>>>>>> 3443948K->3420569K(6549376K), 1.2474290 secs] [Times: user=0.64
>>>>>> sys=1.10,
>>>>>> real=1.25 secs]
>>>>>> 72680.531: [GC 72680.532: [ParNew: 38336K->4221K(38336K), 0.2916570
>>>>>> secs]
>>>>>> 3008948K->2979033K(6549376K), 0.2916710 secs] [Times: user=0.26
>>>>>> sys=0.03,
>>>>>> real=0.29 secs]
>>>>>> 72681.425: [GC 72681.426: [ParNew: 38325K->4224K(38336K), 439.2720750
>>>>>> secs] 3013053K->2979055K(6549376K), 439.2720750 secs] [Times: user=0.08
>>>>>> sys=0.00, real=0.08 secs]
>>>>>>
>>>>>> Environment:
>>>>>> Java HotSpot(TM) 64-Bit Server VM Version 1.6.0_20-b02
>>>>>> Linux Version 2.6.18-128.1.1.el5 on amd64
>>>>>> -Xms6400m
>>>>>> -Xmx6400m
>>>>>> -Xss256k
>>>>>> -XX:+UseConcMarkSweepGC
>>>>>> -XX:+PrintGCDetails
>>>>>> -XX:+PrintGCTimeStamps
>>>>>> -XX:+UseCompressedOops
>>>>>> _______________________________________________
>>>>>> hotspot-gc-use mailing list
>>>>>> hotspot-gc-use at openjdk.java.net
>>>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

From Dori.Rabin at Starhome.com  Wed Oct 27 05:05:03 2010
From: Dori.Rabin at Starhome.com (Rabin Dori)
Date: Wed, 27 Oct 2010 14:05:03 +0200
Subject: i would like to post to this list
Message-ID: <983CFBCFF00E9A498F2703DBD7155DC7295D82561B@ISR-IT-EX-01.starhome.local>

My email is : dori.rabin at starhome.com<mailto:dori.rabin at starhome.com>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101027/fbe74692/attachment.html 

From anthony.warden at nomura.com  Wed Oct 27 05:21:04 2010
From: anthony.warden at nomura.com (anthony.warden at nomura.com)
Date: Wed, 27 Oct 2010 13:21:04 +0100
Subject: i would like to post to this list
In-Reply-To: <983CFBCFF00E9A498F2703DBD7155DC7295D82561B@ISR-IT-EX-01.starhome.local>
References: <983CFBCFF00E9A498F2703DBD7155DC7295D82561B@ISR-IT-EX-01.starhome.local>
Message-ID: <2E97E78D7F99D64DA5108FE2E9F0E8280CAB6996@LONEV3201.EUROPE.NOM>

I think you just did!

 
From: hotspot-gc-use-bounces at openjdk.java.net
[mailto:hotspot-gc-use-bounces at openjdk.java.net] On Behalf Of Rabin Dori
Sent: 27 October 2010 13:05
To: hotspot-gc-use at openjdk.java.net
Subject: i would like to post to this list

 
My email is : dori.rabin at starhome.com

 
This e-mail (including any attachments) is confidential, may contain
proprietary or privileged information and is intended for the named
recipient(s) only. Unintended recipients are prohibited from taking action
on the basis of information in this e-mail and must delete all copies.
Nomura will not accept responsibility or liability for the accuracy or
completeness of, or the presence of any virus or disabling code in, this
e-mail. If verification is sought please request a hard copy. Any reference
to the terms of executed transactions should be treated as preliminary only
and subject to formal written confirmation by Nomura. Nomura reserves the
right to monitor e-mail communications through its networks (in accordance
with applicable laws). No confidentiality or privilege is waived or lost by
Nomura by any mistransmission of this e-mail. Any reference to "Nomura" is
a reference to any entity in the Nomura Holdings, Inc. group. Please read
our Electronic Communications Legal Notice which forms part of this e-mail:
http://www.Nomura.com/email_disclaimer.htm

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20101027/0cc1fbb6/attachment.html