From kinnari.darji at citi.com  Wed Aug  3 10:45:43 2011
From: kinnari.darji at citi.com (Darji, Kinnari )
Date: Wed, 3 Aug 2011 13:45:43 -0400
Subject: understanding GC logs
Message-ID: <21ED8E3420CDB647B88C7F80A7D64DAC0691E04E82@exnjmb89.nam.nsroot.net>


Hello GC team,
What does this all different time mean? Can someone please clarify?
What is the time application when application stops?

[GC 9768.668: [ParNew
   3746 Desired survivor size 10878976 bytes, new threshold 4 (max 4)
   3747 - age   1:     594288 bytes,     594288 total
   3748 - age   2:    2369912 bytes,    2964200 total
   3749 - age   3:    2877584 bytes,    5841784 total
   3750 - age   4:    3075264 bytes,    8917048 total
   3751 : 182066K->12384K(191744K), 0.0089120 secs] 2755986K->2586303K(10710272K), 0.0092180 secs] [Times: user=0.09 sys=0.00, real=0.01 secs]


Thank you
Kinnari

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20110803/d25a746e/attachment.html 

From y.s.ramakrishna at oracle.com  Wed Aug  3 11:07:52 2011
From: y.s.ramakrishna at oracle.com (Ramki Ramakrishna)
Date: Wed, 03 Aug 2011 11:07:52 -0700
Subject: understanding GC logs
In-Reply-To: <21ED8E3420CDB647B88C7F80A7D64DAC0691E04E82@exnjmb89.nam.nsroot.net>
References: <21ED8E3420CDB647B88C7F80A7D64DAC0691E04E82@exnjmb89.nam.nsroot.net>
Message-ID: <4E398E78.3000408@oracle.com>


On 8/3/2011 10:45 AM, Darji, Kinnari wrote:
>
> Hello GC team,
>
> What does this all different time mean? Can someone please clarify?
>
> What is the time application when application stops?
>
> [GC 9768.668: [ParNew
>
            ^^^^^^ JVM timestamp (seconds since start of JVM) at start 
of GC operation)
>
>    3746 Desired survivor size 10878976 bytes, new threshold 4 (max 4)
>
>    3747 - age   1:     594288 bytes,     594288 total
>
>    3748 - age   2:    2369912 bytes,    2964200 total
>
>    3749 - age   3:    2877584 bytes,    5841784 total
>
>    3750 - age   4:    3075264 bytes,    8917048 total
>
>    3751 : 182066K->12384K(191744K), 0.0089120 secs] 
> 2755986K->2586303K(10710272K), 0.0092180 secs]
>
                                                            ^^^^^^^^ 
                                                            ^^^^^^^
                                                           Duration of 
Scavenge                                        Duration of whole GC 
operation
                                                                                                                                       (includes scavenge)
>
> [Times: user=0.09 sys=0.00, real=0.01 secs]
>
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Process virtual user and system 
times, and real (elapsed) time during GC operation.

The time for which the application threads were stopped is about 9.2 ms.

-- ramki
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20110803/2e3d2fc1/attachment.html 

From y.s.ramakrishna at oracle.com  Wed Aug  3 11:36:17 2011
From: y.s.ramakrishna at oracle.com (Ramki Ramakrishna)
Date: Wed, 03 Aug 2011 11:36:17 -0700
Subject: understanding GC logs
In-Reply-To: <21ED8E3420CDB647B88C7F80A7D64DAC0691E04F84@exnjmb89.nam.nsroot.net>
References: <21ED8E3420CDB647B88C7F80A7D64DAC0691E04E82@exnjmb89.nam.nsroot.net>
	<4E398E78.3000408@oracle.com>
	<21ED8E3420CDB647B88C7F80A7D64DAC0691E04F84@exnjmb89.nam.nsroot.net>
Message-ID: <4E399521.2080805@oracle.com>


On 8/3/2011 11:18 AM, Darji, Kinnari wrote:
>
> Thanks Ramki
>
> So If I look at logs starting [GC and real times, that should be 
> almost application STW time. Am I correct?
>

yes. Except that the real time in that display has a resolution of 10 ms 
only.
(Thus the 9.2 ms looked like 0.01 s below, i think.)

But yes, that's the STW time.

One caveat though -- this only lists STW ops attributed to GC.
More generally, you would want to use +PrintSafepointStatistics to
see all STW operations (and details thereof), including of course the
GC ops (which are usually the most common type of STW op, but by
no means the only type).

-- ramki

> Thank you
>
> Kinnari
>
> *From:*Ramki Ramakrishna [mailto:y.s.ramakrishna at oracle.com]
> *Sent:* Wednesday, August 03, 2011 2:08 PM
> *To:* Darji, Kinnari [ICG-IT]
> *Cc:* hotspot-gc-use at openjdk.java.net
> *Subject:* Re: understanding GC logs
>
>
>
> On 8/3/2011 10:45 AM, Darji, Kinnari wrote:
>
> Hello GC team,
>
> What does this all different time mean? Can someone please clarify?
>
> What is the time application when application stops?
>
> [GC 9768.668: [ParNew
>
>            ^^^^^^ JVM timestamp (seconds since start of JVM) at start 
> of GC operation)
>
>    3746 Desired survivor size 10878976 bytes, new threshold 4 (max 4)
>
>    3747 - age   1:     594288 bytes,     594288 total
>
>    3748 - age   2:    2369912 bytes,    2964200 total
>
>    3749 - age   3:    2877584 bytes,    5841784 total
>
>    3750 - age   4:    3075264 bytes,    8917048 total
>
>    3751 : 182066K->12384K(191744K), 0.0089120 secs] 
> 2755986K->2586303K(10710272K), 0.0092180 secs]
>
>                                                            ^^^^^^^^ 
>                                                            ^^^^^^^
>                                                           Duration of 
> Scavenge                                        Duration of whole GC 
> operation
>                                                                                                                                       
> (includes scavenge)
>
> [Times: user=0.09 sys=0.00, real=0.01 secs]
>
>      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Process virtual user and system 
> times, and real (elapsed) time during GC operation.
>
> The time for which the application threads were stopped is about 9.2 ms.
>
> -- ramki
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20110803/3046b7ca/attachment-0001.html 

From kinnari.darji at citi.com  Wed Aug  3 11:43:57 2011
From: kinnari.darji at citi.com (Darji, Kinnari )
Date: Wed, 3 Aug 2011 14:43:57 -0400
Subject: understanding GC logs
In-Reply-To: <4E399521.2080805@oracle.com>
References: <21ED8E3420CDB647B88C7F80A7D64DAC0691E04E82@exnjmb89.nam.nsroot.net>
	<4E398E78.3000408@oracle.com>
	<21ED8E3420CDB647B88C7F80A7D64DAC0691E04F84@exnjmb89.nam.nsroot.net>
	<4E399521.2080805@oracle.com>
Message-ID: <21ED8E3420CDB647B88C7F80A7D64DAC0691EC735C@exnjmb89.nam.nsroot.net>

oh understood..
Thanks I will start use of +PrintSafepointStatistics option.

Thank you
Kinnari

From: Ramki Ramakrishna [mailto:y.s.ramakrishna at oracle.com]
Sent: Wednesday, August 03, 2011 2:36 PM
To: Darji, Kinnari [ICG-IT]
Cc: hotspot-gc-use at openjdk.java.net
Subject: Re: understanding GC logs


On 8/3/2011 11:18 AM, Darji, Kinnari wrote:
Thanks Ramki
So If I look at logs starting [GC and real times, that should be almost application STW time. Am I correct?

yes. Except that the real time in that display has a resolution of 10 ms only.
(Thus the 9.2 ms looked like 0.01 s below, i think.)

But yes, that's the STW time.

One caveat though -- this only lists STW ops attributed to GC.
More generally, you would want to use +PrintSafepointStatistics to
see all STW operations (and details thereof), including of course the
GC ops (which are usually the most common type of STW op, but by
no means the only type).

-- ramki


Thank you
Kinnari

From: Ramki Ramakrishna [mailto:y.s.ramakrishna at oracle.com]
Sent: Wednesday, August 03, 2011 2:08 PM
To: Darji, Kinnari [ICG-IT]
Cc: hotspot-gc-use at openjdk.java.net<mailto:hotspot-gc-use at openjdk.java.net>
Subject: Re: understanding GC logs


On 8/3/2011 10:45 AM, Darji, Kinnari wrote:

Hello GC team,
What does this all different time mean? Can someone please clarify?
What is the time application when application stops?

[GC 9768.668: [ParNew
           ^^^^^^ JVM timestamp (seconds since start of JVM) at start of GC operation)


   3746 Desired survivor size 10878976 bytes, new threshold 4 (max 4)
   3747 - age   1:     594288 bytes,     594288 total
   3748 - age   2:    2369912 bytes,    2964200 total
   3749 - age   3:    2877584 bytes,    5841784 total
   3750 - age   4:    3075264 bytes,    8917048 total
   3751 : 182066K->12384K(191744K), 0.0089120 secs] 2755986K->2586303K(10710272K), 0.0092180 secs]
                                                           ^^^^^^^^                                                            ^^^^^^^
                                                          Duration of Scavenge                                        Duration of whole GC operation
                                                                                                                                      (includes scavenge)


[Times: user=0.09 sys=0.00, real=0.01 secs]
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Process virtual user and system times, and real (elapsed) time during GC operation.

The time for which the application threads were stopped is about 9.2 ms.

-- ramki
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20110803/1e1adccc/attachment.html 

From kinnari.darji at citi.com  Wed Aug  3 14:40:13 2011
From: kinnari.darji at citi.com (Darji, Kinnari )
Date: Wed, 3 Aug 2011 17:40:13 -0400
Subject: understanding GC logs
In-Reply-To: <4E399521.2080805@oracle.com>
References: <21ED8E3420CDB647B88C7F80A7D64DAC0691E04E82@exnjmb89.nam.nsroot.net>
	<4E398E78.3000408@oracle.com>
	<21ED8E3420CDB647B88C7F80A7D64DAC0691E04F84@exnjmb89.nam.nsroot.net>
	<4E399521.2080805@oracle.com>
Message-ID: <21ED8E3420CDB647B88C7F80A7D64DAC0691EC78E3@exnjmb89.nam.nsroot.net>

Hi Ramki,
Not sure what's the problem. The process dies with following when I have +PrintSafepointStatistics

java version "1.6.0_16"
Java(TM) SE Runtime Environment (build 1.6.0_16-b01)
Java HotSpot(TM) 64-Bit Server VM (build 14.2-b01, mixed mode)
     vmop_name               [threads: total initially_running wait_to_block] [time: spin block sync] [vmop_time  time_elapsed] page_trap_count
no vm operation                    [       7          1              1]          [     0     0     0]     [     0        0]          0

Polling page always armed
    0 VM operations coalesced during safepoint
Maximum sync time      0 ms
~

Can you please help?

Thank you
Kinnari

From: Ramki Ramakrishna [mailto:y.s.ramakrishna at oracle.com]
Sent: Wednesday, August 03, 2011 2:36 PM
To: Darji, Kinnari [ICG-IT]
Cc: hotspot-gc-use at openjdk.java.net
Subject: Re: understanding GC logs


On 8/3/2011 11:18 AM, Darji, Kinnari wrote:
Thanks Ramki
So If I look at logs starting [GC and real times, that should be almost application STW time. Am I correct?

yes. Except that the real time in that display has a resolution of 10 ms only.
(Thus the 9.2 ms looked like 0.01 s below, i think.)

But yes, that's the STW time.

One caveat though -- this only lists STW ops attributed to GC.
More generally, you would want to use +PrintSafepointStatistics to
see all STW operations (and details thereof), including of course the
GC ops (which are usually the most common type of STW op, but by
no means the only type).

-- ramki


Thank you
Kinnari

From: Ramki Ramakrishna [mailto:y.s.ramakrishna at oracle.com]
Sent: Wednesday, August 03, 2011 2:08 PM
To: Darji, Kinnari [ICG-IT]
Cc: hotspot-gc-use at openjdk.java.net<mailto:hotspot-gc-use at openjdk.java.net>
Subject: Re: understanding GC logs


On 8/3/2011 10:45 AM, Darji, Kinnari wrote:

Hello GC team,
What does this all different time mean? Can someone please clarify?
What is the time application when application stops?

[GC 9768.668: [ParNew
           ^^^^^^ JVM timestamp (seconds since start of JVM) at start of GC operation)


   3746 Desired survivor size 10878976 bytes, new threshold 4 (max 4)
   3747 - age   1:     594288 bytes,     594288 total
   3748 - age   2:    2369912 bytes,    2964200 total
   3749 - age   3:    2877584 bytes,    5841784 total
   3750 - age   4:    3075264 bytes,    8917048 total
   3751 : 182066K->12384K(191744K), 0.0089120 secs] 2755986K->2586303K(10710272K), 0.0092180 secs]
                                                           ^^^^^^^^                                                            ^^^^^^^
                                                          Duration of Scavenge                                        Duration of whole GC operation
                                                                                                                                      (includes scavenge)


[Times: user=0.09 sys=0.00, real=0.01 secs]
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Process virtual user and system times, and real (elapsed) time during GC operation.

The time for which the application threads were stopped is about 9.2 ms.

-- ramki
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20110803/872ffd3a/attachment.html 

From y.s.ramakrishna at oracle.com  Wed Aug  3 14:47:51 2011
From: y.s.ramakrishna at oracle.com (Ramki Ramakrishna)
Date: Wed, 03 Aug 2011 14:47:51 -0700
Subject: PrintSafepointStatistics (was Re: understanding GC logs)
In-Reply-To: <21ED8E3420CDB647B88C7F80A7D64DAC0691EC78E3@exnjmb89.nam.nsroot.net>
References: <21ED8E3420CDB647B88C7F80A7D64DAC0691E04E82@exnjmb89.nam.nsroot.net>
	<4E398E78.3000408@oracle.com>
	<21ED8E3420CDB647B88C7F80A7D64DAC0691E04F84@exnjmb89.nam.nsroot.net>
	<4E399521.2080805@oracle.com>
	<21ED8E3420CDB647B88C7F80A7D64DAC0691EC78E3@exnjmb89.nam.nsroot.net>
Message-ID: <4E39C207.1050500@oracle.com>

Hi Kinnari -- hs14, which you are on, is rather old (current dev is 
hs22;  latest public is hs21).
Is it possible that you could switch to a more recent JDK? If that's not 
possible,
send me  an hs_err file and I can get a ticket opened for you via the 
usual support
channels. If the problem occurs with a recent hs21 or hs22, we can certainly
take a look here. In either case, I have modified the subject line for 
relevance
to the issue at hand, and also cross-posted to hsotspot-runtime-dev at o.j.n
where PrintSafepointStatistics expertise resides.

-- ramki

On 8/3/2011 2:40 PM, Darji, Kinnari wrote:
>
> Hi Ramki,
>
> Not sure what's the problem. The process dies with following when I 
> have +PrintSafepointStatistics
>
> java version "1.6.0_16"
>
> Java(TM) SE Runtime Environment (build 1.6.0_16-b01)
>
> Java HotSpot(TM) 64-Bit Server VM (build 14.2-b01, mixed mode)
>
>      vmop_name               [threads: total initially_running 
> wait_to_block] [time: spin block sync] [vmop_time  time_elapsed] 
> page_trap_count
>
> no vm operation                    [       7          1              
> 1]          [     0     0     0]     [     0        0]          0
>
> Polling page always armed
>
>     0 VM operations coalesced during safepoint
>
> Maximum sync time      0 ms
>
> ~
>
> Can you please help?
>
> Thank you
>
> Kinnari
>
> *From:*Ramki Ramakrishna [mailto:y.s.ramakrishna at oracle.com]
> *Sent:* Wednesday, August 03, 2011 2:36 PM
> *To:* Darji, Kinnari [ICG-IT]
> *Cc:* hotspot-gc-use at openjdk.java.net
> *Subject:* Re: understanding GC logs
>
>
>
> On 8/3/2011 11:18 AM, Darji, Kinnari wrote:
>
> Thanks Ramki
>
> So If I look at logs starting [GC and real times, that should be 
> almost application STW time. Am I correct?
>
>
> yes. Except that the real time in that display has a resolution of 10 
> ms only.
> (Thus the 9.2 ms looked like 0.01 s below, i think.)
>
> But yes, that's the STW time.
>
> One caveat though -- this only lists STW ops attributed to GC.
> More generally, you would want to use +PrintSafepointStatistics to
> see all STW operations (and details thereof), including of course the
> GC ops (which are usually the most common type of STW op, but by
> no means the only type).
>
> -- ramki
>
>
> Thank you
>
> Kinnari
>
> *From:*Ramki Ramakrishna [mailto:y.s.ramakrishna at oracle.com]
> *Sent:* Wednesday, August 03, 2011 2:08 PM
> *To:* Darji, Kinnari [ICG-IT]
> *Cc:* hotspot-gc-use at openjdk.java.net 
> <mailto:hotspot-gc-use at openjdk.java.net>
> *Subject:* Re: understanding GC logs
>
>
>
> On 8/3/2011 10:45 AM, Darji, Kinnari wrote:
>
> Hello GC team,
>
> What does this all different time mean? Can someone please clarify?
>
> What is the time application when application stops?
>
> [GC 9768.668: [ParNew
>
>            ^^^^^^ JVM timestamp (seconds since start of JVM) at start 
> of GC operation)
>
>
>    3746 Desired survivor size 10878976 bytes, new threshold 4 (max 4)
>
>    3747 - age   1:     594288 bytes,     594288 total
>
>    3748 - age   2:    2369912 bytes,    2964200 total
>
>    3749 - age   3:    2877584 bytes,    5841784 total
>
>    3750 - age   4:    3075264 bytes,    8917048 total
>
>    3751 : 182066K->12384K(191744K), 0.0089120 secs] 
> 2755986K->2586303K(10710272K), 0.0092180 secs]
>
>                                                            ^^^^^^^^ 
>                                                            ^^^^^^^
>                                                           Duration of 
> Scavenge                                        Duration of whole GC 
> operation
>                                                                                                                                       
> (includes scavenge)
>
>
> [Times: user=0.09 sys=0.00, real=0.01 secs]
>
>      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Process virtual user and system 
> times, and real (elapsed) time during GC operation.
>
> The time for which the application threads were stopped is about 9.2 ms.
>
> -- ramki
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20110803/0ab25d9b/attachment-0001.html 

From kinnari.darji at citi.com  Thu Aug  4 08:01:27 2011
From: kinnari.darji at citi.com (Darji, Kinnari )
Date: Thu, 4 Aug 2011 11:01:27 -0400
Subject: PrintSafepointStatistics (was Re: understanding GC logs)
In-Reply-To: <4E39C207.1050500@oracle.com>
References: <21ED8E3420CDB647B88C7F80A7D64DAC0691E04E82@exnjmb89.nam.nsroot.net>
	<4E398E78.3000408@oracle.com>
	<21ED8E3420CDB647B88C7F80A7D64DAC0691E04F84@exnjmb89.nam.nsroot.net>
	<4E399521.2080805@oracle.com>
	<21ED8E3420CDB647B88C7F80A7D64DAC0691EC78E3@exnjmb89.nam.nsroot.net>
	<4E39C207.1050500@oracle.com>
Message-ID: <21ED8E3420CDB647B88C7F80A7D64DAC0691EC81CA@exnjmb89.nam.nsroot.net>

Hi Ramki,
I am running jdk-1.6.0_16 and  Java HotSpot(TM) 64-Bit Server VM (build 14.2-b01, mixed mode). I can't change JDK version.  Is there any other way to have this info printed on GC logs with this JDK version?

Attaching error file..

Thank you
Kinnari

From: Ramki Ramakrishna [mailto:y.s.ramakrishna at oracle.com]
Sent: Wednesday, August 03, 2011 5:48 PM
To: Darji, Kinnari [ICG-IT]
Cc: 'hotspot-gc-use at openjdk.java.net'; hotspot-runtime-dev at openjdk.java.net
Subject: PrintSafepointStatistics (was Re: understanding GC logs)

Hi Kinnari -- hs14, which you are on, is rather old (current dev is hs22;  latest public is hs21).
Is it possible that you could switch to a more recent JDK? If that's not possible,
send me  an hs_err file and I can get a ticket opened for you via the usual support
channels. If the problem occurs with a recent hs21 or hs22, we can certainly
take a look here. In either case, I have modified the subject line for relevance
to the issue at hand, and also cross-posted to hsotspot-runtime-dev at o.j.n<mailto:hsotspot-runtime-dev at o.j.n>
where PrintSafepointStatistics expertise resides.

-- ramki

On 8/3/2011 2:40 PM, Darji, Kinnari wrote:
Hi Ramki,
Not sure what's the problem. The process dies with following when I have +PrintSafepointStatistics

java version "1.6.0_16"
Java(TM) SE Runtime Environment (build 1.6.0_16-b01)
Java HotSpot(TM) 64-Bit Server VM (build 14.2-b01, mixed mode)
     vmop_name               [threads: total initially_running wait_to_block] [time: spin block sync] [vmop_time  time_elapsed] page_trap_count
no vm operation                    [       7          1              1]          [     0     0     0]     [     0        0]          0

Polling page always armed
    0 VM operations coalesced during safepoint
Maximum sync time      0 ms
~

Can you please help?

Thank you
Kinnari

From: Ramki Ramakrishna [mailto:y.s.ramakrishna at oracle.com]
Sent: Wednesday, August 03, 2011 2:36 PM
To: Darji, Kinnari [ICG-IT]
Cc: hotspot-gc-use at openjdk.java.net<mailto:hotspot-gc-use at openjdk.java.net>
Subject: Re: understanding GC logs


On 8/3/2011 11:18 AM, Darji, Kinnari wrote:
Thanks Ramki
So If I look at logs starting [GC and real times, that should be almost application STW time. Am I correct?

yes. Except that the real time in that display has a resolution of 10 ms only.
(Thus the 9.2 ms looked like 0.01 s below, i think.)

But yes, that's the STW time.

One caveat though -- this only lists STW ops attributed to GC.
More generally, you would want to use +PrintSafepointStatistics to
see all STW operations (and details thereof), including of course the
GC ops (which are usually the most common type of STW op, but by
no means the only type).

-- ramki


Thank you
Kinnari

From: Ramki Ramakrishna [mailto:y.s.ramakrishna at oracle.com]
Sent: Wednesday, August 03, 2011 2:08 PM
To: Darji, Kinnari [ICG-IT]
Cc: hotspot-gc-use at openjdk.java.net<mailto:hotspot-gc-use at openjdk.java.net>
Subject: Re: understanding GC logs


On 8/3/2011 10:45 AM, Darji, Kinnari wrote:

Hello GC team,
What does this all different time mean? Can someone please clarify?
What is the time application when application stops?

[GC 9768.668: [ParNew
           ^^^^^^ JVM timestamp (seconds since start of JVM) at start of GC operation)


   3746 Desired survivor size 10878976 bytes, new threshold 4 (max 4)
   3747 - age   1:     594288 bytes,     594288 total
   3748 - age   2:    2369912 bytes,    2964200 total
   3749 - age   3:    2877584 bytes,    5841784 total
   3750 - age   4:    3075264 bytes,    8917048 total
   3751 : 182066K->12384K(191744K), 0.0089120 secs] 2755986K->2586303K(10710272K), 0.0092180 secs]
                                                           ^^^^^^^^                                                            ^^^^^^^
                                                          Duration of Scavenge                                        Duration of whole GC operation
                                                                                                                                      (includes scavenge)


[Times: user=0.09 sys=0.00, real=0.01 secs]
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Process virtual user and system times, and real (elapsed) time during GC operation.

The time for which the application threads were stopped is about 9.2 ms.

-- ramki
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20110804/db804156/attachment.html 
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: HSError.txt
Url: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20110804/db804156/attachment.txt 

From y.s.ramakrishna at oracle.com  Thu Aug  4 10:12:20 2011
From: y.s.ramakrishna at oracle.com (Y. S. Ramakrishna)
Date: Thu, 04 Aug 2011 10:12:20 -0700
Subject: PrintSafepointStatistics (was Re: understanding GC logs)
In-Reply-To: <21ED8E3420CDB647B88C7F80A7D64DAC0691EC81CA@exnjmb89.nam.nsroot.net>
References: <21ED8E3420CDB647B88C7F80A7D64DAC0691E04E82@exnjmb89.nam.nsroot.net>
	<4E398E78.3000408@oracle.com>
	<21ED8E3420CDB647B88C7F80A7D64DAC0691E04F84@exnjmb89.nam.nsroot.net>
	<4E399521.2080805@oracle.com>
	<21ED8E3420CDB647B88C7F80A7D64DAC0691EC78E3@exnjmb89.nam.nsroot.net>
	<4E39C207.1050500@oracle.com>
	<21ED8E3420CDB647B88C7F80A7D64DAC0691EC81CA@exnjmb89.nam.nsroot.net>
Message-ID: <4E3AD2F4.6070308@oracle.com>


Hi Kinnari --

On 08/04/11 08:01, Darji, Kinnari wrote:
> Hi Ramki,
> 
> I am running jdk-1.6.0_16 and  Java HotSpot(TM) 64-Bit Server VM (build 
> 14.2-b01, mixed mode). I can?t change JDK version.  Is there any other 
> way to have this info printed on GC logs with this JDK version?

I'll get you in touch, off-list, with support folk so they can help open a service
ticket based on your support contract for the older JDK.

As regards having this kind of info printed without recourse to +PrintSafepointStatistics,
try -XX:+PrintGCApplicationStoppedTime and -XX:+PrintGCApplicationConcurrentTime,
which should give you the times you want, albeit with none of the finer
details that +PrintSafepointStatistics would have provided you.
Here's a description of those flags from globals.hpp:-

   product(bool, PrintGCApplicationConcurrentTime, false,                    \
           "Print the time the application has been running")                \
                                                                             \
   product(bool, PrintGCApplicationStoppedTime, false,                       \
           "Print the time the application has been stopped")                \
                                                                             \
(... basically between safepoints, or at safepoints respectively).

As regards:

> 
>  
> 
> Attaching error file..

As I understood you were getting a JVM crash when you used
+PrintSafepointStatistics with 6u16. In that case, the JVM would
typically dump a file named hs_err_<pid>.log in the $CWD of
your invoking shell. That's what the support folks would
want (along with the core file may be in some cases).

Please send the hs_err_*.log file so I can provide
that to the support folk.

It is possible that someone on the runtime list
might already recognize this problem as one that
has since been fixed.

-- ramki

> 
>  
> 
> Thank you
> 
> Kinnari
> 
>  
> 
> *From:* Ramki Ramakrishna [mailto:y.s.ramakrishna at oracle.com]
> *Sent:* Wednesday, August 03, 2011 5:48 PM
> *To:* Darji, Kinnari [ICG-IT]
> *Cc:* 'hotspot-gc-use at openjdk.java.net'; 
> hotspot-runtime-dev at openjdk.java.net
> *Subject:* PrintSafepointStatistics (was Re: understanding GC logs)
> 
>  
> 
> Hi Kinnari -- hs14, which you are on, is rather old (current dev is 
> hs22;  latest public is hs21).
> Is it possible that you could switch to a more recent JDK? If that's not 
> possible,
> send me  an hs_err file and I can get a ticket opened for you via the 
> usual support
> channels. If the problem occurs with a recent hs21 or hs22, we can certainly
> take a look here. In either case, I have modified the subject line for 
> relevance
> to the issue at hand, and also cross-posted to 
> hsotspot-runtime-dev at o.j.n <mailto:hsotspot-runtime-dev at o.j.n>
> where PrintSafepointStatistics expertise resides.
> 
> -- ramki
> 
> On 8/3/2011 2:40 PM, Darji, Kinnari wrote:
> 
> Hi Ramki,
> 
> Not sure what?s the problem. The process dies with following when I have 
> +PrintSafepointStatistics
> 
>  
> 
> java version "1.6.0_16"
> 
> Java(TM) SE Runtime Environment (build 1.6.0_16-b01)
> 
> Java HotSpot(TM) 64-Bit Server VM (build 14.2-b01, mixed mode)
> 
>      vmop_name               [threads: total initially_running 
> wait_to_block] [time: spin block sync] [vmop_time  time_elapsed] 
> page_trap_count
> 
> no vm operation                    [       7          1              
> 1]          [     0     0     0]     [     0        0]          0
> 
>  
> 
> Polling page always armed
> 
>     0 VM operations coalesced during safepoint
> 
> Maximum sync time      0 ms
> 
> ~
> 
>  
> 
> Can you please help?
> 
>  
> 
> Thank you
> 
> Kinnari
> 
>  
> 
> *From:* Ramki Ramakrishna [mailto:y.s.ramakrishna at oracle.com]
> *Sent:* Wednesday, August 03, 2011 2:36 PM
> *To:* Darji, Kinnari [ICG-IT]
> *Cc:* hotspot-gc-use at openjdk.java.net 
> <mailto:hotspot-gc-use at openjdk.java.net>
> *Subject:* Re: understanding GC logs
> 
>  
> 
> 
> 
> On 8/3/2011 11:18 AM, Darji, Kinnari wrote:
> 
> Thanks Ramki
> 
> So If I look at logs starting [GC and real times, that should be almost 
> application STW time. Am I correct?
> 
> 
> yes. Except that the real time in that display has a resolution of 10 ms 
> only.
> (Thus the 9.2 ms looked like 0.01 s below, i think.)
> 
> But yes, that's the STW time.
> 
> One caveat though -- this only lists STW ops attributed to GC.
> More generally, you would want to use +PrintSafepointStatistics to
> see all STW operations (and details thereof), including of course the
> GC ops (which are usually the most common type of STW op, but by
> no means the only type).
> 
> -- ramki
> 
> 
> 
>  
> 
> Thank you
> 
> Kinnari
> 
>  
> 
> *From:* Ramki Ramakrishna [mailto:y.s.ramakrishna at oracle.com]
> *Sent:* Wednesday, August 03, 2011 2:08 PM
> *To:* Darji, Kinnari [ICG-IT]
> *Cc:* hotspot-gc-use at openjdk.java.net 
> <mailto:hotspot-gc-use at openjdk.java.net>
> *Subject:* Re: understanding GC logs
> 
>  
> 
> 
> 
> On 8/3/2011 10:45 AM, Darji, Kinnari wrote:
> 
>  
> 
> Hello GC team,
> 
> What does this all different time mean? Can someone please clarify?
> 
> What is the time application when application stops?
> 
>  
> 
> [GC 9768.668: [ParNew
> 
>            ^^^^^^ JVM timestamp (seconds since start of JVM) at start of 
> GC operation)
> 
> 
> 
>    3746 Desired survivor size 10878976 bytes, new threshold 4 (max 4)
> 
>    3747 - age   1:     594288 bytes,     594288 total
> 
>    3748 - age   2:    2369912 bytes,    2964200 total
> 
>    3749 - age   3:    2877584 bytes,    5841784 total
> 
>    3750 - age   4:    3075264 bytes,    8917048 total
> 
>    3751 : 182066K->12384K(191744K), 0.0089120 secs] 
> 2755986K->2586303K(10710272K), 0.0092180 secs]
> 
>                                                            ^^^^^^^^ 
>                                                            ^^^^^^^
>                                                           Duration of 
> Scavenge                                        Duration of whole GC 
> operation
>                                                                                                                                       
> (includes scavenge)
> 
> 
> 
> [Times: user=0.09 sys=0.00, real=0.01 secs]
> 
>      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Process virtual user and system 
> times, and real (elapsed) time during GC operation.
> 
> The time for which the application threads were stopped is about 9.2 ms.
> 
> -- ramki
> 

From y.s.ramakrishna at oracle.com  Thu Aug  4 11:20:33 2011
From: y.s.ramakrishna at oracle.com (Y. S. Ramakrishna)
Date: Thu, 04 Aug 2011 11:20:33 -0700
Subject: PrintSafepointStatistics (was Re: understanding GC logs)
In-Reply-To: <21ED8E3420CDB647B88C7F80A7D64DAC0691F84295@exnjmb89.nam.nsroot.net>
References: <21ED8E3420CDB647B88C7F80A7D64DAC0691E04E82@exnjmb89.nam.nsroot.net>
	<4E398E78.3000408@oracle.com>
	<21ED8E3420CDB647B88C7F80A7D64DAC0691E04F84@exnjmb89.nam.nsroot.net>
	<4E399521.2080805@oracle.com>
	<21ED8E3420CDB647B88C7F80A7D64DAC0691EC78E3@exnjmb89.nam.nsroot.net>
	<4E39C207.1050500@oracle.com>
	<21ED8E3420CDB647B88C7F80A7D64DAC0691EC81CA@exnjmb89.nam.nsroot.net>
	<4E3AD2F4.6070308@oracle.com>
	<21ED8E3420CDB647B88C7F80A7D64DAC0691F84295@exnjmb89.nam.nsroot.net>
Message-ID: <4E3AE2F1.2040401@oracle.com>

There should be a header line that tells you what each column represents. It
was in one of your earlier emails, see below:
(The column header is printed more frequently in later versions of the JVM
making interpretation easier; see also more on that below.)

On 08/04/11 10:34, Darji, Kinnari wrote:
> Ramki,
> I tried it one more time and my process came up fine. I see following logs in console log. Though I don't see anything on verbose GC logs. Is that proper output? If so, how do I interpret following logs?
> 
> Deoptimize                         [       8          0              0]          [     0     0     0]     [     0        0]          0
> Deoptimize                         [       9          0              0]          [     0     0     0]     [     0       26]          0
> Deoptimize                         [       9          0              0]          [     0     0     0]     [     0       46]          0
> Deoptimize                         [      13          0              0]          [     0     0     0]     [     0      860]          0
> Deoptimize                         [      13          0              0]          [     0     0     0]     [     0      714]          0
> Deoptimize                         [      15          0              0]          [     0     0     0]     [     0      760]          0
> Deoptimize                         [      15          0              0]          [     0     0     0]     [     0        0]          0
> Deoptimize                         [      15          0              0]          [     0     0     0]     [     1      310]          0
> GenCollectForAllocation            [      15          0              0]          [     0     0     0]     [    27      292]          0
> Deoptimize                         [      18          0              0]          [     0     0     0]     [     0      821]          0
> EnableBiasedLocking                [      30          0              0]          [     0     0     0]     [     1      355]          0
> BulkRevokeBias                     [      32          1              0]          [     0     0     0]     [     3     1905]          0
> RevokeBias                         [      32          0              1]          [     0     0     0]     [     3        9]          0
> RevokeBias                         [      31          0              2]          [     0     0     0]     [     0        4]          0
> RevokeBias                         [      30          0              0]          [     0     0     0]     [     1        1]          0
> RevokeBias                         [      29          0              0]          [     0     0     0]     [     2        7]          0
> RevokeBias                         [      28          0              1]          [     0     0     0]     [     2        2]          0
> BulkRevokeBias                     [      28          0              1]          [     0     0     0]     [     0        2]          0
> RevokeBias                         [      27          1              0]          [     0     0     0]     [     1        1]          0
> RevokeBias                         [      25          0              1]          [     0     0     0]     [     2        3]
> 
> Thank you
> Kinnari
> 

This here:-

...
>>      vmop_name               [threads: total initially_running wait_to_block] [time: spin block sync] [vmop_time  time_elapsed] page_trap_count

Unfortunately, this is not the easiest thing to interpret if you are not familiar
with the JVM safepoint protocol details (it's intended for extreme performance
tuning or troubleshooting); not only that, because the data is printed in "batches",
it's less than easy to align it with GC or other logging.
(Later versions (in hs20 or later) fixed this somewhat by providing a JVM timestamp
against each -- wherease above you need to reconstruct that info from the deltas, which is
a pain.) So my advice is to use a newer JVM if you can, and if you can't then
just rely on the less detailed, but easier to align,
+PrintGCApplication{Concurrent,Stopped}Time flags.

By the way, if the crash that you reported earlier does not in fact
happen, please make sure to tell the support engineering contacts, who
may contacted you off-list, so they can close off any ticket they may
have opened for your report.

thanks.
-- ramki

From kinnari.darji at citi.com  Thu Aug  4 11:24:46 2011
From: kinnari.darji at citi.com (Darji, Kinnari )
Date: Thu, 4 Aug 2011 14:24:46 -0400
Subject: PrintSafepointStatistics (was Re: understanding GC logs)
In-Reply-To: <4E3AE2F1.2040401@oracle.com>
References: <21ED8E3420CDB647B88C7F80A7D64DAC0691E04E82@exnjmb89.nam.nsroot.net>
	<4E398E78.3000408@oracle.com>
	<21ED8E3420CDB647B88C7F80A7D64DAC0691E04F84@exnjmb89.nam.nsroot.net>
	<4E399521.2080805@oracle.com>
	<21ED8E3420CDB647B88C7F80A7D64DAC0691EC78E3@exnjmb89.nam.nsroot.net>
	<4E39C207.1050500@oracle.com>
	<21ED8E3420CDB647B88C7F80A7D64DAC0691EC81CA@exnjmb89.nam.nsroot.net>
	<4E3AD2F4.6070308@oracle.com>
	<21ED8E3420CDB647B88C7F80A7D64DAC0691F84295@exnjmb89.nam.nsroot.net>
	<4E3AE2F1.2040401@oracle.com>
Message-ID: <21ED8E3420CDB647B88C7F80A7D64DAC0691F84463@exnjmb89.nam.nsroot.net>

Sure, I will let them know..
Thanks a lot for your help. I do appreciate it.

Thank you
Kinnari


-----Original Message-----
From: Y. S. Ramakrishna [mailto:y.s.ramakrishna at oracle.com]
Sent: Thursday, August 04, 2011 2:21 PM
To: Darji, Kinnari [ICG-IT]
Cc: hotspot-gc-use at openjdk.java.net; hostpot-runtime-dev at openjdk.java.net
Subject: Re: PrintSafepointStatistics (was Re: understanding GC logs)

There should be a header line that tells you what each column represents. It
was in one of your earlier emails, see below:
(The column header is printed more frequently in later versions of the JVM
making interpretation easier; see also more on that below.)

On 08/04/11 10:34, Darji, Kinnari wrote:
> Ramki,
> I tried it one more time and my process came up fine. I see following logs in console log. Though I don't see anything on verbose GC logs. Is that proper output? If so, how do I interpret following logs?
>
> Deoptimize                         [       8          0              0]          [     0     0     0]     [     0        0]          0
> Deoptimize                         [       9          0              0]          [     0     0     0]     [     0       26]          0
> Deoptimize                         [       9          0              0]          [     0     0     0]     [     0       46]          0
> Deoptimize                         [      13          0              0]          [     0     0     0]     [     0      860]          0
> Deoptimize                         [      13          0              0]          [     0     0     0]     [     0      714]          0
> Deoptimize                         [      15          0              0]          [     0     0     0]     [     0      760]          0
> Deoptimize                         [      15          0              0]          [     0     0     0]     [     0        0]          0
> Deoptimize                         [      15          0              0]          [     0     0     0]     [     1      310]          0
> GenCollectForAllocation            [      15          0              0]          [     0     0     0]     [    27      292]          0
> Deoptimize                         [      18          0              0]          [     0     0     0]     [     0      821]          0
> EnableBiasedLocking                [      30          0              0]          [     0     0     0]     [     1      355]          0
> BulkRevokeBias                     [      32          1              0]          [     0     0     0]     [     3     1905]          0
> RevokeBias                         [      32          0              1]          [     0     0     0]     [     3        9]          0
> RevokeBias                         [      31          0              2]          [     0     0     0]     [     0        4]          0
> RevokeBias                         [      30          0              0]          [     0     0     0]     [     1        1]          0
> RevokeBias                         [      29          0              0]          [     0     0     0]     [     2        7]          0
> RevokeBias                         [      28          0              1]          [     0     0     0]     [     2        2]          0
> BulkRevokeBias                     [      28          0              1]          [     0     0     0]     [     0        2]          0
> RevokeBias                         [      27          1              0]          [     0     0     0]     [     1        1]          0
> RevokeBias                         [      25          0              1]          [     0     0     0]     [     2        3]
>
> Thank you
> Kinnari
>

This here:-

...
>>      vmop_name               [threads: total initially_running wait_to_block] [time: spin block sync] [vmop_time  time_elapsed] page_trap_count

Unfortunately, this is not the easiest thing to interpret if you are not familiar
with the JVM safepoint protocol details (it's intended for extreme performance
tuning or troubleshooting); not only that, because the data is printed in "batches",
it's less than easy to align it with GC or other logging.
(Later versions (in hs20 or later) fixed this somewhat by providing a JVM timestamp
against each -- wherease above you need to reconstruct that info from the deltas, which is
a pain.) So my advice is to use a newer JVM if you can, and if you can't then
just rely on the less detailed, but easier to align,
+PrintGCApplication{Concurrent,Stopped}Time flags.

By the way, if the crash that you reported earlier does not in fact
happen, please make sure to tell the support engineering contacts, who
may contacted you off-list, so they can close off any ticket they may
have opened for your report.

thanks.
-- ramki

From matt.fowles at gmail.com  Fri Aug  5 10:51:31 2011
From: matt.fowles at gmail.com (Matt Fowles)
Date: Fri, 5 Aug 2011 13:51:31 -0400
Subject: Log Visualization Tools
Message-ID: <CAApERub+RpfrPEr-Gpeb=Kro2ZaaoB+zKvdakYvPgmZkcwhp4A@mail.gmail.com>

All~

What tools do people know of or have for parsing gc logs and
visualizing the results?

The only thing I can find, GCViewer, (from
http://www.tagtraum.com/gcviewer.html) seems like it has not been
updated for a while and does not parse a lot of more complicated logs
(-XX:+PrintTenuringDistribution or -XX:PrintCMSStatistics=1).

Are there more tools out there?  Are there in house tools that people
are willing to share?

Matt

From eric.caspole at amd.com  Fri Aug  5 12:40:38 2011
From: eric.caspole at amd.com (Eric Caspole)
Date: Fri, 5 Aug 2011 15:40:38 -0400
Subject: Log Visualization Tools
In-Reply-To: <CAApERub+RpfrPEr-Gpeb=Kro2ZaaoB+zKvdakYvPgmZkcwhp4A@mail.gmail.com>
References: <CAApERub+RpfrPEr-Gpeb=Kro2ZaaoB+zKvdakYvPgmZkcwhp4A@mail.gmail.com>
Message-ID: <956CA137-70AD-40A0-8757-56BE98A429A9@amd.com>

Sometimes I use HPjmeter for plain Xloggc, but I don't think it can  
do the fancy extra flags either.

On Aug 5, 2011, at 1:51 PM, Matt Fowles wrote:

> All~
>
> What tools do people know of or have for parsing gc logs and
> visualizing the results?
>
> The only thing I can find, GCViewer, (from
> http://www.tagtraum.com/gcviewer.html) seems like it has not been
> updated for a while and does not parse a lot of more complicated logs
> (-XX:+PrintTenuringDistribution or -XX:PrintCMSStatistics=1).
>
> Are there more tools out there?  Are there in house tools that people
> are willing to share?
>
> Matt
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>


From y.s.ramakrishna at oracle.com  Fri Aug  5 13:33:22 2011
From: y.s.ramakrishna at oracle.com (Ramki Ramakrishna)
Date: Fri, 05 Aug 2011 13:33:22 -0700
Subject: Log Visualization Tools
In-Reply-To: <4E3C5253.2050802@oracle.com>
References: <CAApERub+RpfrPEr-Gpeb=Kro2ZaaoB+zKvdakYvPgmZkcwhp4A@mail.gmail.com>	<956CA137-70AD-40A0-8757-56BE98A429A9@amd.com>
	<4E3C5253.2050802@oracle.com>
Message-ID: <4E3C5392.7080906@oracle.com>

Sorry for the noise: my response was sent to the wrong list in error; 
corrected herewith.

-- ramki

On 8/5/2011 1:28 PM, Ramki Ramakrishna wrote:
> Same here -- i sometimes use an internal homegrown awk script to 
> extract the
> metrics, so they can be massaged into a data file amenable to plotting 
> with gnuplot.
> That script does not take well to the extra output either, so we 
> usually strip out the
> extra output and deal with only the more fundamental metrics only. The 
> extra output
> from the more fancy flags has thus far been consumed only by humans or 
> extracted on an ad-hoc basis into
> spreadsheets and such. This is clearly not a nice state of affairs. I 
> believe there is
> work (or plans?) underway for some kind of logging framework into 
> which the JVM will feed
> its metrics, and hopefully the tooling that consumes those logs will 
> be able to
> deal with all these issues in a more uniform fashion once and for 
> all.... Unfortunately,
> I have no real details of that work, though...
>
> Then there is gchisto which is GC-specific (but which also cannot 
> consume the output
> from the more fancy flags), but that has been placed on the backseat 
> as other issues
> have intervened.
> In general, until GC logging formats are standardized, tools that 
> consume textual
> output from the JVM/GC will tend to break unless changes to these text 
> formats are
> carefully controlled. There has been some talk on and off about trying to
> standardize those formats, but I am not sure about the status of that. 
> May be the
> logging framework mentioned earlier will provide a superstructure from 
> which such
> textual standardization will result naturally.
>
> -- ramki
>
> On 8/5/2011 12:40 PM, Eric Caspole wrote:
>> Sometimes I use HPjmeter for plain Xloggc, but I don't think it can
>> do the fancy extra flags either.
>>
>> On Aug 5, 2011, at 1:51 PM, Matt Fowles wrote:
>>
>>> All~
>>>
>>> What tools do people know of or have for parsing gc logs and
>>> visualizing the results?
>>>
>>> The only thing I can find, GCViewer, (from
>>> http://www.tagtraum.com/gcviewer.html) seems like it has not been
>>> updated for a while and does not parse a lot of more complicated logs
>>> (-XX:+PrintTenuringDistribution or -XX:PrintCMSStatistics=1).
>>>
>>> Are there more tools out there?  Are there in house tools that people
>>> are willing to share?
>>>
>>> Matt
>>> _______________________________________________
>>> hotspot-gc-use mailing list
>>> hotspot-gc-use at openjdk.java.net
>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>
>>
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

From matt.fowles at gmail.com  Fri Aug  5 14:06:58 2011
From: matt.fowles at gmail.com (Matt Fowles)
Date: Fri, 5 Aug 2011 17:06:58 -0400
Subject: Log Visualization Tools
In-Reply-To: <4E3C5392.7080906@oracle.com>
References: <CAApERub+RpfrPEr-Gpeb=Kro2ZaaoB+zKvdakYvPgmZkcwhp4A@mail.gmail.com>
	<956CA137-70AD-40A0-8757-56BE98A429A9@amd.com>
	<4E3C5253.2050802@oracle.com> <4E3C5392.7080906@oracle.com>
Message-ID: <CAApERuaYOUreWJ1F0mL23xS2HM=fENovoxYOm4iXTFyT14Yx3A@mail.gmail.com>

Ramki~

What rules govern the upgrade of logging formats?  Can the format only
be changed on major releases or can we just add a flag 'new format' to
minor releases?

Matt

On Fri, Aug 5, 2011 at 4:33 PM, Ramki Ramakrishna
<y.s.ramakrishna at oracle.com> wrote:
> Sorry for the noise: my response was sent to the wrong list in error;
> corrected herewith.
>
> -- ramki
>
> On 8/5/2011 1:28 PM, Ramki Ramakrishna wrote:
>> Same here -- i sometimes use an internal homegrown awk script to
>> extract the
>> metrics, so they can be massaged into a data file amenable to plotting
>> with gnuplot.
>> That script does not take well to the extra output either, so we
>> usually strip out the
>> extra output and deal with only the more fundamental metrics only. The
>> extra output
>> from the more fancy flags has thus far been consumed only by humans or
>> extracted on an ad-hoc basis into
>> spreadsheets and such. This is clearly not a nice state of affairs. I
>> believe there is
>> work (or plans?) underway for some kind of logging framework into
>> which the JVM will feed
>> its metrics, and hopefully the tooling that consumes those logs will
>> be able to
>> deal with all these issues in a more uniform fashion once and for
>> all.... Unfortunately,
>> I have no real details of that work, though...
>>
>> Then there is gchisto which is GC-specific (but which also cannot
>> consume the output
>> from the more fancy flags), but that has been placed on the backseat
>> as other issues
>> have intervened.
>> In general, until GC logging formats are standardized, tools that
>> consume textual
>> output from the JVM/GC will tend to break unless changes to these text
>> formats are
>> carefully controlled. There has been some talk on and off about trying to
>> standardize those formats, but I am not sure about the status of that.
>> May be the
>> logging framework mentioned earlier will provide a superstructure from
>> which such
>> textual standardization will result naturally.
>>
>> -- ramki
>>
>> On 8/5/2011 12:40 PM, Eric Caspole wrote:
>>> Sometimes I use HPjmeter for plain Xloggc, but I don't think it can
>>> do the fancy extra flags either.
>>>
>>> On Aug 5, 2011, at 1:51 PM, Matt Fowles wrote:
>>>
>>>> All~
>>>>
>>>> What tools do people know of or have for parsing gc logs and
>>>> visualizing the results?
>>>>
>>>> The only thing I can find, GCViewer, (from
>>>> http://www.tagtraum.com/gcviewer.html) seems like it has not been
>>>> updated for a while and does not parse a lot of more complicated logs
>>>> (-XX:+PrintTenuringDistribution or -XX:PrintCMSStatistics=1).
>>>>
>>>> Are there more tools out there? ?Are there in house tools that people
>>>> are willing to share?
>>>>
>>>> Matt
>>>> _______________________________________________
>>>> hotspot-gc-use mailing list
>>>> hotspot-gc-use at openjdk.java.net
>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>>
>>>
>>> _______________________________________________
>>> hotspot-gc-use mailing list
>>> hotspot-gc-use at openjdk.java.net
>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>

From y.s.ramakrishna at oracle.com  Fri Aug  5 14:22:46 2011
From: y.s.ramakrishna at oracle.com (Ramki Ramakrishna)
Date: Fri, 05 Aug 2011 14:22:46 -0700
Subject: Log Visualization Tools
In-Reply-To: <CAApERuaYOUreWJ1F0mL23xS2HM=fENovoxYOm4iXTFyT14Yx3A@mail.gmail.com>
References: <CAApERub+RpfrPEr-Gpeb=Kro2ZaaoB+zKvdakYvPgmZkcwhp4A@mail.gmail.com>
	<956CA137-70AD-40A0-8757-56BE98A429A9@amd.com>
	<4E3C5253.2050802@oracle.com> <4E3C5392.7080906@oracle.com>
	<CAApERuaYOUreWJ1F0mL23xS2HM=fENovoxYOm4iXTFyT14Yx3A@mail.gmail.com>
Message-ID: <4E3C5F26.9090707@oracle.com>

Historically, the non-standard ("extra fancy") flag output has not been 
governed
by any rules. The basic logging format however has basically not changed 
since
1.4.2, as far as i recall. Of course, each collector, typically released 
in a major new release,
has had its little quirks even within the basic format. Unfortunately, 
the situation
has not been governed by any strict rules -- although we have never 
knowingly introduced
changes to formatting in minor releases -- or even major releases -- for 
fear of breaking
existing log-parsing scripts, I am sure some have slipped through. In 
the absence
of a spec for the format, QA has never written tests to protect against 
inadvertent
regressions. (Sorry for talking like that; i know it sounds rather like 
a lawyer or politician
talking when i read that email back! :-) I am hoping things will get 
better, more standardized,
going forward, so people can use tools without fear of them breaking 
with a new release.

-- ramki

On 8/5/2011 2:06 PM, Matt Fowles wrote:
> Ramki~
>
> What rules govern the upgrade of logging formats?  Can the format only
> be changed on major releases or can we just add a flag 'new format' to
> minor releases?
>
> Matt
>
> On Fri, Aug 5, 2011 at 4:33 PM, Ramki Ramakrishna
> <y.s.ramakrishna at oracle.com>  wrote:
>> Sorry for the noise: my response was sent to the wrong list in error;
>> corrected herewith.
>>
>> -- ramki
>>
>> On 8/5/2011 1:28 PM, Ramki Ramakrishna wrote:
>>> Same here -- i sometimes use an internal homegrown awk script to
>>> extract the
>>> metrics, so they can be massaged into a data file amenable to plotting
>>> with gnuplot.
>>> That script does not take well to the extra output either, so we
>>> usually strip out the
>>> extra output and deal with only the more fundamental metrics only. The
>>> extra output
>>> from the more fancy flags has thus far been consumed only by humans or
>>> extracted on an ad-hoc basis into
>>> spreadsheets and such. This is clearly not a nice state of affairs. I
>>> believe there is
>>> work (or plans?) underway for some kind of logging framework into
>>> which the JVM will feed
>>> its metrics, and hopefully the tooling that consumes those logs will
>>> be able to
>>> deal with all these issues in a more uniform fashion once and for
>>> all.... Unfortunately,
>>> I have no real details of that work, though...
>>>
>>> Then there is gchisto which is GC-specific (but which also cannot
>>> consume the output
>>> from the more fancy flags), but that has been placed on the backseat
>>> as other issues
>>> have intervened.
>>> In general, until GC logging formats are standardized, tools that
>>> consume textual
>>> output from the JVM/GC will tend to break unless changes to these text
>>> formats are
>>> carefully controlled. There has been some talk on and off about trying to
>>> standardize those formats, but I am not sure about the status of that.
>>> May be the
>>> logging framework mentioned earlier will provide a superstructure from
>>> which such
>>> textual standardization will result naturally.
>>>
>>> -- ramki
>>>
>>> On 8/5/2011 12:40 PM, Eric Caspole wrote:
>>>> Sometimes I use HPjmeter for plain Xloggc, but I don't think it can
>>>> do the fancy extra flags either.
>>>>
>>>> On Aug 5, 2011, at 1:51 PM, Matt Fowles wrote:
>>>>
>>>>> All~
>>>>>
>>>>> What tools do people know of or have for parsing gc logs and
>>>>> visualizing the results?
>>>>>
>>>>> The only thing I can find, GCViewer, (from
>>>>> http://www.tagtraum.com/gcviewer.html) seems like it has not been
>>>>> updated for a while and does not parse a lot of more complicated logs
>>>>> (-XX:+PrintTenuringDistribution or -XX:PrintCMSStatistics=1).
>>>>>
>>>>> Are there more tools out there?  Are there in house tools that people
>>>>> are willing to share?
>>>>>
>>>>> Matt
>>>>> _______________________________________________
>>>>> hotspot-gc-use mailing list
>>>>> hotspot-gc-use at openjdk.java.net
>>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>>>
>>>> _______________________________________________
>>>> hotspot-gc-use mailing list
>>>> hotspot-gc-use at openjdk.java.net
>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>

From matt.fowles at gmail.com  Sat Aug  6 13:15:03 2011
From: matt.fowles at gmail.com (Matt Fowles)
Date: Sat, 6 Aug 2011 16:15:03 -0400
Subject: Log Visualization Tools
In-Reply-To: <D5733C65-2432-4934-B0AA-39D51E37A135@kodewerk.com>
References: <CAApERub+RpfrPEr-Gpeb=Kro2ZaaoB+zKvdakYvPgmZkcwhp4A@mail.gmail.com>
	<956CA137-70AD-40A0-8757-56BE98A429A9@amd.com>
	<D5733C65-2432-4934-B0AA-39D51E37A135@kodewerk.com>
Message-ID: <CAApERuaYNVjT-d4ysNe+eSwe3yZ1vZtJgV1zNWtbcmGPQTCVLw@mail.gmail.com>

Kirk~

I appreciate the offer.  I actually already analyzed the logs in
question (by hand) and was just regretting the lack of tooling in this
space.  I am definitely interested in using your tools (and even
contributing back to them) once they come out.

Thanks,
Matt

On Sat, Aug 6, 2011 at 3:55 PM, Charles K Pepperdine <kirk at kodewerk.com> wrote:
> Hi Matt,
>
> If you send me the GC Log I'll happily analyze it for you. I've got some tooling that is close to release. Alpha should be by end of August.
>
> Regards,
> Kirk
>
> On Aug 5, 2011, at 9:40 PM, Eric Caspole wrote:
>
>> Sometimes I use HPjmeter for plain Xloggc, but I don't think it can
>> do the fancy extra flags either.
>>
>> On Aug 5, 2011, at 1:51 PM, Matt Fowles wrote:
>>
>>> All~
>>>
>>> What tools do people know of or have for parsing gc logs and
>>> visualizing the results?
>>>
>>> The only thing I can find, GCViewer, (from
>>> http://www.tagtraum.com/gcviewer.html) seems like it has not been
>>> updated for a while and does not parse a lot of more complicated logs
>>> (-XX:+PrintTenuringDistribution or -XX:PrintCMSStatistics=1).
>>>
>>> Are there more tools out there? ?Are there in house tools that people
>>> are willing to share?
>>>
>>> Matt
>>> _______________________________________________
>>> hotspot-gc-use mailing list
>>> hotspot-gc-use at openjdk.java.net
>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>
>>
>>
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>

From kirk at kodewerk.com  Sat Aug  6 12:55:14 2011
From: kirk at kodewerk.com (Charles K Pepperdine)
Date: Sat, 6 Aug 2011 21:55:14 +0200
Subject: Log Visualization Tools
In-Reply-To: <956CA137-70AD-40A0-8757-56BE98A429A9@amd.com>
References: <CAApERub+RpfrPEr-Gpeb=Kro2ZaaoB+zKvdakYvPgmZkcwhp4A@mail.gmail.com>
	<956CA137-70AD-40A0-8757-56BE98A429A9@amd.com>
Message-ID: <D5733C65-2432-4934-B0AA-39D51E37A135@kodewerk.com>

Hi Matt,

If you send me the GC Log I'll happily analyze it for you. I've got some tooling that is close to release. Alpha should be by end of August.

Regards,
Kirk

On Aug 5, 2011, at 9:40 PM, Eric Caspole wrote:

> Sometimes I use HPjmeter for plain Xloggc, but I don't think it can  
> do the fancy extra flags either.
> 
> On Aug 5, 2011, at 1:51 PM, Matt Fowles wrote:
> 
>> All~
>> 
>> What tools do people know of or have for parsing gc logs and
>> visualizing the results?
>> 
>> The only thing I can find, GCViewer, (from
>> http://www.tagtraum.com/gcviewer.html) seems like it has not been
>> updated for a while and does not parse a lot of more complicated logs
>> (-XX:+PrintTenuringDistribution or -XX:PrintCMSStatistics=1).
>> 
>> Are there more tools out there?  Are there in house tools that people
>> are willing to share?
>> 
>> Matt
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>> 
> 
> 
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use


From chsu79 at gmail.com  Mon Aug 22 08:54:52 2011
From: chsu79 at gmail.com (Christian)
Date: Mon, 22 Aug 2011 17:54:52 +0200
Subject: Young GC pause time definitions
In-Reply-To: <92354D18-E0DC-4C02-8D61-91C076FB548D@kodewerk.com>
References: <CAJmT7L8OiDdYRbwrMy_GTc6tG04qzNC35sqeWd+D1VCHU5_yRQ@mail.gmail.com>
	<92354D18-E0DC-4C02-8D61-91C076FB548D@kodewerk.com>
Message-ID: <CAJmT7L-nkgKccCTpA23AZF7_JRF1d84ZD6GcDGZG4xTwdmSLVg@mail.gmail.com>

Sorry for my silence, I have been meaning to come back to this thread
first when I had new information of value to report. (I'll move this
discussion to hotspot-gc-use.)

The bug that was fixed in 6u19 was causing some of the increased GC
times that I could see. Upgrading the jdk did improve the situation. I
don't have exact numbers.

It would be interesting to learn how you guys with good insight to the
details of the GC would use publicly available flags to get
information that break down what the GC is spending time on.  I'm
going to be able to have remote access to their lab environment for
some time at which point I could try experiments, and extract
information from gc debug output. PrintTenuringDistribution is
definitely something I want to enable because I'm curious about the
survival rate.


On Mon, Aug 22, 2011 at 17:08, Charles K Pepperdine <kirk at kodewerk.com> wrote:
>>
>>
>> The customer site is running an old jdk 1.6.0_14, with
>> -XX:+UseParNewGC and -XX:UseConcMarkSweepGC. Uses a 12 G heap, a
>> relatively small 512Mb new size.
>
> This seems like a highly suspicious configuration that I would guess is at the root of the problem. Please use -XX:+PrintTenuringDistribution and post the gc log if you can.
>
> Regards,
> Kirk Pepperdine
>
>

From lawrence.chow at oracle.com  Tue Aug 23 12:00:57 2011
From: lawrence.chow at oracle.com (lawrence.chow at oracle.com)
Date: Tue, 23 Aug 2011 12:00:57 -0700 (PDT)
Subject: Auto Reply: hotspot-gc-use Digest, Vol 42, Issue 10
Message-ID: <c0ce889f-79b7-47a2-83f8-bc22697d196f@default>

Lawrence Chow will be out of the office on 08/20/11 through 08/29/11
Lawrence will return to the office on Tueday, 08/30/11.  Please contact Matt.Mille at oracle.com, Terry.Statt at oracle.com, or Mary.McCarthy at oracle.com if assistance is needed from a Java
collaborator in my absence.


From sergey.melderis at gmail.com  Mon Aug 29 10:34:42 2011
From: sergey.melderis at gmail.com (Sergejs Melderis)
Date: Mon, 29 Aug 2011 13:34:42 -0400
Subject: Default max heap size
Message-ID: <CAAU=Jf58YO=cyNhG5n6of_DCSWXPhdcD5VdKLiNFuva441+qwA@mail.gmail.com>

Hello.
I am trying to figure out how the hotspot chooses the default maximum heap size.
I posted this ?question to stackoverflow, but got no answers.
I don't want to repeat it here, so here is the question
http://stackoverflow.com/questions/7194526/hotspot-default-max-heap-size

I searched the jdk source code, for the place where it is calculated.
I found function set_heap_size defined here
http://hg.openjdk.java.net/jdk6/jdk6/hotspot/file/dc40301aed45/src/share/vm/runtime/arguments.cpp

If am not wrong, the calculation happens in the following lines

if (FLAG_IS_DEFAULT(MaxHeapSize)) {
? ?julong reasonable_max = phys_mem / MaxRAMFraction;

? ?if (phys_mem <= MaxHeapSize * MinRAMFraction) {
? ? ?// Small physical memory, so use a minimum fraction of it for the heap
? ? ?reasonable_max = phys_mem / MinRAMFraction;
? ?} else {
? ? ?// Not-small physical memory, so require a heap at least
? ? ?// as large as MaxHeapSize
? ? ?reasonable_max = MAX2(reasonable_max, (julong)MaxHeapSize);
? ?}


MaxRAMFraction is 4, so reasonable_max is phys_mem / 4. So, unless
physical memory is very small,
the reasonable_max will be MAX2(reasonable_max, (julong)MaxHeapSize);

MAX2 is defined as
#define MAX2(a, b) (((a) < (b)) ? (b) : (a))

At the end reasonable_max is set as MaxHeapSize
FLAG_SET_ERGO(uintx, MaxHeapSize, (uintx)reasonable_max);

If I plug in the memory size on my test machine, the reasonable_max
will be very close to what I get from jmap -heap.
With RAM of 8, 16 GB, or more, the MaxHeapSize will be greater than 1
GB, which contradicts the documentation
http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html#par_gc.ergonomics.default_size

Thanks,

Sergey.

From david.tavoularis at mycom-int.com  Mon Aug 29 02:25:38 2011
From: david.tavoularis at mycom-int.com (David Tavoularis)
Date: Mon, 29 Aug 2011 11:25:38 +0200
Subject: Long "stop-the-world" pauses in CMS GC mode
Message-ID: <op.v0yxs0vga5r2ku@nb07-spt1.mycom-int.fr>

Hi,

I am trying to understand the cause of "stop-the-world" pauses in my application using CMS GC and a large heap (48GB).
The production server SF6900 (24 x dual-core UltraSparc-IV 1.35GHz, 48 working threads, 140GB RAM) is running on Solaris 9 and Java6u25.

I know that there are several possible causes :
1) OldGen fragmentation : to avoid it, I implemented an automatic FullGC in crontab at 2:30am
30 2 * * * /usr/jdk/instances/jdk1.6.0/bin/jmap -d64 -histo:live `/usr/bin/pgrep -f "XXXXXXX"` 2>&1 >/dev/null

2) Weak refs processing : a workaround (not tried yet) is to use -XX:+ParallelRefProcEnabled, as described in the following articles :
http://blogs.oracle.com/jonthecollector/entry/top_10_gc_reasons
http://stackoverflow.com/questions/4101540/how-can-i-lower-the-weak-ref-processing-time-during-gc
I have found out that it could be triggered by the daily unreferencing of a big object containing millions of small objects (using weak references).


The application has been running for almost a week and I can see some "stop-the-world" pauses longer than 10 seconds :
$ egrep "Total time for which application threads were stopped: [0-9][0-9]\." gc_201108232207.log
Total time for which application threads were stopped: 10.8630158 seconds <- due to weak refs
Total time for which application threads were stopped: 18.5259611 seconds
Total time for which application threads were stopped: 10.0777809 seconds <- due to weak refs
Total time for which application threads were stopped: 61.5576519 seconds
Total time for which application threads were stopped: 19.0205127 seconds
Total time for which application threads were stopped: 20.6893643 seconds
Total time for which application threads were stopped: 16.0048075 seconds
Total time for which application threads were stopped: 12.3665083 seconds <- due to weak refs
Total time for which application threads were stopped: 11.5213443 seconds <- due to weak refs
Total time for which application threads were stopped: 37.1018520 seconds <- due to weak refs
Total time for which application threads were stopped: 16.3988783 seconds <- due to weak refs
Total time for which application threads were stopped: 12.4057546 seconds

6 of them have unknown explanation for me.

For your information, here are the 6 "weak refs" log messages :
$ egrep "weak refs processing, [1-9][0-9]?" gc_201108232207.log | more
2011-08-24T10:13:49.641+0100: 43564.409: [GC[YG occupancy: 342791 K (943744 K)]43564.410: [Rescan (non-parallel) 43564.410: [grey object rescan, 0.7358794 secs]43565.146: [root rescan, 1.9033345 secs], 2.6398211 secs]43567.049: [weak refs processing, 8.2148555 secs] [1 CMS-remark: 26914465K(49283072K)] 27257257K(50226816K), 10.8566498 secs] [Times: user=10.85 sys=0.00, real=10.86 secs]
2011-08-25T12:33:22.658+0100: 138336.194: [GC[YG occupancy: 179985 K (943744 K)]138336.195: [Rescan (non-parallel) 138336.195: [grey object rescan, 0.5969886 secs]138336.792: [root rescan, 0.5114118 secs], 1.1089811 secs]138337.304: [weak refs processing, 8.8414246 secs] [1 CMS-remark: 20122279K(49283072K)] 20302264K(5226816K), 9.9514563 secs] [Times: user=9.94 sys=0.01, real=9.95 secs]
2011-08-26T07:22:55.233+0100: 206107.887: [GC[YG occupancy: 177014 K (943744 K)]206107.888: [Rescan (non-parallel) 206107.888: [grey object rescan, 0.4472730 secs]206108.335: [root rescan, 1.5575365 secs], 2.0053337 secs]206109.893: [weak refs processing, 10.3436973 secs] [1 CMS-remark: 19861286K(49283072K)] 20038301K(50226816K), 12.3572481 secs] [Times: user=12.22 sys=0.00, real=12.36 secs]
2011-08-26T07:51:55.531+0100: 207848.163: [GC[YG occupancy: 423184 K (943744 K)]207848.163: [Rescan (non-parallel) 207848.163: [grey object rescan, 0.4466552 secs]207848.610: [root rescan, 3.4207362 secs], 3.8680060 secs]207852.031: [weak refs processing, 7.6403893 secs] [1 CMS-remark: 19714349K(49283072K)] 20137533K(50226816K), 11.5130922 secs] [Times: user=11.51 sys=0.00, real=11.51 secs]
2011-08-27T15:18:48.928+0100: 321060.091: [GC[YG occupancy: 711567 K (943744 K)]321060.092: [Rescan (non-parallel) 321060.092: [grey object rescan, 0.4628955 secs]321060.555: [root rescan, 3.2087381 secs], 3.6721710 secs]321063.764: [weak refs processing, 33.3995481 secs] [1 CMS-remark: 19918243K(49283072K)] 20629810K(50226816K), 37.0910804 secs] [Times: user=37.04 sys=0.00, real=37.09 secs]
2011-08-28T11:17:12.144+0100: 392962.378: [GC[YG occupancy: 811576 K (943744 K)]392962.378: [Rescan (non-parallel) 392962.378: [grey object rescan, 0.4140054 secs]392962.793: [root rescan, 4.4323136 secs], 4.8469694 secs]392967.225: [weak refs processing, 11.5384812 secs] [1 CMS-remark: 19819290K(49283072K)] 20630867K(50226816K), 16.3885374 secs] [Times: user=16.35 sys=0.01, real=16.39 secs]


1. Here is the first pattern : a 61-second pause, but I don't see any suspicious message in GC logs:
2011-08-24T10:24:25.748+0100: 44200.509: [GC 44200.511: [ParNew
Desired survivor size 53673984 bytes, new threshold 1 (max 4)
- age 1: 101879520 bytes, 101879520 total
: 933589K->104832K(943744K), 0.3947382 secs] 21369469K->20703994K(50226816K), 0.3966779 secs] [Times: user=6.43 sys=0.04, real=0.40 secs]
Heap after GC invocations=1187 (full 12):
par new generation total 943744K, used 104832K [0xfffffff353c00000, 0xfffffff393c00000, 0xfffffff393c00000)
eden space 838912K, 0% used [0xfffffff353c00000, 0xfffffff353c00000, 0xfffffff386f40000)
from space 104832K, 100% used [0xfffffff386f40000, 0xfffffff38d5a0000, 0xfffffff38d5a0000)
to space 104832K, 0% used [0xfffffff38d5a0000, 0xfffffff38d5a0000, 0xfffffff393c00000)
concurrent mark-sweep generation total 49283072K, used 20599162K [0xfffffff393c00000, 0xffffffff53c00000, 0xffffffff53c00000)
concurrent-mark-sweep perm gen total 524288K, used 42905K [0xffffffff53c00000, 0xffffffff73c00000, 0xffffffff73c00000)
}
Total time for which application threads were stopped: 0.4110458 seconds
Application time: 39.5906692 seconds
{Heap before GC invocations=1187 (full 12):
par new generation total 943744K, used 943744K [0xfffffff353c00000, 0xfffffff393c00000, 0xfffffff393c00000)
eden space 838912K, 100% used [0xfffffff353c00000, 0xfffffff386f40000, 0xfffffff386f40000)
from space 104832K, 100% used [0xfffffff386f40000, 0xfffffff38d5a0000, 0xfffffff38d5a0000)
to space 104832K, 0% used [0xfffffff38d5a0000, 0xfffffff38d5a0000, 0xfffffff393c00000)
concurrent mark-sweep generation total 49283072K, used 20599162K [0xfffffff393c00000, 0xffffffff53c00000, 0xffffffff53c00000)
concurrent-mark-sweep perm gen total 524288K, used 42905K [0xffffffff53c00000, 0xffffffff73c00000, 0xffffffff73c00000)
2011-08-24T10:25:07.776+0100: 44242.537: [GC 44301.853: [ParNew
Desired survivor size 53673984 bytes, new threshold 1 (max 4)
- age 1: 99505080 bytes, 99505080 total
: 943744K->104832K(943744K), 0.2010508 secs] 21542906K->20852742K(50226816K), 0.2022636 secs] [Times: user=5.67 sys=0.02, real=59.52 secs]
Heap after GC invocations=1188 (full 12):
par new generation total 943744K, used 104832K [0xfffffff353c00000, 0xfffffff393c00000, 0xfffffff393c00000)
eden space 838912K, 0% used [0xfffffff353c00000, 0xfffffff353c00000, 0xfffffff386f40000)
from space 104832K, 100% used [0xfffffff38d5a0000, 0xfffffff393c00000, 0xfffffff393c00000)
to space 104832K, 0% used [0xfffffff386f40000, 0xfffffff386f40000, 0xfffffff38d5a0000)
concurrent mark-sweep generation total 49283072K, used 20747910K [0xfffffff393c00000, 0xffffffff53c00000, 0xffffffff53c00000)
concurrent-mark-sweep perm gen total 524288K, used 42905K [0xffffffff53c00000, 0xffffffff73c00000, 0xffffffff73c00000)
}
Total time for which application threads were stopped: 61.5576519 seconds
Application time: 0.0245838 seconds
Total time for which application threads were stopped: 9.8331189 seconds
Application time: 0.0012626 seconds
Total time for which application threads were stopped: 0.0090404 seconds
Application time: 0.0008943 seconds
Total time for which application threads were stopped: 0.0020415 seconds
Application time: 0.0008181 seconds
Total time for which application threads were stopped: 0.2338605 seconds
Application time: 0.0018822 seconds

The only suspicious thing is "[Times: user=5.67 sys=0.02, real=59.52 secs]", which means that the "real" duration is a lot higher than "user" CPU time.
Because "sys" duration is low, it also means that the server is not swapping.
What could explain this 61 seconds pause ?


2. Here is the second pattern : a 20-second pause, in the middle of nowhere in GC logs :
{Heap before GC invocations=11132 (full 166):
par new generation total 943744K, used 882686K [0xfffffff353c00000, 0xfffffff393c00000, 0xfffffff393c00000)
eden space 838912K, 100% used [0xfffffff353c00000, 0xfffffff386f40000, 0xfffffff386f40000)
from space 104832K, 41% used [0xfffffff386f40000, 0xfffffff3899ffa48, 0xfffffff38d5a0000)
to space 104832K, 0% used [0xfffffff38d5a0000, 0xfffffff38d5a0000, 0xfffffff393c00000)
concurrent mark-sweep generation total 49283072K, used 19148140K [0xfffffff393c00000, 0xffffffff53c00000, 0xffffffff53c00000)
concurrent-mark-sweep perm gen total 524288K, used 44308K [0xffffffff53c00000, 0xffffffff73c00000, 0xffffffff73c00000)
2011-08-25T20:07:07.235+0100: 165560.417: [GC 165560.417: [ParNew
Desired survivor size 53673984 bytes, new threshold 4 (max 4)
- age 1: 26189384 bytes, 26189384 total
- age 2: 1713728 bytes, 27903112 total
: 882686K->34449K(943744K), 0.1280202 secs] 20030826K->19182589K(50226816K), 0.1285927 secs] [Times: user=3.94 sys=0.01, real=0.13 secs]
Heap after GC invocations=11133 (full 166):
par new generation total 943744K, used 34449K [0xfffffff353c00000, 0xfffffff393c00000, 0xfffffff393c00000)
eden space 838912K, 0% used [0xfffffff353c00000, 0xfffffff353c00000, 0xfffffff386f40000)
from space 104832K, 32% used [0xfffffff38d5a0000, 0xfffffff38f744468, 0xfffffff393c00000)
to space 104832K, 0% used [0xfffffff386f40000, 0xfffffff386f40000, 0xfffffff38d5a0000)
concurrent mark-sweep generation total 49283072K, used 19148140K [0xfffffff393c00000, 0xffffffff53c00000, 0xffffffff53c00000)
concurrent-mark-sweep perm gen total 524288K, used 44308K [0xffffffff53c00000, 0xffffffff73c00000, 0xffffffff73c00000)
}
Total time for which application threads were stopped: 0.1370098 seconds
Application time: 53.6273550 seconds
Total time for which application threads were stopped: 0.0429426 seconds
Application time: 0.0002318 seconds
Total time for which application threads were stopped: 0.0044294 seconds
Application time: 0.0002250 seconds
Total time for which application threads were stopped: 0.0016478 seconds
Application time: 59.0926108 seconds
Total time for which application threads were stopped: 0.0431387 seconds
Application time: 0.0002193 seconds
Total time for which application threads were stopped: 0.0020966 seconds
Application time: 0.0000956 seconds
Total time for which application threads were stopped: 0.0016358 seconds
Application time: 60.1048190 seconds
Total time for which application threads were stopped: 0.0481582 seconds
Application time: 0.0002207 seconds
Total time for which application threads were stopped: 0.0067752 seconds
Application time: 0.0001073 seconds
Total time for which application threads were stopped: 0.0016387 seconds
Application time: 60.7453974 seconds
Total time for which application threads were stopped: 0.0425995 seconds
Application time: 0.0002457 seconds
Total time for which application threads were stopped: 0.0019724 seconds
Application time: 0.0001005 seconds
Total time for which application threads were stopped: 0.0016210 seconds
Application time: 59.0845530 seconds
Total time for which application threads were stopped: 0.0424095 seconds
Application time: 0.0002314 seconds
Total time for which application threads were stopped: 0.0020107 seconds
Application time: 0.0000959 seconds
Total time for which application threads were stopped: 0.0015940 seconds
Application time: 60.7994458 seconds
Total time for which application threads were stopped: 0.0428210 seconds
Application time: 0.0002210 seconds
Total time for which application threads were stopped: 0.0020541 seconds
Application time: 0.0000974 seconds
Total time for which application threads were stopped: 0.0016126 seconds
Application time: 59.0963098 seconds
Total time for which application threads were stopped: 0.0592795 seconds
Application time: 0.0002622 seconds
Total time for which application threads were stopped: 0.0023229 seconds
Application time: 0.0000926 seconds
Total time for which application threads were stopped: 0.0016296 seconds
Application time: 60.1021141 seconds
Total time for which application threads were stopped: 0.0443986 seconds
Application time: 0.0002462 seconds
Total time for which application threads were stopped: 0.0021135 seconds
Application time: 0.0001076 seconds
Total time for which application threads were stopped: 0.0016165 seconds
Application time: 60.0324234 seconds
Total time for which application threads were stopped: 0.0437486 seconds
Application time: 0.0002286 seconds
Total time for which application threads were stopped: 0.0021017 seconds
Application time: 0.0001073 seconds
Total time for which application threads were stopped: 0.0016570 seconds
Application time: 60.4613330 seconds
Total time for which application threads were stopped: 0.0490276 seconds
Application time: 0.0002947 seconds
Total time for which application threads were stopped: 0.0024618 seconds
Application time: 0.0001238 seconds
Total time for which application threads were stopped: 0.0019863 seconds
Application time: 59.8201422 seconds
Total time for which application threads were stopped: 0.0455540 seconds
Application time: 0.0003668 seconds
Total time for which application threads were stopped: 0.0020906 seconds
Application time: 0.0001126 seconds
Total time for which application threads were stopped: 0.0016693 seconds
Application time: 60.0721521 seconds
Total time for which application threads were stopped: 0.0438111 seconds
Application time: 0.0002660 seconds
Total time for which application threads were stopped: 0.0019814 seconds
Application time: 0.0001018 seconds
Total time for which application threads were stopped: 0.0017817 seconds
Application time: 60.0825886 seconds
Total time for which application threads were stopped: 0.0440386 seconds
Application time: 0.0002197 seconds
Total time for which application threads were stopped: 0.0020655 seconds
Application time: 0.0001093 seconds
Total time for which application threads were stopped: 0.0016122 seconds
Application time: 59.6628580 seconds
Total time for which application threads were stopped: 0.0425082 seconds
Application time: 0.0002121 seconds
Total time for which application threads were stopped: 0.0020967 seconds
Application time: 0.0000935 seconds
Total time for which application threads were stopped: 0.0015909 seconds
Application time: 60.1951548 seconds
Total time for which application threads were stopped: 0.0432125 seconds
Application time: 0.0002274 seconds
Total time for which application threads were stopped: 0.0020316 seconds
Application time: 0.0001062 seconds
Total time for which application threads were stopped: 0.0016534 seconds
Application time: 59.5329171 seconds
Total time for which application threads were stopped: 20.6893643 seconds
Application time: 0.0002839 seconds
Total time for which application threads were stopped: 0.0076240 seconds
Application time: 0.0002137 seconds
Total time for which application threads were stopped: 0.0019918 seconds
Application time: 39.4376656 seconds
Total time for which application threads were stopped: 0.0612671 seconds
Application time: 0.0002478 seconds

Any idea ?


Thanks in advance for your help
--
David Tavoularis


[Annex]
Complete GC log file gc_201108232207.log.gz available here: http://dl.free.fr/gxrxlLsVS

JVM command line extract :
/usr/jdk/instances/jdk1.6.0/jre/bin/sparcv9/java -Dsun.rmi.dgc.checkInterval=2000 -server -Xms49152m -Xmx49152m -XX:PermSize=512m -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+DisableExplicitGC -XX:-CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=40 -XX:NewSize=1024m -XX:MaxNewSize=1024m -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -XX:+PrintGCDateStamps -Xloggc:/logs/gc_201108232207.log -XX:+UseCompressedOops -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/data/heapdump

$ /usr/jdk/instances/jdk1.6.0/jre/bin/sparcv9/java -version
java version "1.6.0_25"
Java(TM) SE Runtime Environment (build 1.6.0_25-b06)
Java HotSpot(TM) 64-Bit Server VM (build 20.0-b11, mixed mode)

$ /usr/sbin/prtdiag | head -3
System Configuration: Sun Microsystems sun4u Sun Fire E6900
System clock frequency: 150 MHz
Memory size: 143360 Megabytes

$ mpstat | wc -l
49

$ uname -a
SunOS XXX 5.9 Generic_122300-05 sun4u sparc SUNW,Sun-Fire

For your information, Full GC automatically triggered at 2:30am :
$ grep Full gc_201108232207.log
2011-08-24T02:30:02.475+0100: 15737.603: [Full GC 15737.604: [CMS: 11972490K->5028118K(49283072K), 137.9859661 secs] 12141664K->5028118K(50226816K), [CMS Perm : 39558K->39491K(524288K)], 137.9867010 secs] [Times: user=133.02 sys=4.89, real=137.99 secs]
2011-08-25T02:30:05.142+0100: 102139.150: [Full GC 102139.150: [CMS: 18724122K->11970549K(49283072K), 433.4189517 secs] 18976948K->11970549K(50226816K), [CMS Perm : 44256K->42995K(524288K)], 433.4350620 secs] [Times: user=429.00 sys=3.89, real=433.44 secs]
2011-08-26T02:30:05.125+0100: 188538.009: [Full GC 188538.009: [CMS: 15865994K->12528867K(49283072K), 477.0168566 secs] 16343213K->12528867K(50226816K), [CMS Perm : 44324K->43408K(524288K)], 477.0175358 secs] [Times: user=476.76 sys=0.05, real=477.02 secs]
2011-08-27T02:30:03.084+0100: 274934.847: [Full GC 274934.849: [CMS: 14857264K->8811922K(49283072K), 312.4786042 secs] 15546860K->8811922K(50226816K), [CMS Perm : 44557K->43762K(524288K)], 312.4796506 secs] [Times: user=312.38 sys=0.11, real=312.48 secs]
2011-08-28T02:30:04.129+0100: 361334.770: [Full GC 361334.777: [CMS: 16479144K->5767617K(49283072K), 161.5857103 secs] 17318705K->5767617K(50226816K), [CMS Perm : 44127K->43481K(524288K)], 161.5863909 secs] [Times: user=161.21 sys=0.02, real=161.59 secs]
2011-08-29T02:30:03.316+0100: 447732.838: [Full GC 447732.838: [CMS: 13471208K->6989798K(49283072K), 173.7255263 secs] 13700543K->6989798K(50226816K), [CMS Perm : 43709K->43433K(524288K)], 173.7260186 secs] [Times: user=173.48 sys=0.01, real=173.73 secs]


________________________________

This electronic message contains information from Mycom which may be privileged or confidential. The information is intended to be for the use of the individual(s) or entity named above. If you are not the intended recipient, be aware that any disclosure, copying, distribution or any other use of the contents of this information is prohibited. If you have received this electronic message in error, please notify us by post or telephone (to the numbers or correspondence address above) or by email (at the email address above) immediately.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20110829/1b93fd8b/attachment-0001.html 

From tronicek at fit.cvut.cz  Mon Aug 29 23:28:53 2011
From: tronicek at fit.cvut.cz (=?utf-8?B?IlpkZW7Em2sgVHJvbsOtxI1layI=?=)
Date: Tue, 30 Aug 2011 08:28:53 +0200
Subject: Long "stop-the-world" pauses in CMS GC mode
In-Reply-To: <op.v0yxs0vga5r2ku@nb07-spt1.mycom-int.fr>
References: <op.v0yxs0vga5r2ku@nb07-spt1.mycom-int.fr>
Message-ID: <f0aa040d183f549012171eb080548d52.squirrel@imap.fit.cvut.cz>

Hi,

this can happen when the machine is overloaded. And as for swapping, I
think it is not involved in the sys time because these times are times of
the application thread.

Z.
-- 
Zdenek Tronicek
FIT CTU in Prague


David Tavoularis napsal(a):
> The only suspicious thing is "[Times: user=5.67 sys=0.02, real=59.52
> secs]", which means that the "real" duration is a lot higher than "user"
> CPU time.
> Because "sys" duration is low, it also means that the server is not
> swapping.
> What could explain this 61 seconds pause ?
>


From y.s.ramakrishna at oracle.com  Mon Aug 29 23:40:04 2011
From: y.s.ramakrishna at oracle.com (Ramki Ramakrishna)
Date: Mon, 29 Aug 2011 23:40:04 -0700
Subject: Default max heap size
In-Reply-To: <CAAU=Jf58YO=cyNhG5n6of_DCSWXPhdcD5VdKLiNFuva441+qwA@mail.gmail.com>
References: <CAAU=Jf58YO=cyNhG5n6of_DCSWXPhdcD5VdKLiNFuva441+qwA@mail.gmail.com>
Message-ID: <4E5C85C4.3090600@oracle.com>

Hi Sergejs --

You are right. This seems to have been changed in hs16/6u18 via:-

*6887571 Increase default heap config sizes 
<http://monaco.sfbay.sun.com/detail.jsf?cr=6887571>

Changeset: *http://hg.openjdk.java.net/hsx/hsx16/baseline/rev/0799687b7385

The documentation you pointed to probably dates back to 6.0 FCS and is
likely obsolete in places. Unfortunately, I do not have a more up to date
counterpart of the document to point you to for the one place for 
consolidates
and more up to date information.

The release notes for 6u18 however did list this change here:-

http://www.oracle.com/technetwork/java/javase/6u18-142093.html

Search for "Server JVM heap configuration ergonomics".

-- ramki

On 8/29/2011 10:34 AM, Sergejs Melderis wrote:
> Hello.
> I am trying to figure out how the hotspot chooses the default maximum heap size.
> I posted this  question to stackoverflow, but got no answers.
> I don't want to repeat it here, so here is the question
> http://stackoverflow.com/questions/7194526/hotspot-default-max-heap-size
>
> I searched the jdk source code, for the place where it is calculated.
> I found function set_heap_size defined here
> http://hg.openjdk.java.net/jdk6/jdk6/hotspot/file/dc40301aed45/src/share/vm/runtime/arguments.cpp
>
> If am not wrong, the calculation happens in the following lines
>
> if (FLAG_IS_DEFAULT(MaxHeapSize)) {
>     julong reasonable_max = phys_mem / MaxRAMFraction;
>
>     if (phys_mem<= MaxHeapSize * MinRAMFraction) {
>       // Small physical memory, so use a minimum fraction of it for the heap
>       reasonable_max = phys_mem / MinRAMFraction;
>     } else {
>       // Not-small physical memory, so require a heap at least
>       // as large as MaxHeapSize
>       reasonable_max = MAX2(reasonable_max, (julong)MaxHeapSize);
>     }
>
>
> MaxRAMFraction is 4, so reasonable_max is phys_mem / 4. So, unless
> physical memory is very small,
> the reasonable_max will be MAX2(reasonable_max, (julong)MaxHeapSize);
>
> MAX2 is defined as
> #define MAX2(a, b) (((a)<  (b)) ? (b) : (a))
>
> At the end reasonable_max is set as MaxHeapSize
> FLAG_SET_ERGO(uintx, MaxHeapSize, (uintx)reasonable_max);
>
> If I plug in the memory size on my test machine, the reasonable_max
> will be very close to what I get from jmap -heap.
> With RAM of 8, 16 GB, or more, the MaxHeapSize will be greater than 1
> GB, which contradicts the documentation
> http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html#par_gc.ergonomics.default_size
>
> Thanks,
>
> Sergey.
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20110829/a1011386/attachment.html 

From y.s.ramakrishna at oracle.com  Mon Aug 29 23:53:05 2011
From: y.s.ramakrishna at oracle.com (Ramki Ramakrishna)
Date: Mon, 29 Aug 2011 23:53:05 -0700
Subject: Long "stop-the-world" pauses in CMS GC mode
In-Reply-To: <op.v0yxs0vga5r2ku@nb07-spt1.mycom-int.fr>
References: <op.v0yxs0vga5r2ku@nb07-spt1.mycom-int.fr>
Message-ID: <4E5C88D1.9040401@oracle.com>

Hi David -- you have a 48 core machine but you have handicapped yr JVM 
by forcing -XX:-CMSParallelRemarkEnabled,
which forces single--threaded remarks on such a huge heap. Please remove 
that flag so all 48 cores can be
used for the remark pause (and it should disappear from your pause-time 
radar entirely i think). If "ref proc"
turns out to be still a problem, you can then enable 
+ParallelRefProcEnabled to parallelize that sub-phase as well.

As to the 20 s pause in the middle of nowhere, I am clueless, but  
switch on -XX:+PrintSfaepointStatistics
to see what that long pause corresponds to. Perhaps some kind of bulk 
bias revocation perhaps, I am
not sure...

-- ramki


On 8/29/2011 2:25 AM, David Tavoularis wrote:
> Hi,
>
> I am trying to understand the cause of "stop-the-world" pauses in my 
> application using CMS GC and a large heap (48GB).
> The production server SF6900 (24 x dual-core UltraSparc-IV 1.35GHz, 48 
> working threads, 140GB RAM) is running on Solaris 9 and Java6u25.
>
> I know that there are several possible causes :
> 1) OldGen fragmentation : to avoid it, I implemented an automatic 
> FullGC in crontab at 2:30am
> 30 2 * * * /usr/jdk/instances/jdk1.6.0/bin/jmap -d64 -histo:live 
> `/usr/bin/pgrep -f "XXXXXXX"` 2>&1 >/dev/null
>
> 2) Weak refs processing : a workaround (not tried yet) is to use 
> -XX:+ParallelRefProcEnabled, as described in the following articles :
> http://blogs.oracle.com/jonthecollector/entry/top_10_gc_reasons
> http://stackoverflow.com/questions/4101540/how-can-i-lower-the-weak-ref-processing-time-during-gc
> I have found out that it could be triggered by the daily unreferencing 
> of a big object containing millions of small objects (using weak 
> references).
>
>
> The application has been running for almost a week and I can see some 
> "stop-the-world" pauses longer than 10 seconds :
> *$ egrep "Total time for which application threads were stopped: 
> [0-9][0-9]\." gc_201108232207.log*
> Total time for which application threads were stopped: *10*.8630158 
> seconds *<- due to weak refs*
> Total time for which application threads were stopped: *18*.5259611 
> seconds
> Total time for which application threads were stopped: *10*.0777809 
> seconds *<- due to weak refs*
> Total time for which application threads were stopped: *61*.5576519 
> seconds
> Total time for which application threads were stopped: *19*.0205127 
> seconds
> Total time for which application threads were stopped: *20*.6893643 
> seconds
> Total time for which application threads were stopped: *16*.0048075 
> seconds
> Total time for which application threads were stopped: *12*.3665083 
> seconds *<- due to weak refs*
> Total time for which application threads were stopped: *11*.5213443 
> seconds *<- due to weak refs*
> Total time for which application threads were stopped: *37*.1018520 
> seconds *<- due to weak refs*
> Total time for which application threads were stopped: *16*.3988783 
> seconds *<- due to weak refs*
> Total time for which application threads were stopped: *12*.4057546 
> seconds
>
> 6 of them have unknown explanation for me.
>
> For your information, here are the 6 "weak refs" log messages :
> $ egrep "weak refs processing, [1-9][0-9]?" gc_201108232207.log | more
> 2011-08-24T10:13:49.641+0100: 43564.409: [GC[YG occupancy: 342791 K 
> (943744 K)]43564.410: [Rescan (non-parallel) 43564.410: [grey object 
> rescan, 0.7358794 secs]43565.146: [root rescan, 1.9033345 secs], 
> 2.6398211 secs]43567.049: [weak refs processing, 8.2148555 secs] [1 
> CMS-remark: 26914465K(49283072K)] 27257257K(50226816K), 10.8566498 
> secs] *[Times: user=10.85 sys=0.00, real=10.86 secs]*
> 2011-08-25T12:33:22.658+0100: 138336.194: [GC[YG occupancy: 179985 K 
> (943744 K)]138336.195: [Rescan (non-parallel) 138336.195: [grey object 
> rescan, 0.5969886 secs]138336.792: [root rescan, 0.5114118 secs], 
> 1.1089811 secs]138337.304: [weak refs processing, 8.8414246 secs] [1 
> CMS-remark: 20122279K(49283072K)] 20302264K(5226816K), 9.9514563 secs] 
> *[Times: user=9.94 sys=0.01, real=9.95 secs]*
> 2011-08-26T07:22:55.233+0100: 206107.887: [GC[YG occupancy: 177014 K 
> (943744 K)]206107.888: [Rescan (non-parallel) 206107.888: [grey object 
> rescan, 0.4472730 secs]206108.335: [root rescan, 1.5575365 secs], 
> 2.0053337 secs]206109.893: [weak refs processing, 10.3436973 secs] [1 
> CMS-remark: 19861286K(49283072K)] 20038301K(50226816K), 12.3572481 
> secs] *[Times: user=12.22 sys=0.00, real=12.36 secs]*
> 2011-08-26T07:51:55.531+0100: 207848.163: [GC[YG occupancy: 423184 K 
> (943744 K)]207848.163: [Rescan (non-parallel) 207848.163: [grey object 
> rescan, 0.4466552 secs]207848.610: [root rescan, 3.4207362 secs], 
> 3.8680060 secs]207852.031: [weak refs processing, 7.6403893 secs] [1 
> CMS-remark: 19714349K(49283072K)] 20137533K(50226816K), 11.5130922 
> secs] *[Times: user=11.51 sys=0.00, real=11.51 secs]*
> 2011-08-27T15:18:48.928+0100: 321060.091: [GC[YG occupancy: 711567 K 
> (943744 K)]321060.092: [Rescan (non-parallel) 321060.092: [grey object 
> rescan, 0.4628955 secs]321060.555: [root rescan, 3.2087381 secs], 
> 3.6721710 secs]321063.764: [weak refs processing, 33.3995481 secs] [1 
> CMS-remark: 19918243K(49283072K)] 20629810K(50226816K), 37.0910804 
> secs] *[Times: user=37.04 sys=0.00, real=37.09 secs]*
> 2011-08-28T11:17:12.144+0100: 392962.378: [GC[YG occupancy: 811576 K 
> (943744 K)]392962.378: [Rescan (non-parallel) 392962.378: [grey object 
> rescan, 0.4140054 secs]392962.793: [root rescan, 4.4323136 secs], 
> 4.8469694 secs]392967.225: [weak refs processing, 11.5384812 secs] [1 
> CMS-remark: 19819290K(49283072K)] 20630867K(50226816K), 16.3885374 
> secs] *[Times: user=16.35 sys=0.01, real=16.39 secs]*
>
>
>
>
>
>
> *1. Here is the first pattern : a _61-second pause_, but I don't see 
> any suspicious message in GC logs:*
> 2011-08-24T10:24:25.748+0100: 44200.509: [GC 44200.511: [ParNew
> Desired survivor size 53673984 bytes, new threshold 1 (max 4)
> - age 1: 101879520 bytes, 101879520 total
> : 933589K->104832K(943744K), 0.3947382 secs] 
> 21369469K->20703994K(50226816K), 0.3966779 secs] [Times: user=6.43 
> sys=0.04, real=0.40 secs]
> Heap after GC invocations=1187 (full 12):
> par new generation total 943744K, used 104832K [0xfffffff353c00000, 
> 0xfffffff393c00000, 0xfffffff393c00000)
> eden space 838912K, 0% used [0xfffffff353c00000, 0xfffffff353c00000, 
> 0xfffffff386f40000)
> from space 104832K, 100% used [0xfffffff386f40000, 0xfffffff38d5a0000, 
> 0xfffffff38d5a0000)
> to space 104832K, 0% used [0xfffffff38d5a0000, 0xfffffff38d5a0000, 
> 0xfffffff393c00000)
> concurrent mark-sweep generation total 49283072K, used 20599162K 
> [0xfffffff393c00000, 0xffffffff53c00000, 0xffffffff53c00000)
> concurrent-mark-sweep perm gen total 524288K, used 42905K 
> [0xffffffff53c00000, 0xffffffff73c00000, 0xffffffff73c00000)
> }
> Total time for which application threads were stopped: 0.4110458 seconds
> Application time: 39.5906692 seconds
> {Heap before GC invocations=1187 (full 12):
> par new generation total 943744K, used 943744K [0xfffffff353c00000, 
> 0xfffffff393c00000, 0xfffffff393c00000)
> eden space 838912K, 100% used [0xfffffff353c00000, 0xfffffff386f40000, 
> 0xfffffff386f40000)
> from space 104832K, 100% used [0xfffffff386f40000, 0xfffffff38d5a0000, 
> 0xfffffff38d5a0000)
> to space 104832K, 0% used [0xfffffff38d5a0000, 0xfffffff38d5a0000, 
> 0xfffffff393c00000)
> concurrent mark-sweep generation total 49283072K, used 20599162K 
> [0xfffffff393c00000, 0xffffffff53c00000, 0xffffffff53c00000)
> concurrent-mark-sweep perm gen total 524288K, used 42905K 
> [0xffffffff53c00000, 0xffffffff73c00000, 0xffffffff73c00000)
> 2011-08-24T10:25:07.776+0100: 44242.537: [GC 44301.853: [ParNew
> Desired survivor size 53673984 bytes, new threshold 1 (max 4)
> - age 1: 99505080 bytes, 99505080 total
> : 943744K->104832K(943744K), 0.2010508 secs] 
> 21542906K->20852742K(50226816K), 0.2022636 secs] *[Times: user=5.67 
> sys=0.02, real=59.52 secs]*
> Heap after GC invocations=1188 (full 12):
> par new generation total 943744K, used 104832K [0xfffffff353c00000, 
> 0xfffffff393c00000, 0xfffffff393c00000)
> eden space 838912K, 0% used [0xfffffff353c00000, 0xfffffff353c00000, 
> 0xfffffff386f40000)
> from space 104832K, 100% used [0xfffffff38d5a0000, 0xfffffff393c00000, 
> 0xfffffff393c00000)
> to space 104832K, 0% used [0xfffffff386f40000, 0xfffffff386f40000, 
> 0xfffffff38d5a0000)
> concurrent mark-sweep generation total 49283072K, used 20747910K 
> [0xfffffff393c00000, 0xffffffff53c00000, 0xffffffff53c00000)
> concurrent-mark-sweep perm gen total 524288K, used 42905K 
> [0xffffffff53c00000, 0xffffffff73c00000, 0xffffffff73c00000)
> }
> *Total time for which application threads were stopped: 61.5576519 
> seconds*
> Application time: 0.0245838 seconds
> Total time for which application threads were stopped: 9.8331189 seconds
> Application time: 0.0012626 seconds
> Total time for which application threads were stopped: 0.0090404 seconds
> Application time: 0.0008943 seconds
> Total time for which application threads were stopped: 0.0020415 seconds
> Application time: 0.0008181 seconds
> Total time for which application threads were stopped: 0.2338605 seconds
> Application time: 0.0018822 seconds
>
> The only suspicious thing is "[Times: user=5.67 sys=0.02, real=59.52 
> secs]", which means that the "real" duration is a lot higher than 
> "user" CPU time.
> Because "sys" duration is low, it also means that the server is not 
> swapping.
> What could explain this 61 seconds pause ?
>
>
>
> *2. Here is the second pattern : a 20-second pause, in the middle of 
> nowhere in GC logs :*
> {Heap before GC invocations=11132 (full 166):
> par new generation total 943744K, used 882686K [0xfffffff353c00000, 
> 0xfffffff393c00000, 0xfffffff393c00000)
> eden space 838912K, 100% used [0xfffffff353c00000, 0xfffffff386f40000, 
> 0xfffffff386f40000)
> from space 104832K, 41% used [0xfffffff386f40000, 0xfffffff3899ffa48, 
> 0xfffffff38d5a0000)
> to space 104832K, 0% used [0xfffffff38d5a0000, 0xfffffff38d5a0000, 
> 0xfffffff393c00000)
> concurrent mark-sweep generation total 49283072K, used 19148140K 
> [0xfffffff393c00000, 0xffffffff53c00000, 0xffffffff53c00000)
> concurrent-mark-sweep perm gen total 524288K, used 44308K 
> [0xffffffff53c00000, 0xffffffff73c00000, 0xffffffff73c00000)
> 2011-08-25T20:07:07.235+0100: 165560.417: [GC 165560.417: [ParNew
> Desired survivor size 53673984 bytes, new threshold 4 (max 4)
> - age 1: 26189384 bytes, 26189384 total
> - age 2: 1713728 bytes, 27903112 total
> : 882686K->34449K(943744K), 0.1280202 secs] 
> 20030826K->19182589K(50226816K), 0.1285927 secs] [Times: user=3.94 
> sys=0.01, real=0.13 secs]
> Heap after GC invocations=11133 (full 166):
> par new generation total 943744K, used 34449K [0xfffffff353c00000, 
> 0xfffffff393c00000, 0xfffffff393c00000)
> eden space 838912K, 0% used [0xfffffff353c00000, 0xfffffff353c00000, 
> 0xfffffff386f40000)
> from space 104832K, 32% used [0xfffffff38d5a0000, 0xfffffff38f744468, 
> 0xfffffff393c00000)
> to space 104832K, 0% used [0xfffffff386f40000, 0xfffffff386f40000, 
> 0xfffffff38d5a0000)
> concurrent mark-sweep generation total 49283072K, used 19148140K 
> [0xfffffff393c00000, 0xffffffff53c00000, 0xffffffff53c00000)
> concurrent-mark-sweep perm gen total 524288K, used 44308K 
> [0xffffffff53c00000, 0xffffffff73c00000, 0xffffffff73c00000)
> }
> Total time for which application threads were stopped: 0.1370098 seconds
> Application time: 53.6273550 seconds
> Total time for which application threads were stopped: 0.0429426 seconds
> Application time: 0.0002318 seconds
> Total time for which application threads were stopped: 0.0044294 seconds
> Application time: 0.0002250 seconds
> Total time for which application threads were stopped: 0.0016478 seconds
> Application time: 59.0926108 seconds
> Total time for which application threads were stopped: 0.0431387 seconds
> Application time: 0.0002193 seconds
> Total time for which application threads were stopped: 0.0020966 seconds
> Application time: 0.0000956 seconds
> Total time for which application threads were stopped: 0.0016358 seconds
> Application time: 60.1048190 seconds
> Total time for which application threads were stopped: 0.0481582 seconds
> Application time: 0.0002207 seconds
> Total time for which application threads were stopped: 0.0067752 seconds
> Application time: 0.0001073 seconds
> Total time for which application threads were stopped: 0.0016387 seconds
> Application time: 60.7453974 seconds
> Total time for which application threads were stopped: 0.0425995 seconds
> Application time: 0.0002457 seconds
> Total time for which application threads were stopped: 0.0019724 seconds
> Application time: 0.0001005 seconds
> Total time for which application threads were stopped: 0.0016210 seconds
> Application time: 59.0845530 seconds
> Total time for which application threads were stopped: 0.0424095 seconds
> Application time: 0.0002314 seconds
> Total time for which application threads were stopped: 0.0020107 seconds
> Application time: 0.0000959 seconds
> Total time for which application threads were stopped: 0.0015940 seconds
> Application time: 60.7994458 seconds
> Total time for which application threads were stopped: 0.0428210 seconds
> Application time: 0.0002210 seconds
> Total time for which application threads were stopped: 0.0020541 seconds
> Application time: 0.0000974 seconds
> Total time for which application threads were stopped: 0.0016126 seconds
> Application time: 59.0963098 seconds
> Total time for which application threads were stopped: 0.0592795 seconds
> Application time: 0.0002622 seconds
> Total time for which application threads were stopped: 0.0023229 seconds
> Application time: 0.0000926 seconds
> Total time for which application threads were stopped: 0.0016296 seconds
> Application time: 60.1021141 seconds
> Total time for which application threads were stopped: 0.0443986 seconds
> Application time: 0.0002462 seconds
> Total time for which application threads were stopped: 0.0021135 seconds
> Application time: 0.0001076 seconds
> Total time for which application threads were stopped: 0.0016165 seconds
> Application time: 60.0324234 seconds
> Total time for which application threads were stopped: 0.0437486 seconds
> Application time: 0.0002286 seconds
> Total time for which application threads were stopped: 0.0021017 seconds
> Application time: 0.0001073 seconds
> Total time for which application threads were stopped: 0.0016570 seconds
> Application time: 60.4613330 seconds
> Total time for which application threads were stopped: 0.0490276 seconds
> Application time: 0.0002947 seconds
> Total time for which application threads were stopped: 0.0024618 seconds
> Application time: 0.0001238 seconds
> Total time for which application threads were stopped: 0.0019863 seconds
> Application time: 59.8201422 seconds
> Total time for which application threads were stopped: 0.0455540 seconds
> Application time: 0.0003668 seconds
> Total time for which application threads were stopped: 0.0020906 seconds
> Application time: 0.0001126 seconds
> Total time for which application threads were stopped: 0.0016693 seconds
> Application time: 60.0721521 seconds
> Total time for which application threads were stopped: 0.0438111 seconds
> Application time: 0.0002660 seconds
> Total time for which application threads were stopped: 0.0019814 seconds
> Application time: 0.0001018 seconds
> Total time for which application threads were stopped: 0.0017817 seconds
> Application time: 60.0825886 seconds
> Total time for which application threads were stopped: 0.0440386 seconds
> Application time: 0.0002197 seconds
> Total time for which application threads were stopped: 0.0020655 seconds
> Application time: 0.0001093 seconds
> Total time for which application threads were stopped: 0.0016122 seconds
> Application time: 59.6628580 seconds
> Total time for which application threads were stopped: 0.0425082 seconds
> Application time: 0.0002121 seconds
> Total time for which application threads were stopped: 0.0020967 seconds
> Application time: 0.0000935 seconds
> Total time for which application threads were stopped: 0.0015909 seconds
> Application time: 60.1951548 seconds
> Total time for which application threads were stopped: 0.0432125 seconds
> Application time: 0.0002274 seconds
> Total time for which application threads were stopped: 0.0020316 seconds
> Application time: 0.0001062 seconds
> Total time for which application threads were stopped: 0.0016534 seconds
> Application time: 59.5329171 seconds
> *Total time for which application threads were stopped: 20.6893643 
> seconds*
> Application time: 0.0002839 seconds
> Total time for which application threads were stopped: 0.0076240 seconds
> Application time: 0.0002137 seconds
> Total time for which application threads were stopped: 0.0019918 seconds
> Application time: 39.4376656 seconds
> Total time for which application threads were stopped: 0.0612671 seconds
> Application time: 0.0002478 seconds
>
> Any idea ?
>
>
> Thanks in advance for your help
> -- 
> David Tavoularis
>
>
>
>
>
>
> [Annex]
> Complete GC log file gc_201108232207.log.gz available here: 
> http://dl.free.fr/gxrxlLsVS
>
> JVM command line extract :
> /usr/jdk/instances/jdk1.6.0/jre/bin/sparcv9/java 
> -Dsun.rmi.dgc.checkInterval=2000 -server -Xms49152m -Xmx49152m 
> -XX:PermSize=512m -XX:+UseConcMarkSweepGC -XX:+UseParNewGC 
> -XX:+DisableExplicitGC -XX:-CMSParallelRemarkEnabled 
> -XX:CMSInitiatingOccupancyFraction=40 -XX:NewSize=1024m 
> -XX:MaxNewSize=1024m -XX:+PrintGCDetails -XX:+PrintGCTimeStamps 
> -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution 
> -XX:+PrintGCApplicationStoppedTime 
> -XX:+PrintGCApplicationConcurrentTime -XX:+PrintGCDateStamps 
> -Xloggc:/logs/gc_201108232207.log -XX:+UseCompressedOops 
> -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/data/heapdump
>
> *$ /usr/jdk/instances/jdk1.6.0/jre/bin/sparcv9/java -version*
> java version "1.6.0_25"
> Java(TM) SE Runtime Environment (build 1.6.0_25-b06)
> Java HotSpot(TM) 64-Bit Server VM (build 20.0-b11, mixed mode)
>
> *$ /usr/sbin/prtdiag | head -3*
> System Configuration: Sun Microsystems sun4u Sun Fire E6900
> System clock frequency: 150 MHz
> Memory size: 143360 Megabytes
>
> *$ mpstat | wc -l*
> 49
>
> *$ uname -a*
> SunOS XXX 5.9 Generic_122300-05 sun4u sparc SUNW,Sun-Fire
>
> For your information, Full GC automatically triggered at 2:30am :
> *$ grep Full gc_201108232207.log*
> 2011-08-24T02:30:02.475+0100: 15737.603: [Full GC 15737.604: [CMS: 
> 11972490K->5028118K(49283072K), 137.9859661 secs] 
> 12141664K->5028118K(50226816K), [CMS Perm : 39558K->39491K(524288K)], 
> 137.9867010 secs] [Times: user=133.02 sys=4.89, real=137.99 secs]
> 2011-08-25T02:30:05.142+0100: 102139.150: [Full GC 102139.150: [CMS: 
> 18724122K->11970549K(49283072K), 433.4189517 secs] 
> 18976948K->11970549K(50226816K), [CMS Perm : 44256K->42995K(524288K)], 
> 433.4350620 secs] [Times: user=429.00 sys=3.89, real=433.44 secs]
> 2011-08-26T02:30:05.125+0100: 188538.009: [Full GC 188538.009: [CMS: 
> 15865994K->12528867K(49283072K), 477.0168566 secs] 
> 16343213K->12528867K(50226816K), [CMS Perm : 44324K->43408K(524288K)], 
> 477.0175358 secs] [Times: user=476.76 sys=0.05, real=477.02 secs]
> 2011-08-27T02:30:03.084+0100: 274934.847: [Full GC 274934.849: [CMS: 
> 14857264K->8811922K(49283072K), 312.4786042 secs] 
> 15546860K->8811922K(50226816K), [CMS Perm : 44557K->43762K(524288K)], 
> 312.4796506 secs] [Times: user=312.38 sys=0.11, real=312.48 secs]
> 2011-08-28T02:30:04.129+0100: 361334.770: [Full GC 361334.777: [CMS: 
> 16479144K->5767617K(49283072K), 161.5857103 secs] 
> 17318705K->5767617K(50226816K), [CMS Perm : 44127K->43481K(524288K)], 
> 161.5863909 secs] [Times: user=161.21 sys=0.02, real=161.59 secs]
> 2011-08-29T02:30:03.316+0100: 447732.838: [Full GC 447732.838: [CMS: 
> 13471208K->6989798K(49283072K), 173.7255263 secs] 
> 13700543K->6989798K(50226816K), [CMS Perm : 43709K->43433K(524288K)], 
> 173.7260186 secs] [Times: user=173.48 sys=0.01, real=173.73 secs]
>
>
> ------------------------------------------------------------------------
>
> This electronic message contains information from Mycom which may be 
> privileged or confidential. The information is intended to be for the 
> use of the individual(s) or entity named above. If you are not the 
> intended recipient, be aware that any disclosure, copying, 
> distribution or any other use of the contents of this information is 
> prohibited. If you have received this electronic message in error, 
> please notify us by post or telephone (to the numbers or 
> correspondence address above) or by email (at the email address above) 
> immediately.
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20110829/88ca702d/attachment-0001.html 

From y.s.ramakrishna at oracle.com  Tue Aug 30 00:19:52 2011
From: y.s.ramakrishna at oracle.com (Ramki Ramakrishna)
Date: Tue, 30 Aug 2011 00:19:52 -0700
Subject: Long "stop-the-world" pauses in CMS GC mode
In-Reply-To: <4E5C88D1.9040401@oracle.com>
References: <op.v0yxs0vga5r2ku@nb07-spt1.mycom-int.fr>
	<4E5C88D1.9040401@oracle.com>
Message-ID: <4E5C8F18.2070801@oracle.com>

David, I missed one of the longer pauses that you'd specifically drawn 
attention to:-

> On 8/29/2011 2:25 AM, David Tavoularis wrote:
>> *1. Here is the first pattern : a _61-second pause_, but I don't see 
>> any suspicious message in GC logs:*
>> ...
>> 2011-08-24T10:25:07.776+0100: 44242.537: [GC 44301.853: [ParNew
>> Desired survivor size 53673984 bytes, new threshold 1 (max 4)
>> - age 1: 99505080 bytes, 99505080 total
>> : 943744K->104832K(943744K), 0.2010508 secs] 
>> 21542906K->20852742K(50226816K), 0.2022636 secs] *[Times: user=5.67 
>> sys=0.02, real=59.52 secs]*

If you look at the timestamps above, the GC event starts off at 
44242.537 seconds, but then the GC itself does not
commence until 44301.853 seconds, i.e. a full 59.32 seconds later. So 
the pause is associated not with
GC work itself (which is correctly reported as 202 ms), but rather with 
a preamble to the GC, perhaps
with bringing threads to a safepoint, I am guessing. Once again 
-XX:+PrintSafepointStatistics (which
i mentioned in previous email wrt the 20 s pause in the middle of 
noweher) would likely provide
some clues. I have heard apocryphal stories of -XX:+UseMembar having 
worked to get rid of
overly long safepointing pauses,. and I have heard -XX:-UseBiasedLocking 
for pauses associated
with bulk bias revocations. But, without +PrintSafepointStatistics data 
to draw inferences
from, those incantations would just constitute superstitious mumbo-jumbo.

-- ramki

>> Heap after GC invocations=1188 (full 12):
>> par new generation total 943744K, used 104832K [0xfffffff353c00000, 
>> 0xfffffff393c00000, 0xfffffff393c00000)
>> eden space 838912K, 0% used [0xfffffff353c00000, 0xfffffff353c00000, 
>> 0xfffffff386f40000)
>> from space 104832K, 100% used [0xfffffff38d5a0000, 
>> 0xfffffff393c00000, 0xfffffff393c00000)
>> to space 104832K, 0% used [0xfffffff386f40000, 0xfffffff386f40000, 
>> 0xfffffff38d5a0000)
>> concurrent mark-sweep generation total 49283072K, used 20747910K 
>> [0xfffffff393c00000, 0xffffffff53c00000, 0xffffffff53c00000)
>> concurrent-mark-sweep perm gen total 524288K, used 42905K 
>> [0xffffffff53c00000, 0xffffffff73c00000, 0xffffffff73c00000)
>> }
>> *Total time for which application threads were stopped: 61.5576519 
>> seconds*
>> Application time: 0.0245838 seconds
>> Total time for which application threads were stopped: 9.8331189 seconds
>> Application time: 0.0012626 seconds
>> Total time for which application threads were stopped: 0.0090404 seconds
>> Application time: 0.0008943 seconds
>> Total time for which application threads were stopped: 0.0020415 seconds
>> Application time: 0.0008181 seconds
>> Total time for which application threads were stopped: 0.2338605 seconds
>> Application time: 0.0018822 seconds
>>
>> The only suspicious thing is "[Times: user=5.67 sys=0.02, real=59.52 
>> secs]", which means that the "real" duration is a lot higher than 
>> "user" CPU time.
>> Because "sys" duration is low, it also means that the server is not 
>> swapping.
>> What could explain this 61 seconds pause ?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20110830/8ef5e553/attachment.html 

From david.tavoularis at mycom-int.com  Tue Aug 30 02:52:18 2011
From: david.tavoularis at mycom-int.com (David Tavoularis)
Date: Tue, 30 Aug 2011 11:52:18 +0200
Subject: Long "stop-the-world" pauses in CMS GC mode
In-Reply-To: <4E5C8F18.2070801@oracle.com>
References: <op.v0yxs0vga5r2ku@nb07-spt1.mycom-int.fr>
	<4E5C88D1.9040401@oracle.com> <4E5C8F18.2070801@oracle.com>
Message-ID: <op.v00tpgtma5r2ku@nb07-spt1.mycom-int.fr>

Hi Ramki, Zden??k,

Thank you for your valuable answers.

> So the pause is associated not with GC work itself (which is correctly reported as 202 ms), but rather with a
> preamble to the GC, perhaps with bringing threads to a safepoint, I am guessing.
I will ask to add -XX:+PrintSafepointStatistics. What are the expected outputs ? Will it be in GC logs or in stdout ?

> you have a 48 core machine but you have handicapped yr JVM by forcing -XX:-CMSParallelRemarkEnabled,
> which forces single--threaded remarks on such a huge heap.
I will ask to remove it and let you know.

> If "ref proc" turns out to be still a problem, you can then enable +ParallelRefProcEnabled to parallelize that sub-phase as well.
I will not activate -XX:+ParallelRefProcEnabled, because according to http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7028845, it is broken in Java6u25 and fixed in Java6u27.

> I have heard apocryphal stories of -XX:+UseMembar having worked to get rid of overly long safepointing pauses,
> and I have heard -XX:-UseBiasedLocking for pauses associated with bulk bias revocations.
Good to know, but I won't use them until I get more info from  -XX:+PrintSafepointStatistics and a new analysis after removing  -XX:-CMSParallelRemarkEnabled

>> The only suspicious thing is "[Times: user=5.67 sys=0.02, real=59.52 secs]", which means that the "real" duration is a lot higher than "user" CPU time.
>> Because "sys" duration is low, it also means that the server is not swapping.
> this can happen when the machine is overloaded. And as for swapping, I think it is not involved in the sys time because these times are times of the application thread.
In my experience, when server is swapping, the "sys" time duration is increasing a lot.
I can confirm that there is no high CPU load on the server (max CPU usage is 30% in the last 7 days) and no disk swapping (according to vmstat "sr"="scan rate" metrics).
According to Ramki, I need to understand the reason of slow safepoint action.

Best Regards
--
David Tavoularis
Head of L3 Support, Capacity & Dimensioning
Mycom Group

On Tue, 30 Aug 2011 09:19:52 +0200, Ramki Ramakrishna <y.s.ramakrishna at oracle.com> wrote:

David, I missed one of the longer pauses that you'd specifically drawn attention to:-

On 8/29/2011 2:25 AM, David Tavoularis wrote:
1. Here is the first pattern : a 61-second pause, but I don't see any suspicious message in GC logs:
...
2011-08-24T10:25:07.776+0100: 44242.537: [GC 44301.853: [ParNew
Desired survivor size 53673984 bytes, new threshold 1 (max 4)
- age 1: 99505080 bytes, 99505080 total
: 943744K->104832K(943744K), 0.2010508 secs] 21542906K->20852742K(50226816K), 0.2022636 secs] [Times: user=5.67 sys=0.02, real=59.52 secs]

If you look at the timestamps above, the GC event starts off at 44242.537 seconds, but then the GC itself does not
commence until 44301.853 seconds, i.e. a full 59.32 seconds later. So the pause is associated not with
GC work itself (which is correctly reported as 202 ms), but rather with a preamble to the GC, perhaps
with bringing threads to a safepoint, I am guessing. Once again -XX:+PrintSafepointStatistics (which
i mentioned in previous email wrt the 20 s pause in the middle of noweher) would likely provide
some clues. I have heard apocryphal stories of -XX:+UseMembar having worked to get rid of
overly long safepointing pauses,. and I have heard -XX:-UseBiasedLocking for pauses associated
with bulk bias revocations. But, without +PrintSafepointStatistics data to draw inferences
from, those incantations would just constitute superstitious mumbo-jumbo.

-- ramki

Heap after GC invocations=1188 (full 12):
par new generation total 943744K, used 104832K [0xfffffff353c00000, 0xfffffff393c00000, 0xfffffff393c00000)
eden space 838912K, 0% used [0xfffffff353c00000, 0xfffffff353c00000, 0xfffffff386f40000)
from space 104832K, 100% used [0xfffffff38d5a0000, 0xfffffff393c00000, 0xfffffff393c00000)
to space 104832K, 0% used [0xfffffff386f40000, 0xfffffff386f40000, 0xfffffff38d5a0000)
concurrent mark-sweep generation total 49283072K, used 20747910K [0xfffffff393c00000, 0xffffffff53c00000, 0xffffffff53c00000)
concurrent-mark-sweep perm gen total 524288K, used 42905K [0xffffffff53c00000, 0xffffffff73c00000, 0xffffffff73c00000)
}
Total time for which application threads were stopped: 61.5576519 seconds
Application time: 0.0245838 seconds
Total time for which application threads were stopped: 9.8331189 seconds
Application time: 0.0012626 seconds
Total time for which application threads were stopped: 0.0090404 seconds
Application time: 0.0008943 seconds
Total time for which application threads were stopped: 0.0020415 seconds
Application time: 0.0008181 seconds
Total time for which application threads were stopped: 0.2338605 seconds
Application time: 0.0018822 seconds

The only suspicious thing is "[Times: user=5.67 sys=0.02, real=59.52 secs]", which means that the "real" duration is a lot higher than "user" CPU time.
Because "sys" duration is low, it also means that the server is not swapping.
What could explain this 61 seconds pause ?


________________________________

This electronic message contains information from Mycom which may be privileged or confidential. The information is intended to be for the use of the individual(s) or entity named above. If you are not the intended recipient, be aware that any disclosure, copying, distribution or any other use of the contents of this information is prohibited. If you have received this electronic message in error, please notify us by post or telephone (to the numbers or correspondence address above) or by email (at the email address above) immediately.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20110830/3b7bd2ed/attachment.html 

From y.s.ramakrishna at oracle.com  Tue Aug 30 09:51:08 2011
From: y.s.ramakrishna at oracle.com (Y. S. Ramakrishna)
Date: Tue, 30 Aug 2011 09:51:08 -0700
Subject: Long "stop-the-world" pauses in CMS GC mode
In-Reply-To: <op.v00tpgtma5r2ku@nb07-spt1.mycom-int.fr>
References: <op.v0yxs0vga5r2ku@nb07-spt1.mycom-int.fr>
	<4E5C88D1.9040401@oracle.com> <4E5C8F18.2070801@oracle.com>
	<op.v00tpgtma5r2ku@nb07-spt1.mycom-int.fr>
Message-ID: <4E5D14FC.5050704@oracle.com>


On 08/30/11 02:52, David Tavoularis wrote:
> Hi Ramki, Zden??k,
> 
> Thank you for your valuable answers.
> 
> /> So the pause is associated not with //GC work itself (which is 
> correctly reported as 202 ms), but rather with a /
> /> preamble to the GC, perhaps //with bringing threads to a safepoint, I 
> am guessing./
> I will ask to add -XX:+PrintSafepointStatistics. What are the expected 
> outputs ? Will it be in GC logs or in stdout ?

To stdout i believe. But with a latest JVM these data (which are
batched into a record of several entries written out together) should have
a timestamp column associated with each safepoint operation
which will allow alignment of the data wrt the GC log events in the
GC logs even, though the two split off into different i/o streams.

> 
> /> you have a 48 core machine but you have handicapped yr JVM by forcing 
> -XX:-CMSParallelRemarkEnabled,
>  > which forces single--threaded remarks on such a huge heap./
> I will ask to remove it and let you know.

Thanks. I am guessing it must be "legacy" from an (much) earlier time when
there were bugs in the cms parallel remark.

> 
> /> If "ref proc" turns out to be still a problem, you can then enable 
> +ParallelRefProcEnabled to parallelize that sub-phase as well./
> I will not activate -XX:+ParallelRefProcEnabled, because according to 
> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7028845, it is broken 
> in Java6u25 and fixed in Java6u27.

Ah yes, good catch; sorry to've not remembered as i had fixed
that problem a while back during JDK 7 development.

> 
> /> I have heard apocryphal stories of -XX:+UseMembar having worked to 
> get rid of //overly long safepointing pauses,/
> /> and I have heard -XX:-UseBiasedLocking for pauses associated //with 
> bulk bias revocations./
> Good to know, but I won't use them until I get more info from  
> -XX:+PrintSafepointStatistics and a new analysis after removing  
> -XX:-CMSParallelRemarkEnabled

Sounds good.

> 
> />> The only suspicious thing is "[Times: user=5.67 sys=0.02, real=59.52 
> secs]", which means that the "real" duration is a lot higher than "user" 
> CPU time.
>  >> Because "sys" duration is low, it also means that the server is not 
> swapping./
> /> this can happen when the machine is overloaded. And as for swapping, 
> I think it is not involved in the sys time because these times are times 
> of the application thread./
> In my experience, when server is swapping, the "sys" time duration is 
> increasing a lot.
> I can confirm that there is no high CPU load on the server (max CPU 
> usage is 30% in the last 7 days) and no disk swapping (according to 
> vmstat "sr"="scan rate" metrics).
> According to Ramki, I need to understand the reason of slow safepoint 
> action.

Right; sounds like a plan.

-- ramki