From kinnari.darji at citi.com Wed Aug 3 10:45:43 2011 From: kinnari.darji at citi.com (Darji, Kinnari ) Date: Wed, 3 Aug 2011 13:45:43 -0400 Subject: understanding GC logs Message-ID: <21ED8E3420CDB647B88C7F80A7D64DAC0691E04E82@exnjmb89.nam.nsroot.net> Hello GC team, What does this all different time mean? Can someone please clarify? What is the time application when application stops? [GC 9768.668: [ParNew 3746 Desired survivor size 10878976 bytes, new threshold 4 (max 4) 3747 - age 1: 594288 bytes, 594288 total 3748 - age 2: 2369912 bytes, 2964200 total 3749 - age 3: 2877584 bytes, 5841784 total 3750 - age 4: 3075264 bytes, 8917048 total 3751 : 182066K->12384K(191744K), 0.0089120 secs] 2755986K->2586303K(10710272K), 0.0092180 secs] [Times: user=0.09 sys=0.00, real=0.01 secs] Thank you Kinnari -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20110803/d25a746e/attachment.html From y.s.ramakrishna at oracle.com Wed Aug 3 11:07:52 2011 From: y.s.ramakrishna at oracle.com (Ramki Ramakrishna) Date: Wed, 03 Aug 2011 11:07:52 -0700 Subject: understanding GC logs In-Reply-To: <21ED8E3420CDB647B88C7F80A7D64DAC0691E04E82@exnjmb89.nam.nsroot.net> References: <21ED8E3420CDB647B88C7F80A7D64DAC0691E04E82@exnjmb89.nam.nsroot.net> Message-ID: <4E398E78.3000408@oracle.com> On 8/3/2011 10:45 AM, Darji, Kinnari wrote: > > Hello GC team, > > What does this all different time mean? Can someone please clarify? > > What is the time application when application stops? > > [GC 9768.668: [ParNew > ^^^^^^ JVM timestamp (seconds since start of JVM) at start of GC operation) > > 3746 Desired survivor size 10878976 bytes, new threshold 4 (max 4) > > 3747 - age 1: 594288 bytes, 594288 total > > 3748 - age 2: 2369912 bytes, 2964200 total > > 3749 - age 3: 2877584 bytes, 5841784 total > > 3750 - age 4: 3075264 bytes, 8917048 total > > 3751 : 182066K->12384K(191744K), 0.0089120 secs] > 2755986K->2586303K(10710272K), 0.0092180 secs] > ^^^^^^^^ ^^^^^^^ Duration of Scavenge Duration of whole GC operation (includes scavenge) > > [Times: user=0.09 sys=0.00, real=0.01 secs] > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Process virtual user and system times, and real (elapsed) time during GC operation. The time for which the application threads were stopped is about 9.2 ms. -- ramki -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20110803/2e3d2fc1/attachment.html From y.s.ramakrishna at oracle.com Wed Aug 3 11:36:17 2011 From: y.s.ramakrishna at oracle.com (Ramki Ramakrishna) Date: Wed, 03 Aug 2011 11:36:17 -0700 Subject: understanding GC logs In-Reply-To: <21ED8E3420CDB647B88C7F80A7D64DAC0691E04F84@exnjmb89.nam.nsroot.net> References: <21ED8E3420CDB647B88C7F80A7D64DAC0691E04E82@exnjmb89.nam.nsroot.net> <4E398E78.3000408@oracle.com> <21ED8E3420CDB647B88C7F80A7D64DAC0691E04F84@exnjmb89.nam.nsroot.net> Message-ID: <4E399521.2080805@oracle.com> On 8/3/2011 11:18 AM, Darji, Kinnari wrote: > > Thanks Ramki > > So If I look at logs starting [GC and real times, that should be > almost application STW time. Am I correct? > yes. Except that the real time in that display has a resolution of 10 ms only. (Thus the 9.2 ms looked like 0.01 s below, i think.) But yes, that's the STW time. One caveat though -- this only lists STW ops attributed to GC. More generally, you would want to use +PrintSafepointStatistics to see all STW operations (and details thereof), including of course the GC ops (which are usually the most common type of STW op, but by no means the only type). -- ramki > Thank you > > Kinnari > > *From:*Ramki Ramakrishna [mailto:y.s.ramakrishna at oracle.com] > *Sent:* Wednesday, August 03, 2011 2:08 PM > *To:* Darji, Kinnari [ICG-IT] > *Cc:* hotspot-gc-use at openjdk.java.net > *Subject:* Re: understanding GC logs > > > > On 8/3/2011 10:45 AM, Darji, Kinnari wrote: > > Hello GC team, > > What does this all different time mean? Can someone please clarify? > > What is the time application when application stops? > > [GC 9768.668: [ParNew > > ^^^^^^ JVM timestamp (seconds since start of JVM) at start > of GC operation) > > 3746 Desired survivor size 10878976 bytes, new threshold 4 (max 4) > > 3747 - age 1: 594288 bytes, 594288 total > > 3748 - age 2: 2369912 bytes, 2964200 total > > 3749 - age 3: 2877584 bytes, 5841784 total > > 3750 - age 4: 3075264 bytes, 8917048 total > > 3751 : 182066K->12384K(191744K), 0.0089120 secs] > 2755986K->2586303K(10710272K), 0.0092180 secs] > > ^^^^^^^^ > ^^^^^^^ > Duration of > Scavenge Duration of whole GC > operation > > (includes scavenge) > > [Times: user=0.09 sys=0.00, real=0.01 secs] > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Process virtual user and system > times, and real (elapsed) time during GC operation. > > The time for which the application threads were stopped is about 9.2 ms. > > -- ramki > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20110803/3046b7ca/attachment-0001.html From kinnari.darji at citi.com Wed Aug 3 11:43:57 2011 From: kinnari.darji at citi.com (Darji, Kinnari ) Date: Wed, 3 Aug 2011 14:43:57 -0400 Subject: understanding GC logs In-Reply-To: <4E399521.2080805@oracle.com> References: <21ED8E3420CDB647B88C7F80A7D64DAC0691E04E82@exnjmb89.nam.nsroot.net> <4E398E78.3000408@oracle.com> <21ED8E3420CDB647B88C7F80A7D64DAC0691E04F84@exnjmb89.nam.nsroot.net> <4E399521.2080805@oracle.com> Message-ID: <21ED8E3420CDB647B88C7F80A7D64DAC0691EC735C@exnjmb89.nam.nsroot.net> oh understood.. Thanks I will start use of +PrintSafepointStatistics option. Thank you Kinnari From: Ramki Ramakrishna [mailto:y.s.ramakrishna at oracle.com] Sent: Wednesday, August 03, 2011 2:36 PM To: Darji, Kinnari [ICG-IT] Cc: hotspot-gc-use at openjdk.java.net Subject: Re: understanding GC logs On 8/3/2011 11:18 AM, Darji, Kinnari wrote: Thanks Ramki So If I look at logs starting [GC and real times, that should be almost application STW time. Am I correct? yes. Except that the real time in that display has a resolution of 10 ms only. (Thus the 9.2 ms looked like 0.01 s below, i think.) But yes, that's the STW time. One caveat though -- this only lists STW ops attributed to GC. More generally, you would want to use +PrintSafepointStatistics to see all STW operations (and details thereof), including of course the GC ops (which are usually the most common type of STW op, but by no means the only type). -- ramki Thank you Kinnari From: Ramki Ramakrishna [mailto:y.s.ramakrishna at oracle.com] Sent: Wednesday, August 03, 2011 2:08 PM To: Darji, Kinnari [ICG-IT] Cc: hotspot-gc-use at openjdk.java.net Subject: Re: understanding GC logs On 8/3/2011 10:45 AM, Darji, Kinnari wrote: Hello GC team, What does this all different time mean? Can someone please clarify? What is the time application when application stops? [GC 9768.668: [ParNew ^^^^^^ JVM timestamp (seconds since start of JVM) at start of GC operation) 3746 Desired survivor size 10878976 bytes, new threshold 4 (max 4) 3747 - age 1: 594288 bytes, 594288 total 3748 - age 2: 2369912 bytes, 2964200 total 3749 - age 3: 2877584 bytes, 5841784 total 3750 - age 4: 3075264 bytes, 8917048 total 3751 : 182066K->12384K(191744K), 0.0089120 secs] 2755986K->2586303K(10710272K), 0.0092180 secs] ^^^^^^^^ ^^^^^^^ Duration of Scavenge Duration of whole GC operation (includes scavenge) [Times: user=0.09 sys=0.00, real=0.01 secs] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Process virtual user and system times, and real (elapsed) time during GC operation. The time for which the application threads were stopped is about 9.2 ms. -- ramki -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20110803/1e1adccc/attachment.html From kinnari.darji at citi.com Wed Aug 3 14:40:13 2011 From: kinnari.darji at citi.com (Darji, Kinnari ) Date: Wed, 3 Aug 2011 17:40:13 -0400 Subject: understanding GC logs In-Reply-To: <4E399521.2080805@oracle.com> References: <21ED8E3420CDB647B88C7F80A7D64DAC0691E04E82@exnjmb89.nam.nsroot.net> <4E398E78.3000408@oracle.com> <21ED8E3420CDB647B88C7F80A7D64DAC0691E04F84@exnjmb89.nam.nsroot.net> <4E399521.2080805@oracle.com> Message-ID: <21ED8E3420CDB647B88C7F80A7D64DAC0691EC78E3@exnjmb89.nam.nsroot.net> Hi Ramki, Not sure what's the problem. The process dies with following when I have +PrintSafepointStatistics java version "1.6.0_16" Java(TM) SE Runtime Environment (build 1.6.0_16-b01) Java HotSpot(TM) 64-Bit Server VM (build 14.2-b01, mixed mode) vmop_name [threads: total initially_running wait_to_block] [time: spin block sync] [vmop_time time_elapsed] page_trap_count no vm operation [ 7 1 1] [ 0 0 0] [ 0 0] 0 Polling page always armed 0 VM operations coalesced during safepoint Maximum sync time 0 ms ~ Can you please help? Thank you Kinnari From: Ramki Ramakrishna [mailto:y.s.ramakrishna at oracle.com] Sent: Wednesday, August 03, 2011 2:36 PM To: Darji, Kinnari [ICG-IT] Cc: hotspot-gc-use at openjdk.java.net Subject: Re: understanding GC logs On 8/3/2011 11:18 AM, Darji, Kinnari wrote: Thanks Ramki So If I look at logs starting [GC and real times, that should be almost application STW time. Am I correct? yes. Except that the real time in that display has a resolution of 10 ms only. (Thus the 9.2 ms looked like 0.01 s below, i think.) But yes, that's the STW time. One caveat though -- this only lists STW ops attributed to GC. More generally, you would want to use +PrintSafepointStatistics to see all STW operations (and details thereof), including of course the GC ops (which are usually the most common type of STW op, but by no means the only type). -- ramki Thank you Kinnari From: Ramki Ramakrishna [mailto:y.s.ramakrishna at oracle.com] Sent: Wednesday, August 03, 2011 2:08 PM To: Darji, Kinnari [ICG-IT] Cc: hotspot-gc-use at openjdk.java.net Subject: Re: understanding GC logs On 8/3/2011 10:45 AM, Darji, Kinnari wrote: Hello GC team, What does this all different time mean? Can someone please clarify? What is the time application when application stops? [GC 9768.668: [ParNew ^^^^^^ JVM timestamp (seconds since start of JVM) at start of GC operation) 3746 Desired survivor size 10878976 bytes, new threshold 4 (max 4) 3747 - age 1: 594288 bytes, 594288 total 3748 - age 2: 2369912 bytes, 2964200 total 3749 - age 3: 2877584 bytes, 5841784 total 3750 - age 4: 3075264 bytes, 8917048 total 3751 : 182066K->12384K(191744K), 0.0089120 secs] 2755986K->2586303K(10710272K), 0.0092180 secs] ^^^^^^^^ ^^^^^^^ Duration of Scavenge Duration of whole GC operation (includes scavenge) [Times: user=0.09 sys=0.00, real=0.01 secs] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Process virtual user and system times, and real (elapsed) time during GC operation. The time for which the application threads were stopped is about 9.2 ms. -- ramki -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20110803/872ffd3a/attachment.html From y.s.ramakrishna at oracle.com Wed Aug 3 14:47:51 2011 From: y.s.ramakrishna at oracle.com (Ramki Ramakrishna) Date: Wed, 03 Aug 2011 14:47:51 -0700 Subject: PrintSafepointStatistics (was Re: understanding GC logs) In-Reply-To: <21ED8E3420CDB647B88C7F80A7D64DAC0691EC78E3@exnjmb89.nam.nsroot.net> References: <21ED8E3420CDB647B88C7F80A7D64DAC0691E04E82@exnjmb89.nam.nsroot.net> <4E398E78.3000408@oracle.com> <21ED8E3420CDB647B88C7F80A7D64DAC0691E04F84@exnjmb89.nam.nsroot.net> <4E399521.2080805@oracle.com> <21ED8E3420CDB647B88C7F80A7D64DAC0691EC78E3@exnjmb89.nam.nsroot.net> Message-ID: <4E39C207.1050500@oracle.com> Hi Kinnari -- hs14, which you are on, is rather old (current dev is hs22; latest public is hs21). Is it possible that you could switch to a more recent JDK? If that's not possible, send me an hs_err file and I can get a ticket opened for you via the usual support channels. If the problem occurs with a recent hs21 or hs22, we can certainly take a look here. In either case, I have modified the subject line for relevance to the issue at hand, and also cross-posted to hsotspot-runtime-dev at o.j.n where PrintSafepointStatistics expertise resides. -- ramki On 8/3/2011 2:40 PM, Darji, Kinnari wrote: > > Hi Ramki, > > Not sure what's the problem. The process dies with following when I > have +PrintSafepointStatistics > > java version "1.6.0_16" > > Java(TM) SE Runtime Environment (build 1.6.0_16-b01) > > Java HotSpot(TM) 64-Bit Server VM (build 14.2-b01, mixed mode) > > vmop_name [threads: total initially_running > wait_to_block] [time: spin block sync] [vmop_time time_elapsed] > page_trap_count > > no vm operation [ 7 1 > 1] [ 0 0 0] [ 0 0] 0 > > Polling page always armed > > 0 VM operations coalesced during safepoint > > Maximum sync time 0 ms > > ~ > > Can you please help? > > Thank you > > Kinnari > > *From:*Ramki Ramakrishna [mailto:y.s.ramakrishna at oracle.com] > *Sent:* Wednesday, August 03, 2011 2:36 PM > *To:* Darji, Kinnari [ICG-IT] > *Cc:* hotspot-gc-use at openjdk.java.net > *Subject:* Re: understanding GC logs > > > > On 8/3/2011 11:18 AM, Darji, Kinnari wrote: > > Thanks Ramki > > So If I look at logs starting [GC and real times, that should be > almost application STW time. Am I correct? > > > yes. Except that the real time in that display has a resolution of 10 > ms only. > (Thus the 9.2 ms looked like 0.01 s below, i think.) > > But yes, that's the STW time. > > One caveat though -- this only lists STW ops attributed to GC. > More generally, you would want to use +PrintSafepointStatistics to > see all STW operations (and details thereof), including of course the > GC ops (which are usually the most common type of STW op, but by > no means the only type). > > -- ramki > > > Thank you > > Kinnari > > *From:*Ramki Ramakrishna [mailto:y.s.ramakrishna at oracle.com] > *Sent:* Wednesday, August 03, 2011 2:08 PM > *To:* Darji, Kinnari [ICG-IT] > *Cc:* hotspot-gc-use at openjdk.java.net > > *Subject:* Re: understanding GC logs > > > > On 8/3/2011 10:45 AM, Darji, Kinnari wrote: > > Hello GC team, > > What does this all different time mean? Can someone please clarify? > > What is the time application when application stops? > > [GC 9768.668: [ParNew > > ^^^^^^ JVM timestamp (seconds since start of JVM) at start > of GC operation) > > > 3746 Desired survivor size 10878976 bytes, new threshold 4 (max 4) > > 3747 - age 1: 594288 bytes, 594288 total > > 3748 - age 2: 2369912 bytes, 2964200 total > > 3749 - age 3: 2877584 bytes, 5841784 total > > 3750 - age 4: 3075264 bytes, 8917048 total > > 3751 : 182066K->12384K(191744K), 0.0089120 secs] > 2755986K->2586303K(10710272K), 0.0092180 secs] > > ^^^^^^^^ > ^^^^^^^ > Duration of > Scavenge Duration of whole GC > operation > > (includes scavenge) > > > [Times: user=0.09 sys=0.00, real=0.01 secs] > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Process virtual user and system > times, and real (elapsed) time during GC operation. > > The time for which the application threads were stopped is about 9.2 ms. > > -- ramki > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20110803/0ab25d9b/attachment-0001.html From kinnari.darji at citi.com Thu Aug 4 08:01:27 2011 From: kinnari.darji at citi.com (Darji, Kinnari ) Date: Thu, 4 Aug 2011 11:01:27 -0400 Subject: PrintSafepointStatistics (was Re: understanding GC logs) In-Reply-To: <4E39C207.1050500@oracle.com> References: <21ED8E3420CDB647B88C7F80A7D64DAC0691E04E82@exnjmb89.nam.nsroot.net> <4E398E78.3000408@oracle.com> <21ED8E3420CDB647B88C7F80A7D64DAC0691E04F84@exnjmb89.nam.nsroot.net> <4E399521.2080805@oracle.com> <21ED8E3420CDB647B88C7F80A7D64DAC0691EC78E3@exnjmb89.nam.nsroot.net> <4E39C207.1050500@oracle.com> Message-ID: <21ED8E3420CDB647B88C7F80A7D64DAC0691EC81CA@exnjmb89.nam.nsroot.net> Hi Ramki, I am running jdk-1.6.0_16 and Java HotSpot(TM) 64-Bit Server VM (build 14.2-b01, mixed mode). I can't change JDK version. Is there any other way to have this info printed on GC logs with this JDK version? Attaching error file.. Thank you Kinnari From: Ramki Ramakrishna [mailto:y.s.ramakrishna at oracle.com] Sent: Wednesday, August 03, 2011 5:48 PM To: Darji, Kinnari [ICG-IT] Cc: 'hotspot-gc-use at openjdk.java.net'; hotspot-runtime-dev at openjdk.java.net Subject: PrintSafepointStatistics (was Re: understanding GC logs) Hi Kinnari -- hs14, which you are on, is rather old (current dev is hs22; latest public is hs21). Is it possible that you could switch to a more recent JDK? If that's not possible, send me an hs_err file and I can get a ticket opened for you via the usual support channels. If the problem occurs with a recent hs21 or hs22, we can certainly take a look here. In either case, I have modified the subject line for relevance to the issue at hand, and also cross-posted to hsotspot-runtime-dev at o.j.n where PrintSafepointStatistics expertise resides. -- ramki On 8/3/2011 2:40 PM, Darji, Kinnari wrote: Hi Ramki, Not sure what's the problem. The process dies with following when I have +PrintSafepointStatistics java version "1.6.0_16" Java(TM) SE Runtime Environment (build 1.6.0_16-b01) Java HotSpot(TM) 64-Bit Server VM (build 14.2-b01, mixed mode) vmop_name [threads: total initially_running wait_to_block] [time: spin block sync] [vmop_time time_elapsed] page_trap_count no vm operation [ 7 1 1] [ 0 0 0] [ 0 0] 0 Polling page always armed 0 VM operations coalesced during safepoint Maximum sync time 0 ms ~ Can you please help? Thank you Kinnari From: Ramki Ramakrishna [mailto:y.s.ramakrishna at oracle.com] Sent: Wednesday, August 03, 2011 2:36 PM To: Darji, Kinnari [ICG-IT] Cc: hotspot-gc-use at openjdk.java.net Subject: Re: understanding GC logs On 8/3/2011 11:18 AM, Darji, Kinnari wrote: Thanks Ramki So If I look at logs starting [GC and real times, that should be almost application STW time. Am I correct? yes. Except that the real time in that display has a resolution of 10 ms only. (Thus the 9.2 ms looked like 0.01 s below, i think.) But yes, that's the STW time. One caveat though -- this only lists STW ops attributed to GC. More generally, you would want to use +PrintSafepointStatistics to see all STW operations (and details thereof), including of course the GC ops (which are usually the most common type of STW op, but by no means the only type). -- ramki Thank you Kinnari From: Ramki Ramakrishna [mailto:y.s.ramakrishna at oracle.com] Sent: Wednesday, August 03, 2011 2:08 PM To: Darji, Kinnari [ICG-IT] Cc: hotspot-gc-use at openjdk.java.net Subject: Re: understanding GC logs On 8/3/2011 10:45 AM, Darji, Kinnari wrote: Hello GC team, What does this all different time mean? Can someone please clarify? What is the time application when application stops? [GC 9768.668: [ParNew ^^^^^^ JVM timestamp (seconds since start of JVM) at start of GC operation) 3746 Desired survivor size 10878976 bytes, new threshold 4 (max 4) 3747 - age 1: 594288 bytes, 594288 total 3748 - age 2: 2369912 bytes, 2964200 total 3749 - age 3: 2877584 bytes, 5841784 total 3750 - age 4: 3075264 bytes, 8917048 total 3751 : 182066K->12384K(191744K), 0.0089120 secs] 2755986K->2586303K(10710272K), 0.0092180 secs] ^^^^^^^^ ^^^^^^^ Duration of Scavenge Duration of whole GC operation (includes scavenge) [Times: user=0.09 sys=0.00, real=0.01 secs] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Process virtual user and system times, and real (elapsed) time during GC operation. The time for which the application threads were stopped is about 9.2 ms. -- ramki -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20110804/db804156/attachment.html -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: HSError.txt Url: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20110804/db804156/attachment.txt From y.s.ramakrishna at oracle.com Thu Aug 4 10:12:20 2011 From: y.s.ramakrishna at oracle.com (Y. S. Ramakrishna) Date: Thu, 04 Aug 2011 10:12:20 -0700 Subject: PrintSafepointStatistics (was Re: understanding GC logs) In-Reply-To: <21ED8E3420CDB647B88C7F80A7D64DAC0691EC81CA@exnjmb89.nam.nsroot.net> References: <21ED8E3420CDB647B88C7F80A7D64DAC0691E04E82@exnjmb89.nam.nsroot.net> <4E398E78.3000408@oracle.com> <21ED8E3420CDB647B88C7F80A7D64DAC0691E04F84@exnjmb89.nam.nsroot.net> <4E399521.2080805@oracle.com> <21ED8E3420CDB647B88C7F80A7D64DAC0691EC78E3@exnjmb89.nam.nsroot.net> <4E39C207.1050500@oracle.com> <21ED8E3420CDB647B88C7F80A7D64DAC0691EC81CA@exnjmb89.nam.nsroot.net> Message-ID: <4E3AD2F4.6070308@oracle.com> Hi Kinnari -- On 08/04/11 08:01, Darji, Kinnari wrote: > Hi Ramki, > > I am running jdk-1.6.0_16 and Java HotSpot(TM) 64-Bit Server VM (build > 14.2-b01, mixed mode). I can?t change JDK version. Is there any other > way to have this info printed on GC logs with this JDK version? I'll get you in touch, off-list, with support folk so they can help open a service ticket based on your support contract for the older JDK. As regards having this kind of info printed without recourse to +PrintSafepointStatistics, try -XX:+PrintGCApplicationStoppedTime and -XX:+PrintGCApplicationConcurrentTime, which should give you the times you want, albeit with none of the finer details that +PrintSafepointStatistics would have provided you. Here's a description of those flags from globals.hpp:- product(bool, PrintGCApplicationConcurrentTime, false, \ "Print the time the application has been running") \ \ product(bool, PrintGCApplicationStoppedTime, false, \ "Print the time the application has been stopped") \ \ (... basically between safepoints, or at safepoints respectively). As regards: > > > > Attaching error file.. As I understood you were getting a JVM crash when you used +PrintSafepointStatistics with 6u16. In that case, the JVM would typically dump a file named hs_err_.log in the $CWD of your invoking shell. That's what the support folks would want (along with the core file may be in some cases). Please send the hs_err_*.log file so I can provide that to the support folk. It is possible that someone on the runtime list might already recognize this problem as one that has since been fixed. -- ramki > > > > Thank you > > Kinnari > > > > *From:* Ramki Ramakrishna [mailto:y.s.ramakrishna at oracle.com] > *Sent:* Wednesday, August 03, 2011 5:48 PM > *To:* Darji, Kinnari [ICG-IT] > *Cc:* 'hotspot-gc-use at openjdk.java.net'; > hotspot-runtime-dev at openjdk.java.net > *Subject:* PrintSafepointStatistics (was Re: understanding GC logs) > > > > Hi Kinnari -- hs14, which you are on, is rather old (current dev is > hs22; latest public is hs21). > Is it possible that you could switch to a more recent JDK? If that's not > possible, > send me an hs_err file and I can get a ticket opened for you via the > usual support > channels. If the problem occurs with a recent hs21 or hs22, we can certainly > take a look here. In either case, I have modified the subject line for > relevance > to the issue at hand, and also cross-posted to > hsotspot-runtime-dev at o.j.n > where PrintSafepointStatistics expertise resides. > > -- ramki > > On 8/3/2011 2:40 PM, Darji, Kinnari wrote: > > Hi Ramki, > > Not sure what?s the problem. The process dies with following when I have > +PrintSafepointStatistics > > > > java version "1.6.0_16" > > Java(TM) SE Runtime Environment (build 1.6.0_16-b01) > > Java HotSpot(TM) 64-Bit Server VM (build 14.2-b01, mixed mode) > > vmop_name [threads: total initially_running > wait_to_block] [time: spin block sync] [vmop_time time_elapsed] > page_trap_count > > no vm operation [ 7 1 > 1] [ 0 0 0] [ 0 0] 0 > > > > Polling page always armed > > 0 VM operations coalesced during safepoint > > Maximum sync time 0 ms > > ~ > > > > Can you please help? > > > > Thank you > > Kinnari > > > > *From:* Ramki Ramakrishna [mailto:y.s.ramakrishna at oracle.com] > *Sent:* Wednesday, August 03, 2011 2:36 PM > *To:* Darji, Kinnari [ICG-IT] > *Cc:* hotspot-gc-use at openjdk.java.net > > *Subject:* Re: understanding GC logs > > > > > > On 8/3/2011 11:18 AM, Darji, Kinnari wrote: > > Thanks Ramki > > So If I look at logs starting [GC and real times, that should be almost > application STW time. Am I correct? > > > yes. Except that the real time in that display has a resolution of 10 ms > only. > (Thus the 9.2 ms looked like 0.01 s below, i think.) > > But yes, that's the STW time. > > One caveat though -- this only lists STW ops attributed to GC. > More generally, you would want to use +PrintSafepointStatistics to > see all STW operations (and details thereof), including of course the > GC ops (which are usually the most common type of STW op, but by > no means the only type). > > -- ramki > > > > > > Thank you > > Kinnari > > > > *From:* Ramki Ramakrishna [mailto:y.s.ramakrishna at oracle.com] > *Sent:* Wednesday, August 03, 2011 2:08 PM > *To:* Darji, Kinnari [ICG-IT] > *Cc:* hotspot-gc-use at openjdk.java.net > > *Subject:* Re: understanding GC logs > > > > > > On 8/3/2011 10:45 AM, Darji, Kinnari wrote: > > > > Hello GC team, > > What does this all different time mean? Can someone please clarify? > > What is the time application when application stops? > > > > [GC 9768.668: [ParNew > > ^^^^^^ JVM timestamp (seconds since start of JVM) at start of > GC operation) > > > > 3746 Desired survivor size 10878976 bytes, new threshold 4 (max 4) > > 3747 - age 1: 594288 bytes, 594288 total > > 3748 - age 2: 2369912 bytes, 2964200 total > > 3749 - age 3: 2877584 bytes, 5841784 total > > 3750 - age 4: 3075264 bytes, 8917048 total > > 3751 : 182066K->12384K(191744K), 0.0089120 secs] > 2755986K->2586303K(10710272K), 0.0092180 secs] > > ^^^^^^^^ > ^^^^^^^ > Duration of > Scavenge Duration of whole GC > operation > > (includes scavenge) > > > > [Times: user=0.09 sys=0.00, real=0.01 secs] > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Process virtual user and system > times, and real (elapsed) time during GC operation. > > The time for which the application threads were stopped is about 9.2 ms. > > -- ramki > From y.s.ramakrishna at oracle.com Thu Aug 4 11:20:33 2011 From: y.s.ramakrishna at oracle.com (Y. S. Ramakrishna) Date: Thu, 04 Aug 2011 11:20:33 -0700 Subject: PrintSafepointStatistics (was Re: understanding GC logs) In-Reply-To: <21ED8E3420CDB647B88C7F80A7D64DAC0691F84295@exnjmb89.nam.nsroot.net> References: <21ED8E3420CDB647B88C7F80A7D64DAC0691E04E82@exnjmb89.nam.nsroot.net> <4E398E78.3000408@oracle.com> <21ED8E3420CDB647B88C7F80A7D64DAC0691E04F84@exnjmb89.nam.nsroot.net> <4E399521.2080805@oracle.com> <21ED8E3420CDB647B88C7F80A7D64DAC0691EC78E3@exnjmb89.nam.nsroot.net> <4E39C207.1050500@oracle.com> <21ED8E3420CDB647B88C7F80A7D64DAC0691EC81CA@exnjmb89.nam.nsroot.net> <4E3AD2F4.6070308@oracle.com> <21ED8E3420CDB647B88C7F80A7D64DAC0691F84295@exnjmb89.nam.nsroot.net> Message-ID: <4E3AE2F1.2040401@oracle.com> There should be a header line that tells you what each column represents. It was in one of your earlier emails, see below: (The column header is printed more frequently in later versions of the JVM making interpretation easier; see also more on that below.) On 08/04/11 10:34, Darji, Kinnari wrote: > Ramki, > I tried it one more time and my process came up fine. I see following logs in console log. Though I don't see anything on verbose GC logs. Is that proper output? If so, how do I interpret following logs? > > Deoptimize [ 8 0 0] [ 0 0 0] [ 0 0] 0 > Deoptimize [ 9 0 0] [ 0 0 0] [ 0 26] 0 > Deoptimize [ 9 0 0] [ 0 0 0] [ 0 46] 0 > Deoptimize [ 13 0 0] [ 0 0 0] [ 0 860] 0 > Deoptimize [ 13 0 0] [ 0 0 0] [ 0 714] 0 > Deoptimize [ 15 0 0] [ 0 0 0] [ 0 760] 0 > Deoptimize [ 15 0 0] [ 0 0 0] [ 0 0] 0 > Deoptimize [ 15 0 0] [ 0 0 0] [ 1 310] 0 > GenCollectForAllocation [ 15 0 0] [ 0 0 0] [ 27 292] 0 > Deoptimize [ 18 0 0] [ 0 0 0] [ 0 821] 0 > EnableBiasedLocking [ 30 0 0] [ 0 0 0] [ 1 355] 0 > BulkRevokeBias [ 32 1 0] [ 0 0 0] [ 3 1905] 0 > RevokeBias [ 32 0 1] [ 0 0 0] [ 3 9] 0 > RevokeBias [ 31 0 2] [ 0 0 0] [ 0 4] 0 > RevokeBias [ 30 0 0] [ 0 0 0] [ 1 1] 0 > RevokeBias [ 29 0 0] [ 0 0 0] [ 2 7] 0 > RevokeBias [ 28 0 1] [ 0 0 0] [ 2 2] 0 > BulkRevokeBias [ 28 0 1] [ 0 0 0] [ 0 2] 0 > RevokeBias [ 27 1 0] [ 0 0 0] [ 1 1] 0 > RevokeBias [ 25 0 1] [ 0 0 0] [ 2 3] > > Thank you > Kinnari > This here:- ... >> vmop_name [threads: total initially_running wait_to_block] [time: spin block sync] [vmop_time time_elapsed] page_trap_count Unfortunately, this is not the easiest thing to interpret if you are not familiar with the JVM safepoint protocol details (it's intended for extreme performance tuning or troubleshooting); not only that, because the data is printed in "batches", it's less than easy to align it with GC or other logging. (Later versions (in hs20 or later) fixed this somewhat by providing a JVM timestamp against each -- wherease above you need to reconstruct that info from the deltas, which is a pain.) So my advice is to use a newer JVM if you can, and if you can't then just rely on the less detailed, but easier to align, +PrintGCApplication{Concurrent,Stopped}Time flags. By the way, if the crash that you reported earlier does not in fact happen, please make sure to tell the support engineering contacts, who may contacted you off-list, so they can close off any ticket they may have opened for your report. thanks. -- ramki From kinnari.darji at citi.com Thu Aug 4 11:24:46 2011 From: kinnari.darji at citi.com (Darji, Kinnari ) Date: Thu, 4 Aug 2011 14:24:46 -0400 Subject: PrintSafepointStatistics (was Re: understanding GC logs) In-Reply-To: <4E3AE2F1.2040401@oracle.com> References: <21ED8E3420CDB647B88C7F80A7D64DAC0691E04E82@exnjmb89.nam.nsroot.net> <4E398E78.3000408@oracle.com> <21ED8E3420CDB647B88C7F80A7D64DAC0691E04F84@exnjmb89.nam.nsroot.net> <4E399521.2080805@oracle.com> <21ED8E3420CDB647B88C7F80A7D64DAC0691EC78E3@exnjmb89.nam.nsroot.net> <4E39C207.1050500@oracle.com> <21ED8E3420CDB647B88C7F80A7D64DAC0691EC81CA@exnjmb89.nam.nsroot.net> <4E3AD2F4.6070308@oracle.com> <21ED8E3420CDB647B88C7F80A7D64DAC0691F84295@exnjmb89.nam.nsroot.net> <4E3AE2F1.2040401@oracle.com> Message-ID: <21ED8E3420CDB647B88C7F80A7D64DAC0691F84463@exnjmb89.nam.nsroot.net> Sure, I will let them know.. Thanks a lot for your help. I do appreciate it. Thank you Kinnari -----Original Message----- From: Y. S. Ramakrishna [mailto:y.s.ramakrishna at oracle.com] Sent: Thursday, August 04, 2011 2:21 PM To: Darji, Kinnari [ICG-IT] Cc: hotspot-gc-use at openjdk.java.net; hostpot-runtime-dev at openjdk.java.net Subject: Re: PrintSafepointStatistics (was Re: understanding GC logs) There should be a header line that tells you what each column represents. It was in one of your earlier emails, see below: (The column header is printed more frequently in later versions of the JVM making interpretation easier; see also more on that below.) On 08/04/11 10:34, Darji, Kinnari wrote: > Ramki, > I tried it one more time and my process came up fine. I see following logs in console log. Though I don't see anything on verbose GC logs. Is that proper output? If so, how do I interpret following logs? > > Deoptimize [ 8 0 0] [ 0 0 0] [ 0 0] 0 > Deoptimize [ 9 0 0] [ 0 0 0] [ 0 26] 0 > Deoptimize [ 9 0 0] [ 0 0 0] [ 0 46] 0 > Deoptimize [ 13 0 0] [ 0 0 0] [ 0 860] 0 > Deoptimize [ 13 0 0] [ 0 0 0] [ 0 714] 0 > Deoptimize [ 15 0 0] [ 0 0 0] [ 0 760] 0 > Deoptimize [ 15 0 0] [ 0 0 0] [ 0 0] 0 > Deoptimize [ 15 0 0] [ 0 0 0] [ 1 310] 0 > GenCollectForAllocation [ 15 0 0] [ 0 0 0] [ 27 292] 0 > Deoptimize [ 18 0 0] [ 0 0 0] [ 0 821] 0 > EnableBiasedLocking [ 30 0 0] [ 0 0 0] [ 1 355] 0 > BulkRevokeBias [ 32 1 0] [ 0 0 0] [ 3 1905] 0 > RevokeBias [ 32 0 1] [ 0 0 0] [ 3 9] 0 > RevokeBias [ 31 0 2] [ 0 0 0] [ 0 4] 0 > RevokeBias [ 30 0 0] [ 0 0 0] [ 1 1] 0 > RevokeBias [ 29 0 0] [ 0 0 0] [ 2 7] 0 > RevokeBias [ 28 0 1] [ 0 0 0] [ 2 2] 0 > BulkRevokeBias [ 28 0 1] [ 0 0 0] [ 0 2] 0 > RevokeBias [ 27 1 0] [ 0 0 0] [ 1 1] 0 > RevokeBias [ 25 0 1] [ 0 0 0] [ 2 3] > > Thank you > Kinnari > This here:- ... >> vmop_name [threads: total initially_running wait_to_block] [time: spin block sync] [vmop_time time_elapsed] page_trap_count Unfortunately, this is not the easiest thing to interpret if you are not familiar with the JVM safepoint protocol details (it's intended for extreme performance tuning or troubleshooting); not only that, because the data is printed in "batches", it's less than easy to align it with GC or other logging. (Later versions (in hs20 or later) fixed this somewhat by providing a JVM timestamp against each -- wherease above you need to reconstruct that info from the deltas, which is a pain.) So my advice is to use a newer JVM if you can, and if you can't then just rely on the less detailed, but easier to align, +PrintGCApplication{Concurrent,Stopped}Time flags. By the way, if the crash that you reported earlier does not in fact happen, please make sure to tell the support engineering contacts, who may contacted you off-list, so they can close off any ticket they may have opened for your report. thanks. -- ramki From matt.fowles at gmail.com Fri Aug 5 10:51:31 2011 From: matt.fowles at gmail.com (Matt Fowles) Date: Fri, 5 Aug 2011 13:51:31 -0400 Subject: Log Visualization Tools Message-ID: All~ What tools do people know of or have for parsing gc logs and visualizing the results? The only thing I can find, GCViewer, (from http://www.tagtraum.com/gcviewer.html) seems like it has not been updated for a while and does not parse a lot of more complicated logs (-XX:+PrintTenuringDistribution or -XX:PrintCMSStatistics=1). Are there more tools out there? Are there in house tools that people are willing to share? Matt From eric.caspole at amd.com Fri Aug 5 12:40:38 2011 From: eric.caspole at amd.com (Eric Caspole) Date: Fri, 5 Aug 2011 15:40:38 -0400 Subject: Log Visualization Tools In-Reply-To: References: Message-ID: <956CA137-70AD-40A0-8757-56BE98A429A9@amd.com> Sometimes I use HPjmeter for plain Xloggc, but I don't think it can do the fancy extra flags either. On Aug 5, 2011, at 1:51 PM, Matt Fowles wrote: > All~ > > What tools do people know of or have for parsing gc logs and > visualizing the results? > > The only thing I can find, GCViewer, (from > http://www.tagtraum.com/gcviewer.html) seems like it has not been > updated for a while and does not parse a lot of more complicated logs > (-XX:+PrintTenuringDistribution or -XX:PrintCMSStatistics=1). > > Are there more tools out there? Are there in house tools that people > are willing to share? > > Matt > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > From y.s.ramakrishna at oracle.com Fri Aug 5 13:33:22 2011 From: y.s.ramakrishna at oracle.com (Ramki Ramakrishna) Date: Fri, 05 Aug 2011 13:33:22 -0700 Subject: Log Visualization Tools In-Reply-To: <4E3C5253.2050802@oracle.com> References: <956CA137-70AD-40A0-8757-56BE98A429A9@amd.com> <4E3C5253.2050802@oracle.com> Message-ID: <4E3C5392.7080906@oracle.com> Sorry for the noise: my response was sent to the wrong list in error; corrected herewith. -- ramki On 8/5/2011 1:28 PM, Ramki Ramakrishna wrote: > Same here -- i sometimes use an internal homegrown awk script to > extract the > metrics, so they can be massaged into a data file amenable to plotting > with gnuplot. > That script does not take well to the extra output either, so we > usually strip out the > extra output and deal with only the more fundamental metrics only. The > extra output > from the more fancy flags has thus far been consumed only by humans or > extracted on an ad-hoc basis into > spreadsheets and such. This is clearly not a nice state of affairs. I > believe there is > work (or plans?) underway for some kind of logging framework into > which the JVM will feed > its metrics, and hopefully the tooling that consumes those logs will > be able to > deal with all these issues in a more uniform fashion once and for > all.... Unfortunately, > I have no real details of that work, though... > > Then there is gchisto which is GC-specific (but which also cannot > consume the output > from the more fancy flags), but that has been placed on the backseat > as other issues > have intervened. > In general, until GC logging formats are standardized, tools that > consume textual > output from the JVM/GC will tend to break unless changes to these text > formats are > carefully controlled. There has been some talk on and off about trying to > standardize those formats, but I am not sure about the status of that. > May be the > logging framework mentioned earlier will provide a superstructure from > which such > textual standardization will result naturally. > > -- ramki > > On 8/5/2011 12:40 PM, Eric Caspole wrote: >> Sometimes I use HPjmeter for plain Xloggc, but I don't think it can >> do the fancy extra flags either. >> >> On Aug 5, 2011, at 1:51 PM, Matt Fowles wrote: >> >>> All~ >>> >>> What tools do people know of or have for parsing gc logs and >>> visualizing the results? >>> >>> The only thing I can find, GCViewer, (from >>> http://www.tagtraum.com/gcviewer.html) seems like it has not been >>> updated for a while and does not parse a lot of more complicated logs >>> (-XX:+PrintTenuringDistribution or -XX:PrintCMSStatistics=1). >>> >>> Are there more tools out there? Are there in house tools that people >>> are willing to share? >>> >>> Matt >>> _______________________________________________ >>> hotspot-gc-use mailing list >>> hotspot-gc-use at openjdk.java.net >>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>> >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From matt.fowles at gmail.com Fri Aug 5 14:06:58 2011 From: matt.fowles at gmail.com (Matt Fowles) Date: Fri, 5 Aug 2011 17:06:58 -0400 Subject: Log Visualization Tools In-Reply-To: <4E3C5392.7080906@oracle.com> References: <956CA137-70AD-40A0-8757-56BE98A429A9@amd.com> <4E3C5253.2050802@oracle.com> <4E3C5392.7080906@oracle.com> Message-ID: Ramki~ What rules govern the upgrade of logging formats? Can the format only be changed on major releases or can we just add a flag 'new format' to minor releases? Matt On Fri, Aug 5, 2011 at 4:33 PM, Ramki Ramakrishna wrote: > Sorry for the noise: my response was sent to the wrong list in error; > corrected herewith. > > -- ramki > > On 8/5/2011 1:28 PM, Ramki Ramakrishna wrote: >> Same here -- i sometimes use an internal homegrown awk script to >> extract the >> metrics, so they can be massaged into a data file amenable to plotting >> with gnuplot. >> That script does not take well to the extra output either, so we >> usually strip out the >> extra output and deal with only the more fundamental metrics only. The >> extra output >> from the more fancy flags has thus far been consumed only by humans or >> extracted on an ad-hoc basis into >> spreadsheets and such. This is clearly not a nice state of affairs. I >> believe there is >> work (or plans?) underway for some kind of logging framework into >> which the JVM will feed >> its metrics, and hopefully the tooling that consumes those logs will >> be able to >> deal with all these issues in a more uniform fashion once and for >> all.... Unfortunately, >> I have no real details of that work, though... >> >> Then there is gchisto which is GC-specific (but which also cannot >> consume the output >> from the more fancy flags), but that has been placed on the backseat >> as other issues >> have intervened. >> In general, until GC logging formats are standardized, tools that >> consume textual >> output from the JVM/GC will tend to break unless changes to these text >> formats are >> carefully controlled. There has been some talk on and off about trying to >> standardize those formats, but I am not sure about the status of that. >> May be the >> logging framework mentioned earlier will provide a superstructure from >> which such >> textual standardization will result naturally. >> >> -- ramki >> >> On 8/5/2011 12:40 PM, Eric Caspole wrote: >>> Sometimes I use HPjmeter for plain Xloggc, but I don't think it can >>> do the fancy extra flags either. >>> >>> On Aug 5, 2011, at 1:51 PM, Matt Fowles wrote: >>> >>>> All~ >>>> >>>> What tools do people know of or have for parsing gc logs and >>>> visualizing the results? >>>> >>>> The only thing I can find, GCViewer, (from >>>> http://www.tagtraum.com/gcviewer.html) seems like it has not been >>>> updated for a while and does not parse a lot of more complicated logs >>>> (-XX:+PrintTenuringDistribution or -XX:PrintCMSStatistics=1). >>>> >>>> Are there more tools out there? ?Are there in house tools that people >>>> are willing to share? >>>> >>>> Matt >>>> _______________________________________________ >>>> hotspot-gc-use mailing list >>>> hotspot-gc-use at openjdk.java.net >>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>>> >>> >>> _______________________________________________ >>> hotspot-gc-use mailing list >>> hotspot-gc-use at openjdk.java.net >>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > From y.s.ramakrishna at oracle.com Fri Aug 5 14:22:46 2011 From: y.s.ramakrishna at oracle.com (Ramki Ramakrishna) Date: Fri, 05 Aug 2011 14:22:46 -0700 Subject: Log Visualization Tools In-Reply-To: References: <956CA137-70AD-40A0-8757-56BE98A429A9@amd.com> <4E3C5253.2050802@oracle.com> <4E3C5392.7080906@oracle.com> Message-ID: <4E3C5F26.9090707@oracle.com> Historically, the non-standard ("extra fancy") flag output has not been governed by any rules. The basic logging format however has basically not changed since 1.4.2, as far as i recall. Of course, each collector, typically released in a major new release, has had its little quirks even within the basic format. Unfortunately, the situation has not been governed by any strict rules -- although we have never knowingly introduced changes to formatting in minor releases -- or even major releases -- for fear of breaking existing log-parsing scripts, I am sure some have slipped through. In the absence of a spec for the format, QA has never written tests to protect against inadvertent regressions. (Sorry for talking like that; i know it sounds rather like a lawyer or politician talking when i read that email back! :-) I am hoping things will get better, more standardized, going forward, so people can use tools without fear of them breaking with a new release. -- ramki On 8/5/2011 2:06 PM, Matt Fowles wrote: > Ramki~ > > What rules govern the upgrade of logging formats? Can the format only > be changed on major releases or can we just add a flag 'new format' to > minor releases? > > Matt > > On Fri, Aug 5, 2011 at 4:33 PM, Ramki Ramakrishna > wrote: >> Sorry for the noise: my response was sent to the wrong list in error; >> corrected herewith. >> >> -- ramki >> >> On 8/5/2011 1:28 PM, Ramki Ramakrishna wrote: >>> Same here -- i sometimes use an internal homegrown awk script to >>> extract the >>> metrics, so they can be massaged into a data file amenable to plotting >>> with gnuplot. >>> That script does not take well to the extra output either, so we >>> usually strip out the >>> extra output and deal with only the more fundamental metrics only. The >>> extra output >>> from the more fancy flags has thus far been consumed only by humans or >>> extracted on an ad-hoc basis into >>> spreadsheets and such. This is clearly not a nice state of affairs. I >>> believe there is >>> work (or plans?) underway for some kind of logging framework into >>> which the JVM will feed >>> its metrics, and hopefully the tooling that consumes those logs will >>> be able to >>> deal with all these issues in a more uniform fashion once and for >>> all.... Unfortunately, >>> I have no real details of that work, though... >>> >>> Then there is gchisto which is GC-specific (but which also cannot >>> consume the output >>> from the more fancy flags), but that has been placed on the backseat >>> as other issues >>> have intervened. >>> In general, until GC logging formats are standardized, tools that >>> consume textual >>> output from the JVM/GC will tend to break unless changes to these text >>> formats are >>> carefully controlled. There has been some talk on and off about trying to >>> standardize those formats, but I am not sure about the status of that. >>> May be the >>> logging framework mentioned earlier will provide a superstructure from >>> which such >>> textual standardization will result naturally. >>> >>> -- ramki >>> >>> On 8/5/2011 12:40 PM, Eric Caspole wrote: >>>> Sometimes I use HPjmeter for plain Xloggc, but I don't think it can >>>> do the fancy extra flags either. >>>> >>>> On Aug 5, 2011, at 1:51 PM, Matt Fowles wrote: >>>> >>>>> All~ >>>>> >>>>> What tools do people know of or have for parsing gc logs and >>>>> visualizing the results? >>>>> >>>>> The only thing I can find, GCViewer, (from >>>>> http://www.tagtraum.com/gcviewer.html) seems like it has not been >>>>> updated for a while and does not parse a lot of more complicated logs >>>>> (-XX:+PrintTenuringDistribution or -XX:PrintCMSStatistics=1). >>>>> >>>>> Are there more tools out there? Are there in house tools that people >>>>> are willing to share? >>>>> >>>>> Matt >>>>> _______________________________________________ >>>>> hotspot-gc-use mailing list >>>>> hotspot-gc-use at openjdk.java.net >>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>>>> >>>> _______________________________________________ >>>> hotspot-gc-use mailing list >>>> hotspot-gc-use at openjdk.java.net >>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> From matt.fowles at gmail.com Sat Aug 6 13:15:03 2011 From: matt.fowles at gmail.com (Matt Fowles) Date: Sat, 6 Aug 2011 16:15:03 -0400 Subject: Log Visualization Tools In-Reply-To: References: <956CA137-70AD-40A0-8757-56BE98A429A9@amd.com> Message-ID: Kirk~ I appreciate the offer. I actually already analyzed the logs in question (by hand) and was just regretting the lack of tooling in this space. I am definitely interested in using your tools (and even contributing back to them) once they come out. Thanks, Matt On Sat, Aug 6, 2011 at 3:55 PM, Charles K Pepperdine wrote: > Hi Matt, > > If you send me the GC Log I'll happily analyze it for you. I've got some tooling that is close to release. Alpha should be by end of August. > > Regards, > Kirk > > On Aug 5, 2011, at 9:40 PM, Eric Caspole wrote: > >> Sometimes I use HPjmeter for plain Xloggc, but I don't think it can >> do the fancy extra flags either. >> >> On Aug 5, 2011, at 1:51 PM, Matt Fowles wrote: >> >>> All~ >>> >>> What tools do people know of or have for parsing gc logs and >>> visualizing the results? >>> >>> The only thing I can find, GCViewer, (from >>> http://www.tagtraum.com/gcviewer.html) seems like it has not been >>> updated for a while and does not parse a lot of more complicated logs >>> (-XX:+PrintTenuringDistribution or -XX:PrintCMSStatistics=1). >>> >>> Are there more tools out there? ?Are there in house tools that people >>> are willing to share? >>> >>> Matt >>> _______________________________________________ >>> hotspot-gc-use mailing list >>> hotspot-gc-use at openjdk.java.net >>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>> >> >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > From kirk at kodewerk.com Sat Aug 6 12:55:14 2011 From: kirk at kodewerk.com (Charles K Pepperdine) Date: Sat, 6 Aug 2011 21:55:14 +0200 Subject: Log Visualization Tools In-Reply-To: <956CA137-70AD-40A0-8757-56BE98A429A9@amd.com> References: <956CA137-70AD-40A0-8757-56BE98A429A9@amd.com> Message-ID: Hi Matt, If you send me the GC Log I'll happily analyze it for you. I've got some tooling that is close to release. Alpha should be by end of August. Regards, Kirk On Aug 5, 2011, at 9:40 PM, Eric Caspole wrote: > Sometimes I use HPjmeter for plain Xloggc, but I don't think it can > do the fancy extra flags either. > > On Aug 5, 2011, at 1:51 PM, Matt Fowles wrote: > >> All~ >> >> What tools do people know of or have for parsing gc logs and >> visualizing the results? >> >> The only thing I can find, GCViewer, (from >> http://www.tagtraum.com/gcviewer.html) seems like it has not been >> updated for a while and does not parse a lot of more complicated logs >> (-XX:+PrintTenuringDistribution or -XX:PrintCMSStatistics=1). >> >> Are there more tools out there? Are there in house tools that people >> are willing to share? >> >> Matt >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From chsu79 at gmail.com Mon Aug 22 08:54:52 2011 From: chsu79 at gmail.com (Christian) Date: Mon, 22 Aug 2011 17:54:52 +0200 Subject: Young GC pause time definitions In-Reply-To: <92354D18-E0DC-4C02-8D61-91C076FB548D@kodewerk.com> References: <92354D18-E0DC-4C02-8D61-91C076FB548D@kodewerk.com> Message-ID: Sorry for my silence, I have been meaning to come back to this thread first when I had new information of value to report. (I'll move this discussion to hotspot-gc-use.) The bug that was fixed in 6u19 was causing some of the increased GC times that I could see. Upgrading the jdk did improve the situation. I don't have exact numbers. It would be interesting to learn how you guys with good insight to the details of the GC would use publicly available flags to get information that break down what the GC is spending time on. I'm going to be able to have remote access to their lab environment for some time at which point I could try experiments, and extract information from gc debug output. PrintTenuringDistribution is definitely something I want to enable because I'm curious about the survival rate. On Mon, Aug 22, 2011 at 17:08, Charles K Pepperdine wrote: >> >> >> The customer site is running an old jdk 1.6.0_14, with >> -XX:+UseParNewGC and -XX:UseConcMarkSweepGC. Uses a 12 G heap, a >> relatively small 512Mb new size. > > This seems like a highly suspicious configuration that I would guess is at the root of the problem. Please use -XX:+PrintTenuringDistribution and post the gc log if you can. > > Regards, > Kirk Pepperdine > > From lawrence.chow at oracle.com Tue Aug 23 12:00:57 2011 From: lawrence.chow at oracle.com (lawrence.chow at oracle.com) Date: Tue, 23 Aug 2011 12:00:57 -0700 (PDT) Subject: Auto Reply: hotspot-gc-use Digest, Vol 42, Issue 10 Message-ID: Lawrence Chow will be out of the office on 08/20/11 through 08/29/11 Lawrence will return to the office on Tueday, 08/30/11. Please contact Matt.Mille at oracle.com, Terry.Statt at oracle.com, or Mary.McCarthy at oracle.com if assistance is needed from a Java collaborator in my absence. From sergey.melderis at gmail.com Mon Aug 29 10:34:42 2011 From: sergey.melderis at gmail.com (Sergejs Melderis) Date: Mon, 29 Aug 2011 13:34:42 -0400 Subject: Default max heap size Message-ID: Hello. I am trying to figure out how the hotspot chooses the default maximum heap size. I posted this ?question to stackoverflow, but got no answers. I don't want to repeat it here, so here is the question http://stackoverflow.com/questions/7194526/hotspot-default-max-heap-size I searched the jdk source code, for the place where it is calculated. I found function set_heap_size defined here http://hg.openjdk.java.net/jdk6/jdk6/hotspot/file/dc40301aed45/src/share/vm/runtime/arguments.cpp If am not wrong, the calculation happens in the following lines if (FLAG_IS_DEFAULT(MaxHeapSize)) { ? ?julong reasonable_max = phys_mem / MaxRAMFraction; ? ?if (phys_mem <= MaxHeapSize * MinRAMFraction) { ? ? ?// Small physical memory, so use a minimum fraction of it for the heap ? ? ?reasonable_max = phys_mem / MinRAMFraction; ? ?} else { ? ? ?// Not-small physical memory, so require a heap at least ? ? ?// as large as MaxHeapSize ? ? ?reasonable_max = MAX2(reasonable_max, (julong)MaxHeapSize); ? ?} MaxRAMFraction is 4, so reasonable_max is phys_mem / 4. So, unless physical memory is very small, the reasonable_max will be MAX2(reasonable_max, (julong)MaxHeapSize); MAX2 is defined as #define MAX2(a, b) (((a) < (b)) ? (b) : (a)) At the end reasonable_max is set as MaxHeapSize FLAG_SET_ERGO(uintx, MaxHeapSize, (uintx)reasonable_max); If I plug in the memory size on my test machine, the reasonable_max will be very close to what I get from jmap -heap. With RAM of 8, 16 GB, or more, the MaxHeapSize will be greater than 1 GB, which contradicts the documentation http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html#par_gc.ergonomics.default_size Thanks, Sergey. From david.tavoularis at mycom-int.com Mon Aug 29 02:25:38 2011 From: david.tavoularis at mycom-int.com (David Tavoularis) Date: Mon, 29 Aug 2011 11:25:38 +0200 Subject: Long "stop-the-world" pauses in CMS GC mode Message-ID: Hi, I am trying to understand the cause of "stop-the-world" pauses in my application using CMS GC and a large heap (48GB). The production server SF6900 (24 x dual-core UltraSparc-IV 1.35GHz, 48 working threads, 140GB RAM) is running on Solaris 9 and Java6u25. I know that there are several possible causes : 1) OldGen fragmentation : to avoid it, I implemented an automatic FullGC in crontab at 2:30am 30 2 * * * /usr/jdk/instances/jdk1.6.0/bin/jmap -d64 -histo:live `/usr/bin/pgrep -f "XXXXXXX"` 2>&1 >/dev/null 2) Weak refs processing : a workaround (not tried yet) is to use -XX:+ParallelRefProcEnabled, as described in the following articles : http://blogs.oracle.com/jonthecollector/entry/top_10_gc_reasons http://stackoverflow.com/questions/4101540/how-can-i-lower-the-weak-ref-processing-time-during-gc I have found out that it could be triggered by the daily unreferencing of a big object containing millions of small objects (using weak references). The application has been running for almost a week and I can see some "stop-the-world" pauses longer than 10 seconds : $ egrep "Total time for which application threads were stopped: [0-9][0-9]\." gc_201108232207.log Total time for which application threads were stopped: 10.8630158 seconds <- due to weak refs Total time for which application threads were stopped: 18.5259611 seconds Total time for which application threads were stopped: 10.0777809 seconds <- due to weak refs Total time for which application threads were stopped: 61.5576519 seconds Total time for which application threads were stopped: 19.0205127 seconds Total time for which application threads were stopped: 20.6893643 seconds Total time for which application threads were stopped: 16.0048075 seconds Total time for which application threads were stopped: 12.3665083 seconds <- due to weak refs Total time for which application threads were stopped: 11.5213443 seconds <- due to weak refs Total time for which application threads were stopped: 37.1018520 seconds <- due to weak refs Total time for which application threads were stopped: 16.3988783 seconds <- due to weak refs Total time for which application threads were stopped: 12.4057546 seconds 6 of them have unknown explanation for me. For your information, here are the 6 "weak refs" log messages : $ egrep "weak refs processing, [1-9][0-9]?" gc_201108232207.log | more 2011-08-24T10:13:49.641+0100: 43564.409: [GC[YG occupancy: 342791 K (943744 K)]43564.410: [Rescan (non-parallel) 43564.410: [grey object rescan, 0.7358794 secs]43565.146: [root rescan, 1.9033345 secs], 2.6398211 secs]43567.049: [weak refs processing, 8.2148555 secs] [1 CMS-remark: 26914465K(49283072K)] 27257257K(50226816K), 10.8566498 secs] [Times: user=10.85 sys=0.00, real=10.86 secs] 2011-08-25T12:33:22.658+0100: 138336.194: [GC[YG occupancy: 179985 K (943744 K)]138336.195: [Rescan (non-parallel) 138336.195: [grey object rescan, 0.5969886 secs]138336.792: [root rescan, 0.5114118 secs], 1.1089811 secs]138337.304: [weak refs processing, 8.8414246 secs] [1 CMS-remark: 20122279K(49283072K)] 20302264K(5226816K), 9.9514563 secs] [Times: user=9.94 sys=0.01, real=9.95 secs] 2011-08-26T07:22:55.233+0100: 206107.887: [GC[YG occupancy: 177014 K (943744 K)]206107.888: [Rescan (non-parallel) 206107.888: [grey object rescan, 0.4472730 secs]206108.335: [root rescan, 1.5575365 secs], 2.0053337 secs]206109.893: [weak refs processing, 10.3436973 secs] [1 CMS-remark: 19861286K(49283072K)] 20038301K(50226816K), 12.3572481 secs] [Times: user=12.22 sys=0.00, real=12.36 secs] 2011-08-26T07:51:55.531+0100: 207848.163: [GC[YG occupancy: 423184 K (943744 K)]207848.163: [Rescan (non-parallel) 207848.163: [grey object rescan, 0.4466552 secs]207848.610: [root rescan, 3.4207362 secs], 3.8680060 secs]207852.031: [weak refs processing, 7.6403893 secs] [1 CMS-remark: 19714349K(49283072K)] 20137533K(50226816K), 11.5130922 secs] [Times: user=11.51 sys=0.00, real=11.51 secs] 2011-08-27T15:18:48.928+0100: 321060.091: [GC[YG occupancy: 711567 K (943744 K)]321060.092: [Rescan (non-parallel) 321060.092: [grey object rescan, 0.4628955 secs]321060.555: [root rescan, 3.2087381 secs], 3.6721710 secs]321063.764: [weak refs processing, 33.3995481 secs] [1 CMS-remark: 19918243K(49283072K)] 20629810K(50226816K), 37.0910804 secs] [Times: user=37.04 sys=0.00, real=37.09 secs] 2011-08-28T11:17:12.144+0100: 392962.378: [GC[YG occupancy: 811576 K (943744 K)]392962.378: [Rescan (non-parallel) 392962.378: [grey object rescan, 0.4140054 secs]392962.793: [root rescan, 4.4323136 secs], 4.8469694 secs]392967.225: [weak refs processing, 11.5384812 secs] [1 CMS-remark: 19819290K(49283072K)] 20630867K(50226816K), 16.3885374 secs] [Times: user=16.35 sys=0.01, real=16.39 secs] 1. Here is the first pattern : a 61-second pause, but I don't see any suspicious message in GC logs: 2011-08-24T10:24:25.748+0100: 44200.509: [GC 44200.511: [ParNew Desired survivor size 53673984 bytes, new threshold 1 (max 4) - age 1: 101879520 bytes, 101879520 total : 933589K->104832K(943744K), 0.3947382 secs] 21369469K->20703994K(50226816K), 0.3966779 secs] [Times: user=6.43 sys=0.04, real=0.40 secs] Heap after GC invocations=1187 (full 12): par new generation total 943744K, used 104832K [0xfffffff353c00000, 0xfffffff393c00000, 0xfffffff393c00000) eden space 838912K, 0% used [0xfffffff353c00000, 0xfffffff353c00000, 0xfffffff386f40000) from space 104832K, 100% used [0xfffffff386f40000, 0xfffffff38d5a0000, 0xfffffff38d5a0000) to space 104832K, 0% used [0xfffffff38d5a0000, 0xfffffff38d5a0000, 0xfffffff393c00000) concurrent mark-sweep generation total 49283072K, used 20599162K [0xfffffff393c00000, 0xffffffff53c00000, 0xffffffff53c00000) concurrent-mark-sweep perm gen total 524288K, used 42905K [0xffffffff53c00000, 0xffffffff73c00000, 0xffffffff73c00000) } Total time for which application threads were stopped: 0.4110458 seconds Application time: 39.5906692 seconds {Heap before GC invocations=1187 (full 12): par new generation total 943744K, used 943744K [0xfffffff353c00000, 0xfffffff393c00000, 0xfffffff393c00000) eden space 838912K, 100% used [0xfffffff353c00000, 0xfffffff386f40000, 0xfffffff386f40000) from space 104832K, 100% used [0xfffffff386f40000, 0xfffffff38d5a0000, 0xfffffff38d5a0000) to space 104832K, 0% used [0xfffffff38d5a0000, 0xfffffff38d5a0000, 0xfffffff393c00000) concurrent mark-sweep generation total 49283072K, used 20599162K [0xfffffff393c00000, 0xffffffff53c00000, 0xffffffff53c00000) concurrent-mark-sweep perm gen total 524288K, used 42905K [0xffffffff53c00000, 0xffffffff73c00000, 0xffffffff73c00000) 2011-08-24T10:25:07.776+0100: 44242.537: [GC 44301.853: [ParNew Desired survivor size 53673984 bytes, new threshold 1 (max 4) - age 1: 99505080 bytes, 99505080 total : 943744K->104832K(943744K), 0.2010508 secs] 21542906K->20852742K(50226816K), 0.2022636 secs] [Times: user=5.67 sys=0.02, real=59.52 secs] Heap after GC invocations=1188 (full 12): par new generation total 943744K, used 104832K [0xfffffff353c00000, 0xfffffff393c00000, 0xfffffff393c00000) eden space 838912K, 0% used [0xfffffff353c00000, 0xfffffff353c00000, 0xfffffff386f40000) from space 104832K, 100% used [0xfffffff38d5a0000, 0xfffffff393c00000, 0xfffffff393c00000) to space 104832K, 0% used [0xfffffff386f40000, 0xfffffff386f40000, 0xfffffff38d5a0000) concurrent mark-sweep generation total 49283072K, used 20747910K [0xfffffff393c00000, 0xffffffff53c00000, 0xffffffff53c00000) concurrent-mark-sweep perm gen total 524288K, used 42905K [0xffffffff53c00000, 0xffffffff73c00000, 0xffffffff73c00000) } Total time for which application threads were stopped: 61.5576519 seconds Application time: 0.0245838 seconds Total time for which application threads were stopped: 9.8331189 seconds Application time: 0.0012626 seconds Total time for which application threads were stopped: 0.0090404 seconds Application time: 0.0008943 seconds Total time for which application threads were stopped: 0.0020415 seconds Application time: 0.0008181 seconds Total time for which application threads were stopped: 0.2338605 seconds Application time: 0.0018822 seconds The only suspicious thing is "[Times: user=5.67 sys=0.02, real=59.52 secs]", which means that the "real" duration is a lot higher than "user" CPU time. Because "sys" duration is low, it also means that the server is not swapping. What could explain this 61 seconds pause ? 2. Here is the second pattern : a 20-second pause, in the middle of nowhere in GC logs : {Heap before GC invocations=11132 (full 166): par new generation total 943744K, used 882686K [0xfffffff353c00000, 0xfffffff393c00000, 0xfffffff393c00000) eden space 838912K, 100% used [0xfffffff353c00000, 0xfffffff386f40000, 0xfffffff386f40000) from space 104832K, 41% used [0xfffffff386f40000, 0xfffffff3899ffa48, 0xfffffff38d5a0000) to space 104832K, 0% used [0xfffffff38d5a0000, 0xfffffff38d5a0000, 0xfffffff393c00000) concurrent mark-sweep generation total 49283072K, used 19148140K [0xfffffff393c00000, 0xffffffff53c00000, 0xffffffff53c00000) concurrent-mark-sweep perm gen total 524288K, used 44308K [0xffffffff53c00000, 0xffffffff73c00000, 0xffffffff73c00000) 2011-08-25T20:07:07.235+0100: 165560.417: [GC 165560.417: [ParNew Desired survivor size 53673984 bytes, new threshold 4 (max 4) - age 1: 26189384 bytes, 26189384 total - age 2: 1713728 bytes, 27903112 total : 882686K->34449K(943744K), 0.1280202 secs] 20030826K->19182589K(50226816K), 0.1285927 secs] [Times: user=3.94 sys=0.01, real=0.13 secs] Heap after GC invocations=11133 (full 166): par new generation total 943744K, used 34449K [0xfffffff353c00000, 0xfffffff393c00000, 0xfffffff393c00000) eden space 838912K, 0% used [0xfffffff353c00000, 0xfffffff353c00000, 0xfffffff386f40000) from space 104832K, 32% used [0xfffffff38d5a0000, 0xfffffff38f744468, 0xfffffff393c00000) to space 104832K, 0% used [0xfffffff386f40000, 0xfffffff386f40000, 0xfffffff38d5a0000) concurrent mark-sweep generation total 49283072K, used 19148140K [0xfffffff393c00000, 0xffffffff53c00000, 0xffffffff53c00000) concurrent-mark-sweep perm gen total 524288K, used 44308K [0xffffffff53c00000, 0xffffffff73c00000, 0xffffffff73c00000) } Total time for which application threads were stopped: 0.1370098 seconds Application time: 53.6273550 seconds Total time for which application threads were stopped: 0.0429426 seconds Application time: 0.0002318 seconds Total time for which application threads were stopped: 0.0044294 seconds Application time: 0.0002250 seconds Total time for which application threads were stopped: 0.0016478 seconds Application time: 59.0926108 seconds Total time for which application threads were stopped: 0.0431387 seconds Application time: 0.0002193 seconds Total time for which application threads were stopped: 0.0020966 seconds Application time: 0.0000956 seconds Total time for which application threads were stopped: 0.0016358 seconds Application time: 60.1048190 seconds Total time for which application threads were stopped: 0.0481582 seconds Application time: 0.0002207 seconds Total time for which application threads were stopped: 0.0067752 seconds Application time: 0.0001073 seconds Total time for which application threads were stopped: 0.0016387 seconds Application time: 60.7453974 seconds Total time for which application threads were stopped: 0.0425995 seconds Application time: 0.0002457 seconds Total time for which application threads were stopped: 0.0019724 seconds Application time: 0.0001005 seconds Total time for which application threads were stopped: 0.0016210 seconds Application time: 59.0845530 seconds Total time for which application threads were stopped: 0.0424095 seconds Application time: 0.0002314 seconds Total time for which application threads were stopped: 0.0020107 seconds Application time: 0.0000959 seconds Total time for which application threads were stopped: 0.0015940 seconds Application time: 60.7994458 seconds Total time for which application threads were stopped: 0.0428210 seconds Application time: 0.0002210 seconds Total time for which application threads were stopped: 0.0020541 seconds Application time: 0.0000974 seconds Total time for which application threads were stopped: 0.0016126 seconds Application time: 59.0963098 seconds Total time for which application threads were stopped: 0.0592795 seconds Application time: 0.0002622 seconds Total time for which application threads were stopped: 0.0023229 seconds Application time: 0.0000926 seconds Total time for which application threads were stopped: 0.0016296 seconds Application time: 60.1021141 seconds Total time for which application threads were stopped: 0.0443986 seconds Application time: 0.0002462 seconds Total time for which application threads were stopped: 0.0021135 seconds Application time: 0.0001076 seconds Total time for which application threads were stopped: 0.0016165 seconds Application time: 60.0324234 seconds Total time for which application threads were stopped: 0.0437486 seconds Application time: 0.0002286 seconds Total time for which application threads were stopped: 0.0021017 seconds Application time: 0.0001073 seconds Total time for which application threads were stopped: 0.0016570 seconds Application time: 60.4613330 seconds Total time for which application threads were stopped: 0.0490276 seconds Application time: 0.0002947 seconds Total time for which application threads were stopped: 0.0024618 seconds Application time: 0.0001238 seconds Total time for which application threads were stopped: 0.0019863 seconds Application time: 59.8201422 seconds Total time for which application threads were stopped: 0.0455540 seconds Application time: 0.0003668 seconds Total time for which application threads were stopped: 0.0020906 seconds Application time: 0.0001126 seconds Total time for which application threads were stopped: 0.0016693 seconds Application time: 60.0721521 seconds Total time for which application threads were stopped: 0.0438111 seconds Application time: 0.0002660 seconds Total time for which application threads were stopped: 0.0019814 seconds Application time: 0.0001018 seconds Total time for which application threads were stopped: 0.0017817 seconds Application time: 60.0825886 seconds Total time for which application threads were stopped: 0.0440386 seconds Application time: 0.0002197 seconds Total time for which application threads were stopped: 0.0020655 seconds Application time: 0.0001093 seconds Total time for which application threads were stopped: 0.0016122 seconds Application time: 59.6628580 seconds Total time for which application threads were stopped: 0.0425082 seconds Application time: 0.0002121 seconds Total time for which application threads were stopped: 0.0020967 seconds Application time: 0.0000935 seconds Total time for which application threads were stopped: 0.0015909 seconds Application time: 60.1951548 seconds Total time for which application threads were stopped: 0.0432125 seconds Application time: 0.0002274 seconds Total time for which application threads were stopped: 0.0020316 seconds Application time: 0.0001062 seconds Total time for which application threads were stopped: 0.0016534 seconds Application time: 59.5329171 seconds Total time for which application threads were stopped: 20.6893643 seconds Application time: 0.0002839 seconds Total time for which application threads were stopped: 0.0076240 seconds Application time: 0.0002137 seconds Total time for which application threads were stopped: 0.0019918 seconds Application time: 39.4376656 seconds Total time for which application threads were stopped: 0.0612671 seconds Application time: 0.0002478 seconds Any idea ? Thanks in advance for your help -- David Tavoularis [Annex] Complete GC log file gc_201108232207.log.gz available here: http://dl.free.fr/gxrxlLsVS JVM command line extract : /usr/jdk/instances/jdk1.6.0/jre/bin/sparcv9/java -Dsun.rmi.dgc.checkInterval=2000 -server -Xms49152m -Xmx49152m -XX:PermSize=512m -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+DisableExplicitGC -XX:-CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=40 -XX:NewSize=1024m -XX:MaxNewSize=1024m -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -XX:+PrintGCDateStamps -Xloggc:/logs/gc_201108232207.log -XX:+UseCompressedOops -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/data/heapdump $ /usr/jdk/instances/jdk1.6.0/jre/bin/sparcv9/java -version java version "1.6.0_25" Java(TM) SE Runtime Environment (build 1.6.0_25-b06) Java HotSpot(TM) 64-Bit Server VM (build 20.0-b11, mixed mode) $ /usr/sbin/prtdiag | head -3 System Configuration: Sun Microsystems sun4u Sun Fire E6900 System clock frequency: 150 MHz Memory size: 143360 Megabytes $ mpstat | wc -l 49 $ uname -a SunOS XXX 5.9 Generic_122300-05 sun4u sparc SUNW,Sun-Fire For your information, Full GC automatically triggered at 2:30am : $ grep Full gc_201108232207.log 2011-08-24T02:30:02.475+0100: 15737.603: [Full GC 15737.604: [CMS: 11972490K->5028118K(49283072K), 137.9859661 secs] 12141664K->5028118K(50226816K), [CMS Perm : 39558K->39491K(524288K)], 137.9867010 secs] [Times: user=133.02 sys=4.89, real=137.99 secs] 2011-08-25T02:30:05.142+0100: 102139.150: [Full GC 102139.150: [CMS: 18724122K->11970549K(49283072K), 433.4189517 secs] 18976948K->11970549K(50226816K), [CMS Perm : 44256K->42995K(524288K)], 433.4350620 secs] [Times: user=429.00 sys=3.89, real=433.44 secs] 2011-08-26T02:30:05.125+0100: 188538.009: [Full GC 188538.009: [CMS: 15865994K->12528867K(49283072K), 477.0168566 secs] 16343213K->12528867K(50226816K), [CMS Perm : 44324K->43408K(524288K)], 477.0175358 secs] [Times: user=476.76 sys=0.05, real=477.02 secs] 2011-08-27T02:30:03.084+0100: 274934.847: [Full GC 274934.849: [CMS: 14857264K->8811922K(49283072K), 312.4786042 secs] 15546860K->8811922K(50226816K), [CMS Perm : 44557K->43762K(524288K)], 312.4796506 secs] [Times: user=312.38 sys=0.11, real=312.48 secs] 2011-08-28T02:30:04.129+0100: 361334.770: [Full GC 361334.777: [CMS: 16479144K->5767617K(49283072K), 161.5857103 secs] 17318705K->5767617K(50226816K), [CMS Perm : 44127K->43481K(524288K)], 161.5863909 secs] [Times: user=161.21 sys=0.02, real=161.59 secs] 2011-08-29T02:30:03.316+0100: 447732.838: [Full GC 447732.838: [CMS: 13471208K->6989798K(49283072K), 173.7255263 secs] 13700543K->6989798K(50226816K), [CMS Perm : 43709K->43433K(524288K)], 173.7260186 secs] [Times: user=173.48 sys=0.01, real=173.73 secs] ________________________________ This electronic message contains information from Mycom which may be privileged or confidential. The information is intended to be for the use of the individual(s) or entity named above. If you are not the intended recipient, be aware that any disclosure, copying, distribution or any other use of the contents of this information is prohibited. If you have received this electronic message in error, please notify us by post or telephone (to the numbers or correspondence address above) or by email (at the email address above) immediately. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20110829/1b93fd8b/attachment-0001.html From tronicek at fit.cvut.cz Mon Aug 29 23:28:53 2011 From: tronicek at fit.cvut.cz (=?utf-8?B?IlpkZW7Em2sgVHJvbsOtxI1layI=?=) Date: Tue, 30 Aug 2011 08:28:53 +0200 Subject: Long "stop-the-world" pauses in CMS GC mode In-Reply-To: References: Message-ID: Hi, this can happen when the machine is overloaded. And as for swapping, I think it is not involved in the sys time because these times are times of the application thread. Z. -- Zdenek Tronicek FIT CTU in Prague David Tavoularis napsal(a): > The only suspicious thing is "[Times: user=5.67 sys=0.02, real=59.52 > secs]", which means that the "real" duration is a lot higher than "user" > CPU time. > Because "sys" duration is low, it also means that the server is not > swapping. > What could explain this 61 seconds pause ? > From y.s.ramakrishna at oracle.com Mon Aug 29 23:40:04 2011 From: y.s.ramakrishna at oracle.com (Ramki Ramakrishna) Date: Mon, 29 Aug 2011 23:40:04 -0700 Subject: Default max heap size In-Reply-To: References: Message-ID: <4E5C85C4.3090600@oracle.com> Hi Sergejs -- You are right. This seems to have been changed in hs16/6u18 via:- *6887571 Increase default heap config sizes Changeset: *http://hg.openjdk.java.net/hsx/hsx16/baseline/rev/0799687b7385 The documentation you pointed to probably dates back to 6.0 FCS and is likely obsolete in places. Unfortunately, I do not have a more up to date counterpart of the document to point you to for the one place for consolidates and more up to date information. The release notes for 6u18 however did list this change here:- http://www.oracle.com/technetwork/java/javase/6u18-142093.html Search for "Server JVM heap configuration ergonomics". -- ramki On 8/29/2011 10:34 AM, Sergejs Melderis wrote: > Hello. > I am trying to figure out how the hotspot chooses the default maximum heap size. > I posted this question to stackoverflow, but got no answers. > I don't want to repeat it here, so here is the question > http://stackoverflow.com/questions/7194526/hotspot-default-max-heap-size > > I searched the jdk source code, for the place where it is calculated. > I found function set_heap_size defined here > http://hg.openjdk.java.net/jdk6/jdk6/hotspot/file/dc40301aed45/src/share/vm/runtime/arguments.cpp > > If am not wrong, the calculation happens in the following lines > > if (FLAG_IS_DEFAULT(MaxHeapSize)) { > julong reasonable_max = phys_mem / MaxRAMFraction; > > if (phys_mem<= MaxHeapSize * MinRAMFraction) { > // Small physical memory, so use a minimum fraction of it for the heap > reasonable_max = phys_mem / MinRAMFraction; > } else { > // Not-small physical memory, so require a heap at least > // as large as MaxHeapSize > reasonable_max = MAX2(reasonable_max, (julong)MaxHeapSize); > } > > > MaxRAMFraction is 4, so reasonable_max is phys_mem / 4. So, unless > physical memory is very small, > the reasonable_max will be MAX2(reasonable_max, (julong)MaxHeapSize); > > MAX2 is defined as > #define MAX2(a, b) (((a)< (b)) ? (b) : (a)) > > At the end reasonable_max is set as MaxHeapSize > FLAG_SET_ERGO(uintx, MaxHeapSize, (uintx)reasonable_max); > > If I plug in the memory size on my test machine, the reasonable_max > will be very close to what I get from jmap -heap. > With RAM of 8, 16 GB, or more, the MaxHeapSize will be greater than 1 > GB, which contradicts the documentation > http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html#par_gc.ergonomics.default_size > > Thanks, > > Sergey. > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20110829/a1011386/attachment.html From y.s.ramakrishna at oracle.com Mon Aug 29 23:53:05 2011 From: y.s.ramakrishna at oracle.com (Ramki Ramakrishna) Date: Mon, 29 Aug 2011 23:53:05 -0700 Subject: Long "stop-the-world" pauses in CMS GC mode In-Reply-To: References: Message-ID: <4E5C88D1.9040401@oracle.com> Hi David -- you have a 48 core machine but you have handicapped yr JVM by forcing -XX:-CMSParallelRemarkEnabled, which forces single--threaded remarks on such a huge heap. Please remove that flag so all 48 cores can be used for the remark pause (and it should disappear from your pause-time radar entirely i think). If "ref proc" turns out to be still a problem, you can then enable +ParallelRefProcEnabled to parallelize that sub-phase as well. As to the 20 s pause in the middle of nowhere, I am clueless, but switch on -XX:+PrintSfaepointStatistics to see what that long pause corresponds to. Perhaps some kind of bulk bias revocation perhaps, I am not sure... -- ramki On 8/29/2011 2:25 AM, David Tavoularis wrote: > Hi, > > I am trying to understand the cause of "stop-the-world" pauses in my > application using CMS GC and a large heap (48GB). > The production server SF6900 (24 x dual-core UltraSparc-IV 1.35GHz, 48 > working threads, 140GB RAM) is running on Solaris 9 and Java6u25. > > I know that there are several possible causes : > 1) OldGen fragmentation : to avoid it, I implemented an automatic > FullGC in crontab at 2:30am > 30 2 * * * /usr/jdk/instances/jdk1.6.0/bin/jmap -d64 -histo:live > `/usr/bin/pgrep -f "XXXXXXX"` 2>&1 >/dev/null > > 2) Weak refs processing : a workaround (not tried yet) is to use > -XX:+ParallelRefProcEnabled, as described in the following articles : > http://blogs.oracle.com/jonthecollector/entry/top_10_gc_reasons > http://stackoverflow.com/questions/4101540/how-can-i-lower-the-weak-ref-processing-time-during-gc > I have found out that it could be triggered by the daily unreferencing > of a big object containing millions of small objects (using weak > references). > > > The application has been running for almost a week and I can see some > "stop-the-world" pauses longer than 10 seconds : > *$ egrep "Total time for which application threads were stopped: > [0-9][0-9]\." gc_201108232207.log* > Total time for which application threads were stopped: *10*.8630158 > seconds *<- due to weak refs* > Total time for which application threads were stopped: *18*.5259611 > seconds > Total time for which application threads were stopped: *10*.0777809 > seconds *<- due to weak refs* > Total time for which application threads were stopped: *61*.5576519 > seconds > Total time for which application threads were stopped: *19*.0205127 > seconds > Total time for which application threads were stopped: *20*.6893643 > seconds > Total time for which application threads were stopped: *16*.0048075 > seconds > Total time for which application threads were stopped: *12*.3665083 > seconds *<- due to weak refs* > Total time for which application threads were stopped: *11*.5213443 > seconds *<- due to weak refs* > Total time for which application threads were stopped: *37*.1018520 > seconds *<- due to weak refs* > Total time for which application threads were stopped: *16*.3988783 > seconds *<- due to weak refs* > Total time for which application threads were stopped: *12*.4057546 > seconds > > 6 of them have unknown explanation for me. > > For your information, here are the 6 "weak refs" log messages : > $ egrep "weak refs processing, [1-9][0-9]?" gc_201108232207.log | more > 2011-08-24T10:13:49.641+0100: 43564.409: [GC[YG occupancy: 342791 K > (943744 K)]43564.410: [Rescan (non-parallel) 43564.410: [grey object > rescan, 0.7358794 secs]43565.146: [root rescan, 1.9033345 secs], > 2.6398211 secs]43567.049: [weak refs processing, 8.2148555 secs] [1 > CMS-remark: 26914465K(49283072K)] 27257257K(50226816K), 10.8566498 > secs] *[Times: user=10.85 sys=0.00, real=10.86 secs]* > 2011-08-25T12:33:22.658+0100: 138336.194: [GC[YG occupancy: 179985 K > (943744 K)]138336.195: [Rescan (non-parallel) 138336.195: [grey object > rescan, 0.5969886 secs]138336.792: [root rescan, 0.5114118 secs], > 1.1089811 secs]138337.304: [weak refs processing, 8.8414246 secs] [1 > CMS-remark: 20122279K(49283072K)] 20302264K(5226816K), 9.9514563 secs] > *[Times: user=9.94 sys=0.01, real=9.95 secs]* > 2011-08-26T07:22:55.233+0100: 206107.887: [GC[YG occupancy: 177014 K > (943744 K)]206107.888: [Rescan (non-parallel) 206107.888: [grey object > rescan, 0.4472730 secs]206108.335: [root rescan, 1.5575365 secs], > 2.0053337 secs]206109.893: [weak refs processing, 10.3436973 secs] [1 > CMS-remark: 19861286K(49283072K)] 20038301K(50226816K), 12.3572481 > secs] *[Times: user=12.22 sys=0.00, real=12.36 secs]* > 2011-08-26T07:51:55.531+0100: 207848.163: [GC[YG occupancy: 423184 K > (943744 K)]207848.163: [Rescan (non-parallel) 207848.163: [grey object > rescan, 0.4466552 secs]207848.610: [root rescan, 3.4207362 secs], > 3.8680060 secs]207852.031: [weak refs processing, 7.6403893 secs] [1 > CMS-remark: 19714349K(49283072K)] 20137533K(50226816K), 11.5130922 > secs] *[Times: user=11.51 sys=0.00, real=11.51 secs]* > 2011-08-27T15:18:48.928+0100: 321060.091: [GC[YG occupancy: 711567 K > (943744 K)]321060.092: [Rescan (non-parallel) 321060.092: [grey object > rescan, 0.4628955 secs]321060.555: [root rescan, 3.2087381 secs], > 3.6721710 secs]321063.764: [weak refs processing, 33.3995481 secs] [1 > CMS-remark: 19918243K(49283072K)] 20629810K(50226816K), 37.0910804 > secs] *[Times: user=37.04 sys=0.00, real=37.09 secs]* > 2011-08-28T11:17:12.144+0100: 392962.378: [GC[YG occupancy: 811576 K > (943744 K)]392962.378: [Rescan (non-parallel) 392962.378: [grey object > rescan, 0.4140054 secs]392962.793: [root rescan, 4.4323136 secs], > 4.8469694 secs]392967.225: [weak refs processing, 11.5384812 secs] [1 > CMS-remark: 19819290K(49283072K)] 20630867K(50226816K), 16.3885374 > secs] *[Times: user=16.35 sys=0.01, real=16.39 secs]* > > > > > > > *1. Here is the first pattern : a _61-second pause_, but I don't see > any suspicious message in GC logs:* > 2011-08-24T10:24:25.748+0100: 44200.509: [GC 44200.511: [ParNew > Desired survivor size 53673984 bytes, new threshold 1 (max 4) > - age 1: 101879520 bytes, 101879520 total > : 933589K->104832K(943744K), 0.3947382 secs] > 21369469K->20703994K(50226816K), 0.3966779 secs] [Times: user=6.43 > sys=0.04, real=0.40 secs] > Heap after GC invocations=1187 (full 12): > par new generation total 943744K, used 104832K [0xfffffff353c00000, > 0xfffffff393c00000, 0xfffffff393c00000) > eden space 838912K, 0% used [0xfffffff353c00000, 0xfffffff353c00000, > 0xfffffff386f40000) > from space 104832K, 100% used [0xfffffff386f40000, 0xfffffff38d5a0000, > 0xfffffff38d5a0000) > to space 104832K, 0% used [0xfffffff38d5a0000, 0xfffffff38d5a0000, > 0xfffffff393c00000) > concurrent mark-sweep generation total 49283072K, used 20599162K > [0xfffffff393c00000, 0xffffffff53c00000, 0xffffffff53c00000) > concurrent-mark-sweep perm gen total 524288K, used 42905K > [0xffffffff53c00000, 0xffffffff73c00000, 0xffffffff73c00000) > } > Total time for which application threads were stopped: 0.4110458 seconds > Application time: 39.5906692 seconds > {Heap before GC invocations=1187 (full 12): > par new generation total 943744K, used 943744K [0xfffffff353c00000, > 0xfffffff393c00000, 0xfffffff393c00000) > eden space 838912K, 100% used [0xfffffff353c00000, 0xfffffff386f40000, > 0xfffffff386f40000) > from space 104832K, 100% used [0xfffffff386f40000, 0xfffffff38d5a0000, > 0xfffffff38d5a0000) > to space 104832K, 0% used [0xfffffff38d5a0000, 0xfffffff38d5a0000, > 0xfffffff393c00000) > concurrent mark-sweep generation total 49283072K, used 20599162K > [0xfffffff393c00000, 0xffffffff53c00000, 0xffffffff53c00000) > concurrent-mark-sweep perm gen total 524288K, used 42905K > [0xffffffff53c00000, 0xffffffff73c00000, 0xffffffff73c00000) > 2011-08-24T10:25:07.776+0100: 44242.537: [GC 44301.853: [ParNew > Desired survivor size 53673984 bytes, new threshold 1 (max 4) > - age 1: 99505080 bytes, 99505080 total > : 943744K->104832K(943744K), 0.2010508 secs] > 21542906K->20852742K(50226816K), 0.2022636 secs] *[Times: user=5.67 > sys=0.02, real=59.52 secs]* > Heap after GC invocations=1188 (full 12): > par new generation total 943744K, used 104832K [0xfffffff353c00000, > 0xfffffff393c00000, 0xfffffff393c00000) > eden space 838912K, 0% used [0xfffffff353c00000, 0xfffffff353c00000, > 0xfffffff386f40000) > from space 104832K, 100% used [0xfffffff38d5a0000, 0xfffffff393c00000, > 0xfffffff393c00000) > to space 104832K, 0% used [0xfffffff386f40000, 0xfffffff386f40000, > 0xfffffff38d5a0000) > concurrent mark-sweep generation total 49283072K, used 20747910K > [0xfffffff393c00000, 0xffffffff53c00000, 0xffffffff53c00000) > concurrent-mark-sweep perm gen total 524288K, used 42905K > [0xffffffff53c00000, 0xffffffff73c00000, 0xffffffff73c00000) > } > *Total time for which application threads were stopped: 61.5576519 > seconds* > Application time: 0.0245838 seconds > Total time for which application threads were stopped: 9.8331189 seconds > Application time: 0.0012626 seconds > Total time for which application threads were stopped: 0.0090404 seconds > Application time: 0.0008943 seconds > Total time for which application threads were stopped: 0.0020415 seconds > Application time: 0.0008181 seconds > Total time for which application threads were stopped: 0.2338605 seconds > Application time: 0.0018822 seconds > > The only suspicious thing is "[Times: user=5.67 sys=0.02, real=59.52 > secs]", which means that the "real" duration is a lot higher than > "user" CPU time. > Because "sys" duration is low, it also means that the server is not > swapping. > What could explain this 61 seconds pause ? > > > > *2. Here is the second pattern : a 20-second pause, in the middle of > nowhere in GC logs :* > {Heap before GC invocations=11132 (full 166): > par new generation total 943744K, used 882686K [0xfffffff353c00000, > 0xfffffff393c00000, 0xfffffff393c00000) > eden space 838912K, 100% used [0xfffffff353c00000, 0xfffffff386f40000, > 0xfffffff386f40000) > from space 104832K, 41% used [0xfffffff386f40000, 0xfffffff3899ffa48, > 0xfffffff38d5a0000) > to space 104832K, 0% used [0xfffffff38d5a0000, 0xfffffff38d5a0000, > 0xfffffff393c00000) > concurrent mark-sweep generation total 49283072K, used 19148140K > [0xfffffff393c00000, 0xffffffff53c00000, 0xffffffff53c00000) > concurrent-mark-sweep perm gen total 524288K, used 44308K > [0xffffffff53c00000, 0xffffffff73c00000, 0xffffffff73c00000) > 2011-08-25T20:07:07.235+0100: 165560.417: [GC 165560.417: [ParNew > Desired survivor size 53673984 bytes, new threshold 4 (max 4) > - age 1: 26189384 bytes, 26189384 total > - age 2: 1713728 bytes, 27903112 total > : 882686K->34449K(943744K), 0.1280202 secs] > 20030826K->19182589K(50226816K), 0.1285927 secs] [Times: user=3.94 > sys=0.01, real=0.13 secs] > Heap after GC invocations=11133 (full 166): > par new generation total 943744K, used 34449K [0xfffffff353c00000, > 0xfffffff393c00000, 0xfffffff393c00000) > eden space 838912K, 0% used [0xfffffff353c00000, 0xfffffff353c00000, > 0xfffffff386f40000) > from space 104832K, 32% used [0xfffffff38d5a0000, 0xfffffff38f744468, > 0xfffffff393c00000) > to space 104832K, 0% used [0xfffffff386f40000, 0xfffffff386f40000, > 0xfffffff38d5a0000) > concurrent mark-sweep generation total 49283072K, used 19148140K > [0xfffffff393c00000, 0xffffffff53c00000, 0xffffffff53c00000) > concurrent-mark-sweep perm gen total 524288K, used 44308K > [0xffffffff53c00000, 0xffffffff73c00000, 0xffffffff73c00000) > } > Total time for which application threads were stopped: 0.1370098 seconds > Application time: 53.6273550 seconds > Total time for which application threads were stopped: 0.0429426 seconds > Application time: 0.0002318 seconds > Total time for which application threads were stopped: 0.0044294 seconds > Application time: 0.0002250 seconds > Total time for which application threads were stopped: 0.0016478 seconds > Application time: 59.0926108 seconds > Total time for which application threads were stopped: 0.0431387 seconds > Application time: 0.0002193 seconds > Total time for which application threads were stopped: 0.0020966 seconds > Application time: 0.0000956 seconds > Total time for which application threads were stopped: 0.0016358 seconds > Application time: 60.1048190 seconds > Total time for which application threads were stopped: 0.0481582 seconds > Application time: 0.0002207 seconds > Total time for which application threads were stopped: 0.0067752 seconds > Application time: 0.0001073 seconds > Total time for which application threads were stopped: 0.0016387 seconds > Application time: 60.7453974 seconds > Total time for which application threads were stopped: 0.0425995 seconds > Application time: 0.0002457 seconds > Total time for which application threads were stopped: 0.0019724 seconds > Application time: 0.0001005 seconds > Total time for which application threads were stopped: 0.0016210 seconds > Application time: 59.0845530 seconds > Total time for which application threads were stopped: 0.0424095 seconds > Application time: 0.0002314 seconds > Total time for which application threads were stopped: 0.0020107 seconds > Application time: 0.0000959 seconds > Total time for which application threads were stopped: 0.0015940 seconds > Application time: 60.7994458 seconds > Total time for which application threads were stopped: 0.0428210 seconds > Application time: 0.0002210 seconds > Total time for which application threads were stopped: 0.0020541 seconds > Application time: 0.0000974 seconds > Total time for which application threads were stopped: 0.0016126 seconds > Application time: 59.0963098 seconds > Total time for which application threads were stopped: 0.0592795 seconds > Application time: 0.0002622 seconds > Total time for which application threads were stopped: 0.0023229 seconds > Application time: 0.0000926 seconds > Total time for which application threads were stopped: 0.0016296 seconds > Application time: 60.1021141 seconds > Total time for which application threads were stopped: 0.0443986 seconds > Application time: 0.0002462 seconds > Total time for which application threads were stopped: 0.0021135 seconds > Application time: 0.0001076 seconds > Total time for which application threads were stopped: 0.0016165 seconds > Application time: 60.0324234 seconds > Total time for which application threads were stopped: 0.0437486 seconds > Application time: 0.0002286 seconds > Total time for which application threads were stopped: 0.0021017 seconds > Application time: 0.0001073 seconds > Total time for which application threads were stopped: 0.0016570 seconds > Application time: 60.4613330 seconds > Total time for which application threads were stopped: 0.0490276 seconds > Application time: 0.0002947 seconds > Total time for which application threads were stopped: 0.0024618 seconds > Application time: 0.0001238 seconds > Total time for which application threads were stopped: 0.0019863 seconds > Application time: 59.8201422 seconds > Total time for which application threads were stopped: 0.0455540 seconds > Application time: 0.0003668 seconds > Total time for which application threads were stopped: 0.0020906 seconds > Application time: 0.0001126 seconds > Total time for which application threads were stopped: 0.0016693 seconds > Application time: 60.0721521 seconds > Total time for which application threads were stopped: 0.0438111 seconds > Application time: 0.0002660 seconds > Total time for which application threads were stopped: 0.0019814 seconds > Application time: 0.0001018 seconds > Total time for which application threads were stopped: 0.0017817 seconds > Application time: 60.0825886 seconds > Total time for which application threads were stopped: 0.0440386 seconds > Application time: 0.0002197 seconds > Total time for which application threads were stopped: 0.0020655 seconds > Application time: 0.0001093 seconds > Total time for which application threads were stopped: 0.0016122 seconds > Application time: 59.6628580 seconds > Total time for which application threads were stopped: 0.0425082 seconds > Application time: 0.0002121 seconds > Total time for which application threads were stopped: 0.0020967 seconds > Application time: 0.0000935 seconds > Total time for which application threads were stopped: 0.0015909 seconds > Application time: 60.1951548 seconds > Total time for which application threads were stopped: 0.0432125 seconds > Application time: 0.0002274 seconds > Total time for which application threads were stopped: 0.0020316 seconds > Application time: 0.0001062 seconds > Total time for which application threads were stopped: 0.0016534 seconds > Application time: 59.5329171 seconds > *Total time for which application threads were stopped: 20.6893643 > seconds* > Application time: 0.0002839 seconds > Total time for which application threads were stopped: 0.0076240 seconds > Application time: 0.0002137 seconds > Total time for which application threads were stopped: 0.0019918 seconds > Application time: 39.4376656 seconds > Total time for which application threads were stopped: 0.0612671 seconds > Application time: 0.0002478 seconds > > Any idea ? > > > Thanks in advance for your help > -- > David Tavoularis > > > > > > > [Annex] > Complete GC log file gc_201108232207.log.gz available here: > http://dl.free.fr/gxrxlLsVS > > JVM command line extract : > /usr/jdk/instances/jdk1.6.0/jre/bin/sparcv9/java > -Dsun.rmi.dgc.checkInterval=2000 -server -Xms49152m -Xmx49152m > -XX:PermSize=512m -XX:+UseConcMarkSweepGC -XX:+UseParNewGC > -XX:+DisableExplicitGC -XX:-CMSParallelRemarkEnabled > -XX:CMSInitiatingOccupancyFraction=40 -XX:NewSize=1024m > -XX:MaxNewSize=1024m -XX:+PrintGCDetails -XX:+PrintGCTimeStamps > -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution > -XX:+PrintGCApplicationStoppedTime > -XX:+PrintGCApplicationConcurrentTime -XX:+PrintGCDateStamps > -Xloggc:/logs/gc_201108232207.log -XX:+UseCompressedOops > -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/data/heapdump > > *$ /usr/jdk/instances/jdk1.6.0/jre/bin/sparcv9/java -version* > java version "1.6.0_25" > Java(TM) SE Runtime Environment (build 1.6.0_25-b06) > Java HotSpot(TM) 64-Bit Server VM (build 20.0-b11, mixed mode) > > *$ /usr/sbin/prtdiag | head -3* > System Configuration: Sun Microsystems sun4u Sun Fire E6900 > System clock frequency: 150 MHz > Memory size: 143360 Megabytes > > *$ mpstat | wc -l* > 49 > > *$ uname -a* > SunOS XXX 5.9 Generic_122300-05 sun4u sparc SUNW,Sun-Fire > > For your information, Full GC automatically triggered at 2:30am : > *$ grep Full gc_201108232207.log* > 2011-08-24T02:30:02.475+0100: 15737.603: [Full GC 15737.604: [CMS: > 11972490K->5028118K(49283072K), 137.9859661 secs] > 12141664K->5028118K(50226816K), [CMS Perm : 39558K->39491K(524288K)], > 137.9867010 secs] [Times: user=133.02 sys=4.89, real=137.99 secs] > 2011-08-25T02:30:05.142+0100: 102139.150: [Full GC 102139.150: [CMS: > 18724122K->11970549K(49283072K), 433.4189517 secs] > 18976948K->11970549K(50226816K), [CMS Perm : 44256K->42995K(524288K)], > 433.4350620 secs] [Times: user=429.00 sys=3.89, real=433.44 secs] > 2011-08-26T02:30:05.125+0100: 188538.009: [Full GC 188538.009: [CMS: > 15865994K->12528867K(49283072K), 477.0168566 secs] > 16343213K->12528867K(50226816K), [CMS Perm : 44324K->43408K(524288K)], > 477.0175358 secs] [Times: user=476.76 sys=0.05, real=477.02 secs] > 2011-08-27T02:30:03.084+0100: 274934.847: [Full GC 274934.849: [CMS: > 14857264K->8811922K(49283072K), 312.4786042 secs] > 15546860K->8811922K(50226816K), [CMS Perm : 44557K->43762K(524288K)], > 312.4796506 secs] [Times: user=312.38 sys=0.11, real=312.48 secs] > 2011-08-28T02:30:04.129+0100: 361334.770: [Full GC 361334.777: [CMS: > 16479144K->5767617K(49283072K), 161.5857103 secs] > 17318705K->5767617K(50226816K), [CMS Perm : 44127K->43481K(524288K)], > 161.5863909 secs] [Times: user=161.21 sys=0.02, real=161.59 secs] > 2011-08-29T02:30:03.316+0100: 447732.838: [Full GC 447732.838: [CMS: > 13471208K->6989798K(49283072K), 173.7255263 secs] > 13700543K->6989798K(50226816K), [CMS Perm : 43709K->43433K(524288K)], > 173.7260186 secs] [Times: user=173.48 sys=0.01, real=173.73 secs] > > > ------------------------------------------------------------------------ > > This electronic message contains information from Mycom which may be > privileged or confidential. The information is intended to be for the > use of the individual(s) or entity named above. If you are not the > intended recipient, be aware that any disclosure, copying, > distribution or any other use of the contents of this information is > prohibited. If you have received this electronic message in error, > please notify us by post or telephone (to the numbers or > correspondence address above) or by email (at the email address above) > immediately. > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20110829/88ca702d/attachment-0001.html From y.s.ramakrishna at oracle.com Tue Aug 30 00:19:52 2011 From: y.s.ramakrishna at oracle.com (Ramki Ramakrishna) Date: Tue, 30 Aug 2011 00:19:52 -0700 Subject: Long "stop-the-world" pauses in CMS GC mode In-Reply-To: <4E5C88D1.9040401@oracle.com> References: <4E5C88D1.9040401@oracle.com> Message-ID: <4E5C8F18.2070801@oracle.com> David, I missed one of the longer pauses that you'd specifically drawn attention to:- > On 8/29/2011 2:25 AM, David Tavoularis wrote: >> *1. Here is the first pattern : a _61-second pause_, but I don't see >> any suspicious message in GC logs:* >> ... >> 2011-08-24T10:25:07.776+0100: 44242.537: [GC 44301.853: [ParNew >> Desired survivor size 53673984 bytes, new threshold 1 (max 4) >> - age 1: 99505080 bytes, 99505080 total >> : 943744K->104832K(943744K), 0.2010508 secs] >> 21542906K->20852742K(50226816K), 0.2022636 secs] *[Times: user=5.67 >> sys=0.02, real=59.52 secs]* If you look at the timestamps above, the GC event starts off at 44242.537 seconds, but then the GC itself does not commence until 44301.853 seconds, i.e. a full 59.32 seconds later. So the pause is associated not with GC work itself (which is correctly reported as 202 ms), but rather with a preamble to the GC, perhaps with bringing threads to a safepoint, I am guessing. Once again -XX:+PrintSafepointStatistics (which i mentioned in previous email wrt the 20 s pause in the middle of noweher) would likely provide some clues. I have heard apocryphal stories of -XX:+UseMembar having worked to get rid of overly long safepointing pauses,. and I have heard -XX:-UseBiasedLocking for pauses associated with bulk bias revocations. But, without +PrintSafepointStatistics data to draw inferences from, those incantations would just constitute superstitious mumbo-jumbo. -- ramki >> Heap after GC invocations=1188 (full 12): >> par new generation total 943744K, used 104832K [0xfffffff353c00000, >> 0xfffffff393c00000, 0xfffffff393c00000) >> eden space 838912K, 0% used [0xfffffff353c00000, 0xfffffff353c00000, >> 0xfffffff386f40000) >> from space 104832K, 100% used [0xfffffff38d5a0000, >> 0xfffffff393c00000, 0xfffffff393c00000) >> to space 104832K, 0% used [0xfffffff386f40000, 0xfffffff386f40000, >> 0xfffffff38d5a0000) >> concurrent mark-sweep generation total 49283072K, used 20747910K >> [0xfffffff393c00000, 0xffffffff53c00000, 0xffffffff53c00000) >> concurrent-mark-sweep perm gen total 524288K, used 42905K >> [0xffffffff53c00000, 0xffffffff73c00000, 0xffffffff73c00000) >> } >> *Total time for which application threads were stopped: 61.5576519 >> seconds* >> Application time: 0.0245838 seconds >> Total time for which application threads were stopped: 9.8331189 seconds >> Application time: 0.0012626 seconds >> Total time for which application threads were stopped: 0.0090404 seconds >> Application time: 0.0008943 seconds >> Total time for which application threads were stopped: 0.0020415 seconds >> Application time: 0.0008181 seconds >> Total time for which application threads were stopped: 0.2338605 seconds >> Application time: 0.0018822 seconds >> >> The only suspicious thing is "[Times: user=5.67 sys=0.02, real=59.52 >> secs]", which means that the "real" duration is a lot higher than >> "user" CPU time. >> Because "sys" duration is low, it also means that the server is not >> swapping. >> What could explain this 61 seconds pause ? -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20110830/8ef5e553/attachment.html From david.tavoularis at mycom-int.com Tue Aug 30 02:52:18 2011 From: david.tavoularis at mycom-int.com (David Tavoularis) Date: Tue, 30 Aug 2011 11:52:18 +0200 Subject: Long "stop-the-world" pauses in CMS GC mode In-Reply-To: <4E5C8F18.2070801@oracle.com> References: <4E5C88D1.9040401@oracle.com> <4E5C8F18.2070801@oracle.com> Message-ID: Hi Ramki, Zden??k, Thank you for your valuable answers. > So the pause is associated not with GC work itself (which is correctly reported as 202 ms), but rather with a > preamble to the GC, perhaps with bringing threads to a safepoint, I am guessing. I will ask to add -XX:+PrintSafepointStatistics. What are the expected outputs ? Will it be in GC logs or in stdout ? > you have a 48 core machine but you have handicapped yr JVM by forcing -XX:-CMSParallelRemarkEnabled, > which forces single--threaded remarks on such a huge heap. I will ask to remove it and let you know. > If "ref proc" turns out to be still a problem, you can then enable +ParallelRefProcEnabled to parallelize that sub-phase as well. I will not activate -XX:+ParallelRefProcEnabled, because according to http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7028845, it is broken in Java6u25 and fixed in Java6u27. > I have heard apocryphal stories of -XX:+UseMembar having worked to get rid of overly long safepointing pauses, > and I have heard -XX:-UseBiasedLocking for pauses associated with bulk bias revocations. Good to know, but I won't use them until I get more info from -XX:+PrintSafepointStatistics and a new analysis after removing -XX:-CMSParallelRemarkEnabled >> The only suspicious thing is "[Times: user=5.67 sys=0.02, real=59.52 secs]", which means that the "real" duration is a lot higher than "user" CPU time. >> Because "sys" duration is low, it also means that the server is not swapping. > this can happen when the machine is overloaded. And as for swapping, I think it is not involved in the sys time because these times are times of the application thread. In my experience, when server is swapping, the "sys" time duration is increasing a lot. I can confirm that there is no high CPU load on the server (max CPU usage is 30% in the last 7 days) and no disk swapping (according to vmstat "sr"="scan rate" metrics). According to Ramki, I need to understand the reason of slow safepoint action. Best Regards -- David Tavoularis Head of L3 Support, Capacity & Dimensioning Mycom Group On Tue, 30 Aug 2011 09:19:52 +0200, Ramki Ramakrishna wrote: David, I missed one of the longer pauses that you'd specifically drawn attention to:- On 8/29/2011 2:25 AM, David Tavoularis wrote: 1. Here is the first pattern : a 61-second pause, but I don't see any suspicious message in GC logs: ... 2011-08-24T10:25:07.776+0100: 44242.537: [GC 44301.853: [ParNew Desired survivor size 53673984 bytes, new threshold 1 (max 4) - age 1: 99505080 bytes, 99505080 total : 943744K->104832K(943744K), 0.2010508 secs] 21542906K->20852742K(50226816K), 0.2022636 secs] [Times: user=5.67 sys=0.02, real=59.52 secs] If you look at the timestamps above, the GC event starts off at 44242.537 seconds, but then the GC itself does not commence until 44301.853 seconds, i.e. a full 59.32 seconds later. So the pause is associated not with GC work itself (which is correctly reported as 202 ms), but rather with a preamble to the GC, perhaps with bringing threads to a safepoint, I am guessing. Once again -XX:+PrintSafepointStatistics (which i mentioned in previous email wrt the 20 s pause in the middle of noweher) would likely provide some clues. I have heard apocryphal stories of -XX:+UseMembar having worked to get rid of overly long safepointing pauses,. and I have heard -XX:-UseBiasedLocking for pauses associated with bulk bias revocations. But, without +PrintSafepointStatistics data to draw inferences from, those incantations would just constitute superstitious mumbo-jumbo. -- ramki Heap after GC invocations=1188 (full 12): par new generation total 943744K, used 104832K [0xfffffff353c00000, 0xfffffff393c00000, 0xfffffff393c00000) eden space 838912K, 0% used [0xfffffff353c00000, 0xfffffff353c00000, 0xfffffff386f40000) from space 104832K, 100% used [0xfffffff38d5a0000, 0xfffffff393c00000, 0xfffffff393c00000) to space 104832K, 0% used [0xfffffff386f40000, 0xfffffff386f40000, 0xfffffff38d5a0000) concurrent mark-sweep generation total 49283072K, used 20747910K [0xfffffff393c00000, 0xffffffff53c00000, 0xffffffff53c00000) concurrent-mark-sweep perm gen total 524288K, used 42905K [0xffffffff53c00000, 0xffffffff73c00000, 0xffffffff73c00000) } Total time for which application threads were stopped: 61.5576519 seconds Application time: 0.0245838 seconds Total time for which application threads were stopped: 9.8331189 seconds Application time: 0.0012626 seconds Total time for which application threads were stopped: 0.0090404 seconds Application time: 0.0008943 seconds Total time for which application threads were stopped: 0.0020415 seconds Application time: 0.0008181 seconds Total time for which application threads were stopped: 0.2338605 seconds Application time: 0.0018822 seconds The only suspicious thing is "[Times: user=5.67 sys=0.02, real=59.52 secs]", which means that the "real" duration is a lot higher than "user" CPU time. Because "sys" duration is low, it also means that the server is not swapping. What could explain this 61 seconds pause ? ________________________________ This electronic message contains information from Mycom which may be privileged or confidential. The information is intended to be for the use of the individual(s) or entity named above. If you are not the intended recipient, be aware that any disclosure, copying, distribution or any other use of the contents of this information is prohibited. If you have received this electronic message in error, please notify us by post or telephone (to the numbers or correspondence address above) or by email (at the email address above) immediately. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20110830/3b7bd2ed/attachment.html From y.s.ramakrishna at oracle.com Tue Aug 30 09:51:08 2011 From: y.s.ramakrishna at oracle.com (Y. S. Ramakrishna) Date: Tue, 30 Aug 2011 09:51:08 -0700 Subject: Long "stop-the-world" pauses in CMS GC mode In-Reply-To: References: <4E5C88D1.9040401@oracle.com> <4E5C8F18.2070801@oracle.com> Message-ID: <4E5D14FC.5050704@oracle.com> On 08/30/11 02:52, David Tavoularis wrote: > Hi Ramki, Zden??k, > > Thank you for your valuable answers. > > /> So the pause is associated not with //GC work itself (which is > correctly reported as 202 ms), but rather with a / > /> preamble to the GC, perhaps //with bringing threads to a safepoint, I > am guessing./ > I will ask to add -XX:+PrintSafepointStatistics. What are the expected > outputs ? Will it be in GC logs or in stdout ? To stdout i believe. But with a latest JVM these data (which are batched into a record of several entries written out together) should have a timestamp column associated with each safepoint operation which will allow alignment of the data wrt the GC log events in the GC logs even, though the two split off into different i/o streams. > > /> you have a 48 core machine but you have handicapped yr JVM by forcing > -XX:-CMSParallelRemarkEnabled, > > which forces single--threaded remarks on such a huge heap./ > I will ask to remove it and let you know. Thanks. I am guessing it must be "legacy" from an (much) earlier time when there were bugs in the cms parallel remark. > > /> If "ref proc" turns out to be still a problem, you can then enable > +ParallelRefProcEnabled to parallelize that sub-phase as well./ > I will not activate -XX:+ParallelRefProcEnabled, because according to > http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7028845, it is broken > in Java6u25 and fixed in Java6u27. Ah yes, good catch; sorry to've not remembered as i had fixed that problem a while back during JDK 7 development. > > /> I have heard apocryphal stories of -XX:+UseMembar having worked to > get rid of //overly long safepointing pauses,/ > /> and I have heard -XX:-UseBiasedLocking for pauses associated //with > bulk bias revocations./ > Good to know, but I won't use them until I get more info from > -XX:+PrintSafepointStatistics and a new analysis after removing > -XX:-CMSParallelRemarkEnabled Sounds good. > > />> The only suspicious thing is "[Times: user=5.67 sys=0.02, real=59.52 > secs]", which means that the "real" duration is a lot higher than "user" > CPU time. > >> Because "sys" duration is low, it also means that the server is not > swapping./ > /> this can happen when the machine is overloaded. And as for swapping, > I think it is not involved in the sys time because these times are times > of the application thread./ > In my experience, when server is swapping, the "sys" time duration is > increasing a lot. > I can confirm that there is no high CPU load on the server (max CPU > usage is 30% in the last 7 days) and no disk swapping (according to > vmstat "sr"="scan rate" metrics). > According to Ramki, I need to understand the reason of slow safepoint > action. Right; sounds like a plan. -- ramki