From bkgood at gmail.com Thu Sep 1 15:12:07 2016 From: bkgood at gmail.com (William Good) Date: Thu, 1 Sep 2016 17:12:07 +0200 Subject: High termination times pre-concurrent cycle in G1 In-Reply-To: <3227fd9b-3f11-3e78-9768-1545b4153219@oracle.com> References:

<3227fd9b-3f11-3e78-9768-1545b4153219@oracle.com> Message-ID: Yep, the marking cycle seems to help. I just don't know why. Objects in old regions should die very infrequently, as everything produced either survives indefinitely or is a byproduct of loading or evaluation (a very fast operation,especially when compared to frequency of evac pause * number of survivor regions). Thus a mark cycle shouldn't reveal much to be collected in old regions, and my understanding is that all the survivor spaces are marked+evac'd on each evac pause. Tried first with 12 workers (happens to be the number of physical cores on my machine) and got the same pathological behavior. Then tried with 2 and still see large termination time increases. Log file attached. William On Wed, Aug 31, 2016 at 8:18 AM, yu.zhang at oracle.com wrote: > It seems that after marking (clean up), the termination time drops. Maybe > that is why you need a very low ihop so that you can have more marking > cycle. > > The work distribution seems fine. But system time is high. Maybe some lock > contention. > > I would agree to try lowering the gc threads, -XX:ParallelGCThreads= > > Jenny > > > On 08/30/2016 04:08 PM, Vitaly Davidovich wrote: > > William, > > Have you tried running with a lower number (than the current 18) of parallel > workers? > > On Tuesday, August 30, 2016, William Good wrote: >> >> I've been experiencing an issue in a production application using G1 >> for quite some time over a handful of 1.8.0 builds. The application is >> relatively simple: it spends about 60s reading some parameters from >> files on disk, and then starts serving web requests which merge some >> input with those parameters, performs some computation and returns a >> result. We're aiming to keep max total request time (as seen by remote >> hosts) below 100 ms but from previous experience with parnew and cms >> (and g1 on previous projects, for that matter), I didn't anticipate >> this being a problem. >> >> The symptoms are an ever-increasing time spent in evacuation pauses, >> and high parallel worker termination times stick out. With the >> recommended set of G1 settings (max heap size and pause time target), >> they increase sharply until I start seeing 500ms+ pause times and have >> to kill the JVM. >> >> I found some time ago that first forcing a bunch of full GCs with >> System.gc() at the phase (load -> serve) change and then forcing >> frequent concurrent cycles with -XX:InitiatingHeapOccupancyPercent=1 >> seems to mitigate the problem. I'd prefer to have to do neither, as >> the former makes redeployments very slow and the latter adds a couple >> of neighboring 40ms pauses for remark and cleanup pauses that aren't >> good for request time targets. >> >> I'm attaching a log file that details a short run, with the phase >> change at about 60s from start. After a few evacuation pauses, one >> lasts 160ms with nearly 100-120ms spent in parallel workers' >> 'termination'. After this, a concurrent cycle runs and everything goes >> back to normal. java params are at the top of the file. >> >> Generally this happens over a much longer period of time (and >> especially if I haven't given the low >> -XX:InitiatingHeapOccupancyPercent value) and over many different >> builds of 1.8.0. This was b101. It's running alone on a fairly hefty >> dual-socket Xeon box with 128GB of RAM on CentOS 7. >> >> I'd be more than happy to hear any ideas on what's going on here and >> how it could be fixed. >> >> Best, >> William > > > > -- > Sent from my phone > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > -------------- next part -------------- A non-text attachment was scrubbed... Name: 2workers.log.gz Type: application/x-gzip Size: 80535 bytes Desc: not available URL: From yu.zhang at oracle.com Thu Sep 1 23:34:17 2016 From: yu.zhang at oracle.com (yu.zhang at oracle.com) Date: Thu, 1 Sep 2016 16:34:17 -0700 Subject: High termination times pre-concurrent cycle in G1 In-Reply-To: References:

<3227fd9b-3f11-3e78-9768-1545b4153219@oracle.com> Message-ID: William, I think it might be related to humongous objects. Can you do some experiments -XX:+UnlockExperimentalVMOptions -XX:+G1TraceEagerReclaimHumongousObjects Can you remove -XX:G1LogLevel=finest? Based on the output, can adjust the heap region size. But I think the termination is less with 2 worker threads. Thanks Jenny On 09/01/2016 08:12 AM, William Good wrote: > Yep, the marking cycle seems to help. I just don't know why. Objects > in old regions should die very infrequently, as everything produced > either survives indefinitely or is a byproduct of loading or > evaluation (a very fast operation,especially when compared to > frequency of evac pause * number of survivor regions). Thus a mark > cycle shouldn't reveal much to be collected in old regions, and my > understanding is that all the survivor spaces are marked+evac'd on > each evac pause. > > Tried first with 12 workers (happens to be the number of physical > cores on my machine) and got the same pathological behavior. Then > tried with 2 and still see large termination time increases. Log file > attached. > > William > > On Wed, Aug 31, 2016 at 8:18 AM, yu.zhang at oracle.com > wrote: >> It seems that after marking (clean up), the termination time drops. Maybe >> that is why you need a very low ihop so that you can have more marking >> cycle. >> >> The work distribution seems fine. But system time is high. Maybe some lock >> contention. >> >> I would agree to try lowering the gc threads, -XX:ParallelGCThreads= >> >> Jenny >> >> >> On 08/30/2016 04:08 PM, Vitaly Davidovich wrote: >> >> William, >> >> Have you tried running with a lower number (than the current 18) of parallel >> workers? >> >> On Tuesday, August 30, 2016, William Good wrote: >>> I've been experiencing an issue in a production application using G1 >>> for quite some time over a handful of 1.8.0 builds. The application is >>> relatively simple: it spends about 60s reading some parameters from >>> files on disk, and then starts serving web requests which merge some >>> input with those parameters, performs some computation and returns a >>> result. We're aiming to keep max total request time (as seen by remote >>> hosts) below 100 ms but from previous experience with parnew and cms >>> (and g1 on previous projects, for that matter), I didn't anticipate >>> this being a problem. >>> >>> The symptoms are an ever-increasing time spent in evacuation pauses, >>> and high parallel worker termination times stick out. With the >>> recommended set of G1 settings (max heap size and pause time target), >>> they increase sharply until I start seeing 500ms+ pause times and have >>> to kill the JVM. >>> >>> I found some time ago that first forcing a bunch of full GCs with >>> System.gc() at the phase (load -> serve) change and then forcing >>> frequent concurrent cycles with -XX:InitiatingHeapOccupancyPercent=1 >>> seems to mitigate the problem. I'd prefer to have to do neither, as >>> the former makes redeployments very slow and the latter adds a couple >>> of neighboring 40ms pauses for remark and cleanup pauses that aren't >>> good for request time targets. >>> >>> I'm attaching a log file that details a short run, with the phase >>> change at about 60s from start. After a few evacuation pauses, one >>> lasts 160ms with nearly 100-120ms spent in parallel workers' >>> 'termination'. After this, a concurrent cycle runs and everything goes >>> back to normal. java params are at the top of the file. >>> >>> Generally this happens over a much longer period of time (and >>> especially if I haven't given the low >>> -XX:InitiatingHeapOccupancyPercent value) and over many different >>> builds of 1.8.0. This was b101. It's running alone on a fairly hefty >>> dual-socket Xeon box with 128GB of RAM on CentOS 7. >>> >>> I'd be more than happy to hear any ideas on what's going on here and >>> how it could be fixed. >>> >>> Best, >>> William >> >> >> -- >> Sent from my phone >> >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> >> From bkgood at gmail.com Fri Sep 2 10:10:54 2016 From: bkgood at gmail.com (William Good) Date: Fri, 2 Sep 2016 12:10:54 +0200 Subject: High termination times pre-concurrent cycle in G1 In-Reply-To: References:

<3227fd9b-3f11-3e78-9768-1545b4153219@oracle.com>

Message-ID: Jenny, There's a log attached with those options. Termination times are less with 2 workers but eden is much smaller (~2 GB instead of ~7) and evac pauses are more frequent. So it's a nice improvement but not ideal. Also it appears to me that the 2 worker case shows that work distribution appears to degenerate over time (termination attempts increases) which mirrors what I perceive in the cases with higher worker counts. Thanks for your help so far! William On Fri, Sep 2, 2016 at 1:34 AM, yu.zhang at oracle.com wrote: > William, > > I think it might be related to humongous objects. Can you do some > experiments > > -XX:+UnlockExperimentalVMOptions -XX:+G1TraceEagerReclaimHumongousObjects > > Can you remove -XX:G1LogLevel=finest? > > Based on the output, can adjust the heap region size. > But I think the termination is less with 2 worker threads. > > Thanks > Jenny > > On 09/01/2016 08:12 AM, William Good wrote: >> >> Yep, the marking cycle seems to help. I just don't know why. Objects >> in old regions should die very infrequently, as everything produced >> either survives indefinitely or is a byproduct of loading or >> evaluation (a very fast operation,especially when compared to >> frequency of evac pause * number of survivor regions). Thus a mark >> cycle shouldn't reveal much to be collected in old regions, and my >> understanding is that all the survivor spaces are marked+evac'd on >> each evac pause. >> >> Tried first with 12 workers (happens to be the number of physical >> cores on my machine) and got the same pathological behavior. Then >> tried with 2 and still see large termination time increases. Log file >> attached. >> >> William >> >> On Wed, Aug 31, 2016 at 8:18 AM, yu.zhang at oracle.com >> wrote: >>> >>> It seems that after marking (clean up), the termination time drops. Maybe >>> that is why you need a very low ihop so that you can have more marking >>> cycle. >>> >>> The work distribution seems fine. But system time is high. Maybe some >>> lock >>> contention. >>> >>> I would agree to try lowering the gc threads, -XX:ParallelGCThreads= >>> >>> Jenny >>> >>> >>> On 08/30/2016 04:08 PM, Vitaly Davidovich wrote: >>> >>> William, >>> >>> Have you tried running with a lower number (than the current 18) of >>> parallel >>> workers? >>> >>> On Tuesday, August 30, 2016, William Good wrote: >>>> >>>> I've been experiencing an issue in a production application using G1 >>>> for quite some time over a handful of 1.8.0 builds. The application is >>>> relatively simple: it spends about 60s reading some parameters from >>>> files on disk, and then starts serving web requests which merge some >>>> input with those parameters, performs some computation and returns a >>>> result. We're aiming to keep max total request time (as seen by remote >>>> hosts) below 100 ms but from previous experience with parnew and cms >>>> (and g1 on previous projects, for that matter), I didn't anticipate >>>> this being a problem. >>>> >>>> The symptoms are an ever-increasing time spent in evacuation pauses, >>>> and high parallel worker termination times stick out. With the >>>> recommended set of G1 settings (max heap size and pause time target), >>>> they increase sharply until I start seeing 500ms+ pause times and have >>>> to kill the JVM. >>>> >>>> I found some time ago that first forcing a bunch of full GCs with >>>> System.gc() at the phase (load -> serve) change and then forcing >>>> frequent concurrent cycles with -XX:InitiatingHeapOccupancyPercent=1 >>>> seems to mitigate the problem. I'd prefer to have to do neither, as >>>> the former makes redeployments very slow and the latter adds a couple >>>> of neighboring 40ms pauses for remark and cleanup pauses that aren't >>>> good for request time targets. >>>> >>>> I'm attaching a log file that details a short run, with the phase >>>> change at about 60s from start. After a few evacuation pauses, one >>>> lasts 160ms with nearly 100-120ms spent in parallel workers' >>>> 'termination'. After this, a concurrent cycle runs and everything goes >>>> back to normal. java params are at the top of the file. >>>> >>>> Generally this happens over a much longer period of time (and >>>> especially if I haven't given the low >>>> -XX:InitiatingHeapOccupancyPercent value) and over many different >>>> builds of 1.8.0. This was b101. It's running alone on a fairly hefty >>>> dual-socket Xeon box with 128GB of RAM on CentOS 7. >>>> >>>> I'd be more than happy to hear any ideas on what's going on here and >>>> how it could be fixed. >>>> >>>> Best, >>>> William >>> >>> >>> >>> -- >>> Sent from my phone >>> >>> >>> _______________________________________________ >>> hotspot-gc-use mailing list >>> hotspot-gc-use at openjdk.java.net >>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>> >>> > -------------- next part -------------- A non-text attachment was scrubbed... Name: 2workers_tracehuge.log.gz Type: application/x-gzip Size: 58508 bytes Desc: not available URL: From thomas.schatzl at oracle.com Fri Sep 2 11:43:10 2016 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Fri, 02 Sep 2016 13:43:10 +0200 Subject: High termination times pre-concurrent cycle in G1 In-Reply-To: References:

<3227fd9b-3f11-3e78-9768-1545b4153219@oracle.com>

Message-ID: <1472816590.4493.20.camel@oracle.com> Hi William, ? maybe it is a bit too much to ask, but would it be possible for you to build and run a self-compiled version of the JDK in some test environment? Yesterday a patch to fix these kinds of problems has been pushed to the 8u repos. It will only be available with 8u122 though (as far as I know), which has a GA of January 2017. Otherwise early access builds will probably be available as soon as October this year, after 8u112 shipped. Unfortunately, really diagnosing some of the potential causes of this issue also requires a custom build too, containing some otherwise compiled-out code. That change (JDK-8152438) is known to fix similar issues that you see. It particularly helps if your object graph is relatively deep and narrow with one or more largish arrays of java.lang.Object that contain the only references to many objects. One thing that could be tried is decreasing region size to 2M or even 1M (what we typically do not recommend). This may keep these objects out of the young gen, and so they are accessed differently. This may have other negative consequences on performance though. However then scalability of the evacuation pause should be better. Thanks, ? Thomas From bkgood at gmail.com Fri Sep 2 13:08:38 2016 From: bkgood at gmail.com (William Good) Date: Fri, 2 Sep 2016 15:08:38 +0200 Subject: High termination times pre-concurrent cycle in G1 In-Reply-To: <1472816590.4493.20.camel@oracle.com> References:

<3227fd9b-3f11-3e78-9768-1545b4153219@oracle.com>

<1472816590.4493.20.camel@oracle.com> Message-ID: Thomas, More than happy to build and test with a patched JDK. Let me know what I need to do, at least what sources to grab (I think I can build it once I've got them). I learned this morning that my belief that significant tenuring wasn't taking place was possibly wrong. Our problem seems to have disappeared with an akka upgrade [1] and my guess for the relevant fix [2] indicates that unintended tenuring was occurring at some point (the tenuring has never been significant enough for us to notice). However as long as I'm unable to reproduce with the new version I'm happy to continue testing using the older version I know to reproduce, as I don't think this G1 behavior is intended. William [1] http://akka.io/news/2016/04/01/akka-2.3.15-released.html [2] https://github.com/akka/akka/issues/19216 On Fri, Sep 2, 2016 at 1:43 PM, Thomas Schatzl wrote: > Hi William, > > maybe it is a bit too much to ask, but would it be possible for you > to build and run a self-compiled version of the JDK in some test > environment? > > Yesterday a patch to fix these kinds of problems has been pushed to the > 8u repos. It will only be available with 8u122 though (as far as I > know), which has a GA of January 2017. Otherwise early access builds > will probably be available as soon as October this year, after 8u112 > shipped. > > Unfortunately, really diagnosing some of the potential causes of this > issue also requires a custom build too, containing some otherwise > compiled-out code. > > That change (JDK-8152438) is known to fix similar issues that you see. > > It particularly helps if your object graph is relatively deep and > narrow with one or more largish arrays of java.lang.Object that contain > the only references to many objects. > > One thing that could be tried is decreasing region size to 2M or even > 1M (what we typically do not recommend). This may keep these objects > out of the young gen, and so they are accessed differently. This may > have other negative consequences on performance though. > However then scalability of the evacuation pause should be better. > > Thanks, > Thomas > From thomas.schatzl at oracle.com Fri Sep 2 14:01:40 2016 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Fri, 02 Sep 2016 16:01:40 +0200 Subject: High termination times pre-concurrent cycle in G1 In-Reply-To: References:

<3227fd9b-3f11-3e78-9768-1545b4153219@oracle.com>

<1472816590.4493.20.camel@oracle.com> Message-ID: <1472824900.4493.34.camel@oracle.com> Hi, On Fri, 2016-09-02 at 15:08 +0200, William Good wrote: > Thomas, > > More than happy to build and test with a patched JDK. Let me know > what I need to do, at least what sources to grab (I think I can build > it once I've got them). > > I learned this morning that my belief that significant tenuring > wasn't taking place was possibly wrong. Our problem seems to have > disappeared with an akka upgrade [1] and my guess for the relevant > fix [2] indicates that unintended tenuring was occurring at some > point (teh tenuring has never been significant enough for us to > notice). However as long as I'm unable to reproduce with the new > version I'm happy to continue testing using the older version I know > to reproduce, as I don't think this G1 behavior is intended. ? this bug report makes me tend to believe that the suggested fix (for JDK-8152438) will actually fix the issue. Let me explain: Every G1 thread has two work queues, one fixed size public one where others can steal from, and one resizable one that is private. Work (references) is first put into the public one, and then if it is full, into the private one. Threads first process their private buffers, so that others can steal from the public one while they are working on it. Due to some conditions it can happen that the public queues are already completely empty, while one thread is still busy for a long time with its private one. I.e. the work in the private queue can be so that it never generates more work in the public queue that others can steal and continue work from. So they wait. That can cause this high termination time. Of course this situation can occur multiple times during a GC, so that every thread gets his fair share of waiting :) That mentioned fix has been pushed into the?http://hg.openjdk.java.net/ jdk8u/jdk8u-dev/?repository. It should be a matter of pulling it. The README-builds file in the repo has build instructions. It would be really nice if we could track down your problem to this issue, or at least significantly improve it. (JDK9 has more significant patches in that area iirc, but this one is probably the most important). Thanks, ? Thomas From vitalyd at gmail.com Fri Sep 2 14:46:01 2016 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Fri, 2 Sep 2016 10:46:01 -0400 Subject: High termination times pre-concurrent cycle in G1 In-Reply-To: References:

<3227fd9b-3f11-3e78-9768-1545b4153219@oracle.com>

Message-ID: This particular run hits to-space exhaustion, followed by a full GC that appears a bunch of humongous arrays and generally reduces heap occupancy substantially (of course you paid 23s for that). There's a humongous allocation that triggers a mixed cycle (~60s into the run), which is followed by initiation of concurrent-marking. That concurrent-mark never completes because the to-space exhaustion comes, and then a full GC; the concurrent mark is aborted right after that. Since you've reduced ParallelGCThreads to 2, you should explicitly bump ConcGCThreads or else you're running with just 1, which looks like isn't keeping up. On Fri, Sep 2, 2016 at 6:10 AM, William Good wrote: > Jenny, > > There's a log attached with those options. > > Termination times are less with 2 workers but eden is much smaller (~2 > GB instead of ~7) and evac pauses are more frequent. So it's a nice > improvement but not ideal. Also it appears to me that the 2 worker > case shows that work distribution appears to degenerate over time > (termination attempts increases) which mirrors what I perceive in the > cases with higher worker counts. > > Thanks for your help so far! > > William > > On Fri, Sep 2, 2016 at 1:34 AM, yu.zhang at oracle.com > wrote: > > William, > > > > I think it might be related to humongous objects. Can you do some > > experiments > > > > -XX:+UnlockExperimentalVMOptions -XX:+G1TraceEagerReclaimHumongousOb > jects > > > > Can you remove -XX:G1LogLevel=finest? > > > > Based on the output, can adjust the heap region size. > > But I think the termination is less with 2 worker threads. > > > > Thanks > > Jenny > > > > On 09/01/2016 08:12 AM, William Good wrote: > >> > >> Yep, the marking cycle seems to help. I just don't know why. Objects > >> in old regions should die very infrequently, as everything produced > >> either survives indefinitely or is a byproduct of loading or > >> evaluation (a very fast operation,especially when compared to > >> frequency of evac pause * number of survivor regions). Thus a mark > >> cycle shouldn't reveal much to be collected in old regions, and my > >> understanding is that all the survivor spaces are marked+evac'd on > >> each evac pause. > >> > >> Tried first with 12 workers (happens to be the number of physical > >> cores on my machine) and got the same pathological behavior. Then > >> tried with 2 and still see large termination time increases. Log file > >> attached. > >> > >> William > >> > >> On Wed, Aug 31, 2016 at 8:18 AM, yu.zhang at oracle.com > >> wrote: > >>> > >>> It seems that after marking (clean up), the termination time drops. > Maybe > >>> that is why you need a very low ihop so that you can have more marking > >>> cycle. > >>> > >>> The work distribution seems fine. But system time is high. Maybe some > >>> lock > >>> contention. > >>> > >>> I would agree to try lowering the gc threads, -XX:ParallelGCThreads= > >>> > >>> Jenny > >>> > >>> > >>> On 08/30/2016 04:08 PM, Vitaly Davidovich wrote: > >>> > >>> William, > >>> > >>> Have you tried running with a lower number (than the current 18) of > >>> parallel > >>> workers? > >>> > >>> On Tuesday, August 30, 2016, William Good wrote: > >>>> > >>>> I've been experiencing an issue in a production application using G1 > >>>> for quite some time over a handful of 1.8.0 builds. The application is > >>>> relatively simple: it spends about 60s reading some parameters from > >>>> files on disk, and then starts serving web requests which merge some > >>>> input with those parameters, performs some computation and returns a > >>>> result. We're aiming to keep max total request time (as seen by remote > >>>> hosts) below 100 ms but from previous experience with parnew and cms > >>>> (and g1 on previous projects, for that matter), I didn't anticipate > >>>> this being a problem. > >>>> > >>>> The symptoms are an ever-increasing time spent in evacuation pauses, > >>>> and high parallel worker termination times stick out. With the > >>>> recommended set of G1 settings (max heap size and pause time target), > >>>> they increase sharply until I start seeing 500ms+ pause times and have > >>>> to kill the JVM. > >>>> > >>>> I found some time ago that first forcing a bunch of full GCs with > >>>> System.gc() at the phase (load -> serve) change and then forcing > >>>> frequent concurrent cycles with -XX:InitiatingHeapOccupancyPercent=1 > >>>> seems to mitigate the problem. I'd prefer to have to do neither, as > >>>> the former makes redeployments very slow and the latter adds a couple > >>>> of neighboring 40ms pauses for remark and cleanup pauses that aren't > >>>> good for request time targets. > >>>> > >>>> I'm attaching a log file that details a short run, with the phase > >>>> change at about 60s from start. After a few evacuation pauses, one > >>>> lasts 160ms with nearly 100-120ms spent in parallel workers' > >>>> 'termination'. After this, a concurrent cycle runs and everything goes > >>>> back to normal. java params are at the top of the file. > >>>> > >>>> Generally this happens over a much longer period of time (and > >>>> especially if I haven't given the low > >>>> -XX:InitiatingHeapOccupancyPercent value) and over many different > >>>> builds of 1.8.0. This was b101. It's running alone on a fairly hefty > >>>> dual-socket Xeon box with 128GB of RAM on CentOS 7. > >>>> > >>>> I'd be more than happy to hear any ideas on what's going on here and > >>>> how it could be fixed. > >>>> > >>>> Best, > >>>> William > >>> > >>> > >>> > >>> -- > >>> Sent from my phone > >>> > >>> > >>> _______________________________________________ > >>> hotspot-gc-use mailing list > >>> hotspot-gc-use at openjdk.java.net > >>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > >>> > >>> > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bkgood at gmail.com Mon Sep 5 10:02:40 2016 From: bkgood at gmail.com (William Good) Date: Mon, 5 Sep 2016 12:02:40 +0200 Subject: High termination times pre-concurrent cycle in G1 In-Reply-To: <1472824900.4493.34.camel@oracle.com> References:

<3227fd9b-3f11-3e78-9768-1545b4153219@oracle.com>

<1472816590.4493.20.camel@oracle.com> <1472824900.4493.34.camel@oracle.com> Message-ID: Hi Thomas, Attached is a log file from a run with this morning's tip. I don't see any change unfortunately. I'll try with jdk9 tip later today since you mention there are more fixes there. configure summary looked like this, I think everything here is in order: Configuration summary: * Debug level: release * JDK variant: normal * JVM variants: server * OpenJDK target: OS: linux, CPU architecture: x86, address length: 64 Tools summary: * Boot JDK: java version "1.8.0_101" Java(TM) SE Runtime Environment (build 1.8.0_101-b13) Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode) (at /var/tmp/jdk1.8.0_101) * C Compiler: gcc (GCC) 4.8.5 20150623 (Red Hat-4) version 4.8.5 (at /usr/bin/gcc) * C++ Compiler: g++ (GCC) 4.8.5 20150623 (Red Hat-4) version 4.8.5 (at /usr/bin/g++) Best, William On Fri, Sep 2, 2016 at 4:01 PM, Thomas Schatzl wrote: > Hi, > > On Fri, 2016-09-02 at 15:08 +0200, William Good wrote: >> Thomas, >> >> More than happy to build and test with a patched JDK. Let me know >> what I need to do, at least what sources to grab (I think I can build >> it once I've got them). >> >> I learned this morning that my belief that significant tenuring >> wasn't taking place was possibly wrong. Our problem seems to have >> disappeared with an akka upgrade [1] and my guess for the relevant >> fix [2] indicates that unintended tenuring was occurring at some >> point (teh tenuring has never been significant enough for us to >> notice). However as long as I'm unable to reproduce with the new >> version I'm happy to continue testing using the older version I know >> to reproduce, as I don't think this G1 behavior is intended. > > this bug report makes me tend to believe that the suggested fix (for > JDK-8152438) will actually fix the issue. Let me explain: > > Every G1 thread has two work queues, one fixed size public one where > others can steal from, and one resizable one that is private. Work > (references) is first put into the public one, and then if it is full, > into the private one. > Threads first process their private buffers, so that others can steal > from the public one while they are working on it. > Due to some conditions it can happen that the public queues are already > completely empty, while one thread is still busy for a long time with > its private one. I.e. the work in the private queue can be so that it > never generates more work in the public queue that others can steal and > continue work from. So they wait. > > That can cause this high termination time. Of course this situation can > occur multiple times during a GC, so that every thread gets his fair > share of waiting :) > > That mentioned fix has been pushed into the http://hg.openjdk.java.net/ > jdk8u/jdk8u-dev/ repository. It should be a matter of pulling it. > The README-builds file in the repo has build instructions. > > It would be really nice if we could track down your problem to this > issue, or at least significantly improve it. (JDK9 has more significant > patches in that area iirc, but this one is probably the most > important). > > Thanks, > Thomas > -------------- next part -------------- A non-text attachment was scrubbed... Name: tip.log.gz Type: application/x-gzip Size: 50424 bytes Desc: not available URL: From thomas.schatzl at oracle.com Mon Sep 5 10:45:14 2016 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Mon, 05 Sep 2016 12:45:14 +0200 Subject: High termination times pre-concurrent cycle in G1 In-Reply-To: References:

<3227fd9b-3f11-3e78-9768-1545b4153219@oracle.com>

<1472816590.4493.20.camel@oracle.com> <1472824900.4493.34.camel@oracle.com> Message-ID: <1473072314.4588.24.camel@oracle.com> Hi William, On Mon, 2016-09-05 at 12:02 +0200, William Good wrote: > Hi Thomas, > > Attached is a log file from a run with this morning's tip. I don't > see any change unfortunately. ? hmm. I talked to somebody else about it, and looked a bit at the Akka code for this and related bugs: the reason why it does not help may just be because the problem is the data structure itself, not the gc. I.e. the?AbstractNodeQueue from Akka seems to be a linked list of nodes, probably rather very deep and with low fan-out. So there may be not a lot of work that can actually be done in parallel. Additionally, the references into young gen really cause significant work. As opposed to the case that JDK-8152438?fixes, that is, if a worker thread does have a lot of work that could be taken over by other threads available, but does not share it. If that is indeed the case, jdk9 will not help a lot if anything. If I look at another referenced bug [1], AbstractNodeQueue seems kind of a "textbook" linked list. Fixing the application seems to be the easiest option here. Thanks for your effort in pinning this down, ? Thomas [1]?https://github.com/akka/akka/issues/17547 > I'll try with jdk9 tip later today since you mention there are more > fixes there. > > configure summary looked like this, I think everything here is in > order: > > Configuration summary: > * Debug level:????release > * JDK variant:????normal > * JVM variants:???server > * OpenJDK target: OS: linux, CPU architecture: x86, address length: > 64 > > Tools summary: > * Boot JDK:???????java version "1.8.0_101" Java(TM) SE Runtime > Environment (build 1.8.0_101-b13) Java HotSpot(TM) 64-Bit Server VM > (build 25.101-b13, mixed mode)??(at /var/tmp/jdk1.8.0_101) > * C Compiler:?????gcc (GCC) 4.8.5 20150623 (Red Hat-4) version 4.8.5 > (at /usr/bin/gcc) > * C++ Compiler:???g++ (GCC) 4.8.5 20150623 (Red Hat-4) version 4.8.5 > (at /usr/bin/g++) Yes, that's fine. Thanks, ? Thomas From yu.zhang at oracle.com Tue Sep 6 16:25:39 2016 From: yu.zhang at oracle.com (yu.zhang at oracle.com) Date: Tue, 6 Sep 2016 09:25:39 -0700 Subject: High termination times pre-concurrent cycle in G1 In-Reply-To: References:

<3227fd9b-3f11-3e78-9768-1545b4153219@oracle.com>

<1472816590.4493.20.camel@oracle.com> <1472824900.4493.34.camel@oracle.com> Message-ID: <157af541-2e05-3582-b2e6-e9fd6ce3deea@oracle.com> William, While this might not help the termination time, I hope adjusting the region size can get rid of some of the humongous objects. It seems the humongous objects are filling old gen, about ~5g. The full gc can clean 10g. Can you try increase region size to 8m or 16m? Thanks Jenny On 09/05/2016 03:02 AM, William Good wrote: > Hi Thomas, > > Attached is a log file from a run with this morning's tip. I don't see > any change unfortunately. > > I'll try with jdk9 tip later today since you mention there are more fixes there. > > configure summary looked like this, I think everything here is in order: > > Configuration summary: > * Debug level: release > * JDK variant: normal > * JVM variants: server > * OpenJDK target: OS: linux, CPU architecture: x86, address length: 64 > > Tools summary: > * Boot JDK: java version "1.8.0_101" Java(TM) SE Runtime > Environment (build 1.8.0_101-b13) Java HotSpot(TM) 64-Bit Server VM > (build 25.101-b13, mixed mode) (at /var/tmp/jdk1.8.0_101) > * C Compiler: gcc (GCC) 4.8.5 20150623 (Red Hat-4) version 4.8.5 > (at /usr/bin/gcc) > * C++ Compiler: g++ (GCC) 4.8.5 20150623 (Red Hat-4) version 4.8.5 > (at /usr/bin/g++) > > Best, > William > > > On Fri, Sep 2, 2016 at 4:01 PM, Thomas Schatzl > wrote: >> Hi, >> >> On Fri, 2016-09-02 at 15:08 +0200, William Good wrote: >>> Thomas, >>> >>> More than happy to build and test with a patched JDK. Let me know >>> what I need to do, at least what sources to grab (I think I can build >>> it once I've got them). >>> >>> I learned this morning that my belief that significant tenuring >>> wasn't taking place was possibly wrong. Our problem seems to have >>> disappeared with an akka upgrade [1] and my guess for the relevant >>> fix [2] indicates that unintended tenuring was occurring at some >>> point (teh tenuring has never been significant enough for us to >>> notice). However as long as I'm unable to reproduce with the new >>> version I'm happy to continue testing using the older version I know >>> to reproduce, as I don't think this G1 behavior is intended. >> this bug report makes me tend to believe that the suggested fix (for >> JDK-8152438) will actually fix the issue. Let me explain: >> >> Every G1 thread has two work queues, one fixed size public one where >> others can steal from, and one resizable one that is private. Work >> (references) is first put into the public one, and then if it is full, >> into the private one. >> Threads first process their private buffers, so that others can steal >> from the public one while they are working on it. >> Due to some conditions it can happen that the public queues are already >> completely empty, while one thread is still busy for a long time with >> its private one. I.e. the work in the private queue can be so that it >> never generates more work in the public queue that others can steal and >> continue work from. So they wait. >> >> That can cause this high termination time. Of course this situation can >> occur multiple times during a GC, so that every thread gets his fair >> share of waiting :) >> >> That mentioned fix has been pushed into the http://hg.openjdk.java.net/ >> jdk8u/jdk8u-dev/ repository. It should be a matter of pulling it. >> The README-builds file in the repo has build instructions. >> >> It would be really nice if we could track down your problem to this >> issue, or at least significantly improve it. (JDK9 has more significant >> patches in that area iirc, but this one is probably the most >> important). >> >> Thanks, >> Thomas >> From charlie.hunt at oracle.com Wed Sep 7 11:59:10 2016 From: charlie.hunt at oracle.com (charlie hunt) Date: Wed, 7 Sep 2016 06:59:10 -0500 Subject: Java application getting paused when pfiles or pstack command run for it In-Reply-To: References: Message-ID: <91E55761-6249-4ECA-87C2-FA779AA8CCD7@oracle.com> Hi Amit, pfiles, pstack and pldd stops the process while they do their work. The following is directly from the p-tools (pfiles,etc.) man page, and from Oracle docs on p-tools: > These proc tools (p-tools) stop their target processes while inspecting them and reporting the results: files, pled, mmap, and stack. A process can do nothing while it is stopped. As for why it may take up to 15 seconds? It may be that there are a large number file descriptors in use, etc. or a (large) portion of that time is spent getting all threads in the JVM and app to come to stopped state, or a combination of both. hths, charlie > On Sep 7, 2016, at 5:33 AM, Amit Mishra wrote: > > Hello Charlie/team, > > I need your expert help on one of Production issue whereas when pfiles command whenever runs then it cause java application freeze until command get completed.(it is causing long pauses of up-to 15 seconds) > > During this whole time situations appears as JVM freeze as TD?s for Application PID stopped generating, GC logs are getting paused and application stopped catering traffic. > > What could be the cause of it, Application Java version is 1.6u45.(Application PID is 18387) > > 3 samples from Top command when pfiles ,pldd and pstack command cause Application freeze(as TD?s was also not coming we treated this as JVM freeze). > > > ======================================================== > 1st Sample: > load averages: 1.11, 1.00, 0.98; up 68+20:03:01 21:41:59 > 94 processes: 91 sleeping, 1 stopped, 2 on cpu > CPU states: 85.6% idle, 0.4% user, 14.1% kernel, 0.0% iowait, 0.0% swap > Memory: 64G phys mem, 30G free mem, 16G total swap, 16G free swap > > PID USERNAME LWP PRI NICE SIZE RES STATE TIME CPU COMMAND > 5251 root 1 0 17 2872K 1396K cpu/2 0:00 6.82% pfiles 18387 > 18387 rkadm 999 0 0 29G 23G stop 878.5H 5.73% /usr/Java_1.6_45/bin/amd64/java -Djava.util.logging.manager=com.redknee.framewo > > ====================================================================== > 2nd Sample: > PID USERNAME LWP PRI NICE SIZE RES STATE TIME CPU COMMAND > 10145 root 1 0 17 2620K 1676K cpu/2 0:00 3.90% pldd 18387 > 5352 root 4 6 17 31M 27M sleep 0:01 3.33% pkgserv -N pkgchk > 18387 rkadm 999 59 0 29G 23G stop 878.5H 0.82% /usr/Java_1.6_45/bin/amd64/java -Djava.util.logging.manager=com.redknee.framewo > > ================================================================= > 3rd Sample > load averages: 1.21, 1.16, 1.15; up 68+18:02:37 19:41:35 > 88 processes: 85 sleeping, 1 stopped, 2 on cpu > CPU states: 85.9% idle, 0.2% user, 13.8% kernel, 0.0% iowait, 0.0% swap > Memory: 64G phys mem, 30G free mem, 16G total swap, 16G free swap > > PID USERNAME LWP PRI NICE SIZE RES STATE TIME CPU COMMAND > 1250 rkadm 1 30 0 28M 11M cpu/1 0:09 12.49% pstack 18387 > 19859 rkadm 392 59 0 3322M 2342M sleep 69.6H 0.46% /usr/Java_1.6_45/bin/java -classpath /usr/Java_1.6_45/lib/tools.jar:/opt/redkne > 1257 rkadm 1 59 0 3128K 1756K cpu/3 0:00 0.05% top -c -d 1 -s 1 100 > 1024 root 12 6 17 154M 109M sleep 118:09 0.01% /opt/IBM/SCM/client/../_jvm/bin/java -Xint -Xmx128m -Djlog.logCmdPort=1953 -Dsu > 15036 root 32 59 0 138M 74M sleep 85:05 0.01% /usr/java/bin/java -Dviper.fifo.path=/var/run/smc898/boot.fifo -Xmx128m -Dsun.s > 1075 noaccess 19 59 0 97M 91M sleep 69:05 0.01% /usr/java/bin/java -server -Xmx128m -XX:+UseParallelGC -XX:ParallelGCThreads=4 > 18285 rkadm 23 59 0 110M 53M sleep 42:39 0.01% orbd -ORBInitialPort 20000 -port 20100 > 18289 rkadm 21 59 0 105M 45M sleep 42:02 0.01% orbd -ORBInitialPort 21000 -port 21100 /opt/redknee/log/rkctl.log > 185 root 1 59 0 2472K 1108K sleep 46:38 0.01% /usr/lib/inet/in.mpathd -a > 1254 root 43 59 0 47M 22M sleep 12:32 0.00% /opt/IBM/ITM/sol606/ul/bin/kulagent > 1467 root 44 59 0 61M 32M sleep 157:12 0.00% /opt/IBM/ITM/sol606/ux/bin/kuxagent > 665 root 1 59 0 9084K 2596K sleep 1:41 0.00% /usr/lib/sendmail -bd -q15m > 7 root 14 59 0 12M 10M sleep 1:58 0.00% /lib/svc/bin/svc.startd > 502 root 1 59 0 1444K 748K sleep 0:18 0.00% /usr/lib/utmpd > 676 root 1 100 -20 2652K 1424K sleep 5:32 0.00% /usr/lib/inet/xntpd > 574 root 4 59 0 6412K 3068K sleep 4:19 0.00% /usr/lib/inet/inetd start > 643 root 1 59 0 2468K 1304K sleep 0:00 0.00% /usr/lib/snmp/snmpdx -y -c /etc/snmp/conf > 4933 root 2 59 0 590M 578M sleep 499:41 0.00% /usr/bin/dsmc schedule > 5231 root 7 59 0 44M 17M sleep 2:59 0.00% /opt/IBM/ITM/sol606/ux/bin/kcawd > 18387 rkadm 999 59 0 29G 23G stop 877.2H 0.00% /usr/Java_1.6_45/bin/amd64/java -Djava.util.logging.manager=com.redknee.framewo > > > Thanks, > Amit Mishra -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlie.hunt at oracle.com Wed Sep 7 12:59:19 2016 From: charlie.hunt at oracle.com (Charlie Hunt) Date: Wed, 7 Sep 2016 05:59:19 -0700 (PDT) Subject: Java application getting paused when pfiles or pstack command run for it Message-ID: <493861e7-b224-4303-95fb-2da7e8be0f91@default> Your description sounds like it may be distributed GC running at its default, every 60 minutes. A look at GC logs would help confirm that's the case. GC logs are usually the best place to start looking when looking for sources of application pauses. If it turns out to be distributed GC, you can change the frequency at which it runs by adding the following system properties: -Dsun.rmi.dgc.client.gcInterval= -Dsun.rmi.dgc.server.gcInterval= As an extreme example, you could effectively tell distributed GC to not run by setting: -Dsun.rmi.dgc.client.gcInterval=Long.MAX_VALUE -Dsun.rmi.dgc.server.gcInterval=Long.MAX_VALUE Or, you could add the following command line option: -XX:+DisableExplicitGC hths, charlie ----- Original Message ----- From: amit.mishra at redknee.com To: charlie.hunt at oracle.com Cc: hotspot-gc-use at openjdk.java.net Sent: Wednesday, September 7, 2016 7:08:23 AM GMT -06:00 US/Canada Central Subject: RE: Java application getting paused when pfiles or pstack command run for it Thank you very much Charles, but initially Customer reported a Production issue where java Application process hangs periodically for a few seconds at fixed point of time of every hour.(say 17:41,18:41,19:41). We tried to investigate it by taking TD?s to see where application threads are blocked but TD didn?t came up due to which we conclude that it is JVM freeze. Further to analyze that we were supposed to analyze process threads using pstack but as pstack itself is causing application pause so in your opinion what are the best commands/tools to analyze JVM freeze when Thread dumps stopped coming. Regards, Amit From: charlie hunt [mailto:charlie.hunt at oracle.com] Sent: Wednesday, September 7, 2016 17:29 To: Amit Mishra Cc: hotspot-gc-use at openjdk.java.net Subject: Re: Java application getting paused when pfiles or pstack command run for it Hi Amit, pfiles, pstack and pldd stops the process while they do their work. The following is directly from the p-tools (pfiles,etc.) man page, and from Oracle docs on p-tools: These proc tools (p-tools) stop their target processes while inspecting them and reporting the results: files , pled , mmap , and stack . A process can do nothing while it is stopped. As for why it may take up to 15 seconds? It may be that there are a large number file descriptors in use, etc. or a (large) portion of that time is spent getting all threads in the JVM and app to come to stopped state, or a combination of both. hths, charlie On Sep 7, 2016, at 5:33 AM, Amit Mishra < amit.mishra at redknee.com > wrote: Hello Charlie/team, I need your expert help on one of Production issue whereas when pfiles command whenever runs then it cause java application freeze until command get completed.(it is causing long pauses of up-to 15 seconds) During this whole time situations appears as JVM freeze as TD?s for Application PID stopped generating, GC logs are getting paused and application stopped catering traffic. What could be the cause of it, Application Java version is 1.6u45.(Application PID is 18387) 3 samples from Top command when pfiles ,pldd and pstack command cause Application freeze(as TD?s was also not coming we treated this as JVM freeze). ======================================================== 1 st Sample: load averages: 1.11, 1.00, 0.98; up 68+20:03:01 21:41:59 94 processes: 91 sleeping, 1 stopped, 2 on cpu CPU states: 85.6% idle, 0.4% user, 14.1% kernel, 0.0% iowait, 0.0% swap Memory: 64G phys mem, 30G free mem, 16G total swap, 16G free swap PID USERNAME LWP PRI NICE SIZE RES STATE TIME CPU COMMAND 5251 root 1 0 17 2872K 1396K cpu/2 0:00 6.82% pfiles 18387 18387 rkadm 999 0 0 29G 23G stop 878.5H 5.73% /usr/Java_1.6_45/bin/amd64/java -Djava.util.logging.manager=com.redknee.framewo ====================================================================== 2 nd Sample: PID USERNAME LWP PRI NICE SIZE RES STATE TIME CPU COMMAND 10145 root 1 0 17 2620K 1676K cpu/2 0:00 3.90% pldd 18387 5352 root 4 6 17 31M 27M sleep 0:01 3.33% pkgserv -N pkgchk 18387 rkadm 999 59 0 29G 23G stop 878.5H 0.82% /usr/Java_1.6_45/bin/amd64/java -Djava.util.logging.manager=com.redknee.framewo ================================================================= 3 rd Sample load averages: 1.21, 1.16, 1.15; up 68+18:02:37 19:41:35 88 processes: 85 sleeping, 1 stopped, 2 on cpu CPU states: 85.9% idle, 0.2% user, 13.8% kernel, 0.0% iowait, 0.0% swap Memory: 64G phys mem, 30G free mem, 16G total swap, 16G free swap PID USERNAME LWP PRI NICE SIZE RES STATE TIME CPU COMMAND 1250 rkadm 1 30 0 28M 11M cpu/1 0:09 12.49% pstack 18387 19859 rkadm 392 59 0 3322M 2342M sleep 69.6H 0.46% /usr/Java_1.6_45/bin/java -classpath /usr/Java_1.6_45/lib/tools.jar:/opt/redkne 1257 rkadm 1 59 0 3128K 1756K cpu/3 0:00 0.05% top -c -d 1 -s 1 100 1024 root 12 6 17 154M 109M sleep 118:09 0.01% /opt/IBM/SCM/client/../_jvm/bin/java -Xint -Xmx128m -Djlog.logCmdPort=1953 -Dsu 15036 root 32 59 0 138M 74M sleep 85:05 0.01% /usr/java/bin/java -Dviper.fifo.path=/var/run/smc898/boot.fifo -Xmx128m -Dsun.s 1075 noaccess 19 59 0 97M 91M sleep 69:05 0.01% /usr/java/bin/java -server -Xmx128m -XX:+UseParallelGC -XX:ParallelGCThreads=4 18285 rkadm 23 59 0 110M 53M sleep 42:39 0.01% orbd -ORBInitialPort 20000 -port 20100 18289 rkadm 21 59 0 105M 45M sleep 42:02 0.01% orbd -ORBInitialPort 21000 -port 21100 /opt/redknee/log/rkctl.log 185 root 1 59 0 2472K 1108K sleep 46:38 0.01% /usr/lib/inet/in.mpathd -a 1254 root 43 59 0 47M 22M sleep 12:32 0.00% /opt/IBM/ITM/sol606/ul/bin/kulagent 1467 root 44 59 0 61M 32M sleep 157:12 0.00% /opt/IBM/ITM/sol606/ux/bin/kuxagent 665 root 1 59 0 9084K 2596K sleep 1:41 0.00% /usr/lib/sendmail -bd -q15m 7 root 14 59 0 12M 10M sleep 1:58 0.00% /lib/svc/bin/svc.startd 502 root 1 59 0 1444K 748K sleep 0:18 0.00% /usr/lib/utmpd 676 root 1 100 -20 2652K 1424K sleep 5:32 0.00% /usr/lib/inet/xntpd 574 root 4 59 0 6412K 3068K sleep 4:19 0.00% /usr/lib/inet/inetd start 643 root 1 59 0 2468K 1304K sleep 0:00 0.00% /usr/lib/snmp/snmpdx -y -c /etc/snmp/conf 4933 root 2 59 0 590M 578M sleep 499:41 0.00% /usr/bin/dsmc schedule 5231 root 7 59 0 44M 17M sleep 2:59 0.00% /opt/IBM/ITM/sol606/ux/bin/kcawd 1 8387 rkadm 999 59 0 29G 23G stop 877.2H 0.00% /usr/Java_1.6_45/bin/amd64/java -Djava.util.logging.manager=com.redknee.framewo Thanks, Amit Mishra -------------- next part -------------- An HTML attachment was scrubbed... URL: From amit.mishra at redknee.com Wed Sep 7 10:33:26 2016 From: amit.mishra at redknee.com (Amit Mishra) Date: Wed, 7 Sep 2016 10:33:26 +0000 Subject: Java application getting paused when pfiles or pstack command run for it Message-ID: Hello Charlie/team, I need your expert help on one of Production issue whereas when pfiles command whenever runs then it cause java application freeze until command get completed.(it is causing long pauses of up-to 15 seconds) During this whole time situations appears as JVM freeze as TD?s for Application PID stopped generating, GC logs are getting paused and application stopped catering traffic. What could be the cause of it, Application Java version is 1.6u45.(Application PID is 18387) 3 samples from Top command when pfiles ,pldd and pstack command cause Application freeze(as TD?s was also not coming we treated this as JVM freeze). ======================================================== 1st Sample: load averages: 1.11, 1.00, 0.98; up 68+20:03:01 21:41:59 94 processes: 91 sleeping, 1 stopped, 2 on cpu CPU states: 85.6% idle, 0.4% user, 14.1% kernel, 0.0% iowait, 0.0% swap Memory: 64G phys mem, 30G free mem, 16G total swap, 16G free swap PID USERNAME LWP PRI NICE SIZE RES STATE TIME CPU COMMAND 5251 root 1 0 17 2872K 1396K cpu/2 0:00 6.82% pfiles 18387 18387 rkadm 999 0 0 29G 23G stop 878.5H 5.73% /usr/Java_1.6_45/bin/amd64/java -Djava.util.logging.manager=com.redknee.framewo ====================================================================== 2nd Sample: PID USERNAME LWP PRI NICE SIZE RES STATE TIME CPU COMMAND 10145 root 1 0 17 2620K 1676K cpu/2 0:00 3.90% pldd 18387 5352 root 4 6 17 31M 27M sleep 0:01 3.33% pkgserv -N pkgchk 18387 rkadm 999 59 0 29G 23G stop 878.5H 0.82% /usr/Java_1.6_45/bin/amd64/java -Djava.util.logging.manager=com.redknee.framewo ================================================================= 3rd Sample load averages: 1.21, 1.16, 1.15; up 68+18:02:37 19:41:35 88 processes: 85 sleeping, 1 stopped, 2 on cpu CPU states: 85.9% idle, 0.2% user, 13.8% kernel, 0.0% iowait, 0.0% swap Memory: 64G phys mem, 30G free mem, 16G total swap, 16G free swap PID USERNAME LWP PRI NICE SIZE RES STATE TIME CPU COMMAND 1250 rkadm 1 30 0 28M 11M cpu/1 0:09 12.49% pstack 18387 19859 rkadm 392 59 0 3322M 2342M sleep 69.6H 0.46% /usr/Java_1.6_45/bin/java -classpath /usr/Java_1.6_45/lib/tools.jar:/opt/redkne 1257 rkadm 1 59 0 3128K 1756K cpu/3 0:00 0.05% top -c -d 1 -s 1 100 1024 root 12 6 17 154M 109M sleep 118:09 0.01% /opt/IBM/SCM/client/../_jvm/bin/java -Xint -Xmx128m -Djlog.logCmdPort=1953 -Dsu 15036 root 32 59 0 138M 74M sleep 85:05 0.01% /usr/java/bin/java -Dviper.fifo.path=/var/run/smc898/boot.fifo -Xmx128m -Dsun.s 1075 noaccess 19 59 0 97M 91M sleep 69:05 0.01% /usr/java/bin/java -server -Xmx128m -XX:+UseParallelGC -XX:ParallelGCThreads=4 18285 rkadm 23 59 0 110M 53M sleep 42:39 0.01% orbd -ORBInitialPort 20000 -port 20100 18289 rkadm 21 59 0 105M 45M sleep 42:02 0.01% orbd -ORBInitialPort 21000 -port 21100 /opt/redknee/log/rkctl.log 185 root 1 59 0 2472K 1108K sleep 46:38 0.01% /usr/lib/inet/in.mpathd -a 1254 root 43 59 0 47M 22M sleep 12:32 0.00% /opt/IBM/ITM/sol606/ul/bin/kulagent 1467 root 44 59 0 61M 32M sleep 157:12 0.00% /opt/IBM/ITM/sol606/ux/bin/kuxagent 665 root 1 59 0 9084K 2596K sleep 1:41 0.00% /usr/lib/sendmail -bd -q15m 7 root 14 59 0 12M 10M sleep 1:58 0.00% /lib/svc/bin/svc.startd 502 root 1 59 0 1444K 748K sleep 0:18 0.00% /usr/lib/utmpd 676 root 1 100 -20 2652K 1424K sleep 5:32 0.00% /usr/lib/inet/xntpd 574 root 4 59 0 6412K 3068K sleep 4:19 0.00% /usr/lib/inet/inetd start 643 root 1 59 0 2468K 1304K sleep 0:00 0.00% /usr/lib/snmp/snmpdx -y -c /etc/snmp/conf 4933 root 2 59 0 590M 578M sleep 499:41 0.00% /usr/bin/dsmc schedule 5231 root 7 59 0 44M 17M sleep 2:59 0.00% /opt/IBM/ITM/sol606/ux/bin/kcawd 18387 rkadm 999 59 0 29G 23G stop 877.2H 0.00% /usr/Java_1.6_45/bin/amd64/java -Djava.util.logging.manager=com.redknee.framewo Thanks, Amit Mishra -------------- next part -------------- An HTML attachment was scrubbed... URL: From amit.mishra at redknee.com Wed Sep 7 12:08:19 2016 From: amit.mishra at redknee.com (Amit Mishra) Date: Wed, 7 Sep 2016 12:08:19 +0000 Subject: Java application getting paused when pfiles or pstack command run for it In-Reply-To: <91E55761-6249-4ECA-87C2-FA779AA8CCD7@oracle.com> References: <91E55761-6249-4ECA-87C2-FA779AA8CCD7@oracle.com> Message-ID: Thank you very much Charles, but initially Customer reported a Production issue where java Application process hangs periodically for a few seconds at fixed point of time of every hour.(say 17:41,18:41,19:41). We tried to investigate it by taking TD?s to see where application threads are blocked but TD didn?t came up due to which we conclude that it is JVM freeze. Further to analyze that we were supposed to analyze process threads using pstack but as pstack itself is causing application pause so in your opinion what are the best commands/tools to analyze JVM freeze when Thread dumps stopped coming. Regards, Amit From: charlie hunt [mailto:charlie.hunt at oracle.com] Sent: Wednesday, September 7, 2016 17:29 To: Amit Mishra Cc: hotspot-gc-use at openjdk.java.net Subject: Re: Java application getting paused when pfiles or pstack command run for it Hi Amit, pfiles, pstack and pldd stops the process while they do their work. The following is directly from the p-tools (pfiles,etc.) man page, and from Oracle docs on p-tools: These proc tools (p-tools) stop their target processes while inspecting them and reporting the results: files, pled, mmap, and stack. A process can do nothing while it is stopped. As for why it may take up to 15 seconds? It may be that there are a large number file descriptors in use, etc. or a (large) portion of that time is spent getting all threads in the JVM and app to come to stopped state, or a combination of both. hths, charlie On Sep 7, 2016, at 5:33 AM, Amit Mishra > wrote: Hello Charlie/team, I need your expert help on one of Production issue whereas when pfiles command whenever runs then it cause java application freeze until command get completed.(it is causing long pauses of up-to 15 seconds) During this whole time situations appears as JVM freeze as TD?s for Application PID stopped generating, GC logs are getting paused and application stopped catering traffic. What could be the cause of it, Application Java version is 1.6u45.(Application PID is 18387) 3 samples from Top command when pfiles ,pldd and pstack command cause Application freeze(as TD?s was also not coming we treated this as JVM freeze). ======================================================== 1st Sample: load averages: 1.11, 1.00, 0.98; up 68+20:03:01 21:41:59 94 processes: 91 sleeping, 1 stopped, 2 on cpu CPU states: 85.6% idle, 0.4% user, 14.1% kernel, 0.0% iowait, 0.0% swap Memory: 64G phys mem, 30G free mem, 16G total swap, 16G free swap PID USERNAME LWP PRI NICE SIZE RES STATE TIME CPU COMMAND 5251 root 1 0 17 2872K 1396K cpu/2 0:00 6.82% pfiles 18387 18387 rkadm 999 0 0 29G 23G stop 878.5H 5.73% /usr/Java_1.6_45/bin/amd64/java -Djava.util.logging.manager=com.redknee.framewo ====================================================================== 2nd Sample: PID USERNAME LWP PRI NICE SIZE RES STATE TIME CPU COMMAND 10145 root 1 0 17 2620K 1676K cpu/2 0:00 3.90% pldd 18387 5352 root 4 6 17 31M 27M sleep 0:01 3.33% pkgserv -N pkgchk 18387 rkadm 999 59 0 29G 23G stop 878.5H 0.82% /usr/Java_1.6_45/bin/amd64/java -Djava.util.logging.manager=com.redknee.framewo ================================================================= 3rd Sample load averages: 1.21, 1.16, 1.15; up 68+18:02:37 19:41:35 88 processes: 85 sleeping, 1 stopped, 2 on cpu CPU states: 85.9% idle, 0.2% user, 13.8% kernel, 0.0% iowait, 0.0% swap Memory: 64G phys mem, 30G free mem, 16G total swap, 16G free swap PID USERNAME LWP PRI NICE SIZE RES STATE TIME CPU COMMAND 1250 rkadm 1 30 0 28M 11M cpu/1 0:09 12.49% pstack 18387 19859 rkadm 392 59 0 3322M 2342M sleep 69.6H 0.46% /usr/Java_1.6_45/bin/java -classpath /usr/Java_1.6_45/lib/tools.jar:/opt/redkne 1257 rkadm 1 59 0 3128K 1756K cpu/3 0:00 0.05% top -c -d 1 -s 1 100 1024 root 12 6 17 154M 109M sleep 118:09 0.01% /opt/IBM/SCM/client/../_jvm/bin/java -Xint -Xmx128m -Djlog.logCmdPort=1953 -Dsu 15036 root 32 59 0 138M 74M sleep 85:05 0.01% /usr/java/bin/java -Dviper.fifo.path=/var/run/smc898/boot.fifo -Xmx128m -Dsun.s 1075 noaccess 19 59 0 97M 91M sleep 69:05 0.01% /usr/java/bin/java -server -Xmx128m -XX:+UseParallelGC -XX:ParallelGCThreads=4 18285 rkadm 23 59 0 110M 53M sleep 42:39 0.01% orbd -ORBInitialPort 20000 -port 20100 18289 rkadm 21 59 0 105M 45M sleep 42:02 0.01% orbd -ORBInitialPort 21000 -port 21100 /opt/redknee/log/rkctl.log 185 root 1 59 0 2472K 1108K sleep 46:38 0.01% /usr/lib/inet/in.mpathd -a 1254 root 43 59 0 47M 22M sleep 12:32 0.00% /opt/IBM/ITM/sol606/ul/bin/kulagent 1467 root 44 59 0 61M 32M sleep 157:12 0.00% /opt/IBM/ITM/sol606/ux/bin/kuxagent 665 root 1 59 0 9084K 2596K sleep 1:41 0.00% /usr/lib/sendmail -bd -q15m 7 root 14 59 0 12M 10M sleep 1:58 0.00% /lib/svc/bin/svc.startd 502 root 1 59 0 1444K 748K sleep 0:18 0.00% /usr/lib/utmpd 676 root 1 100 -20 2652K 1424K sleep 5:32 0.00% /usr/lib/inet/xntpd 574 root 4 59 0 6412K 3068K sleep 4:19 0.00% /usr/lib/inet/inetd start 643 root 1 59 0 2468K 1304K sleep 0:00 0.00% /usr/lib/snmp/snmpdx -y -c /etc/snmp/conf 4933 root 2 59 0 590M 578M sleep 499:41 0.00% /usr/bin/dsmc schedule 5231 root 7 59 0 44M 17M sleep 2:59 0.00% /opt/IBM/ITM/sol606/ux/bin/kcawd 18387 rkadm 999 59 0 29G 23G stop 877.2H 0.00% /usr/Java_1.6_45/bin/amd64/java -Djava.util.logging.manager=com.redknee.framewo Thanks, Amit Mishra -------------- next part -------------- An HTML attachment was scrubbed... URL: From poonam.bajaj at oracle.com Wed Sep 7 16:58:25 2016 From: poonam.bajaj at oracle.com (Poonam Bajaj Parhar) Date: Wed, 7 Sep 2016 09:58:25 -0700 Subject: Java application getting paused when pfiles or pstack command run for it In-Reply-To: <493861e7-b224-4303-95fb-2da7e8be0f91@default> References: <493861e7-b224-4303-95fb-2da7e8be0f91@default> Message-ID: Hello Amit, If you suspect that the JVM is hung, then you could try jstack -F to collect the stack traces. And if that too does not work then try collecting core file, and use jstack and/or native debuggers to collect the stack traces including native frames to see what is going on. Thanks, Poonam On 9/7/2016 5:59 AM, Charlie Hunt wrote: > Your description sounds like it may be distributed GC running at its > default, every 60 minutes. A look at GC logs would help confirm > that's the case. > > GC logs are usually the best place to start looking when looking for > sources of application pauses. > > If it turns out to be distributed GC, you can change the frequency at > which it runs by adding the following system properties: > -Dsun.rmi.dgc.client.gcInterval= > -Dsun.rmi.dgc.server.gcInterval= > > As an extreme example, you could effectively tell distributed GC to > not run by setting: > -Dsun.rmi.dgc.client.gcInterval=Long.MAX_VALUE > -Dsun.rmi.dgc.server.gcInterval=Long.MAX_VALUE > > Or, you could add the following command line option: > -XX:+DisableExplicitGC > > hths, > > charlie > > ----- Original Message ----- > From: amit.mishra at redknee.com > To: charlie.hunt at oracle.com > Cc: hotspot-gc-use at openjdk.java.net > Sent: Wednesday, September 7, 2016 7:08:23 AM GMT -06:00 US/Canada Central > Subject: RE: Java application getting paused when pfiles or pstack > command run for it > > Thank you very much Charles, but initially Customer reported a > Production issue where java Application process hangs periodically for > a few seconds at fixed point of time of every hour.(say > 17:41,18:41,19:41). > > We tried to investigate it by taking TD?s to see where application > threads are blocked but TD didn?t came up due to which we conclude > that it is JVM freeze. > > Further to analyze that we were supposed to analyze process threads > using pstack but as pstack itself is causing application pause so in > your opinion what are the best commands/tools to analyze JVM freeze > when Thread dumps stopped coming. > > Regards, > > Amit > > *From:*charlie hunt [mailto:charlie.hunt at oracle.com] > *Sent:* Wednesday, September 7, 2016 17:29 > *To:* Amit Mishra > *Cc:* hotspot-gc-use at openjdk.java.net > *Subject:* Re: Java application getting paused when pfiles or pstack > command run for it > > Hi Amit, > > pfiles, pstack and pldd stops the process while they do their work. > > The following is directly from the p-tools (pfiles,etc.) man page, and > from Oracle docs on p-tools: > > These proc tools (p-tools) stop their target processes while > inspecting them and reporting the results: files, pled, mmap, and > stack. A process can do nothing while it is stopped. > > As for why it may take up to 15 seconds? It may be that there are a > large number file descriptors in use, etc. or a (large) portion of > that time is spent getting all threads in the JVM and app to come to > stopped state, or a combination of both. > > hths, > > charlie > > On Sep 7, 2016, at 5:33 AM, Amit Mishra > wrote: > > Hello Charlie/team, > > I need your expert help on one of Production issue whereas when > pfiles command whenever runs then it cause java > application freeze until command get completed.(it is causing long > pauses of up-to 15 seconds) > > During this whole time situations appears as JVM freeze as TD?s > for Application PID stopped generating, GC logs are getting paused > and application stopped catering traffic. > > What could be the cause of it, Application Java version is > 1.6u45.(Application PID is 18387) > > 3 samples from Top command when pfiles ,pldd and pstack command > cause Application freeze(as TD?s was also not coming we treated > this as JVM freeze). > > ======================================================== > > 1^st Sample: > > load averages: 1.11, 1.00, 0.98; up > 68+20:03:01 21:41:59 > > 94 processes: 91 sleeping, 1 stopped, 2 on cpu > > CPU states: 85.6% idle, 0.4% user, 14.1% kernel, 0.0% iowait, > 0.0% swap > > Memory: 64G phys mem, 30G free mem, 16G total swap, 16G free swap > > PID USERNAME LWP PRI NICE SIZE RES STATE TIME CPU COMMAND > > 5251 root 1 0 17 2872K 1396K cpu/2 0:00 6.82%pfiles > 18387 > > 18387rkadm 999 0 0 29G 23Gstop 878.5H 5.73% > /usr/Java_1.6_45/bin/amd64/java > -Djava.util.logging.manager=com.redknee.framewo > > ====================================================================== > > 2^nd Sample: > > PID USERNAME LWP PRI NICE SIZE RES STATE TIME CPU COMMAND > > 10145 root 1 0 17 2620K 1676K cpu/2 0:00 3.90%pldd 18387 > > 5352 root 4 6 17 31M 27M sleep 0:01 3.33% > pkgserv -N pkgchk > > 18387rkadm 999 59 0 29G 23Gstop 878.5H 0.82% > /usr/Java_1.6_45/bin/amd64/java > -Djava.util.logging.manager=com.redknee.framewo > > ================================================================= > > 3^rd Sample > > load averages: 1.21, 1.16, 1.15; up > 68+18:02:37 19:41:35 > > 88 processes: 85 sleeping, 1 stopped, 2 on cpu > > CPU states: 85.9% idle, 0.2% user, 13.8% kernel, 0.0% iowait, > 0.0% swap > > Memory: 64G phys mem, 30G free mem, 16G total swap, 16G free swap > > PID USERNAME LWP PRI NICE SIZE RES STATE TIME CPU COMMAND > > 1250 rkadm 1 30 0 28M 11M cpu/1 0:09 12.49%pstack 18387 > > 19859 rkadm 392 59 0 3322M 2342M sleep 69.6H 0.46% > /usr/Java_1.6_45/bin/java -classpath > /usr/Java_1.6_45/lib/tools.jar:/opt/redkne > > 1257 rkadm 1 59 0 3128K 1756K cpu/3 0:00 0.05% top -c > -d 1 -s 1 100 > > 1024 root 12 6 17 154M 109M sleep 118:09 0.01% > /opt/IBM/SCM/client/../_jvm/bin/java -Xint -Xmx128m > -Djlog.logCmdPort=1953 -Dsu > > 15036 root 32 59 0 138M 74M sleep 85:05 0.01% > /usr/java/bin/java -Dviper.fifo.path=/var/run/smc898/boot.fifo > -Xmx128m -Dsun.s > > 1075 noaccess 19 59 0 97M 91M sleep 69:05 0.01% > /usr/java/bin/java -server -Xmx128m -XX:+UseParallelGC > -XX:ParallelGCThreads=4 > > 18285 rkadm 23 59 0 110M 53M sleep 42:39 0.01% orbd > -ORBInitialPort 20000 -port 20100 > > 18289 rkadm 21 59 0 105M 45M sleep 42:02 0.01% orbd > -ORBInitialPort 21000 -port 21100 /opt/redknee/log/rkctl.log > > 185 root 1 59 0 2472K 1108K sleep 46:38 0.01% > /usr/lib/inet/in.mpathd -a > > 1254 root 43 59 0 47M 22M sleep 12:32 0.00% > /opt/IBM/ITM/sol606/ul/bin/kulagent > > 1467 root 44 59 0 61M 32M sleep 157:12 0.00% > /opt/IBM/ITM/sol606/ux/bin/kuxagent > > 665 root 1 59 0 9084K 2596K sleep 1:41 0.00% > /usr/lib/sendmail -bd -q15m > > 7 root 14 59 0 12M 10M sleep 1:58 0.00% > /lib/svc/bin/svc.startd > > 502 root 1 59 0 1444K 748K sleep 0:18 0.00% /usr/lib/utmpd > > 676 root 1 100 -20 2652K 1424K sleep 5:32 0.00% > /usr/lib/inet/xntpd > > 574 root 4 59 0 6412K 3068K sleep 4:19 0.00% > /usr/lib/inet/inetd start > > 643 root 1 59 0 2468K 1304K sleep 0:00 0.00% > /usr/lib/snmp/snmpdx -y -c /etc/snmp/conf > > 4933 root 2 59 0 590M 578M sleep 499:41 0.00% > /usr/bin/dsmc schedule > > 5231 root 7 59 0 44M 17M sleep 2:59 0.00% > /opt/IBM/ITM/sol606/ux/bin/kcawd > > 18387 rkadm 999 59 0 29G 23Gstop 877.2H 0.00% > /usr/Java_1.6_45/bin/amd64/java > -Djava.util.logging.manager=com.redknee.framewo > > Thanks, > > Amit Mishra > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: From poonam.bajaj at oracle.com Thu Sep 8 13:43:59 2016 From: poonam.bajaj at oracle.com (Poonam Bajaj Parhar) Date: Thu, 8 Sep 2016 06:43:59 -0700 Subject: Java application getting paused when pfiles or pstack command run for it In-Reply-To: References: <493861e7-b224-4303-95fb-2da7e8be0f91@default> Message-ID: <4c08b4c1-446d-8062-a969-4c0c8b39510e@oracle.com> Hello Amit, If it is not a complete JVM freeze, then a core file may not be necessary. The stack traces collected from the unresponsive process should help in getting some clue. From your description, it appears that there is some periodic activity happening on specific day of the week and that is slowing things down. Have you considered collecting Java Flight Recording for the slow hours to see all the events happening in the application for that duration? Thanks, Poonam On 9/8/2016 4:43 AM, Amit Mishra wrote: > > Thanks Poonam, do you have exact set of commands to be executed on > impacted application code, I was thinking of core dump but it will > cause complete application breakdown while in current scenario this > pause is happening only for few seconds regularly after every hour on > specific day of week in our case it is Tuesday and not daily. > > Also application become responsive on its own after few seconds so I > am suspecting it could be due to those proc or similar commands for > application PID. > > Thanks Charlie for your response, first thing we checked here was GC > logs which is all clean , RMI DGC frequency is 24 hour and even that > is disabled as we have ExplicitGCInvokeCMS flag in place. > > Moreover these application issues is not happening daily but once in a > week on particular day during few hours in morning and evening. > > argv[32]: -XX:+ExplicitGCInvokesConcurrent > > argv[38]: -Dsun.rmi.dgc.server.gcInterval=86400000 > > argv[39]: -Dsun.rmi.dgc.client.gcInterval=86400000 > > I will collect top,netstat -an,ps -eaf next Tuesday and in case if > there will be any new findings then I will share with you guys. > > Thank you very much once again for your kind support. > > Regards, > > Amit Mishra > > *From:*Poonam Bajaj Parhar [mailto:poonam.bajaj at oracle.com] > *Sent:* Wednesday, September 7, 2016 22:28 > *To:* Amit Mishra > *Cc:* hotspot-gc-use at openjdk.java.net > *Subject:* Re: Java application getting paused when pfiles or pstack > command run for it > > Hello Amit, > > If you suspect that the JVM is hung, then you could try jstack -F to > collect the stack traces. And if that too does not work then try > collecting core file, and use jstack and/or native debuggers to > collect the stack traces including native frames to see what is going on. > > Thanks, > Poonam > > On 9/7/2016 5:59 AM, Charlie Hunt wrote: > > Your description sounds like it may be distributed GC running at > its default, every 60 minutes. A look at GC logs would help > confirm that's the case. > > GC logs are usually the best place to start looking when looking > for sources of application pauses. > > If it turns out to be distributed GC, you can change the frequency > at which it runs by adding the following system properties: > -Dsun.rmi.dgc.client.gcInterval= > -Dsun.rmi.dgc.server.gcInterval= > > As an extreme example, you could effectively tell distributed GC > to not run by setting: > -Dsun.rmi.dgc.client.gcInterval=Long.MAX_VALUE > -Dsun.rmi.dgc.server.gcInterval=Long.MAX_VALUE > > Or, you could add the following command line option: > -XX:+DisableExplicitGC > > hths, > > charlie > > ----- Original Message ----- > From: amit.mishra at redknee.com > To: charlie.hunt at oracle.com > Cc: hotspot-gc-use at openjdk.java.net > > Sent: Wednesday, September 7, 2016 7:08:23 AM GMT -06:00 US/Canada > Central > Subject: RE: Java application getting paused when pfiles or pstack > command run for it > > > Thank you very much Charles, but initially Customer reported a > Production issue where java Application process hangs periodically > for a few seconds at fixed point of time of every hour.(say > 17:41,18:41,19:41). > > We tried to investigate it by taking TD?s to see where application > threads are blocked but TD didn?t came up due to which we conclude > that it is JVM freeze. > > Further to analyze that we were supposed to analyze process > threads using pstack but as pstack itself is causing application > pause so in your opinion what are the best commands/tools to > analyze JVM freeze when Thread dumps stopped coming. > > Regards, > > Amit > > *From:*charlie hunt [mailto:charlie.hunt at oracle.com] > *Sent:* Wednesday, September 7, 2016 17:29 > *To:* Amit Mishra > > *Cc:* hotspot-gc-use at openjdk.java.net > > *Subject:* Re: Java application getting paused when pfiles or > pstack command run for it > > Hi Amit, > > pfiles, pstack and pldd stops the process while they do their work. > > The following is directly from the p-tools (pfiles,etc.) man page, > and from Oracle docs on p-tools: > > These proc tools (p-tools) stop their target processes while > inspecting them and reporting the results: files, pled, mmap, > and stack. A process can do nothing while it is stopped. > > As for why it may take up to 15 seconds? It may be that there are > a large number file descriptors in use, etc. or a (large) portion > of that time is spent getting all threads in the JVM and app to > come to stopped state, or a combination of both. > > hths, > > charlie > > On Sep 7, 2016, at 5:33 AM, Amit Mishra > > wrote: > > Hello Charlie/team, > > I need your expert help on one of Production issue whereas > when pfiles command whenever runs then it cause > java application freeze until command get completed.(it is > causing long pauses of up-to 15 seconds) > > During this whole time situations appears as JVM freeze as > TD?s for Application PID stopped generating, GC logs are > getting paused and application stopped catering traffic. > > What could be the cause of it, Application Java version is > 1.6u45.(Application PID is 18387) > > 3 samples from Top command when pfiles ,pldd and pstack > command cause Application freeze(as TD?s was also not coming > we treated this as JVM freeze). > > ======================================================== > > 1^st Sample: > > load averages: 1.11, 1.00, 0.98; up > 68+20:03:01 21:41:59 > > 94 processes: 91 sleeping, 1 stopped, 2 on cpu > > CPU states: 85.6% idle, 0.4% user, 14.1% kernel, 0.0% > iowait, 0.0% swap > > Memory: 64G phys mem, 30G free mem, 16G total swap, 16G free swap > > PID USERNAME LWP PRI NICE SIZE RES STATE TIME CPU COMMAND > > 5251 root 1 0 17 2872K 1396K cpu/2 0:00 > 6.82%pfiles 18387 > > 18387rkadm 999 0 0 29G 23Gstop878.5H 5.73% > /usr/Java_1.6_45/bin/amd64/java > -Djava.util.logging.manager=com.redknee.framewo > > ====================================================================== > > 2^nd Sample: > > PID USERNAME LWP PRI NICE SIZE RES STATE TIME CPU COMMAND > > 10145 root 1 0 17 2620K 1676K cpu/2 0:00 3.90%pldd > 18387 > > 5352 root 4 6 17 31M 27M sleep 0:01 3.33% > pkgserv -N pkgchk > > 18387rkadm 999 59 0 29G 23Gstop878.5H 0.82% > /usr/Java_1.6_45/bin/amd64/java > -Djava.util.logging.manager=com.redknee.framewo > > ================================================================= > > 3^rd Sample > > load averages: 1.21, 1.16, 1.15; up > 68+18:02:37 19:41:35 > > 88 processes: 85 sleeping, 1 stopped, 2 on cpu > > CPU states: 85.9% idle, 0.2% user, 13.8% kernel, 0.0% > iowait, 0.0% swap > > Memory: 64G phys mem, 30G free mem, 16G total swap, 16G free swap > > PID USERNAME LWP PRI NICE SIZE RES STATE TIME CPU COMMAND > > 1250 rkadm 1 30 0 28M 11M cpu/1 0:09 12.49%pstack 18387 > > 19859 rkadm 392 59 0 3322M 2342M sleep 69.6H 0.46% > /usr/Java_1.6_45/bin/java -classpath > /usr/Java_1.6_45/lib/tools.jar:/opt/redkne > > 1257 rkadm 1 59 0 3128K 1756K cpu/3 0:00 0.05% > top -c -d 1 -s 1 100 > > 1024 root 12 6 17 154M 109M sleep 118:09 0.01% > /opt/IBM/SCM/client/../_jvm/bin/java -Xint -Xmx128m > -Djlog.logCmdPort=1953 -Dsu > > 15036 root 32 59 0 138M 74M sleep 85:05 0.01% > /usr/java/bin/java -Dviper.fifo.path=/var/run/smc898/boot.fifo > -Xmx128m -Dsun.s > > 1075 noaccess 19 59 0 97M 91M sleep 69:05 0.01% > /usr/java/bin/java -server -Xmx128m -XX:+UseParallelGC > -XX:ParallelGCThreads=4 > > 18285 rkadm 23 59 0 110M 53M sleep 42:39 0.01% > orbd -ORBInitialPort 20000 -port 20100 > > 18289 rkadm 21 59 0 105M 45M sleep 42:02 0.01% > orbd -ORBInitialPort 21000 -port 21100 /opt/redknee/log/rkctl.log > > 185 root 1 59 0 2472K 1108K sleep 46:38 0.01% > /usr/lib/inet/in.mpathd -a > > 1254 root 43 59 0 47M 22M sleep 12:32 0.00% > /opt/IBM/ITM/sol606/ul/bin/kulagent > > 1467 root 44 59 0 61M 32M sleep 157:12 0.00% > /opt/IBM/ITM/sol606/ux/bin/kuxagent > > 665 root 1 59 0 9084K 2596K sleep 1:41 0.00% > /usr/lib/sendmail -bd -q15m > > 7 root 14 59 0 12M 10M sleep 1:58 0.00% > /lib/svc/bin/svc.startd > > 502 root 1 59 0 1444K 748K sleep 0:18 0.00% > /usr/lib/utmpd > > 676 root 1 100 -20 2652K 1424K sleep 5:32 0.00% > /usr/lib/inet/xntpd > > 574 root 4 59 0 6412K 3068K sleep 4:19 0.00% > /usr/lib/inet/inetd start > > 643 root 1 59 0 2468K 1304K sleep 0:00 0.00% > /usr/lib/snmp/snmpdx -y -c /etc/snmp/conf > > 4933 root 2 59 0 590M 578M sleep 499:41 0.00% > /usr/bin/dsmc schedule > > 5231 root 7 59 0 44M 17M sleep 2:59 0.00% > /opt/IBM/ITM/sol606/ux/bin/kcawd > > 18387 rkadm 999 59 0 29G 23Gstop 877.2H 0.00% > /usr/Java_1.6_45/bin/amd64/java > -Djava.util.logging.manager=com.redknee.framewo > > Thanks, > > Amit Mishra > > > > > _______________________________________________ > > hotspot-gc-use mailing list > > hotspot-gc-use at openjdk.java.net > > > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > -------------- next part -------------- An HTML attachment was scrubbed... URL: From amit.mishra at redknee.com Thu Sep 8 11:43:13 2016 From: amit.mishra at redknee.com (Amit Mishra) Date: Thu, 8 Sep 2016 11:43:13 +0000 Subject: Java application getting paused when pfiles or pstack command run for it In-Reply-To: References: <493861e7-b224-4303-95fb-2da7e8be0f91@default> Message-ID: Thanks Poonam, do you have exact set of commands to be executed on impacted application code, I was thinking of core dump but it will cause complete application breakdown while in current scenario this pause is happening only for few seconds regularly after every hour on specific day of week in our case it is Tuesday and not daily. Also application become responsive on its own after few seconds so I am suspecting it could be due to those proc or similar commands for application PID. Thanks Charlie for your response, first thing we checked here was GC logs which is all clean , RMI DGC frequency is 24 hour and even that is disabled as we have ExplicitGCInvokeCMS flag in place. Moreover these application issues is not happening daily but once in a week on particular day during few hours in morning and evening. argv[32]: -XX:+ExplicitGCInvokesConcurrent argv[38]: -Dsun.rmi.dgc.server.gcInterval=86400000 argv[39]: -Dsun.rmi.dgc.client.gcInterval=86400000 I will collect top,netstat -an,ps -eaf next Tuesday and in case if there will be any new findings then I will share with you guys. Thank you very much once again for your kind support. Regards, Amit Mishra From: Poonam Bajaj Parhar [mailto:poonam.bajaj at oracle.com] Sent: Wednesday, September 7, 2016 22:28 To: Amit Mishra Cc: hotspot-gc-use at openjdk.java.net Subject: Re: Java application getting paused when pfiles or pstack command run for it Hello Amit, If you suspect that the JVM is hung, then you could try jstack -F to collect the stack traces. And if that too does not work then try collecting core file, and use jstack and/or native debuggers to collect the stack traces including native frames to see what is going on. Thanks, Poonam On 9/7/2016 5:59 AM, Charlie Hunt wrote: Your description sounds like it may be distributed GC running at its default, every 60 minutes. A look at GC logs would help confirm that's the case. GC logs are usually the best place to start looking when looking for sources of application pauses. If it turns out to be distributed GC, you can change the frequency at which it runs by adding the following system properties: -Dsun.rmi.dgc.client.gcInterval= -Dsun.rmi.dgc.server.gcInterval= As an extreme example, you could effectively tell distributed GC to not run by setting: -Dsun.rmi.dgc.client.gcInterval=Long.MAX_VALUE -Dsun.rmi.dgc.server.gcInterval=Long.MAX_VALUE Or, you could add the following command line option: -XX:+DisableExplicitGC hths, charlie ----- Original Message ----- From: amit.mishra at redknee.com To: charlie.hunt at oracle.com Cc: hotspot-gc-use at openjdk.java.net Sent: Wednesday, September 7, 2016 7:08:23 AM GMT -06:00 US/Canada Central Subject: RE: Java application getting paused when pfiles or pstack command run for it Thank you very much Charles, but initially Customer reported a Production issue where java Application process hangs periodically for a few seconds at fixed point of time of every hour.(say 17:41,18:41,19:41). We tried to investigate it by taking TD's to see where application threads are blocked but TD didn't came up due to which we conclude that it is JVM freeze. Further to analyze that we were supposed to analyze process threads using pstack but as pstack itself is causing application pause so in your opinion what are the best commands/tools to analyze JVM freeze when Thread dumps stopped coming. Regards, Amit From: charlie hunt [mailto:charlie.hunt at oracle.com] Sent: Wednesday, September 7, 2016 17:29 To: Amit Mishra Cc: hotspot-gc-use at openjdk.java.net Subject: Re: Java application getting paused when pfiles or pstack command run for it Hi Amit, pfiles, pstack and pldd stops the process while they do their work. The following is directly from the p-tools (pfiles,etc.) man page, and from Oracle docs on p-tools: These proc tools (p-tools) stop their target processes while inspecting them and reporting the results: files, pled, mmap, and stack. A process can do nothing while it is stopped. As for why it may take up to 15 seconds? It may be that there are a large number file descriptors in use, etc. or a (large) portion of that time is spent getting all threads in the JVM and app to come to stopped state, or a combination of both. hths, charlie On Sep 7, 2016, at 5:33 AM, Amit Mishra > wrote: Hello Charlie/team, I need your expert help on one of Production issue whereas when pfiles command whenever runs then it cause java application freeze until command get completed.(it is causing long pauses of up-to 15 seconds) During this whole time situations appears as JVM freeze as TD's for Application PID stopped generating, GC logs are getting paused and application stopped catering traffic. What could be the cause of it, Application Java version is 1.6u45.(Application PID is 18387) 3 samples from Top command when pfiles ,pldd and pstack command cause Application freeze(as TD's was also not coming we treated this as JVM freeze). ======================================================== 1st Sample: load averages: 1.11, 1.00, 0.98; up 68+20:03:01 21:41:59 94 processes: 91 sleeping, 1 stopped, 2 on cpu CPU states: 85.6% idle, 0.4% user, 14.1% kernel, 0.0% iowait, 0.0% swap Memory: 64G phys mem, 30G free mem, 16G total swap, 16G free swap PID USERNAME LWP PRI NICE SIZE RES STATE TIME CPU COMMAND 5251 root 1 0 17 2872K 1396K cpu/2 0:00 6.82% pfiles 18387 18387 rkadm 999 0 0 29G 23G stop 878.5H 5.73% /usr/Java_1.6_45/bin/amd64/java -Djava.util.logging.manager=com.redknee.framewo ====================================================================== 2nd Sample: PID USERNAME LWP PRI NICE SIZE RES STATE TIME CPU COMMAND 10145 root 1 0 17 2620K 1676K cpu/2 0:00 3.90% pldd 18387 5352 root 4 6 17 31M 27M sleep 0:01 3.33% pkgserv -N pkgchk 18387 rkadm 999 59 0 29G 23G stop 878.5H 0.82% /usr/Java_1.6_45/bin/amd64/java -Djava.util.logging.manager=com.redknee.framewo ================================================================= 3rd Sample load averages: 1.21, 1.16, 1.15; up 68+18:02:37 19:41:35 88 processes: 85 sleeping, 1 stopped, 2 on cpu CPU states: 85.9% idle, 0.2% user, 13.8% kernel, 0.0% iowait, 0.0% swap Memory: 64G phys mem, 30G free mem, 16G total swap, 16G free swap PID USERNAME LWP PRI NICE SIZE RES STATE TIME CPU COMMAND 1250 rkadm 1 30 0 28M 11M cpu/1 0:09 12.49% pstack 18387 19859 rkadm 392 59 0 3322M 2342M sleep 69.6H 0.46% /usr/Java_1.6_45/bin/java -classpath /usr/Java_1.6_45/lib/tools.jar:/opt/redkne 1257 rkadm 1 59 0 3128K 1756K cpu/3 0:00 0.05% top -c -d 1 -s 1 100 1024 root 12 6 17 154M 109M sleep 118:09 0.01% /opt/IBM/SCM/client/../_jvm/bin/java -Xint -Xmx128m -Djlog.logCmdPort=1953 -Dsu 15036 root 32 59 0 138M 74M sleep 85:05 0.01% /usr/java/bin/java -Dviper.fifo.path=/var/run/smc898/boot.fifo -Xmx128m -Dsun.s 1075 noaccess 19 59 0 97M 91M sleep 69:05 0.01% /usr/java/bin/java -server -Xmx128m -XX:+UseParallelGC -XX:ParallelGCThreads=4 18285 rkadm 23 59 0 110M 53M sleep 42:39 0.01% orbd -ORBInitialPort 20000 -port 20100 18289 rkadm 21 59 0 105M 45M sleep 42:02 0.01% orbd -ORBInitialPort 21000 -port 21100 /opt/redknee/log/rkctl.log 185 root 1 59 0 2472K 1108K sleep 46:38 0.01% /usr/lib/inet/in.mpathd -a 1254 root 43 59 0 47M 22M sleep 12:32 0.00% /opt/IBM/ITM/sol606/ul/bin/kulagent 1467 root 44 59 0 61M 32M sleep 157:12 0.00% /opt/IBM/ITM/sol606/ux/bin/kuxagent 665 root 1 59 0 9084K 2596K sleep 1:41 0.00% /usr/lib/sendmail -bd -q15m 7 root 14 59 0 12M 10M sleep 1:58 0.00% /lib/svc/bin/svc.startd 502 root 1 59 0 1444K 748K sleep 0:18 0.00% /usr/lib/utmpd 676 root 1 100 -20 2652K 1424K sleep 5:32 0.00% /usr/lib/inet/xntpd 574 root 4 59 0 6412K 3068K sleep 4:19 0.00% /usr/lib/inet/inetd start 643 root 1 59 0 2468K 1304K sleep 0:00 0.00% /usr/lib/snmp/snmpdx -y -c /etc/snmp/conf 4933 root 2 59 0 590M 578M sleep 499:41 0.00% /usr/bin/dsmc schedule 5231 root 7 59 0 44M 17M sleep 2:59 0.00% /opt/IBM/ITM/sol606/ux/bin/kcawd 18387 rkadm 999 59 0 29G 23G stop 877.2H 0.00% /usr/Java_1.6_45/bin/amd64/java -Djava.util.logging.manager=com.redknee.framewo Thanks, Amit Mishra _______________________________________________ hotspot-gc-use mailing list hotspot-gc-use at openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: From bkgood at gmail.com Thu Sep 8 15:35:46 2016 From: bkgood at gmail.com (William Good) Date: Thu, 8 Sep 2016 17:35:46 +0200 Subject: High termination times pre-concurrent cycle in G1 In-Reply-To: <157af541-2e05-3582-b2e6-e9fd6ce3deea@oracle.com> References:

<3227fd9b-3f11-3e78-9768-1545b4153219@oracle.com>

<1472816590.4493.20.camel@oracle.com> <1472824900.4493.34.camel@oracle.com> <157af541-2e05-3582-b2e6-e9fd6ce3deea@oracle.com> Message-ID: Hi Jenny, Attached are logs for runs with 8m and 16m region sizes. This is with the old version of the application (before the akka patch). I was pleasantly surprised by the 16m run and tried it a few times, but it seems these results are reliable. Thanks, William On Tue, Sep 6, 2016 at 6:25 PM, yu.zhang at oracle.com wrote: > William, > > While this might not help the termination time, I hope adjusting the region > size can get rid of some of the humongous objects. > > It seems the humongous objects are filling old gen, about ~5g. The full gc > can clean 10g. Can you try increase region size to 8m or 16m? > > Thanks > > Jenny > > > > On 09/05/2016 03:02 AM, William Good wrote: >> >> Hi Thomas, >> >> Attached is a log file from a run with this morning's tip. I don't see >> any change unfortunately. >> >> I'll try with jdk9 tip later today since you mention there are more fixes >> there. >> >> configure summary looked like this, I think everything here is in order: >> >> Configuration summary: >> * Debug level: release >> * JDK variant: normal >> * JVM variants: server >> * OpenJDK target: OS: linux, CPU architecture: x86, address length: 64 >> >> Tools summary: >> * Boot JDK: java version "1.8.0_101" Java(TM) SE Runtime >> Environment (build 1.8.0_101-b13) Java HotSpot(TM) 64-Bit Server VM >> (build 25.101-b13, mixed mode) (at /var/tmp/jdk1.8.0_101) >> * C Compiler: gcc (GCC) 4.8.5 20150623 (Red Hat-4) version 4.8.5 >> (at /usr/bin/gcc) >> * C++ Compiler: g++ (GCC) 4.8.5 20150623 (Red Hat-4) version 4.8.5 >> (at /usr/bin/g++) >> >> Best, >> William >> >> >> On Fri, Sep 2, 2016 at 4:01 PM, Thomas Schatzl >> wrote: >>> >>> Hi, >>> >>> On Fri, 2016-09-02 at 15:08 +0200, William Good wrote: >>>> >>>> Thomas, >>>> >>>> More than happy to build and test with a patched JDK. Let me know >>>> what I need to do, at least what sources to grab (I think I can build >>>> it once I've got them). >>>> >>>> I learned this morning that my belief that significant tenuring >>>> wasn't taking place was possibly wrong. Our problem seems to have >>>> disappeared with an akka upgrade [1] and my guess for the relevant >>>> fix [2] indicates that unintended tenuring was occurring at some >>>> point (teh tenuring has never been significant enough for us to >>>> notice). However as long as I'm unable to reproduce with the new >>>> version I'm happy to continue testing using the older version I know >>>> to reproduce, as I don't think this G1 behavior is intended. >>> >>> this bug report makes me tend to believe that the suggested fix (for >>> JDK-8152438) will actually fix the issue. Let me explain: >>> >>> Every G1 thread has two work queues, one fixed size public one where >>> others can steal from, and one resizable one that is private. Work >>> (references) is first put into the public one, and then if it is full, >>> into the private one. >>> Threads first process their private buffers, so that others can steal >>> from the public one while they are working on it. >>> Due to some conditions it can happen that the public queues are already >>> completely empty, while one thread is still busy for a long time with >>> its private one. I.e. the work in the private queue can be so that it >>> never generates more work in the public queue that others can steal and >>> continue work from. So they wait. >>> >>> That can cause this high termination time. Of course this situation can >>> occur multiple times during a GC, so that every thread gets his fair >>> share of waiting :) >>> >>> That mentioned fix has been pushed into the http://hg.openjdk.java.net/ >>> jdk8u/jdk8u-dev/ repository. It should be a matter of pulling it. >>> The README-builds file in the repo has build instructions. >>> >>> It would be really nice if we could track down your problem to this >>> issue, or at least significantly improve it. (JDK9 has more significant >>> patches in that area iirc, but this one is probably the most >>> important). >>> >>> Thanks, >>> Thomas >>> > -------------- next part -------------- A non-text attachment was scrubbed... Name: vary_region_sizes_logs.tar.bz Type: application/x-bzip2 Size: 75668 bytes Desc: not available URL: From yu.zhang at oracle.com Fri Sep 9 01:14:46 2016 From: yu.zhang at oracle.com (yu.zhang at oracle.com) Date: Thu, 8 Sep 2016 18:14:46 -0700 Subject: High termination times pre-concurrent cycle in G1 In-Reply-To: References:

<3227fd9b-3f11-3e78-9768-1545b4153219@oracle.com>

<1472816590.4493.20.camel@oracle.com> <1472824900.4493.34.camel@oracle.com> <157af541-2e05-3582-b2e6-e9fd6ce3deea@oracle.com> Message-ID: <56f5c29e-481d-10df-1797-da13768eaab8@oracle.com> William, Thanks for the update. Gladly surprised as well. One explanation is those humongous objects have pointers to the young gen. Now since we moved them to young gen by increasing region size, we do not need to update those pointers. In the tip.log, the update remember set and object copy time is not evenly distributed among the threads. Maybe g1 is having difficulties with those special kind of object arrays. Can you get a heap dump? I am curious what kind of objects those are. Thanks Jenny On 09/08/2016 08:35 AM, William Good wrote: > Hi Jenny, > > Attached are logs for runs with 8m and 16m region sizes. This is with > the old version of the application (before the akka patch). I was > pleasantly surprised by the 16m run and tried it a few times, but it > seems these results are reliable. > > Thanks, > William > > On Tue, Sep 6, 2016 at 6:25 PM, yu.zhang at oracle.com wrote: >> William, >> >> While this might not help the termination time, I hope adjusting the region >> size can get rid of some of the humongous objects. >> >> It seems the humongous objects are filling old gen, about ~5g. The full gc >> can clean 10g. Can you try increase region size to 8m or 16m? >> >> Thanks >> >> Jenny >> >> >> >> On 09/05/2016 03:02 AM, William Good wrote: >>> Hi Thomas, >>> >>> Attached is a log file from a run with this morning's tip. I don't see >>> any change unfortunately. >>> >>> I'll try with jdk9 tip later today since you mention there are more fixes >>> there. >>> >>> configure summary looked like this, I think everything here is in order: >>> >>> Configuration summary: >>> * Debug level: release >>> * JDK variant: normal >>> * JVM variants: server >>> * OpenJDK target: OS: linux, CPU architecture: x86, address length: 64 >>> >>> Tools summary: >>> * Boot JDK: java version "1.8.0_101" Java(TM) SE Runtime >>> Environment (build 1.8.0_101-b13) Java HotSpot(TM) 64-Bit Server VM >>> (build 25.101-b13, mixed mode) (at /var/tmp/jdk1.8.0_101) >>> * C Compiler: gcc (GCC) 4.8.5 20150623 (Red Hat-4) version 4.8.5 >>> (at /usr/bin/gcc) >>> * C++ Compiler: g++ (GCC) 4.8.5 20150623 (Red Hat-4) version 4.8.5 >>> (at /usr/bin/g++) >>> >>> Best, >>> William >>> >>> >>> On Fri, Sep 2, 2016 at 4:01 PM, Thomas Schatzl >>> wrote: >>>> Hi, >>>> >>>> On Fri, 2016-09-02 at 15:08 +0200, William Good wrote: >>>>> Thomas, >>>>> >>>>> More than happy to build and test with a patched JDK. Let me know >>>>> what I need to do, at least what sources to grab (I think I can build >>>>> it once I've got them). >>>>> >>>>> I learned this morning that my belief that significant tenuring >>>>> wasn't taking place was possibly wrong. Our problem seems to have >>>>> disappeared with an akka upgrade [1] and my guess for the relevant >>>>> fix [2] indicates that unintended tenuring was occurring at some >>>>> point (teh tenuring has never been significant enough for us to >>>>> notice). However as long as I'm unable to reproduce with the new >>>>> version I'm happy to continue testing using the older version I know >>>>> to reproduce, as I don't think this G1 behavior is intended. >>>> this bug report makes me tend to believe that the suggested fix (for >>>> JDK-8152438) will actually fix the issue. Let me explain: >>>> >>>> Every G1 thread has two work queues, one fixed size public one where >>>> others can steal from, and one resizable one that is private. Work >>>> (references) is first put into the public one, and then if it is full, >>>> into the private one. >>>> Threads first process their private buffers, so that others can steal >>>> from the public one while they are working on it. >>>> Due to some conditions it can happen that the public queues are already >>>> completely empty, while one thread is still busy for a long time with >>>> its private one. I.e. the work in the private queue can be so that it >>>> never generates more work in the public queue that others can steal and >>>> continue work from. So they wait. >>>> >>>> That can cause this high termination time. Of course this situation can >>>> occur multiple times during a GC, so that every thread gets his fair >>>> share of waiting :) >>>> >>>> That mentioned fix has been pushed into the http://hg.openjdk.java.net/ >>>> jdk8u/jdk8u-dev/ repository. It should be a matter of pulling it. >>>> The README-builds file in the repo has build instructions. >>>> >>>> It would be really nice if we could track down your problem to this >>>> issue, or at least significantly improve it. (JDK9 has more significant >>>> patches in that area iirc, but this one is probably the most >>>> important). >>>> >>>> Thanks, >>>> Thomas >>>> From inurislamov at getintent.com Fri Sep 23 09:40:56 2016 From: inurislamov at getintent.com (Ildar Nurislamov) Date: Fri, 23 Sep 2016 12:40:56 +0300 Subject: JDK-8166500 Adaptive sizing for IHOP causes excessively long mixed GC pauses Message-ID: <273BC628-AC88-4E21-AB27-32AE2021B8FA@getintent.com> Hi Thomas Schatzl! Thank you for such prompt responses. I'm going to try you advices and send results next week. Here are log files you have asked about: https://www.dropbox.com/s/i9o4nuuh5gpsf1y/9noaihop_07_09_16.log.zip?dl=0 https://www.dropbox.com/s/xa3cfezvlqwwh6v/8u_log_07_09_16.log.zip?dl=0 -- Ildar Nurislamov GetIntent, AdServer Team Leader -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.schatzl at oracle.com Wed Sep 28 07:37:07 2016 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Wed, 28 Sep 2016 09:37:07 +0200 Subject: JDK-8166500 Adaptive sizing for IHOP causes excessively long mixed GC pauses In-Reply-To: <273BC628-AC88-4E21-AB27-32AE2021B8FA@getintent.com> References: <273BC628-AC88-4E21-AB27-32AE2021B8FA@getintent.com> Message-ID: <1475048227.4430.4.camel@oracle.com> Hi Ildar, On Fri, 2016-09-23 at 12:40 +0300, Ildar Nurislamov wrote: > Hi?Thomas Schatzl! > > Thank you for such prompt responses.? > I'm going to try you advices and send results next week. > > Here are log files you have asked about: > https://www.dropbox.com/s/i9o4nuuh5gpsf1y/9noaihop_07_09_16.log.zip?d > l=0 > https://www.dropbox.com/s/xa3cfezvlqwwh6v/8u_log_07_09_16.log.zip?dl= > 0 > ? ? thanks a lot for the logs. As you may have noticed I closed?JDK- 8166500 as duplicate of the existing?JDK-8159697 issue. They are the same after all. We will continue working on improving out-of-box experience of G1. :) As hypothesized in the text for JDK-8166500, the 8u and 9-without-aihop show the same general issue. The suggested tunings should improve mixed gc times for now. Thanks, ? Thomas From inurislamov at getintent.com Thu Sep 29 10:42:53 2016 From: inurislamov at getintent.com (Ildar Nurislamov) Date: Thu, 29 Sep 2016 13:42:53 +0300 Subject: JDK-8166500 Adaptive sizing for IHOP causes excessively long mixed GC pauses In-Reply-To: <1475048227.4430.4.camel@oracle.com> References: <273BC628-AC88-4E21-AB27-32AE2021B8FA@getintent.com> <1475048227.4430.4.camel@oracle.com> Message-ID: Hi Thomas, Thank you for really helpful advices. I have performed 8-hour testing with: -XX:+UnlockExperimentalVMOptions -XX:G1NewSizePercent=2 -XX:G1HeapWastePercent=10 -XX:G1MixedGCCountTarget=12 and they improved situation for both 8u and 9ea. Longest pause on 9ea now is 400ms with Adaptive sizing for IHOP I will continue testing and report if anything interesting pops out. -- Ildar Nurislamov GetIntent, AdServer Team Leader > On Sep 28, 2016, at 10:37, Thomas Schatzl wrote: > > Hi Ildar, > > On Fri, 2016-09-23 at 12:40 +0300, Ildar Nurislamov wrote: >> Hi Thomas Schatzl! >> >> Thank you for such prompt responses. >> I'm going to try you advices and send results next week. >> >> Here are log files you have asked about: >> https://www.dropbox.com/s/i9o4nuuh5gpsf1y/9noaihop_07_09_16.log.zip?d >> l=0 >> https://www.dropbox.com/s/xa3cfezvlqwwh6v/8u_log_07_09_16.log.zip?dl= >> 0 >> > > thanks a lot for the logs. As you may have noticed I closed JDK- > 8166500 as duplicate of the existing JDK-8159697 issue. They are the > same after all. > We will continue working on improving out-of-box experience of G1. :) > > As hypothesized in the text for JDK-8166500, the 8u and 9-without-aihop > show the same general issue. The suggested tunings should improve mixed > gc times for now. > > Thanks, > Thomas > -------------- next part -------------- An HTML attachment was scrubbed... URL: