From matthew.miller at forgerock.com Wed Apr 2 17:43:25 2014 From: matthew.miller at forgerock.com (Matt Miller) Date: Wed, 02 Apr 2014 13:43:25 -0400 Subject: -XX:+PrintClassHistogram Does a FullGC even with CMS enabled and -XX:+ExplicitGCInvokesConcurrent ? Message-ID: <533C4C3D.7000906@forgerock.com> Hi All, It seems to me that even with 7 (Tested both u45 and u51) the title holds true. I would expect with ConcMarkSweep enabled, and ExplicitGCInvokesConcurrent that -XX:+PrintClassHistogram would NOT do a FullGC, but instead start a CMS cycle. I thought maybe this was addressed by http://bugs.java.com/view_bug.do?bug_id=6797870 But I suppose this is not correct? Is there a bug to fix PrintClassHistogram when using CMS? Perhaps I am just not finding the correct bug number? And of course even -XX:+DisableExplicitGC does not help... Example: jconsole -J-Xloggc:/Users/mmiller/gc.log -J-XX:+UseConcMarkSweepGC -J-XX:+PrintGCDetails -J-XX:+ExplicitGCInvokesConcurrent -J-XX:+PrintClassHistogram & [1] 10759 $ kill -3 10759 $ cat gc.log 19.169: [Full GC19.169: [CMS: 5620K->6303K(7776K), 0.0338410 secs] 6950K->6303K(10208K), [CMS Perm : 13492K->13487K(21248K)], 0.0340110 secs] [Times: user=0.04 sys=0.00, real=0.03 secs] num #instances #bytes class name ---------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: From ysr1729 at gmail.com Wed Apr 2 18:41:58 2014 From: ysr1729 at gmail.com (Srinivas Ramakrishna) Date: Wed, 2 Apr 2014 11:41:58 -0700 Subject: -XX:+PrintClassHistogram Does a FullGC even with CMS enabled and -XX:+ExplicitGCInvokesConcurrent ? In-Reply-To: <533C4C3D.7000906@forgerock.com> References: <533C4C3D.7000906@forgerock.com> Message-ID: Hi Matt -- It's been a while since we talked; great to hear from you. As regards the behaviour of +PrintClassHistogram, I don't recall that interpretation ever having been implemented, and definitely not in the CR that you reference below. +PrintClassHistogram will do an explicit stop-world full gc, even with CMS, then produce a class histogram. What the description in the CR you pointed to below is saying is that, PrintClassHistogramBefore/AfterFullGC will not produce a histogram at the end of a concurrent gc cycle. However, it will at the end of a stop-world full gc cycle. That having been said, I do not think it's a good idea to change the interpretation of the SIGQUIT related +PrintClassHistogram depending on the GC in question. It looks like what you want here is the ability to get a class histogram after a concurrent gc cycle when presented with a SIGQUIT. (There's also the question of whether the same behaviour should also apply to jmap -heap:live.) I think a change in behaviour along the lines of yr suggestion requires discussion by users of concurrent gc's to see if that is desirable and secondly whether both kinds of behaviours should be available (i.e. the current one, as well as an optional "concurrent" one). thanks! -- ramki On Wed, Apr 2, 2014 at 10:43 AM, Matt Miller wrote: > Hi All, > > It seems to me that even with 7 (Tested both u45 and u51) the title holds > true. > I would expect with ConcMarkSweep enabled, and ExplicitGCInvokesConcurrent > that -XX:+PrintClassHistogram would NOT do a FullGC, but instead start a > CMS cycle. > I thought maybe this was addressed by > http://bugs.java.com/view_bug.do?bug_id=6797870 > > But I suppose this is not correct? > > Is there a bug to fix PrintClassHistogram when using CMS? > Perhaps I am just not finding the correct bug number? > > And of course even -XX:+DisableExplicitGC does not help... > > Example: > jconsole -J-Xloggc:/Users/mmiller/gc.log -J-XX:+UseConcMarkSweepGC > -J-XX:+PrintGCDetails -J-XX:+ExplicitGCInvokesConcurrent > -J-XX:+PrintClassHistogram & > [1] 10759 > $ kill -3 10759 > $ cat gc.log > > > > 19.169: [Full GC19.169: [CMS: 5620K->6303K(7776K), 0.0338410 secs] > 6950K->6303K(10208K), [CMS Perm : 13492K->13487K(21248K)], 0.0340110 secs] > [Times: user=0.04 sys=0.00, real=0.03 secs] > > num #instances #bytes class name > ---------------------------------------------- > > > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.miller at forgerock.com Wed Apr 2 19:00:11 2014 From: matthew.miller at forgerock.com (Matt Miller) Date: Wed, 02 Apr 2014 15:00:11 -0400 Subject: -XX:+PrintClassHistogram Does a FullGC even with CMS enabled and -XX:+ExplicitGCInvokesConcurrent ? In-Reply-To: References: <533C4C3D.7000906@forgerock.com> Message-ID: <533C5E3B.1030606@forgerock.com> Hi Ramki! Thanks for replying. I would be ok with it, if PrintClassHistogram did an "explicit" Full GC. It should probably put (System) into the log file though, to show that this is an Explicit GC. It should also be preventable with -XX:+DisableExplicitGC turned on in that case. To me, it seems like a bug if you have -XX:+DisableExplicitGC turned on, and +ExplicitGCInvokesConcurrent turned on, yet +PrintClassHistogram can still performs a Full STW GC. There should be away to prevent any STW GC's from happening if you're specifying that you want CMS to do the work in the tenured space. There should also be a way (Explicit GC) to force a Full STW GC when using CMS (That piece seems to work fine ;) ). It's also hard to identify what the cause of FullGC's are in the log file when jmap -histo:live can also produce a FullGC, as well as jmap -dump:live and a SIGQUIT with PrintHistogram turned on. As it stands right now, you just have to guess that something else is causing the JVM to do a STW GC because it's showing that it's supposed to be using the CMS collector, and the heap is not full (maybe you're unlucky though and you're doing a histogram while the heap is much closer to full?). I've seen situations where customers have put a cronjob to collect jmap -histo output every X minutes.. and then "mysteriously" there are FullGCs in the CMS logs .. Which are very hard to identify where they are coming from , and then stop them from occurring ... -Matt On 4/2/14, 2:41 PM, Srinivas Ramakrishna wrote: > Hi Matt -- > > It's been a while since we talked; great to hear from you. > > As regards the behaviour of +PrintClassHistogram, I don't recall that > interpretation ever having been implemented, > and definitely not in the CR that you reference below. > > +PrintClassHistogram will do an explicit stop-world full gc, even with > CMS, then produce a class histogram. > > What the description in the CR you pointed to below is saying is that, > PrintClassHistogramBefore/AfterFullGC > will not produce a histogram at the end of a concurrent gc cycle. > > However, it will at the end of a stop-world full gc cycle. > > That having been said, I do not think it's a good idea to change the > interpretation of the SIGQUIT related > +PrintClassHistogram depending on the GC in question. It looks like > what you want here is the ability to > get a class histogram after a concurrent gc cycle when presented with > a SIGQUIT. (There's also the > question of whether the same behaviour should also apply to jmap > -heap:live.) > > I think a change in behaviour along the lines of yr suggestion > requires discussion by users of concurrent gc's > to see if that is desirable and secondly whether both kinds of > behaviours should be available (i.e. the current one, > as well as an optional "concurrent" one). > > thanks! > -- ramki > > > On Wed, Apr 2, 2014 at 10:43 AM, Matt Miller > > > wrote: > > Hi All, > > It seems to me that even with 7 (Tested both u45 and u51) the > title holds true. > I would expect with ConcMarkSweep enabled, and > ExplicitGCInvokesConcurrent that -XX:+PrintClassHistogram would > NOT do a FullGC, but instead start a CMS cycle. > I thought maybe this was addressed by > http://bugs.java.com/view_bug.do?bug_id=6797870 > > But I suppose this is not correct? > > Is there a bug to fix PrintClassHistogram when using CMS? > Perhaps I am just not finding the correct bug number? > > And of course even -XX:+DisableExplicitGC does not help... > > Example: > jconsole -J-Xloggc:/Users/mmiller/gc.log -J-XX:+UseConcMarkSweepGC > -J-XX:+PrintGCDetails -J-XX:+ExplicitGCInvokesConcurrent > -J-XX:+PrintClassHistogram & > [1] 10759 > $ kill -3 10759 > $ cat gc.log > > > > 19.169: [Full GC19.169: [CMS: 5620K->6303K(7776K), 0.0338410 secs] > 6950K->6303K(10208K), [CMS Perm : 13492K->13487K(21248K)], > 0.0340110 secs] [Times: user=0.04 sys=0.00, real=0.03 secs] > > num #instances #bytes class name > ---------------------------------------------- > > > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cri at itscope.de Wed Apr 9 11:26:30 2014 From: cri at itscope.de (=?windows-1252?Q?Cornelius_Riemenschneider?=) Date: Wed, 9 Apr 2014 13:26:30 +0200 Subject: ridiculous ParNew pause times Message-ID: Hello, we?ve got the following problem with the ParNew collector: Our log.gc usually looks like this: 2014-04-09T12:58:02.485+0200: 77357.712: [GC2014-04-09T12:58:02.485+0200: 77357.712: [ParNew: 2722925K->100795K(3145728K), 0.3202010 secs] 6998057K->4375934K(21495808K), 0.3205670 secs] [Times: user=4.10 sys=0.02, real=0.32 secs] 2014-04-09T12:58:06.256+0200: 77361.483: [GC2014-04-09T12:58:06.257+0200: 77361.483: [ParNew: 2722235K->101011K(3145728K), 0.3229910 secs] 6997374K->4376165K(21495808K), 0.3233580 secs] [Times: user=4.13 sys=0.02, real=0.33 secs] 2014-04-09T12:58:12.295+0200: 77367.522: [GC2014-04-09T12:58:12.296+0200: 77367.522: [ParNew: 2722451K->101057K(3145728K), 0.3215320 secs] 6997605K->4376216K(21495808K), 0.3219080 secs] [Times: user=4.12 sys=0.01, real=0.32 secs] 2014-04-09T12:58:18.461+0200: 77373.688: [GC2014-04-09T12:58:18.462+0200: 77373.688: [ParNew: 2722497K->2232K(3145728K), 0.2944540 secs] 6997656K->4376242K(21495808K), 0.2948280 secs] [Times: user=3.79 sys=0.00, real=0.29 secs] But occasionally we have entries like these: 2014-04-09T09:56:12.840+0200: 66448.066: [GC2014-04-09T09:56:38.154+0200: 66473.381: [ParNew: 3139808K->524288K(3145728K), 0.8355990 secs] 6845512K->4585563K(21495808K), 26.1502640 secs] [Times: user=5.59 sys=0.16, real=26.16 secs] 2014-04-09T09:57:09.173+0200: 66504.400: [GC2014-04-09T09:57:24.871+0200: 66520.098: [ParNew: 2950573K->488555K(3145728K), 0.1876660 secs] 8701082K->6239064K(21495808K), 15.8858250 secs] [Times: user=2.38 sys=0.00, real=15.89 secs] 2014-04-09T12:58:34.661+0200: 77389.888: [GC2014-04-09T12:59:06.390+0200: 77421.616: [ParNew: 2623439K->2083K(3145728K), 0.0292270 secs] 6997709K->4376490K(21495808K), 31.7578950 secs] [Times: user=0.34 sys=0.02, real=31.76 secs] which I can?t explain at all. The real time of 31.76sec equals a pause of 31.76secs, in which the jvm does not respond to user requests, which is obviously bad. The application is _very_ allocation heavy, so generally pauses of 0.3sec are okay. Our GC settings for this server are: -Xms21g -Xmx21g -XX:ReservedCodeCacheSize=256m -XX:PermSize=256m -XX:MaxPermSize=768m -server -verbose:gc -Xloggc:log.gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+ExplicitGCInvokesConcurrent -XX:NewRatio=5 -XX:SurvivorRatio=5 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=40 -XX:+CMSClassUnloadingEnabled -XX:+CMSScavengeBeforeRemark -Dsun.rmi.dgc.client.gcInterval=1209600000 -Dsun.rmi.dgc.server.gcInterval=1209600000 ? We run the sun jdk 7u51 on a current debian wheezy. We previously had issues with long ParNew pauses, but back then, the sys time was high, so we concluded that the server was swapping, which we were able to prevent. Do you have any idea or further hint at debugging options which might help us in finding the issue? ? Regards, Cornelius Riemenschneider -- ITscope GmbH Ludwig-Erhard-Allee 20 76131 Karlsruhe Email: cornelius.riemenschneider at itscope.de https://www.itscope.com Handelsregister: AG Mannheim, HRB 232782 Sitz der Gesellschaft: Karlsruhe Gesch?ftsf?hrer: Alexander M?nkel, Benjamin Mund, Stefan Reger ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From cri at itscope.de Wed Apr 9 11:56:26 2014 From: cri at itscope.de (=?windows-1252?Q?Cornelius_Riemenschneider?=) Date: Wed, 9 Apr 2014 13:56:26 +0200 Subject: Why G1 doesn't cut it for our application Message-ID: Hello, after recently switching to the latest java 7 (u51), I was eager to try out G1. I used mainly http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-gc-migration-to-expectations-and-advanced-tuning for tuning, but I hit a roadblock which makes it impossile for us to use G1. Our allocation pattern includes sometimes huge objects, sometimes in the range of ~120MB, sometimes ~600MB, but I?ve seen about 1.2GB as well. This is obviously unfriendly to the GC. Our tuned CMS mostly handles this, but sometimes we hit problems, so we had high expectations for G1. G1, in our case, triggers FullGC way more often than CMS, even when the heap is mostly empty. ? An example log excerpt for this: 399934.892: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: humongous allocation request failed, allocation request: 117245600 bytes] 399934.892: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 83886080 bytes, attempted expansion amount: 83886080 bytes] 399934.892: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap expansion operation failed] 399934.893: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: humongous allocation request failed, allocation request: 117245600 bytes] 399934.893: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 83886080 bytes, attempted expansion amount: 83886080 bytes] 399934.893: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap expansion operation failed] 399934.893: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: allocation request failed, allocation request: 117245600 bytes] 399934.893: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 117245600 bytes, attempted expansion amount: 117440512 bytes] 399934.893: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap expansion operation failed] 2014-04-09T12:12:49.602+0200: 399934.894: [Full GC 11G->8118M(20G), 20.8752850 secs] ?? [Eden: 8192.0K(1016.0M)->0.0B(2728.0M) Survivors: 96.0M->0.0B Heap: 11.6G(20.0G)->8118.8M(20.0G)] [Times: user=37.77 sys=0.00, real=20.88 secs] 2014-04-09T12:13:10.479+0200: 399955.770: [GC concurrent-mark-abort] ? We have a total of 20G for the heap available, and try to allocate objects in the 120MB range. 9 GB of the heap are free, so these should fit in without problems, even in Eden is a lot of free space. The attempted heap expansion fails, because we use -Xms20g -Xmx20g which is the maximum the server is able to handle. Still, G1 gets us a FullGC here. This FullGC may be faster than a CMS FullGC, but these happen way too often to be tolerated, especially as this server is responsible for a web application with which users directly interact ? 20 secs pause after clicking are simply not tolerable. Besides using CMS, or not doing large allocations (which is sometimes impossible, given that we deal with a lot of data), do you have oher ideas? Is it known that an allocation pattern with a lot of huge objects breaks G1? The above linked presentation suggests to increase the G1 region size when humongous allocation requests are encountered, so these allocation go in eden, but we can not increase the region? size beyond 32M, so this fix doesn?t work for us. ? Regards, Cornelius Riemenschneider -- ITscope GmbH Ludwig-Erhard-Allee 20 76131 Karlsruhe Email: cornelius.riemenschneider at itscope.de https://www.itscope.com Handelsregister: AG Mannheim, HRB 232782 Sitz der Gesellschaft: Karlsruhe Gesch?ftsf?hrer: Alexander M?nkel, Benjamin Mund, Stefan Reger ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From vitalyd at gmail.com Wed Apr 9 12:49:10 2014 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Wed, 9 Apr 2014 08:49:10 -0400 Subject: ridiculous ParNew pause times In-Reply-To: References: Message-ID: This is typically caused by one of: 1) heavy swapping (this won't count towards sys time because kernel is not using cpu while waiting for I/O to complete) 2) oversubscribed machine where gc threads aren't getting enough cpu time due Have you looked at stats on the machine when these pauses occur, specifically around swap activity? Is your machine running multiple JVMs or any other noisy neighbor apps? Sent from my phone On Apr 9, 2014 7:26 AM, "Cornelius Riemenschneider" wrote: > Hello, > > we've got the following problem with the ParNew collector: > > Our log.gc usually looks like this: > > 2014-04-09T12:58:02.485+0200: 77357.712: [GC2014-04-09T12:58:02.485+0200: > 77357.712: [ParNew: 2722925K->100795K(3145728K), 0.3202010 secs] > 6998057K->4375934K(21495808K), 0.3205670 secs] [Times: user=4.10 sys=0.02, > real=0.32 secs] > > 2014-04-09T12:58:06.256+0200: 77361.483: [GC2014-04-09T12:58:06.257+0200: > 77361.483: [ParNew: 2722235K->101011K(3145728K), 0.3229910 secs] > 6997374K->4376165K(21495808K), 0.3233580 secs] [Times: user=4.13 sys=0.02, > real=0.33 secs] > > 2014-04-09T12:58:12.295+0200: 77367.522: [GC2014-04-09T12:58:12.296+0200: > 77367.522: [ParNew: 2722451K->101057K(3145728K), 0.3215320 secs] > 6997605K->4376216K(21495808K), 0.3219080 secs] [Times: user=4.12 sys=0.01, > real=0.32 secs] > > 2014-04-09T12:58:18.461+0200: 77373.688: [GC2014-04-09T12:58:18.462+0200: > 77373.688: [ParNew: 2722497K->2232K(3145728K), 0.2944540 secs] > 6997656K->4376242K(21495808K), 0.2948280 secs] [Times: user=3.79 sys=0.00, > real=0.29 secs] > > But occasionally we have entries like these: > > 2014-04-09T09:56:12.840+0200: 66448.066: [GC2014-04-09T09:56:38.154+0200: > 66473.381: [ParNew: 3139808K->524288K(3145728K), 0.8355990 secs] > 6845512K->4585563K(21495808K), 26.1502640 secs] [Times: user=5.59 sys=0.16, > real=26.16 secs] > > 2014-04-09T09:57:09.173+0200: 66504.400: [GC2014-04-09T09:57:24.871+0200: > 66520.098: [ParNew: 2950573K->488555K(3145728K), 0.1876660 secs] > 8701082K->6239064K(21495808K), 15.8858250 secs] [Times: user=2.38 sys=0.00, > real=15.89 secs] > > 2014-04-09T12:58:34.661+0200: 77389.888: [GC2014-04-09T12:59:06.390+0200: > 77421.616: [ParNew: 2623439K->2083K(3145728K), 0.0292270 secs] > 6997709K->4376490K(21495808K), 31.7578950 secs] [Times: user=0.34 sys=0.02, > real=31.76 secs] > > which I can't explain at all. > > The real time of 31.76sec equals a pause of 31.76secs, in which the jvm > does not respond to user requests, which is obviously bad. > > The application is _*very*_ allocation heavy, so generally pauses of > 0.3sec are okay. > > Our GC settings for this server are: > > -Xms21g > > -Xmx21g > > -XX:ReservedCodeCacheSize=256m > > -XX:PermSize=256m > > -XX:MaxPermSize=768m > > -server > > -verbose:gc > > -Xloggc:log.gc > > -XX:+PrintGCDetails > > -XX:+PrintGCDateStamps > > -XX:+ExplicitGCInvokesConcurrent > > -XX:NewRatio=5 > > -XX:SurvivorRatio=5 > > -XX:+UseConcMarkSweepGC > > -XX:+UseParNewGC > > -XX:+UseCMSInitiatingOccupancyOnly > > -XX:CMSInitiatingOccupancyFraction=40 > > -XX:+CMSClassUnloadingEnabled > > -XX:+CMSScavengeBeforeRemark > > -Dsun.rmi.dgc.client.gcInterval=1209600000 > > -Dsun.rmi.dgc.server.gcInterval=1209600000 > > > > We run the sun jdk 7u51 on a current debian wheezy. > > We previously had issues with long ParNew pauses, but back then, the sys > time was high, so we concluded that the server was swapping, > > which we were able to prevent. > > Do you have any idea or further hint at debugging options which might help > us in finding the issue? > > > > Regards, > > Cornelius Riemenschneider > > -- > > ITscope GmbH > > Ludwig-Erhard-Allee 20 > > 76131 Karlsruhe > > Email: cornelius.riemenschneider at itscope.de > > https://www.itscope.com > > Handelsregister: AG Mannheim, HRB 232782 > > Sitz der Gesellschaft: Karlsruhe > > Gesch?ftsf?hrer: Alexander M?nkel, Benjamin Mund, Stefan Reger > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jesper.wilhelmsson at oracle.com Wed Apr 9 14:33:20 2014 From: jesper.wilhelmsson at oracle.com (Jesper Wilhelmsson) Date: Wed, 09 Apr 2014 16:33:20 +0200 Subject: Why G1 doesn't cut it for our application In-Reply-To: References: Message-ID: <53455A30.3040905@oracle.com> Hi Cornelius, G1 is known to have issues with very large objects, especially if they vary in size over the life of the application. If you have the option to build from sources (OpenJDK) you can try the latest version of G1. There have been some improvements to the large objects handling lately. Some of these changes could work in your favor. Best regards, /Jesper Cornelius Riemenschneider skrev 9/4/14 13:56: > Hello, > > after recently switching to the latest java 7 (u51), I was eager to try out G1. > > I used mainly > http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-gc-migration-to-expectations-and-advanced-tuning > for tuning, but > > I hit a roadblock which makes it impossile for us to use G1. > > Our allocation pattern includes sometimes huge objects, sometimes in the range > of ~120MB, sometimes ~600MB, but I?ve seen about 1.2GB as well. This is > obviously unfriendly to the GC. > > Our tuned CMS mostly handles this, but sometimes we hit problems, so we had high > expectations for G1. > > G1, in our case, triggers FullGC way more often than CMS, even when the heap is > mostly empty. > > An example log excerpt for this: > > 399934.892: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: > humongous allocation request failed, allocation request: 117245600 bytes] > > 399934.892: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion > amount: 83886080 bytes, attempted expansion amount: 83886080 bytes] > > 399934.892: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap > expansion operation failed] > > 399934.893: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: > humongous allocation request failed, allocation request: 117245600 bytes] > > 399934.893: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion > amount: 83886080 bytes, attempted expansion amount: 83886080 bytes] > > 399934.893: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap > expansion operation failed] > > 399934.893: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: > allocation request failed, allocation request: 117245600 bytes] > > 399934.893: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion > amount: 117245600 bytes, attempted expansion amount: 117440512 bytes] > > 399934.893: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap > expansion operation failed] > > 2014-04-09T12:12:49.602+0200: 399934.894: [Full GC 11G->8118M(20G), 20.8752850 secs] > > [Eden: 8192.0K(1016.0M)->0.0B(2728.0M) Survivors: 96.0M->0.0B Heap: > 11.6G(20.0G)->8118.8M(20.0G)] > > [Times: user=37.77 sys=0.00, real=20.88 secs] > > 2014-04-09T12:13:10.479+0200: 399955.770: [GC concurrent-mark-abort] > > We have a total of 20G for the heap available, and try to allocate objects in > the 120MB range. > > 9 GB of the heap are free, so these should fit in without problems, even in Eden > is a lot of free space. > > The attempted heap expansion fails, because we use > > -Xms20g > > -Xmx20g > > which is the maximum the server is able to handle. > > Still, G1 gets us a FullGC here. This FullGC may be faster than a CMS FullGC, > but these happen way too often to be tolerated, especially as this server is > responsible for a web > > application with which users directly interact ? 20 secs pause after clicking > are simply not tolerable. > > Besides using CMS, or not doing large allocations (which is sometimes > impossible, given that we deal with a lot of data), > > do you have oher ideas? > > Is it known that an allocation pattern with a lot of huge objects breaks G1? > > The above linked presentation suggests to increase the G1 region size when > humongous allocation requests are encountered, so these allocation go in eden, > but we can > > not increase the region size beyond 32M, so this fix doesn?t work for us. > > Regards, > > Cornelius Riemenschneider > > -- > > ITscope GmbH > > Ludwig-Erhard-Allee 20 > > 76131 Karlsruhe > > Email: cornelius.riemenschneider at itscope.de > > https://www.itscope.com > > Handelsregister: AG Mannheim, HRB 232782 > > Sitz der Gesellschaft: Karlsruhe > > Gesch?ftsf?hrer: Alexander M?nkel, Benjamin Mund, Stefan Reger > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > From cri at itscope.de Wed Apr 9 14:38:12 2014 From: cri at itscope.de (=?windows-1252?Q?Cornelius_Riemenschneider?=) Date: Wed, 9 Apr 2014 16:38:12 +0200 Subject: AW: Why G1 doesn't cut it for our application In-Reply-To: <53455A30.3040905@oracle.com> References: Message-ID: Hello, thanks for your answer. Unfortunately, building from source is not on the table for our production systems. Will these changes only go in java 8, or will we see them in 7u60 or later versions of java 7? Regards, Cornelius Riemenschneider -- ITscope GmbH Ludwig-Erhard-Allee 20 76131 Karlsruhe Email: cornelius.riemenschneider at itscope.de https://www.itscope.com Handelsregister: AG Mannheim, HRB 232782 Sitz der Gesellschaft: Karlsruhe Gesch?ftsf?hrer: Alexander M?nkel, Benjamin Mund, Stefan Reger -----Urspr?ngliche Nachricht----- Von: Jesper Wilhelmsson [mailto:jesper.wilhelmsson at oracle.com] Gesendet: Mittwoch, 9. April 2014 16:33 An: Cornelius Riemenschneider; hotspot-gc-use at openjdk.java.net Betreff: Re: Why G1 doesn't cut it for our application Hi Cornelius, G1 is known to have issues with very large objects, especially if they vary in size over the life of the application. If you have the option to build from sources (OpenJDK) you can try the latest version of G1. There have been some improvements to the large objects handling lately. Some of these changes could work in your favor. Best regards, /Jesper Cornelius Riemenschneider skrev 9/4/14 13:56: > Hello, > > after recently switching to the latest java 7 (u51), I was eager to try out G1. > > I used mainly > http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collect > or-g1-gc-migration-to-expectations-and-advanced-tuning > for tuning, but > > I hit a roadblock which makes it impossile for us to use G1. > > Our allocation pattern includes sometimes huge objects, sometimes in > the range of ~120MB, sometimes ~600MB, but I?ve seen about 1.2GB as > well. This is obviously unfriendly to the GC. > > Our tuned CMS mostly handles this, but sometimes we hit problems, so > we had high expectations for G1. > > G1, in our case, triggers FullGC way more often than CMS, even when > the heap is mostly empty. > > An example log excerpt for this: > > 399934.892: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: > humongous allocation request failed, allocation request: 117245600 > bytes] > > 399934.892: [G1Ergonomics (Heap Sizing) expand the heap, requested > expansion > amount: 83886080 bytes, attempted expansion amount: 83886080 bytes] > > 399934.892: [G1Ergonomics (Heap Sizing) did not expand the heap, > reason: heap expansion operation failed] > > 399934.893: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: > humongous allocation request failed, allocation request: 117245600 > bytes] > > 399934.893: [G1Ergonomics (Heap Sizing) expand the heap, requested > expansion > amount: 83886080 bytes, attempted expansion amount: 83886080 bytes] > > 399934.893: [G1Ergonomics (Heap Sizing) did not expand the heap, > reason: heap expansion operation failed] > > 399934.893: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: > allocation request failed, allocation request: 117245600 bytes] > > 399934.893: [G1Ergonomics (Heap Sizing) expand the heap, requested > expansion > amount: 117245600 bytes, attempted expansion amount: 117440512 bytes] > > 399934.893: [G1Ergonomics (Heap Sizing) did not expand the heap, > reason: heap expansion operation failed] > > 2014-04-09T12:12:49.602+0200: 399934.894: [Full GC 11G->8118M(20G), > 20.8752850 secs] > > [Eden: 8192.0K(1016.0M)->0.0B(2728.0M) Survivors: 96.0M->0.0B Heap: > 11.6G(20.0G)->8118.8M(20.0G)] > > [Times: user=37.77 sys=0.00, real=20.88 secs] > > 2014-04-09T12:13:10.479+0200: 399955.770: [GC concurrent-mark-abort] > > We have a total of 20G for the heap available, and try to allocate > objects in the 120MB range. > > 9 GB of the heap are free, so these should fit in without problems, > even in Eden is a lot of free space. > > The attempted heap expansion fails, because we use > > -Xms20g > > -Xmx20g > > which is the maximum the server is able to handle. > > Still, G1 gets us a FullGC here. This FullGC may be faster than a CMS > FullGC, but these happen way too often to be tolerated, especially as > this server is responsible for a web > > application with which users directly interact ? 20 secs pause after > clicking are simply not tolerable. > > Besides using CMS, or not doing large allocations (which is sometimes > impossible, given that we deal with a lot of data), > > do you have oher ideas? > > Is it known that an allocation pattern with a lot of huge objects breaks G1? > > The above linked presentation suggests to increase the G1 region size > when humongous allocation requests are encountered, so these > allocation go in eden, but we can > > not increase the region size beyond 32M, so this fix doesn?t work for us. > > Regards, > > Cornelius Riemenschneider > > -- > > ITscope GmbH > > Ludwig-Erhard-Allee 20 > > 76131 Karlsruhe > > Email: cornelius.riemenschneider at itscope.de > > https://www.itscope.com > > Handelsregister: AG Mannheim, HRB 232782 > > Sitz der Gesellschaft: Karlsruhe > > Gesch?ftsf?hrer: Alexander M?nkel, Benjamin Mund, Stefan Reger > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > From cri at itscope.de Wed Apr 9 14:41:36 2014 From: cri at itscope.de (=?windows-1252?Q?Cornelius_Riemenschneider?=) Date: Wed, 9 Apr 2014 16:41:36 +0200 Subject: AW: ridiculous ParNew pause times In-Reply-To: References: Message-ID: Hey, thanks for the hints :-) The server runs one JVM and one redis instance (A event-based, single threaded in-memory nosql-datastore). Redis stores about 5G of data, which are written to disk from time to time ? it now turns out, that the redis saves align perfectly with our long ParNew times. By initiating a redis save I was able to get these garbage collections. 2014-04-09T15:06:07.892+0200: 85043.119: [GC2014-04-09T15:06:18.089+0200: 85053.315: [ParNew: 2964426K->338263K(3145728K), 0.1532950 secs] 7167668K->4541505K(21495808K), 10.3502410 secs] [Times: user=1.93 sys=0.01, real=10.35 secs] 2014-04-09T15:06:39.203+0200: 85074.429: [GC2014-04-09T15:07:11.026+0200: 85106.252: [ParNew: 3145728K->405516K(3145728K), 0.3429470 secs] 7645283K->5212053K(21495808K), 32.1663210 secs] [Times: user=2.96 sys=0.01, real=32.17 secs] I monitored the system during the GCs and swapping definitly does not happen. Because redis is single-threaded (and only used by the JVM which is during ParNew in a STW phase), during ParNew there is only redis active, trying to write 5G of memory to disk as fast as possible and the GC. I wasn?t able to pinpoint the issue yet, do you have an idea why the jvm could block on I/O during a GC? Disk access is during these phases probably way slower than usual, as everything is on the / partition, a RAID 1 disk array. Or does anyone know the right perf options for ?the perf tool to monitor blocking i/o during garbage collections? ? Regards, Cornelius Riemenschneider -- ITscope GmbH Ludwig-Erhard-Allee 20 76131 Karlsruhe Email: cornelius.riemenschneider at itscope.de https://www.itscope.com Handelsregister: AG Mannheim, HRB 232782 Sitz der Gesellschaft: Karlsruhe Gesch?ftsf?hrer: Alexander M?nkel, Benjamin Mund, Stefan Reger ? Von: Vitaly Davidovich [mailto:vitalyd at gmail.com] Gesendet: Mittwoch, 9. April 2014 14:49 An: Cornelius Riemenschneider Cc: hotspot-gc-use Betreff: Re: ridiculous ParNew pause times ? This is typically caused by one of: 1) heavy swapping (this won't count towards sys time because kernel is not using cpu while waiting for I/O to complete) 2) oversubscribed machine where gc threads aren't getting enough cpu time due Have you looked at stats on the machine when these pauses occur, specifically around swap activity? Is your machine running multiple JVMs or any other noisy neighbor apps? Sent from my phone On Apr 9, 2014 7:26 AM, "Cornelius Riemenschneider" wrote: Hello, we?ve got the following problem with the ParNew collector: Our log.gc usually looks like this: 2014-04-09T12:58:02.485+0200: 77357.712: [GC2014-04-09T12:58:02.485+0200: 77357.712: [ParNew: 2722925K->100795K(3145728K), 0.3202010 secs] 6998057K->4375934K(21495808K), 0.3205670 secs] [Times: user=4.10 sys=0.02, real=0.32 secs] 2014-04-09T12:58:06.256+0200: 77361.483: [GC2014-04-09T12:58:06.257+0200: 77361.483: [ParNew: 2722235K->101011K(3145728K), 0.3229910 secs] 6997374K->4376165K(21495808K), 0.3233580 secs] [Times: user=4.13 sys=0.02, real=0.33 secs] 2014-04-09T12:58:12.295+0200: 77367.522: [GC2014-04-09T12:58:12.296+0200: 77367.522: [ParNew: 2722451K->101057K(3145728K), 0.3215320 secs] 6997605K->4376216K(21495808K), 0.3219080 secs] [Times: user=4.12 sys=0.01, real=0.32 secs] 2014-04-09T12:58:18.461+0200: 77373.688: [GC2014-04-09T12:58:18.462+0200: 77373.688: [ParNew: 2722497K->2232K(3145728K), 0.2944540 secs] 6997656K->4376242K(21495808K), 0.2948280 secs] [Times: user=3.79 sys=0.00, real=0.29 secs] But occasionally we have entries like these: 2014-04-09T09:56:12.840+0200: 66448.066: [GC2014-04-09T09:56:38.154+0200: 66473.381: [ParNew: 3139808K->524288K(3145728K), 0.8355990 secs] 6845512K->4585563K(21495808K), 26.1502640 secs] [Times: user=5.59 sys=0.16, real=26.16 secs] 2014-04-09T09:57:09.173+0200: 66504.400: [GC2014-04-09T09:57:24.871+0200: 66520.098: [ParNew: 2950573K->488555K(3145728K), 0.1876660 secs] 8701082K->6239064K(21495808K), 15.8858250 secs] [Times: user=2.38 sys=0.00, real=15.89 secs] 2014-04-09T12:58:34.661+0200: 77389.888: [GC2014-04-09T12:59:06.390+0200: 77421.616: [ParNew: 2623439K->2083K(3145728K), 0.0292270 secs] 6997709K->4376490K(21495808K), 31.7578950 secs] [Times: user=0.34 sys=0.02, real=31.76 secs] which I can?t explain at all. The real time of 31.76sec equals a pause of 31.76secs, in which the jvm does not respond to user requests, which is obviously bad. The application is _very_ allocation heavy, so generally pauses of 0.3sec are okay. Our GC settings for this server are: -Xms21g -Xmx21g -XX:ReservedCodeCacheSize=256m -XX:PermSize=256m -XX:MaxPermSize=768m -server -verbose:gc -Xloggc:log.gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+ExplicitGCInvokesConcurrent -XX:NewRatio=5 -XX:SurvivorRatio=5 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=40 -XX:+CMSClassUnloadingEnabled -XX:+CMSScavengeBeforeRemark -Dsun.rmi.dgc.client.gcInterval=1209600000 -Dsun.rmi.dgc.server.gcInterval=1209600000 ? We run the sun jdk 7u51 on a current debian wheezy. We previously had issues with long ParNew pauses, but back then, the sys time was high, so we concluded that the server was swapping, which we were able to prevent. Do you have any idea or further hint at debugging options which might help us in finding the issue? ? Regards, Cornelius Riemenschneider -- ITscope GmbH Ludwig-Erhard-Allee 20 76131 Karlsruhe Email: cornelius.riemenschneider at itscope.de https://www.itscope.com Handelsregister: AG Mannheim, HRB 232782 Sitz der Gesellschaft: Karlsruhe Gesch?ftsf?hrer: Alexander M?nkel, Benjamin Mund, Stefan Reger ? _______________________________________________ hotspot-gc-use mailing list hotspot-gc-use at openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: From jesper.wilhelmsson at oracle.com Wed Apr 9 14:46:17 2014 From: jesper.wilhelmsson at oracle.com (Jesper Wilhelmsson) Date: Wed, 09 Apr 2014 16:46:17 +0200 Subject: AW: Why G1 doesn't cut it for our application In-Reply-To: References: Message-ID: <53455D39.2070706@oracle.com> Hi, The changes I refer to are available in the 8u20 repository. There are no present plans to backport them to 7. If you have a test setup with similar properties a run with a recent G1 could tell us if the problem is fixed or if we need to work more on it. /Jesper Cornelius Riemenschneider skrev 9/4/14 16:38: > Hello, > thanks for your answer. Unfortunately, building from source is not on the table for our production systems. > Will these changes only go in java 8, or will we see them in 7u60 or later versions of java 7? > > Regards, > Cornelius Riemenschneider > -- > ITscope GmbH > Ludwig-Erhard-Allee 20 > 76131 Karlsruhe > Email: cornelius.riemenschneider at itscope.de > https://www.itscope.com > Handelsregister: AG Mannheim, HRB 232782 > Sitz der Gesellschaft: Karlsruhe > Gesch?ftsf?hrer: Alexander M?nkel, Benjamin Mund, Stefan Reger > > -----Urspr?ngliche Nachricht----- > Von: Jesper Wilhelmsson [mailto:jesper.wilhelmsson at oracle.com] > Gesendet: Mittwoch, 9. April 2014 16:33 > An: Cornelius Riemenschneider; hotspot-gc-use at openjdk.java.net > Betreff: Re: Why G1 doesn't cut it for our application > > Hi Cornelius, > > G1 is known to have issues with very large objects, especially if they vary in size over the life of the application. If you have the option to build from sources (OpenJDK) you can try the latest version of G1. There have been some improvements to the large objects handling lately. Some of these changes could work in your favor. > > Best regards, > /Jesper > > > Cornelius Riemenschneider skrev 9/4/14 13:56: >> Hello, >> >> after recently switching to the latest java 7 (u51), I was eager to try out G1. >> >> I used mainly >> http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collect >> or-g1-gc-migration-to-expectations-and-advanced-tuning >> for tuning, but >> >> I hit a roadblock which makes it impossile for us to use G1. >> >> Our allocation pattern includes sometimes huge objects, sometimes in >> the range of ~120MB, sometimes ~600MB, but I?ve seen about 1.2GB as >> well. This is obviously unfriendly to the GC. >> >> Our tuned CMS mostly handles this, but sometimes we hit problems, so >> we had high expectations for G1. >> >> G1, in our case, triggers FullGC way more often than CMS, even when >> the heap is mostly empty. >> >> An example log excerpt for this: >> >> 399934.892: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: >> humongous allocation request failed, allocation request: 117245600 >> bytes] >> >> 399934.892: [G1Ergonomics (Heap Sizing) expand the heap, requested >> expansion >> amount: 83886080 bytes, attempted expansion amount: 83886080 bytes] >> >> 399934.892: [G1Ergonomics (Heap Sizing) did not expand the heap, >> reason: heap expansion operation failed] >> >> 399934.893: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: >> humongous allocation request failed, allocation request: 117245600 >> bytes] >> >> 399934.893: [G1Ergonomics (Heap Sizing) expand the heap, requested >> expansion >> amount: 83886080 bytes, attempted expansion amount: 83886080 bytes] >> >> 399934.893: [G1Ergonomics (Heap Sizing) did not expand the heap, >> reason: heap expansion operation failed] >> >> 399934.893: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: >> allocation request failed, allocation request: 117245600 bytes] >> >> 399934.893: [G1Ergonomics (Heap Sizing) expand the heap, requested >> expansion >> amount: 117245600 bytes, attempted expansion amount: 117440512 bytes] >> >> 399934.893: [G1Ergonomics (Heap Sizing) did not expand the heap, >> reason: heap expansion operation failed] >> >> 2014-04-09T12:12:49.602+0200: 399934.894: [Full GC 11G->8118M(20G), >> 20.8752850 secs] >> >> [Eden: 8192.0K(1016.0M)->0.0B(2728.0M) Survivors: 96.0M->0.0B Heap: >> 11.6G(20.0G)->8118.8M(20.0G)] >> >> [Times: user=37.77 sys=0.00, real=20.88 secs] >> >> 2014-04-09T12:13:10.479+0200: 399955.770: [GC concurrent-mark-abort] >> >> We have a total of 20G for the heap available, and try to allocate >> objects in the 120MB range. >> >> 9 GB of the heap are free, so these should fit in without problems, >> even in Eden is a lot of free space. >> >> The attempted heap expansion fails, because we use >> >> -Xms20g >> >> -Xmx20g >> >> which is the maximum the server is able to handle. >> >> Still, G1 gets us a FullGC here. This FullGC may be faster than a CMS >> FullGC, but these happen way too often to be tolerated, especially as >> this server is responsible for a web >> >> application with which users directly interact ? 20 secs pause after >> clicking are simply not tolerable. >> >> Besides using CMS, or not doing large allocations (which is sometimes >> impossible, given that we deal with a lot of data), >> >> do you have oher ideas? >> >> Is it known that an allocation pattern with a lot of huge objects breaks G1? >> >> The above linked presentation suggests to increase the G1 region size >> when humongous allocation requests are encountered, so these >> allocation go in eden, but we can >> >> not increase the region size beyond 32M, so this fix doesn?t work for us. >> >> Regards, >> >> Cornelius Riemenschneider >> >> -- >> >> ITscope GmbH >> >> Ludwig-Erhard-Allee 20 >> >> 76131 Karlsruhe >> >> Email: cornelius.riemenschneider at itscope.de >> >> https://www.itscope.com >> >> Handelsregister: AG Mannheim, HRB 232782 >> >> Sitz der Gesellschaft: Karlsruhe >> >> Gesch?ftsf?hrer: Alexander M?nkel, Benjamin Mund, Stefan Reger >> >> >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> > > From thomas.schatzl at oracle.com Wed Apr 9 14:47:18 2014 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Wed, 09 Apr 2014 16:47:18 +0200 Subject: Why G1 doesn't cut it for our application In-Reply-To: References: Message-ID: <1397054838.2710.101.camel@cirrus> Hi Cornelius, On Wed, 2014-04-09 at 13:56 +0200, Cornelius Riemenschneider wrote: > Hello, > > after recently switching to the latest java 7 (u51), I was eager to > try out G1. > > I used mainly > http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-gc-migration-to-expectations-and-advanced-tuning for tuning, but > > I hit a roadblock which makes it impossile for us to use G1. > > Our allocation pattern includes sometimes huge objects, sometimes in > the range of ~120MB, sometimes ~600MB, but I?ve seen about 1.2GB as > well. This is obviously unfriendly to the GC. > > Our tuned CMS mostly handles this, but sometimes we hit problems, so > we had high expectations for G1. > > G1, in our case, triggers FullGC way more often than CMS, even when > the heap is mostly empty. > >[...] > > We have a total of 20G for the heap available, and try to allocate > objects in the 120MB range. > > 9 GB of the heap are free, so these should fit in without problems, > even in Eden is a lot of free space. > > Still, G1 gets us a FullGC here. This FullGC may be faster than a CMS > FullGC, but these happen way too often to be tolerated, especially as > this server is responsible for a web > application with which users directly interact ? 20 secs pause after > clicking are simply not tolerable. > > Besides using CMS, or not doing large allocations (which is sometimes > impossible, given that we deal with a lot of data), > > do you have oher ideas? > > Is it known that an allocation pattern with a lot of huge objects > breaks G1? Current releases with G1 all have problems with many large objects. The only workaround at this time I can think of, for the case when these large objects are rather short-lived, is to increase the frequency of the concurrent marking (decreasing InitiatingHeapOccupancyPercent to a value where marking is running more frequently) to reclaim them faster. Beginning with 8u20 effort has been put in to decrease this problem in particular for shorter-living large objects. If the heap is relatively empty, as in your case, one change that sorts the free region list (https://bugs.openjdk.java.net/browse/JDK-8036025) tends to help a lot. This change has been pushed to the 8u20 repository already, and there may be a Java Early Access download for it already. We have been working on a variety on other improvements in that area lately, like a method to reclaim short-living large objects at every GC (https://bugs.openjdk.java.net/browse/JDK-8027959) , or in case of a dense heap, allocating objects in "tail regions" of large objects (https://bugs.openjdk.java.net/browse/JDK-8031381). There are some more ideas floating around. > The above linked presentation suggests to increase the G1 region size > when humongous allocation requests are encountered, so these > allocation go in eden, but we can not increase the region size beyond > 32M, so this fix doesn?t work for us. As mentioned, the only suggestion I can think of at this time is to decrease the InitiatingHeapOccupancyPercent appropriately so that the marking will more frequently try to reclaim these large objects, leading to more space available. hth, Thomas From cri at itscope.de Wed Apr 9 15:00:06 2014 From: cri at itscope.de (=?windows-1252?Q?Cornelius_Riemenschneider?=) Date: Wed, 9 Apr 2014 17:00:06 +0200 Subject: Why G1 doesn't cut it for our application In-Reply-To: <1397054838.2710.101.camel@cirrus> References: Message-ID: Hi, The server having problems with the 120MB allocation has InitiatingHeapOccupancyPercent=45, the server with the bigger allocations (600MB, 1.2GB) had InitiatingHeapOccupancyPercent=0, but it allocates objects quickly, so it didn't really help. Another problem is that our objects are at least sometimes not short-lived - eden is collected every few seconds, but our objects may live for 20-30 sec, because we load a lot of data from mysql, process the data and munge them in a customer-specified format and then write out a ~100MB zip file containing text files. The raw data for the text files are obviously stored in the heap, per file, and get quite large. When I still encounter problems with 8u20 or later, I'll write again, but I can't promise I'll get to it soon. Regards, Cornelius Riemenschneider -- ITscope GmbH Ludwig-Erhard-Allee 20 76131 Karlsruhe Email: cornelius.riemenschneider at itscope.de https://www.itscope.com Handelsregister: AG Mannheim, HRB 232782 Sitz der Gesellschaft: Karlsruhe Gesch?ftsf?hrer: Alexander M?nkel, Benjamin Mund, Stefan Reger -----Urspr?ngliche Nachricht----- Von: Thomas Schatzl [mailto:thomas.schatzl at oracle.com] Gesendet: Mittwoch, 9. April 2014 16:47 An: Cornelius Riemenschneider Cc: hotspot-gc-use at openjdk.java.net Betreff: Re: Why G1 doesn't cut it for our application Hi Cornelius, On Wed, 2014-04-09 at 13:56 +0200, Cornelius Riemenschneider wrote: > Hello, > > after recently switching to the latest java 7 (u51), I was eager to > try out G1. > > I used mainly > http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collect > or-g1-gc-migration-to-expectations-and-advanced-tuning for tuning, but > > I hit a roadblock which makes it impossile for us to use G1. > > Our allocation pattern includes sometimes huge objects, sometimes in > the range of ~120MB, sometimes ~600MB, but I?ve seen about 1.2GB as > well. This is obviously unfriendly to the GC. > > Our tuned CMS mostly handles this, but sometimes we hit problems, so > we had high expectations for G1. > > G1, in our case, triggers FullGC way more often than CMS, even when > the heap is mostly empty. > >[...] > > We have a total of 20G for the heap available, and try to allocate > objects in the 120MB range. > > 9 GB of the heap are free, so these should fit in without problems, > even in Eden is a lot of free space. > > Still, G1 gets us a FullGC here. This FullGC may be faster than a CMS > FullGC, but these happen way too often to be tolerated, especially as > this server is responsible for a web application with which users > directly interact ? 20 secs pause after clicking are simply not > tolerable. > > Besides using CMS, or not doing large allocations (which is sometimes > impossible, given that we deal with a lot of data), > > do you have oher ideas? > > Is it known that an allocation pattern with a lot of huge objects > breaks G1? Current releases with G1 all have problems with many large objects. The only workaround at this time I can think of, for the case when these large objects are rather short-lived, is to increase the frequency of the concurrent marking (decreasing InitiatingHeapOccupancyPercent to a value where marking is running more frequently) to reclaim them faster. Beginning with 8u20 effort has been put in to decrease this problem in particular for shorter-living large objects. If the heap is relatively empty, as in your case, one change that sorts the free region list (https://bugs.openjdk.java.net/browse/JDK-8036025) tends to help a lot. This change has been pushed to the 8u20 repository already, and there may be a Java Early Access download for it already. We have been working on a variety on other improvements in that area lately, like a method to reclaim short-living large objects at every GC (https://bugs.openjdk.java.net/browse/JDK-8027959) , or in case of a dense heap, allocating objects in "tail regions" of large objects (https://bugs.openjdk.java.net/browse/JDK-8031381). There are some more ideas floating around. > The above linked presentation suggests to increase the G1 region size > when humongous allocation requests are encountered, so these > allocation go in eden, but we can not increase the region size beyond > 32M, so this fix doesn?t work for us. As mentioned, the only suggestion I can think of at this time is to decrease the InitiatingHeapOccupancyPercent appropriately so that the marking will more frequently try to reclaim these large objects, leading to more space available. hth, Thomas From vitalyd at gmail.com Wed Apr 9 15:06:27 2014 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Wed, 9 Apr 2014 11:06:27 -0400 Subject: AW: ridiculous ParNew pause times In-Reply-To: References: Message-ID: Interesting. How much RAM does this machine have? You're using about 21-22g for jvm and another 5g for redis, right? Is there room to spare in physical memory? If there's no swap activity, then I don't think there's any I/O interference. I'm guessing that redis writing out 5g of memory is blowing the cpu caches for the jvm, and causing page faults. Although if there's no swap activity, it seems these faults would be soft and it's hard to explain how that would balloon to 30 sec pauses and not show up in sys time then. Are there lots of context switches in the java process when this happens? Sent from my phone On Apr 9, 2014 10:41 AM, "Cornelius Riemenschneider" wrote: > Hey, > > thanks for the hints :-) > > The server runs one JVM and one redis instance (A event-based, single > threaded in-memory nosql-datastore). > > Redis stores about 5G of data, which are written to disk from time to time > - it now turns out, that the redis saves align perfectly with our long > ParNew times. > > By initiating a redis save I was able to get these garbage collections. > > 2014-04-09T15:06:07.892+0200: 85043.119: [GC2014-04-09T15:06:18.089+0200: > 85053.315: [ParNew: 2964426K->338263K(3145728K), 0.1532950 secs] > 7167668K->4541505K(21495808K), 10.3502410 secs] [Times: user=1.93 sys=0.01, > real=10.35 secs] > > 2014-04-09T15:06:39.203+0200: 85074.429: [GC2014-04-09T15:07:11.026+0200: > 85106.252: [ParNew: 3145728K->405516K(3145728K), 0.3429470 secs] > 7645283K->5212053K(21495808K), 32.1663210 secs] [Times: user=2.96 sys=0.01, > real=32.17 secs] > > I monitored the system during the GCs and swapping definitly does not > happen. > > Because redis is single-threaded (and only used by the JVM which is during > ParNew in a STW phase), during ParNew there is only redis active, trying to > write 5G of memory to disk as fast as possible and the GC. > > I wasn't able to pinpoint the issue yet, do you have an idea why the jvm > could block on I/O during a GC? > > Disk access is during these phases probably way slower than usual, as > everything is on the / partition, a RAID 1 disk array. > > Or does anyone know the right perf options for the perf tool to monitor > blocking i/o during garbage collections? > > > > Regards, > > Cornelius Riemenschneider > > -- > > ITscope GmbH > > Ludwig-Erhard-Allee 20 > > 76131 Karlsruhe > > Email: cornelius.riemenschneider at itscope.de > > https://www.itscope.com > > Handelsregister: AG Mannheim, HRB 232782 > > Sitz der Gesellschaft: Karlsruhe > > Gesch?ftsf?hrer: Alexander M?nkel, Benjamin Mund, Stefan Reger > > > > *Von:* Vitaly Davidovich [mailto:vitalyd at gmail.com] > *Gesendet:* Mittwoch, 9. April 2014 14:49 > *An:* Cornelius Riemenschneider > *Cc:* hotspot-gc-use > *Betreff:* Re: ridiculous ParNew pause times > > > > This is typically caused by one of: > > 1) heavy swapping (this won't count towards sys time because kernel is not > using cpu while waiting for I/O to complete) > 2) oversubscribed machine where gc threads aren't getting enough cpu time > due > > Have you looked at stats on the machine when these pauses occur, > specifically around swap activity? Is your machine running multiple JVMs or > any other noisy neighbor apps? > > Sent from my phone > > On Apr 9, 2014 7:26 AM, "Cornelius Riemenschneider" > wrote: > > Hello, > > we've got the following problem with the ParNew collector: > > Our log.gc usually looks like this: > > 2014-04-09T12:58:02.485+0200: 77357.712: [GC2014-04-09T12:58:02.485+0200: > 77357.712: [ParNew: 2722925K->100795K(3145728K), 0.3202010 secs] > 6998057K->4375934K(21495808K), 0.3205670 secs] [Times: user=4.10 sys=0.02, > real=0.32 secs] > > 2014-04-09T12:58:06.256+0200: 77361.483: [GC2014-04-09T12:58:06.257+0200: > 77361.483: [ParNew: 2722235K->101011K(3145728K), 0.3229910 secs] > 6997374K->4376165K(21495808K), 0.3233580 secs] [Times: user=4.13 sys=0.02, > real=0.33 secs] > > 2014-04-09T12:58:12.295+0200: 77367.522: [GC2014-04-09T12:58:12.296+0200: > 77367.522: [ParNew: 2722451K->101057K(3145728K), 0.3215320 secs] > 6997605K->4376216K(21495808K), 0.3219080 secs] [Times: user=4.12 sys=0.01, > real=0.32 secs] > > 2014-04-09T12:58:18.461+0200: 77373.688: [GC2014-04-09T12:58:18.462+0200: > 77373.688: [ParNew: 2722497K->2232K(3145728K), 0.2944540 secs] > 6997656K->4376242K(21495808K), 0.2948280 secs] [Times: user=3.79 sys=0.00, > real=0.29 secs] > > But occasionally we have entries like these: > > 2014-04-09T09:56:12.840+0200: 66448.066: [GC2014-04-09T09:56:38.154+0200: > 66473.381: [ParNew: 3139808K->524288K(3145728K), 0.8355990 secs] > 6845512K->4585563K(21495808K), 26.1502640 secs] [Times: user=5.59 sys=0.16, > real=26.16 secs] > > 2014-04-09T09:57:09.173+0200: 66504.400: [GC2014-04-09T09:57:24.871+0200: > 66520.098: [ParNew: 2950573K->488555K(3145728K), 0.1876660 secs] > 8701082K->6239064K(21495808K), 15.8858250 secs] [Times: user=2.38 sys=0.00, > real=15.89 secs] > > 2014-04-09T12:58:34.661+0200: 77389.888: [GC2014-04-09T12:59:06.390+0200: > 77421.616: [ParNew: 2623439K->2083K(3145728K), 0.0292270 secs] > 6997709K->4376490K(21495808K), 31.7578950 secs] [Times: user=0.34 sys=0.02, > real=31.76 secs] > > which I can't explain at all. > > The real time of 31.76sec equals a pause of 31.76secs, in which the jvm > does not respond to user requests, which is obviously bad. > > The application is _*very*_ allocation heavy, so generally pauses of > 0.3sec are okay. > > Our GC settings for this server are: > > -Xms21g > > -Xmx21g > > -XX:ReservedCodeCacheSize=256m > > -XX:PermSize=256m > > -XX:MaxPermSize=768m > > -server > > -verbose:gc > > -Xloggc:log.gc > > -XX:+PrintGCDetails > > -XX:+PrintGCDateStamps > > -XX:+ExplicitGCInvokesConcurrent > > -XX:NewRatio=5 > > -XX:SurvivorRatio=5 > > -XX:+UseConcMarkSweepGC > > -XX:+UseParNewGC > > -XX:+UseCMSInitiatingOccupancyOnly > > -XX:CMSInitiatingOccupancyFraction=40 > > -XX:+CMSClassUnloadingEnabled > > -XX:+CMSScavengeBeforeRemark > > -Dsun.rmi.dgc.client.gcInterval=1209600000 > > -Dsun.rmi.dgc.server.gcInterval=1209600000 > > > > We run the sun jdk 7u51 on a current debian wheezy. > > We previously had issues with long ParNew pauses, but back then, the sys > time was high, so we concluded that the server was swapping, > > which we were able to prevent. > > Do you have any idea or further hint at debugging options which might help > us in finding the issue? > > > > Regards, > > Cornelius Riemenschneider > > -- > > ITscope GmbH > > Ludwig-Erhard-Allee 20 > > 76131 Karlsruhe > > Email: cornelius.riemenschneider at itscope.de > > https://www.itscope.com > > Handelsregister: AG Mannheim, HRB 232782 > > Sitz der Gesellschaft: Karlsruhe > > Gesch?ftsf?hrer: Alexander M?nkel, Benjamin Mund, Stefan Reger > > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vitalyd at gmail.com Wed Apr 9 15:10:49 2014 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Wed, 9 Apr 2014 11:10:49 -0400 Subject: Why G1 doesn't cut it for our application In-Reply-To: References: <1397054838.2710.101.camel@cirrus> Message-ID: If you're storing entire file content in memory for large files, I think GC will very frequently have problems. Is there a way for you to make this streaming? I know that's not as simple as tuning GC ... or, maybe it will be simpler afterall :) Sent from my phone On Apr 9, 2014 11:00 AM, "Cornelius Riemenschneider" wrote: > Hi, > The server having problems with the 120MB allocation has > InitiatingHeapOccupancyPercent=45, > the server with the bigger allocations (600MB, 1.2GB) had > InitiatingHeapOccupancyPercent=0, but it allocates objects quickly, so it > didn't really help. > Another problem is that our objects are at least sometimes not short-lived > - eden is collected every few seconds, but our objects may live for 20-30 > sec, because > we load a lot of data from mysql, process the data and munge them in a > customer-specified format and then write out a ~100MB zip file containing > text files. > The raw data for the text files are obviously stored in the heap, per > file, and get quite large. > When I still encounter problems with 8u20 or later, I'll write again, but > I can't promise I'll get to it soon. > > Regards, > Cornelius Riemenschneider > -- > ITscope GmbH > Ludwig-Erhard-Allee 20 > 76131 Karlsruhe > Email: cornelius.riemenschneider at itscope.de > https://www.itscope.com > Handelsregister: AG Mannheim, HRB 232782 > Sitz der Gesellschaft: Karlsruhe > Gesch?ftsf?hrer: Alexander M?nkel, Benjamin Mund, Stefan Reger > > > -----Urspr?ngliche Nachricht----- > Von: Thomas Schatzl [mailto:thomas.schatzl at oracle.com] > Gesendet: Mittwoch, 9. April 2014 16:47 > An: Cornelius Riemenschneider > Cc: hotspot-gc-use at openjdk.java.net > Betreff: Re: Why G1 doesn't cut it for our application > > Hi Cornelius, > > On Wed, 2014-04-09 at 13:56 +0200, Cornelius Riemenschneider wrote: > > Hello, > > > > after recently switching to the latest java 7 (u51), I was eager to > > try out G1. > > > > I used mainly > > http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collect > > or-g1-gc-migration-to-expectations-and-advanced-tuning for tuning, but > > > > I hit a roadblock which makes it impossile for us to use G1. > > > > Our allocation pattern includes sometimes huge objects, sometimes in > > the range of ~120MB, sometimes ~600MB, but I?ve seen about 1.2GB as > > well. This is obviously unfriendly to the GC. > > > > Our tuned CMS mostly handles this, but sometimes we hit problems, so > > we had high expectations for G1. > > > > G1, in our case, triggers FullGC way more often than CMS, even when > > the heap is mostly empty. > > > >[...] > > > > We have a total of 20G for the heap available, and try to allocate > > objects in the 120MB range. > > > > 9 GB of the heap are free, so these should fit in without problems, > > even in Eden is a lot of free space. > > > > Still, G1 gets us a FullGC here. This FullGC may be faster than a CMS > > FullGC, but these happen way too often to be tolerated, especially as > > this server is responsible for a web application with which users > > directly interact ? 20 secs pause after clicking are simply not > > tolerable. > > > > Besides using CMS, or not doing large allocations (which is sometimes > > impossible, given that we deal with a lot of data), > > > > do you have oher ideas? > > > > Is it known that an allocation pattern with a lot of huge objects > > breaks G1? > > Current releases with G1 all have problems with many large objects. > > The only workaround at this time I can think of, for the case when these > large objects are rather short-lived, is to increase the frequency of the > concurrent marking (decreasing InitiatingHeapOccupancyPercent to a value > where marking is running more frequently) to reclaim them faster. > > Beginning with 8u20 effort has been put in to decrease this problem in > particular for shorter-living large objects. > > If the heap is relatively empty, as in your case, one change that sorts > the free region list (https://bugs.openjdk.java.net/browse/JDK-8036025) > tends to help a lot. This change has been pushed to the 8u20 repository > already, and there may be a Java Early Access download for it already. > > We have been working on a variety on other improvements in that area > lately, like a method to reclaim short-living large objects at every GC > (https://bugs.openjdk.java.net/browse/JDK-8027959) , or in case of a > dense heap, allocating objects in "tail regions" of large objects ( > https://bugs.openjdk.java.net/browse/JDK-8031381). > > There are some more ideas floating around. > > > The above linked presentation suggests to increase the G1 region size > > when humongous allocation requests are encountered, so these > > allocation go in eden, but we can not increase the region size beyond > > 32M, so this fix doesn?t work for us. > > As mentioned, the only suggestion I can think of at this time is to > decrease the InitiatingHeapOccupancyPercent appropriately so that the > marking will more frequently try to reclaim these large objects, leading to > more space available. > > hth, > Thomas > > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wolfgang.pedot at finkzeit.at Wed Apr 9 15:29:21 2014 From: wolfgang.pedot at finkzeit.at (Wolfgang Pedot) Date: Wed, 09 Apr 2014 17:29:21 +0200 Subject: ridiculous ParNew pause times In-Reply-To: References: Message-ID: <53456751.40408@finkzeit.at> A crazy idea: could the gc-logging be the problem here because the redis-dump is using up all the IO bandwidth? Wolfgang Am 09.04.2014 17:06, schrieb Vitaly Davidovich: > Interesting. How much RAM does this machine have? You're using about > 21-22g for jvm and another 5g for redis, right? Is there room to spare > in physical memory? > > If there's no swap activity, then I don't think there's any I/O > interference. I'm guessing that redis writing out 5g of memory is > blowing the cpu caches for the jvm, and causing page faults. Although > if there's no swap activity, it seems these faults would be soft and > it's hard to explain how that would balloon to 30 sec pauses and not > show up in sys time then. > > Are there lots of context switches in the java process when this happens? > > Sent from my phone > > On Apr 9, 2014 10:41 AM, "Cornelius Riemenschneider" > wrote: > > Hey,____ > > thanks for the hints :-)____ > > The server runs one JVM and one redis instance (A event-based, > single threaded in-memory nosql-datastore).____ > > Redis stores about 5G of data, which are written to disk from time > to time ? it now turns out, that the redis saves align perfectly > with our long ParNew times.____ > > By initiating a redis save I was able to get these garbage > collections.____ > > 2014-04-09T15:06:07.892+0200: 85043.119: > [GC2014-04-09T15:06:18.089+0200: 85053.315: [ParNew: > 2964426K->338263K(3145728K), 0.1532950 secs] > 7167668K->4541505K(21495808K), 10.3502410 secs] [Times: user=1.93 > sys=0.01, real=10.35 secs]____ > > 2014-04-09T15:06:39.203+0200: 85074.429: > [GC2014-04-09T15:07:11.026+0200: 85106.252: [ParNew: > 3145728K->405516K(3145728K), 0.3429470 secs] > 7645283K->5212053K(21495808K), 32.1663210 secs] [Times: user=2.96 > sys=0.01, real=32.17 secs]____ > > I monitored the system during the GCs and swapping definitly does > not happen.____ > > Because redis is single-threaded (and only used by the JVM which is > during ParNew in a STW phase), during ParNew there is only redis > active, trying to write 5G of memory to disk as fast as possible and > the GC.____ > > I wasn?t able to pinpoint the issue yet, do you have an idea why the > jvm could block on I/O during a GC?____ > > Disk access is during these phases probably way slower than usual, > as everything is on the / partition, a RAID 1 disk array.____ > > Or does anyone know the right perf options for the perf tool to > monitor blocking i/o during garbage collections?____ > > __ __ > > Regards,____ > > Cornelius Riemenschneider____ > > --____ > > ITscope GmbH____ > > Ludwig-Erhard-Allee 20____ > > 76131 Karlsruhe____ > > Email: cornelius.riemenschneider at itscope.de > ____ > > https://www.itscope.com____ > > Handelsregister: AG Mannheim, HRB 232782____ > > Sitz der Gesellschaft: Karlsruhe____ > > Gesch?ftsf?hrer: Alexander M?nkel, Benjamin Mund, Stefan Reger____ > > __ __ > > *Von:*Vitaly Davidovich [mailto:vitalyd at gmail.com > ] > *Gesendet:* Mittwoch, 9. April 2014 14:49 > *An:* Cornelius Riemenschneider > *Cc:* hotspot-gc-use > *Betreff:* Re: ridiculous ParNew pause times____ > > __ __ > > This is typically caused by one of:____ > > 1) heavy swapping (this won't count towards sys time because kernel > is not using cpu while waiting for I/O to complete) > 2) oversubscribed machine where gc threads aren't getting enough cpu > time due____ > > Have you looked at stats on the machine when these pauses occur, > specifically around swap activity? Is your machine running multiple > JVMs or any other noisy neighbor apps?____ > > Sent from my phone____ > > On Apr 9, 2014 7:26 AM, "Cornelius Riemenschneider" > wrote:____ > > Hello,____ > > we?ve got the following problem with the ParNew collector:____ > > Our log.gc usually looks like this:____ > > 2014-04-09T12:58:02.485+0200: 77357.712: > [GC2014-04-09T12:58:02.485+0200: 77357.712: [ParNew: > 2722925K->100795K(3145728K), 0.3202010 secs] > 6998057K->4375934K(21495808K), 0.3205670 secs] [Times: user=4.10 > sys=0.02, real=0.32 secs] ____ > > 2014-04-09T12:58:06.256+0200: 77361.483: > [GC2014-04-09T12:58:06.257+0200: 77361.483: [ParNew: > 2722235K->101011K(3145728K), 0.3229910 secs] > 6997374K->4376165K(21495808K), 0.3233580 secs] [Times: user=4.13 > sys=0.02, real=0.33 secs] ____ > > 2014-04-09T12:58:12.295+0200: 77367.522: > [GC2014-04-09T12:58:12.296+0200: 77367.522: [ParNew: > 2722451K->101057K(3145728K), 0.3215320 secs] > 6997605K->4376216K(21495808K), 0.3219080 secs] [Times: user=4.12 > sys=0.01, real=0.32 secs] ____ > > 2014-04-09T12:58:18.461+0200: 77373.688: > [GC2014-04-09T12:58:18.462+0200: 77373.688: [ParNew: > 2722497K->2232K(3145728K), 0.2944540 secs] > 6997656K->4376242K(21495808K), 0.2948280 secs] [Times: user=3.79 > sys=0.00, real=0.29 secs]____ > > But occasionally we have entries like these:____ > > 2014-04-09T09:56:12.840+0200: 66448.066: > [GC2014-04-09T09:56:38.154+0200: 66473.381: [ParNew: > 3139808K->524288K(3145728K), 0.8355990 secs] > 6845512K->4585563K(21495808K), 26.1502640 secs] [Times: user=5.59 > sys=0.16, real=26.16 secs] ____ > > 2014-04-09T09:57:09.173+0200: 66504.400: > [GC2014-04-09T09:57:24.871+0200: 66520.098: [ParNew: > 2950573K->488555K(3145728K), 0.1876660 secs] > 8701082K->6239064K(21495808K), 15.8858250 secs] [Times: user=2.38 > sys=0.00, real=15.89 secs] ____ > > 2014-04-09T12:58:34.661+0200: 77389.888: > [GC2014-04-09T12:59:06.390+0200: 77421.616: [ParNew: > 2623439K->2083K(3145728K), 0.0292270 secs] > 6997709K->4376490K(21495808K), 31.7578950 secs] [Times: user=0.34 > sys=0.02, real=31.76 secs] ____ > > which I can?t explain at all.____ > > The real time of 31.76sec equals a pause of 31.76secs, in which the > jvm does not respond to user requests, which is obviously bad.____ > > The application is _/very/_ allocation heavy, so generally pauses of > 0.3sec are okay.____ > > Our GC settings for this server are:____ > > -Xms21g____ > > -Xmx21g____ > > -XX:ReservedCodeCacheSize=256m____ > > -XX:PermSize=256m____ > > -XX:MaxPermSize=768m____ > > -server____ > > -verbose:gc____ > > -Xloggc:log.gc____ > > -XX:+PrintGCDetails____ > > -XX:+PrintGCDateStamps____ > > -XX:+ExplicitGCInvokesConcurrent____ > > -XX:NewRatio=5____ > > -XX:SurvivorRatio=5____ > > -XX:+UseConcMarkSweepGC____ > > -XX:+UseParNewGC____ > > -XX:+UseCMSInitiatingOccupancyOnly____ > > -XX:CMSInitiatingOccupancyFraction=40____ > > -XX:+CMSClassUnloadingEnabled____ > > -XX:+CMSScavengeBeforeRemark____ > > -Dsun.rmi.dgc.client.gcInterval=1209600000____ > > -Dsun.rmi.dgc.server.gcInterval=1209600000____ > > ____ > > We run the sun jdk 7u51 on a current debian wheezy.____ > > We previously had issues with long ParNew pauses, but back then, the > sys time was high, so we concluded that the server was swapping,____ > > which we were able to prevent.____ > > Do you have any idea or further hint at debugging options which > might help us in finding the issue?____ > > ____ > > Regards,____ > > Cornelius Riemenschneider____ > > --____ > > ITscope GmbH____ > > Ludwig-Erhard-Allee 20____ > > 76131 Karlsruhe____ > > Email: cornelius.riemenschneider at itscope.de > ____ > > https://www.itscope.com____ > > Handelsregister: AG Mannheim, HRB 232782____ > > Sitz der Gesellschaft: Karlsruhe____ > > Gesch?ftsf?hrer: Alexander M?nkel, Benjamin Mund, Stefan Reger____ > > ____ > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use____ > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > From vitalyd at gmail.com Wed Apr 9 15:40:03 2014 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Wed, 9 Apr 2014 11:40:03 -0400 Subject: ridiculous ParNew pause times In-Reply-To: <53456751.40408@finkzeit.at> References: <53456751.40408@finkzeit.at> Message-ID: That's an interesting theory. Does the jvm fsync log writes? If it doesn't then I'd expect the write syscalls to not actually block. Perhaps there's some contention in the kernel though? Cornelius, maybe you can strace the jvm and see what syscalls are made in this scenario? Sent from my phone On Apr 9, 2014 11:29 AM, "Wolfgang Pedot" wrote: > A crazy idea: could the gc-logging be the problem here because the > redis-dump is using up all the IO bandwidth? > > Wolfgang > > Am 09.04.2014 17:06, schrieb Vitaly Davidovich: > >> Interesting. How much RAM does this machine have? You're using about >> 21-22g for jvm and another 5g for redis, right? Is there room to spare >> in physical memory? >> >> If there's no swap activity, then I don't think there's any I/O >> interference. I'm guessing that redis writing out 5g of memory is >> blowing the cpu caches for the jvm, and causing page faults. Although >> if there's no swap activity, it seems these faults would be soft and >> it's hard to explain how that would balloon to 30 sec pauses and not >> show up in sys time then. >> >> Are there lots of context switches in the java process when this happens? >> >> Sent from my phone >> >> On Apr 9, 2014 10:41 AM, "Cornelius Riemenschneider" > > wrote: >> >> Hey,____ >> >> thanks for the hints :-)____ >> >> The server runs one JVM and one redis instance (A event-based, >> single threaded in-memory nosql-datastore).____ >> >> Redis stores about 5G of data, which are written to disk from time >> to time - it now turns out, that the redis saves align perfectly >> with our long ParNew times.____ >> >> By initiating a redis save I was able to get these garbage >> collections.____ >> >> 2014-04-09T15:06:07.892+0200: 85043.119: >> [GC2014-04-09T15:06:18.089+0200: 85053.315: [ParNew: >> 2964426K->338263K(3145728K), 0.1532950 secs] >> 7167668K->4541505K(21495808K), 10.3502410 secs] [Times: user=1.93 >> sys=0.01, real=10.35 secs]____ >> >> 2014-04-09T15:06:39.203+0200: 85074.429: >> [GC2014-04-09T15:07:11.026+0200: 85106.252: [ParNew: >> 3145728K->405516K(3145728K), 0.3429470 secs] >> 7645283K->5212053K(21495808K), 32.1663210 secs] [Times: user=2.96 >> sys=0.01, real=32.17 secs]____ >> >> I monitored the system during the GCs and swapping definitly does >> not happen.____ >> >> Because redis is single-threaded (and only used by the JVM which is >> during ParNew in a STW phase), during ParNew there is only redis >> active, trying to write 5G of memory to disk as fast as possible and >> the GC.____ >> >> I wasn't able to pinpoint the issue yet, do you have an idea why the >> jvm could block on I/O during a GC?____ >> >> Disk access is during these phases probably way slower than usual, >> as everything is on the / partition, a RAID 1 disk array.____ >> >> Or does anyone know the right perf options for the perf tool to >> monitor blocking i/o during garbage collections?____ >> >> __ __ >> >> Regards,____ >> >> Cornelius Riemenschneider____ >> >> --____ >> >> ITscope GmbH____ >> >> Ludwig-Erhard-Allee 20____ >> >> 76131 Karlsruhe____ >> >> Email: cornelius.riemenschneider at itscope.de >> ____ >> >> https://www.itscope.com____ >> >> Handelsregister: AG Mannheim, HRB 232782____ >> >> Sitz der Gesellschaft: Karlsruhe____ >> >> Gesch?ftsf?hrer: Alexander M?nkel, Benjamin Mund, Stefan Reger____ >> >> __ __ >> >> *Von:*Vitaly Davidovich [mailto:vitalyd at gmail.com >> ] >> *Gesendet:* Mittwoch, 9. April 2014 14:49 >> *An:* Cornelius Riemenschneider >> *Cc:* hotspot-gc-use >> *Betreff:* Re: ridiculous ParNew pause times____ >> >> __ __ >> >> This is typically caused by one of:____ >> >> 1) heavy swapping (this won't count towards sys time because kernel >> is not using cpu while waiting for I/O to complete) >> 2) oversubscribed machine where gc threads aren't getting enough cpu >> time due____ >> >> Have you looked at stats on the machine when these pauses occur, >> specifically around swap activity? Is your machine running multiple >> JVMs or any other noisy neighbor apps?____ >> >> Sent from my phone____ >> >> On Apr 9, 2014 7:26 AM, "Cornelius Riemenschneider" > > wrote:____ >> >> Hello,____ >> >> we've got the following problem with the ParNew collector:____ >> >> Our log.gc usually looks like this:____ >> >> 2014-04-09T12:58:02.485+0200: 77357.712: >> [GC2014-04-09T12:58:02.485+0200: 77357.712: [ParNew: >> 2722925K->100795K(3145728K), 0.3202010 secs] >> 6998057K->4375934K(21495808K), 0.3205670 secs] [Times: user=4.10 >> sys=0.02, real=0.32 secs] ____ >> >> 2014-04-09T12:58:06.256+0200: 77361.483: >> [GC2014-04-09T12:58:06.257+0200: 77361.483: [ParNew: >> 2722235K->101011K(3145728K), 0.3229910 secs] >> 6997374K->4376165K(21495808K), 0.3233580 secs] [Times: user=4.13 >> sys=0.02, real=0.33 secs] ____ >> >> 2014-04-09T12:58:12.295+0200: 77367.522: >> [GC2014-04-09T12:58:12.296+0200: 77367.522: [ParNew: >> 2722451K->101057K(3145728K), 0.3215320 secs] >> 6997605K->4376216K(21495808K), 0.3219080 secs] [Times: user=4.12 >> sys=0.01, real=0.32 secs] ____ >> >> 2014-04-09T12:58:18.461+0200: 77373.688: >> [GC2014-04-09T12:58:18.462+0200: 77373.688: [ParNew: >> 2722497K->2232K(3145728K), 0.2944540 secs] >> 6997656K->4376242K(21495808K), 0.2948280 secs] [Times: user=3.79 >> sys=0.00, real=0.29 secs]____ >> >> But occasionally we have entries like these:____ >> >> 2014-04-09T09:56:12.840+0200: 66448.066: >> [GC2014-04-09T09:56:38.154+0200: 66473.381: [ParNew: >> 3139808K->524288K(3145728K), 0.8355990 secs] >> 6845512K->4585563K(21495808K), 26.1502640 secs] [Times: user=5.59 >> sys=0.16, real=26.16 secs] ____ >> >> 2014-04-09T09:57:09.173+0200: 66504.400: >> [GC2014-04-09T09:57:24.871+0200: 66520.098: [ParNew: >> 2950573K->488555K(3145728K), 0.1876660 secs] >> 8701082K->6239064K(21495808K), 15.8858250 secs] [Times: user=2.38 >> sys=0.00, real=15.89 secs] ____ >> >> 2014-04-09T12:58:34.661+0200: 77389.888: >> [GC2014-04-09T12:59:06.390+0200: 77421.616: [ParNew: >> 2623439K->2083K(3145728K), 0.0292270 secs] >> 6997709K->4376490K(21495808K), 31.7578950 secs] [Times: user=0.34 >> sys=0.02, real=31.76 secs] ____ >> >> which I can't explain at all.____ >> >> The real time of 31.76sec equals a pause of 31.76secs, in which the >> jvm does not respond to user requests, which is obviously bad.____ >> >> The application is _/very/_ allocation heavy, so generally pauses of >> 0.3sec are okay.____ >> >> Our GC settings for this server are:____ >> >> -Xms21g____ >> >> -Xmx21g____ >> >> -XX:ReservedCodeCacheSize=256m____ >> >> -XX:PermSize=256m____ >> >> -XX:MaxPermSize=768m____ >> >> -server____ >> >> -verbose:gc____ >> >> -Xloggc:log.gc____ >> >> -XX:+PrintGCDetails____ >> >> -XX:+PrintGCDateStamps____ >> >> -XX:+ExplicitGCInvokesConcurrent____ >> >> -XX:NewRatio=5____ >> >> -XX:SurvivorRatio=5____ >> >> -XX:+UseConcMarkSweepGC____ >> >> -XX:+UseParNewGC____ >> >> -XX:+UseCMSInitiatingOccupancyOnly____ >> >> -XX:CMSInitiatingOccupancyFraction=40____ >> >> -XX:+CMSClassUnloadingEnabled____ >> >> -XX:+CMSScavengeBeforeRemark____ >> >> -Dsun.rmi.dgc.client.gcInterval=1209600000____ >> >> -Dsun.rmi.dgc.server.gcInterval=1209600000____ >> >> ____ >> >> We run the sun jdk 7u51 on a current debian wheezy.____ >> >> We previously had issues with long ParNew pauses, but back then, the >> sys time was high, so we concluded that the server was swapping,____ >> >> which we were able to prevent.____ >> >> Do you have any idea or further hint at debugging options which >> might help us in finding the issue?____ >> >> ____ >> >> Regards,____ >> >> Cornelius Riemenschneider____ >> >> --____ >> >> ITscope GmbH____ >> >> Ludwig-Erhard-Allee 20____ >> >> 76131 Karlsruhe____ >> >> Email: cornelius.riemenschneider at itscope.de >> ____ >> >> https://www.itscope.com____ >> >> Handelsregister: AG Mannheim, HRB 232782____ >> >> Sitz der Gesellschaft: Karlsruhe____ >> >> Gesch?ftsf?hrer: Alexander M?nkel, Benjamin Mund, Stefan Reger____ >> >> ____ >> >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net > openjdk.java.net> >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use____ >> >> >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gustav.r.akesson at gmail.com Wed Apr 9 15:48:16 2014 From: gustav.r.akesson at gmail.com (=?ISO-8859-1?Q?Gustav_=C5kesson?=) Date: Wed, 9 Apr 2014 17:48:16 +0200 Subject: ridiculous ParNew pause times In-Reply-To: References: <53456751.40408@finkzeit.at> Message-ID: Hi, I think that is an idea worth investigating. I've had many cases where GC logging is causing long YGCs when disc is heavily loaded. Try to log to a file on a RAM drive, if you have the possibility. Besy Regards, Gustav ?kesson Den 9 apr 2014 17:40 skrev "Vitaly Davidovich" : > That's an interesting theory. Does the jvm fsync log writes? If it > doesn't then I'd expect the write syscalls to not actually block. Perhaps > there's some contention in the kernel though? > > Cornelius, maybe you can strace the jvm and see what syscalls are made in > this scenario? > > Sent from my phone > On Apr 9, 2014 11:29 AM, "Wolfgang Pedot" > wrote: > >> A crazy idea: could the gc-logging be the problem here because the >> redis-dump is using up all the IO bandwidth? >> >> Wolfgang >> >> Am 09.04.2014 17:06, schrieb Vitaly Davidovich: >> >>> Interesting. How much RAM does this machine have? You're using about >>> 21-22g for jvm and another 5g for redis, right? Is there room to spare >>> in physical memory? >>> >>> If there's no swap activity, then I don't think there's any I/O >>> interference. I'm guessing that redis writing out 5g of memory is >>> blowing the cpu caches for the jvm, and causing page faults. Although >>> if there's no swap activity, it seems these faults would be soft and >>> it's hard to explain how that would balloon to 30 sec pauses and not >>> show up in sys time then. >>> >>> Are there lots of context switches in the java process when this happens? >>> >>> Sent from my phone >>> >>> On Apr 9, 2014 10:41 AM, "Cornelius Riemenschneider" >> > wrote: >>> >>> Hey,____ >>> >>> thanks for the hints :-)____ >>> >>> The server runs one JVM and one redis instance (A event-based, >>> single threaded in-memory nosql-datastore).____ >>> >>> Redis stores about 5G of data, which are written to disk from time >>> to time - it now turns out, that the redis saves align perfectly >>> with our long ParNew times.____ >>> >>> By initiating a redis save I was able to get these garbage >>> collections.____ >>> >>> 2014-04-09T15:06:07.892+0200: 85043.119: >>> [GC2014-04-09T15:06:18.089+0200: 85053.315: [ParNew: >>> 2964426K->338263K(3145728K), 0.1532950 secs] >>> 7167668K->4541505K(21495808K), 10.3502410 secs] [Times: user=1.93 >>> sys=0.01, real=10.35 secs]____ >>> >>> 2014-04-09T15:06:39.203+0200: 85074.429: >>> [GC2014-04-09T15:07:11.026+0200: 85106.252: [ParNew: >>> 3145728K->405516K(3145728K), 0.3429470 secs] >>> 7645283K->5212053K(21495808K), 32.1663210 secs] [Times: user=2.96 >>> sys=0.01, real=32.17 secs]____ >>> >>> I monitored the system during the GCs and swapping definitly does >>> not happen.____ >>> >>> Because redis is single-threaded (and only used by the JVM which is >>> during ParNew in a STW phase), during ParNew there is only redis >>> active, trying to write 5G of memory to disk as fast as possible and >>> the GC.____ >>> >>> I wasn't able to pinpoint the issue yet, do you have an idea why the >>> jvm could block on I/O during a GC?____ >>> >>> Disk access is during these phases probably way slower than usual, >>> as everything is on the / partition, a RAID 1 disk array.____ >>> >>> Or does anyone know the right perf options for the perf tool to >>> monitor blocking i/o during garbage collections?____ >>> >>> __ __ >>> >>> Regards,____ >>> >>> Cornelius Riemenschneider____ >>> >>> --____ >>> >>> ITscope GmbH____ >>> >>> Ludwig-Erhard-Allee 20____ >>> >>> 76131 Karlsruhe____ >>> >>> Email: cornelius.riemenschneider at itscope.de >>> ____ >>> >>> https://www.itscope.com____ >>> >>> Handelsregister: AG Mannheim, HRB 232782____ >>> >>> Sitz der Gesellschaft: Karlsruhe____ >>> >>> Gesch?ftsf?hrer: Alexander M?nkel, Benjamin Mund, Stefan Reger____ >>> >>> __ __ >>> >>> *Von:*Vitaly Davidovich [mailto:vitalyd at gmail.com >>> ] >>> *Gesendet:* Mittwoch, 9. April 2014 14:49 >>> *An:* Cornelius Riemenschneider >>> *Cc:* hotspot-gc-use >>> *Betreff:* Re: ridiculous ParNew pause times____ >>> >>> __ __ >>> >>> This is typically caused by one of:____ >>> >>> 1) heavy swapping (this won't count towards sys time because kernel >>> is not using cpu while waiting for I/O to complete) >>> 2) oversubscribed machine where gc threads aren't getting enough cpu >>> time due____ >>> >>> Have you looked at stats on the machine when these pauses occur, >>> specifically around swap activity? Is your machine running multiple >>> JVMs or any other noisy neighbor apps?____ >>> >>> Sent from my phone____ >>> >>> On Apr 9, 2014 7:26 AM, "Cornelius Riemenschneider" >> > wrote:____ >>> >>> Hello,____ >>> >>> we've got the following problem with the ParNew collector:____ >>> >>> Our log.gc usually looks like this:____ >>> >>> 2014-04-09T12:58:02.485+0200: 77357.712: >>> [GC2014-04-09T12:58:02.485+0200: 77357.712: [ParNew: >>> 2722925K->100795K(3145728K), 0.3202010 secs] >>> 6998057K->4375934K(21495808K), 0.3205670 secs] [Times: user=4.10 >>> sys=0.02, real=0.32 secs] ____ >>> >>> 2014-04-09T12:58:06.256+0200: 77361.483: >>> [GC2014-04-09T12:58:06.257+0200: 77361.483: [ParNew: >>> 2722235K->101011K(3145728K), 0.3229910 secs] >>> 6997374K->4376165K(21495808K), 0.3233580 secs] [Times: user=4.13 >>> sys=0.02, real=0.33 secs] ____ >>> >>> 2014-04-09T12:58:12.295+0200: 77367.522: >>> [GC2014-04-09T12:58:12.296+0200: 77367.522: [ParNew: >>> 2722451K->101057K(3145728K), 0.3215320 secs] >>> 6997605K->4376216K(21495808K), 0.3219080 secs] [Times: user=4.12 >>> sys=0.01, real=0.32 secs] ____ >>> >>> 2014-04-09T12:58:18.461+0200: 77373.688: >>> [GC2014-04-09T12:58:18.462+0200: 77373.688: [ParNew: >>> 2722497K->2232K(3145728K), 0.2944540 secs] >>> 6997656K->4376242K(21495808K), 0.2948280 secs] [Times: user=3.79 >>> sys=0.00, real=0.29 secs]____ >>> >>> But occasionally we have entries like these:____ >>> >>> 2014-04-09T09:56:12.840+0200: 66448.066: >>> [GC2014-04-09T09:56:38.154+0200: 66473.381: [ParNew: >>> 3139808K->524288K(3145728K), 0.8355990 secs] >>> 6845512K->4585563K(21495808K), 26.1502640 secs] [Times: user=5.59 >>> sys=0.16, real=26.16 secs] ____ >>> >>> 2014-04-09T09:57:09.173+0200: 66504.400: >>> [GC2014-04-09T09:57:24.871+0200: 66520.098: [ParNew: >>> 2950573K->488555K(3145728K), 0.1876660 secs] >>> 8701082K->6239064K(21495808K), 15.8858250 secs] [Times: user=2.38 >>> sys=0.00, real=15.89 secs] ____ >>> >>> 2014-04-09T12:58:34.661+0200: 77389.888: >>> [GC2014-04-09T12:59:06.390+0200: 77421.616: [ParNew: >>> 2623439K->2083K(3145728K), 0.0292270 secs] >>> 6997709K->4376490K(21495808K), 31.7578950 secs] [Times: user=0.34 >>> sys=0.02, real=31.76 secs] ____ >>> >>> which I can't explain at all.____ >>> >>> The real time of 31.76sec equals a pause of 31.76secs, in which the >>> jvm does not respond to user requests, which is obviously bad.____ >>> >>> The application is _/very/_ allocation heavy, so generally pauses of >>> 0.3sec are okay.____ >>> >>> Our GC settings for this server are:____ >>> >>> -Xms21g____ >>> >>> -Xmx21g____ >>> >>> -XX:ReservedCodeCacheSize=256m____ >>> >>> -XX:PermSize=256m____ >>> >>> -XX:MaxPermSize=768m____ >>> >>> -server____ >>> >>> -verbose:gc____ >>> >>> -Xloggc:log.gc____ >>> >>> -XX:+PrintGCDetails____ >>> >>> -XX:+PrintGCDateStamps____ >>> >>> -XX:+ExplicitGCInvokesConcurrent____ >>> >>> -XX:NewRatio=5____ >>> >>> -XX:SurvivorRatio=5____ >>> >>> -XX:+UseConcMarkSweepGC____ >>> >>> -XX:+UseParNewGC____ >>> >>> -XX:+UseCMSInitiatingOccupancyOnly____ >>> >>> -XX:CMSInitiatingOccupancyFraction=40____ >>> >>> -XX:+CMSClassUnloadingEnabled____ >>> >>> -XX:+CMSScavengeBeforeRemark____ >>> >>> -Dsun.rmi.dgc.client.gcInterval=1209600000____ >>> >>> -Dsun.rmi.dgc.server.gcInterval=1209600000____ >>> >>> ____ >>> >>> We run the sun jdk 7u51 on a current debian wheezy.____ >>> >>> We previously had issues with long ParNew pauses, but back then, the >>> sys time was high, so we concluded that the server was swapping,____ >>> >>> which we were able to prevent.____ >>> >>> Do you have any idea or further hint at debugging options which >>> might help us in finding the issue?____ >>> >>> ____ >>> >>> Regards,____ >>> >>> Cornelius Riemenschneider____ >>> >>> --____ >>> >>> ITscope GmbH____ >>> >>> Ludwig-Erhard-Allee 20____ >>> >>> 76131 Karlsruhe____ >>> >>> Email: cornelius.riemenschneider at itscope.de >>> ____ >>> >>> https://www.itscope.com____ >>> >>> Handelsregister: AG Mannheim, HRB 232782____ >>> >>> Sitz der Gesellschaft: Karlsruhe____ >>> >>> Gesch?ftsf?hrer: Alexander M?nkel, Benjamin Mund, Stefan Reger____ >>> >>> ____ >>> >>> >>> _______________________________________________ >>> hotspot-gc-use mailing list >>> hotspot-gc-use at openjdk.java.net >> openjdk.java.net> >>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use____ >>> >>> >>> >>> _______________________________________________ >>> hotspot-gc-use mailing list >>> hotspot-gc-use at openjdk.java.net >>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>> >>> >> > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cri at itscope.de Wed Apr 9 16:40:10 2014 From: cri at itscope.de (=?windows-1252?Q?Cornelius_Riemenschneider?=) Date: Wed, 9 Apr 2014 18:40:10 +0200 Subject: AW: ridiculous ParNew pause times In-Reply-To: <53456751.40408@finkzeit.at> References: Message-ID: Bingo! First, i tried to get a tool which shows me which process writes to which file and how long that takes, but I was unable to find one I could master to use. Based on your suggestion I moved the log.gc file to a ramdisk and performed extensive load testing - now my biggest outlier is 2014-04-09T18:29:13.873+0200: 383.372: [GC2014-04-09T18:29:13.874+0200: 383.372: [ParNew: 431041K->208698K(3145728K), 1.8312460 secs] 11055599K->11203421K(21495808K), 1.8315130 secs] [Times: user=2.61 sys=0.03, real=1.83 secs] which is okay. When moving log.gc back to the harddisk to which redis saves I get pauses as long as 45sec. So I agree with Vitaly that redis probably thrashes some caches (though not all, as it only occupies one cpu core of a dual-socket server) which would explain a slowdown of garbage collections, but the real cause of the long spikes is actually writing the log to disk. Could you guys check if you write the gc log synchronous? If yes, I'd suggest (a) moving writing the log out of the STW phase and (b) using asynchronous I/O for writing the log. Thanks for all your suggestions and your help! Regards, Cornelius Riemenschneider -- ITscope GmbH Ludwig-Erhard-Allee 20 76131 Karlsruhe Email: cornelius.riemenschneider at itscope.de https://www.itscope.com Handelsregister: AG Mannheim, HRB 232782 Sitz der Gesellschaft: Karlsruhe Gesch?ftsf?hrer: Alexander M?nkel, Benjamin Mund, Stefan Reger -----Urspr?ngliche Nachricht----- Von: Wolfgang Pedot [mailto:wolfgang.pedot at finkzeit.at] Gesendet: Mittwoch, 9. April 2014 17:29 An: Vitaly Davidovich Cc: Cornelius Riemenschneider; hotspot-gc-use Betreff: Re: ridiculous ParNew pause times A crazy idea: could the gc-logging be the problem here because the redis-dump is using up all the IO bandwidth? Wolfgang Am 09.04.2014 17:06, schrieb Vitaly Davidovich: > Interesting. How much RAM does this machine have? You're using about > 21-22g for jvm and another 5g for redis, right? Is there room to spare > in physical memory? > > If there's no swap activity, then I don't think there's any I/O > interference. I'm guessing that redis writing out 5g of memory is > blowing the cpu caches for the jvm, and causing page faults. Although > if there's no swap activity, it seems these faults would be soft and > it's hard to explain how that would balloon to 30 sec pauses and not > show up in sys time then. > > Are there lots of context switches in the java process when this happens? > > Sent from my phone > > On Apr 9, 2014 10:41 AM, "Cornelius Riemenschneider" > wrote: > > Hey,____ > > thanks for the hints :-)____ > > The server runs one JVM and one redis instance (A event-based, > single threaded in-memory nosql-datastore).____ > > Redis stores about 5G of data, which are written to disk from time > to time ? it now turns out, that the redis saves align perfectly > with our long ParNew times.____ > > By initiating a redis save I was able to get these garbage > collections.____ > > 2014-04-09T15:06:07.892+0200: 85043.119: > [GC2014-04-09T15:06:18.089+0200: 85053.315: [ParNew: > 2964426K->338263K(3145728K), 0.1532950 secs] > 7167668K->4541505K(21495808K), 10.3502410 secs] [Times: user=1.93 > sys=0.01, real=10.35 secs]____ > > 2014-04-09T15:06:39.203+0200: 85074.429: > [GC2014-04-09T15:07:11.026+0200: 85106.252: [ParNew: > 3145728K->405516K(3145728K), 0.3429470 secs] > 7645283K->5212053K(21495808K), 32.1663210 secs] [Times: user=2.96 > sys=0.01, real=32.17 secs]____ > > I monitored the system during the GCs and swapping definitly does > not happen.____ > > Because redis is single-threaded (and only used by the JVM which is > during ParNew in a STW phase), during ParNew there is only redis > active, trying to write 5G of memory to disk as fast as possible and > the GC.____ > > I wasn?t able to pinpoint the issue yet, do you have an idea why the > jvm could block on I/O during a GC?____ > > Disk access is during these phases probably way slower than usual, > as everything is on the / partition, a RAID 1 disk array.____ > > Or does anyone know the right perf options for the perf tool to > monitor blocking i/o during garbage collections?____ > > __ __ > > Regards,____ > > Cornelius Riemenschneider____ > > --____ > > ITscope GmbH____ > > Ludwig-Erhard-Allee 20____ > > 76131 Karlsruhe____ > > Email: cornelius.riemenschneider at itscope.de > ____ > > https://www.itscope.com____ > > Handelsregister: AG Mannheim, HRB 232782____ > > Sitz der Gesellschaft: Karlsruhe____ > > Gesch?ftsf?hrer: Alexander M?nkel, Benjamin Mund, Stefan Reger____ > > __ __ > > *Von:*Vitaly Davidovich [mailto:vitalyd at gmail.com > ] > *Gesendet:* Mittwoch, 9. April 2014 14:49 > *An:* Cornelius Riemenschneider > *Cc:* hotspot-gc-use > *Betreff:* Re: ridiculous ParNew pause times____ > > __ __ > > This is typically caused by one of:____ > > 1) heavy swapping (this won't count towards sys time because kernel > is not using cpu while waiting for I/O to complete) > 2) oversubscribed machine where gc threads aren't getting enough cpu > time due____ > > Have you looked at stats on the machine when these pauses occur, > specifically around swap activity? Is your machine running multiple > JVMs or any other noisy neighbor apps?____ > > Sent from my phone____ > > On Apr 9, 2014 7:26 AM, "Cornelius Riemenschneider" > wrote:____ > > Hello,____ > > we?ve got the following problem with the ParNew collector:____ > > Our log.gc usually looks like this:____ > > 2014-04-09T12:58:02.485+0200: 77357.712: > [GC2014-04-09T12:58:02.485+0200: 77357.712: [ParNew: > 2722925K->100795K(3145728K), 0.3202010 secs] > 6998057K->4375934K(21495808K), 0.3205670 secs] [Times: user=4.10 > sys=0.02, real=0.32 secs] ____ > > 2014-04-09T12:58:06.256+0200: 77361.483: > [GC2014-04-09T12:58:06.257+0200: 77361.483: [ParNew: > 2722235K->101011K(3145728K), 0.3229910 secs] > 6997374K->4376165K(21495808K), 0.3233580 secs] [Times: user=4.13 > sys=0.02, real=0.33 secs] ____ > > 2014-04-09T12:58:12.295+0200: 77367.522: > [GC2014-04-09T12:58:12.296+0200: 77367.522: [ParNew: > 2722451K->101057K(3145728K), 0.3215320 secs] > 6997605K->4376216K(21495808K), 0.3219080 secs] [Times: user=4.12 > sys=0.01, real=0.32 secs] ____ > > 2014-04-09T12:58:18.461+0200: 77373.688: > [GC2014-04-09T12:58:18.462+0200: 77373.688: [ParNew: > 2722497K->2232K(3145728K), 0.2944540 secs] > 6997656K->4376242K(21495808K), 0.2948280 secs] [Times: user=3.79 > sys=0.00, real=0.29 secs]____ > > But occasionally we have entries like these:____ > > 2014-04-09T09:56:12.840+0200: 66448.066: > [GC2014-04-09T09:56:38.154+0200: 66473.381: [ParNew: > 3139808K->524288K(3145728K), 0.8355990 secs] > 6845512K->4585563K(21495808K), 26.1502640 secs] [Times: user=5.59 > sys=0.16, real=26.16 secs] ____ > > 2014-04-09T09:57:09.173+0200: 66504.400: > [GC2014-04-09T09:57:24.871+0200: 66520.098: [ParNew: > 2950573K->488555K(3145728K), 0.1876660 secs] > 8701082K->6239064K(21495808K), 15.8858250 secs] [Times: user=2.38 > sys=0.00, real=15.89 secs] ____ > > 2014-04-09T12:58:34.661+0200: 77389.888: > [GC2014-04-09T12:59:06.390+0200: 77421.616: [ParNew: > 2623439K->2083K(3145728K), 0.0292270 secs] > 6997709K->4376490K(21495808K), 31.7578950 secs] [Times: user=0.34 > sys=0.02, real=31.76 secs] ____ > > which I can?t explain at all.____ > > The real time of 31.76sec equals a pause of 31.76secs, in which the > jvm does not respond to user requests, which is obviously bad.____ > > The application is _/very/_ allocation heavy, so generally pauses of > 0.3sec are okay.____ > > Our GC settings for this server are:____ > > -Xms21g____ > > -Xmx21g____ > > -XX:ReservedCodeCacheSize=256m____ > > -XX:PermSize=256m____ > > -XX:MaxPermSize=768m____ > > -server____ > > -verbose:gc____ > > -Xloggc:log.gc____ > > -XX:+PrintGCDetails____ > > -XX:+PrintGCDateStamps____ > > -XX:+ExplicitGCInvokesConcurrent____ > > -XX:NewRatio=5____ > > -XX:SurvivorRatio=5____ > > -XX:+UseConcMarkSweepGC____ > > -XX:+UseParNewGC____ > > -XX:+UseCMSInitiatingOccupancyOnly____ > > -XX:CMSInitiatingOccupancyFraction=40____ > > -XX:+CMSClassUnloadingEnabled____ > > -XX:+CMSScavengeBeforeRemark____ > > -Dsun.rmi.dgc.client.gcInterval=1209600000____ > > -Dsun.rmi.dgc.server.gcInterval=1209600000____ > > ____ > > We run the sun jdk 7u51 on a current debian wheezy.____ > > We previously had issues with long ParNew pauses, but back then, the > sys time was high, so we concluded that the server was > swapping,____ > > which we were able to prevent.____ > > Do you have any idea or further hint at debugging options which > might help us in finding the issue?____ > > ____ > > Regards,____ > > Cornelius Riemenschneider____ > > --____ > > ITscope GmbH____ > > Ludwig-Erhard-Allee 20____ > > 76131 Karlsruhe____ > > Email: cornelius.riemenschneider at itscope.de > ____ > > https://www.itscope.com____ > > Handelsregister: AG Mannheim, HRB 232782____ > > Sitz der Gesellschaft: Karlsruhe____ > > Gesch?ftsf?hrer: Alexander M?nkel, Benjamin Mund, Stefan Reger____ > > ____ > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use____ > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > From ysr1729 at gmail.com Wed Apr 9 16:49:55 2014 From: ysr1729 at gmail.com (Srinivas Ramakrishna) Date: Wed, 9 Apr 2014 09:49:55 -0700 Subject: Why G1 doesn't cut it for our application In-Reply-To: References: Message-ID: May be look at the +PrintHeapAtGCExtended (or similar) option which gives you a breakdown of space usage in each of the regions. The messages here state that the attempt is to increase the heap size by a mere 117 MB. It would have been nice to also print the size of the allocation request that was failing and causing a fallback to full gc. It seems a bit strange that with only 11 G of the 17 G in the old gen used (i.e. 6 GB free), the space was so fragmented as to prevent, presumably here, a 117 MB allocation (OTOH perhaps the heap expansion request of 117 MB should not be construed as a request for 117 MB). Knowing the distribution of free space in the regions before the full gc and the size of the allocation request that failed might provide some clues on how G1 may have to be "paced" or tuned to keep sufficiently many completely free (and hopefully contiguous) regions to accomodate larger than region size object allocations. What region size are you using? I would definitely try a large region size and a larger heap, if you have sufficient free ram (make the space available to the old generation three times the footprint of your application just to give G1 sufficient head room to manage the space). I wouldn't necessarily want to force large object allocations into Eden unless you know that these objects have a relatively short lifetime (which would seem to be a dubious use of large objects since it's likely then that a very small portion of that object is actually used in that case). What settings do you use with CMS? And what settings are you using with G1? USD 0.02. -- ramki On Wed, Apr 9, 2014 at 4:56 AM, Cornelius Riemenschneider wrote: > Hello, > > after recently switching to the latest java 7 (u51), I was eager to try > out G1. > > I used mainly > http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-gc-migration-to-expectations-and-advanced-tuningfor tuning, but > > I hit a roadblock which makes it impossile for us to use G1. > > Our allocation pattern includes sometimes huge objects, sometimes in the > range of ~120MB, sometimes ~600MB, but I've seen about 1.2GB as well. This > is obviously unfriendly to the GC. > > Our tuned CMS mostly handles this, but sometimes we hit problems, so we > had high expectations for G1. > > G1, in our case, triggers FullGC way more often than CMS, even when the > heap is mostly empty. > > > > An example log excerpt for this: > > 399934.892: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: > humongous allocation request failed, allocation request: 117245600 bytes] > > 399934.892: [G1Ergonomics (Heap Sizing) expand the heap, requested > expansion amount: 83886080 bytes, attempted expansion amount: 83886080 > bytes] > > 399934.892: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: > heap expansion operation failed] > > 399934.893: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: > humongous allocation request failed, allocation request: 117245600 bytes] > > 399934.893: [G1Ergonomics (Heap Sizing) expand the heap, requested > expansion amount: 83886080 bytes, attempted expansion amount: 83886080 > bytes] > > 399934.893: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: > heap expansion operation failed] > > 399934.893: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: > allocation request failed, allocation request: 117245600 bytes] > > 399934.893: [G1Ergonomics (Heap Sizing) expand the heap, requested > expansion amount: 117245600 bytes, attempted expansion amount: 117440512 > bytes] > > 399934.893: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: > heap expansion operation failed] > > 2014-04-09T12:12:49.602+0200: 399934.894: [Full GC 11G->8118M(20G), > 20.8752850 secs] > > [Eden: 8192.0K(1016.0M)->0.0B(2728.0M) Survivors: 96.0M->0.0B Heap: > 11.6G(20.0G)->8118.8M(20.0G)] > > [Times: user=37.77 sys=0.00, real=20.88 secs] > > 2014-04-09T12:13:10.479+0200: 399955.770: [GC concurrent-mark-abort] > > > > We have a total of 20G for the heap available, and try to allocate objects > in the 120MB range. > > 9 GB of the heap are free, so these should fit in without problems, even > in Eden is a lot of free space. > > The attempted heap expansion fails, because we use > > -Xms20g > > -Xmx20g > > which is the maximum the server is able to handle. > > Still, G1 gets us a FullGC here. This FullGC may be faster than a CMS > FullGC, but these happen way too often to be tolerated, especially as this > server is responsible for a web > > application with which users directly interact - 20 secs pause after > clicking are simply not tolerable. > > Besides using CMS, or not doing large allocations (which is sometimes > impossible, given that we deal with a lot of data), > > do you have oher ideas? > > Is it known that an allocation pattern with a lot of huge objects breaks > G1? > > The above linked presentation suggests to increase the G1 region size when > humongous allocation requests are encountered, so these allocation go in > eden, but we can > > not increase the region size beyond 32M, so this fix doesn't work for us. > > > > Regards, > > Cornelius Riemenschneider > > -- > > ITscope GmbH > > Ludwig-Erhard-Allee 20 > > 76131 Karlsruhe > > Email: cornelius.riemenschneider at itscope.de > > https://www.itscope.com > > Handelsregister: AG Mannheim, HRB 232782 > > Sitz der Gesellschaft: Karlsruhe > > Gesch?ftsf?hrer: Alexander M?nkel, Benjamin Mund, Stefan Reger > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gustav.r.akesson at gmail.com Wed Apr 9 17:05:59 2014 From: gustav.r.akesson at gmail.com (=?ISO-8859-1?Q?Gustav_=C5kesson?=) Date: Wed, 9 Apr 2014 19:05:59 +0200 Subject: AW: ridiculous ParNew pause times In-Reply-To: References: <53456751.40408@finkzeit.at> Message-ID: Hi, Yes the GC-logging is synchronous and part of the GC cycle, so any delay will add to the pause. Best regards, Gustav ?kesson Den 9 apr 2014 18:40 skrev "Cornelius Riemenschneider" : > Bingo! > First, i tried to get a tool which shows me which process writes to which > file and how long that takes, but I was unable to find one I could master > to use. > Based on your suggestion I moved the log.gc file to a ramdisk and > performed extensive load testing - now my biggest outlier is > 2014-04-09T18:29:13.873+0200: 383.372: [GC2014-04-09T18:29:13.874+0200: > 383.372: [ParNew: 431041K->208698K(3145728K), 1.8312460 secs] > 11055599K->11203421K(21495808K), 1.8315130 secs] [Times: user=2.61 > sys=0.03, real=1.83 secs] > which is okay. > When moving log.gc back to the harddisk to which redis saves I get pauses > as long as 45sec. > > So I agree with Vitaly that redis probably thrashes some caches (though > not all, as it only occupies one cpu core of a dual-socket server) which > would explain a slowdown of garbage collections, but the real cause of the > long spikes is actually writing the log to disk. > > Could you guys check if you write the gc log synchronous? > If yes, I'd suggest (a) moving writing the log out of the STW phase and > (b) using asynchronous I/O for writing the log. > > Thanks for all your suggestions and your help! > > Regards, > Cornelius Riemenschneider > -- > ITscope GmbH > Ludwig-Erhard-Allee 20 > 76131 Karlsruhe > Email: cornelius.riemenschneider at itscope.de > https://www.itscope.com > Handelsregister: AG Mannheim, HRB 232782 > Sitz der Gesellschaft: Karlsruhe > Gesch?ftsf?hrer: Alexander M?nkel, Benjamin Mund, Stefan Reger > > > -----Urspr?ngliche Nachricht----- > Von: Wolfgang Pedot [mailto:wolfgang.pedot at finkzeit.at] > Gesendet: Mittwoch, 9. April 2014 17:29 > An: Vitaly Davidovich > Cc: Cornelius Riemenschneider; hotspot-gc-use > Betreff: Re: ridiculous ParNew pause times > > A crazy idea: could the gc-logging be the problem here because the > redis-dump is using up all the IO bandwidth? > > Wolfgang > > Am 09.04.2014 17:06, schrieb Vitaly Davidovich: > > Interesting. How much RAM does this machine have? You're using about > > 21-22g for jvm and another 5g for redis, right? Is there room to spare > > in physical memory? > > > > If there's no swap activity, then I don't think there's any I/O > > interference. I'm guessing that redis writing out 5g of memory is > > blowing the cpu caches for the jvm, and causing page faults. Although > > if there's no swap activity, it seems these faults would be soft and > > it's hard to explain how that would balloon to 30 sec pauses and not > > show up in sys time then. > > > > Are there lots of context switches in the java process when this happens? > > > > Sent from my phone > > > > On Apr 9, 2014 10:41 AM, "Cornelius Riemenschneider" > > wrote: > > > > Hey,____ > > > > thanks for the hints :-)____ > > > > The server runs one JVM and one redis instance (A event-based, > > single threaded in-memory nosql-datastore).____ > > > > Redis stores about 5G of data, which are written to disk from time > > to time ? it now turns out, that the redis saves align perfectly > > with our long ParNew times.____ > > > > By initiating a redis save I was able to get these garbage > > collections.____ > > > > 2014-04-09T15:06:07.892+0200: 85043.119: > > [GC2014-04-09T15:06:18.089+0200: 85053.315: [ParNew: > > 2964426K->338263K(3145728K), 0.1532950 secs] > > 7167668K->4541505K(21495808K), 10.3502410 secs] [Times: user=1.93 > > sys=0.01, real=10.35 secs]____ > > > > 2014-04-09T15:06:39.203+0200: 85074.429: > > [GC2014-04-09T15:07:11.026+0200: 85106.252: [ParNew: > > 3145728K->405516K(3145728K), 0.3429470 secs] > > 7645283K->5212053K(21495808K), 32.1663210 secs] [Times: user=2.96 > > sys=0.01, real=32.17 secs]____ > > > > I monitored the system during the GCs and swapping definitly does > > not happen.____ > > > > Because redis is single-threaded (and only used by the JVM which is > > during ParNew in a STW phase), during ParNew there is only redis > > active, trying to write 5G of memory to disk as fast as possible and > > the GC.____ > > > > I wasn?t able to pinpoint the issue yet, do you have an idea why the > > jvm could block on I/O during a GC?____ > > > > Disk access is during these phases probably way slower than usual, > > as everything is on the / partition, a RAID 1 disk array.____ > > > > Or does anyone know the right perf options for the perf tool to > > monitor blocking i/o during garbage collections?____ > > > > __ __ > > > > Regards,____ > > > > Cornelius Riemenschneider____ > > > > --____ > > > > ITscope GmbH____ > > > > Ludwig-Erhard-Allee 20____ > > > > 76131 Karlsruhe____ > > > > Email: cornelius.riemenschneider at itscope.de > > ____ > > > > https://www.itscope.com____ > > > > Handelsregister: AG Mannheim, HRB 232782____ > > > > Sitz der Gesellschaft: Karlsruhe____ > > > > Gesch?ftsf?hrer: Alexander M?nkel, Benjamin Mund, Stefan Reger____ > > > > __ __ > > > > *Von:*Vitaly Davidovich [mailto:vitalyd at gmail.com > > ] > > *Gesendet:* Mittwoch, 9. April 2014 14:49 > > *An:* Cornelius Riemenschneider > > *Cc:* hotspot-gc-use > > *Betreff:* Re: ridiculous ParNew pause times____ > > > > __ __ > > > > This is typically caused by one of:____ > > > > 1) heavy swapping (this won't count towards sys time because kernel > > is not using cpu while waiting for I/O to complete) > > 2) oversubscribed machine where gc threads aren't getting enough cpu > > time due____ > > > > Have you looked at stats on the machine when these pauses occur, > > specifically around swap activity? Is your machine running multiple > > JVMs or any other noisy neighbor apps?____ > > > > Sent from my phone____ > > > > On Apr 9, 2014 7:26 AM, "Cornelius Riemenschneider" > > wrote:____ > > > > Hello,____ > > > > we?ve got the following problem with the ParNew collector:____ > > > > Our log.gc usually looks like this:____ > > > > 2014-04-09T12:58:02.485+0200: 77357.712: > > [GC2014-04-09T12:58:02.485+0200: 77357.712: [ParNew: > > 2722925K->100795K(3145728K), 0.3202010 secs] > > 6998057K->4375934K(21495808K), 0.3205670 secs] [Times: user=4.10 > > sys=0.02, real=0.32 secs] ____ > > > > 2014-04-09T12:58:06.256+0200: 77361.483: > > [GC2014-04-09T12:58:06.257+0200: 77361.483: [ParNew: > > 2722235K->101011K(3145728K), 0.3229910 secs] > > 6997374K->4376165K(21495808K), 0.3233580 secs] [Times: user=4.13 > > sys=0.02, real=0.33 secs] ____ > > > > 2014-04-09T12:58:12.295+0200: 77367.522: > > [GC2014-04-09T12:58:12.296+0200: 77367.522: [ParNew: > > 2722451K->101057K(3145728K), 0.3215320 secs] > > 6997605K->4376216K(21495808K), 0.3219080 secs] [Times: user=4.12 > > sys=0.01, real=0.32 secs] ____ > > > > 2014-04-09T12:58:18.461+0200: 77373.688: > > [GC2014-04-09T12:58:18.462+0200: 77373.688: [ParNew: > > 2722497K->2232K(3145728K), 0.2944540 secs] > > 6997656K->4376242K(21495808K), 0.2948280 secs] [Times: user=3.79 > > sys=0.00, real=0.29 secs]____ > > > > But occasionally we have entries like these:____ > > > > 2014-04-09T09:56:12.840+0200: 66448.066: > > [GC2014-04-09T09:56:38.154+0200: 66473.381: [ParNew: > > 3139808K->524288K(3145728K), 0.8355990 secs] > > 6845512K->4585563K(21495808K), 26.1502640 secs] [Times: user=5.59 > > sys=0.16, real=26.16 secs] ____ > > > > 2014-04-09T09:57:09.173+0200: 66504.400: > > [GC2014-04-09T09:57:24.871+0200: 66520.098: [ParNew: > > 2950573K->488555K(3145728K), 0.1876660 secs] > > 8701082K->6239064K(21495808K), 15.8858250 secs] [Times: user=2.38 > > sys=0.00, real=15.89 secs] ____ > > > > 2014-04-09T12:58:34.661+0200: 77389.888: > > [GC2014-04-09T12:59:06.390+0200: 77421.616: [ParNew: > > 2623439K->2083K(3145728K), 0.0292270 secs] > > 6997709K->4376490K(21495808K), 31.7578950 secs] [Times: user=0.34 > > sys=0.02, real=31.76 secs] ____ > > > > which I can?t explain at all.____ > > > > The real time of 31.76sec equals a pause of 31.76secs, in which the > > jvm does not respond to user requests, which is obviously bad.____ > > > > The application is _/very/_ allocation heavy, so generally pauses of > > 0.3sec are okay.____ > > > > Our GC settings for this server are:____ > > > > -Xms21g____ > > > > -Xmx21g____ > > > > -XX:ReservedCodeCacheSize=256m____ > > > > -XX:PermSize=256m____ > > > > -XX:MaxPermSize=768m____ > > > > -server____ > > > > -verbose:gc____ > > > > -Xloggc:log.gc____ > > > > -XX:+PrintGCDetails____ > > > > -XX:+PrintGCDateStamps____ > > > > -XX:+ExplicitGCInvokesConcurrent____ > > > > -XX:NewRatio=5____ > > > > -XX:SurvivorRatio=5____ > > > > -XX:+UseConcMarkSweepGC____ > > > > -XX:+UseParNewGC____ > > > > -XX:+UseCMSInitiatingOccupancyOnly____ > > > > -XX:CMSInitiatingOccupancyFraction=40____ > > > > -XX:+CMSClassUnloadingEnabled____ > > > > -XX:+CMSScavengeBeforeRemark____ > > > > -Dsun.rmi.dgc.client.gcInterval=1209600000____ > > > > -Dsun.rmi.dgc.server.gcInterval=1209600000____ > > > > ____ > > > > We run the sun jdk 7u51 on a current debian wheezy.____ > > > > We previously had issues with long ParNew pauses, but back then, the > > sys time was high, so we concluded that the server was > > swapping,____ > > > > which we were able to prevent.____ > > > > Do you have any idea or further hint at debugging options which > > might help us in finding the issue?____ > > > > ____ > > > > Regards,____ > > > > Cornelius Riemenschneider____ > > > > --____ > > > > ITscope GmbH____ > > > > Ludwig-Erhard-Allee 20____ > > > > 76131 Karlsruhe____ > > > > Email: cornelius.riemenschneider at itscope.de > > ____ > > > > https://www.itscope.com____ > > > > Handelsregister: AG Mannheim, HRB 232782____ > > > > Sitz der Gesellschaft: Karlsruhe____ > > > > Gesch?ftsf?hrer: Alexander M?nkel, Benjamin Mund, Stefan Reger____ > > > > ____ > > > > > > _______________________________________________ > > hotspot-gc-use mailing list > > hotspot-gc-use at openjdk.java.net hotspot-gc-use at openjdk.java.net> > > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use____ > > > > > > > > _______________________________________________ > > hotspot-gc-use mailing list > > hotspot-gc-use at openjdk.java.net > > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gustav.r.akesson at gmail.com Wed Apr 9 17:18:26 2014 From: gustav.r.akesson at gmail.com (=?ISO-8859-1?Q?Gustav_=C5kesson?=) Date: Wed, 9 Apr 2014 19:18:26 +0200 Subject: AW: ridiculous ParNew pause times In-Reply-To: References: <53456751.40408@finkzeit.at>

Message-ID: Hi, Oh, and another thing - when conference speakers suggest to always have GC logging enabled in production, they should also mention the synchronous logging... folks seem to think there is minimal impact of GC logging. Best regards, Gustav ?kesson Den 9 apr 2014 19:05 skrev "Gustav ?kesson" : > Hi, > > Yes the GC-logging is synchronous and part of the GC cycle, so any delay > will add to the pause. > > Best regards, > Gustav ?kesson > Den 9 apr 2014 18:40 skrev "Cornelius Riemenschneider" : > >> Bingo! >> First, i tried to get a tool which shows me which process writes to which >> file and how long that takes, but I was unable to find one I could master >> to use. >> Based on your suggestion I moved the log.gc file to a ramdisk and >> performed extensive load testing - now my biggest outlier is >> 2014-04-09T18:29:13.873+0200: 383.372: [GC2014-04-09T18:29:13.874+0200: >> 383.372: [ParNew: 431041K->208698K(3145728K), 1.8312460 secs] >> 11055599K->11203421K(21495808K), 1.8315130 secs] [Times: user=2.61 >> sys=0.03, real=1.83 secs] >> which is okay. >> When moving log.gc back to the harddisk to which redis saves I get pauses >> as long as 45sec. >> >> So I agree with Vitaly that redis probably thrashes some caches (though >> not all, as it only occupies one cpu core of a dual-socket server) which >> would explain a slowdown of garbage collections, but the real cause of the >> long spikes is actually writing the log to disk. >> >> Could you guys check if you write the gc log synchronous? >> If yes, I'd suggest (a) moving writing the log out of the STW phase and >> (b) using asynchronous I/O for writing the log. >> >> Thanks for all your suggestions and your help! >> >> Regards, >> Cornelius Riemenschneider >> -- >> ITscope GmbH >> Ludwig-Erhard-Allee 20 >> 76131 Karlsruhe >> Email: cornelius.riemenschneider at itscope.de >> https://www.itscope.com >> Handelsregister: AG Mannheim, HRB 232782 >> Sitz der Gesellschaft: Karlsruhe >> Gesch?ftsf?hrer: Alexander M?nkel, Benjamin Mund, Stefan Reger >> >> >> -----Urspr?ngliche Nachricht----- >> Von: Wolfgang Pedot [mailto:wolfgang.pedot at finkzeit.at] >> Gesendet: Mittwoch, 9. April 2014 17:29 >> An: Vitaly Davidovich >> Cc: Cornelius Riemenschneider; hotspot-gc-use >> Betreff: Re: ridiculous ParNew pause times >> >> A crazy idea: could the gc-logging be the problem here because the >> redis-dump is using up all the IO bandwidth? >> >> Wolfgang >> >> Am 09.04.2014 17:06, schrieb Vitaly Davidovich: >> > Interesting. How much RAM does this machine have? You're using about >> > 21-22g for jvm and another 5g for redis, right? Is there room to spare >> > in physical memory? >> > >> > If there's no swap activity, then I don't think there's any I/O >> > interference. I'm guessing that redis writing out 5g of memory is >> > blowing the cpu caches for the jvm, and causing page faults. Although >> > if there's no swap activity, it seems these faults would be soft and >> > it's hard to explain how that would balloon to 30 sec pauses and not >> > show up in sys time then. >> > >> > Are there lots of context switches in the java process when this >> happens? >> > >> > Sent from my phone >> > >> > On Apr 9, 2014 10:41 AM, "Cornelius Riemenschneider" > > > wrote: >> > >> > Hey,____ >> > >> > thanks for the hints :-)____ >> > >> > The server runs one JVM and one redis instance (A event-based, >> > single threaded in-memory nosql-datastore).____ >> > >> > Redis stores about 5G of data, which are written to disk from time >> > to time ? it now turns out, that the redis saves align perfectly >> > with our long ParNew times.____ >> > >> > By initiating a redis save I was able to get these garbage >> > collections.____ >> > >> > 2014-04-09T15:06:07.892+0200: 85043.119: >> > [GC2014-04-09T15:06:18.089+0200: 85053.315: [ParNew: >> > 2964426K->338263K(3145728K), 0.1532950 secs] >> > 7167668K->4541505K(21495808K), 10.3502410 secs] [Times: user=1.93 >> > sys=0.01, real=10.35 secs]____ >> > >> > 2014-04-09T15:06:39.203+0200: 85074.429: >> > [GC2014-04-09T15:07:11.026+0200: 85106.252: [ParNew: >> > 3145728K->405516K(3145728K), 0.3429470 secs] >> > 7645283K->5212053K(21495808K), 32.1663210 secs] [Times: user=2.96 >> > sys=0.01, real=32.17 secs]____ >> > >> > I monitored the system during the GCs and swapping definitly does >> > not happen.____ >> > >> > Because redis is single-threaded (and only used by the JVM which is >> > during ParNew in a STW phase), during ParNew there is only redis >> > active, trying to write 5G of memory to disk as fast as possible and >> > the GC.____ >> > >> > I wasn?t able to pinpoint the issue yet, do you have an idea why the >> > jvm could block on I/O during a GC?____ >> > >> > Disk access is during these phases probably way slower than usual, >> > as everything is on the / partition, a RAID 1 disk array.____ >> > >> > Or does anyone know the right perf options for the perf tool to >> > monitor blocking i/o during garbage collections?____ >> > >> > __ __ >> > >> > Regards,____ >> > >> > Cornelius Riemenschneider____ >> > >> > --____ >> > >> > ITscope GmbH____ >> > >> > Ludwig-Erhard-Allee 20____ >> > >> > 76131 Karlsruhe____ >> > >> > Email: cornelius.riemenschneider at itscope.de >> > ____ >> > >> > https://www.itscope.com____ >> > >> > Handelsregister: AG Mannheim, HRB 232782____ >> > >> > Sitz der Gesellschaft: Karlsruhe____ >> > >> > Gesch?ftsf?hrer: Alexander M?nkel, Benjamin Mund, Stefan Reger____ >> > >> > __ __ >> > >> > *Von:*Vitaly Davidovich [mailto:vitalyd at gmail.com >> > ] >> > *Gesendet:* Mittwoch, 9. April 2014 14:49 >> > *An:* Cornelius Riemenschneider >> > *Cc:* hotspot-gc-use >> > *Betreff:* Re: ridiculous ParNew pause times____ >> > >> > __ __ >> > >> > This is typically caused by one of:____ >> > >> > 1) heavy swapping (this won't count towards sys time because kernel >> > is not using cpu while waiting for I/O to complete) >> > 2) oversubscribed machine where gc threads aren't getting enough cpu >> > time due____ >> > >> > Have you looked at stats on the machine when these pauses occur, >> > specifically around swap activity? Is your machine running multiple >> > JVMs or any other noisy neighbor apps?____ >> > >> > Sent from my phone____ >> > >> > On Apr 9, 2014 7:26 AM, "Cornelius Riemenschneider" > > > wrote:____ >> > >> > Hello,____ >> > >> > we?ve got the following problem with the ParNew collector:____ >> > >> > Our log.gc usually looks like this:____ >> > >> > 2014-04-09T12:58:02.485+0200: 77357.712: >> > [GC2014-04-09T12:58:02.485+0200: 77357.712: [ParNew: >> > 2722925K->100795K(3145728K), 0.3202010 secs] >> > 6998057K->4375934K(21495808K), 0.3205670 secs] [Times: user=4.10 >> > sys=0.02, real=0.32 secs] ____ >> > >> > 2014-04-09T12:58:06.256+0200: 77361.483: >> > [GC2014-04-09T12:58:06.257+0200: 77361.483: [ParNew: >> > 2722235K->101011K(3145728K), 0.3229910 secs] >> > 6997374K->4376165K(21495808K), 0.3233580 secs] [Times: user=4.13 >> > sys=0.02, real=0.33 secs] ____ >> > >> > 2014-04-09T12:58:12.295+0200: 77367.522: >> > [GC2014-04-09T12:58:12.296+0200: 77367.522: [ParNew: >> > 2722451K->101057K(3145728K), 0.3215320 secs] >> > 6997605K->4376216K(21495808K), 0.3219080 secs] [Times: user=4.12 >> > sys=0.01, real=0.32 secs] ____ >> > >> > 2014-04-09T12:58:18.461+0200: 77373.688: >> > [GC2014-04-09T12:58:18.462+0200: 77373.688: [ParNew: >> > 2722497K->2232K(3145728K), 0.2944540 secs] >> > 6997656K->4376242K(21495808K), 0.2948280 secs] [Times: user=3.79 >> > sys=0.00, real=0.29 secs]____ >> > >> > But occasionally we have entries like these:____ >> > >> > 2014-04-09T09:56:12.840+0200: 66448.066: >> > [GC2014-04-09T09:56:38.154+0200: 66473.381: [ParNew: >> > 3139808K->524288K(3145728K), 0.8355990 secs] >> > 6845512K->4585563K(21495808K), 26.1502640 secs] [Times: user=5.59 >> > sys=0.16, real=26.16 secs] ____ >> > >> > 2014-04-09T09:57:09.173+0200: 66504.400: >> > [GC2014-04-09T09:57:24.871+0200: 66520.098: [ParNew: >> > 2950573K->488555K(3145728K), 0.1876660 secs] >> > 8701082K->6239064K(21495808K), 15.8858250 secs] [Times: user=2.38 >> > sys=0.00, real=15.89 secs] ____ >> > >> > 2014-04-09T12:58:34.661+0200: 77389.888: >> > [GC2014-04-09T12:59:06.390+0200: 77421.616: [ParNew: >> > 2623439K->2083K(3145728K), 0.0292270 secs] >> > 6997709K->4376490K(21495808K), 31.7578950 secs] [Times: user=0.34 >> > sys=0.02, real=31.76 secs] ____ >> > >> > which I can?t explain at all.____ >> > >> > The real time of 31.76sec equals a pause of 31.76secs, in which the >> > jvm does not respond to user requests, which is obviously bad.____ >> > >> > The application is _/very/_ allocation heavy, so generally pauses of >> > 0.3sec are okay.____ >> > >> > Our GC settings for this server are:____ >> > >> > -Xms21g____ >> > >> > -Xmx21g____ >> > >> > -XX:ReservedCodeCacheSize=256m____ >> > >> > -XX:PermSize=256m____ >> > >> > -XX:MaxPermSize=768m____ >> > >> > -server____ >> > >> > -verbose:gc____ >> > >> > -Xloggc:log.gc____ >> > >> > -XX:+PrintGCDetails____ >> > >> > -XX:+PrintGCDateStamps____ >> > >> > -XX:+ExplicitGCInvokesConcurrent____ >> > >> > -XX:NewRatio=5____ >> > >> > -XX:SurvivorRatio=5____ >> > >> > -XX:+UseConcMarkSweepGC____ >> > >> > -XX:+UseParNewGC____ >> > >> > -XX:+UseCMSInitiatingOccupancyOnly____ >> > >> > -XX:CMSInitiatingOccupancyFraction=40____ >> > >> > -XX:+CMSClassUnloadingEnabled____ >> > >> > -XX:+CMSScavengeBeforeRemark____ >> > >> > -Dsun.rmi.dgc.client.gcInterval=1209600000____ >> > >> > -Dsun.rmi.dgc.server.gcInterval=1209600000____ >> > >> > ____ >> > >> > We run the sun jdk 7u51 on a current debian wheezy.____ >> > >> > We previously had issues with long ParNew pauses, but back then, the >> > sys time was high, so we concluded that the server was >> > swapping,____ >> > >> > which we were able to prevent.____ >> > >> > Do you have any idea or further hint at debugging options which >> > might help us in finding the issue?____ >> > >> > ____ >> > >> > Regards,____ >> > >> > Cornelius Riemenschneider____ >> > >> > --____ >> > >> > ITscope GmbH____ >> > >> > Ludwig-Erhard-Allee 20____ >> > >> > 76131 Karlsruhe____ >> > >> > Email: cornelius.riemenschneider at itscope.de >> > ____ >> > >> > https://www.itscope.com____ >> > >> > Handelsregister: AG Mannheim, HRB 232782____ >> > >> > Sitz der Gesellschaft: Karlsruhe____ >> > >> > Gesch?ftsf?hrer: Alexander M?nkel, Benjamin Mund, Stefan Reger____ >> > >> > ____ >> > >> > >> > _______________________________________________ >> > hotspot-gc-use mailing list >> > hotspot-gc-use at openjdk.java.net > hotspot-gc-use at openjdk.java.net> >> > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use____ >> > >> > >> > >> > _______________________________________________ >> > hotspot-gc-use mailing list >> > hotspot-gc-use at openjdk.java.net >> > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> > >> >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yu.zhang at oracle.com Wed Apr 9 17:41:26 2014 From: yu.zhang at oracle.com (YU ZHANG) Date: Wed, 09 Apr 2014 10:41:26 -0700 Subject: Why G1 doesn't cut it for our application In-Reply-To: References: Message-ID: <53458646.1020107@oracle.com> Cornelius, Humongous objects are treated specially in G1. They can only be cleaned after marking or full gc. Usually we recommend to increase region size to avoid humongous allocation(It is humongous if the object size > half of the region size). But it would not help in your case. The max heap region size is 32m. As Thomas mentioned, there are some improvement in 8u20, and more improvements under investigation. I am curious to know the performance after 8u20. In addition to Thomas' suggestion, you can try to reduce Eden size, and use -XX:+G1PrintRegionLivenessInfo to print out region information after the marking phase. Thanks, Jenny On 4/9/2014 4:56 AM, Cornelius Riemenschneider wrote: > > Hello, > > after recently switching to the latest java 7 (u51), I was eager to > try out G1. > > I used mainly > http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-gc-migration-to-expectations-and-advanced-tuning > for tuning, but > > I hit a roadblock which makes it impossile for us to use G1. > > Our allocation pattern includes sometimes huge objects, sometimes in > the range of ~120MB, sometimes ~600MB, but I've seen about 1.2GB as > well. This is obviously unfriendly to the GC. > > Our tuned CMS mostly handles this, but sometimes we hit problems, so > we had high expectations for G1. > > G1, in our case, triggers FullGC way more often than CMS, even when > the heap is mostly empty. > > An example log excerpt for this: > > 399934.892: [G1Ergonomics (Heap Sizing) attempt heap expansion, > reason: humongous allocation request failed, allocation request: > 117245600 bytes] > > 399934.892: [G1Ergonomics (Heap Sizing) expand the heap, requested > expansion amount: 83886080 bytes, attempted expansion amount: 83886080 > bytes] > > 399934.892: [G1Ergonomics (Heap Sizing) did not expand the heap, > reason: heap expansion operation failed] > > 399934.893: [G1Ergonomics (Heap Sizing) attempt heap expansion, > reason: humongous allocation request failed, allocation request: > 117245600 bytes] > > 399934.893: [G1Ergonomics (Heap Sizing) expand the heap, requested > expansion amount: 83886080 bytes, attempted expansion amount: 83886080 > bytes] > > 399934.893: [G1Ergonomics (Heap Sizing) did not expand the heap, > reason: heap expansion operation failed] > > 399934.893: [G1Ergonomics (Heap Sizing) attempt heap expansion, > reason: allocation request failed, allocation request: 117245600 bytes] > > 399934.893: [G1Ergonomics (Heap Sizing) expand the heap, requested > expansion amount: 117245600 bytes, attempted expansion amount: > 117440512 bytes] > > 399934.893: [G1Ergonomics (Heap Sizing) did not expand the heap, > reason: heap expansion operation failed] > > 2014-04-09T12:12:49.602+0200: 399934.894: [Full GC 11G->8118M(20G), > 20.8752850 secs] > > [Eden: 8192.0K(1016.0M)->0.0B(2728.0M) Survivors: 96.0M->0.0B Heap: > 11.6G(20.0G)->8118.8M(20.0G)] > > [Times: user=37.77 sys=0.00, real=20.88 secs] > > 2014-04-09T12:13:10.479+0200: 399955.770: [GC concurrent-mark-abort] > > We have a total of 20G for the heap available, and try to allocate > objects in the 120MB range. > > 9 GB of the heap are free, so these should fit in without problems, > even in Eden is a lot of free space. > > The attempted heap expansion fails, because we use > > -Xms20g > > -Xmx20g > > which is the maximum the server is able to handle. > > Still, G1 gets us a FullGC here. This FullGC may be faster than a CMS > FullGC, but these happen way too often to be tolerated, especially as > this server is responsible for a web > > application with which users directly interact -- 20 secs pause after > clicking are simply not tolerable. > > Besides using CMS, or not doing large allocations (which is sometimes > impossible, given that we deal with a lot of data), > > do you have oher ideas? > > Is it known that an allocation pattern with a lot of huge objects > breaks G1? > > The above linked presentation suggests to increase the G1 region size > when humongous allocation requests are encountered, so these > allocation go in eden, but we can > > not increase the region size beyond 32M, so this fix doesn't work for us. > > Regards, > > Cornelius Riemenschneider > > -- > > ITscope GmbH > > Ludwig-Erhard-Allee 20 > > 76131 Karlsruhe > > Email: cornelius.riemenschneider at itscope.de > > https://www.itscope.com > > Handelsregister: AG Mannheim, HRB 232782 > > Sitz der Gesellschaft: Karlsruhe > > Gesch?ftsf?hrer: Alexander M?nkel, Benjamin Mund, Stefan Reger > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: From holger.hoffstaette at googlemail.com Wed Apr 9 17:42:07 2014 From: holger.hoffstaette at googlemail.com (=?UTF-8?B?SG9sZ2VyIEhvZmZzdMOkdHRl?=) Date: Wed, 09 Apr 2014 19:42:07 +0200 Subject: ridiculous ParNew pause times In-Reply-To: References: Message-ID: <5345866F.70604@googlemail.com> On 04/09/14 18:40, Cornelius Riemenschneider wrote: > Bingo! First, i tried to get a tool which shows me which process > writes to which file and how long that takes, but I was unable to > find one I could master to use. Based on your suggestion I moved the > log.gc file to a ramdisk and performed extensive load testing - now > my biggest outlier is 2014-04-09T18:29:13.873+0200: 383.372: > [GC2014-04-09T18:29:13.874+0200: 383.372: [ParNew: > 431041K->208698K(3145728K), 1.8312460 secs] > 11055599K->11203421K(21495808K), 1.8315130 secs] [Times: user=2.61 > sys=0.03, real=1.83 secs] which is okay. Another idea: are your vm.dirty_(expire|writeback)_centisecs and especially vm.dirty_(background)_ratio sysctl settings default, aka ridiculously high? This can result in writeback storms of a huge number of accumulated dirty buffers and is a common reason for periodic stalls, which can take forever when another process issues fsync() at inappropriate moments. Just a guess. -h From vitalyd at gmail.com Wed Apr 9 18:15:43 2014 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Wed, 9 Apr 2014 14:15:43 -0400 Subject: AW: ridiculous ParNew pause times In-Reply-To: References: <53456751.40408@finkzeit.at>

Message-ID: Ouch, that's very unfortunate. I'm curious if there's a good reason why it has to be synchronous? Seems like the log entries can be flushed async by a dedicated (or some other VM worker) thread and not risk stalling the app like this. Sent from my phone On Apr 9, 2014 1:05 PM, "Gustav ?kesson" wrote: > Hi, > > Yes the GC-logging is synchronous and part of the GC cycle, so any delay > will add to the pause. > > Best regards, > Gustav ?kesson > Den 9 apr 2014 18:40 skrev "Cornelius Riemenschneider" : > >> Bingo! >> First, i tried to get a tool which shows me which process writes to which >> file and how long that takes, but I was unable to find one I could master >> to use. >> Based on your suggestion I moved the log.gc file to a ramdisk and >> performed extensive load testing - now my biggest outlier is >> 2014-04-09T18:29:13.873+0200: 383.372: [GC2014-04-09T18:29:13.874+0200: >> 383.372: [ParNew: 431041K->208698K(3145728K), 1.8312460 secs] >> 11055599K->11203421K(21495808K), 1.8315130 secs] [Times: user=2.61 >> sys=0.03, real=1.83 secs] >> which is okay. >> When moving log.gc back to the harddisk to which redis saves I get pauses >> as long as 45sec. >> >> So I agree with Vitaly that redis probably thrashes some caches (though >> not all, as it only occupies one cpu core of a dual-socket server) which >> would explain a slowdown of garbage collections, but the real cause of the >> long spikes is actually writing the log to disk. >> >> Could you guys check if you write the gc log synchronous? >> If yes, I'd suggest (a) moving writing the log out of the STW phase and >> (b) using asynchronous I/O for writing the log. >> >> Thanks for all your suggestions and your help! >> >> Regards, >> Cornelius Riemenschneider >> -- >> ITscope GmbH >> Ludwig-Erhard-Allee 20 >> 76131 Karlsruhe >> Email: cornelius.riemenschneider at itscope.de >> https://www.itscope.com >> Handelsregister: AG Mannheim, HRB 232782 >> Sitz der Gesellschaft: Karlsruhe >> Gesch?ftsf?hrer: Alexander M?nkel, Benjamin Mund, Stefan Reger >> >> >> -----Urspr?ngliche Nachricht----- >> Von: Wolfgang Pedot [mailto:wolfgang.pedot at finkzeit.at] >> Gesendet: Mittwoch, 9. April 2014 17:29 >> An: Vitaly Davidovich >> Cc: Cornelius Riemenschneider; hotspot-gc-use >> Betreff: Re: ridiculous ParNew pause times >> >> A crazy idea: could the gc-logging be the problem here because the >> redis-dump is using up all the IO bandwidth? >> >> Wolfgang >> >> Am 09.04.2014 17:06, schrieb Vitaly Davidovich: >> > Interesting. How much RAM does this machine have? You're using about >> > 21-22g for jvm and another 5g for redis, right? Is there room to spare >> > in physical memory? >> > >> > If there's no swap activity, then I don't think there's any I/O >> > interference. I'm guessing that redis writing out 5g of memory is >> > blowing the cpu caches for the jvm, and causing page faults. Although >> > if there's no swap activity, it seems these faults would be soft and >> > it's hard to explain how that would balloon to 30 sec pauses and not >> > show up in sys time then. >> > >> > Are there lots of context switches in the java process when this >> happens? >> > >> > Sent from my phone >> > >> > On Apr 9, 2014 10:41 AM, "Cornelius Riemenschneider" > > > wrote: >> > >> > Hey,____ >> > >> > thanks for the hints :-)____ >> > >> > The server runs one JVM and one redis instance (A event-based, >> > single threaded in-memory nosql-datastore).____ >> > >> > Redis stores about 5G of data, which are written to disk from time >> > to time ? it now turns out, that the redis saves align perfectly >> > with our long ParNew times.____ >> > >> > By initiating a redis save I was able to get these garbage >> > collections.____ >> > >> > 2014-04-09T15:06:07.892+0200: 85043.119: >> > [GC2014-04-09T15:06:18.089+0200: 85053.315: [ParNew: >> > 2964426K->338263K(3145728K), 0.1532950 secs] >> > 7167668K->4541505K(21495808K), 10.3502410 secs] [Times: user=1.93 >> > sys=0.01, real=10.35 secs]____ >> > >> > 2014-04-09T15:06:39.203+0200: 85074.429: >> > [GC2014-04-09T15:07:11.026+0200: 85106.252: [ParNew: >> > 3145728K->405516K(3145728K), 0.3429470 secs] >> > 7645283K->5212053K(21495808K), 32.1663210 secs] [Times: user=2.96 >> > sys=0.01, real=32.17 secs]____ >> > >> > I monitored the system during the GCs and swapping definitly does >> > not happen.____ >> > >> > Because redis is single-threaded (and only used by the JVM which is >> > during ParNew in a STW phase), during ParNew there is only redis >> > active, trying to write 5G of memory to disk as fast as possible and >> > the GC.____ >> > >> > I wasn?t able to pinpoint the issue yet, do you have an idea why the >> > jvm could block on I/O during a GC?____ >> > >> > Disk access is during these phases probably way slower than usual, >> > as everything is on the / partition, a RAID 1 disk array.____ >> > >> > Or does anyone know the right perf options for the perf tool to >> > monitor blocking i/o during garbage collections?____ >> > >> > __ __ >> > >> > Regards,____ >> > >> > Cornelius Riemenschneider____ >> > >> > --____ >> > >> > ITscope GmbH____ >> > >> > Ludwig-Erhard-Allee 20____ >> > >> > 76131 Karlsruhe____ >> > >> > Email: cornelius.riemenschneider at itscope.de >> > ____ >> > >> > https://www.itscope.com____ >> > >> > Handelsregister: AG Mannheim, HRB 232782____ >> > >> > Sitz der Gesellschaft: Karlsruhe____ >> > >> > Gesch?ftsf?hrer: Alexander M?nkel, Benjamin Mund, Stefan Reger____ >> > >> > __ __ >> > >> > *Von:*Vitaly Davidovich [mailto:vitalyd at gmail.com >> > ] >> > *Gesendet:* Mittwoch, 9. April 2014 14:49 >> > *An:* Cornelius Riemenschneider >> > *Cc:* hotspot-gc-use >> > *Betreff:* Re: ridiculous ParNew pause times____ >> > >> > __ __ >> > >> > This is typically caused by one of:____ >> > >> > 1) heavy swapping (this won't count towards sys time because kernel >> > is not using cpu while waiting for I/O to complete) >> > 2) oversubscribed machine where gc threads aren't getting enough cpu >> > time due____ >> > >> > Have you looked at stats on the machine when these pauses occur, >> > specifically around swap activity? Is your machine running multiple >> > JVMs or any other noisy neighbor apps?____ >> > >> > Sent from my phone____ >> > >> > On Apr 9, 2014 7:26 AM, "Cornelius Riemenschneider" > > > wrote:____ >> > >> > Hello,____ >> > >> > we?ve got the following problem with the ParNew collector:____ >> > >> > Our log.gc usually looks like this:____ >> > >> > 2014-04-09T12:58:02.485+0200: 77357.712: >> > [GC2014-04-09T12:58:02.485+0200: 77357.712: [ParNew: >> > 2722925K->100795K(3145728K), 0.3202010 secs] >> > 6998057K->4375934K(21495808K), 0.3205670 secs] [Times: user=4.10 >> > sys=0.02, real=0.32 secs] ____ >> > >> > 2014-04-09T12:58:06.256+0200: 77361.483: >> > [GC2014-04-09T12:58:06.257+0200: 77361.483: [ParNew: >> > 2722235K->101011K(3145728K), 0.3229910 secs] >> > 6997374K->4376165K(21495808K), 0.3233580 secs] [Times: user=4.13 >> > sys=0.02, real=0.33 secs] ____ >> > >> > 2014-04-09T12:58:12.295+0200: 77367.522: >> > [GC2014-04-09T12:58:12.296+0200: 77367.522: [ParNew: >> > 2722451K->101057K(3145728K), 0.3215320 secs] >> > 6997605K->4376216K(21495808K), 0.3219080 secs] [Times: user=4.12 >> > sys=0.01, real=0.32 secs] ____ >> > >> > 2014-04-09T12:58:18.461+0200: 77373.688: >> > [GC2014-04-09T12:58:18.462+0200: 77373.688: [ParNew: >> > 2722497K->2232K(3145728K), 0.2944540 secs] >> > 6997656K->4376242K(21495808K), 0.2948280 secs] [Times: user=3.79 >> > sys=0.00, real=0.29 secs]____ >> > >> > But occasionally we have entries like these:____ >> > >> > 2014-04-09T09:56:12.840+0200: 66448.066: >> > [GC2014-04-09T09:56:38.154+0200: 66473.381: [ParNew: >> > 3139808K->524288K(3145728K), 0.8355990 secs] >> > 6845512K->4585563K(21495808K), 26.1502640 secs] [Times: user=5.59 >> > sys=0.16, real=26.16 secs] ____ >> > >> > 2014-04-09T09:57:09.173+0200: 66504.400: >> > [GC2014-04-09T09:57:24.871+0200: 66520.098: [ParNew: >> > 2950573K->488555K(3145728K), 0.1876660 secs] >> > 8701082K->6239064K(21495808K), 15.8858250 secs] [Times: user=2.38 >> > sys=0.00, real=15.89 secs] ____ >> > >> > 2014-04-09T12:58:34.661+0200: 77389.888: >> > [GC2014-04-09T12:59:06.390+0200: 77421.616: [ParNew: >> > 2623439K->2083K(3145728K), 0.0292270 secs] >> > 6997709K->4376490K(21495808K), 31.7578950 secs] [Times: user=0.34 >> > sys=0.02, real=31.76 secs] ____ >> > >> > which I can?t explain at all.____ >> > >> > The real time of 31.76sec equals a pause of 31.76secs, in which the >> > jvm does not respond to user requests, which is obviously bad.____ >> > >> > The application is _/very/_ allocation heavy, so generally pauses of >> > 0.3sec are okay.____ >> > >> > Our GC settings for this server are:____ >> > >> > -Xms21g____ >> > >> > -Xmx21g____ >> > >> > -XX:ReservedCodeCacheSize=256m____ >> > >> > -XX:PermSize=256m____ >> > >> > -XX:MaxPermSize=768m____ >> > >> > -server____ >> > >> > -verbose:gc____ >> > >> > -Xloggc:log.gc____ >> > >> > -XX:+PrintGCDetails____ >> > >> > -XX:+PrintGCDateStamps____ >> > >> > -XX:+ExplicitGCInvokesConcurrent____ >> > >> > -XX:NewRatio=5____ >> > >> > -XX:SurvivorRatio=5____ >> > >> > -XX:+UseConcMarkSweepGC____ >> > >> > -XX:+UseParNewGC____ >> > >> > -XX:+UseCMSInitiatingOccupancyOnly____ >> > >> > -XX:CMSInitiatingOccupancyFraction=40____ >> > >> > -XX:+CMSClassUnloadingEnabled____ >> > >> > -XX:+CMSScavengeBeforeRemark____ >> > >> > -Dsun.rmi.dgc.client.gcInterval=1209600000____ >> > >> > -Dsun.rmi.dgc.server.gcInterval=1209600000____ >> > >> > ____ >> > >> > We run the sun jdk 7u51 on a current debian wheezy.____ >> > >> > We previously had issues with long ParNew pauses, but back then, the >> > sys time was high, so we concluded that the server was >> > swapping,____ >> > >> > which we were able to prevent.____ >> > >> > Do you have any idea or further hint at debugging options which >> > might help us in finding the issue?____ >> > >> > ____ >> > >> > Regards,____ >> > >> > Cornelius Riemenschneider____ >> > >> > --____ >> > >> > ITscope GmbH____ >> > >> > Ludwig-Erhard-Allee 20____ >> > >> > 76131 Karlsruhe____ >> > >> > Email: cornelius.riemenschneider at itscope.de >> > ____ >> > >> > https://www.itscope.com____ >> > >> > Handelsregister: AG Mannheim, HRB 232782____ >> > >> > Sitz der Gesellschaft: Karlsruhe____ >> > >> > Gesch?ftsf?hrer: Alexander M?nkel, Benjamin Mund, Stefan Reger____ >> > >> > ____ >> > >> > >> > _______________________________________________ >> > hotspot-gc-use mailing list >> > hotspot-gc-use at openjdk.java.net > hotspot-gc-use at openjdk.java.net> >> > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use____ >> > >> > >> > >> > _______________________________________________ >> > hotspot-gc-use mailing list >> > hotspot-gc-use at openjdk.java.net >> > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> > >> >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cri at itscope.de Thu Apr 10 09:07:55 2014 From: cri at itscope.de (=?windows-1252?Q?Cornelius_Riemenschneider?=) Date: Thu, 10 Apr 2014 11:07:55 +0200 Subject: AW: ridiculous ParNew pause times In-Reply-To: <5345866F.70604@googlemail.com> References: Message-ID: Yes, these settings are currently set to the default values. I'll investigate changing them as well, but for now I'll move the gc log to a ramdisk - waiting for the log to be written is a very stupid reason to delay something awful as a garbage collection. Regards, Cornelius Riemenschneider -- ITscope GmbH Ludwig-Erhard-Allee 20 76131 Karlsruhe Email: cornelius.riemenschneider at itscope.de https://www.itscope.com Handelsregister: AG Mannheim, HRB 232782 Sitz der Gesellschaft: Karlsruhe Gesch?ftsf?hrer: Alexander M?nkel, Benjamin Mund, Stefan Reger -----Urspr?ngliche Nachricht----- Von: hotspot-gc-use [mailto:hotspot-gc-use-bounces at openjdk.java.net] Im Auftrag von Holger Hoffst?tte Gesendet: Mittwoch, 9. April 2014 19:42 An: hotspot-gc-use Betreff: Re: ridiculous ParNew pause times On 04/09/14 18:40, Cornelius Riemenschneider wrote: > Bingo! First, i tried to get a tool which shows me which process > writes to which file and how long that takes, but I was unable to find > one I could master to use. Based on your suggestion I moved the log.gc > file to a ramdisk and performed extensive load testing - now my > biggest outlier is 2014-04-09T18:29:13.873+0200: 383.372: > [GC2014-04-09T18:29:13.874+0200: 383.372: [ParNew: > 431041K->208698K(3145728K), 1.8312460 secs] > 11055599K->11203421K(21495808K), 1.8315130 secs] [Times: user=2.61 > sys=0.03, real=1.83 secs] which is okay. Another idea: are your vm.dirty_(expire|writeback)_centisecs and especially vm.dirty_(background)_ratio sysctl settings default, aka ridiculously high? This can result in writeback storms of a huge number of accumulated dirty buffers and is a common reason for periodic stalls, which can take forever when another process issues fsync() at inappropriate moments. Just a guess. -h _______________________________________________ hotspot-gc-use mailing list hotspot-gc-use at openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From cri at itscope.de Thu Apr 10 11:17:07 2014 From: cri at itscope.de (=?windows-1252?Q?Cornelius_Riemenschneider?=) Date: Thu, 10 Apr 2014 13:17:07 +0200 Subject: AW: Why G1 doesn't cut it for our application In-Reply-To: References: Message-ID: Hey, first, we run two different types of servers ? one of them is very allocation heavy, with the possibility of huge objects whcih live longer. The other one usually has only small allocations, but some requests may trigger the allocation of up to ~150MB of data at once. This data is used at whole, but very short-lived. Explanation: We sometimes ask the question: Given two unsorted sets of integer IDs, how many common ids are in both? Aka the cardinal number of the intersection of the two sets. The fastest way we found to determine is this: for both sets determine the biggest integer per set (linear scan of the array) allocate a long array which has as many bits as the alrgest id requires for each id, set the bit which corresponds with the id to 1 (example: id 3 in ou set would mean the first long in our long array, and then the 3rd bit there) ? now, we can intersect these sets very fast by bitwise and of each long in the array and using Long.popcnt, which is documented to compile to the POPCNT instruction on our CPUs. The intersect-and-count is very fast (I don?t have exact numbers on this, but it should be <300ms), but the objects required are rather huge. I even found this gem in our log: 485471.317: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: humongous allocation request failed, allocation request: 81000016 bytes] 485471.317: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 67108864 bytes, attempted expansion amount: 67108864 bytes] 485471.317: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap expansion operation failed] 485471.319: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: humongous allocation request failed, allocation request: 81000016 bytes] 485471.319: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 67108864 bytes, attempted expansion amount: 67108864 bytes] 485471.319: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap expansion operation failed] 485471.319: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: allocation request failed, allocation request: 81000016 bytes] 485471.319: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 81000016 bytes, attempted expansion amount: 83886080 bytes] 485471.319: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap expansion operation failed] 2014-04-10T11:58:26.028+0200: 485471.319: [Full GC 11G->7884M(20G), 20.8011100 secs] ?? [Eden: 0.0B(1584.0M)->0.0B(1600.0M) Survivors: 8192.0K->0.0B Heap: 11.3G(20.0G)->7884.1M(20.0G)] [Times: user=37.88 sys=0.00, real=20.81 secs] ? about 80MB allocation, 9GB free but it still requires a fullgc (though still with 7u51). The settings used there are: -Xms20g?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? -Xmx20g???????????????? ?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? -XX:ReservedCodeCacheSize=256m??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? -XX:PermSize=256m??????????????????????????????????????????????????????????????????????????????????????? ????????????????????????????????????????????? -XX:MaxPermSize=768m????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? -XX:+ExplicitGCInvokesConcurrent????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????? -XX:+PrintAdaptiveSizePolicy??????????????????????????? ?????????????????????????????????????????????????????????????????????????????????????????????? -XX:+UseG1GC????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? -XX:MaxGCPauseMillis=80??????????????????????????????????????????????????????????????????????????????????????????? ??????????????????????????????????? -XX:InitiatingHeapOccupancyPercent=45???????????????????????????????????????????????????????????????????????????????????????????????????????????????? -XX:G1ReservePercent=15????????????????????????????????????????????? ????????????????????????????????????????????????????????????????????????????????? ? A CMS config which has way less problems for the server would be: -Xms20g???????????????????????????????????????????????????? ?????????????????????????????????????????????????????????????????????????????????????????? -Xmx20g?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? -XX:ReservedCodeCacheSize=256m??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? -XX:PermSize=256m????????????????????????????????????????????????????????????????????????????????????????????????????? ??????????????????????????????? -XX:MaxPermSize=768m????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? -XX:NewRatio=3??????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????? -XX:SurvivorRatio=5?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? -XX:+UseConcMarkSweepGC???? ?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? -XX:+UseParNewGC???????????????????????????????????????????????????????????????????????????????????????????????????????????????????? ????????????????? -XX:+UseCMSInitiatingOccupancyOnly??????????????????????????????????????????????????????????????????????????????????????????????????????????????????? -XX:CMSInitiatingOccupancyFraction=75????????????????????????????????????????????????? ??????????????????????????????????????????????????????????????? -XX:+CMSClassUnloadingEnabled???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? -XX:+ExplicitGCInvokesConcurrent???????? ????????????????????????????????????????????????????????????????????????????????????????????????????????????? -XX:+PrintGCDetails?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? ???? -XX:+PrintGCDateStamps??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? -XX:+CMSScavengeBeforeRemark????????? I can?t increase the heap further without deploying more RAM. ? I tested a lot more different options with out allocation heavy servers, because restarting these has no direct impact on our customers. The configuration which worked best used the following: A high pause time goal (600ms), but a limit to the newsize (6g), so that we still have enough oldspace for the big objects.????????????????????????????????????????????????????????????? ?????????????????????????????????????????????????? I also tried setting the regionsize to 32mb, but this didn?t cause a noticable difference from the default. Setting InitiatingHeapSize to 0, 1 or 45 didn?t impact the amount of fullgcs either (at least not noticable). ? When i?ll test with 8u20, I?ll run with +PrintHeapAtGCExtended. ? Regards, Cornelius Riemenschneider -- ITscope GmbH Ludwig-Erhard-Allee 20 76131 Karlsruhe Email: cornelius.riemenschneider at itscope.de https://www.itscope.com Handelsregister: AG Mannheim, HRB 232782 Sitz der Gesellschaft: Karlsruhe Gesch?ftsf?hrer: Alexander M?nkel, Benjamin Mund, Stefan Reger ? Von: Srinivas Ramakrishna [mailto:ysr1729 at gmail.com] Gesendet: Mittwoch, 9. April 2014 18:50 An: Cornelius Riemenschneider Cc: hotspot-gc-use at openjdk.java.net Betreff: Re: Why G1 doesn't cut it for our application ? May be look at the +PrintHeapAtGCExtended (or similar) option which gives you a breakdown of space usage in each of the regions. The messages here state that the attempt is to increase the heap size by a mere 117 MB. It would have been nice to also print the size of the allocation request that was failing and causing a fallback to full gc. It seems a bit strange that with only 11 G of the 17 G in the old gen used (i.e. 6 GB free), the space was so fragmented as to prevent, presumably here, a 117 MB allocation (OTOH perhaps the heap expansion request of 117 MB should not be construed as a request for 117 MB). Knowing the distribution of free space in the regions before the full gc and the size of the allocation request that failed might provide some clues on how G1 may have to be "paced" or tuned to keep sufficiently many completely free (and hopefully contiguous) regions to accomodate larger than region size object allocations.? ? What region size are you using? I would definitely try ?a large region size and a larger heap, if you have sufficient free ram (make the space available to the ?old generation three times the footprint of your application just to give G1 sufficient head room to manage the space). I wouldn't necessarily want to force large object allocations into Eden unless you know that these objects have a relatively short lifetime (which would seem to be a dubious use of large objects since it's likely then that a very small portion of that object is actually used in that case). ? What settings do you use with CMS? And what settings are you using with G1? ? USD 0.02. -- ramki ? ? On Wed, Apr 9, 2014 at 4:56 AM, Cornelius Riemenschneider wrote: Hello, after recently switching to the latest java 7 (u51), I was eager to try out G1. I used mainly http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-gc-migration-to-expectations-and-advanced-tuning for tuning, but I hit a roadblock which makes it impossile for us to use G1. Our allocation pattern includes sometimes huge objects, sometimes in the range of ~120MB, sometimes ~600MB, but I?ve seen about 1.2GB as well. This is obviously unfriendly to the GC. Our tuned CMS mostly handles this, but sometimes we hit problems, so we had high expectations for G1. G1, in our case, triggers FullGC way more often than CMS, even when the heap is mostly empty. ? An example log excerpt for this: 399934.892: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: humongous allocation request failed, allocation request: 117245600 bytes] 399934.892: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 83886080 bytes, attempted expansion amount: 83886080 bytes] 399934.892: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap expansion operation failed] 399934.893: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: humongous allocation request failed, allocation request: 117245600 bytes] 399934.893: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 83886080 bytes, attempted expansion amount: 83886080 bytes] 399934.893: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap expansion operation failed] 399934.893: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: allocation request failed, allocation request: 117245600 bytes] 399934.893: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 117245600 bytes, attempted expansion amount: 117440512 bytes] 399934.893: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap expansion operation failed] 2014-04-09T12:12:49.602+0200: 399934.894: [Full GC 11G->8118M(20G), 20.8752850 secs] ?? [Eden: 8192.0K(1016.0M)->0.0B(2728.0M) Survivors: 96.0M->0.0B Heap: 11.6G(20.0G)->8118.8M(20.0G)] [Times: user=37.77 sys=0.00, real=20.88 secs] 2014-04-09T12:13:10.479+0200: 399955.770: [GC concurrent-mark-abort] ? We have a total of 20G for the heap available, and try to allocate objects in the 120MB range. 9 GB of the heap are free, so these should fit in without problems, even in Eden is a lot of free space. The attempted heap expansion fails, because we use -Xms20g -Xmx20g which is the maximum the server is able to handle. Still, G1 gets us a FullGC here. This FullGC may be faster than a CMS FullGC, but these happen way too often to be tolerated, especially as this server is responsible for a web application with which users directly interact ? 20 secs pause after clicking are simply not tolerable. Besides using CMS, or not doing large allocations (which is sometimes impossible, given that we deal with a lot of data), do you have oher ideas? Is it known that an allocation pattern with a lot of huge objects breaks G1? The above linked presentation suggests to increase the G1 region size when humongous allocation requests are encountered, so these allocation go in eden, but we can not increase the region? size beyond 32M, so this fix doesn?t work for us. ? Regards, Cornelius Riemenschneider -- ITscope GmbH Ludwig-Erhard-Allee 20 76131 Karlsruhe Email: cornelius.riemenschneider at itscope.de https://www.itscope.com Handelsregister: AG Mannheim, HRB 232782 Sitz der Gesellschaft: Karlsruhe Gesch?ftsf?hrer: Alexander M?nkel, Benjamin Mund, Stefan Reger ? _______________________________________________ hotspot-gc-use mailing list hotspot-gc-use at openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.schatzl at oracle.com Thu Apr 10 13:11:23 2014 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Thu, 10 Apr 2014 15:11:23 +0200 Subject: AW: Why G1 doesn't cut it for our application In-Reply-To: References: Message-ID: <1397135483.2748.47.camel@cirrus> Hi, On Thu, 2014-04-10 at 13:17 +0200, Cornelius Riemenschneider wrote: > Hey, > > first, we run two different types of servers ? one of them is very > allocation heavy, with the possibility of huge objects whcih live > longer. > > The other one usually has only small allocations, but some requests > may trigger the allocation of up to ~150MB of data at once. > > This data is used at whole, but very short-lived. Explanation: That's the perfect case for the change in JDK-8027959. > We sometimes ask the question: Given two unsorted sets of integer IDs, > how many common ids are in both? > > Aka the cardinal number of the intersection of the two sets. > > The fastest way we found to determine is this: > [...] > I even found this gem in our log: > > 485471.317: [G1Ergonomics (Heap Sizing) attempt heap expansion, > reason: humongous allocation request failed, allocation request: > 81000016 bytes] > 485471.317: [G1Ergonomics (Heap Sizing) expand the heap, requested > expansion amount: 67108864 bytes, attempted expansion amount: 67108864 > bytes] > 485471.317: [G1Ergonomics (Heap Sizing) did not expand the heap, > reason: heap expansion operation failed] > 485471.319: [G1Ergonomics (Heap Sizing) attempt heap expansion, > reason: humongous allocation request failed, allocation request: > 81000016 bytes] > 485471.319: [G1Ergonomics (Heap Sizing) expand the heap, requested > expansion amount: 67108864 bytes, attempted expansion amount: 67108864 > bytes] > 485471.319: [G1Ergonomics (Heap Sizing) did not expand the heap, > reason: heap expansion operation failed] > 485471.319: [G1Ergonomics (Heap Sizing) attempt heap expansion, > reason: allocation request failed, allocation request: 81000016 bytes] > 485471.319: [G1Ergonomics (Heap Sizing) expand the heap, requested > expansion amount: 81000016 bytes, attempted expansion amount: 83886080 > bytes] > 485471.319: [G1Ergonomics (Heap Sizing) did not expand the heap, > reason: heap expansion operation failed] Note that these log messages come from different threads, so an "allocation request: 81000016 bytes" followed by "attempted expansion amount: 67108864 bytes]" is perfectly natural :) > 2014-04-10T11:58:26.028+0200: 485471.319: [Full GC 11G->7884M(20G), > 20.8011100 secs] > [Eden: 0.0B(1584.0M)->0.0B(1600.0M) Survivors: 8192.0K->0.0B Heap: > 11.3G(20.0G)->7884.1M(20.0G)] > [Times: user=37.88 sys=0.00, real=20.81 secs] Not sure why G1 immediately tries to do a full GC here. > about 80MB allocation, 9GB free but it still requires a fullgc (though > still with 7u51). Imo there is a bug (or at least some documentation issue) in total size accounting for large objects: only the space that is actually occupied by the object is counted as such, not the actual space it takes (object size rounded up to regions). So, if, for example you have lots of objects that are region_size + 1 byte large, the value shown sums up these region_size + 1 bytes, but in reality 2 * region_size bytes are taken up. So in extreme cases, this value can be off by almost 50% :) This seems like one of these cases. You can find out with the PrintHeapAtGCExtended flag by looking if the HC (humongous object) regions are almost empty. Unfortunately typical sizing rules-of-thumb tend to create these objects: i.e. start with an element size that is a power of two, and double that every time a reallocation is required. Taking the object header into account, you end up with objects of region_size + 16 bytes size :/ But, this is what JDK-8031381 is about :) > The settings used there are: [...] > I can?t increase the heap further without deploying more RAM. > > I tested a lot more different options with out allocation heavy > servers, because restarting these has no direct impact on our > customers. > > The configuration which worked best used the following: A high pause > time goal (600ms), but a limit to the newsize (6g), so that we still > have enough oldspace for the big > objects. > > I also tried setting the regionsize to 32mb, but this didn?t cause a > noticable difference from the default. Setting InitiatingHeapSize to > 0, 1 or 45 didn?t impact the amount of fullgcs either (at least not > noticable). The allocation rate is too high so that InitiatingHeapSize makes a difference. Not exactly sure why CMS would work better in this case, but probably because it just allocates (most of) these objects into the young gen (not sure if there is some large object threshold, if so it is probably higher than 32M), getting rid of them the next young collection easily. > When i?ll test with 8u20, I?ll run with +PrintHeapAtGCExtended. You also need to add -XX:+PrintHeapAtGC so that the "extended" output is shown btw. Thomas From thomas.schatzl at oracle.com Fri Apr 11 10:00:33 2014 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Fri, 11 Apr 2014 12:00:33 +0200 Subject: Why G1 doesn't cut it for our application In-Reply-To: References: Message-ID: <1397210433.2710.23.camel@cirrus> hi, On Thu, 2014-04-10 at 16:08 +0200, Cornelius Riemenschneider wrote: > Hi, > I finally tried 8u20 b5 (a precompiled early access build) on a test server. > Triggering FullGCs with our not-so-allocation-heavy application by hand (without the load of our real users) is hard even with 7u51, so I didn't try that. > On the other hand, load-testing our allocation-heavy software is very easy, and triggered loads of FullGCs with 7u51. > 8u20 does way better with my current test. That's great to hear :) Thanks for trying out the new build. > I can't really interpret the output of PrintHeapAtGCExtended, so I've attached the (hopefully relevant) log from before and after the fullgc. Let me try to explain the relevant information: every line in this log represents a region of the heap, from the bottom address to top address. Relevant for this problem are the columns on the left (up to but not including "TS"), and the percentage before "used". So, what's important here are "HS", "HC" and "F" strings. HS means that in that particular region a humongous objects starts, HC regions are regions the object in the previous humongous start region extend into. "F" regions that are completely free for allocation. So, any new humongous object needs a contiguous sequence of free regions for allocation, so the GC is looking for long contiguous sequences of "F" regions. This seems to be the problem: longer sequences of free regions are often broken up by single regions with content. Another aspect is how much of the area occupied by large objects is actually taken up by the object: the "used" percentage for the "HS" region shows that for the entire object. E.g. something like 51% is really bad, while 90%+ is probably okay. From what I saw, in your case most objects are very large, so the used/total space percentage is typically good. > Is this a problem which can be fixed with other gc settings on my end, The only setting I can see that could affect something from the user end would be increasing heap region size. Not the best option, but possibly worth a try as it reduces the required number of consecutive free regions (but then it also reduces the number of total available regions). >or can you improve G1 further to target our allocation pattern? Jdk8u20-b08 includes a set of further optimizations for G1, some that should directly improve your use case. I do not know when/if it will be available for download in the Java Early Access program. Further improvements will be included with later releases only though, i.e. 8u40/9. You could subscribe to notifications for the mentioned CRs to at least get information about changes. JDK-8038487 "G1: use mixed GC instead of Full GC to clear out space for failing humoungous object allocations" is the one I did not remember last time, and probably addresses the problem with your application directly. I added the log you sent to the CR as an example - thanks for discussing your problem here. Contributions are always welcome to make things happen faster though ;) Thomas From per.liden at oracle.com Fri Apr 11 13:06:07 2014 From: per.liden at oracle.com (Per Liden) Date: Fri, 11 Apr 2014 15:06:07 +0200 Subject: AW: ridiculous ParNew pause times In-Reply-To: References: Message-ID: <5347E8BF.4010509@oracle.com> Hi, As someone already mentioned, writes to the GC log file are synchronous. They are buffered fwrite()s on a stdio FILE stream. But there's no fsync()-ing going on. There's work ongoing to improve/redesign the whole logging framework in the VM (JEP 158, http://openjdk.java.net/jeps/158). I briefly talked to the guys involved in that and passed on the feedback that some kind of support for async logging might be worth considering. If you want to get involved or follow that effort I'd suggest you hang out on serviceability-dev at openjdk.java.net. cheers, /Per On 04/10/2014 11:07 AM, Cornelius Riemenschneider wrote: > Yes, these settings are currently set to the default values. > I'll investigate changing them as well, but for now I'll move the gc log to a ramdisk - > waiting for the log to be written is a very stupid reason to delay something awful as a garbage collection. > > Regards, > Cornelius Riemenschneider > -- > ITscope GmbH > Ludwig-Erhard-Allee 20 > 76131 Karlsruhe > Email: cornelius.riemenschneider at itscope.de > https://www.itscope.com > Handelsregister: AG Mannheim, HRB 232782 > Sitz der Gesellschaft: Karlsruhe > Gesch?ftsf?hrer: Alexander M?nkel, Benjamin Mund, Stefan Reger > > > -----Urspr?ngliche Nachricht----- > Von: hotspot-gc-use [mailto:hotspot-gc-use-bounces at openjdk.java.net] Im Auftrag von Holger Hoffst?tte > Gesendet: Mittwoch, 9. April 2014 19:42 > An: hotspot-gc-use > Betreff: Re: ridiculous ParNew pause times > > On 04/09/14 18:40, Cornelius Riemenschneider wrote: >> Bingo! First, i tried to get a tool which shows me which process >> writes to which file and how long that takes, but I was unable to find >> one I could master to use. Based on your suggestion I moved the log.gc >> file to a ramdisk and performed extensive load testing - now my >> biggest outlier is 2014-04-09T18:29:13.873+0200: 383.372: >> [GC2014-04-09T18:29:13.874+0200: 383.372: [ParNew: >> 431041K->208698K(3145728K), 1.8312460 secs] >> 11055599K->11203421K(21495808K), 1.8315130 secs] [Times: user=2.61 >> sys=0.03, real=1.83 secs] which is okay. > > Another idea: are your vm.dirty_(expire|writeback)_centisecs and especially vm.dirty_(background)_ratio sysctl settings default, aka ridiculously high? This can result in writeback storms of a huge number of accumulated dirty buffers and is a common reason for periodic stalls, which can take forever when another process issues fsync() at inappropriate moments. > > Just a guess. > > -h > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > From vipin.sharma at verizon.com Thu Apr 24 06:17:58 2014 From: vipin.sharma at verizon.com (Sharma, Vipin K) Date: Thu, 24 Apr 2014 02:17:58 -0400 Subject: NewSize and SurvivorRatio needed with UseAdaptiveSizePolicy ? Message-ID: <94424C8743C3F7479E3F25C7CC00BD4104B3A5F991@FLDP1LUMXC7V81.us.one.verizon.com> Hi All I am working on an old java application and few days back we observed application crashed due to memory issue. Below are the parameters we are using -Xms32m -Xmx2048m -Xss1m -XX:+UseParallelGC -XX:NewRatio=2 -XX:NewSize=8m -XX:MaxNewSize=64m -XX:SurvivorRatio=25 -XX:+UseAdaptiveSizePolicy Looking at these parameters few questions comes in mind and Looking for expert advice 1. When we are using UseAdaptiveSizePolicy option , what is use of NewSize , MaxNewSize , NewRatio , SurvivorRatio. Shall I remove these ? 2. Using above options are we overriding values set by UseAdaptiveSizePolicy ? Thanks, Vipin Sharma -------------- next part -------------- An HTML attachment was scrubbed... URL: From jon.masamitsu at oracle.com Thu Apr 24 21:31:01 2014 From: jon.masamitsu at oracle.com (Jon Masamitsu) Date: Thu, 24 Apr 2014 14:31:01 -0700 Subject: NewSize and SurvivorRatio needed with UseAdaptiveSizePolicy ? In-Reply-To: <94424C8743C3F7479E3F25C7CC00BD4104B3A5F991@FLDP1LUMXC7V81.us.one.verizon.com> References: <94424C8743C3F7479E3F25C7CC00BD4104B3A5F991@FLDP1LUMXC7V81.us.one.verizon.com> Message-ID: <53598295.5070200@oracle.com> On 4/23/14 11:17 PM, Sharma, Vipin K wrote: > > Hi All > > I am working on an old java application and few days back we observed > application crashed due to memory issue. > > Below are the parameters we are using > > -Xms32m -Xmx2048m -Xss1m -XX:+UseParallelGC -XX:NewRatio=2 > -XX:NewSize=8m -XX:MaxNewSize=64m -XX:SurvivorRatio=25 > -XX:+UseAdaptiveSizePolicy > > Looking at these parameters few questions comes in mind and Looking > for expert advice > > 1.When we are using UseAdaptiveSizePolicy option , what is use of > NewSize , MaxNewSize , NewRatio , SurvivorRatio. Shall I remove these ? > > 2.Using above options are we overriding values set by > UseAdaptiveSizePolicy ? > UseAdaptiveSizePolicy affects how the heap grows and shrinks. UseAdaptiveSizePolicy grows and shrinks the heap within whatever size it is given for the heap. There are default sizes but if you have preferred sizes, continue to use them. Jon > Thanks, > > *Vipin Sharma* > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: From Denny.Kettwig at werum.com Fri Apr 25 08:23:28 2014 From: Denny.Kettwig at werum.com (Denny Kettwig) Date: Fri, 25 Apr 2014 08:23:28 +0000 Subject: Full GC Single Thread? Message-ID: <6175F8C4FE407D4F830EDA25C27A43173B66AC1F@Werum1790.werum.net> Hey folks, simple question I cannot find an answer to. Is a Full GC a single thread operation per default and if that is the case will it become multi-threaded by using -XX:+UseParallelOldGC ? We are on JDK 1.6 u22. Kind Regards Denny -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlie.hunt at oracle.com Fri Apr 25 11:41:17 2014 From: charlie.hunt at oracle.com (charlie hunt) Date: Fri, 25 Apr 2014 06:41:17 -0500 Subject: Full GC Single Thread? In-Reply-To: <6175F8C4FE407D4F830EDA25C27A43173B66AC1F@Werum1790.werum.net> References: <6175F8C4FE407D4F830EDA25C27A43173B66AC1F@Werum1790.werum.net> Message-ID: <2B9F5993-0FAF-4DD0-B5EE-1FAB9BECF68A@oracle.com> If you are not specifying a GC as a command line option on JDK 6u22, and your system is identified by HotSpot as a server class machine [1], HotSpot will automatically use ParallelGC as the default GC. Full GCs in ParallelGC, (the portion that collects old generation) is single threaded. If you explicitly set -XX:+UseParallelOldGC, Full GCs will be multi-threaded, (both the old generation part and the young generation part). Fwiw, if you also enable -XX:+PrintGCDetails you can look at the ?CPU? info, i.e. user, sys, real values and get a sense of the parallelism realized on a given GC. In simplistic terms, the higher the difference in usr times versus real time, the greater the parallelism. hths, charlie PS: Stating the obvious, Java 6, is a rather old technology, (iirc circa 2006 or 2007). You?ll likely get better GC and app performance by moving to a more recent Java 7 update release or Java 8. [1]: Server class machine is a system that has 2 or more GB of RAM and two or more virtual processors (not CPU sockets, hardware threads as in like the value returned by Runtime.availableProcessors()). On Apr 25, 2014, at 3:23 AM, Denny Kettwig wrote: > Hey folks, > > simple question I cannot find an answer to. > > Is a Full GC a single thread operation per default and if that is the case will it become multi-threaded by using -XX:+UseParallelOldGC ? > > We are on JDK 1.6 u22. > > Kind Regards > > Denny > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: From vipin.sharma at verizon.com Fri Apr 25 14:08:46 2014 From: vipin.sharma at verizon.com (Sharma, Vipin K) Date: Fri, 25 Apr 2014 10:08:46 -0400 Subject: NewSize and SurvivorRatio needed with UseAdaptiveSizePolicy ? In-Reply-To: <53598295.5070200@oracle.com> References: <94424C8743C3F7479E3F25C7CC00BD4104B3A5F991@FLDP1LUMXC7V81.us.one.verizon.com> <53598295.5070200@oracle.com> Message-ID: <94424C8743C3F7479E3F25C7CC00BD4104B3A5FC86@FLDP1LUMXC7V81.us.one.verizon.com> On 4/23/14 11:17 PM, Sharma, Vipin K wrote: Hi All ? I am working on an old java application and few days back we observed application crashed due to memory issue. ? Below are the parameters we are using -Xms32m -Xmx2048m -Xss1m -XX:+UseParallelGC -XX:NewRatio=2 -XX:NewSize=8m -XX:MaxNewSize=64m -XX:SurvivorRatio=25 -XX:+UseAdaptiveSizePolicy ? Looking at these parameters few questions comes in mind and Looking for expert advice 1. When we are using UseAdaptiveSizePolicy option , what is use of NewSize , MaxNewSize , NewRatio , SurvivorRatio. Shall I remove these ? 2. Using above options are we overriding values set by UseAdaptiveSizePolicy ? Thanks, Vipin Sharma On 4/25/2014 3:01 AM, Jon Wrote: UseAdaptiveSizePolicy affects how the heap grows and shrinks.? UseAdaptiveSizePolicy grows and shrinks the heap within whatever size it is given for the heap.? There are default sizes but if you have preferred sizes, continue to use them. Jon Hi Jon, My understanding from your answer is: These options will override values used by UseAdaptiveSizePolicy. Also my Xmx is 2048m , MaxNewSize=64m and NewRatio=2 , it means below will be Heap structure when all parts of heap are allocated maximum memory Young(64 M ) + Old ( 128 M) + Rest Memory only for Permanent Generation Area I feel we are not using heap memory efficiently more than 1.8 GB is allocated for Permanent Generation only. Is it correct understanding ? Thanks Vipin Sharma From vipin.sharma at verizon.com Tue Apr 29 09:54:06 2014 From: vipin.sharma at verizon.com (Sharma, Vipin K) Date: Tue, 29 Apr 2014 05:54:06 -0400 Subject: is it correct Heap Structure on basis of given JVM Parameters ? Message-ID: <94424C8743C3F7479E3F25C7CC00BD4104B3A60142@FLDP1LUMXC7V81.us.one.verizon.com> HI All In my Java application ( JDK 7) JVM parameters are -Xms32m -Xmx2048m -Xss1m -XX:+UseParallelGC -XX:NewRatio=2 -XX:NewSize=8m -XX:MaxNewSize=64m -XX:SurvivorRatio=25 -XX:+UseAdaptiveSizePolicy Xmx2048m : Maximum Heap memory is 2GB MaxNewSize=64m : Maximum Young Generation Size 64 MB NewRatio=2 : Old Generation Size will be double of Young generation size so Maximum Old generation Size will be 64*2 As per my understanding below will be Heap structure when all parts of heap are allocated maximum memory Young(64 M ) + Old ( 128 M) + Rest Memory only for Permanent Generation Area I feel we are not using heap memory efficiently more than 1.8 GB is allocated for Permanent Generation only. Is it correct understanding ? if not then what will be heap structure in case we give max memory to all parts? Thanks, Vipin Sharma -------------- next part -------------- An HTML attachment was scrubbed... URL: From jon.masamitsu at oracle.com Wed Apr 30 02:16:36 2014 From: jon.masamitsu at oracle.com (Jon Masamitsu) Date: Tue, 29 Apr 2014 19:16:36 -0700 Subject: is it correct Heap Structure on basis of given JVM Parameters ? In-Reply-To: <94424C8743C3F7479E3F25C7CC00BD4104B3A60142@FLDP1LUMXC7V81.us.one.verizon.com> References: <94424C8743C3F7479E3F25C7CC00BD4104B3A60142@FLDP1LUMXC7V81.us.one.verizon.com> Message-ID: <53605D04.2040208@oracle.com> Vipin, If you want to see the heap after it has fully expanded, use -Xms2048m (not -Xms32m) and -XX:NewSize=64m (not -XX:NewSize=8m) and add -XX:PrintGCDetails -version to your java command line. You'll see something like this $JAVA_HOME/bin/java -XX:+PrintGCDetails -Xms2048m -XX:NewSize=64m -XX:NewRatio=2 -XX:MaxNewSize=64m -XX:SurvivorRatio=25 -Xmx2048m -version java version "1.7.0" Java(TM) SE Runtime Environment (build 1.7.0-b147) Java HotSpot(TM) Server VM (build 21.0-b17, mixed mode) Heap PSYoungGen total 63168K, used 2432K [0xf7c00000, 0xfbc00000, 0xfbc00000) eden space 60800K, 4% used [0xf7c00000,0xf7e600d0,0xfb760000) from space 2368K, 0% used [0xfb9b0000,0xfb9b0000,0xfbc00000) to space 2368K, 0% used [0xfb760000,0xfb760000,0xfb9b0000) PSOldGen total 2031616K, used 0K [0x7bc00000, 0xf7c00000, 0xf7c00000) object space 2031616K, 0% used [0x7bc00000,0x7bc00000,0xf7c00000) PSPermGen total 16384K, used 1356K [0x77c00000, 0x78c00000, 0x7bc00000) object space 16384K, 8% used [0x77c00000,0x77d532d0,0x78c00000) Jon On 4/29/2014 2:54 AM, Sharma, Vipin K wrote: > > HI All > > In my Java application ( JDK 7) JVM parameters are > > -Xms32m -Xmx2048m -Xss1m -XX:+UseParallelGC -XX:NewRatio=2 > -XX:NewSize=8m -XX:MaxNewSize=64m -XX:SurvivorRatio=25 > -XX:+UseAdaptiveSizePolicy > > Xmx2048m : Maximum Heap memory is 2GB > > MaxNewSize=64m : Maximum Young Generation Size 64 MB > > NewRatio=2 : Old Generation Size will be double > of Young generation size so Maximum Old generation Size will be 64*2 > > As per my understanding below will be Heap structure when all parts of > heap are allocated maximum memory > > Young(64 M ) + Old ( 128 M) + Rest Memory only for Permanent > Generation Area > > I feel we are not using heap memory efficiently more than 1.8 GB is > allocated for Permanent Generation only. > > Is it correct understanding ? if not then what will be heap structure > in case we give max memory to all parts? > > Thanks, > > Vipin Sharma > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: From Andreas.Mueller at mgm-tp.com Wed Apr 30 11:43:25 2014 From: Andreas.Mueller at mgm-tp.com (=?iso-8859-1?Q?Andreas_M=FCller?=) Date: Wed, 30 Apr 2014 11:43:25 +0000 Subject: New blog post: Controlling GC pauses with the GarbageFirst Collector Message-ID: <46FF8393B58AD84D95E444264805D98FBDE1B762@edata01.mgm-edv.de> Hi all, My new blog article has just been published here: http://blog.mgm-tp.com/2014/04/controlling-gc-pauses-with-g1-collector/ Please feel free to comment, either by replying directly or by email. Best regards / Mit freundlichen Gr??en Andreas Mueller mgm technology partners GmbH Frankfurter Ring 105a 80807 M?nchen Tel. +49 (89) 35 86 80-633 Fax +49 (89) 35 86 80-288 E-Mail Andreas.Mueller at mgm-tp.com Innovation Implemented. Sitz der Gesellschaft: M?nchen Gesch?ftsf?hrer: Hamarz Mehmanesh Handelsregister: AG M?nchen HRB 105068 -------------- next part -------------- An HTML attachment was scrubbed... URL: From christopherhurst at hotmail.com Wed Apr 30 14:58:39 2014 From: christopherhurst at hotmail.com (Chris Hurst) Date: Wed, 30 Apr 2014 14:58:39 -0000 Subject: Minor GC difference Java 7 vs Java 8 Message-ID: Hi, Has anyone seen anything similar to this ... On java 6 (range of versions 32bit Solaris) application , using parallel old gc, non adapative. Using a very heavy test performance load we see minor GC's around the 5ms mark and some very rare say 3or4 ish instances in 12 hours say 20ms pauses the number of pauses is random (though always few compares with the total number of GC's) and large ~20ms (this value appears the same for all such points.) We have a large number of minor GC's in our runs, only a full GC at startup. These freak GC's can be bunched or spread out and we can run for many hours without one (though doing minor GC's). What's odd is that if I use Java 7 (range of versions 32bit) the result is very close but the spikes (1 or 2 arguably less) are now 30-40ms (depends on run arguably even rarer). Has anyone experienced anything similar why would Java 7 up to double a minor GC / The GC throughput is approximately the same arguably 7 is better throughput just but that freak minor GC makes it usable due to latency. In terms of the change in spike height (20 (J6)vs40(J7)) this is very reproducible though the number of points and when they occur varies slightly. The over all GC graph , throughput is similar otherwise as is the resultant memory dump at the end. The test should be constant load, multiple clients just doing the same thing over and over. Has anyone seen anything similar, I was hoping someone might have seen a change in defaults, thread timeout, default data structure size change that would account for this. I was hoping the marked increase might be a give away to someone as its way off our average minor GC time. We have looked at gclogs, heap dumps, processor activity, background processes, amount of disc access, safepoints etc etc. , we trace message rate into out of the application for variation, compare heap dumps at end etc. nothing stands out so far. Chris -------------- next part -------------- An HTML attachment was scrubbed... URL: