From monica.beckwith at oracle.com Sat Jun 1 07:50:12 2013 From: monica.beckwith at oracle.com (Monica Beckwith) Date: Sat, 01 Jun 2013 09:50:12 -0500 Subject: Bug in G1GC it performs Full GC when code cache is full resulting in overkill In-Reply-To: References: <3807DE76-D6CA-451F-AC72-771332825905@salesforce.com> <5C892669-3515-4DCC-BB72-96AE89EBE5F8@salesforce.com> <51A63C5F.9040808@oracle.com> <51A7BCC7.4050700@oracle.com> Message-ID: <51AA0A24.2010103@oracle.com> comments inlined - On 5/30/2013 5:32 PM, Darius D. wrote: > Hi, > > Monica, thanks a lot for Your additional insight about H-Objects, as > i've already mentioned in this thread, your great JavaOne presentation > about G1GC was key in solving our problem. > Thanks for your feedback on the slides. I am very glad that you found them useful. This year also (for J1) we are hoping to provide more insight into G1 with some real world case studies (provided the talks get accepted :)). > You are right that it is all about "fitting" H-Object in old gen. In > our case no matter how high (within confines of what server memory > allowed us) we set the heap size, we were still getting H-Obj alloc > failures. Actually even drastically cutting H-Object count we were > still getting some Full GCs, only after we increased region size step > by step to 16M they ceased. It was subtle fragmentation issue, made > worse by rather small default region size. > > The following happened: > > 1) Big web request came in, 1-1.5GB of various allocations were done > to generate that Json string, triggering young gc in progress. > 2) After young gc heap was a bit fragmented cause there was temporary, > but still "live" data. All that data now got into fresh survivor/old > regions. The "health" of heap now really depends on how long ago last > mixed GC ran. > 3) Into such "sprayed" heap we start to allocate our big object. I > have no idea how a set of regions is chosen for humongous region, but > i think we were generating a total of ~30 humongous objects ( > "generating" as in resizing StringBuffer somewhere deep in web > framework till 30M fits) and that was too much for G1GC to cope. > 4) Reducing allocation rate is not enough unfortunately, those small > ones that slip are really dangerous - they are immediately allocated > in OldGen, fragmenting the heap further. > 5) It is now a race between big web requests and next mixed gc. We > could reliably reproduce Full GCs in testing by generating several > well timed big requests :) > > Getting region size up from default (actually I got a question, why > G1GC is aiming for thousands of regions, what are the drawbacks of > larger than "default" region) is more about reducing fragmentation by > keeping those large but temporary objects where they belong - in > nursery where G1GC can collect them efficiently. There are two parts to your question - The default region size for a particular configuration depends on your minimum heap size. You can increase the region size (by power of 2) up-to the max limit of 32M (minimum = 1M). The effect of changing the region size would be that previously H-objs are no longer H-objs. Changing the region size, would result in a change in the entries for Remembered Set (RSet) tables (hence, you may observe a change in the RSet times during a pause). I can think of other things like cross-region references can get affected as well. And the sizes of collection sets will get affected as well. Now the first part of your question - The 2048 number of regions is just a means to get a number between 1M and 32M for the region size, the real limits are the max (32M) and min (1M) bounds. As to the max region size - the upper bound of 32M is deliberately introduced, since there is a tradeoff between larger regions and the effectiveness of the 'cleanup' phase of the multi-phased marking cycle. Since cleanup reclaims *completely* free regions and returns them to the free regions list, for larger sized regions, these *completely* free regions may not be that easy to come by. You can read more in heapRegion.cpp. hope this helps. > > So we have 3 important todo and tunables in avoiding H-Object caused > FullGC: > > > 1) Code changes - any profiler that can record every allocation stack > trace will help, set it to record each alloc above 1/2 heap region > size and limit them as much as possible > 2) IHOP should be tuned to allow mixed GCs frequently enough even if > your app is behaving perfectly ( stable old gen + temporaries in > nursery with little promotion going on ). > 3) G1 region size can be increased to reduce heap fragmentation caused > by H-Objects if they are temporary (by reducing them to ordinary > objects allocated in young gen) > > Makes sense. -Monica > Darius. > > > > > > On Thu, May 30, 2013 at 11:55 PM, Monica Beckwith > > wrote: > > +1 to enabling PrintAdaptiveSizePolicy. > Darius, > As you have already mentioned - Any object with size greater or > equal to a half region is called a "humongous" object (H-obj). The > max region size for G1 is 32M. So yes, even if you set your region > size to max, your 32M object will be considered humongous. > Now, there are a couple of things that we should be aware of with > respect to humongous regions (H-region)/objects - > > 1. The H-obj allocation will happen directly into the old generation > 1. There will be a check for marking threshold (IHOP), and a > concurrent cycle will be initiated if necessary > 2. The H-regions are not included in an evacuation pause, since > it's just going to increase the copying expense. > 1. But if the H-obj(s) are dead, they get freed at the end of > the multi-phased concurrent marking cycle. > > So, I think, if you have to work with H-objs and increasing the > region size doesn't help (as is your case), then maybe you should > try limiting your nursery so as to allow more space for the old > generation, so as to sustain your H-objs till they die or if they > are a part of your live data set, then it's all the more necessary > to be able to fit them in your old gen. > > -Monica > > > On 5/30/2013 1:54 PM, Martin Makundi wrote: >> Hi! >> >>> I'd strongly suggest that Martin should add >>> -XX:+PrintAdaptiveSizePolicy to his JVM options. In our case that was >>> what we needed to solve the mystery of FullGCs with gigabytes of heap >>> free. >> Thanks, will add that tomorrow. >> >>> Actually with some minor googling around i've found: >>> >>> https://forums.oracle.com/forums/thread.jspa?messageID=10869877 >>> >>> I suspect it could be same story as ours, "humongous allocation >>> request failed" is bad for JVM health, FullGC will occur immediately. >>> >>> Remember, any allocation that is larger than half of G1GC region size >>> will get allocated as "humongous" object on heap, that does not care >>> about regions etc. In our case we were failing to allocate 32 >>> megabytes with over 50% of heap free! >> Any solution to such problem or is it a bug in g1gc? Is there a way to >> log what code is performing the memory allocation if that happens to >> be the case? >> >> ** >> Martin >>> Best regards, >>> >>> Darius. >>> >>> >>> >>> On Wed, May 29, 2013 at 8:35 PM, John Cuthbertson >>> wrote: >>>> Hi Martin, >>>> >>>> I'm going to fill in bit more detail to Charlie's replies.... >>>> >>>> On 5/29/2013 8:49 AM, Martin Makundi wrote: >>>>> Hi! >>>>> >>>>>> A bit of constructive criticism ;-) It would be good practice to set one option at a time and measure its performance to determine whether it improves performance rather than choosing an option because of something you read in text. In short, always measure and reason about whether what you've observed for an improvement or regression makes sense. And, also run multiple times to get a sense of noise versus real improvement or regression. >>>>> Thanks. That's one of the reasons we never changed our options. Once >>>>> we found someting that works very well, we know that its always n! >>>>> work to test changes and the system was running very nice indeed >>>>> before the previous tweak ;) >>>>> >>>>>>>> -XX:+UseFastAccessorMethod (the default is disabled) >>>>>>> Fast sounds good, the description of it is "Use optimized versions of >>>>>>> GetField" which sounds good. I see no harm in this. >>>>>> These would be JNI operations. >>>>>> >>>>>> A quick at the HotSpot source suggests UseFastAccessorMethods >>>>>> is mostly confined to interpreter operations. >>>>> Thanks for the info. Doesn't say much to me, but does not seem to harm >>>>> anything. Will try setting it off at some point in time. >>>>> >>>>>>>> -XX:+UseNUMA (Are you running a JVM that spans NUMA memory nodes? >>>>>>>> Or, do you have multiple JVMs running on a NUMA or non-NUMA system?) >>>>>> Key point is that you should use -XX:+UseNUMA only when you are >>>>>> deploying a JVM that spans NUMA nodes. >>>>> Thanks for the info. Doesn't say much to me, but does not seem to harm >>>>> anything. Will try setting it off at some point in time. >>>> The fast accessor methods flag creates specialized (i.e. short and >>>> optimized) interpreter entry points for accessor methods (those that >>>> just return the value in one of the object's fields). In most >>>> applications the bulk of the execution time is spent executing JIT >>>> compiled code; only a few percent is typically spent in Hotspot's >>>> interpreter. The JIT compiler will always try to inline accessor methods >>>> into their caller. So, unless your application is spending a ton of time >>>> interpreting, this flag should make no difference. >>>> >>>>>>>> -XX:CMSInitiatingOccupancyFraction=70 (This is applicable to CMS GC, and not applicable to G1 GC) >>>>>>> Again, not documented thoroughly where it applies and where not, jvm >>>>>>> gave no warning/error about it so we assumed it's valid. >>>>>> There's always the HotSpot source code ;-) >>>>>> >>>>>> It's also quite well documented in various slide ware on the internet. >>>>>> It's also quite well documented in the Java Performance book. :-) >>>>> Uh.. does it say somewhere that Do not use >>>>> XX:CMSInitiatingOccupancyFraction with G1GC? ;) I know performance >>>>> tuning is your bread and butter but is not ours... is more like we are >>>>> just driving the car and you are the mechanic...different >>>>> perspective.. just trying to fill'er'up to go..leaded or unleaded... >>>>> ;) >>>> The G1 equivalent of this flag is InitiatingHeapOccupancyPercent (IHOP >>>> for short). Actually both G1 and CMS accept and observe IHOP. >>>> CMSInitiatingOccupancyFraction was superseded by IHOP. The JVM still >>>> accepts the old flag name - but it is CMS only and doesn't affect G1. >>>> >>>>>>>> -XX:InitiatingHeapOccupancyPercent=0 (You realize this will force G1's concurrent cycle to run continuously?) >>>>>>> Yes, that's what we figured out, we don't want it to sit lazy and end >>>>>>> up in a situation where it is required to do a Full GC. This switch >>>>>>> was specifically chosen in a situation we had a memory leak and tried >>>>>>> to aggressively fight against it before we found the root cause. Maybe >>>>>>> we should try without this switch now, and see what effect it has. >>>>>> Having GC logs to see what available head room you have between the >>>>>> initiating of a G1 concurrent cycle and available regions / heap space would be most appropriate. >>>>> Hmm.. I don't thoroughly understand the logs either, but, here is a snap: >>>>> >>>>> 2013-05-29T17:28:56.119+0300: 38905.407: [GC pause (young), 0.29261000 secs] >>>>> [Parallel Time: 288.8 ms] >>>>> [GC Worker Start (ms): 38905407.6 38905407.7 38905407.7 38905407.9 >>>>> Avg: 38905407.7, Min: 38905407.6, Max: 38905407.9, Diff: 0.3] >>>>> [Ext Root Scanning (ms): 22.8 16.3 18.1 22.0 >>>>> Avg: 19.8, Min: 16.3, Max: 22.8, Diff: 6.6] >>>>> [SATB Filtering (ms): 0.0 0.1 0.0 0.0 >>>>> Avg: 0.0, Min: 0.0, Max: 0.1, Diff: 0.1] >>>>> [Update RS (ms): 31.9 37.3 35.1 33.3 >>>>> Avg: 34.4, Min: 31.9, Max: 37.3, Diff: 5.5] >>>>> [Processed Buffers : 102 106 119 104 >>>>> Sum: 431, Avg: 107, Min: 102, Max: 119, Diff: 17] >>>>> [Scan RS (ms): 0.0 0.0 0.1 0.0 >>>>> Avg: 0.0, Min: 0.0, Max: 0.1, Diff: 0.1] >>>>> [Object Copy (ms): 228.2 229.1 229.5 227.3 >>>>> Avg: 228.5, Min: 227.3, Max: 229.5, Diff: 2.2] >>>>> [Termination (ms): 0.0 0.0 0.0 0.0 >>>>> Avg: 0.0, Min: 0.0, Max: 0.0, Diff: 0.0] >>>>> [Termination Attempts : 4 1 11 4 >>>>> Sum: 20, Avg: 5, Min: 1, Max: 11, Diff: 10] >>>>> [GC Worker End (ms): 38905690.5 38905690.5 38905690.5 38905690.5 >>>>> Avg: 38905690.5, Min: 38905690.5, Max: 38905690.5, Diff: 0.0] >>>>> [GC Worker (ms): 282.9 282.8 282.8 282.6 >>>>> Avg: 282.8, Min: 282.6, Max: 282.9, Diff: 0.3] >>>>> [GC Worker Other (ms): 5.9 6.0 6.0 6.2 >>>>> Avg: 6.0, Min: 5.9, Max: 6.2, Diff: 0.3] >>>>> [Complete CSet Marking: 0.0 ms] >>>>> [Clear CT: 0.1 ms] >>>>> [Other: 3.7 ms] >>>>> [Choose CSet: 0.0 ms] >>>>> [Ref Proc: 2.8 ms] >>>>> [Ref Enq: 0.1 ms] >>>>> [Free CSet: 0.3 ms] >>>>> [Eden: 48M(5032M)->0B(5032M) Survivors: 288M->288M Heap: >>>>> 15790M(26624M)->15741M(26624M)] >>>>> [Times: user=1.14 sys=0.00, real=0.29 secs] >>>>> Heap after GC invocations=575 (full 157): >>>>> garbage-first heap total 27262976K, used 16119181K >>>>> [0x0000000160000000, 0x00000007e0000000, 0x00000007e0000000) >>>>> region size 8192K, 36 young (294912K), 36 survivors (294912K) >>>>> compacting perm gen total 524288K, used 164479K [0x00000007e0000000, >>>>> 0x0000000800000000, 0x0000000800000000) >>>>> the space 524288K, 31% used [0x00000007e0000000, >>>>> 0x00000007ea09ff68, 0x00000007ea0a0000, 0x0000000800000000) >>>>> No shared spaces configured. >>>>> } >>>>> {Heap before GC invocations=575 (full 157): >>>>> garbage-first heap total 27262976K, used 16119181K >>>>> [0x0000000160000000, 0x00000007e0000000, 0x00000007e0000000) >>>>> region size 8192K, 37 young (303104K), 36 survivors (294912K) >>>>> compacting perm gen total 524288K, used 164479K [0x00000007e0000000, >>>>> 0x0000000800000000, 0x0000000800000000) >>>>> the space 524288K, 31% used [0x00000007e0000000, >>>>> 0x00000007ea09ff68, 0x00000007ea0a0000, 0x0000000800000000) >>>>> No shared spaces configured. >>>>> 2013-05-29T17:28:56.413+0300: 38905.701: [Full GC >>>>> 15742M->14497M(26624M), 56.7731320 secs] >>>>> >>>>> That's the third Full GC today after the change to 26G and change from >>>>> occupancypercent=0. Tomorrow will be trying again with >>>>> occupancypercent=0 >>>> What did you set the IHOP value to? >>>> >>>>>>>> -noclassgc (This is rarely needed and haven't seen an app that required it for quite some time) >>>>>>> Jvm 1.6 stopped the world for couple of minutes several times per day >>>>>>> while unloading classes, so we used noclassgc to disable that. We do >>>>>>> not know if this is necessary for latest 1.7 to avoid class unload >>>>>>> pause, but we continued to use this switch and found no harm in it. >>>>>>> Can't afford testing that in production ;) >>>>>> Haven't seen a case where unloading classes cause a several minute pause. >>>>>> Are you sure your system is not swapping? And, do you have GC logs you >>>>>> can share that illustrate the behavior and that -noclassgc fixed it? >>>>> We deleted swap partition long time ago, we simply do not risk swapping at all. >>>>> >>>>> We had this class unloading problem several times per day like half a >>>>> year ago, and fixed it with noclasssgc, that was a no-brainer, single >>>>> parameter that made the difference. >>>>> >>>>> It is also discussed here (they do not discuss noclassgc though, we >>>>> figured that out somehow) >>>>> http://stackoverflow.com/questions/2833983/meaning-of-the-unloading-class-messages >>>> G1 only performs class unloading during a full GC. But if you're not >>>> running out of perm space or compiled code cache - you can leave this flag. >>>> >>>>>>>> -XX:+ UseGCOverheadLimit >>>>>>>> -XX:ReservedCodeCacheSize=48, that is the default for 7u21. You might consider setting it higher if you have the available space, and more importantly if you think you're running out of code space. >>>>>>> For our sun jvm linux 64bit 48m is maximum, jvm won't start if higher value. >>>>>> If you can't go larger than -XX:ReservedCodeCacheSize=48m, that may suggest you have memory constraints and may also suggest you don't have enough swap space defined, and you may be experiencing swapping during JVM execution. I've got a Linux system that has 32 GB of RAM, I can set ReservedCodeCacheSize=256m with no issues, even with -Xms30g and -Xms30g. >>>>> It is also documented that 48m is maximum >>>>> http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html >>>>> "maximum code cache size. [Solaris 64-bit, amd64, and -server x86: >>>>> 48m" >>>>> >>>>> >>>> That's the default max code cache size. When the JIT compiler compiles a >>>> Java method it places the generated code into the code cache. When >>>> there's no more room in the code cache, a warning is issued and JIT >>>> compilation is stopped. You can set it higher. IIRC there was time in >>>> the past when the size was limited in order to use short branches in >>>> compiled code. I don't think we've had that restriction for a while. >>>> >>>> HTHs >>>> >>>> JohnC >>>> _______________________________________________ >>>> hotspot-gc-use mailing list >>>> hotspot-gc-use at openjdk.java.net >>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>> _______________________________________________ >>> hotspot-gc-use mailing list >>> hotspot-gc-use at openjdk.java.net >>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > -- > Oracle > Monica Beckwith | Principal Member of Technical Staff > VOIP: +15124011274 > Oracle Java Performance > > Green Oracle Oracle is > committed to developing practices and products that help protect > the environment > > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -- Oracle Monica Beckwith | Principal Member of Technical Staff VOIP: +15124011274 Oracle Java Performance Green Oracle Oracle is committed to developing practices and products that help protect the environment -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130601/689c7987/attachment-0001.html -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 658 bytes Desc: not available Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130601/689c7987/attachment-0002.gif -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 356 bytes Desc: not available Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130601/689c7987/attachment-0003.gif -------------- next part -------------- A non-text attachment was scrubbed... Name: oracle_sig_logo.gif Type: image/gif Size: 658 bytes Desc: not available Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130601/689c7987/oracle_sig_logo-0001.gif -------------- next part -------------- A non-text attachment was scrubbed... Name: green-for-email-sig_0.gif Type: image/gif Size: 356 bytes Desc: not available Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130601/689c7987/green-for-email-sig_0-0001.gif From chunt at salesforce.com Sun Jun 2 08:42:17 2013 From: chunt at salesforce.com (Charlie Hunt) Date: Sun, 2 Jun 2013 10:42:17 -0500 Subject: Bug in G1GC it performs Full GC when code cache is full resulting in overkill In-Reply-To: <51AA0A24.2010103@oracle.com> References: <3807DE76-D6CA-451F-AC72-771332825905@salesforce.com> <5C892669-3515-4DCC-BB72-96AE89EBE5F8@salesforce.com> <51A63C5F.9040808@oracle.com> <51A7BCC7.4050700@oracle.com> <51AA0A24.2010103@oracle.com> Message-ID: <0014A986-1651-44C6-8D0D-B77E200C18D6@salesforce.com> Monica / John: thanks for jumping in, much appreciated! I'm kinda swamped with other stuff at the moment. On Jun 1, 2013, at 9:50 AM, Monica Beckwith wrote: > comments inlined - > On 5/30/2013 5:32 PM, Darius D. wrote: >> Hi, >> >> Monica, thanks a lot for Your additional insight about H-Objects, as i've already mentioned in this thread, your great JavaOne presentation about G1GC was key in solving our problem. >> > Thanks for your feedback on the slides. I am very glad that you found them useful. This year also (for J1) we are hoping to provide more insight into G1 with some real world case studies (provided the talks get accepted :)). >> You are right that it is all about "fitting" H-Object in old gen. In our case no matter how high (within confines of what server memory allowed us) we set the heap size, we were still getting H-Obj alloc failures. Actually even drastically cutting H-Object count we were still getting some Full GCs, only after we increased region size step by step to 16M they ceased. It was subtle fragmentation issue, made worse by rather small default region size. >> >> The following happened: >> >> 1) Big web request came in, 1-1.5GB of various allocations were done to generate that Json string, triggering young gc in progress. >> 2) After young gc heap was a bit fragmented cause there was temporary, but still "live" data. All that data now got into fresh survivor/old regions. The "health" of heap now really depends on how long ago last mixed GC ran. >> 3) Into such "sprayed" heap we start to allocate our big object. I have no idea how a set of regions is chosen for humongous region, but i think we were generating a total of ~30 humongous objects ( "generating" as in resizing StringBuffer somewhere deep in web framework till 30M fits) and that was too much for G1GC to cope. >> 4) Reducing allocation rate is not enough unfortunately, those small ones that slip are really dangerous - they are immediately allocated in OldGen, fragmenting the heap further. >> 5) It is now a race between big web requests and next mixed gc. We could reliably reproduce Full GCs in testing by generating several well timed big requests :) >> >> Getting region size up from default (actually I got a question, why G1GC is aiming for thousands of regions, what are the drawbacks of larger than "default" region) is more about reducing fragmentation by keeping those large but temporary objects where they belong - in nursery where G1GC can collect them efficiently. > There are two parts to your question - > The default region size for a particular configuration depends on your minimum heap size. You can increase the region size (by power of 2) up-to the max limit of 32M (minimum = 1M). The effect of changing the region size would be that previously H-objs are no longer H-objs. Changing the region size, would result in a change in the entries for Remembered Set (RSet) tables (hence, you may observe a change in the RSet times during a pause). I can think of other things like cross-region references can get affected as well. And the sizes of collection sets will get affected as well. > > Now the first part of your question - The 2048 number of regions is just a means to get a number between 1M and 32M for the region size, the real limits are the max (32M) and min (1M) bounds. As to the max region size - the upper bound of 32M is deliberately introduced, since there is a tradeoff between larger regions and the effectiveness of the 'cleanup' phase of the multi-phased marking cycle. Since cleanup reclaims *completely* free regions and returns them to the free regions list, for larger sized regions, these *completely* free regions may not be that easy to come by. > You can read more in heapRegion.cpp. > > hope this helps. > >> >> So we have 3 important todo and tunables in avoiding H-Object caused FullGC: >> >> >> 1) Code changes - any profiler that can record every allocation stack trace will help, set it to record each alloc above 1/2 heap region size and limit them as much as possible >> 2) IHOP should be tuned to allow mixed GCs frequently enough even if your app is behaving perfectly ( stable old gen + temporaries in nursery with little promotion going on ). >> 3) G1 region size can be increased to reduce heap fragmentation caused by H-Objects if they are temporary (by reducing them to ordinary objects allocated in young gen) >> >> > Makes sense. > > -Monica > >> Darius. >> >> >> >> >> >> On Thu, May 30, 2013 at 11:55 PM, Monica Beckwith wrote: >> +1 to enabling PrintAdaptiveSizePolicy. >> Darius, >> As you have already mentioned - Any object with size greater or equal to a half region is called a "humongous" object (H-obj). The max region size for G1 is 32M. So yes, even if you set your region size to max, your 32M object will be considered humongous. >> Now, there are a couple of things that we should be aware of with respect to humongous regions (H-region)/objects - >> The H-obj allocation will happen directly into the old generation >> There will be a check for marking threshold (IHOP), and a concurrent cycle will be initiated if necessary >> The H-regions are not included in an evacuation pause, since it's just going to increase the copying expense. >> But if the H-obj(s) are dead, they get freed at the end of the multi-phased concurrent marking cycle. >> So, I think, if you have to work with H-objs and increasing the region size doesn't help (as is your case), then maybe you should try limiting your nursery so as to allow more space for the old generation, so as to sustain your H-objs till they die or if they are a part of your live data set, then it's all the more necessary to be able to fit them in your old gen. >> >> -Monica >> >> >> On 5/30/2013 1:54 PM, Martin Makundi wrote: >>> Hi! >>> >>>> I'd strongly suggest that Martin should add >>>> -XX:+PrintAdaptiveSizePolicy to his JVM options. In our case that was >>>> what we needed to solve the mystery of FullGCs with gigabytes of heap >>>> free. >>> Thanks, will add that tomorrow. >>> >>>> Actually with some minor googling around i've found: >>>> >>>> https://forums.oracle.com/forums/thread.jspa?messageID=10869877 >>>> >>>> I suspect it could be same story as ours, "humongous allocation >>>> request failed" is bad for JVM health, FullGC will occur immediately. >>>> >>>> Remember, any allocation that is larger than half of G1GC region size >>>> will get allocated as "humongous" object on heap, that does not care >>>> about regions etc. In our case we were failing to allocate 32 >>>> megabytes with over 50% of heap free! >>> Any solution to such problem or is it a bug in g1gc? Is there a way to >>> log what code is performing the memory allocation if that happens to >>> be the case? >>> >>> ** >>> Martin >>>> Best regards, >>>> >>>> Darius. >>>> >>>> >>>> >>>> On Wed, May 29, 2013 at 8:35 PM, John Cuthbertson >>>> wrote: >>>>> Hi Martin, >>>>> >>>>> I'm going to fill in bit more detail to Charlie's replies.... >>>>> >>>>> On 5/29/2013 8:49 AM, Martin Makundi wrote: >>>>>> Hi! >>>>>> >>>>>>> A bit of constructive criticism ;-) It would be good practice to set one option at a time and measure its performance to determine whether it improves performance rather than choosing an option because of something you read in text. In short, always measure and reason about whether what you've observed for an improvement or regression makes sense. And, also run multiple times to get a sense of noise versus real improvement or regression. >>>>>> Thanks. That's one of the reasons we never changed our options. Once >>>>>> we found someting that works very well, we know that its always n! >>>>>> work to test changes and the system was running very nice indeed >>>>>> before the previous tweak ;) >>>>>> >>>>>>>>> -XX:+UseFastAccessorMethod (the default is disabled) >>>>>>>> Fast sounds good, the description of it is "Use optimized versions of >>>>>>>> GetField" which sounds good. I see no harm in this. >>>>>>> These would be JNI operations. >>>>>>> >>>>>>> A quick at the HotSpot source suggests UseFastAccessorMethods >>>>>>> is mostly confined to interpreter operations. >>>>>> Thanks for the info. Doesn't say much to me, but does not seem to harm >>>>>> anything. Will try setting it off at some point in time. >>>>>> >>>>>>>>> -XX:+UseNUMA (Are you running a JVM that spans NUMA memory nodes? >>>>>>>>> Or, do you have multiple JVMs running on a NUMA or non-NUMA system?) >>>>>>> Key point is that you should use -XX:+UseNUMA only when you are >>>>>>> deploying a JVM that spans NUMA nodes. >>>>>> Thanks for the info. Doesn't say much to me, but does not seem to harm >>>>>> anything. Will try setting it off at some point in time. >>>>> The fast accessor methods flag creates specialized (i.e. short and >>>>> optimized) interpreter entry points for accessor methods (those that >>>>> just return the value in one of the object's fields). In most >>>>> applications the bulk of the execution time is spent executing JIT >>>>> compiled code; only a few percent is typically spent in Hotspot's >>>>> interpreter. The JIT compiler will always try to inline accessor methods >>>>> into their caller. So, unless your application is spending a ton of time >>>>> interpreting, this flag should make no difference. >>>>> >>>>>>>>> -XX:CMSInitiatingOccupancyFraction=70 (This is applicable to CMS GC, and not applicable to G1 GC) >>>>>>>> Again, not documented thoroughly where it applies and where not, jvm >>>>>>>> gave no warning/error about it so we assumed it's valid. >>>>>>> There's always the HotSpot source code ;-) >>>>>>> >>>>>>> It's also quite well documented in various slide ware on the internet. >>>>>>> It's also quite well documented in the Java Performance book. :-) >>>>>> Uh.. does it say somewhere that Do not use >>>>>> XX:CMSInitiatingOccupancyFraction with G1GC? ;) I know performance >>>>>> tuning is your bread and butter but is not ours... is more like we are >>>>>> just driving the car and you are the mechanic...different >>>>>> perspective.. just trying to fill'er'up to go..leaded or unleaded... >>>>>> ;) >>>>> The G1 equivalent of this flag is InitiatingHeapOccupancyPercent (IHOP >>>>> for short). Actually both G1 and CMS accept and observe IHOP. >>>>> CMSInitiatingOccupancyFraction was superseded by IHOP. The JVM still >>>>> accepts the old flag name - but it is CMS only and doesn't affect G1. >>>>> >>>>>>>>> -XX:InitiatingHeapOccupancyPercent=0 (You realize this will force G1's concurrent cycle to run continuously?) >>>>>>>> Yes, that's what we figured out, we don't want it to sit lazy and end >>>>>>>> up in a situation where it is required to do a Full GC. This switch >>>>>>>> was specifically chosen in a situation we had a memory leak and tried >>>>>>>> to aggressively fight against it before we found the root cause. Maybe >>>>>>>> we should try without this switch now, and see what effect it has. >>>>>>> Having GC logs to see what available head room you have between the >>>>>>> initiating of a G1 concurrent cycle and available regions / heap space would be most appropriate. >>>>>> Hmm.. I don't thoroughly understand the logs either, but, here is a snap: >>>>>> >>>>>> 2013-05-29T17:28:56.119+0300: 38905.407: [GC pause (young), 0.29261000 secs] >>>>>> [Parallel Time: 288.8 ms] >>>>>> [GC Worker Start (ms): 38905407.6 38905407.7 38905407.7 38905407.9 >>>>>> Avg: 38905407.7, Min: 38905407.6, Max: 38905407.9, Diff: 0.3] >>>>>> [Ext Root Scanning (ms): 22.8 16.3 18.1 22.0 >>>>>> Avg: 19.8, Min: 16.3, Max: 22.8, Diff: 6.6] >>>>>> [SATB Filtering (ms): 0.0 0.1 0.0 0.0 >>>>>> Avg: 0.0, Min: 0.0, Max: 0.1, Diff: 0.1] >>>>>> [Update RS (ms): 31.9 37.3 35.1 33.3 >>>>>> Avg: 34.4, Min: 31.9, Max: 37.3, Diff: 5.5] >>>>>> [Processed Buffers : 102 106 119 104 >>>>>> Sum: 431, Avg: 107, Min: 102, Max: 119, Diff: 17] >>>>>> [Scan RS (ms): 0.0 0.0 0.1 0.0 >>>>>> Avg: 0.0, Min: 0.0, Max: 0.1, Diff: 0.1] >>>>>> [Object Copy (ms): 228.2 229.1 229.5 227.3 >>>>>> Avg: 228.5, Min: 227.3, Max: 229.5, Diff: 2.2] >>>>>> [Termination (ms): 0.0 0.0 0.0 0.0 >>>>>> Avg: 0.0, Min: 0.0, Max: 0.0, Diff: 0.0] >>>>>> [Termination Attempts : 4 1 11 4 >>>>>> Sum: 20, Avg: 5, Min: 1, Max: 11, Diff: 10] >>>>>> [GC Worker End (ms): 38905690.5 38905690.5 38905690.5 38905690.5 >>>>>> Avg: 38905690.5, Min: 38905690.5, Max: 38905690.5, Diff: 0.0] >>>>>> [GC Worker (ms): 282.9 282.8 282.8 282.6 >>>>>> Avg: 282.8, Min: 282.6, Max: 282.9, Diff: 0.3] >>>>>> [GC Worker Other (ms): 5.9 6.0 6.0 6.2 >>>>>> Avg: 6.0, Min: 5.9, Max: 6.2, Diff: 0.3] >>>>>> [Complete CSet Marking: 0.0 ms] >>>>>> [Clear CT: 0.1 ms] >>>>>> [Other: 3.7 ms] >>>>>> [Choose CSet: 0.0 ms] >>>>>> [Ref Proc: 2.8 ms] >>>>>> [Ref Enq: 0.1 ms] >>>>>> [Free CSet: 0.3 ms] >>>>>> [Eden: 48M(5032M)->0B(5032M) Survivors: 288M->288M Heap: >>>>>> 15790M(26624M)->15741M(26624M)] >>>>>> [Times: user=1.14 sys=0.00, real=0.29 secs] >>>>>> Heap after GC invocations=575 (full 157): >>>>>> garbage-first heap total 27262976K, used 16119181K >>>>>> [0x0000000160000000, 0x00000007e0000000, 0x00000007e0000000) >>>>>> region size 8192K, 36 young (294912K), 36 survivors (294912K) >>>>>> compacting perm gen total 524288K, used 164479K [0x00000007e0000000, >>>>>> 0x0000000800000000, 0x0000000800000000) >>>>>> the space 524288K, 31% used [0x00000007e0000000, >>>>>> 0x00000007ea09ff68, 0x00000007ea0a0000, 0x0000000800000000) >>>>>> No shared spaces configured. >>>>>> } >>>>>> {Heap before GC invocations=575 (full 157): >>>>>> garbage-first heap total 27262976K, used 16119181K >>>>>> [0x0000000160000000, 0x00000007e0000000, 0x00000007e0000000) >>>>>> region size 8192K, 37 young (303104K), 36 survivors (294912K) >>>>>> compacting perm gen total 524288K, used 164479K [0x00000007e0000000, >>>>>> 0x0000000800000000, 0x0000000800000000) >>>>>> the space 524288K, 31% used [0x00000007e0000000, >>>>>> 0x00000007ea09ff68, 0x00000007ea0a0000, 0x0000000800000000) >>>>>> No shared spaces configured. >>>>>> 2013-05-29T17:28:56.413+0300: 38905.701: [Full GC >>>>>> 15742M->14497M(26624M), 56.7731320 secs] >>>>>> >>>>>> That's the third Full GC today after the change to 26G and change from >>>>>> occupancypercent=0. Tomorrow will be trying again with >>>>>> occupancypercent=0 >>>>> What did you set the IHOP value to? >>>>> >>>>>>>>> -noclassgc (This is rarely needed and haven't seen an app that required it for quite some time) >>>>>>>> Jvm 1.6 stopped the world for couple of minutes several times per day >>>>>>>> while unloading classes, so we used noclassgc to disable that. We do >>>>>>>> not know if this is necessary for latest 1.7 to avoid class unload >>>>>>>> pause, but we continued to use this switch and found no harm in it. >>>>>>>> Can't afford testing that in production ;) >>>>>>> Haven't seen a case where unloading classes cause a several minute pause. >>>>>>> Are you sure your system is not swapping? And, do you have GC logs you >>>>>>> can share that illustrate the behavior and that -noclassgc fixed it? >>>>>> We deleted swap partition long time ago, we simply do not risk swapping at all. >>>>>> >>>>>> We had this class unloading problem several times per day like half a >>>>>> year ago, and fixed it with noclasssgc, that was a no-brainer, single >>>>>> parameter that made the difference. >>>>>> >>>>>> It is also discussed here (they do not discuss noclassgc though, we >>>>>> figured that out somehow) >>>>>> http://stackoverflow.com/questions/2833983/meaning-of-the-unloading-class-messages >>>>> G1 only performs class unloading during a full GC. But if you're not >>>>> running out of perm space or compiled code cache - you can leave this flag. >>>>> >>>>>>>>> -XX:+ UseGCOverheadLimit >>>>>>>>> -XX:ReservedCodeCacheSize=48, that is the default for 7u21. You might consider setting it higher if you have the available space, and more importantly if you think you're running out of code space. >>>>>>>> For our sun jvm linux 64bit 48m is maximum, jvm won't start if higher value. >>>>>>> If you can't go larger than -XX:ReservedCodeCacheSize=48m, that may suggest you have memory constraints and may also suggest you don't have enough swap space defined, and you may be experiencing swapping during JVM execution. I've got a Linux system that has 32 GB of RAM, I can set ReservedCodeCacheSize=256m with no issues, even with -Xms30g and -Xms30g. >>>>>> It is also documented that 48m is maximum >>>>>> http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html >>>>>> "maximum code cache size. [Solaris 64-bit, amd64, and -server x86: >>>>>> 48m" >>>>>> >>>>>> >>>>> That's the default max code cache size. When the JIT compiler compiles a >>>>> Java method it places the generated code into the code cache. When >>>>> there's no more room in the code cache, a warning is issued and JIT >>>>> compilation is stopped. You can set it higher. IIRC there was time in >>>>> the past when the size was limited in order to use short branches in >>>>> compiled code. I don't think we've had that restriction for a while. >>>>> >>>>> HTHs >>>>> >>>>> JohnC >>>>> _______________________________________________ >>>>> hotspot-gc-use mailing list >>>>> hotspot-gc-use at openjdk.java.net >>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>>> _______________________________________________ >>>> hotspot-gc-use mailing list >>>> hotspot-gc-use at openjdk.java.net >>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>> _______________________________________________ >>> hotspot-gc-use mailing list >>> hotspot-gc-use at openjdk.java.net >>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> >> -- >> >> Monica Beckwith | Principal Member of Technical Staff >> VOIP: +15124011274 >> Oracle Java Performance >> >> Oracle is committed to developing practices and products that help protect the environment >> >> >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > -- > > Monica Beckwith | Principal Member of Technical Staff > VOIP: +15124011274 > Oracle Java Performance > > Oracle is committed to developing practices and products that help protect the environment > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130602/a0c97f5c/attachment-0001.html From martin.makundi at koodaripalvelut.com Sun Jun 2 19:10:39 2013 From: martin.makundi at koodaripalvelut.com (Martin Makundi) Date: Mon, 3 Jun 2013 05:10:39 +0300 Subject: Bug in G1GC it performs Full GC when code cache is full resulting in overkill In-Reply-To: <0014A986-1651-44C6-8D0D-B77E200C18D6@salesforce.com> References: <3807DE76-D6CA-451F-AC72-771332825905@salesforce.com> <5C892669-3515-4DCC-BB72-96AE89EBE5F8@salesforce.com> <51A63C5F.9040808@oracle.com> <51A7BCC7.4050700@oracle.com> <51AA0A24.2010103@oracle.com> <0014A986-1651-44C6-8D0D-B77E200C18D6@salesforce.com> Message-ID: Hi! We seem to be suffering from humongous and something else too... I will attempt to investigate how the cause can be captured to see if they can be made smaller. Why does it need to do a stop the world instead of just making space for the items that need to be placed? 1. Here some humongous: 35747.355: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: humongous allocation request failed, allocation request: 47906832 bytes] 35747.355: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 50331648 bytes, attempted expansion amount: 50331648 bytes] 35747.355: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap expansion operation failed] 35747.356: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: humongous allocation request failed, allocation request: 47906832 bytes] 35747.356: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 50331648 bytes, attempted expansion amount: 50331648 bytes] 35747.356: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap expansion operation failed] 35747.356: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: allocation request failed, allocation request: 47906832 bytes] 35747.356: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 47906832 bytes, attempted expansion amount: 50331648 bytes] 35747.356: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap expansion operation failed] {Heap before GC invocations=729 (full 242): garbage-first heap total 27262976K, used 18459177K [0x0000000160000000, 0x00000007e0000000, 0x00000007e0000000) region size 8192K, 30 young (245760K), 30 survivors (245760K) compacting perm gen total 524288K, used 171121K [0x00000007e0000000, 0x0000000800000000, 0x0000000800000000) the space 524288K, 32% used [0x00000007e0000000, 0x00000007ea71c778, 0x00000007ea71c800, 0x0000000800000000) No shared spaces configured. 2013-05-31T16:01:01.323+0300: 35747.356: [Full GC2013-05-31T16:01:01.857+0300: 35747.890: [GC concurrent-mark-end, 51.0519450 sec] 2. Here memory is simply full: 44577.340: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: allocation request failed, allocation request: 80 bytes] 44577.340: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 8388608 bytes, attempted expansion amount: 8388608 bytes] 44577.340: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap expansion operation failed] {Heap before GC invocations=845 (full 281): garbage-first heap total 27262976K, used 26896156K [0x0000000160000000, 0x00000007e0000000, 0x00000007e0000000) region size 8192K, 0 young (0K), 0 survivors (0K) compacting perm gen total 524288K, used 173165K [0x00000007e0000000, 0x0000000800000000, 0x0000000800000000) the space 524288K, 33% used [0x00000007e0000000, 0x00000007ea91b5d8, 0x00000007ea91b600, 0x0000000800000000) No shared spaces configured. 2013-05-31T18:28:11.307+0300: 44577.340: [Full GC 26265M->12778M(26624M), 51.0529510 secs] ** Martin 2013/6/2 Charlie Hunt : > Monica / John: thanks for jumping in, much appreciated! I'm kinda swamped > with other stuff at the moment. > > On Jun 1, 2013, at 9:50 AM, Monica Beckwith wrote: > > comments inlined - > On 5/30/2013 5:32 PM, Darius D. wrote: > > Hi, > > Monica, thanks a lot for Your additional insight about H-Objects, as i've > already mentioned in this thread, your great JavaOne presentation about G1GC > was key in solving our problem. > > Thanks for your feedback on the slides. I am very glad that you found them > useful. This year also (for J1) we are hoping to provide more insight into > G1 with some real world case studies (provided the talks get accepted :)). > > You are right that it is all about "fitting" H-Object in old gen. In our > case no matter how high (within confines of what server memory allowed us) > we set the heap size, we were still getting H-Obj alloc failures. Actually > even drastically cutting H-Object count we were still getting some Full GCs, > only after we increased region size step by step to 16M they ceased. It was > subtle fragmentation issue, made worse by rather small default region size. > > The following happened: > > 1) Big web request came in, 1-1.5GB of various allocations were done to > generate that Json string, triggering young gc in progress. > 2) After young gc heap was a bit fragmented cause there was temporary, but > still "live" data. All that data now got into fresh survivor/old regions. > The "health" of heap now really depends on how long ago last mixed GC ran. > 3) Into such "sprayed" heap we start to allocate our big object. I have no > idea how a set of regions is chosen for humongous region, but i think we > were generating a total of ~30 humongous objects ( "generating" as in > resizing StringBuffer somewhere deep in web framework till 30M fits) and > that was too much for G1GC to cope. > 4) Reducing allocation rate is not enough unfortunately, those small ones > that slip are really dangerous - they are immediately allocated in OldGen, > fragmenting the heap further. > 5) It is now a race between big web requests and next mixed gc. We could > reliably reproduce Full GCs in testing by generating several well timed big > requests :) > > > Getting region size up from default (actually I got a question, why G1GC is > aiming for thousands of regions, what are the drawbacks of larger than > "default" region) is more about reducing fragmentation by keeping those > large but temporary objects where they belong - in nursery where G1GC can > collect them efficiently. > > There are two parts to your question - > The default region size for a particular configuration depends on your > minimum heap size. You can increase the region size (by power of 2) up-to > the max limit of 32M (minimum = 1M). The effect of changing the region size > would be that previously H-objs are no longer H-objs. Changing the region > size, would result in a change in the entries for Remembered Set (RSet) > tables (hence, you may observe a change in the RSet times during a pause). I > can think of other things like cross-region references can get affected as > well. And the sizes of collection sets will get affected as well. > > Now the first part of your question - The 2048 number of regions is just a > means to get a number between 1M and 32M for the region size, the real > limits are the max (32M) and min (1M) bounds. As to the max region size - > the upper bound of 32M is deliberately introduced, since there is a tradeoff > between larger regions and the effectiveness of the 'cleanup' phase of the > multi-phased marking cycle. Since cleanup reclaims *completely* free regions > and returns them to the free regions list, for larger sized regions, these > *completely* free regions may not be that easy to come by. > You can read more in heapRegion.cpp. > > hope this helps. > > > So we have 3 important todo and tunables in avoiding H-Object caused FullGC: > > > 1) Code changes - any profiler that can record every allocation stack trace > will help, set it to record each alloc above 1/2 heap region size and limit > them as much as possible > 2) IHOP should be tuned to allow mixed GCs frequently enough even if your > app is behaving perfectly ( stable old gen + temporaries in nursery with > little promotion going on ). > 3) G1 region size can be increased to reduce heap fragmentation caused by > H-Objects if they are temporary (by reducing them to ordinary objects > allocated in young gen) > > > Makes sense. > > -Monica > > Darius. > > > > > > On Thu, May 30, 2013 at 11:55 PM, Monica Beckwith > wrote: >> >> +1 to enabling PrintAdaptiveSizePolicy. >> Darius, >> As you have already mentioned - Any object with size greater or equal to a >> half region is called a "humongous" object (H-obj). The max region size for >> G1 is 32M. So yes, even if you set your region size to max, your 32M object >> will be considered humongous. >> Now, there are a couple of things that we should be aware of with respect >> to humongous regions (H-region)/objects - >> >> The H-obj allocation will happen directly into the old generation >> >> There will be a check for marking threshold (IHOP), and a concurrent cycle >> will be initiated if necessary >> >> The H-regions are not included in an evacuation pause, since it's just >> going to increase the copying expense. >> >> But if the H-obj(s) are dead, they get freed at the end of the >> multi-phased concurrent marking cycle. >> >> So, I think, if you have to work with H-objs and increasing the region >> size doesn't help (as is your case), then maybe you should try limiting your >> nursery so as to allow more space for the old generation, so as to sustain >> your H-objs till they die or if they are a part of your live data set, then >> it's all the more necessary to be able to fit them in your old gen. >> >> -Monica >> >> >> On 5/30/2013 1:54 PM, Martin Makundi wrote: >> >> Hi! >> >> I'd strongly suggest that Martin should add >> -XX:+PrintAdaptiveSizePolicy to his JVM options. In our case that was >> what we needed to solve the mystery of FullGCs with gigabytes of heap >> free. >> >> Thanks, will add that tomorrow. >> >> Actually with some minor googling around i've found: >> >> https://forums.oracle.com/forums/thread.jspa?messageID=10869877 >> >> I suspect it could be same story as ours, "humongous allocation >> request failed" is bad for JVM health, FullGC will occur immediately. >> >> Remember, any allocation that is larger than half of G1GC region size >> will get allocated as "humongous" object on heap, that does not care >> about regions etc. In our case we were failing to allocate 32 >> megabytes with over 50% of heap free! >> >> Any solution to such problem or is it a bug in g1gc? Is there a way to >> log what code is performing the memory allocation if that happens to >> be the case? >> >> ** >> Martin >> >> Best regards, >> >> Darius. >> >> >> >> On Wed, May 29, 2013 at 8:35 PM, John Cuthbertson >> wrote: >> >> Hi Martin, >> >> I'm going to fill in bit more detail to Charlie's replies.... >> >> On 5/29/2013 8:49 AM, Martin Makundi wrote: >> >> Hi! >> >> A bit of constructive criticism ;-) It would be good practice to set one >> option at a time and measure its performance to determine whether it >> improves performance rather than choosing an option because of something you >> read in text. In short, always measure and reason about whether what you've >> observed for an improvement or regression makes sense. And, also run >> multiple times to get a sense of noise versus real improvement or >> regression. >> >> Thanks. That's one of the reasons we never changed our options. Once >> we found someting that works very well, we know that its always n! >> work to test changes and the system was running very nice indeed >> before the previous tweak ;) >> >> -XX:+UseFastAccessorMethod (the default is disabled) >> >> Fast sounds good, the description of it is "Use optimized versions of >> GetField" which sounds good. I see no harm in this. >> >> These would be JNI operations. >> >> A quick at the HotSpot source suggests UseFastAccessorMethods >> is mostly confined to interpreter operations. >> >> Thanks for the info. Doesn't say much to me, but does not seem to harm >> anything. Will try setting it off at some point in time. >> >> -XX:+UseNUMA (Are you running a JVM that spans NUMA memory nodes? >> Or, do you have multiple JVMs running on a NUMA or non-NUMA system?) >> >> Key point is that you should use -XX:+UseNUMA only when you are >> deploying a JVM that spans NUMA nodes. >> >> Thanks for the info. Doesn't say much to me, but does not seem to harm >> anything. Will try setting it off at some point in time. >> >> The fast accessor methods flag creates specialized (i.e. short and >> optimized) interpreter entry points for accessor methods (those that >> just return the value in one of the object's fields). In most >> applications the bulk of the execution time is spent executing JIT >> compiled code; only a few percent is typically spent in Hotspot's >> interpreter. The JIT compiler will always try to inline accessor methods >> into their caller. So, unless your application is spending a ton of time >> interpreting, this flag should make no difference. >> >> -XX:CMSInitiatingOccupancyFraction=70 (This is applicable to CMS GC, and >> not applicable to G1 GC) >> >> Again, not documented thoroughly where it applies and where not, jvm >> gave no warning/error about it so we assumed it's valid. >> >> There's always the HotSpot source code ;-) >> >> It's also quite well documented in various slide ware on the internet. >> It's also quite well documented in the Java Performance book. :-) >> >> Uh.. does it say somewhere that Do not use >> XX:CMSInitiatingOccupancyFraction with G1GC? ;) I know performance >> tuning is your bread and butter but is not ours... is more like we are >> just driving the car and you are the mechanic...different >> perspective.. just trying to fill'er'up to go..leaded or unleaded... >> ;) >> >> The G1 equivalent of this flag is InitiatingHeapOccupancyPercent (IHOP >> for short). Actually both G1 and CMS accept and observe IHOP. >> CMSInitiatingOccupancyFraction was superseded by IHOP. The JVM still >> accepts the old flag name - but it is CMS only and doesn't affect G1. >> >> -XX:InitiatingHeapOccupancyPercent=0 (You realize this will force G1's >> concurrent cycle to run continuously?) >> >> Yes, that's what we figured out, we don't want it to sit lazy and end >> up in a situation where it is required to do a Full GC. This switch >> was specifically chosen in a situation we had a memory leak and tried >> to aggressively fight against it before we found the root cause. Maybe >> we should try without this switch now, and see what effect it has. >> >> Having GC logs to see what available head room you have between the >> initiating of a G1 concurrent cycle and available regions / heap space >> would be most appropriate. >> >> Hmm.. I don't thoroughly understand the logs either, but, here is a snap: >> >> 2013-05-29T17:28:56.119+0300: 38905.407: [GC pause (young), 0.29261000 >> secs] >> [Parallel Time: 288.8 ms] >> [GC Worker Start (ms): 38905407.6 38905407.7 38905407.7 >> 38905407.9 >> Avg: 38905407.7, Min: 38905407.6, Max: 38905407.9, Diff: 0.3] >> [Ext Root Scanning (ms): 22.8 16.3 18.1 22.0 >> Avg: 19.8, Min: 16.3, Max: 22.8, Diff: 6.6] >> [SATB Filtering (ms): 0.0 0.1 0.0 0.0 >> Avg: 0.0, Min: 0.0, Max: 0.1, Diff: 0.1] >> [Update RS (ms): 31.9 37.3 35.1 33.3 >> Avg: 34.4, Min: 31.9, Max: 37.3, Diff: 5.5] >> [Processed Buffers : 102 106 119 104 >> Sum: 431, Avg: 107, Min: 102, Max: 119, Diff: 17] >> [Scan RS (ms): 0.0 0.0 0.1 0.0 >> Avg: 0.0, Min: 0.0, Max: 0.1, Diff: 0.1] >> [Object Copy (ms): 228.2 229.1 229.5 227.3 >> Avg: 228.5, Min: 227.3, Max: 229.5, Diff: 2.2] >> [Termination (ms): 0.0 0.0 0.0 0.0 >> Avg: 0.0, Min: 0.0, Max: 0.0, Diff: 0.0] >> [Termination Attempts : 4 1 11 4 >> Sum: 20, Avg: 5, Min: 1, Max: 11, Diff: 10] >> [GC Worker End (ms): 38905690.5 38905690.5 38905690.5 >> 38905690.5 >> Avg: 38905690.5, Min: 38905690.5, Max: 38905690.5, Diff: 0.0] >> [GC Worker (ms): 282.9 282.8 282.8 282.6 >> Avg: 282.8, Min: 282.6, Max: 282.9, Diff: 0.3] >> [GC Worker Other (ms): 5.9 6.0 6.0 6.2 >> Avg: 6.0, Min: 5.9, Max: 6.2, Diff: 0.3] >> [Complete CSet Marking: 0.0 ms] >> [Clear CT: 0.1 ms] >> [Other: 3.7 ms] >> [Choose CSet: 0.0 ms] >> [Ref Proc: 2.8 ms] >> [Ref Enq: 0.1 ms] >> [Free CSet: 0.3 ms] >> [Eden: 48M(5032M)->0B(5032M) Survivors: 288M->288M Heap: >> 15790M(26624M)->15741M(26624M)] >> [Times: user=1.14 sys=0.00, real=0.29 secs] >> Heap after GC invocations=575 (full 157): >> garbage-first heap total 27262976K, used 16119181K >> [0x0000000160000000, 0x00000007e0000000, 0x00000007e0000000) >> region size 8192K, 36 young (294912K), 36 survivors (294912K) >> compacting perm gen total 524288K, used 164479K [0x00000007e0000000, >> 0x0000000800000000, 0x0000000800000000) >> the space 524288K, 31% used [0x00000007e0000000, >> 0x00000007ea09ff68, 0x00000007ea0a0000, 0x0000000800000000) >> No shared spaces configured. >> } >> {Heap before GC invocations=575 (full 157): >> garbage-first heap total 27262976K, used 16119181K >> [0x0000000160000000, 0x00000007e0000000, 0x00000007e0000000) >> region size 8192K, 37 young (303104K), 36 survivors (294912K) >> compacting perm gen total 524288K, used 164479K [0x00000007e0000000, >> 0x0000000800000000, 0x0000000800000000) >> the space 524288K, 31% used [0x00000007e0000000, >> 0x00000007ea09ff68, 0x00000007ea0a0000, 0x0000000800000000) >> No shared spaces configured. >> 2013-05-29T17:28:56.413+0300: 38905.701: [Full GC >> 15742M->14497M(26624M), 56.7731320 secs] >> >> That's the third Full GC today after the change to 26G and change from >> occupancypercent=0. Tomorrow will be trying again with >> occupancypercent=0 >> >> What did you set the IHOP value to? >> >> -noclassgc (This is rarely needed and haven't seen an app that required it >> for quite some time) >> >> Jvm 1.6 stopped the world for couple of minutes several times per day >> while unloading classes, so we used noclassgc to disable that. We do >> not know if this is necessary for latest 1.7 to avoid class unload >> pause, but we continued to use this switch and found no harm in it. >> Can't afford testing that in production ;) >> >> Haven't seen a case where unloading classes cause a several minute pause. >> Are you sure your system is not swapping? And, do you have GC logs you >> can share that illustrate the behavior and that -noclassgc fixed it? >> >> We deleted swap partition long time ago, we simply do not risk swapping at >> all. >> >> We had this class unloading problem several times per day like half a >> year ago, and fixed it with noclasssgc, that was a no-brainer, single >> parameter that made the difference. >> >> It is also discussed here (they do not discuss noclassgc though, we >> figured that out somehow) >> >> http://stackoverflow.com/questions/2833983/meaning-of-the-unloading-class-messages >> >> G1 only performs class unloading during a full GC. But if you're not >> running out of perm space or compiled code cache - you can leave this >> flag. >> >> -XX:+ UseGCOverheadLimit >> -XX:ReservedCodeCacheSize=48, that is the default for 7u21. You might >> consider setting it higher if you have the available space, and more >> importantly if you think you're running out of code space. >> >> For our sun jvm linux 64bit 48m is maximum, jvm won't start if higher >> value. >> >> If you can't go larger than -XX:ReservedCodeCacheSize=48m, that may >> suggest you have memory constraints and may also suggest you don't have >> enough swap space defined, and you may be experiencing swapping during JVM >> execution. I've got a Linux system that has 32 GB of RAM, I can set >> ReservedCodeCacheSize=256m with no issues, even with -Xms30g and -Xms30g. >> >> It is also documented that 48m is maximum >> >> http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html >> "maximum code cache size. [Solaris 64-bit, amd64, and -server x86: >> 48m" >> >> >> That's the default max code cache size. When the JIT compiler compiles a >> Java method it places the generated code into the code cache. When >> there's no more room in the code cache, a warning is issued and JIT >> compilation is stopped. You can set it higher. IIRC there was time in >> the past when the size was limited in order to use short branches in >> compiled code. I don't think we've had that restriction for a while. >> >> HTHs >> >> JohnC >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> >> >> -- >> >> >> Monica Beckwith | Principal Member of Technical Staff >> VOIP: +15124011274 >> Oracle Java Performance >> >> Oracle is committed to developing practices and products >> that help protect the environment > > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > > -- > > > Monica Beckwith | Principal Member of Technical Staff > VOIP: +15124011274 > Oracle Java Performance > > Oracle is committed to developing practices and > products that help protect the environment > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > From thomas.schatzl at oracle.com Mon Jun 3 04:18:12 2013 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Mon, 03 Jun 2013 13:18:12 +0200 Subject: Bug in G1GC it performs Full GC when code cache is full resulting in overkill In-Reply-To: References: <3807DE76-D6CA-451F-AC72-771332825905@salesforce.com> <5C892669-3515-4DCC-BB72-96AE89EBE5F8@salesforce.com> <51A63C5F.9040808@oracle.com> <51A7BCC7.4050700@oracle.com> <51AA0A24.2010103@oracle.com> <0014A986-1651-44C6-8D0D-B77E200C18D6@salesforce.com> Message-ID: <1370258292.2707.43.camel@cirrus> Hi, On Mon, 2013-06-03 at 05:10 +0300, Martin Makundi wrote: > Hi! > > We seem to be suffering from humongous and something else too... I > will attempt to investigate how the cause can be captured to see if > they can be made smaller. > > Why does it need to do a stop the world instead of just making space > for the items that need to be placed? There is no reason for the current behavior other than that trying to make space using a targeted mixed collection in this case is just not implemented in G1. Thomas From martin.makundi at koodaripalvelut.com Mon Jun 3 04:23:41 2013 From: martin.makundi at koodaripalvelut.com (Martin Makundi) Date: Mon, 3 Jun 2013 14:23:41 +0300 Subject: Bug in G1GC it performs Full GC when code cache is full resulting in overkill In-Reply-To: <1370258292.2707.43.camel@cirrus> References: <3807DE76-D6CA-451F-AC72-771332825905@salesforce.com> <5C892669-3515-4DCC-BB72-96AE89EBE5F8@salesforce.com> <51A63C5F.9040808@oracle.com> <51A7BCC7.4050700@oracle.com> <51AA0A24.2010103@oracle.com> <0014A986-1651-44C6-8D0D-B77E200C18D6@salesforce.com> <1370258292.2707.43.camel@cirrus> Message-ID: Hi! > > We seem to be suffering from humongous and something else too... I > > will attempt to investigate how the cause can be captured to see if > > they can be made smaller. > > > > Why does it need to do a stop the world instead of just making space > > for the items that need to be placed? > > There is no reason for the current behavior other than that trying to > make space using a targeted mixed collection in this case is just not > implemented in G1. So could we improve it by adding that and also allowing varying-size regions, to allow smaller and larger (even bigger than 32m) objects? ** Martin > > > From thomas.schatzl at oracle.com Mon Jun 3 04:53:13 2013 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Mon, 03 Jun 2013 13:53:13 +0200 Subject: Bug in G1GC it performs Full GC when code cache is full resulting in overkill In-Reply-To: References: <3807DE76-D6CA-451F-AC72-771332825905@salesforce.com> <5C892669-3515-4DCC-BB72-96AE89EBE5F8@salesforce.com> <51A63C5F.9040808@oracle.com> <51A7BCC7.4050700@oracle.com> <51AA0A24.2010103@oracle.com> <0014A986-1651-44C6-8D0D-B77E200C18D6@salesforce.com> <1370258292.2707.43.camel@cirrus> Message-ID: <1370260393.2707.63.camel@cirrus> Hi, On Mon, 2013-06-03 at 14:23 +0300, Martin Makundi wrote: > Hi! > > > > We seem to be suffering from humongous and something else too... I > > > will attempt to investigate how the cause can be captured to see if > > > they can be made smaller. > > > > > > Why does it need to do a stop the world instead of just making space > > > for the items that need to be placed? > > > > There is no reason for the current behavior other than that trying to > > make space using a targeted mixed collection in this case is just not > > implemented in G1. > > So could we improve it by adding that Yes, patches are welcome :) I think a patch needs approximately the following changes: - instead of issuing a full gc vm operation in that case, issue a special mixed gc. - in this mixed gc first scan the region table to find the place (area) where the least amount of regions need to be evacuated, ie. least total amount of cost. (If you want to optimize for performance of these mixed gcs, there may be other goals worth pursuing. A first prototype might just take the first N regions). Since g1 keeps remembered sets for all regions anyway, i.e. can evacuate everything at any time, there is no other limitation here I can think of. Put the regions into the list of evacuation candidates at the top, and tell g1 that it absolutely should evacuate these number of regions. - sort of pre-reserve already free regions in that area so that the mixed gc does not allocate into them during evacuation :) - do the mixed gc as usual - allocate the large object in that now free place. All functionality is there, things just need to be wired together differently. > and also allowing varying-size > regions, to allow smaller and larger (even bigger than 32m) objects? "Humonguous regions" already enable the use of arbitrarily sized objects. I am not sure what overall advantage arbitrarily sized regions would have; while they might solve this particular problem better (in which way?) they would require massive changes in the collector/VM to support them. Also, this would likely have a large impact on achievable performance. E.g. evenly sized power-of-two regions give a lot of opportunities for optimizations in many places. There are other options, like cutting large objects into multiple smaller arrays (arraylets etc), but they require compiler support. Hth, Thomas From martin.makundi at koodaripalvelut.com Tue Jun 4 21:26:49 2013 From: martin.makundi at koodaripalvelut.com (Martin Makundi) Date: Wed, 5 Jun 2013 07:26:49 +0300 Subject: Bug in G1GC it performs Full GC when code cache is full resulting in overkill In-Reply-To: <1370260393.2707.63.camel@cirrus> References: <3807DE76-D6CA-451F-AC72-771332825905@salesforce.com> <5C892669-3515-4DCC-BB72-96AE89EBE5F8@salesforce.com> <51A63C5F.9040808@oracle.com> <51A7BCC7.4050700@oracle.com> <51AA0A24.2010103@oracle.com> <0014A986-1651-44C6-8D0D-B77E200C18D6@salesforce.com> <1370258292.2707.43.camel@cirrus> <1370260393.2707.63.camel@cirrus> Message-ID: Meanwhile, is there a feature like "Finding which methods allocated large objects" for Sun/OpenJDK JAVA? See http://publib.boulder.ibm.com/infocenter/javasdk/v5r0/topic/com.ibm.java.doc.diagnostics.50/diag/tools/gcpd_stackevents.html for -Xdump:stack:events=allocation,filter=#1k or -Xdump:stack:events=allocation,filter=#2m..4m ** Martin 2013/6/3 Thomas Schatzl : > Hi, > > On Mon, 2013-06-03 at 14:23 +0300, Martin Makundi wrote: >> Hi! >> >> > > We seem to be suffering from humongous and something else too... I >> > > will attempt to investigate how the cause can be captured to see if >> > > they can be made smaller. >> > > >> > > Why does it need to do a stop the world instead of just making space >> > > for the items that need to be placed? >> > >> > There is no reason for the current behavior other than that trying to >> > make space using a targeted mixed collection in this case is just not >> > implemented in G1. >> >> So could we improve it by adding that > > Yes, patches are welcome :) > > I think a patch needs approximately the following changes: > > - instead of issuing a full gc vm operation in that case, issue a > special mixed gc. > > - in this mixed gc first scan the region table to find the place (area) > where the least amount of regions need to be evacuated, ie. least total > amount of cost. > (If you want to optimize for performance of these mixed gcs, there may > be other goals worth pursuing. A first prototype might just take the > first N regions). > > Since g1 keeps remembered sets for all regions anyway, i.e. can evacuate > everything at any time, there is no other limitation here I can think > of. > > Put the regions into the list of evacuation candidates at the top, and > tell g1 that it absolutely should evacuate these number of regions. > > - sort of pre-reserve already free regions in that area so that the > mixed gc does not allocate into them during evacuation :) > > - do the mixed gc as usual > > - allocate the large object in that now free place. > > All functionality is there, things just need to be wired together > differently. > >> and also allowing varying-size >> regions, to allow smaller and larger (even bigger than 32m) objects? > > "Humonguous regions" already enable the use of arbitrarily sized > objects. > > I am not sure what overall advantage arbitrarily sized regions would > have; while they might solve this particular problem better (in which > way?) they would require massive changes in the collector/VM to support > them. Also, this would likely have a large impact on achievable > performance. E.g. evenly sized power-of-two regions give a lot of > opportunities for optimizations in many places. > > There are other options, like cutting large objects into multiple > smaller arrays (arraylets etc), but they require compiler support. > > Hth, > Thomas > > From thomas.schatzl at oracle.com Wed Jun 5 03:05:03 2013 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Wed, 05 Jun 2013 12:05:03 +0200 Subject: Bug in G1GC it performs Full GC when code cache is full resulting in overkill In-Reply-To: References: <3807DE76-D6CA-451F-AC72-771332825905@salesforce.com> <5C892669-3515-4DCC-BB72-96AE89EBE5F8@salesforce.com> <51A63C5F.9040808@oracle.com> <51A7BCC7.4050700@oracle.com> <51AA0A24.2010103@oracle.com> <0014A986-1651-44C6-8D0D-B77E200C18D6@salesforce.com> <1370258292.2707.43.camel@cirrus> <1370260393.2707.63.camel@cirrus> Message-ID: <1370426703.2599.16.camel@cirrus> Hi, On Wed, 2013-06-05 at 07:26 +0300, Martin Makundi wrote: > Meanwhile, is there a feature like "Finding which methods allocated > large objects" for Sun/OpenJDK JAVA? I am not so proficient with the options to track allocations on the Java level: there are a few possibilites that come to my mind right away, but all require some extra effort: using dtrace (on solaris), or byte code rewriting. Afaik there are also JFR events to track memory allocations. I think you can also do this using the JVMTI interface, but I don't know much about it. There are certainly profilers available that use it. For dtrace/solaris I found some pre-baked script collection that should give you almost what you want at http://www.gousios.gr/sw/jdt.html , the jmemstat script. It should be simple to add a threshold for the printout. The documentation at http://docs.oracle.com/javase/6/docs/technotes/guides/vm/dtrace.html gives an overview about the available probes. Not sure about the current state of Linux/systemtap support. Regarding bytecode rewriting, I found https://code.google.com/p/java-allocation-instrumenter/ which on the front page already gives an example for almost what you want to achieve. Likely others can give a more detailed answer about further options. Hth, Thomas From martin.makundi at koodaripalvelut.com Wed Jun 5 22:53:32 2013 From: martin.makundi at koodaripalvelut.com (Martin Makundi) Date: Thu, 6 Jun 2013 08:53:32 +0300 Subject: Bug in G1GC it performs Full GC when code cache is full resulting in overkill In-Reply-To: <1370426703.2599.16.camel@cirrus> References: <3807DE76-D6CA-451F-AC72-771332825905@salesforce.com> <5C892669-3515-4DCC-BB72-96AE89EBE5F8@salesforce.com> <51A63C5F.9040808@oracle.com> <51A7BCC7.4050700@oracle.com> <51AA0A24.2010103@oracle.com> <0014A986-1651-44C6-8D0D-B77E200C18D6@salesforce.com> <1370258292.2707.43.camel@cirrus> <1370260393.2707.63.camel@cirrus> <1370426703.2599.16.camel@cirrus> Message-ID: Hi! > Regarding bytecode rewriting, I found > https://code.google.com/p/java-allocation-instrumenter/ which on the > front page already gives an example for almost what you want to achieve. Thanks, this seems to work, will try that to see how it impacts and what we can find out. The first large object we found was a stringbuilder from a logger that collects stacktraces to dump them optionally after a certain timeout... need to redesign that up front ;) ** Martin From thomas.schatzl at oracle.com Mon Jun 10 07:32:55 2013 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Mon, 10 Jun 2013 16:32:55 +0200 Subject: Bug in G1GC it performs Full GC when code cache is full resulting in overkill In-Reply-To: References: <3807DE76-D6CA-451F-AC72-771332825905@salesforce.com> <5C892669-3515-4DCC-BB72-96AE89EBE5F8@salesforce.com> <51A63C5F.9040808@oracle.com> <51A7BCC7.4050700@oracle.com> <51AA0A24.2010103@oracle.com> <0014A986-1651-44C6-8D0D-B77E200C18D6@salesforce.com> <1370258292.2707.43.camel@cirrus> <1370260393.2707.63.camel@cirrus> <1370426703.2599.16.camel@cirrus> <1370870693.2814.26.camel@cirrus> Message-ID: <1370874775.2814.59.camel@cirrus> Hi, On Mon, 2013-06-10 at 16:32 +0300, Martin Makundi wrote: > Hi! > > >> Does this make sense? > > > > To me, the log indicates that there are multiple threads trying to > > allocate large objects, all of them failing to do so, only then > > triggering the gc. > > Sizes of the allocations seem too close to each other to be multiple threads. > > Is it possible that the gc is racing against itself in some > re-allocation algorithm? The allocation after the full gc occurs inside the safepoint - so no. The collector tries this allocation twice before actually doing the collection, both of them failing. So you get two sets of messages. I.e. first the gc tries outside the safepoint and fails, then schedules a safepoint. Within that safepoint the code tries again first in the hope that another previous safepoint might have freed enough space. This explains the messages, as every time you get the "attempt heap expansion", "expand the heap" and "did not expand the heap" message. > > I assume you checked that the log messages are not possibly printed > > after the gc? The instrumentation invokes the allocation sampler after > > allocation. So in this case, the threads will block at the allocation, > > wait for gc, and only after that print the expected messages. There may > > also be a delay between the gc output and the allocation information. > > I did check, not even a single 30mb allocation in today's log. > > > Maybe looking at the code for the instrumentation helps finding this > > issue (this is the first one such bytecode rewriting tool that looked > > simple to use that I found as example). > > You mean change instrumentation or debug it? We use exactly this one: > https://code.google.com/p/java-allocation-instrumenter/ Something like that; I really do not know why the messages do not show up in the log. The only idea I would have at the moment is that there is something wrong with the instrumentation, i.e. for some reason not all allocation sites are instrumented. Maybe rebuilding this instrumenter with current libraries may help. hth, Thomas From martin.makundi at koodaripalvelut.com Mon Jun 10 08:23:08 2013 From: martin.makundi at koodaripalvelut.com (Martin Makundi) Date: Mon, 10 Jun 2013 18:23:08 +0300 Subject: Bug in G1GC it performs Full GC when code cache is full resulting in overkill In-Reply-To: <1370874775.2814.59.camel@cirrus> References: <3807DE76-D6CA-451F-AC72-771332825905@salesforce.com> <5C892669-3515-4DCC-BB72-96AE89EBE5F8@salesforce.com> <51A63C5F.9040808@oracle.com> <51A7BCC7.4050700@oracle.com> <51AA0A24.2010103@oracle.com> <0014A986-1651-44C6-8D0D-B77E200C18D6@salesforce.com> <1370258292.2707.43.camel@cirrus> <1370260393.2707.63.camel@cirrus> <1370426703.2599.16.camel@cirrus> <1370870693.2814.26.camel@cirrus> <1370874775.2814.59.camel@cirrus> Message-ID: Ok, will rebuild it. However, it does work if I explicitly instantiate, for example a huge string or something. And I can see it working on the logs: 2013-06-10T17:18:35.179+0300 - 37sec - at java.lang.ThreadLocal$ThreadLocalMap.access$100(ThreadLocal.java:261) at java.lang.ThreadLocal.set(ThreadLocal.java:183) at com.google.monitoring.runtime.instrumentation.AllocationRecorder.recordAllocation(Unknown Source) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1517) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1493) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1416) Nevertheless, I rebuilt it and will investigate logs again tomorrow. ** Martin 2013/6/10 Thomas Schatzl : > Hi, > > On Mon, 2013-06-10 at 16:32 +0300, Martin Makundi wrote: >> Hi! >> >> >> Does this make sense? >> > >> > To me, the log indicates that there are multiple threads trying to >> > allocate large objects, all of them failing to do so, only then >> > triggering the gc. >> >> Sizes of the allocations seem too close to each other to be multiple threads. >> >> Is it possible that the gc is racing against itself in some >> re-allocation algorithm? > > The allocation after the full gc occurs inside the safepoint - so no. > > The collector tries this allocation twice before actually doing the > collection, both of them failing. So you get two sets of messages. I.e. > first the gc tries outside the safepoint and fails, then schedules a > safepoint. Within that safepoint the code tries again first in the hope > that another previous safepoint might have freed enough space. > > This explains the messages, as every time you get the "attempt heap > expansion", "expand the heap" and "did not expand the heap" message. > >> > I assume you checked that the log messages are not possibly printed >> > after the gc? The instrumentation invokes the allocation sampler after >> > allocation. So in this case, the threads will block at the allocation, >> > wait for gc, and only after that print the expected messages. There may >> > also be a delay between the gc output and the allocation information. >> >> I did check, not even a single 30mb allocation in today's log. >> >> > Maybe looking at the code for the instrumentation helps finding this >> > issue (this is the first one such bytecode rewriting tool that looked >> > simple to use that I found as example). >> >> You mean change instrumentation or debug it? We use exactly this one: >> https://code.google.com/p/java-allocation-instrumenter/ > > Something like that; I really do not know why the messages do not show > up in the log. The only idea I would have at the moment is that there is > something wrong with the instrumentation, i.e. for some reason not all > allocation sites are instrumented. > > Maybe rebuilding this instrumenter with current libraries may help. > > hth, > Thomas > > From martin.makundi at koodaripalvelut.com Mon Jun 10 06:32:40 2013 From: martin.makundi at koodaripalvelut.com (Martin Makundi) Date: Mon, 10 Jun 2013 16:32:40 +0300 Subject: Bug in G1GC it performs Full GC when code cache is full resulting in overkill In-Reply-To: <1370870693.2814.26.camel@cirrus> References: <3807DE76-D6CA-451F-AC72-771332825905@salesforce.com> <5C892669-3515-4DCC-BB72-96AE89EBE5F8@salesforce.com> <51A63C5F.9040808@oracle.com> <51A7BCC7.4050700@oracle.com> <51AA0A24.2010103@oracle.com> <0014A986-1651-44C6-8D0D-B77E200C18D6@salesforce.com> <1370258292.2707.43.camel@cirrus> <1370260393.2707.63.camel@cirrus> <1370426703.2599.16.camel@cirrus> <1370870693.2814.26.camel@cirrus> Message-ID: Hi! >> Does this make sense? > > To me, the log indicates that there are multiple threads trying to > allocate large objects, all of them failing to do so, only then > triggering the gc. Sizes of the allocations seem too close to each other to be multiple threads. Is it possible that the gc is racing against itself in some re-allocation algorithm? > I assume you checked that the log messages are not possibly printed > after the gc? The instrumentation invokes the allocation sampler after > allocation. So in this case, the threads will block at the allocation, > wait for gc, and only after that print the expected messages. There may > also be a delay between the gc output and the allocation information. I did check, not even a single 30mb allocation in today's log. > Maybe looking at the code for the instrumentation helps finding this > issue (this is the first one such bytecode rewriting tool that looked > simple to use that I found as example). You mean change instrumentation or debug it? We use exactly this one: https://code.google.com/p/java-allocation-instrumenter/ ** Martin > > Thomas > > From thomas.schatzl at oracle.com Mon Jun 10 06:24:53 2013 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Mon, 10 Jun 2013 15:24:53 +0200 Subject: Bug in G1GC it performs Full GC when code cache is full resulting in overkill In-Reply-To: References: <3807DE76-D6CA-451F-AC72-771332825905@salesforce.com> <5C892669-3515-4DCC-BB72-96AE89EBE5F8@salesforce.com> <51A63C5F.9040808@oracle.com> <51A7BCC7.4050700@oracle.com> <51AA0A24.2010103@oracle.com> <0014A986-1651-44C6-8D0D-B77E200C18D6@salesforce.com> <1370258292.2707.43.camel@cirrus> <1370260393.2707.63.camel@cirrus> <1370426703.2599.16.camel@cirrus> Message-ID: <1370870693.2814.26.camel@cirrus> Hi, On Mon, 2013-06-10 at 15:22 +0300, Martin Makundi wrote: > >> Regarding bytecode rewriting, I found > >> https://code.google.com/p/java-allocation-instrumenter/ which on the > >> front page already gives an example for almost what you want to achieve. > > > > Thanks, this seems to work, will try that to see how it impacts and > > what we can find out. The first large object we found was a > > stringbuilder from a logger that collects stacktraces to dump them > > optionally after a certain timeout... need to redesign that up front > > ;) > > Interesting, my allocationrecorder did not catch the allocation: > >[...] > >While gc log says: > > 28787.582: [G1Ergonomics (Heap Sizing) attempt heap expansion, > reason: humongous allocation request failed, allocation request: > 58720272 bytes] > 28787.582: [G1Ergonomics (Heap Sizing) expand the heap, requested > expansion amount: 58720256 bytes, attempted expansion amount: 58720256 > bytes] > 28787.582: [G1Ergonomics (Heap Sizing) did not expand the heap, > reason: heap expansion operation failed] > 28787.585: --""-- > 28787.585: --""-- > 28787.585: --""-- > 28787.585: --""-- > 28787.585: --""-- > 28787.585: --""-- > [ gc executed message ] > > Does this make sense? To me, the log indicates that there are multiple threads trying to allocate large objects, all of them failing to do so, only then triggering the gc. I assume you checked that the log messages are not possibly printed after the gc? The instrumentation invokes the allocation sampler after allocation. So in this case, the threads will block at the allocation, wait for gc, and only after that print the expected messages. There may also be a delay between the gc output and the allocation information. This should not be a problem, as this does not influence a stack trace in the AllocationSampler. In very quick tests the correct amount of allocations were always reported, also after full gc. Maybe looking at the code for the instrumentation helps finding this issue (this is the first one such bytecode rewriting tool that looked simple to use that I found as example). Thomas From martin.makundi at koodaripalvelut.com Mon Jun 10 05:22:57 2013 From: martin.makundi at koodaripalvelut.com (Martin Makundi) Date: Mon, 10 Jun 2013 15:22:57 +0300 Subject: Bug in G1GC it performs Full GC when code cache is full resulting in overkill In-Reply-To: References: <3807DE76-D6CA-451F-AC72-771332825905@salesforce.com> <5C892669-3515-4DCC-BB72-96AE89EBE5F8@salesforce.com> <51A63C5F.9040808@oracle.com> <51A7BCC7.4050700@oracle.com> <51AA0A24.2010103@oracle.com> <0014A986-1651-44C6-8D0D-B77E200C18D6@salesforce.com> <1370258292.2707.43.camel@cirrus> <1370260393.2707.63.camel@cirrus> <1370426703.2599.16.camel@cirrus> Message-ID: >> Regarding bytecode rewriting, I found >> https://code.google.com/p/java-allocation-instrumenter/ which on the >> front page already gives an example for almost what you want to achieve. > > Thanks, this seems to work, will try that to see how it impacts and > what we can find out. The first large object we found was a > stringbuilder from a logger that collects stacktraces to dump them > optionally after a certain timeout... need to redesign that up front > ;) Interesting, my allocationrecorder did not catch the allocation: AllocationRecorder.addSampler(new Sampler() { @Override public void sampleAllocation(int count, String name, Object newObj, long size) { if (Bytes.megabytes(30).bytes() < size) { Utils.errorLog(Sampler.class, "I just allocated the object " + newObj + " of type " + name + " whose size is " + Bytes.bytes(size).megabytes() + " megabytes" + ((count != -1) ? (" and it is an array of size " + count) : "")); } } }); While gc log says: 28787.582: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: humongous allocation request failed, allocation request: 58720272 bytes] 28787.582: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 58720256 bytes, attempted expansion amount: 58720256 bytes] 28787.582: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap expansion operation failed] 28787.585: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: humongous allocation request failed, allocation request: 58720272 bytes] 28787.585: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 58720256 bytes, attempted expansion amount: 58720256 bytes] 28787.585: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap expansion operation failed] 28787.585: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: allocation request failed, allocation request: 58720272 bytes] 28787.585: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 58720272 bytes, attempted expansion amount: 67108864 bytes] 28787.585: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap expansion operation failed] {Heap before GC invocations=774 (full 231): garbage-first heap total 31457280K, used 22000762K [0x00007fdbf7800000, 0x00007fe377800000, 0x00007fe377800000) region size 8192K, 69 young (565248K), 69 survivors (565248K) compacting perm gen total 524288K, used 169935K [0x00007fe377800000, 0x00007fe397800000, 0x00007fe397800000) the space 524288K, 32% used [0x00007fe377800000, 0x00007fe381df3e28, 0x00007fe381df4000, 0x00007fe397800000) No shared spaces configured. 2013-06-10T14:25:52.166+0300: 28787.585: [Full GC 21485M->17350M(30720M), 67.3563480 secs] [Times: user=96.62 sys=0.15, real=67.34 secs] Heap after GC invocations=775 (full 232): garbage-first heap total 31457280K, used 17766944K [0x00007fdbf7800000, 0x00007fe377800000, 0x00007fe377800000) region size 8192K, 0 young (0K), 0 survivors (0K) compacting perm gen total 524288K, used 169935K [0x00007fe377800000, 0x00007fe397800000, 0x00007fe397800000) the space 524288K, 32% used [0x00007fe377800000, 0x00007fe381df3e28, 0x00007fe381df4000, 0x00007fe397800000) No shared spaces configured. } Does this make sense? ** Martin From martin.makundi at koodaripalvelut.com Tue Jun 11 08:52:54 2013 From: martin.makundi at koodaripalvelut.com (Martin Makundi) Date: Tue, 11 Jun 2013 18:52:54 +0300 Subject: Bug in G1GC it performs Full GC when code cache is full resulting in overkill In-Reply-To: References: <3807DE76-D6CA-451F-AC72-771332825905@salesforce.com> <5C892669-3515-4DCC-BB72-96AE89EBE5F8@salesforce.com> <51A63C5F.9040808@oracle.com> <51A7BCC7.4050700@oracle.com> <51AA0A24.2010103@oracle.com> <0014A986-1651-44C6-8D0D-B77E200C18D6@salesforce.com> <1370258292.2707.43.camel@cirrus> <1370260393.2707.63.camel@cirrus> <1370426703.2599.16.camel@cirrus> <1370870693.2814.26.camel@cirrus> <1370874775.2814.59.camel@cirrus> Message-ID: Hi! Same result with re-compiled instrumentation. It works if I explicitly instantiate a large stringbuffer, for example, but not when the gc log says: 18176.048: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: humongous allocation request failed, allocation request: 46137360 bytes] 18176.048: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 8388608 bytes, attempted expansion amount: 8388608 bytes] 18176.048: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap expansion operation failed] 18176.079: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: humongous allocation request failed, allocation request: 46137360 bytes] 18176.079: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 8388608 bytes, attempted expansion amount: 8388608 bytes] 18176.079: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap expansion operation failed] 18176.079: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: allocation request failed, allocation request: 46137360 bytes] 18176.079: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 46137360 bytes, attempted expansion amount: 50331648 bytes] 18176.079: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap expansion operation failed] {Heap before GC invocations=525 (full 191): garbage-first heap total 31457280K, used 24727426K [0x00007f8344000000, 0x00007f8ac4000000, 0x00007f8ac4000000) region size 8192K, 54 young (442368K), 54 survivors (442368K) compacting perm gen total 524288K, used 152746K [0x00007f8ac4000000, 0x00007f8ae4000000, 0x00007f8ae4000000) the space 524288K, 29% used [0x00007f8ac4000000, 0x00007f8acd52ab68, 0x00007f8acd52ac00, 0x00007f8ae4000000) No shared spaces configured. 2013-06-11T12:07:03.152+0300: 18176.079: [Full GC 24147M->14651M(30720M), 59.0086320 secs] [Times: user=82.99 sys=0.43, real=59.00 secs] Heap after GC invocations=526 (full 192): garbage-first heap total 31457280K, used 15003001K [0x00007f8344000000, 0x00007f8ac4000000, 0x00007f8ac4000000) region size 8192K, 0 young (0K), 0 survivors (0K) compacting perm gen total 524288K, used 152746K [0x00007f8ac4000000, 0x00007f8ae4000000, 0x00007f8ae4000000) the space 524288K, 29% used [0x00007f8ac4000000, 0x00007f8acd52ab68, 0x00007f8acd52ac00, 0x00007f8ae4000000) No shared spaces configured. } ** Martin 2013/6/10 Martin Makundi : > Ok, will rebuild it. However, it does work if I explicitly > instantiate, for example a huge string or something. > > And I can see it working on the logs: > 2013-06-10T17:18:35.179+0300 - 37sec - at > java.lang.ThreadLocal$ThreadLocalMap.access$100(ThreadLocal.java:261) > at java.lang.ThreadLocal.set(ThreadLocal.java:183) > at com.google.monitoring.runtime.instrumentation.AllocationRecorder.recordAllocation(Unknown > Source) > at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1517) > at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1493) > at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1416) > > Nevertheless, I rebuilt it and will investigate logs again tomorrow. > > ** > Martin > > 2013/6/10 Thomas Schatzl : >> Hi, >> >> On Mon, 2013-06-10 at 16:32 +0300, Martin Makundi wrote: >>> Hi! >>> >>> >> Does this make sense? >>> > >>> > To me, the log indicates that there are multiple threads trying to >>> > allocate large objects, all of them failing to do so, only then >>> > triggering the gc. >>> >>> Sizes of the allocations seem too close to each other to be multiple threads. >>> >>> Is it possible that the gc is racing against itself in some >>> re-allocation algorithm? >> >> The allocation after the full gc occurs inside the safepoint - so no. >> >> The collector tries this allocation twice before actually doing the >> collection, both of them failing. So you get two sets of messages. I.e. >> first the gc tries outside the safepoint and fails, then schedules a >> safepoint. Within that safepoint the code tries again first in the hope >> that another previous safepoint might have freed enough space. >> >> This explains the messages, as every time you get the "attempt heap >> expansion", "expand the heap" and "did not expand the heap" message. >> >>> > I assume you checked that the log messages are not possibly printed >>> > after the gc? The instrumentation invokes the allocation sampler after >>> > allocation. So in this case, the threads will block at the allocation, >>> > wait for gc, and only after that print the expected messages. There may >>> > also be a delay between the gc output and the allocation information. >>> >>> I did check, not even a single 30mb allocation in today's log. >>> >>> > Maybe looking at the code for the instrumentation helps finding this >>> > issue (this is the first one such bytecode rewriting tool that looked >>> > simple to use that I found as example). >>> >>> You mean change instrumentation or debug it? We use exactly this one: >>> https://code.google.com/p/java-allocation-instrumenter/ >> >> Something like that; I really do not know why the messages do not show >> up in the log. The only idea I would have at the moment is that there is >> something wrong with the instrumentation, i.e. for some reason not all >> allocation sites are instrumented. >> >> Maybe rebuilding this instrumenter with current libraries may help. >> >> hth, >> Thomas >> >>