From jon.masamitsu at oracle.com Mon Jun 2 22:52:14 2014 From: jon.masamitsu at oracle.com (Jon Masamitsu) Date: Mon, 02 Jun 2014 15:52:14 -0700 Subject: Minor GC difference Java 7 vs Java 8 In-Reply-To: References: , <53879CEB.2070803@Oracle.COM> Message-ID: <538D001E.1050801@oracle.com> On 06/01/2014 05:14 PM, Chris Hurst wrote: > Hi, > > Thanks for the replies, I've been stuck on this issue for about a year > and had raised it with Oracle support but hadn't got anywhere but last > weekend I managed to get a lot further with it .. > > I wrote a trivial program to continuously fill young gen and release > the garbage for use with some DTrace tests and this showed a similar > issue spikes wise from there I could work out the issue as the > parallel GC threads, i.e. anything less than number of cores removed > the spike (ie cores-1), for the test program normal young GC > oscillated about 1ms but spiked at about 15ms(need to check) (Reducing > the parallel threads worked on the real application in a similar way). > We managed to identify some very minor tasks (they were so small they > weren't showing up on some of our less fine grained CPU monitoring) > that occasionally competed for CPU, the effect was surprising > relatively but now we understand the cause we can tune the Java 6 GC > better. > The spikes were again larger than I would have expected and all appear > to be every close in size, I wouldn't have predicted this from the > issue but that's fine ;-) > > Currently the Java 7 version is still not quite as good on overall > throughput when not spiking though I will recheck these results, as > our most recent tests were around tuning J6 with the new info. We're > happy with our Java 6 GC performance. > > Although we can now reduce these already rare spikes (potentially to > zero), I can't 100% guarantee they won't occur so I would still like > to understand why Java 7 appears to handle this scenario less efficiently. > > Using Dtrace we were mostly seeing yields and looking at a stack trace > pointed us toward some JDK 7 changes and some newer java options that > might be related ?? ... > > taskqueue.cpp > > libc.so.1`lwp_yield+0x15 libjvm.so`__1cCosFyield6F_v_+0x257 > libjvm.so`__1cWParallelTaskTerminatorFyield6M_v_+0x18 > libjvm.so`__1cWParallelTaskTerminatorRoffer_termination6MpnUTerminatorTerminator__b_+0xe8 > libjvm.so`__1cJStealTaskFdo_it6MpnNGCTaskManager_I_v_+0x378 > libjvm.so`__1cMGCTaskThreadDrun6M_v_+0x19f libjvm.so`java_start+0x1f2 > libc.so.1`_thr_setup+0x4e libc.so.1`_lwp_start 17:29 > > uintx WorkStealingHardSpins = 4096 > {experimental} > uintx WorkStealingSleepMillis = 1 > {experimental} > uintx WorkStealingSpinToYieldRatio = 10 > {experimental} > uintx WorkStealingYieldsBeforeSleep = 5000 > {experimental} > > I haven't had a chance to play with these as yet but could these be > involved eg j7 tuned to be more friendly to other applications at the > cost of latency (spin to yield) ? Would that make sense ? > Chris, My best recollection is that there was a performance regression reported internally and the change to 5000 was to fix that regression. Increasing the number of yield's done before a sleep made this code work more like the previous behavior. Let me know if you need better information and I can see what I can dig up. By the way, when you tuned down the number of ParallelGCThreads, you saw little or no increase in the STW pause times? You're using UseParallelGC? Jon > We would like to move to Java 7 for support reasons, also as we are on > Solaris the extra memory over head of J8 (64bit only) even with > compressed oops gives us another latency hit. > > Chris > > PS -XX:+AlwaysPreTouch is on. > > > Date: Thu, 29 May 2014 13:47:39 -0700 > > From: Peter.B.Kessler at Oracle.COM > > To: christopherhurst at hotmail.com; hotspot-gc-use at openjdk.java.net > > Subject: Re: Minor GC difference Java 7 vs Java 8 > > > > Are the -XX:+PrintGCDetails "[Times: user=0.01 sys=0.00, real=0.03 > secs]" reports for the long pauses different from the short pauses? > I'm hoping for some anomalous sys time, or user/real ratio, that would > indicate it was something happening on the machine that is interfering > with the collector. But you'd think that would show up as occasional > 15ms blips in your message processing latency outside of when the > collector goes off. > > > > Does -XX:+PrintHeapAtGC show anything anomalous about the space > occupancy after the long pauses? E.g., more objects getting copied to > the survivor space, or promoted to the old generation? You could infer > the numbers from -XX:+PrintGCDetails output if you didn't want to deal > with the volume produced by -XX:+PrintHeapAtGC. > > > > You don't say how large or how stable your old generation size is. > If you have to get new pages from the OS to expand the old generation, > or give pages back to the OS because the old generation can shrink, > that's extra work. You can infer this traffic from -XX:+PrintHeapAtGC > output by looking at the "committed" values for the generations. E.g., > in "ParOldGen total 43008K, used 226K [0xba400000, 0xbce00000, > 0xe4e00000)" those three hex numbers are the start address for the > generation, the end of the committed memory for that generation, and > the end of the reserved memory for that generation. There's a similar > report for the young generation. Running with -Xms equal to -Xmx > should prevent pages from being acquired from or returned to the OS > during the run. > > > > Are you running with -XX:+AlwaysPreTouch? Even if you've reserved > and committed the address space, the first time you touch new pages > the OS wants to zero them, which takes time. That flags forces all the > zeroing at initialization. If you know your page size, you should be > able to see the generations (mostly the old generation) crossing a > page boundary for the first time in the -XX:+PrintHeapAtGC output. > > > > Or it could be some change in the collector between JDK-6 and JDK-7. > > > > Posting some log snippets might let sharper eyes see something. > > > > ... peter > > > > On 04/30/14 07:58, Chris Hurst wrote: > > > Hi, > > > > > > Has anyone seen anything similar to this ... > > > > > > On java 6 (range of versions 32bit Solaris) application , using > parallel old gc, non adapative. Using a very heavy test performance > load we see minor GC's around the 5ms mark and some very rare say 3or4 > ish instances in 12 hours say 20ms pauses the number of pauses is > random (though always few compares with the total number of GC's) and > large ~20ms (this value appears the same for all such points.) We have > a large number of minor GC's in our runs, only a full GC at startup. > These freak GC's can be bunched or spread out and we can run for many > hours without one (though doing minor GC's). > > > > > > What's odd is that if I use Java 7 (range of versions 32bit) the > result is very close but the spikes (1 or 2 arguably less) are now > 30-40ms (depends on run arguably even rarer). Has anyone experienced > anything similar why would Java 7 up to double a minor GC / The GC > throughput is approximately the same arguably 7 is better throughput > just but that freak minor GC makes it usable due to latency. > > > > > > In terms of the change in spike height (20 (J6)vs40(J7)) this is > very reproducible though the number of points and when they occur > varies slightly. The over all GC graph , throughput is similar > otherwise as is the resultant memory dump at the end. The test should > be constant load, multiple clients just doing the same thing over and > over. > > > > > > Has anyone seen anything similar, I was hoping someone might have > seen a change in defaults, thread timeout, default data structure size > change that would account for this. I was hoping the marked increase > might be a give away to someone as its way off our average minor GC time. > > > > > > We have looked at gclogs, heap dumps, processor activity, > background processes, amount of disc access, safepoints etc etc. , we > trace message rate into out of the application for variation, compare > heap dumps at end etc. nothing stands out so far. > > > > > > Chris > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > hotspot-gc-use mailing list > > > hotspot-gc-use at openjdk.java.net > > > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jon.masamitsu at oracle.com Tue Jun 3 17:37:24 2014 From: jon.masamitsu at oracle.com (Jon Masamitsu) Date: Tue, 03 Jun 2014 10:37:24 -0700 Subject: Minor GC difference Java 7 vs Java 8 In-Reply-To: References: , <53879CEB.2070803@Oracle.COM> , <538D001E.1050801@oracle.com> Message-ID: <538E07D4.2020407@oracle.com> On 06/03/2014 06:51 AM, Chris Hurst wrote: > Hi, > > Reducing parallel threads on the simple testbed sample code actually > reduced minor GC STW's (presumably in this scenario multithreading is > inefficient due to the simplicity of the task), however the reverse > was true for the real application though for a signal thread reduction > the difference is hard to measure sub ms if at all. I'd expect the > difference to be because the real application will have a more complex > object graph and be tenuring with a very small amount of promotion to > old. If a spike does occur with parallel gc threads reduced ie > increasing other process activity on the box the duration of the spike > appears unaffected by the reduction in parallel GC threads which I > guess I would expect. > > Any suggestions as to what value to use for > WorkStealingYieldsBeforeSleep, otherwise I'll just trying doubling it > ? If you have any additional notes , details on this change that would > be most helpful. WorkStealingYieldsBeforeSleep is one of those flags that just depends on your needs so I don't have any good suggestion. Jon > > Chris > > PS This particular application uses the following (we regularly test > other GC permutations including G1 / CMS, our goal is lower latency) .. > > -Xms1536m > -Xmx1536m > -XX:+PrintCommandLineFlags > -XX:+UseParallelOldGC > -XX:+UnlockDiagnosticVMOptions > -XX:PermSize=50M > -XX:-UseAdaptiveSizePolicy > -XX:+AlwaysPreTouch > -XX:MaxTenuringThreshold=15 > -XX:InitialTenuringThreshold=15 > -XX:+DTraceMonitorProbes > -XX:+ExtendedDTraceProbes > -Dsun.rmi.dgc.client.gcInterval=604800000 > -Dsun.rmi.dgc.server.gcInterval=604800000 > > ------------------------------------------------------------------------ > Date: Mon, 2 Jun 2014 15:52:14 -0700 > From: jon.masamitsu at oracle.com > To: christopherhurst at hotmail.com > CC: hotspot-gc-use at openjdk.java.net > Subject: Re: Minor GC difference Java 7 vs Java 8 > > > On 06/01/2014 05:14 PM, Chris Hurst wrote: > > Hi, > > Thanks for the replies, I've been stuck on this issue for about a > year and had raised it with Oracle support but hadn't got anywhere > but last weekend I managed to get a lot further with it .. > > I wrote a trivial program to continuously fill young gen and > release the garbage for use with some DTrace tests and this showed > a similar issue spikes wise from there I could work out the issue > as the parallel GC threads, i.e. anything less than number of > cores removed the spike (ie cores-1), for the test program normal > young GC oscillated about 1ms but spiked at about 15ms(need to > check) (Reducing the parallel threads worked on the real > application in a similar way). > We managed to identify some very minor tasks (they were so small > they weren't showing up on some of our less fine grained CPU > monitoring) that occasionally competed for CPU, the effect was > surprising relatively but now we understand the cause we can tune > the Java 6 GC better. > The spikes were again larger than I would have expected and all > appear to be every close in size, I wouldn't have predicted this > from the issue but that's fine ;-) > > Currently the Java 7 version is still not quite as good on overall > throughput when not spiking though I will recheck these results, > as our most recent tests were around tuning J6 with the new info. > We're happy with our Java 6 GC performance. > > Although we can now reduce these already rare spikes (potentially > to zero), I can't 100% guarantee they won't occur so I would still > like to understand why Java 7 appears to handle this scenario less > efficiently. > > Using Dtrace we were mostly seeing yields and looking at a stack > trace pointed us toward some JDK 7 changes and some newer java > options that might be related ?? ... > > taskqueue.cpp > > libc.so.1`lwp_yield+0x15 libjvm.so`__1cCosFyield6F_v_+0x257 > libjvm.so`__1cWParallelTaskTerminatorFyield6M_v_+0x18 > libjvm.so`__1cWParallelTaskTerminatorRoffer_termination6MpnUTerminatorTerminator__b_+0xe8 > libjvm.so`__1cJStealTaskFdo_it6MpnNGCTaskManager_I_v_+0x378 > libjvm.so`__1cMGCTaskThreadDrun6M_v_+0x19f > libjvm.so`java_start+0x1f2 libc.so.1`_thr_setup+0x4e > libc.so.1`_lwp_start 17:29 > > uintx WorkStealingHardSpins = 4096 > {experimental} > uintx WorkStealingSleepMillis = 1 > {experimental} > uintx WorkStealingSpinToYieldRatio = 10 > {experimental} > uintx WorkStealingYieldsBeforeSleep = 5000 > {experimental} > > I haven't had a chance to play with these as yet but could these > be involved eg j7 tuned to be more friendly to other applications > at the cost of latency (spin to yield) ? Would that make sense ? > > > Chris, > > My best recollection is that there was a performance regression > reported internally and the change to 5000 was to fix > that regression. Increasing the number of yield's done before > a sleep made this code work more like the previous behavior. > Let me know if you need better information and I can see what > I can dig up. > > By the way, when you tuned down the number of ParallelGCThreads, > you saw little or no increase in the STW pause times? > > You're using UseParallelGC? > > Jon > > We would like to move to Java 7 for support reasons, also as we > are on Solaris the extra memory over head of J8 (64bit only) even > with compressed oops gives us another latency hit. > > Chris > > PS -XX:+AlwaysPreTouch is on. > > > Date: Thu, 29 May 2014 13:47:39 -0700 > > From: Peter.B.Kessler at Oracle.COM > > To: christopherhurst at hotmail.com > ; > hotspot-gc-use at openjdk.java.net > > > Subject: Re: Minor GC difference Java 7 vs Java 8 > > > > Are the -XX:+PrintGCDetails "[Times: user=0.01 sys=0.00, > real=0.03 secs]" reports for the long pauses different from the > short pauses? I'm hoping for some anomalous sys time, or user/real > ratio, that would indicate it was something happening on the > machine that is interfering with the collector. But you'd think > that would show up as occasional 15ms blips in your message > processing latency outside of when the collector goes off. > > > > Does -XX:+PrintHeapAtGC show anything anomalous about the space > occupancy after the long pauses? E.g., more objects getting copied > to the survivor space, or promoted to the old generation? You > could infer the numbers from -XX:+PrintGCDetails output if you > didn't want to deal with the volume produced by -XX:+PrintHeapAtGC. > > > > You don't say how large or how stable your old generation size > is. If you have to get new pages from the OS to expand the old > generation, or give pages back to the OS because the old > generation can shrink, that's extra work. You can infer this > traffic from -XX:+PrintHeapAtGC output by looking at the > "committed" values for the generations. E.g., in "ParOldGen total > 43008K, used 226K [0xba400000, 0xbce00000, 0xe4e00000)" those > three hex numbers are the start address for the generation, the > end of the committed memory for that generation, and the end of > the reserved memory for that generation. There's a similar report > for the young generation. Running with -Xms equal to -Xmx should > prevent pages from being acquired from or returned to the OS > during the run. > > > > Are you running with -XX:+AlwaysPreTouch? Even if you've > reserved and committed the address space, the first time you touch > new pages the OS wants to zero them, which takes time. That flags > forces all the zeroing at initialization. If you know your page > size, you should be able to see the generations (mostly the old > generation) crossing a page boundary for the first time in the > -XX:+PrintHeapAtGC output. > > > > Or it could be some change in the collector between JDK-6 and JDK-7. > > > > Posting some log snippets might let sharper eyes see something. > > > > ... peter > > > > On 04/30/14 07:58, Chris Hurst wrote: > > > Hi, > > > > > > Has anyone seen anything similar to this ... > > > > > > On java 6 (range of versions 32bit Solaris) application , > using parallel old gc, non adapative. Using a very heavy test > performance load we see minor GC's around the 5ms mark and some > very rare say 3or4 ish instances in 12 hours say 20ms pauses the > number of pauses is random (though always few compares with the > total number of GC's) and large ~20ms (this value appears the same > for all such points.) We have a large number of minor GC's in our > runs, only a full GC at startup. These freak GC's can be bunched > or spread out and we can run for many hours without one (though > doing minor GC's). > > > > > > What's odd is that if I use Java 7 (range of versions 32bit) > the result is very close but the spikes (1 or 2 arguably less) are > now 30-40ms (depends on run arguably even rarer). Has anyone > experienced anything similar why would Java 7 up to double a minor > GC / The GC throughput is approximately the same arguably 7 is > better throughput just but that freak minor GC makes it usable due > to latency. > > > > > > In terms of the change in spike height (20 (J6)vs40(J7)) this > is very reproducible though the number of points and when they > occur varies slightly. The over all GC graph , throughput is > similar otherwise as is the resultant memory dump at the end. The > test should be constant load, multiple clients just doing the same > thing over and over. > > > > > > Has anyone seen anything similar, I was hoping someone might > have seen a change in defaults, thread timeout, default data > structure size change that would account for this. I was hoping > the marked increase might be a give away to someone as its way off > our average minor GC time. > > > > > > We have looked at gclogs, heap dumps, processor activity, > background processes, amount of disc access, safepoints etc etc. , > we trace message rate into out of the application for variation, > compare heap dumps at end etc. nothing stands out so far. > > > > > > Chris > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > hotspot-gc-use mailing list > > > hotspot-gc-use at openjdk.java.net > > > > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From christopherhurst at hotmail.com Tue Jun 3 13:51:31 2014 From: christopherhurst at hotmail.com (Chris Hurst) Date: Tue, 3 Jun 2014 13:51:31 +0000 Subject: Minor GC difference Java 7 vs Java 8 In-Reply-To: <538D001E.1050801@oracle.com> References: , <53879CEB.2070803@Oracle.COM> , <538D001E.1050801@oracle.com> Message-ID: Hi, Reducing parallel threads on the simple testbed sample code actually reduced minor GC STW's (presumably in this scenario multithreading is inefficient due to the simplicity of the task), however the reverse was true for the real application though for a signal thread reduction the difference is hard to measure sub ms if at all. I'd expect the difference to be because the real application will have a more complex object graph and be tenuring with a very small amount of promotion to old. If a spike does occur with parallel gc threads reduced ie increasing other process activity on the box the duration of the spike appears unaffected by the reduction in parallel GC threads which I guess I would expect. Any suggestions as to what value to use for WorkStealingYieldsBeforeSleep, otherwise I'll just trying doubling it ? If you have any additional notes , details on this change that would be most helpful. Chris PS This particular application uses the following (we regularly test other GC permutations including G1 / CMS, our goal is lower latency) .. -Xms1536m -Xmx1536m -XX:+PrintCommandLineFlags -XX:+UseParallelOldGC -XX:+UnlockDiagnosticVMOptions -XX:PermSize=50M -XX:-UseAdaptiveSizePolicy -XX:+AlwaysPreTouch -XX:MaxTenuringThreshold=15 -XX:InitialTenuringThreshold=15 -XX:+DTraceMonitorProbes -XX:+ExtendedDTraceProbes -Dsun.rmi.dgc.client.gcInterval=604800000 -Dsun.rmi.dgc.server.gcInterval=604800000 Date: Mon, 2 Jun 2014 15:52:14 -0700 From: jon.masamitsu at oracle.com To: christopherhurst at hotmail.com CC: hotspot-gc-use at openjdk.java.net Subject: Re: Minor GC difference Java 7 vs Java 8 On 06/01/2014 05:14 PM, Chris Hurst wrote: Hi, Thanks for the replies, I've been stuck on this issue for about a year and had raised it with Oracle support but hadn't got anywhere but last weekend I managed to get a lot further with it .. I wrote a trivial program to continuously fill young gen and release the garbage for use with some DTrace tests and this showed a similar issue spikes wise from there I could work out the issue as the parallel GC threads, i.e. anything less than number of cores removed the spike (ie cores-1), for the test program normal young GC oscillated about 1ms but spiked at about 15ms(need to check) (Reducing the parallel threads worked on the real application in a similar way). We managed to identify some very minor tasks (they were so small they weren't showing up on some of our less fine grained CPU monitoring) that occasionally competed for CPU, the effect was surprising relatively but now we understand the cause we can tune the Java 6 GC better. The spikes were again larger than I would have expected and all appear to be every close in size, I wouldn't have predicted this from the issue but that's fine ;-) Currently the Java 7 version is still not quite as good on overall throughput when not spiking though I will recheck these results, as our most recent tests were around tuning J6 with the new info. We're happy with our Java 6 GC performance. Although we can now reduce these already rare spikes (potentially to zero), I can't 100% guarantee they won't occur so I would still like to understand why Java 7 appears to handle this scenario less efficiently. Using Dtrace we were mostly seeing yields and looking at a stack trace pointed us toward some JDK 7 changes and some newer java options that might be related ?? ... taskqueue.cpp libc.so.1`lwp_yield+0x15 libjvm.so`__1cCosFyield6F_v_+0x257 libjvm.so`__1cWParallelTaskTerminatorFyield6M_v_+0x18 libjvm.so`__1cWParallelTaskTerminatorRoffer_termination6MpnUTerminatorTerminator__b_+0xe8 libjvm.so`__1cJStealTaskFdo_it6MpnNGCTaskManager_I_v_+0x378 libjvm.so`__1cMGCTaskThreadDrun6M_v_+0x19f libjvm.so`java_start+0x1f2 libc.so.1`_thr_setup+0x4e libc.so.1`_lwp_start 17:29 uintx WorkStealingHardSpins = 4096 {experimental} uintx WorkStealingSleepMillis = 1 {experimental} uintx WorkStealingSpinToYieldRatio = 10 {experimental} uintx WorkStealingYieldsBeforeSleep = 5000 {experimental} I haven't had a chance to play with these as yet but could these be involved eg j7 tuned to be more friendly to other applications at the cost of latency (spin to yield) ? Would that make sense ? Chris, My best recollection is that there was a performance regression reported internally and the change to 5000 was to fix that regression. Increasing the number of yield's done before a sleep made this code work more like the previous behavior. Let me know if you need better information and I can see what I can dig up. By the way, when you tuned down the number of ParallelGCThreads, you saw little or no increase in the STW pause times? You're using UseParallelGC? Jon We would like to move to Java 7 for support reasons, also as we are on Solaris the extra memory over head of J8 (64bit only) even with compressed oops gives us another latency hit. Chris PS -XX:+AlwaysPreTouch is on. > Date: Thu, 29 May 2014 13:47:39 -0700 > From: Peter.B.Kessler at Oracle.COM > To: christopherhurst at hotmail.com; hotspot-gc-use at openjdk.java.net > Subject: Re: Minor GC difference Java 7 vs Java 8 > > Are the -XX:+PrintGCDetails "[Times: user=0.01 sys=0.00, real=0.03 secs]" reports for the long pauses different from the short pauses? I'm hoping for some anomalous sys time, or user/real ratio, that would indicate it was something happening on the machine that is interfering with the collector. But you'd think that would show up as occasional 15ms blips in your message processing latency outside of when the collector goes off. > > Does -XX:+PrintHeapAtGC show anything anomalous about the space occupancy after the long pauses? E.g., more objects getting copied to the survivor space, or promoted to the old generation? You could infer the numbers from -XX:+PrintGCDetails output if you didn't want to deal with the volume produced by -XX:+PrintHeapAtGC. > > You don't say how large or how stable your old generation size is. If you have to get new pages from the OS to expand the old generation, or give pages back to the OS because the old generation can shrink, that's extra work. You can infer this traffic from -XX:+PrintHeapAtGC output by looking at the "committed" values for the generations. E.g., in "ParOldGen total 43008K, used 226K [0xba400000, 0xbce00000, 0xe4e00000)" those three hex numbers are the start address for the generation, the end of the committed memory for that generation, and the end of the reserved memory for that generation. There's a similar report for the young generation. Running with -Xms equal to -Xmx should prevent pages from being acquired from or returned to the OS during the run. > > Are you running with -XX:+AlwaysPreTouch? Even if you've reserved and committed the address space, the first time you touch new pages the OS wants to zero them, which takes time. That flags forces all the zeroing at initialization. If you know your page size, you should be able to see the generations (mostly the old generation) crossing a page boundary for the first time in the -XX:+PrintHeapAtGC output. > > Or it could be some change in the collector between JDK-6 and JDK-7. > > Posting some log snippets might let sharper eyes see something. > > ... peter > > On 04/30/14 07:58, Chris Hurst wrote: > > Hi, > > > > Has anyone seen anything similar to this ... > > > > On java 6 (range of versions 32bit Solaris) application , using parallel old gc, non adapative. Using a very heavy test performance load we see minor GC's around the 5ms mark and some very rare say 3or4 ish instances in 12 hours say 20ms pauses the number of pauses is random (though always few compares with the total number of GC's) and large ~20ms (this value appears the same for all such points.) We have a large number of minor GC's in our runs, only a full GC at startup. These freak GC's can be bunched or spread out and we can run for many hours without one (though doing minor GC's). > > > > What's odd is that if I use Java 7 (range of versions 32bit) the result is very close but the spikes (1 or 2 arguably less) are now 30-40ms (depends on run arguably even rarer). Has anyone experienced anything similar why would Java 7 up to double a minor GC / The GC throughput is approximately the same arguably 7 is better throughput just but that freak minor GC makes it usable due to latency. > > > > In terms of the change in spike height (20 (J6)vs40(J7)) this is very reproducible though the number of points and when they occur varies slightly. The over all GC graph , throughput is similar otherwise as is the resultant memory dump at the end. The test should be constant load, multiple clients just doing the same thing over and over. > > > > Has anyone seen anything similar, I was hoping someone might have seen a change in defaults, thread timeout, default data structure size change that would account for this. I was hoping the marked increase might be a give away to someone as its way off our average minor GC time. > > > > We have looked at gclogs, heap dumps, processor activity, background processes, amount of disc access, safepoints etc etc. , we trace message rate into out of the application for variation, compare heap dumps at end etc. nothing stands out so far. > > > > Chris > > > > > > > > > > > > > > > > > > _______________________________________________ > > hotspot-gc-use mailing list > > hotspot-gc-use at openjdk.java.net > > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin.makundi at koodaripalvelut.com Sat Jun 7 08:03:58 2014 From: martin.makundi at koodaripalvelut.com (Martin Makundi) Date: Sat, 7 Jun 2014 11:03:58 +0300 Subject: Why does G1GC do Full GC when it only needs 40 bytes? Message-ID: Why does G1GC do Full GC when it only needs only 40 bytes? Is there a way to tune this so that it would try to free up "some" chunk of memory and escape the full gc when enough memory has been freed? This would lower the freeze time? See logs: [Times: user=12.89 sys=0.16, real=1.25 secs] 2014-06-06T17:11:48.727+0300: 34022.631: [GC concurrent-root-region-scan-start] 2014-06-06T17:11:48.727+0300: 34022.631: [GC concurrent-root-region-scan-end, 0.0000180 secs] 2014-06-06T17:11:48.727+0300: 34022.631: [GC concurrent-mark-start] 34022.632: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: allocation request failed, allocation request: 40 bytes] 34022.632: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 8388608 bytes, attempted expansion amount: 8388608 bytes] 34022.632: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap expansion operation failed] {Heap before GC invocations=1867 (full 1): garbage-first heap total 31457280K, used 31453781K [0x00007f1724000000, 0x00007f1ea4000000, 0x00007f1ea4000000) region size 8192K, 0 young (0K), 0 survivors (0K) compacting perm gen total 524288K, used 166271K [0x00007f1ea4000000, 0x00007f1ec4000000, 0x00007f1ec4000000) the space 524288K, 31% used [0x00007f1ea4000000, 0x00007f1eae25fdc8, 0x00007f1eae25fe00, 0x00007f1ec4000000) No shared spaces configured. 2014-06-06T17:11:48.728+0300: 34022.632: [Full GC 29G->13G(30G), 47.7918670 secs] Settings: -server -XX:InitiatingHeapOccupancyPercent=0 -Xss4096k -XX:MaxPermSize=512m -XX:PermSize=512m -Xms20G -Xmx30G -Xnoclassgc -XX:-OmitStackTraceInFastThrow -XX:+UseNUMA -XX:+UseFastAccessorMethods -XX:ReservedCodeCacheSize=128m -XX:-UseStringCache -XX:+UseGCOverheadLimit -Duser.timezone=EET -XX:+UseCompressedOops -XX:+DisableExplicitGC -XX:+AggressiveOpts -XX:CMSInitiatingOccupancyFraction=70 -XX:+ParallelRefProcEnabled -XX:+UseAdaptiveSizePolicy -XX:MaxGCPauseMillis=500 -XX:+UseG1GC -XX:G1HeapRegionSize=8M -XX:GCPauseIntervalMillis=10000 -XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+PrintAdaptiveSizePolicy -XX:+PrintGCDateStamps -XX:+PrintGC -Xloggc:gc.log -------------- next part -------------- An HTML attachment was scrubbed... URL: From yu.zhang at oracle.com Mon Jun 9 05:55:55 2014 From: yu.zhang at oracle.com (YU ZHANG) Date: Sun, 08 Jun 2014 22:55:55 -0700 Subject: Why does G1GC do Full GC when it only needs 40 bytes? In-Reply-To: References: Message-ID: <53954C6B.30406@oracle.com> Martin, The log shows full gc happened when it tries to satisfy a 40 byte allocation request. -XX:InitiatingHeapOccupancyPercent=0 should trigger concurrent gcs too much. What is the jdk version? Can you share the gc log? Thanks, Jenny On 6/7/2014 1:03 AM, Martin Makundi wrote: > Why does G1GC do Full GC when it only needs only 40 bytes? > > Is there a way to tune this so that it would try to free up "some" > chunk of memory and escape the full gc when enough memory has been > freed? This would lower the freeze time? > > See logs: > > [Times: user=12.89 sys=0.16, real=1.25 secs] > 2014-06-06T17:11:48.727+0300: 34022.631: [GC > concurrent-root-region-scan-start] > 2014-06-06T17:11:48.727+0300: 34022.631: [GC > concurrent-root-region-scan-end, 0.0000180 secs] > 2014-06-06T17:11:48.727+0300: 34022.631: [GC concurrent-mark-start] > 34022.632: [G1Ergonomics (Heap Sizing) attempt heap expansion, > reason: allocation request failed, allocation request: 40 bytes] > 34022.632: [G1Ergonomics (Heap Sizing) expand the heap, requested > expansion amount: 8388608 bytes, attempted expansion amount: 8388608 > bytes] > 34022.632: [G1Ergonomics (Heap Sizing) did not expand the heap, > reason: heap expansion operation failed] > {Heap before GC invocations=1867 (full 1): > garbage-first heap total 31457280K, used 31453781K > [0x00007f1724000000, 0x00007f1ea4000000, 0x00007f1ea4000000) > region size 8192K, 0 young (0K), 0 survivors (0K) > compacting perm gen total 524288K, used 166271K [0x00007f1ea4000000, > 0x00007f1ec4000000, 0x00007f1ec4000000) > the space 524288K, 31% used [0x00007f1ea4000000, > 0x00007f1eae25fdc8, 0x00007f1eae25fe00, 0x00007f1ec4000000) > No shared spaces configured. > 2014-06-06T17:11:48.728+0300: 34022.632: [Full GC 29G->13G(30G), > 47.7918670 secs] > > > Settings: > > -server -XX:InitiatingHeapOccupancyPercent=0 -Xss4096k > -XX:MaxPermSize=512m -XX:PermSize=512m -Xms20G -Xmx30G -Xnoclassgc > -XX:-OmitStackTraceInFastThrow -XX:+UseNUMA > -XX:+UseFastAccessorMethods -XX:ReservedCodeCacheSize=128m > -XX:-UseStringCache -XX:+UseGCOverheadLimit -Duser.timezone=EET > -XX:+UseCompressedOops -XX:+DisableExplicitGC -XX:+AggressiveOpts > -XX:CMSInitiatingOccupancyFraction=70 -XX:+ParallelRefProcEnabled > -XX:+UseAdaptiveSizePolicy -XX:MaxGCPauseMillis=500 -XX:+UseG1GC > -XX:G1HeapRegionSize=8M -XX:GCPauseIntervalMillis=10000 > -XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+PrintAdaptiveSizePolicy > -XX:+PrintGCDateStamps -XX:+PrintGC -Xloggc:gc.log > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin.makundi at koodaripalvelut.com Mon Jun 9 14:29:28 2014 From: martin.makundi at koodaripalvelut.com (Martin Makundi) Date: Mon, 9 Jun 2014 17:29:28 +0300 Subject: Why does G1GC do Full GC when it only needs 40 bytes? In-Reply-To: References: Message-ID: 1. "-XX:InitiatingHeapOccupancyPercent=0 should trigger concurrent gcs too much. " We use InitiatingHeapOccupancyPercent to make sure the gc is always active. How is InitiatingHeapOccupancyPercent related to Full GC:s? 2. "What is the jdk version? java version "1.7.0_55" Java(TM) SE Runtime Environment (build 1.7.0_55-b13) Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode) 3. Can you share the gc log?" Log is available at 81.22.250.165/log 4. I wonder if it is possible to have mixed set of different/automatically varying sizes of G1HeapRegionSize's small for smaller objects and larger for larger objects? ** Martin 2014-06-07 11:03 GMT+03:00 Martin Makundi < martin.makundi at koodaripalvelut.com>: > Why does G1GC do Full GC when it only needs only 40 bytes? > > Is there a way to tune this so that it would try to free up "some" chunk > of memory and escape the full gc when enough memory has been freed? This > would lower the freeze time? > > See logs: > > [Times: user=12.89 sys=0.16, real=1.25 secs] > 2014-06-06T17:11:48.727+0300: 34022.631: [GC > concurrent-root-region-scan-start] > 2014-06-06T17:11:48.727+0300: 34022.631: [GC > concurrent-root-region-scan-end, 0.0000180 secs] > 2014-06-06T17:11:48.727+0300: 34022.631: [GC concurrent-mark-start] > 34022.632: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: > allocation request failed, allocation request: 40 bytes] > 34022.632: [G1Ergonomics (Heap Sizing) expand the heap, requested > expansion amount: 8388608 bytes, attempted expansion amount: 8388608 bytes] > 34022.632: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: > heap expansion operation failed] > {Heap before GC invocations=1867 (full 1): > garbage-first heap total 31457280K, used 31453781K [0x00007f1724000000, > 0x00007f1ea4000000, 0x00007f1ea4000000) > region size 8192K, 0 young (0K), 0 survivors (0K) > compacting perm gen total 524288K, used 166271K [0x00007f1ea4000000, > 0x00007f1ec4000000, 0x00007f1ec4000000) > the space 524288K, 31% used [0x00007f1ea4000000, 0x00007f1eae25fdc8, > 0x00007f1eae25fe00, 0x00007f1ec4000000) > No shared spaces configured. > 2014-06-06T17:11:48.728+0300: 34022.632: [Full GC 29G->13G(30G), > 47.7918670 secs] > > > Settings: > > -server -XX:InitiatingHeapOccupancyPercent=0 -Xss4096k > -XX:MaxPermSize=512m -XX:PermSize=512m -Xms20G -Xmx30G -Xnoclassgc > -XX:-OmitStackTraceInFastThrow -XX:+UseNUMA -XX:+UseFastAccessorMethods > -XX:ReservedCodeCacheSize=128m -XX:-UseStringCache -XX:+UseGCOverheadLimit > -Duser.timezone=EET -XX:+UseCompressedOops -XX:+DisableExplicitGC > -XX:+AggressiveOpts -XX:CMSInitiatingOccupancyFraction=70 > -XX:+ParallelRefProcEnabled -XX:+UseAdaptiveSizePolicy > -XX:MaxGCPauseMillis=500 -XX:+UseG1GC -XX:G1HeapRegionSize=8M > -XX:GCPauseIntervalMillis=10000 -XX:+PrintGCDetails -XX:+PrintHeapAtGC > -XX:+PrintAdaptiveSizePolicy -XX:+PrintGCDateStamps -XX:+PrintGC > -Xloggc:gc.log > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yaoshengzhe at gmail.com Mon Jun 9 18:42:48 2014 From: yaoshengzhe at gmail.com (yao) Date: Mon, 9 Jun 2014 11:42:48 -0700 Subject: Why does G1GC do Full GC when it only needs 40 bytes? In-Reply-To: References: Message-ID: Hi Martin, Just quickly went through your gc log and following lines caught my attention. 2014-06-06T17:11:29.109+0300: 34003.013: [GC pause (young) 34003.013: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 127096, predicted base time: 293.06 ms, remaining time: 206.94 ms, target pause time: 500.00 ms] 34003.013: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 872 regions, survivors: 113 regions, predicted young region time: 880.35 ms] 34003.013: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 872 regions, survivors: 113 regions, old: 0 regions, predicted pause time: 1173.41 ms, target pause time: 500.00 ms] 34003.014: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: region allocation request failed, allocation request: 4194304 bytes] 34003.014: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 4194304 bytes, attempted expansion amount: 8388608 bytes] 34003.014: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap expansion operation failed] 34021.371: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: recent GC overhead higher than threshold after GC, recent GC overhead: 12.44 %, threshold: 10.00 %, uncommitted: 0 bytes, calculated expansion amount: 0 bytes (20.00 %)] 34021.371: [G1Ergonomics (Concurrent Cycles) request concurrent cycle initiation, reason: occupancy higher than threshold, occupancy: 32212254720 bytes, allocation request: 0 bytes, threshold: 0 bytes (0.00 %), source: end of GC] (to-space exhausted), 18.3580340 secs] [Parallel Time: 15011.2 ms, GC Workers: 13] [GC Worker Start (ms): Min: 34003013.8, Avg: 34003018.9, Max: 34003045.1, Diff: 31.3] [Ext Root Scanning (ms): Min: 0.0, Avg: 8.4, Max: 16.7, Diff: 16.7, Sum: 109.6] [Update RS (ms): Min: 0.0, Avg: 2.7, Max: 6.8, Diff: 6.8, Sum: 34.9] [Processed Buffers: Min: 0, Avg: 39.9, Max: 153, Diff: 153, Sum: 519] [Scan RS (ms): Min: 0.2, Avg: 22.6, Max: 35.7, Diff: 35.5, Sum: 294.2] [Object Copy (ms): Min: 14949.9, Avg: 14968.9, Max: 15009.4, Diff: 59.5, Sum: 194595.3] [Termination (ms): Min: 0.1, Avg: 3.1, Max: 5.1, Diff: 5.0, Sum: 40.9] [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 1.0] [GC Worker Total (ms): Min: 14979.6, Avg: 15005.8, Max: 15011.0, Diff: 31.3, Sum: 195075.8] [GC Worker End (ms): Min: 34018024.7, Avg: 34018024.7, Max: 34018024.8, Diff: 0.1] [Code Root Fixup: 0.0 ms] [Clear CT: 2.0 ms] [Other: 3344.9 ms] [Choose CSet: 0.0 ms] [Ref Proc: 199.9 ms] [Ref Enq: 0.9 ms] [Free CSet: 1.0 ms] [Eden: 6976.0M(11.1G)->0.0B(11.1G) Survivors: 904.0M->0.0B Heap: 30.0G(30.0G)->30.0G(30.0G)] Typically, it means that there is a increasing object allocation in a short amount of time and G1 think it needs more heap. However, the following GC indicates live object size is only ~15G, half of your heap. Combine these two, it means that probably we need to ask G1 to do more clean up for your application to make sure there is enough space to handle such allocation traffic. 2014-06-06T13:48:17.164+0300: 21811.068: [Full GC 29G->15G(30G), 53.9089520 secs] [Eden: 0.0B(5456.0M)->0.0B(1536.0M) Survivors: 0.0B->0.0B Heap: 29.9G(30.0G)->15.1G(30.0G)] To avoid full GC, here are two important G1 parameters you can tune (suggested numbers are based on my personal experience) -XX:*G1ReservePercent*=20 (default: 10) Sets the percentage of reserve memory to keep free so as to reduce the risk of to-space overflows. The default is 10 percent. When you increase or decrease the percentage, make sure to adjust the total Java heap by the same amount. This setting is not available in Java HotSpot VM, build 23. -XX:*G1HeapWastePercent*=5 (default: 10) Sets the percentage of heap that you are willing to waste. The Java HotSpot VM does not initiate the mixed garbage collection cycle when the reclaimable percentage is less than the heap waste percentage. The default is 10 percent. This setting is not available in Java HotSpot VM, build 23. In this way, G1 will reserve some heap (20%) for emergency and do more clean up, which should help with your situation. In addition, even if you want to make sure GC is always "on", it is still better to set InitiatingHeapOccupancyPercent slight higher than your live object ratio. For example, you may want to set it to 55% (slightly larger than 15.1 / 30). The good news is, we know one of our production services running with 100GB heap under G1 but there is no full GC at all and its throughput is also better than CMS. Please try to tune these parameters and see if it helps. I found this page is also super useful: http://www.oracle.com/technetwork/articles/java/g1gc-1984535.html Best Shengzhe On Mon, Jun 9, 2014 at 7:29 AM, Martin Makundi < martin.makundi at koodaripalvelut.com> wrote: > 1. "-XX:InitiatingHeapOccupancyPercent=0 should trigger concurrent gcs > too much. " > > We use InitiatingHeapOccupancyPercent to make sure the gc is always > active. How is InitiatingHeapOccupancyPercent related to Full GC:s? > > 2. "What is the jdk version? > > java version "1.7.0_55" > Java(TM) SE Runtime Environment (build 1.7.0_55-b13) > Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode) > > > 3. Can you share the gc log?" > > Log is available at 81.22.250.165/log > > 4. I wonder if it is possible to have mixed set of different/automatically > varying sizes of G1HeapRegionSize's small for smaller objects and larger > for larger objects? > > > ** > Martin > > > 2014-06-07 11:03 GMT+03:00 Martin Makundi < > martin.makundi at koodaripalvelut.com>: > > Why does G1GC do Full GC when it only needs only 40 bytes? >> >> Is there a way to tune this so that it would try to free up "some" chunk >> of memory and escape the full gc when enough memory has been freed? This >> would lower the freeze time? >> >> See logs: >> >> [Times: user=12.89 sys=0.16, real=1.25 secs] >> 2014-06-06T17:11:48.727+0300: 34022.631: [GC >> concurrent-root-region-scan-start] >> 2014-06-06T17:11:48.727+0300: 34022.631: [GC >> concurrent-root-region-scan-end, 0.0000180 secs] >> 2014-06-06T17:11:48.727+0300: 34022.631: [GC concurrent-mark-start] >> 34022.632: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: >> allocation request failed, allocation request: 40 bytes] >> 34022.632: [G1Ergonomics (Heap Sizing) expand the heap, requested >> expansion amount: 8388608 bytes, attempted expansion amount: 8388608 bytes] >> 34022.632: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: >> heap expansion operation failed] >> {Heap before GC invocations=1867 (full 1): >> garbage-first heap total 31457280K, used 31453781K >> [0x00007f1724000000, 0x00007f1ea4000000, 0x00007f1ea4000000) >> region size 8192K, 0 young (0K), 0 survivors (0K) >> compacting perm gen total 524288K, used 166271K [0x00007f1ea4000000, >> 0x00007f1ec4000000, 0x00007f1ec4000000) >> the space 524288K, 31% used [0x00007f1ea4000000, 0x00007f1eae25fdc8, >> 0x00007f1eae25fe00, 0x00007f1ec4000000) >> No shared spaces configured. >> 2014-06-06T17:11:48.728+0300: 34022.632: [Full GC 29G->13G(30G), >> 47.7918670 secs] >> >> >> Settings: >> >> -server -XX:InitiatingHeapOccupancyPercent=0 -Xss4096k >> -XX:MaxPermSize=512m -XX:PermSize=512m -Xms20G -Xmx30G -Xnoclassgc >> -XX:-OmitStackTraceInFastThrow -XX:+UseNUMA -XX:+UseFastAccessorMethods >> -XX:ReservedCodeCacheSize=128m -XX:-UseStringCache -XX:+UseGCOverheadLimit >> -Duser.timezone=EET -XX:+UseCompressedOops -XX:+DisableExplicitGC >> -XX:+AggressiveOpts -XX:CMSInitiatingOccupancyFraction=70 >> -XX:+ParallelRefProcEnabled -XX:+UseAdaptiveSizePolicy >> -XX:MaxGCPauseMillis=500 -XX:+UseG1GC -XX:G1HeapRegionSize=8M >> -XX:GCPauseIntervalMillis=10000 -XX:+PrintGCDetails -XX:+PrintHeapAtGC >> -XX:+PrintAdaptiveSizePolicy -XX:+PrintGCDateStamps -XX:+PrintGC >> -Xloggc:gc.log >> > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yu.zhang at oracle.com Mon Jun 9 21:04:00 2014 From: yu.zhang at oracle.com (YU ZHANG) Date: Mon, 09 Jun 2014 14:04:00 -0700 Subject: Why does G1GC do Full GC when it only needs 40 bytes? In-Reply-To: References: Message-ID: <53962140.6090800@oracle.com> Thanks for the log. Please see my comments in line Thanks, Jenny On 6/9/2014 7:29 AM, Martin Makundi wrote: > 1. "-XX:InitiatingHeapOccupancyPercent=0 should trigger concurrent gcs > too much. " > > We use InitiatingHeapOccupancyPercent to make sure the gc is always > active. How is InitiatingHeapOccupancyPercent related to Full GC:s? InitiatingHeapOccupancyPercentdetermines when concurrent gcs are triggered, so that we can do mixed gc. In your case,the heap usage after full gc is 15.5g, you can try to set this to 60. But this is not the reason why you get full gcs. The mixed gc should be able to clean more. In the gc log, there are a lot of messages like: 1. "750.559: [G1Ergonomics (CSet Construction) finish adding old regions to CSet, reason: reclaimable percentage not over threshold, old: 38 regions, max: 379 regions, reclaimable: 3172139640 bytes (9.99 %), threshold: 10.00 %]" the 10% threshold is controlled by -XX:G1HeapWastePercent=10. You can try 5% so that more mixed gcs will happen after the concurrent cycles. 2. Depending on if target pause time =500 is critical. Currently, the gc pause is > 500ms. If keeping gc pause <500 is critical, you need to decrease Eden size. We need to take another look at the gc logs then. If you can relax the target pause time, please increase it, so that we do not see too many " 5099.012: [G1Ergonomics (CSet Construction) finish adding old regions to CSet, reason: predicted time is too high, predicted time: 3.34 ms, remaining time: 0.00 ms, old: 100 regions, min: 100 regions]", maybe even to try, -XX:G1MixedGCCountTarget=<4>. This determines the minimum old regions added to CSet This workload has high reference process time. It is good that you have enabled parallel reference processing. It seems the parallel work is distributed among the 13 gc threads well. Though the work termination time is high. If you are not cpu bound, you can try to increase the gc threads to 16 using -XX:ParallelGCThreads=16, ASSUMING you are not cpu bound, your system has 20-cpu threads? There are a lot of system cpu activities, which might contribute to longer gc pause. I used to see people reported this is due to logging. Not sure if this applies to your case. > > 2. "What is the jdk version? > > java version "1.7.0_55" > Java(TM) SE Runtime Environment (build 1.7.0_55-b13) > Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode) > > > 3. Can you share the gc log?" > > Log is available at 81.22.250.165/log > > 4. I wonder if it is possible to have mixed set of > different/automatically varying sizes of G1HeapRegionSize's small for > smaller objects and larger for larger objects? Currently we do not have mixed region size. > > > ** > Martin > > > 2014-06-07 11:03 GMT+03:00 Martin Makundi > >: > > Why does G1GC do Full GC when it only needs only 40 bytes? > > Is there a way to tune this so that it would try to free up "some" > chunk of memory and escape the full gc when enough memory has been > freed? This would lower the freeze time? > > See logs: > > [Times: user=12.89 sys=0.16, real=1.25 secs] > 2014-06-06T17:11:48.727+0300: 34022.631: [GC > concurrent-root-region-scan-start] > 2014-06-06T17:11:48.727+0300: 34022.631: [GC > concurrent-root-region-scan-end, 0.0000180 secs] > 2014-06-06T17:11:48.727+0300: 34022.631: [GC concurrent-mark-start] > 34022.632: [G1Ergonomics (Heap Sizing) attempt heap expansion, > reason: allocation request failed, allocation request: 40 bytes] > 34022.632: [G1Ergonomics (Heap Sizing) expand the heap, requested > expansion amount: 8388608 bytes, attempted expansion amount: > 8388608 bytes] > 34022.632: [G1Ergonomics (Heap Sizing) did not expand the heap, > reason: heap expansion operation failed] > {Heap before GC invocations=1867 (full 1): > garbage-first heap total 31457280K, used 31453781K > [0x00007f1724000000, 0x00007f1ea4000000, 0x00007f1ea4000000) > region size 8192K, 0 young (0K), 0 survivors (0K) > compacting perm gen total 524288K, used 166271K > [0x00007f1ea4000000, 0x00007f1ec4000000, 0x00007f1ec4000000) > the space 524288K, 31% used [0x00007f1ea4000000, > 0x00007f1eae25fdc8, 0x00007f1eae25fe00, 0x00007f1ec4000000) > No shared spaces configured. > 2014-06-06T17:11:48.728+0300: 34022.632: [Full GC 29G->13G(30G), > 47.7918670 secs] > > > Settings: > > -server -XX:InitiatingHeapOccupancyPercent=0 -Xss4096k > -XX:MaxPermSize=512m -XX:PermSize=512m -Xms20G -Xmx30G -Xnoclassgc > -XX:-OmitStackTraceInFastThrow -XX:+UseNUMA > -XX:+UseFastAccessorMethods -XX:ReservedCodeCacheSize=128m > -XX:-UseStringCache -XX:+UseGCOverheadLimit -Duser.timezone=EET > -XX:+UseCompressedOops -XX:+DisableExplicitGC -XX:+AggressiveOpts > -XX:CMSInitiatingOccupancyFraction=70 -XX:+ParallelRefProcEnabled > -XX:+UseAdaptiveSizePolicy -XX:MaxGCPauseMillis=500 -XX:+UseG1GC > -XX:G1HeapRegionSize=8M -XX:GCPauseIntervalMillis=10000 > -XX:+PrintGCDetails -XX:+PrintHeapAtGC > -XX:+PrintAdaptiveSizePolicy -XX:+PrintGCDateStamps -XX:+PrintGC > -Xloggc:gc.log > > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin.makundi at koodaripalvelut.com Tue Jun 10 01:03:48 2014 From: martin.makundi at koodaripalvelut.com (Martin Makundi) Date: Tue, 10 Jun 2014 04:03:48 +0300 Subject: Why does G1GC do Full GC when it only needs 40 bytes? In-Reply-To: <53962140.6090800@oracle.com> References: <53962140.6090800@oracle.com> Message-ID: > > > 1. "-XX:InitiatingHeapOccupancyPercent=0 should trigger concurrent gcs > too much. " > > We use InitiatingHeapOccupancyPercent to make sure the gc is always > active. How is InitiatingHeapOccupancyPercent related to Full GC:s? > > InitiatingHeapOccupancyPercent determines when concurrent gcs are > triggered, so that we can do mixed gc. In your case,the heap usage after > full gc is 15.5g, you can try to set this to 60. > > But this is not the reason why you get full gcs. > > The mixed gc should be able to clean more. > Thanks. Do you mean InitiatingHeapOccupancyPercent=0 is disabling some features? I thought it simply triggers the clean to start earlier (always on). Will try different values, but would like to know what harm =0 does? > In the gc log, there are a lot of messages like: > 1. "750.559: [G1Ergonomics (CSet Construction) finish adding old regions > to CSet, reason: reclaimable percentage not over threshold, old: 38 > regions, max: 379 regions, reclaimable: 3172139640 bytes (9.99 %), > threshold: 10.00 %]" > > the 10% threshold is controlled by -XX:G1HeapWastePercent=10. You can try > 5% so that more mixed gcs will happen after the concurrent cycles. > Thanks, will try that. > 2. Depending on if target pause time =500 is critical. > Currently, the gc pause is > 500ms. If keeping gc pause <500 is critical, > you need to decrease Eden size. > We need to take another look at the gc logs then. > > If you can relax the target pause time, please increase it, so that we do > not see too many > It's a web application so response time is important for users, will try 75 and GCPauseIntervalMillis=1000 > " 5099.012: [G1Ergonomics (CSet Construction) finish adding old regions > to CSet, reason: predicted time is too high, predicted time: 3.34 ms, > remaining time: 0.00 ms, old: 100 regions, min: 100 regions]", maybe even > to try, -XX:G1MixedGCCountTarget=<4>. This determines the minimum old > regions added to CSet > Will try that too. > > This workload has high reference process time. It is good that you have > enabled parallel reference processing. It seems the parallel work is > distributed among the 13 gc threads well. Though the work termination time > is high. If you are not cpu bound, you can try to increase the gc threads > to 16 using -XX:ParallelGCThreads=16, ASSUMING you are not cpu bound, your > system has 20-cpu threads? > There are 16 cpus so I assume there are 16 threads... > > There are a lot of system cpu activities, which might contribute to longer > gc pause. I used to see people reported this is due to logging. Not sure > if this applies to your case. > Which are these? Do you mean gc logging being on might contribute to something? > > 2. "What is the jdk version? > > java version "1.7.0_55" > Java(TM) SE Runtime Environment (build 1.7.0_55-b13) > Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode) > > > 3. Can you share the gc log?" > > Log is available at 81.22.250.165/log > > 4. I wonder if it is possible to have mixed set of > different/automatically varying sizes of G1HeapRegionSize's small for > smaller objects and larger for larger objects? > > Currently we do not have mixed region size. > Has this been tried in some gc? ** Martin > > > ** > Martin > > > 2014-06-07 11:03 GMT+03:00 Martin Makundi < > martin.makundi at koodaripalvelut.com>: > >> Why does G1GC do Full GC when it only needs only 40 bytes? >> >> Is there a way to tune this so that it would try to free up "some" >> chunk of memory and escape the full gc when enough memory has been freed? >> This would lower the freeze time? >> >> See logs: >> >> [Times: user=12.89 sys=0.16, real=1.25 secs] >> 2014-06-06T17:11:48.727+0300: 34022.631: [GC >> concurrent-root-region-scan-start] >> 2014-06-06T17:11:48.727+0300: 34022.631: [GC >> concurrent-root-region-scan-end, 0.0000180 secs] >> 2014-06-06T17:11:48.727+0300: 34022.631: [GC concurrent-mark-start] >> 34022.632: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: >> allocation request failed, allocation request: 40 bytes] >> 34022.632: [G1Ergonomics (Heap Sizing) expand the heap, requested >> expansion amount: 8388608 bytes, attempted expansion amount: 8388608 bytes] >> 34022.632: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: >> heap expansion operation failed] >> {Heap before GC invocations=1867 (full 1): >> garbage-first heap total 31457280K, used 31453781K >> [0x00007f1724000000, 0x00007f1ea4000000, 0x00007f1ea4000000) >> region size 8192K, 0 young (0K), 0 survivors (0K) >> compacting perm gen total 524288K, used 166271K [0x00007f1ea4000000, >> 0x00007f1ec4000000, 0x00007f1ec4000000) >> the space 524288K, 31% used [0x00007f1ea4000000, 0x00007f1eae25fdc8, >> 0x00007f1eae25fe00, 0x00007f1ec4000000) >> No shared spaces configured. >> 2014-06-06T17:11:48.728+0300: 34022.632: [Full GC 29G->13G(30G), >> 47.7918670 secs] >> >> >> Settings: >> >> -server -XX:InitiatingHeapOccupancyPercent=0 -Xss4096k >> -XX:MaxPermSize=512m -XX:PermSize=512m -Xms20G -Xmx30G -Xnoclassgc >> -XX:-OmitStackTraceInFastThrow -XX:+UseNUMA -XX:+UseFastAccessorMethods >> -XX:ReservedCodeCacheSize=128m -XX:-UseStringCache -XX:+UseGCOverheadLimit >> -Duser.timezone=EET -XX:+UseCompressedOops -XX:+DisableExplicitGC >> -XX:+AggressiveOpts -XX:CMSInitiatingOccupancyFraction=70 >> -XX:+ParallelRefProcEnabled -XX:+UseAdaptiveSizePolicy >> -XX:MaxGCPauseMillis=500 -XX:+UseG1GC -XX:G1HeapRegionSize=8M >> -XX:GCPauseIntervalMillis=10000 -XX:+PrintGCDetails -XX:+PrintHeapAtGC >> -XX:+PrintAdaptiveSizePolicy -XX:+PrintGCDateStamps -XX:+PrintGC >> -Xloggc:gc.log >> > > > > _______________________________________________ > hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin.makundi at koodaripalvelut.com Tue Jun 10 01:09:43 2014 From: martin.makundi at koodaripalvelut.com (Martin Makundi) Date: Tue, 10 Jun 2014 04:09:43 +0300 Subject: Why does G1GC do Full GC when it only needs 40 bytes? In-Reply-To: References: Message-ID: > > t means that there is a increasing object allocation in a short amount of > time and G1 think it needs more heap. However, the following GC indicates > live object size is only ~15G, half of your heap. Combine these two, it > means that probably we need to ask G1 to do more clean up for your > application to make sure there is enough space to handle such allocation > traffic. > > 2014-06-06T13:48:17.164+0300: 21811.068: [Full GC 29G->15G(30G), 53.9089520 secs] > [Eden: 0.0B(5456.0M)->0.0B(1536.0M) Survivors: 0.0B->0.0B Heap: 29.9G(30.0G)->15.1G(30.0G)] > > To avoid full GC, here are two important G1 parameters you can tune > (suggested numbers are based on my personal experience) > > -XX:*G1ReservePercent*=20 (default: 10) > Sets the percentage of reserve memory to keep free so as to reduce the > risk of to-space overflows. The default is 10 percent. When you increase or > decrease the percentage, make sure to adjust the total Java heap by the > same amount. This setting is not available in Java HotSpot VM, build 23. > Thanks. If possible, I would try to avoid this because we try to maximize the amount of -XX:+UseCompressedOops memory we can have. Otherwise we would probably need to double our memory and also there is some overhead (processing power wise) with non compressed oops. I assume not using -XX:+UseCompressedOops will somewhat double our hw costs... > -XX:*G1HeapWastePercent*=5 (default: 10) > Sets the percentage of heap that you are willing to waste. The Java > HotSpot VM does not initiate the mixed garbage collection cycle when the > reclaimable percentage is less than the heap waste percentage. The default > is 10 percent. This setting is not available in Java HotSpot VM, build 23. > > In this way, G1 will reserve some heap (20%) for emergency and do more > clean up, which should help with your situation. > > In addition, even if you want to make sure GC is always "on", it is still > better to set InitiatingHeapOccupancyPercent slight higher than your live > object ratio. For example, you may want to set it to 55% (slightly larger > than 15.1 / 30). > Thanks, will try these. > > The good news is, we know one of our production services running with > 100GB heap under G1 but there is no full GC at all and its throughput is > also better than CMS. Please try to tune these parameters and see if it > helps. I found this page is also super useful: > http://www.oracle.com/technetwork/articles/java/g1gc-1984535.html > Thanks. ** Martin > > > Best > Shengzhe > > > On Mon, Jun 9, 2014 at 7:29 AM, Martin Makundi < > martin.makundi at koodaripalvelut.com> wrote: > >> 1. "-XX:InitiatingHeapOccupancyPercent=0 should trigger concurrent gcs >> too much. " >> >> We use InitiatingHeapOccupancyPercent to make sure the gc is always >> active. How is InitiatingHeapOccupancyPercent related to Full GC:s? >> >> 2. "What is the jdk version? >> >> java version "1.7.0_55" >> Java(TM) SE Runtime Environment (build 1.7.0_55-b13) >> Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode) >> >> >> 3. Can you share the gc log?" >> >> Log is available at 81.22.250.165/log >> >> 4. I wonder if it is possible to have mixed set of >> different/automatically varying sizes of G1HeapRegionSize's small for >> smaller objects and larger for larger objects? >> >> >> ** >> Martin >> >> >> 2014-06-07 11:03 GMT+03:00 Martin Makundi < >> martin.makundi at koodaripalvelut.com>: >> >> Why does G1GC do Full GC when it only needs only 40 bytes? >>> >>> Is there a way to tune this so that it would try to free up "some" chunk >>> of memory and escape the full gc when enough memory has been freed? This >>> would lower the freeze time? >>> >>> See logs: >>> >>> [Times: user=12.89 sys=0.16, real=1.25 secs] >>> 2014-06-06T17:11:48.727+0300: 34022.631: [GC >>> concurrent-root-region-scan-start] >>> 2014-06-06T17:11:48.727+0300: 34022.631: [GC >>> concurrent-root-region-scan-end, 0.0000180 secs] >>> 2014-06-06T17:11:48.727+0300: 34022.631: [GC concurrent-mark-start] >>> 34022.632: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: >>> allocation request failed, allocation request: 40 bytes] >>> 34022.632: [G1Ergonomics (Heap Sizing) expand the heap, requested >>> expansion amount: 8388608 bytes, attempted expansion amount: 8388608 bytes] >>> 34022.632: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: >>> heap expansion operation failed] >>> {Heap before GC invocations=1867 (full 1): >>> garbage-first heap total 31457280K, used 31453781K >>> [0x00007f1724000000, 0x00007f1ea4000000, 0x00007f1ea4000000) >>> region size 8192K, 0 young (0K), 0 survivors (0K) >>> compacting perm gen total 524288K, used 166271K [0x00007f1ea4000000, >>> 0x00007f1ec4000000, 0x00007f1ec4000000) >>> the space 524288K, 31% used [0x00007f1ea4000000, 0x00007f1eae25fdc8, >>> 0x00007f1eae25fe00, 0x00007f1ec4000000) >>> No shared spaces configured. >>> 2014-06-06T17:11:48.728+0300: 34022.632: [Full GC 29G->13G(30G), >>> 47.7918670 secs] >>> >>> >>> Settings: >>> >>> -server -XX:InitiatingHeapOccupancyPercent=0 -Xss4096k >>> -XX:MaxPermSize=512m -XX:PermSize=512m -Xms20G -Xmx30G -Xnoclassgc >>> -XX:-OmitStackTraceInFastThrow -XX:+UseNUMA -XX:+UseFastAccessorMethods >>> -XX:ReservedCodeCacheSize=128m -XX:-UseStringCache -XX:+UseGCOverheadLimit >>> -Duser.timezone=EET -XX:+UseCompressedOops -XX:+DisableExplicitGC >>> -XX:+AggressiveOpts -XX:CMSInitiatingOccupancyFraction=70 >>> -XX:+ParallelRefProcEnabled -XX:+UseAdaptiveSizePolicy >>> -XX:MaxGCPauseMillis=500 -XX:+UseG1GC -XX:G1HeapRegionSize=8M >>> -XX:GCPauseIntervalMillis=10000 -XX:+PrintGCDetails -XX:+PrintHeapAtGC >>> -XX:+PrintAdaptiveSizePolicy -XX:+PrintGCDateStamps -XX:+PrintGC >>> -Xloggc:gc.log >>> >> >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yu.zhang at oracle.com Tue Jun 10 01:29:39 2014 From: yu.zhang at oracle.com (YU ZHANG) Date: Mon, 09 Jun 2014 18:29:39 -0700 Subject: Why does G1GC do Full GC when it only needs 40 bytes? In-Reply-To: References: <53962140.6090800@oracle.com> Message-ID: <53965F83.60108@oracle.com> On 6/9/2014 6:03 PM, Martin Makundi wrote: > > >> 1. "-XX:InitiatingHeapOccupancyPercent=0 should trigger >> concurrent gcs too much. " >> >> We use InitiatingHeapOccupancyPercent to make sure the gc is >> always active. How is InitiatingHeapOccupancyPercent related to >> Full GC:s? > InitiatingHeapOccupancyPercentdetermines when concurrent gcs are > triggered, so that we can do mixed gc. In your case,the heap > usage after full gc is 15.5g, you can try to set this to 60. > > But this is not the reason why you get full gcs. > > The mixed gc should be able to clean more. > > > Thanks. Do you mean InitiatingHeapOccupancyPercent=0 is disabling some > features? I thought it simply triggers the clean to start earlier > (always on). Will try different values, but would like to know what > harm =0 does? By setting InitiatingHeapOccupancyPercent=0, you might have wasted the concurrent gc work. Since when concurrent cycles are triggered, most of them are live objects. The concurrent cycles have to go through all the objects, but can not clean them. In your case, it might not be too bad, as the mixed gcs can clean ~1g of heap, and the cleanup can clean some, too. But setting it to the right value should make g1 keeping up with the application, while not wasting too much work. > In the gc log, there are a lot of messages like: > 1. "750.559: [G1Ergonomics (CSet Construction) finish adding old > regions to CSet, reason: reclaimable percentage not over > threshold, old: 38 regions, max: 379 regions, reclaimable: > 3172139640 bytes (9.99 %), threshold: 10.00 %]" > > the 10% threshold is controlled by -XX:G1HeapWastePercent=10. You > can try 5% so that more mixed gcs will happen after the concurrent > cycles. > > > Thanks, will try that. > > 2. Depending on if target pause time =500 is critical. > Currently, the gc pause is > 500ms. If keeping gc pause <500 is > critical, you need to decrease Eden size. > We need to take another look at the gc logs then. > > If you can relax the target pause time, please increase it, so > that we do not see too many > > > It's a web application so response time is important for users, will > try 75 and GCPauseIntervalMillis=1000 > > " 5099.012: [G1Ergonomics (CSet Construction) finish adding old > regions to CSet, reason: predicted time is too high, predicted > time: 3.34 ms, remaining time: 0.00 ms, old: 100 regions, min: 100 > regions]", maybe even to try, -XX:G1MixedGCCountTarget=<4>. This > determines the minimum old regions added to CSet > > > Will try that too. > > > This workload has high reference process time. It is good that > you have enabled parallel reference processing. It seems the > parallel work is distributed among the 13 gc threads well. Though > the work termination time is high. If you are not cpu bound, you > can try to increase the gc threads to 16 using > -XX:ParallelGCThreads=16, ASSUMING you are not cpu bound, your > system has 20-cpu threads? > > > There are 16 cpus so I assume there are 16 threads... Then setting it to 16 might not be a good idea. We usually do not want to use all the cpus. > > > There are a lot of system cpu activities, which might contribute > to longer gc pause. I used to see people reported this is due to > logging. Not sure if this applies to your case. > > > Which are these? Do you mean gc logging being on might contribute to > something? GC logging itself should not have much overhead. IIRR, the cases were when there were logging activities from other processes, gc logging has to wait writing to log. > > >> >> 2. "What is the jdk version? >> >> java version "1.7.0_55" >> Java(TM) SE Runtime Environment (build 1.7.0_55-b13) >> Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode) >> >> >> 3. Can you share the gc log?" >> >> Log is available at 81.22.250.165/log >> >> 4. I wonder if it is possible to have mixed set of >> different/automatically varying sizes of G1HeapRegionSize's >> small for smaller objects and larger for larger objects? > Currently we do not have mixed region size. > > > Has this been tried in some gc? No. > > ** > Martin > >> >> >> ** >> Martin >> >> >> 2014-06-07 11:03 GMT+03:00 Martin Makundi >> > >: >> >> Why does G1GC do Full GC when it only needs only 40 bytes? >> >> Is there a way to tune this so that it would try to free up >> "some" chunk of memory and escape the full gc when enough >> memory has been freed? This would lower the freeze time? >> >> See logs: >> >> [Times: user=12.89 sys=0.16, real=1.25 secs] >> 2014-06-06T17:11:48.727+0300: 34022.631: [GC >> concurrent-root-region-scan-start] >> 2014-06-06T17:11:48.727+0300: 34022.631: [GC >> concurrent-root-region-scan-end, 0.0000180 secs] >> 2014-06-06T17:11:48.727+0300: 34022.631: [GC >> concurrent-mark-start] >> 34022.632: [G1Ergonomics (Heap Sizing) attempt heap >> expansion, reason: allocation request failed, allocation >> request: 40 bytes] >> 34022.632: [G1Ergonomics (Heap Sizing) expand the heap, >> requested expansion amount: 8388608 bytes, attempted >> expansion amount: 8388608 bytes] >> 34022.632: [G1Ergonomics (Heap Sizing) did not expand the >> heap, reason: heap expansion operation failed] >> {Heap before GC invocations=1867 (full 1): >> garbage-first heap total 31457280K, used 31453781K >> [0x00007f1724000000, 0x00007f1ea4000000, 0x00007f1ea4000000) >> region size 8192K, 0 young (0K), 0 survivors (0K) >> compacting perm gen total 524288K, used 166271K >> [0x00007f1ea4000000, 0x00007f1ec4000000, 0x00007f1ec4000000) >> the space 524288K, 31% used [0x00007f1ea4000000, >> 0x00007f1eae25fdc8, 0x00007f1eae25fe00, 0x00007f1ec4000000) >> No shared spaces configured. >> 2014-06-06T17:11:48.728+0300: 34022.632: [Full GC >> 29G->13G(30G), 47.7918670 secs] >> >> >> Settings: >> >> -server -XX:InitiatingHeapOccupancyPercent=0 -Xss4096k >> -XX:MaxPermSize=512m -XX:PermSize=512m -Xms20G -Xmx30G >> -Xnoclassgc -XX:-OmitStackTraceInFastThrow -XX:+UseNUMA >> -XX:+UseFastAccessorMethods -XX:ReservedCodeCacheSize=128m >> -XX:-UseStringCache -XX:+UseGCOverheadLimit >> -Duser.timezone=EET -XX:+UseCompressedOops >> -XX:+DisableExplicitGC -XX:+AggressiveOpts >> -XX:CMSInitiatingOccupancyFraction=70 >> -XX:+ParallelRefProcEnabled -XX:+UseAdaptiveSizePolicy >> -XX:MaxGCPauseMillis=500 -XX:+UseG1GC -XX:G1HeapRegionSize=8M >> -XX:GCPauseIntervalMillis=10000 -XX:+PrintGCDetails >> -XX:+PrintHeapAtGC -XX:+PrintAdaptiveSizePolicy >> -XX:+PrintGCDateStamps -XX:+PrintGC -Xloggc:gc.log >> >> >> >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From graham at vast.com Fri Jun 13 04:48:26 2014 From: graham at vast.com (graham sanderson) Date: Thu, 12 Jun 2014 23:48:26 -0500 Subject: CMSEdenChunksRecordAlways & CMSParallelInitialMarkEnabled Message-ID: I was investigating abortable preclean timeouts in our app (and associated long remark pause) so had a look at the old jdk6 code I had on my box, wondered about recording eden chunks during certain eden slow allocation paths (I wasn?t sure if TLAB allocation is just a CAS bump), and saw what looked perfect in the latest code, so was excited to install 1.7.0_60-b19 I wanted to ask what you consider the stability of these two options to be (I?m pretty sure at least the first one is new in this release) I have just installed locally on my mac, and am aware of http://bugs.java.com/bugdatabase/view_bug.do?bug_id=8021809 which I could reproduce, however I wasn?t able to reproduce it without -XX:-UseCMSCompactAtFullCollection (is this your understanding too?) We are running our application with 8 gig young generation (6.4g eden), on boxes with 32 cores? so parallelism is good for short pauses we already have -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled we have seen a few long(isn) initial marks, so -XX:+CMSParallelInitialMarkEnabled sounds good as for -XX:+CMSEdenChunksRecordAlways my question is: what constitutes a slow path such an eden chunk is potentially recorded? TLAB allocation, or more horrific things; basically (and I?ll test our app with -XX:+CMSPrintEdenSurvivorChunks) is it likely that I?ll actually get less samples using -XX:+CMSEdenChunksRecordAlways in a highly multithread app than I would with sampling, or put another way? what sort of app allocation patterns if any might avoid the slow path altogether and might leave me with just one chunk? Thanks, Graham P.S. less relevant I think, but our old generation is 16g P.P.S. I suspect the abortable preclean timeouts mostly happen after a burst of very high allocation rate followed by an almost complete lull? this is one of the patterns that can happen in our application -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1574 bytes Desc: not available URL: From gustav.r.akesson at gmail.com Sat Jun 14 23:29:11 2014 From: gustav.r.akesson at gmail.com (=?UTF-8?Q?Gustav_=C3=85kesson?=) Date: Sun, 15 Jun 2014 01:29:11 +0200 Subject: CMSEdenChunksRecordAlways & CMSParallelInitialMarkEnabled In-Reply-To: References: Message-ID: Hi, Even though I won't answer all your questions I'd like to share my experience with these settings (plus additional thoughts) even though I haven't yet have had the time to dig into details. We've been using these flags for several months in production (yes, Java 7 even before latest update release) and we've seen a lot of improvements for CMS old gen STW. During execution occasional initial mark of 1.5s could occur, but using these settings combined CMS pauses are consistently around ~100ms (on high-end machine as yours, they are 20-30ms). We're using 1gb and 2gb heaps with roughly half/half old/new. Obviously, YMMV but this is at least the behavior of this particular application - we've had nothing but positive outcome from using these settings. Additionally, the pauses are rather deterministic. Not sure what your heap size settings are, but what I've also observed is that setting Xms != Xmx could also cause occasional long initial mark when heap capacity is slightly increased. I had a discussion a while back ( http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2014-February/001795.html ) regarding this, and this seems to be an issue with CMS. Also, swapping/paging is another factor which could cause indeterministic / occasional long STW GCs. If you're on Linux, try swappiness=0 and see if pauses get more stable. Best Regards, Gustav ?kesson On Fri, Jun 13, 2014 at 6:48 AM, graham sanderson wrote: > I was investigating abortable preclean timeouts in our app (and associated > long remark pause) so had a look at the old jdk6 code I had on my box, > wondered about recording eden chunks during certain eden slow allocation > paths (I wasn?t sure if TLAB allocation is just a CAS bump), and saw what > looked perfect in the latest code, so was excited to install 1.7.0_60-b19 > > I wanted to ask what you consider the stability of these two options to be > (I?m pretty sure at least the first one is new in this release) > > I have just installed locally on my mac, and am aware of > http://bugs.java.com/bugdatabase/view_bug.do?bug_id=8021809 which I could > reproduce, however I wasn?t able to reproduce it without -XX:-UseCMSCompactAtFullCollection > (is this your understanding too?) > > We are running our application with 8 gig young generation (6.4g eden), on > boxes with 32 cores? so parallelism is good for short pauses > > we already have > > -XX:+UseParNewGC > -XX:+UseConcMarkSweepGC > -XX:+CMSParallelRemarkEnabled > > we have seen a few long(isn) initial marks, so > > -XX:+CMSParallelInitialMarkEnabled sounds good > > as for > > -XX:+CMSEdenChunksRecordAlways > > my question is: what constitutes a slow path such an eden chunk is > potentially recorded? TLAB allocation, or more horrific things; basically > (and I?ll test our app with -XX:+CMSPrintEdenSurvivorChunks) is it likely > that I?ll actually get less samples using -XX:+CMSEdenChunksRecordAlways in > a highly multithread app than I would with sampling, or put another way? > what sort of app allocation patterns if any might avoid the slow path > altogether and might leave me with just one chunk? > > Thanks, > > Graham > > P.S. less relevant I think, but our old generation is 16g > P.P.S. I suspect the abortable preclean timeouts mostly happen after a > burst of very high allocation rate followed by an almost complete lull? > this is one of the patterns that can happen in our application > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From graham at vast.com Sun Jun 15 00:05:04 2014 From: graham at vast.com (graham sanderson) Date: Sat, 14 Jun 2014 19:05:04 -0500 Subject: CMSEdenChunksRecordAlways & CMSParallelInitialMarkEnabled In-Reply-To: References: Message-ID: Thanks for the answer Gustav, The fact that you have been running in production for months makes me confident enough to try this on at least one our nodes? (this is actually cassandra) Current GC related options are at the bottom - these nodes have 256G of RAM, and they aren?t swapping, and we are certainly used to a pause within the first 10 seconds or so, but the nodes haven?t even joined the ring yet, so we don?t really care. yeah ms != mx is bad; we want one heap size and to stick with it. I will gather data via -XX:+CMSEdenChunksRecordAlways, however I?d be interested if a developer has an answer as to when we expect potential chunk recording? Otherwise I?ll have to go dig into the code a bit deeper - my assumption was that this call would not be in the inlined allocation code, but I had thought that even allocation of a new TLAB was inlined by the compilers - perhaps not. Current GC related settings - note we were running with a lower CMSInitiatingOccupancyFraction until recently - seems to have gotten changed back by accident, but that is kind of tangential. -Xms24576M -Xmx24576M -Xmn8192M -XX:+HeapDumpOnOutOfMemoryError -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB -XX:+UseCondCardMark -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -XX:+PrintPromotionFailure -XX:PrintFLSStatistics=1 -Xloggc:/var/log/cassandra/gc.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=30 -XX:GCLogFileSize=20M -XX:+PrintGCApplicationConcurrentTime Thanks, Graham P.S. Note tuning here is rather interesting since we use this cassandra cluster for lots of different data with very different usage patterns - sometimes we?ll suddenly dump 50G of data in over the course of a few minutes. Also cassandra doesn?t really mind a node being paused for a while due to GC, but things get a little more annoying if they pause at the same time? even though promotion failure can we worse for us (that is a separate issue), we?ve seen STW pauses up to about 6-8 seconds in re mark (presumably when things go horribly wrong and you only get one chunk). Basically I?m on a mission to minimize all pauses, since their effects can propagate (timeouts are very short in a lot of places) I will report back with my findings On Jun 14, 2014, at 6:29 PM, Gustav ?kesson wrote: > Hi, > > Even though I won't answer all your questions I'd like to share my experience with these settings (plus additional thoughts) even though I haven't yet have had the time to dig into details. > > We've been using these flags for several months in production (yes, Java 7 even before latest update release) and we've seen a lot of improvements for CMS old gen STW. During execution occasional initial mark of 1.5s could occur, but using these settings combined CMS pauses are consistently around ~100ms (on high-end machine as yours, they are 20-30ms). We're using 1gb and 2gb heaps with roughly half/half old/new. Obviously, YMMV but this is at least the behavior of this particular application - we've had nothing but positive outcome from using these settings. Additionally, the pauses are rather deterministic. > > Not sure what your heap size settings are, but what I've also observed is that setting Xms != Xmx could also cause occasional long initial mark when heap capacity is slightly increased. I had a discussion a while back ( http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2014-February/001795.html ) regarding this, and this seems to be an issue with CMS. > > Also, swapping/paging is another factor which could cause indeterministic / occasional long STW GCs. If you're on Linux, try swappiness=0 and see if pauses get more stable. > > > Best Regards, > Gustav ?kesson > > > On Fri, Jun 13, 2014 at 6:48 AM, graham sanderson wrote: > I was investigating abortable preclean timeouts in our app (and associated long remark pause) so had a look at the old jdk6 code I had on my box, wondered about recording eden chunks during certain eden slow allocation paths (I wasn?t sure if TLAB allocation is just a CAS bump), and saw what looked perfect in the latest code, so was excited to install 1.7.0_60-b19 > > I wanted to ask what you consider the stability of these two options to be (I?m pretty sure at least the first one is new in this release) > > I have just installed locally on my mac, and am aware of http://bugs.java.com/bugdatabase/view_bug.do?bug_id=8021809 which I could reproduce, however I wasn?t able to reproduce it without -XX:-UseCMSCompactAtFullCollection (is this your understanding too?) > > We are running our application with 8 gig young generation (6.4g eden), on boxes with 32 cores? so parallelism is good for short pauses > > we already have > > -XX:+UseParNewGC > -XX:+UseConcMarkSweepGC > -XX:+CMSParallelRemarkEnabled > > we have seen a few long(isn) initial marks, so > > -XX:+CMSParallelInitialMarkEnabled sounds good > > as for > > -XX:+CMSEdenChunksRecordAlways > > my question is: what constitutes a slow path such an eden chunk is potentially recorded? TLAB allocation, or more horrific things; basically (and I?ll test our app with -XX:+CMSPrintEdenSurvivorChunks) is it likely that I?ll actually get less samples using -XX:+CMSEdenChunksRecordAlways in a highly multithread app than I would with sampling, or put another way? what sort of app allocation patterns if any might avoid the slow path altogether and might leave me with just one chunk? > > Thanks, > > Graham > > P.S. less relevant I think, but our old generation is 16g > P.P.S. I suspect the abortable preclean timeouts mostly happen after a burst of very high allocation rate followed by an almost complete lull? this is one of the patterns that can happen in our application > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1574 bytes Desc: not available URL: From jresch at cleversafe.com Fri Jun 20 16:51:02 2014 From: jresch at cleversafe.com (Jason Resch) Date: Fri, 20 Jun 2014 11:51:02 -0500 Subject: Fwd: Reference Processing in G1 remark phase vs. throughput collector In-Reply-To: <53A1E684.1020000@cleversafe.com> References: <53A1E684.1020000@cleversafe.com> Message-ID: <53A46676.8040903@cleversafe.com> -------- Original Message -------- Subject: Reference Processing in G1 remark phase vs. throughput collector Date: Wed, 18 Jun 2014 14:20:36 -0500 From: Jason Resch To: hotspot-gc-dev at openjdk.java.net Hello, We've recently been experimenting with the G1 collector for our application, and we noticed something odd with reference processing times in the G1. It is not clear to us if this is expected or indicative of a bug, but I thought I would mention it to this list to see if there is a reasonable explanation for this result. We are seeing that during the remark phase when non-strong references are processed, it takes around 20 times longer than the throughput collector spends processing the same number of references. As an example, here is some output for references processing times we observed: 2014-05-23T19:58:12.805+0000: 11446.605: [GC remark 11446.618: [GC ref-proc11446.618: [SoftReference, 0 refs, 0.0040400 secs]11446.622: [WeakReference, 11131810 refs, 8.7176900 secs]11455.340: [FinalReference, 2273593 refs, 2.0022000 secs]11457.342: [PhantomReference, 297950 refs, 0.3004680 secs]11457.643: [JNI Weak Reference, 0.0000040 secs], 13.7534950 secs], 13.8035420 secs] We see the G1 spent 8.7 seconds were spent processing 11 million weak references 2014-05-30T05:57:24.002+0000: 32724.998: [Full GC32726.138: [SoftReference, 154 refs, 0.0050380 secs]32726.143: [WeakReference, 7713339 refs, 0.3449380 secs]32726.488: [FinalReference, 1966941 refs, 0.1005860 secs]32726.588: [PhantomReference, 650797 refs, 0.0631680 secs]32726.652: [JNI Weak Reference, 0.0000060 secs] [PSYoungGen: 1012137K->0K(14784384K)] [ParOldGen: 16010001K->5894387K(16384000K)] 17022139K->5894387K(31168384K) [PSPermGen: 39256K->39256K(39552K)], 4.3463290 secs] [Times: user=98.05 sys=0.00, real=4.35 secs] While the throughput collector spent 0.34 seconds processing 7.7 million weak references In summary, the G1 collector processed weak references at a rate of 1.27 million per second, while the throughput collector processed them at 22.36 million references per second. Is there a fundamental design reason that explains why the G1 collector should be so much slower in this regard, or might there be ways to improve upon it? Jason -------------- next part -------------- An HTML attachment was scrubbed... URL: From graham at vast.com Sat Jun 21 16:52:45 2014 From: graham at vast.com (graham sanderson) Date: Sat, 21 Jun 2014 11:52:45 -0500 Subject: CMSEdenChunksRecordAlways & CMSParallelInitialMarkEnabled In-Reply-To: References: Message-ID: <8217B498-8868-453D-B8DC-39A718310D75@vast.com> Note this works great for us too ? given formatting in this email is a bit flaky, I?ll refer you to our numbers I posted in a Cassandra issue I opened to add these flags as defaults for ParNew/CMS (on the appropriate JVMs) https://issues.apache.org/jira/browse/CASSANDRA-7432 On Jun 14, 2014, at 7:05 PM, graham sanderson wrote: > Thanks for the answer Gustav, > > The fact that you have been running in production for months makes me confident enough to try this on at least one our nodes? (this is actually cassandra) > > Current GC related options are at the bottom - these nodes have 256G of RAM, and they aren?t swapping, and we are certainly used to a pause within the first 10 seconds or so, but the nodes haven?t even joined the ring yet, so we don?t really care. yeah ms != mx is bad; we want one heap size and to stick with it. > > I will gather data via -XX:+CMSEdenChunksRecordAlways, however I?d be interested if a developer has an answer as to when we expect potential chunk recording? Otherwise I?ll have to go dig into the code a bit deeper - my assumption was that this call would not be in the inlined allocation code, but I had thought that even allocation of a new TLAB was inlined by the compilers - perhaps not. > > Current GC related settings - note we were running with a lower CMSInitiatingOccupancyFraction until recently - seems to have gotten changed back by accident, but that is kind of tangential. > > -Xms24576M > -Xmx24576M > -Xmn8192M > -XX:+HeapDumpOnOutOfMemoryError > -XX:+UseParNewGC > -XX:+UseConcMarkSweepGC > -XX:+CMSParallelRemarkEnabled > -XX:SurvivorRatio=8 > -XX:MaxTenuringThreshold=1 > -XX:CMSInitiatingOccupancyFraction=70 > -XX:+UseCMSInitiatingOccupancyOnly > -XX:+UseTLAB > -XX:+UseCondCardMark > -XX:+PrintGCDetails > -XX:+PrintGCDateStamps > -XX:+PrintHeapAtGC > -XX:+PrintTenuringDistribution > -XX:+PrintGCApplicationStoppedTime > -XX:+PrintPromotionFailure > -XX:PrintFLSStatistics=1 > -Xloggc:/var/log/cassandra/gc.log > -XX:+UseGCLogFileRotation > -XX:NumberOfGCLogFiles=30 > -XX:GCLogFileSize=20M > -XX:+PrintGCApplicationConcurrentTime > > Thanks, Graham > > P.S. Note tuning here is rather interesting since we use this cassandra cluster for lots of different data with very different usage patterns - sometimes we?ll suddenly dump 50G of data in over the course of a few minutes. Also cassandra doesn?t really mind a node being paused for a while due to GC, but things get a little more annoying if they pause at the same time? even though promotion failure can we worse for us (that is a separate issue), we?ve seen STW pauses up to about 6-8 seconds in re mark (presumably when things go horribly wrong and you only get one chunk). Basically I?m on a mission to minimize all pauses, since their effects can propagate (timeouts are very short in a lot of places) > > I will report back with my findings > > On Jun 14, 2014, at 6:29 PM, Gustav ?kesson wrote: > >> Hi, >> >> Even though I won't answer all your questions I'd like to share my experience with these settings (plus additional thoughts) even though I haven't yet have had the time to dig into details. >> >> We've been using these flags for several months in production (yes, Java 7 even before latest update release) and we've seen a lot of improvements for CMS old gen STW. During execution occasional initial mark of 1.5s could occur, but using these settings combined CMS pauses are consistently around ~100ms (on high-end machine as yours, they are 20-30ms). We're using 1gb and 2gb heaps with roughly half/half old/new. Obviously, YMMV but this is at least the behavior of this particular application - we've had nothing but positive outcome from using these settings. Additionally, the pauses are rather deterministic. >> >> Not sure what your heap size settings are, but what I've also observed is that setting Xms != Xmx could also cause occasional long initial mark when heap capacity is slightly increased. I had a discussion a while back ( http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2014-February/001795.html ) regarding this, and this seems to be an issue with CMS. >> >> Also, swapping/paging is another factor which could cause indeterministic / occasional long STW GCs. If you're on Linux, try swappiness=0 and see if pauses get more stable. >> >> >> Best Regards, >> Gustav ?kesson >> >> >> On Fri, Jun 13, 2014 at 6:48 AM, graham sanderson wrote: >> I was investigating abortable preclean timeouts in our app (and associated long remark pause) so had a look at the old jdk6 code I had on my box, wondered about recording eden chunks during certain eden slow allocation paths (I wasn?t sure if TLAB allocation is just a CAS bump), and saw what looked perfect in the latest code, so was excited to install 1.7.0_60-b19 >> >> I wanted to ask what you consider the stability of these two options to be (I?m pretty sure at least the first one is new in this release) >> >> I have just installed locally on my mac, and am aware of http://bugs.java.com/bugdatabase/view_bug.do?bug_id=8021809 which I could reproduce, however I wasn?t able to reproduce it without -XX:-UseCMSCompactAtFullCollection (is this your understanding too?) >> >> We are running our application with 8 gig young generation (6.4g eden), on boxes with 32 cores? so parallelism is good for short pauses >> >> we already have >> >> -XX:+UseParNewGC >> -XX:+UseConcMarkSweepGC >> -XX:+CMSParallelRemarkEnabled >> >> we have seen a few long(isn) initial marks, so >> >> -XX:+CMSParallelInitialMarkEnabled sounds good >> >> as for >> >> -XX:+CMSEdenChunksRecordAlways >> >> my question is: what constitutes a slow path such an eden chunk is potentially recorded? TLAB allocation, or more horrific things; basically (and I?ll test our app with -XX:+CMSPrintEdenSurvivorChunks) is it likely that I?ll actually get less samples using -XX:+CMSEdenChunksRecordAlways in a highly multithread app than I would with sampling, or put another way? what sort of app allocation patterns if any might avoid the slow path altogether and might leave me with just one chunk? >> >> Thanks, >> >> Graham >> >> P.S. less relevant I think, but our old generation is 16g >> P.P.S. I suspect the abortable preclean timeouts mostly happen after a burst of very high allocation rate followed by an almost complete lull? this is one of the patterns that can happen in our application >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1574 bytes Desc: not available URL: From yu.zhang at oracle.com Mon Jun 23 21:00:45 2014 From: yu.zhang at oracle.com (YU ZHANG) Date: Mon, 23 Jun 2014 14:00:45 -0700 Subject: Fwd: Reference Processing in G1 remark phase vs. throughput collector In-Reply-To: <53A46676.8040903@cleversafe.com> References: <53A1E684.1020000@cleversafe.com> <53A46676.8040903@cleversafe.com> Message-ID: <53A8957D.30603@oracle.com> Jason, Can you try -XX:+ParallelRefProcEnabled? The default is disabled. Thanks, Jenny On 6/20/2014 9:51 AM, Jason Resch wrote: > > > > -------- Original Message -------- > Subject: Reference Processing in G1 remark phase vs. throughput > collector > Date: Wed, 18 Jun 2014 14:20:36 -0500 > From: Jason Resch > To: hotspot-gc-dev at openjdk.java.net > > > > Hello, > > We've recently been experimenting with the G1 collector for our > application, and we noticed something odd with reference processing > times in the G1. It is not clear to us if this is expected or > indicative of a bug, but I thought I would mention it to this list to > see if there is a reasonable explanation for this result. > > We are seeing that during the remark phase when non-strong references > are processed, it takes around 20 times longer than the throughput > collector spends processing the same number of references. As an > example, here is some output for references processing times we observed: > > 2014-05-23T19:58:12.805+0000: 11446.605: [GC remark 11446.618: [GC > ref-proc11446.618: [SoftReference, 0 refs, 0.0040400 > secs]11446.622: [WeakReference, 11131810 refs, 8.7176900 > secs]11455.340: [FinalReference, 2273593 refs, 2.0022000 > secs]11457.342: [PhantomReference, 297950 refs, 0.3004680 > secs]11457.643: [JNI Weak Reference, 0.0000040 secs], 13.7534950 > secs], 13.8035420 secs] > > > We see the G1 spent 8.7 seconds were spent processing 11 million weak > references > > 2014-05-30T05:57:24.002+0000: 32724.998: [Full GC32726.138: > [SoftReference, 154 refs, 0.0050380 secs]32726.143: > [WeakReference, 7713339 refs, 0.3449380 secs]32726.488: > [FinalReference, 1966941 refs, 0.1005860 secs]32726.588: > [PhantomReference, 650797 refs, 0.0631680 secs]32726.652: [JNI > Weak Reference, 0.0000060 secs] [PSYoungGen: > 1012137K->0K(14784384K)] [ParOldGen: > 16010001K->5894387K(16384000K)] 17022139K->5894387K(31168384K) > [PSPermGen: 39256K->39256K(39552K)], 4.3463290 secs] [Times: > user=98.05 sys=0.00, real=4.35 secs] > > While the throughput collector spent 0.34 seconds processing 7.7 > million weak references > > > In summary, the G1 collector processed weak references at a rate of > 1.27 million per second, while the throughput collector processed them > at 22.36 million references per second. Is there a fundamental design > reason that explains why the G1 collector should be so much slower in > this regard, or might there be ways to improve upon it? > > > Jason > > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: From serkanozal86 at hotmail.com Sat Jun 28 12:21:15 2014 From: serkanozal86 at hotmail.com (=?utf-8?B?c2Vya2FuIMO2emFs?=) Date: Sat, 28 Jun 2014 15:21:15 +0300 Subject: =?utf-8?Q?Compressed?= =?utf-8?Q?-OOP's_on_?= =?utf-8?B?SlZN4oCP?= Message-ID: Hi all, As you know, sometimes, although compressed-oops are used, if java heap size < 4Gb and it can be moved into low virtual address space (below 4Gb) then compressed oops can be used without encoding/decoding. (https://wikis.oracle.com/display/HotSpotInternals/CompressedOops) In 64 bit JVM with compressed-oops enable and and with minimum heap size 1G and maximum heap size 1G, object references are 4 byte. In this case, compressed-oop is real native address. But in 64 bit JVM with compressed-oops enable and and with minimum heap size 4G and maximum heap size 8G, object references are 4 byte. But in this case, compressed-oop is needed to be encoded/decoded (by 3 bit shifting) before getting real native address. In both of cases, compressed-oop is enable, but how can I detect compressed-oops are used as native address or are they need to be encoded/decoded ? If they are encoded/decoded, what is the value of bit shifting ? Thanks in advance. -- Serkan ?ZAL -------------- next part -------------- An HTML attachment was scrubbed... URL: From bernd-2014 at eckenfels.net Sat Jun 28 17:36:21 2014 From: bernd-2014 at eckenfels.net (Bernd Eckenfels) Date: Sat, 28 Jun 2014 19:36:21 +0200 Subject: Compressed-OOP's on =?utf-8?Q?JVM=E2=80=8F?= In-Reply-To: References: Message-ID: <20140628193621.00000b7f.bernd-2014@eckenfels.net> Hello, you can use -XX:+PrintCompressedOopsMode like this: > java.exe -Xmx1960m-XX:+UnlockDiagnosticVMOptions > -XX:+PrintCompressedOopsMode -version heap address: 0x0000000080600000, size: 2042 MB, zero based Compressed Oops, 32-bits Oops java version "1.7.0_51" Java(TM) SE Runtime Environment (build 1.7.0_51-b13) Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode) > java.exe -Xmx1970m-XX:+UnlockDiagnosticVMOptions > -XX:+PrintCompressedOopsMode -version heap address: 0x000000077fc00000, size: 2052 MB, zero based Compressed Oops > java.exe -Xmx26g-XX:+UnlockDiagnosticVMOptions -XX:+PrintCompressedOopsMode -version heap address: 0x000000017ae00000, size: 26706 MB, zero based Compressed Oops > java.exe -Xmx27g-XX:+UnlockDiagnosticVMOptions -XX:+PrintCompressedOopsMode -version Protected page at the reserved heap base: 0x000000013f3e0000 / 65536 bytes heap address: 0x000000013f3f0000, size: 27730 MB, Compressed Oops with base: 0x000000013f3ef000 > java.exe -Xmx31g-XX:+UnlockDiagnosticVMOptions -XX:+PrintCompressedOopsMode -version Protected page at the reserved heap base: 0x000000013f7a0000 / 65536 bytes heap address: 0x000000013f7b0000, size: 31826 MB, Compressed Oops with base: 0x000000013f7af000 > java.exe -Xmx32g-XX:+UnlockDiagnosticVMOptions -XX:+PrintCompressedOopsMode -version java version "1.7.0_25" The last one does not use C-OOPs. Gruss Bernd PS: please dont spam so many lists simulatanously with user questions. It not only bothers a lot of people, it also makes the target for a followup discussion hard to figure out. Am Sat, 28 Jun 2014 15:21:15 +0300 schrieb serkan ?zal : > Hi all, > As you know, sometimes, although compressed-oops are used, if java > heap size < 4Gb and it can be moved into low virtual address space > (below 4Gb) then compressed oops can be used without > encoding/decoding. > (https://wikis.oracle.com/display/HotSpotInternals/CompressedOops) In > 64 bit JVM with compressed-oops enable and and with minimum heap size > 1G and maximum heap size 1G, object references are 4 byte. In this > case, compressed-oop is real native address. But in 64 bit JVM with > compressed-oops enable and and with minimum heap size 4G and maximum > heap size 8G, object references are 4 byte. But in this case, > compressed-oop is needed to be encoded/decoded (by 3 bit shifting) > before getting real native address. In both of cases, compressed-oop > is enable, but how can I detect compressed-oops are used as native > address or are they need to be encoded/decoded ? If they are > encoded/decoded, what is the value of bit shifting ? Thanks in > advance. -- Serkan ?ZAL From bernd-2014 at eckenfels.net Sat Jun 28 21:51:52 2014 From: bernd-2014 at eckenfels.net (Bernd Eckenfels) Date: Sat, 28 Jun 2014 23:51:52 +0200 Subject: Compressed-OOP's on =?utf-8?Q?JVM=E2=80=8F?= In-Reply-To: References: <20140628193621.00000b7f.bernd-2014@eckenfels.net> Message-ID: <20140628235152.00001042.bernd-2014@eckenfels.net> Hello, JOL does print out the C-OOPs characteristics. It is printed with VMSupport#vmDetails(). I think there are a lot of heuristics used, for hotspot it queries getVMOptions() on the HotSpotDiagnotic MBean with "UseCompressedOops" and "ObjectAlignmentInBytes" options. http://openjdk.java.net/projects/code-tools/jol/ http://hg.openjdk.java.net/code-tools/jol/file/b4bc510cbad0/jol-core/src/main/java/org/openjdk/jol/util/VMSupport.java#l400 Gruss Bernd Am Sun, 29 Jun 2014 00:28:48 +0300 schrieb serkan ?zal : > Thanks Bernd, > How can I check it programmatically? > > > Date: Sat, 28 Jun 2014 19:36:21 +0200 > > From: bernd-2014 at eckenfels.net > > To: serkanozal86 at hotmail.com > > CC: hotspot-gc-use at openjdk.java.net > > Subject: Re: Compressed-OOP's on JVM? > > > > Hello, > > > > you can use -XX:+PrintCompressedOopsMode like this: > > > > > java.exe -Xmx1960m-XX:+UnlockDiagnosticVMOptions > > > -XX:+PrintCompressedOopsMode -version > > heap address: 0x0000000080600000, size: 2042 MB, zero based > > Compressed Oops, 32-bits Oops java version "1.7.0_51" > > Java(TM) SE Runtime Environment (build 1.7.0_51-b13) > > Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode) > > > java.exe -Xmx1970m-XX:+UnlockDiagnosticVMOptions > > > -XX:+PrintCompressedOopsMode -version > > heap address: 0x000000077fc00000, size: 2052 MB, zero based > > Compressed Oops > > > java.exe -Xmx26g-XX:+UnlockDiagnosticVMOptions > > > -XX:+PrintCompressedOopsMode -version > > heap address: 0x000000017ae00000, size: 26706 MB, zero based > > Compressed Oops > > > java.exe -Xmx27g-XX:+UnlockDiagnosticVMOptions > > > -XX:+PrintCompressedOopsMode -version > > Protected page at the reserved heap base: 0x000000013f3e0000 / 65536 > > bytes heap address: 0x000000013f3f0000, size: 27730 MB, Compressed > > Oops with base: 0x000000013f3ef000 > > > java.exe -Xmx31g-XX:+UnlockDiagnosticVMOptions > > > -XX:+PrintCompressedOopsMode -version > > Protected page at the reserved heap base: 0x000000013f7a0000 / 65536 > > bytes heap address: 0x000000013f7b0000, size: 31826 MB, Compressed > > Oops with base: 0x000000013f7af000 > > > java.exe -Xmx32g-XX:+UnlockDiagnosticVMOptions > > > -XX:+PrintCompressedOopsMode -version > > java version "1.7.0_25" > > > > The last one does not use C-OOPs. > > > > Gruss > > Bernd > > > > PS: please dont spam so many lists simulatanously with user > > questions. It not only bothers a lot of people, it also makes the > > target for a followup discussion hard to figure out. > > > > Am Sat, 28 Jun 2014 15:21:15 +0300 > > schrieb serkan ?zal : > > > > > Hi all, > > > As you know, sometimes, although compressed-oops are used, if java > > > heap size < 4Gb and it can be moved into low virtual address space > > > (below 4Gb) then compressed oops can be used without > > > encoding/decoding. > > > (https://wikis.oracle.com/display/HotSpotInternals/CompressedOops) > > > In 64 bit JVM with compressed-oops enable and and with minimum > > > heap size 1G and maximum heap size 1G, object references are 4 > > > byte. In this case, compressed-oop is real native address. But in > > > 64 bit JVM with compressed-oops enable and and with minimum heap > > > size 4G and maximum heap size 8G, object references are 4 byte. > > > But in this case, compressed-oop is needed to be encoded/decoded > > > (by 3 bit shifting) before getting real native address. In both > > > of cases, compressed-oop is enable, but how can I detect > > > compressed-oops are used as native address or are they need to > > > be encoded/decoded ? If they are encoded/decoded, what is the > > > value of bit shifting ? Thanks in advance. -- Serkan ?ZAL > From dhd at exnet.com Mon Jun 30 15:28:57 2014 From: dhd at exnet.com (Damon Hart-Davis) Date: Mon, 30 Jun 2014 16:28:57 +0100 Subject: MinHeapFreeRatio / MaxHeapFreeRatio In-Reply-To: <4FAA3867.2020808@oracle.com> References: <4FAA3867.2020808@oracle.com> Message-ID: <7A18E4BF-6CF7-493C-B1EF-90B05D01973A@exnet.com> Hi Jon, Did CMS get fixed to observe -XX:MaxHeapFreeRatio in JDK8? Rgds Damon > > On 5/8/2012 11:47 PM, Damon Hart-Davis wrote: >> Hi, >> >> Note: I should have been clear that memory is not given back to the OS even when the heap is (say) 60%+ free and I force GCs. >> >> Rgds >> >> Damon >> >> >> On 9 May 2012, at 06:46, Damon Hart-Davis wrote: >> >>> Hi, >>> >>> First time on this list, so hello! I see a few familiar names from days of yore! >>> >>> I am running my favourite (Tomcat/Web) app in a very constrained memory environment (a SheevaPlug) along with all the usual Internet server junk (sendmail, ntp, sshd, etc) and I am aiming to squeeze it into the even tighter space of a Raspberry Pi in due course. >>> >>> As such, I want the JVM to give memory back to the OS whenever possible. In code I set a target to keep about 25% of the heap free, and so for example I stop cacheing some stuff when below that, and cache more vigorously above that. >>> >>> Here are some relevant options: >>> >>> CATALINA_OPTS="-Xmx100m -Xms64m" >>> # Cap size of non-(main-)heap components. >>> CATALINA_OPTS="$CATALINA_OPTS -XX:MaxPermSize=48m" >>> # Trim thread stack size. >>> CATALINA_OPTS="$CATALINA_OPTS -Xss256k" >>> # Keep the new generation well within the target 25% free... >>> CATALINA_OPTS="$CATALINA_OPTS -XX:NewRatio=5" >>> # Be aggressive about giving memory back to the system above target 25% free. >>> CATALINA_OPTS="$CATALINA_OPTS -XX:MinHeapFreeRatio=10 -XX:MaxHeapFreeRatio=26" >>> # Run in incremental GC mode to minimise pauses. >>> CATALINA_OPTS="$CATALINA_OPTS -Xincgc" >>> >>> I see some evidence that the free ratios are being observed by when the heap is expanded, but I have never seen any variant of JDK 6 or 7, including the Oracle embedded 6, actually give memory back to the OS with this app. >>> >>> Am I still doing something wrong? B^> >>> >>> Rgds >>> >>> Damon >>> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > From dhd at exnet.com Mon Jun 30 23:31:39 2014 From: dhd at exnet.com (Damon Hart-Davis) Date: Tue, 1 Jul 2014 00:31:39 +0100 Subject: MinHeapFreeRatio / MaxHeapFreeRatio In-Reply-To: <7A18E4BF-6CF7-493C-B1EF-90B05D01973A@exnet.com> References: <4FAA3867.2020808@oracle.com> <7A18E4BF-6CF7-493C-B1EF-90B05D01973A@exnet.com> Message-ID: <06E69C5B-6D77-40B7-B4B7-72BF6109F075@exnet.com> Hi, Thanks, so: % java -fullversion java full version "1.8.0-b132" Should be fine if I give it a whirl somehow then! Rgds Damon On 30 Jun 2014, at 16:28, Damon Hart-Davis wrote: > Hi Jon, > > Did CMS get fixed to observe -XX:MaxHeapFreeRatio in JDK8? > > Rgds > > Damon > >> >> On 5/8/2012 11:47 PM, Damon Hart-Davis wrote: >>> Hi, >>> >>> Note: I should have been clear that memory is not given back to the OS even when the heap is (say) 60%+ free and I force GCs. >>> >>> Rgds >>> >>> Damon >>> >>> >>> On 9 May 2012, at 06:46, Damon Hart-Davis wrote: >>> >>>> Hi, >>>> >>>> First time on this list, so hello! I see a few familiar names from days of yore! >>>> >>>> I am running my favourite (Tomcat/Web) app in a very constrained memory environment (a SheevaPlug) along with all the usual Internet server junk (sendmail, ntp, sshd, etc) and I am aiming to squeeze it into the even tighter space of a Raspberry Pi in due course. >>>> >>>> As such, I want the JVM to give memory back to the OS whenever possible. In code I set a target to keep about 25% of the heap free, and so for example I stop cacheing some stuff when below that, and cache more vigorously above that. >>>> >>>> Here are some relevant options: >>>> >>>> CATALINA_OPTS="-Xmx100m -Xms64m" >>>> # Cap size of non-(main-)heap components. >>>> CATALINA_OPTS="$CATALINA_OPTS -XX:MaxPermSize=48m" >>>> # Trim thread stack size. >>>> CATALINA_OPTS="$CATALINA_OPTS -Xss256k" >>>> # Keep the new generation well within the target 25% free... >>>> CATALINA_OPTS="$CATALINA_OPTS -XX:NewRatio=5" >>>> # Be aggressive about giving memory back to the system above target 25% free. >>>> CATALINA_OPTS="$CATALINA_OPTS -XX:MinHeapFreeRatio=10 -XX:MaxHeapFreeRatio=26" >>>> # Run in incremental GC mode to minimise pauses. >>>> CATALINA_OPTS="$CATALINA_OPTS -Xincgc" >>>> >>>> I see some evidence that the free ratios are being observed by when the heap is expanded, but I have never seen any variant of JDK 6 or 7, including the Oracle embedded 6, actually give memory back to the OS with this app. >>>> >>>> Am I still doing something wrong? B^> >>>> >>>> Rgds >>>> >>>> Damon >>>> >>> _______________________________________________ >>> hotspot-gc-use mailing list >>> hotspot-gc-use at openjdk.java.net >>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> >