From yu.zhang at oracle.com Thu Jan 2 08:24:59 2014 From: yu.zhang at oracle.com (YU ZHANG) Date: Thu, 02 Jan 2014 08:24:59 -0800 Subject: G1 GC clean up time is too long In-Reply-To: References: <52B5037C.8010704@servergy.com> Message-ID: <52C592DB.9060007@oracle.com> Yao, Thanks for the feedback. Glad to know that the tuning helps. We are working on improving G1 performance. If the current build does not meet your requirements, I hope the future builds(jdk8), with more improvements, work for your workload. Thanks, Jenny On 12/31/2013 5:27 PM, yao wrote: > Hi Folks, > > Sorry for reporting GC performance result late, we are in the code > freeze period for the holiday season and cannot do any production > related deployment. > > First, I'd like to say thank you to Jenny, Monica and Thomas. Your > suggestions are really helpful and help us to understand G1 GC > behavior. We did NOT observe any full GCs after adjusting suggested > parameters. That is really awesome, we tried these new parameters on > Dec 26 and full GC disappeared since then (at least until I am writing > this email, at 3:37pm EST, Dec 30). > > G1 parameters: > *-XX:MaxGCPauseMillis=100 > *-XX:G1HeapRegionSize=32m > *-XX:InitiatingHeapOccupancyPercent=65 > *-XX:G1ReservePercent=20 > *-XX:G1HeapWastePercent=5 > -XX:G1MixedGCLiveThresholdPercent=75 > > * > We've reduced**MaxGCPauseMillis to 100 since our real-time system is > focus on low pause, if system cannot give response in 50 milliseconds, > it's totally useless for the client. However, current read latency 99 > percentile is still slightly higher than CMS machines but they are > pretty close (14 millis vs 12 millis). One thing we can do now is to > increase heap size for G1 machines, for now, the heap size for G1 is > only 90 percent of those CMS machines. This is because we observed our > server process is killed by OOM killer on G1 machines and we decided > to decrease heap size on G1 machines. Since G1ReservePercent was > increased, we think it should be safe to increase G1 heap to be same > as CMS machine. We believe it could make G1 machine give us better > performance because 40 percent of heap will be used for block cache. > > Thanks > -Shengzhe > > G1 Logs > > 2013-12-30T08:25:26.727-0500: 308692.158: [GC pause (young) > Desired survivor size 234881024 bytes, new threshold 14 (max 15) > - age 1: 16447904 bytes, 16447904 total > - age 2: 30614384 bytes, 47062288 total > - age 3: 16122104 bytes, 63184392 total > - age 4: 16542280 bytes, 79726672 total > - age 5: 14249520 bytes, 93976192 total > - age 6: 15187728 bytes, 109163920 total > - age 7: 15073808 bytes, 124237728 total > - age 8: 17903552 bytes, 142141280 total > - age 9: 17031280 bytes, 159172560 total > - age 10: 16854792 bytes, 176027352 total > - age 11: 19192480 bytes, 195219832 total > - age 12: 20491176 bytes, 215711008 total > - age 13: 16367528 bytes, 232078536 total > - age 14: 15536120 bytes, 247614656 total > 308692.158: [G1Ergonomics (CSet Construction) start choosing CSet, > _pending_cards: 32768, predicted base time: 38.52 ms, remaining time: > 61.48 ms, target pause time: 100.00 ms] > 308692.158: [G1Ergonomics (CSet Construction) add young regions to > CSet, eden: 91 regions, survivors: 14 regions, predicted young region > time: 27.76 ms] > 308692.158: [G1Ergonomics (CSet Construction) finish choosing CSet, > eden: 91 regions, survivors: 14 regions, old: 0 regions, predicted > pause time: 66.28 ms, target pause time: 100.00 ms] > 308692.233: [G1Ergonomics (Concurrent Cycles) request concurrent > cycle initiation, reason: occupancy higher than threshold, occupancy: > 52143587328 bytes, allocation request: 0 bytes, threshold: 46172576125 > bytes (65.00 %), source: end of GC] > , 0.0749020 secs] > [Parallel Time: 53.9 ms, GC Workers: 18] > [GC Worker Start (ms): Min: 308692158.6 , Avg: > 308692159.0 , Max: 308692159.4 , > Diff: 0.8] > [Ext Root Scanning (ms): Min: 3.9, Avg: 4.5, Max: 6.4, Diff: > 2.4, Sum: 81.9] > [Update RS (ms): Min: 10.2, Avg: 11.6, Max: 12.2, Diff: 2.0, > Sum: 209.0] > [Processed Buffers: Min: 15, Avg: 22.5, Max: 31, Diff: 16, > Sum: 405] > [Scan RS (ms): Min: 7.8, Avg: 8.0, Max: 8.3, Diff: 0.5, Sum: 144.3] > [Object Copy (ms): Min: 28.3, Avg: 28.4, Max: 28.5, Diff: 0.2, > Sum: 510.7] > [Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: > 1.2] > [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, > Sum: 0.5] > [GC Worker Total (ms): Min: 52.3, Avg: 52.6, Max: 53.1, Diff: > 0.8, Sum: 947.5] > [GC Worker End (ms): Min: 308692211.6 , Avg: > 308692211.7 , Max: 308692211.7 , > Diff: 0.1] > [Code Root Fixup: 0.0 ms] > [Clear CT: 9.8 ms] > [Other: 11.1 ms] > [Choose CSet: 0.0 ms] > [Ref Proc: 2.4 ms] > [Ref Enq: 0.4 ms] > [Free CSet: 1.1 ms] > [Eden: 2912.0M(2912.0M)->0.0B(3616.0M) Survivors: 448.0M->416.0M > Heap: 51.7G(66.2G)->48.9G(66.2G)] > [Times: user=1.07 sys=0.01, real=0.08 secs] > 308697.312: [G1Ergonomics (Concurrent Cycles) initiate concurrent > cycle, reason: concurrent cycle initiation requested] > 2013-12-30T08:25:31.881-0500: 308697.312: [GC pause (young) (initial-mark) > Desired survivor size 268435456 bytes, new threshold 15 (max 15) > - age 1: 17798336 bytes, 17798336 total > - age 2: 15275456 bytes, 33073792 total > - age 3: 27940176 bytes, 61013968 total > - age 4: 15716648 bytes, 76730616 total > - age 5: 16474656 bytes, 93205272 total > - age 6: 14249232 bytes, 107454504 total > - age 7: 15187536 bytes, 122642040 total > - age 8: 15073808 bytes, 137715848 total > - age 9: 17362752 bytes, 155078600 total > - age 10: 17031280 bytes, 172109880 total > - age 11: 16854792 bytes, 188964672 total > - age 12: 19124800 bytes, 208089472 total > - age 13: 20491176 bytes, 228580648 total > - age 14: 16367528 bytes, 244948176 total > 308697.313: [G1Ergonomics (CSet Construction) start choosing CSet, > _pending_cards: 31028, predicted base time: 37.87 ms, remaining time: > 62.13 ms, target pause time: 100.00 ms] > 308697.313: [G1Ergonomics (CSet Construction) add young regions to > CSet, eden: 113 regions, survivors: 13 regions, predicted young region > time: 27.99 ms] > 308697.313: [G1Ergonomics (CSet Construction) finish choosing CSet, > eden: 113 regions, survivors: 13 regions, old: 0 regions, predicted > pause time: 65.86 ms, target pause time: 100.00 ms] > , 0.0724890 secs] > [Parallel Time: 51.9 ms, GC Workers: 18] > [GC Worker Start (ms): Min: 308697313.3 , Avg: > 308697313.7 , Max: 308697314.0 , > Diff: 0.6] > [Ext Root Scanning (ms): Min: 4.3, Avg: 5.7, Max: 16.7, Diff: > 12.3, Sum: 101.8] > [Update RS (ms): Min: 0.0, Avg: 9.3, Max: 10.4, Diff: 10.4, Sum: > 166.9] > [Processed Buffers: Min: 0, Avg: 22.0, Max: 30, Diff: 30, > Sum: 396] > [Scan RS (ms): Min: 6.4, Avg: 8.5, Max: 13.0, Diff: 6.5, Sum: 152.3] > [Object Copy (ms): Min: 22.5, Avg: 27.1, Max: 27.7, Diff: 5.2, > Sum: 487.0] > [Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: > 1.0] > [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, > Sum: 0.6] > [GC Worker Total (ms): Min: 50.2, Avg: 50.5, Max: 50.9, Diff: > 0.6, Sum: 909.5] > [GC Worker End (ms): Min: 308697364.2 , Avg: > 308697364.2 , Max: 308697364.3 , > Diff: 0.1] > [Code Root Fixup: 0.0 ms] > [Clear CT: 9.9 ms] > [Other: 10.8 ms] > [Choose CSet: 0.0 ms] > [Ref Proc: 2.8 ms] > [Ref Enq: 0.4 ms] > [Free CSet: 0.9 ms] > [Eden: 3616.0M(3616.0M)->0.0B(3520.0M) Survivors: 416.0M->448.0M > Heap: 52.5G(66.2G)->49.0G(66.2G)] > [Times: user=1.01 sys=0.00, real=0.07 secs] > 2013-12-30T08:25:31.954-0500: 308697.385: [GC > concurrent-root-region-scan-start] > 2013-12-30T08:25:31.967-0500: 308697.398: [GC > concurrent-root-region-scan-end, 0.0131710 secs] > 2013-12-30T08:25:31.967-0500: 308697.398: [GC concurrent-mark-start] > 2013-12-30T08:25:36.566-0500: 308701.997: [GC concurrent-mark-end, > 4.5984140 secs] > 2013-12-30T08:25:36.570-0500: 308702.002: [GC remark > 2013-12-30T08:25:36.573-0500: 308702.004: [GC ref-proc, 0.0126990 > secs], 0.0659540 secs] > [Times: user=0.87 sys=0.00, real=0.06 secs] > 2013-12-30T08:25:36.641-0500: 308702.072: [GC cleanup 52G->52G(66G), > 0.5487830 secs] > [Times: user=9.66 sys=0.06, real=0.54 secs] > 2013-12-30T08:25:37.190-0500: 308702.622: [GC concurrent-cleanup-start] > 2013-12-30T08:25:37.190-0500: 308702.622: [GC concurrent-cleanup-end, > 0.0000480 secs] > > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140102/ab765b84/attachment.html From ryebrye at gmail.com Thu Jan 2 09:57:26 2014 From: ryebrye at gmail.com (Ryan Gardner) Date: Thu, 2 Jan 2014 12:57:26 -0500 Subject: G1 GC clean up time is too long In-Reply-To: References: <52B5037C.8010704@servergy.com> Message-ID: I've also fought with cleanup times being long with a large heap and G1. In my case, I was suspicious that the RSet coarsening was increasing the time for GC Cleanups. If you have a way to test different settings in a non-production environment, you could consider experimenting with: -XX:+UnlockExperimentalVMOptions -XX:G1RSetRegionEntries=4096 and different values for the RSetRegionEntries - 4096 was a sweet spot for me, but your application may behave differently. You can turn on: -XX:+UnlockDiagnosticVMOptions -XX:+G1SummarizeRSetStats -XX:G1SummarizeRSetStatsPeriod=20 to get it to spit out what it is doing to get some more insight into those times. The specific number of RSetRegionEntries I set (4096) was, in theory, supposed to be close to what it was setting based on my region size (also 32m) and number of regions- but it did not seem to be. Also, if you have more memory available, I have found G1 to take the extra memory and not increase pause times much. As you increase the total heap size, the size of your smallest possible collection will also increase since it sets it to a percentage of total heap... In my case I was tuning an applicaiton that was a cache, so it had tons heap space but wasn't churning it over much... I ended up going as low as: -XX:G1NewSizePercent=1 to let G1 feel free to use as few regions as possible to achieve smaller pause times. I've been running in production on 1.7u40 for several months now with 92GB heaps and a worst-case cleanup pause time of around 370ms - prior to tuning the rset region entries, the cleanup phase was getting worse and worse over time and in testing would sometimes be over 1 second. I meant to dive into the OpenJDK code to look at where the default RSetRegionEntries are calculated, but didn't get around to it. Hope that helps, Ryan Gardner On Dec 31, 2013 8:29 PM, "yao" wrote: > Hi Folks, > > Sorry for reporting GC performance result late, we are in the code freeze > period for the holiday season and cannot do any production related > deployment. > > First, I'd like to say thank you to Jenny, Monica and Thomas. Your > suggestions are really helpful and help us to understand G1 GC behavior. We > did NOT observe any full GCs after adjusting suggested parameters. That is > really awesome, we tried these new parameters on Dec 26 and full GC > disappeared since then (at least until I am writing this email, at 3:37pm > EST, Dec 30). > > G1 parameters: > > *-XX:MaxGCPauseMillis=100 *-XX:G1HeapRegionSize=32m > > *-XX:InitiatingHeapOccupancyPercent=65 *-XX:G1ReservePercent=20 > > > > *-XX:G1HeapWastePercent=5 -XX:G1MixedGCLiveThresholdPercent=75 * > We've reduced MaxGCPauseMillis to 100 since our real-time system is focus > on low pause, if system cannot give response in 50 milliseconds, it's > totally useless for the client. However, current read latency 99 percentile > is still slightly higher than CMS machines but they are pretty close (14 > millis vs 12 millis). One thing we can do now is to increase heap size for > G1 machines, for now, the heap size for G1 is only 90 percent of those CMS > machines. This is because we observed our server process is killed by OOM > killer on G1 machines and we decided to decrease heap size on G1 machines. > Since G1ReservePercent was increased, we think it should be safe to > increase G1 heap to be same as CMS machine. We believe it could make G1 > machine give us better performance because 40 percent of heap will be used > for block cache. > > Thanks > -Shengzhe > > G1 Logs > > 2013-12-30T08:25:26.727-0500: 308692.158: [GC pause (young) > Desired survivor size 234881024 bytes, new threshold 14 (max 15) > - age 1: 16447904 bytes, 16447904 total > - age 2: 30614384 bytes, 47062288 total > - age 3: 16122104 bytes, 63184392 total > - age 4: 16542280 bytes, 79726672 total > - age 5: 14249520 bytes, 93976192 total > - age 6: 15187728 bytes, 109163920 total > - age 7: 15073808 bytes, 124237728 total > - age 8: 17903552 bytes, 142141280 total > - age 9: 17031280 bytes, 159172560 total > - age 10: 16854792 bytes, 176027352 total > - age 11: 19192480 bytes, 195219832 total > - age 12: 20491176 bytes, 215711008 total > - age 13: 16367528 bytes, 232078536 total > - age 14: 15536120 bytes, 247614656 total > 308692.158: [G1Ergonomics (CSet Construction) start choosing CSet, > _pending_cards: 32768, predicted base time: 38.52 ms, remaining time: 61.48 > ms, target pause time: 100.00 ms] > 308692.158: [G1Ergonomics (CSet Construction) add young regions to CSet, > eden: 91 regions, survivors: 14 regions, predicted young region time: 27.76 > ms] > 308692.158: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: > 91 regions, survivors: 14 regions, old: 0 regions, predicted pause time: > 66.28 ms, target pause time: 100.00 ms] > 308692.233: [G1Ergonomics (Concurrent Cycles) request concurrent cycle > initiation, reason: occupancy higher than threshold, occupancy: 52143587328 > bytes, allocation request: 0 bytes, threshold: 46172576125 bytes (65.00 %), > source: end of GC] > , 0.0749020 secs] > [Parallel Time: 53.9 ms, GC Workers: 18] > [GC Worker Start (ms): Min: 308692158.6, Avg: 308692159.0, Max: > 308692159.4, Diff: 0.8] > [Ext Root Scanning (ms): Min: 3.9, Avg: 4.5, Max: 6.4, Diff: 2.4, > Sum: 81.9] > [Update RS (ms): Min: 10.2, Avg: 11.6, Max: 12.2, Diff: 2.0, Sum: > 209.0] > [Processed Buffers: Min: 15, Avg: 22.5, Max: 31, Diff: 16, Sum: > 405] > [Scan RS (ms): Min: 7.8, Avg: 8.0, Max: 8.3, Diff: 0.5, Sum: 144.3] > [Object Copy (ms): Min: 28.3, Avg: 28.4, Max: 28.5, Diff: 0.2, Sum: > 510.7] > [Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 1.2] > [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: > 0.5] > [GC Worker Total (ms): Min: 52.3, Avg: 52.6, Max: 53.1, Diff: 0.8, > Sum: 947.5] > [GC Worker End (ms): Min: 308692211.6, Avg: 308692211.7, Max: > 308692211.7, Diff: 0.1] > [Code Root Fixup: 0.0 ms] > [Clear CT: 9.8 ms] > [Other: 11.1 ms] > [Choose CSet: 0.0 ms] > [Ref Proc: 2.4 ms] > [Ref Enq: 0.4 ms] > [Free CSet: 1.1 ms] > [Eden: 2912.0M(2912.0M)->0.0B(3616.0M) Survivors: 448.0M->416.0M Heap: > 51.7G(66.2G)->48.9G(66.2G)] > [Times: user=1.07 sys=0.01, real=0.08 secs] > 308697.312: [G1Ergonomics (Concurrent Cycles) initiate concurrent cycle, > reason: concurrent cycle initiation requested] > 2013-12-30T08:25:31.881-0500: 308697.312: [GC pause (young) (initial-mark) > Desired survivor size 268435456 bytes, new threshold 15 (max 15) > - age 1: 17798336 bytes, 17798336 total > - age 2: 15275456 bytes, 33073792 total > - age 3: 27940176 bytes, 61013968 total > - age 4: 15716648 bytes, 76730616 total > - age 5: 16474656 bytes, 93205272 total > - age 6: 14249232 bytes, 107454504 total > - age 7: 15187536 bytes, 122642040 total > - age 8: 15073808 bytes, 137715848 total > - age 9: 17362752 bytes, 155078600 total > - age 10: 17031280 bytes, 172109880 total > - age 11: 16854792 bytes, 188964672 total > - age 12: 19124800 bytes, 208089472 total > - age 13: 20491176 bytes, 228580648 total > - age 14: 16367528 bytes, 244948176 total > 308697.313: [G1Ergonomics (CSet Construction) start choosing CSet, > _pending_cards: 31028, predicted base time: 37.87 ms, remaining time: 62.13 > ms, target pause time: 100.00 ms] > 308697.313: [G1Ergonomics (CSet Construction) add young regions to CSet, > eden: 113 regions, survivors: 13 regions, predicted young region time: > 27.99 ms] > 308697.313: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: > 113 regions, survivors: 13 regions, old: 0 regions, predicted pause time: > 65.86 ms, target pause time: 100.00 ms] > , 0.0724890 secs] > [Parallel Time: 51.9 ms, GC Workers: 18] > [GC Worker Start (ms): Min: 308697313.3, Avg: 308697313.7, Max: > 308697314.0, Diff: 0.6] > [Ext Root Scanning (ms): Min: 4.3, Avg: 5.7, Max: 16.7, Diff: 12.3, > Sum: 101.8] > [Update RS (ms): Min: 0.0, Avg: 9.3, Max: 10.4, Diff: 10.4, Sum: > 166.9] > [Processed Buffers: Min: 0, Avg: 22.0, Max: 30, Diff: 30, Sum: > 396] > [Scan RS (ms): Min: 6.4, Avg: 8.5, Max: 13.0, Diff: 6.5, Sum: 152.3] > [Object Copy (ms): Min: 22.5, Avg: 27.1, Max: 27.7, Diff: 5.2, Sum: > 487.0] > [Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 1.0] > [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: > 0.6] > [GC Worker Total (ms): Min: 50.2, Avg: 50.5, Max: 50.9, Diff: 0.6, > Sum: 909.5] > [GC Worker End (ms): Min: 308697364.2, Avg: 308697364.2, Max: > 308697364.3, Diff: 0.1] > [Code Root Fixup: 0.0 ms] > [Clear CT: 9.9 ms] > [Other: 10.8 ms] > [Choose CSet: 0.0 ms] > [Ref Proc: 2.8 ms] > [Ref Enq: 0.4 ms] > [Free CSet: 0.9 ms] > [Eden: 3616.0M(3616.0M)->0.0B(3520.0M) Survivors: 416.0M->448.0M Heap: > 52.5G(66.2G)->49.0G(66.2G)] > [Times: user=1.01 sys=0.00, real=0.07 secs] > 2013-12-30T08:25:31.954-0500: 308697.385: [GC > concurrent-root-region-scan-start] > 2013-12-30T08:25:31.967-0500: 308697.398: [GC > concurrent-root-region-scan-end, 0.0131710 secs] > 2013-12-30T08:25:31.967-0500: 308697.398: [GC concurrent-mark-start] > 2013-12-30T08:25:36.566-0500: 308701.997: [GC concurrent-mark-end, > 4.5984140 secs] > 2013-12-30T08:25:36.570-0500: 308702.002: [GC remark > 2013-12-30T08:25:36.573-0500: 308702.004: [GC ref-proc, 0.0126990 secs], > 0.0659540 secs] > [Times: user=0.87 sys=0.00, real=0.06 secs] > 2013-12-30T08:25:36.641-0500: 308702.072: [GC cleanup 52G->52G(66G), > 0.5487830 secs] > [Times: user=9.66 sys=0.06, real=0.54 secs] > 2013-12-30T08:25:37.190-0500: 308702.622: [GC concurrent-cleanup-start] > 2013-12-30T08:25:37.190-0500: 308702.622: [GC concurrent-cleanup-end, > 0.0000480 secs] > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140102/d574df84/attachment-0001.html From yu.zhang at oracle.com Thu Jan 2 10:49:58 2014 From: yu.zhang at oracle.com (YU ZHANG) Date: Thu, 02 Jan 2014 10:49:58 -0800 Subject: G1 GC clean up time is too long In-Reply-To: References: <52B5037C.8010704@servergy.com> Message-ID: <52C5B4D6.8010908@oracle.com> Ryan, Please see my comments in line. Thanks, Jenny On 1/2/2014 9:57 AM, Ryan Gardner wrote: > > I've also fought with cleanup times being long with a large heap and > G1. In my case, I was suspicious that the RSet coarsening was > increasing the time for GC Cleanups. > > If you have a way to test different settings in a non-production > environment, you could consider experimenting with: > > > -XX:+UnlockExperimentalVMOptions > > -XX:G1RSetRegionEntries=4096 > > and different values for the RSetRegionEntries - 4096 was a sweet spot > for me, but your application may behave differently. > > You can turn on: > > -XX:+UnlockDiagnosticVMOptions > > -XX:+G1SummarizeRSetStats > > -XX:G1SummarizeRSetStatsPeriod=20 > > to get it to spit out what it is doing to get some more insight into > those times. > > > The specific number of RSetRegionEntries I set (4096) was, in theory, > supposed to be close to what it was setting based on my region size > (also 32m) and number of regions- but it did not seem to be. > If G1RSetRegionEntries not set, it is decided by G1RSetRegionEntriesBase*(region_size_log_mb+1). G1SetRegionEntriesBase is a constant(256). region_size_log_mb is related to heap region size(region_size_mb-20). If you have 92G heap, and 32m regions size, I guess the default value is bigger than 4096? Assuming my guess was right, you decide to reduce the entries as not seeing 'coarsenings' in the G1SummarizeRSetStats output? Did you see the cards for old or young regions increase as the clean up time increase? Also in your log, when clean up time increase, is it update RS or scan RS? > > Also, if you have more memory available, I have found G1 to take the > extra memory and not increase pause times much. As you increase the > total heap size, the size of your smallest possible collection will > also increase since it sets it to a percentage of total heap... In my > case I was tuning an applicaiton that was a cache, so it had tons heap > space but wasn't churning it over much... > > I ended up going as low as: > > -XX:G1NewSizePercent=1 > > to let G1 feel free to use as few regions as possible to achieve > smaller pause times. > G1NewSizePercent(default 5) allows G1 to allocate this percent of heap as young gen size. Lowering it should results smaller young gen. So the young gc pause is smaller. > > I've been running in production on 1.7u40 for several months now with > 92GB heaps and a worst-case cleanup pause time of around 370ms - prior > to tuning the rset region entries, the cleanup phase was getting worse > and worse over time and in testing would sometimes be over 1 second. > > I meant to dive into the OpenJDK code to look at where the default > RSetRegionEntries are calculated, but didn't get around to it. > > > Hope that helps, > > Ryan Gardner > > > On Dec 31, 2013 8:29 PM, "yao" > wrote: > > Hi Folks, > > Sorry for reporting GC performance result late, we are in the code > freeze period for the holiday season and cannot do any production > related deployment. > > First, I'd like to say thank you to Jenny, Monica and Thomas. Your > suggestions are really helpful and help us to understand G1 GC > behavior. We did NOT observe any full GCs after adjusting > suggested parameters. That is really awesome, we tried these new > parameters on Dec 26 and full GC disappeared since then (at least > until I am writing this email, at 3:37pm EST, Dec 30). > > G1 parameters: > *-XX:MaxGCPauseMillis=100 > *-XX:G1HeapRegionSize=32m > *-XX:InitiatingHeapOccupancyPercent=65 > *-XX:G1ReservePercent=20 > *-XX:G1HeapWastePercent=5 > -XX:G1MixedGCLiveThresholdPercent=75 > > * > We've reduced**MaxGCPauseMillis to 100 since our real-time system > is focus on low pause, if system cannot give response in 50 > milliseconds, it's totally useless for the client. However, > current read latency 99 percentile is still slightly higher than > CMS machines but they are pretty close (14 millis vs 12 millis). > One thing we can do now is to increase heap size for G1 machines, > for now, the heap size for G1 is only 90 percent of those CMS > machines. This is because we observed our server process is killed > by OOM killer on G1 machines and we decided to decrease heap size > on G1 machines. Since G1ReservePercent was increased, we think it > should be safe to increase G1 heap to be same as CMS machine. We > believe it could make G1 machine give us better performance > because 40 percent of heap will be used for block cache. > > Thanks > -Shengzhe > > G1 Logs > > 2013-12-30T08:25:26.727-0500: 308692.158: [GC pause (young) > Desired survivor size 234881024 bytes, new threshold 14 (max 15) > - age 1: 16447904 bytes, 16447904 total > - age 2: 30614384 bytes, 47062288 total > - age 3: 16122104 bytes, 63184392 total > - age 4: 16542280 bytes, 79726672 total > - age 5: 14249520 bytes, 93976192 total > - age 6: 15187728 bytes, 109163920 total > - age 7: 15073808 bytes, 124237728 total > - age 8: 17903552 bytes, 142141280 total > - age 9: 17031280 bytes, 159172560 total > - age 10: 16854792 bytes, 176027352 total > - age 11: 19192480 bytes, 195219832 total > - age 12: 20491176 bytes, 215711008 total > - age 13: 16367528 bytes, 232078536 total > - age 14: 15536120 bytes, 247614656 total > 308692.158: [G1Ergonomics (CSet Construction) start choosing > CSet, _pending_cards: 32768, predicted base time: 38.52 ms, > remaining time: 61.48 ms, target pause time: 100.00 ms] > 308692.158: [G1Ergonomics (CSet Construction) add young regions > to CSet, eden: 91 regions, survivors: 14 regions, predicted young > region time: 27.76 ms] > 308692.158: [G1Ergonomics (CSet Construction) finish choosing > CSet, eden: 91 regions, survivors: 14 regions, old: 0 regions, > predicted pause time: 66.28 ms, target pause time: 100.00 ms] > 308692.233: [G1Ergonomics (Concurrent Cycles) request concurrent > cycle initiation, reason: occupancy higher than threshold, > occupancy: 52143587328 bytes, allocation request: 0 bytes, > threshold: 46172576125 bytes (65.00 %), source: end of GC] > , 0.0749020 secs] > [Parallel Time: 53.9 ms, GC Workers: 18] > [GC Worker Start (ms): Min: 308692158.6 , > Avg: 308692159.0 , Max: 308692159.4 > , Diff: 0.8] > [Ext Root Scanning (ms): Min: 3.9, Avg: 4.5, Max: 6.4, Diff: > 2.4, Sum: 81.9] > [Update RS (ms): Min: 10.2, Avg: 11.6, Max: 12.2, Diff: 2.0, > Sum: 209.0] > [Processed Buffers: Min: 15, Avg: 22.5, Max: 31, Diff: > 16, Sum: 405] > [Scan RS (ms): Min: 7.8, Avg: 8.0, Max: 8.3, Diff: 0.5, Sum: > 144.3] > [Object Copy (ms): Min: 28.3, Avg: 28.4, Max: 28.5, Diff: > 0.2, Sum: 510.7] > [Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, > Sum: 1.2] > [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: > 0.1, Sum: 0.5] > [GC Worker Total (ms): Min: 52.3, Avg: 52.6, Max: 53.1, > Diff: 0.8, Sum: 947.5] > [GC Worker End (ms): Min: 308692211.6 , > Avg: 308692211.7 , Max: 308692211.7 > , Diff: 0.1] > [Code Root Fixup: 0.0 ms] > [Clear CT: 9.8 ms] > [Other: 11.1 ms] > [Choose CSet: 0.0 ms] > [Ref Proc: 2.4 ms] > [Ref Enq: 0.4 ms] > [Free CSet: 1.1 ms] > [Eden: 2912.0M(2912.0M)->0.0B(3616.0M) Survivors: > 448.0M->416.0M Heap: 51.7G(66.2G)->48.9G(66.2G)] > [Times: user=1.07 sys=0.01, real=0.08 secs] > 308697.312: [G1Ergonomics (Concurrent Cycles) initiate concurrent > cycle, reason: concurrent cycle initiation requested] > 2013-12-30T08:25:31.881-0500: 308697.312: [GC pause (young) > (initial-mark) > Desired survivor size 268435456 bytes, new threshold 15 (max 15) > - age 1: 17798336 bytes, 17798336 total > - age 2: 15275456 bytes, 33073792 total > - age 3: 27940176 bytes, 61013968 total > - age 4: 15716648 bytes, 76730616 total > - age 5: 16474656 bytes, 93205272 total > - age 6: 14249232 bytes, 107454504 total > - age 7: 15187536 bytes, 122642040 total > - age 8: 15073808 bytes, 137715848 total > - age 9: 17362752 bytes, 155078600 total > - age 10: 17031280 bytes, 172109880 total > - age 11: 16854792 bytes, 188964672 total > - age 12: 19124800 bytes, 208089472 total > - age 13: 20491176 bytes, 228580648 total > - age 14: 16367528 bytes, 244948176 total > 308697.313: [G1Ergonomics (CSet Construction) start choosing > CSet, _pending_cards: 31028, predicted base time: 37.87 ms, > remaining time: 62.13 ms, target pause time: 100.00 ms] > 308697.313: [G1Ergonomics (CSet Construction) add young regions > to CSet, eden: 113 regions, survivors: 13 regions, predicted young > region time: 27.99 ms] > 308697.313: [G1Ergonomics (CSet Construction) finish choosing > CSet, eden: 113 regions, survivors: 13 regions, old: 0 regions, > predicted pause time: 65.86 ms, target pause time: 100.00 ms] > , 0.0724890 secs] > [Parallel Time: 51.9 ms, GC Workers: 18] > [GC Worker Start (ms): Min: 308697313.3 , > Avg: 308697313.7 , Max: 308697314.0 > , Diff: 0.6] > [Ext Root Scanning (ms): Min: 4.3, Avg: 5.7, Max: 16.7, > Diff: 12.3, Sum: 101.8] > [Update RS (ms): Min: 0.0, Avg: 9.3, Max: 10.4, Diff: 10.4, > Sum: 166.9] > [Processed Buffers: Min: 0, Avg: 22.0, Max: 30, Diff: 30, > Sum: 396] > [Scan RS (ms): Min: 6.4, Avg: 8.5, Max: 13.0, Diff: 6.5, > Sum: 152.3] > [Object Copy (ms): Min: 22.5, Avg: 27.1, Max: 27.7, Diff: > 5.2, Sum: 487.0] > [Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, > Sum: 1.0] > [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: > 0.1, Sum: 0.6] > [GC Worker Total (ms): Min: 50.2, Avg: 50.5, Max: 50.9, > Diff: 0.6, Sum: 909.5] > [GC Worker End (ms): Min: 308697364.2 , > Avg: 308697364.2 , Max: 308697364.3 > , Diff: 0.1] > [Code Root Fixup: 0.0 ms] > [Clear CT: 9.9 ms] > [Other: 10.8 ms] > [Choose CSet: 0.0 ms] > [Ref Proc: 2.8 ms] > [Ref Enq: 0.4 ms] > [Free CSet: 0.9 ms] > [Eden: 3616.0M(3616.0M)->0.0B(3520.0M) Survivors: > 416.0M->448.0M Heap: 52.5G(66.2G)->49.0G(66.2G)] > [Times: user=1.01 sys=0.00, real=0.07 secs] > 2013-12-30T08:25:31.954-0500: 308697.385: [GC > concurrent-root-region-scan-start] > 2013-12-30T08:25:31.967-0500: 308697.398: [GC > concurrent-root-region-scan-end, 0.0131710 secs] > 2013-12-30T08:25:31.967-0500: 308697.398: [GC concurrent-mark-start] > 2013-12-30T08:25:36.566-0500: 308701.997: [GC concurrent-mark-end, > 4.5984140 secs] > 2013-12-30T08:25:36.570-0500: 308702.002: [GC remark > 2013-12-30T08:25:36.573-0500: 308702.004: [GC ref-proc, 0.0126990 > secs], 0.0659540 secs] > [Times: user=0.87 sys=0.00, real=0.06 secs] > 2013-12-30T08:25:36.641-0500: 308702.072: [GC cleanup > 52G->52G(66G), 0.5487830 secs] > [Times: user=9.66 sys=0.06, real=0.54 secs] > 2013-12-30T08:25:37.190-0500: 308702.622: [GC > concurrent-cleanup-start] > 2013-12-30T08:25:37.190-0500: 308702.622: [GC > concurrent-cleanup-end, 0.0000480 secs] > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140102/763edb09/attachment.html From yaoshengzhe at gmail.com Thu Jan 2 14:35:10 2014 From: yaoshengzhe at gmail.com (yao) Date: Thu, 2 Jan 2014 14:35:10 -0800 Subject: G1 GC clean up time is too long In-Reply-To: <52C5B4D6.8010908@oracle.com> References: <52B5037C.8010704@servergy.com> <52C5B4D6.8010908@oracle.com> Message-ID: Hi Ryan, I've enabled gc logging options you mentioned and it looks like rset coarsenings is a problem for large gc clean up time. I will take your suggestions and try different G1RSetRegionEntries values. Thank you very much. Happy New Year -Shengzhe *Typical RSet Log* Concurrent RS processed 184839720 cards Of 960997 completed buffers: 930426 ( 96.8%) by conc RS threads. 30571 ( 3.2%) by mutator threads. Conc RS threads times(s) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 \ 0.00 Total heap region rem set sizes = 5256086K. Max = 8640K. Static structures = 347K, free_lists = 7420K. 1166427614 occupied cards represented. Max size region = 296:(O)[0x00007fc7a6000000,0x00007fc7a8000000,0x00007fc7a8000000], size = 8641K, occupied = 1797K. Did 25790 coarsenings. Output of *$ cat gc-hbase-1388692019.log | grep "coarsenings\|\(GC cleanup\)"* Did 0 coarsenings. Did 0 coarsenings. Did 0 coarsenings. Did 0 coarsenings. Did 0 coarsenings. Did 0 coarsenings. Did 72 coarsenings. Did 224 coarsenings. 2014-01-02T15:12:03.031-0500: 1452.619: [GC cleanup 44G->43G(66G), 0.0376940 secs] Did 1015 coarsenings. Did 1476 coarsenings. Did 2210 coarsenings. 2014-01-02T15:25:37.483-0500: 2267.070: [GC cleanup 43G->42G(66G), 0.0539190 secs] Did 4123 coarsenings. Did 4817 coarsenings. Did 5362 coarsenings. 2014-01-02T15:40:19.499-0500: 3149.087: [GC cleanup 44G->42G(66G), 0.0661880 secs] Did 6316 coarsenings. Did 6842 coarsenings. Did 7213 coarsenings. 2014-01-02T15:54:42.812-0500: 4012.400: [GC cleanup 43G->42G(66G), 0.0888960 secs] Did 7458 coarsenings. Did 7739 coarsenings. Did 8214 coarsenings. 2014-01-02T16:09:04.009-0500: 4873.597: [GC cleanup 44G->43G(66G), 0.1171540 secs] Did 8958 coarsenings. Did 8973 coarsenings. Did 9056 coarsenings. Did 9543 coarsenings. 2014-01-02T16:23:51.359-0500: 5760.947: [GC cleanup 44G->43G(66G), 0.1526980 secs] Did 9561 coarsenings. Did 9873 coarsenings. Did 10209 coarsenings. 2014-01-02T16:39:04.462-0500: 6674.050: [GC cleanup 44G->43G(66G), 0.1923330 secs] Did 10599 coarsenings. Did 10849 coarsenings. Did 11178 coarsenings. 2014-01-02T16:46:57.445-0500: 7147.033: [GC cleanup 44G->44G(66G), 0.2353640 secs] Did 11746 coarsenings. Did 12701 coarsenings. 2014-01-02T16:53:17.536-0500: 7527.124: [GC cleanup 44G->44G(66G), 0.3489450 secs] Did 13272 coarsenings. Did 14682 coarsenings. 2014-01-02T16:58:00.726-0500: 7810.314: [GC cleanup 44G->44G(66G), 0.4271240 secs] Did 16630 coarsenings. 2014-01-02T17:01:37.077-0500: 8026.664: [GC cleanup 44G->44G(66G), 0.5089060 secs] Did 17612 coarsenings. Did 21654 coarsenings. 2014-01-02T17:06:02.566-0500: 8292.154: [GC cleanup 44G->44G(66G), 0.5531680 secs] Did 23774 coarsenings. Did 24074 coarsenings. 2014-01-02T17:11:24.795-0500: 8614.383: [GC cleanup 44G->44G(66G), 0.5290600 secs] Did 24768 coarsenings. 2014-01-02T17:17:23.219-0500: 8972.807: [GC cleanup 44G->44G(66G), 0.5382620 secs] Did 25790 coarsenings. Did 27047 coarsenings. 2014-01-02T17:23:00.551-0500: 9310.139: [GC cleanup 45G->44G(66G), 0.5107910 secs] Did 28558 coarsenings. 2014-01-02T17:28:22.157-0500: 9631.745: [GC cleanup 45G->44G(66G), 0.4902690 secs] Did 29272 coarsenings. Did 29335 coarsenings. On Thu, Jan 2, 2014 at 10:49 AM, YU ZHANG wrote: > Ryan, > > Please see my comments in line. > > Thanks, > Jenny > > On 1/2/2014 9:57 AM, Ryan Gardner wrote: > > I've also fought with cleanup times being long with a large heap and G1. > In my case, I was suspicious that the RSet coarsening was increasing the > time for GC Cleanups. > > If you have a way to test different settings in a non-production > environment, you could consider experimenting with: > > > -XX:+UnlockExperimentalVMOptions > > -XX:G1RSetRegionEntries=4096 > > and different values for the RSetRegionEntries - 4096 was a sweet spot for > me, but your application may behave differently. > > You can turn on: > > -XX:+UnlockDiagnosticVMOptions > > -XX:+G1SummarizeRSetStats > > -XX:G1SummarizeRSetStatsPeriod=20 > > to get it to spit out what it is doing to get some more insight into those > times. > > > The specific number of RSetRegionEntries I set (4096) was, in theory, > supposed to be close to what it was setting based on my region size (also > 32m) and number of regions- but it did not seem to be. > > If G1RSetRegionEntries not set, it is decided by > G1RSetRegionEntriesBase*(region_size_log_mb+1). > G1SetRegionEntriesBase is a constant(256). region_size_log_mb is related > to heap region size(region_size_mb-20). > > If you have 92G heap, and 32m regions size, I guess the default value is > bigger than 4096? > Assuming my guess was right, you decide to reduce the entries as not > seeing 'coarsenings' in the G1SummarizeRSetStats output? Did you see the > cards for old or young regions increase as the clean up time increase? > Also in your log, when clean up time increase, is it update RS or scan RS? > > Also, if you have more memory available, I have found G1 to take the > extra memory and not increase pause times much. As you increase the total > heap size, the size of your smallest possible collection will also increase > since it sets it to a percentage of total heap... In my case I was tuning > an applicaiton that was a cache, so it had tons heap space but wasn't > churning it over much... > > I ended up going as low as: > > -XX:G1NewSizePercent=1 > > to let G1 feel free to use as few regions as possible to achieve smaller > pause times. > > G1NewSizePercent(default 5) allows G1 to allocate this percent of heap as > young gen size. Lowering it should results smaller young gen. So the > young gc pause is smaller. > > I've been running in production on 1.7u40 for several months now with > 92GB heaps and a worst-case cleanup pause time of around 370ms - prior to > tuning the rset region entries, the cleanup phase was getting worse and > worse over time and in testing would sometimes be over 1 second. > > I meant to dive into the OpenJDK code to look at where the default > RSetRegionEntries are calculated, but didn't get around to it. > > > Hope that helps, > > Ryan Gardner > > > On Dec 31, 2013 8:29 PM, "yao" wrote: > >> Hi Folks, >> >> Sorry for reporting GC performance result late, we are in the code >> freeze period for the holiday season and cannot do any production related >> deployment. >> >> First, I'd like to say thank you to Jenny, Monica and Thomas. Your >> suggestions are really helpful and help us to understand G1 GC behavior. We >> did NOT observe any full GCs after adjusting suggested parameters. That is >> really awesome, we tried these new parameters on Dec 26 and full GC >> disappeared since then (at least until I am writing this email, at 3:37pm >> EST, Dec 30). >> >> G1 parameters: >> >> *-XX:MaxGCPauseMillis=100 *-XX:G1HeapRegionSize=32m >> >> *-XX:InitiatingHeapOccupancyPercent=65 *-XX:G1ReservePercent=20 >> >> >> >> *-XX:G1HeapWastePercent=5 -XX:G1MixedGCLiveThresholdPercent=75 * >> We've reduced MaxGCPauseMillis to 100 since our real-time system is >> focus on low pause, if system cannot give response in 50 milliseconds, it's >> totally useless for the client. However, current read latency 99 percentile >> is still slightly higher than CMS machines but they are pretty close (14 >> millis vs 12 millis). One thing we can do now is to increase heap size for >> G1 machines, for now, the heap size for G1 is only 90 percent of those CMS >> machines. This is because we observed our server process is killed by OOM >> killer on G1 machines and we decided to decrease heap size on G1 machines. >> Since G1ReservePercent was increased, we think it should be safe to >> increase G1 heap to be same as CMS machine. We believe it could make G1 >> machine give us better performance because 40 percent of heap will be used >> for block cache. >> >> Thanks >> -Shengzhe >> >> G1 Logs >> >> 2013-12-30T08:25:26.727-0500: 308692.158: [GC pause (young) >> Desired survivor size 234881024 bytes, new threshold 14 (max 15) >> - age 1: 16447904 bytes, 16447904 total >> - age 2: 30614384 bytes, 47062288 total >> - age 3: 16122104 bytes, 63184392 total >> - age 4: 16542280 bytes, 79726672 total >> - age 5: 14249520 bytes, 93976192 total >> - age 6: 15187728 bytes, 109163920 total >> - age 7: 15073808 bytes, 124237728 total >> - age 8: 17903552 bytes, 142141280 total >> - age 9: 17031280 bytes, 159172560 total >> - age 10: 16854792 bytes, 176027352 total >> - age 11: 19192480 bytes, 195219832 total >> - age 12: 20491176 bytes, 215711008 total >> - age 13: 16367528 bytes, 232078536 total >> - age 14: 15536120 bytes, 247614656 total >> 308692.158: [G1Ergonomics (CSet Construction) start choosing CSet, >> _pending_cards: 32768, predicted base time: 38.52 ms, remaining time: 61.48 >> ms, target pause time: 100.00 ms] >> 308692.158: [G1Ergonomics (CSet Construction) add young regions to CSet, >> eden: 91 regions, survivors: 14 regions, predicted young region time: 27.76 >> ms] >> 308692.158: [G1Ergonomics (CSet Construction) finish choosing CSet, >> eden: 91 regions, survivors: 14 regions, old: 0 regions, predicted pause >> time: 66.28 ms, target pause time: 100.00 ms] >> 308692.233: [G1Ergonomics (Concurrent Cycles) request concurrent cycle >> initiation, reason: occupancy higher than threshold, occupancy: 52143587328 >> bytes, allocation request: 0 bytes, threshold: 46172576125 bytes (65.00 %), >> source: end of GC] >> , 0.0749020 secs] >> [Parallel Time: 53.9 ms, GC Workers: 18] >> [GC Worker Start (ms): Min: 308692158.6, Avg: 308692159.0, Max: >> 308692159.4, Diff: 0.8] >> [Ext Root Scanning (ms): Min: 3.9, Avg: 4.5, Max: 6.4, Diff: 2.4, >> Sum: 81.9] >> [Update RS (ms): Min: 10.2, Avg: 11.6, Max: 12.2, Diff: 2.0, Sum: >> 209.0] >> [Processed Buffers: Min: 15, Avg: 22.5, Max: 31, Diff: 16, Sum: >> 405] >> [Scan RS (ms): Min: 7.8, Avg: 8.0, Max: 8.3, Diff: 0.5, Sum: 144.3] >> [Object Copy (ms): Min: 28.3, Avg: 28.4, Max: 28.5, Diff: 0.2, Sum: >> 510.7] >> [Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: >> 1.2] >> [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, >> Sum: 0.5] >> [GC Worker Total (ms): Min: 52.3, Avg: 52.6, Max: 53.1, Diff: 0.8, >> Sum: 947.5] >> [GC Worker End (ms): Min: 308692211.6, Avg: 308692211.7, Max: >> 308692211.7, Diff: 0.1] >> [Code Root Fixup: 0.0 ms] >> [Clear CT: 9.8 ms] >> [Other: 11.1 ms] >> [Choose CSet: 0.0 ms] >> [Ref Proc: 2.4 ms] >> [Ref Enq: 0.4 ms] >> [Free CSet: 1.1 ms] >> [Eden: 2912.0M(2912.0M)->0.0B(3616.0M) Survivors: 448.0M->416.0M Heap: >> 51.7G(66.2G)->48.9G(66.2G)] >> [Times: user=1.07 sys=0.01, real=0.08 secs] >> 308697.312: [G1Ergonomics (Concurrent Cycles) initiate concurrent cycle, >> reason: concurrent cycle initiation requested] >> 2013-12-30T08:25:31.881-0500: 308697.312: [GC pause (young) (initial-mark) >> Desired survivor size 268435456 bytes, new threshold 15 (max 15) >> - age 1: 17798336 bytes, 17798336 total >> - age 2: 15275456 bytes, 33073792 total >> - age 3: 27940176 bytes, 61013968 total >> - age 4: 15716648 bytes, 76730616 total >> - age 5: 16474656 bytes, 93205272 total >> - age 6: 14249232 bytes, 107454504 total >> - age 7: 15187536 bytes, 122642040 total >> - age 8: 15073808 bytes, 137715848 total >> - age 9: 17362752 bytes, 155078600 total >> - age 10: 17031280 bytes, 172109880 total >> - age 11: 16854792 bytes, 188964672 total >> - age 12: 19124800 bytes, 208089472 total >> - age 13: 20491176 bytes, 228580648 total >> - age 14: 16367528 bytes, 244948176 total >> 308697.313: [G1Ergonomics (CSet Construction) start choosing CSet, >> _pending_cards: 31028, predicted base time: 37.87 ms, remaining time: 62.13 >> ms, target pause time: 100.00 ms] >> 308697.313: [G1Ergonomics (CSet Construction) add young regions to CSet, >> eden: 113 regions, survivors: 13 regions, predicted young region time: >> 27.99 ms] >> 308697.313: [G1Ergonomics (CSet Construction) finish choosing CSet, >> eden: 113 regions, survivors: 13 regions, old: 0 regions, predicted pause >> time: 65.86 ms, target pause time: 100.00 ms] >> , 0.0724890 secs] >> [Parallel Time: 51.9 ms, GC Workers: 18] >> [GC Worker Start (ms): Min: 308697313.3, Avg: 308697313.7, Max: >> 308697314.0, Diff: 0.6] >> [Ext Root Scanning (ms): Min: 4.3, Avg: 5.7, Max: 16.7, Diff: 12.3, >> Sum: 101.8] >> [Update RS (ms): Min: 0.0, Avg: 9.3, Max: 10.4, Diff: 10.4, Sum: >> 166.9] >> [Processed Buffers: Min: 0, Avg: 22.0, Max: 30, Diff: 30, Sum: >> 396] >> [Scan RS (ms): Min: 6.4, Avg: 8.5, Max: 13.0, Diff: 6.5, Sum: 152.3] >> [Object Copy (ms): Min: 22.5, Avg: 27.1, Max: 27.7, Diff: 5.2, Sum: >> 487.0] >> [Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: >> 1.0] >> [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, >> Sum: 0.6] >> [GC Worker Total (ms): Min: 50.2, Avg: 50.5, Max: 50.9, Diff: 0.6, >> Sum: 909.5] >> [GC Worker End (ms): Min: 308697364.2, Avg: 308697364.2, Max: >> 308697364.3, Diff: 0.1] >> [Code Root Fixup: 0.0 ms] >> [Clear CT: 9.9 ms] >> [Other: 10.8 ms] >> [Choose CSet: 0.0 ms] >> [Ref Proc: 2.8 ms] >> [Ref Enq: 0.4 ms] >> [Free CSet: 0.9 ms] >> [Eden: 3616.0M(3616.0M)->0.0B(3520.0M) Survivors: 416.0M->448.0M Heap: >> 52.5G(66.2G)->49.0G(66.2G)] >> [Times: user=1.01 sys=0.00, real=0.07 secs] >> 2013-12-30T08:25:31.954-0500: 308697.385: [GC >> concurrent-root-region-scan-start] >> 2013-12-30T08:25:31.967-0500: 308697.398: [GC >> concurrent-root-region-scan-end, 0.0131710 secs] >> 2013-12-30T08:25:31.967-0500: 308697.398: [GC concurrent-mark-start] >> 2013-12-30T08:25:36.566-0500: 308701.997: [GC concurrent-mark-end, >> 4.5984140 secs] >> 2013-12-30T08:25:36.570-0500: 308702.002: [GC remark >> 2013-12-30T08:25:36.573-0500: 308702.004: [GC ref-proc, 0.0126990 secs], >> 0.0659540 secs] >> [Times: user=0.87 sys=0.00, real=0.06 secs] >> 2013-12-30T08:25:36.641-0500: 308702.072: [GC cleanup 52G->52G(66G), >> 0.5487830 secs] >> [Times: user=9.66 sys=0.06, real=0.54 secs] >> 2013-12-30T08:25:37.190-0500: 308702.622: [GC concurrent-cleanup-start] >> 2013-12-30T08:25:37.190-0500: 308702.622: [GC concurrent-cleanup-end, >> 0.0000480 secs] >> >> >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> >> > > _______________________________________________ > hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140102/5d821f95/attachment.html From ryebrye at gmail.com Thu Jan 2 15:13:54 2014 From: ryebrye at gmail.com (Ryan Gardner) Date: Thu, 2 Jan 2014 18:13:54 -0500 Subject: G1 GC clean up time is too long In-Reply-To: References: <52B5037C.8010704@servergy.com> <52C5B4D6.8010908@oracle.com> Message-ID: Be sure to try different values... 4196 for me was half the size of the default value yet yielded far fewer coarsenings (which made no sense to me at the time) I'm going to try to dig up my logs from my tuning earlier to reply to the previous email - it was a few months ago so the specifics aren't fresh in my mind. I seem to remember that there was a slight tradeoff for rset scanning. I tried 1024, 2048, 4096, 8192 and the sweet spot for me was 4096. Let me know what you find. I'm curious to see if your results match mine. Ryan On Jan 2, 2014 5:35 PM, "yao" wrote: > Hi Ryan, > > I've enabled gc logging options you mentioned and it looks like rset > coarsenings is a problem for large gc clean up time. I will take your > suggestions and try different G1RSetRegionEntries values. Thank you very > much. > > Happy New Year > -Shengzhe > > *Typical RSet Log* > Concurrent RS processed 184839720 cards > Of 960997 completed buffers: > 930426 ( 96.8%) by conc RS threads. > 30571 ( 3.2%) by mutator threads. > Conc RS threads times(s) > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 \ > 0.00 > Total heap region rem set sizes = 5256086K. Max = 8640K. > Static structures = 347K, free_lists = 7420K. > 1166427614 occupied cards represented. > Max size region = > 296:(O)[0x00007fc7a6000000,0x00007fc7a8000000,0x00007fc7a8000000], size = > 8641K, occupied = 1797K. > Did 25790 coarsenings. > > Output of *$ cat gc-hbase-1388692019.log | grep "coarsenings\|\(GC > cleanup\)"* > Did 0 coarsenings. > Did 0 coarsenings. > Did 0 coarsenings. > Did 0 coarsenings. > Did 0 coarsenings. > Did 0 coarsenings. > Did 72 coarsenings. > Did 224 coarsenings. > 2014-01-02T15:12:03.031-0500: 1452.619: [GC cleanup 44G->43G(66G), > 0.0376940 secs] > Did 1015 coarsenings. > Did 1476 coarsenings. > Did 2210 coarsenings. > 2014-01-02T15:25:37.483-0500: 2267.070: [GC cleanup 43G->42G(66G), > 0.0539190 secs] > Did 4123 coarsenings. > Did 4817 coarsenings. > Did 5362 coarsenings. > 2014-01-02T15:40:19.499-0500: 3149.087: [GC cleanup 44G->42G(66G), > 0.0661880 secs] > Did 6316 coarsenings. > Did 6842 coarsenings. > Did 7213 coarsenings. > 2014-01-02T15:54:42.812-0500: 4012.400: [GC cleanup 43G->42G(66G), > 0.0888960 secs] > Did 7458 coarsenings. > Did 7739 coarsenings. > Did 8214 coarsenings. > 2014-01-02T16:09:04.009-0500: 4873.597: [GC cleanup 44G->43G(66G), > 0.1171540 secs] > Did 8958 coarsenings. > Did 8973 coarsenings. > Did 9056 coarsenings. > Did 9543 coarsenings. > 2014-01-02T16:23:51.359-0500: 5760.947: [GC cleanup 44G->43G(66G), > 0.1526980 secs] > Did 9561 coarsenings. > Did 9873 coarsenings. > Did 10209 coarsenings. > 2014-01-02T16:39:04.462-0500: 6674.050: [GC cleanup 44G->43G(66G), > 0.1923330 secs] > Did 10599 coarsenings. > Did 10849 coarsenings. > Did 11178 coarsenings. > 2014-01-02T16:46:57.445-0500: 7147.033: [GC cleanup 44G->44G(66G), > 0.2353640 secs] > Did 11746 coarsenings. > Did 12701 coarsenings. > 2014-01-02T16:53:17.536-0500: 7527.124: [GC cleanup 44G->44G(66G), > 0.3489450 secs] > Did 13272 coarsenings. > Did 14682 coarsenings. > 2014-01-02T16:58:00.726-0500: 7810.314: [GC cleanup 44G->44G(66G), > 0.4271240 secs] > Did 16630 coarsenings. > 2014-01-02T17:01:37.077-0500: 8026.664: [GC cleanup 44G->44G(66G), > 0.5089060 secs] > Did 17612 coarsenings. > Did 21654 coarsenings. > 2014-01-02T17:06:02.566-0500: 8292.154: [GC cleanup 44G->44G(66G), > 0.5531680 secs] > Did 23774 coarsenings. > Did 24074 coarsenings. > 2014-01-02T17:11:24.795-0500: 8614.383: [GC cleanup 44G->44G(66G), > 0.5290600 secs] > Did 24768 coarsenings. > 2014-01-02T17:17:23.219-0500: 8972.807: [GC cleanup 44G->44G(66G), > 0.5382620 secs] > Did 25790 coarsenings. > Did 27047 coarsenings. > 2014-01-02T17:23:00.551-0500: 9310.139: [GC cleanup 45G->44G(66G), > 0.5107910 secs] > Did 28558 coarsenings. > 2014-01-02T17:28:22.157-0500: 9631.745: [GC cleanup 45G->44G(66G), > 0.4902690 secs] > Did 29272 coarsenings. > Did 29335 coarsenings. > > > On Thu, Jan 2, 2014 at 10:49 AM, YU ZHANG wrote: > >> Ryan, >> >> Please see my comments in line. >> >> Thanks, >> Jenny >> >> On 1/2/2014 9:57 AM, Ryan Gardner wrote: >> >> I've also fought with cleanup times being long with a large heap and >> G1. In my case, I was suspicious that the RSet coarsening was increasing >> the time for GC Cleanups. >> >> If you have a way to test different settings in a non-production >> environment, you could consider experimenting with: >> >> >> -XX:+UnlockExperimentalVMOptions >> >> -XX:G1RSetRegionEntries=4096 >> >> and different values for the RSetRegionEntries - 4096 was a sweet spot >> for me, but your application may behave differently. >> >> You can turn on: >> >> -XX:+UnlockDiagnosticVMOptions >> >> -XX:+G1SummarizeRSetStats >> >> -XX:G1SummarizeRSetStatsPeriod=20 >> >> to get it to spit out what it is doing to get some more insight into >> those times. >> >> >> The specific number of RSetRegionEntries I set (4096) was, in theory, >> supposed to be close to what it was setting based on my region size (also >> 32m) and number of regions- but it did not seem to be. >> >> If G1RSetRegionEntries not set, it is decided by >> G1RSetRegionEntriesBase*(region_size_log_mb+1). >> G1SetRegionEntriesBase is a constant(256). region_size_log_mb is related >> to heap region size(region_size_mb-20). >> >> If you have 92G heap, and 32m regions size, I guess the default value is >> bigger than 4096? >> Assuming my guess was right, you decide to reduce the entries as not >> seeing 'coarsenings' in the G1SummarizeRSetStats output? Did you see the >> cards for old or young regions increase as the clean up time increase? >> Also in your log, when clean up time increase, is it update RS or scan RS? >> >> Also, if you have more memory available, I have found G1 to take the >> extra memory and not increase pause times much. As you increase the total >> heap size, the size of your smallest possible collection will also increase >> since it sets it to a percentage of total heap... In my case I was tuning >> an applicaiton that was a cache, so it had tons heap space but wasn't >> churning it over much... >> >> I ended up going as low as: >> >> -XX:G1NewSizePercent=1 >> >> to let G1 feel free to use as few regions as possible to achieve smaller >> pause times. >> >> G1NewSizePercent(default 5) allows G1 to allocate this percent of heap as >> young gen size. Lowering it should results smaller young gen. So the >> young gc pause is smaller. >> >> I've been running in production on 1.7u40 for several months now with >> 92GB heaps and a worst-case cleanup pause time of around 370ms - prior to >> tuning the rset region entries, the cleanup phase was getting worse and >> worse over time and in testing would sometimes be over 1 second. >> >> I meant to dive into the OpenJDK code to look at where the default >> RSetRegionEntries are calculated, but didn't get around to it. >> >> >> Hope that helps, >> >> Ryan Gardner >> >> >> On Dec 31, 2013 8:29 PM, "yao" wrote: >> >>> Hi Folks, >>> >>> Sorry for reporting GC performance result late, we are in the code >>> freeze period for the holiday season and cannot do any production related >>> deployment. >>> >>> First, I'd like to say thank you to Jenny, Monica and Thomas. Your >>> suggestions are really helpful and help us to understand G1 GC behavior. We >>> did NOT observe any full GCs after adjusting suggested parameters. That is >>> really awesome, we tried these new parameters on Dec 26 and full GC >>> disappeared since then (at least until I am writing this email, at 3:37pm >>> EST, Dec 30). >>> >>> G1 parameters: >>> >>> *-XX:MaxGCPauseMillis=100 *-XX:G1HeapRegionSize=32m >>> >>> *-XX:InitiatingHeapOccupancyPercent=65 *-XX:G1ReservePercent=20 >>> >>> >>> >>> *-XX:G1HeapWastePercent=5 -XX:G1MixedGCLiveThresholdPercent=75 * >>> We've reduced MaxGCPauseMillis to 100 since our real-time system is >>> focus on low pause, if system cannot give response in 50 milliseconds, it's >>> totally useless for the client. However, current read latency 99 percentile >>> is still slightly higher than CMS machines but they are pretty close (14 >>> millis vs 12 millis). One thing we can do now is to increase heap size for >>> G1 machines, for now, the heap size for G1 is only 90 percent of those CMS >>> machines. This is because we observed our server process is killed by OOM >>> killer on G1 machines and we decided to decrease heap size on G1 machines. >>> Since G1ReservePercent was increased, we think it should be safe to >>> increase G1 heap to be same as CMS machine. We believe it could make G1 >>> machine give us better performance because 40 percent of heap will be used >>> for block cache. >>> >>> Thanks >>> -Shengzhe >>> >>> G1 Logs >>> >>> 2013-12-30T08:25:26.727-0500: 308692.158: [GC pause (young) >>> Desired survivor size 234881024 bytes, new threshold 14 (max 15) >>> - age 1: 16447904 bytes, 16447904 total >>> - age 2: 30614384 bytes, 47062288 total >>> - age 3: 16122104 bytes, 63184392 total >>> - age 4: 16542280 bytes, 79726672 total >>> - age 5: 14249520 bytes, 93976192 total >>> - age 6: 15187728 bytes, 109163920 total >>> - age 7: 15073808 bytes, 124237728 total >>> - age 8: 17903552 bytes, 142141280 total >>> - age 9: 17031280 bytes, 159172560 total >>> - age 10: 16854792 bytes, 176027352 total >>> - age 11: 19192480 bytes, 195219832 total >>> - age 12: 20491176 bytes, 215711008 total >>> - age 13: 16367528 bytes, 232078536 total >>> - age 14: 15536120 bytes, 247614656 total >>> 308692.158: [G1Ergonomics (CSet Construction) start choosing CSet, >>> _pending_cards: 32768, predicted base time: 38.52 ms, remaining time: 61.48 >>> ms, target pause time: 100.00 ms] >>> 308692.158: [G1Ergonomics (CSet Construction) add young regions to >>> CSet, eden: 91 regions, survivors: 14 regions, predicted young region time: >>> 27.76 ms] >>> 308692.158: [G1Ergonomics (CSet Construction) finish choosing CSet, >>> eden: 91 regions, survivors: 14 regions, old: 0 regions, predicted pause >>> time: 66.28 ms, target pause time: 100.00 ms] >>> 308692.233: [G1Ergonomics (Concurrent Cycles) request concurrent cycle >>> initiation, reason: occupancy higher than threshold, occupancy: 52143587328 >>> bytes, allocation request: 0 bytes, threshold: 46172576125 bytes (65.00 %), >>> source: end of GC] >>> , 0.0749020 secs] >>> [Parallel Time: 53.9 ms, GC Workers: 18] >>> [GC Worker Start (ms): Min: 308692158.6, Avg: 308692159.0, Max: >>> 308692159.4, Diff: 0.8] >>> [Ext Root Scanning (ms): Min: 3.9, Avg: 4.5, Max: 6.4, Diff: 2.4, >>> Sum: 81.9] >>> [Update RS (ms): Min: 10.2, Avg: 11.6, Max: 12.2, Diff: 2.0, Sum: >>> 209.0] >>> [Processed Buffers: Min: 15, Avg: 22.5, Max: 31, Diff: 16, Sum: >>> 405] >>> [Scan RS (ms): Min: 7.8, Avg: 8.0, Max: 8.3, Diff: 0.5, Sum: 144.3] >>> [Object Copy (ms): Min: 28.3, Avg: 28.4, Max: 28.5, Diff: 0.2, >>> Sum: 510.7] >>> [Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: >>> 1.2] >>> [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, >>> Sum: 0.5] >>> [GC Worker Total (ms): Min: 52.3, Avg: 52.6, Max: 53.1, Diff: 0.8, >>> Sum: 947.5] >>> [GC Worker End (ms): Min: 308692211.6, Avg: 308692211.7, Max: >>> 308692211.7, Diff: 0.1] >>> [Code Root Fixup: 0.0 ms] >>> [Clear CT: 9.8 ms] >>> [Other: 11.1 ms] >>> [Choose CSet: 0.0 ms] >>> [Ref Proc: 2.4 ms] >>> [Ref Enq: 0.4 ms] >>> [Free CSet: 1.1 ms] >>> [Eden: 2912.0M(2912.0M)->0.0B(3616.0M) Survivors: 448.0M->416.0M >>> Heap: 51.7G(66.2G)->48.9G(66.2G)] >>> [Times: user=1.07 sys=0.01, real=0.08 secs] >>> 308697.312: [G1Ergonomics (Concurrent Cycles) initiate concurrent >>> cycle, reason: concurrent cycle initiation requested] >>> 2013-12-30T08:25:31.881-0500: 308697.312: [GC pause (young) >>> (initial-mark) >>> Desired survivor size 268435456 bytes, new threshold 15 (max 15) >>> - age 1: 17798336 bytes, 17798336 total >>> - age 2: 15275456 bytes, 33073792 total >>> - age 3: 27940176 bytes, 61013968 total >>> - age 4: 15716648 bytes, 76730616 total >>> - age 5: 16474656 bytes, 93205272 total >>> - age 6: 14249232 bytes, 107454504 total >>> - age 7: 15187536 bytes, 122642040 total >>> - age 8: 15073808 bytes, 137715848 total >>> - age 9: 17362752 bytes, 155078600 total >>> - age 10: 17031280 bytes, 172109880 total >>> - age 11: 16854792 bytes, 188964672 total >>> - age 12: 19124800 bytes, 208089472 total >>> - age 13: 20491176 bytes, 228580648 total >>> - age 14: 16367528 bytes, 244948176 total >>> 308697.313: [G1Ergonomics (CSet Construction) start choosing CSet, >>> _pending_cards: 31028, predicted base time: 37.87 ms, remaining time: 62.13 >>> ms, target pause time: 100.00 ms] >>> 308697.313: [G1Ergonomics (CSet Construction) add young regions to >>> CSet, eden: 113 regions, survivors: 13 regions, predicted young region >>> time: 27.99 ms] >>> 308697.313: [G1Ergonomics (CSet Construction) finish choosing CSet, >>> eden: 113 regions, survivors: 13 regions, old: 0 regions, predicted pause >>> time: 65.86 ms, target pause time: 100.00 ms] >>> , 0.0724890 secs] >>> [Parallel Time: 51.9 ms, GC Workers: 18] >>> [GC Worker Start (ms): Min: 308697313.3, Avg: 308697313.7, Max: >>> 308697314.0, Diff: 0.6] >>> [Ext Root Scanning (ms): Min: 4.3, Avg: 5.7, Max: 16.7, Diff: >>> 12.3, Sum: 101.8] >>> [Update RS (ms): Min: 0.0, Avg: 9.3, Max: 10.4, Diff: 10.4, Sum: >>> 166.9] >>> [Processed Buffers: Min: 0, Avg: 22.0, Max: 30, Diff: 30, Sum: >>> 396] >>> [Scan RS (ms): Min: 6.4, Avg: 8.5, Max: 13.0, Diff: 6.5, Sum: >>> 152.3] >>> [Object Copy (ms): Min: 22.5, Avg: 27.1, Max: 27.7, Diff: 5.2, >>> Sum: 487.0] >>> [Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: >>> 1.0] >>> [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, >>> Sum: 0.6] >>> [GC Worker Total (ms): Min: 50.2, Avg: 50.5, Max: 50.9, Diff: 0.6, >>> Sum: 909.5] >>> [GC Worker End (ms): Min: 308697364.2, Avg: 308697364.2, Max: >>> 308697364.3, Diff: 0.1] >>> [Code Root Fixup: 0.0 ms] >>> [Clear CT: 9.9 ms] >>> [Other: 10.8 ms] >>> [Choose CSet: 0.0 ms] >>> [Ref Proc: 2.8 ms] >>> [Ref Enq: 0.4 ms] >>> [Free CSet: 0.9 ms] >>> [Eden: 3616.0M(3616.0M)->0.0B(3520.0M) Survivors: 416.0M->448.0M >>> Heap: 52.5G(66.2G)->49.0G(66.2G)] >>> [Times: user=1.01 sys=0.00, real=0.07 secs] >>> 2013-12-30T08:25:31.954-0500: 308697.385: [GC >>> concurrent-root-region-scan-start] >>> 2013-12-30T08:25:31.967-0500: 308697.398: [GC >>> concurrent-root-region-scan-end, 0.0131710 secs] >>> 2013-12-30T08:25:31.967-0500: 308697.398: [GC concurrent-mark-start] >>> 2013-12-30T08:25:36.566-0500: 308701.997: [GC concurrent-mark-end, >>> 4.5984140 secs] >>> 2013-12-30T08:25:36.570-0500: 308702.002: [GC remark >>> 2013-12-30T08:25:36.573-0500: 308702.004: [GC ref-proc, 0.0126990 secs], >>> 0.0659540 secs] >>> [Times: user=0.87 sys=0.00, real=0.06 secs] >>> 2013-12-30T08:25:36.641-0500: 308702.072: [GC cleanup 52G->52G(66G), >>> 0.5487830 secs] >>> [Times: user=9.66 sys=0.06, real=0.54 secs] >>> 2013-12-30T08:25:37.190-0500: 308702.622: [GC concurrent-cleanup-start] >>> 2013-12-30T08:25:37.190-0500: 308702.622: [GC concurrent-cleanup-end, >>> 0.0000480 secs] >>> >>> >>> >>> _______________________________________________ >>> hotspot-gc-use mailing list >>> hotspot-gc-use at openjdk.java.net >>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>> >>> >> >> _______________________________________________ >> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> >> >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140102/d97f0482/attachment-0001.html From ysr1729 at gmail.com Fri Jan 3 01:12:02 2014 From: ysr1729 at gmail.com (Srinivas Ramakrishna) Date: Fri, 3 Jan 2014 01:12:02 -0800 Subject: G1: higher perm gen footprint or a possible perm gen leak? Message-ID: I haven't narrowed it down sufficiently yet, but has anyone noticed if G1 causes a higher perm gen footprint or, worse, a perm gen leak perhaps? I do realize that G1 does not today (as of 7u40 at least) collect the perm gen concurrently, rather deferring its collection to a stop-world full gc. However, it has just come to my attention that despite full stop-world gc's (on account of the perm gen getting full), G1 still uses more perm gen space (in some instacnes substantially more) than ParallelOldGC even after the full stop-world gc's, in some of our experiments. (PS: Also noticed that the default gc logging for G1 does not print the perm gen usage at full gc, unlike other collectors; looks like an oversight in logging perhaps one that has been fixed recently; i was on 7u40 i think.) While I need to collect more data using non-ParallelOld, non-G1 collectors (escpeially CMS) to see how things look and to get closer to the root cause, I wondered if anyone else had come across a similar issue and to check if this is a known issue. I'll post more details after gathering more data, but in case anyone has experienced this, please do share. thank you in advance, and Happy New Year! -- ramki -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140103/cffda9b9/attachment.html From wolfgang.pedot at finkzeit.at Fri Jan 3 07:33:14 2014 From: wolfgang.pedot at finkzeit.at (Wolfgang Pedot) Date: Fri, 03 Jan 2014 16:33:14 +0100 Subject: G1: higher perm gen footprint or a possible perm gen leak? In-Reply-To: References: Message-ID: <52C6D83A.8070309@finkzeit.at> Hi, I am using G1 on 7u45 for an application-server which has a "healthy" permGen churn because it generates a lot of short-lived dynamic classes (JavaScript). Currently permGen is sized at a little over 1GB and depending on usage there can be up to 2 full GCs per day (usually only 1). I have not noticed an increased permGen usage with G1 (increased size just before switching to G1) but I have noticed something odd about the permGen-usage after a collect. The class-count will always fall back to the same level which is currently 65k but the permGen usage after collect can either be ~0.8GB or ~0.55GB. There are always 3 collects resulting in 0.8GB followed by one scoring 0.55GB so there seems to be some kind of "rythm" going on. The full GCs are always triggered by permGen getting full and the loaded class count goes significantly higher after a 0.55GB collect (165k vs 125k) so I guess some classes just get unloaded later... I can not tell if this behaviour is due to G1 or some other factor in this application but I do know that I have no leak because the after-collect values are fairly stable over weeks. So I have not experienced this but am sharing anyway ;) happy new year Wolfgang Am 03.01.2014 10:12, schrieb Srinivas Ramakrishna: > I haven't narrowed it down sufficiently yet, but has anyone noticed if > G1 causes a higher perm gen footprint or, worse, a perm gen leak perhaps? > I do realize that G1 does not today (as of 7u40 at least) collect the > perm gen concurrently, rather deferring its collection to a stop-world full > gc. However, it has just come to my attention that despite full > stop-world gc's (on account of the perm gen getting full), G1 still uses > more perm gen > space (in some instacnes substantially more) than ParallelOldGC even > after the full stop-world gc's, in some of our experiments. (PS: Also > noticed > that the default gc logging for G1 does not print the perm gen usage at > full gc, unlike other collectors; looks like an oversight in logging > perhaps one > that has been fixed recently; i was on 7u40 i think.) > > While I need to collect more data using non-ParallelOld, non-G1 > collectors (escpeially CMS) to see how things look and to get closer to > the root > cause, I wondered if anyone else had come across a similar issue and to > check if this is a known issue. > > I'll post more details after gathering more data, but in case anyone has > experienced this, please do share. > > thank you in advance, and Happy New Year! > -- ramki > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > From jocf83 at gmail.com Fri Jan 3 07:47:05 2014 From: jocf83 at gmail.com (Jose Otavio Carlomagno Filho) Date: Fri, 3 Jan 2014 13:47:05 -0200 Subject: G1: higher perm gen footprint or a possible perm gen leak? In-Reply-To: <52C6D83A.8070309@finkzeit.at> References: <52C6D83A.8070309@finkzeit.at> Message-ID: We recently switched to G1 in our application and started experiencing this type of behaviour too. Turns out G1 was not causing the problem, it was only exposing it to us. Our application would generate a large number of proxy classes and that would cause the Perm Gen to fill up until a full GC was performed by G1. When using ParallelOldGC, this would not happen because full GCs would be executed much more frequently (when the old gen was full), which prevented the perm gen from filling up. You can find more info about our problem and our analysis here: http://stackoverflow.com/questions/20274317/g1-garbage-collector-perm-gen-fills-up-indefinitely-until-a-full-gc-is-performe I recommend you use a profiling too to investigate the root cause of your Perm Gen getting filled up. There's a chance it is a leak, but as I said, in our case, it was our own application's fault and G1 exposed the problem to us. Regards, Jose On Fri, Jan 3, 2014 at 1:33 PM, Wolfgang Pedot wrote: > Hi, > > I am using G1 on 7u45 for an application-server which has a "healthy" > permGen churn because it generates a lot of short-lived dynamic classes > (JavaScript). Currently permGen is sized at a little over 1GB and > depending on usage there can be up to 2 full GCs per day (usually only > 1). I have not noticed an increased permGen usage with G1 (increased > size just before switching to G1) but I have noticed something odd about > the permGen-usage after a collect. The class-count will always fall back > to the same level which is currently 65k but the permGen usage after > collect can either be ~0.8GB or ~0.55GB. There are always 3 collects > resulting in 0.8GB followed by one scoring 0.55GB so there seems to be > some kind of "rythm" going on. The full GCs are always triggered by > permGen getting full and the loaded class count goes significantly > higher after a 0.55GB collect (165k vs 125k) so I guess some classes > just get unloaded later... > > I can not tell if this behaviour is due to G1 or some other factor in > this application but I do know that I have no leak because the > after-collect values are fairly stable over weeks. > > So I have not experienced this but am sharing anyway ;) > > happy new year > Wolfgang > > Am 03.01.2014 10:12, schrieb Srinivas Ramakrishna: > > I haven't narrowed it down sufficiently yet, but has anyone noticed if > > G1 causes a higher perm gen footprint or, worse, a perm gen leak perhaps? > > I do realize that G1 does not today (as of 7u40 at least) collect the > > perm gen concurrently, rather deferring its collection to a stop-world > full > > gc. However, it has just come to my attention that despite full > > stop-world gc's (on account of the perm gen getting full), G1 still uses > > more perm gen > > space (in some instacnes substantially more) than ParallelOldGC even > > after the full stop-world gc's, in some of our experiments. (PS: Also > > noticed > > that the default gc logging for G1 does not print the perm gen usage at > > full gc, unlike other collectors; looks like an oversight in logging > > perhaps one > > that has been fixed recently; i was on 7u40 i think.) > > > > While I need to collect more data using non-ParallelOld, non-G1 > > collectors (escpeially CMS) to see how things look and to get closer to > > the root > > cause, I wondered if anyone else had come across a similar issue and to > > check if this is a known issue. > > > > I'll post more details after gathering more data, but in case anyone has > > experienced this, please do share. > > > > thank you in advance, and Happy New Year! > > -- ramki > > > > > > _______________________________________________ > > hotspot-gc-use mailing list > > hotspot-gc-use at openjdk.java.net > > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140103/4cbf11aa/attachment.html From yu.zhang at oracle.com Fri Jan 3 10:05:26 2014 From: yu.zhang at oracle.com (YU ZHANG) Date: Fri, 03 Jan 2014 10:05:26 -0800 Subject: G1: higher perm gen footprint or a possible perm gen leak? In-Reply-To: References: <52C6D83A.8070309@finkzeit.at> Message-ID: <52C6FBE6.6040904@oracle.com> Very interesting post. Like someone mentioned in the comments, with -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled, CMS can clean classes in PermGen with minor GC. But G1 can only unload class during full gc. Full GC in G1 is slow as it is single threaded. Thanks, Jenny On 1/3/2014 7:47 AM, Jose Otavio Carlomagno Filho wrote: > We recently switched to G1 in our application and started experiencing > this type of behaviour too. Turns out G1 was not causing the problem, > it was only exposing it to us. > > Our application would generate a large number of proxy classes and > that would cause the Perm Gen to fill up until a full GC was performed > by G1. When using ParallelOldGC, this would not happen because full > GCs would be executed much more frequently (when the old gen was > full), which prevented the perm gen from filling up. > > You can find more info about our problem and our analysis here: > http://stackoverflow.com/questions/20274317/g1-garbage-collector-perm-gen-fills-up-indefinitely-until-a-full-gc-is-performe > > I recommend you use a profiling too to investigate the root cause of > your Perm Gen getting filled up. There's a chance it is a leak, but as > I said, in our case, it was our own application's fault and G1 exposed > the problem to us. > > Regards, > Jose > > > On Fri, Jan 3, 2014 at 1:33 PM, Wolfgang Pedot > > wrote: > > Hi, > > I am using G1 on 7u45 for an application-server which has a "healthy" > permGen churn because it generates a lot of short-lived dynamic > classes > (JavaScript). Currently permGen is sized at a little over 1GB and > depending on usage there can be up to 2 full GCs per day (usually only > 1). I have not noticed an increased permGen usage with G1 (increased > size just before switching to G1) but I have noticed something odd > about > the permGen-usage after a collect. The class-count will always > fall back > to the same level which is currently 65k but the permGen usage after > collect can either be ~0.8GB or ~0.55GB. There are always 3 collects > resulting in 0.8GB followed by one scoring 0.55GB so there seems to be > some kind of "rythm" going on. The full GCs are always triggered by > permGen getting full and the loaded class count goes significantly > higher after a 0.55GB collect (165k vs 125k) so I guess some classes > just get unloaded later... > > I can not tell if this behaviour is due to G1 or some other factor in > this application but I do know that I have no leak because the > after-collect values are fairly stable over weeks. > > So I have not experienced this but am sharing anyway ;) > > happy new year > Wolfgang > > Am 03.01.2014 10:12, schrieb Srinivas Ramakrishna: > > I haven't narrowed it down sufficiently yet, but has anyone > noticed if > > G1 causes a higher perm gen footprint or, worse, a perm gen leak > perhaps? > > I do realize that G1 does not today (as of 7u40 at least) > collect the > > perm gen concurrently, rather deferring its collection to a > stop-world full > > gc. However, it has just come to my attention that despite full > > stop-world gc's (on account of the perm gen getting full), G1 > still uses > > more perm gen > > space (in some instacnes substantially more) than ParallelOldGC even > > after the full stop-world gc's, in some of our experiments. (PS: > Also > > noticed > > that the default gc logging for G1 does not print the perm gen > usage at > > full gc, unlike other collectors; looks like an oversight in logging > > perhaps one > > that has been fixed recently; i was on 7u40 i think.) > > > > While I need to collect more data using non-ParallelOld, non-G1 > > collectors (escpeially CMS) to see how things look and to get > closer to > > the root > > cause, I wondered if anyone else had come across a similar issue > and to > > check if this is a known issue. > > > > I'll post more details after gathering more data, but in case > anyone has > > experienced this, please do share. > > > > thank you in advance, and Happy New Year! > > -- ramki > > > > > > _______________________________________________ > > hotspot-gc-use mailing list > > hotspot-gc-use at openjdk.java.net > > > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140103/239d98fd/attachment.html From ysr1729 at gmail.com Fri Jan 3 11:30:47 2014 From: ysr1729 at gmail.com (Srinivas Ramakrishna) Date: Fri, 3 Jan 2014 11:30:47 -0800 Subject: G1: higher perm gen footprint or a possible perm gen leak? In-Reply-To: <52C6FBE6.6040904@oracle.com> References: <52C6D83A.8070309@finkzeit.at> <52C6FBE6.6040904@oracle.com> Message-ID: Thanks everyone for sharing yr experiences. As I indicated, I do realize that G1 does not collect perm gen concurrently. What was surprising was that G1's use of perm gen was much higher following its stop-world full gc's which would have collected the perm gen. As a result, G1 needed a perm gen quite a bit more than twice that given to parallel gc to be able to run an application for a certain length of time. I'll provide more data on perm gen dynamics when I have it. My guess would be that somehow G1's use of regions in the perm gen is causing a dilation of perm gen footprint on account of fragmentation in the G1 perm gen regions. If that were the case, I would expect a modest increase in the perm gen footprint, but it seemed the increase in footprint was much higher. I'll collect and post more concrete numbers when I get a chance. -- ramki On Fri, Jan 3, 2014 at 10:05 AM, YU ZHANG wrote: > Very interesting post. Like someone mentioned in the comments, with > -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled, CMS can clean > classes in PermGen with minor GC. But G1 can only unload class during full > gc. Full GC in G1 is slow as it is single threaded. > > Thanks, > Jenny > > On 1/3/2014 7:47 AM, Jose Otavio Carlomagno Filho wrote: > > We recently switched to G1 in our application and started experiencing > this type of behaviour too. Turns out G1 was not causing the problem, it > was only exposing it to us. > > Our application would generate a large number of proxy classes and that > would cause the Perm Gen to fill up until a full GC was performed by G1. > When using ParallelOldGC, this would not happen because full GCs would be > executed much more frequently (when the old gen was full), which prevented > the perm gen from filling up. > > You can find more info about our problem and our analysis here: > http://stackoverflow.com/questions/20274317/g1-garbage-collector-perm-gen-fills-up-indefinitely-until-a-full-gc-is-performe > > I recommend you use a profiling too to investigate the root cause of > your Perm Gen getting filled up. There's a chance it is a leak, but as I > said, in our case, it was our own application's fault and G1 exposed the > problem to us. > > Regards, > Jose > > > On Fri, Jan 3, 2014 at 1:33 PM, Wolfgang Pedot > wrote: > >> Hi, >> >> I am using G1 on 7u45 for an application-server which has a "healthy" >> permGen churn because it generates a lot of short-lived dynamic classes >> (JavaScript). Currently permGen is sized at a little over 1GB and >> depending on usage there can be up to 2 full GCs per day (usually only >> 1). I have not noticed an increased permGen usage with G1 (increased >> size just before switching to G1) but I have noticed something odd about >> the permGen-usage after a collect. The class-count will always fall back >> to the same level which is currently 65k but the permGen usage after >> collect can either be ~0.8GB or ~0.55GB. There are always 3 collects >> resulting in 0.8GB followed by one scoring 0.55GB so there seems to be >> some kind of "rythm" going on. The full GCs are always triggered by >> permGen getting full and the loaded class count goes significantly >> higher after a 0.55GB collect (165k vs 125k) so I guess some classes >> just get unloaded later... >> >> I can not tell if this behaviour is due to G1 or some other factor in >> this application but I do know that I have no leak because the >> after-collect values are fairly stable over weeks. >> >> So I have not experienced this but am sharing anyway ;) >> >> happy new year >> Wolfgang >> >> Am 03.01.2014 10:12, schrieb Srinivas Ramakrishna: >> > I haven't narrowed it down sufficiently yet, but has anyone noticed if >> > G1 causes a higher perm gen footprint or, worse, a perm gen leak >> perhaps? >> > I do realize that G1 does not today (as of 7u40 at least) collect the >> > perm gen concurrently, rather deferring its collection to a stop-world >> full >> > gc. However, it has just come to my attention that despite full >> > stop-world gc's (on account of the perm gen getting full), G1 still uses >> > more perm gen >> > space (in some instacnes substantially more) than ParallelOldGC even >> > after the full stop-world gc's, in some of our experiments. (PS: Also >> > noticed >> > that the default gc logging for G1 does not print the perm gen usage at >> > full gc, unlike other collectors; looks like an oversight in logging >> > perhaps one >> > that has been fixed recently; i was on 7u40 i think.) >> > >> > While I need to collect more data using non-ParallelOld, non-G1 >> > collectors (escpeially CMS) to see how things look and to get closer to >> > the root >> > cause, I wondered if anyone else had come across a similar issue and to >> > check if this is a known issue. >> > >> > I'll post more details after gathering more data, but in case anyone has >> > experienced this, please do share. >> > >> > thank you in advance, and Happy New Year! >> > -- ramki >> > >> > >> > _______________________________________________ >> > hotspot-gc-use mailing list >> > hotspot-gc-use at openjdk.java.net >> > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> > >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> > > > > _______________________________________________ > hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140103/75e83946/attachment-0001.html From ysr1729 at gmail.com Fri Jan 3 11:36:42 2014 From: ysr1729 at gmail.com (Srinivas Ramakrishna) Date: Fri, 3 Jan 2014 11:36:42 -0800 Subject: G1: higher perm gen footprint or a possible perm gen leak? In-Reply-To: <52C6FBE6.6040904@oracle.com> References: <52C6D83A.8070309@finkzeit.at> <52C6FBE6.6040904@oracle.com> Message-ID: Hi Jenny -- On Fri, Jan 3, 2014 at 10:05 AM, YU ZHANG wrote: > Very interesting post. Like someone mentioned in the comments, with > -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled, CMS can clean > classes in PermGen with minor GC. But G1 can only unload class during full > gc. Full GC in G1 is slow as it is single threaded. > One small correction: CMS collects perm gen in major gc cycles, albeit concurrently with that flag enabled. The perm gen isn't cleaned at a minor gc with any of our collectors, since global reachability isn't checked at minor gc's. -- ramki > > > Thanks, > Jenny > > On 1/3/2014 7:47 AM, Jose Otavio Carlomagno Filho wrote: > > We recently switched to G1 in our application and started experiencing > this type of behaviour too. Turns out G1 was not causing the problem, it > was only exposing it to us. > > Our application would generate a large number of proxy classes and that > would cause the Perm Gen to fill up until a full GC was performed by G1. > When using ParallelOldGC, this would not happen because full GCs would be > executed much more frequently (when the old gen was full), which prevented > the perm gen from filling up. > > You can find more info about our problem and our analysis here: > http://stackoverflow.com/questions/20274317/g1-garbage-collector-perm-gen-fills-up-indefinitely-until-a-full-gc-is-performe > > I recommend you use a profiling too to investigate the root cause of > your Perm Gen getting filled up. There's a chance it is a leak, but as I > said, in our case, it was our own application's fault and G1 exposed the > problem to us. > > Regards, > Jose > > > On Fri, Jan 3, 2014 at 1:33 PM, Wolfgang Pedot > wrote: > >> Hi, >> >> I am using G1 on 7u45 for an application-server which has a "healthy" >> permGen churn because it generates a lot of short-lived dynamic classes >> (JavaScript). Currently permGen is sized at a little over 1GB and >> depending on usage there can be up to 2 full GCs per day (usually only >> 1). I have not noticed an increased permGen usage with G1 (increased >> size just before switching to G1) but I have noticed something odd about >> the permGen-usage after a collect. The class-count will always fall back >> to the same level which is currently 65k but the permGen usage after >> collect can either be ~0.8GB or ~0.55GB. There are always 3 collects >> resulting in 0.8GB followed by one scoring 0.55GB so there seems to be >> some kind of "rythm" going on. The full GCs are always triggered by >> permGen getting full and the loaded class count goes significantly >> higher after a 0.55GB collect (165k vs 125k) so I guess some classes >> just get unloaded later... >> >> I can not tell if this behaviour is due to G1 or some other factor in >> this application but I do know that I have no leak because the >> after-collect values are fairly stable over weeks. >> >> So I have not experienced this but am sharing anyway ;) >> >> happy new year >> Wolfgang >> >> Am 03.01.2014 10:12, schrieb Srinivas Ramakrishna: >> > I haven't narrowed it down sufficiently yet, but has anyone noticed if >> > G1 causes a higher perm gen footprint or, worse, a perm gen leak >> perhaps? >> > I do realize that G1 does not today (as of 7u40 at least) collect the >> > perm gen concurrently, rather deferring its collection to a stop-world >> full >> > gc. However, it has just come to my attention that despite full >> > stop-world gc's (on account of the perm gen getting full), G1 still uses >> > more perm gen >> > space (in some instacnes substantially more) than ParallelOldGC even >> > after the full stop-world gc's, in some of our experiments. (PS: Also >> > noticed >> > that the default gc logging for G1 does not print the perm gen usage at >> > full gc, unlike other collectors; looks like an oversight in logging >> > perhaps one >> > that has been fixed recently; i was on 7u40 i think.) >> > >> > While I need to collect more data using non-ParallelOld, non-G1 >> > collectors (escpeially CMS) to see how things look and to get closer to >> > the root >> > cause, I wondered if anyone else had come across a similar issue and to >> > check if this is a known issue. >> > >> > I'll post more details after gathering more data, but in case anyone has >> > experienced this, please do share. >> > >> > thank you in advance, and Happy New Year! >> > -- ramki >> > >> > >> > _______________________________________________ >> > hotspot-gc-use mailing list >> > hotspot-gc-use at openjdk.java.net >> > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> > >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> > > > > _______________________________________________ > hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140103/7a4c9945/attachment.html From yu.zhang at oracle.com Fri Jan 3 11:53:34 2014 From: yu.zhang at oracle.com (YU ZHANG) Date: Fri, 03 Jan 2014 11:53:34 -0800 Subject: G1: higher perm gen footprint or a possible perm gen leak? In-Reply-To: References: <52C6D83A.8070309@finkzeit.at> <52C6FBE6.6040904@oracle.com> Message-ID: <52C7153E.9070206@oracle.com> Ramki, The perm gen data would be very interesting. And thanks for correcting me on my previous post: "One small correction: CMS collects perm gen in major gc cycles, albeit concurrently with that flag enabled. The perm gen isn't cleaned at a minor gc with any of our collectors, since global reachability isn't checked at minor gc's." Thanks, Jenny On 1/3/2014 11:30 AM, Srinivas Ramakrishna wrote: > Thanks everyone for sharing yr experiences. As I indicated, I do > realize that G1 does not collect perm gen concurrently. > What was surprising was that G1's use of perm gen was much higher > following its stop-world full gc's > which would have collected the perm gen. As a result, G1 needed a perm > gen quite a bit more than twice that > given to parallel gc to be able to run an application for a certain > length of time. > > I'll provide more data on perm gen dynamics when I have it. My guess > would be that somehow G1's use of > regions in the perm gen is causing a dilation of perm gen footprint on > account of fragmentation in the G1 perm > gen regions. If that were the case, I would expect a modest increase > in the perm gen footprint, but it seemed the increase in > footprint was much higher. I'll collect and post more concrete numbers > when I get a chance. > > -- ramki > > > > On Fri, Jan 3, 2014 at 10:05 AM, YU ZHANG > wrote: > > Very interesting post. Like someone mentioned in the comments, > with -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled, CMS > can clean classes in PermGen with minor GC. But G1 can only > unload class during full gc. Full GC in G1 is slow as it is > single threaded. > > Thanks, > Jenny > > On 1/3/2014 7:47 AM, Jose Otavio Carlomagno Filho wrote: >> We recently switched to G1 in our application and started >> experiencing this type of behaviour too. Turns out G1 was not >> causing the problem, it was only exposing it to us. >> >> Our application would generate a large number of proxy classes >> and that would cause the Perm Gen to fill up until a full GC was >> performed by G1. When using ParallelOldGC, this would not happen >> because full GCs would be executed much more frequently (when the >> old gen was full), which prevented the perm gen from filling up. >> >> You can find more info about our problem and our analysis here: >> http://stackoverflow.com/questions/20274317/g1-garbage-collector-perm-gen-fills-up-indefinitely-until-a-full-gc-is-performe >> >> I recommend you use a profiling too to investigate the root cause >> of your Perm Gen getting filled up. There's a chance it is a >> leak, but as I said, in our case, it was our own application's >> fault and G1 exposed the problem to us. >> >> Regards, >> Jose >> >> >> On Fri, Jan 3, 2014 at 1:33 PM, Wolfgang Pedot >> > >> wrote: >> >> Hi, >> >> I am using G1 on 7u45 for an application-server which has a >> "healthy" >> permGen churn because it generates a lot of short-lived >> dynamic classes >> (JavaScript). Currently permGen is sized at a little over 1GB and >> depending on usage there can be up to 2 full GCs per day >> (usually only >> 1). I have not noticed an increased permGen usage with G1 >> (increased >> size just before switching to G1) but I have noticed >> something odd about >> the permGen-usage after a collect. The class-count will >> always fall back >> to the same level which is currently 65k but the permGen >> usage after >> collect can either be ~0.8GB or ~0.55GB. There are always 3 >> collects >> resulting in 0.8GB followed by one scoring 0.55GB so there >> seems to be >> some kind of "rythm" going on. The full GCs are always >> triggered by >> permGen getting full and the loaded class count goes >> significantly >> higher after a 0.55GB collect (165k vs 125k) so I guess some >> classes >> just get unloaded later... >> >> I can not tell if this behaviour is due to G1 or some other >> factor in >> this application but I do know that I have no leak because the >> after-collect values are fairly stable over weeks. >> >> So I have not experienced this but am sharing anyway ;) >> >> happy new year >> Wolfgang >> >> Am 03.01.2014 10:12, schrieb Srinivas Ramakrishna: >> > I haven't narrowed it down sufficiently yet, but has anyone >> noticed if >> > G1 causes a higher perm gen footprint or, worse, a perm gen >> leak perhaps? >> > I do realize that G1 does not today (as of 7u40 at least) >> collect the >> > perm gen concurrently, rather deferring its collection to a >> stop-world full >> > gc. However, it has just come to my attention that despite full >> > stop-world gc's (on account of the perm gen getting full), >> G1 still uses >> > more perm gen >> > space (in some instacnes substantially more) than >> ParallelOldGC even >> > after the full stop-world gc's, in some of our experiments. >> (PS: Also >> > noticed >> > that the default gc logging for G1 does not print the perm >> gen usage at >> > full gc, unlike other collectors; looks like an oversight >> in logging >> > perhaps one >> > that has been fixed recently; i was on 7u40 i think.) >> > >> > While I need to collect more data using non-ParallelOld, non-G1 >> > collectors (escpeially CMS) to see how things look and to >> get closer to >> > the root >> > cause, I wondered if anyone else had come across a similar >> issue and to >> > check if this is a known issue. >> > >> > I'll post more details after gathering more data, but in >> case anyone has >> > experienced this, please do share. >> > >> > thank you in advance, and Happy New Year! >> > -- ramki >> > >> > >> > _______________________________________________ >> > hotspot-gc-use mailing list >> > hotspot-gc-use at openjdk.java.net >> >> > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> > >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> >> >> >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140103/dffcd240/attachment.html From wolfgang.pedot at finkzeit.at Fri Jan 3 12:46:43 2014 From: wolfgang.pedot at finkzeit.at (Wolfgang Pedot) Date: Fri, 03 Jan 2014 21:46:43 +0100 Subject: G1: higher perm gen footprint or a possible perm gen leak? In-Reply-To: <52C6FBE6.6040904@oracle.com> References: <52C6D83A.8070309@finkzeit.at> <52C6FBE6.6040904@oracle.com> Message-ID: <52C721B3.9010909@finkzeit.at> Looks like the mail you quoted (from Jose Otavio Carlomagno Filho) was in response to mine but I have not received it... Just to clarify: I know why permGen fills up and its an expected behaviour in this application. Having 1-2 full GCs a day is certainly not ideal but its also no killer and I like how G1 handles the young/old heap. What makes me wonder is why after every 4th full GC permGen usage drops a good 250MB lower than the 3 collects before and there is space for significantly more classes afterwards (165k vs 125k). Something else in permGen must get cleaned up at that time... That rythm keeps constant so far no matter how much time passes between full GCs. I dont really think G1 causes this 3-1 rythm specifically but whats interesting is that CMS with ClassUnloading never got significantly below that 0.8GB if I remember correctly. regards Wolfgang PS: my older question about G1 and incremental permGen possibility to this mailing list is actually linked in that stackoverflow-thread so we have a complete circle here ;) Am 03.01.2014 19:05, schrieb YU ZHANG: > Very interesting post. Like someone mentioned in the comments, with > -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled, CMS can clean > classes in PermGen with minor GC. But G1 can only unload class during > full gc. Full GC in G1 is slow as it is single threaded. > > Thanks, > Jenny > > On 1/3/2014 7:47 AM, Jose Otavio Carlomagno Filho wrote: >> We recently switched to G1 in our application and started experiencing >> this type of behaviour too. Turns out G1 was not causing the problem, >> it was only exposing it to us. >> >> Our application would generate a large number of proxy classes and >> that would cause the Perm Gen to fill up until a full GC was performed >> by G1. When using ParallelOldGC, this would not happen because full >> GCs would be executed much more frequently (when the old gen was >> full), which prevented the perm gen from filling up. >> >> You can find more info about our problem and our analysis here: >> http://stackoverflow.com/questions/20274317/g1-garbage-collector-perm-gen-fills-up-indefinitely-until-a-full-gc-is-performe >> >> I recommend you use a profiling too to investigate the root cause of >> your Perm Gen getting filled up. There's a chance it is a leak, but as >> I said, in our case, it was our own application's fault and G1 exposed >> the problem to us. >> >> Regards, >> Jose >> >> >> On Fri, Jan 3, 2014 at 1:33 PM, Wolfgang Pedot >> > wrote: >> >> Hi, >> >> I am using G1 on 7u45 for an application-server which has a "healthy" >> permGen churn because it generates a lot of short-lived dynamic >> classes >> (JavaScript). Currently permGen is sized at a little over 1GB and >> depending on usage there can be up to 2 full GCs per day (usually only >> 1). I have not noticed an increased permGen usage with G1 (increased >> size just before switching to G1) but I have noticed something odd >> about >> the permGen-usage after a collect. The class-count will always >> fall back >> to the same level which is currently 65k but the permGen usage after >> collect can either be ~0.8GB or ~0.55GB. There are always 3 collects >> resulting in 0.8GB followed by one scoring 0.55GB so there seems to be >> some kind of "rythm" going on. The full GCs are always triggered by >> permGen getting full and the loaded class count goes significantly >> higher after a 0.55GB collect (165k vs 125k) so I guess some classes >> just get unloaded later... >> >> I can not tell if this behaviour is due to G1 or some other factor in >> this application but I do know that I have no leak because the >> after-collect values are fairly stable over weeks. >> >> So I have not experienced this but am sharing anyway ;) >> >> happy new year >> Wolfgang >> >> Am 03.01.2014 10:12, schrieb Srinivas Ramakrishna: >> > I haven't narrowed it down sufficiently yet, but has anyone >> noticed if >> > G1 causes a higher perm gen footprint or, worse, a perm gen leak >> perhaps? >> > I do realize that G1 does not today (as of 7u40 at least) >> collect the >> > perm gen concurrently, rather deferring its collection to a >> stop-world full >> > gc. However, it has just come to my attention that despite full >> > stop-world gc's (on account of the perm gen getting full), G1 >> still uses >> > more perm gen >> > space (in some instacnes substantially more) than ParallelOldGC even >> > after the full stop-world gc's, in some of our experiments. (PS: >> Also >> > noticed >> > that the default gc logging for G1 does not print the perm gen >> usage at >> > full gc, unlike other collectors; looks like an oversight in logging >> > perhaps one >> > that has been fixed recently; i was on 7u40 i think.) >> > >> > While I need to collect more data using non-ParallelOld, non-G1 >> > collectors (escpeially CMS) to see how things look and to get >> closer to >> > the root >> > cause, I wondered if anyone else had come across a similar issue >> and to >> > check if this is a known issue. >> > >> > I'll post more details after gathering more data, but in case >> anyone has >> > experienced this, please do share. >> > >> > thank you in advance, and Happy New Year! >> > -- ramki >> > >> > >> > _______________________________________________ >> > hotspot-gc-use mailing list >> > hotspot-gc-use at openjdk.java.net >> >> > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> > >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> >> >> >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > From bernd-2013 at eckenfels.net Fri Jan 3 12:59:10 2014 From: bernd-2013 at eckenfels.net (Bernd Eckenfels) Date: Fri, 03 Jan 2014 21:59:10 +0100 Subject: G1: higher perm gen footprint or a possible perm gen leak? In-Reply-To: <52C721B3.9010909@finkzeit.at> References: <52C6D83A.8070309@finkzeit.at> <52C6FBE6.6040904@oracle.com> <52C721B3.9010909@finkzeit.at> Message-ID: Am 03.01.2014, 21:46 Uhr, schrieb Wolfgang Pedot : > What makes > me wonder is why after every 4th full GC permGen usage drops a good > 250MB lower than the 3 collects before and there is space for > significantly more classes afterwards (165k vs 125k). Could be softreference or (more likely) finalizer related? Gruss Bernd From thomas.schatzl at oracle.com Fri Jan 3 13:52:46 2014 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Fri, 03 Jan 2014 22:52:46 +0100 Subject: G1: higher perm gen footprint or a possible perm gen leak? In-Reply-To: References: <52C6D83A.8070309@finkzeit.at> <52C6FBE6.6040904@oracle.com> Message-ID: <1388785966.6059.2.camel@cirrus> Hi, On Fri, 2014-01-03 at 11:30 -0800, Srinivas Ramakrishna wrote: > Thanks everyone for sharing yr experiences. As I indicated, I do > realize that G1 does not collect perm gen concurrently. > What was surprising was that G1's use of perm gen was much higher > following its stop-world full gc's > which would have collected the perm gen. As a result, G1 needed a perm > gen quite a bit more than twice that > given to parallel gc to be able to run an application for a certain > length of time. Maybe explained by different soft reference policies? I.e. maybe the input for the soft reference processing is different in both collectors, making it behave differently, possibly keeping alive more objects/classes for longer. > I'll provide more data on perm gen dynamics when I have it. My guess > would be that somehow G1's use of > regions in the perm gen is causing a dilation of perm gen footprint on > account of fragmentation in the G1 perm > gen regions. If that were the case, I would expect a modest increase > in the perm gen footprint, but it seemed the increase in > footprint was much higher. I'll collect and post more concrete numbers > when I get a chance. (G1) Perm gen is never region based. Thomas From ysr1729 at gmail.com Fri Jan 3 14:02:23 2014 From: ysr1729 at gmail.com (Srinivas Ramakrishna) Date: Fri, 3 Jan 2014 14:02:23 -0800 Subject: G1: higher perm gen footprint or a possible perm gen leak? In-Reply-To: <1388785966.6059.2.camel@cirrus> References: <52C6D83A.8070309@finkzeit.at> <52C6FBE6.6040904@oracle.com> <1388785966.6059.2.camel@cirrus> Message-ID: Hi Thomas -- On Fri, Jan 3, 2014 at 1:52 PM, Thomas Schatzl wrote: > Hi, > > On Fri, 2014-01-03 at 11:30 -0800, Srinivas Ramakrishna wrote: > > Thanks everyone for sharing yr experiences. As I indicated, I do > > realize that G1 does not collect perm gen concurrently. > > What was surprising was that G1's use of perm gen was much higher > > following its stop-world full gc's > > which would have collected the perm gen. As a result, G1 needed a perm > > gen quite a bit more than twice that > > given to parallel gc to be able to run an application for a certain > > length of time. > > Maybe explained by different soft reference policies? I.e. maybe the > input for the soft reference processing is different in both collectors, > making it behave differently, possibly keeping alive more > objects/classes for longer. > Thanks for that thought; i'll keep that in mind. > > > I'll provide more data on perm gen dynamics when I have it. My guess > > would be that somehow G1's use of > > regions in the perm gen is causing a dilation of perm gen footprint on > > account of fragmentation in the G1 perm > > gen regions. If that were the case, I would expect a modest increase > > in the perm gen footprint, but it seemed the increase in > > footprint was much higher. I'll collect and post more concrete numbers > > when I get a chance. > > (G1) Perm gen is never region based. > > Ah, thanks for correcting that misconception of mine. So we can cross that off. -- ramki > Thomas > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140103/996d4c42/attachment.html From yaoshengzhe at gmail.com Mon Jan 6 12:03:36 2014 From: yaoshengzhe at gmail.com (yao) Date: Mon, 6 Jan 2014 12:03:36 -0800 Subject: java process memory usage is higher than Xmx Message-ID: Hi All, I have a java process (HBase region server process ) running under Java 7 (1.7.0_40-b43) with G1 enabled. Both Xms and Xmx are the same. After running process for a few hours, I see the actual memory used by the process is about 10 percent higher than given Xmx. Has anyone experienced the similar when use Java 7 or G1 ? Is there useful tools to diagnose the cause ? I've tried jmap but the output doesn't say anything about high memory usage. FYI, the java process use a large heap (90GB), but the actual memory usage ($ top) is about 99GB. Thanks Shengzhe -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140106/44ab95e1/attachment.html From jon.masamitsu at oracle.com Mon Jan 6 12:01:41 2014 From: jon.masamitsu at oracle.com (Jon Masamitsu) Date: Mon, 06 Jan 2014 12:01:41 -0800 Subject: G1: higher perm gen footprint or a possible perm gen leak? In-Reply-To: <52C721B3.9010909@finkzeit.at> References: <52C6D83A.8070309@finkzeit.at> <52C6FBE6.6040904@oracle.com> <52C721B3.9010909@finkzeit.at> Message-ID: <52CB0BA5.2080202@oracle.com> On 01/03/2014 12:46 PM, Wolfgang Pedot wrote: > Looks like the mail you quoted (from Jose Otavio Carlomagno Filho) was > in response to mine but I have not received it... > > Just to clarify: > I know why permGen fills up and its an expected behaviour in this > application. Having 1-2 full GCs a day is certainly not ideal but its > also no killer and I like how G1 handles the young/old heap. What makes > me wonder is why after every 4th full GC permGen usage drops a good > 250MB lower than the 3 collects before and there is space for > significantly more classes afterwards (165k vs 125k). Something else in > permGen must get cleaned up at that time... > That rythm keeps constant so far no matter how much time passes between > full GCs. > > I dont really think G1 causes this 3-1 rythm specifically but whats > interesting is that CMS with ClassUnloading never got significantly > below that 0.8GB if I remember correctly. Try -XX:MarkSweepAlwaysCompactCount=1 which should make every full GC compact out all the dead space. Alternatively try -XX:MarkSweepAlwaysCompactCount=8 and see if that changes the pattern. product(uintx, MarkSweepAlwaysCompactCount, 4, \ "How often should we fully compact the heap (ignoring the dead " \ "space parameters)") Jon > > regards > Wolfgang > > PS: my older question about G1 and incremental permGen possibility to > this mailing list is actually linked in that stackoverflow-thread so we > have a complete circle here ;) > > > > Am 03.01.2014 19:05, schrieb YU ZHANG: >> Very interesting post. Like someone mentioned in the comments, with >> -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled, CMS can clean >> classes in PermGen with minor GC. But G1 can only unload class during >> full gc. Full GC in G1 is slow as it is single threaded. >> >> Thanks, >> Jenny >> >> On 1/3/2014 7:47 AM, Jose Otavio Carlomagno Filho wrote: >>> We recently switched to G1 in our application and started experiencing >>> this type of behaviour too. Turns out G1 was not causing the problem, >>> it was only exposing it to us. >>> >>> Our application would generate a large number of proxy classes and >>> that would cause the Perm Gen to fill up until a full GC was performed >>> by G1. When using ParallelOldGC, this would not happen because full >>> GCs would be executed much more frequently (when the old gen was >>> full), which prevented the perm gen from filling up. >>> >>> You can find more info about our problem and our analysis here: >>> http://stackoverflow.com/questions/20274317/g1-garbage-collector-perm-gen-fills-up-indefinitely-until-a-full-gc-is-performe >>> >>> I recommend you use a profiling too to investigate the root cause of >>> your Perm Gen getting filled up. There's a chance it is a leak, but as >>> I said, in our case, it was our own application's fault and G1 exposed >>> the problem to us. >>> >>> Regards, >>> Jose >>> >>> >>> On Fri, Jan 3, 2014 at 1:33 PM, Wolfgang Pedot >>> > wrote: >>> >>> Hi, >>> >>> I am using G1 on 7u45 for an application-server which has a "healthy" >>> permGen churn because it generates a lot of short-lived dynamic >>> classes >>> (JavaScript). Currently permGen is sized at a little over 1GB and >>> depending on usage there can be up to 2 full GCs per day (usually only >>> 1). I have not noticed an increased permGen usage with G1 (increased >>> size just before switching to G1) but I have noticed something odd >>> about >>> the permGen-usage after a collect. The class-count will always >>> fall back >>> to the same level which is currently 65k but the permGen usage after >>> collect can either be ~0.8GB or ~0.55GB. There are always 3 collects >>> resulting in 0.8GB followed by one scoring 0.55GB so there seems to be >>> some kind of "rythm" going on. The full GCs are always triggered by >>> permGen getting full and the loaded class count goes significantly >>> higher after a 0.55GB collect (165k vs 125k) so I guess some classes >>> just get unloaded later... >>> >>> I can not tell if this behaviour is due to G1 or some other factor in >>> this application but I do know that I have no leak because the >>> after-collect values are fairly stable over weeks. >>> >>> So I have not experienced this but am sharing anyway ;) >>> >>> happy new year >>> Wolfgang >>> >>> Am 03.01.2014 10:12, schrieb Srinivas Ramakrishna: >>> > I haven't narrowed it down sufficiently yet, but has anyone >>> noticed if >>> > G1 causes a higher perm gen footprint or, worse, a perm gen leak >>> perhaps? >>> > I do realize that G1 does not today (as of 7u40 at least) >>> collect the >>> > perm gen concurrently, rather deferring its collection to a >>> stop-world full >>> > gc. However, it has just come to my attention that despite full >>> > stop-world gc's (on account of the perm gen getting full), G1 >>> still uses >>> > more perm gen >>> > space (in some instacnes substantially more) than ParallelOldGC even >>> > after the full stop-world gc's, in some of our experiments. (PS: >>> Also >>> > noticed >>> > that the default gc logging for G1 does not print the perm gen >>> usage at >>> > full gc, unlike other collectors; looks like an oversight in logging >>> > perhaps one >>> > that has been fixed recently; i was on 7u40 i think.) >>> > >>> > While I need to collect more data using non-ParallelOld, non-G1 >>> > collectors (escpeially CMS) to see how things look and to get >>> closer to >>> > the root >>> > cause, I wondered if anyone else had come across a similar issue >>> and to >>> > check if this is a known issue. >>> > >>> > I'll post more details after gathering more data, but in case >>> anyone has >>> > experienced this, please do share. >>> > >>> > thank you in advance, and Happy New Year! >>> > -- ramki >>> > >>> > >>> > _______________________________________________ >>> > hotspot-gc-use mailing list >>> > hotspot-gc-use at openjdk.java.net >>> >>> > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>> > >>> >>> _______________________________________________ >>> hotspot-gc-use mailing list >>> hotspot-gc-use at openjdk.java.net >>> >>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >>> >>> >>> >>> >>> _______________________________________________ >>> hotspot-gc-use mailing list >>> hotspot-gc-use at openjdk.java.net >>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use >> > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From daubman at gmail.com Mon Jan 6 12:54:29 2014 From: daubman at gmail.com (Aaron Daubman) Date: Mon, 6 Jan 2014 15:54:29 -0500 Subject: java process memory usage is higher than Xmx In-Reply-To: References: Message-ID: > > > I've tried jmap but the output doesn't say anything about high memory > usage. FYI, the java process use a large heap (90GB), but the actual memory > usage ($ top) is about 99GB. > > Is this the RES column from top? You might also see: http://plumbr.eu/blog/why-does-my-java-process-consume-more-memory-than-xmx Although I would not expect permgen and stack to sum up to 9G... Are you using JNI, bytebuffers or anything else that would allocate off-heap memory? -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140106/77be33ea/attachment.html From yaoshengzhe at gmail.com Mon Jan 6 13:33:45 2014 From: yaoshengzhe at gmail.com (yao) Date: Mon, 6 Jan 2014 13:33:45 -0800 Subject: java process memory usage is higher than Xmx In-Reply-To: References: Message-ID: Hi Aaron, Is this the RES column from top? > Yes, it is. Are you using JNI, bytebuffers or anything else that would allocate > off-heap memory? > It is original HBase region server process, we never modify the code. It might use off-heap memory internally but the problem is, similar machine running under Java 6 with CMS do not have this problem and the real memory usage is very close to Xmx. In our case, the permgen seems not very high and I don't think stack would be a problem. I am now wondering, does anyone use G1 (with large heap) experience this memory usage issue (larger than Xmx a lot) ? Perm Generation: capacity = 33554432 (32.0MB) used = 31725872 (30.256149291992188MB) free = 1828560 (1.7438507080078125MB) 94.55046653747559% used -Shengzhe -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140106/292eb2a6/attachment.html From thomas.schatzl at oracle.com Mon Jan 6 13:43:12 2014 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Mon, 06 Jan 2014 22:43:12 +0100 Subject: java process memory usage is higher than Xmx In-Reply-To: References: Message-ID: <1389044592.5005.3.camel@cirrus> Hi, On Mon, 2014-01-06 at 12:03 -0800, yao wrote: > Hi All, > > I have a java process (HBase region server process ) running under > Java 7 (1.7.0_40-b43) with G1 enabled. Both Xms and Xmx are the same. > After running process for a few hours, I see the actual memory used by > the process is about 10 percent higher than given Xmx. Has anyone > experienced the similar when use Java 7 or G1 ? Is there useful tools > to diagnose the cause ? > > > I've tried jmap but the output doesn't say anything about high memory > usage. FYI, the java process use a large heap (90GB), but the actual > memory usage ($ top) is about 99GB. > Possibly remembered set size. Can you enable -XX:+UnlockDiagnosticVMOptions and -XX: +G1SummarizeRSetStats? The line Total heap region rem set sizes = 5256086K. Max = 8640K. gives you a good idea about remembered set size memory usage. I copied above line from one of your responses to the "G1 GC clean up time is too long" thread, and it seems the remembered set takes ~5GB there. Hth, Thomas From yaoshengzhe at gmail.com Mon Jan 6 13:49:43 2014 From: yaoshengzhe at gmail.com (yao) Date: Mon, 6 Jan 2014 13:49:43 -0800 Subject: java process memory usage is higher than Xmx In-Reply-To: <1389044592.5005.3.camel@cirrus> References: <1389044592.5005.3.camel@cirrus> Message-ID: Hi Thomas, Possibly remembered set size. > > Can you enable -XX:+UnlockDiagnosticVMOptions and -XX: > +G1SummarizeRSetStats? > > The line > > Total heap region rem set sizes = 5256086K. Max = 8640K. > > gives you a good idea about remembered set size memory usage. > You are right, rem set occupies ~7GB Concurrent RS processed -1863202596 cards Of 12780432 completed buffers: 12611979 ( 98.7%) by conc RS threads. 168453 ( 1.3%) by mutator threads. Conc RS threads times(s) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 \ 0.00 Total heap region rem set sizes = 7520648K. Max = 13256K. Static structures = 347K, free_lists = 28814K. 141012349 occupied cards represented. Max size region = 93:(O)[0x00007fcb48000000,0x00007fcb4a000000,0x00007fcb4a000000], size = 13257K, occupied = 2639K. Did 0 coarsenings. I copied above line from one of your responses to the "G1 GC clean up > time is too long" thread, and it seems the remembered set takes ~5GB > there. > I did set -XX:G1RSetRegionEntries=4096 to avoid coarsenings; however, the cleanup time seems not being reduced, it is still around 700 milliseconds, althrough there is no coarsenings. Any hint for tuning ? Because I want process with G1 use the same heap as CMS to compare the performance. But I cannot do so if rem set is that large, the process will be likely killed by OOM killed if I gave more memory. -Shengzhe On Mon, Jan 6, 2014 at 1:43 PM, Thomas Schatzl wrote: > Hi, > > On Mon, 2014-01-06 at 12:03 -0800, yao wrote: > > Hi All, > > > > I have a java process (HBase region server process ) running under > > Java 7 (1.7.0_40-b43) with G1 enabled. Both Xms and Xmx are the same. > > After running process for a few hours, I see the actual memory used by > > the process is about 10 percent higher than given Xmx. Has anyone > > experienced the similar when use Java 7 or G1 ? Is there useful tools > > to diagnose the cause ? > > > > > > I've tried jmap but the output doesn't say anything about high memory > > usage. FYI, the java process use a large heap (90GB), but the actual > > memory usage ($ top) is about 99GB. > > > Possibly remembered set size. > > Can you enable -XX:+UnlockDiagnosticVMOptions and -XX: > +G1SummarizeRSetStats? > > The line > > Total heap region rem set sizes = 5256086K. Max = 8640K. > > gives you a good idea about remembered set size memory usage. > > I copied above line from one of your responses to the "G1 GC clean up > time is too long" thread, and it seems the remembered set takes ~5GB > there. > > Hth, > Thomas > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140106/9ed3f35b/attachment.html From yu.zhang at oracle.com Mon Jan 6 14:05:38 2014 From: yu.zhang at oracle.com (YU ZHANG) Date: Mon, 06 Jan 2014 14:05:38 -0800 Subject: java process memory usage is higher than Xmx In-Reply-To: <1389044592.5005.3.camel@cirrus> References: <1389044592.5005.3.camel@cirrus> Message-ID: <52CB28B2.7010705@oracle.com> I did a study on G1 vs ParallelgGC native memory footprint. The source for G1 using more memory includes: mtGC: mainly for RS related data structure. internal: internal for tracking Thread: g1 has more internal threads and thread related data structures In Yao's previous email "It might use off-heap memory internally but the problem is, similar machine running under Java 6 with CMS do not have this problem and the real memory usage is very close to Xmx." I am not quite familiar with CMS, does CMS need to keep a similar RS kinda data structure? Thanks, Jenny On 1/6/2014 1:43 PM, Thomas Schatzl wrote: > Hi, > > On Mon, 2014-01-06 at 12:03 -0800, yao wrote: >> Hi All, >> >> I have a java process (HBase region server process ) running under >> Java 7 (1.7.0_40-b43) with G1 enabled. Both Xms and Xmx are the same. >> After running process for a few hours, I see the actual memory used by >> the process is about 10 percent higher than given Xmx. Has anyone >> experienced the similar when use Java 7 or G1 ? Is there useful tools >> to diagnose the cause ? >> >> >> I've tried jmap but the output doesn't say anything about high memory >> usage. FYI, the java process use a large heap (90GB), but the actual >> memory usage ($ top) is about 99GB. >> > Possibly remembered set size. > > Can you enable -XX:+UnlockDiagnosticVMOptions and -XX: > +G1SummarizeRSetStats? > > The line > > Total heap region rem set sizes = 5256086K. Max = 8640K. > > gives you a good idea about remembered set size memory usage. > > I copied above line from one of your responses to the "G1 GC clean up > time is too long" thread, and it seems the remembered set takes ~5GB > there. > > Hth, > Thomas > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From thomas.schatzl at oracle.com Mon Jan 6 15:19:56 2014 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 07 Jan 2014 00:19:56 +0100 Subject: java process memory usage is higher than Xmx In-Reply-To: References: <1389044592.5005.3.camel@cirrus> Message-ID: <1389050396.5530.31.camel@cirrus> Hi Shengzhe, On Mon, 2014-01-06 at 13:49 -0800, yao wrote: > Hi Thomas, > Possibly remembered set size. > Can you enable -XX:+UnlockDiagnosticVMOptions and -XX: > +G1SummarizeRSetStats? > The line > Total heap region rem set sizes = 5256086K. Max = 8640K. > gives you a good idea about remembered set size memory usage. > > You are right, rem set occupies ~7GB Could you add -XX:+G1SummarizeConcMark? The GC then shows some details about the work done during cleanup phases at VM exit. At 7 GB remembered set size most likely the phase that tries to minimize the remembered set is dominant. It should show up as large "RS scrub total time" compared to the "Final counting total time". > > Concurrent RS processed -1863202596 cards > Of 12780432 completed buffers: > 12611979 ( 98.7%) by conc RS threads. > 168453 ( 1.3%) by mutator threads. > Conc RS threads times(s) > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 \ > 0.00 > Total heap region rem set sizes = 7520648K. Max = 13256K. > Static structures = 347K, free_lists = 28814K. > 141012349 occupied cards represented. > Max size region = > 93:(O)[0x00007fcb48000000,0x00007fcb4a000000,0x00007fcb4a000000], size > = 13257K, occupied = 2639K. > Did 0 coarsenings. > > I copied above line from one of your responses to the "G1 GC > clean up > time is too long" thread, and it seems the remembered set > takes ~5GB > there. > > I did set -XX:G1RSetRegionEntries=4096 to avoid coarsenings; however, > the cleanup time seems not being reduced, it is still around 700 > milliseconds, althrough there is no coarsenings. > > Any hint for tuning ? Because I want process with G1 use the same heap > as CMS to compare the performance. But I cannot do so if rem set is You could still compare performance with the same total memory usage. > that large, the process will be likely killed by OOM killed if I gave > more memory. The default value of G1RSetRegionEntries at 32M region size should be 1536 (= (log(region size) - log(1MB) + 1) * G1RSetRegionEntriesBase by default); the chosen value means that you allow G1 to keep a larger remembered set (ie. less coarsening) per region. The "RS scrub" part of cleanup is roughly dependent on remembered set size, and this is the main knob to turn here imo. So it seems that increasing the G1RSetRegionEntries is counter-productive for decreasing gc cleanup time, because scrubbing coarsened remembered sets looks fast. I do not have numbers though, just a feeling. Coarsening mostly increases gc pause time (RS Scan time to be exact). Otoh you mentioned that gc cleanup time did not change when changing G1RSetRegionEntries. It's best to measure where the time is spent using the G1SummarizeConcMark switch and then possibly change the G1RSetRegionEntries value (and measuring the impact). Thomas From yaoshengzhe at gmail.com Mon Jan 6 15:31:55 2014 From: yaoshengzhe at gmail.com (yao) Date: Mon, 6 Jan 2014 15:31:55 -0800 Subject: G1 Full GC without to-space exhausted Message-ID: Hi All We have some interesting G1 GC logs and want to share with you. G1 triggers full GC without to-space exhausted. G1 parameters we use: -server -XX:MaxGCPauseMillis=100 -XX:G1HeapRegionSize=32m -XX:InitiatingHeapOccupancyPercent=65 -XX:G1ReservePercent=20 -XX:G1HeapWastePercent=5 -XX:G1MixedGCLiveThresholdPercent=75 -XX:G1RSetRegionEntries=4096 Note: 1. This machine do not have any full GCs after applying "*-XX:G1ReservePercent=20 -XX:G1HeapWastePercent=5 -XX:G1MixedGCLiveThresholdPercent=75*" since Dec 26, 2013 2. Yesterday, we set -XX:G1RSetRegionEntries=4096 to reduce RSet coarsening and we've observed following two full GCs after a few hours (~ 20 hours). 3. Another production machine with similar traffic (without -XX:G1RSetRegionEntries=4096) do not have full GCs so far (since Dec 26, 2013) *First Full GC*2014-01-06T17:21:11.644-0500: 72496.707: [GC pause (young) Desired survivor size 234881024 bytes, new threshold 3 (max 15) - age 1: 91549360 bytes, 91549360 total - age 2: 83989936 bytes, 175539296 total - age 3: 80986496 bytes, 256525792 total 72496.708: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 29358, predicted base time: 43.17 ms, remaining time: 56.83 ms, target pause time: 100.0\ 0 ms] 72496.708: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 91 regions, survivors: 14 regions, predicted young region time: 34.53 ms] 72496.708: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 91 regions, survivors: 14 regions, old: 0 regions, predicted pause time: 77.70 ms, target pause t\ ime: 100.00 ms] 72496.786: [G1Ergonomics (Concurrent Cycles) request concurrent cycle initiation, reason: occupancy higher than threshold, occupancy: 56841207808 bytes, allocation reques\ t: 0 bytes, threshold: 46172576125 bytes (65.00 %), source: end of GC] , 0.0788860 secs] [Parallel Time: 56.7 ms, GC Workers: 18] [GC Worker Start (ms): Min: 72496707.9, Avg: 72496708.1, Max: 72496708.3, Diff: 0.4] [Ext Root Scanning (ms): Min: 3.1, Avg: 3.6, Max: 5.2, Diff: 2.0, Sum: 64.6] [Update RS (ms): Min: 8.9, Avg: 10.2, Max: 13.1, Diff: 4.3, Sum: 183.5] [Processed Buffers: Min: 5, Avg: 19.4, Max: 26, Diff: 21, Sum: 349] [Scan RS (ms): Min: 3.9, Avg: 6.8, Max: 7.2, Diff: 3.3, Sum: 121.7] [Object Copy (ms): Min: 35.2, Avg: 35.3, Max: 35.4, Diff: 0.2, Sum: 635.2] [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.3] [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.9] [GC Worker Total (ms): Min: 55.7, Avg: 55.9, Max: 56.1, Diff: 0.4, Sum: 1006.2] [GC Worker End (ms): Min: 72496764.0, Avg: 72496764.0, Max: 72496764.1, Diff: 0.1] [Code Root Fixup: 0.0 ms] [Clear CT: 9.7 ms] [Other: 12.5 ms] [Choose CSet: 0.0 ms] [Ref Proc: 6.5 ms] [Ref Enq: 1.3 ms] [Free CSet: 0.6 ms] [Eden: 2912.0M(2912.0M)->0.0B(2944.0M) Survivors: 448.0M->416.0M Heap: 55.9G(66.2G)->53.3G(66.2G)] [Times: user=1.14 sys=0.01, real=0.07 secs] *2014-01-06T17:21:17.773-0500: 72502.837: [Full GC 55G->44G(66G), 42.9123930 secs]* [Eden: 1856.0M(2944.0M)->0.0B(8224.0M) Survivors: 416.0M->0.0B Heap: 55.1G(66.2G)->44.2G(66.2G)] [Times: user=89.27 sys=0.32, real=42.91 secs] *Second Full GC* 2014-01-06T17:22:19.756-0500: 72564.819: [GC pause (young) Desired survivor size 234881024 bytes, new threshold 1 (max 15) - age 1: 480804360 bytes, 480804360 total 72564.819: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 20485, predicted base time: 39.00 ms, remaining time: 61.00 ms, target pause time: 100.0\ 0 ms] 72564.819: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 91 regions, survivors: 14 regions, predicted young region time: 58.68 ms] 72564.820: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 91 regions, survivors: 14 regions, old: 0 regions, predicted pause time: 97.68 ms, target pause t\ ime: 100.00 ms] 72564.921: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: recent GC overhead higher than threshold after GC, recent GC overhead: 35.26 %, threshold: 10.00 %,\ uncommitted: 0 bytes, calculated expansion amount: 0 bytes (20.00 %)] , 0.1015370 secs] [Parallel Time: 86.3 ms, GC Workers: 18] [GC Worker Start (ms): Min: 72564819.8, Avg: 72564820.0, Max: 72564820.1, Diff: 0.3] [Ext Root Scanning (ms): Min: 3.1, Avg: 3.7, Max: 5.4, Diff: 2.2, Sum: 65.8] [SATB Filtering (ms): Min: 0.0, Avg: 0.2, Max: 4.3, Diff: 4.3, Sum: 4.3] [Update RS (ms): Min: 1.0, Avg: 8.7, Max: 69.5, Diff: 68.5, Sum: 156.8] [Processed Buffers: Min: 4, Avg: 18.6, Max: 24, Diff: 20, Sum: 335] [Scan RS (ms): Min: 0.0, Avg: 1.4, Max: 1.8, Diff: 1.8, Sum: 25.6] [Object Copy (ms): Min: 12.4, Avg: 71.3, Max: 75.2, Diff: 62.9, Sum: 1284.0] [Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 2.2] [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.6] [GC Worker Total (ms): Min: 85.4, Avg: 85.5, Max: 85.7, Diff: 0.3, Sum: 1539.3] [GC Worker End (ms): Min: 72564905.5, Avg: 72564905.5, Max: 72564905.6, Diff: 0.1] [Code Root Fixup: 0.0 ms] [Clear CT: 5.8 ms] [Other: 9.5 ms] [Choose CSet: 0.0 ms] [Ref Proc: 5.7 ms] [Ref Enq: 1.5 ms] [Free CSet: 0.4 ms] [Eden: 2912.0M(2912.0M)->0.0B(2912.0M) Survivors: 448.0M->448.0M Heap: 48.4G(66.2G)->46.1G(66.2G)] [Times: user=1.62 sys=0.04, real=0.10 secs] *2014-01-06T17:22:21.027-0500: 72566.090: [Full GC 47G->42G(66G), 39.4019900 secs]* [Eden: 1344.0M(2912.0M)->0.0B(4640.0M) Survivors: 448.0M->0.0B Heap: 47.4G(66.2G)->42.1G(66.2G)] [Times: user=85.26 sys=0.25, real=39.39 secs] *RSet Summarize* Concurrent RS processed -1783517237 cards Of 13181224 completed buffers: 13012752 ( 98.7%) by conc RS threads. 168472 ( 1.3%) by mutator threads. Conc RS threads times(s) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Total heap region rem set sizes = 7388596K. Max = 11538K. Static structures = 347K, free_lists = 27631K. 123115917 occupied cards represented. Max size region = 84:(O)[0x00007fcb36000000,0x00007fcb37ffe258,0x00007fcb38000000], size = 11539K, occupied = 1472K. Did 0 coarsenings. Thanks Shengzhe -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140106/95ee6d50/attachment-0001.html From yaoshengzhe at gmail.com Mon Jan 6 15:49:17 2014 From: yaoshengzhe at gmail.com (yao) Date: Mon, 6 Jan 2014 15:49:17 -0800 Subject: java process memory usage is higher than Xmx In-Reply-To: <1389050396.5530.31.camel@cirrus> References: <1389044592.5005.3.camel@cirrus> <1389050396.5530.31.camel@cirrus> Message-ID: Hi Thomas, Thanks for your good explanation, very informational and helpful. Thanks -Shengzhe On Mon, Jan 6, 2014 at 3:19 PM, Thomas Schatzl wrote: > Hi Shengzhe, > > On Mon, 2014-01-06 at 13:49 -0800, yao wrote: > > Hi Thomas, > > Possibly remembered set size. > > Can you enable -XX:+UnlockDiagnosticVMOptions and -XX: > > +G1SummarizeRSetStats? > > The line > > Total heap region rem set sizes = 5256086K. Max = 8640K. > > gives you a good idea about remembered set size memory usage. > > > > You are right, rem set occupies ~7GB > > Could you add -XX:+G1SummarizeConcMark? The GC then shows some details > about the work done during cleanup phases at VM exit. > > At 7 GB remembered set size most likely the phase that tries to minimize > the remembered set is dominant. > > It should show up as large "RS scrub total time" compared to the "Final > counting total time". > > > > Concurrent RS processed -1863202596 cards > > Of 12780432 completed buffers: > > 12611979 ( 98.7%) by conc RS threads. > > 168453 ( 1.3%) by mutator threads. > > Conc RS threads times(s) > > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > 0.00 0.00 0.00 \ > > 0.00 > > Total heap region rem set sizes = 7520648K. Max = 13256K. > > Static structures = 347K, free_lists = 28814K. > > 141012349 occupied cards represented. > > Max size region = > > 93:(O)[0x00007fcb48000000,0x00007fcb4a000000,0x00007fcb4a000000], size > > = 13257K, occupied = 2639K. > > Did 0 coarsenings. > > > > I copied above line from one of your responses to the "G1 GC > > clean up > > time is too long" thread, and it seems the remembered set > > takes ~5GB > > there. > > > > I did set -XX:G1RSetRegionEntries=4096 to avoid coarsenings; however, > > the cleanup time seems not being reduced, it is still around 700 > > milliseconds, althrough there is no coarsenings. > > > > Any hint for tuning ? Because I want process with G1 use the same heap > > as CMS to compare the performance. But I cannot do so if rem set is > > You could still compare performance with the same total memory usage. > > > that large, the process will be likely killed by OOM killed if I gave > > more memory. > > The default value of G1RSetRegionEntries at 32M region size should be > 1536 (= (log(region size) - log(1MB) + 1) * G1RSetRegionEntriesBase by > default); the chosen value means that you allow G1 to keep a larger > remembered set (ie. less coarsening) per region. > > The "RS scrub" part of cleanup is roughly dependent on remembered set > size, and this is the main knob to turn here imo. > > So it seems that increasing the G1RSetRegionEntries is > counter-productive for decreasing gc cleanup time, because scrubbing > coarsened remembered sets looks fast. I do not have numbers though, just > a feeling. > Coarsening mostly increases gc pause time (RS Scan time to be exact). > > Otoh you mentioned that gc cleanup time did not change when changing > G1RSetRegionEntries. > > It's best to measure where the time is spent using the > G1SummarizeConcMark switch and then possibly change the > G1RSetRegionEntries value (and measuring the impact). > > Thomas > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140106/5f6e9078/attachment.html From bengt.rutisson at oracle.com Tue Jan 7 01:21:15 2014 From: bengt.rutisson at oracle.com (Bengt Rutisson) Date: Tue, 07 Jan 2014 10:21:15 +0100 Subject: G1: higher perm gen footprint or a possible perm gen leak? In-Reply-To: <1388785966.6059.2.camel@cirrus> References: <52C6D83A.8070309@finkzeit.at> <52C6FBE6.6040904@oracle.com> <1388785966.6059.2.camel@cirrus> Message-ID: <52CBC70B.2010901@oracle.com> Hi all, First just a note about the missing PermGen data in the full GC output. That has been fixed for JDK 8 where (there the metadata information is printed instead of course) but I don't think it has been backported to 7u. G1: Output for full GCs with +PrintGCDetails should contain perm gen/meta data size change info https://bugs.openjdk.java.net/browse/JDK-8010738 On 2014-01-03 22:52, Thomas Schatzl wrote: > Hi, > > On Fri, 2014-01-03 at 11:30 -0800, Srinivas Ramakrishna wrote: >> Thanks everyone for sharing yr experiences. As I indicated, I do >> realize that G1 does not collect perm gen concurrently. >> What was surprising was that G1's use of perm gen was much higher >> following its stop-world full gc's >> which would have collected the perm gen. As a result, G1 needed a perm >> gen quite a bit more than twice that >> given to parallel gc to be able to run an application for a certain >> length of time. > Maybe explained by different soft reference policies? I.e. maybe the > input for the soft reference processing is different in both collectors, > making it behave differently, possibly keeping alive more > objects/classes for longer. Yes, this would be my first guess too. We have seen differences between ParallelGC and G1 in the soft reference handling. I think this is mostly due to the different way they estimate the used and free space on the heap since the actual calculation based on that data is then the same for all collectors. On the other hand, as I recall, we saw the opposite behavior. That G1 is more aggressive about cleaning soft references than ParallelGC. Maybe playing around a bit with -XX:SoftRefLRUPolicyMSPerMB can help? Bengt > >> I'll provide more data on perm gen dynamics when I have it. My guess >> would be that somehow G1's use of >> regions in the perm gen is causing a dilation of perm gen footprint on >> account of fragmentation in the G1 perm >> gen regions. If that were the case, I would expect a modest increase >> in the perm gen footprint, but it seemed the increase in >> footprint was much higher. I'll collect and post more concrete numbers >> when I get a chance. > (G1) Perm gen is never region based. > > Thomas > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From Shane.Cox at theice.com Wed Jan 8 05:05:16 2014 From: Shane.Cox at theice.com (Shane Cox) Date: Wed, 8 Jan 2014 08:05:16 -0500 Subject: ParNew pauses longer in JDK7 Message-ID: <752D1F18B064FC46BF3DE5166BD3CE9C01CAFC0E@AT-BP-IXMX-09.theice.com> While benchmarking my application on JDK7, I noticed that minor GC pauses are longer compared to JDK6. One clue may relate to heap size. I noticed that heap size (Xmx) has much more impact on minor GC in JDK7. This is what I have observed: JDK6 w/ 1GB heap: avg minor GC pause = 3.9ms JDK6 w/ 10GB heap: avg minor GC pause = 3.9ms JDK7 w/ 1GB heap: avg minor GC pause = 5ms JDK7 w/ 10GB heap: avg minor GC pause = 13.3ms GC logs attached. Platform info below. Any help understanding this behavior would be appreciated. java version "1.6.0_27" Java(TM) SE Runtime Environment (build 1.6.0_27-b07) Java HotSpot(TM) 64-Bit Server VM (build 20.2-b06, mixed mode) java version "1.7.0_45" Java(TM) SE Runtime Environment (build 1.7.0_45-b18) Java HotSpot(TM) 64-Bit Server VM (build 24.45-b08, mixed mode) -d64 -Xms1000m -Xmx1000m -XX:MaxNewSize=168M -XX:PermSize=48m -XX:MaxPermSize=96m -Xnoclassgc -XX:+DisableExplicitGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=80 -XX:+PrintGCDateStamps -XX:+PrintGCDetails -verbose:gc -Xloggc:./logs/gc-output.log HP ProLiant DL360 G7 Intel(R) Xeon(R) CPU X5670 @ 2.93GHz Red Hat Enterprise Linux Server release 6.4 (Santiago) Linux ll-lt-fxmr-03 2.6.32-358.2.1.el6.x86_64 #1 SMP Wed Feb 20 12:17:37 EST 2013 x86_64 x86_64 x86_64 GNU/Linux ________________________________ This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140108/08531ea5/attachment-0001.html -------------- next part -------------- A non-text attachment was scrubbed... Name: gcLogs.tar.gz Type: application/x-gzip Size: 69885 bytes Desc: gcLogs.tar.gz Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140108/08531ea5/gcLogs.tar-0001.gz From wolfgang.pedot at finkzeit.at Wed Jan 8 05:26:41 2014 From: wolfgang.pedot at finkzeit.at (Wolfgang Pedot) Date: Wed, 08 Jan 2014 14:26:41 +0100 Subject: G1: higher perm gen footprint or a possible perm gen leak? In-Reply-To: <52CB0BA5.2080202@oracle.com> References: <52C6D83A.8070309@finkzeit.at> <52C6FBE6.6040904@oracle.com> <52C721B3.9010909@finkzeit.at> <52CB0BA5.2080202@oracle.com> Message-ID: <52CD5211.4000802@finkzeit.at> Hi, >> I dont really think G1 causes this 3-1 rythm specifically but whats >> interesting is that CMS with ClassUnloading never got significantly >> below that 0.8GB if I remember correctly. > Try > > -XX:MarkSweepAlwaysCompactCount=1 > > which should make every full GC compact out all > the dead space. > > Alternatively try > > -XX:MarkSweepAlwaysCompactCount=8 > > and see if that changes the pattern. > thats it, with a value of 1 all PermGen collects reach the same usage. Since the compacting collects are not visibly slower than "normal" full GCs I guess I?ll lower that value on the live system to increase time between full GCs. thanks for the tip Wolfgang From Andreas.Mueller at mgm-tp.com Wed Jan 8 05:53:16 2014 From: Andreas.Mueller at mgm-tp.com (=?iso-8859-1?Q?Andreas_M=FCller?=) Date: Wed, 8 Jan 2014 13:53:16 +0000 Subject: ParNew pauses longer in JDK7 Message-ID: <46FF8393B58AD84D95E444264805D98FBDE13A2A@edata01.mgm-edv.de> Hi Shane, >While benchmarking my application on JDK7, I noticed that minor GC pauses are longer compared to JDK6. One clue may relate to heap size. I noticed that heap size (Xmx) has much more impact on minor GC in JDK7. This is what I have observed: >JDK6 w/ 1GB heap: avg minor GC pause = 3.9ms >JDK6 w/ 10GB heap: avg minor GC pause = 3.9ms >JDK7 w/ 1GB heap: avg minor GC pause = 5ms >JDK7 w/ 10GB heap: avg minor GC pause = 13.3ms Very interesting: Only your comparison with Java 6 made clear to me that this is a bug in Java7! You can probably work around that problem and make Java 7 perform better by explicitly setting the NewSize to a much higher value than the default of around 160 MB which I see in all the GC logs. I have observed before that the CMS collector does not perform well for small (and default) NewSize. Find the details here: http://blog.mgm-tp.com/2013/12/benchmarking-g1-and-other-java-7-garbage-collectors/ in the section about the GarbageOnly benchmark and, in particular, figures 2+3. I have further observed that the increase in accumulated pause time (shown in figure 3) for smaller values of NewSize comes about because ParNew pauses get LONGER when NewSize gets SMALLER (which is odd enough). I have a figure showing that but removed it from my blog post because it is (too) long already. I will send it to you in an extra mail, though. It did, however, not occur to me to check that rather odd behavior with older Java versions (as you did with Java 6). Thanks for asking your question! You helped me to understand that there is another bug which I unknowingly documented on my blog post. The first one is already highlighted in figure 11 of the same article and was also discussed on this mailing list some weeks ago. Mit freundlichen Gr??en/Best regards Andreas M?ller mgm technology partners GmbH Frankfurter Ring 105a 80807 M?nchen Tel. +49 (89) 35 86 80-633 Fax +49 (89) 35 86 80-288 E-Mail Andreas.Mueller at mgm-tp.com Innovation Implemented. Sitz der Gesellschaft: M?nchen Gesch?ftsf?hrer: Hamarz Mehmanesh Handelsregister: AG M?nchen HRB 105068 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140108/928708b7/attachment.html From Andreas.Mueller at mgm-tp.com Wed Jan 8 06:05:50 2014 From: Andreas.Mueller at mgm-tp.com (=?iso-8859-1?Q?Andreas_M=FCller?=) Date: Wed, 8 Jan 2014 14:05:50 +0000 Subject: AW: Re: ParNew pauses longer in JDK7 Message-ID: <46FF8393B58AD84D95E444264805D98FBDE13A41@edata01.mgm-edv.de> Hi Shane, your problem is documented in the purple lines of the attached plot: New gen pauses get LONGER when new gen size gets smaller for the CMS collector only. As you can see from figure 3 of my blog post this translates into much more accumulated pause time because new gen pauses also get MORE FREQUENT when new gen size gets smaller. The fast growth of accumulated pause time directly translates into a sharp decrease of GC throughput (figure 2) as is explained (even with a formula) in the text. Remeasuring those purple lines with Java 6 would probably show that this is a Java 7 problem. So far I have only measured one point (at NewSize=160 MB) in comparison to the solid purple line which confirmed what you observed with your benchmark. As there is a link to the source code of my benchmark in the article anybody can use it to reproduce the difference in Java 6 and 7. Best regards Andreas Von: Andreas M?ller Gesendet: Mittwoch, 8. Januar 2014 14:53 An: 'Shane.Cox at theice.com' Cc: hotspot-gc-use at openjdk.java.net Betreff: Re: ParNew pauses longer in JDK7 Hi Shane, >While benchmarking my application on JDK7, I noticed that minor GC pauses are longer compared to JDK6. One clue may relate to heap size. I noticed that heap size (Xmx) has much more impact on minor GC in JDK7. This is what I have observed: >JDK6 w/ 1GB heap: avg minor GC pause = 3.9ms >JDK6 w/ 10GB heap: avg minor GC pause = 3.9ms >JDK7 w/ 1GB heap: avg minor GC pause = 5ms >JDK7 w/ 10GB heap: avg minor GC pause = 13.3ms Very interesting: Only your comparison with Java 6 made clear to me that this is a bug in Java7! You can probably work around that problem and make Java 7 perform better by explicitly setting the NewSize to a much higher value than the default of around 160 MB which I see in all the GC logs. I have observed before that the CMS collector does not perform well for small (and default) NewSize. Find the details here: http://blog.mgm-tp.com/2013/12/benchmarking-g1-and-other-java-7-garbage-collectors/ in the section about the GarbageOnly benchmark and, in particular, figures 2+3. I have further observed that the increase in accumulated pause time (shown in figure 3) for smaller values of NewSize comes about because ParNew pauses get LONGER when NewSize gets SMALLER (which is odd enough). I have a figure showing that but removed it from my blog post because it is (too) long already. I will send it to you in an extra mail, though. It did, however, not occur to me to check that rather odd behavior with older Java versions (as you did with Java 6). Thanks for asking your question! You helped me to understand that there is another bug which I unknowingly documented on my blog post. The first one is already highlighted in figure 11 of the same article and was also discussed on this mailing list some weeks ago. Mit freundlichen Gr??en/Best regards Andreas M?ller mgm technology partners GmbH Frankfurter Ring 105a 80807 M?nchen Tel. +49 (89) 35 86 80-633 Fax +49 (89) 35 86 80-288 E-Mail Andreas.Mueller at mgm-tp.com Innovation Implemented. Sitz der Gesellschaft: M?nchen Gesch?ftsf?hrer: Hamarz Mehmanesh Handelsregister: AG M?nchen HRB 105068 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140108/a40d3fd5/attachment-0001.html -------------- next part -------------- A non-text attachment was scrubbed... Name: NewGenPauseDuration.png Type: image/png Size: 61103 bytes Desc: NewGenPauseDuration.png Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140108/a40d3fd5/NewGenPauseDuration-0001.png From gustav.r.akesson at gmail.com Mon Jan 13 02:50:00 2014 From: gustav.r.akesson at gmail.com (=?ISO-8859-1?Q?Gustav_=C5kesson?=) Date: Mon, 13 Jan 2014 11:50:00 +0100 Subject: Long remark due to young generation occupancy Message-ID: Hi, This is a topic which has been discussed before, but I think I have some new findings. We're experiencing problems with CMS pauses. Settings we are using. -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=68 -XX:MaxTenuringThreshold=0 -XX:+UseParNewGC -XX:+ScavengeBeforeFullGC -XX:CMSWaitDuration=30000 -Xmx2048M -Xms2048M -Xmn1024M Note that MaxTenuringThreshold is 0. This is only done during test to provoke the CMS to run more frequently (otherwise it runs once every day...). Due to this, promotion to old generation is around 400K to 1M per second. We have an allocation rate of roughly 1G per second, meaning that YGC runs once every second. We're running JDK7u17. This is a log entry when running with above settings. This entry is the typical example to all of the CMS collections in this test. *2014-01-13T09:31:52.504+0100: 661.675: [GC [1 CMS-initial-mark: 524986K(1048576K)] 526507K(2096192K), 0.0023550 secs] [Times: user=0.00 sys=0.00, real=0.01 secs]*2014-01-13T09:31:52.506+0100: 661.677: [CMS-concurrent-mark-start] 2014-01-13T09:31:52.644+0100: 661.815: [CMS-concurrent-mark: 0.138/0.138 secs] [Times: user=1.96 sys=0.11, real=0.13 secs] 2014-01-13T09:31:52.644+0100: 661.815: [CMS-concurrent-preclean-start] 2014-01-13T09:31:52.655+0100: 661.826: [CMS-concurrent-preclean: 0.010/0.011 secs] [Times: user=0.14 sys=0.02, real=0.02 secs] 2014-01-13T09:31:52.655+0100: 661.826: [CMS-concurrent-abortable-preclean-start] 2014-01-13T09:31:53.584+0100: 662.755: [GC 662.755: [ParNew Desired survivor size 491520 bytes, new threshold 0 (max 0) : 1046656K->0K(1047616K), 0.0039870 secs] 1571642K->525579K(2096192K), 0.0043310 secs] [Times: user=0.04 sys=0.00, real=0.01 secs] 2014-01-13T09:31:54.146+0100: 663.317: [CMS-concurrent-abortable-preclean: 0.831/1.491 secs] [Times: user=16.76 sys=1.54, real=1.49 secs] *2014-01-13T09:31:54.148+0100: 663.319: [GC[YG occupancy: 552670 K (1047616 K)]663.319: [Rescan (parallel) , 0.2000060 secs]663.519: [weak refs processing, 0.0008740 secs]663.520: [scrub string table, 0.0006940 secs] [1 CMS-remark: 525579K(1048576K)] 1078249K(2096192K), 0.2017690 secs] [Times: user=3.53 sys=0.01, real=0.20 secs]*2014-01-13T09:31:54.350+0100: 663.521: [CMS-concurrent-sweep-start] 2014-01-13T09:31:54.846+0100: 664.017: [GC 664.017: [ParNew Desired survivor size 491520 bytes, new threshold 0 (max 0) : 1046656K->0K(1047616K), 0.0033500 secs] 1330075K->284041K(2096192K), 0.0034660 secs] [Times: user=0.04 sys=0.00, real=0.00 secs] 2014-01-13T09:31:55.020+0100: 664.191: [CMS-concurrent-sweep: 0.665/0.670 secs] [Times: user=7.77 sys=0.71, real=0.67 secs] 2014-01-13T09:31:55.020+0100: 664.191: [CMS-concurrent-reset-start] 2014-01-13T09:31:55.023+0100: 664.194: [CMS-concurrent-reset: 0.003/0.003 secs] [Times: user=0.03 sys=0.00, real=0.00 secs] The initial pause is fine. Then I investigated how to reduce the remark phase, and activated -XX:+CMSScavengeBeforeRemark. That flag partly solves this issue (not active in the log above), but I've seen cases when it does not scavenge (I suspect JNI critical section), which is bad and generates yet again long remark pause. And yet again the pause is correlated towards the occupancy in young. So instead, I tried setting... -XX:CMSScheduleRemarkEdenPenetration=0 -XX:CMSScheduleRemarkEdenSizeThreshold=0 This is a log entry with the settings at the top plus the two above... *2014-01-13T10:18:25.757+0100: 590.198: [GC [1 CMS-initial-mark: 524654K(1048576K)] 526646K(2096192K), 0.0029130 secs] [Times: user=0.00 sys=0.00, real=0.01 secs]*2014-01-13T10:18:25.760+0100: 590.201: [CMS-concurrent-mark-start] 2014-01-13T10:18:25.904+0100: 590.345: [CMS-concurrent-mark: 0.144/0.144 secs] [Times: user=1.98 sys=0.15, real=0.14 secs] 2014-01-13T10:18:25.904+0100: 590.346: [CMS-concurrent-preclean-start] 2014-01-13T10:18:25.912+0100: 590.354: [CMS-concurrent-preclean: 0.008/0.008 secs] [Times: user=0.11 sys=0.00, real=0.01 secs] 2014-01-13T10:18:25.912+0100: 590.354: [CMS-concurrent-abortable-preclean-start] 2014-01-13T10:18:26.836+0100: 591.278: [GC 591.278: [ParNew Desired survivor size 491520 bytes, new threshold 0 (max 0) : 1046656K->0K(1047616K), 0.0048160 secs] 1571310K->525477K(2096192K), 0.0049240 secs] [Times: user=0.05 sys=0.00, real=0.01 secs] 2014-01-13T10:18:26.842+0100: 591.283: [CMS-concurrent-abortable-preclean: 0.608/0.929 secs] [Times: user=10.77 sys=0.97, real=0.93 secs] *2014-01-13T10:18:26.843+0100: 591.285: [GC[YG occupancy: 20938 K (1047616 K)]591.285: [Rescan (parallel) , 0.0024770 secs]591.287: [weak refs processing, 0.0007760 secs]591.288: [scrub string table, 0.0006440 secs] [1 CMS-remark: 525477K(1048576K)] 546415K(2096192K), 0.0040480 secs] [Times: user=0.03 sys=0.00, real=0.00 secs]*2014-01-13T10:18:26.848+0100: 591.289: [CMS-concurrent-sweep-start] 2014-01-13T10:18:27.573+0100: 592.015: [CMS-concurrent-sweep: 0.726/0.726 secs] [Times: user=8.50 sys=0.76, real=0.73 secs] 2014-01-13T10:18:27.573+0100: 592.015: [CMS-concurrent-reset-start] 2014-01-13T10:18:27.576+0100: 592.017: [CMS-concurrent-reset: 0.003/0.003 secs] [Times: user=0.03 sys=0.01, real=0.00 secs] This means that when I set these two, CMS STWs go from ~200ms to below 10ms. I'm leaning towards activating... -XX:CMSScheduleRemarkEdenPenetration=0 -XX:CMSScheduleRemarkEdenSizeThreshold=0 -XX:CMSMaxAbortablePrecleanTime=30000 What I have seen with these flags is that as soon as a young is completely collected during abortable preclean, the remark is scheduled and since it can start when eden is nearly empty, it is ridicously fast. In case it takes a long time for preclean to catch a young collection, it is also fine because no promotion is being made. We can live with the pause of young plus a consecutive remark (for us, young is ~10ms). So, to the question - is there any obvious drawbacks with the three settings above? Why does eden have to be 50% (default) in order for a remark to be scheduled (besides spreading the pause)? It does only seem to do harm. Any reason? -XX:+CMSScavengeBeforeRemark I'm thinking to avoid since it can't be completely trusted. Usually it helps, but that is not good enough since the pauses get irregular in case it fails. And with these settings above, it will only add to the CMS pause. Best Regards, Gustav ?kesson -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140113/18b50992/attachment.html From stoth at miami-holdings.com Tue Jan 14 12:10:40 2014 From: stoth at miami-holdings.com (Steven Toth) Date: Tue, 14 Jan 2014 20:10:40 +0000 Subject: Seeking assistance with long garbage collection pauses with G1GC In-Reply-To: References: <0FF288A8F3C0E44A83B55C4E260BF0032AFC0B4F@DNY2I1EXC03.miamiholdings.corp> <0FF288A8F3C0E44A83B55C4E260BF0032AFD04E3@DNY2I1EXC03.miamiholdings.corp> Message-ID: <0FF288A8F3C0E44A83B55C4E260BF0032AFFEEFD@DNY2I1EXC03.miamiholdings.corp> Charlie, thank you very much. After disabling THP on our servers and running for the past few weeks in production we've received no long pauses for GC's. That was a lifesaver. -Steve -----Original Message----- From: charlie hunt [mailto:charlesjhunt at gmail.com] Sent: Thursday, December 12, 2013 11:51 AM To: Steven Toth Cc: hotspot-gc-use at openjdk.java.net; Randy Foster Subject: Re: Seeking assistance with long garbage collection pauses with G1GC Fyi, G1 was not officially supported on until JDK 1.7.0_04, aka 7u4. Not only are there many improvements in 7u4 vs 7u3, but many improvements since 7u4. I'd recommend you work with 7u40 or 7u45. All the above said, copy times look incredibly high for a 3 gb Java heap. Depending on your version of RHEL, if transparent huge pages are an available feature on your version RHEL, disable it. You might be seeing huge page coalescing which is contributing to your high sys time. Alternatively you may be paging / swapping, or possibly having high thread context switching. You might also need to throttle back the number GC threads. hths, charlie ... On Dec 10, 2013, at 6:16 PM, Steven Toth wrote: > Hello, > > We've been struggling with long pauses with the G1GC garbage collector for several weeks now and was hoping to get some assistance. > > We have a Java app running in a standalone JVM on RHEL. The app listens for data on one or more sockets, queues the data, and has scheduled threads pulling the data off the queue and persisting it. The data is wide, over 700 data elements per record, though all of the data elements are small Strings, Integers, or Longs. > > The app runs smoothly for periods of time, sometimes 30 minutes to an hour, but then we experience one or more long garbage collection pauses. The logs indicate the majority of the pause time is spent in the Object Copy time. The long pauses also have high sys time relative to the other shorter collections. > > Here are the JVM details: > > java version "1.7.0_03" > Java(TM) SE Runtime Environment (build 1.7.0_03-b04) Java HotSpot(TM) 64-Bit Server VM (build 22.1-b02, mixed mode) > > Here are the JVM options: > > -XX:MaxPermSize=256m -XX:PermSize=256m -Xms3G -Xmx3G -XX:+UseG1GC -XX:G1HeapRegionSize=32M -XX:-UseGCOverheadLimit \ -Xloggc:logs/gc-STAT5-collector.log -XX:+PrintGCDetails -XX:+PrintGCDateStamps \ -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=8 -XX:GCLogFileSize=10M -XX:+PrintGCApplicationStoppedTime \ -XX:MaxNewSize=1G -XX:NewSize=1G \ -XX:+PrintGCApplicationStoppedTime -XX:+PrintTenuringDistribution -XX:+PrintAdaptiveSizePolicy > > After several iterations of experimenting with an assortment of options (including no options other than -Xms and -Xmx) the aforementioned options have given us the best performance with the fewest amount of long pauses. However we're still experiencing several dozen garbage collections a day that range from 1-5 seconds. > > The process is taskset to 4 cores (all on the same socket), but is barely using 2 of them. All of the processes on this box are pinned to their own cores (with 0 and 1 unused). The machine has plenty of free memory (20+G) and top shows the process using 2.5G of RES memory. > > A day's worth of garbage collection logs are attached, but here is an example of the GC log output with high Object Copy and sys time. There are numerous GC events comparable to the example below with near identical Eden/Survivors/Heap sizes that take well under 100 millis whereas this example took over 2 seconds. > > [Object Copy (ms): 2090.4 2224.0 2484.0 2160.1 1603.9 2071.2 887.8 1608.1 1992.0 2030.5 1692.5 1583.9 2140.3 1703.0 2174.0 1949.5 1941.1 2190.1 2153.3 1604.1 1930.8 1892.6 1651.9 > > [Eden: 1017M(1017M)->0B(1016M) Survivors: 7168K->8192K Heap: 1062M(3072M)->47M(3072M)] > > [Times: user=2.24 sys=7.22, real=2.49 secs] > > Any help would be greatly appreciated. > > Thanks. > > -Steve > > > ****Confidentiality Note**** This e-mail may contain confidential and or privileged information and is solely for the use of the sender's intended recipient(s). Any review, dissemination, copying, printing or other use of this e-mail by any other persons or entities is prohibited. If you have received this e-mail in error, please contact the sender immediately by reply email and delete the material from any computer. > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use ***Confidentiality Note*** This e-mail may contain confidential and or privileged information and is solely for the use of the sender's intended recipient(s). Any review, dissemination, copying, printing or other use of this e-mail by any other persons or entities is prohibited. If you have received this e-mail in error, please contact the sender immediately by reply email and delete the material from any computer. From bernd-2014 at eckenfels.net Tue Jan 14 13:34:01 2014 From: bernd-2014 at eckenfels.net (Bernd Eckenfels) Date: Tue, 14 Jan 2014 22:34:01 +0100 Subject: Seeking assistance with long garbage collection pauses with G1GC In-Reply-To: <0FF288A8F3C0E44A83B55C4E260BF0032AFFEEFD@DNY2I1EXC03.miamiholdings.corp> References: <0FF288A8F3C0E44A83B55C4E260BF0032AFC0B4F@DNY2I1EXC03.miamiholdings.corp> <0FF288A8F3C0E44A83B55C4E260BF0032AFD04E3@DNY2I1EXC03.miamiholdings.corp> <0FF288A8F3C0E44A83B55C4E260BF0032AFFEEFD@DNY2I1EXC03.miamiholdings.corp> Message-ID: Hello, I wonder if there is anything the VM can do to avoid this? Maybe with some memadvice, changed allocation pattern or similiar? Is this only a problem when the VM does not use LargePages itself, or is it affected in that case as well? Gruss Bernd Am 14.01.2014, 21:10 Uhr, schrieb Steven Toth : > Charlie, thank you very much. After disabling THP on our servers and > running for the past few weeks in production we've received no long > pauses for GC's. > > That was a lifesaver. > > -Steve > > > -----Original Message----- > From: charlie hunt [mailto:charlesjhunt at gmail.com] > Sent: Thursday, December 12, 2013 11:51 AM > To: Steven Toth > Cc: hotspot-gc-use at openjdk.java.net; Randy Foster > Subject: Re: Seeking assistance with long garbage collection pauses with > G1GC > > Fyi, G1 was not officially supported on until JDK 1.7.0_04, aka 7u4. > > Not only are there many improvements in 7u4 vs 7u3, but many > improvements since 7u4. I'd recommend you work with 7u40 or 7u45. > > All the above said, copy times look incredibly high for a 3 gb Java > heap. Depending on your version of RHEL, if transparent huge pages are > an available feature on your version RHEL, disable it. You might be > seeing huge page coalescing which is contributing to your high sys time. > Alternatively you may be paging / swapping, or possibly having high > thread context switching. > > You might also need to throttle back the number GC threads. > > hths, > > charlie ... > > On Dec 10, 2013, at 6:16 PM, Steven Toth > wrote: > >> Hello, >> >> We've been struggling with long pauses with the G1GC garbage collector >> for several weeks now and was hoping to get some assistance. >> >> We have a Java app running in a standalone JVM on RHEL. The app >> listens for data on one or more sockets, queues the data, and has >> scheduled threads pulling the data off the queue and persisting it. The >> data is wide, over 700 data elements per record, though all of the data >> elements are small Strings, Integers, or Longs. >> >> The app runs smoothly for periods of time, sometimes 30 minutes to an >> hour, but then we experience one or more long garbage collection >> pauses. The logs indicate the majority of the pause time is spent in >> the Object Copy time. The long pauses also have high sys time relative >> to the other shorter collections. >> >> Here are the JVM details: >> >> java version "1.7.0_03" >> Java(TM) SE Runtime Environment (build 1.7.0_03-b04) Java HotSpot(TM) >> 64-Bit Server VM (build 22.1-b02, mixed mode) >> >> Here are the JVM options: >> >> -XX:MaxPermSize=256m -XX:PermSize=256m -Xms3G -Xmx3G -XX:+UseG1GC >> -XX:G1HeapRegionSize=32M -XX:-UseGCOverheadLimit \ >> -Xloggc:logs/gc-STAT5-collector.log -XX:+PrintGCDetails >> -XX:+PrintGCDateStamps \ -XX:+PrintGCTimeStamps >> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=8 >> -XX:GCLogFileSize=10M -XX:+PrintGCApplicationStoppedTime \ >> -XX:MaxNewSize=1G -XX:NewSize=1G \ -XX:+PrintGCApplicationStoppedTime >> -XX:+PrintTenuringDistribution -XX:+PrintAdaptiveSizePolicy >> >> After several iterations of experimenting with an assortment of options >> (including no options other than -Xms and -Xmx) the aforementioned >> options have given us the best performance with the fewest amount of >> long pauses. However we're still experiencing several dozen garbage >> collections a day that range from 1-5 seconds. >> >> The process is taskset to 4 cores (all on the same socket), but is >> barely using 2 of them. All of the processes on this box are pinned to >> their own cores (with 0 and 1 unused). The machine has plenty of free >> memory (20+G) and top shows the process using 2.5G of RES memory. >> >> A day's worth of garbage collection logs are attached, but here is an >> example of the GC log output with high Object Copy and sys time. There >> are numerous GC events comparable to the example below with near >> identical Eden/Survivors/Heap sizes that take well under 100 millis >> whereas this example took over 2 seconds. >> >> [Object Copy (ms): 2090.4 2224.0 2484.0 2160.1 1603.9 2071.2 887.8 >> 1608.1 1992.0 2030.5 1692.5 1583.9 2140.3 1703.0 2174.0 1949.5 1941.1 >> 2190.1 2153.3 1604.1 1930.8 1892.6 1651.9 >> >> [Eden: 1017M(1017M)->0B(1016M) Survivors: 7168K->8192K Heap: >> 1062M(3072M)->47M(3072M)] >> >> [Times: user=2.24 sys=7.22, real=2.49 secs] >> >> Any help would be greatly appreciated. >> >> Thanks. >> >> -Steve >> >> >> ****Confidentiality Note**** This e-mail may contain confidential and >> or privileged information and is solely for the use of the sender's >> intended recipient(s). Any review, dissemination, copying, printing or >> other use of this e-mail by any other persons or entities is >> prohibited. If you have received this e-mail in error, please contact >> the sender immediately by reply email and delete the material from any >> computer. >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > ***Confidentiality Note*** This e-mail may contain confidential and or > privileged information and is solely for the use of the sender's > intended recipient(s). Any review, dissemination, copying, printing or > other use of this e-mail by any other persons or entities is prohibited. > If you have received this e-mail in error, please contact the sender > immediately by reply email and delete the material from any computer. > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -- http://bernd.eckenfels.net From carlo.fernando at baml.com Tue Jan 14 13:47:33 2014 From: carlo.fernando at baml.com (Fernando, Carlo) Date: Tue, 14 Jan 2014 21:47:33 +0000 Subject: Trying to understand what happens during GenCollectForALlocation Message-ID: <204609DC9565564AA71E9B4312EA323240A38D8F@smtp_mail.bankofamerica.com> Hello. I'm trying to reduce our server latency and I'm trying to understand the meaning of this SafePointStatistic output. A snippet of the GC log: 2014-01-10T08:54:12.767+0000: 110949.481: [GC 110949.481: [ParNew: 65673K->83K(76480K), 0.0013940 secs] 88634K->23044K(251264K), 0.0014490 secs] [Times: user=0.00 sys=0.00, real=0.01 secs] Total time for which application threads were stopped: 0.0048290 seconds A snippet of the Safepoint log which what I think correlates to the GC log: 110949.484: GenCollectForAllocation [ 55 0 0 ] [ 0 0 0 0 4 ] 0 Is it correct to say that the whole duration of the GC is 1ms but because of the safepoint, total STW was 4ms? Also, what could the possible cause be of the 4ms pause? In addition, I also noticed 1 output where real time was larger than user+sys. Would that indicate some type of cpu starvation? 2014-01-10T13:11:04.727+0000: 126361.441: [GC 126361.473: [ParNew: 65676K->72K(76480K), 0.0014070 secs] 89630K->24025K(251264K), 0.0014720 secs] [Times: user=0.01 sys=0.00, real=0.03 secs] Total time for which application threads were stopped: 0.0335640 seconds 126361.438: GenCollectForAllocation [ 55 0 0 ] [ 0 0 0 0 33 ] 0 Please let me know if you need any other info. Thanks -carlo ---------------------------------------------------------------------- This message, and any attachments, is for the intended recipient(s) only, may contain information that is privileged, confidential and/or proprietary and subject to important terms and conditions available at http://www.bankofamerica.com/emaildisclaimer. If you are not the intended recipient, please delete this message. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140114/4ad90b4e/attachment.html From jon.masamitsu at oracle.com Wed Jan 15 10:24:34 2014 From: jon.masamitsu at oracle.com (Jon Masamitsu) Date: Wed, 15 Jan 2014 10:24:34 -0800 Subject: Long remark due to young generation occupancy In-Reply-To: References: Message-ID: <52D6D262.6010106@oracle.com> > -XX:CMSScheduleRemarkEdenPenetration=0 Schedule the remark pause immediately after the next young collection. > -XX:CMSScheduleRemarkEdenSizeThreshold=0 Any sized eden should allow scheduling of the remark pause. That is, no eden is too small to schedule. > -XX:CMSMaxAbortablePrecleanTime=30000 Wait up to 30 seconds for the remark to be scheduled after a young collection. Otherwise, wait only up to the default of 5 seconds. > So, to the question - is there any obvious drawbacks with the three > settings above? Why does eden have to be 50% (default) in order for a > remark to be scheduled (besides spreading the pause)? It does only > seem to do harm. Any reason? The default is 50% to try and place the remark pause between two young pauses (spread it out as you say). I don't believe it is always the case that the remark pause is very small if it is scheduled immediately after a young collection. In such cases we still want to spread out the pauses. If the remark is delayed to wait for the next young collection, the sweeping is also delayed. You're not using up space in the CMS (tenured) generation but you're also not collecting garbage and not making additional space available for reuse (which the concurrent sweep does). Jon On 01/13/2014 02:50 AM, Gustav ?kesson wrote: > Hi, > This is a topic which has been discussed before, but I think I have > some new findings. We're experiencing problems with CMS pauses. > Settings we are using. > -XX:+UseConcMarkSweepGC > -XX:CMSInitiatingOccupancyFraction=68 > -XX:MaxTenuringThreshold=0 > -XX:+UseParNewGC > -XX:+ScavengeBeforeFullGC > -XX:CMSWaitDuration=30000 > -Xmx2048M > -Xms2048M > -Xmn1024M > Note that MaxTenuringThreshold is 0. This is only done during test to > provoke the CMS to run more frequently (otherwise it runs once every > day...). Due to this, promotion to old generation is around 400K to 1M > per second. > We have an allocation rate of roughly 1G per second, meaning that YGC > runs once every second. > We're running JDK7u17. > This is a log entry when running with above settings. This entry is > the typical example to all of the CMS collections in this test. > *2014-01-13T09:31:52.504+0100: 661.675: [GC [1 CMS-initial-mark: > 524986K(1048576K)] 526507K(2096192K), 0.0023550 secs] [Times: > user=0.00 sys=0.00, real=0.01 secs] > *2014-01-13T09:31:52.506+0100: 661.677: [CMS-concurrent-mark-start] > 2014-01-13T09:31:52.644+0100: 661.815: [CMS-concurrent-mark: > 0.138/0.138 secs] [Times: user=1.96 sys=0.11, real=0.13 secs] > 2014-01-13T09:31:52.644+0100: 661.815: [CMS-concurrent-preclean-start] > 2014-01-13T09:31:52.655+0100: 661.826: [CMS-concurrent-preclean: > 0.010/0.011 secs] [Times: user=0.14 sys=0.02, real=0.02 secs] > 2014-01-13T09:31:52.655+0100: 661.826: > [CMS-concurrent-abortable-preclean-start] > 2014-01-13T09:31:53.584+0100: 662.755: [GC 662.755: [ParNew > Desired survivor size 491520 bytes, new threshold 0 (max 0) > : 1046656K->0K(1047616K), 0.0039870 secs] 1571642K->525579K(2096192K), > 0.0043310 secs] [Times: user=0.04 sys=0.00, real=0.01 secs] > 2014-01-13T09:31:54.146+0100: 663.317: > [CMS-concurrent-abortable-preclean: 0.831/1.491 secs] [Times: > user=16.76 sys=1.54, real=1.49 secs] > *2014-01-13T09:31:54.148+0100: 663.319: [GC[YG occupancy: 552670 K > (1047616 K)]663.319: [Rescan (parallel) , 0.2000060 secs]663.519: > [weak refs processing, 0.0008740 secs]663.520: [scrub string table, > 0.0006940 secs] [1 CMS-remark: 525579K(1048576K)] 1078249K(2096192K), > 0.2017690 secs] [Times: user=3.53 sys=0.01, real=0.20 secs] > *2014-01-13T09:31:54.350+0100: 663.521: [CMS-concurrent-sweep-start] > 2014-01-13T09:31:54.846+0100: 664.017: [GC 664.017: [ParNew > Desired survivor size 491520 bytes, new threshold 0 (max 0) > : 1046656K->0K(1047616K), 0.0033500 secs] 1330075K->284041K(2096192K), > 0.0034660 secs] [Times: user=0.04 sys=0.00, real=0.00 secs] > 2014-01-13T09:31:55.020+0100: 664.191: [CMS-concurrent-sweep: > 0.665/0.670 secs] [Times: user=7.77 sys=0.71, real=0.67 secs] > 2014-01-13T09:31:55.020+0100: 664.191: [CMS-concurrent-reset-start] > 2014-01-13T09:31:55.023+0100: 664.194: [CMS-concurrent-reset: > 0.003/0.003 secs] [Times: user=0.03 sys=0.00, real=0.00 secs] > The initial pause is fine. Then I investigated how to reduce the > remark phase, and activated -XX:+CMSScavengeBeforeRemark. That flag > partly solves this issue (not active in the log above), but I've seen > cases when it does not scavenge (I suspect JNI critical section), > which is bad and generates yet again long remark pause. And yet again > the pause is correlated towards the occupancy in young. > So instead, I tried setting... > -XX:CMSScheduleRemarkEdenPenetration=0 > -XX:CMSScheduleRemarkEdenSizeThreshold=0 > This is a log entry with the settings at the top plus the two above... > *2014-01-13T10:18:25.757+0100: 590.198: [GC [1 CMS-initial-mark: > 524654K(1048576K)] 526646K(2096192K), 0.0029130 secs] [Times: > user=0.00 sys=0.00, real=0.01 secs] > *2014-01-13T10:18:25.760+0100: 590.201: [CMS-concurrent-mark-start] > 2014-01-13T10:18:25.904+0100: 590.345: [CMS-concurrent-mark: > 0.144/0.144 secs] [Times: user=1.98 sys=0.15, real=0.14 secs] > 2014-01-13T10:18:25.904+0100: 590.346: [CMS-concurrent-preclean-start] > 2014-01-13T10:18:25.912+0100: 590.354: [CMS-concurrent-preclean: > 0.008/0.008 secs] [Times: user=0.11 sys=0.00, real=0.01 secs] > 2014-01-13T10:18:25.912+0100: 590.354: > [CMS-concurrent-abortable-preclean-start] > 2014-01-13T10:18:26.836+0100: 591.278: [GC 591.278: [ParNew > Desired survivor size 491520 bytes, new threshold 0 (max 0) > : 1046656K->0K(1047616K), 0.0048160 secs] 1571310K->525477K(2096192K), > 0.0049240 secs] [Times: user=0.05 sys=0.00, real=0.01 secs] > 2014-01-13T10:18:26.842+0100: 591.283: > [CMS-concurrent-abortable-preclean: 0.608/0.929 secs] [Times: > user=10.77 sys=0.97, real=0.93 secs] > *2014-01-13T10:18:26.843+0100: 591.285: [GC[YG occupancy: 20938 K > (1047616 K)]591.285: [Rescan (parallel) , 0.0024770 secs]591.287: > [weak refs processing, 0.0007760 secs]591.288: [scrub string table, > 0.0006440 secs] [1 CMS-remark: 525477K(1048576K)] 546415K(2096192K), > 0.0040480 secs] [Times: user=0.03 sys=0.00, real=0.00 secs] > *2014-01-13T10:18:26.848+0100: 591.289: [CMS-concurrent-sweep-start] > 2014-01-13T10:18:27.573+0100: 592.015: [CMS-concurrent-sweep: > 0.726/0.726 secs] [Times: user=8.50 sys=0.76, real=0.73 secs] > 2014-01-13T10:18:27.573+0100: 592.015: [CMS-concurrent-reset-start] > 2014-01-13T10:18:27.576+0100: 592.017: [CMS-concurrent-reset: > 0.003/0.003 secs] [Times: user=0.03 sys=0.01, real=0.00 secs] > This means that when I set these two, CMS STWs go from ~200ms to below > 10ms. > I'm leaning towards activating... > -XX:CMSScheduleRemarkEdenPenetration=0 > -XX:CMSScheduleRemarkEdenSizeThreshold=0 > -XX:CMSMaxAbortablePrecleanTime=30000 > What I have seen with these flags is that as soon as a young is > completely collected during abortable preclean, the remark is > scheduled and since it can start when eden is nearly empty, it is > ridicously fast. In case it takes a long time for preclean to catch a > young collection, it is also fine because no promotion is being made. > We can live with the pause of young plus a consecutive remark (for us, > young is ~10ms). > So, to the question - is there any obvious drawbacks with the three > settings above? Why does eden have to be 50% (default) in order for a > remark to be scheduled (besides spreading the pause)? It does only > seem to do harm. Any reason? > -XX:+CMSScavengeBeforeRemark I'm thinking to avoid since it can't be > completely trusted. Usually it helps, but that is not good enough > since the pauses get irregular in case it fails. And with these > settings above, it will only add to the CMS pause. > Best Regards, > Gustav ?kesson > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140115/59f03f13/attachment.html From gustav.r.akesson at gmail.com Wed Jan 15 11:41:14 2014 From: gustav.r.akesson at gmail.com (=?ISO-8859-1?Q?Gustav_=C5kesson?=) Date: Wed, 15 Jan 2014 20:41:14 +0100 Subject: Long remark due to young generation occupancy In-Reply-To: <52D6D262.6010106@oracle.com> References: <52D6D262.6010106@oracle.com> Message-ID: Hi Jon, Thanks for looking into this. A clarification for "Wait up to 30 seconds for the remark to be scheduled after a young collection": Is this really the case? Is this timeout used after a young collection? I was under the impression that CMSMaxAbortablePrecleanTime precleans and waits for a young collection, and if one occurs (or timeout) the remark is scheduled. Here we wait for 30 seconds to a young to happen. Right..? The work for remark is to revisit updated objects and trace from roots again (missing something? ah, and reference processing, but that is practically no overhead for us). What is usually the biggest cost of the remark? To scan the dirty cards or to trace from roots? Perhaps this depends on the application - you're talking about "not always the case". What do you refer to? If we have en empty young generation, what could bring the remark phase to e.g. 200ms on a high-end server like ours? For my application, it seems that it tracing from roots that is the most expensive. In such scenario, spreading the pause seems as beneficial as not running a young collection prior to initial mark (which is highly dependent on occupancy in young). Especially since young collection is so fast, at least for us. Regarding the last section that we wait a long time for sweeping - does this really matter? Yes, we have a lot of floating garbage in case young collections are infrequent and we keep on precleaning, but that also means no promotions. The garbage is just sitting there on the heap taking space, but no one is claiming that space until a young collection. And by then the sweeping proceeds. Or am I missing something? Best Regards, Gustav ?kesson On Wed, Jan 15, 2014 at 7:24 PM, Jon Masamitsu wrote: > > -XX:CMSScheduleRemarkEdenPenetration=0 > > Schedule the remark pause immediately after the > next young collection. > > -XX:CMSScheduleRemarkEdenSizeThreshold=0 > > Any sized eden should allow scheduling of the remark > pause. That is, no eden is too small to schedule. > > -XX:CMSMaxAbortablePrecleanTime=30000 > > > Wait up to 30 seconds for the remark to be scheduled > after a young collection. Otherwise, wait only up to > the default of 5 seconds. > > > So, to the question - is there any obvious drawbacks with the three > settings above? Why does eden have to be 50% (default) in order for a > remark to be scheduled (besides spreading the pause)? It does only seem to > do harm. Any reason? > > > The default is 50% to try and place the remark pause between two young > pauses > (spread it out as you say). I don't believe it is always the case that > the remark > pause is very small if it is scheduled immediately after a young > collection. In > such cases we still want to spread out the pauses. > > If the remark is delayed to wait for the next young collection, > the sweeping is also delayed. You're not using up space in the > CMS (tenured) generation but you're also not collecting garbage > and not making additional space available for reuse (which the > concurrent sweep does). > > Jon > > > On 01/13/2014 02:50 AM, Gustav ?kesson wrote: > > Hi, > > This is a topic which has been discussed before, but I think I have some > new findings. We're experiencing problems with CMS pauses. > > Settings we are using. > > -XX:+UseConcMarkSweepGC > -XX:CMSInitiatingOccupancyFraction=68 > -XX:MaxTenuringThreshold=0 > -XX:+UseParNewGC > -XX:+ScavengeBeforeFullGC > -XX:CMSWaitDuration=30000 > -Xmx2048M > -Xms2048M > -Xmn1024M > > Note that MaxTenuringThreshold is 0. This is only done during test to > provoke the CMS to run more frequently (otherwise it runs once every > day...). Due to this, promotion to old generation is around 400K to 1M per > second. > > We have an allocation rate of roughly 1G per second, meaning that YGC runs > once every second. > > We're running JDK7u17. > > > This is a log entry when running with above settings. This entry is the > typical example to all of the CMS collections in this test. > > > *2014-01-13T09:31:52.504+0100: 661.675: [GC [1 CMS-initial-mark: > 524986K(1048576K)] 526507K(2096192K), 0.0023550 secs] [Times: user=0.00 > sys=0.00, real=0.01 secs] *2014-01-13T09:31:52.506+0100: 661.677: > [CMS-concurrent-mark-start] > 2014-01-13T09:31:52.644+0100: 661.815: [CMS-concurrent-mark: 0.138/0.138 > secs] [Times: user=1.96 sys=0.11, real=0.13 secs] > 2014-01-13T09:31:52.644+0100: 661.815: [CMS-concurrent-preclean-start] > 2014-01-13T09:31:52.655+0100: 661.826: [CMS-concurrent-preclean: > 0.010/0.011 secs] [Times: user=0.14 sys=0.02, real=0.02 secs] > 2014-01-13T09:31:52.655+0100: 661.826: > [CMS-concurrent-abortable-preclean-start] > 2014-01-13T09:31:53.584+0100: 662.755: [GC 662.755: [ParNew > Desired survivor size 491520 bytes, new threshold 0 (max 0) > : 1046656K->0K(1047616K), 0.0039870 secs] 1571642K->525579K(2096192K), > 0.0043310 secs] [Times: user=0.04 sys=0.00, real=0.01 secs] > 2014-01-13T09:31:54.146+0100: 663.317: [CMS-concurrent-abortable-preclean: > 0.831/1.491 secs] [Times: user=16.76 sys=1.54, real=1.49 secs] > > *2014-01-13T09:31:54.148+0100: 663.319: [GC[YG occupancy: 552670 K > (1047616 K)]663.319: [Rescan (parallel) , 0.2000060 secs]663.519: [weak > refs processing, 0.0008740 secs]663.520: [scrub string table, 0.0006940 > secs] [1 CMS-remark: 525579K(1048576K)] 1078249K(2096192K), 0.2017690 secs] > [Times: user=3.53 sys=0.01, real=0.20 secs] *2014-01-13T09:31:54.350+0100: > 663.521: [CMS-concurrent-sweep-start] > 2014-01-13T09:31:54.846+0100: 664.017: [GC 664.017: [ParNew > Desired survivor size 491520 bytes, new threshold 0 (max 0) > : 1046656K->0K(1047616K), 0.0033500 secs] 1330075K->284041K(2096192K), > 0.0034660 secs] [Times: user=0.04 sys=0.00, real=0.00 secs] > 2014-01-13T09:31:55.020+0100: 664.191: [CMS-concurrent-sweep: 0.665/0.670 > secs] [Times: user=7.77 sys=0.71, real=0.67 secs] > 2014-01-13T09:31:55.020+0100: 664.191: [CMS-concurrent-reset-start] > 2014-01-13T09:31:55.023+0100: 664.194: [CMS-concurrent-reset: 0.003/0.003 > secs] [Times: user=0.03 sys=0.00, real=0.00 secs] > > The initial pause is fine. Then I investigated how to reduce the remark > phase, and activated -XX:+CMSScavengeBeforeRemark. That flag partly solves > this issue (not active in the log above), but I've seen cases when it does > not scavenge (I suspect JNI critical section), which is bad and generates > yet again long remark pause. And yet again the pause is correlated towards > the occupancy in young. > > So instead, I tried setting... > > -XX:CMSScheduleRemarkEdenPenetration=0 > -XX:CMSScheduleRemarkEdenSizeThreshold=0 > > This is a log entry with the settings at the top plus the two above... > > > *2014-01-13T10:18:25.757+0100: 590.198: [GC [1 CMS-initial-mark: > 524654K(1048576K)] 526646K(2096192K), 0.0029130 secs] [Times: user=0.00 > sys=0.00, real=0.01 secs] *2014-01-13T10:18:25.760+0100: 590.201: > [CMS-concurrent-mark-start] > 2014-01-13T10:18:25.904+0100: 590.345: [CMS-concurrent-mark: 0.144/0.144 > secs] [Times: user=1.98 sys=0.15, real=0.14 secs] > 2014-01-13T10:18:25.904+0100: 590.346: [CMS-concurrent-preclean-start] > 2014-01-13T10:18:25.912+0100: 590.354: [CMS-concurrent-preclean: > 0.008/0.008 secs] [Times: user=0.11 sys=0.00, real=0.01 secs] > 2014-01-13T10:18:25.912+0100: 590.354: > [CMS-concurrent-abortable-preclean-start] > 2014-01-13T10:18:26.836+0100: 591.278: [GC 591.278: [ParNew > Desired survivor size 491520 bytes, new threshold 0 (max 0) > : 1046656K->0K(1047616K), 0.0048160 secs] 1571310K->525477K(2096192K), > 0.0049240 secs] [Times: user=0.05 sys=0.00, real=0.01 secs] > 2014-01-13T10:18:26.842+0100: 591.283: [CMS-concurrent-abortable-preclean: > 0.608/0.929 secs] [Times: user=10.77 sys=0.97, real=0.93 secs] > > *2014-01-13T10:18:26.843+0100: 591.285: [GC[YG occupancy: 20938 K (1047616 > K)]591.285: [Rescan (parallel) , 0.0024770 secs]591.287: [weak refs > processing, 0.0007760 secs]591.288: [scrub string table, 0.0006440 secs] [1 > CMS-remark: 525477K(1048576K)] 546415K(2096192K), 0.0040480 secs] [Times: > user=0.03 sys=0.00, real=0.00 secs] *2014-01-13T10:18:26.848+0100: > 591.289: [CMS-concurrent-sweep-start] > 2014-01-13T10:18:27.573+0100: 592.015: [CMS-concurrent-sweep: 0.726/0.726 > secs] [Times: user=8.50 sys=0.76, real=0.73 secs] > 2014-01-13T10:18:27.573+0100: 592.015: [CMS-concurrent-reset-start] > 2014-01-13T10:18:27.576+0100: 592.017: [CMS-concurrent-reset: 0.003/0.003 > secs] [Times: user=0.03 sys=0.01, real=0.00 secs] > > This means that when I set these two, CMS STWs go from ~200ms to below > 10ms. > > I'm leaning towards activating... > > -XX:CMSScheduleRemarkEdenPenetration=0 > -XX:CMSScheduleRemarkEdenSizeThreshold=0 > -XX:CMSMaxAbortablePrecleanTime=30000 > > > What I have seen with these flags is that as soon as a young is completely > collected during abortable preclean, the remark is scheduled and since it > can start when eden is nearly empty, it is ridicously fast. In case it > takes a long time for preclean to catch a young collection, it is also fine > because no promotion is being made. We can live with the pause of young > plus a consecutive remark (for us, young is ~10ms). > > So, to the question - is there any obvious drawbacks with the three > settings above? Why does eden have to be 50% (default) in order for a > remark to be scheduled (besides spreading the pause)? It does only seem to > do harm. Any reason? > > -XX:+CMSScavengeBeforeRemark I'm thinking to avoid since it can't be > completely trusted. Usually it helps, but that is not good enough since the > pauses get irregular in case it fails. And with these settings above, it > will only add to the CMS pause. > > > Best Regards, > > Gustav ?kesson > > > _______________________________________________ > hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140115/64e00278/attachment-0001.html From jon.masamitsu at oracle.com Wed Jan 15 17:22:00 2014 From: jon.masamitsu at oracle.com (Jon Masamitsu) Date: Wed, 15 Jan 2014 17:22:00 -0800 Subject: Long remark due to young generation occupancy In-Reply-To: References: <52D6D262.6010106@oracle.com> Message-ID: <52D73438.2070407@oracle.com> On 1/15/2014 11:41 AM, Gustav ?kesson wrote: > Hi Jon, > > Thanks for looking into this. > > A clarification for "Wait up to 30 seconds for the remark to be > scheduled after a young collection": > Is this really the case? Is this timeout used after a young > collection? I was under the impression that > CMSMaxAbortablePrecleanTime precleans and waits for a young > collection, and if one occurs (or timeout) the remark is scheduled. > Here we wait for 30 seconds to a young to happen. Right..? I think it is 1) concurrent marking runs 2) concurrent precleaning runs 3) concurrent abortable precleaning runs until the remark can be scheduled or the timeout is reached With CMSMaxAbortablePrecleanTime set to 30000, the abortable precleaning runs for 30 seconds unless aborted. Scheduling the remark means waiting for the next young collection to empty eden and then waiting for allocations to fill up eden to the CMSScheduleRemarkEdenPenetration percentage. When CMSScheduleRemarkEdenPenetration is reached abortable precleaning is aborted. 4) remark runs > > The work for remark is to revisit updated objects and trace from > roots again (missing something? ah, and reference processing, but that > is practically no overhead for us). What is usually the biggest cost > of the remark? To scan the dirty cards or to trace from roots? Perhaps > this depends on the application - you're talking about "not always the > case". What do you refer to? If we have en empty young generation, > what could bring the remark phase to e.g. 200ms on a high-end server > like ours? There could be thousands of thread stacks to scan. Some applications make heavy use of soft References. Class unloading happens at remark. The young gen is not necessarily empty after a young collection. Some applications make good use of the survivor spaces. > > For my application, it seems that it tracing from roots that is the > most expensive. In such scenario, spreading the pause seems as > beneficial as not running a young collection prior to initial mark > (which is highly dependent on occupancy in young). Especially since > young collection is so fast, at least for us. > > Regarding the last section that we wait a long time for sweeping - > does this really matter? Yes, we have a lot of floating garbage in > case young collections are infrequent and we keep on precleaning, but > that also means no promotions. The garbage is just sitting there on > the heap taking space, but no one is claiming that space until a young > collection. And by then the sweeping proceeds. Or am I missing something? Some applications can be easily scaled up to the point where the allocation rate (and promotion rate because of their object lifetimes) exceeds the rate at which CMS can collect. Such applications sometimes are run with very little excess space in the heap and any delay in any part of the CMS collection can mean CMS loses the race and falls back to a full collection. That's all I'm saying. If you're not in that situation, don't worry about it. Jon > > > Best Regards, > > Gustav ?kesson > > > > > > > On Wed, Jan 15, 2014 at 7:24 PM, Jon Masamitsu > > wrote: > >> -XX:CMSScheduleRemarkEdenPenetration=0 > Schedule the remark pause immediately after the > next young collection. >> -XX:CMSScheduleRemarkEdenSizeThreshold=0 > Any sized eden should allow scheduling of the remark > pause. That is, no eden is too small to schedule. >> -XX:CMSMaxAbortablePrecleanTime=30000 > Wait up to 30 seconds for the remark to be scheduled > after a young collection. Otherwise, wait only up to > the default of 5 seconds. > > >> So, to the question - is there any obvious drawbacks with the >> three settings above? Why does eden have to be 50% (default) in >> order for a remark to be scheduled (besides spreading the pause)? >> It does only seem to do harm. Any reason? > > The default is 50% to try and place the remark pause between two > young pauses > (spread it out as you say). I don't believe it is always the > case that the remark > pause is very small if it is scheduled immediately after a young > collection. In > such cases we still want to spread out the pauses. > > If the remark is delayed to wait for the next young collection, > the sweeping is also delayed. You're not using up space in the > CMS (tenured) generation but you're also not collecting garbage > and not making additional space available for reuse (which the > concurrent sweep does). > > Jon > > > On 01/13/2014 02:50 AM, Gustav ?kesson wrote: >> Hi, >> This is a topic which has been discussed before, but I think I >> have some new findings. We're experiencing problems with CMS pauses. >> Settings we are using. >> -XX:+UseConcMarkSweepGC >> -XX:CMSInitiatingOccupancyFraction=68 >> -XX:MaxTenuringThreshold=0 >> -XX:+UseParNewGC >> -XX:+ScavengeBeforeFullGC >> -XX:CMSWaitDuration=30000 >> -Xmx2048M >> -Xms2048M >> -Xmn1024M >> Note that MaxTenuringThreshold is 0. This is only done during >> test to provoke the CMS to run more frequently (otherwise it runs >> once every day...). Due to this, promotion to old generation is >> around 400K to 1M per second. >> We have an allocation rate of roughly 1G per second, meaning that >> YGC runs once every second. >> We're running JDK7u17. >> This is a log entry when running with above settings. This entry >> is the typical example to all of the CMS collections in this test. >> *2014-01-13T09:31:52.504+0100: 661.675: [GC [1 CMS-initial-mark: >> 524986K(1048576K)] 526507K(2096192K), 0.0023550 secs] [Times: >> user=0.00 sys=0.00, real=0.01 secs] >> *2014-01-13T09:31:52.506+0100: 661.677: [CMS-concurrent-mark-start] >> 2014-01-13T09:31:52.644+0100: 661.815: [CMS-concurrent-mark: >> 0.138/0.138 secs] [Times: user=1.96 sys=0.11, real=0.13 secs] >> 2014-01-13T09:31:52.644+0100: 661.815: >> [CMS-concurrent-preclean-start] >> 2014-01-13T09:31:52.655+0100: 661.826: [CMS-concurrent-preclean: >> 0.010/0.011 secs] [Times: user=0.14 sys=0.02, real=0.02 secs] >> 2014-01-13T09:31:52.655+0100: 661.826: >> [CMS-concurrent-abortable-preclean-start] >> 2014-01-13T09:31:53.584+0100: 662.755: [GC 662.755: [ParNew >> Desired survivor size 491520 bytes, new threshold 0 (max 0) >> : 1046656K->0K(1047616K), 0.0039870 secs] >> 1571642K->525579K(2096192K), 0.0043310 secs] [Times: user=0.04 >> sys=0.00, real=0.01 secs] >> 2014-01-13T09:31:54.146+0100: 663.317: >> [CMS-concurrent-abortable-preclean: 0.831/1.491 secs] [Times: >> user=16.76 sys=1.54, real=1.49 secs] >> *2014-01-13T09:31:54.148+0100: 663.319: [GC[YG occupancy: 552670 >> K (1047616 K)]663.319: [Rescan (parallel) , 0.2000060 >> secs]663.519: [weak refs processing, 0.0008740 secs]663.520: >> [scrub string table, 0.0006940 secs] [1 CMS-remark: >> 525579K(1048576K)] 1078249K(2096192K), 0.2017690 secs] [Times: >> user=3.53 sys=0.01, real=0.20 secs] >> *2014-01-13T09:31:54.350+0100: 663.521: [CMS-concurrent-sweep-start] >> 2014-01-13T09:31:54.846+0100: 664.017: [GC 664.017: [ParNew >> Desired survivor size 491520 bytes, new threshold 0 (max 0) >> : 1046656K->0K(1047616K), 0.0033500 secs] >> 1330075K->284041K(2096192K), 0.0034660 secs] [Times: user=0.04 >> sys=0.00, real=0.00 secs] >> 2014-01-13T09:31:55.020+0100: 664.191: [CMS-concurrent-sweep: >> 0.665/0.670 secs] [Times: user=7.77 sys=0.71, real=0.67 secs] >> 2014-01-13T09:31:55.020+0100: 664.191: [CMS-concurrent-reset-start] >> 2014-01-13T09:31:55.023+0100: 664.194: [CMS-concurrent-reset: >> 0.003/0.003 secs] [Times: user=0.03 sys=0.00, real=0.00 secs] >> The initial pause is fine. Then I investigated how to reduce the >> remark phase, and activated -XX:+CMSScavengeBeforeRemark. That >> flag partly solves this issue (not active in the log above), but >> I've seen cases when it does not scavenge (I suspect JNI critical >> section), which is bad and generates yet again long remark pause. >> And yet again the pause is correlated towards the occupancy in young. >> So instead, I tried setting... >> -XX:CMSScheduleRemarkEdenPenetration=0 >> -XX:CMSScheduleRemarkEdenSizeThreshold=0 >> This is a log entry with the settings at the top plus the two >> above... >> *2014-01-13T10:18:25.757+0100: 590.198: [GC [1 CMS-initial-mark: >> 524654K(1048576K)] 526646K(2096192K), 0.0029130 secs] [Times: >> user=0.00 sys=0.00, real=0.01 secs] >> *2014-01-13T10:18:25.760+0100: 590.201: [CMS-concurrent-mark-start] >> 2014-01-13T10:18:25.904+0100: 590.345: [CMS-concurrent-mark: >> 0.144/0.144 secs] [Times: user=1.98 sys=0.15, real=0.14 secs] >> 2014-01-13T10:18:25.904+0100: 590.346: >> [CMS-concurrent-preclean-start] >> 2014-01-13T10:18:25.912+0100: 590.354: [CMS-concurrent-preclean: >> 0.008/0.008 secs] [Times: user=0.11 sys=0.00, real=0.01 secs] >> 2014-01-13T10:18:25.912+0100: 590.354: >> [CMS-concurrent-abortable-preclean-start] >> 2014-01-13T10:18:26.836+0100: 591.278: [GC 591.278: [ParNew >> Desired survivor size 491520 bytes, new threshold 0 (max 0) >> : 1046656K->0K(1047616K), 0.0048160 secs] >> 1571310K->525477K(2096192K), 0.0049240 secs] [Times: user=0.05 >> sys=0.00, real=0.01 secs] >> 2014-01-13T10:18:26.842+0100: 591.283: >> [CMS-concurrent-abortable-preclean: 0.608/0.929 secs] [Times: >> user=10.77 sys=0.97, real=0.93 secs] >> *2014-01-13T10:18:26.843+0100: 591.285: [GC[YG occupancy: 20938 K >> (1047616 K)]591.285: [Rescan (parallel) , 0.0024770 secs]591.287: >> [weak refs processing, 0.0007760 secs]591.288: [scrub string >> table, 0.0006440 secs] [1 CMS-remark: 525477K(1048576K)] >> 546415K(2096192K), 0.0040480 secs] [Times: user=0.03 sys=0.00, >> real=0.00 secs] >> *2014-01-13T10:18:26.848+0100: 591.289: [CMS-concurrent-sweep-start] >> 2014-01-13T10:18:27.573+0100: 592.015: [CMS-concurrent-sweep: >> 0.726/0.726 secs] [Times: user=8.50 sys=0.76, real=0.73 secs] >> 2014-01-13T10:18:27.573+0100: 592.015: [CMS-concurrent-reset-start] >> 2014-01-13T10:18:27.576+0100: 592.017: [CMS-concurrent-reset: >> 0.003/0.003 secs] [Times: user=0.03 sys=0.01, real=0.00 secs] >> This means that when I set these two, CMS STWs go from ~200ms to >> below 10ms. >> I'm leaning towards activating... >> -XX:CMSScheduleRemarkEdenPenetration=0 >> -XX:CMSScheduleRemarkEdenSizeThreshold=0 >> -XX:CMSMaxAbortablePrecleanTime=30000 >> What I have seen with these flags is that as soon as a young is >> completely collected during abortable preclean, the remark is >> scheduled and since it can start when eden is nearly empty, it is >> ridicously fast. In case it takes a long time for preclean to >> catch a young collection, it is also fine because no promotion is >> being made. We can live with the pause of young plus a >> consecutive remark (for us, young is ~10ms). >> So, to the question - is there any obvious drawbacks with the >> three settings above? Why does eden have to be 50% (default) in >> order for a remark to be scheduled (besides spreading the pause)? >> It does only seem to do harm. Any reason? >> -XX:+CMSScavengeBeforeRemark I'm thinking to avoid since it can't >> be completely trusted. Usually it helps, but that is not good >> enough since the pauses get irregular in case it fails. And with >> these settings above, it will only add to the CMS pause. >> Best Regards, >> Gustav ?kesson >> >> >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140115/be0a04ce/attachment.html From gustav.r.akesson at gmail.com Thu Jan 16 22:50:51 2014 From: gustav.r.akesson at gmail.com (=?ISO-8859-1?Q?Gustav_=C5kesson?=) Date: Fri, 17 Jan 2014 07:50:51 +0100 Subject: Long remark due to young generation occupancy In-Reply-To: References: <52D6D262.6010106@oracle.com> <52D73438.2070407@oracle.com> Message-ID: Hi, (Sorry for spam, Jon - didn't reply below to all in gc-use) There could be thousands of thread stacks to scan. Some applications make heavy use of soft References. Class unloading happens at remark. The young gen is not necessarily empty after a young collection. Some applications make good use of the survivor spaces. Perhaps our application is less dependant on this, since we're only having a couple of hundred threads, no use of soft references and hardly every unload any classes. Also our aim is to have every request's allocation die in eden (or first collection from survivor). Likely my ideas presented here is well-suited for our application due to these reasons - our biggest remark bottleneck is the size of young generation. Some applications can be easily scaled up to the point where the allocation rate (and promotion rate because of their object lifetimes) exceeds the rate at which CMS can collect. Such applications sometimes are run with very little excess space in the heap and any delay in any part of the CMS collection can mean CMS loses the race and falls back to a full collection. That's all I'm saying. If you're not in that situation, don't worry about it. But if the application is scaled up to an extreme allocation rate (and promotion rate) then we will also hit YGCs more often, which means that the abortable preclean will exit and schedule remark and then sweep. Then it doesn't matter if abortable preclean if 5000 or 30000 in case YGC hits e.g. every 3s - right? On the other hand, in case the allocation rate (and thus, also promotion rate) is low then the abortable preclean runs and the garbage is not bothering anyone sitting on the heap waiting for a YGC. An update for this experiment - during the night I ran 57 CMS collections and 52 of them were below 10ms. The other 5 were pretty long - 100ms to 200ms and yet again the pauses can be correlated towards the occupancy of young. In the long pauses, after exiting the abortable preclean 100ms lapsed before starting the remarking, making eden have roughly 120mb of occupancy. Folks, I'd very much appreciate if we could keep this discussion alive and please give any input possible regarding these flags. Any input or experience is appreciated Thanks for your insights, Jon. Best Regards, Gustav ?kesson On Thu, Jan 16, 2014 at 9:26 AM, Gustav ?kesson wrote: > Hi, > > > There could be thousands of thread stacks to scan. Some applications make > heavy > use of soft References. Class unloading happens at remark. The young gen > is not > necessarily empty after a young collection. Some applications make good > use of the > survivor spaces. > > > Perhaps our application is less dependant on this, since we're only having > a couple of hundred threads, no use of soft references and hardly every > unload any classes. Also our aim is to have every request's allocation die > in eden (or first collection from survivor). Likely my ideas presented here > is well-suited for our application due to these reasons - our biggest > remark bottleneck is the size of young generation. > > > > Some applications can be easily scaled up to the point where the > allocation rate (and promotion rate because of their object lifetimes) > exceeds the rate at which CMS can collect. Such applications sometimes > are run with very little excess space in the heap and any delay in > any part of the CMS collection can mean CMS loses the race and > falls back to a full collection. That's all I'm saying. If you're not > in that situation, don't worry about it. > > > > But if the application is scaled up to an extreme allocation rate (and > promotion rate) then we will also hit YGCs more often, which means that the > abortable preclean will exit and schedule remark and then sweep. Then it > doesn't matter if abortable preclean if 5000 or 30000 in case YGC hits e.g. > every 3s - right? On the other hand, in case the allocation rate (and thus, > also promotion rate) is low then the abortable preclean runs and > the garbage is not bothering anyone sitting on the heap waiting for a YGC. > > > An update for this experiment - during the night I ran 57 CMS collections > and 52 of them were below 10ms. The other 5 were pretty long - 100ms to > 200ms and yet again the pauses can be correlated towards the occupancy of > young. In the long pauses, after exiting the abortable preclean 100ms > lapsed before starting the remarking, making eden have roughly 120mb of > occupancy. > > > Folks, I'd very much appreciate if we could keep this discussion alive and > please give any input possible regarding these flags. Any input or > experience is appreciated > > Thanks for your insights, Jon. > > > Best Regards, > > Gustav ?kesson > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140117/bf18a946/attachment.html From gustav.r.akesson at gmail.com Thu Jan 16 22:59:56 2014 From: gustav.r.akesson at gmail.com (=?ISO-8859-1?Q?Gustav_=C5kesson?=) Date: Fri, 17 Jan 2014 07:59:56 +0100 Subject: Long remark due to young generation occupancy In-Reply-To: References: <52D6D262.6010106@oracle.com> <52D73438.2070407@oracle.com> Message-ID: Hi, An update on this experiment. I managed to track down the issue with 5 of the 57 collections (which still had 100-200ms pauses). It was due to the abortable preclean sleeping. When abortable preclean scans dirty cards and less than 100 cards were scanned, then it sleeps for 100ms. In case that happens and a YG is processed, then it will take ~100ms (default) to reach the remark, which means that eden will fill up again. When I lowered the sleep time then the highest GC (out of 50) was 43ms and 33ms. Rest was <11ms. Most of them were 7-8ms. Best Regards, Gustav ?kesson -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140117/a4a60327/attachment.html From gustav.r.akesson at gmail.com Wed Jan 22 03:12:22 2014 From: gustav.r.akesson at gmail.com (=?ISO-8859-1?Q?Gustav_=C5kesson?=) Date: Wed, 22 Jan 2014 12:12:22 +0100 Subject: Fragmentation and UseCMSInitiatingOccupancyOnly Message-ID: Hi, In case UseCMSInitiatingOccupancyOnly is enabled we instruct CMS to start at X% of old gen, and not try to figure out by itself when to start. My understanding is that when flag is disabled, CMS is aiming for X%, but uses statistics of previous collections (GC rate, GC time) to determine when to initiate. My question is whether enabling UseCMSInitiatingOccupancyOnly increases the risk of promotion failure (FullGC) due to fragmentation, meaning that it will always honor X% rule and rather generate promotion failure event than run CMS prematurely? Or, in case flag is disabled, is CMS smart enough to start prior to X% when heap is fragmented instead of generating promotion failure? Best Regards, Gustav ?kesson -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140122/92e95a10/attachment.html From bernd-2014 at eckenfels.net Wed Jan 22 03:37:27 2014 From: bernd-2014 at eckenfels.net (Bernd Eckenfels) Date: Wed, 22 Jan 2014 12:37:27 +0100 Subject: Fragmentation and UseCMSInitiatingOccupancyOnly In-Reply-To: References: Message-ID: <95E1766F-DADA-411D-A48E-8CC603169926@eckenfels.net> If You set the percentage low, the Risk is, that oldgen will be permanently over the threshold (this might be wanted?). If you set it high, then AF might happen due to fragmentation or background collection beeing too slow. I think fragmentation is not honored, therefore your desired oldheap size should account for that an have plenty of (untouched) headroom. Bernd > Am 22.01.2014 um 12:12 schrieb Gustav ?kesson : > > Hi, > > In case UseCMSInitiatingOccupancyOnly is enabled we instruct CMS to start at X% of old gen, and not try to figure out by itself when to start. My understanding is that when flag is disabled, CMS is aiming for X%, but uses statistics of previous collections (GC rate, GC time) to determine when to initiate. > > My question is whether enabling UseCMSInitiatingOccupancyOnly increases the risk of promotion failure (FullGC) due to fragmentation, meaning that it will always honor X% rule and rather generate promotion failure event than run CMS prematurely? > > Or, in case flag is disabled, is CMS smart enough to start prior to X% when heap is fragmented instead of generating promotion failure? > > > Best Regards, > > Gustav ?kesson > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From gustav.r.akesson at gmail.com Wed Jan 22 11:37:29 2014 From: gustav.r.akesson at gmail.com (=?iso-8859-1?Q?Gustav_=C5kesson?=) Date: Wed, 22 Jan 2014 20:37:29 +0100 Subject: Fragmentation and UseCMSInitiatingOccupancyOnly In-Reply-To: <95E1766F-DADA-411D-A48E-8CC603169926@eckenfels.net> References: <95E1766F-DADA-411D-A48E-8CC603169926@eckenfels.net> Message-ID: Hi, Not sure I understand the answer - in case the flag is disabled, is little contiguous free space (i.e. fragmentation) in oldgen a variable which could trigger CMS prior to the set occupancy fraction? I.e. is CMS smart enough to take into account the possibility of of clearing out dead objects (thus, reducing fragmentation) before hitting a promotion failure. And if you set UseCMSInitiatingOccupancyOnly then you remove this feature from CMS? The reason I'm asking is that I got a couple of FullGCs due to promotion failure when setting UseCMSInitiatingOccupancyOnly, which did not occur when the flag was disabled. And this was on a 512mb heap with occupancy fraction 80% and only 20mb live data. Best Regards, Gustav ?kesson 22 jan 2014 kl. 12:37 skrev Bernd Eckenfels : > If You set the percentage low, the Risk is, that oldgen will be permanently over the threshold (this might be wanted?). If you set it high, then AF might happen due to fragmentation or background collection beeing too slow. > > I think fragmentation is not honored, therefore your desired oldheap size should account for that an have plenty of (untouched) headroom. > > Bernd > >> Am 22.01.2014 um 12:12 schrieb Gustav ?kesson : >> >> Hi, >> >> In case UseCMSInitiatingOccupancyOnly is enabled we instruct CMS to start at X% of old gen, and not try to figure out by itself when to start. My understanding is that when flag is disabled, CMS is aiming for X%, but uses statistics of previous collections (GC rate, GC time) to determine when to initiate. >> >> My question is whether enabling UseCMSInitiatingOccupancyOnly increases the risk of promotion failure (FullGC) due to fragmentation, meaning that it will always honor X% rule and rather generate promotion failure event than run CMS prematurely? >> >> Or, in case flag is disabled, is CMS smart enough to start prior to X% when heap is fragmented instead of generating promotion failure? >> >> >> Best Regards, >> >> Gustav ?kesson >> _______________________________________________ >> hotspot-gc-use mailing list >> hotspot-gc-use at openjdk.java.net >> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From bernd-2014 at eckenfels.net Wed Jan 22 13:04:07 2014 From: bernd-2014 at eckenfels.net (Bernd Eckenfels) Date: Wed, 22 Jan 2014 22:04:07 +0100 Subject: Fragmentation and UseCMSInitiatingOccupancyOnly In-Reply-To: References: <95E1766F-DADA-411D-A48E-8CC603169926@eckenfels.net> Message-ID: Hello Gustav, first of all, I noticed you are talking about a 512mb heap. Can you maybe elaborate what hardware that is, and what pause times you see and expect? Whats your newsize? Are setting Xmx/Xms to same values? With parold and smaller heaps having a smaller initial size will reduce pause times even more. I would expect that in most common scenarios ParallelOld is much more reliable (no concurrent mode risk falling back to serial gc; defragmenting), easy to tune (consistent behaviour). And the pause times should be comparable to your young collections (and certainly much smaller than full non-paralle collections). But, back to your question, I am not so familiar with the AdaptiveSizing and FreeList code, but the rest of CMS does not seem to care about freechunks vs. fragmentation when considering free memory. Maybe somebody else can comment on that. BTW: I think you can use a larger (more than default 10%) safty margin instead of using a lower Occupancy setting if you want to keep the dynamic adjustment property but not want to risk concurrent mode failures. Am 22.01.2014, 20:37 Uhr, schrieb Gustav ?kesson : > Not sure I understand the answer - in case the flag is disabled, is > little contiguous free space (i.e. fragmentation) in oldgen a variable From denny.kettwig at werum.de Thu Jan 23 00:38:10 2014 From: denny.kettwig at werum.de (Denny Kettwig) Date: Thu, 23 Jan 2014 08:38:10 +0000 Subject: AW: Unexplanable events in GC logs Message-ID: <6175F8C4FE407D4F830EDA25C27A43173B66291C@Werum1790.werum.net> Hey folks, in one of our recent cluster systems we found 2 unexplainable events within the GC logs. I'd like to address these events to you and ask for your advice. We are running a clustered jBoss System with 4 nodes, every node has the same configuration. We use the following parameters: -Xms10g -Xmx10g -Xmn3g -Xss2048k -XX:+ExplicitGCInvokesConcurrent -XX:+CMSClassUnloadingEnabled -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:ParallelGCThreads=22 -XX:SurvivorRatio=8 -XX:TargetSurvivorRatio=90 -XX:PermSize=512M -XX:MaxPermSize=512m -Dsun.rmi.dgc.client.gcInterval=3600000 -Dsun.rmi.dgc.server.gcInterval=3600000 Event 1 The first event is a very frequent CMS collections on the first node for about 11 hours. We are talking here about a peak value of 258 CMS collections per hour. From my knowledge this event starts without any reason and ends without any reason since the old space is below 10% usage in this time frame. I do not know what might have caused this CMS collection. We already experienced similar events on this cluster in the past, but at a much higher heap usage (above 90%) and under high load with most likely a huge heap fragmentation and the frequent CMS collections ended with a single very long Full GC that defragmented the heap. In the current event all this is not the case. I attached the relevant part of the GC log. Event 2 The second event is just as confusing for me as the first one. At a certain point in time a full GC takes place on all 4 Nodes and I cannot find a reason for it. Here are the relevant parts: N1 2013-12-20T05:43:28.041+0100: 768231.382: [GC 768231.383: [ParNew: 2618493K->92197K(2831168K), 0.3007160 secs] 3120456K->594160K(10171200K), 0.3017204 secs] [Times: user=0.91 sys=0.00, real=0.30 secs] 2013-12-20T05:45:31.864+0100: 768355.209: [Full GC 768355.210: [CMS: 501963K->496288K(7340032K), 4.2140085 secs] 1397267K->496288K(10171200K), [CMS Perm : 203781K->178528K(524288K)], 4.2148018 secs] [Times: user=4.21 sys=0.00, real=4.21 secs] 2013-12-20T05:52:40.591+0100: 768783.949: [GC 768783.949: [ParNew: 2516608K->48243K(2831168K), 0.2649174 secs] 3012896K->544532K(10171200K), 0.2659039 secs] [Times: user=0.47 sys=0.00, real=0.27 secs] N2 2013-12-20T04:57:21.524+0100: 765208.310: [GC 765208.311: [ParNew: 924566K->111068K(2831168K), 0.1790573 secs] 1416514K->603015K(10171200K), 0.1797121 secs] [Times: user=1.08 sys=0.00, real=0.19 secs] 2013-12-20T04:57:21.711+0100: 765208.499: [GC [1 CMS-initial-mark: 491947K(7340032K)] 603015K(10171200K), 0.3289639 secs] [Times: user=0.33 sys=0.00, real=0.33 secs] 2013-12-20T04:57:22.039+0100: 765208.828: [CMS-concurrent-mark-start] 2013-12-20T04:57:22.616+0100: 765209.405: [CMS-concurrent-mark: 0.577/0.577 secs] [Times: user=3.53 sys=0.05, real=0.58 secs] 2013-12-20T04:57:22.616+0100: 765209.405: [CMS-concurrent-preclean-start] 2013-12-20T04:57:22.647+0100: 765209.430: [CMS-concurrent-preclean: 0.024/0.024 secs] [Times: user=0.03 sys=0.00, real=0.03 secs] 2013-12-20T04:57:22.647+0100: 765209.430: [CMS-concurrent-abortable-preclean-start] CMS: abort preclean due to time 2013-12-20T04:57:28.060+0100: 765214.848: [CMS-concurrent-abortable-preclean: 4.208/5.418 secs] [Times: user=4.10 sys=0.03, real=5.41 secs] 2013-12-20T04:57:28.076+0100: 765214.857: [GC[YG occupancy: 124872 K (2831168 K)]765214.857: [Rescan (parallel) , 0.0580906 secs]765214.916: [weak refs processing, 0.0000912 secs]765214.916: [class unloading, 0.0769742 secs]765214.993: [scrub symbol & string tables, 0.0612689 secs] [1 CMS-remark: 491947K(7340032K)] 616820K(10171200K), 0.2256506 secs] [Times: user=0.48 sys=0.00, real=0.22 secs] 2013-12-20T04:57:28.294+0100: 765215.083: [CMS-concurrent-sweep-start] 2013-12-20T04:57:29.043+0100: 765215.834: [CMS-concurrent-sweep: 0.750/0.750 secs] [Times: user=0.76 sys=0.00, real=0.75 secs] 2013-12-20T04:57:29.043+0100: 765215.834: [CMS-concurrent-reset-start] 2013-12-20T04:57:29.074+0100: 765215.856: [CMS-concurrent-reset: 0.022/0.022 secs] [Times: user=0.03 sys=0.00, real=0.03 secs] 2013-12-20T05:47:54.607+0100: 768241.413: [Full GC 768241.414: [CMS: 491947K->464864K(7340032K), 4.5053183 secs] 1910412K->464864K(10171200K), [CMS Perm : 183511K->168997K(524288K)], 4.5059088 secs] [Times: user=4.49 sys=0.02, real=4.51 secs] 2013-12-20T06:47:59.098+0100: 771845.954: [GC 771845.954: [ParNew: 1507774K->58195K(2831168K), 0.2307012 secs] 1972638K->523059K(10171200K), 0.2313931 secs] [Times: user=0.37 sys=0.00, real=0.23 secs] N3 2013-12-20T05:46:25.441+0100: 767981.526: [GC 767981.526: [ParNew: 2695212K->166641K(2831168K), 0.3278475 secs] 3268057K->739486K(10171200K), 0.3284853 secs] [Times: user=1.62 sys=0.00, real=0.33 secs] 2013-12-20T05:49:55.467+0100: 768191.578: [Full GC 768191.578: [CMS: 572844K->457790K(7340032K), 3.7687762 secs] 1216176K->457790K(10171200K), [CMS Perm : 181711K->169514K(524288K)], 3.7692999 secs] [Times: user=3.76 sys=0.00, real=3.79 secs] 2013-12-20T06:49:59.249+0100: 771795.415: [GC 771795.415: [ParNew: 1585077K->72146K(2831168K), 0.2632945 secs] 2042868K->529936K(10171200K), 0.2639889 secs] [Times: user=0.41 sys=0.00, real=0.27 secs] N4 2013-12-20T05:48:21.551+0100: 767914.067: [GC 767914.068: [ParNew: 2656327K->119432K(2831168K), 0.2603676 secs] 3222693K->685799K(10171200K), 0.2609581 secs] [Times: user=1.14 sys=0.00, real=0.26 secs] 2013-12-20T05:49:03.939+0100: 767956.457: [Full GC 767956.457: [CMS: 566366K->579681K(7340032K), 6.1324841 secs] 3149011K->579681K(10171200K), [CMS Perm : 190240K->174791K(524288K)], 6.1331389 secs] [Times: user=6.13 sys=0.00, real=6.13 secs] 2013-12-20T05:50:10.262+0100: 768022.762: [GC 768022.763: [ParNew: 2516608K->83922K(2831168K), 0.2157015 secs] 3096289K->663603K(10171200K), 0.2162262 secs] [Times: user=0.41 sys=0.00, real=0.22 secs] Between these Full GC are only a few minutes or as between N3 and N4 a few seconds. I made some research on possible reasons for a Full GC and this is the list I gathered so far: 1. Running out of old gen 2. Running out of perm gen 3. Calling System.gc() (indicated by System in the ouput) 4. Not having enough free space in Survivor Space to copy objects from Eden (promotion failed) 5. Running out of old gen before a concurrent collection can free it (Concurrent Mode Failure) 6. Having high fragmentation and not enough space for a larger object in old gen However none of these 6 conditions are fulfilled by any of the above shown full GC. So once again I'm lost and do not have an explanation for this. If you need the full logs for further analysis please let me know. Kind Regards Denny [cid:image002.png at 01CEF67E.8AEB4630] Werum Software & Systems AG Wulf-Werum-Strasse 3 | 21337 Lueneburg | Germany Tel. +49(0)4131/8900-983 | Fax +49(0)4131/8900-20 mailto:denny.kettwig at werum.de | http://www.werum.de VAT No. DE 116 083 850 | RG Lueneburg HRB 2262 Chairman of Supervisory Board: Johannes Zimmermann Executive Board: Hartmut Krome, Ruediger Schlierenkaemper, Hans-Peter Subel -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140123/b68b9c03/attachment.html -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 1089 bytes Desc: image001.jpg Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140123/b68b9c03/image001.jpg -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.png Type: image/png Size: 6441 bytes Desc: image002.png Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140123/b68b9c03/image002.png From matthew.miller at forgerock.com Thu Jan 23 03:58:48 2014 From: matthew.miller at forgerock.com (Matt Miller) Date: Thu, 23 Jan 2014 06:58:48 -0500 Subject: AW: Unexplanable events in GC logs In-Reply-To: <6175F8C4FE407D4F830EDA25C27A43173B66291C@Werum1790.werum.net> References: <6175F8C4FE407D4F830EDA25C27A43173B66291C@Werum1790.werum.net> Message-ID: <52E103F8.2070009@forgerock.com> Hi Denny, Another reason for a Full GC that you did not list is: running jmap against the process. If you run a histo:live or take a live object heap dump, a Full GC will happen (and it doesn't show as a System GC). Is it possible that you are taking a heap histogram or heap dump using the live option? -Matt On 1/23/14, 3:38 AM, Denny Kettwig wrote: > > Hey folks, > > in one of our recent cluster systems we found 2 unexplainable events > within the GC logs. I'd like to address these events to you and ask > for your advice. We are running a clustered jBoss System with 4 nodes, > every node has the same configuration. We use the following parameters: > > -Xms10g > > -Xmx10g > > -Xmn3g > > -Xss2048k > > -XX:+ExplicitGCInvokesConcurrent > > -XX:+CMSClassUnloadingEnabled > > -XX:+UseParNewGC > > -XX:+UseConcMarkSweepGC > > -XX:ParallelGCThreads=22 > > -XX:SurvivorRatio=8 > > -XX:TargetSurvivorRatio=90 > > -XX:PermSize=512M > > -XX:MaxPermSize=512m > > -Dsun.rmi.dgc.client.gcInterval=3600000 > > -Dsun.rmi.dgc.server.gcInterval=3600000 > > *Event 1* > > The first event is a very frequent CMS collections on the first node > for about 11 hours. We are talking here about a peak value of 258 CMS > collections per hour. From my knowledge this event starts without any > reason and ends without any reason since the old space is below 10% > usage in this time frame. I do not know what might have caused this > CMS collection. We already experienced similar events on this cluster > in the past, but at a much higher heap usage (above 90%) and under > high load with most likely a huge heap fragmentation and the frequent > CMS collections ended with a single very long Full GC that > defragmented the heap. In the current event all this is not the case. > *I attached the relevant part of the GC log*. > > *Event 2* > > The second event is just as confusing for me as the first one. At a > certain point in time a full GC takes place on *all 4 Nodes *and I > cannot find a reason for it. Here are the relevant parts: > > N1 > > 2013-12-20T05:43:28.041+0100: 768231.382: [GC 768231.383: [ParNew: > 2618493K->92197K(2831168K), 0.3007160 secs] > 3120456K->594160K(10171200K), 0.3017204 secs] [Times: user=0.91 > sys=0.00, real=0.30 secs] > > 2013-12-20T05:45:31.864+0100: 768355.209: [Full GC 768355.210: [CMS: > 501963K->496288K(7340032K), 4.2140085 secs] > 1397267K->496288K(10171200K), [CMS Perm : 203781K->178528K(524288K)], > 4.2148018 secs] [Times: user=4.21 sys=0.00, real=4.21 secs] > > 2013-12-20T05:52:40.591+0100: 768783.949: [GC 768783.949: [ParNew: > 2516608K->48243K(2831168K), 0.2649174 secs] > 3012896K->544532K(10171200K), 0.2659039 secs] [Times: user=0.47 > sys=0.00, real=0.27 secs] > > N2 > > 2013-12-20T04:57:21.524+0100: 765208.310: [GC 765208.311: [ParNew: > 924566K->111068K(2831168K), 0.1790573 secs] > 1416514K->603015K(10171200K), 0.1797121 secs] [Times: user=1.08 > sys=0.00, real=0.19 secs] > > 2013-12-20T04:57:21.711+0100: 765208.499: [GC [1 CMS-initial-mark: > 491947K(7340032K)] 603015K(10171200K), 0.3289639 secs] [Times: > user=0.33 sys=0.00, real=0.33 secs] > > 2013-12-20T04:57:22.039+0100: 765208.828: [CMS-concurrent-mark-start] > > 2013-12-20T04:57:22.616+0100: 765209.405: [CMS-concurrent-mark: > 0.577/0.577 secs] [Times: user=3.53 sys=0.05, real=0.58 secs] > > 2013-12-20T04:57:22.616+0100: 765209.405: [CMS-concurrent-preclean-start] > > 2013-12-20T04:57:22.647+0100: 765209.430: [CMS-concurrent-preclean: > 0.024/0.024 secs] [Times: user=0.03 sys=0.00, real=0.03 secs] > > 2013-12-20T04:57:22.647+0100: 765209.430: > [CMS-concurrent-abortable-preclean-start] > > CMS: abort preclean due to time 2013-12-20T04:57:28.060+0100: > 765214.848: [CMS-concurrent-abortable-preclean: 4.208/5.418 secs] > [Times: user=4.10 sys=0.03, real=5.41 secs] > > 2013-12-20T04:57:28.076+0100: 765214.857: [GC[YG occupancy: 124872 K > (2831168 K)]765214.857: [Rescan (parallel) , 0.0580906 > secs]765214.916: [weak refs processing, 0.0000912 secs]765214.916: > [class unloading, 0.0769742 secs]765214.993: [scrub symbol & string > tables, 0.0612689 secs] [1 CMS-remark: 491947K(7340032K)] > 616820K(10171200K), 0.2256506 secs] [Times: user=0.48 sys=0.00, > real=0.22 secs] > > 2013-12-20T04:57:28.294+0100: 765215.083: [CMS-concurrent-sweep-start] > > 2013-12-20T04:57:29.043+0100: 765215.834: [CMS-concurrent-sweep: > 0.750/0.750 secs] [Times: user=0.76 sys=0.00, real=0.75 secs] > > 2013-12-20T04:57:29.043+0100: 765215.834: [CMS-concurrent-reset-start] > > 2013-12-20T04:57:29.074+0100: 765215.856: [CMS-concurrent-reset: > 0.022/0.022 secs] [Times: user=0.03 sys=0.00, real=0.03 secs] > > 2013-12-20T05:47:54.607+0100: 768241.413: [Full GC 768241.414: [CMS: > 491947K->464864K(7340032K), 4.5053183 secs] > 1910412K->464864K(10171200K), [CMS Perm : 183511K->168997K(524288K)], > 4.5059088 secs] [Times: user=4.49 sys=0.02, real=4.51 secs] > > 2013-12-20T06:47:59.098+0100: 771845.954: [GC 771845.954: [ParNew: > 1507774K->58195K(2831168K), 0.2307012 secs] > 1972638K->523059K(10171200K), 0.2313931 secs] [Times: user=0.37 > sys=0.00, real=0.23 secs] > > N3 > > 2013-12-20T05:46:25.441+0100: 767981.526: [GC 767981.526: [ParNew: > 2695212K->166641K(2831168K), 0.3278475 secs] > 3268057K->739486K(10171200K), 0.3284853 secs] [Times: user=1.62 > sys=0.00, real=0.33 secs] > > 2013-12-20T05:49:55.467+0100: 768191.578: [Full GC 768191.578: [CMS: > 572844K->457790K(7340032K), 3.7687762 secs] > 1216176K->457790K(10171200K), [CMS Perm : 181711K->169514K(524288K)], > 3.7692999 secs] [Times: user=3.76 sys=0.00, real=3.79 secs] > > 2013-12-20T06:49:59.249+0100: 771795.415: [GC 771795.415: [ParNew: > 1585077K->72146K(2831168K), 0.2632945 secs] > 2042868K->529936K(10171200K), 0.2639889 secs] [Times: user=0.41 > sys=0.00, real=0.27 secs] > > N4 > > 2013-12-20T05:48:21.551+0100: 767914.067: [GC 767914.068: [ParNew: > 2656327K->119432K(2831168K), 0.2603676 secs] > 3222693K->685799K(10171200K), 0.2609581 secs] [Times: user=1.14 > sys=0.00, real=0.26 secs] > > 2013-12-20T05:49:03.939+0100: 767956.457: [Full GC 767956.457: [CMS: > 566366K->579681K(7340032K), 6.1324841 secs] > 3149011K->579681K(10171200K), [CMS Perm : 190240K->174791K(524288K)], > 6.1331389 secs] [Times: user=6.13 sys=0.00, real=6.13 secs] > > 2013-12-20T05:50:10.262+0100: 768022.762: [GC 768022.763: [ParNew: > 2516608K->83922K(2831168K), 0.2157015 secs] > 3096289K->663603K(10171200K), 0.2162262 secs] [Times: user=0.41 > sys=0.00, real=0.22 secs] > > Between these Full GC are only a few minutes or as between N3 and N4 a > few seconds. I made some research on possible reasons for a Full GC > and this is the list I gathered so far: > > 1.Running out of old gen > > 2.Running out of perm gen > > 3.Calling System.gc() (indicated by System in the ouput) > > 4.Not having enough free space in Survivor Space to copy objects from > Eden (promotion failed) > > 5.Running out of old gen before a concurrent collection can free it > (Concurrent Mode Failure) > > 6.Having high fragmentation and not enough space for a larger object > in old gen > > However none of these 6 conditions are fulfilled by any of the above > shown full GC. So once again I'm lost and do not have an explanation > for this. > > If you need the full logs for further analysis please let me know. > > Kind Regards > > Denny > > Beschreibung: werum-hr > > cid:image002.png at 01CEF67E.8AEB4630 > > Werum Software & Systems AG > > Wulf-Werum-Strasse 3 | 21337 Lueneburg | Germany > > Tel. +49(0)4131/8900-983 | Fax +49(0)4131/8900-20 > > mailto:denny.kettwig at werum.de | http://www.werum.de > > VAT No. DE 116 083 850 | RG Lueneburg HRB 2262 > > Chairman of Supervisory Board: Johannes Zimmermann > > Executive Board: Hartmut Krome, Ruediger Schlierenkaemper, Hans-Peter > Subel > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140123/dcdc9453/attachment-0001.html -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 1089 bytes Desc: not available Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140123/dcdc9453/attachment-0001.jpe -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 6441 bytes Desc: not available Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140123/dcdc9453/attachment-0001.png From denny.kettwig at werum.de Thu Jan 23 04:43:56 2014 From: denny.kettwig at werum.de (Denny Kettwig) Date: Thu, 23 Jan 2014 12:43:56 +0000 Subject: Unexplanable events in GC logs Message-ID: <6175F8C4FE407D4F830EDA25C27A43173B662980@Werum1790.werum.net> Thank you Matt! An option I never considered, this is very likely the case. Any ideas for the CMS issue? -Denny Von: hotspot-gc-use-bounces at openjdk.java.net [mailto:hotspot-gc-use-bounces at openjdk.java.net] Im Auftrag von Matt Miller Gesendet: Thursday, January 23, 2014 1:02 PM An: hotspot-gc-use at openjdk.java.net Betreff: Re: AW: Unexplanable events in GC logs Hi Denny, Another reason for a Full GC that you did not list is: running jmap against the process. If you run a histo:live or take a live object heap dump, a Full GC will happen (and it doesn't show as a System GC). Is it possible that you are taking a heap histogram or heap dump using the live option? -Matt On 1/23/14, 3:38 AM, Denny Kettwig wrote: Hey folks, in one of our recent cluster systems we found 2 unexplainable events within the GC logs. I'd like to address these events to you and ask for your advice. We are running a clustered jBoss System with 4 nodes, every node has the same configuration. We use the following parameters: -Xms10g -Xmx10g -Xmn3g -Xss2048k -XX:+ExplicitGCInvokesConcurrent -XX:+CMSClassUnloadingEnabled -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:ParallelGCThreads=22 -XX:SurvivorRatio=8 -XX:TargetSurvivorRatio=90 -XX:PermSize=512M -XX:MaxPermSize=512m -Dsun.rmi.dgc.client.gcInterval=3600000 -Dsun.rmi.dgc.server.gcInterval=3600000 Event 1 The first event is a very frequent CMS collections on the first node for about 11 hours. We are talking here about a peak value of 258 CMS collections per hour. From my knowledge this event starts without any reason and ends without any reason since the old space is below 10% usage in this time frame. I do not know what might have caused this CMS collection. We already experienced similar events on this cluster in the past, but at a much higher heap usage (above 90%) and under high load with most likely a huge heap fragmentation and the frequent CMS collections ended with a single very long Full GC that defragmented the heap. In the current event all this is not the case. I attached the relevant part of the GC log. Event 2 The second event is just as confusing for me as the first one. At a certain point in time a full GC takes place on all 4 Nodes and I cannot find a reason for it. Here are the relevant parts: N1 2013-12-20T05:43:28.041+0100: 768231.382: [GC 768231.383: [ParNew: 2618493K->92197K(2831168K), 0.3007160 secs] 3120456K->594160K(10171200K), 0.3017204 secs] [Times: user=0.91 sys=0.00, real=0.30 secs] 2013-12-20T05:45:31.864+0100: 768355.209: [Full GC 768355.210: [CMS: 501963K->496288K(7340032K), 4.2140085 secs] 1397267K->496288K(10171200K), [CMS Perm : 203781K->178528K(524288K)], 4.2148018 secs] [Times: user=4.21 sys=0.00, real=4.21 secs] 2013-12-20T05:52:40.591+0100: 768783.949: [GC 768783.949: [ParNew: 2516608K->48243K(2831168K), 0.2649174 secs] 3012896K->544532K(10171200K), 0.2659039 secs] [Times: user=0.47 sys=0.00, real=0.27 secs] N2 2013-12-20T04:57:21.524+0100: 765208.310: [GC 765208.311: [ParNew: 924566K->111068K(2831168K), 0.1790573 secs] 1416514K->603015K(10171200K), 0.1797121 secs] [Times: user=1.08 sys=0.00, real=0.19 secs] 2013-12-20T04:57:21.711+0100: 765208.499: [GC [1 CMS-initial-mark: 491947K(7340032K)] 603015K(10171200K), 0.3289639 secs] [Times: user=0.33 sys=0.00, real=0.33 secs] 2013-12-20T04:57:22.039+0100: 765208.828: [CMS-concurrent-mark-start] 2013-12-20T04:57:22.616+0100: 765209.405: [CMS-concurrent-mark: 0.577/0.577 secs] [Times: user=3.53 sys=0.05, real=0.58 secs] 2013-12-20T04:57:22.616+0100: 765209.405: [CMS-concurrent-preclean-start] 2013-12-20T04:57:22.647+0100: 765209.430: [CMS-concurrent-preclean: 0.024/0.024 secs] [Times: user=0.03 sys=0.00, real=0.03 secs] 2013-12-20T04:57:22.647+0100: 765209.430: [CMS-concurrent-abortable-preclean-start] CMS: abort preclean due to time 2013-12-20T04:57:28.060+0100: 765214.848: [CMS-concurrent-abortable-preclean: 4.208/5.418 secs] [Times: user=4.10 sys=0.03, real=5.41 secs] 2013-12-20T04:57:28.076+0100: 765214.857: [GC[YG occupancy: 124872 K (2831168 K)]765214.857: [Rescan (parallel) , 0.0580906 secs]765214.916: [weak refs processing, 0.0000912 secs]765214.916: [class unloading, 0.0769742 secs]765214.993: [scrub symbol & string tables, 0.0612689 secs] [1 CMS-remark: 491947K(7340032K)] 616820K(10171200K), 0.2256506 secs] [Times: user=0.48 sys=0.00, real=0.22 secs] 2013-12-20T04:57:28.294+0100: 765215.083: [CMS-concurrent-sweep-start] 2013-12-20T04:57:29.043+0100: 765215.834: [CMS-concurrent-sweep: 0.750/0.750 secs] [Times: user=0.76 sys=0.00, real=0.75 secs] 2013-12-20T04:57:29.043+0100: 765215.834: [CMS-concurrent-reset-start] 2013-12-20T04:57:29.074+0100: 765215.856: [CMS-concurrent-reset: 0.022/0.022 secs] [Times: user=0.03 sys=0.00, real=0.03 secs] 2013-12-20T05:47:54.607+0100: 768241.413: [Full GC 768241.414: [CMS: 491947K->464864K(7340032K), 4.5053183 secs] 1910412K->464864K(10171200K), [CMS Perm : 183511K->168997K(524288K)], 4.5059088 secs] [Times: user=4.49 sys=0.02, real=4.51 secs] 2013-12-20T06:47:59.098+0100: 771845.954: [GC 771845.954: [ParNew: 1507774K->58195K(2831168K), 0.2307012 secs] 1972638K->523059K(10171200K), 0.2313931 secs] [Times: user=0.37 sys=0.00, real=0.23 secs] N3 2013-12-20T05:46:25.441+0100: 767981.526: [GC 767981.526: [ParNew: 2695212K->166641K(2831168K), 0.3278475 secs] 3268057K->739486K(10171200K), 0.3284853 secs] [Times: user=1.62 sys=0.00, real=0.33 secs] 2013-12-20T05:49:55.467+0100: 768191.578: [Full GC 768191.578: [CMS: 572844K->457790K(7340032K), 3.7687762 secs] 1216176K->457790K(10171200K), [CMS Perm : 181711K->169514K(524288K)], 3.7692999 secs] [Times: user=3.76 sys=0.00, real=3.79 secs] 2013-12-20T06:49:59.249+0100: 771795.415: [GC 771795.415: [ParNew: 1585077K->72146K(2831168K), 0.2632945 secs] 2042868K->529936K(10171200K), 0.2639889 secs] [Times: user=0.41 sys=0.00, real=0.27 secs] N4 2013-12-20T05:48:21.551+0100: 767914.067: [GC 767914.068: [ParNew: 2656327K->119432K(2831168K), 0.2603676 secs] 3222693K->685799K(10171200K), 0.2609581 secs] [Times: user=1.14 sys=0.00, real=0.26 secs] 2013-12-20T05:49:03.939+0100: 767956.457: [Full GC 767956.457: [CMS: 566366K->579681K(7340032K), 6.1324841 secs] 3149011K->579681K(10171200K), [CMS Perm : 190240K->174791K(524288K)], 6.1331389 secs] [Times: user=6.13 sys=0.00, real=6.13 secs] 2013-12-20T05:50:10.262+0100: 768022.762: [GC 768022.763: [ParNew: 2516608K->83922K(2831168K), 0.2157015 secs] 3096289K->663603K(10171200K), 0.2162262 secs] [Times: user=0.41 sys=0.00, real=0.22 secs] Between these Full GC are only a few minutes or as between N3 and N4 a few seconds. I made some research on possible reasons for a Full GC and this is the list I gathered so far: 1. Running out of old gen 2. Running out of perm gen 3. Calling System.gc() (indicated by System in the ouput) 4. Not having enough free space in Survivor Space to copy objects from Eden (promotion failed) 5. Running out of old gen before a concurrent collection can free it (Concurrent Mode Failure) 6. Having high fragmentation and not enough space for a larger object in old gen However none of these 6 conditions are fulfilled by any of the above shown full GC. So once again I'm lost and do not have an explanation for this. If you need the full logs for further analysis please let me know. Kind Regards Denny [cid:image002.png at 01CEF67E.8AEB4630] Werum Software & Systems AG Wulf-Werum-Strasse 3 | 21337 Lueneburg | Germany Tel. +49(0)4131/8900-983 | Fax +49(0)4131/8900-20 mailto:denny.kettwig at werum.de | http://www.werum.de VAT No. DE 116 083 850 | RG Lueneburg HRB 2262 Chairman of Supervisory Board: Johannes Zimmermann Executive Board: Hartmut Krome, Ruediger Schlierenkaemper, Hans-Peter Subel _______________________________________________ hotspot-gc-use mailing list hotspot-gc-use at openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140123/128be8d5/attachment.html -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 1089 bytes Desc: image001.jpg Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140123/128be8d5/image001.jpg -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.png Type: image/png Size: 6441 bytes Desc: image002.png Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140123/128be8d5/image002.png From holger.hoffstaette at googlemail.com Thu Jan 23 05:22:42 2014 From: holger.hoffstaette at googlemail.com (=?UTF-8?B?SG9sZ2VyIEhvZmZzdMOkdHRl?=) Date: Thu, 23 Jan 2014 14:22:42 +0100 Subject: Unexplanable events in GC logs In-Reply-To: <6175F8C4FE407D4F830EDA25C27A43173B662980@Werum1790.werum.net> References: <6175F8C4FE407D4F830EDA25C27A43173B662980@Werum1790.werum.net> Message-ID: <52E117A2.7070405@googlemail.com> On 01/23/14 13:43, Denny Kettwig wrote: > An option I never considered, this is very likely the case. Any ideas > for the CMS issue? You do know that RMI periodically calls System.gc(), right? CMS will normally ignore this, but since you have enabled ExplicitGCInvokesConcurrent it might (probably will) mess with any estimations done by CMS. Have you tried without it? -h From denny.kettwig at werum.de Thu Jan 23 06:10:47 2014 From: denny.kettwig at werum.de (Denny Kettwig) Date: Thu, 23 Jan 2014 14:10:47 +0000 Subject: AW: Unexplanable events in GC logs In-Reply-To: <52E117A2.7070405@googlemail.com> References: <6175F8C4FE407D4F830EDA25C27A43173B662980@Werum1790.werum.net> <52E117A2.7070405@googlemail.com> Message-ID: <6175F8C4FE407D4F830EDA25C27A43173B6629DA@Werum1790.werum.net> RMI call interval is set by: -Dsun.rmi.dgc.client.gcInterval=3600000 -Dsun.rmi.dgc.server.gcInterval=3600000 And except for this particular part of the log the CMS collection occurs once per hour. -------------------------------------------- Von: hotspot-gc-use-bounces at openjdk.java.net [mailto:hotspot-gc-use-bounces at openjdk.java.net] Im Auftrag von Holger Hoffst?tte Gesendet: Thursday, January 23, 2014 2:25 PM An: hotspot-gc-use at openjdk.java.net Betreff: Re: Unexplanable events in GC logs On 01/23/14 13:43, Denny Kettwig wrote: > An option I never considered, this is very likely the case. Any ideas > for the CMS issue? You do know that RMI periodically calls System.gc(), right? CMS will normally ignore this, but since you have enabled ExplicitGCInvokesConcurrent it might (probably will) mess with any estimations done by CMS. Have you tried without it? -h _______________________________________________ hotspot-gc-use mailing list hotspot-gc-use at openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use From bernd-2014 at eckenfels.net Thu Jan 23 15:49:38 2014 From: bernd-2014 at eckenfels.net (Bernd Eckenfels) Date: Fri, 24 Jan 2014 00:49:38 +0100 Subject: AW: Unexplanable events in GC logs In-Reply-To: <6175F8C4FE407D4F830EDA25C27A43173B66291C@Werum1790.werum.net> References: <6175F8C4FE407D4F830EDA25C27A43173B66291C@Werum1790.werum.net> Message-ID: Am 23.01.2014, 09:38 Uhr, schrieb Denny Kettwig : > The first event is a very frequent CMS collections on the first node for > about 11 hours. We are talking here about a peak value of 258 CMS > collections per hour. From my knowledge this event starts without any > reason and ends without any reason since the old space is below 10% > usage in this time frame. I do not know what might have caused this CMS > collection. We already experienced similar events on this cluster in the > past, but at a much higher heap usage (above 90%) and under high load > with most likely a huge heap fragmentation and the frequent CMS > collections ended with a single very long Full GC that defragmented the > heap. In the current event all this is not the case. I attached the > relevant part of the GC log. I had discussed similiar problems hare in the past as well. I havent really found the reason, but some things consiedered have been filling up of the codecache and a hossed estimator (using OccupancyOnly might help here). Greetings Bernd From jon.masamitsu at oracle.com Fri Jan 24 13:20:20 2014 From: jon.masamitsu at oracle.com (Jon Masamitsu) Date: Fri, 24 Jan 2014 13:20:20 -0800 Subject: AW: Unexplanable events in GC logs In-Reply-To: <6175F8C4FE407D4F830EDA25C27A43173B66291C@Werum1790.werum.net> References: <6175F8C4FE407D4F830EDA25C27A43173B66291C@Werum1790.werum.net> Message-ID: <52E2D914.4010406@oracle.com> Denny, Does you application use JNI critical regions? (GetPrimitiveArrayCritical or GetStringCritical) What jdk release is this? Jon On 1/23/2014 12:38 AM, Denny Kettwig wrote: > > Hey folks, > > in one of our recent cluster systems we found 2 unexplainable events > within the GC logs. I'd like to address these events to you and ask > for your advice. We are running a clustered jBoss System with 4 nodes, > every node has the same configuration. We use the following parameters: > > -Xms10g > > -Xmx10g > > -Xmn3g > > -Xss2048k > > -XX:+ExplicitGCInvokesConcurrent > > -XX:+CMSClassUnloadingEnabled > > -XX:+UseParNewGC > > -XX:+UseConcMarkSweepGC > > -XX:ParallelGCThreads=22 > > -XX:SurvivorRatio=8 > > -XX:TargetSurvivorRatio=90 > > -XX:PermSize=512M > > -XX:MaxPermSize=512m > > -Dsun.rmi.dgc.client.gcInterval=3600000 > > -Dsun.rmi.dgc.server.gcInterval=3600000 > > *Event 1* > > The first event is a very frequent CMS collections on the first node > for about 11 hours. We are talking here about a peak value of 258 CMS > collections per hour. From my knowledge this event starts without any > reason and ends without any reason since the old space is below 10% > usage in this time frame. I do not know what might have caused this > CMS collection. We already experienced similar events on this cluster > in the past, but at a much higher heap usage (above 90%) and under > high load with most likely a huge heap fragmentation and the frequent > CMS collections ended with a single very long Full GC that > defragmented the heap. In the current event all this is not the case. > *I attached the relevant part of the GC log*. > > *Event 2* > > The second event is just as confusing for me as the first one. At a > certain point in time a full GC takes place on *all 4 Nodes *and I > cannot find a reason for it. Here are the relevant parts: > > N1 > > 2013-12-20T05:43:28.041+0100: 768231.382: [GC 768231.383: [ParNew: > 2618493K->92197K(2831168K), 0.3007160 secs] > 3120456K->594160K(10171200K), 0.3017204 secs] [Times: user=0.91 > sys=0.00, real=0.30 secs] > > 2013-12-20T05:45:31.864+0100: 768355.209: [Full GC 768355.210: [CMS: > 501963K->496288K(7340032K), 4.2140085 secs] > 1397267K->496288K(10171200K), [CMS Perm : 203781K->178528K(524288K)], > 4.2148018 secs] [Times: user=4.21 sys=0.00, real=4.21 secs] > > 2013-12-20T05:52:40.591+0100: 768783.949: [GC 768783.949: [ParNew: > 2516608K->48243K(2831168K), 0.2649174 secs] > 3012896K->544532K(10171200K), 0.2659039 secs] [Times: user=0.47 > sys=0.00, real=0.27 secs] > > N2 > > 2013-12-20T04:57:21.524+0100: 765208.310: [GC 765208.311: [ParNew: > 924566K->111068K(2831168K), 0.1790573 secs] > 1416514K->603015K(10171200K), 0.1797121 secs] [Times: user=1.08 > sys=0.00, real=0.19 secs] > > 2013-12-20T04:57:21.711+0100: 765208.499: [GC [1 CMS-initial-mark: > 491947K(7340032K)] 603015K(10171200K), 0.3289639 secs] [Times: > user=0.33 sys=0.00, real=0.33 secs] > > 2013-12-20T04:57:22.039+0100: 765208.828: [CMS-concurrent-mark-start] > > 2013-12-20T04:57:22.616+0100: 765209.405: [CMS-concurrent-mark: > 0.577/0.577 secs] [Times: user=3.53 sys=0.05, real=0.58 secs] > > 2013-12-20T04:57:22.616+0100: 765209.405: [CMS-concurrent-preclean-start] > > 2013-12-20T04:57:22.647+0100: 765209.430: [CMS-concurrent-preclean: > 0.024/0.024 secs] [Times: user=0.03 sys=0.00, real=0.03 secs] > > 2013-12-20T04:57:22.647+0100: 765209.430: > [CMS-concurrent-abortable-preclean-start] > > CMS: abort preclean due to time 2013-12-20T04:57:28.060+0100: > 765214.848: [CMS-concurrent-abortable-preclean: 4.208/5.418 secs] > [Times: user=4.10 sys=0.03, real=5.41 secs] > > 2013-12-20T04:57:28.076+0100: 765214.857: [GC[YG occupancy: 124872 K > (2831168 K)]765214.857: [Rescan (parallel) , 0.0580906 > secs]765214.916: [weak refs processing, 0.0000912 secs]765214.916: > [class unloading, 0.0769742 secs]765214.993: [scrub symbol & string > tables, 0.0612689 secs] [1 CMS-remark: 491947K(7340032K)] > 616820K(10171200K), 0.2256506 secs] [Times: user=0.48 sys=0.00, > real=0.22 secs] > > 2013-12-20T04:57:28.294+0100: 765215.083: [CMS-concurrent-sweep-start] > > 2013-12-20T04:57:29.043+0100: 765215.834: [CMS-concurrent-sweep: > 0.750/0.750 secs] [Times: user=0.76 sys=0.00, real=0.75 secs] > > 2013-12-20T04:57:29.043+0100: 765215.834: [CMS-concurrent-reset-start] > > 2013-12-20T04:57:29.074+0100: 765215.856: [CMS-concurrent-reset: > 0.022/0.022 secs] [Times: user=0.03 sys=0.00, real=0.03 secs] > > 2013-12-20T05:47:54.607+0100: 768241.413: [Full GC 768241.414: [CMS: > 491947K->464864K(7340032K), 4.5053183 secs] > 1910412K->464864K(10171200K), [CMS Perm : 183511K->168997K(524288K)], > 4.5059088 secs] [Times: user=4.49 sys=0.02, real=4.51 secs] > > 2013-12-20T06:47:59.098+0100: 771845.954: [GC 771845.954: [ParNew: > 1507774K->58195K(2831168K), 0.2307012 secs] > 1972638K->523059K(10171200K), 0.2313931 secs] [Times: user=0.37 > sys=0.00, real=0.23 secs] > > N3 > > 2013-12-20T05:46:25.441+0100: 767981.526: [GC 767981.526: [ParNew: > 2695212K->166641K(2831168K), 0.3278475 secs] > 3268057K->739486K(10171200K), 0.3284853 secs] [Times: user=1.62 > sys=0.00, real=0.33 secs] > > 2013-12-20T05:49:55.467+0100: 768191.578: [Full GC 768191.578: [CMS: > 572844K->457790K(7340032K), 3.7687762 secs] > 1216176K->457790K(10171200K), [CMS Perm : 181711K->169514K(524288K)], > 3.7692999 secs] [Times: user=3.76 sys=0.00, real=3.79 secs] > > 2013-12-20T06:49:59.249+0100: 771795.415: [GC 771795.415: [ParNew: > 1585077K->72146K(2831168K), 0.2632945 secs] > 2042868K->529936K(10171200K), 0.2639889 secs] [Times: user=0.41 > sys=0.00, real=0.27 secs] > > N4 > > 2013-12-20T05:48:21.551+0100: 767914.067: [GC 767914.068: [ParNew: > 2656327K->119432K(2831168K), 0.2603676 secs] > 3222693K->685799K(10171200K), 0.2609581 secs] [Times: user=1.14 > sys=0.00, real=0.26 secs] > > 2013-12-20T05:49:03.939+0100: 767956.457: [Full GC 767956.457: [CMS: > 566366K->579681K(7340032K), 6.1324841 secs] > 3149011K->579681K(10171200K), [CMS Perm : 190240K->174791K(524288K)], > 6.1331389 secs] [Times: user=6.13 sys=0.00, real=6.13 secs] > > 2013-12-20T05:50:10.262+0100: 768022.762: [GC 768022.763: [ParNew: > 2516608K->83922K(2831168K), 0.2157015 secs] > 3096289K->663603K(10171200K), 0.2162262 secs] [Times: user=0.41 > sys=0.00, real=0.22 secs] > > Between these Full GC are only a few minutes or as between N3 and N4 a > few seconds. I made some research on possible reasons for a Full GC > and this is the list I gathered so far: > > 1.Running out of old gen > > 2.Running out of perm gen > > 3.Calling System.gc() (indicated by System in the ouput) > > 4.Not having enough free space in Survivor Space to copy objects from > Eden (promotion failed) > > 5.Running out of old gen before a concurrent collection can free it > (Concurrent Mode Failure) > > 6.Having high fragmentation and not enough space for a larger object > in old gen > > However none of these 6 conditions are fulfilled by any of the above > shown full GC. So once again I'm lost and do not have an explanation > for this. > > If you need the full logs for further analysis please let me know. > > Kind Regards > > Denny > > Beschreibung: werum-hr > > cid:image002.png at 01CEF67E.8AEB4630 > > Werum Software & Systems AG > > Wulf-Werum-Strasse 3 | 21337 Lueneburg | Germany > > Tel. +49(0)4131/8900-983 | Fax +49(0)4131/8900-20 > > mailto:denny.kettwig at werum.de | http://www.werum.de > > VAT No. DE 116 083 850 | RG Lueneburg HRB 2262 > > Chairman of Supervisory Board: Johannes Zimmermann > > Executive Board: Hartmut Krome, Ruediger Schlierenkaemper, Hans-Peter > Subel > > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140124/e29d81a2/attachment.html -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 1089 bytes Desc: not available Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140124/e29d81a2/attachment.jpe -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 6441 bytes Desc: not available Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140124/e29d81a2/attachment.png From bphinz at users.sourceforge.net Sat Jan 25 09:03:25 2014 From: bphinz at users.sourceforge.net (Brian Hinz) Date: Sat, 25 Jan 2014 12:03:25 -0500 Subject: Why does Clipboard.getData() use huge amount of heap memory? Message-ID: Hi, Apologies in advance if this is not the right place, but I maintain an open source java VNC viewer (TigerVNC) and I'm pretty much stumped over an OOM exception that gets thrown when I try to access the system clipboard and it contains a large amount of text data. It seems that the heap size jumps more than 10x the actual size of the clipboard data. For example, if I select the whole contents of a 20Mb text file and copy it all to the clipboard (outside the java app) then try to access the clipboard from my app while monitoring the heap size using jconsole, I see the heap size jump by 200-400Mb. I've isolated the source of the exception to the call to Clipboard.getData() (see code below). I've tried using different DataFlavors, etc., but all have the same result. Is this just an inefficiency in the implementation of getData that I'll have to live with? Any suggestions? TIA, -brian public synchronized void checkClipboard() { SecurityManager sm = System.getSecurityManager(); try { if (sm != null) sm.checkSystemClipboardAccess(); Clipboard cb = Toolkit.getDefaultToolkit().getSystemClipboard(); DataFlavor flavor = DataFlavor.stringFlavor; if (cb != null && cb.isDataFlavorAvailable(flavor)) { StringReader reader = null; try { reader = new StringReader((String)cb.getData(flavor)); reader.read(clipBuf); } catch(java.lang.OutOfMemoryError e) { vlog.error("Too much data on local clipboard for VncViewer to handle!"); } finally { if (reader != null) reader.close(); } clipBuf.flip(); String newContents = clipBuf.toString(); if (!cc.clipboardDialog.compareContentsTo(newContents)) { cc.clipboardDialog.setContents(newContents); if (cc.viewer.sendClipboard.getValue()) cc.writeClientCutText(newContents, newContents.length()); } clipBuf.clear(); // clear out the heap memory used by cb.getData() or else it starts to accumulate System.gc(); } } catch(java.lang.Exception e) { vlog.debug("Exception getting clipboard data: " + e.getMessage()); } } -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140125/a10371e5/attachment-0001.html From bernd-2014 at eckenfels.net Sat Jan 25 11:35:18 2014 From: bernd-2014 at eckenfels.net (Bernd Eckenfels) Date: Sat, 25 Jan 2014 20:35:18 +0100 Subject: Why does Clipboard.getData() use huge amount of heap memory? In-Reply-To: References: Message-ID: <1A94445B-6814-4E46-A474-2932755311DE@eckenfels.net> Not sure about the clipboard API, but with your reading from a StringReader you essentially double the amount of Space used. (And UTF16 Chars also doubeld the byte count for single byte text). Did you try to get a byte[] Array instead? You can place a ByteBuffer on top of it with no additional copy. Bernd > Am 25.01.2014 um 18:03 schrieb Brian Hinz : > > Hi, > > Apologies in advance if this is not the right place, but I maintain an open source java VNC viewer (TigerVNC) and I'm pretty much stumped over an OOM exception that gets thrown when I try to access the system clipboard and it contains a large amount of text data. It seems that the heap size jumps more than 10x the actual size of the clipboard data. For example, if I select the whole contents of a 20Mb text file and copy it all to the clipboard (outside the java app) then try to access the clipboard from my app while monitoring the heap size using jconsole, I see the heap size jump by 200-400Mb. I've isolated the source of the exception to the call to Clipboard.getData() (see code below). I've tried using different DataFlavors, etc., but all have the same result. Is this just an inefficiency in the implementation of getData that I'll have to live with? Any suggestions? > > TIA, > -brian > > public synchronized void checkClipboard() { > SecurityManager sm = System.getSecurityManager(); > try { > if (sm != null) sm.checkSystemClipboardAccess(); > Clipboard cb = Toolkit.getDefaultToolkit().getSystemClipboard(); > DataFlavor flavor = DataFlavor.stringFlavor; > if (cb != null && cb.isDataFlavorAvailable(flavor)) { > StringReader reader = null; > try { > reader = new StringReader((String)cb.getData(flavor)); > reader.read(clipBuf); > } catch(java.lang.OutOfMemoryError e) { > vlog.error("Too much data on local clipboard for VncViewer to handle!"); > } finally { > if (reader != null) reader.close(); > } > clipBuf.flip(); > String newContents = clipBuf.toString(); > if (!cc.clipboardDialog.compareContentsTo(newContents)) { > cc.clipboardDialog.setContents(newContents); > if (cc.viewer.sendClipboard.getValue()) > cc.writeClientCutText(newContents, newContents.length()); > } > clipBuf.clear(); > // clear out the heap memory used by cb.getData() or else it starts to accumulate > System.gc(); > } > } catch(java.lang.Exception e) { > vlog.debug("Exception getting clipboard data: " + e.getMessage()); > } > } > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use at openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140125/460d8bc3/attachment.html From denny.kettwig at werum.de Mon Jan 27 01:50:44 2014 From: denny.kettwig at werum.de (Denny Kettwig) Date: Mon, 27 Jan 2014 09:50:44 +0000 Subject: AW: AW: Unexplanable events in GC logs In-Reply-To: <52E2D914.4010406@oracle.com> References: <6175F8C4FE407D4F830EDA25C27A43173B66291C@Werum1790.werum.net> <52E2D914.4010406@oracle.com> Message-ID: <6175F8C4FE407D4F830EDA25C27A43173B662E80@Werum1790.werum.net> Hey Jon, > Does you application use JNI critical regions? No. > What jdk release is this? JDK 1.6 u23 Regards, Denny Von: hotspot-gc-use-bounces at openjdk.java.net [mailto:hotspot-gc-use-bounces at openjdk.java.net] Im Auftrag von Jon Masamitsu Gesendet: Friday, January 24, 2014 10:24 PM An: hotspot-gc-use at openjdk.java.net Betreff: Re: AW: Unexplanable events in GC logs Denny, Does you application use JNI critical regions? (GetPrimitiveArrayCritical or GetStringCritical) What jdk release is this? Jon -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140127/366e4d45/attachment.html