G1 GC clean up time is too long
yao
yaoshengzhe at gmail.com
Thu Jan 2 14:35:10 PST 2014
Hi Ryan,
I've enabled gc logging options you mentioned and it looks like rset
coarsenings is a problem for large gc clean up time. I will take your
suggestions and try different G1RSetRegionEntries values. Thank you very
much.
Happy New Year
-Shengzhe
*Typical RSet Log*
Concurrent RS processed 184839720 cards
Of 960997 completed buffers:
930426 ( 96.8%) by conc RS threads.
30571 ( 3.2%) by mutator threads.
Conc RS threads times(s)
0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 \
0.00
Total heap region rem set sizes = 5256086K. Max = 8640K.
Static structures = 347K, free_lists = 7420K.
1166427614 occupied cards represented.
Max size region =
296:(O)[0x00007fc7a6000000,0x00007fc7a8000000,0x00007fc7a8000000], size =
8641K, occupied = 1797K.
Did 25790 coarsenings.
Output of *$ cat gc-hbase-1388692019.log | grep "coarsenings\|\(GC
cleanup\)"*
Did 0 coarsenings.
Did 0 coarsenings.
Did 0 coarsenings.
Did 0 coarsenings.
Did 0 coarsenings.
Did 0 coarsenings.
Did 72 coarsenings.
Did 224 coarsenings.
2014-01-02T15:12:03.031-0500: 1452.619: [GC cleanup 44G->43G(66G),
0.0376940 secs]
Did 1015 coarsenings.
Did 1476 coarsenings.
Did 2210 coarsenings.
2014-01-02T15:25:37.483-0500: 2267.070: [GC cleanup 43G->42G(66G),
0.0539190 secs]
Did 4123 coarsenings.
Did 4817 coarsenings.
Did 5362 coarsenings.
2014-01-02T15:40:19.499-0500: 3149.087: [GC cleanup 44G->42G(66G),
0.0661880 secs]
Did 6316 coarsenings.
Did 6842 coarsenings.
Did 7213 coarsenings.
2014-01-02T15:54:42.812-0500: 4012.400: [GC cleanup 43G->42G(66G),
0.0888960 secs]
Did 7458 coarsenings.
Did 7739 coarsenings.
Did 8214 coarsenings.
2014-01-02T16:09:04.009-0500: 4873.597: [GC cleanup 44G->43G(66G),
0.1171540 secs]
Did 8958 coarsenings.
Did 8973 coarsenings.
Did 9056 coarsenings.
Did 9543 coarsenings.
2014-01-02T16:23:51.359-0500: 5760.947: [GC cleanup 44G->43G(66G),
0.1526980 secs]
Did 9561 coarsenings.
Did 9873 coarsenings.
Did 10209 coarsenings.
2014-01-02T16:39:04.462-0500: 6674.050: [GC cleanup 44G->43G(66G),
0.1923330 secs]
Did 10599 coarsenings.
Did 10849 coarsenings.
Did 11178 coarsenings.
2014-01-02T16:46:57.445-0500: 7147.033: [GC cleanup 44G->44G(66G),
0.2353640 secs]
Did 11746 coarsenings.
Did 12701 coarsenings.
2014-01-02T16:53:17.536-0500: 7527.124: [GC cleanup 44G->44G(66G),
0.3489450 secs]
Did 13272 coarsenings.
Did 14682 coarsenings.
2014-01-02T16:58:00.726-0500: 7810.314: [GC cleanup 44G->44G(66G),
0.4271240 secs]
Did 16630 coarsenings.
2014-01-02T17:01:37.077-0500: 8026.664: [GC cleanup 44G->44G(66G),
0.5089060 secs]
Did 17612 coarsenings.
Did 21654 coarsenings.
2014-01-02T17:06:02.566-0500: 8292.154: [GC cleanup 44G->44G(66G),
0.5531680 secs]
Did 23774 coarsenings.
Did 24074 coarsenings.
2014-01-02T17:11:24.795-0500: 8614.383: [GC cleanup 44G->44G(66G),
0.5290600 secs]
Did 24768 coarsenings.
2014-01-02T17:17:23.219-0500: 8972.807: [GC cleanup 44G->44G(66G),
0.5382620 secs]
Did 25790 coarsenings.
Did 27047 coarsenings.
2014-01-02T17:23:00.551-0500: 9310.139: [GC cleanup 45G->44G(66G),
0.5107910 secs]
Did 28558 coarsenings.
2014-01-02T17:28:22.157-0500: 9631.745: [GC cleanup 45G->44G(66G),
0.4902690 secs]
Did 29272 coarsenings.
Did 29335 coarsenings.
On Thu, Jan 2, 2014 at 10:49 AM, YU ZHANG <yu.zhang at oracle.com> wrote:
> Ryan,
>
> Please see my comments in line.
>
> Thanks,
> Jenny
>
> On 1/2/2014 9:57 AM, Ryan Gardner wrote:
>
> I've also fought with cleanup times being long with a large heap and G1.
> In my case, I was suspicious that the RSet coarsening was increasing the
> time for GC Cleanups.
>
> If you have a way to test different settings in a non-production
> environment, you could consider experimenting with:
>
>
> -XX:+UnlockExperimentalVMOptions
>
> -XX:G1RSetRegionEntries=4096
>
> and different values for the RSetRegionEntries - 4096 was a sweet spot for
> me, but your application may behave differently.
>
> You can turn on:
>
> -XX:+UnlockDiagnosticVMOptions
>
> -XX:+G1SummarizeRSetStats
>
> -XX:G1SummarizeRSetStatsPeriod=20
>
> to get it to spit out what it is doing to get some more insight into those
> times.
>
>
> The specific number of RSetRegionEntries I set (4096) was, in theory,
> supposed to be close to what it was setting based on my region size (also
> 32m) and number of regions- but it did not seem to be.
>
> If G1RSetRegionEntries not set, it is decided by
> G1RSetRegionEntriesBase*(region_size_log_mb+1).
> G1SetRegionEntriesBase is a constant(256). region_size_log_mb is related
> to heap region size(region_size_mb-20).
>
> If you have 92G heap, and 32m regions size, I guess the default value is
> bigger than 4096?
> Assuming my guess was right, you decide to reduce the entries as not
> seeing 'coarsenings' in the G1SummarizeRSetStats output? Did you see the
> cards for old or young regions increase as the clean up time increase?
> Also in your log, when clean up time increase, is it update RS or scan RS?
>
> Also, if you have more memory available, I have found G1 to take the
> extra memory and not increase pause times much. As you increase the total
> heap size, the size of your smallest possible collection will also increase
> since it sets it to a percentage of total heap... In my case I was tuning
> an applicaiton that was a cache, so it had tons heap space but wasn't
> churning it over much...
>
> I ended up going as low as:
>
> -XX:G1NewSizePercent=1
>
> to let G1 feel free to use as few regions as possible to achieve smaller
> pause times.
>
> G1NewSizePercent(default 5) allows G1 to allocate this percent of heap as
> young gen size. Lowering it should results smaller young gen. So the
> young gc pause is smaller.
>
> I've been running in production on 1.7u40 for several months now with
> 92GB heaps and a worst-case cleanup pause time of around 370ms - prior to
> tuning the rset region entries, the cleanup phase was getting worse and
> worse over time and in testing would sometimes be over 1 second.
>
> I meant to dive into the OpenJDK code to look at where the default
> RSetRegionEntries are calculated, but didn't get around to it.
>
>
> Hope that helps,
>
> Ryan Gardner
>
>
> On Dec 31, 2013 8:29 PM, "yao" <yaoshengzhe at gmail.com> wrote:
>
>> Hi Folks,
>>
>> Sorry for reporting GC performance result late, we are in the code
>> freeze period for the holiday season and cannot do any production related
>> deployment.
>>
>> First, I'd like to say thank you to Jenny, Monica and Thomas. Your
>> suggestions are really helpful and help us to understand G1 GC behavior. We
>> did NOT observe any full GCs after adjusting suggested parameters. That is
>> really awesome, we tried these new parameters on Dec 26 and full GC
>> disappeared since then (at least until I am writing this email, at 3:37pm
>> EST, Dec 30).
>>
>> G1 parameters:
>>
>> *-XX:MaxGCPauseMillis=100 *-XX:G1HeapRegionSize=32m
>>
>> *-XX:InitiatingHeapOccupancyPercent=65 *-XX:G1ReservePercent=20
>>
>>
>>
>> *-XX:G1HeapWastePercent=5 -XX:G1MixedGCLiveThresholdPercent=75 *
>> We've reduced MaxGCPauseMillis to 100 since our real-time system is
>> focus on low pause, if system cannot give response in 50 milliseconds, it's
>> totally useless for the client. However, current read latency 99 percentile
>> is still slightly higher than CMS machines but they are pretty close (14
>> millis vs 12 millis). One thing we can do now is to increase heap size for
>> G1 machines, for now, the heap size for G1 is only 90 percent of those CMS
>> machines. This is because we observed our server process is killed by OOM
>> killer on G1 machines and we decided to decrease heap size on G1 machines.
>> Since G1ReservePercent was increased, we think it should be safe to
>> increase G1 heap to be same as CMS machine. We believe it could make G1
>> machine give us better performance because 40 percent of heap will be used
>> for block cache.
>>
>> Thanks
>> -Shengzhe
>>
>> G1 Logs
>>
>> 2013-12-30T08:25:26.727-0500: 308692.158: [GC pause (young)
>> Desired survivor size 234881024 bytes, new threshold 14 (max 15)
>> - age 1: 16447904 bytes, 16447904 total
>> - age 2: 30614384 bytes, 47062288 total
>> - age 3: 16122104 bytes, 63184392 total
>> - age 4: 16542280 bytes, 79726672 total
>> - age 5: 14249520 bytes, 93976192 total
>> - age 6: 15187728 bytes, 109163920 total
>> - age 7: 15073808 bytes, 124237728 total
>> - age 8: 17903552 bytes, 142141280 total
>> - age 9: 17031280 bytes, 159172560 total
>> - age 10: 16854792 bytes, 176027352 total
>> - age 11: 19192480 bytes, 195219832 total
>> - age 12: 20491176 bytes, 215711008 total
>> - age 13: 16367528 bytes, 232078536 total
>> - age 14: 15536120 bytes, 247614656 total
>> 308692.158: [G1Ergonomics (CSet Construction) start choosing CSet,
>> _pending_cards: 32768, predicted base time: 38.52 ms, remaining time: 61.48
>> ms, target pause time: 100.00 ms]
>> 308692.158: [G1Ergonomics (CSet Construction) add young regions to CSet,
>> eden: 91 regions, survivors: 14 regions, predicted young region time: 27.76
>> ms]
>> 308692.158: [G1Ergonomics (CSet Construction) finish choosing CSet,
>> eden: 91 regions, survivors: 14 regions, old: 0 regions, predicted pause
>> time: 66.28 ms, target pause time: 100.00 ms]
>> 308692.233: [G1Ergonomics (Concurrent Cycles) request concurrent cycle
>> initiation, reason: occupancy higher than threshold, occupancy: 52143587328
>> bytes, allocation request: 0 bytes, threshold: 46172576125 bytes (65.00 %),
>> source: end of GC]
>> , 0.0749020 secs]
>> [Parallel Time: 53.9 ms, GC Workers: 18]
>> [GC Worker Start (ms): Min: 308692158.6, Avg: 308692159.0, Max:
>> 308692159.4, Diff: 0.8]
>> [Ext Root Scanning (ms): Min: 3.9, Avg: 4.5, Max: 6.4, Diff: 2.4,
>> Sum: 81.9]
>> [Update RS (ms): Min: 10.2, Avg: 11.6, Max: 12.2, Diff: 2.0, Sum:
>> 209.0]
>> [Processed Buffers: Min: 15, Avg: 22.5, Max: 31, Diff: 16, Sum:
>> 405]
>> [Scan RS (ms): Min: 7.8, Avg: 8.0, Max: 8.3, Diff: 0.5, Sum: 144.3]
>> [Object Copy (ms): Min: 28.3, Avg: 28.4, Max: 28.5, Diff: 0.2, Sum:
>> 510.7]
>> [Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum:
>> 1.2]
>> [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1,
>> Sum: 0.5]
>> [GC Worker Total (ms): Min: 52.3, Avg: 52.6, Max: 53.1, Diff: 0.8,
>> Sum: 947.5]
>> [GC Worker End (ms): Min: 308692211.6, Avg: 308692211.7, Max:
>> 308692211.7, Diff: 0.1]
>> [Code Root Fixup: 0.0 ms]
>> [Clear CT: 9.8 ms]
>> [Other: 11.1 ms]
>> [Choose CSet: 0.0 ms]
>> [Ref Proc: 2.4 ms]
>> [Ref Enq: 0.4 ms]
>> [Free CSet: 1.1 ms]
>> [Eden: 2912.0M(2912.0M)->0.0B(3616.0M) Survivors: 448.0M->416.0M Heap:
>> 51.7G(66.2G)->48.9G(66.2G)]
>> [Times: user=1.07 sys=0.01, real=0.08 secs]
>> 308697.312: [G1Ergonomics (Concurrent Cycles) initiate concurrent cycle,
>> reason: concurrent cycle initiation requested]
>> 2013-12-30T08:25:31.881-0500: 308697.312: [GC pause (young) (initial-mark)
>> Desired survivor size 268435456 bytes, new threshold 15 (max 15)
>> - age 1: 17798336 bytes, 17798336 total
>> - age 2: 15275456 bytes, 33073792 total
>> - age 3: 27940176 bytes, 61013968 total
>> - age 4: 15716648 bytes, 76730616 total
>> - age 5: 16474656 bytes, 93205272 total
>> - age 6: 14249232 bytes, 107454504 total
>> - age 7: 15187536 bytes, 122642040 total
>> - age 8: 15073808 bytes, 137715848 total
>> - age 9: 17362752 bytes, 155078600 total
>> - age 10: 17031280 bytes, 172109880 total
>> - age 11: 16854792 bytes, 188964672 total
>> - age 12: 19124800 bytes, 208089472 total
>> - age 13: 20491176 bytes, 228580648 total
>> - age 14: 16367528 bytes, 244948176 total
>> 308697.313: [G1Ergonomics (CSet Construction) start choosing CSet,
>> _pending_cards: 31028, predicted base time: 37.87 ms, remaining time: 62.13
>> ms, target pause time: 100.00 ms]
>> 308697.313: [G1Ergonomics (CSet Construction) add young regions to CSet,
>> eden: 113 regions, survivors: 13 regions, predicted young region time:
>> 27.99 ms]
>> 308697.313: [G1Ergonomics (CSet Construction) finish choosing CSet,
>> eden: 113 regions, survivors: 13 regions, old: 0 regions, predicted pause
>> time: 65.86 ms, target pause time: 100.00 ms]
>> , 0.0724890 secs]
>> [Parallel Time: 51.9 ms, GC Workers: 18]
>> [GC Worker Start (ms): Min: 308697313.3, Avg: 308697313.7, Max:
>> 308697314.0, Diff: 0.6]
>> [Ext Root Scanning (ms): Min: 4.3, Avg: 5.7, Max: 16.7, Diff: 12.3,
>> Sum: 101.8]
>> [Update RS (ms): Min: 0.0, Avg: 9.3, Max: 10.4, Diff: 10.4, Sum:
>> 166.9]
>> [Processed Buffers: Min: 0, Avg: 22.0, Max: 30, Diff: 30, Sum:
>> 396]
>> [Scan RS (ms): Min: 6.4, Avg: 8.5, Max: 13.0, Diff: 6.5, Sum: 152.3]
>> [Object Copy (ms): Min: 22.5, Avg: 27.1, Max: 27.7, Diff: 5.2, Sum:
>> 487.0]
>> [Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum:
>> 1.0]
>> [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1,
>> Sum: 0.6]
>> [GC Worker Total (ms): Min: 50.2, Avg: 50.5, Max: 50.9, Diff: 0.6,
>> Sum: 909.5]
>> [GC Worker End (ms): Min: 308697364.2, Avg: 308697364.2, Max:
>> 308697364.3, Diff: 0.1]
>> [Code Root Fixup: 0.0 ms]
>> [Clear CT: 9.9 ms]
>> [Other: 10.8 ms]
>> [Choose CSet: 0.0 ms]
>> [Ref Proc: 2.8 ms]
>> [Ref Enq: 0.4 ms]
>> [Free CSet: 0.9 ms]
>> [Eden: 3616.0M(3616.0M)->0.0B(3520.0M) Survivors: 416.0M->448.0M Heap:
>> 52.5G(66.2G)->49.0G(66.2G)]
>> [Times: user=1.01 sys=0.00, real=0.07 secs]
>> 2013-12-30T08:25:31.954-0500: 308697.385: [GC
>> concurrent-root-region-scan-start]
>> 2013-12-30T08:25:31.967-0500: 308697.398: [GC
>> concurrent-root-region-scan-end, 0.0131710 secs]
>> 2013-12-30T08:25:31.967-0500: 308697.398: [GC concurrent-mark-start]
>> 2013-12-30T08:25:36.566-0500: 308701.997: [GC concurrent-mark-end,
>> 4.5984140 secs]
>> 2013-12-30T08:25:36.570-0500: 308702.002: [GC remark
>> 2013-12-30T08:25:36.573-0500: 308702.004: [GC ref-proc, 0.0126990 secs],
>> 0.0659540 secs]
>> [Times: user=0.87 sys=0.00, real=0.06 secs]
>> 2013-12-30T08:25:36.641-0500: 308702.072: [GC cleanup 52G->52G(66G),
>> 0.5487830 secs]
>> [Times: user=9.66 sys=0.06, real=0.54 secs]
>> 2013-12-30T08:25:37.190-0500: 308702.622: [GC concurrent-cleanup-start]
>> 2013-12-30T08:25:37.190-0500: 308702.622: [GC concurrent-cleanup-end,
>> 0.0000480 secs]
>>
>>
>>
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>
>>
>
> _______________________________________________
> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140102/5d821f95/attachment.html
More information about the hotspot-gc-use
mailing list