G1 GC clean up time is too long

Ryan Gardner ryebrye at gmail.com
Thu Jan 2 15:13:54 PST 2014


Be sure to try different values... 4196 for me was half the size of the
default value yet yielded far fewer coarsenings (which made no sense to me
at the time)

I'm going to try to dig up my logs from my tuning earlier to reply to the
previous email - it was a few months ago so the specifics aren't fresh in
my mind.

I seem to remember that there was a slight tradeoff for rset scanning. I
tried 1024, 2048, 4096, 8192 and the sweet spot for me was 4096.

Let me know what you find. I'm curious to see if your results match mine.

Ryan
On Jan 2, 2014 5:35 PM, "yao" <yaoshengzhe at gmail.com> wrote:

> Hi Ryan,
>
> I've enabled gc logging options you mentioned and it looks like rset
> coarsenings is a problem for large gc clean up time. I will take your
> suggestions and try different G1RSetRegionEntries values. Thank you very
> much.
>
> Happy New Year
> -Shengzhe
>
> *Typical RSet Log*
>  Concurrent RS processed 184839720 cards
>   Of 960997 completed buffers:
>        930426 ( 96.8%) by conc RS threads.
>         30571 (  3.2%) by mutator threads.
>   Conc RS threads times(s)
>           0.00     0.00     0.00     0.00     0.00     0.00     0.00
> 0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00
> 0.00     0.00     0.00    \
>  0.00
>   Total heap region rem set sizes = 5256086K.  Max = 8640K.
>   Static structures = 347K, free_lists = 7420K.
>     1166427614 occupied cards represented.
>     Max size region =
> 296:(O)[0x00007fc7a6000000,0x00007fc7a8000000,0x00007fc7a8000000], size =
> 8641K, occupied = 1797K.
>     Did 25790 coarsenings.
>
> Output of *$ cat gc-hbase-1388692019.log | grep "coarsenings\|\(GC
> cleanup\)"*
>     Did 0 coarsenings.
>     Did 0 coarsenings.
>     Did 0 coarsenings.
>     Did 0 coarsenings.
>     Did 0 coarsenings.
>     Did 0 coarsenings.
>     Did 72 coarsenings.
>     Did 224 coarsenings.
> 2014-01-02T15:12:03.031-0500: 1452.619: [GC cleanup 44G->43G(66G),
> 0.0376940 secs]
>     Did 1015 coarsenings.
>     Did 1476 coarsenings.
>     Did 2210 coarsenings.
> 2014-01-02T15:25:37.483-0500: 2267.070: [GC cleanup 43G->42G(66G),
> 0.0539190 secs]
>     Did 4123 coarsenings.
>     Did 4817 coarsenings.
>     Did 5362 coarsenings.
> 2014-01-02T15:40:19.499-0500: 3149.087: [GC cleanup 44G->42G(66G),
> 0.0661880 secs]
>     Did 6316 coarsenings.
>     Did 6842 coarsenings.
>     Did 7213 coarsenings.
> 2014-01-02T15:54:42.812-0500: 4012.400: [GC cleanup 43G->42G(66G),
> 0.0888960 secs]
>     Did 7458 coarsenings.
>     Did 7739 coarsenings.
>     Did 8214 coarsenings.
> 2014-01-02T16:09:04.009-0500: 4873.597: [GC cleanup 44G->43G(66G),
> 0.1171540 secs]
>     Did 8958 coarsenings.
>     Did 8973 coarsenings.
>     Did 9056 coarsenings.
>     Did 9543 coarsenings.
> 2014-01-02T16:23:51.359-0500: 5760.947: [GC cleanup 44G->43G(66G),
> 0.1526980 secs]
>     Did 9561 coarsenings.
>     Did 9873 coarsenings.
>     Did 10209 coarsenings.
> 2014-01-02T16:39:04.462-0500: 6674.050: [GC cleanup 44G->43G(66G),
> 0.1923330 secs]
>     Did 10599 coarsenings.
>     Did 10849 coarsenings.
>     Did 11178 coarsenings.
> 2014-01-02T16:46:57.445-0500: 7147.033: [GC cleanup 44G->44G(66G),
> 0.2353640 secs]
>     Did 11746 coarsenings.
>     Did 12701 coarsenings.
> 2014-01-02T16:53:17.536-0500: 7527.124: [GC cleanup 44G->44G(66G),
> 0.3489450 secs]
>     Did 13272 coarsenings.
>     Did 14682 coarsenings.
> 2014-01-02T16:58:00.726-0500: 7810.314: [GC cleanup 44G->44G(66G),
> 0.4271240 secs]
>     Did 16630 coarsenings.
> 2014-01-02T17:01:37.077-0500: 8026.664: [GC cleanup 44G->44G(66G),
> 0.5089060 secs]
>     Did 17612 coarsenings.
>     Did 21654 coarsenings.
> 2014-01-02T17:06:02.566-0500: 8292.154: [GC cleanup 44G->44G(66G),
> 0.5531680 secs]
>     Did 23774 coarsenings.
>     Did 24074 coarsenings.
> 2014-01-02T17:11:24.795-0500: 8614.383: [GC cleanup 44G->44G(66G),
> 0.5290600 secs]
>     Did 24768 coarsenings.
> 2014-01-02T17:17:23.219-0500: 8972.807: [GC cleanup 44G->44G(66G),
> 0.5382620 secs]
>     Did 25790 coarsenings.
>     Did 27047 coarsenings.
> 2014-01-02T17:23:00.551-0500: 9310.139: [GC cleanup 45G->44G(66G),
> 0.5107910 secs]
>     Did 28558 coarsenings.
> 2014-01-02T17:28:22.157-0500: 9631.745: [GC cleanup 45G->44G(66G),
> 0.4902690 secs]
>     Did 29272 coarsenings.
>     Did 29335 coarsenings.
>
>
> On Thu, Jan 2, 2014 at 10:49 AM, YU ZHANG <yu.zhang at oracle.com> wrote:
>
>>  Ryan,
>>
>> Please see my comments in line.
>>
>> Thanks,
>> Jenny
>>
>> On 1/2/2014 9:57 AM, Ryan Gardner wrote:
>>
>>  I've also fought with cleanup times being long with a large heap and
>> G1. In my case, I was suspicious that the RSet coarsening was increasing
>> the time for GC Cleanups.
>>
>> If you have a way to test different settings in a non-production
>> environment, you could consider experimenting with:
>>
>>
>>  -XX:+UnlockExperimentalVMOptions
>>
>> -XX:G1RSetRegionEntries=4096
>>
>> and different values for the RSetRegionEntries - 4096 was a sweet spot
>> for me, but your application may behave differently.
>>
>> You can turn on:
>>
>> -XX:+UnlockDiagnosticVMOptions
>>
>> -XX:+G1SummarizeRSetStats
>>
>> -XX:G1SummarizeRSetStatsPeriod=20
>>
>> to get it to spit out what it is doing to get some more insight into
>> those times.
>>
>>
>>  The specific number of RSetRegionEntries I set (4096) was, in theory,
>> supposed to be close to what it was setting based on my region size (also
>> 32m) and number of regions- but it did not seem to be.
>>
>> If G1RSetRegionEntries not set, it is decided by
>> G1RSetRegionEntriesBase*(region_size_log_mb+1).
>> G1SetRegionEntriesBase is a constant(256). region_size_log_mb is related
>> to heap region size(region_size_mb-20).
>>
>> If  you have 92G heap, and 32m regions size, I guess the default value is
>> bigger than 4096?
>> Assuming my guess was right, you decide to reduce the entries as not
>> seeing 'coarsenings' in the G1SummarizeRSetStats output?  Did you see the
>> cards for old or young regions increase as the clean up time increase?
>> Also in your log, when clean up time increase, is it update RS or scan RS?
>>
>>  Also, if you have more memory available, I have found G1 to take the
>> extra memory and not increase pause times much. As you increase the total
>> heap size, the size of your smallest possible collection will also increase
>> since it sets it to a percentage of total heap... In my case I was tuning
>> an applicaiton that was a cache, so it had tons heap space but wasn't
>> churning it over much...
>>
>> I ended up going as low as:
>>
>> -XX:G1NewSizePercent=1
>>
>> to let G1 feel free to use as few regions as possible to achieve smaller
>> pause times.
>>
>> G1NewSizePercent(default 5) allows G1 to allocate this percent of heap as
>> young gen size.  Lowering it should results smaller young gen.  So the
>> young gc pause is smaller.
>>
>>  I've been running in production on 1.7u40 for several months now with
>> 92GB heaps and a worst-case cleanup pause time of around 370ms - prior to
>> tuning the rset region entries, the cleanup phase was getting worse and
>> worse over time and in testing would sometimes be over 1 second.
>>
>> I meant to dive into the OpenJDK code to look at where the default
>> RSetRegionEntries are calculated, but didn't get around to it.
>>
>>
>>  Hope that helps,
>>
>> Ryan Gardner
>>
>>
>>  On Dec 31, 2013 8:29 PM, "yao" <yaoshengzhe at gmail.com> wrote:
>>
>>>   Hi Folks,
>>>
>>>  Sorry for reporting GC performance result late, we are in the code
>>> freeze period for the holiday season and cannot do any production related
>>> deployment.
>>>
>>> First, I'd like to say thank you to Jenny, Monica and Thomas. Your
>>> suggestions are really helpful and help us to understand G1 GC behavior. We
>>> did NOT observe any full GCs after adjusting suggested parameters. That is
>>> really awesome, we tried these new parameters on Dec 26 and full GC
>>> disappeared since then (at least until I am writing this email, at 3:37pm
>>> EST, Dec 30).
>>>
>>> G1 parameters:
>>>
>>> *-XX:MaxGCPauseMillis=100  *-XX:G1HeapRegionSize=32m
>>>
>>> *-XX:InitiatingHeapOccupancyPercent=65  *-XX:G1ReservePercent=20
>>>
>>>
>>>
>>> *-XX:G1HeapWastePercent=5  -XX:G1MixedGCLiveThresholdPercent=75 *
>>>  We've reduced MaxGCPauseMillis to 100 since our real-time system is
>>> focus on low pause, if system cannot give response in 50 milliseconds, it's
>>> totally useless for the client. However, current read latency 99 percentile
>>> is still slightly higher than CMS machines but they are pretty close (14
>>> millis vs 12 millis). One thing we can do now is to increase heap size for
>>> G1 machines, for now, the heap size for G1 is only 90 percent of those CMS
>>> machines. This is because we observed our server process is killed by OOM
>>> killer on G1 machines and we decided to decrease heap size on G1 machines.
>>> Since G1ReservePercent was increased, we think it should be safe to
>>> increase G1 heap to be same as CMS machine. We believe it could make G1
>>> machine give us better performance because 40 percent of heap will be used
>>> for block cache.
>>>
>>>  Thanks
>>>  -Shengzhe
>>>
>>>  G1 Logs
>>>
>>>  2013-12-30T08:25:26.727-0500: 308692.158: [GC pause (young)
>>> Desired survivor size 234881024 bytes, new threshold 14 (max 15)
>>> - age   1:   16447904 bytes,   16447904 total
>>> - age   2:   30614384 bytes,   47062288 total
>>> - age   3:   16122104 bytes,   63184392 total
>>> - age   4:   16542280 bytes,   79726672 total
>>> - age   5:   14249520 bytes,   93976192 total
>>> - age   6:   15187728 bytes,  109163920 total
>>> - age   7:   15073808 bytes,  124237728 total
>>> - age   8:   17903552 bytes,  142141280 total
>>> - age   9:   17031280 bytes,  159172560 total
>>> - age  10:   16854792 bytes,  176027352 total
>>> - age  11:   19192480 bytes,  195219832 total
>>> - age  12:   20491176 bytes,  215711008 total
>>> - age  13:   16367528 bytes,  232078536 total
>>> - age  14:   15536120 bytes,  247614656 total
>>>  308692.158: [G1Ergonomics (CSet Construction) start choosing CSet,
>>> _pending_cards: 32768, predicted base time: 38.52 ms, remaining time: 61.48
>>> ms, target pause time: 100.00 ms]
>>>  308692.158: [G1Ergonomics (CSet Construction) add young regions to
>>> CSet, eden: 91 regions, survivors: 14 regions, predicted young region time:
>>> 27.76 ms]
>>>  308692.158: [G1Ergonomics (CSet Construction) finish choosing CSet,
>>> eden: 91 regions, survivors: 14 regions, old: 0 regions, predicted pause
>>> time: 66.28 ms, target pause time: 100.00 ms]
>>>  308692.233: [G1Ergonomics (Concurrent Cycles) request concurrent cycle
>>> initiation, reason: occupancy higher than threshold, occupancy: 52143587328
>>> bytes, allocation request: 0 bytes, threshold: 46172576125 bytes (65.00 %),
>>> source: end of GC]
>>> , 0.0749020 secs]
>>>    [Parallel Time: 53.9 ms, GC Workers: 18]
>>>       [GC Worker Start (ms): Min: 308692158.6, Avg: 308692159.0, Max:
>>> 308692159.4, Diff: 0.8]
>>>       [Ext Root Scanning (ms): Min: 3.9, Avg: 4.5, Max: 6.4, Diff: 2.4,
>>> Sum: 81.9]
>>>       [Update RS (ms): Min: 10.2, Avg: 11.6, Max: 12.2, Diff: 2.0, Sum:
>>> 209.0]
>>>          [Processed Buffers: Min: 15, Avg: 22.5, Max: 31, Diff: 16, Sum:
>>> 405]
>>>       [Scan RS (ms): Min: 7.8, Avg: 8.0, Max: 8.3, Diff: 0.5, Sum: 144.3]
>>>       [Object Copy (ms): Min: 28.3, Avg: 28.4, Max: 28.5, Diff: 0.2,
>>> Sum: 510.7]
>>>       [Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum:
>>> 1.2]
>>>       [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1,
>>> Sum: 0.5]
>>>       [GC Worker Total (ms): Min: 52.3, Avg: 52.6, Max: 53.1, Diff: 0.8,
>>> Sum: 947.5]
>>>       [GC Worker End (ms): Min: 308692211.6, Avg: 308692211.7, Max:
>>> 308692211.7, Diff: 0.1]
>>>    [Code Root Fixup: 0.0 ms]
>>>    [Clear CT: 9.8 ms]
>>>    [Other: 11.1 ms]
>>>       [Choose CSet: 0.0 ms]
>>>       [Ref Proc: 2.4 ms]
>>>       [Ref Enq: 0.4 ms]
>>>        [Free CSet: 1.1 ms]
>>>    [Eden: 2912.0M(2912.0M)->0.0B(3616.0M) Survivors: 448.0M->416.0M
>>> Heap: 51.7G(66.2G)->48.9G(66.2G)]
>>>  [Times: user=1.07 sys=0.01, real=0.08 secs]
>>>  308697.312: [G1Ergonomics (Concurrent Cycles) initiate concurrent
>>> cycle, reason: concurrent cycle initiation requested]
>>> 2013-12-30T08:25:31.881-0500: 308697.312: [GC pause (young)
>>> (initial-mark)
>>> Desired survivor size 268435456 bytes, new threshold 15 (max 15)
>>> - age   1:   17798336 bytes,   17798336 total
>>>  - age   2:   15275456 bytes,   33073792 total
>>> - age   3:   27940176 bytes,   61013968 total
>>> - age   4:   15716648 bytes,   76730616 total
>>> - age   5:   16474656 bytes,   93205272 total
>>>  - age   6:   14249232 bytes,  107454504 total
>>> - age   7:   15187536 bytes,  122642040 total
>>> - age   8:   15073808 bytes,  137715848 total
>>> - age   9:   17362752 bytes,  155078600 total
>>>  - age  10:   17031280 bytes,  172109880 total
>>> - age  11:   16854792 bytes,  188964672 total
>>> - age  12:   19124800 bytes,  208089472 total
>>> - age  13:   20491176 bytes,  228580648 total
>>>  - age  14:   16367528 bytes,  244948176 total
>>>  308697.313: [G1Ergonomics (CSet Construction) start choosing CSet,
>>> _pending_cards: 31028, predicted base time: 37.87 ms, remaining time: 62.13
>>> ms, target pause time: 100.00 ms]
>>>  308697.313: [G1Ergonomics (CSet Construction) add young regions to
>>> CSet, eden: 113 regions, survivors: 13 regions, predicted young region
>>> time: 27.99 ms]
>>>  308697.313: [G1Ergonomics (CSet Construction) finish choosing CSet,
>>> eden: 113 regions, survivors: 13 regions, old: 0 regions, predicted pause
>>> time: 65.86 ms, target pause time: 100.00 ms]
>>> , 0.0724890 secs]
>>>    [Parallel Time: 51.9 ms, GC Workers: 18]
>>>       [GC Worker Start (ms): Min: 308697313.3, Avg: 308697313.7, Max:
>>> 308697314.0, Diff: 0.6]
>>>       [Ext Root Scanning (ms): Min: 4.3, Avg: 5.7, Max: 16.7, Diff:
>>> 12.3, Sum: 101.8]
>>>       [Update RS (ms): Min: 0.0, Avg: 9.3, Max: 10.4, Diff: 10.4, Sum:
>>> 166.9]
>>>          [Processed Buffers: Min: 0, Avg: 22.0, Max: 30, Diff: 30, Sum:
>>> 396]
>>>       [Scan RS (ms): Min: 6.4, Avg: 8.5, Max: 13.0, Diff: 6.5, Sum:
>>> 152.3]
>>>       [Object Copy (ms): Min: 22.5, Avg: 27.1, Max: 27.7, Diff: 5.2,
>>> Sum: 487.0]
>>>       [Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum:
>>> 1.0]
>>>       [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1,
>>> Sum: 0.6]
>>>       [GC Worker Total (ms): Min: 50.2, Avg: 50.5, Max: 50.9, Diff: 0.6,
>>> Sum: 909.5]
>>>       [GC Worker End (ms): Min: 308697364.2, Avg: 308697364.2, Max:
>>> 308697364.3, Diff: 0.1]
>>>    [Code Root Fixup: 0.0 ms]
>>>    [Clear CT: 9.9 ms]
>>>    [Other: 10.8 ms]
>>>       [Choose CSet: 0.0 ms]
>>>       [Ref Proc: 2.8 ms]
>>>       [Ref Enq: 0.4 ms]
>>>        [Free CSet: 0.9 ms]
>>>    [Eden: 3616.0M(3616.0M)->0.0B(3520.0M) Survivors: 416.0M->448.0M
>>> Heap: 52.5G(66.2G)->49.0G(66.2G)]
>>>  [Times: user=1.01 sys=0.00, real=0.07 secs]
>>> 2013-12-30T08:25:31.954-0500: 308697.385: [GC
>>> concurrent-root-region-scan-start]
>>> 2013-12-30T08:25:31.967-0500: 308697.398: [GC
>>> concurrent-root-region-scan-end, 0.0131710 secs]
>>> 2013-12-30T08:25:31.967-0500: 308697.398: [GC concurrent-mark-start]
>>> 2013-12-30T08:25:36.566-0500: 308701.997: [GC concurrent-mark-end,
>>> 4.5984140 secs]
>>> 2013-12-30T08:25:36.570-0500: 308702.002: [GC remark
>>> 2013-12-30T08:25:36.573-0500: 308702.004: [GC ref-proc, 0.0126990 secs],
>>> 0.0659540 secs]
>>>  [Times: user=0.87 sys=0.00, real=0.06 secs]
>>> 2013-12-30T08:25:36.641-0500: 308702.072: [GC cleanup 52G->52G(66G),
>>> 0.5487830 secs]
>>>  [Times: user=9.66 sys=0.06, real=0.54 secs]
>>> 2013-12-30T08:25:37.190-0500: 308702.622: [GC concurrent-cleanup-start]
>>> 2013-12-30T08:25:37.190-0500: 308702.622: [GC concurrent-cleanup-end,
>>> 0.0000480 secs]
>>>
>>>
>>>
>>> _______________________________________________
>>> hotspot-gc-use mailing list
>>> hotspot-gc-use at openjdk.java.net
>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>
>>>
>>
>> _______________________________________________
>> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>
>>
>>
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140102/d97f0482/attachment-0001.html 


More information about the hotspot-gc-use mailing list