G1 GC clean up time is too long

Thu Jan 2 14:35:10 PST 2014

Hi Ryan,

I've enabled gc logging options you mentioned and it looks like rset
coarsenings is a problem for large gc clean up time. I will take your
suggestions and try different G1RSetRegionEntries values. Thank you very
much.

Happy New Year
-Shengzhe

*Typical RSet Log*
 Concurrent RS processed 184839720 cards
  Of 960997 completed buffers:
       930426 ( 96.8%) by conc RS threads.
        30571 (  3.2%) by mutator threads.
  Conc RS threads times(s)
          0.00     0.00     0.00     0.00     0.00     0.00     0.00
0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00
0.00     0.00     0.00    \
 0.00
  Total heap region rem set sizes = 5256086K.  Max = 8640K.
  Static structures = 347K, free_lists = 7420K.
    1166427614 occupied cards represented.
    Max size region =
296:(O)[0x00007fc7a6000000,0x00007fc7a8000000,0x00007fc7a8000000], size =
8641K, occupied = 1797K.
    Did 25790 coarsenings.

Output of *$ cat gc-hbase-1388692019.log | grep "coarsenings\|\(GC
cleanup\)"*
    Did 0 coarsenings.
    Did 0 coarsenings.
    Did 0 coarsenings.
    Did 0 coarsenings.
    Did 0 coarsenings.
    Did 0 coarsenings.
    Did 72 coarsenings.
    Did 224 coarsenings.
2014-01-02T15:12:03.031-0500: 1452.619: [GC cleanup 44G->43G(66G),
0.0376940 secs]
    Did 1015 coarsenings.
    Did 1476 coarsenings.
    Did 2210 coarsenings.
2014-01-02T15:25:37.483-0500: 2267.070: [GC cleanup 43G->42G(66G),
0.0539190 secs]
    Did 4123 coarsenings.
    Did 4817 coarsenings.
    Did 5362 coarsenings.
2014-01-02T15:40:19.499-0500: 3149.087: [GC cleanup 44G->42G(66G),
0.0661880 secs]
    Did 6316 coarsenings.
    Did 6842 coarsenings.
    Did 7213 coarsenings.
2014-01-02T15:54:42.812-0500: 4012.400: [GC cleanup 43G->42G(66G),
0.0888960 secs]
    Did 7458 coarsenings.
    Did 7739 coarsenings.
    Did 8214 coarsenings.
2014-01-02T16:09:04.009-0500: 4873.597: [GC cleanup 44G->43G(66G),
0.1171540 secs]
    Did 8958 coarsenings.
    Did 8973 coarsenings.
    Did 9056 coarsenings.
    Did 9543 coarsenings.
2014-01-02T16:23:51.359-0500: 5760.947: [GC cleanup 44G->43G(66G),
0.1526980 secs]
    Did 9561 coarsenings.
    Did 9873 coarsenings.
    Did 10209 coarsenings.
2014-01-02T16:39:04.462-0500: 6674.050: [GC cleanup 44G->43G(66G),
0.1923330 secs]
    Did 10599 coarsenings.
    Did 10849 coarsenings.
    Did 11178 coarsenings.
2014-01-02T16:46:57.445-0500: 7147.033: [GC cleanup 44G->44G(66G),
0.2353640 secs]
    Did 11746 coarsenings.
    Did 12701 coarsenings.
2014-01-02T16:53:17.536-0500: 7527.124: [GC cleanup 44G->44G(66G),
0.3489450 secs]
    Did 13272 coarsenings.
    Did 14682 coarsenings.
2014-01-02T16:58:00.726-0500: 7810.314: [GC cleanup 44G->44G(66G),
0.4271240 secs]
    Did 16630 coarsenings.
2014-01-02T17:01:37.077-0500: 8026.664: [GC cleanup 44G->44G(66G),
0.5089060 secs]
    Did 17612 coarsenings.
    Did 21654 coarsenings.
2014-01-02T17:06:02.566-0500: 8292.154: [GC cleanup 44G->44G(66G),
0.5531680 secs]
    Did 23774 coarsenings.
    Did 24074 coarsenings.
2014-01-02T17:11:24.795-0500: 8614.383: [GC cleanup 44G->44G(66G),
0.5290600 secs]
    Did 24768 coarsenings.
2014-01-02T17:17:23.219-0500: 8972.807: [GC cleanup 44G->44G(66G),
0.5382620 secs]
    Did 25790 coarsenings.
    Did 27047 coarsenings.
2014-01-02T17:23:00.551-0500: 9310.139: [GC cleanup 45G->44G(66G),
0.5107910 secs]
    Did 28558 coarsenings.
2014-01-02T17:28:22.157-0500: 9631.745: [GC cleanup 45G->44G(66G),
0.4902690 secs]
    Did 29272 coarsenings.
    Did 29335 coarsenings.

On Thu, Jan 2, 2014 at 10:49 AM, YU ZHANG <yu.zhang at oracle.com> wrote:

>  Ryan,
>
> Please see my comments in line.
>
> Thanks,
> Jenny
>
> On 1/2/2014 9:57 AM, Ryan Gardner wrote:
>
>  I've also fought with cleanup times being long with a large heap and G1.
> In my case, I was suspicious that the RSet coarsening was increasing the
> time for GC Cleanups.
>
> If you have a way to test different settings in a non-production
> environment, you could consider experimenting with:
>
>
>  -XX:+UnlockExperimentalVMOptions
>
> -XX:G1RSetRegionEntries=4096
>
> and different values for the RSetRegionEntries - 4096 was a sweet spot for
> me, but your application may behave differently.
>
> You can turn on:
>
> -XX:+UnlockDiagnosticVMOptions
>
> -XX:+G1SummarizeRSetStats
>
> -XX:G1SummarizeRSetStatsPeriod=20
>
> to get it to spit out what it is doing to get some more insight into those
> times.
>
>
>  The specific number of RSetRegionEntries I set (4096) was, in theory,
> supposed to be close to what it was setting based on my region size (also
> 32m) and number of regions- but it did not seem to be.
>
> If G1RSetRegionEntries not set, it is decided by
> G1RSetRegionEntriesBase*(region_size_log_mb+1).
> G1SetRegionEntriesBase is a constant(256). region_size_log_mb is related
> to heap region size(region_size_mb-20).
>
> If  you have 92G heap, and 32m regions size, I guess the default value is
> bigger than 4096?
> Assuming my guess was right, you decide to reduce the entries as not
> seeing 'coarsenings' in the G1SummarizeRSetStats output?  Did you see the
> cards for old or young regions increase as the clean up time increase?
> Also in your log, when clean up time increase, is it update RS or scan RS?
>
>  Also, if you have more memory available, I have found G1 to take the
> extra memory and not increase pause times much. As you increase the total
> heap size, the size of your smallest possible collection will also increase
> since it sets it to a percentage of total heap... In my case I was tuning
> an applicaiton that was a cache, so it had tons heap space but wasn't
> churning it over much...
>
> I ended up going as low as:
>
> -XX:G1NewSizePercent=1
>
> to let G1 feel free to use as few regions as possible to achieve smaller
> pause times.
>
> G1NewSizePercent(default 5) allows G1 to allocate this percent of heap as
> young gen size.  Lowering it should results smaller young gen.  So the
> young gc pause is smaller.
>
>  I've been running in production on 1.7u40 for several months now with
> 92GB heaps and a worst-case cleanup pause time of around 370ms - prior to
> tuning the rset region entries, the cleanup phase was getting worse and
> worse over time and in testing would sometimes be over 1 second.
>
> I meant to dive into the OpenJDK code to look at where the default
> RSetRegionEntries are calculated, but didn't get around to it.
>
>
>  Hope that helps,
>
> Ryan Gardner
>
>
>  On Dec 31, 2013 8:29 PM, "yao" <yaoshengzhe at gmail.com> wrote:
>
>>   Hi Folks,
>>
>>  Sorry for reporting GC performance result late, we are in the code
>> freeze period for the holiday season and cannot do any production related
>> deployment.
>>
>> First, I'd like to say thank you to Jenny, Monica and Thomas. Your
>> suggestions are really helpful and help us to understand G1 GC behavior. We
>> did NOT observe any full GCs after adjusting suggested parameters. That is
>> really awesome, we tried these new parameters on Dec 26 and full GC
>> disappeared since then (at least until I am writing this email, at 3:37pm
>> EST, Dec 30).
>>
>> G1 parameters:
>>
>> *-XX:MaxGCPauseMillis=100  *-XX:G1HeapRegionSize=32m
>>
>> *-XX:InitiatingHeapOccupancyPercent=65  *-XX:G1ReservePercent=20
>>
>>
>>
>> *-XX:G1HeapWastePercent=5  -XX:G1MixedGCLiveThresholdPercent=75 *
>>  We've reduced MaxGCPauseMillis to 100 since our real-time system is
>> focus on low pause, if system cannot give response in 50 milliseconds, it's
>> totally useless for the client. However, current read latency 99 percentile
>> is still slightly higher than CMS machines but they are pretty close (14
>> millis vs 12 millis). One thing we can do now is to increase heap size for
>> G1 machines, for now, the heap size for G1 is only 90 percent of those CMS
>> machines. This is because we observed our server process is killed by OOM
>> killer on G1 machines and we decided to decrease heap size on G1 machines.
>> Since G1ReservePercent was increased, we think it should be safe to
>> increase G1 heap to be same as CMS machine. We believe it could make G1
>> machine give us better performance because 40 percent of heap will be used
>> for block cache.
>>
>>  Thanks
>>  -Shengzhe
>>
>>  G1 Logs
>>
>>  2013-12-30T08:25:26.727-0500: 308692.158: [GC pause (young)
>> Desired survivor size 234881024 bytes, new threshold 14 (max 15)
>> - age   1:   16447904 bytes,   16447904 total
>> - age   2:   30614384 bytes,   47062288 total
>> - age   3:   16122104 bytes,   63184392 total
>> - age   4:   16542280 bytes,   79726672 total
>> - age   5:   14249520 bytes,   93976192 total
>> - age   6:   15187728 bytes,  109163920 total
>> - age   7:   15073808 bytes,  124237728 total
>> - age   8:   17903552 bytes,  142141280 total
>> - age   9:   17031280 bytes,  159172560 total
>> - age  10:   16854792 bytes,  176027352 total
>> - age  11:   19192480 bytes,  195219832 total
>> - age  12:   20491176 bytes,  215711008 total
>> - age  13:   16367528 bytes,  232078536 total
>> - age  14:   15536120 bytes,  247614656 total
>>  308692.158: [G1Ergonomics (CSet Construction) start choosing CSet,
>> _pending_cards: 32768, predicted base time: 38.52 ms, remaining time: 61.48
>> ms, target pause time: 100.00 ms]
>>  308692.158: [G1Ergonomics (CSet Construction) add young regions to CSet,
>> eden: 91 regions, survivors: 14 regions, predicted young region time: 27.76
>> ms]
>>  308692.158: [G1Ergonomics (CSet Construction) finish choosing CSet,
>> eden: 91 regions, survivors: 14 regions, old: 0 regions, predicted pause
>> time: 66.28 ms, target pause time: 100.00 ms]
>>  308692.233: [G1Ergonomics (Concurrent Cycles) request concurrent cycle
>> initiation, reason: occupancy higher than threshold, occupancy: 52143587328
>> bytes, allocation request: 0 bytes, threshold: 46172576125 bytes (65.00 %),
>> source: end of GC]
>> , 0.0749020 secs]
>>    [Parallel Time: 53.9 ms, GC Workers: 18]
>>       [GC Worker Start (ms): Min: 308692158.6, Avg: 308692159.0, Max:
>> 308692159.4, Diff: 0.8]
>>       [Ext Root Scanning (ms): Min: 3.9, Avg: 4.5, Max: 6.4, Diff: 2.4,
>> Sum: 81.9]
>>       [Update RS (ms): Min: 10.2, Avg: 11.6, Max: 12.2, Diff: 2.0, Sum:
>> 209.0]
>>          [Processed Buffers: Min: 15, Avg: 22.5, Max: 31, Diff: 16, Sum:
>> 405]
>>       [Scan RS (ms): Min: 7.8, Avg: 8.0, Max: 8.3, Diff: 0.5, Sum: 144.3]
>>       [Object Copy (ms): Min: 28.3, Avg: 28.4, Max: 28.5, Diff: 0.2, Sum:
>> 510.7]
>>       [Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum:
>> 1.2]
>>       [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1,
>> Sum: 0.5]
>>       [GC Worker Total (ms): Min: 52.3, Avg: 52.6, Max: 53.1, Diff: 0.8,
>> Sum: 947.5]
>>       [GC Worker End (ms): Min: 308692211.6, Avg: 308692211.7, Max:
>> 308692211.7, Diff: 0.1]
>>    [Code Root Fixup: 0.0 ms]
>>    [Clear CT: 9.8 ms]
>>    [Other: 11.1 ms]
>>       [Choose CSet: 0.0 ms]
>>       [Ref Proc: 2.4 ms]
>>       [Ref Enq: 0.4 ms]
>>        [Free CSet: 1.1 ms]
>>    [Eden: 2912.0M(2912.0M)->0.0B(3616.0M) Survivors: 448.0M->416.0M Heap:
>> 51.7G(66.2G)->48.9G(66.2G)]
>>  [Times: user=1.07 sys=0.01, real=0.08 secs]
>>  308697.312: [G1Ergonomics (Concurrent Cycles) initiate concurrent cycle,
>> reason: concurrent cycle initiation requested]
>> 2013-12-30T08:25:31.881-0500: 308697.312: [GC pause (young) (initial-mark)
>> Desired survivor size 268435456 bytes, new threshold 15 (max 15)
>> - age   1:   17798336 bytes,   17798336 total
>>  - age   2:   15275456 bytes,   33073792 total
>> - age   3:   27940176 bytes,   61013968 total
>> - age   4:   15716648 bytes,   76730616 total
>> - age   5:   16474656 bytes,   93205272 total
>>  - age   6:   14249232 bytes,  107454504 total
>> - age   7:   15187536 bytes,  122642040 total
>> - age   8:   15073808 bytes,  137715848 total
>> - age   9:   17362752 bytes,  155078600 total
>>  - age  10:   17031280 bytes,  172109880 total
>> - age  11:   16854792 bytes,  188964672 total
>> - age  12:   19124800 bytes,  208089472 total
>> - age  13:   20491176 bytes,  228580648 total
>>  - age  14:   16367528 bytes,  244948176 total
>>  308697.313: [G1Ergonomics (CSet Construction) start choosing CSet,
>> _pending_cards: 31028, predicted base time: 37.87 ms, remaining time: 62.13
>> ms, target pause time: 100.00 ms]
>>  308697.313: [G1Ergonomics (CSet Construction) add young regions to CSet,
>> eden: 113 regions, survivors: 13 regions, predicted young region time:
>> 27.99 ms]
>>  308697.313: [G1Ergonomics (CSet Construction) finish choosing CSet,
>> eden: 113 regions, survivors: 13 regions, old: 0 regions, predicted pause
>> time: 65.86 ms, target pause time: 100.00 ms]
>> , 0.0724890 secs]
>>    [Parallel Time: 51.9 ms, GC Workers: 18]
>>       [GC Worker Start (ms): Min: 308697313.3, Avg: 308697313.7, Max:
>> 308697314.0, Diff: 0.6]
>>       [Ext Root Scanning (ms): Min: 4.3, Avg: 5.7, Max: 16.7, Diff: 12.3,
>> Sum: 101.8]
>>       [Update RS (ms): Min: 0.0, Avg: 9.3, Max: 10.4, Diff: 10.4, Sum:
>> 166.9]
>>          [Processed Buffers: Min: 0, Avg: 22.0, Max: 30, Diff: 30, Sum:
>> 396]
>>       [Scan RS (ms): Min: 6.4, Avg: 8.5, Max: 13.0, Diff: 6.5, Sum: 152.3]
>>       [Object Copy (ms): Min: 22.5, Avg: 27.1, Max: 27.7, Diff: 5.2, Sum:
>> 487.0]
>>       [Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum:
>> 1.0]
>>       [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1,
>> Sum: 0.6]
>>       [GC Worker Total (ms): Min: 50.2, Avg: 50.5, Max: 50.9, Diff: 0.6,
>> Sum: 909.5]
>>       [GC Worker End (ms): Min: 308697364.2, Avg: 308697364.2, Max:
>> 308697364.3, Diff: 0.1]
>>    [Code Root Fixup: 0.0 ms]
>>    [Clear CT: 9.9 ms]
>>    [Other: 10.8 ms]
>>       [Choose CSet: 0.0 ms]
>>       [Ref Proc: 2.8 ms]
>>       [Ref Enq: 0.4 ms]
>>        [Free CSet: 0.9 ms]
>>    [Eden: 3616.0M(3616.0M)->0.0B(3520.0M) Survivors: 416.0M->448.0M Heap:
>> 52.5G(66.2G)->49.0G(66.2G)]
>>  [Times: user=1.01 sys=0.00, real=0.07 secs]
>> 2013-12-30T08:25:31.954-0500: 308697.385: [GC
>> concurrent-root-region-scan-start]
>> 2013-12-30T08:25:31.967-0500: 308697.398: [GC
>> concurrent-root-region-scan-end, 0.0131710 secs]
>> 2013-12-30T08:25:31.967-0500: 308697.398: [GC concurrent-mark-start]
>> 2013-12-30T08:25:36.566-0500: 308701.997: [GC concurrent-mark-end,
>> 4.5984140 secs]
>> 2013-12-30T08:25:36.570-0500: 308702.002: [GC remark
>> 2013-12-30T08:25:36.573-0500: 308702.004: [GC ref-proc, 0.0126990 secs],
>> 0.0659540 secs]
>>  [Times: user=0.87 sys=0.00, real=0.06 secs]
>> 2013-12-30T08:25:36.641-0500: 308702.072: [GC cleanup 52G->52G(66G),
>> 0.5487830 secs]
>>  [Times: user=9.66 sys=0.06, real=0.54 secs]
>> 2013-12-30T08:25:37.190-0500: 308702.622: [GC concurrent-cleanup-start]
>> 2013-12-30T08:25:37.190-0500: 308702.622: [GC concurrent-cleanup-end,
>> 0.0000480 secs]
>>
>>
>>
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>
>>
>
> _______________________________________________
> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140102/5d821f95/attachment.html