G1 GC clean up time is too long

Thu Jan 2 10:49:58 PST 2014

Ryan,

Please see my comments in line.

Thanks,
Jenny

On 1/2/2014 9:57 AM, Ryan Gardner wrote:
>
> I've also fought with cleanup times being long with a large heap and 
> G1. In my case, I was suspicious that the RSet coarsening was 
> increasing the time for GC Cleanups.
>
> If you have a way to test different settings in a non-production 
> environment, you could consider experimenting with:
>
>
> -XX:+UnlockExperimentalVMOptions
>
> -XX:G1RSetRegionEntries=4096
>
> and different values for the RSetRegionEntries - 4096 was a sweet spot 
> for me, but your application may behave differently.
>
> You can turn on:
>
> -XX:+UnlockDiagnosticVMOptions
>
> -XX:+G1SummarizeRSetStats
>
> -XX:G1SummarizeRSetStatsPeriod=20
>
> to get it to spit out what it is doing to get some more insight into 
> those times.
>
>
> The specific number of RSetRegionEntries I set (4096) was, in theory, 
> supposed to be close to what it was setting based on my region size 
> (also 32m) and number of regions- but it did not seem to be.
>
If G1RSetRegionEntries not set, it is decided by 
G1RSetRegionEntriesBase*(region_size_log_mb+1).
G1SetRegionEntriesBase is a constant(256). region_size_log_mb is related 
to heap region size(region_size_mb-20).

If  you have 92G heap, and 32m regions size, I guess the default value 
is bigger than 4096?
Assuming my guess was right, you decide to reduce the entries as not 
seeing 'coarsenings' in the G1SummarizeRSetStats output?  Did you see 
the cards for old or young regions increase as the clean up time 
increase?  Also in your log, when clean up time increase, is it update 
RS or scan RS?
>
> Also, if you have more memory available, I have found G1 to take the 
> extra memory and not increase pause times much. As you increase the 
> total heap size, the size of your smallest possible collection will 
> also increase since it sets it to a percentage of total heap... In my 
> case I was tuning an applicaiton that was a cache, so it had tons heap 
> space but wasn't churning it over much...
>
> I ended up going as low as:
>
> -XX:G1NewSizePercent=1
>
> to let G1 feel free to use as few regions as possible to achieve 
> smaller pause times.
>
G1NewSizePercent(default 5) allows G1 to allocate this percent of heap 
as young gen size.  Lowering it should results smaller young gen.  So 
the young gc pause is smaller.
>
> I've been running in production on 1.7u40 for several months now with 
> 92GB heaps and a worst-case cleanup pause time of around 370ms - prior 
> to tuning the rset region entries, the cleanup phase was getting worse 
> and worse over time and in testing would sometimes be over 1 second.
>
> I meant to dive into the OpenJDK code to look at where the default 
> RSetRegionEntries are calculated, but didn't get around to it.
>
>
> Hope that helps,
>
> Ryan Gardner
>
>
> On Dec 31, 2013 8:29 PM, "yao" <yaoshengzhe at gmail.com 
> <mailto:yaoshengzhe at gmail.com>> wrote:
>
>     Hi Folks,
>
>     Sorry for reporting GC performance result late, we are in the code
>     freeze period for the holiday season and cannot do any production
>     related deployment.
>
>     First, I'd like to say thank you to Jenny, Monica and Thomas. Your
>     suggestions are really helpful and help us to understand G1 GC
>     behavior. We did NOT observe any full GCs after adjusting
>     suggested parameters. That is really awesome, we tried these new
>     parameters on Dec 26 and full GC disappeared since then (at least
>     until I am writing this email, at 3:37pm EST, Dec 30).
>
>     G1 parameters:
>     *-XX:MaxGCPauseMillis=100
>     *-XX:G1HeapRegionSize=32m
>     *-XX:InitiatingHeapOccupancyPercent=65
>     *-XX:G1ReservePercent=20
>     *-XX:G1HeapWastePercent=5
>     -XX:G1MixedGCLiveThresholdPercent=75
>
>     *
>     We've reduced**MaxGCPauseMillis to 100 since our real-time system
>     is focus on low pause, if system cannot give response in 50
>     milliseconds, it's totally useless for the client. However,
>     current read latency 99 percentile is still slightly higher than
>     CMS machines but they are pretty close (14 millis vs 12 millis).
>     One thing we can do now is to increase heap size for G1 machines,
>     for now, the heap size for G1 is only 90 percent of those CMS
>     machines. This is because we observed our server process is killed
>     by OOM killer on G1 machines and we decided to decrease heap size
>     on G1 machines. Since G1ReservePercent was increased, we think it
>     should be safe to increase G1 heap to be same as CMS machine. We
>     believe it could make G1 machine give us better performance
>     because 40 percent of heap will be used for block cache.
>
>     Thanks
>     -Shengzhe
>
>     G1 Logs
>
>     2013-12-30T08:25:26.727-0500: 308692.158: [GC pause (young)
>     Desired survivor size 234881024 bytes, new threshold 14 (max 15)
>     - age   1:   16447904 bytes,   16447904 total
>     - age   2:   30614384 bytes,   47062288 total
>     - age   3:   16122104 bytes,   63184392 total
>     - age   4:   16542280 bytes,   79726672 total
>     - age   5:   14249520 bytes,   93976192 total
>     - age   6:   15187728 bytes,  109163920 total
>     - age   7:   15073808 bytes,  124237728 total
>     - age   8:   17903552 bytes,  142141280 total
>     - age   9:   17031280 bytes,  159172560 total
>     - age  10:   16854792 bytes,  176027352 total
>     - age  11:   19192480 bytes,  195219832 total
>     - age  12:   20491176 bytes,  215711008 total
>     - age  13:   16367528 bytes,  232078536 total
>     - age  14:   15536120 bytes,  247614656 total
>      308692.158: [G1Ergonomics (CSet Construction) start choosing
>     CSet, _pending_cards: 32768, predicted base time: 38.52 ms,
>     remaining time: 61.48 ms, target pause time: 100.00 ms]
>      308692.158: [G1Ergonomics (CSet Construction) add young regions
>     to CSet, eden: 91 regions, survivors: 14 regions, predicted young
>     region time: 27.76 ms]
>      308692.158: [G1Ergonomics (CSet Construction) finish choosing
>     CSet, eden: 91 regions, survivors: 14 regions, old: 0 regions,
>     predicted pause time: 66.28 ms, target pause time: 100.00 ms]
>      308692.233: [G1Ergonomics (Concurrent Cycles) request concurrent
>     cycle initiation, reason: occupancy higher than threshold,
>     occupancy: 52143587328 bytes, allocation request: 0 bytes,
>     threshold: 46172576125 bytes (65.00 %), source: end of GC]
>     , 0.0749020 secs]
>        [Parallel Time: 53.9 ms, GC Workers: 18]
>           [GC Worker Start (ms): Min: 308692158.6 <tel:308692158.6>,
>     Avg: 308692159.0 <tel:308692159.0>, Max: 308692159.4
>     <tel:308692159.4>, Diff: 0.8]
>           [Ext Root Scanning (ms): Min: 3.9, Avg: 4.5, Max: 6.4, Diff:
>     2.4, Sum: 81.9]
>           [Update RS (ms): Min: 10.2, Avg: 11.6, Max: 12.2, Diff: 2.0,
>     Sum: 209.0]
>              [Processed Buffers: Min: 15, Avg: 22.5, Max: 31, Diff:
>     16, Sum: 405]
>           [Scan RS (ms): Min: 7.8, Avg: 8.0, Max: 8.3, Diff: 0.5, Sum:
>     144.3]
>           [Object Copy (ms): Min: 28.3, Avg: 28.4, Max: 28.5, Diff:
>     0.2, Sum: 510.7]
>           [Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1,
>     Sum: 1.2]
>           [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff:
>     0.1, Sum: 0.5]
>           [GC Worker Total (ms): Min: 52.3, Avg: 52.6, Max: 53.1,
>     Diff: 0.8, Sum: 947.5]
>           [GC Worker End (ms): Min: 308692211.6 <tel:308692211.6>,
>     Avg: 308692211.7 <tel:308692211.7>, Max: 308692211.7
>     <tel:308692211.7>, Diff: 0.1]
>        [Code Root Fixup: 0.0 ms]
>        [Clear CT: 9.8 ms]
>        [Other: 11.1 ms]
>           [Choose CSet: 0.0 ms]
>           [Ref Proc: 2.4 ms]
>           [Ref Enq: 0.4 ms]
>           [Free CSet: 1.1 ms]
>        [Eden: 2912.0M(2912.0M)->0.0B(3616.0M) Survivors:
>     448.0M->416.0M Heap: 51.7G(66.2G)->48.9G(66.2G)]
>      [Times: user=1.07 sys=0.01, real=0.08 secs]
>      308697.312: [G1Ergonomics (Concurrent Cycles) initiate concurrent
>     cycle, reason: concurrent cycle initiation requested]
>     2013-12-30T08:25:31.881-0500: 308697.312: [GC pause (young)
>     (initial-mark)
>     Desired survivor size 268435456 bytes, new threshold 15 (max 15)
>     - age   1:   17798336 bytes,   17798336 total
>     - age   2:   15275456 bytes,   33073792 total
>     - age   3:   27940176 bytes,   61013968 total
>     - age   4:   15716648 bytes,   76730616 total
>     - age   5:   16474656 bytes,   93205272 total
>     - age   6:   14249232 bytes,  107454504 total
>     - age   7:   15187536 bytes,  122642040 total
>     - age   8:   15073808 bytes,  137715848 total
>     - age   9:   17362752 bytes,  155078600 total
>     - age  10:   17031280 bytes,  172109880 total
>     - age  11:   16854792 bytes,  188964672 total
>     - age  12:   19124800 bytes,  208089472 total
>     - age  13:   20491176 bytes,  228580648 total
>     - age  14:   16367528 bytes,  244948176 total
>      308697.313: [G1Ergonomics (CSet Construction) start choosing
>     CSet, _pending_cards: 31028, predicted base time: 37.87 ms,
>     remaining time: 62.13 ms, target pause time: 100.00 ms]
>      308697.313: [G1Ergonomics (CSet Construction) add young regions
>     to CSet, eden: 113 regions, survivors: 13 regions, predicted young
>     region time: 27.99 ms]
>      308697.313: [G1Ergonomics (CSet Construction) finish choosing
>     CSet, eden: 113 regions, survivors: 13 regions, old: 0 regions,
>     predicted pause time: 65.86 ms, target pause time: 100.00 ms]
>     , 0.0724890 secs]
>        [Parallel Time: 51.9 ms, GC Workers: 18]
>           [GC Worker Start (ms): Min: 308697313.3 <tel:308697313.3>,
>     Avg: 308697313.7 <tel:308697313.7>, Max: 308697314.0
>     <tel:308697314.0>, Diff: 0.6]
>           [Ext Root Scanning (ms): Min: 4.3, Avg: 5.7, Max: 16.7,
>     Diff: 12.3, Sum: 101.8]
>           [Update RS (ms): Min: 0.0, Avg: 9.3, Max: 10.4, Diff: 10.4,
>     Sum: 166.9]
>              [Processed Buffers: Min: 0, Avg: 22.0, Max: 30, Diff: 30,
>     Sum: 396]
>           [Scan RS (ms): Min: 6.4, Avg: 8.5, Max: 13.0, Diff: 6.5,
>     Sum: 152.3]
>           [Object Copy (ms): Min: 22.5, Avg: 27.1, Max: 27.7, Diff:
>     5.2, Sum: 487.0]
>           [Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1,
>     Sum: 1.0]
>           [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff:
>     0.1, Sum: 0.6]
>           [GC Worker Total (ms): Min: 50.2, Avg: 50.5, Max: 50.9,
>     Diff: 0.6, Sum: 909.5]
>           [GC Worker End (ms): Min: 308697364.2 <tel:308697364.2>,
>     Avg: 308697364.2 <tel:308697364.2>, Max: 308697364.3
>     <tel:308697364.3>, Diff: 0.1]
>        [Code Root Fixup: 0.0 ms]
>        [Clear CT: 9.9 ms]
>        [Other: 10.8 ms]
>           [Choose CSet: 0.0 ms]
>           [Ref Proc: 2.8 ms]
>           [Ref Enq: 0.4 ms]
>           [Free CSet: 0.9 ms]
>        [Eden: 3616.0M(3616.0M)->0.0B(3520.0M) Survivors:
>     416.0M->448.0M Heap: 52.5G(66.2G)->49.0G(66.2G)]
>      [Times: user=1.01 sys=0.00, real=0.07 secs]
>     2013-12-30T08:25:31.954-0500: 308697.385: [GC
>     concurrent-root-region-scan-start]
>     2013-12-30T08:25:31.967-0500: 308697.398: [GC
>     concurrent-root-region-scan-end, 0.0131710 secs]
>     2013-12-30T08:25:31.967-0500: 308697.398: [GC concurrent-mark-start]
>     2013-12-30T08:25:36.566-0500: 308701.997: [GC concurrent-mark-end,
>     4.5984140 secs]
>     2013-12-30T08:25:36.570-0500: 308702.002: [GC remark
>     2013-12-30T08:25:36.573-0500: 308702.004: [GC ref-proc, 0.0126990
>     secs], 0.0659540 secs]
>      [Times: user=0.87 sys=0.00, real=0.06 secs]
>     2013-12-30T08:25:36.641-0500: 308702.072: [GC cleanup
>     52G->52G(66G), 0.5487830 secs]
>      [Times: user=9.66 sys=0.06, real=0.54 secs]
>     2013-12-30T08:25:37.190-0500: 308702.622: [GC
>     concurrent-cleanup-start]
>     2013-12-30T08:25:37.190-0500: 308702.622: [GC
>     concurrent-cleanup-end, 0.0000480 secs]
>
>
>
>     _______________________________________________
>     hotspot-gc-use mailing list
>     hotspot-gc-use at openjdk.java.net
>     <mailto:hotspot-gc-use at openjdk.java.net>
>     http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140102/763edb09/attachment.html