G1 "to-space exhausted" causes used heap space to increase?

Tue Mar 3 11:43:25 UTC 2015

Hi,

When there are live objects during an eden collection, they must be copied
to a new, empty region. With a huge eden size, this may require more space
than there is available, causing a to-space exhaustion. We always run with
a fixed new generation size to avoid this kind of issues; when ergonomics
think it can hit the pause time target with a very large eden, it will
allocate a very large eden as it can be more efficient, that's a bit too
unpredictable for us in production.

We usually set a NewRatio of 4 to 10, when set to 4, the eden is fixed at
1/5th the size of the full heap or ~9GB. This also pretty much guarantee a
tiny static eden collection pause, in your case, of ~21ms. (60/807*208
regions). Your promotion failure happened when 25.5G produced 3.2G of
survivors, with a eden of 9G, this should only be 1.2G, which shouldn't be
any issue if the old collector is run regularly. The old collector is only
triggered after a young collection by the way, so having them spaced closer
(smaller eden -> full more quickly) gives it more chance to run and add
almost empty regions to the next mixed gc run.

Con: GC will run more often, with smaller pauses, and promote more objects
to the old generation which require more work to collect (concurrent scan
required). But as your collections run once per many minutes, this extra
overhead is basically zero. Our prod young collectors run multiple times
per second on a 4G eden, so you're not pushing the limits of the collector
at all.

Kind regards,

-- 
Chi Ho Kwok
Digibites Technology
chkwok at digibites.nl

On 2 March 2015 at 18:44, Christopher Berner <christopherberner at gmail.com>
wrote:

> Hello,
>
> I work on the Presto project (https://github.com/facebook/presto) and am
> trying to understand the behavior of G1. We run a 45GB heap on the worker
> machines with "-XX:G1HeapRegionSize=32M", and it works smoothly, except
> that every day a few machines hit a "to-space exhausted" failure and either
> dies due to an OutOfMemory error, or does a full gc with such a long pause
> that it fails our health checks and is restarted by our service manager.
>
> Looking at the GC logs, the sequence of events is always the same. The
> young gen is quite large (~50% of the heap), and every collection is fast,
> but then it hits a "to-space exhausted" failure which appears to increase
> heap used (see log below). After that the young gen is tiny and it never
> recovers.
>
> Two questions: 1) why does heap used increase in the middle of the GC
> cycle? 2) Looking at some of the logs it appears that it starts a full GC,
> but also throws an OutOfMemory error concurrently (they show up a hundred
> lines apart or so in stdout). Why would there be an OutOfMemory error
> before the full GC finished?
>
> Thanks for any help!
> Christopher
>
> 2015-03-02T00:56:32.131-0800: 199078.406: [GC pause (GCLocker Initiated
> GC) (young) 199078.407: [G1Ergonomics (CSet Construction) start choosing
> CSet, _pending_cards: 16136, predicted base
>
> time: 30.29 ms, remaining time: 169.71 ms, target pause time: 200.00 ms]
>
>  199078.407: [G1Ergonomics (CSet Construction) add young regions to CSet,
> eden: 805 regions, survivors: 11 regions, predicted young region time:
> 56.53 ms]
>
>  199078.407: [G1Ergonomics (CSet Construction) finish choosing CSet, eden:
> 805 regions, survivors: 11 regions, old: 0 regions, predicted pause time:
> 86.82 ms, target pause time: 200.00 ms]
>
> , 0.0722119 secs]
>
>    [Parallel Time: 46.7 ms, GC Workers: 28]
>
>       [GC Worker Start (ms): Min: 199078406.9, Avg: 199078407.2, Max:
> 199078407.5, Diff: 0.6]
>
>       [Ext Root Scanning (ms): Min: 0.8, Avg: 1.4, Max: 3.9, Diff: 3.1,
> Sum: 39.7]
>
>       [Update RS (ms): Min: 0.0, Avg: 2.1, Max: 3.4, Diff: 3.4, Sum: 58.9]
>
>          [Processed Buffers: Min: 0, Avg: 6.5, Max: 22, Diff: 22, Sum: 182]
>
>       [Scan RS (ms): Min: 0.0, Avg: 0.2, Max: 0.3, Diff: 0.3, Sum: 5.3]
>
>       [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.4, Diff: 0.4,
> Sum: 0.7]
>
>       [Object Copy (ms): Min: 40.1, Avg: 41.3, Max: 43.7, Diff: 3.6, Sum:
> 1155.3]
>
>       [Termination (ms): Min: 0.8, Avg: 0.9, Max: 1.1, Diff: 0.3, Sum:
> 25.8]
>
>       [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.3, Diff: 0.2, Sum:
> 3.9]
>
>       [GC Worker Total (ms): Min: 45.7, Avg: 46.1, Max: 46.3, Diff: 0.6,
> Sum: 1289.7]
>
>       [GC Worker End (ms): Min: 199078453.2, Avg: 199078453.3, Max:
> 199078453.4, Diff: 0.2]
>
>    [Code Root Fixup: 0.3 ms]
>
>    [Code Root Purge: 0.0 ms]
>
>    [Clear CT: 3.0 ms]
>
>    [Other: 22.2 ms]
>
>       [Choose CSet: 0.0 ms]
>
>       [Ref Proc: 18.0 ms]
>
>       [Ref Enq: 0.5 ms]
>
>       [Redirty Cards: 0.9 ms]
>
>       [Humongous Reclaim: 0.0 ms]
>
>       [Free CSet: 1.7 ms]
>
>    [Eden: 25.2G(25.1G)->0.0B(25.2G) Survivors: 352.0M->320.0M Heap:
> 39.7G(45.0G)->14.6G(45.0G)]
>
>  [Times: user=1.37 sys=0.00, real=0.08 secs]
>
> 2015-03-02T01:38:44.545-0800: 201610.820: [GC pause (GCLocker Initiated
> GC) (young) 201610.820: [G1Ergonomics (CSet Construction) start choosing
> CSet, _pending_cards: 56252, predicted base
>
> time: 35.00 ms, remaining time: 165.00 ms, target pause time: 200.00 ms]
>
>  201610.820: [G1Ergonomics (CSet Construction) add young regions to CSet,
> eden: 807 regions, survivors: 10 regions, predicted young region time:
> 60.67 ms]
>
>  201610.820: [G1Ergonomics (CSet Construction) finish choosing CSet, eden:
> 807 regions, survivors: 10 regions, old: 0 regions, predicted pause time:
> 95.67 ms, target pause time: 200.00 ms]
>
>  201611.305: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason:
> region allocation request failed, allocation request: 3058176 bytes]
>
>  201611.319: [G1Ergonomics (Heap Sizing) expand the heap, requested
> expansion amount: 3058176 bytes, attempted expansion amount: 33554432 bytes]
>
>  201611.319: [G1Ergonomics (Heap Sizing) did not expand the heap, reason:
> heap already fully expanded]
>
>  201619.914: [G1Ergonomics (Concurrent Cycles) request concurrent cycle
> initiation, reason: occupancy higher than threshold, occupancy: 44291850240
> bytes, allocation request: 0 bytes, thres
>
> hold: 21743271900 bytes (45.00 %), source: end of GC]
>
>  (to-space exhausted), 9.0961593 secs]
>
>    [Parallel Time: 8209.7 ms, GC Workers: 28]
>
>       [GC Worker Start (ms): Min: 201610864.0, Avg: 201610864.2, Max:
> 201610864.4, Diff: 0.5]
>
>       [Ext Root Scanning (ms): Min: 1.2, Avg: 1.7, Max: 4.7, Diff: 3.6,
> Sum: 47.8]
>
>       [Update RS (ms): Min: 0.0, Avg: 4.7, Max: 6.0, Diff: 6.0, Sum: 131.1]
>
>          [Processed Buffers: Min: 0, Avg: 27.4, Max: 48, Diff: 48, Sum:
> 766]
>
>       [Scan RS (ms): Min: 0.1, Avg: 0.3, Max: 1.2, Diff: 1.1, Sum: 7.1]
>
>       [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.5, Diff: 0.5,
> Sum: 0.8]
>
>       [Object Copy (ms): Min: 8200.9, Avg: 8202.2, Max: 8207.2, Diff: 6.3,
> Sum: 229661.8]
>
>       [Termination (ms): Min: 0.0, Avg: 0.2, Max: 0.3, Diff: 0.3, Sum: 7.0]
>
>       [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2, Sum:
> 2.8]
>
>       [GC Worker Total (ms): Min: 8209.0, Avg: 8209.2, Max: 8209.5, Diff:
> 0.6, Sum: 229858.3]
>
>       [GC Worker End (ms): Min: 201619073.3, Avg: 201619073.4, Max:
> 201619073.5, Diff: 0.2]
>
>    [Code Root Fixup: 0.3 ms]
>
>    [Code Root Purge: 0.0 ms]
>
>    [Clear CT: 3.0 ms]
>
>    [Other: 883.1 ms]
>
>       [Evacuation Failure: 788.4 ms]
>
>       [Choose CSet: 0.0 ms]
>
>       [Ref Proc: 45.0 ms]
>
>       [Ref Enq: 0.6 ms]
>
>       [Redirty Cards: 1.4 ms]
>
>       [Humongous Reclaim: 0.1 ms]
>
>       [Free CSet: 0.6 ms]
>
>    [Eden: 25.2G(25.2G)->0.0B(32.0M) Survivors: 320.0M->3264.0M Heap:
> 39.8G(45.0G)->44.1G(45.0G)]
>
>  [Times: user=46.07 sys=2.21, real=9.10 secs]
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20150303/fbce5164/attachment.html>