G1 out of memory behaviour
Wolfgang Pedot
wolfgang.pedot at finkzeit.at
Wed Oct 30 11:19:22 PDT 2013
Hi (again),
yesterday I had a pretty bad out-of memory situation (Heap was
completely full), unfortunately the VM was so unresponsive that I could
not find out what the original problem was but I suspect there was a
single thread allocating memory in an endless loop. We have had that
situation once before due to a scripting error and with CMS the system
was recoverable by terminating that thread.
The situation yesterday was only recoverable by terminating the VM
because it was so extremely unresponsive.
Heap-Usage jumped from a normal 6-7GB in old-gen to the full 14.6GB
within minutes and cpu usage toggled between 100% (1 core active) and
1200% (all 12 cores active) in the end. There appear to be some
unusually large humongous-allocations in the log (40-100MB) normally we
should not have objects these sizes, I am guessing these are huge arrays.
I managed to get the server down semi-gracefully but it took a very long
time to clean up. In the end the gclog was filled with collects like this:
2013-10-29T16:58:44.561+0100: 111441.449: [GC pause (young)
Desired survivor size 50331648 bytes, new threshold 15 (max 15)
111441.450: [G1Ergonomics (CSet Construction) start choosing CSet,
_pending_cards: 0, predicted base time: 38.89 ms, remaining time: 261.11
ms, target pause time: 300.00 ms]
111441.450: [G1Ergonomics (CSet Construction) add young regions to
CSet, eden: 0 regions, survivors: 0 regions, predicted young region
time: 0.00 ms]
111441.450: [G1Ergonomics (CSet Construction) finish choosing CSet,
eden: 0 regions, survivors: 0 regions, old: 0 regions, predicted pause
time: 38.89 ms, target pause time: 300.00 ms]
111441.503: [G1Ergonomics (Heap Sizing) attempt heap expansion,
reason: recent GC overhead higher than threshold after GC, recent GC
overhead: 92.91 %, threshold: 10.00 %, uncommitted: 0 bytes, calculated
expansion amount: 0 bytes (20.00 %)]
, 0.0535920 secs]
[Parallel Time: 51.0 ms, GC Workers: 12]
[GC Worker Start (ms): 111441449.7 111441449.7 111441449.7
111441449.7 111441449.7 111441449.7 111441449.8 111441449.8
111441449.8 111441449.8 111441449.8 111441449.8
Min: 111441449.7, Avg: 111441449.7, Max: 111441449.8, Diff: 0.1]
[Ext Root Scanning (ms): 31.7 40.4 42.6 31.6 31.9 34.1
32.4 39.8 31.7 40.4 31.7 38.0
Min: 31.6, Avg: 35.5, Max: 42.6, Diff: 11.0, Sum: 426.3]
[SATB Filtering (ms): 0.0 0.0 0.0 0.3 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0
Min: 0.0, Avg: 0.0, Max: 0.3, Diff: 0.3, Sum: 0.3]
[Update RS (ms): 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0
Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Processed Buffers: 1 0 0 0 0 0 0 0 0 0 0 0
Min: 0, Avg: 0.1, Max: 1, Diff: 1, Sum: 1]
[Scan RS (ms): 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0
Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Object Copy (ms): 2.5 3.2 8.2 2.5 2.5 1.2 2.2 3.1 2.4
3.2 2.5 3.0
Min: 1.2, Avg: 3.1, Max: 8.2, Diff: 6.9, Sum: 36.6]
[Termination (ms): 16.7 7.2 0.0 16.5 16.4 15.5 16.2 7.8
16.7 7.2 16.6 9.7
Min: 0.0, Avg: 12.2, Max: 16.7, Diff: 16.7, Sum: 146.4]
[Termination Attempts: 1 1 1 1 1 1 1 1 1 1 1 1
Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 12]
[GC Worker Other (ms): 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0
Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
[GC Worker Total (ms): 50.9 50.9 50.8 50.8 50.8 50.8 50.8
50.8 50.8 50.8 50.8 50.8
Min: 50.8, Avg: 50.8, Max: 50.9, Diff: 0.1, Sum: 609.8]
[GC Worker End (ms): 111441500.6 111441500.6 111441500.6
111441500.6 111441500.6 111441500.6 111441500.6 111441500.6
111441500.6 111441500.6 111441500.6 111441500.6
Min: 111441500.6, Avg: 111441500.6, Max: 111441500.6, Diff: 0.0]
[Code Root Fixup: 0.0 ms]
[Clear CT: 0.1 ms]
[Other: 2.6 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 1.0 ms]
[Ref Enq: 0.1 ms]
[Free CSet: 0.0 ms]
[Eden: 0.0B(744.0M)->0.0B(744.0M) Survivors: 0.0B->0.0B Heap:
14.6G(14.6G)->14.6G(14.6G)]
[Times: user=0.61 sys=0.00, real=0.05 secs]
I read this output as "not a single byte available anywhere".
What puzzles me is why there has not been a single visible
OutOfMemoryError during the hole time while there are a whole bunch of
different exceptions in the log. If the problem was a single thread an
OOM could have terminated it. This application has been running for
years (several weeks since the last update) and there has only been the
one OOM situation before.
Is there any documentation available on what triggers an OOM with G1?
regards
Wolfgang Pedot
More information about the hotspot-gc-use
mailing list