G1 out of memory behaviour
Wolfgang Pedot
wolfgang.pedot at finkzeit.at
Thu Oct 31 05:53:27 PDT 2013
Hi,
thanks for your explanations and effort, see my additional comments below.
>
>> What puzzles me is why there has not been a single visible
>> OutOfMemoryError during the hole time while there are a whole bunch of
>> different exceptions in the log. If the problem was a single thread an
>> OOM could have terminated it. This application has been running for
>> years (several weeks since the last update) and there has only been the
>> one OOM situation before.
>
> The cause for this behavior is likely the large object/LOB.
>
> So the application allocates this LOB, does something, and additional
> allocations trigger the full gc because the heap is completely full.
>
> This full gc can reclaim some space (there's no log output after the
> full gc).
>
> This reclaimed space is large enough for G1 to continue for a little
> while (i.e. the GC thinks everything is "okay"), however with only a
> very small young gen, so these young GCs likely follow very closely upon
> each other (explaining the high gc overhead of 92%), but making some
> progress at least.
As I read the logs the young-gen is actually 0B and the CSet in those
collects consists of 0 regions so they do not seem to help much. There
is some progress during the full-GCs but because the values are in GB
its not possible to get exact numbers. I can see up to ~20
young-collects per second over a quite long time and the GC-overhead
reaches values above 99.5%.
I guess the extreme sluggishness comes from the fact that G1 scaled the
young-gen down to 0 regions in this case. CMS with a fixed eden-size
would probably throw an OOM when the latest survivors no longer fit into
old-gen.
I will try to replicate this behaviour on the test-system (much smaller
heap) and see what happens there.
Here are some of the full-GCs, the first one was in the morning (caused
by perm-gen) and thats what normally happens. As you can see the fun
begins around 16:32 and there is not much time in between those collects
to do anything else.
2013-10-29T08:51:39.412+0100: 82216.300: [Full GC 12G->5801M(14G),
15.0086860 secs]
2013-10-29T16:32:52.984+0100: 109889.872: [Full GC 14G->10G(14G),
21.9944800 secs]
2013-10-29T16:33:25.881+0100: 109922.769: [Full GC 14G->12G(14G),
23.9814220 secs]
2013-10-29T16:33:55.213+0100: 109952.101: [Full GC 14G->13G(14G),
24.5648510 secs]
2013-10-29T16:34:23.449+0100: 109980.337: [Full GC 14G->13G(14G),
25.0227810 secs]
2013-10-29T16:34:50.487+0100: 110007.375: [Full GC 14G->13G(14G),
24.7523580 secs]
2013-10-29T16:35:22.647+0100: 110039.535: [Full GC 14G->13G(14G),
25.5301280 secs]
2013-10-29T16:35:50.341+0100: 110067.229: [Full GC 14G->13G(14G),
26.2003390 secs]
2013-10-29T16:36:18.202+0100: 110095.089: [Full GC 14G->14G(14G),
25.1388210 secs]
2013-10-29T16:36:49.917+0100: 110126.805: [Full GC 14G->14G(14G),
25.6623660 secs]
2013-10-29T16:37:24.125+0100: 110161.013: [Full GC 14G->14G(14G),
26.1288850 secs]
2013-10-29T16:37:51.479+0100: 110188.367: [Full GC 14G->14G(14G),
25.5675010 secs]
2013-10-29T16:38:18.206+0100: 110215.094: [Full GC 14G->14G(14G),
25.4115790 secs]
2013-10-29T16:38:44.440+0100: 110241.328: [Full GC 14G->14G(14G),
25.2001720 secs]
2013-10-29T16:39:10.771+0100: 110267.659: [Full GC 14G->14G(14G),
25.0099190 secs]
2013-10-29T16:39:36.640+0100: 110293.528: [Full GC 14G->14G(14G),
24.8488030 secs]
2013-10-29T16:40:02.489+0100: 110319.377: [Full GC 14G->14G(14G),
26.0805850 secs]
2013-10-29T16:40:29.287+0100: 110346.175: [Full GC 14G->14G(14G),
27.2401450 secs]
.
.
.
Would it be feasible to use the GC-overhead to decide that its time for
an OOM-Error?
Wolfgang
More information about the hotspot-gc-use
mailing list