G1 collector consuming up to ~35GB memory?

Fri Jul 20 00:41:14 UTC 2012

I am not a trained reader of G1 logs, so I can't help you there.  The "(to-space overflow)" notation in the log might mean that your application is promoting more objects than G1 was expecting, possibly causing the full collection.

I'm surprised by the absolute time numbers, though.  Computing the "scalability" of the collector as (user+sys/real) for the G1 full collection, that looks like

	[Times: user=1810.46 sys=37.43, real=1100.24 secs]

for a scalability of 1.7.  The G1 young generation collection comes in at

	[Times: user=179.15 sys=26.29, real=136.73 secs]

for a scalability of 1.5.  In contrast, the ParallelOld collection reports

	[Times: user=5470.99 sys=22.04, real=266.11 secs]

which indicates a scalability of 20.1.  That would be good except that it looks like it's just burning more user time in somewhat more real time (than the 60GB G1 heap), which doesn't win any points.  I'll assume that you are running on a machine with many processors, and were hoping that GC would scale to use several of them.

Poor scaling might happen if your heap is filled with a graph that doesn't allow a lot of parallelism in the collector.  Parallel marking basically depends on the bushiness of the object graph.  For example, a long linked list of tiny objects doesn't provide any opportunity for parallel marking.  A nicely balanced tree provides lots of opportunity for parallel marking.  In the ParallelOld collector, a heap with a few widely-scattered small unreachable objects doesn't give a lot of opportunity for parallel compaction.  (Though, there is supposed to be a "deadwood" calculation that usually keeps us from trying to compact such heaps.)  Does your sample application accurately model the object graph you expect to see in production?

I hope that someone who knows more about G1 logs can teach us all how to read yours.

			... peter

Neil Ferguson wrote:
> Hi Peter
> 
> I haven't yet spent any time tuning the G1 collector for these tests,
> and it seems that it's performing full GCs a lot of the time. Here are
> the times from my GC logs:
> 
> 60G heap: 2741.088: [Full GC 61408M->51197M(61440M), 239.5237130 secs]
> 120G heap: 4577.455: [Full GC 122841M->112633M(122880M), 525.9004520 secs]
> 240G heap: 13500.655: [Full GC 245725M->235520M(245760M), 1100.4050500 secs]
> 
> I'm not sure why it's performing full GCs, as I don't yet understand
> the details of the G1 logs. However, if anyone could shed any light on
> this, I'd be most grateful. Here's a section of the logs for the 240G
> heap:
> 
> 2012-06-19T19:29:42.849+0000: 13500.654: [GC concurrent-root-region-scan-start]
> 2012-06-19T19:29:42.849+0000: 13500.654: [GC
> concurrent-root-region-scan-end, 0.0000380]
> 2012-06-19T19:29:42.849+0000: 13500.654: [GC concurrent-mark-start]
> 2012-06-19T19:29:42.850+0000: 13500.655: [Full GC
> 245725M->235520M(245760M), 1100.4050500 secs]
>  [Times: user=1810.46 sys=37.43, real=1100.24 secs]
> 2012-06-19T19:48:03.257+0000: 14601.062: [GC concurrent-mark-abort]
> 2012-06-19T19:51:04.824+0000: 14782.629: [GC pause (young) (to-space
> overflow), 136.75016300 secs]
>    [Parallel Time: 122281.2 ms]
>       [GC Worker Start (ms):  14782633.1  14782633.1  14782633.2
> 14782633.2  14782633.2  14782633.3  14782633.3  14782633.3  14782633.3
>  14782633.4  14782633.4  14782633.4  147
> 82633.4  14782633.5  14782633.6  14782633.6  14782633.6  14782633.7
> 14782633.7  14782633.7  14782633.8  14782633.9  14782634.0
>        Avg: 14782633.5, Min: 14782633.1, Max: 14782634.0, Diff:   0.8]
>       [Ext Root Scanning (ms):  0.4  0.3  0.4  0.4  0.2  0.2  0.3  0.2
>  0.2  0.1  0.1  0.1  0.1  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
> 0.0
>        Avg:   0.1, Min:   0.0, Max:   0.4, Diff:   0.4]
>       [Update RS (ms):  14.2  14.7  12.6  12.2  13.1  14.4  14.3  14.4
>  14.2  14.5  14.3  12.9  14.4  12.1  14.3  14.5  14.4  14.1  12.1
> 12.0  14.2  11.9  13.5
>        Avg:  13.6, Min:  11.9, Max:  14.7, Diff:   2.8]
>          [Processed Buffers : 3 4 4 3 3 4 4 3 3 3 4 3 3 3 3 3 3 3 2 2 3 3 3
>           Sum: 72, Avg: 3, Min: 2, Max: 4, Diff: 2]
>       [Scan RS (ms):  1319.1  1318.7  1320.8  1321.1  1320.3  1319.0
> 1319.0  1319.0  1319.1  1318.9  1319.1  1320.4  1318.9  1321.2  1318.9
>  1318.8  1318.9  1319.0  1321.1  132
> 1.1  1318.8  1321.0  1319.4
>        Avg: 1319.6, Min: 1318.7, Max: 1321.2, Diff:   2.5]
>       [Object Copy (ms):  120944.7  120944.8  120944.9  120944.8
> 120944.7  120944.8  120944.8  120944.8  120944.8  120944.8  120944.8
> 120944.8  120944.8  120944.8  120944.7  1
> 20944.8  120944.8  120944.8  120944.7  120944.7  120944.8  120944.8  120944.8
>        Avg: 120944.8, Min: 120944.7, Max: 120944.9, Diff:   0.2]
>       [Termination (ms):  0.2  0.1  0.0  0.1  0.2  0.1  0.1  0.1  0.1
> 0.1  0.1  0.1  0.1  0.1  0.2  0.1  0.0  0.1  0.1  0.1  0.1  0.1  0.1
>        Avg:   0.1, Min:   0.0, Max:   0.2, Diff:   0.2]
>          [Termination Attempts : 21 17 1 14 17 16 14 10 13 21 16 19 12
> 11 17 12 12 13 16 14 12 15 17
>           Sum: 330, Avg: 14, Min: 1, Max: 21, Diff: 20]
>       [GC Worker End (ms):  14904912.9  14904911.8  14904912.0
> 14904911.8  14904911.9  14904912.2  14904912.8  14904912.1  14904911.8
>  14904912.9  14904912.5  14904911.8  14904
> 911.8  14904912.7  14904912.3  14904913.0  14904912.4  14904913.1
> 14904911.8  14904912.6  14904911.8  14904912.6  14904911.8
>        Avg: 14904912.3, Min: 14904911.8, Max: 14904913.1, Diff:   1.3]
>       [GC Worker (ms):  122279.8  122278.7  122278.8  122278.6
> 122278.7  122278.9  122279.5  122278.8  122278.5  122279.5  122279.1
> 122278.4  122278.4  122279.2  122278.7  122
> 279.4  122278.8  122279.4  122278.1  122278.9  122278.0  122278.7  122277.8
>        Avg: 122278.8, Min: 122277.8, Max: 122279.8, Diff:   2.0]
>       [GC Worker Other (ms):  2.6  2.6  2.6  2.7  2.8  2.8  2.8  2.8
> 2.8  2.9  2.9  2.9  2.9  3.0  3.1  3.1  3.1  3.2  3.2  3.3  3.3  3.4
> 3.5
>        Avg:   3.0, Min:   2.6, Max:   3.5, Diff:   0.8]
>    [Clear CT:  38.1 ms]
>    [Other: 14430.8 ms]
>       [Choose CSet:   0.0 ms]
>       [Ref Proc:   8.0 ms]
>       [Ref Enq:   0.0 ms]
>       [Free CSet:   4.7 ms]
>    [Eden: 10176M(49152M)->0B(49152M) Survivors: 0B->0B Heap:
> 245696M(245760M)->245696M(245760M)]
>  [Times: user=179.15 sys=26.29, real=136.73 secs]
> 
> I can provide full logs if necessary.
> 
> BTW, The numbers from the parallel collector are similar, as I'd expect:
> 
> 2012-06-17T08:11:24.048+0000: 58877.099: [Full GC [PSYoungGen:
> 15728640K->7747432K(18350080K)] [ParOldGen:
> 41943038K->41943038K(41943040K)] 57671678K->49690470K(60293120K)
> [PSPermGen: 3512K->3512K(21248K)], 266.1496160 secs] [Times:
> user=5470.99 sys=22.04, real=266.11 secs]
> 
> On Thu, Jul 19, 2012 at 8:06 PM, Peter B. Kessler
> <Peter.B.Kessler at oracle.com> wrote:
>> If you don't mind sharing, what kinds of collection times are you seeing
>> with G1 (and with the ParallelOld collector if you have those numbers) on
>> heaps of these sizes?
>>
>>                         ... peter
>>
>>
>> Neil Ferguson wrote:
>>> Hi all.
>>>
>>> When using the G1 collector with large heap sizes I'm seeing the JVM
>>> process grow much larger than my heap size, sometimes by as much as 35GB.
>>> Here are some examples:
>>>
>>> Heap size 60GB, java process size: 85GB
>>> Heap size 120GB, java process size: 158GB
>>> Heap size 240GB, java process size: 274GB
>>> Heap size 480GB, process killed by Linux
>>>
>>> I'm quoting the resident set as reported by the Linux ps command. The
>>> virtual memory size of the process is typically a few GB larger. This is on
>>> a machine with 512GB RAM, running Linux  2.6.32-220.el6.x86_64 and Java
>>> 1.7.0_05-b05.
>>>
>>> Typically the process size grows gradually over the first 30 minutes or so
>>> of my test, then flattens out and doesn't grow any more. I'm running a
>>> fairly trivial benchmark, which repeatedly adds items to a bounded (LRU)
>>> cache. There's typically around 10GB free on the heap in each of my tests.
>>>
>>> Does anyone know if this is the expected behaviour for G1? It seems rather
>>> a lot of memory to be using, over the heap size. I've run the same test with
>>> the parallel collector with a heap size of 60GB, and the process size
>>> stabilizes quickly to around 62GB.
>>>
>>> Thanks.
>>>
>>> Neil.