G1 collector consuming up to ~35GB memory?

Mon Jul 23 21:41:00 UTC 2012

I'm running a reasonably simple benchmark here, and although it's not
production code, I'd expect it to have a reasonably similar object
graph to a large cache in production.

Essentially I'm creating a big map containing linked lists, where each
list contains 15 integers, on average. So I have lots of small linked
lists (in a big array), but not one large one. I'm then accessing the
cache with a hit rate of 90%, creating a new item on a cache miss, and
evicting the LRU item. This generates around 50MB of data per second
(which, admittedly, is probably more than I'd expect to generate in
production).

The code is attached (BigRamTester.java). I'm running with the
following command "java -cp . -mx60G -ms60G -XX:MaxDirectMemorySize=1G
-Xloggc:./gc.log -XX:+PrintGC -XX:+PrintGCDetails
-XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+UseG1GC
-XX:+PrintClassHistogram com.neilferguson.memory.BigRamTester"

I'm am also surprised by the "scalability" of G1. The machine I'm
running my tests on has got 32 cores, so it seems strange that the
collection isn't more parallel. Presumably there's no configuration
parameter for this, and G1 will try and eat as many cores as it can?

On Fri, Jul 20, 2012 at 1:41 AM, Peter B. Kessler
<Peter.B.Kessler at oracle.com> wrote:
> I am not a trained reader of G1 logs, so I can't help you there.  The
> "(to-space overflow)" notation in the log might mean that your application
> is promoting more objects than G1 was expecting, possibly causing the full
> collection.
>
> I'm surprised by the absolute time numbers, though.  Computing the
> "scalability" of the collector as (user+sys/real) for the G1 full
> collection, that looks like
>
>
>         [Times: user=1810.46 sys=37.43, real=1100.24 secs]
>
> for a scalability of 1.7.  The G1 young generation collection comes in at
>
>
>         [Times: user=179.15 sys=26.29, real=136.73 secs]
>
> for a scalability of 1.5.  In contrast, the ParallelOld collection reports
>
>
>         [Times: user=5470.99 sys=22.04, real=266.11 secs]
>
> which indicates a scalability of 20.1.  That would be good except that it
> looks like it's just burning more user time in somewhat more real time (than
> the 60GB G1 heap), which doesn't win any points.  I'll assume that you are
> running on a machine with many processors, and were hoping that GC would
> scale to use several of them.
>
> Poor scaling might happen if your heap is filled with a graph that doesn't
> allow a lot of parallelism in the collector.  Parallel marking basically
> depends on the bushiness of the object graph.  For example, a long linked
> list of tiny objects doesn't provide any opportunity for parallel marking.
> A nicely balanced tree provides lots of opportunity for parallel marking.
> In the ParallelOld collector, a heap with a few widely-scattered small
> unreachable objects doesn't give a lot of opportunity for parallel
> compaction.  (Though, there is supposed to be a "deadwood" calculation that
> usually keeps us from trying to compact such heaps.)  Does your sample
> application accurately model the object graph you expect to see in
> production?
>
> I hope that someone who knows more about G1 logs can teach us all how to
> read yours.
>
>
>                         ... peter
>
> Neil Ferguson wrote:
>>
>> Hi Peter
>>
>> I haven't yet spent any time tuning the G1 collector for these tests,
>> and it seems that it's performing full GCs a lot of the time. Here are
>> the times from my GC logs:
>>
>> 60G heap: 2741.088: [Full GC 61408M->51197M(61440M), 239.5237130 secs]
>> 120G heap: 4577.455: [Full GC 122841M->112633M(122880M), 525.9004520 secs]
>> 240G heap: 13500.655: [Full GC 245725M->235520M(245760M), 1100.4050500
>> secs]
>>
>> I'm not sure why it's performing full GCs, as I don't yet understand
>> the details of the G1 logs. However, if anyone could shed any light on
>> this, I'd be most grateful. Here's a section of the logs for the 240G
>> heap:
>>
>> 2012-06-19T19:29:42.849+0000: 13500.654: [GC
>> concurrent-root-region-scan-start]
>> 2012-06-19T19:29:42.849+0000: 13500.654: [GC
>> concurrent-root-region-scan-end, 0.0000380]
>> 2012-06-19T19:29:42.849+0000: 13500.654: [GC concurrent-mark-start]
>> 2012-06-19T19:29:42.850+0000: 13500.655: [Full GC
>> 245725M->235520M(245760M), 1100.4050500 secs]
>>  [Times: user=1810.46 sys=37.43, real=1100.24 secs]
>> 2012-06-19T19:48:03.257+0000: 14601.062: [GC concurrent-mark-abort]
>> 2012-06-19T19:51:04.824+0000: 14782.629: [GC pause (young) (to-space
>> overflow), 136.75016300 secs]
>>    [Parallel Time: 122281.2 ms]
>>       [GC Worker Start (ms):  14782633.1  14782633.1  14782633.2
>> 14782633.2  14782633.2  14782633.3  14782633.3  14782633.3  14782633.3
>>  14782633.4  14782633.4  14782633.4  147
>> 82633.4  14782633.5  14782633.6  14782633.6  14782633.6  14782633.7
>> 14782633.7  14782633.7  14782633.8  14782633.9  14782634.0
>>        Avg: 14782633.5, Min: 14782633.1, Max: 14782634.0, Diff:   0.8]
>>       [Ext Root Scanning (ms):  0.4  0.3  0.4  0.4  0.2  0.2  0.3  0.2
>>  0.2  0.1  0.1  0.1  0.1  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
>> 0.0
>>        Avg:   0.1, Min:   0.0, Max:   0.4, Diff:   0.4]
>>       [Update RS (ms):  14.2  14.7  12.6  12.2  13.1  14.4  14.3  14.4
>>  14.2  14.5  14.3  12.9  14.4  12.1  14.3  14.5  14.4  14.1  12.1
>> 12.0  14.2  11.9  13.5
>>        Avg:  13.6, Min:  11.9, Max:  14.7, Diff:   2.8]
>>          [Processed Buffers : 3 4 4 3 3 4 4 3 3 3 4 3 3 3 3 3 3 3 2 2 3 3
>> 3
>>           Sum: 72, Avg: 3, Min: 2, Max: 4, Diff: 2]
>>       [Scan RS (ms):  1319.1  1318.7  1320.8  1321.1  1320.3  1319.0
>> 1319.0  1319.0  1319.1  1318.9  1319.1  1320.4  1318.9  1321.2  1318.9
>>  1318.8  1318.9  1319.0  1321.1  132
>> 1.1  1318.8  1321.0  1319.4
>>        Avg: 1319.6, Min: 1318.7, Max: 1321.2, Diff:   2.5]
>>       [Object Copy (ms):  120944.7  120944.8  120944.9  120944.8
>> 120944.7  120944.8  120944.8  120944.8  120944.8  120944.8  120944.8
>> 120944.8  120944.8  120944.8  120944.7  1
>> 20944.8  120944.8  120944.8  120944.7  120944.7  120944.8  120944.8
>> 120944.8
>>        Avg: 120944.8, Min: 120944.7, Max: 120944.9, Diff:   0.2]
>>       [Termination (ms):  0.2  0.1  0.0  0.1  0.2  0.1  0.1  0.1  0.1
>> 0.1  0.1  0.1  0.1  0.1  0.2  0.1  0.0  0.1  0.1  0.1  0.1  0.1  0.1
>>        Avg:   0.1, Min:   0.0, Max:   0.2, Diff:   0.2]
>>          [Termination Attempts : 21 17 1 14 17 16 14 10 13 21 16 19 12
>> 11 17 12 12 13 16 14 12 15 17
>>           Sum: 330, Avg: 14, Min: 1, Max: 21, Diff: 20]
>>       [GC Worker End (ms):  14904912.9  14904911.8  14904912.0
>> 14904911.8  14904911.9  14904912.2  14904912.8  14904912.1  14904911.8
>>  14904912.9  14904912.5  14904911.8  14904
>> 911.8  14904912.7  14904912.3  14904913.0  14904912.4  14904913.1
>> 14904911.8  14904912.6  14904911.8  14904912.6  14904911.8
>>        Avg: 14904912.3, Min: 14904911.8, Max: 14904913.1, Diff:   1.3]
>>       [GC Worker (ms):  122279.8  122278.7  122278.8  122278.6
>> 122278.7  122278.9  122279.5  122278.8  122278.5  122279.5  122279.1
>> 122278.4  122278.4  122279.2  122278.7  122
>> 279.4  122278.8  122279.4  122278.1  122278.9  122278.0  122278.7
>> 122277.8
>>        Avg: 122278.8, Min: 122277.8, Max: 122279.8, Diff:   2.0]
>>       [GC Worker Other (ms):  2.6  2.6  2.6  2.7  2.8  2.8  2.8  2.8
>> 2.8  2.9  2.9  2.9  2.9  3.0  3.1  3.1  3.1  3.2  3.2  3.3  3.3  3.4
>> 3.5
>>        Avg:   3.0, Min:   2.6, Max:   3.5, Diff:   0.8]
>>    [Clear CT:  38.1 ms]
>>    [Other: 14430.8 ms]
>>       [Choose CSet:   0.0 ms]
>>       [Ref Proc:   8.0 ms]
>>       [Ref Enq:   0.0 ms]
>>       [Free CSet:   4.7 ms]
>>    [Eden: 10176M(49152M)->0B(49152M) Survivors: 0B->0B Heap:
>> 245696M(245760M)->245696M(245760M)]
>>  [Times: user=179.15 sys=26.29, real=136.73 secs]
>>
>> I can provide full logs if necessary.
>>
>> BTW, The numbers from the parallel collector are similar, as I'd expect:
>>
>> 2012-06-17T08:11:24.048+0000: 58877.099: [Full GC [PSYoungGen:
>> 15728640K->7747432K(18350080K)] [ParOldGen:
>> 41943038K->41943038K(41943040K)] 57671678K->49690470K(60293120K)
>> [PSPermGen: 3512K->3512K(21248K)], 266.1496160 secs] [Times:
>> user=5470.99 sys=22.04, real=266.11 secs]
>>
>> On Thu, Jul 19, 2012 at 8:06 PM, Peter B. Kessler
>> <Peter.B.Kessler at oracle.com> wrote:
>>>
>>> If you don't mind sharing, what kinds of collection times are you seeing
>>> with G1 (and with the ParallelOld collector if you have those numbers) on
>>> heaps of these sizes?
>>>
>>>                         ... peter
>>>
>>>
>>> Neil Ferguson wrote:
>>>>
>>>> Hi all.
>>>>
>>>> When using the G1 collector with large heap sizes I'm seeing the JVM
>>>> process grow much larger than my heap size, sometimes by as much as
>>>> 35GB.
>>>> Here are some examples:
>>>>
>>>> Heap size 60GB, java process size: 85GB
>>>> Heap size 120GB, java process size: 158GB
>>>> Heap size 240GB, java process size: 274GB
>>>> Heap size 480GB, process killed by Linux
>>>>
>>>> I'm quoting the resident set as reported by the Linux ps command. The
>>>> virtual memory size of the process is typically a few GB larger. This is
>>>> on
>>>> a machine with 512GB RAM, running Linux  2.6.32-220.el6.x86_64 and Java
>>>> 1.7.0_05-b05.
>>>>
>>>> Typically the process size grows gradually over the first 30 minutes or
>>>> so
>>>> of my test, then flattens out and doesn't grow any more. I'm running a
>>>> fairly trivial benchmark, which repeatedly adds items to a bounded (LRU)
>>>> cache. There's typically around 10GB free on the heap in each of my
>>>> tests.
>>>>
>>>> Does anyone know if this is the expected behaviour for G1? It seems
>>>> rather
>>>> a lot of memory to be using, over the heap size. I've run the same test
>>>> with
>>>> the parallel collector with a heap size of 60GB, and the process size
>>>> stabilizes quickly to around 62GB.
>>>>
>>>> Thanks.
>>>>
>>>> Neil.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: BigRamTester.java
Type: application/octet-stream
Size: 7347 bytes
Desc: not available
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20120723/d030b446/BigRamTester.java>