G1 collector consuming up to ~35GB memory?
Neil Ferguson
nferguson at gmail.com
Mon Jul 23 21:43:10 UTC 2012
And just after I've sent this I've realised I'm using a LinkedHashMap,
so there will be a big linked list. I can try a test with a standard
HashMap.
On Mon, Jul 23, 2012 at 10:41 PM, Neil Ferguson <nferguson at gmail.com> wrote:
> I'm running a reasonably simple benchmark here, and although it's not
> production code, I'd expect it to have a reasonably similar object
> graph to a large cache in production.
>
> Essentially I'm creating a big map containing linked lists, where each
> list contains 15 integers, on average. So I have lots of small linked
> lists (in a big array), but not one large one. I'm then accessing the
> cache with a hit rate of 90%, creating a new item on a cache miss, and
> evicting the LRU item. This generates around 50MB of data per second
> (which, admittedly, is probably more than I'd expect to generate in
> production).
>
> The code is attached (BigRamTester.java). I'm running with the
> following command "java -cp . -mx60G -ms60G -XX:MaxDirectMemorySize=1G
> -Xloggc:./gc.log -XX:+PrintGC -XX:+PrintGCDetails
> -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+UseG1GC
> -XX:+PrintClassHistogram com.neilferguson.memory.BigRamTester"
>
> I'm am also surprised by the "scalability" of G1. The machine I'm
> running my tests on has got 32 cores, so it seems strange that the
> collection isn't more parallel. Presumably there's no configuration
> parameter for this, and G1 will try and eat as many cores as it can?
>
> On Fri, Jul 20, 2012 at 1:41 AM, Peter B. Kessler
> <Peter.B.Kessler at oracle.com> wrote:
>> I am not a trained reader of G1 logs, so I can't help you there. The
>> "(to-space overflow)" notation in the log might mean that your application
>> is promoting more objects than G1 was expecting, possibly causing the full
>> collection.
>>
>> I'm surprised by the absolute time numbers, though. Computing the
>> "scalability" of the collector as (user+sys/real) for the G1 full
>> collection, that looks like
>>
>>
>> [Times: user=1810.46 sys=37.43, real=1100.24 secs]
>>
>> for a scalability of 1.7. The G1 young generation collection comes in at
>>
>>
>> [Times: user=179.15 sys=26.29, real=136.73 secs]
>>
>> for a scalability of 1.5. In contrast, the ParallelOld collection reports
>>
>>
>> [Times: user=5470.99 sys=22.04, real=266.11 secs]
>>
>> which indicates a scalability of 20.1. That would be good except that it
>> looks like it's just burning more user time in somewhat more real time (than
>> the 60GB G1 heap), which doesn't win any points. I'll assume that you are
>> running on a machine with many processors, and were hoping that GC would
>> scale to use several of them.
>>
>> Poor scaling might happen if your heap is filled with a graph that doesn't
>> allow a lot of parallelism in the collector. Parallel marking basically
>> depends on the bushiness of the object graph. For example, a long linked
>> list of tiny objects doesn't provide any opportunity for parallel marking.
>> A nicely balanced tree provides lots of opportunity for parallel marking.
>> In the ParallelOld collector, a heap with a few widely-scattered small
>> unreachable objects doesn't give a lot of opportunity for parallel
>> compaction. (Though, there is supposed to be a "deadwood" calculation that
>> usually keeps us from trying to compact such heaps.) Does your sample
>> application accurately model the object graph you expect to see in
>> production?
>>
>> I hope that someone who knows more about G1 logs can teach us all how to
>> read yours.
>>
>>
>> ... peter
>>
>> Neil Ferguson wrote:
>>>
>>> Hi Peter
>>>
>>> I haven't yet spent any time tuning the G1 collector for these tests,
>>> and it seems that it's performing full GCs a lot of the time. Here are
>>> the times from my GC logs:
>>>
>>> 60G heap: 2741.088: [Full GC 61408M->51197M(61440M), 239.5237130 secs]
>>> 120G heap: 4577.455: [Full GC 122841M->112633M(122880M), 525.9004520 secs]
>>> 240G heap: 13500.655: [Full GC 245725M->235520M(245760M), 1100.4050500
>>> secs]
>>>
>>> I'm not sure why it's performing full GCs, as I don't yet understand
>>> the details of the G1 logs. However, if anyone could shed any light on
>>> this, I'd be most grateful. Here's a section of the logs for the 240G
>>> heap:
>>>
>>> 2012-06-19T19:29:42.849+0000: 13500.654: [GC
>>> concurrent-root-region-scan-start]
>>> 2012-06-19T19:29:42.849+0000: 13500.654: [GC
>>> concurrent-root-region-scan-end, 0.0000380]
>>> 2012-06-19T19:29:42.849+0000: 13500.654: [GC concurrent-mark-start]
>>> 2012-06-19T19:29:42.850+0000: 13500.655: [Full GC
>>> 245725M->235520M(245760M), 1100.4050500 secs]
>>> [Times: user=1810.46 sys=37.43, real=1100.24 secs]
>>> 2012-06-19T19:48:03.257+0000: 14601.062: [GC concurrent-mark-abort]
>>> 2012-06-19T19:51:04.824+0000: 14782.629: [GC pause (young) (to-space
>>> overflow), 136.75016300 secs]
>>> [Parallel Time: 122281.2 ms]
>>> [GC Worker Start (ms): 14782633.1 14782633.1 14782633.2
>>> 14782633.2 14782633.2 14782633.3 14782633.3 14782633.3 14782633.3
>>> 14782633.4 14782633.4 14782633.4 147
>>> 82633.4 14782633.5 14782633.6 14782633.6 14782633.6 14782633.7
>>> 14782633.7 14782633.7 14782633.8 14782633.9 14782634.0
>>> Avg: 14782633.5, Min: 14782633.1, Max: 14782634.0, Diff: 0.8]
>>> [Ext Root Scanning (ms): 0.4 0.3 0.4 0.4 0.2 0.2 0.3 0.2
>>> 0.2 0.1 0.1 0.1 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
>>> 0.0
>>> Avg: 0.1, Min: 0.0, Max: 0.4, Diff: 0.4]
>>> [Update RS (ms): 14.2 14.7 12.6 12.2 13.1 14.4 14.3 14.4
>>> 14.2 14.5 14.3 12.9 14.4 12.1 14.3 14.5 14.4 14.1 12.1
>>> 12.0 14.2 11.9 13.5
>>> Avg: 13.6, Min: 11.9, Max: 14.7, Diff: 2.8]
>>> [Processed Buffers : 3 4 4 3 3 4 4 3 3 3 4 3 3 3 3 3 3 3 2 2 3 3
>>> 3
>>> Sum: 72, Avg: 3, Min: 2, Max: 4, Diff: 2]
>>> [Scan RS (ms): 1319.1 1318.7 1320.8 1321.1 1320.3 1319.0
>>> 1319.0 1319.0 1319.1 1318.9 1319.1 1320.4 1318.9 1321.2 1318.9
>>> 1318.8 1318.9 1319.0 1321.1 132
>>> 1.1 1318.8 1321.0 1319.4
>>> Avg: 1319.6, Min: 1318.7, Max: 1321.2, Diff: 2.5]
>>> [Object Copy (ms): 120944.7 120944.8 120944.9 120944.8
>>> 120944.7 120944.8 120944.8 120944.8 120944.8 120944.8 120944.8
>>> 120944.8 120944.8 120944.8 120944.7 1
>>> 20944.8 120944.8 120944.8 120944.7 120944.7 120944.8 120944.8
>>> 120944.8
>>> Avg: 120944.8, Min: 120944.7, Max: 120944.9, Diff: 0.2]
>>> [Termination (ms): 0.2 0.1 0.0 0.1 0.2 0.1 0.1 0.1 0.1
>>> 0.1 0.1 0.1 0.1 0.1 0.2 0.1 0.0 0.1 0.1 0.1 0.1 0.1 0.1
>>> Avg: 0.1, Min: 0.0, Max: 0.2, Diff: 0.2]
>>> [Termination Attempts : 21 17 1 14 17 16 14 10 13 21 16 19 12
>>> 11 17 12 12 13 16 14 12 15 17
>>> Sum: 330, Avg: 14, Min: 1, Max: 21, Diff: 20]
>>> [GC Worker End (ms): 14904912.9 14904911.8 14904912.0
>>> 14904911.8 14904911.9 14904912.2 14904912.8 14904912.1 14904911.8
>>> 14904912.9 14904912.5 14904911.8 14904
>>> 911.8 14904912.7 14904912.3 14904913.0 14904912.4 14904913.1
>>> 14904911.8 14904912.6 14904911.8 14904912.6 14904911.8
>>> Avg: 14904912.3, Min: 14904911.8, Max: 14904913.1, Diff: 1.3]
>>> [GC Worker (ms): 122279.8 122278.7 122278.8 122278.6
>>> 122278.7 122278.9 122279.5 122278.8 122278.5 122279.5 122279.1
>>> 122278.4 122278.4 122279.2 122278.7 122
>>> 279.4 122278.8 122279.4 122278.1 122278.9 122278.0 122278.7
>>> 122277.8
>>> Avg: 122278.8, Min: 122277.8, Max: 122279.8, Diff: 2.0]
>>> [GC Worker Other (ms): 2.6 2.6 2.6 2.7 2.8 2.8 2.8 2.8
>>> 2.8 2.9 2.9 2.9 2.9 3.0 3.1 3.1 3.1 3.2 3.2 3.3 3.3 3.4
>>> 3.5
>>> Avg: 3.0, Min: 2.6, Max: 3.5, Diff: 0.8]
>>> [Clear CT: 38.1 ms]
>>> [Other: 14430.8 ms]
>>> [Choose CSet: 0.0 ms]
>>> [Ref Proc: 8.0 ms]
>>> [Ref Enq: 0.0 ms]
>>> [Free CSet: 4.7 ms]
>>> [Eden: 10176M(49152M)->0B(49152M) Survivors: 0B->0B Heap:
>>> 245696M(245760M)->245696M(245760M)]
>>> [Times: user=179.15 sys=26.29, real=136.73 secs]
>>>
>>> I can provide full logs if necessary.
>>>
>>> BTW, The numbers from the parallel collector are similar, as I'd expect:
>>>
>>> 2012-06-17T08:11:24.048+0000: 58877.099: [Full GC [PSYoungGen:
>>> 15728640K->7747432K(18350080K)] [ParOldGen:
>>> 41943038K->41943038K(41943040K)] 57671678K->49690470K(60293120K)
>>> [PSPermGen: 3512K->3512K(21248K)], 266.1496160 secs] [Times:
>>> user=5470.99 sys=22.04, real=266.11 secs]
>>>
>>> On Thu, Jul 19, 2012 at 8:06 PM, Peter B. Kessler
>>> <Peter.B.Kessler at oracle.com> wrote:
>>>>
>>>> If you don't mind sharing, what kinds of collection times are you seeing
>>>> with G1 (and with the ParallelOld collector if you have those numbers) on
>>>> heaps of these sizes?
>>>>
>>>> ... peter
>>>>
>>>>
>>>> Neil Ferguson wrote:
>>>>>
>>>>> Hi all.
>>>>>
>>>>> When using the G1 collector with large heap sizes I'm seeing the JVM
>>>>> process grow much larger than my heap size, sometimes by as much as
>>>>> 35GB.
>>>>> Here are some examples:
>>>>>
>>>>> Heap size 60GB, java process size: 85GB
>>>>> Heap size 120GB, java process size: 158GB
>>>>> Heap size 240GB, java process size: 274GB
>>>>> Heap size 480GB, process killed by Linux
>>>>>
>>>>> I'm quoting the resident set as reported by the Linux ps command. The
>>>>> virtual memory size of the process is typically a few GB larger. This is
>>>>> on
>>>>> a machine with 512GB RAM, running Linux 2.6.32-220.el6.x86_64 and Java
>>>>> 1.7.0_05-b05.
>>>>>
>>>>> Typically the process size grows gradually over the first 30 minutes or
>>>>> so
>>>>> of my test, then flattens out and doesn't grow any more. I'm running a
>>>>> fairly trivial benchmark, which repeatedly adds items to a bounded (LRU)
>>>>> cache. There's typically around 10GB free on the heap in each of my
>>>>> tests.
>>>>>
>>>>> Does anyone know if this is the expected behaviour for G1? It seems
>>>>> rather
>>>>> a lot of memory to be using, over the heap size. I've run the same test
>>>>> with
>>>>> the parallel collector with a heap size of 60GB, and the process size
>>>>> stabilizes quickly to around 62GB.
>>>>>
>>>>> Thanks.
>>>>>
>>>>> Neil.
More information about the hotspot-gc-dev
mailing list