G1 collector consuming up to ~35GB memory?

Peter B. Kessler Peter.B.Kessler at Oracle.COM
Mon Jul 23 23:44:07 UTC 2012


Thanks for sending out the example code.  I tried changing "storage" to a HashMap, and in a much smaller heap (1280m) on a much smaller machine (2 processors) scaling the RESERVED_SPACE to leave room so that after a young collection the young generation doesn't worry enough to cause a full collection.  With -XX:+UseParallelOldGC I'm getting collections like

	2012-07-23 16:11:28 - Hit rate = 0.9011470722399617; total hits = 80996; total misses = 8885; total reads = 89881; free memory = 437MB; max time taken = 240.813062ms; uptime = 538 seconds
	2012-07-23 16:11:29 - Hit rate = 0.8990989373520204; total hits = 80125; total misses = 8992; total reads = 89117; free memory = 428MB; max time taken = 240.813062ms; uptime = 539 seconds
	539.973: [GC [PSYoungGen: 237088K->32K(354944K)] 774836K->537780K(1229440K), 0.0032309 secs] [Times: user=0.01 sys=0.00, real=0.00 secs]
	2012-07-23 16:11:30 - Hit rate = 0.9000791518489203; total hits = 80738; total misses = 8963; total reads = 89701; free memory = 670MB; max time taken = 240.813062ms; uptime = 540 seconds
	2012-07-23 16:11:31 - Hit rate = 0.8995499570058182; total hits = 80552; total misses = 8995; total reads = 89547; free memory = 659MB; max time taken = 240.813062ms; uptime = 541 seconds

Comparing runs with -XX:ParallelGCThreads=1 and -XX:ParallelGCThreads=2 the early collections show that your data structures allow scaling in the collector.  (It's hard to see scaling with collections as short as the one shown above.)  I'll leave it to you to see if changing from a LinkedHashMap to a HashMap scales to bigger heaps and more processors.  If so, that would be something else to add to the "GC-friendly programming" guide.

With the -XX:+UseG1GC collector, I'm seeing collections like

	2012-07-23 16:17:27 - Hit rate = 0.8992040485432123; total hits = 36603; total misses = 4103; total reads = 40706; free memory = 327MB; max time taken = 6.043111ms; uptime = 166 seconds
	2012-07-23 16:17:28 - Hit rate = 0.9004803215213449; total hits = 36745; total misses = 4061; total reads = 40806; free memory = 322MB; max time taken = 6.043111ms; uptime = 167 seconds
        168.151: [GC pause (young) (initial-mark), 0.20408485 secs]
           [Parallel Time: 193.3 ms]
              [GC Worker Start Time (ms):  168151.1  168151.7
               Avg: 168151.4, Min: 168151.1, Max: 168151.7, Diff:   0.6]
              [Update RS (ms):  5.2  11.6
               Avg:   8.4, Min:   5.2, Max:  11.6, Diff:   6.4]
                 [Processed Buffers : 4 4
                  Sum: 8, Avg: 4, Min: 4, Max: 4, Diff: 0]
              [Ext Root Scanning (ms):  0.7  0.6
               Avg:   0.7, Min:   0.6, Max:   0.7, Diff:   0.2]
              [Mark Stack Scanning (ms):  0.0  0.0
               Avg:   0.0, Min:   0.0, Max:   0.0, Diff:   0.0]
              [Scan RS (ms):  6.7  0.1
               Avg:   3.4, Min:   0.1, Max:   6.7, Diff:   6.7]
              [Object Copy (ms):  180.6  180.4
               Avg: 180.5, Min: 180.4, Max: 180.6, Diff:   0.2]
              [Termination (ms):  0.0  0.0
               Avg:   0.0, Min:   0.0, Max:   0.0, Diff:   0.0]
                 [Termination Attempts : 1 1
                  Sum: 2, Avg: 1, Min: 1, Max: 1, Diff: 0]
              [GC Worker End Time (ms):  168344.3  168344.3
               Avg: 168344.3, Min: 168344.3, Max: 168344.3, Diff:   0.0]
              [GC Worker Times (ms):  193.3  192.6
               Avg: 192.9, Min: 192.6, Max: 193.3, Diff:   0.6]
              [Parallel Other:   0.4 ms]
           [Clear CT:   0.4 ms]
           [Other:  10.4 ms]
              [Choose CSet:   0.0 ms]
              [Ref Proc:   0.1 ms]
              [Ref Enq:   0.0 ms]
        168.355: [GC concurrent-mark-start]
           [Eden: 372M(372M)->0B(425M) Survivors: 54M->1024K Heap: 958M(1280M)->588M(1280M)]
         [Times: user=0.36 sys=0.00, real=0.20 secs]
        2012-07-23 16:17:29 - Hit rate = 0.9016960738240697; total hits = 21106; total misses = 2301; total reads = 23407; free memory = 690MB; max time taken = 154.873942ms; uptime = 168 seconds
        2012-07-23 16:17:30 - Hit rate = 0.9012887513318788; total hits = 35527; total misses = 3891; total reads = 39418; free memory = 686MB; max time taken = 154.873942ms; uptime = 169 seconds
        170.382: [GC concurrent-mark-end, 2.0273056 sec]
        170.402: [GC remark 170.436: [GC ref-proc, 0.0001604 secs], 0.0400842 secs]
         [Times: user=0.02 sys=0.00, real=0.04 secs]
        170.443: [GC concurrent-count-start]
        170.852: [GC concurrent-count-end, 0.4090470]
        170.852: [GC cleanup 597M->556M(1280M), 0.0041084 secs]
         [Times: user=0.01 sys=0.01, real=0.00 secs]
        170.856: [GC concurrent-cleanup-start]
        170.856: [GC concurrent-cleanup-end, 0.0000913]
	2012-07-23 16:17:31 - Hit rate = 0.9031118541689138; total hits = 33491; total misses = 3593; total reads = 37084; free memory = 723MB; max time taken = 154.873942ms; uptime = 170 seconds
	2012-07-23 16:17:32 - Hit rate = 0.8994589496674278; total hits = 36241; total misses = 4051; total reads = 40292; free memory = 718MB; max time taken = 154.873942ms; uptime = 171 seconds

But as I said, I'm not a trained G1 log reader.  Except for the full collections you force by calling System.gc(), I haven't seen any full collections with either collector, so I can't speculate as to how they will scale.

I hope the total hits/misses/reads numbers are not a measure of your throughput.  I might expect throughput to drop while the concurrent phases of the G1 collector are running, but it might be that the write-barrier overhead is an issue (or maybe not).

			... peter

Neil Ferguson wrote:
> And just after I've sent this I've realised I'm using a LinkedHashMap,
> so there will be a big linked list. I can try a test with a standard
> HashMap.
> 
> On Mon, Jul 23, 2012 at 10:41 PM, Neil Ferguson <nferguson at gmail.com> wrote:
>> I'm running a reasonably simple benchmark here, and although it's not
>> production code, I'd expect it to have a reasonably similar object
>> graph to a large cache in production.
>>
>> Essentially I'm creating a big map containing linked lists, where each
>> list contains 15 integers, on average. So I have lots of small linked
>> lists (in a big array), but not one large one. I'm then accessing the
>> cache with a hit rate of 90%, creating a new item on a cache miss, and
>> evicting the LRU item. This generates around 50MB of data per second
>> (which, admittedly, is probably more than I'd expect to generate in
>> production).
>>
>> The code is attached (BigRamTester.java). I'm running with the
>> following command "java -cp . -mx60G -ms60G -XX:MaxDirectMemorySize=1G
>> -Xloggc:./gc.log -XX:+PrintGC -XX:+PrintGCDetails
>> -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+UseG1GC
>> -XX:+PrintClassHistogram com.neilferguson.memory.BigRamTester"
>>
>> I'm am also surprised by the "scalability" of G1. The machine I'm
>> running my tests on has got 32 cores, so it seems strange that the
>> collection isn't more parallel. Presumably there's no configuration
>> parameter for this, and G1 will try and eat as many cores as it can?
>>
>> On Fri, Jul 20, 2012 at 1:41 AM, Peter B. Kessler
>> <Peter.B.Kessler at oracle.com> wrote:
>>> I am not a trained reader of G1 logs, so I can't help you there.  The
>>> "(to-space overflow)" notation in the log might mean that your application
>>> is promoting more objects than G1 was expecting, possibly causing the full
>>> collection.
>>>
>>> I'm surprised by the absolute time numbers, though.  Computing the
>>> "scalability" of the collector as (user+sys/real) for the G1 full
>>> collection, that looks like
>>>
>>>
>>>         [Times: user=1810.46 sys=37.43, real=1100.24 secs]
>>>
>>> for a scalability of 1.7.  The G1 young generation collection comes in at
>>>
>>>
>>>         [Times: user=179.15 sys=26.29, real=136.73 secs]
>>>
>>> for a scalability of 1.5.  In contrast, the ParallelOld collection reports
>>>
>>>
>>>         [Times: user=5470.99 sys=22.04, real=266.11 secs]
>>>
>>> which indicates a scalability of 20.1.  That would be good except that it
>>> looks like it's just burning more user time in somewhat more real time (than
>>> the 60GB G1 heap), which doesn't win any points.  I'll assume that you are
>>> running on a machine with many processors, and were hoping that GC would
>>> scale to use several of them.
>>>
>>> Poor scaling might happen if your heap is filled with a graph that doesn't
>>> allow a lot of parallelism in the collector.  Parallel marking basically
>>> depends on the bushiness of the object graph.  For example, a long linked
>>> list of tiny objects doesn't provide any opportunity for parallel marking.
>>> A nicely balanced tree provides lots of opportunity for parallel marking.
>>> In the ParallelOld collector, a heap with a few widely-scattered small
>>> unreachable objects doesn't give a lot of opportunity for parallel
>>> compaction.  (Though, there is supposed to be a "deadwood" calculation that
>>> usually keeps us from trying to compact such heaps.)  Does your sample
>>> application accurately model the object graph you expect to see in
>>> production?
>>>
>>> I hope that someone who knows more about G1 logs can teach us all how to
>>> read yours.
>>>
>>>
>>>                         ... peter
>>>
>>> Neil Ferguson wrote:
>>>> Hi Peter
>>>>
>>>> I haven't yet spent any time tuning the G1 collector for these tests,
>>>> and it seems that it's performing full GCs a lot of the time. Here are
>>>> the times from my GC logs:
>>>>
>>>> 60G heap: 2741.088: [Full GC 61408M->51197M(61440M), 239.5237130 secs]
>>>> 120G heap: 4577.455: [Full GC 122841M->112633M(122880M), 525.9004520 secs]
>>>> 240G heap: 13500.655: [Full GC 245725M->235520M(245760M), 1100.4050500
>>>> secs]
>>>>
>>>> I'm not sure why it's performing full GCs, as I don't yet understand
>>>> the details of the G1 logs. However, if anyone could shed any light on
>>>> this, I'd be most grateful. Here's a section of the logs for the 240G
>>>> heap:
>>>>
>>>> 2012-06-19T19:29:42.849+0000: 13500.654: [GC
>>>> concurrent-root-region-scan-start]
>>>> 2012-06-19T19:29:42.849+0000: 13500.654: [GC
>>>> concurrent-root-region-scan-end, 0.0000380]
>>>> 2012-06-19T19:29:42.849+0000: 13500.654: [GC concurrent-mark-start]
>>>> 2012-06-19T19:29:42.850+0000: 13500.655: [Full GC
>>>> 245725M->235520M(245760M), 1100.4050500 secs]
>>>>  [Times: user=1810.46 sys=37.43, real=1100.24 secs]
>>>> 2012-06-19T19:48:03.257+0000: 14601.062: [GC concurrent-mark-abort]
>>>> 2012-06-19T19:51:04.824+0000: 14782.629: [GC pause (young) (to-space
>>>> overflow), 136.75016300 secs]
>>>>    [Parallel Time: 122281.2 ms]
>>>>       [GC Worker Start (ms):  14782633.1  14782633.1  14782633.2
>>>> 14782633.2  14782633.2  14782633.3  14782633.3  14782633.3  14782633.3
>>>>  14782633.4  14782633.4  14782633.4  147
>>>> 82633.4  14782633.5  14782633.6  14782633.6  14782633.6  14782633.7
>>>> 14782633.7  14782633.7  14782633.8  14782633.9  14782634.0
>>>>        Avg: 14782633.5, Min: 14782633.1, Max: 14782634.0, Diff:   0.8]
>>>>       [Ext Root Scanning (ms):  0.4  0.3  0.4  0.4  0.2  0.2  0.3  0.2
>>>>  0.2  0.1  0.1  0.1  0.1  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
>>>> 0.0
>>>>        Avg:   0.1, Min:   0.0, Max:   0.4, Diff:   0.4]
>>>>       [Update RS (ms):  14.2  14.7  12.6  12.2  13.1  14.4  14.3  14.4
>>>>  14.2  14.5  14.3  12.9  14.4  12.1  14.3  14.5  14.4  14.1  12.1
>>>> 12.0  14.2  11.9  13.5
>>>>        Avg:  13.6, Min:  11.9, Max:  14.7, Diff:   2.8]
>>>>          [Processed Buffers : 3 4 4 3 3 4 4 3 3 3 4 3 3 3 3 3 3 3 2 2 3 3
>>>> 3
>>>>           Sum: 72, Avg: 3, Min: 2, Max: 4, Diff: 2]
>>>>       [Scan RS (ms):  1319.1  1318.7  1320.8  1321.1  1320.3  1319.0
>>>> 1319.0  1319.0  1319.1  1318.9  1319.1  1320.4  1318.9  1321.2  1318.9
>>>>  1318.8  1318.9  1319.0  1321.1  132
>>>> 1.1  1318.8  1321.0  1319.4
>>>>        Avg: 1319.6, Min: 1318.7, Max: 1321.2, Diff:   2.5]
>>>>       [Object Copy (ms):  120944.7  120944.8  120944.9  120944.8
>>>> 120944.7  120944.8  120944.8  120944.8  120944.8  120944.8  120944.8
>>>> 120944.8  120944.8  120944.8  120944.7  1
>>>> 20944.8  120944.8  120944.8  120944.7  120944.7  120944.8  120944.8
>>>> 120944.8
>>>>        Avg: 120944.8, Min: 120944.7, Max: 120944.9, Diff:   0.2]
>>>>       [Termination (ms):  0.2  0.1  0.0  0.1  0.2  0.1  0.1  0.1  0.1
>>>> 0.1  0.1  0.1  0.1  0.1  0.2  0.1  0.0  0.1  0.1  0.1  0.1  0.1  0.1
>>>>        Avg:   0.1, Min:   0.0, Max:   0.2, Diff:   0.2]
>>>>          [Termination Attempts : 21 17 1 14 17 16 14 10 13 21 16 19 12
>>>> 11 17 12 12 13 16 14 12 15 17
>>>>           Sum: 330, Avg: 14, Min: 1, Max: 21, Diff: 20]
>>>>       [GC Worker End (ms):  14904912.9  14904911.8  14904912.0
>>>> 14904911.8  14904911.9  14904912.2  14904912.8  14904912.1  14904911.8
>>>>  14904912.9  14904912.5  14904911.8  14904
>>>> 911.8  14904912.7  14904912.3  14904913.0  14904912.4  14904913.1
>>>> 14904911.8  14904912.6  14904911.8  14904912.6  14904911.8
>>>>        Avg: 14904912.3, Min: 14904911.8, Max: 14904913.1, Diff:   1.3]
>>>>       [GC Worker (ms):  122279.8  122278.7  122278.8  122278.6
>>>> 122278.7  122278.9  122279.5  122278.8  122278.5  122279.5  122279.1
>>>> 122278.4  122278.4  122279.2  122278.7  122
>>>> 279.4  122278.8  122279.4  122278.1  122278.9  122278.0  122278.7
>>>> 122277.8
>>>>        Avg: 122278.8, Min: 122277.8, Max: 122279.8, Diff:   2.0]
>>>>       [GC Worker Other (ms):  2.6  2.6  2.6  2.7  2.8  2.8  2.8  2.8
>>>> 2.8  2.9  2.9  2.9  2.9  3.0  3.1  3.1  3.1  3.2  3.2  3.3  3.3  3.4
>>>> 3.5
>>>>        Avg:   3.0, Min:   2.6, Max:   3.5, Diff:   0.8]
>>>>    [Clear CT:  38.1 ms]
>>>>    [Other: 14430.8 ms]
>>>>       [Choose CSet:   0.0 ms]
>>>>       [Ref Proc:   8.0 ms]
>>>>       [Ref Enq:   0.0 ms]
>>>>       [Free CSet:   4.7 ms]
>>>>    [Eden: 10176M(49152M)->0B(49152M) Survivors: 0B->0B Heap:
>>>> 245696M(245760M)->245696M(245760M)]
>>>>  [Times: user=179.15 sys=26.29, real=136.73 secs]
>>>>
>>>> I can provide full logs if necessary.
>>>>
>>>> BTW, The numbers from the parallel collector are similar, as I'd expect:
>>>>
>>>> 2012-06-17T08:11:24.048+0000: 58877.099: [Full GC [PSYoungGen:
>>>> 15728640K->7747432K(18350080K)] [ParOldGen:
>>>> 41943038K->41943038K(41943040K)] 57671678K->49690470K(60293120K)
>>>> [PSPermGen: 3512K->3512K(21248K)], 266.1496160 secs] [Times:
>>>> user=5470.99 sys=22.04, real=266.11 secs]
>>>>
>>>> On Thu, Jul 19, 2012 at 8:06 PM, Peter B. Kessler
>>>> <Peter.B.Kessler at oracle.com> wrote:
>>>>> If you don't mind sharing, what kinds of collection times are you seeing
>>>>> with G1 (and with the ParallelOld collector if you have those numbers) on
>>>>> heaps of these sizes?
>>>>>
>>>>>                         ... peter
>>>>>
>>>>>
>>>>> Neil Ferguson wrote:
>>>>>> Hi all.
>>>>>>
>>>>>> When using the G1 collector with large heap sizes I'm seeing the JVM
>>>>>> process grow much larger than my heap size, sometimes by as much as
>>>>>> 35GB.
>>>>>> Here are some examples:
>>>>>>
>>>>>> Heap size 60GB, java process size: 85GB
>>>>>> Heap size 120GB, java process size: 158GB
>>>>>> Heap size 240GB, java process size: 274GB
>>>>>> Heap size 480GB, process killed by Linux
>>>>>>
>>>>>> I'm quoting the resident set as reported by the Linux ps command. The
>>>>>> virtual memory size of the process is typically a few GB larger. This is
>>>>>> on
>>>>>> a machine with 512GB RAM, running Linux  2.6.32-220.el6.x86_64 and Java
>>>>>> 1.7.0_05-b05.
>>>>>>
>>>>>> Typically the process size grows gradually over the first 30 minutes or
>>>>>> so
>>>>>> of my test, then flattens out and doesn't grow any more. I'm running a
>>>>>> fairly trivial benchmark, which repeatedly adds items to a bounded (LRU)
>>>>>> cache. There's typically around 10GB free on the heap in each of my
>>>>>> tests.
>>>>>>
>>>>>> Does anyone know if this is the expected behaviour for G1? It seems
>>>>>> rather
>>>>>> a lot of memory to be using, over the heap size. I've run the same test
>>>>>> with
>>>>>> the parallel collector with a heap size of 60GB, and the process size
>>>>>> stabilizes quickly to around 62GB.
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>> Neil.



More information about the hotspot-gc-dev mailing list