Java 8 + Docker container - CMS collector leaves around instances that have no GC roots

Mon Nov 26 10:03:26 UTC 2018

Hi!

> 
> The docker cgroups limit (set via --memory) is set to 8G. Here's the
> docker stats output of that container right now (been running for around
> 16 hours now):

Okay, so you have about 2 gig to use outside the java heap. The docker 
image will use some space. The JVM will use memory (off java heap) for 
compiler caches, threads, class metadata, mmaped files (_mmap is used by 
elastic_) and other things.

I can not give you good help on how much this is, but a quick look at: 
https://www.elastic.co/guide/en/elasticsearch/guide/current/heap-sizing.html 
suggest that you should only give maximum half the memory to the java 
heap. If you give 6 gig to the heap, they seem to suggest 12 gig of 
memory (instead of 8).

Now, I do not know much about elastic search to say this is enough, and 
that guide does not seem to mention docker. I would suggest that you try 
to find a configuration that also _limits_ the use of file cache for 
elastic. If elastic does not understand that it is running under docker, 
it _might_ use huge file caches. Try to limit the file caches and give 
the docker image x gig for java heap, y gig for caches and some extra 
for slack.

Hope this helps.
Thanks, Leo

> 
> CONTAINER ID        NAME                             CPU %
> MEM USAGE / LIMIT   MEM %               NET I/O             BLOCK
> I/O           PIDS
> da5ee21b1d        elasticsearch   94.62%              3.657GiB /
> 8GiB     45.71%              0B / 0B             1.21MB / 18.2GB     227
>> You will need _more_ than 6 gig of memory; exactly how much is hard to
>> say. You will probably find the limit faster if you use Xms==Xmx and
>> maybe not need days of running the application.
> That's interesting and useful - I'll give that a try. That will
> certainly help speed up the investigation.
> 
>>
>>>
>>> Notice the vast difference between the live and non-live instances of
>>> the same class. This isn't just in one "snapshot". I have been
>>> monitoring this for more than a day and this pattern continues. Even
>>> taking heap dumps and using tools like visualvm shows that these
>>> instances have "no GC root" and I have even checked the gc log files to
>>> see that the CMS collector does occasionally run. However these objects
>>> never seem to get collected.
>>
>> What makes you believe they never get collected?
> I have gc logs enabled on this setup. I notice that the Full GC gets
> triggered once in a while. However, even after that Full GC completes, I
> still see these vast amount of "non-live" objects staying around in the
> heap (via jmap -histo output as well as a heap dump using visualvm).
> This is the "total" in heap via jmap -histo:
> 
>             #instances   #bytes
> Total      22880189     1644503392 (around 1.5 G)
> 
> and this is the -histo:live
> 
> Total       1292631      102790184 (around 98M, not even 100M)
> 
> 
> Some of these non-live objects hold on to the ByteBuffer(s) which keep
> filling up then non-heap memory too (right now the non-heap "mapped"
> ByteBuffer memory as shown in the JMX MBean is around 2.5G). The Full GC
> log message looks like this:
> 
> 2018-11-24T00:57:00.665+0000: 59655.295: [Full GC (Heap Inspection
> Initiated GC))
>   2018-11-24T00:57:00.665+0000: 59655.295: [CMS:
> 711842K->101527K(989632K), 0.5322
> 0752 secs] 1016723K->101527K(1986432K), [Metaspace:
> 48054K->48054K(1093632K)], 00
> .5325692 secs] [Times: user=0.53 sys=0.00, real=0.53 secs]
> 
> The jmap -heap output is this:
> 
> Server compiler detected.
> JVM version is 25.172-b11
> 
> using parallel threads in the new generation.
> using thread-local object allocation.
> Concurrent Mark-Sweep GC
> 
> Heap Configuration:
>     MinHeapFreeRatio         = 40
>     MaxHeapFreeRatio         = 70
>     MaxHeapSize              = 6442450944 (6144.0MB)
>     NewSize                  = 1134100480 (1081.5625MB)
>     MaxNewSize               = 1134100480 (1081.5625MB)
>     OldSize                  = 1013383168 (966.4375MB)
>     NewRatio                 = 2
>     SurvivorRatio            = 8
>     MetaspaceSize            = 21807104 (20.796875MB)
>     CompressedClassSpaceSize = 1073741824 (1024.0MB)
>     MaxMetaspaceSize         = 17592186044415 MB
>     G1HeapRegionSize         = 0 (0.0MB)
> 
> Heap Usage:
> New Generation (Eden + 1 Survivor Space):
>     capacity = 1020723200 (973.4375MB)
>     used     = 900988112 (859.2492218017578MB)
>     free     = 119735088 (114.18827819824219MB)
>     88.26958297802969% used
> Eden Space:
>     capacity = 907345920 (865.3125MB)
>     used     = 879277776 (838.5446319580078MB)
>     free     = 28068144 (26.767868041992188MB)
>     96.9065663512324% used
>  From Space:
>     capacity = 113377280 (108.125MB)
>     used     = 21710336 (20.70458984375MB)
>     free     = 91666944 (87.42041015625MB)
>     19.148753612716764% used
> To Space:
>     capacity = 113377280 (108.125MB)
>     used     = 0 (0.0MB)
>     free     = 113377280 (108.125MB)
>     0.0% used
> concurrent mark-sweep generation:
>     capacity = 1013383168 (966.4375MB)
>     used     = 338488264 (322.8075637817383MB)
>     free     = 674894904 (643.6299362182617MB)
>     33.40180443968061% used
> 
> 14231 interned Strings occupying 2258280 bytes.
> 
> So based on these various logs and data, I notice that they aren't being
> collected. At the time when the process gets chosen to be killed, even
> then I see these vast non-live objects holding on. I don't have too much
> knowledge in the GC area, so a genuine question - Is it just a wrong
> expectation that whenever a Full GC runs, it is supposed to clear out
> the non-live objects? Or is it a usual thing that the collector doesn't
> choose them for collection for some reason?
>>
>>>
>>> I realize this data may not be enough to narrow down the issue, but what
>>> I am looking for is some kind of help/input/hints/suggestions on what I
>>> should be trying to figure out why these instances aren't GCed. Is this
>>> something that's expected in certain situations?
>>
>> Being killed by the oom-killer suggests that your limit is too low
>> and/or your -Xmx is too large. If an increasing number of objects does
>> not get collected, you would get an exception (not being killed by the
>> oom-killer). What likely is happening is that your java heap slowly
>> grows (not unusual with CMS that does not do compaction of old
>> objects) and that the memory consumed by your docker image exceeds
>> your limit.
> You are right. The heap usage (keeping into consideration the
> uncollected non-live objects) does indeed grow very slowly. There
> however aren't too many (live) objects on heap really (even after days
> when this process is about to be killed). I'll have to read up on the
> compaction that you mentioned about CMS.
> 
> Here's one example output from dmesg when this process was killed
> previously:
> 
> [Wed Aug  8 03:21:13 2018] java invoked oom-killer: gfp_mask=0xd0,
> order=0, oom_score_adj=-600
> [Wed Aug  8 03:21:13 2018] java
> cpuset=390dfe142d30d73b43a35996ae11b9647fc0598651c8959e95788edbbc9916b5
> mems_allowed=0
> [Wed Aug  8 03:21:13 2018] CPU: 7 PID: 15827 Comm: java Not tainted
> 3.10.0-693.5.2.el7.x86_64 #1
> ...
> [Wed Aug  8 03:21:13 2018]  ffff88081472dee0 00000000fe941297
> ffff8808062cfcb8 ffffffff816a3e51
> [Wed Aug  8 03:21:13 2018]  ffff8808062cfd48 ffffffff8169f246
> ffff88080d6d3500 0000000000000001
> [Wed Aug  8 03:21:13 2018]  0000000000000000 0000000000000000
> ffff8808062cfcf8 0000000000000046
> [Wed Aug  8 03:21:13 2018] Call Trace:
> [Wed Aug  8 03:21:13 2018]  [<ffffffff816a3e51>] dump_stack+0x19/0x1b
> [Wed Aug  8 03:21:13 2018]  [<ffffffff8169f246>] dump_header+0x90/0x229
> [Wed Aug  8 03:21:13 2018]  [<ffffffff81185ef6>] ?
> find_lock_task_mm+0x56/0xc0
> [Wed Aug  8 03:21:13 2018]  [<ffffffff811f14f8>] ?
> try_get_mem_cgroup_from_mm+0x28/0x60
> [Wed Aug  8 03:21:13 2018]  [<ffffffff811863a4>]
> oom_kill_process+0x254/0x3d0
> [Wed Aug  8 03:21:13 2018]  [<ffffffff812b7eec>] ? selinux_capable+0x1c/0x40
> [Wed Aug  8 03:21:13 2018]  [<ffffffff811f5216>]
> mem_cgroup_oom_synchronize+0x546/0x570
> [Wed Aug  8 03:21:13 2018]  [<ffffffff811f4690>] ?
> mem_cgroup_charge_common+0xc0/0xc0
> [Wed Aug  8 03:21:13 2018]  [<ffffffff81186c34>]
> pagefault_out_of_memory+0x14/0x90
> [Wed Aug  8 03:21:13 2018]  [<ffffffff8169d60e>] mm_fault_error+0x68/0x12b
> [Wed Aug  8 03:21:13 2018]  [<ffffffff816b02b1>] __do_page_fault+0x391/0x450
> [Wed Aug  8 03:21:13 2018]  [<ffffffff816b03a5>] do_page_fault+0x35/0x90
> [Wed Aug  8 03:21:13 2018]  [<ffffffff816ac5c8>] page_fault+0x28/0x30
> [Wed Aug  8 03:21:13 2018] Task in
> /docker/390dfe142d30d73b43a35996ae11b9647fc0598651c8959e95788edbbc9916b5
> killed as a result of limit of
> /docker/390dfe142d30d73b43a35996ae11b9647fc0598651c8959e95788edbbc9916b5
> [Wed Aug  8 03:21:13 2018] memory: usage 8388608kB, limit 8388608kB,
> failcnt 426669
> [Wed Aug  8 03:21:13 2018] memory+swap: usage 8388608kB, limit
> 8388608kB, failcnt 152
> [Wed Aug  8 03:21:13 2018] kmem: usage 5773512kB, limit
> 9007199254740988kB, failcnt 0
> [Wed Aug  8 03:21:13 2018] Memory cgroup stats for
> /docker/390dfe142d30d73b43a35996ae11b9647fc0598651c8959e95788edbbc9916b5:
> cache:14056KB rss:2601040KB rss_huge:2484224KB mapped_file:6428KB
> swap:0KB inactive_anon:0KB active_anon:142640KB inactive_file:4288KB
> active_file:3324KB unevictable:2464692KB
> [Wed Aug  8 03:21:13 2018] [ pid ]   uid  tgid total_vm      rss nr_ptes
> swapents oom_score_adj name
> [Wed Aug  8 03:21:13 2018] [15582]  1000 15582  3049827   654465
> 1446        0          -600 java
> [Wed Aug  8 03:21:13 2018] Memory cgroup out of memory: Kill process
> 12625 (java) score 0 or sacrifice child
> [Wed Aug  8 03:21:13 2018] Killed process 15582 (java)
> total-vm:12199308kB, anon-rss:2599320kB, file-rss:18540kB, shmem-rss:0kB
> 
> 
>> How big the limit should be is hard to tell, but it _must_ be larger
>> than your "-Xmx" (the JVM is using more memory than the java heap, so
>> this would be true even without the addition of docker).
> Agreed and that's where the ByteBuffer(s) are coming into picture
> (around 2.5 GB right now and growing steadily).
> 
> -Jaikiran
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>