Java 8 + Docker container - CMS collector leaves around instances that have no GC roots

Mon Nov 26 10:26:41 UTC 2018

Hi Jaikiran
 Have a look at some blog posts by old friends :) These blog posts
might be helpful (along with the other replies you received) to
diagnose the root cause of the issue. In particular native memory
tracking.

https://developers.redhat.com/blog/2017/03/14/java-inside-docker/
https://developers.redhat.com/blog/2017/04/04/openjdk-and-containers/

Regards,
Jeremy

On Fri, 2018-11-23 at 19:25 +0530, Jaikiran Pai wrote:
> Hi,
> 
> I'm looking for some inputs in debugging a high memory usage issue
> (and
> subsequently the process being killed) in one of the applications I
> deal
> with. Given that from what I have looked into this issue so far, this
> appears to be something to do with the CMS collector, so I hope this
> is
> the right place to this question.
> 
> A bit of a background - The application that I'm dealing with is
> ElasticSearch server version 1.7.5. We use Java 8:
> 
> java version "1.8.0_172"
> Java(TM) SE Runtime Environment (build 1.8.0_172-b11)
> Java HotSpot(TM) 64-Bit Server VM (build 25.172-b11, mixed mode)
> 
> To add to the complexity in debugging this issue, this runs as a
> docker
> container on docker version 18.03.0-ce on a CentOS 7 host VM kernel
> version 3.10.0-693.5.2.el7.x86_64.
> 
> We have been noticing that this container/process keeps getting
> killed
> by the oom-killer every few days. The dmesg logs suggest that the
> process has hit the "limits" set on the docker cgroups level. After
> debugging this over past day or so, I've reached a point where I
> can't
> make much sense of the data I'm looking at. The JVM process is
> started
> using the following params (of relevance):
> 
> java -Xms2G -Xmx6G -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
> -XX:CMSInitiatingOccupancyFraction=75
> -XX:+UseCMSInitiatingOccupancyOnly
> -XX:+HeapDumpOnOutOfMemoryError -XX:+DisableExplicitGC ....
> 
> As you can see it uses CMS collector with 75% of tenured/old gen for
> initiating the GC.
> 
> After a few hours/days of running I notice that even though the CMS
> collector does run almost every hour or so, there are huge number of
> objects _with no GC roots_ that never get collected. These objects
> internally seem to hold on to ByteBuffer(s) which (from what I see)
> as a
> result never get released and the non-heap memory keeps building up,
> till the process gets killed. To give an example, here's the jmap
> -histo
> output (only relevant parts):
> 
>    1:        861642      196271400  [B
>    2:        198776       28623744 
> org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame
>    3:        676722       21655104 
> org.apache.lucene.store.ByteArrayDataInput
>    4:        202398       19430208 
> org.apache.lucene.codecs.lucene41.Lucene41PostingsWriter$IntBlockTerm
> State
>    5:        261819       18850968 
> org.apache.lucene.util.fst.FST$Arc
>    6:        178661       17018376  [C
>    7:         31452       16856024  [I
>    8:        203911        8049352  [J
>    9:         85700        5484800  java.nio.DirectByteBufferR
>   10:        168935        5405920 
> java.util.concurrent.ConcurrentHashMap$Node
>   11:         89948        5105328  [Ljava.lang.Object;
>   12:        148514        4752448 
> org.apache.lucene.util.WeakIdentityMap$IdentityWeakReference
> 
> ....
> 
> Total       5061244      418712248
> 
> This above output is without the "live" option. Running jmap
> -histo:live
> returns something like (again only relevant parts):
> 
>   13:         31753        1016096 
> org.apache.lucene.util.WeakIdentityMap$IdentityWeakReference
>   ...
>   44:           887         127728 
> org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame
>   ...
>   50:          3054          97728 
> org.apache.lucene.store.ByteArrayDataInput
>   ...
>   59:           888          85248 
> org.apache.lucene.codecs.lucene41.Lucene41PostingsWriter$IntBlockTerm
> State
> 
>   Total       1177783      138938920
> 
> 
> Notice the vast difference between the live and non-live instances of
> the same class. This isn't just in one "snapshot". I have been
> monitoring this for more than a day and this pattern continues. Even
> taking heap dumps and using tools like visualvm shows that these
> instances have "no GC root" and I have even checked the gc log files
> to
> see that the CMS collector does occasionally run. However these
> objects
> never seem to get collected.
> 
> I realize this data may not be enough to narrow down the issue, but
> what
> I am looking for is some kind of help/input/hints/suggestions on what
> I
> should be trying to figure out why these instances aren't GCed. Is
> this
> something that's expected in certain situations?
> 
> -Jaikiran
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

-- 
-- 
Jeremy Whiting
Senior Software Engineer, Middleware Performance Team
Red Hat

------------------------------------------------------------
Registered Address: Red Hat UK Ltd, Peninsular House, 30 Monument
Street, London. United Kingdom.
Registered in England and Wales under Company Registration No.
03798903. Directors: Directors:Michael Cunningham (US), Michael
O'Neill(Ireland), Eric Shander (US)