Java 8 + Docker container - CMS collector leaves around instances that have no GC roots
jwhiting at redhat.com
jwhiting at redhat.com
Mon Nov 26 10:26:41 UTC 2018
Hi Jaikiran
Have a look at some blog posts by old friends :) These blog posts
might be helpful (along with the other replies you received) to
diagnose the root cause of the issue. In particular native memory
tracking.
https://developers.redhat.com/blog/2017/03/14/java-inside-docker/
https://developers.redhat.com/blog/2017/04/04/openjdk-and-containers/
Regards,
Jeremy
On Fri, 2018-11-23 at 19:25 +0530, Jaikiran Pai wrote:
> Hi,
>
> I'm looking for some inputs in debugging a high memory usage issue
> (and
> subsequently the process being killed) in one of the applications I
> deal
> with. Given that from what I have looked into this issue so far, this
> appears to be something to do with the CMS collector, so I hope this
> is
> the right place to this question.
>
> A bit of a background - The application that I'm dealing with is
> ElasticSearch server version 1.7.5. We use Java 8:
>
> java version "1.8.0_172"
> Java(TM) SE Runtime Environment (build 1.8.0_172-b11)
> Java HotSpot(TM) 64-Bit Server VM (build 25.172-b11, mixed mode)
>
> To add to the complexity in debugging this issue, this runs as a
> docker
> container on docker version 18.03.0-ce on a CentOS 7 host VM kernel
> version 3.10.0-693.5.2.el7.x86_64.
>
> We have been noticing that this container/process keeps getting
> killed
> by the oom-killer every few days. The dmesg logs suggest that the
> process has hit the "limits" set on the docker cgroups level. After
> debugging this over past day or so, I've reached a point where I
> can't
> make much sense of the data I'm looking at. The JVM process is
> started
> using the following params (of relevance):
>
> java -Xms2G -Xmx6G -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
> -XX:CMSInitiatingOccupancyFraction=75
> -XX:+UseCMSInitiatingOccupancyOnly
> -XX:+HeapDumpOnOutOfMemoryError -XX:+DisableExplicitGC ....
>
> As you can see it uses CMS collector with 75% of tenured/old gen for
> initiating the GC.
>
> After a few hours/days of running I notice that even though the CMS
> collector does run almost every hour or so, there are huge number of
> objects _with no GC roots_ that never get collected. These objects
> internally seem to hold on to ByteBuffer(s) which (from what I see)
> as a
> result never get released and the non-heap memory keeps building up,
> till the process gets killed. To give an example, here's the jmap
> -histo
> output (only relevant parts):
>
> 1: 861642 196271400 [B
> 2: 198776 28623744
> org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame
> 3: 676722 21655104
> org.apache.lucene.store.ByteArrayDataInput
> 4: 202398 19430208
> org.apache.lucene.codecs.lucene41.Lucene41PostingsWriter$IntBlockTerm
> State
> 5: 261819 18850968
> org.apache.lucene.util.fst.FST$Arc
> 6: 178661 17018376 [C
> 7: 31452 16856024 [I
> 8: 203911 8049352 [J
> 9: 85700 5484800 java.nio.DirectByteBufferR
> 10: 168935 5405920
> java.util.concurrent.ConcurrentHashMap$Node
> 11: 89948 5105328 [Ljava.lang.Object;
> 12: 148514 4752448
> org.apache.lucene.util.WeakIdentityMap$IdentityWeakReference
>
> ....
>
> Total 5061244 418712248
>
> This above output is without the "live" option. Running jmap
> -histo:live
> returns something like (again only relevant parts):
>
> 13: 31753 1016096
> org.apache.lucene.util.WeakIdentityMap$IdentityWeakReference
> ...
> 44: 887 127728
> org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame
> ...
> 50: 3054 97728
> org.apache.lucene.store.ByteArrayDataInput
> ...
> 59: 888 85248
> org.apache.lucene.codecs.lucene41.Lucene41PostingsWriter$IntBlockTerm
> State
>
> Total 1177783 138938920
>
>
> Notice the vast difference between the live and non-live instances of
> the same class. This isn't just in one "snapshot". I have been
> monitoring this for more than a day and this pattern continues. Even
> taking heap dumps and using tools like visualvm shows that these
> instances have "no GC root" and I have even checked the gc log files
> to
> see that the CMS collector does occasionally run. However these
> objects
> never seem to get collected.
>
> I realize this data may not be enough to narrow down the issue, but
> what
> I am looking for is some kind of help/input/hints/suggestions on what
> I
> should be trying to figure out why these instances aren't GCed. Is
> this
> something that's expected in certain situations?
>
> -Jaikiran
>
>
>
>
>
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
--
--
Jeremy Whiting
Senior Software Engineer, Middleware Performance Team
Red Hat
------------------------------------------------------------
Registered Address: Red Hat UK Ltd, Peninsular House, 30 Monument
Street, London. United Kingdom.
Registered in England and Wales under Company Registration No.
03798903. Directors: Directors:Michael Cunningham (US), Michael
O'Neill(Ireland), Eric Shander (US)
More information about the hotspot-gc-use
mailing list