Troubleshoot memory leak without taking heap dump of Production application

Thu Nov 10 18:15:57 UTC 2016

Hello Amit,

Given the fact that the Full GCs are not able to reclaim space, this 
indicates that there is some strong root that is holding on to the 
growing objects in the Java Heap.

Issue time :Heap usage around 28G

   num     #instances         #bytes  class name

----------------------------------------------

    1:     118037170     6600874168  [C

    2:     103071116     5771982496 java.util.HashMap$Entry

    3:     101560457     5687385592 
com.redknee.product.s5600.ipc.xgen.AcctSessionInfo

    4:     118042761     4721710440 java.lang.String

    5:       9942863     3020272632 [Ljava.lang.Object;

    6:       7537560     2737186632 [Ljava.util.HashMap$Entry;

    7:       1453865      639700600 
com.redknee.product.s5600.ipc.xgen.PdpContextID

    8:       7537148      542674656  java.util.HashMap

I would focus my attention on 
'com.redknee.product.s5600.ipc.xgen.AcctSessionInfo' instances and try 
to determine what is holding them and preventing them from getting 
collected by the Full GCs.

Heap dumps are the best way to figure that out if you could collect one 
from your production system when the issue starts occurring. If that is 
not possible, then would it be possible to run JVMTI agent to collect 
the reference path information for these objects? Long time back, I had 
written this JVMTI agent that given a class name can print the reference 
path information for the instances of that class.
https://blogs.oracle.com/poonam/entry/jvmti_agent_to_print_reference

And if you have access to the code where instances of AcctSessionInfo 
are being created, and stored in a HashMap, I would suggest taking a 
look at the source code around that too and see if there is anything 
obviously happening wrong with the storage of these instances.

Thanks,
Poonam

On 11/9/2016 11:38 PM, Amit Mishra wrote:
>
> Hello Charlie/Poonam/team,
>
> Need your help/suggestions on how to troubleshoot memory leak without 
> taking any heap dump.
>
> We are facing random Promotion failure followed by Continuous 
> concurrent mode failures/Full GC events that impacts our Standalone 
> application for long time until restart.
>
> Application GC remain stable for more than a week with smooth saw 
> tooth pattern and suddenly something happened within 1 hour or so that 
> results in severe GC failure and ultimately application failure.
>
> We have verified traffic pattern/application logs and other dependent 
> application logs but there is no indication on why suddenly at one 
> point of time heap usage kept on increasing which results in CMS 
> failures.(Traffic pattern is fairly stable and there are no scheduled 
> or cron jobs during time of issue)
>
> We cannot take heap dump as this is standalone application having big 
> heap size.(32G)
>
> We have collected histogram during issue time and of non- issue time 
> and found that instances of 2-3 classes have been suddenly increased 
> from 200-300 MB to 5G+ but not sure how we can dig into code to find 
> out what cause those classes instances to surge.
>
> Please guide me how to troubleshoot this issue in terms of any light 
> weight tool that would exactly pin point methods or calls that can 
> lead to this memory leak as we can’t take heap dump which is very 
> heavy impacting tool.
>
> One more question is why Full GC not able to clean generations even 
> after multiple attempts and a continuous loop of GC failures being 
> created which got resolved only after application restart, does it 
> indicates that no new objects was creating & it was only GC algorithm 
> which started failing and increased heap usage.
>
> Many thanks in advance for your kind support and guidance.
>
> This is GC graph and attached is GC file.
>
> cid:image002.jpg at 01D23948.747997C0
>
> Histogram snapshots:
>
> java.util.HashMap$Entry was only 400 MB before issue and then 5.5G 
> during issue same thing true for AcctSessionInfo and java.lang.String 
> class instances.
>
> Non issue time:
>
>   num     #instances         #bytes  class name
>
> ----------------------------------------------
>
>    1:      13613915     2219936904 [Ljava.lang.Object;
>
>   2:      10065566     1569906056 [Ljava.util.HashMap$Entry;
>
>    3:       2671564     1175488160 
> com.redknee.product.s5600.ipc.xgen.PdpContextID
>
>    4:      17247420      903565648  [C
>
>    5:      10055084      723966048 java.util.HashMap
>
>    6:      17208464      688338560 java.lang.String
>
>    7:       7843562      439239472 java.util.HashMap$Entry
>
>    8:      10065566      402622640 java.util.HashMap$FrontCache
>
> Issue time :Heap usage around 28G
>
>   num     #instances         #bytes  class name
>
> ----------------------------------------------
>
>    1:     118037170     6600874168  [C
>
>    2:     103071116     5771982496 java.util.HashMap$Entry
>
>    3:     101560457     5687385592 
> com.redknee.product.s5600.ipc.xgen.AcctSessionInfo
>
>    4:     118042761     4721710440 java.lang.String
>
>    5:       9942863     3020272632 [Ljava.lang.Object;
>
>    6:       7537560     2737186632 [Ljava.util.HashMap$Entry;
>
>    7:       1453865      639700600 
> com.redknee.product.s5600.ipc.xgen.PdpContextID
>
>    8:       7537148      542674656 java.util.HashMap
>
> Thanks,
>
> Amit Mishra
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20161110/6b6879a4/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/jpeg
Size: 34533 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20161110/6b6879a4/attachment-0001.jpe>