the amazing tales of the search for the invisible man! or, where's my gc root

kirk kirk at kodewerk.com
Mon Apr 20 09:43:54 UTC 2009


Ok, new information. You are using CMS. By default, CMS doesn't collect 
perm-space. The class-loader related bits that I saw would be in 
perm-space. IBM heap is structured very differently and it doesn't 
contain a perm-space. Your conformation of no leak in IBM JVM points 
directly to a perm-space leak. Read Frank Kieviet's blogs about perm 
space. 
http://blogs.sun.com/fkieviet/entry/classloader_leaks_the_dreaded_java

I am on the tipping point about CMS not collecting perm-space by default 
and perm-space leaks in general. But I can charge nice sums of money 
fixing them and look like Scotty in the process so I shalln't bite the 
hand that feeds.. ;-) In your case I'll just be looking for more beer!

I was chatting to Tony about the number of int[] arrays in the profile. 
He mentioned that holes in heap maybe represented as int[] with no GC 
root. This can be a bit confusing at first glance.

JiraPluginManager and DefaultPicoContainer.. hummm. I need to take 
another look.
Ok for the Default PicoContainer I checked the first 10 of 89 instances 
and was able to locate a GC root for each of them. The first instance 
has one GC root as a Java frame. Tracing back you get right into apache 
commons classloaders once again. There are a number of weak links buy 
more importantly, there are a number of array slots that also keep a 
grip on these objects. Strangely enough, the first one that I randomly 
selected had no GC root. IMHO, this object should be collectable and 
since it hasn't been there are one of two possibilities. It is either in 
perm space or it is a dead object that wasn't collected prior to the 
heap dump being taken. A heap dump contains *all* objects, both dead and 
alive. Best to run GC twice prior to dumping heap.

8 instances of JiraPluginManager. 6 have no GC roots and should be 
collectable. The first instance is listed as a JavaFrame and the last is 
tied up with apache commons classloading.

See if you can set the CMS perm space sweeping flag, this the 
System.gc() twice and then re-take a heap dump and send it. I'll have 
another look

Regards,
Kirk


Jed Wesley-Smith wrote:
> Hi Kirk,
>
> Having had a chance to look more into your findings I think we are 
> getting somewhere, but I still have some questions.
>
> The java.lang.reflect.Method count definitely looks like a problem, 
> but I it is a by-product of one of the things I was using to eliminate 
> (noisy) references - specifically the weak references that were coming 
> java.lang.reflect.Proxy.loaderToCache and java.lang.reflect.Proxy. 
> Unfortunately removing the caching meant that these proxies get 
> recreated and hence the Method objects get recreated. I removed this 
> code and rerun the tests and can confirm that the number of Method 
> objects is now much smaller (10k).
>
> Of more specific interest to us is the number of objects that have no 
> GC root. I was hoping that the NetBeans analyzer might be able to 
> identify a root that all the other analyzers had failed to find. For 
> instance if I look at the instances of the "JiraPluginManager" I see 
> over a dozen instances (there should normally only be one and never be 
> more than two - while the plugin system is reloading). If I ask the 
> NetBeans profiler (like every other profiler) for the nearest GC root 
> it says that none is found. There should only be about 50MB of 
> reachable heap but there is ~80MB dark heap that will not be GC'd by 
> the Sun JVM. Another interesting case in point is the hundreds of 
> DefaultPicoContainer instances with no GC root. These guys have fairly 
> tightly circular reference chains with the various ComponentAdapter 
> instances, but will not die. Also check out the ASTReference 
> instances. These seem to have some of the tightest
>
> We have definitely established that there is no memory leak under the 
> IBM JDK. The aforementioned Proxy cache clearing was causing problems 
> with heap growth but once we removed that it went away.
>
> Our stab in the dark theory about this is that there is somehow just 
> too much densely circular referenced heap chains living in the old gen 
> that are somehow causing the GC algorithm to choke. The fact that the 
> CMS collector helps to some extent does seem to support this theory. 
> On the converse to this theory is that reducing the application's 
> workload to zero and repeatedly asking it to full GC it will not 
> release any further heap.
>
> We have looked at all finalizers and the most that any of them do is 
> to call JarFile.close(). Following is the list of objects that may be 
> finalized:
>
> 1    com.sun.crypto.provider.PBEKey
> 71    java.io.FileInputStream
> 10    java.io.FileOutputStream
> 7    java.lang.ClassLoader$NativeLibrary
> 14    java.net.SocksSocketImpl
> 22    java.util.Timer$1
> 1    java.util.concurrent.ScheduledThreadPoolExecutor
> 2    java.util.concurrent.ThreadPoolExecutor
> 227    java.util.jar.JarFile
> 288    java.util.zip.Inflater
> 143    org.apache.felix.framework.cache.JarContent
> 2    org.apache.log4j.ConsoleAppender
> 6    org.apache.log4j.RollingFileAppender
> 1    org.hsqldb.Database
> 8    org.hsqldb.jdbc.jdbcConnection
> 1    org.hsqldb.persist.NIOLockFile
> 1    org.hsqldb.scriptio.ScriptWriterText
> 86    sun.net.www.protocol.jar.URLJarFile
>
> None of them contain references back to any of the application.
>
> If anyone can help us understand why the JiraPluginManager and other 
> instances listed above remain under the Sun JVM it would be most 
> appreciated.
>
> cheers,
> jed.
>
> kirk wrote:
>> Hi Jed,
>>
>> I've had a quick look at the heap dump. I'm having a little trouble 
>> understanding what is in there. What I can see is a large number of 
>> java.lang.reflect.Method objects being held. There seems to be two 
>> competing patterns of references holding onto these objects. I've 
>> attached some screenshots rather than use words.
>>
>> The scary thing is that the references include ClassLoader.scl, 
>> JDK12Hooks.systemClassLoader as well as Apache commons logging 
>> LogFactory. With this type of the complex entanglement it would seem 
>> unlikely that these objects would ever be collected. The other 
>> pattern also includes the spiders web of references. It also includes 
>> UberspecImpl and a whole bunch of static collections. IME, static 
>> collections are involved in the vast majority of leaks I've diagnosed.
>>
>> Interestingly enough the a portion of the 2nd largest consumer of 
>> memory is also tangled up in the JDK12Hooks. Random sampling leads me 
>> to AST parse trees and "no reference". Looks like much of this is 
>> tied up with Velocity. In fact the largest consumer of memory at 24% 
>> is char[]. I'm failing to find anything that is not tied up with 
>> Velocity (AST parsing).
>>
>> Needs more investigation. Be interesting to run a test with 
>> generations turned on. NetBeans generations is a true count unlike 
>> that provided by YourKit.
>>
>> Regards,
>> Kirk
>>
>>
>> Jed Wesley-Smith wrote:
>>> Classes as well. We end up getting an OOME although the profilers 
>>> report only a third of the heap is reachable.
>>>
>>> Although I indicated we saw this on the IBM jdk analysis of that 
>>> dump showed a completely different issue that apparently may not be 
>>> a problem (due to reflection optimisation on that jdk) - the dead 
>>> objects appear to have been correctly cleared. We are reproducing 
>>> this to verify.
>>>
>>> Additionally we tried running with -client on the sun jvms as we saw 
>>> a bug that might have caused it reported against server only but 
>>> without success.
>>>
>>> cheers,
>>> jed.
>>>
>>> On 16/04/2009, at 12:51 AM, Tony Printezis 
>>> <Antonios.Printezis at sun.com> wrote:
>>>
>>>> OK, I'll bite.
>>>>
>>>> When you say: "a large section of memory (a plugin framework)" do 
>>>> you mean only objects in the young / old gen, or also classes in 
>>>> the perm gen?
>>>>
>>>> How do you know that said memory is not being reclaimed? Do you 
>>>> eventually get an OOM?
>>>>
>>>> Given that it happens with two different JVMs (I assume you use 
>>>> HotSpot on Linux and Mac, as well as the IBM JDK), it's unlikely to 
>>>> be a GC bug, as both JVMs would need to have the same bug. Not 
>>>> impossible, but unlikely, IMHO.
>>>>
>>>> Tony
>>>>
>>>> Jed Wesley-Smith wrote:
>>>>> all,
>>>>>
>>>>> I am writing to this list in some desperation hoping for some 
>>>>> expert advice. We (the JIRA development team at Atlassian) have 
>>>>> been hunting memory leaks for some weeks and in the process have 
>>>>> tracked down and removed every possible reference to a large 
>>>>> section of memory (a plugin framework) that we could find. 
>>>>> Starting with all strong references and proceeding to remove soft 
>>>>> and weak references - even things like clearing the 
>>>>> java.lang.reflect.Proxy cache - and even Finalizer references 
>>>>> until both YourKit, Eclipse MAT, JProfiler and jhat all report 
>>>>> that the memory in question is dead and should be collectable, but 
>>>>> inexplicably _the JVM still holds on to it_. There are no JNI 
>>>>> Global references either, yet this memory remains uncollectable!
>>>>>
>>>>> This happens for the 1.5 and 1.6 JVMs on Linux and Mac, and the 
>>>>> IBM 1.6 JDK on Linux.
>>>>>
>>>>> So my question is, how on earth do I search for what is 
>>>>> referencing this uncollectable memory? Are there any other tools 
>>>>> that can help find why this memory is not collected? Can I query 
>>>>> the VM directly somehow?
>>>>>
>>>>> I fear this is a JVM GC bug as no known memory analysis tool can 
>>>>> find the heap root (i.e. according to "the rules" there is no heap 
>>>>> root). Are there any known GC memory leaks caused by ClassLoaders 
>>>>> being dropped for instance?
>>>>>
>>>>> The application is creating and disposing of a lot of ClassLoaders 
>>>>> via OSGi (Apache Felix) with Spring OSGi. It creates a lot of 
>>>>> java.lang.reflect.Proxy class instances.
>>>>>
>>>>> We have written this up and added an example heap dump here:
>>>>> http://jira.atlassian.com/browse/JRA-16932
>>>>>
>>>>> Having come to the end of our tethers here, if anyone can help in 
>>>>> any way it would be massively appreciated.
>>>>>
>>>>> cheers,
>>>>> Jed Wesley-Smith
>>>>> JIRA Team @ Atlassian
>
>




More information about the hotspot-gc-dev mailing list