the amazing tales of the search for the invisible man! or, where's my gc root

Mon Apr 20 08:13:06 UTC 2009

Hi Kirk,

Having had a chance to look more into your findings I think we are 
getting somewhere, but I still have some questions.

The java.lang.reflect.Method count definitely looks like a problem, but 
I it is a by-product of one of the things I was using to eliminate 
(noisy) references - specifically the weak references that were coming 
java.lang.reflect.Proxy.loaderToCache and java.lang.reflect.Proxy. 
Unfortunately removing the caching meant that these proxies get 
recreated and hence the Method objects get recreated. I removed this 
code and rerun the tests and can confirm that the number of Method 
objects is now much smaller (10k).

Of more specific interest to us is the number of objects that have no GC 
root. I was hoping that the NetBeans analyzer might be able to identify 
a root that all the other analyzers had failed to find. For instance if 
I look at the instances of the "JiraPluginManager" I see over a dozen 
instances (there should normally only be one and never be more than two 
- while the plugin system is reloading). If I ask the NetBeans profiler 
(like every other profiler) for the nearest GC root it says that none is 
found. There should only be about 50MB of reachable heap but there is 
~80MB dark heap that will not be GC'd by the Sun JVM. Another 
interesting case in point is the hundreds of DefaultPicoContainer 
instances with no GC root. These guys have fairly tightly circular 
reference chains with the various ComponentAdapter instances, but will 
not die. Also check out the ASTReference instances. These seem to have 
some of the tightest

We have definitely established that there is no memory leak under the 
IBM JDK. The aforementioned Proxy cache clearing was causing problems 
with heap growth but once we removed that it went away.

Our stab in the dark theory about this is that there is somehow just too 
much densely circular referenced heap chains living in the old gen that 
are somehow causing the GC algorithm to choke. The fact that the CMS 
collector helps to some extent does seem to support this theory. On the 
converse to this theory is that reducing the application's workload to 
zero and repeatedly asking it to full GC it will not release any further 
heap.

We have looked at all finalizers and the most that any of them do is to 
call JarFile.close(). Following is the list of objects that may be 
finalized:

1    com.sun.crypto.provider.PBEKey
71    java.io.FileInputStream
10    java.io.FileOutputStream
7    java.lang.ClassLoader$NativeLibrary
14    java.net.SocksSocketImpl
22    java.util.Timer$1
1    java.util.concurrent.ScheduledThreadPoolExecutor
2    java.util.concurrent.ThreadPoolExecutor
227    java.util.jar.JarFile
288    java.util.zip.Inflater
143    org.apache.felix.framework.cache.JarContent
2    org.apache.log4j.ConsoleAppender
6    org.apache.log4j.RollingFileAppender
1    org.hsqldb.Database
8    org.hsqldb.jdbc.jdbcConnection
1    org.hsqldb.persist.NIOLockFile
1    org.hsqldb.scriptio.ScriptWriterText
86    sun.net.www.protocol.jar.URLJarFile

None of them contain references back to any of the application.

If anyone can help us understand why the JiraPluginManager and other 
instances listed above remain under the Sun JVM it would be most 
appreciated.

cheers,
jed.

kirk wrote:
> Hi Jed,
>
> I've had a quick look at the heap dump. I'm having a little trouble 
> understanding what is in there. What I can see is a large number of 
> java.lang.reflect.Method objects being held. There seems to be two 
> competing patterns of references holding onto these objects. I've 
> attached some screenshots rather than use words.
>
> The scary thing is that the references include ClassLoader.scl, 
> JDK12Hooks.systemClassLoader as well as Apache commons logging 
> LogFactory. With this type of the complex entanglement it would seem 
> unlikely that these objects would ever be collected. The other pattern 
> also includes the spiders web of references. It also includes 
> UberspecImpl and a whole bunch of static collections. IME, static 
> collections are involved in the vast majority of leaks I've diagnosed.
>
> Interestingly enough the a portion of the 2nd largest consumer of 
> memory is also tangled up in the JDK12Hooks. Random sampling leads me 
> to AST parse trees and "no reference". Looks like much of this is tied 
> up with Velocity. In fact the largest consumer of memory at 24% is 
> char[]. I'm failing to find anything that is not tied up with Velocity 
> (AST parsing).
>
> Needs more investigation. Be interesting to run a test with 
> generations turned on. NetBeans generations is a true count unlike 
> that provided by YourKit.
>
> Regards,
> Kirk
>
>
> Jed Wesley-Smith wrote:
>> Classes as well. We end up getting an OOME although the profilers 
>> report only a third of the heap is reachable.
>>
>> Although I indicated we saw this on the IBM jdk analysis of that dump 
>> showed a completely different issue that apparently may not be a 
>> problem (due to reflection optimisation on that jdk) - the dead 
>> objects appear to have been correctly cleared. We are reproducing 
>> this to verify.
>>
>> Additionally we tried running with -client on the sun jvms as we saw 
>> a bug that might have caused it reported against server only but 
>> without success.
>>
>> cheers,
>> jed.
>>
>> On 16/04/2009, at 12:51 AM, Tony Printezis 
>> <Antonios.Printezis at sun.com> wrote:
>>
>>> OK, I'll bite.
>>>
>>> When you say: "a large section of memory (a plugin framework)" do 
>>> you mean only objects in the young / old gen, or also classes in the 
>>> perm gen?
>>>
>>> How do you know that said memory is not being reclaimed? Do you 
>>> eventually get an OOM?
>>>
>>> Given that it happens with two different JVMs (I assume you use 
>>> HotSpot on Linux and Mac, as well as the IBM JDK), it's unlikely to 
>>> be a GC bug, as both JVMs would need to have the same bug. Not 
>>> impossible, but unlikely, IMHO.
>>>
>>> Tony
>>>
>>> Jed Wesley-Smith wrote:
>>>> all,
>>>>
>>>> I am writing to this list in some desperation hoping for some 
>>>> expert advice. We (the JIRA development team at Atlassian) have 
>>>> been hunting memory leaks for some weeks and in the process have 
>>>> tracked down and removed every possible reference to a large 
>>>> section of memory (a plugin framework) that we could find. Starting 
>>>> with all strong references and proceeding to remove soft and weak 
>>>> references - even things like clearing the java.lang.reflect.Proxy 
>>>> cache - and even Finalizer references until both YourKit, Eclipse 
>>>> MAT, JProfiler and jhat all report that the memory in question is 
>>>> dead and should be collectable, but inexplicably _the JVM still 
>>>> holds on to it_. There are no JNI Global references either, yet 
>>>> this memory remains uncollectable!
>>>>
>>>> This happens for the 1.5 and 1.6 JVMs on Linux and Mac, and the IBM 
>>>> 1.6 JDK on Linux.
>>>>
>>>> So my question is, how on earth do I search for what is referencing 
>>>> this uncollectable memory? Are there any other tools that can help 
>>>> find why this memory is not collected? Can I query the VM directly 
>>>> somehow?
>>>>
>>>> I fear this is a JVM GC bug as no known memory analysis tool can 
>>>> find the heap root (i.e. according to "the rules" there is no heap 
>>>> root). Are there any known GC memory leaks caused by ClassLoaders 
>>>> being dropped for instance?
>>>>
>>>> The application is creating and disposing of a lot of ClassLoaders 
>>>> via OSGi (Apache Felix) with Spring OSGi. It creates a lot of 
>>>> java.lang.reflect.Proxy class instances.
>>>>
>>>> We have written this up and added an example heap dump here:
>>>> http://jira.atlassian.com/browse/JRA-16932
>>>>
>>>> Having come to the end of our tethers here, if anyone can help in 
>>>> any way it would be massively appreciated.
>>>>
>>>> cheers,
>>>> Jed Wesley-Smith
>>>> JIRA Team @ Atlassian