the amazing tales of the search for the invisible man! or, where's my gc root
Jed Wesley-Smith
jed at atlassian.com
Mon Apr 20 08:13:06 UTC 2009
Hi Kirk,
Having had a chance to look more into your findings I think we are
getting somewhere, but I still have some questions.
The java.lang.reflect.Method count definitely looks like a problem, but
I it is a by-product of one of the things I was using to eliminate
(noisy) references - specifically the weak references that were coming
java.lang.reflect.Proxy.loaderToCache and java.lang.reflect.Proxy.
Unfortunately removing the caching meant that these proxies get
recreated and hence the Method objects get recreated. I removed this
code and rerun the tests and can confirm that the number of Method
objects is now much smaller (10k).
Of more specific interest to us is the number of objects that have no GC
root. I was hoping that the NetBeans analyzer might be able to identify
a root that all the other analyzers had failed to find. For instance if
I look at the instances of the "JiraPluginManager" I see over a dozen
instances (there should normally only be one and never be more than two
- while the plugin system is reloading). If I ask the NetBeans profiler
(like every other profiler) for the nearest GC root it says that none is
found. There should only be about 50MB of reachable heap but there is
~80MB dark heap that will not be GC'd by the Sun JVM. Another
interesting case in point is the hundreds of DefaultPicoContainer
instances with no GC root. These guys have fairly tightly circular
reference chains with the various ComponentAdapter instances, but will
not die. Also check out the ASTReference instances. These seem to have
some of the tightest
We have definitely established that there is no memory leak under the
IBM JDK. The aforementioned Proxy cache clearing was causing problems
with heap growth but once we removed that it went away.
Our stab in the dark theory about this is that there is somehow just too
much densely circular referenced heap chains living in the old gen that
are somehow causing the GC algorithm to choke. The fact that the CMS
collector helps to some extent does seem to support this theory. On the
converse to this theory is that reducing the application's workload to
zero and repeatedly asking it to full GC it will not release any further
heap.
We have looked at all finalizers and the most that any of them do is to
call JarFile.close(). Following is the list of objects that may be
finalized:
1 com.sun.crypto.provider.PBEKey
71 java.io.FileInputStream
10 java.io.FileOutputStream
7 java.lang.ClassLoader$NativeLibrary
14 java.net.SocksSocketImpl
22 java.util.Timer$1
1 java.util.concurrent.ScheduledThreadPoolExecutor
2 java.util.concurrent.ThreadPoolExecutor
227 java.util.jar.JarFile
288 java.util.zip.Inflater
143 org.apache.felix.framework.cache.JarContent
2 org.apache.log4j.ConsoleAppender
6 org.apache.log4j.RollingFileAppender
1 org.hsqldb.Database
8 org.hsqldb.jdbc.jdbcConnection
1 org.hsqldb.persist.NIOLockFile
1 org.hsqldb.scriptio.ScriptWriterText
86 sun.net.www.protocol.jar.URLJarFile
None of them contain references back to any of the application.
If anyone can help us understand why the JiraPluginManager and other
instances listed above remain under the Sun JVM it would be most
appreciated.
cheers,
jed.
kirk wrote:
> Hi Jed,
>
> I've had a quick look at the heap dump. I'm having a little trouble
> understanding what is in there. What I can see is a large number of
> java.lang.reflect.Method objects being held. There seems to be two
> competing patterns of references holding onto these objects. I've
> attached some screenshots rather than use words.
>
> The scary thing is that the references include ClassLoader.scl,
> JDK12Hooks.systemClassLoader as well as Apache commons logging
> LogFactory. With this type of the complex entanglement it would seem
> unlikely that these objects would ever be collected. The other pattern
> also includes the spiders web of references. It also includes
> UberspecImpl and a whole bunch of static collections. IME, static
> collections are involved in the vast majority of leaks I've diagnosed.
>
> Interestingly enough the a portion of the 2nd largest consumer of
> memory is also tangled up in the JDK12Hooks. Random sampling leads me
> to AST parse trees and "no reference". Looks like much of this is tied
> up with Velocity. In fact the largest consumer of memory at 24% is
> char[]. I'm failing to find anything that is not tied up with Velocity
> (AST parsing).
>
> Needs more investigation. Be interesting to run a test with
> generations turned on. NetBeans generations is a true count unlike
> that provided by YourKit.
>
> Regards,
> Kirk
>
>
> Jed Wesley-Smith wrote:
>> Classes as well. We end up getting an OOME although the profilers
>> report only a third of the heap is reachable.
>>
>> Although I indicated we saw this on the IBM jdk analysis of that dump
>> showed a completely different issue that apparently may not be a
>> problem (due to reflection optimisation on that jdk) - the dead
>> objects appear to have been correctly cleared. We are reproducing
>> this to verify.
>>
>> Additionally we tried running with -client on the sun jvms as we saw
>> a bug that might have caused it reported against server only but
>> without success.
>>
>> cheers,
>> jed.
>>
>> On 16/04/2009, at 12:51 AM, Tony Printezis
>> <Antonios.Printezis at sun.com> wrote:
>>
>>> OK, I'll bite.
>>>
>>> When you say: "a large section of memory (a plugin framework)" do
>>> you mean only objects in the young / old gen, or also classes in the
>>> perm gen?
>>>
>>> How do you know that said memory is not being reclaimed? Do you
>>> eventually get an OOM?
>>>
>>> Given that it happens with two different JVMs (I assume you use
>>> HotSpot on Linux and Mac, as well as the IBM JDK), it's unlikely to
>>> be a GC bug, as both JVMs would need to have the same bug. Not
>>> impossible, but unlikely, IMHO.
>>>
>>> Tony
>>>
>>> Jed Wesley-Smith wrote:
>>>> all,
>>>>
>>>> I am writing to this list in some desperation hoping for some
>>>> expert advice. We (the JIRA development team at Atlassian) have
>>>> been hunting memory leaks for some weeks and in the process have
>>>> tracked down and removed every possible reference to a large
>>>> section of memory (a plugin framework) that we could find. Starting
>>>> with all strong references and proceeding to remove soft and weak
>>>> references - even things like clearing the java.lang.reflect.Proxy
>>>> cache - and even Finalizer references until both YourKit, Eclipse
>>>> MAT, JProfiler and jhat all report that the memory in question is
>>>> dead and should be collectable, but inexplicably _the JVM still
>>>> holds on to it_. There are no JNI Global references either, yet
>>>> this memory remains uncollectable!
>>>>
>>>> This happens for the 1.5 and 1.6 JVMs on Linux and Mac, and the IBM
>>>> 1.6 JDK on Linux.
>>>>
>>>> So my question is, how on earth do I search for what is referencing
>>>> this uncollectable memory? Are there any other tools that can help
>>>> find why this memory is not collected? Can I query the VM directly
>>>> somehow?
>>>>
>>>> I fear this is a JVM GC bug as no known memory analysis tool can
>>>> find the heap root (i.e. according to "the rules" there is no heap
>>>> root). Are there any known GC memory leaks caused by ClassLoaders
>>>> being dropped for instance?
>>>>
>>>> The application is creating and disposing of a lot of ClassLoaders
>>>> via OSGi (Apache Felix) with Spring OSGi. It creates a lot of
>>>> java.lang.reflect.Proxy class instances.
>>>>
>>>> We have written this up and added an example heap dump here:
>>>> http://jira.atlassian.com/browse/JRA-16932
>>>>
>>>> Having come to the end of our tethers here, if anyone can help in
>>>> any way it would be massively appreciated.
>>>>
>>>> cheers,
>>>> Jed Wesley-Smith
>>>> JIRA Team @ Atlassian
More information about the hotspot-gc-dev
mailing list