G1 garbage collection Ext Root Scanning time increase linearly as application runs
John Cuthbertson
john.cuthbertson at oracle.com
Wed Feb 20 10:56:07 PST 2013
Hi Ramki,
This is what I was thinking. An internal group has also seen the same
problem and has offered to run with an instrumented build. If Ashley is
willing I could supply a temporary patch.
JohnC
On 2/19/2013 10:24 PM, Srinivas Ramakrishna wrote:
> Perhaps Ashley could build an instrumented jvm with time trace around
> the various external root groups scanned serially and the answer would
> be immediate?
>
> ysr1729
>
> On Feb 19, 2013, at 17:38, John Cuthbertson
> <john.cuthbertson at oracle.com <mailto:john.cuthbertson at oracle.com>> wrote:
>
>> Hi Ashely,
>>
>> Off the top of my head there's also the intern string table. I'll
>> have to look at the code to figure out what else it could be.
>>
>> Thanks for the info.
>>
>> JohnC
>>
>> On 2/19/2013 5:11 PM, Ashley Taylor wrote:
>>>
>>> Hi John
>>>
>>> I reran my application with the JIT log turned on. It seems that
>>> once the application has been running for a while there is very
>>> little activity within the JIT log but the pause times keep
>>> climbing, I ran it for 4 hours and the ‘Ext Root Scan’ had climbed
>>> to 40ms.
>>>
>>> At the 4 hour point I also performed a full gc to see how many
>>> classes would be unload and it was only 50. We have around 5500
>>> loaded classes.
>>>
>>> The number of loaded classes also does not increase once the
>>> application has run for a while.
>>>
>>> I also used jstat to see how full the permanent memory region is, it
>>> is slowly climbing the full gc did not seem to reduce it at all,
>>> however the full gc did fix the pause time.
>>>
>>> The permanent region is currently at 89.17% and seems to increase by
>>> 0.01% every couple of minutes.
>>>
>>> Is there any other GC events that only happen at a full gc?
>>>
>>> Cheers,
>>>
>>> Ashley
>>>
>>> *From:*hotspot-gc-use-bounces at openjdk.java.net
>>> [mailto:hotspot-gc-use-bounces at openjdk.java.net] *On Behalf Of *John
>>> Cuthbertson
>>> *Sent:* Wednesday, 20 February 2013 9:10 a.m.
>>> *To:* hotspot-gc-use at openjdk.java.net
>>> *Subject:* Re: G1 garbage collection Ext Root Scanning time increase
>>> linearly as application runs
>>>
>>> Hi Ashley,
>>>
>>> Basically as you surmise one the GC worker threads is being held up
>>> when processing a single root. I've seen s similar issue that's
>>> caused by filling up the code cache (where JIT compiled methods are
>>> held). The code cache is treated as a single root and so is claimed
>>> in its entirety by a single GC worker thread. As a the code cache
>>> fills up, the thread that claims the code cache to scan starts
>>> getting held up.
>>>
>>> A full GC clears the issue because that's where G1 currently does
>>> class unloading: the full GC unloads a whole bunch of classes
>>> allowing any the compiled code of any of the unloaded classes'
>>> methods to be freed by the nmethod sweeper. So after a a full GC the
>>> number of compiled methods in the code cache is less.
>>>
>>> It could also be the just the sheer number of loaded classes as the
>>> system dictionary is also treated as a single claimable root.
>>>
>>> I think there's a couple existing CRs to track this. I'll see if I
>>> can find the numbers.
>>>
>>> Regards,
>>>
>>> JohnC
>>>
>>> On 2/19/2013 11:24 AM, Ashley Taylor wrote:
>>>
>>> Hi Matt
>>>
>>> Seems that the issue I’m experiencing is unrelated to JNI same
>>> issue with JNI calls mocked.
>>>
>>> Reading that post I noticed that your gc pauses where still
>>> increasing after a full gc. In our case a full gc will fix the
>>> issue.
>>>
>>> Will have to keep hunting for the cause in my application.
>>>
>>>
>>> Cheers,
>>>
>>> Ashley
>>>
>>> *From:*Matt Fowles [mailto:matt.fowles at gmail.com]
>>> *Sent:* Tuesday, 19 February 2013 3:49 p.m.
>>> *To:* Ashley Taylor
>>> *Cc:* hotspot-gc-use at openjdk.java.net
>>> <mailto:hotspot-gc-use at openjdk.java.net>
>>> *Subject:* Re: G1 garbage collection Ext Root Scanning time
>>> increase linearly as application runs
>>>
>>> Ashley~
>>>
>>> The issue I was seeing was actually in CMS not G1, but it was
>>> eventually tracked down to leaking LocalReferences in the JNI.
>>> Each LocalRef (or likely GlobalRef) adds 4 bytes to a section
>>> that has to be scanned every GC. If these build up without
>>> bound, you end up with growing GC times.
>>>
>>> The issue that I found essentially boiled down to GetMethodID
>>> calls creating a LocalRef and not being freed.
>>>
>>> You can find the full painful search here:
>>>
>>> http://web.archiveorange.com/archive/v/Dp7Rf33tij5BFBNRpVja
>>>
>>> My minimal reproduction is
>>>
>>> http://web.archiveorange.com/archive/v/Dp7Rf33tij5BFBNRpVja#YnJRjM4IVyt54TV
>>>
>>> I sincerely hope my painful experience can save you time ;-)
>>>
>>> Matt
>>>
>>> On Mon, Feb 18, 2013 at 9:29 PM, Ashley Taylor
>>> <ashley.taylor at sli-systems.com
>>> <mailto:ashley.taylor at sli-systems.com>> wrote:
>>>
>>> Hi Matt
>>>
>>> Thanks for the quick response.
>>>
>>> Yes we do have JNI in this setup, I will disable the JNI link
>>> and rerun the test.
>>>
>>> If it is JNI can you elaborate what you mean by leaked handle in
>>> a JNI thread and how we would go about identifying and fixing that.
>>>
>>> Cheers,
>>>
>>> Ashley
>>>
>>> *From:*Matt Fowles [mailto:matt.fowles at gmail.com
>>> <mailto:matt.fowles at gmail.com>]
>>> *Sent:* Tuesday, 19 February 2013 3:04 p.m.
>>> *To:* Ashley Taylor
>>> *Cc:* hotspot-gc-use at openjdk.java.net
>>> <mailto:hotspot-gc-use at openjdk.java.net>
>>> *Subject:* Re: G1 garbage collection Ext Root Scanning time
>>> increase linearly as application runs
>>>
>>> Ashley~
>>>
>>> Do you have any JNI in the setup? I saw a similar issue that
>>> was painstakingly tracked down to a leaked handle in a JNI thread.
>>>
>>> Matt
>>>
>>> On Mon, Feb 18, 2013 at 8:12 PM, Ashley Taylor
>>> <ashley.taylor at sli-systems.com
>>> <mailto:ashley.taylor at sli-systems.com>> wrote:
>>>
>>> Hi,
>>>
>>> We are testing the performance of the G1 garbage collection.
>>>
>>> Our goal is to be able to remove the full gc pause that
>>> eventually happens when we CMS.
>>>
>>> We have noticed that the garbage collection pause time starts
>>> off really well but over time it keeps climbing.
>>>
>>> Looking at the logs we see that the section that is increasing
>>> linearly with time is the Ext Root Scanning
>>>
>>> Here is a Root Scanning 1 Hour into the application here the
>>> total gc pause is around 80ms
>>>
>>> [Ext Root Scanning (ms): 11.5 0.8 1.5 1.8 1.6 4.8 1.2 1.5
>>> 1.2 1.4 1.1 1.6 1.2 1.1 1.1 1.1 1.2 1.2
>>>
>>> Avg: 2.1, Min: 0.8, Max: 11.5, Diff: 10.7]
>>>
>>> Here is a snap shot after 19 hours. Here the pause is around 280ms
>>>
>>> [Ext Root Scanning (ms): 1.2 184.7 1.3 1.3 1.8 6.3
>>> 1.7 1.2 1.5 1.2 1.2 1.1 1.2 1.1 1.2 1.1 1.2 1.2
>>>
>>> Avg: 11.8, Min: 1.1, Max: 184.7, Diff: 183.6]
>>>
>>> It seems that some task is linearly increasing with time, which
>>> only effects one thread.
>>>
>>> After manually firing a full gc the total pause time returns
>>> back to around 80ms
>>>
>>> After full GC
>>>
>>> [Ext Root Scanning (ms): 2.4 1.7 4.5 2.6 4.6 2.1 2.1 1.7
>>> 2.1 1.8 1.8 2.2 0.6 0.0 0.0 0.0 0.0 0.0
>>>
>>> Avg: 1.7, Min: 0.0, Max: 4.6, Diff: 4.6]
>>>
>>> The test is run with a constant load applied on the application
>>> that should hold the machine at around load 6.
>>>
>>> We have around 3GB of data within the heap which will very
>>> rarely become garbage, life of these objects would be several
>>> hours to days.
>>>
>>> the rest will only live for 10s of milliseconds.
>>>
>>> The JVM memory usage floats between 4-6gb.
>>>
>>> Have checked a thread dump. There are no threads that have very
>>> large stack traces.
>>>
>>> What could cause this increasing pause durations? Is there any
>>> way to get more information out of what that thread is actually
>>> trying to do, or any tuning options?
>>>
>>> Environment
>>>
>>> JVM Arguments
>>>
>>> -Xms8g
>>>
>>> -Xmx8g
>>>
>>> -XX:+UseG1GC
>>>
>>> -XX:InitiatingHeapOccupancyPercent=0 #found that having this at
>>> zero has greatly reduced the frequency of GC pause over 500ms
>>> and the overhead is not that noticeable to our application
>>>
>>> -XX:MaxGCPauseMillis=70
>>>
>>> -XX:+UseLargePages
>>>
>>> Environment
>>>
>>> java version "1.7.0_13"
>>>
>>> Java(TM) SE Runtime Environment (build 1.7.0_13-b20)
>>>
>>> Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode)
>>>
>>> Operating System
>>>
>>> redhat 5.8 machine.
>>>
>>> The machine has 12 cores/ 24threads and 48gb of ram.
>>>
>>> Cheers,
>>>
>>> *Ashley Taylor*
>>>
>>> Software Engineer
>>>
>>> Email:ashley.taylor at sli-systems.com
>>> <mailto:ashley.taylor at sli-systems.com>
>>>
>>> Website: www.sli-systems.com <http://www.sli-systems.com/>
>>>
>>> Blog: blog.sli-systems.com <http://blog.sli-systems.com/>
>>>
>>> Podcast: EcommercePodcast.com <http://ecommercepodcast.com/>
>>>
>>> Twitter: www.twitter.com/slisystems
>>> <http://www.twitter.com/slisystems>
>>>
>>> _______________________________________________
>>> hotspot-gc-use mailing list
>>> hotspot-gc-use at openjdk.java.net
>>> <mailto:hotspot-gc-use at openjdk.java.net>
>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>>
>>> hotspot-gc-use mailing list
>>>
>>> hotspot-gc-use at openjdk.java.net <mailto:hotspot-gc-use at openjdk.java.net>
>>>
>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>
>>
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net <mailto:hotspot-gc-use at openjdk.java.net>
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130220/1ac5eedd/attachment-0001.html
More information about the hotspot-gc-use
mailing list