G1 garbage collection Ext Root Scanning time increase linearly as application runs

Wed Feb 20 10:56:07 PST 2013

Hi Ramki,

This is what I was thinking. An internal group has also seen the same 
problem and has offered to run with an instrumented build. If Ashley is 
willing I could supply a temporary patch.

JohnC

On 2/19/2013 10:24 PM, Srinivas Ramakrishna wrote:
> Perhaps Ashley could build an instrumented jvm with time trace around 
> the various external root groups scanned serially and the answer would 
> be immediate?
>
> ysr1729
>
> On Feb 19, 2013, at 17:38, John Cuthbertson 
> <john.cuthbertson at oracle.com <mailto:john.cuthbertson at oracle.com>> wrote:
>
>> Hi Ashely,
>>
>> Off the top of my head there's also the intern string table. I'll 
>> have to look at the code to figure out what else it could be.
>>
>> Thanks for the info.
>>
>> JohnC
>>
>> On 2/19/2013 5:11 PM, Ashley Taylor wrote:
>>>
>>> Hi John
>>>
>>> I reran my application with the JIT log turned on. It seems that 
>>> once the application has been running for a while there is very 
>>> little activity within the JIT log but the pause times keep 
>>> climbing, I ran it for 4 hours and the ‘Ext Root Scan’ had climbed 
>>> to 40ms.
>>>
>>> At the 4 hour point I also performed a full gc to see how many 
>>> classes would be unload and it was only 50. We have around 5500 
>>> loaded classes.
>>>
>>> The number of loaded classes also does not increase once the 
>>> application has run for a while.
>>>
>>> I also used jstat to see how full the permanent memory region is, it 
>>> is slowly climbing the full gc did not seem to reduce it at all, 
>>> however the full gc did fix the pause time.
>>>
>>> The permanent region is currently at 89.17% and seems to increase by 
>>> 0.01% every couple of minutes.
>>>
>>> Is there any other GC events that only happen at a full gc?
>>>
>>> Cheers,
>>>
>>> Ashley
>>>
>>> *From:*hotspot-gc-use-bounces at openjdk.java.net 
>>> [mailto:hotspot-gc-use-bounces at openjdk.java.net] *On Behalf Of *John 
>>> Cuthbertson
>>> *Sent:* Wednesday, 20 February 2013 9:10 a.m.
>>> *To:* hotspot-gc-use at openjdk.java.net
>>> *Subject:* Re: G1 garbage collection Ext Root Scanning time increase 
>>> linearly as application runs
>>>
>>> Hi Ashley,
>>>
>>> Basically as you surmise one the GC worker threads is being held up 
>>> when processing a single root. I've seen s similar issue that's 
>>> caused by filling up the code cache (where JIT compiled methods are 
>>> held). The code cache is treated as a single root and so is claimed 
>>> in its entirety by a single GC worker thread. As a the code cache 
>>> fills up, the thread that claims the code cache to scan starts 
>>> getting held up.
>>>
>>> A full GC clears the issue because that's where G1 currently does 
>>> class unloading: the full GC unloads a whole bunch of classes 
>>> allowing any the compiled code of any of the unloaded classes' 
>>> methods to be freed by the nmethod sweeper. So after a a full GC the 
>>> number of compiled methods in the code cache is less.
>>>
>>> It could also be the just the sheer number of loaded classes as the 
>>> system dictionary is also treated as a single claimable root.
>>>
>>> I think there's a couple existing CRs to track this. I'll see if I 
>>> can find the numbers.
>>>
>>> Regards,
>>>
>>> JohnC
>>>
>>> On 2/19/2013 11:24 AM, Ashley Taylor wrote:
>>>
>>>     Hi Matt
>>>
>>>     Seems that the issue I’m experiencing is unrelated to JNI same
>>>     issue with JNI calls mocked.
>>>
>>>     Reading that post I noticed that your gc pauses where still
>>>     increasing after a full gc. In our case a full gc will fix the
>>>     issue.
>>>
>>>     Will have to keep hunting for the cause in my application.
>>>
>>>
>>>     Cheers,
>>>
>>>     Ashley
>>>
>>>     *From:*Matt Fowles [mailto:matt.fowles at gmail.com]
>>>     *Sent:* Tuesday, 19 February 2013 3:49 p.m.
>>>     *To:* Ashley Taylor
>>>     *Cc:* hotspot-gc-use at openjdk.java.net
>>>     <mailto:hotspot-gc-use at openjdk.java.net>
>>>     *Subject:* Re: G1 garbage collection Ext Root Scanning time
>>>     increase linearly as application runs
>>>
>>>     Ashley~
>>>
>>>     The issue I was seeing was actually in CMS not G1, but it was
>>>     eventually tracked down to leaking LocalReferences in the JNI.
>>>      Each LocalRef (or likely GlobalRef) adds 4 bytes to a section
>>>     that has to be scanned every GC.  If these build up without
>>>     bound, you end up with growing GC times.
>>>
>>>     The issue that I found essentially boiled down to GetMethodID
>>>     calls creating a LocalRef and not being freed.
>>>
>>>     You can find the full painful search here:
>>>
>>>     http://web.archiveorange.com/archive/v/Dp7Rf33tij5BFBNRpVja
>>>
>>>     My minimal reproduction is
>>>
>>>     http://web.archiveorange.com/archive/v/Dp7Rf33tij5BFBNRpVja#YnJRjM4IVyt54TV
>>>
>>>     I sincerely hope my painful experience can save you time ;-)
>>>
>>>     Matt
>>>
>>>     On Mon, Feb 18, 2013 at 9:29 PM, Ashley Taylor
>>>     <ashley.taylor at sli-systems.com
>>>     <mailto:ashley.taylor at sli-systems.com>> wrote:
>>>
>>>     Hi Matt
>>>
>>>     Thanks for the quick response.
>>>
>>>     Yes we do have JNI in this setup, I will disable the JNI link
>>>     and rerun the test.
>>>
>>>     If it is JNI can you elaborate what you mean by leaked handle in
>>>     a JNI thread and how we would go about identifying and fixing that.
>>>
>>>     Cheers,
>>>
>>>     Ashley
>>>
>>>     *From:*Matt Fowles [mailto:matt.fowles at gmail.com
>>>     <mailto:matt.fowles at gmail.com>]
>>>     *Sent:* Tuesday, 19 February 2013 3:04 p.m.
>>>     *To:* Ashley Taylor
>>>     *Cc:* hotspot-gc-use at openjdk.java.net
>>>     <mailto:hotspot-gc-use at openjdk.java.net>
>>>     *Subject:* Re: G1 garbage collection Ext Root Scanning time
>>>     increase linearly as application runs
>>>
>>>     Ashley~
>>>
>>>     Do you have any JNI in the setup?  I saw a similar issue that
>>>     was painstakingly tracked down to a leaked handle in a JNI thread.
>>>
>>>     Matt
>>>
>>>     On Mon, Feb 18, 2013 at 8:12 PM, Ashley Taylor
>>>     <ashley.taylor at sli-systems.com
>>>     <mailto:ashley.taylor at sli-systems.com>> wrote:
>>>
>>>     Hi,
>>>
>>>     We are testing the performance of the G1 garbage collection.
>>>
>>>     Our goal is to be able to remove the full gc pause that
>>>     eventually happens when we CMS.
>>>
>>>     We have noticed that the garbage collection pause time starts
>>>     off really well but over time it keeps climbing.
>>>
>>>     Looking at the logs we see that the section that is increasing
>>>     linearly with time is the Ext Root Scanning
>>>
>>>     Here is a Root Scanning 1 Hour into the application here the
>>>     total gc pause is around 80ms
>>>
>>>     [Ext Root Scanning (ms):  11.5 0.8  1.5  1.8  1.6  4.8  1.2  1.5
>>>     1.2  1.4  1.1  1.6  1.2  1.1  1.1 1.1  1.2  1.2
>>>
>>>     Avg:   2.1, Min:   0.8, Max: 11.5, Diff:  10.7]
>>>
>>>     Here is a snap shot after 19 hours. Here the pause is around 280ms
>>>
>>>           [Ext Root Scanning (ms):  1.2  184.7 1.3  1.3  1.8  6.3 
>>>     1.7  1.2  1.5 1.2  1.2  1.1  1.2  1.1  1.2  1.1  1.2  1.2
>>>
>>>     Avg:  11.8, Min:   1.1, Max: 184.7, Diff: 183.6]
>>>
>>>     It seems that some task is linearly increasing with time, which
>>>     only effects one thread.
>>>
>>>     After manually firing a full gc the total pause time returns
>>>     back to around 80ms
>>>
>>>     After full GC
>>>
>>>     [Ext Root Scanning (ms):  2.4  1.7 4.5  2.6  4.6  2.1  2.1  1.7 
>>>     2.1 1.8  1.8  2.2  0.6  0.0  0.0  0.0 0.0  0.0
>>>
>>>     Avg:   1.7, Min:   0.0, Max: 4.6, Diff:   4.6]
>>>
>>>     The test is run with a constant load applied on the application
>>>     that should hold the machine at around load 6.
>>>
>>>     We have around 3GB of data within the heap which  will very
>>>     rarely become garbage, life of these objects would be several
>>>     hours to days.
>>>
>>>     the rest will only live for 10s of milliseconds.
>>>
>>>     The JVM memory usage floats between 4-6gb.
>>>
>>>     Have checked a thread dump. There are no threads that have very
>>>     large stack traces.
>>>
>>>     What could cause this increasing pause durations? Is there any
>>>     way to get more information out of what that thread is actually
>>>     trying to do, or any tuning options?
>>>
>>>     Environment
>>>
>>>     JVM Arguments
>>>
>>>     -Xms8g
>>>
>>>     -Xmx8g
>>>
>>>     -XX:+UseG1GC
>>>
>>>     -XX:InitiatingHeapOccupancyPercent=0 #found that having this at
>>>     zero has greatly reduced the frequency of GC pause over 500ms
>>>     and the overhead is not that noticeable to our application
>>>
>>>     -XX:MaxGCPauseMillis=70
>>>
>>>     -XX:+UseLargePages
>>>
>>>     Environment
>>>
>>>     java version "1.7.0_13"
>>>
>>>     Java(TM) SE Runtime Environment (build 1.7.0_13-b20)
>>>
>>>     Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode)
>>>
>>>     Operating System
>>>
>>>     redhat 5.8 machine.
>>>
>>>     The machine has 12 cores/ 24threads and 48gb of ram.
>>>
>>>     Cheers,
>>>
>>>     *Ashley Taylor*
>>>
>>>     Software Engineer
>>>
>>>     Email:ashley.taylor at sli-systems.com
>>>     <mailto:ashley.taylor at sli-systems.com>
>>>
>>>     Website: www.sli-systems.com <http://www.sli-systems.com/>
>>>
>>>     Blog: blog.sli-systems.com <http://blog.sli-systems.com/>
>>>
>>>     Podcast: EcommercePodcast.com <http://ecommercepodcast.com/>
>>>
>>>     Twitter: www.twitter.com/slisystems
>>>     <http://www.twitter.com/slisystems>
>>>
>>>     _______________________________________________
>>>     hotspot-gc-use mailing list
>>>     hotspot-gc-use at openjdk.java.net
>>>     <mailto:hotspot-gc-use at openjdk.java.net>
>>>     http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>
>>>
>>>
>>>
>>>     _______________________________________________
>>>
>>>     hotspot-gc-use mailing list
>>>
>>>     hotspot-gc-use at openjdk.java.net  <mailto:hotspot-gc-use at openjdk.java.net>
>>>
>>>     http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>
>>
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net <mailto:hotspot-gc-use at openjdk.java.net>
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130220/1ac5eedd/attachment-0001.html