From john.cuthbertson at oracle.com  Fri Feb  1 14:17:50 2013
From: john.cuthbertson at oracle.com (John Cuthbertson)
Date: Fri, 01 Feb 2013 14:17:50 -0800
Subject: java 1.7.0u4 GarbageCollectionNotificationInfo API
In-Reply-To: <CA+R7V78w6HC9My1bjTZi3VaV1-4b5qxXpSAeAKUFcWpRuU9igg@mail.gmail.com>
References: <CA+R7V78w6HC9My1bjTZi3VaV1-4b5qxXpSAeAKUFcWpRuU9igg@mail.gmail.com>
Message-ID: <510C3F0E.9030400@oracle.com>

Hi Taras,

I'm going to cc the serviceability alias. I think they might be best 
suited to answer some of your questions. I believe they own the API and 
the GC provides the data.

Answer 1: It should be milliseconds, but there was a bug 
(http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7087969) that is now 
fixed in hs24 and you could be running into that.

Answer 2: This sounds like a bug. Do you have a test case you can share?

Answer 3: I'll leave that to the serviceability guys.

Regards,

JohnC

On 1/28/2013 1:11 PM, Taras Tielkes wrote:
>
> Hi,
>
> I'm playing around with the new(ish) GarbageCollectionNotificationInfo 
> API. We're using ParNew+CMS in all our systems, and my first goal is a 
> comparison between -XX:+PrintGCDetails -verbose:gc output and the 
> actual data coming through the notification API. I'm using Java 
> 1.7.0u6 for the experiments.
>
> So far, I have a number of questions:
> 1) duration times
>
> The javadoc for gcInfo.getDuration() describes the returned value as 
> expressed in milliseconds. However, the values differ to the gc logs 
> by  several orders of magnitude. How are they calculated?
>
> On a 1-core Linux x64 VM, the values actually look like microseconds, 
> but on a Win32 machines I still can't figure out any resemblance to gc 
> log timings.
>
> Apart from the unit, what should the value represent? Real time or 
> user time?
>
> 2) CMS events with cause "No GC"
>
> How exactly do the phases of CMS map to the notifications emitted for 
> the CMS collector?
>
> I sometimes get events with cause "No GC". Does this indicate a 
> background CMS cycle being initiated by hitting the occupancy fraction 
> threshold?
>
> 3) Eden/Survivor
>
> It seems that the MemoryUsage API treats Eden and Survivor separately, 
> i.e. survivor is not a subset of eden. This is different from the gc 
> log presentation. Is my understanding correct?
>
> In general, I think it would be useful to have a code sample for the 
> GC notification API that generates output as close as possible to 
> -XX:+PrintGCDetails -verbose:gc, as far as the data required to do so 
> is available.
>
> The API looks quite promising, it seems it could really benefit from a 
> bit of documentation love :)
>
> Thanks,
> -tt
>
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130201/37dca1ab/attachment.html 

From reachbach at yahoo.com  Sun Feb  3 23:26:58 2013
From: reachbach at yahoo.com (Bharath R)
Date: Sun, 3 Feb 2013 23:26:58 -0800 (PST)
Subject: G1 status in JDK1.6 Vs JDK1.7
In-Reply-To: <1359962096.18794.YahooMailNeo@web162101.mail.bf1.yahoo.com>
References: <1359962096.18794.YahooMailNeo@web162101.mail.bf1.yahoo.com>
Message-ID: <1359962818.10581.YahooMailNeo@web162103.mail.bf1.yahoo.com>

Hi,


Is the G1 GC 1.6 port on par with the 1.7 in terms of stability / quality? If that is true, I intend to begin experimenting with it in production and gradually roll it out across our deployment based on the outcome. On a related note, we intend to use G1 for an online system with a very low pause time requirement ( <10ms). The hardware is heterogeneous in terms of memory (ranges between 12G - 32G available to the application process) with comparable CPU configuration. CMS required considerable tuning to achieve acceptable results and I'm hoping G1 would fare better without myraid config options or overrides. 

I'd like to know of comparisons / experience operating G1 in production under such conditions. Thanks in advance.

-Bharath

P.S: Using RTJ is not an option for us :)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130203/14664a74/attachment.html 

From jesper.wilhelmsson at oracle.com  Wed Feb  6 16:21:20 2013
From: jesper.wilhelmsson at oracle.com (Jesper Wilhelmsson)
Date: Thu, 07 Feb 2013 01:21:20 +0100
Subject: G1 status in JDK1.6 Vs JDK1.7
In-Reply-To: <1359962818.10581.YahooMailNeo@web162103.mail.bf1.yahoo.com>
References: <1359962096.18794.YahooMailNeo@web162101.mail.bf1.yahoo.com>
	<1359962818.10581.YahooMailNeo@web162103.mail.bf1.yahoo.com>
Message-ID: <5112F380.5040403@oracle.com>

Hi Bharath,

The first supported release of G1 was with 7u4. The 7u4 version came 
with significant improvements and I do not recommend doing performance 
evaluations with earlier versions.

If you decide to move to JDK 7 and try G1 please share your experiences.
/Jesper


On 4/2/13 8:26 AM, Bharath R wrote:
> Hi,
>
> Is the G1 GC 1.6 port on par with the 1.7 in terms of stability /
> quality? If that is true, I intend to begin experimenting with it in
> production and gradually roll it out across our deployment based on the
> outcome. On a related note, we intend to use G1 for an online system
> with a very low pause time requirement ( <10ms). The hardware is
> heterogeneous in terms of memory (ranges between 12G - 32G available to
> the application process) with comparable CPU configuration. CMS required
> considerable tuning to achieve acceptable results and I'm hoping G1
> would fare better without myraid config options or overrides.
> I'd like to know of comparisons / experience operating G1 in production
> under such conditions. Thanks in advance.
>
> -Bharath
>
> P.S: Using RTJ is not an option for us :)
>
>
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>

From dahouet at gmail.com  Wed Feb 13 04:40:24 2013
From: dahouet at gmail.com (Nicolas VIAL)
Date: Wed, 13 Feb 2013 13:40:24 +0100
Subject: ParNew Allocation Failure
Message-ID: <CACfgF72Cy925aUpLHLQ9xFjzwW-kq9y47g-N2+-zkO8=xvjBow@mail.gmail.com>

 Hello

I'm trying to use HugePages on a high memory server.
Seems to be working fine except for this king of error messages :

ParNew occured at 2013-02-13 13:34:57.469, took 77ms (Allocation Failure)
eden(-838912) old(+103897)


JVM started with :

d64
XX:+UseCompressedOops
server
Xms30720M
Xmx30720M
XX:+UseLargePages

XX:PermSize=512m
XX:MaxPermSize=512m

XX:NewSize=1024m
XX:MaxNewSize=1024m

XX:InitialCodeCacheSize=256m
XX:ReservedCodeCacheSize=256m

XX:CompileThreshold=1000
XX:+UseParNewGC
XX:+PrintGCDetails

Statistics :

gc(ParNew)[count=20, time=2839], gc(MarkSweepCompact)[count=2, time=472],
eden[used=405995, commited=838912], survivor[used=104832, commited=104832],
old[used=2771853, commited=30408704], perm[used=126512, commited=1048576],
code[used=24987, commited=262144], compile[count=7492, time=69911,
invalidated=0, failed=4, threads=2], threads[count=391, daemon=25,
total=397, internal=15], class[loaded=18682, unloaded=0, initialized=11504,
loadtime=8524, inittime=2783, veriftime=4048], descriptors[open=335],
os[loadavg=0%, physicalfree=4557952, swapfree=4194296, virtual=39611076],
cpu[load=0%], disk[rate=75319, used=35%]

Using :
java version "1.7.0_13"
Java(TM) SE Runtime Environment (build 1.7.0_13-b20)
Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode)

Hope i can get some help
Thanks
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130213/1ed10867/attachment.html 

From bernd.eckenfels at googlemail.com  Wed Feb 13 05:07:23 2013
From: bernd.eckenfels at googlemail.com (Bernd Eckenfels)
Date: Wed, 13 Feb 2013 14:07:23 +0100
Subject: ParNew Allocation Failure
In-Reply-To: <CACfgF72Cy925aUpLHLQ9xFjzwW-kq9y47g-N2+-zkO8=xvjBow@mail.gmail.com>
References: <CACfgF72Cy925aUpLHLQ9xFjzwW-kq9y47g-N2+-zkO8=xvjBow@mail.gmail.com>
Message-ID: <op.wsf32ltotqmg3o@eckenfels02.seeburger.de>

Am 13.02.2013, 13:40 Uhr, schrieb Nicolas VIAL <dahouet at gmail.com>:

> ParNew occured at 2013-02-13 13:34:57.469, took 77ms (Allocation Failure)
> eden(-838912) old(+103897)

77ms does not seem like very long. How often do you see them? If you want  
to reduce that, you will need to reduce NewSize. Do you mean the deadline  
is only violated with UseLargePages but not without? Did you check for  
memory pressure on heap memory? Is that a 32GB system? 30GB looks a bit  
large for that.

Gruss
Bernd
-- 
https://plus.google.com/u/1/108084227682171831683/about

From taras.tielkes at gmail.com  Sun Feb 17 03:15:07 2013
From: taras.tielkes at gmail.com (Taras Tielkes)
Date: Sun, 17 Feb 2013 12:15:07 +0100
Subject: java 1.7.0u4 GarbageCollectionNotificationInfo API
In-Reply-To: <510C3F0E.9030400@oracle.com>
References: <CA+R7V78w6HC9My1bjTZi3VaV1-4b5qxXpSAeAKUFcWpRuU9igg@mail.gmail.com>
	<510C3F0E.9030400@oracle.com>
Message-ID: <CA+R7V7-cg+=vObA+Q3SS3yEn9WGgM7pA3-RKereZ2DBRWRsc7w@mail.gmail.com>

Hi John,

Thanks for the feedback. The milliseconds/ticks issue indeed seems to be
bug 7087969. Will the upcoming 7u14 contain hs24, and the fix?

Regarding the "No GC" cause, I think
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8006954 might be the
underlying issue. If I understand correctly, it's now fixed for hs24 as
well, and will hopefully be part of 7u14.

I'll post any follow-up questions regarding the GC Notification API to the
serviceability-dev mailing list.

Kind regards,
-tt


On Fri, Feb 1, 2013 at 11:17 PM, John Cuthbertson <
john.cuthbertson at oracle.com> wrote:

>  Hi Taras,
>
> I'm going to cc the serviceability alias. I think they might be best
> suited to answer some of your questions. I believe they own the API and the
> GC provides the data.
>
> Answer 1: It should be milliseconds, but there was a bug (
> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7087969) that is now
> fixed in hs24 and you could be running into that.
>
> Answer 2: This sounds like a bug. Do you have a test case you can share?
>
> Answer 3: I'll leave that to the serviceability guys.
>
> Regards,
>
> JohnC
>
>
> On 1/28/2013 1:11 PM, Taras Tielkes wrote:
>
>
>  Hi,
>
>  I'm playing around with the new(ish) GarbageCollectionNotificationInfo
> API. We're using ParNew+CMS in all our systems, and my first goal is a
> comparison between -XX:+PrintGCDetails -verbose:gc output and the actual
> data coming through the notification API. I'm using Java 1.7.0u6 for the
> experiments.
>
>  So far, I have a number of questions:
> 1) duration times
>
>  The javadoc for gcInfo.getDuration() describes the returned value as
> expressed in milliseconds. However, the values differ to the gc logs by
>  several orders of magnitude. How are they calculated?
>
>  On a 1-core Linux x64 VM, the values actually look like microseconds,
> but on a Win32 machines I still can't figure out any resemblance to gc log
> timings.
>
>  Apart from the unit, what should the value represent? Real time or user
> time?
>
>  2) CMS events with cause "No GC"
>
>  How exactly do the phases of CMS map to the notifications emitted for
> the CMS collector?
>
>  I sometimes get events with cause "No GC". Does this indicate a
> background CMS cycle being initiated by hitting the occupancy fraction
> threshold?
>
>  3) Eden/Survivor
>
>  It seems that the MemoryUsage API treats Eden and Survivor separately,
> i.e. survivor is not a subset of eden. This is different from the gc log
> presentation. Is my understanding correct?
>
>  In general, I think it would be useful to have a code sample for the GC
> notification API that generates output as close as possible to
> -XX:+PrintGCDetails -verbose:gc, as far as the data required to do so is
> available.
>
>  The API looks quite promising, it seems it could really benefit from a
> bit of documentation love :)
>
>  Thanks,
> -tt
>
>
>
> _______________________________________________
> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130217/3b971380/attachment.html 

From ashley.taylor at sli-systems.com  Mon Feb 18 17:12:11 2013
From: ashley.taylor at sli-systems.com (Ashley Taylor)
Date: Tue, 19 Feb 2013 01:12:11 +0000
Subject: G1 garbage collection Ext Root Scanning time increase linearly as
	application runs
Message-ID: <407A2CFDD3D8024187AFF7A7A4CC34344C5EE597@ex-nz1.globalbrain.net>

Hi,

We are testing the performance of the G1 garbage collection.
Our goal is to be able to remove the full gc pause that eventually happens when we CMS.

We have noticed that the garbage collection pause time starts off really well but over time it keeps climbing.

Looking at the logs we see that the section that is increasing linearly with time is the Ext Root Scanning
Here is a Root Scanning 1 Hour into the application here the total gc pause is around 80ms
  [Ext Root Scanning (ms):  11.5  0.8  1.5  1.8  1.6  4.8  1.2  1.5  1.2  1.4  1.1  1.6  1.2  1.1  1.1  1.1  1.2  1.2
       Avg:   2.1, Min:   0.8, Max:  11.5, Diff:  10.7]


Here is a snap shot after 19 hours. Here the pause is around 280ms
      [Ext Root Scanning (ms):  1.2  184.7  1.3  1.3  1.8  6.3  1.7  1.2  1.5  1.2  1.2  1.1  1.2  1.1  1.2  1.1  1.2  1.2
       Avg:  11.8, Min:   1.1, Max: 184.7, Diff: 183.6]

It seems that some task is linearly increasing with time, which only effects one thread.

After manually firing a full gc the total pause time returns back to around 80ms

After full GC
[Ext Root Scanning (ms):  2.4  1.7  4.5  2.6  4.6  2.1  2.1  1.7  2.1  1.8  1.8  2.2  0.6  0.0  0.0  0.0  0.0  0.0
       Avg:   1.7, Min:   0.0, Max:   4.6, Diff:   4.6]


The test is run with a constant load applied on the application that should hold the machine at around load 6.
We have around 3GB of data within the heap which  will very rarely become garbage, life of these objects would be several hours to days.
the rest will only live for 10s of milliseconds.
The JVM memory usage floats between 4-6gb.

Have checked a thread dump. There are no threads that have very large stack traces.
What could cause this increasing pause durations? Is there any way to get more information out of what that thread is actually trying to do, or any tuning options?


Environment

JVM Arguments
-Xms8g
-Xmx8g
-XX:+UseG1GC
-XX:InitiatingHeapOccupancyPercent=0 #found that having this at zero has greatly reduced the frequency of GC pause over 500ms and the overhead is not that noticeable to our application
-XX:MaxGCPauseMillis=70
-XX:+UseLargePages


Environment
java version "1.7.0_13"
Java(TM) SE Runtime Environment (build 1.7.0_13-b20)
Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode)


Operating System
redhat 5.8 machine.
The machine has 12 cores/ 24threads and 48gb of ram.


Cheers,
Ashley Taylor
Software Engineer
Email: ashley.taylor at sli-systems.com<mailto:ashley.taylor at sli-systems.com>
Website: www.sli-systems.com<http://www.sli-systems.com/>
Blog: blog.sli-systems.com<http://blog.sli-systems.com/>
Podcast: EcommercePodcast.com<http://ecommercepodcast.com/>
Twitter: www.twitter.com/slisystems<http://www.twitter.com/slisystems>

[sli_logo_2011]

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130219/2b6e9da6/attachment.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 8602 bytes
Desc: image001.png
Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130219/2b6e9da6/image001.png 

From matt.fowles at gmail.com  Mon Feb 18 18:03:38 2013
From: matt.fowles at gmail.com (Matt Fowles)
Date: Mon, 18 Feb 2013 21:03:38 -0500
Subject: G1 garbage collection Ext Root Scanning time increase linearly as
	application runs
In-Reply-To: <407A2CFDD3D8024187AFF7A7A4CC34344C5EE597@ex-nz1.globalbrain.net>
References: <407A2CFDD3D8024187AFF7A7A4CC34344C5EE597@ex-nz1.globalbrain.net>
Message-ID: <CAApERua2nEgOgBGuvwESF8QPOdVedYhyhQhs8yShYF7O6O-zbw@mail.gmail.com>

Ashley~

Do you have any JNI in the setup?  I saw a similar issue that was
painstakingly tracked down to a leaked handle in a JNI thread.

Matt


On Mon, Feb 18, 2013 at 8:12 PM, Ashley Taylor <
ashley.taylor at sli-systems.com> wrote:

>  Hi,****
>
> ** **
>
> We are testing the performance of the G1 garbage collection.****
>
> Our goal is to be able to remove the full gc pause that eventually happens
> when we CMS.****
>
> ** **
>
> We have noticed that the garbage collection pause time starts off really
> well but over time it keeps climbing.****
>
> ** **
>
> Looking at the logs we see that the section that is increasing linearly
> with time is the Ext Root Scanning****
>
> Here is a Root Scanning 1 Hour into the application here the total gc
> pause is around 80ms****
>
>   [Ext Root Scanning (ms):  11.5  0.8  1.5  1.8  1.6  4.8  1.2  1.5  1.2
> 1.4  1.1  1.6  1.2  1.1  1.1  1.1  1.2  1.2****
>
>        Avg:   2.1, Min:   0.8, Max:  11.5, Diff:  10.7]****
>
> ** **
>
> ** **
>
> Here is a snap shot after 19 hours. Here the pause is around 280ms ****
>
>       [Ext Root Scanning (ms):  1.2  184.7  1.3  1.3  1.8  6.3  1.7  1.2
> 1.5  1.2  1.2  1.1  1.2  1.1  1.2  1.1  1.2  1.2****
>
>        Avg:  11.8, Min:   1.1, Max: 184.7, Diff: 183.6]****
>
> ** **
>
> It seems that some task is linearly increasing with time, which only
> effects one thread.****
>
> ** **
>
> After manually firing a full gc the total pause time returns back to
> around 80ms****
>
> ** **
>
> After full GC****
>
> [Ext Root Scanning (ms):  2.4  1.7  4.5  2.6  4.6  2.1  2.1  1.7  2.1
> 1.8  1.8  2.2  0.6  0.0  0.0  0.0  0.0  0.0****
>
>        Avg:   1.7, Min:   0.0, Max:   4.6, Diff:   4.6]****
>
> ** **
>
> ** **
>
> The test is run with a constant load applied on the application that
> should hold the machine at around load 6.****
>
> We have around 3GB of data within the heap which  will very rarely become
> garbage, life of these objects would be several hours to days.****
>
> the rest will only live for 10s of milliseconds.****
>
> The JVM memory usage floats between 4-6gb.****
>
> ** **
>
> Have checked a thread dump. There are no threads that have very large
> stack traces.****
>
> What could cause this increasing pause durations? Is there any way to get
> more information out of what that thread is actually trying to do, or any
> tuning options?****
>
> ** **
>
> ** **
>
> Environment****
>
> ** **
>
> JVM Arguments****
>
> -Xms8g****
>
> -Xmx8g ****
>
> -XX:+UseG1GC ****
>
> -XX:InitiatingHeapOccupancyPercent=0 #found that having this at zero has
> greatly reduced the frequency of GC pause over 500ms and the overhead is
> not that noticeable to our application****
>
> -XX:MaxGCPauseMillis=70****
>
> -XX:+UseLargePages****
>
> ** **
>
> ** **
>
> Environment****
>
> java version "1.7.0_13"****
>
> Java(TM) SE Runtime Environment (build 1.7.0_13-b20)****
>
> Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode)****
>
> ** **
>
> ** **
>
> Operating System****
>
> redhat 5.8 machine.****
>
> The machine has 12 cores/ 24threads and 48gb of ram.****
>
> ** **
>
> ** **
>
> ** **
>
> Cheers,****
>
> *Ashley Taylor*
>
> Software Engineer****
>
> Email: ashley.taylor at sli-systems.com****
>
> Website: www.sli-systems.com****
>
> Blog: blog.sli-systems.com****
>
> Podcast: EcommercePodcast.com <http://ecommercepodcast.com/>****
>
> Twitter: www.twitter.com/slisystems****
>
> ** **
>
> [image: sli_logo_2011]**
>
> ** **
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130218/a1e85188/attachment-0001.html 

From ashley.taylor at sli-systems.com  Mon Feb 18 18:29:36 2013
From: ashley.taylor at sli-systems.com (Ashley Taylor)
Date: Tue, 19 Feb 2013 02:29:36 +0000
Subject: G1 garbage collection Ext Root Scanning time increase linearly
	as application runs
In-Reply-To: <CAApERua2nEgOgBGuvwESF8QPOdVedYhyhQhs8yShYF7O6O-zbw@mail.gmail.com>
References: <407A2CFDD3D8024187AFF7A7A4CC34344C5EE597@ex-nz1.globalbrain.net>
	<CAApERua2nEgOgBGuvwESF8QPOdVedYhyhQhs8yShYF7O6O-zbw@mail.gmail.com>
Message-ID: <407A2CFDD3D8024187AFF7A7A4CC34344C5EE6C7@ex-nz1.globalbrain.net>

Hi Matt
Thanks for the quick response.

Yes we do have JNI in this setup, I will disable the JNI link and rerun the test.
If it is JNI can you elaborate what you mean by leaked handle in a JNI thread and how we would go about identifying and fixing that.

Cheers,
Ashley

From: Matt Fowles [mailto:matt.fowles at gmail.com]
Sent: Tuesday, 19 February 2013 3:04 p.m.
To: Ashley Taylor
Cc: hotspot-gc-use at openjdk.java.net
Subject: Re: G1 garbage collection Ext Root Scanning time increase linearly as application runs

Ashley~

Do you have any JNI in the setup?  I saw a similar issue that was painstakingly tracked down to a leaked handle in a JNI thread.

Matt

On Mon, Feb 18, 2013 at 8:12 PM, Ashley Taylor <ashley.taylor at sli-systems.com<mailto:ashley.taylor at sli-systems.com>> wrote:
Hi,

We are testing the performance of the G1 garbage collection.
Our goal is to be able to remove the full gc pause that eventually happens when we CMS.

We have noticed that the garbage collection pause time starts off really well but over time it keeps climbing.

Looking at the logs we see that the section that is increasing linearly with time is the Ext Root Scanning
Here is a Root Scanning 1 Hour into the application here the total gc pause is around 80ms
  [Ext Root Scanning (ms):  11.5  0.8  1.5  1.8  1.6  4.8  1.2  1.5  1.2  1.4  1.1  1.6  1.2  1.1  1.1  1.1  1.2  1.2
       Avg:   2.1, Min:   0.8, Max:  11.5, Diff:  10.7]


Here is a snap shot after 19 hours. Here the pause is around 280ms
      [Ext Root Scanning (ms):  1.2  184.7  1.3  1.3  1.8  6.3  1.7  1.2  1.5  1.2  1.2  1.1  1.2  1.1  1.2  1.1  1.2  1.2
       Avg:  11.8, Min:   1.1, Max: 184.7, Diff: 183.6]

It seems that some task is linearly increasing with time, which only effects one thread.

After manually firing a full gc the total pause time returns back to around 80ms

After full GC
[Ext Root Scanning (ms):  2.4  1.7  4.5  2.6  4.6  2.1  2.1  1.7  2.1  1.8  1.8  2.2  0.6  0.0  0.0  0.0  0.0  0.0
       Avg:   1.7, Min:   0.0, Max:   4.6, Diff:   4.6]


The test is run with a constant load applied on the application that should hold the machine at around load 6.
We have around 3GB of data within the heap which  will very rarely become garbage, life of these objects would be several hours to days.
the rest will only live for 10s of milliseconds.
The JVM memory usage floats between 4-6gb.

Have checked a thread dump. There are no threads that have very large stack traces.
What could cause this increasing pause durations? Is there any way to get more information out of what that thread is actually trying to do, or any tuning options?


Environment

JVM Arguments
-Xms8g
-Xmx8g
-XX:+UseG1GC
-XX:InitiatingHeapOccupancyPercent=0 #found that having this at zero has greatly reduced the frequency of GC pause over 500ms and the overhead is not that noticeable to our application
-XX:MaxGCPauseMillis=70
-XX:+UseLargePages


Environment
java version "1.7.0_13"
Java(TM) SE Runtime Environment (build 1.7.0_13-b20)
Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode)


Operating System
redhat 5.8 machine.
The machine has 12 cores/ 24threads and 48gb of ram.


Cheers,
Ashley Taylor
Software Engineer
Email: ashley.taylor at sli-systems.com<mailto:ashley.taylor at sli-systems.com>
Website: www.sli-systems.com<http://www.sli-systems.com/>
Blog: blog.sli-systems.com<http://blog.sli-systems.com/>
Podcast: EcommercePodcast.com<http://ecommercepodcast.com/>
Twitter: www.twitter.com/slisystems<http://www.twitter.com/slisystems>


_______________________________________________
hotspot-gc-use mailing list
hotspot-gc-use at openjdk.java.net<mailto:hotspot-gc-use at openjdk.java.net>
http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130219/44a34a6e/attachment.html 

From matt.fowles at gmail.com  Mon Feb 18 18:49:00 2013
From: matt.fowles at gmail.com (Matt Fowles)
Date: Mon, 18 Feb 2013 21:49:00 -0500
Subject: G1 garbage collection Ext Root Scanning time increase linearly as
	application runs
In-Reply-To: <407A2CFDD3D8024187AFF7A7A4CC34344C5EE6C7@ex-nz1.globalbrain.net>
References: <407A2CFDD3D8024187AFF7A7A4CC34344C5EE597@ex-nz1.globalbrain.net>
	<CAApERua2nEgOgBGuvwESF8QPOdVedYhyhQhs8yShYF7O6O-zbw@mail.gmail.com>
	<407A2CFDD3D8024187AFF7A7A4CC34344C5EE6C7@ex-nz1.globalbrain.net>
Message-ID: <CAApERua1oiFv_KuV357aYctzBZdjO9aJUdzPngH6EBS_KpWdEg@mail.gmail.com>

Ashley~

The issue I was seeing was actually in CMS not G1, but it was eventually
tracked down to leaking LocalReferences in the JNI.  Each LocalRef (or
likely GlobalRef) adds 4 bytes to a section that has to be scanned every
GC.  If these build up without bound, you end up with growing GC times.

The issue that I found essentially boiled down to GetMethodID calls
creating a LocalRef and not being freed.

You can find the full painful search here:

http://web.archiveorange.com/archive/v/Dp7Rf33tij5BFBNRpVja

My minimal reproduction is

http://web.archiveorange.com/archive/v/Dp7Rf33tij5BFBNRpVja#YnJRjM4IVyt54TV

I sincerely hope my painful experience can save you time ;-)

Matt


On Mon, Feb 18, 2013 at 9:29 PM, Ashley Taylor <
ashley.taylor at sli-systems.com> wrote:

>  Hi Matt****
>
> Thanks for the quick response.****
>
> ** **
>
> Yes we do have JNI in this setup, I will disable the JNI link and rerun
> the test.****
>
> If it is JNI can you elaborate what you mean by leaked handle in a JNI
> thread and how we would go about identifying and fixing that.****
>
> ** **
>
> Cheers,****
>
> Ashley****
>
> ** **
>
> *From:* Matt Fowles [mailto:matt.fowles at gmail.com]
> *Sent:* Tuesday, 19 February 2013 3:04 p.m.
> *To:* Ashley Taylor
> *Cc:* hotspot-gc-use at openjdk.java.net
> *Subject:* Re: G1 garbage collection Ext Root Scanning time increase
> linearly as application runs****
>
> ** **
>
> Ashley~****
>
> ** **
>
> Do you have any JNI in the setup?  I saw a similar issue that was
> painstakingly tracked down to a leaked handle in a JNI thread.****
>
> ** **
>
> Matt****
>
> ** **
>
> On Mon, Feb 18, 2013 at 8:12 PM, Ashley Taylor <
> ashley.taylor at sli-systems.com> wrote:****
>
> Hi,****
>
>  ****
>
> We are testing the performance of the G1 garbage collection.****
>
> Our goal is to be able to remove the full gc pause that eventually happens
> when we CMS.****
>
>  ****
>
> We have noticed that the garbage collection pause time starts off really
> well but over time it keeps climbing.****
>
>  ****
>
> Looking at the logs we see that the section that is increasing linearly
> with time is the Ext Root Scanning****
>
> Here is a Root Scanning 1 Hour into the application here the total gc
> pause is around 80ms****
>
>   [Ext Root Scanning (ms):  11.5  0.8  1.5  1.8  1.6  4.8  1.2  1.5  1.2
> 1.4  1.1  1.6  1.2  1.1  1.1  1.1  1.2  1.2****
>
>        Avg:   2.1, Min:   0.8, Max:  11.5, Diff:  10.7]****
>
>  ****
>
>  ****
>
> Here is a snap shot after 19 hours. Here the pause is around 280ms ****
>
>       [Ext Root Scanning (ms):  1.2  184.7  1.3  1.3  1.8  6.3  1.7  1.2
> 1.5  1.2  1.2  1.1  1.2  1.1  1.2  1.1  1.2  1.2****
>
>        Avg:  11.8, Min:   1.1, Max: 184.7, Diff: 183.6]****
>
>  ****
>
> It seems that some task is linearly increasing with time, which only
> effects one thread.****
>
>  ****
>
> After manually firing a full gc the total pause time returns back to
> around 80ms****
>
>  ****
>
> After full GC****
>
> [Ext Root Scanning (ms):  2.4  1.7  4.5  2.6  4.6  2.1  2.1  1.7  2.1
> 1.8  1.8  2.2  0.6  0.0  0.0  0.0  0.0  0.0****
>
>        Avg:   1.7, Min:   0.0, Max:   4.6, Diff:   4.6]****
>
>  ****
>
>  ****
>
> The test is run with a constant load applied on the application that
> should hold the machine at around load 6.****
>
> We have around 3GB of data within the heap which  will very rarely become
> garbage, life of these objects would be several hours to days.****
>
> the rest will only live for 10s of milliseconds.****
>
> The JVM memory usage floats between 4-6gb.****
>
>  ****
>
> Have checked a thread dump. There are no threads that have very large
> stack traces.****
>
> What could cause this increasing pause durations? Is there any way to get
> more information out of what that thread is actually trying to do, or any
> tuning options?****
>
>  ****
>
>  ****
>
> Environment****
>
>  ****
>
> JVM Arguments****
>
> -Xms8g****
>
> -Xmx8g ****
>
> -XX:+UseG1GC ****
>
> -XX:InitiatingHeapOccupancyPercent=0 #found that having this at zero has
> greatly reduced the frequency of GC pause over 500ms and the overhead is
> not that noticeable to our application****
>
> -XX:MaxGCPauseMillis=70****
>
> -XX:+UseLargePages****
>
>  ****
>
>  ****
>
> Environment****
>
> java version "1.7.0_13"****
>
> Java(TM) SE Runtime Environment (build 1.7.0_13-b20)****
>
> Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode)****
>
>  ****
>
>  ****
>
> Operating System****
>
> redhat 5.8 machine.****
>
> The machine has 12 cores/ 24threads and 48gb of ram.****
>
>  ****
>
>  ****
>
>  ****
>
> Cheers,****
>
> *Ashley Taylor*****
>
> Software Engineer****
>
> Email: ashley.taylor at sli-systems.com****
>
> Website: www.sli-systems.com****
>
> Blog: blog.sli-systems.com****
>
> Podcast: EcommercePodcast.com <http://ecommercepodcast.com/>****
>
> Twitter: www.twitter.com/slisystems****
>
>  ****
>
> ****
>
>  ****
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use****
>
>  ** **
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130218/ee77bd33/attachment-0001.html 

From ashley.taylor at sli-systems.com  Tue Feb 19 11:24:16 2013
From: ashley.taylor at sli-systems.com (Ashley Taylor)
Date: Tue, 19 Feb 2013 19:24:16 +0000
Subject: G1 garbage collection Ext Root Scanning time increase linearly
	as application runs
In-Reply-To: <CAApERua1oiFv_KuV357aYctzBZdjO9aJUdzPngH6EBS_KpWdEg@mail.gmail.com>
References: <407A2CFDD3D8024187AFF7A7A4CC34344C5EE597@ex-nz1.globalbrain.net>
	<CAApERua2nEgOgBGuvwESF8QPOdVedYhyhQhs8yShYF7O6O-zbw@mail.gmail.com>
	<407A2CFDD3D8024187AFF7A7A4CC34344C5EE6C7@ex-nz1.globalbrain.net>
	<CAApERua1oiFv_KuV357aYctzBZdjO9aJUdzPngH6EBS_KpWdEg@mail.gmail.com>
Message-ID: <407A2CFDD3D8024187AFF7A7A4CC34344C5F03F9@ex-nz1.globalbrain.net>

Hi Matt

Seems that the issue I'm experiencing is unrelated to JNI same issue with JNI calls mocked.
Reading that post I noticed that your gc pauses where still increasing after a full gc. In our case a full gc will fix the issue.
Will have to keep hunting for the cause in my application.


Cheers,
Ashley

From: Matt Fowles [mailto:matt.fowles at gmail.com]
Sent: Tuesday, 19 February 2013 3:49 p.m.
To: Ashley Taylor
Cc: hotspot-gc-use at openjdk.java.net
Subject: Re: G1 garbage collection Ext Root Scanning time increase linearly as application runs

Ashley~

The issue I was seeing was actually in CMS not G1, but it was eventually tracked down to leaking LocalReferences in the JNI.  Each LocalRef (or likely GlobalRef) adds 4 bytes to a section that has to be scanned every GC.  If these build up without bound, you end up with growing GC times.

The issue that I found essentially boiled down to GetMethodID calls creating a LocalRef and not being freed.

You can find the full painful search here:

http://web.archiveorange.com/archive/v/Dp7Rf33tij5BFBNRpVja

My minimal reproduction is

http://web.archiveorange.com/archive/v/Dp7Rf33tij5BFBNRpVja#YnJRjM4IVyt54TV

I sincerely hope my painful experience can save you time ;-)

Matt


On Mon, Feb 18, 2013 at 9:29 PM, Ashley Taylor <ashley.taylor at sli-systems.com<mailto:ashley.taylor at sli-systems.com>> wrote:
Hi Matt
Thanks for the quick response.

Yes we do have JNI in this setup, I will disable the JNI link and rerun the test.
If it is JNI can you elaborate what you mean by leaked handle in a JNI thread and how we would go about identifying and fixing that.

Cheers,
Ashley

From: Matt Fowles [mailto:matt.fowles at gmail.com<mailto:matt.fowles at gmail.com>]
Sent: Tuesday, 19 February 2013 3:04 p.m.
To: Ashley Taylor
Cc: hotspot-gc-use at openjdk.java.net<mailto:hotspot-gc-use at openjdk.java.net>
Subject: Re: G1 garbage collection Ext Root Scanning time increase linearly as application runs

Ashley~

Do you have any JNI in the setup?  I saw a similar issue that was painstakingly tracked down to a leaked handle in a JNI thread.

Matt

On Mon, Feb 18, 2013 at 8:12 PM, Ashley Taylor <ashley.taylor at sli-systems.com<mailto:ashley.taylor at sli-systems.com>> wrote:
Hi,

We are testing the performance of the G1 garbage collection.
Our goal is to be able to remove the full gc pause that eventually happens when we CMS.

We have noticed that the garbage collection pause time starts off really well but over time it keeps climbing.

Looking at the logs we see that the section that is increasing linearly with time is the Ext Root Scanning
Here is a Root Scanning 1 Hour into the application here the total gc pause is around 80ms
  [Ext Root Scanning (ms):  11.5  0.8  1.5  1.8  1.6  4.8  1.2  1.5  1.2  1.4  1.1  1.6  1.2  1.1  1.1  1.1  1.2  1.2
       Avg:   2.1, Min:   0.8, Max:  11.5, Diff:  10.7]


Here is a snap shot after 19 hours. Here the pause is around 280ms
      [Ext Root Scanning (ms):  1.2  184.7  1.3  1.3  1.8  6.3  1.7  1.2  1.5  1.2  1.2  1.1  1.2  1.1  1.2  1.1  1.2  1.2
       Avg:  11.8, Min:   1.1, Max: 184.7, Diff: 183.6]

It seems that some task is linearly increasing with time, which only effects one thread.

After manually firing a full gc the total pause time returns back to around 80ms

After full GC
[Ext Root Scanning (ms):  2.4  1.7  4.5  2.6  4.6  2.1  2.1  1.7  2.1  1.8  1.8  2.2  0.6  0.0  0.0  0.0  0.0  0.0
       Avg:   1.7, Min:   0.0, Max:   4.6, Diff:   4.6]


The test is run with a constant load applied on the application that should hold the machine at around load 6.
We have around 3GB of data within the heap which  will very rarely become garbage, life of these objects would be several hours to days.
the rest will only live for 10s of milliseconds.
The JVM memory usage floats between 4-6gb.

Have checked a thread dump. There are no threads that have very large stack traces.
What could cause this increasing pause durations? Is there any way to get more information out of what that thread is actually trying to do, or any tuning options?


Environment

JVM Arguments
-Xms8g
-Xmx8g
-XX:+UseG1GC
-XX:InitiatingHeapOccupancyPercent=0 #found that having this at zero has greatly reduced the frequency of GC pause over 500ms and the overhead is not that noticeable to our application
-XX:MaxGCPauseMillis=70
-XX:+UseLargePages


Environment
java version "1.7.0_13"
Java(TM) SE Runtime Environment (build 1.7.0_13-b20)
Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode)


Operating System
redhat 5.8 machine.
The machine has 12 cores/ 24threads and 48gb of ram.


Cheers,
Ashley Taylor
Software Engineer
Email: ashley.taylor at sli-systems.com<mailto:ashley.taylor at sli-systems.com>
Website: www.sli-systems.com<http://www.sli-systems.com/>
Blog: blog.sli-systems.com<http://blog.sli-systems.com/>
Podcast: EcommercePodcast.com<http://ecommercepodcast.com/>
Twitter: www.twitter.com/slisystems<http://www.twitter.com/slisystems>


_______________________________________________
hotspot-gc-use mailing list
hotspot-gc-use at openjdk.java.net<mailto:hotspot-gc-use at openjdk.java.net>
http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130219/62cca774/attachment-0001.html 

From john.cuthbertson at oracle.com  Tue Feb 19 12:01:54 2013
From: john.cuthbertson at oracle.com (John Cuthbertson)
Date: Tue, 19 Feb 2013 12:01:54 -0800
Subject: G1 garbage collection Ext Root Scanning time increase linearly
	as application runs
In-Reply-To: <407A2CFDD3D8024187AFF7A7A4CC34344C5F03F9@ex-nz1.globalbrain.net>
References: <407A2CFDD3D8024187AFF7A7A4CC34344C5EE597@ex-nz1.globalbrain.net>
	<CAApERua2nEgOgBGuvwESF8QPOdVedYhyhQhs8yShYF7O6O-zbw@mail.gmail.com>
	<407A2CFDD3D8024187AFF7A7A4CC34344C5EE6C7@ex-nz1.globalbrain.net>
	<CAApERua1oiFv_KuV357aYctzBZdjO9aJUdzPngH6EBS_KpWdEg@mail.gmail.com>
	<407A2CFDD3D8024187AFF7A7A4CC34344C5F03F9@ex-nz1.globalbrain.net>
Message-ID: <5123DA32.806@oracle.com>

Hi Ashley,

Basically as you surmise one the GC worker threads is being held up when 
processing a single root. I've seen s similar issue that's caused by 
filling up the code cache (where JIT compiled methods are held). The 
code cache is treated as a single root and so is claimed in its entirety 
by a single GC worker thread. As a the code cache fills up, the thread 
that claims the code cache to scan starts getting held up.

A full GC clears the issue because that's where G1 currently does class 
unloading: the full GC unloads a whole bunch of classes allowing any the 
compiled code of any of the unloaded classes' methods to be freed by the 
nmethod sweeper. So after a a full GC the number of compiled methods in 
the code cache is less.

It could also be the just the sheer number of loaded classes as the 
system dictionary is also treated as a single claimable root.

I think there's a couple existing CRs to track this. I'll see if I can 
find the numbers.

Regards,

JohnC

On 2/19/2013 11:24 AM, Ashley Taylor wrote:
>
> Hi Matt
>
> Seems that the issue I'm experiencing is unrelated to JNI same issue 
> with JNI calls mocked.
>
> Reading that post I noticed that your gc pauses where still increasing 
> after a full gc. In our case a full gc will fix the issue.
>
> Will have to keep hunting for the cause in my application.
>
>
> Cheers,
>
> Ashley
>
> *From:*Matt Fowles [mailto:matt.fowles at gmail.com]
> *Sent:* Tuesday, 19 February 2013 3:49 p.m.
> *To:* Ashley Taylor
> *Cc:* hotspot-gc-use at openjdk.java.net
> *Subject:* Re: G1 garbage collection Ext Root Scanning time increase 
> linearly as application runs
>
> Ashley~
>
> The issue I was seeing was actually in CMS not G1, but it was 
> eventually tracked down to leaking LocalReferences in the JNI.  Each 
> LocalRef (or likely GlobalRef) adds 4 bytes to a section that has to 
> be scanned every GC.  If these build up without bound, you end up with 
> growing GC times.
>
> The issue that I found essentially boiled down to GetMethodID calls 
> creating a LocalRef and not being freed.
>
> You can find the full painful search here:
>
> http://web.archiveorange.com/archive/v/Dp7Rf33tij5BFBNRpVja
>
> My minimal reproduction is
>
> http://web.archiveorange.com/archive/v/Dp7Rf33tij5BFBNRpVja#YnJRjM4IVyt54TV
>
> I sincerely hope my painful experience can save you time ;-)
>
> Matt
>
> On Mon, Feb 18, 2013 at 9:29 PM, Ashley Taylor 
> <ashley.taylor at sli-systems.com <mailto:ashley.taylor at sli-systems.com>> 
> wrote:
>
> Hi Matt
>
> Thanks for the quick response.
>
> Yes we do have JNI in this setup, I will disable the JNI link and 
> rerun the test.
>
> If it is JNI can you elaborate what you mean by leaked handle in a JNI 
> thread and how we would go about identifying and fixing that.
>
> Cheers,
>
> Ashley
>
> *From:*Matt Fowles [mailto:matt.fowles at gmail.com 
> <mailto:matt.fowles at gmail.com>]
> *Sent:* Tuesday, 19 February 2013 3:04 p.m.
> *To:* Ashley Taylor
> *Cc:* hotspot-gc-use at openjdk.java.net 
> <mailto:hotspot-gc-use at openjdk.java.net>
> *Subject:* Re: G1 garbage collection Ext Root Scanning time increase 
> linearly as application runs
>
> Ashley~
>
> Do you have any JNI in the setup?  I saw a similar issue that was 
> painstakingly tracked down to a leaked handle in a JNI thread.
>
> Matt
>
> On Mon, Feb 18, 2013 at 8:12 PM, Ashley Taylor 
> <ashley.taylor at sli-systems.com <mailto:ashley.taylor at sli-systems.com>> 
> wrote:
>
> Hi,
>
> We are testing the performance of the G1 garbage collection.
>
> Our goal is to be able to remove the full gc pause that eventually 
> happens when we CMS.
>
> We have noticed that the garbage collection pause time starts off 
> really well but over time it keeps climbing.
>
> Looking at the logs we see that the section that is increasing 
> linearly with time is the Ext Root Scanning
>
> Here is a Root Scanning 1 Hour into the application here the total gc 
> pause is around 80ms
>
> [Ext Root Scanning (ms):  11.5  0.8  1.5 1.8  1.6  4.8  1.2  1.5  1.2  
> 1.4  1.1 1.6  1.2  1.1  1.1  1.1  1.2  1.2
>
> Avg:   2.1, Min:   0.8, Max:  11.5, Diff: 10.7]
>
> Here is a snap shot after 19 hours. Here the pause is around 280ms
>
>       [Ext Root Scanning (ms):  1.2  184.7  1.3  1.3 1.8  6.3  1.7  
> 1.2  1.5  1.2  1.2  1.1 1.2  1.1  1.2  1.1  1.2  1.2
>
> Avg:  11.8, Min:   1.1, Max: 184.7, Diff: 183.6]
>
> It seems that some task is linearly increasing with time, which only 
> effects one thread.
>
> After manually firing a full gc the total pause time returns back to 
> around 80ms
>
> After full GC
>
> [Ext Root Scanning (ms):  2.4  1.7  4.5  2.6 4.6  2.1  2.1  1.7  2.1  
> 1.8  1.8  2.2 0.6  0.0  0.0  0.0  0.0  0.0
>
> Avg:   1.7, Min:   0.0, Max:   4.6, Diff:   4.6]
>
> The test is run with a constant load applied on the application that 
> should hold the machine at around load 6.
>
> We have around 3GB of data within the heap which  will very rarely 
> become garbage, life of these objects would be several hours to days.
>
> the rest will only live for 10s of milliseconds.
>
> The JVM memory usage floats between 4-6gb.
>
> Have checked a thread dump. There are no threads that have very large 
> stack traces.
>
> What could cause this increasing pause durations? Is there any way to 
> get more information out of what that thread is actually trying to do, 
> or any tuning options?
>
> Environment
>
> JVM Arguments
>
> -Xms8g
>
> -Xmx8g
>
> -XX:+UseG1GC
>
> -XX:InitiatingHeapOccupancyPercent=0 #found that having this at zero 
> has greatly reduced the frequency of GC pause over 500ms and the 
> overhead is not that noticeable to our application
>
> -XX:MaxGCPauseMillis=70
>
> -XX:+UseLargePages
>
> Environment
>
> java version "1.7.0_13"
>
> Java(TM) SE Runtime Environment (build 1.7.0_13-b20)
>
> Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode)
>
> Operating System
>
> redhat 5.8 machine.
>
> The machine has 12 cores/ 24threads and 48gb of ram.
>
> Cheers,
>
> *Ashley Taylor*
>
> Software Engineer
>
> Email:ashley.taylor at sli-systems.com <mailto:ashley.taylor at sli-systems.com>
>
> Website: www.sli-systems.com <http://www.sli-systems.com/>
>
> Blog: blog.sli-systems.com <http://blog.sli-systems.com/>
>
> Podcast: EcommercePodcast.com <http://ecommercepodcast.com/>
>
> Twitter: www.twitter.com/slisystems <http://www.twitter.com/slisystems>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net <mailto:hotspot-gc-use at openjdk.java.net>
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130219/a0e6cb98/attachment-0001.html 

From ashley.taylor at sli-systems.com  Tue Feb 19 17:11:04 2013
From: ashley.taylor at sli-systems.com (Ashley Taylor)
Date: Wed, 20 Feb 2013 01:11:04 +0000
Subject: G1 garbage collection Ext Root Scanning time increase linearly
	as application runs
In-Reply-To: <5123DA32.806@oracle.com>
References: <407A2CFDD3D8024187AFF7A7A4CC34344C5EE597@ex-nz1.globalbrain.net>
	<CAApERua2nEgOgBGuvwESF8QPOdVedYhyhQhs8yShYF7O6O-zbw@mail.gmail.com>
	<407A2CFDD3D8024187AFF7A7A4CC34344C5EE6C7@ex-nz1.globalbrain.net>
	<CAApERua1oiFv_KuV357aYctzBZdjO9aJUdzPngH6EBS_KpWdEg@mail.gmail.com>
	<407A2CFDD3D8024187AFF7A7A4CC34344C5F03F9@ex-nz1.globalbrain.net>
	<5123DA32.806@oracle.com>
Message-ID: <407A2CFDD3D8024187AFF7A7A4CC34344C5F085D@ex-nz1.globalbrain.net>

Hi John

I reran my application with the JIT log turned on. It seems that once the application has been running for a while there is very little activity within the JIT log but the pause times keep climbing, I ran it for 4 hours and the 'Ext Root Scan' had climbed to 40ms.

At the 4 hour point I also performed a full gc to see how many classes would be unload and it was only 50. We have around 5500 loaded classes.
The number of loaded classes also does not increase once the application has run for a while.

I also used jstat to see how full the permanent memory region is, it is slowly climbing the full gc did not seem to reduce it at all, however the full gc did fix the pause time.

The permanent region is currently at 89.17% and seems to increase by 0.01% every couple of minutes.

Is there any other GC events that only happen at a full gc?

Cheers,
Ashley


From: hotspot-gc-use-bounces at openjdk.java.net [mailto:hotspot-gc-use-bounces at openjdk.java.net] On Behalf Of John Cuthbertson
Sent: Wednesday, 20 February 2013 9:10 a.m.
To: hotspot-gc-use at openjdk.java.net
Subject: Re: G1 garbage collection Ext Root Scanning time increase linearly as application runs

Hi Ashley,

Basically as you surmise one the GC worker threads is being held up when processing a single root. I've seen s similar issue that's caused by filling up the code cache (where JIT compiled methods are held). The code cache is treated as a single root and so is claimed in its entirety by a single GC worker thread. As a the code cache fills up, the thread that claims the code cache to scan starts getting held up.

A full GC clears the issue because that's where G1 currently does class unloading: the full GC unloads a whole bunch of classes allowing any the compiled code of any of the unloaded classes' methods to be freed by the nmethod sweeper. So after a a full GC the number of compiled methods in the code cache is less.

It could also be the just the sheer number of loaded classes as the system dictionary is also treated as a single claimable root.

I think there's a couple existing CRs to track this. I'll see if I can find the numbers.

Regards,

JohnC
On 2/19/2013 11:24 AM, Ashley Taylor wrote:
Hi Matt

Seems that the issue I'm experiencing is unrelated to JNI same issue with JNI calls mocked.
Reading that post I noticed that your gc pauses where still increasing after a full gc. In our case a full gc will fix the issue.
Will have to keep hunting for the cause in my application.


Cheers,
Ashley

From: Matt Fowles [mailto:matt.fowles at gmail.com]
Sent: Tuesday, 19 February 2013 3:49 p.m.
To: Ashley Taylor
Cc: hotspot-gc-use at openjdk.java.net<mailto:hotspot-gc-use at openjdk.java.net>
Subject: Re: G1 garbage collection Ext Root Scanning time increase linearly as application runs

Ashley~

The issue I was seeing was actually in CMS not G1, but it was eventually tracked down to leaking LocalReferences in the JNI.  Each LocalRef (or likely GlobalRef) adds 4 bytes to a section that has to be scanned every GC.  If these build up without bound, you end up with growing GC times.

The issue that I found essentially boiled down to GetMethodID calls creating a LocalRef and not being freed.

You can find the full painful search here:

http://web.archiveorange.com/archive/v/Dp7Rf33tij5BFBNRpVja

My minimal reproduction is

http://web.archiveorange.com/archive/v/Dp7Rf33tij5BFBNRpVja#YnJRjM4IVyt54TV

I sincerely hope my painful experience can save you time ;-)

Matt


On Mon, Feb 18, 2013 at 9:29 PM, Ashley Taylor <ashley.taylor at sli-systems.com<mailto:ashley.taylor at sli-systems.com>> wrote:
Hi Matt
Thanks for the quick response.

Yes we do have JNI in this setup, I will disable the JNI link and rerun the test.
If it is JNI can you elaborate what you mean by leaked handle in a JNI thread and how we would go about identifying and fixing that.

Cheers,
Ashley

From: Matt Fowles [mailto:matt.fowles at gmail.com<mailto:matt.fowles at gmail.com>]
Sent: Tuesday, 19 February 2013 3:04 p.m.
To: Ashley Taylor
Cc: hotspot-gc-use at openjdk.java.net<mailto:hotspot-gc-use at openjdk.java.net>
Subject: Re: G1 garbage collection Ext Root Scanning time increase linearly as application runs

Ashley~

Do you have any JNI in the setup?  I saw a similar issue that was painstakingly tracked down to a leaked handle in a JNI thread.

Matt

On Mon, Feb 18, 2013 at 8:12 PM, Ashley Taylor <ashley.taylor at sli-systems.com<mailto:ashley.taylor at sli-systems.com>> wrote:
Hi,

We are testing the performance of the G1 garbage collection.
Our goal is to be able to remove the full gc pause that eventually happens when we CMS.

We have noticed that the garbage collection pause time starts off really well but over time it keeps climbing.

Looking at the logs we see that the section that is increasing linearly with time is the Ext Root Scanning
Here is a Root Scanning 1 Hour into the application here the total gc pause is around 80ms
  [Ext Root Scanning (ms):  11.5  0.8  1.5  1.8  1.6  4.8  1.2  1.5  1.2  1.4  1.1  1.6  1.2  1.1  1.1  1.1  1.2  1.2
       Avg:   2.1, Min:   0.8, Max:  11.5, Diff:  10.7]


Here is a snap shot after 19 hours. Here the pause is around 280ms
      [Ext Root Scanning (ms):  1.2  184.7  1.3  1.3  1.8  6.3  1.7  1.2  1.5  1.2  1.2  1.1  1.2  1.1  1.2  1.1  1.2  1.2
       Avg:  11.8, Min:   1.1, Max: 184.7, Diff: 183.6]

It seems that some task is linearly increasing with time, which only effects one thread.

After manually firing a full gc the total pause time returns back to around 80ms

After full GC
[Ext Root Scanning (ms):  2.4  1.7  4.5  2.6  4.6  2.1  2.1  1.7  2.1  1.8  1.8  2.2  0.6  0.0  0.0  0.0  0.0  0.0
       Avg:   1.7, Min:   0.0, Max:   4.6, Diff:   4.6]


The test is run with a constant load applied on the application that should hold the machine at around load 6.
We have around 3GB of data within the heap which  will very rarely become garbage, life of these objects would be several hours to days.
the rest will only live for 10s of milliseconds.
The JVM memory usage floats between 4-6gb.

Have checked a thread dump. There are no threads that have very large stack traces.
What could cause this increasing pause durations? Is there any way to get more information out of what that thread is actually trying to do, or any tuning options?


Environment

JVM Arguments
-Xms8g
-Xmx8g
-XX:+UseG1GC
-XX:InitiatingHeapOccupancyPercent=0 #found that having this at zero has greatly reduced the frequency of GC pause over 500ms and the overhead is not that noticeable to our application
-XX:MaxGCPauseMillis=70
-XX:+UseLargePages


Environment
java version "1.7.0_13"
Java(TM) SE Runtime Environment (build 1.7.0_13-b20)
Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode)


Operating System
redhat 5.8 machine.
The machine has 12 cores/ 24threads and 48gb of ram.


Cheers,
Ashley Taylor
Software Engineer
Email: ashley.taylor at sli-systems.com<mailto:ashley.taylor at sli-systems.com>
Website: www.sli-systems.com<http://www.sli-systems.com/>
Blog: blog.sli-systems.com<http://blog.sli-systems.com/>
Podcast: EcommercePodcast.com<http://ecommercepodcast.com/>
Twitter: www.twitter.com/slisystems<http://www.twitter.com/slisystems>


_______________________________________________
hotspot-gc-use mailing list
hotspot-gc-use at openjdk.java.net<mailto:hotspot-gc-use at openjdk.java.net>
http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use


_______________________________________________

hotspot-gc-use mailing list

hotspot-gc-use at openjdk.java.net<mailto:hotspot-gc-use at openjdk.java.net>

http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130220/a0c199a3/attachment-0001.html 

From john.cuthbertson at oracle.com  Tue Feb 19 17:38:24 2013
From: john.cuthbertson at oracle.com (John Cuthbertson)
Date: Tue, 19 Feb 2013 17:38:24 -0800
Subject: G1 garbage collection Ext Root Scanning time increase linearly
	as application runs
In-Reply-To: <407A2CFDD3D8024187AFF7A7A4CC34344C5F085D@ex-nz1.globalbrain.net>
References: <407A2CFDD3D8024187AFF7A7A4CC34344C5EE597@ex-nz1.globalbrain.net>
	<CAApERua2nEgOgBGuvwESF8QPOdVedYhyhQhs8yShYF7O6O-zbw@mail.gmail.com>
	<407A2CFDD3D8024187AFF7A7A4CC34344C5EE6C7@ex-nz1.globalbrain.net>
	<CAApERua1oiFv_KuV357aYctzBZdjO9aJUdzPngH6EBS_KpWdEg@mail.gmail.com>
	<407A2CFDD3D8024187AFF7A7A4CC34344C5F03F9@ex-nz1.globalbrain.net>
	<5123DA32.806@oracle.com>
	<407A2CFDD3D8024187AFF7A7A4CC34344C5F085D@ex-nz1.globalbrain.net>
Message-ID: <51242910.8080604@oracle.com>

Hi Ashely,

Off the top of my head there's also the intern string table. I'll have 
to look at the code to figure out what else it could be.

Thanks for the info.

JohnC

On 2/19/2013 5:11 PM, Ashley Taylor wrote:
>
> Hi John
>
> I reran my application with the JIT log turned on. It seems that once 
> the application has been running for a while there is very little 
> activity within the JIT log but the pause times keep climbing, I ran 
> it for 4 hours and the 'Ext Root Scan' had climbed to 40ms.
>
> At the 4 hour point I also performed a full gc to see how many classes 
> would be unload and it was only 50. We have around 5500 loaded classes.
>
> The number of loaded classes also does not increase once the 
> application has run for a while.
>
> I also used jstat to see how full the permanent memory region is, it 
> is slowly climbing the full gc did not seem to reduce it at all, 
> however the full gc did fix the pause time.
>
> The permanent region is currently at 89.17% and seems to increase by 
> 0.01% every couple of minutes.
>
> Is there any other GC events that only happen at a full gc?
>
> Cheers,
>
> Ashley
>
> *From:*hotspot-gc-use-bounces at openjdk.java.net 
> [mailto:hotspot-gc-use-bounces at openjdk.java.net] *On Behalf Of *John 
> Cuthbertson
> *Sent:* Wednesday, 20 February 2013 9:10 a.m.
> *To:* hotspot-gc-use at openjdk.java.net
> *Subject:* Re: G1 garbage collection Ext Root Scanning time increase 
> linearly as application runs
>
> Hi Ashley,
>
> Basically as you surmise one the GC worker threads is being held up 
> when processing a single root. I've seen s similar issue that's caused 
> by filling up the code cache (where JIT compiled methods are held). 
> The code cache is treated as a single root and so is claimed in its 
> entirety by a single GC worker thread. As a the code cache fills up, 
> the thread that claims the code cache to scan starts getting held up.
>
> A full GC clears the issue because that's where G1 currently does 
> class unloading: the full GC unloads a whole bunch of classes allowing 
> any the compiled code of any of the unloaded classes' methods to be 
> freed by the nmethod sweeper. So after a a full GC the number of 
> compiled methods in the code cache is less.
>
> It could also be the just the sheer number of loaded classes as the 
> system dictionary is also treated as a single claimable root.
>
> I think there's a couple existing CRs to track this. I'll see if I can 
> find the numbers.
>
> Regards,
>
> JohnC
>
> On 2/19/2013 11:24 AM, Ashley Taylor wrote:
>
>     Hi Matt
>
>     Seems that the issue I'm experiencing is unrelated to JNI same
>     issue with JNI calls mocked.
>
>     Reading that post I noticed that your gc pauses where still
>     increasing after a full gc. In our case a full gc will fix the issue.
>
>     Will have to keep hunting for the cause in my application.
>
>
>     Cheers,
>
>     Ashley
>
>     *From:*Matt Fowles [mailto:matt.fowles at gmail.com]
>     *Sent:* Tuesday, 19 February 2013 3:49 p.m.
>     *To:* Ashley Taylor
>     *Cc:* hotspot-gc-use at openjdk.java.net
>     <mailto:hotspot-gc-use at openjdk.java.net>
>     *Subject:* Re: G1 garbage collection Ext Root Scanning time
>     increase linearly as application runs
>
>     Ashley~
>
>     The issue I was seeing was actually in CMS not G1, but it was
>     eventually tracked down to leaking LocalReferences in the JNI.
>      Each LocalRef (or likely GlobalRef) adds 4 bytes to a section
>     that has to be scanned every GC.  If these build up without bound,
>     you end up with growing GC times.
>
>     The issue that I found essentially boiled down to GetMethodID
>     calls creating a LocalRef and not being freed.
>
>     You can find the full painful search here:
>
>     http://web.archiveorange.com/archive/v/Dp7Rf33tij5BFBNRpVja
>
>     My minimal reproduction is
>
>     http://web.archiveorange.com/archive/v/Dp7Rf33tij5BFBNRpVja#YnJRjM4IVyt54TV
>
>     I sincerely hope my painful experience can save you time ;-)
>
>     Matt
>
>     On Mon, Feb 18, 2013 at 9:29 PM, Ashley Taylor
>     <ashley.taylor at sli-systems.com
>     <mailto:ashley.taylor at sli-systems.com>> wrote:
>
>     Hi Matt
>
>     Thanks for the quick response.
>
>     Yes we do have JNI in this setup, I will disable the JNI link and
>     rerun the test.
>
>     If it is JNI can you elaborate what you mean by leaked handle in a
>     JNI thread and how we would go about identifying and fixing that.
>
>     Cheers,
>
>     Ashley
>
>     *From:*Matt Fowles [mailto:matt.fowles at gmail.com
>     <mailto:matt.fowles at gmail.com>]
>     *Sent:* Tuesday, 19 February 2013 3:04 p.m.
>     *To:* Ashley Taylor
>     *Cc:* hotspot-gc-use at openjdk.java.net
>     <mailto:hotspot-gc-use at openjdk.java.net>
>     *Subject:* Re: G1 garbage collection Ext Root Scanning time
>     increase linearly as application runs
>
>     Ashley~
>
>     Do you have any JNI in the setup?  I saw a similar issue that was
>     painstakingly tracked down to a leaked handle in a JNI thread.
>
>     Matt
>
>     On Mon, Feb 18, 2013 at 8:12 PM, Ashley Taylor
>     <ashley.taylor at sli-systems.com
>     <mailto:ashley.taylor at sli-systems.com>> wrote:
>
>     Hi,
>
>     We are testing the performance of the G1 garbage collection.
>
>     Our goal is to be able to remove the full gc pause that eventually
>     happens when we CMS.
>
>     We have noticed that the garbage collection pause time starts off
>     really well but over time it keeps climbing.
>
>     Looking at the logs we see that the section that is increasing
>     linearly with time is the Ext Root Scanning
>
>     Here is a Root Scanning 1 Hour into the application here the total
>     gc pause is around 80ms
>
>     [Ext Root Scanning (ms):  11.5  0.8 1.5  1.8  1.6  4.8  1.2  1.5 
>     1.2  1.4 1.1  1.6  1.2  1.1  1.1  1.1  1.2  1.2
>
>     Avg:   2.1, Min:   0.8, Max:  11.5, Diff:  10.7]
>
>     Here is a snap shot after 19 hours. Here the pause is around 280ms
>
>           [Ext Root Scanning (ms):  1.2  184.7  1.3 1.3  1.8  6.3 
>     1.7  1.2  1.5  1.2  1.2 1.1  1.2  1.1  1.2  1.1  1.2  1.2
>
>     Avg:  11.8, Min:   1.1, Max: 184.7, Diff: 183.6]
>
>     It seems that some task is linearly increasing with time, which
>     only effects one thread.
>
>     After manually firing a full gc the total pause time returns back
>     to around 80ms
>
>     After full GC
>
>     [Ext Root Scanning (ms):  2.4  1.7  4.5  2.6 4.6  2.1  2.1  1.7 
>     2.1  1.8  1.8  2.2 0.6  0.0  0.0  0.0  0.0  0.0
>
>     Avg:   1.7, Min:   0.0, Max:   4.6, Diff:   4.6]
>
>     The test is run with a constant load applied on the application
>     that should hold the machine at around load 6.
>
>     We have around 3GB of data within the heap which  will very rarely
>     become garbage, life of these objects would be several hours to days.
>
>     the rest will only live for 10s of milliseconds.
>
>     The JVM memory usage floats between 4-6gb.
>
>     Have checked a thread dump. There are no threads that have very
>     large stack traces.
>
>     What could cause this increasing pause durations? Is there any way
>     to get more information out of what that thread is actually trying
>     to do, or any tuning options?
>
>     Environment
>
>     JVM Arguments
>
>     -Xms8g
>
>     -Xmx8g
>
>     -XX:+UseG1GC
>
>     -XX:InitiatingHeapOccupancyPercent=0 #found that having this at
>     zero has greatly reduced the frequency of GC pause over 500ms and
>     the overhead is not that noticeable to our application
>
>     -XX:MaxGCPauseMillis=70
>
>     -XX:+UseLargePages
>
>     Environment
>
>     java version "1.7.0_13"
>
>     Java(TM) SE Runtime Environment (build 1.7.0_13-b20)
>
>     Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode)
>
>     Operating System
>
>     redhat 5.8 machine.
>
>     The machine has 12 cores/ 24threads and 48gb of ram.
>
>     Cheers,
>
>     *Ashley Taylor*
>
>     Software Engineer
>
>     Email:ashley.taylor at sli-systems.com
>     <mailto:ashley.taylor at sli-systems.com>
>
>     Website: www.sli-systems.com <http://www.sli-systems.com/>
>
>     Blog: blog.sli-systems.com <http://blog.sli-systems.com/>
>
>     Podcast: EcommercePodcast.com <http://ecommercepodcast.com/>
>
>     Twitter: www.twitter.com/slisystems
>     <http://www.twitter.com/slisystems>
>
>     _______________________________________________
>     hotspot-gc-use mailing list
>     hotspot-gc-use at openjdk.java.net
>     <mailto:hotspot-gc-use at openjdk.java.net>
>     http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
>
>
>     _______________________________________________
>
>     hotspot-gc-use mailing list
>
>     hotspot-gc-use at openjdk.java.net  <mailto:hotspot-gc-use at openjdk.java.net>
>
>     http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130219/0c13f81c/attachment-0001.html 

From ysr1729 at gmail.com  Tue Feb 19 22:24:55 2013
From: ysr1729 at gmail.com (Srinivas Ramakrishna)
Date: Tue, 19 Feb 2013 22:24:55 -0800
Subject: G1 garbage collection Ext Root Scanning time increase linearly as
	application runs
In-Reply-To: <51242910.8080604@oracle.com>
References: <407A2CFDD3D8024187AFF7A7A4CC34344C5EE597@ex-nz1.globalbrain.net>
	<CAApERua2nEgOgBGuvwESF8QPOdVedYhyhQhs8yShYF7O6O-zbw@mail.gmail.com>
	<407A2CFDD3D8024187AFF7A7A4CC34344C5EE6C7@ex-nz1.globalbrain.net>
	<CAApERua1oiFv_KuV357aYctzBZdjO9aJUdzPngH6EBS_KpWdEg@mail.gmail.com>
	<407A2CFDD3D8024187AFF7A7A4CC34344C5F03F9@ex-nz1.globalbrain.net>
	<5123DA32.806@oracle.com>
	<407A2CFDD3D8024187AFF7A7A4CC34344C5F085D@ex-nz1.globalbrain.net>
	<51242910.8080604@oracle.com>
Message-ID: <147D1F53-A270-4CBA-8490-A242477EB1E1@gmail.com>

Perhaps Ashley could build an instrumented jvm with time trace around the various external root groups scanned serially and the answer would be immediate?

ysr1729

On Feb 19, 2013, at 17:38, John Cuthbertson <john.cuthbertson at oracle.com> wrote:

> Hi Ashely,
> 
> Off the top of my head there's also the intern string table. I'll have to look at the code to figure out what else it could be.
> 
> Thanks for the info.
> 
> JohnC
> 
> On 2/19/2013 5:11 PM, Ashley Taylor       wrote:
>> Hi John
>>  
>> I reran my application with the JIT log turned on. It seems that once the application has been running for a while there is very little activity within the JIT log but the pause times keep climbing, I ran it for 4 hours and the ?Ext Root Scan? had climbed to 40ms.
>>  
>> At the 4 hour point I also performed a full gc to see how many classes would be unload and it was only 50. We have around 5500 loaded classes.
>> The number of loaded classes also does not increase once the application has run for a while.
>>  
>> I also used jstat to see how full the permanent memory region is, it is slowly climbing the full gc did not seem to reduce it at all, however the full gc did fix the pause time.
>>  
>> The permanent region is currently at 89.17% and seems to increase by 0.01% every couple of minutes.
>>  
>> Is there any other GC events that only happen at a full gc?
>>  
>> Cheers,
>> Ashley
>>  
>>  
>>  
>> From: hotspot-gc-use-bounces at openjdk.java.net [mailto:hotspot-gc-use-bounces at openjdk.java.net] On Behalf Of John Cuthbertson
>> Sent: Wednesday, 20 February 2013 9:10 a.m.
>> To: hotspot-gc-use at openjdk.java.net
>> Subject: Re: G1 garbage collection Ext Root Scanning time increase linearly as application runs
>>  
>> Hi Ashley,
>> 
>> Basically as you surmise one the GC worker threads is being held up when processing a single root. I've seen s similar           issue that's caused by filling up the code cache (where JIT compiled methods are held). The code cache is treated as a single root and so is claimed in its entirety by a single GC worker thread. As a the code cache fills up, the thread that claims the code cache to scan starts getting held up.
>> 
>> A full GC clears the issue because that's where G1 currently does class unloading: the full GC unloads a whole bunch of classes allowing any the compiled code of any of the unloaded classes' methods to be freed by the nmethod sweeper. So after a a full GC the number of compiled methods in the code cache is less.
>> 
>> It could also be the just the sheer number of loaded classes as the system dictionary is also treated as a single claimable root.
>> 
>> I think there's a couple existing CRs to track this. I'll see if I can find the numbers.
>> 
>> Regards,
>> 
>> JohnC
>> 
>> On 2/19/2013 11:24 AM, Ashley Taylor wrote:
>> Hi Matt
>>  
>> Seems that the issue I?m experiencing is unrelated to JNI same issue with JNI calls mocked.
>> Reading that post I noticed that your gc pauses where still increasing after a full gc. In our case a full gc will fix the issue.
>> Will have to keep hunting for the cause in my application.
>>  
>> 
>> Cheers,
>> Ashley
>>  
>> From: Matt Fowles [mailto:matt.fowles at gmail.com] 
>> Sent: Tuesday, 19 February 2013 3:49 p.m.
>> To: Ashley Taylor
>> Cc: hotspot-gc-use at openjdk.java.net
>> Subject: Re: G1 garbage collection Ext Root Scanning time increase linearly as application runs
>>  
>> Ashley~
>>  
>> The issue I was seeing was actually in CMS not G1, but it was eventually tracked down to leaking LocalReferences in the JNI.  Each LocalRef (or likely GlobalRef) adds 4 bytes to a section that has to be scanned every GC.  If these build up without bound, you end up with growing GC times.
>>  
>> The issue that I found essentially boiled down to GetMethodID calls creating a LocalRef and not being freed.
>>  
>> You can find the full painful search here:
>>  
>> http://web.archiveorange.com/archive/v/Dp7Rf33tij5BFBNRpVja
>>  
>> My minimal reproduction is
>>  
>> http://web.archiveorange.com/archive/v/Dp7Rf33tij5BFBNRpVja#YnJRjM4IVyt54TV
>>  
>> I sincerely hope my painful experience can save you time ;-)
>>  
>> Matt
>>  
>>  
>>  
>> 
>> On Mon, Feb 18, 2013 at 9:29 PM, Ashley Taylor <ashley.taylor at sli-systems.com> wrote:
>> Hi Matt
>> Thanks for the quick response.
>>  
>> Yes we do have JNI in this setup, I will disable the JNI link and rerun the test.
>> If it is JNI can you elaborate what you mean by leaked handle in a JNI thread and how we would go about identifying and fixing that.
>>  
>> Cheers,
>> Ashley
>>  
>> From: Matt Fowles [mailto:matt.fowles at gmail.com] 
>> Sent: Tuesday, 19 February 2013 3:04 p.m.
>> To: Ashley Taylor
>> Cc: hotspot-gc-use at openjdk.java.net
>> Subject: Re: G1 garbage collection Ext                         Root Scanning time increase linearly as                         application runs
>>  
>> Ashley~
>>  
>> Do you have any JNI in the setup?  I saw a similar issue that was painstakingly tracked down to a leaked handle in a JNI thread.
>>  
>> Matt
>>  
>> 
>> On Mon, Feb 18, 2013 at 8:12 PM, Ashley Taylor <ashley.taylor at sli-systems.com> wrote:
>> Hi,
>>  
>> We are testing the performance of the G1                                 garbage collection.
>> Our goal is to be able to remove the full gc pause that eventually happens when we CMS.
>>  
>> We have noticed that the garbage collection pause time starts off really well but over time it keeps climbing.
>>  
>> Looking at the logs we see that the section that is increasing linearly with time is the Ext Root Scanning
>> Here is a Root Scanning 1 Hour into the application here the total gc pause is around 80ms
>>   [Ext Root Scanning (ms):  11.5  0.8  1.5  1.8  1.6  4.8  1.2  1.5  1.2  1.4  1.1  1.6  1.2  1.1  1.1  1.1  1.2  1.2
>>        Avg:   2.1, Min:   0.8, Max:  11.5, Diff:  10.7]
>>  
>>  
>> Here is a snap shot after 19 hours. Here the pause is around 280ms
>>       [Ext Root Scanning (ms):  1.2  184.7  1.3  1.3  1.8  6.3  1.7  1.2  1.5  1.2  1.2  1.1  1.2  1.1  1.2  1.1  1.2  1.2
>>        Avg:  11.8, Min:   1.1, Max: 184.7, Diff: 183.6]
>>  
>> It seems that some task is linearly increasing with time, which only effects one thread.
>>  
>> After manually firing a full gc the total pause time returns back to around 80ms
>>  
>> After full GC
>> [Ext Root Scanning (ms):  2.4  1.7  4.5  2.6  4.6  2.1  2.1  1.7  2.1  1.8  1.8  2.2  0.6  0.0  0.0  0.0  0.0  0.0
>>        Avg:   1.7, Min:   0.0, Max:   4.6, Diff:   4.6]
>>  
>>  
>> The test is run with a constant load applied on the application that should hold the machine at around load 6.
>> We have around 3GB of data within the heap which  will very rarely become garbage, life of these objects would be several hours to days.
>> the rest will only live for 10s of milliseconds.
>> The JVM memory usage floats between 4-6gb.
>>  
>> Have checked a thread dump. There are no threads that have very large stack traces.
>> What could cause this increasing pause durations? Is there any way to get more information out of what that thread is actually trying to do, or any tuning options?
>>  
>>  
>> Environment
>>  
>> JVM Arguments
>> -Xms8g
>> -Xmx8g
>> -XX:+UseG1GC
>> -XX:InitiatingHeapOccupancyPercent=0                                 #found that having this at zero has greatly reduced the frequency of GC pause over 500ms and the overhead is not that noticeable to our application
>> -XX:MaxGCPauseMillis=70
>> -XX:+UseLargePages
>>  
>>  
>> Environment
>> java version "1.7.0_13"
>> Java(TM) SE Runtime Environment (build 1.7.0_13-b20)
>> Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode)
>>  
>>  
>> Operating System
>> redhat 5.8 machine.
>> The machine has 12 cores/ 24threads and 48gb of ram.
>>  
>>  
>>  
>> Cheers,
>> Ashley Taylor
>> Software Engineer
>> Email: ashley.taylor at sli-systems.com
>> Website: www.sli-systems.com
>> Blog: blog.sli-systems.com
>> Podcast: EcommercePodcast.com
>> Twitter: www.twitter.com/slisystems
>>  
>>  
>>  
>> 
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>  
>>  
>> 
>> 
>> 
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
> 
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130219/dd586629/attachment-0001.html 

From john.cuthbertson at oracle.com  Wed Feb 20 10:56:07 2013
From: john.cuthbertson at oracle.com (John Cuthbertson)
Date: Wed, 20 Feb 2013 10:56:07 -0800
Subject: G1 garbage collection Ext Root Scanning time increase linearly
	as application runs
In-Reply-To: <147D1F53-A270-4CBA-8490-A242477EB1E1@gmail.com>
References: <407A2CFDD3D8024187AFF7A7A4CC34344C5EE597@ex-nz1.globalbrain.net>
	<CAApERua2nEgOgBGuvwESF8QPOdVedYhyhQhs8yShYF7O6O-zbw@mail.gmail.com>
	<407A2CFDD3D8024187AFF7A7A4CC34344C5EE6C7@ex-nz1.globalbrain.net>
	<CAApERua1oiFv_KuV357aYctzBZdjO9aJUdzPngH6EBS_KpWdEg@mail.gmail.com>
	<407A2CFDD3D8024187AFF7A7A4CC34344C5F03F9@ex-nz1.globalbrain.net>
	<5123DA32.806@oracle.com>
	<407A2CFDD3D8024187AFF7A7A4CC34344C5F085D@ex-nz1.globalbrain.net>
	<51242910.8080604@oracle.com>
	<147D1F53-A270-4CBA-8490-A242477EB1E1@gmail.com>
Message-ID: <51251C47.6090009@oracle.com>

Hi Ramki,

This is what I was thinking. An internal group has also seen the same 
problem and has offered to run with an instrumented build. If Ashley is 
willing I could supply a temporary patch.

JohnC

On 2/19/2013 10:24 PM, Srinivas Ramakrishna wrote:
> Perhaps Ashley could build an instrumented jvm with time trace around 
> the various external root groups scanned serially and the answer would 
> be immediate?
>
> ysr1729
>
> On Feb 19, 2013, at 17:38, John Cuthbertson 
> <john.cuthbertson at oracle.com <mailto:john.cuthbertson at oracle.com>> wrote:
>
>> Hi Ashely,
>>
>> Off the top of my head there's also the intern string table. I'll 
>> have to look at the code to figure out what else it could be.
>>
>> Thanks for the info.
>>
>> JohnC
>>
>> On 2/19/2013 5:11 PM, Ashley Taylor wrote:
>>>
>>> Hi John
>>>
>>> I reran my application with the JIT log turned on. It seems that 
>>> once the application has been running for a while there is very 
>>> little activity within the JIT log but the pause times keep 
>>> climbing, I ran it for 4 hours and the ?Ext Root Scan? had climbed 
>>> to 40ms.
>>>
>>> At the 4 hour point I also performed a full gc to see how many 
>>> classes would be unload and it was only 50. We have around 5500 
>>> loaded classes.
>>>
>>> The number of loaded classes also does not increase once the 
>>> application has run for a while.
>>>
>>> I also used jstat to see how full the permanent memory region is, it 
>>> is slowly climbing the full gc did not seem to reduce it at all, 
>>> however the full gc did fix the pause time.
>>>
>>> The permanent region is currently at 89.17% and seems to increase by 
>>> 0.01% every couple of minutes.
>>>
>>> Is there any other GC events that only happen at a full gc?
>>>
>>> Cheers,
>>>
>>> Ashley
>>>
>>> *From:*hotspot-gc-use-bounces at openjdk.java.net 
>>> [mailto:hotspot-gc-use-bounces at openjdk.java.net] *On Behalf Of *John 
>>> Cuthbertson
>>> *Sent:* Wednesday, 20 February 2013 9:10 a.m.
>>> *To:* hotspot-gc-use at openjdk.java.net
>>> *Subject:* Re: G1 garbage collection Ext Root Scanning time increase 
>>> linearly as application runs
>>>
>>> Hi Ashley,
>>>
>>> Basically as you surmise one the GC worker threads is being held up 
>>> when processing a single root. I've seen s similar issue that's 
>>> caused by filling up the code cache (where JIT compiled methods are 
>>> held). The code cache is treated as a single root and so is claimed 
>>> in its entirety by a single GC worker thread. As a the code cache 
>>> fills up, the thread that claims the code cache to scan starts 
>>> getting held up.
>>>
>>> A full GC clears the issue because that's where G1 currently does 
>>> class unloading: the full GC unloads a whole bunch of classes 
>>> allowing any the compiled code of any of the unloaded classes' 
>>> methods to be freed by the nmethod sweeper. So after a a full GC the 
>>> number of compiled methods in the code cache is less.
>>>
>>> It could also be the just the sheer number of loaded classes as the 
>>> system dictionary is also treated as a single claimable root.
>>>
>>> I think there's a couple existing CRs to track this. I'll see if I 
>>> can find the numbers.
>>>
>>> Regards,
>>>
>>> JohnC
>>>
>>> On 2/19/2013 11:24 AM, Ashley Taylor wrote:
>>>
>>>     Hi Matt
>>>
>>>     Seems that the issue I?m experiencing is unrelated to JNI same
>>>     issue with JNI calls mocked.
>>>
>>>     Reading that post I noticed that your gc pauses where still
>>>     increasing after a full gc. In our case a full gc will fix the
>>>     issue.
>>>
>>>     Will have to keep hunting for the cause in my application.
>>>
>>>
>>>     Cheers,
>>>
>>>     Ashley
>>>
>>>     *From:*Matt Fowles [mailto:matt.fowles at gmail.com]
>>>     *Sent:* Tuesday, 19 February 2013 3:49 p.m.
>>>     *To:* Ashley Taylor
>>>     *Cc:* hotspot-gc-use at openjdk.java.net
>>>     <mailto:hotspot-gc-use at openjdk.java.net>
>>>     *Subject:* Re: G1 garbage collection Ext Root Scanning time
>>>     increase linearly as application runs
>>>
>>>     Ashley~
>>>
>>>     The issue I was seeing was actually in CMS not G1, but it was
>>>     eventually tracked down to leaking LocalReferences in the JNI.
>>>      Each LocalRef (or likely GlobalRef) adds 4 bytes to a section
>>>     that has to be scanned every GC.  If these build up without
>>>     bound, you end up with growing GC times.
>>>
>>>     The issue that I found essentially boiled down to GetMethodID
>>>     calls creating a LocalRef and not being freed.
>>>
>>>     You can find the full painful search here:
>>>
>>>     http://web.archiveorange.com/archive/v/Dp7Rf33tij5BFBNRpVja
>>>
>>>     My minimal reproduction is
>>>
>>>     http://web.archiveorange.com/archive/v/Dp7Rf33tij5BFBNRpVja#YnJRjM4IVyt54TV
>>>
>>>     I sincerely hope my painful experience can save you time ;-)
>>>
>>>     Matt
>>>
>>>     On Mon, Feb 18, 2013 at 9:29 PM, Ashley Taylor
>>>     <ashley.taylor at sli-systems.com
>>>     <mailto:ashley.taylor at sli-systems.com>> wrote:
>>>
>>>     Hi Matt
>>>
>>>     Thanks for the quick response.
>>>
>>>     Yes we do have JNI in this setup, I will disable the JNI link
>>>     and rerun the test.
>>>
>>>     If it is JNI can you elaborate what you mean by leaked handle in
>>>     a JNI thread and how we would go about identifying and fixing that.
>>>
>>>     Cheers,
>>>
>>>     Ashley
>>>
>>>     *From:*Matt Fowles [mailto:matt.fowles at gmail.com
>>>     <mailto:matt.fowles at gmail.com>]
>>>     *Sent:* Tuesday, 19 February 2013 3:04 p.m.
>>>     *To:* Ashley Taylor
>>>     *Cc:* hotspot-gc-use at openjdk.java.net
>>>     <mailto:hotspot-gc-use at openjdk.java.net>
>>>     *Subject:* Re: G1 garbage collection Ext Root Scanning time
>>>     increase linearly as application runs
>>>
>>>     Ashley~
>>>
>>>     Do you have any JNI in the setup?  I saw a similar issue that
>>>     was painstakingly tracked down to a leaked handle in a JNI thread.
>>>
>>>     Matt
>>>
>>>     On Mon, Feb 18, 2013 at 8:12 PM, Ashley Taylor
>>>     <ashley.taylor at sli-systems.com
>>>     <mailto:ashley.taylor at sli-systems.com>> wrote:
>>>
>>>     Hi,
>>>
>>>     We are testing the performance of the G1 garbage collection.
>>>
>>>     Our goal is to be able to remove the full gc pause that
>>>     eventually happens when we CMS.
>>>
>>>     We have noticed that the garbage collection pause time starts
>>>     off really well but over time it keeps climbing.
>>>
>>>     Looking at the logs we see that the section that is increasing
>>>     linearly with time is the Ext Root Scanning
>>>
>>>     Here is a Root Scanning 1 Hour into the application here the
>>>     total gc pause is around 80ms
>>>
>>>     [Ext Root Scanning (ms):  11.5 0.8  1.5  1.8  1.6  4.8  1.2  1.5
>>>     1.2  1.4  1.1  1.6  1.2  1.1  1.1 1.1  1.2  1.2
>>>
>>>     Avg:   2.1, Min:   0.8, Max: 11.5, Diff:  10.7]
>>>
>>>     Here is a snap shot after 19 hours. Here the pause is around 280ms
>>>
>>>           [Ext Root Scanning (ms):  1.2  184.7 1.3  1.3  1.8  6.3 
>>>     1.7  1.2  1.5 1.2  1.2  1.1  1.2  1.1  1.2  1.1  1.2  1.2
>>>
>>>     Avg:  11.8, Min:   1.1, Max: 184.7, Diff: 183.6]
>>>
>>>     It seems that some task is linearly increasing with time, which
>>>     only effects one thread.
>>>
>>>     After manually firing a full gc the total pause time returns
>>>     back to around 80ms
>>>
>>>     After full GC
>>>
>>>     [Ext Root Scanning (ms):  2.4  1.7 4.5  2.6  4.6  2.1  2.1  1.7 
>>>     2.1 1.8  1.8  2.2  0.6  0.0  0.0  0.0 0.0  0.0
>>>
>>>     Avg:   1.7, Min:   0.0, Max: 4.6, Diff:   4.6]
>>>
>>>     The test is run with a constant load applied on the application
>>>     that should hold the machine at around load 6.
>>>
>>>     We have around 3GB of data within the heap which  will very
>>>     rarely become garbage, life of these objects would be several
>>>     hours to days.
>>>
>>>     the rest will only live for 10s of milliseconds.
>>>
>>>     The JVM memory usage floats between 4-6gb.
>>>
>>>     Have checked a thread dump. There are no threads that have very
>>>     large stack traces.
>>>
>>>     What could cause this increasing pause durations? Is there any
>>>     way to get more information out of what that thread is actually
>>>     trying to do, or any tuning options?
>>>
>>>     Environment
>>>
>>>     JVM Arguments
>>>
>>>     -Xms8g
>>>
>>>     -Xmx8g
>>>
>>>     -XX:+UseG1GC
>>>
>>>     -XX:InitiatingHeapOccupancyPercent=0 #found that having this at
>>>     zero has greatly reduced the frequency of GC pause over 500ms
>>>     and the overhead is not that noticeable to our application
>>>
>>>     -XX:MaxGCPauseMillis=70
>>>
>>>     -XX:+UseLargePages
>>>
>>>     Environment
>>>
>>>     java version "1.7.0_13"
>>>
>>>     Java(TM) SE Runtime Environment (build 1.7.0_13-b20)
>>>
>>>     Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode)
>>>
>>>     Operating System
>>>
>>>     redhat 5.8 machine.
>>>
>>>     The machine has 12 cores/ 24threads and 48gb of ram.
>>>
>>>     Cheers,
>>>
>>>     *Ashley Taylor*
>>>
>>>     Software Engineer
>>>
>>>     Email:ashley.taylor at sli-systems.com
>>>     <mailto:ashley.taylor at sli-systems.com>
>>>
>>>     Website: www.sli-systems.com <http://www.sli-systems.com/>
>>>
>>>     Blog: blog.sli-systems.com <http://blog.sli-systems.com/>
>>>
>>>     Podcast: EcommercePodcast.com <http://ecommercepodcast.com/>
>>>
>>>     Twitter: www.twitter.com/slisystems
>>>     <http://www.twitter.com/slisystems>
>>>
>>>     _______________________________________________
>>>     hotspot-gc-use mailing list
>>>     hotspot-gc-use at openjdk.java.net
>>>     <mailto:hotspot-gc-use at openjdk.java.net>
>>>     http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>
>>>
>>>
>>>
>>>     _______________________________________________
>>>
>>>     hotspot-gc-use mailing list
>>>
>>>     hotspot-gc-use at openjdk.java.net  <mailto:hotspot-gc-use at openjdk.java.net>
>>>
>>>     http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>
>>
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net <mailto:hotspot-gc-use at openjdk.java.net>
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130220/1ac5eedd/attachment-0001.html 

From ashley.taylor at sli-systems.com  Wed Feb 20 11:12:54 2013
From: ashley.taylor at sli-systems.com (Ashley Taylor)
Date: Wed, 20 Feb 2013 19:12:54 +0000
Subject: G1 garbage collection Ext Root Scanning time increase linearly
	as application runs
In-Reply-To: <51251C47.6090009@oracle.com>
References: <407A2CFDD3D8024187AFF7A7A4CC34344C5EE597@ex-nz1.globalbrain.net>
	<CAApERua2nEgOgBGuvwESF8QPOdVedYhyhQhs8yShYF7O6O-zbw@mail.gmail.com>
	<407A2CFDD3D8024187AFF7A7A4CC34344C5EE6C7@ex-nz1.globalbrain.net>
	<CAApERua1oiFv_KuV357aYctzBZdjO9aJUdzPngH6EBS_KpWdEg@mail.gmail.com>
	<407A2CFDD3D8024187AFF7A7A4CC34344C5F03F9@ex-nz1.globalbrain.net>
	<5123DA32.806@oracle.com>
	<407A2CFDD3D8024187AFF7A7A4CC34344C5F085D@ex-nz1.globalbrain.net>
	<51242910.8080604@oracle.com>
	<147D1F53-A270-4CBA-8490-A242477EB1E1@gmail.com>
	<51251C47.6090009@oracle.com>
Message-ID: <407A2CFDD3D8024187AFF7A7A4CC34344C5F1EF3@ex-nz1.globalbrain.net>

Hi John

I would be willing to run the instrumented build.


Ran the test for 15 hours watching the intern string table using jmap

It grew by about 10%
Start 10917 interned Strings occupying   951752 bytes.
End   11801 interned Strings occupying 1031976 bytes.

over the same test the Ext Root Scanning pauses increased to 130ms

Cheers,
Ashley

From: John Cuthbertson [mailto:john.cuthbertson at oracle.com]
Sent: Thursday, 21 February 2013 7:56 a.m.
To: Srinivas Ramakrishna
Cc: Ashley Taylor; hotspot-gc-use at openjdk.java.net
Subject: Re: G1 garbage collection Ext Root Scanning time increase linearly as application runs

Hi Ramki,

This is what I was thinking. An internal group has also seen the same problem and has offered to run with an instrumented build. If Ashley is willing I could supply a temporary patch.

JohnC
On 2/19/2013 10:24 PM, Srinivas Ramakrishna wrote:
Perhaps Ashley could build an instrumented jvm with time trace around the various external root groups scanned serially and the answer would be immediate?

ysr1729

On Feb 19, 2013, at 17:38, John Cuthbertson <john.cuthbertson at oracle.com<mailto:john.cuthbertson at oracle.com>> wrote:
Hi Ashely,

Off the top of my head there's also the intern string table. I'll have to look at the code to figure out what else it could be.

Thanks for the info.

JohnC
On 2/19/2013 5:11 PM, Ashley Taylor wrote:
Hi John

I reran my application with the JIT log turned on. It seems that once the application has been running for a while there is very little activity within the JIT log but the pause times keep climbing, I ran it for 4 hours and the ?Ext Root Scan? had climbed to 40ms.

At the 4 hour point I also performed a full gc to see how many classes would be unload and it was only 50. We have around 5500 loaded classes.
The number of loaded classes also does not increase once the application has run for a while.

I also used jstat to see how full the permanent memory region is, it is slowly climbing the full gc did not seem to reduce it at all, however the full gc did fix the pause time.

The permanent region is currently at 89.17% and seems to increase by 0.01% every couple of minutes.

Is there any other GC events that only happen at a full gc?

Cheers,
Ashley


From: hotspot-gc-use-bounces at openjdk.java.net<mailto:hotspot-gc-use-bounces at openjdk.java.net> [mailto:hotspot-gc-use-bounces at openjdk.java.net] On Behalf Of John Cuthbertson
Sent: Wednesday, 20 February 2013 9:10 a.m.
To: hotspot-gc-use at openjdk.java.net<mailto:hotspot-gc-use at openjdk.java.net>
Subject: Re: G1 garbage collection Ext Root Scanning time increase linearly as application runs

Hi Ashley,

Basically as you surmise one the GC worker threads is being held up when processing a single root. I've seen s similar issue that's caused by filling up the code cache (where JIT compiled methods are held). The code cache is treated as a single root and so is claimed in its entirety by a single GC worker thread. As a the code cache fills up, the thread that claims the code cache to scan starts getting held up.

A full GC clears the issue because that's where G1 currently does class unloading: the full GC unloads a whole bunch of classes allowing any the compiled code of any of the unloaded classes' methods to be freed by the nmethod sweeper. So after a a full GC the number of compiled methods in the code cache is less.

It could also be the just the sheer number of loaded classes as the system dictionary is also treated as a single claimable root.

I think there's a couple existing CRs to track this. I'll see if I can find the numbers.

Regards,

JohnC
On 2/19/2013 11:24 AM, Ashley Taylor wrote:
Hi Matt

Seems that the issue I?m experiencing is unrelated to JNI same issue with JNI calls mocked.
Reading that post I noticed that your gc pauses where still increasing after a full gc. In our case a full gc will fix the issue.
Will have to keep hunting for the cause in my application.


Cheers,
Ashley

From: Matt Fowles [mailto:matt.fowles at gmail.com]
Sent: Tuesday, 19 February 2013 3:49 p.m.
To: Ashley Taylor
Cc: hotspot-gc-use at openjdk.java.net<mailto:hotspot-gc-use at openjdk.java.net>
Subject: Re: G1 garbage collection Ext Root Scanning time increase linearly as application runs

Ashley~

The issue I was seeing was actually in CMS not G1, but it was eventually tracked down to leaking LocalReferences in the JNI.  Each LocalRef (or likely GlobalRef) adds 4 bytes to a section that has to be scanned every GC.  If these build up without bound, you end up with growing GC times.

The issue that I found essentially boiled down to GetMethodID calls creating a LocalRef and not being freed.

You can find the full painful search here:

http://web.archiveorange.com/archive/v/Dp7Rf33tij5BFBNRpVja

My minimal reproduction is

http://web.archiveorange.com/archive/v/Dp7Rf33tij5BFBNRpVja#YnJRjM4IVyt54TV

I sincerely hope my painful experience can save you time ;-)

Matt


On Mon, Feb 18, 2013 at 9:29 PM, Ashley Taylor <ashley.taylor at sli-systems.com<mailto:ashley.taylor at sli-systems.com>> wrote:
Hi Matt
Thanks for the quick response.

Yes we do have JNI in this setup, I will disable the JNI link and rerun the test.
If it is JNI can you elaborate what you mean by leaked handle in a JNI thread and how we would go about identifying and fixing that.

Cheers,
Ashley

From: Matt Fowles [mailto:matt.fowles at gmail.com<mailto:matt.fowles at gmail.com>]
Sent: Tuesday, 19 February 2013 3:04 p.m.
To: Ashley Taylor
Cc: hotspot-gc-use at openjdk.java.net<mailto:hotspot-gc-use at openjdk.java.net>
Subject: Re: G1 garbage collection Ext Root Scanning time increase linearly as application runs

Ashley~

Do you have any JNI in the setup?  I saw a similar issue that was painstakingly tracked down to a leaked handle in a JNI thread.

Matt

On Mon, Feb 18, 2013 at 8:12 PM, Ashley Taylor <ashley.taylor at sli-systems.com<mailto:ashley.taylor at sli-systems.com>> wrote:
Hi,

We are testing the performance of the G1 garbage collection.
Our goal is to be able to remove the full gc pause that eventually happens when we CMS.

We have noticed that the garbage collection pause time starts off really well but over time it keeps climbing.

Looking at the logs we see that the section that is increasing linearly with time is the Ext Root Scanning
Here is a Root Scanning 1 Hour into the application here the total gc pause is around 80ms
  [Ext Root Scanning (ms):  11.5  0.8  1.5  1.8  1.6  4.8  1.2  1.5  1.2  1.4  1.1  1.6  1.2  1.1  1.1  1.1  1.2  1.2
       Avg:   2.1, Min:   0.8, Max:  11.5, Diff:  10.7]


Here is a snap shot after 19 hours. Here the pause is around 280ms
      [Ext Root Scanning (ms):  1.2  184.7  1.3  1.3  1.8  6.3  1.7  1.2  1.5  1.2  1.2  1.1  1.2  1.1  1.2  1.1  1.2  1.2
       Avg:  11.8, Min:   1.1, Max: 184.7, Diff: 183.6]

It seems that some task is linearly increasing with time, which only effects one thread.

After manually firing a full gc the total pause time returns back to around 80ms

After full GC
[Ext Root Scanning (ms):  2.4  1.7  4.5  2.6  4.6  2.1  2.1  1.7  2.1  1.8  1.8  2.2  0.6  0.0  0.0  0.0  0.0  0.0
       Avg:   1.7, Min:   0.0, Max:   4.6, Diff:   4.6]


The test is run with a constant load applied on the application that should hold the machine at around load 6.
We have around 3GB of data within the heap which  will very rarely become garbage, life of these objects would be several hours to days.
the rest will only live for 10s of milliseconds.
The JVM memory usage floats between 4-6gb.

Have checked a thread dump. There are no threads that have very large stack traces.
What could cause this increasing pause durations? Is there any way to get more information out of what that thread is actually trying to do, or any tuning options?


Environment

JVM Arguments
-Xms8g
-Xmx8g
-XX:+UseG1GC
-XX:InitiatingHeapOccupancyPercent=0 #found that having this at zero has greatly reduced the frequency of GC pause over 500ms and the overhead is not that noticeable to our application
-XX:MaxGCPauseMillis=70
-XX:+UseLargePages


Environment
java version "1.7.0_13"
Java(TM) SE Runtime Environment (build 1.7.0_13-b20)
Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode)


Operating System
redhat 5.8 machine.
The machine has 12 cores/ 24threads and 48gb of ram.


Cheers,
Ashley Taylor
Software Engineer
Email: ashley.taylor at sli-systems.com<mailto:ashley.taylor at sli-systems.com>
Website: www.sli-systems.com<http://www.sli-systems.com/>
Blog: blog.sli-systems.com<http://blog.sli-systems.com/>
Podcast: EcommercePodcast.com<http://ecommercepodcast.com/>
Twitter: www.twitter.com/slisystems<http://www.twitter.com/slisystems>


_______________________________________________
hotspot-gc-use mailing list
hotspot-gc-use at openjdk.java.net<mailto:hotspot-gc-use at openjdk.java.net>
http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use


_______________________________________________

hotspot-gc-use mailing list

hotspot-gc-use at openjdk.java.net<mailto:hotspot-gc-use at openjdk.java.net>

http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use


_______________________________________________
hotspot-gc-use mailing list
hotspot-gc-use at openjdk.java.net<mailto:hotspot-gc-use at openjdk.java.net>
http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130220/e5959e52/attachment-0001.html 

From reachbach at yahoo.com  Fri Feb 22 01:26:58 2013
From: reachbach at yahoo.com (Bharath R)
Date: Fri, 22 Feb 2013 01:26:58 -0800 (PST)
Subject: G1 status in JDK1.6 Vs JDK1.7
In-Reply-To: <5112F380.5040403@oracle.com>
References: <1359962096.18794.YahooMailNeo@web162101.mail.bf1.yahoo.com>
	<1359962818.10581.YahooMailNeo@web162103.mail.bf1.yahoo.com>
	<5112F380.5040403@oracle.com>
Message-ID: <1361525218.18687.YahooMailNeo@web162102.mail.bf1.yahoo.com>

Jesper,

Thanks for the clarification. I'm now running benchmarks against JDK7.


-Bharath


________________________________
 From: Jesper Wilhelmsson <jesper.wilhelmsson at oracle.com>
To: Bharath R <reachbach at yahoo.com> 
Cc: "hotspot-gc-use at openjdk.java.net" <hotspot-gc-use at openjdk.java.net> 
Sent: Thursday, February 7, 2013 5:51 AM
Subject: Re: G1 status in JDK1.6 Vs JDK1.7
 
Hi Bharath,

The first supported release of G1 was with 7u4. The 7u4 version came 
with significant improvements and I do not recommend doing performance 
evaluations with earlier versions.

If you decide to move to JDK 7 and try G1 please share your experiences.
/Jesper


On 4/2/13 8:26 AM, Bharath R wrote:
> Hi,
>
> Is the G1 GC 1.6 port on par with the 1.7 in terms of stability /
> quality? If that is true, I intend to begin experimenting with it in
> production and gradually roll it out across our deployment based on the
> outcome. On a related note, we intend to use G1 for an online system
> with a very low pause time requirement ( <10ms). The hardware is
> heterogeneous in terms of memory (ranges between 12G - 32G available to
> the application process) with comparable CPU configuration. CMS required
> considerable tuning to achieve acceptable results and I'm hoping G1
> would fare better without myraid config options or overrides.
> I'd like to know of comparisons / experience operating G1 in production
> under such conditions. Thanks in advance.
>
> -Bharath
>
> P.S: Using RTJ is not an option for us :)
>
>
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130222/ee654e59/attachment.html