From ryebrye at gmail.com  Tue May  7 07:09:30 2013
From: ryebrye at gmail.com (Ryan Gardner)
Date: Tue, 7 May 2013 10:09:30 -0400
Subject: Is "-XX:G1OldCSetRegionLiveThresholdPercent" a flag in java 7?
Message-ID: <CAEAKNo_b2rxcDy0pYqpBqGC+RqDf=9iw6QB=+QnuZdmT5ypHfQ@mail.gmail.com>

In the slides posted for the G1 tuning session at Java One 2012 here:

http://www.myexpospace.com/JavaOne2012/SessionFiles/CON6583_PDF_6583_0001.pdf

I see "-XX:G1OldCSetRegionLiveThresholdPercent" being listed as one of the
options to try to tune long mixed GC's

I tried using this on Java 1.7.0_21 but it comes back as being an
unrecognized vm option

Is there another secret flag I need to enable to try to tune these bits
more?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130507/d3fb4a74/attachment.html 

From jesper.wilhelmsson at oracle.com  Tue May  7 08:57:54 2013
From: jesper.wilhelmsson at oracle.com (Jesper Wilhelmsson)
Date: Tue, 07 May 2013 17:57:54 +0200
Subject: Is "-XX:G1OldCSetRegionLiveThresholdPercent" a flag in java 7?
In-Reply-To: <CAEAKNo_b2rxcDy0pYqpBqGC+RqDf=9iw6QB=+QnuZdmT5ypHfQ@mail.gmail.com>
References: <CAEAKNo_b2rxcDy0pYqpBqGC+RqDf=9iw6QB=+QnuZdmT5ypHfQ@mail.gmail.com>
Message-ID: <51892482.5090501@oracle.com>

Hi Ryan,

-XX:G1OldCSetRegionLiveThresholdPercent has been replaced by 
-XX:G1MixedGCLiveThresholdPercent

It is also an experimental option which means you should only use it if you know 
what you are doing. To enable experimental options use 
-XX:+UnlockExperimentalVMOptions as shown in some of the examples in the 
presentation.

Hth,
/Jesper


Ryan Gardner skrev 7/5/13 4:09 PM:
> In the slides posted for the G1 tuning session at Java One 2012 here:
>
> http://www.myexpospace.com/JavaOne2012/SessionFiles/CON6583_PDF_6583_0001.pdf
>
> I see "-XX:G1OldCSetRegionLiveThresholdPercent" being listed as one of the
> options to try to tune long mixed GC's
>
> I tried using this on Java 1.7.0_21 but it comes back as being an unrecognized
> vm option
>
> Is there another secret flag I need to enable to try to tune these bits more?
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>

From monica.beckwith at oracle.com  Tue May  7 09:54:39 2013
From: monica.beckwith at oracle.com (Monica Beckwith)
Date: Tue, 07 May 2013 11:54:39 -0500
Subject: Is "-XX:G1OldCSetRegionLiveThresholdPercent" a flag in java 7?
In-Reply-To: <51892482.5090501@oracle.com>
References: <CAEAKNo_b2rxcDy0pYqpBqGC+RqDf=9iw6QB=+QnuZdmT5ypHfQ@mail.gmail.com>
	<51892482.5090501@oracle.com>
Message-ID: <518931CF.8060702@oracle.com>

Ryan,

We have also renamed a couple of other flags. You will find them in this 
CR: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8001424

Thanks!
Monica

On 5/7/2013 10:57 AM, Jesper Wilhelmsson wrote:
> Hi Ryan,
>
> -XX:G1OldCSetRegionLiveThresholdPercent has been replaced by
> -XX:G1MixedGCLiveThresholdPercent
>
> It is also an experimental option which means you should only use it if you know
> what you are doing. To enable experimental options use
> -XX:+UnlockExperimentalVMOptions as shown in some of the examples in the
> presentation.
>
> Hth,
> /Jesper
>
>
> Ryan Gardner skrev 7/5/13 4:09 PM:
>> In the slides posted for the G1 tuning session at Java One 2012 here:
>>
>> http://www.myexpospace.com/JavaOne2012/SessionFiles/CON6583_PDF_6583_0001.pdf
>>
>> I see "-XX:G1OldCSetRegionLiveThresholdPercent" being listed as one of the
>> options to try to tune long mixed GC's
>>
>> I tried using this on Java 1.7.0_21 but it comes back as being an unrecognized
>> vm option
>>
>> Is there another secret flag I need to enable to try to tune these bits more?
>>
>>
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

-- 
Oracle <http://www.oracle.com>
Monica Beckwith | Principal Member of Technical Staff
VOIP: +15124011274 <tel:+15124011274>
Oracle Java Performance

Green Oracle <http://www.oracle.com/commitment> Oracle is committed to 
developing practices and products that help protect the environment
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130507/abaa46dd/attachment.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: oracle_sig_logo.gif
Type: image/gif
Size: 658 bytes
Desc: not available
Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130507/abaa46dd/oracle_sig_logo.gif 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: green-for-email-sig_0.gif
Type: image/gif
Size: 356 bytes
Desc: not available
Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130507/abaa46dd/green-for-email-sig_0.gif 

From the.6th.month at gmail.com  Thu May 16 21:19:22 2013
From: the.6th.month at gmail.com (the.6th.month at gmail.com)
Date: Fri, 17 May 2013 12:19:22 +0800
Subject: unexpected full gc time spike
Message-ID: <CAKzy53nCsSvkpUvSQkGng170Ee9Y7s7+jxih2QRZkKVgEJ68Qw@mail.gmail.com>

hi, all:
We just had a situation that I don't quite understand with CMS gc. When I
examined the gc log, I found that there was a cms gc which resulted in a
parnew promotion failure and concurrent mode failure at the same time, and
then the full gc lasted for slightly over three minutes. Here is the gc log:
2013-05-17T10:12:55.983+0800: 45168.774: [CMS-concurrent-mark: 7.056/7.860
secs] [Times: user=14.90 sys=0.45, real=7.86 secs]
2013-05-17T10:12:55.984+0800: 45168.775: [CMS-concurrent-preclean-start]
2013-05-17T10:12:56.753+0800: 45169.544: [CMS-concurrent-preclean:
0.676/0.770 secs] [Times: user=0.83 sys=0.15, real=0.77 secs]
2013-05-17T10:12:56.753+0800: 45169.544:
[CMS-concurrent-abortable-preclean-start]
2013-05-17T10:12:58.460+0800: 45171.251: [GC 45171.252: [ParNew (promotion
failed)
Desired survivor size 67108864 bytes, new threshold 1 (max 6)
- age   1:   70527216 bytes,   70527216 total
: 917504K->917504K(917504K), 177.3558880 secs]45348.608: [CMS CMS: abort
preclean due to time 2013-05-17T10:15:56.197+0800: 45348.989:
[CMS-concurrent-abortable-preclean: 2.037/179.444 secs] [Times: user=44.72
sys=13.59, real=179.45 secs]
 (concurrent mode failure): 3017323K->2177093K(3047424K), 16.5528620 secs]
3879476K->2177093K(3964928K), [CMS Perm : 91333K->91008K(262144K)],
193.9097970 secs] [Times: user=58.79 sys=13.55, real=193.91 secs]

the usual cms full gc time was roughly 100ms-400ms, but this time it lasted
for 193 seconds. I understand that when there's a parnew gc happens during
cms and to space is not large enough to hold all survived objects, or the
remaining space in old gen cannot cope with memory allocation in old gen,
full gc happens. But I don't understand why it hangs so long.
I am using oracle jdk 1.6.0_37, and the jvm options we use are:
-Xms4000m -Xmx4000m -Xmn1G -Xss256k -XX:PermSize=256m -XX:SurvivorRatio=6
-XX:MaxTenuringThreshold=6 -XX:+DisableExplicitGC -Xnoclassgc -Xverify:none
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
-XX:+UseCMSCompactAtFullCollection -XX:+UseFastAccessorMethods
-XX:+CMSPermGenSweepingEnabled -XX:+CMSClassUnloadingEnabled
-XX:+UseCompressedOops -XX:CMSInitiatingOccupancyFraction=90
-XX:+UseCMSInitiatingOccupancyOnly

Could it be a bug that results in the long full gc in case of promotion
failure or something else? Could anyone offer me some help, and I really
appreciate your help.

Looking forward to any reply.

All the best,
Leon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130517/ac6e67ea/attachment.html 

From ysr1729 at gmail.com  Thu May 16 21:38:44 2013
From: ysr1729 at gmail.com (Srinivas Ramakrishna)
Date: Thu, 16 May 2013 21:38:44 -0700
Subject: unexpected full gc time spike
In-Reply-To: <CAKzy53nCsSvkpUvSQkGng170Ee9Y7s7+jxih2QRZkKVgEJ68Qw@mail.gmail.com>
References: <CAKzy53nCsSvkpUvSQkGng170Ee9Y7s7+jxih2QRZkKVgEJ68Qw@mail.gmail.com>
Message-ID: <CABzyjym3bs=hfPY-SsEQNSoYi6QciHwACEdGRot=G95zLGH0Rg@mail.gmail.com>

Hi Leon --

Yes, there are a couple of performance bugs related to promotion
failure handling with ParNew+CMS that can cause this time to balloon.
Here the unwind of the failed promotion took 177 s. I have at least a
partial fix for this which I had written up a few months ago but never
quite got around to collecting sufficient performance data to submit
it as an official patch.

I'll try and revive that patch and submit it... May be someone else
can check if it helps sufficiently in the performance with promotion
failure.

-- ramki


On Thu, May 16, 2013 at 9:19 PM, the.6th.month at gmail.com
<the.6th.month at gmail.com> wrote:
> hi, all:
> We just had a situation that I don't quite understand with CMS gc. When I
> examined the gc log, I found that there was a cms gc which resulted in a
> parnew promotion failure and concurrent mode failure at the same time, and
> then the full gc lasted for slightly over three minutes. Here is the gc log:
> 2013-05-17T10:12:55.983+0800: 45168.774: [CMS-concurrent-mark: 7.056/7.860
> secs] [Times: user=14.90 sys=0.45, real=7.86 secs]
> 2013-05-17T10:12:55.984+0800: 45168.775: [CMS-concurrent-preclean-start]
> 2013-05-17T10:12:56.753+0800: 45169.544: [CMS-concurrent-preclean:
> 0.676/0.770 secs] [Times: user=0.83 sys=0.15, real=0.77 secs]
> 2013-05-17T10:12:56.753+0800: 45169.544:
> [CMS-concurrent-abortable-preclean-start]
> 2013-05-17T10:12:58.460+0800: 45171.251: [GC 45171.252: [ParNew (promotion
> failed)
> Desired survivor size 67108864 bytes, new threshold 1 (max 6)
> - age   1:   70527216 bytes,   70527216 total
> : 917504K->917504K(917504K), 177.3558880 secs]45348.608: [CMS CMS: abort
> preclean due to time 2013-05-17T10:15:56.197+0800: 45348.989:
> [CMS-concurrent-abortable-preclean: 2.037/179.444 secs] [Times: user=44.72
> sys=13.59, real=179.45 secs]
>  (concurrent mode failure): 3017323K->2177093K(3047424K), 16.5528620 secs]
> 3879476K->2177093K(3964928K), [CMS Perm : 91333K->91008K(262144K)],
> 193.9097970 secs] [Times: user=58.79 sys=13.55, real=193.91 secs]
>
> the usual cms full gc time was roughly 100ms-400ms, but this time it lasted
> for 193 seconds. I understand that when there's a parnew gc happens during
> cms and to space is not large enough to hold all survived objects, or the
> remaining space in old gen cannot cope with memory allocation in old gen,
> full gc happens. But I don't understand why it hangs so long.
> I am using oracle jdk 1.6.0_37, and the jvm options we use are:
> -Xms4000m -Xmx4000m -Xmn1G -Xss256k -XX:PermSize=256m -XX:SurvivorRatio=6
> -XX:MaxTenuringThreshold=6 -XX:+DisableExplicitGC -Xnoclassgc -Xverify:none
> -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
> -XX:+UseCMSCompactAtFullCollection -XX:+UseFastAccessorMethods
> -XX:+CMSPermGenSweepingEnabled -XX:+CMSClassUnloadingEnabled
> -XX:+UseCompressedOops -XX:CMSInitiatingOccupancyFraction=90
> -XX:+UseCMSInitiatingOccupancyOnly
>
> Could it be a bug that results in the long full gc in case of promotion
> failure or something else? Could anyone offer me some help, and I really
> appreciate your help.
>
> Looking forward to any reply.
>
> All the best,
> Leon
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>

From the.6th.month at gmail.com  Thu May 16 21:39:56 2013
From: the.6th.month at gmail.com (the.6th.month at gmail.com)
Date: Fri, 17 May 2013 12:39:56 +0800
Subject: unexpected full gc time spike
In-Reply-To: <CABzyjym3bs=hfPY-SsEQNSoYi6QciHwACEdGRot=G95zLGH0Rg@mail.gmail.com>
References: <CAKzy53nCsSvkpUvSQkGng170Ee9Y7s7+jxih2QRZkKVgEJ68Qw@mail.gmail.com>
	<CABzyjym3bs=hfPY-SsEQNSoYi6QciHwACEdGRot=G95zLGH0Rg@mail.gmail.com>
Message-ID: <CAKzy53=5tt5N2ajjS7-bHba3RjpvPMhPFW7Fv5XTAugJ-je0bQ@mail.gmail.com>

thanks very much indeed, hope we can see your patch soon


On 17 May 2013 12:38, Srinivas Ramakrishna <ysr1729 at gmail.com> wrote:

> Hi Leon --
>
> Yes, there are a couple of performance bugs related to promotion
> failure handling with ParNew+CMS that can cause this time to balloon.
> Here the unwind of the failed promotion took 177 s. I have at least a
> partial fix for this which I had written up a few months ago but never
> quite got around to collecting sufficient performance data to submit
> it as an official patch.
>
> I'll try and revive that patch and submit it... May be someone else
> can check if it helps sufficiently in the performance with promotion
> failure.
>
> -- ramki
>
>
> On Thu, May 16, 2013 at 9:19 PM, the.6th.month at gmail.com
> <the.6th.month at gmail.com> wrote:
> > hi, all:
> > We just had a situation that I don't quite understand with CMS gc. When I
> > examined the gc log, I found that there was a cms gc which resulted in a
> > parnew promotion failure and concurrent mode failure at the same time,
> and
> > then the full gc lasted for slightly over three minutes. Here is the gc
> log:
> > 2013-05-17T10:12:55.983+0800: 45168.774: [CMS-concurrent-mark:
> 7.056/7.860
> > secs] [Times: user=14.90 sys=0.45, real=7.86 secs]
> > 2013-05-17T10:12:55.984+0800: 45168.775: [CMS-concurrent-preclean-start]
> > 2013-05-17T10:12:56.753+0800: 45169.544: [CMS-concurrent-preclean:
> > 0.676/0.770 secs] [Times: user=0.83 sys=0.15, real=0.77 secs]
> > 2013-05-17T10:12:56.753+0800: 45169.544:
> > [CMS-concurrent-abortable-preclean-start]
> > 2013-05-17T10:12:58.460+0800: 45171.251: [GC 45171.252: [ParNew
> (promotion
> > failed)
> > Desired survivor size 67108864 bytes, new threshold 1 (max 6)
> > - age   1:   70527216 bytes,   70527216 total
> > : 917504K->917504K(917504K), 177.3558880 secs]45348.608: [CMS CMS: abort
> > preclean due to time 2013-05-17T10:15:56.197+0800: 45348.989:
> > [CMS-concurrent-abortable-preclean: 2.037/179.444 secs] [Times:
> user=44.72
> > sys=13.59, real=179.45 secs]
> >  (concurrent mode failure): 3017323K->2177093K(3047424K), 16.5528620
> secs]
> > 3879476K->2177093K(3964928K), [CMS Perm : 91333K->91008K(262144K)],
> > 193.9097970 secs] [Times: user=58.79 sys=13.55, real=193.91 secs]
> >
> > the usual cms full gc time was roughly 100ms-400ms, but this time it
> lasted
> > for 193 seconds. I understand that when there's a parnew gc happens
> during
> > cms and to space is not large enough to hold all survived objects, or the
> > remaining space in old gen cannot cope with memory allocation in old gen,
> > full gc happens. But I don't understand why it hangs so long.
> > I am using oracle jdk 1.6.0_37, and the jvm options we use are:
> > -Xms4000m -Xmx4000m -Xmn1G -Xss256k -XX:PermSize=256m -XX:SurvivorRatio=6
> > -XX:MaxTenuringThreshold=6 -XX:+DisableExplicitGC -Xnoclassgc
> -Xverify:none
> > -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
> > -XX:+UseCMSCompactAtFullCollection -XX:+UseFastAccessorMethods
> > -XX:+CMSPermGenSweepingEnabled -XX:+CMSClassUnloadingEnabled
> > -XX:+UseCompressedOops -XX:CMSInitiatingOccupancyFraction=90
> > -XX:+UseCMSInitiatingOccupancyOnly
> >
> > Could it be a bug that results in the long full gc in case of promotion
> > failure or something else? Could anyone offer me some help, and I really
> > appreciate your help.
> >
> > Looking forward to any reply.
> >
> > All the best,
> > Leon
> >
> >
> > _______________________________________________
> > hotspot-gc-use mailing list
> > hotspot-gc-use at openjdk.java.net
> > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130517/1fdd6f2a/attachment.html 

From the.6th.month at gmail.com  Thu May 16 22:06:54 2013
From: the.6th.month at gmail.com (the.6th.month at gmail.com)
Date: Fri, 17 May 2013 13:06:54 +0800
Subject: unexpected full gc time spike
In-Reply-To: <CAKzy53=5tt5N2ajjS7-bHba3RjpvPMhPFW7Fv5XTAugJ-je0bQ@mail.gmail.com>
References: <CAKzy53nCsSvkpUvSQkGng170Ee9Y7s7+jxih2QRZkKVgEJ68Qw@mail.gmail.com>
	<CABzyjym3bs=hfPY-SsEQNSoYi6QciHwACEdGRot=G95zLGH0Rg@mail.gmail.com>
	<CAKzy53=5tt5N2ajjS7-bHba3RjpvPMhPFW7Fv5XTAugJ-je0bQ@mail.gmail.com>
Message-ID: <CAKzy53kz0wV_xK=ZSGP6AxQWqktqgzh3ythGtg2sD-h59CQG1w@mail.gmail.com>

hi, Ramki:
btw, could you possibly explain what the bugs are and how those bugs affect
the fallback fullgc time? I am really curious about the reason.
thanks very much.

all the best,
Leon
 On 17 May 2013 12:39, "the.6th.month at gmail.com" <the.6th.month at gmail.com>
wrote:

> thanks very much indeed, hope we can see your patch soon
>
>
> On 17 May 2013 12:38, Srinivas Ramakrishna <ysr1729 at gmail.com> wrote:
>
>> Hi Leon --
>>
>> Yes, there are a couple of performance bugs related to promotion
>> failure handling with ParNew+CMS that can cause this time to balloon.
>> Here the unwind of the failed promotion took 177 s. I have at least a
>> partial fix for this which I had written up a few months ago but never
>> quite got around to collecting sufficient performance data to submit
>> it as an official patch.
>>
>> I'll try and revive that patch and submit it... May be someone else
>> can check if it helps sufficiently in the performance with promotion
>> failure.
>>
>> -- ramki
>>
>>
>> On Thu, May 16, 2013 at 9:19 PM, the.6th.month at gmail.com
>> <the.6th.month at gmail.com> wrote:
>> > hi, all:
>> > We just had a situation that I don't quite understand with CMS gc. When
>> I
>> > examined the gc log, I found that there was a cms gc which resulted in a
>> > parnew promotion failure and concurrent mode failure at the same time,
>> and
>> > then the full gc lasted for slightly over three minutes. Here is the gc
>> log:
>> > 2013-05-17T10:12:55.983+0800: 45168.774: [CMS-concurrent-mark:
>> 7.056/7.860
>> > secs] [Times: user=14.90 sys=0.45, real=7.86 secs]
>> > 2013-05-17T10:12:55.984+0800: 45168.775: [CMS-concurrent-preclean-start]
>> > 2013-05-17T10:12:56.753+0800: 45169.544: [CMS-concurrent-preclean:
>> > 0.676/0.770 secs] [Times: user=0.83 sys=0.15, real=0.77 secs]
>> > 2013-05-17T10:12:56.753+0800: 45169.544:
>> > [CMS-concurrent-abortable-preclean-start]
>> > 2013-05-17T10:12:58.460+0800: 45171.251: [GC 45171.252: [ParNew
>> (promotion
>> > failed)
>> > Desired survivor size 67108864 bytes, new threshold 1 (max 6)
>> > - age   1:   70527216 bytes,   70527216 total
>> > : 917504K->917504K(917504K), 177.3558880 secs]45348.608: [CMS CMS: abort
>> > preclean due to time 2013-05-17T10:15:56.197+0800: 45348.989:
>> > [CMS-concurrent-abortable-preclean: 2.037/179.444 secs] [Times:
>> user=44.72
>> > sys=13.59, real=179.45 secs]
>> >  (concurrent mode failure): 3017323K->2177093K(3047424K), 16.5528620
>> secs]
>> > 3879476K->2177093K(3964928K), [CMS Perm : 91333K->91008K(262144K)],
>> > 193.9097970 secs] [Times: user=58.79 sys=13.55, real=193.91 secs]
>> >
>> > the usual cms full gc time was roughly 100ms-400ms, but this time it
>> lasted
>> > for 193 seconds. I understand that when there's a parnew gc happens
>> during
>> > cms and to space is not large enough to hold all survived objects, or
>> the
>> > remaining space in old gen cannot cope with memory allocation in old
>> gen,
>> > full gc happens. But I don't understand why it hangs so long.
>> > I am using oracle jdk 1.6.0_37, and the jvm options we use are:
>> > -Xms4000m -Xmx4000m -Xmn1G -Xss256k -XX:PermSize=256m
>> -XX:SurvivorRatio=6
>> > -XX:MaxTenuringThreshold=6 -XX:+DisableExplicitGC -Xnoclassgc
>> -Xverify:none
>> > -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
>> > -XX:+UseCMSCompactAtFullCollection -XX:+UseFastAccessorMethods
>> > -XX:+CMSPermGenSweepingEnabled -XX:+CMSClassUnloadingEnabled
>> > -XX:+UseCompressedOops -XX:CMSInitiatingOccupancyFraction=90
>> > -XX:+UseCMSInitiatingOccupancyOnly
>> >
>> > Could it be a bug that results in the long full gc in case of promotion
>> > failure or something else? Could anyone offer me some help, and I really
>> > appreciate your help.
>> >
>> > Looking forward to any reply.
>> >
>> > All the best,
>> > Leon
>> >
>> >
>> > _______________________________________________
>> > hotspot-gc-use mailing list
>> > hotspot-gc-use at openjdk.java.net
>> > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>> >
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130517/4695f7cc/attachment-0001.html 

From ysr1729 at gmail.com  Fri May 17 09:37:42 2013
From: ysr1729 at gmail.com (Srinivas Ramakrishna)
Date: Fri, 17 May 2013 09:37:42 -0700
Subject: unexpected full gc time spike
In-Reply-To: <CAKzy53kz0wV_xK=ZSGP6AxQWqktqgzh3ythGtg2sD-h59CQG1w@mail.gmail.com>
References: <CAKzy53nCsSvkpUvSQkGng170Ee9Y7s7+jxih2QRZkKVgEJ68Qw@mail.gmail.com>
	<CABzyjym3bs=hfPY-SsEQNSoYi6QciHwACEdGRot=G95zLGH0Rg@mail.gmail.com>
	<CAKzy53=5tt5N2ajjS7-bHba3RjpvPMhPFW7Fv5XTAugJ-je0bQ@mail.gmail.com>
	<CAKzy53kz0wV_xK=ZSGP6AxQWqktqgzh3ythGtg2sD-h59CQG1w@mail.gmail.com>
Message-ID: <CABzyjymCRehggW5im-qWR8-DWA2J-mbAvLAErGPH+C-utoDZhA@mail.gmail.com>

Hi Leon --

Here's the history of that discussion, starting with this email
(follow subject thread):

http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2012-October/001370.html

On Thu, May 16, 2013 at 10:06 PM, the.6th.month at gmail.com
<the.6th.month at gmail.com> wrote:
> hi, Ramki:
> btw, could you possibly explain what the bugs are and how those bugs affect
> the fallback fullgc time? I am really curious about the reason.
> thanks very much.
>
> all the best,
> Leon
>
> On 17 May 2013 12:39, "the.6th.month at gmail.com" <the.6th.month at gmail.com>
> wrote:
>>
>> thanks very much indeed, hope we can see your patch soon
>>
>>
>> On 17 May 2013 12:38, Srinivas Ramakrishna <ysr1729 at gmail.com> wrote:
>>>
>>> Hi Leon --
>>>
>>> Yes, there are a couple of performance bugs related to promotion
>>> failure handling with ParNew+CMS that can cause this time to balloon.
>>> Here the unwind of the failed promotion took 177 s. I have at least a
>>> partial fix for this which I had written up a few months ago but never
>>> quite got around to collecting sufficient performance data to submit
>>> it as an official patch.
>>>
>>> I'll try and revive that patch and submit it... May be someone else
>>> can check if it helps sufficiently in the performance with promotion
>>> failure.
>>>
>>> -- ramki
>>>
>>>
>>> On Thu, May 16, 2013 at 9:19 PM, the.6th.month at gmail.com
>>> <the.6th.month at gmail.com> wrote:
>>> > hi, all:
>>> > We just had a situation that I don't quite understand with CMS gc. When
>>> > I
>>> > examined the gc log, I found that there was a cms gc which resulted in
>>> > a
>>> > parnew promotion failure and concurrent mode failure at the same time,
>>> > and
>>> > then the full gc lasted for slightly over three minutes. Here is the gc
>>> > log:
>>> > 2013-05-17T10:12:55.983+0800: 45168.774: [CMS-concurrent-mark:
>>> > 7.056/7.860
>>> > secs] [Times: user=14.90 sys=0.45, real=7.86 secs]
>>> > 2013-05-17T10:12:55.984+0800: 45168.775:
>>> > [CMS-concurrent-preclean-start]
>>> > 2013-05-17T10:12:56.753+0800: 45169.544: [CMS-concurrent-preclean:
>>> > 0.676/0.770 secs] [Times: user=0.83 sys=0.15, real=0.77 secs]
>>> > 2013-05-17T10:12:56.753+0800: 45169.544:
>>> > [CMS-concurrent-abortable-preclean-start]
>>> > 2013-05-17T10:12:58.460+0800: 45171.251: [GC 45171.252: [ParNew
>>> > (promotion
>>> > failed)
>>> > Desired survivor size 67108864 bytes, new threshold 1 (max 6)
>>> > - age   1:   70527216 bytes,   70527216 total
>>> > : 917504K->917504K(917504K), 177.3558880 secs]45348.608: [CMS CMS:
>>> > abort
>>> > preclean due to time 2013-05-17T10:15:56.197+0800: 45348.989:
>>> > [CMS-concurrent-abortable-preclean: 2.037/179.444 secs] [Times:
>>> > user=44.72
>>> > sys=13.59, real=179.45 secs]
>>> >  (concurrent mode failure): 3017323K->2177093K(3047424K), 16.5528620
>>> > secs]
>>> > 3879476K->2177093K(3964928K), [CMS Perm : 91333K->91008K(262144K)],
>>> > 193.9097970 secs] [Times: user=58.79 sys=13.55, real=193.91 secs]
>>> >
>>> > the usual cms full gc time was roughly 100ms-400ms, but this time it
>>> > lasted
>>> > for 193 seconds. I understand that when there's a parnew gc happens
>>> > during
>>> > cms and to space is not large enough to hold all survived objects, or
>>> > the
>>> > remaining space in old gen cannot cope with memory allocation in old
>>> > gen,
>>> > full gc happens. But I don't understand why it hangs so long.
>>> > I am using oracle jdk 1.6.0_37, and the jvm options we use are:
>>> > -Xms4000m -Xmx4000m -Xmn1G -Xss256k -XX:PermSize=256m
>>> > -XX:SurvivorRatio=6
>>> > -XX:MaxTenuringThreshold=6 -XX:+DisableExplicitGC -Xnoclassgc
>>> > -Xverify:none
>>> > -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
>>> > -XX:+UseCMSCompactAtFullCollection -XX:+UseFastAccessorMethods
>>> > -XX:+CMSPermGenSweepingEnabled -XX:+CMSClassUnloadingEnabled
>>> > -XX:+UseCompressedOops -XX:CMSInitiatingOccupancyFraction=90
>>> > -XX:+UseCMSInitiatingOccupancyOnly
>>> >
>>> > Could it be a bug that results in the long full gc in case of promotion
>>> > failure or something else? Could anyone offer me some help, and I
>>> > really
>>> > appreciate your help.
>>> >
>>> > Looking forward to any reply.
>>> >
>>> > All the best,
>>> > Leon
>>> >
>>> >
>>> > _______________________________________________
>>> > hotspot-gc-use mailing list
>>> > hotspot-gc-use at openjdk.java.net
>>> > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>> >
>>
>>
>

From ysr1729 at gmail.com  Fri May 17 10:05:16 2013
From: ysr1729 at gmail.com (Srinivas Ramakrishna)
Date: Fri, 17 May 2013 10:05:16 -0700
Subject: unexpected full gc time spike
In-Reply-To: <CABzyjymCRehggW5im-qWR8-DWA2J-mbAvLAErGPH+C-utoDZhA@mail.gmail.com>
References: <CAKzy53nCsSvkpUvSQkGng170Ee9Y7s7+jxih2QRZkKVgEJ68Qw@mail.gmail.com>
	<CABzyjym3bs=hfPY-SsEQNSoYi6QciHwACEdGRot=G95zLGH0Rg@mail.gmail.com>
	<CAKzy53=5tt5N2ajjS7-bHba3RjpvPMhPFW7Fv5XTAugJ-je0bQ@mail.gmail.com>
	<CAKzy53kz0wV_xK=ZSGP6AxQWqktqgzh3ythGtg2sD-h59CQG1w@mail.gmail.com>
	<CABzyjymCRehggW5im-qWR8-DWA2J-mbAvLAErGPH+C-utoDZhA@mail.gmail.com>
Message-ID: <CABzyjyneVH_DZe5P9uttWknpjj2J0Tht9BH=cbwci3UrtZrZ7Q@mail.gmail.com>

Looks like the search functionality of bugs.sun.com is no longer
available. I tried searching the new bugzilla portal for the bug I had
submitted around that time, but that doesn't bring up the bug when i
use the normal search terms, so I do not know if the bug report is
still in review or not, and whether it ever made it into the set of
hotspot/gc bugs or not, but the Review ID i recvd was:-

     "Your Report (Review ID: 2391561) - Promotion failure code does not scale "

I'll try and dig up the (raw, tentative) patch and send it in soon.

-- ramki


On Fri, May 17, 2013 at 9:37 AM, Srinivas Ramakrishna <ysr1729 at gmail.com> wrote:
> Hi Leon --
>
> Here's the history of that discussion, starting with this email
> (follow subject thread):
>
> http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2012-October/001370.html
>
> On Thu, May 16, 2013 at 10:06 PM, the.6th.month at gmail.com
> <the.6th.month at gmail.com> wrote:
>> hi, Ramki:
>> btw, could you possibly explain what the bugs are and how those bugs affect
>> the fallback fullgc time? I am really curious about the reason.
>> thanks very much.
>>
>> all the best,
>> Leon
>>
>> On 17 May 2013 12:39, "the.6th.month at gmail.com" <the.6th.month at gmail.com>
>> wrote:
>>>
>>> thanks very much indeed, hope we can see your patch soon
>>>
>>>
>>> On 17 May 2013 12:38, Srinivas Ramakrishna <ysr1729 at gmail.com> wrote:
>>>>
>>>> Hi Leon --
>>>>
>>>> Yes, there are a couple of performance bugs related to promotion
>>>> failure handling with ParNew+CMS that can cause this time to balloon.
>>>> Here the unwind of the failed promotion took 177 s. I have at least a
>>>> partial fix for this which I had written up a few months ago but never
>>>> quite got around to collecting sufficient performance data to submit
>>>> it as an official patch.
>>>>
>>>> I'll try and revive that patch and submit it... May be someone else
>>>> can check if it helps sufficiently in the performance with promotion
>>>> failure.
>>>>
>>>> -- ramki
>>>>
>>>>
>>>> On Thu, May 16, 2013 at 9:19 PM, the.6th.month at gmail.com
>>>> <the.6th.month at gmail.com> wrote:
>>>> > hi, all:
>>>> > We just had a situation that I don't quite understand with CMS gc. When
>>>> > I
>>>> > examined the gc log, I found that there was a cms gc which resulted in
>>>> > a
>>>> > parnew promotion failure and concurrent mode failure at the same time,
>>>> > and
>>>> > then the full gc lasted for slightly over three minutes. Here is the gc
>>>> > log:
>>>> > 2013-05-17T10:12:55.983+0800: 45168.774: [CMS-concurrent-mark:
>>>> > 7.056/7.860
>>>> > secs] [Times: user=14.90 sys=0.45, real=7.86 secs]
>>>> > 2013-05-17T10:12:55.984+0800: 45168.775:
>>>> > [CMS-concurrent-preclean-start]
>>>> > 2013-05-17T10:12:56.753+0800: 45169.544: [CMS-concurrent-preclean:
>>>> > 0.676/0.770 secs] [Times: user=0.83 sys=0.15, real=0.77 secs]
>>>> > 2013-05-17T10:12:56.753+0800: 45169.544:
>>>> > [CMS-concurrent-abortable-preclean-start]
>>>> > 2013-05-17T10:12:58.460+0800: 45171.251: [GC 45171.252: [ParNew
>>>> > (promotion
>>>> > failed)
>>>> > Desired survivor size 67108864 bytes, new threshold 1 (max 6)
>>>> > - age   1:   70527216 bytes,   70527216 total
>>>> > : 917504K->917504K(917504K), 177.3558880 secs]45348.608: [CMS CMS:
>>>> > abort
>>>> > preclean due to time 2013-05-17T10:15:56.197+0800: 45348.989:
>>>> > [CMS-concurrent-abortable-preclean: 2.037/179.444 secs] [Times:
>>>> > user=44.72
>>>> > sys=13.59, real=179.45 secs]
>>>> >  (concurrent mode failure): 3017323K->2177093K(3047424K), 16.5528620
>>>> > secs]
>>>> > 3879476K->2177093K(3964928K), [CMS Perm : 91333K->91008K(262144K)],
>>>> > 193.9097970 secs] [Times: user=58.79 sys=13.55, real=193.91 secs]
>>>> >
>>>> > the usual cms full gc time was roughly 100ms-400ms, but this time it
>>>> > lasted
>>>> > for 193 seconds. I understand that when there's a parnew gc happens
>>>> > during
>>>> > cms and to space is not large enough to hold all survived objects, or
>>>> > the
>>>> > remaining space in old gen cannot cope with memory allocation in old
>>>> > gen,
>>>> > full gc happens. But I don't understand why it hangs so long.
>>>> > I am using oracle jdk 1.6.0_37, and the jvm options we use are:
>>>> > -Xms4000m -Xmx4000m -Xmn1G -Xss256k -XX:PermSize=256m
>>>> > -XX:SurvivorRatio=6
>>>> > -XX:MaxTenuringThreshold=6 -XX:+DisableExplicitGC -Xnoclassgc
>>>> > -Xverify:none
>>>> > -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
>>>> > -XX:+UseCMSCompactAtFullCollection -XX:+UseFastAccessorMethods
>>>> > -XX:+CMSPermGenSweepingEnabled -XX:+CMSClassUnloadingEnabled
>>>> > -XX:+UseCompressedOops -XX:CMSInitiatingOccupancyFraction=90
>>>> > -XX:+UseCMSInitiatingOccupancyOnly
>>>> >
>>>> > Could it be a bug that results in the long full gc in case of promotion
>>>> > failure or something else? Could anyone offer me some help, and I
>>>> > really
>>>> > appreciate your help.
>>>> >
>>>> > Looking forward to any reply.
>>>> >
>>>> > All the best,
>>>> > Leon
>>>> >
>>>> >
>>>> > _______________________________________________
>>>> > hotspot-gc-use mailing list
>>>> > hotspot-gc-use at openjdk.java.net
>>>> > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>> >
>>>
>>>
>>

From the.6th.month at gmail.com  Fri May 17 10:14:01 2013
From: the.6th.month at gmail.com (the.6th.month at gmail.com)
Date: Sat, 18 May 2013 01:14:01 +0800
Subject: unexpected full gc time spike
In-Reply-To: <CABzyjyneVH_DZe5P9uttWknpjj2J0Tht9BH=cbwci3UrtZrZ7Q@mail.gmail.com>
References: <CAKzy53nCsSvkpUvSQkGng170Ee9Y7s7+jxih2QRZkKVgEJ68Qw@mail.gmail.com>
	<CABzyjym3bs=hfPY-SsEQNSoYi6QciHwACEdGRot=G95zLGH0Rg@mail.gmail.com>
	<CAKzy53=5tt5N2ajjS7-bHba3RjpvPMhPFW7Fv5XTAugJ-je0bQ@mail.gmail.com>
	<CAKzy53kz0wV_xK=ZSGP6AxQWqktqgzh3ythGtg2sD-h59CQG1w@mail.gmail.com>
	<CABzyjymCRehggW5im-qWR8-DWA2J-mbAvLAErGPH+C-utoDZhA@mail.gmail.com>
	<CABzyjyneVH_DZe5P9uttWknpjj2J0Tht9BH=cbwci3UrtZrZ7Q@mail.gmail.com>
Message-ID: <CAKzy53kXH7ERRFeqke0zkm3AT5a9_X-+cyVjcUYCPkA6OwFYrw@mail.gmail.com>

thanks ramki, looking forward to it

Leon
On 18 May 2013 01:05, "Srinivas Ramakrishna" <ysr1729 at gmail.com> wrote:

> Looks like the search functionality of bugs.sun.com is no longer
> available. I tried searching the new bugzilla portal for the bug I had
> submitted around that time, but that doesn't bring up the bug when i
> use the normal search terms, so I do not know if the bug report is
> still in review or not, and whether it ever made it into the set of
> hotspot/gc bugs or not, but the Review ID i recvd was:-
>
>      "Your Report (Review ID: 2391561) - Promotion failure code does not
> scale "
>
> I'll try and dig up the (raw, tentative) patch and send it in soon.
>
> -- ramki
>
>
> On Fri, May 17, 2013 at 9:37 AM, Srinivas Ramakrishna <ysr1729 at gmail.com>
> wrote:
> > Hi Leon --
> >
> > Here's the history of that discussion, starting with this email
> > (follow subject thread):
> >
> >
> http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2012-October/001370.html
> >
> > On Thu, May 16, 2013 at 10:06 PM, the.6th.month at gmail.com
> > <the.6th.month at gmail.com> wrote:
> >> hi, Ramki:
> >> btw, could you possibly explain what the bugs are and how those bugs
> affect
> >> the fallback fullgc time? I am really curious about the reason.
> >> thanks very much.
> >>
> >> all the best,
> >> Leon
> >>
> >> On 17 May 2013 12:39, "the.6th.month at gmail.com" <
> the.6th.month at gmail.com>
> >> wrote:
> >>>
> >>> thanks very much indeed, hope we can see your patch soon
> >>>
> >>>
> >>> On 17 May 2013 12:38, Srinivas Ramakrishna <ysr1729 at gmail.com> wrote:
> >>>>
> >>>> Hi Leon --
> >>>>
> >>>> Yes, there are a couple of performance bugs related to promotion
> >>>> failure handling with ParNew+CMS that can cause this time to balloon.
> >>>> Here the unwind of the failed promotion took 177 s. I have at least a
> >>>> partial fix for this which I had written up a few months ago but never
> >>>> quite got around to collecting sufficient performance data to submit
> >>>> it as an official patch.
> >>>>
> >>>> I'll try and revive that patch and submit it... May be someone else
> >>>> can check if it helps sufficiently in the performance with promotion
> >>>> failure.
> >>>>
> >>>> -- ramki
> >>>>
> >>>>
> >>>> On Thu, May 16, 2013 at 9:19 PM, the.6th.month at gmail.com
> >>>> <the.6th.month at gmail.com> wrote:
> >>>> > hi, all:
> >>>> > We just had a situation that I don't quite understand with CMS gc.
> When
> >>>> > I
> >>>> > examined the gc log, I found that there was a cms gc which resulted
> in
> >>>> > a
> >>>> > parnew promotion failure and concurrent mode failure at the same
> time,
> >>>> > and
> >>>> > then the full gc lasted for slightly over three minutes. Here is
> the gc
> >>>> > log:
> >>>> > 2013-05-17T10:12:55.983+0800: 45168.774: [CMS-concurrent-mark:
> >>>> > 7.056/7.860
> >>>> > secs] [Times: user=14.90 sys=0.45, real=7.86 secs]
> >>>> > 2013-05-17T10:12:55.984+0800: 45168.775:
> >>>> > [CMS-concurrent-preclean-start]
> >>>> > 2013-05-17T10:12:56.753+0800: 45169.544: [CMS-concurrent-preclean:
> >>>> > 0.676/0.770 secs] [Times: user=0.83 sys=0.15, real=0.77 secs]
> >>>> > 2013-05-17T10:12:56.753+0800: 45169.544:
> >>>> > [CMS-concurrent-abortable-preclean-start]
> >>>> > 2013-05-17T10:12:58.460+0800: 45171.251: [GC 45171.252: [ParNew
> >>>> > (promotion
> >>>> > failed)
> >>>> > Desired survivor size 67108864 bytes, new threshold 1 (max 6)
> >>>> > - age   1:   70527216 bytes,   70527216 total
> >>>> > : 917504K->917504K(917504K), 177.3558880 secs]45348.608: [CMS CMS:
> >>>> > abort
> >>>> > preclean due to time 2013-05-17T10:15:56.197+0800: 45348.989:
> >>>> > [CMS-concurrent-abortable-preclean: 2.037/179.444 secs] [Times:
> >>>> > user=44.72
> >>>> > sys=13.59, real=179.45 secs]
> >>>> >  (concurrent mode failure): 3017323K->2177093K(3047424K), 16.5528620
> >>>> > secs]
> >>>> > 3879476K->2177093K(3964928K), [CMS Perm : 91333K->91008K(262144K)],
> >>>> > 193.9097970 secs] [Times: user=58.79 sys=13.55, real=193.91 secs]
> >>>> >
> >>>> > the usual cms full gc time was roughly 100ms-400ms, but this time it
> >>>> > lasted
> >>>> > for 193 seconds. I understand that when there's a parnew gc happens
> >>>> > during
> >>>> > cms and to space is not large enough to hold all survived objects,
> or
> >>>> > the
> >>>> > remaining space in old gen cannot cope with memory allocation in old
> >>>> > gen,
> >>>> > full gc happens. But I don't understand why it hangs so long.
> >>>> > I am using oracle jdk 1.6.0_37, and the jvm options we use are:
> >>>> > -Xms4000m -Xmx4000m -Xmn1G -Xss256k -XX:PermSize=256m
> >>>> > -XX:SurvivorRatio=6
> >>>> > -XX:MaxTenuringThreshold=6 -XX:+DisableExplicitGC -Xnoclassgc
> >>>> > -Xverify:none
> >>>> > -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
> -XX:+CMSParallelRemarkEnabled
> >>>> > -XX:+UseCMSCompactAtFullCollection -XX:+UseFastAccessorMethods
> >>>> > -XX:+CMSPermGenSweepingEnabled -XX:+CMSClassUnloadingEnabled
> >>>> > -XX:+UseCompressedOops -XX:CMSInitiatingOccupancyFraction=90
> >>>> > -XX:+UseCMSInitiatingOccupancyOnly
> >>>> >
> >>>> > Could it be a bug that results in the long full gc in case of
> promotion
> >>>> > failure or something else? Could anyone offer me some help, and I
> >>>> > really
> >>>> > appreciate your help.
> >>>> >
> >>>> > Looking forward to any reply.
> >>>> >
> >>>> > All the best,
> >>>> > Leon
> >>>> >
> >>>> >
> >>>> > _______________________________________________
> >>>> > hotspot-gc-use mailing list
> >>>> > hotspot-gc-use at openjdk.java.net
> >>>> > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
> >>>> >
> >>>
> >>>
> >>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130518/6633aaca/attachment-0001.html 

From jon.masamitsu at oracle.com  Fri May 17 10:29:48 2013
From: jon.masamitsu at oracle.com (Jon Masamitsu)
Date: Fri, 17 May 2013 10:29:48 -0700
Subject: unexpected full gc time spike
In-Reply-To: <CABzyjyneVH_DZe5P9uttWknpjj2J0Tht9BH=cbwci3UrtZrZ7Q@mail.gmail.com>
References: <CAKzy53nCsSvkpUvSQkGng170Ee9Y7s7+jxih2QRZkKVgEJ68Qw@mail.gmail.com>
	<CABzyjym3bs=hfPY-SsEQNSoYi6QciHwACEdGRot=G95zLGH0Rg@mail.gmail.com>
	<CAKzy53=5tt5N2ajjS7-bHba3RjpvPMhPFW7Fv5XTAugJ-je0bQ@mail.gmail.com>
	<CAKzy53kz0wV_xK=ZSGP6AxQWqktqgzh3ythGtg2sD-h59CQG1w@mail.gmail.com>
	<CABzyjymCRehggW5im-qWR8-DWA2J-mbAvLAErGPH+C-utoDZhA@mail.gmail.com>
	<CABzyjyneVH_DZe5P9uttWknpjj2J0Tht9BH=cbwci3UrtZrZ7Q@mail.gmail.com>
Message-ID: <5196690C.6040900@oracle.com>

I found it

8005060: Promotion failure code does not scale 
<https://jbs.oracle.com/bugs/browse/JDK-8005060>


On 5/17/2013 10:05 AM, Srinivas Ramakrishna wrote:
> Looks like the search functionality of bugs.sun.com is no longer
> available. I tried searching the new bugzilla portal for the bug I had
> submitted around that time, but that doesn't bring up the bug when i
> use the normal search terms, so I do not know if the bug report is
> still in review or not, and whether it ever made it into the set of
> hotspot/gc bugs or not, but the Review ID i recvd was:-
>
>       "Your Report (Review ID: 2391561) - Promotion failure code does not scale "
>
> I'll try and dig up the (raw, tentative) patch and send it in soon.
>
> -- ramki
>
>
> On Fri, May 17, 2013 at 9:37 AM, Srinivas Ramakrishna <ysr1729 at gmail.com> wrote:
>> Hi Leon --
>>
>> Here's the history of that discussion, starting with this email
>> (follow subject thread):
>>
>> http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2012-October/001370.html
>>
>> On Thu, May 16, 2013 at 10:06 PM, the.6th.month at gmail.com
>> <the.6th.month at gmail.com> wrote:
>>> hi, Ramki:
>>> btw, could you possibly explain what the bugs are and how those bugs affect
>>> the fallback fullgc time? I am really curious about the reason.
>>> thanks very much.
>>>
>>> all the best,
>>> Leon
>>>
>>> On 17 May 2013 12:39, "the.6th.month at gmail.com" <the.6th.month at gmail.com>
>>> wrote:
>>>> thanks very much indeed, hope we can see your patch soon
>>>>
>>>>
>>>> On 17 May 2013 12:38, Srinivas Ramakrishna <ysr1729 at gmail.com> wrote:
>>>>> Hi Leon --
>>>>>
>>>>> Yes, there are a couple of performance bugs related to promotion
>>>>> failure handling with ParNew+CMS that can cause this time to balloon.
>>>>> Here the unwind of the failed promotion took 177 s. I have at least a
>>>>> partial fix for this which I had written up a few months ago but never
>>>>> quite got around to collecting sufficient performance data to submit
>>>>> it as an official patch.
>>>>>
>>>>> I'll try and revive that patch and submit it... May be someone else
>>>>> can check if it helps sufficiently in the performance with promotion
>>>>> failure.
>>>>>
>>>>> -- ramki
>>>>>
>>>>>
>>>>> On Thu, May 16, 2013 at 9:19 PM, the.6th.month at gmail.com
>>>>> <the.6th.month at gmail.com> wrote:
>>>>>> hi, all:
>>>>>> We just had a situation that I don't quite understand with CMS gc. When
>>>>>> I
>>>>>> examined the gc log, I found that there was a cms gc which resulted in
>>>>>> a
>>>>>> parnew promotion failure and concurrent mode failure at the same time,
>>>>>> and
>>>>>> then the full gc lasted for slightly over three minutes. Here is the gc
>>>>>> log:
>>>>>> 2013-05-17T10:12:55.983+0800: 45168.774: [CMS-concurrent-mark:
>>>>>> 7.056/7.860
>>>>>> secs] [Times: user=14.90 sys=0.45, real=7.86 secs]
>>>>>> 2013-05-17T10:12:55.984+0800: 45168.775:
>>>>>> [CMS-concurrent-preclean-start]
>>>>>> 2013-05-17T10:12:56.753+0800: 45169.544: [CMS-concurrent-preclean:
>>>>>> 0.676/0.770 secs] [Times: user=0.83 sys=0.15, real=0.77 secs]
>>>>>> 2013-05-17T10:12:56.753+0800: 45169.544:
>>>>>> [CMS-concurrent-abortable-preclean-start]
>>>>>> 2013-05-17T10:12:58.460+0800: 45171.251: [GC 45171.252: [ParNew
>>>>>> (promotion
>>>>>> failed)
>>>>>> Desired survivor size 67108864 bytes, new threshold 1 (max 6)
>>>>>> - age   1:   70527216 bytes,   70527216 total
>>>>>> : 917504K->917504K(917504K), 177.3558880 secs]45348.608: [CMS CMS:
>>>>>> abort
>>>>>> preclean due to time 2013-05-17T10:15:56.197+0800: 45348.989:
>>>>>> [CMS-concurrent-abortable-preclean: 2.037/179.444 secs] [Times:
>>>>>> user=44.72
>>>>>> sys=13.59, real=179.45 secs]
>>>>>>   (concurrent mode failure): 3017323K->2177093K(3047424K), 16.5528620
>>>>>> secs]
>>>>>> 3879476K->2177093K(3964928K), [CMS Perm : 91333K->91008K(262144K)],
>>>>>> 193.9097970 secs] [Times: user=58.79 sys=13.55, real=193.91 secs]
>>>>>>
>>>>>> the usual cms full gc time was roughly 100ms-400ms, but this time it
>>>>>> lasted
>>>>>> for 193 seconds. I understand that when there's a parnew gc happens
>>>>>> during
>>>>>> cms and to space is not large enough to hold all survived objects, or
>>>>>> the
>>>>>> remaining space in old gen cannot cope with memory allocation in old
>>>>>> gen,
>>>>>> full gc happens. But I don't understand why it hangs so long.
>>>>>> I am using oracle jdk 1.6.0_37, and the jvm options we use are:
>>>>>> -Xms4000m -Xmx4000m -Xmn1G -Xss256k -XX:PermSize=256m
>>>>>> -XX:SurvivorRatio=6
>>>>>> -XX:MaxTenuringThreshold=6 -XX:+DisableExplicitGC -Xnoclassgc
>>>>>> -Xverify:none
>>>>>> -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
>>>>>> -XX:+UseCMSCompactAtFullCollection -XX:+UseFastAccessorMethods
>>>>>> -XX:+CMSPermGenSweepingEnabled -XX:+CMSClassUnloadingEnabled
>>>>>> -XX:+UseCompressedOops -XX:CMSInitiatingOccupancyFraction=90
>>>>>> -XX:+UseCMSInitiatingOccupancyOnly
>>>>>>
>>>>>> Could it be a bug that results in the long full gc in case of promotion
>>>>>> failure or something else? Could anyone offer me some help, and I
>>>>>> really
>>>>>> appreciate your help.
>>>>>>
>>>>>> Looking forward to any reply.
>>>>>>
>>>>>> All the best,
>>>>>> Leon
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> hotspot-gc-use mailing list
>>>>>> hotspot-gc-use at openjdk.java.net
>>>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>>>>
>>>>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130517/a80f8f34/attachment.html 

From the.6th.month at gmail.com  Sat May 18 02:05:49 2013
From: the.6th.month at gmail.com (the.6th.month at gmail.com)
Date: Sat, 18 May 2013 17:05:49 +0800
Subject: unexpected full gc time spike
In-Reply-To: <5196690C.6040900@oracle.com>
References: <CAKzy53nCsSvkpUvSQkGng170Ee9Y7s7+jxih2QRZkKVgEJ68Qw@mail.gmail.com>
	<CABzyjym3bs=hfPY-SsEQNSoYi6QciHwACEdGRot=G95zLGH0Rg@mail.gmail.com>
	<CAKzy53=5tt5N2ajjS7-bHba3RjpvPMhPFW7Fv5XTAugJ-je0bQ@mail.gmail.com>
	<CAKzy53kz0wV_xK=ZSGP6AxQWqktqgzh3ythGtg2sD-h59CQG1w@mail.gmail.com>
	<CABzyjymCRehggW5im-qWR8-DWA2J-mbAvLAErGPH+C-utoDZhA@mail.gmail.com>
	<CABzyjyneVH_DZe5P9uttWknpjj2J0Tht9BH=cbwci3UrtZrZ7Q@mail.gmail.com>
	<5196690C.6040900@oracle.com>
Message-ID: <CAKzy53mCqaHkZwBqUoES_xE2ApmfMchjbYMiRMNtBW6Sp1CMOA@mail.gmail.com>

Hi, Jon & Ramki:
Sorry I can't get access to that page, the browser says the webpage is
currently unavailable. But I did  look through the whole mail thread
regarding this issue. I am wondering if I get it right. If each thread
failed fast and started to fall back to single-threaded full gc
immediately, there wouldn't be such a long pause. But under the current
mechanism, there's no such a global flag to coordinate threads to fall
back, and each thread tends to retry the allocation of new plab which could
result in highly active locking contention and hence the extremely long
pause.
Is that correct?

Leon


On 18 May 2013 01:29, Jon Masamitsu <jon.masamitsu at oracle.com> wrote:

>  I found it
>
> 8005060: Promotion failure code does not scale<https://jbs.oracle.com/bugs/browse/JDK-8005060>
>
>
>
> On 5/17/2013 10:05 AM, Srinivas Ramakrishna wrote:
>
> Looks like the search functionality of bugs.sun.com is no longer
> available. I tried searching the new bugzilla portal for the bug I had
> submitted around that time, but that doesn't bring up the bug when i
> use the normal search terms, so I do not know if the bug report is
> still in review or not, and whether it ever made it into the set of
> hotspot/gc bugs or not, but the Review ID i recvd was:-
>
>      "Your Report (Review ID: 2391561) - Promotion failure code does not scale "
>
> I'll try and dig up the (raw, tentative) patch and send it in soon.
>
> -- ramki
>
>
> On Fri, May 17, 2013 at 9:37 AM, Srinivas Ramakrishna <ysr1729 at gmail.com> <ysr1729 at gmail.com> wrote:
>
>  Hi Leon --
>
> Here's the history of that discussion, starting with this email
> (follow subject thread):
> http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2012-October/001370.html
>
> On Thu, May 16, 2013 at 10:06 PM, the.6th.month at gmail.com<the.6th.month at gmail.com> <the.6th.month at gmail.com> wrote:
>
>  hi, Ramki:
> btw, could you possibly explain what the bugs are and how those bugs affect
> the fallback fullgc time? I am really curious about the reason.
> thanks very much.
>
> all the best,
> Leon
>
> On 17 May 2013 12:39, "the.6th.month at gmail.com" <the.6th.month at gmail.com> <the.6th.month at gmail.com> <the.6th.month at gmail.com>
> wrote:
>
>  thanks very much indeed, hope we can see your patch soon
>
>
> On 17 May 2013 12:38, Srinivas Ramakrishna <ysr1729 at gmail.com> <ysr1729 at gmail.com> wrote:
>
>  Hi Leon --
>
> Yes, there are a couple of performance bugs related to promotion
> failure handling with ParNew+CMS that can cause this time to balloon.
> Here the unwind of the failed promotion took 177 s. I have at least a
> partial fix for this which I had written up a few months ago but never
> quite got around to collecting sufficient performance data to submit
> it as an official patch.
>
> I'll try and revive that patch and submit it... May be someone else
> can check if it helps sufficiently in the performance with promotion
> failure.
>
> -- ramki
>
>
> On Thu, May 16, 2013 at 9:19 PM, the.6th.month at gmail.com<the.6th.month at gmail.com> <the.6th.month at gmail.com> wrote:
>
>  hi, all:
> We just had a situation that I don't quite understand with CMS gc. When
> I
> examined the gc log, I found that there was a cms gc which resulted in
> a
> parnew promotion failure and concurrent mode failure at the same time,
> and
> then the full gc lasted for slightly over three minutes. Here is the gc
> log:
> 2013-05-17T10:12:55.983+0800: 45168.774: [CMS-concurrent-mark:
> 7.056/7.860
> secs] [Times: user=14.90 sys=0.45, real=7.86 secs]
> 2013-05-17T10:12:55.984+0800: 45168.775:
> [CMS-concurrent-preclean-start]
> 2013-05-17T10:12:56.753+0800: 45169.544: [CMS-concurrent-preclean:
> 0.676/0.770 secs] [Times: user=0.83 sys=0.15, real=0.77 secs]
> 2013-05-17T10:12:56.753+0800: 45169.544:
> [CMS-concurrent-abortable-preclean-start]
> 2013-05-17T10:12:58.460+0800: 45171.251: [GC 45171.252: [ParNew
> (promotion
> failed)
> Desired survivor size 67108864 bytes, new threshold 1 (max 6)
> - age   1:   70527216 bytes,   70527216 total
> : 917504K->917504K(917504K), 177.3558880 secs]45348.608: [CMS CMS:
> abort
> preclean due to time 2013-05-17T10:15:56.197+0800: 45348.989:
> [CMS-concurrent-abortable-preclean: 2.037/179.444 secs] [Times:
> user=44.72
> sys=13.59, real=179.45 secs]
>  (concurrent mode failure): 3017323K->2177093K(3047424K), 16.5528620
> secs]
> 3879476K->2177093K(3964928K), [CMS Perm : 91333K->91008K(262144K)],
> 193.9097970 secs] [Times: user=58.79 sys=13.55, real=193.91 secs]
>
> the usual cms full gc time was roughly 100ms-400ms, but this time it
> lasted
> for 193 seconds. I understand that when there's a parnew gc happens
> during
> cms and to space is not large enough to hold all survived objects, or
> the
> remaining space in old gen cannot cope with memory allocation in old
> gen,
> full gc happens. But I don't understand why it hangs so long.
> I am using oracle jdk 1.6.0_37, and the jvm options we use are:
> -Xms4000m -Xmx4000m -Xmn1G -Xss256k -XX:PermSize=256m
> -XX:SurvivorRatio=6
> -XX:MaxTenuringThreshold=6 -XX:+DisableExplicitGC -Xnoclassgc
> -Xverify:none
> -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
> -XX:+UseCMSCompactAtFullCollection -XX:+UseFastAccessorMethods
> -XX:+CMSPermGenSweepingEnabled -XX:+CMSClassUnloadingEnabled
> -XX:+UseCompressedOops -XX:CMSInitiatingOccupancyFraction=90
> -XX:+UseCMSInitiatingOccupancyOnly
>
> Could it be a bug that results in the long full gc in case of promotion
> failure or something else? Could anyone offer me some help, and I
> really
> appreciate your help.
>
> Looking forward to any reply.
>
> All the best,
> Leon
>
>
> _______________________________________________
> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>     _______________________________________________
> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130518/9c873d4f/attachment.html 

From bengt.rutisson at oracle.com  Sun May 19 22:39:40 2013
From: bengt.rutisson at oracle.com (Bengt Rutisson)
Date: Mon, 20 May 2013 07:39:40 +0200
Subject: unexpected full gc time spike
In-Reply-To: <CAKzy53mCqaHkZwBqUoES_xE2ApmfMchjbYMiRMNtBW6Sp1CMOA@mail.gmail.com>
References: <CAKzy53nCsSvkpUvSQkGng170Ee9Y7s7+jxih2QRZkKVgEJ68Qw@mail.gmail.com>
	<CABzyjym3bs=hfPY-SsEQNSoYi6QciHwACEdGRot=G95zLGH0Rg@mail.gmail.com>
	<CAKzy53=5tt5N2ajjS7-bHba3RjpvPMhPFW7Fv5XTAugJ-je0bQ@mail.gmail.com>
	<CAKzy53kz0wV_xK=ZSGP6AxQWqktqgzh3ythGtg2sD-h59CQG1w@mail.gmail.com>
	<CABzyjymCRehggW5im-qWR8-DWA2J-mbAvLAErGPH+C-utoDZhA@mail.gmail.com>
	<CABzyjyneVH_DZe5P9uttWknpjj2J0Tht9BH=cbwci3UrtZrZ7Q@mail.gmail.com>
	<5196690C.6040900@oracle.com>
	<CAKzy53mCqaHkZwBqUoES_xE2ApmfMchjbYMiRMNtBW6Sp1CMOA@mail.gmail.com>
Message-ID: <5199B71C.5070807@oracle.com>


Hi Leon,

Here is the link to the bug that should be available to you:

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8005060


Ramki,

The bug report made it in to the hotspot/gc bug set just a couple of 
days after you filed it, in December 2012. It has been classified as an 
enhancement and since we are in a bug fixing phase right now we don't 
have anybody assigned to fixing this right now.

I haven't been following this thread closely enough to have a strong 
opinion about whether it is a bug or an enhancement. Let me know if you 
think we should update the  the report to be a bug rather than an 
enhancement.

If you have a patch it would be great to try to get it out for review. 
Sounds like a good thing to fix no matter how we define the bug report :)

Thanks,
Bengt

On 5/18/13 11:05 AM, the.6th.month at gmail.com wrote:
> Hi, Jon & Ramki:
> Sorry I can't get access to that page, the browser says the webpage is 
> currently unavailable. But I did  look through the whole mail thread 
> regarding this issue. I am wondering if I get it right. If each thread 
> failed fast and started to fall back to single-threaded full gc 
> immediately, there wouldn't be such a long pause. But under the 
> current mechanism, there's no such a global flag to coordinate threads 
> to fall back, and each thread tends to retry the allocation of new 
> plab which could result in highly active locking contention and hence 
> the extremely long pause.
> Is that correct?
>
> Leon
>
>
> On 18 May 2013 01:29, Jon Masamitsu <jon.masamitsu at oracle.com 
> <mailto:jon.masamitsu at oracle.com>> wrote:
>
>     I found it
>
>     8005060: Promotion failure code does not scale
>     <https://jbs.oracle.com/bugs/browse/JDK-8005060>
>
>
>
>     On 5/17/2013 10:05 AM, Srinivas Ramakrishna wrote:
>>     Looks like the search functionality ofbugs.sun.com  <http://bugs.sun.com>  is no longer
>>     available. I tried searching the new bugzilla portal for the bug I had
>>     submitted around that time, but that doesn't bring up the bug when i
>>     use the normal search terms, so I do not know if the bug report is
>>     still in review or not, and whether it ever made it into the set of
>>     hotspot/gc bugs or not, but the Review ID i recvd was:-
>>
>>           "Your Report (Review ID: 2391561) - Promotion failure code does not scale "
>>
>>     I'll try and dig up the (raw, tentative) patch and send it in soon.
>>
>>     -- ramki
>>
>>
>>     On Fri, May 17, 2013 at 9:37 AM, Srinivas Ramakrishna<ysr1729 at gmail.com>  <mailto:ysr1729 at gmail.com>  wrote:
>>>     Hi Leon --
>>>
>>>     Here's the history of that discussion, starting with this email
>>>     (follow subject thread):
>>>
>>>     http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2012-October/001370.html
>>>
>>>     On Thu, May 16, 2013 at 10:06 PM,the.6th.month at gmail.com  <mailto:the.6th.month at gmail.com>
>>>     <the.6th.month at gmail.com>  <mailto:the.6th.month at gmail.com>  wrote:
>>>>     hi, Ramki:
>>>>     btw, could you possibly explain what the bugs are and how those bugs affect
>>>>     the fallback fullgc time? I am really curious about the reason.
>>>>     thanks very much.
>>>>
>>>>     all the best,
>>>>     Leon
>>>>
>>>>     On 17 May 2013 12:39,"the.6th.month at gmail.com"  <mailto:the.6th.month at gmail.com>  <the.6th.month at gmail.com>  <mailto:the.6th.month at gmail.com>
>>>>     wrote:
>>>>>     thanks very much indeed, hope we can see your patch soon
>>>>>
>>>>>
>>>>>     On 17 May 2013 12:38, Srinivas Ramakrishna<ysr1729 at gmail.com>  <mailto:ysr1729 at gmail.com>  wrote:
>>>>>>     Hi Leon --
>>>>>>
>>>>>>     Yes, there are a couple of performance bugs related to promotion
>>>>>>     failure handling with ParNew+CMS that can cause this time to balloon.
>>>>>>     Here the unwind of the failed promotion took 177 s. I have at least a
>>>>>>     partial fix for this which I had written up a few months ago but never
>>>>>>     quite got around to collecting sufficient performance data to submit
>>>>>>     it as an official patch.
>>>>>>
>>>>>>     I'll try and revive that patch and submit it... May be someone else
>>>>>>     can check if it helps sufficiently in the performance with promotion
>>>>>>     failure.
>>>>>>
>>>>>>     -- ramki
>>>>>>
>>>>>>
>>>>>>     On Thu, May 16, 2013 at 9:19 PM,the.6th.month at gmail.com  <mailto:the.6th.month at gmail.com>
>>>>>>     <the.6th.month at gmail.com>  <mailto:the.6th.month at gmail.com>  wrote:
>>>>>>>     hi, all:
>>>>>>>     We just had a situation that I don't quite understand with CMS gc. When
>>>>>>>     I
>>>>>>>     examined the gc log, I found that there was a cms gc which resulted in
>>>>>>>     a
>>>>>>>     parnew promotion failure and concurrent mode failure at the same time,
>>>>>>>     and
>>>>>>>     then the full gc lasted for slightly over three minutes. Here is the gc
>>>>>>>     log:
>>>>>>>     2013-05-17T10:12:55.983+0800: 45168.774: [CMS-concurrent-mark:
>>>>>>>     7.056/7.860
>>>>>>>     secs] [Times: user=14.90 sys=0.45, real=7.86 secs]
>>>>>>>     2013-05-17T10:12:55.984+0800: 45168.775:
>>>>>>>     [CMS-concurrent-preclean-start]
>>>>>>>     2013-05-17T10:12:56.753+0800: 45169.544: [CMS-concurrent-preclean:
>>>>>>>     0.676/0.770 secs] [Times: user=0.83 sys=0.15, real=0.77 secs]
>>>>>>>     2013-05-17T10:12:56.753+0800: 45169.544:
>>>>>>>     [CMS-concurrent-abortable-preclean-start]
>>>>>>>     2013-05-17T10:12:58.460+0800: 45171.251: [GC 45171.252: [ParNew
>>>>>>>     (promotion
>>>>>>>     failed)
>>>>>>>     Desired survivor size 67108864 bytes, new threshold 1 (max 6)
>>>>>>>     - age   1:   70527216 bytes,   70527216 total
>>>>>>>     : 917504K->917504K(917504K), 177.3558880 secs]45348.608: [CMS CMS:
>>>>>>>     abort
>>>>>>>     preclean due to time 2013-05-17T10:15:56.197+0800: 45348.989:
>>>>>>>     [CMS-concurrent-abortable-preclean: 2.037/179.444 secs] [Times:
>>>>>>>     user=44.72
>>>>>>>     sys=13.59, real=179.45 secs]
>>>>>>>       (concurrent mode failure): 3017323K->2177093K(3047424K), 16.5528620
>>>>>>>     secs]
>>>>>>>     3879476K->2177093K(3964928K), [CMS Perm : 91333K->91008K(262144K)],
>>>>>>>     193.9097970 secs] [Times: user=58.79 sys=13.55, real=193.91 secs]
>>>>>>>
>>>>>>>     the usual cms full gc time was roughly 100ms-400ms, but this time it
>>>>>>>     lasted
>>>>>>>     for 193 seconds. I understand that when there's a parnew gc happens
>>>>>>>     during
>>>>>>>     cms and to space is not large enough to hold all survived objects, or
>>>>>>>     the
>>>>>>>     remaining space in old gen cannot cope with memory allocation in old
>>>>>>>     gen,
>>>>>>>     full gc happens. But I don't understand why it hangs so long.
>>>>>>>     I am using oracle jdk 1.6.0_37, and the jvm options we use are:
>>>>>>>     -Xms4000m -Xmx4000m -Xmn1G -Xss256k -XX:PermSize=256m
>>>>>>>     -XX:SurvivorRatio=6
>>>>>>>     -XX:MaxTenuringThreshold=6 -XX:+DisableExplicitGC -Xnoclassgc
>>>>>>>     -Xverify:none
>>>>>>>     -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
>>>>>>>     -XX:+UseCMSCompactAtFullCollection -XX:+UseFastAccessorMethods
>>>>>>>     -XX:+CMSPermGenSweepingEnabled -XX:+CMSClassUnloadingEnabled
>>>>>>>     -XX:+UseCompressedOops -XX:CMSInitiatingOccupancyFraction=90
>>>>>>>     -XX:+UseCMSInitiatingOccupancyOnly
>>>>>>>
>>>>>>>     Could it be a bug that results in the long full gc in case of promotion
>>>>>>>     failure or something else? Could anyone offer me some help, and I
>>>>>>>     really
>>>>>>>     appreciate your help.
>>>>>>>
>>>>>>>     Looking forward to any reply.
>>>>>>>
>>>>>>>     All the best,
>>>>>>>     Leon
>>>>>>>
>>>>>>>
>>>>>>>     _______________________________________________
>>>>>>>     hotspot-gc-use mailing list
>>>>>>>     hotspot-gc-use at openjdk.java.net  <mailto:hotspot-gc-use at openjdk.java.net>
>>>>>>>     http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>>>>>
>>     _______________________________________________
>>     hotspot-gc-use mailing list
>>     hotspot-gc-use at openjdk.java.net  <mailto:hotspot-gc-use at openjdk.java.net>
>>     http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
>     _______________________________________________
>     hotspot-gc-use mailing list
>     hotspot-gc-use at openjdk.java.net
>     <mailto:hotspot-gc-use at openjdk.java.net>
>     http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130520/a7064d63/attachment.html 

From java at java4.info  Thu May 23 05:25:27 2013
From: java at java4.info (Florian Binder)
Date: Thu, 23 May 2013 14:25:27 +0200
Subject: Missing memory
Message-ID: <519E0AB7.3010202@java4.info>

Hi all,

I am running a jboss application with an embedded h2-database using the 
CMS-Collector.
It uses the following memory configuration:
-Xms8G -Xmx8G -Xmn2G

After running a while I got the following interesting issue:
After a young collection the application uses only 3172435K (8178944K). 
But In the statistics for the BinaryTreeDictionary I see only 1976982 
words (~ 16MB) of Total Free Space. So I am wondering about the 2GB 
which are not used and not in the free list space. Might it be in a TLAB 
or PLAB or where?

The annoying problem of this occurs during the next young collection 
where it does not have enough space in the old generation and fails with 
"promotion failed", which results in a 17s stw collection. After this 
collection I have 446204324 of Total Free Space, which seems correct.
A concurrent collection is not running due to less usage of the old 
generation.

I am running it on an 8 core machine with
Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode) (1.7.0_17).
Detailed information can be found below.

Thank you for your help,
Flo


############ The startup parameter are: ############
-server \
-Xms8G -Xmx8G \
-Xmn2G \
-XX:MaxPermSize=256m \
-verbose:gc \
-XX:+PrintGC \
-XX:+PrintGCDateStamps \
-XX:+PrintGCDetails \
-XX:+UseConcMarkSweepGC \
-XX:CMSInitiatingOccupancyFraction=80 \
-XX:+PrintFlagsFinal \
-XX:PrintFLSStatistics=1 \
-XX:+PrintTenuringDistribution \
-XX:+PrintGCApplicationConcurrentTime \
-XX:+PrintGCApplicationStoppedTime \
-XX:+UseLargePages \
-XX:LargePageSizeInBytes=4m \

############ The relevant gc-log snippet: ############
2013-05-23T01:04:57.536-0400: [GC Before GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 4459853
Max   Chunk Size: 2117113
Number of Blocks: 10
Av.  Block  Size: 445985
Tree      Height: 6
Before GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 6837777
Max   Chunk Size: 6832640
Number of Blocks: 6
Av.  Block  Size: 1139629
Tree      Height: 5
[ParNew
Desired survivor size 107347968 bytes, new threshold 2 (max 6)
- age   1:   59250760 bytes,   59250760 total
- age   2:   72435232 bytes,  131685992 total
: 1887488K->206257K(1887488K), 0,1177960 secs] 
4788275K->3172435K(8178944K)After GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 1976982
Max   Chunk Size: 1969801
Number of Blocks: 2
Av.  Block  Size: 988491
Tree      Height: 2
After GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 6837777
Max   Chunk Size: 6832640
Number of Blocks: 6
Av.  Block  Size: 1139629
Tree      Height: 5
, 0,1179200 secs] [Times: user=0,80 sys=0,00, real=0,12 secs]
Total time for which application threads were stopped: 0,1186510 seconds
Application time: 0,7920070 seconds
2013-05-23T01:04:58.447-0400: [GC Before GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 1976982
Max   Chunk Size: 1969801
Number of Blocks: 2
Av.  Block  Size: 988491
Tree      Height: 2
Before GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 6837777
Max   Chunk Size: 6832640
Number of Blocks: 6
Av.  Block  Size: 1139629
Tree      Height: 5
[ParNew (promotion failed)
Desired survivor size 107347968 bytes, new threshold 2 (max 6)
- age   1:   57903280 bytes,   57903280 total
- age   2:   52076168 bytes,  109979448 total
: 1884081K->1878750K(1887488K), 2,5295040 secs][CMSCMS: Large block 
0x000000071b3bb2e0
: 3020224K->2805356K(6291456K), 15,0643760 secs] 
4850259K->2805356K(8178944K), [CMS Perm : 80257K->79027K(133764K)]After GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 446204324
Max   Chunk Size: 446204324
Number of Blocks: 1
Av.  Block  Size: 446204324
Tree      Height: 1
After GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 0
Max   Chunk Size: 0
Number of Blocks: 0
Tree      Height: 0
, 17,5940190 secs] [Times: user=18,00 sys=0,78, real=17,59 secs]

From Philip.Lee at smartstream-stp.com  Thu May 23 05:47:14 2013
From: Philip.Lee at smartstream-stp.com (Philip Lee)
Date: Thu, 23 May 2013 12:47:14 +0000
Subject: Missing memory
In-Reply-To: <519E0AB7.3010202@java4.info>
References: <519E0AB7.3010202@java4.info>
Message-ID: <62EC155DFFC99A4EB1D3BBC6CD0A395D01BFF1EDC1@briexch0002.sst.stp>

Hi,

We have seen a few problems using the CMS collector with JBoss one of which was that the CMS collector does not collect objects that have finalizer() methods. The version of JBoss that we were using (5.1) made heavy use of objects with finalizer() methods within the implementation of its VFS component.

We ended up switching to the parallel collector which gave us maximum pause times of around 5s on a 5G heap.

- Phil
________________________________________
From: hotspot-gc-use-bounces at openjdk.java.net [hotspot-gc-use-bounces at openjdk.java.net] on behalf of Florian Binder [java at java4.info]
Sent: 23 May 2013 13:25
To: hotspot-gc-use at openjdk.java.net
Subject: Missing memory

Hi all,

I am running a jboss application with an embedded h2-database using the
CMS-Collector.
It uses the following memory configuration:
-Xms8G -Xmx8G -Xmn2G

After running a while I got the following interesting issue:
After a young collection the application uses only 3172435K (8178944K).
But In the statistics for the BinaryTreeDictionary I see only 1976982
words (~ 16MB) of Total Free Space. So I am wondering about the 2GB
which are not used and not in the free list space. Might it be in a TLAB
or PLAB or where?

The annoying problem of this occurs during the next young collection
where it does not have enough space in the old generation and fails with
"promotion failed", which results in a 17s stw collection. After this
collection I have 446204324 of Total Free Space, which seems correct.
A concurrent collection is not running due to less usage of the old
generation.

I am running it on an 8 core machine with
Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode) (1.7.0_17).
Detailed information can be found below.

Thank you for your help,
Flo


############ The startup parameter are: ############
-server \
-Xms8G -Xmx8G \
-Xmn2G \
-XX:MaxPermSize=256m \
-verbose:gc \
-XX:+PrintGC \
-XX:+PrintGCDateStamps \
-XX:+PrintGCDetails \
-XX:+UseConcMarkSweepGC \
-XX:CMSInitiatingOccupancyFraction=80 \
-XX:+PrintFlagsFinal \
-XX:PrintFLSStatistics=1 \
-XX:+PrintTenuringDistribution \
-XX:+PrintGCApplicationConcurrentTime \
-XX:+PrintGCApplicationStoppedTime \
-XX:+UseLargePages \
-XX:LargePageSizeInBytes=4m \

############ The relevant gc-log snippet: ############
2013-05-23T01:04:57.536-0400: [GC Before GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 4459853
Max   Chunk Size: 2117113
Number of Blocks: 10
Av.  Block  Size: 445985
Tree      Height: 6
Before GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 6837777
Max   Chunk Size: 6832640
Number of Blocks: 6
Av.  Block  Size: 1139629
Tree      Height: 5
[ParNew
Desired survivor size 107347968 bytes, new threshold 2 (max 6)
- age   1:   59250760 bytes,   59250760 total
- age   2:   72435232 bytes,  131685992 total
: 1887488K->206257K(1887488K), 0,1177960 secs]
4788275K->3172435K(8178944K)After GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 1976982
Max   Chunk Size: 1969801
Number of Blocks: 2
Av.  Block  Size: 988491
Tree      Height: 2
After GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 6837777
Max   Chunk Size: 6832640
Number of Blocks: 6
Av.  Block  Size: 1139629
Tree      Height: 5
, 0,1179200 secs] [Times: user=0,80 sys=0,00, real=0,12 secs]
Total time for which application threads were stopped: 0,1186510 seconds
Application time: 0,7920070 seconds
2013-05-23T01:04:58.447-0400: [GC Before GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 1976982
Max   Chunk Size: 1969801
Number of Blocks: 2
Av.  Block  Size: 988491
Tree      Height: 2
Before GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 6837777
Max   Chunk Size: 6832640
Number of Blocks: 6
Av.  Block  Size: 1139629
Tree      Height: 5
[ParNew (promotion failed)
Desired survivor size 107347968 bytes, new threshold 2 (max 6)
- age   1:   57903280 bytes,   57903280 total
- age   2:   52076168 bytes,  109979448 total
: 1884081K->1878750K(1887488K), 2,5295040 secs][CMSCMS: Large block
0x000000071b3bb2e0
: 3020224K->2805356K(6291456K), 15,0643760 secs]
4850259K->2805356K(8178944K), [CMS Perm : 80257K->79027K(133764K)]After GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 446204324
Max   Chunk Size: 446204324
Number of Blocks: 1
Av.  Block  Size: 446204324
Tree      Height: 1
After GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 0
Max   Chunk Size: 0
Number of Blocks: 0
Tree      Height: 0
, 17,5940190 secs] [Times: user=18,00 sys=0,78, real=17,59 secs]
_______________________________________________
hotspot-gc-use mailing list
hotspot-gc-use at openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
________________________________
 The information in this email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this email by anyone else is unauthorised. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful.
SmartStream Technologies Ltd. is a company incorporated in England and Wales. Registered office: St Helen's, 1 Undershaft, London, EC3A 8EE. Registration No. 2285524

From java at java4.info  Thu May 23 09:36:10 2013
From: java at java4.info (Florian Binder)
Date: Thu, 23 May 2013 18:36:10 +0200
Subject: Missing memory
In-Reply-To: <62EC155DFFC99A4EB1D3BBC6CD0A395D01BFF1EDC1@briexch0002.sst.stp>
References: <519E0AB7.3010202@java4.info>
	<62EC155DFFC99A4EB1D3BBC6CD0A395D01BFF1EDC1@briexch0002.sst.stp>
Message-ID: <519E457A.8070509@java4.info>

Ok, I found it:

Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 97824599
Max   Chunk Size: 67277447
Number of Blocks: 9063
Av.  Block  Size: 10793
Tree      Height: 67
Statistics for IndexedFreeLists:
--------------------------------
Total Free Space: 322670900
Max   Chunk Size: 256
Number of Blocks: 53951619
Av.  Block  Size: 5
  free=420495499 frag=0,9744

They are in the IndexedFreeLists. There seem to be a lot of very small 
objects in the old generation, which are removed soon :(

/Flo


Am 23.05.2013 14:47, schrieb Philip Lee:
> Hi,
>
> We have seen a few problems using the CMS collector with JBoss one of which was that the CMS collector does not collect objects that have finalizer() methods. The version of JBoss that we were using (5.1) made heavy use of objects with finalizer() methods within the implementation of its VFS component.
>
> We ended up switching to the parallel collector which gave us maximum pause times of around 5s on a 5G heap.
>
> - Phil
> ________________________________________
> From: hotspot-gc-use-bounces at openjdk.java.net [hotspot-gc-use-bounces at openjdk.java.net] on behalf of Florian Binder [java at java4.info]
> Sent: 23 May 2013 13:25
> To: hotspot-gc-use at openjdk.java.net
> Subject: Missing memory
>
> Hi all,
>
> I am running a jboss application with an embedded h2-database using the
> CMS-Collector.
> It uses the following memory configuration:
> -Xms8G -Xmx8G -Xmn2G
>
> After running a while I got the following interesting issue:
> After a young collection the application uses only 3172435K (8178944K).
> But In the statistics for the BinaryTreeDictionary I see only 1976982
> words (~ 16MB) of Total Free Space. So I am wondering about the 2GB
> which are not used and not in the free list space. Might it be in a TLAB
> or PLAB or where?
>
> The annoying problem of this occurs during the next young collection
> where it does not have enough space in the old generation and fails with
> "promotion failed", which results in a 17s stw collection. After this
> collection I have 446204324 of Total Free Space, which seems correct.
> A concurrent collection is not running due to less usage of the old
> generation.
>
> I am running it on an 8 core machine with
> Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode) (1.7.0_17).
> Detailed information can be found below.
>
> Thank you for your help,
> Flo
>
>
> ############ The startup parameter are: ############
> -server \
> -Xms8G -Xmx8G \
> -Xmn2G \
> -XX:MaxPermSize=256m \
> -verbose:gc \
> -XX:+PrintGC \
> -XX:+PrintGCDateStamps \
> -XX:+PrintGCDetails \
> -XX:+UseConcMarkSweepGC \
> -XX:CMSInitiatingOccupancyFraction=80 \
> -XX:+PrintFlagsFinal \
> -XX:PrintFLSStatistics=1 \
> -XX:+PrintTenuringDistribution \
> -XX:+PrintGCApplicationConcurrentTime \
> -XX:+PrintGCApplicationStoppedTime \
> -XX:+UseLargePages \
> -XX:LargePageSizeInBytes=4m \
>
> ############ The relevant gc-log snippet: ############
> 2013-05-23T01:04:57.536-0400: [GC Before GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 4459853
> Max   Chunk Size: 2117113
> Number of Blocks: 10
> Av.  Block  Size: 445985
> Tree      Height: 6
> Before GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 6837777
> Max   Chunk Size: 6832640
> Number of Blocks: 6
> Av.  Block  Size: 1139629
> Tree      Height: 5
> [ParNew
> Desired survivor size 107347968 bytes, new threshold 2 (max 6)
> - age   1:   59250760 bytes,   59250760 total
> - age   2:   72435232 bytes,  131685992 total
> : 1887488K->206257K(1887488K), 0,1177960 secs]
> 4788275K->3172435K(8178944K)After GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 1976982
> Max   Chunk Size: 1969801
> Number of Blocks: 2
> Av.  Block  Size: 988491
> Tree      Height: 2
> After GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 6837777
> Max   Chunk Size: 6832640
> Number of Blocks: 6
> Av.  Block  Size: 1139629
> Tree      Height: 5
> , 0,1179200 secs] [Times: user=0,80 sys=0,00, real=0,12 secs]
> Total time for which application threads were stopped: 0,1186510 seconds
> Application time: 0,7920070 seconds
> 2013-05-23T01:04:58.447-0400: [GC Before GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 1976982
> Max   Chunk Size: 1969801
> Number of Blocks: 2
> Av.  Block  Size: 988491
> Tree      Height: 2
> Before GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 6837777
> Max   Chunk Size: 6832640
> Number of Blocks: 6
> Av.  Block  Size: 1139629
> Tree      Height: 5
> [ParNew (promotion failed)
> Desired survivor size 107347968 bytes, new threshold 2 (max 6)
> - age   1:   57903280 bytes,   57903280 total
> - age   2:   52076168 bytes,  109979448 total
> : 1884081K->1878750K(1887488K), 2,5295040 secs][CMSCMS: Large block
> 0x000000071b3bb2e0
> : 3020224K->2805356K(6291456K), 15,0643760 secs]
> 4850259K->2805356K(8178944K), [CMS Perm : 80257K->79027K(133764K)]After GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 446204324
> Max   Chunk Size: 446204324
> Number of Blocks: 1
> Av.  Block  Size: 446204324
> Tree      Height: 1
> After GC:
> Statistics for BinaryTreeDictionary:
> ------------------------------------
> Total Free Space: 0
> Max   Chunk Size: 0
> Number of Blocks: 0
> Tree      Height: 0
> , 17,5940190 secs] [Times: user=18,00 sys=0,78, real=17,59 secs]
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
> ________________________________
>   The information in this email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this email by anyone else is unauthorised. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful.
> SmartStream Technologies Ltd. is a company incorporated in England and Wales. Registered office: St Helen's, 1 Undershaft, London, EC3A 8EE. Registration No. 2285524
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130523/a9bbb202/attachment-0001.html 

From martin.makundi at koodaripalvelut.com  Sat May 25 21:32:37 2013
From: martin.makundi at koodaripalvelut.com (Martin Makundi)
Date: Sun, 26 May 2013 07:32:37 +0300
Subject: Bug in G1GC it performs Full GC when code cache is full resulting in
	overkill
Message-ID: <CAKspAHKSnrndWznEe8Uk-frRALUJ_b_JVq+be5woQuj0BnKSeg@mail.gmail.com>

Hi!

For long time, we have had about 10gb more memory than what we need,
but about 1-3 times per day in production, the G1GC performs Full GC
without any apparent reason.

Recently we installed Appdynamics profiler which shows also Code Cache
memory levels. To our surprise we noticed, that every time the code
cache becomes almost full, G1GC performs a Full GC, which we ofcourse
consider an overkill because the Full GC takes nearly 60 seconds every
time with our memory size!

Is this a bug in G1GC or is there a configuration option to disable
such behavior?


See profiler snapshots at:

http://eisler.vps.kotisivut.com/logs/g1gc-code-cache-full-gc-bug-illustration.png

The issue is not an isolated occurrence, it occurs daily.

Similar posts can be found on the web where G1GC performs Full GC with
no apparent reason:

http://grokbase.com/t/openjdk/hotspot-gc-use/1192sy84j5/g1c-strange-full-gc-behavior
http://grokbase.com/p/openjdk/hotspot-gc-use/123ydf9c92/puzzling-why-is-a-full-gc-triggered-here


**
Martin

From chunt at salesforce.com  Sun May 26 07:21:34 2013
From: chunt at salesforce.com (Charlie Hunt)
Date: Sun, 26 May 2013 07:21:34 -0700
Subject: Bug in G1GC it performs Full GC when code cache is full
	resulting in	overkill
In-Reply-To: <CAKspAHKSnrndWznEe8Uk-frRALUJ_b_JVq+be5woQuj0BnKSeg@mail.gmail.com>
References: <CAKspAHKSnrndWznEe8Uk-frRALUJ_b_JVq+be5woQuj0BnKSeg@mail.gmail.com>
Message-ID: <3807DE76-D6CA-451F-AC72-771332825905@salesforce.com>

Which version of the JDK/JRE are you using?

One of the links you referenced below was using JDK 6, where there is no official support for G1. The other link suggests it could have been RMI DGC or a System.gc().


Sent from my iPhone

On May 25, 2013, at 11:43 PM, "Martin Makundi" <martin.makundi at koodaripalvelut.com> wrote:

> it occurs daily.

From martin.makundi at koodaripalvelut.com  Sun May 26 08:20:36 2013
From: martin.makundi at koodaripalvelut.com (Martin Makundi)
Date: Sun, 26 May 2013 18:20:36 +0300
Subject: Bug in G1GC it performs Full GC when code cache is full resulting
	in overkill
In-Reply-To: <3807DE76-D6CA-451F-AC72-771332825905@salesforce.com>
References: <CAKspAHKSnrndWznEe8Uk-frRALUJ_b_JVq+be5woQuj0BnKSeg@mail.gmail.com>
	<3807DE76-D6CA-451F-AC72-771332825905@salesforce.com>
Message-ID: <CAKspAHJ5M7CikMmyr0t0h=5GOFjHJ3jACo-M6hW-ZXoz=v=-2g@mail.gmail.com>

Sorry, forgot to mention, using:

java version "1.7.0_21"
Java(TM) SE Runtime Environment (build 1.7.0_21-b11)
Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode)

Linux version 3.0.1.stk64 (dfn at localhost.localdomain) (gcc version
4.5.1 20100924 (Red Hat 4.5.1-4) (GCC) ) #1 SMP Sat Aug 13 12:53:46
EDT 2011

-Dclassworlds.conf=/usr/share/maven/maven/bin/m2.conf
-Dmaven.home=/usr/share/maven/maven
-Duser.timezone=EET
-XX:+AggressiveOpts
-XX:+DisableExplicitGC
-XX:+ParallelRefProcEnabled
-XX:+PrintGCDateStamps
-XX:+PrintGCDetails
-XX:+PrintHeapAtGC
-XX:+UseAdaptiveSizePolicy
-XX:+UseCompressedOops
-XX:+UseFastAccessorMethods
-XX:+UseG1GC
-XX:+UseGCOverheadLimit
-XX:+UseNUMA
-XX:+UseStringCache
-XX:CMSInitiatingOccupancyFraction=70
-XX:GCPauseIntervalMillis=10000
-XX:InitiatingHeapOccupancyPercent=0
-XX:MaxGCPauseMillis=500
-XX:MaxPermSize=512m
-XX:PermSize=512m
-XX:ReservedCodeCacheSize=48m
-Xloggc:gc.log
-Xmaxf1
-Xms30G
-Xmx30G
-Xnoclassgc
-Xss4096k


**
Martin

2013/5/26 Charlie Hunt <chunt at salesforce.com>:
> Which version of the JDK/JRE are you using?
>
> One of the links you referenced below was using JDK 6, where there is no official support for G1. The other link suggests it could have been RMI DGC or a System.gc().
>
>
>
> Sent from my iPhone
>
> On May 25, 2013, at 11:43 PM, "Martin Makundi" <martin.makundi at koodaripalvelut.com> wrote:
>
>> it occurs daily.

From darius.ski at gmail.com  Mon May 27 15:51:37 2013
From: darius.ski at gmail.com (Darius D.)
Date: Tue, 28 May 2013 01:51:37 +0300
Subject: Bug in G1GC it performs Full GC when code cache is full resulting
	in overkill
In-Reply-To: <CAKspAHJ5M7CikMmyr0t0h=5GOFjHJ3jACo-M6hW-ZXoz=v=-2g@mail.gmail.com>
References: <CAKspAHKSnrndWznEe8Uk-frRALUJ_b_JVq+be5woQuj0BnKSeg@mail.gmail.com>
	<3807DE76-D6CA-451F-AC72-771332825905@salesforce.com>
	<CAKspAHJ5M7CikMmyr0t0h=5GOFjHJ3jACo-M6hW-ZXoz=v=-2g@mail.gmail.com>
Message-ID: <CAKt3ReK2t_r-dirb135Qbr7zVO=2=om-TQnVLQZxEtFvgv+2wg@mail.gmail.com>

Hi,

since I see a reference to my old post with G1 problems, I felt the
need to share our success story with G1 collector.

We have been using G1 in production since ~1.7_04, the main reason
being that CMS was generating way too many FullGCs (in retrospective
this is already a hint what the real problem was ...).

It turned out that we were getting Full GCs due to humongous object
allocation failures, our application generates JSON objects sized in
quite a few megabytes and due to unfortunate design of web framework
this was causing plenty of reallocation during character encode phase
spraying heap in progress. Once proper instrumentation was in place
for G1GC in mid-late 2012, we were seeing humongous allocation
failures of some 30+ megabytes or so. No wonder that in a busy heap
there was not enough continuous space for object this large ( remember
realloc chain burned 16, 8, 4, 2 etc megabyte sized  chunks down to
half of G1HeapRegionSize).

So we set out to fix it:

1) We got performance patch merged into our open source web framework
that slashed reallocations and rewrote our own code that was
generating JSON String to limit reallocation.
2) Tuning  -XX:G1HeapRegionSize once we saw proper explanation of G1GC
in Monica's JavaOne presentation. For whatever reason for ~7-8GB heap
we were getting thousands of G1GC heap regions and we've arrived to
-XX:G1HeapRegionSize=16m after some testing.

The fact is after all this tuning G1GC has been performing amazingly
for us { -XX:G1HeapRegionSize=16m
-XX:InitiatingHeapOccupancyPercent=33 -XX:MaxGCPauseMillis=250
-XX:+UseG1GC -Xmx7168m -Xms7168m -XX:MaxPermSize=768m
-XX:ReservedCodeCacheSize=128m }. We haven't seen FullGC in production
for months now and we have to work really hard to generate them in
testing too. Our GC pauses are sub 0.3s, giving us desired web app
performance.


Keep up great work! :)


Darius.


On Sun, May 26, 2013 at 6:20 PM, Martin Makundi
<martin.makundi at koodaripalvelut.com> wrote:
> Sorry, forgot to mention, using:
>
> java version "1.7.0_21"
> Java(TM) SE Runtime Environment (build 1.7.0_21-b11)
> Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode)
>
> Linux version 3.0.1.stk64 (dfn at localhost.localdomain) (gcc version
> 4.5.1 20100924 (Red Hat 4.5.1-4) (GCC) ) #1 SMP Sat Aug 13 12:53:46
> EDT 2011
>
> -Dclassworlds.conf=/usr/share/maven/maven/bin/m2.conf
> -Dmaven.home=/usr/share/maven/maven
> -Duser.timezone=EET
> -XX:+AggressiveOpts
> -XX:+DisableExplicitGC
> -XX:+ParallelRefProcEnabled
> -XX:+PrintGCDateStamps
> -XX:+PrintGCDetails
> -XX:+PrintHeapAtGC
> -XX:+UseAdaptiveSizePolicy
> -XX:+UseCompressedOops
> -XX:+UseFastAccessorMethods
> -XX:+UseG1GC
> -XX:+UseGCOverheadLimit
> -XX:+UseNUMA
> -XX:+UseStringCache
> -XX:CMSInitiatingOccupancyFraction=70
> -XX:GCPauseIntervalMillis=10000
> -XX:InitiatingHeapOccupancyPercent=0
> -XX:MaxGCPauseMillis=500
> -XX:MaxPermSize=512m
> -XX:PermSize=512m
> -XX:ReservedCodeCacheSize=48m
> -Xloggc:gc.log
> -Xmaxf1
> -Xms30G
> -Xmx30G
> -Xnoclassgc
> -Xss4096k
>
>
> **
> Martin
>
> 2013/5/26 Charlie Hunt <chunt at salesforce.com>:
>> Which version of the JDK/JRE are you using?
>>
>> One of the links you referenced below was using JDK 6, where there is no official support for G1. The other link suggests it could have been RMI DGC or a System.gc().
>>
>>
>>
>> Sent from my iPhone
>>
>> On May 25, 2013, at 11:43 PM, "Martin Makundi" <martin.makundi at koodaripalvelut.com> wrote:
>>
>>> it occurs daily.
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

From martin.makundi at koodaripalvelut.com  Mon May 27 19:09:08 2013
From: martin.makundi at koodaripalvelut.com (Martin Makundi)
Date: Tue, 28 May 2013 05:09:08 +0300
Subject: Bug in G1GC it performs Full GC when code cache is full resulting
	in overkill
In-Reply-To: <CAKt3ReK2t_r-dirb135Qbr7zVO=2=om-TQnVLQZxEtFvgv+2wg@mail.gmail.com>
References: <CAKspAHKSnrndWznEe8Uk-frRALUJ_b_JVq+be5woQuj0BnKSeg@mail.gmail.com>
	<3807DE76-D6CA-451F-AC72-771332825905@salesforce.com>
	<CAKspAHJ5M7CikMmyr0t0h=5GOFjHJ3jACo-M6hW-ZXoz=v=-2g@mail.gmail.com>
	<CAKt3ReK2t_r-dirb135Qbr7zVO=2=om-TQnVLQZxEtFvgv+2wg@mail.gmail.com>
Message-ID: <CAKspAHJFEx9r3a=TDq=Sj1MM_7tbHm4e9X2H5kCgqe6xprDHcQ@mail.gmail.com>

Hi!

We actually recorded this bug on 1.7._u06 and upgraded to  1.7.0_21-b11
just a couple of days ago and mistakenly reported wrong version to be in
use when bug in effect.

We haven't seen any Full GC's since upgrade, so it seems like this bug is
fixed. The code cache memory profiles look much nicer now too in the
profiler.

If the situation changes, I will report back. Until then, I consider this
problem solved with the upgrade.

**
Martin

2013/5/28 Darius D. <darius.ski at gmail.com>

> Hi,
>
> since I see a reference to my old post with G1 problems, I felt the
> need to share our success story with G1 collector.
>
> We have been using G1 in production since ~1.7_04, the main reason
> being that CMS was generating way too many FullGCs (in retrospective
> this is already a hint what the real problem was ...).
>
> It turned out that we were getting Full GCs due to humongous object
> allocation failures, our application generates JSON objects sized in
> quite a few megabytes and due to unfortunate design of web framework
> this was causing plenty of reallocation during character encode phase
> spraying heap in progress. Once proper instrumentation was in place
> for G1GC in mid-late 2012, we were seeing humongous allocation
> failures of some 30+ megabytes or so. No wonder that in a busy heap
> there was not enough continuous space for object this large ( remember
> realloc chain burned 16, 8, 4, 2 etc megabyte sized  chunks down to
> half of G1HeapRegionSize).
>
> So we set out to fix it:
>
> 1) We got performance patch merged into our open source web framework
> that slashed reallocations and rewrote our own code that was
> generating JSON String to limit reallocation.
> 2) Tuning  -XX:G1HeapRegionSize once we saw proper explanation of G1GC
> in Monica's JavaOne presentation. For whatever reason for ~7-8GB heap
> we were getting thousands of G1GC heap regions and we've arrived to
> -XX:G1HeapRegionSize=16m after some testing.
>
> The fact is after all this tuning G1GC has been performing amazingly
> for us { -XX:G1HeapRegionSize=16m
> -XX:InitiatingHeapOccupancyPercent=33 -XX:MaxGCPauseMillis=250
> -XX:+UseG1GC -Xmx7168m -Xms7168m -XX:MaxPermSize=768m
> -XX:ReservedCodeCacheSize=128m }. We haven't seen FullGC in production
> for months now and we have to work really hard to generate them in
> testing too. Our GC pauses are sub 0.3s, giving us desired web app
> performance.
>
>
> Keep up great work! :)
>
>
> Darius.
>
>
> On Sun, May 26, 2013 at 6:20 PM, Martin Makundi
> <martin.makundi at koodaripalvelut.com> wrote:
> > Sorry, forgot to mention, using:
> >
> > java version "1.7.0_21"
> > Java(TM) SE Runtime Environment (build 1.7.0_21-b11)
> > Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode)
> >
> > Linux version 3.0.1.stk64 (dfn at localhost.localdomain) (gcc version
> > 4.5.1 20100924 (Red Hat 4.5.1-4) (GCC) ) #1 SMP Sat Aug 13 12:53:46
> > EDT 2011
> >
> > -Dclassworlds.conf=/usr/share/maven/maven/bin/m2.conf
> > -Dmaven.home=/usr/share/maven/maven
> > -Duser.timezone=EET
> > -XX:+AggressiveOpts
> > -XX:+DisableExplicitGC
> > -XX:+ParallelRefProcEnabled
> > -XX:+PrintGCDateStamps
> > -XX:+PrintGCDetails
> > -XX:+PrintHeapAtGC
> > -XX:+UseAdaptiveSizePolicy
> > -XX:+UseCompressedOops
> > -XX:+UseFastAccessorMethods
> > -XX:+UseG1GC
> > -XX:+UseGCOverheadLimit
> > -XX:+UseNUMA
> > -XX:+UseStringCache
> > -XX:CMSInitiatingOccupancyFraction=70
> > -XX:GCPauseIntervalMillis=10000
> > -XX:InitiatingHeapOccupancyPercent=0
> > -XX:MaxGCPauseMillis=500
> > -XX:MaxPermSize=512m
> > -XX:PermSize=512m
> > -XX:ReservedCodeCacheSize=48m
> > -Xloggc:gc.log
> > -Xmaxf1
> > -Xms30G
> > -Xmx30G
> > -Xnoclassgc
> > -Xss4096k
> >
> >
> > **
> > Martin
> >
> > 2013/5/26 Charlie Hunt <chunt at salesforce.com>:
> >> Which version of the JDK/JRE are you using?
> >>
> >> One of the links you referenced below was using JDK 6, where there is
> no official support for G1. The other link suggests it could have been RMI
> DGC or a System.gc().
> >>
> >>
> >>
> >> Sent from my iPhone
> >>
> >> On May 25, 2013, at 11:43 PM, "Martin Makundi" <
> martin.makundi at koodaripalvelut.com> wrote:
> >>
> >>> it occurs daily.
> > _______________________________________________
> > hotspot-gc-use mailing list
> > hotspot-gc-use at openjdk.java.net
> > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130528/a7619ee6/attachment.html 

From chunt at salesforce.com  Tue May 28 10:39:24 2013
From: chunt at salesforce.com (Charlie Hunt)
Date: Tue, 28 May 2013 12:39:24 -0500
Subject: Bug in G1GC it performs Full GC when code cache is full resulting
	in overkill
In-Reply-To: <CAKspAHJ5M7CikMmyr0t0h=5GOFjHJ3jACo-M6hW-ZXoz=v=-2g@mail.gmail.com>
References: <CAKspAHKSnrndWznEe8Uk-frRALUJ_b_JVq+be5woQuj0BnKSeg@mail.gmail.com>
	<3807DE76-D6CA-451F-AC72-771332825905@salesforce.com>
	<CAKspAHJ5M7CikMmyr0t0h=5GOFjHJ3jACo-M6hW-ZXoz=v=-2g@mail.gmail.com>
Message-ID: <A6094A0A-7AE8-4722-80DA-E1CFA7A2AD3D@salesforce.com>

Hi Martin,

There's a few cmd line options in your list that you likely don't need.  We'll address in a different email.

Do you have GC logs you can share that exhibit the "unexpected Full GC" with G1 ?  At a minimum several GC events before the Full GC event, and a couple after.

thanks,

charlie ...

On May 26, 2013, at 10:20 AM, Martin Makundi wrote:

> Sorry, forgot to mention, using:
> 
> java version "1.7.0_21"
> Java(TM) SE Runtime Environment (build 1.7.0_21-b11)
> Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode)
> 
> Linux version 3.0.1.stk64 (dfn at localhost.localdomain) (gcc version
> 4.5.1 20100924 (Red Hat 4.5.1-4) (GCC) ) #1 SMP Sat Aug 13 12:53:46
> EDT 2011
> 
> -Dclassworlds.conf=/usr/share/maven/maven/bin/m2.conf
> -Dmaven.home=/usr/share/maven/maven
> -Duser.timezone=EET
> -XX:+AggressiveOpts
> -XX:+DisableExplicitGC
> -XX:+ParallelRefProcEnabled
> -XX:+PrintGCDateStamps
> -XX:+PrintGCDetails
> -XX:+PrintHeapAtGC
> -XX:+UseAdaptiveSizePolicy
> -XX:+UseCompressedOops
> -XX:+UseFastAccessorMethods
> -XX:+UseG1GC
> -XX:+UseGCOverheadLimit
> -XX:+UseNUMA
> -XX:+UseStringCache
> -XX:CMSInitiatingOccupancyFraction=70
> -XX:GCPauseIntervalMillis=10000
> -XX:InitiatingHeapOccupancyPercent=0
> -XX:MaxGCPauseMillis=500
> -XX:MaxPermSize=512m
> -XX:PermSize=512m
> -XX:ReservedCodeCacheSize=48m
> -Xloggc:gc.log
> -Xmaxf1
> -Xms30G
> -Xmx30G
> -Xnoclassgc
> -Xss4096k
> 
> 
> **
> Martin
> 
> 2013/5/26 Charlie Hunt <chunt at salesforce.com>:
>> Which version of the JDK/JRE are you using?
>> 
>> One of the links you referenced below was using JDK 6, where there is no official support for G1. The other link suggests it could have been RMI DGC or a System.gc().
>> 
>> 
>> 
>> Sent from my iPhone
>> 
>> On May 25, 2013, at 11:43 PM, "Martin Makundi" <martin.makundi at koodaripalvelut.com> wrote:
>> 
>>> it occurs daily.


From chunt at salesforce.com  Tue May 28 11:00:48 2013
From: chunt at salesforce.com (Charlie Hunt)
Date: Tue, 28 May 2013 13:00:48 -0500
Subject: Bug in G1GC it performs Full GC when code cache is full resulting
	in overkill
In-Reply-To: <CAKspAHJ5M7CikMmyr0t0h=5GOFjHJ3jACo-M6hW-ZXoz=v=-2g@mail.gmail.com>
References: <CAKspAHKSnrndWznEe8Uk-frRALUJ_b_JVq+be5woQuj0BnKSeg@mail.gmail.com>
	<3807DE76-D6CA-451F-AC72-771332825905@salesforce.com>
	<CAKspAHJ5M7CikMmyr0t0h=5GOFjHJ3jACo-M6hW-ZXoz=v=-2g@mail.gmail.com>
Message-ID: <DE80C755-56B1-4C39-AD35-3007C86762A7@salesforce.com>

Hi Martin,

On the subject of cmd line options ...

Here's a list of options that I think look a bit questionable, and I'd like to understand why you feel the need to set them:
-XX:+UseFastAccessorMethod  (the default is disabled)
-XX:+UseNUMA  (Are you running a JVM that spans NUMA memory nodes?  Or, do you have multiple JVMs running on a NUMA or non-NUMA system?)
-XX:+UseStringCache (Do you have evidence that this helps?  And, do you know what it does?)
-XX:CMSInitiatingOccupancyFraction=70 (This is applicable to CMS GC, and not applicable to G1 GC)
-XX:GCPauseIntervalMillis=10000 (Would like to understand the justification for setting this, and to a 10 second value. This will impact G1.)
-XX:InitiatingHeapOccupancyPercent=0  (You realize this will force G1's concurrent cycle to run continuously?)
-Xmaxf1 (I've never seen this used before, can you share what you it does and what you expect it to do?)
-noclassgc (This is rarely needed and haven't seen an app that required it for quite some time)

These you don't need to set as they are the default with 1.7.0_21 when you specify -XX:+UseG1GC, hence you can remove them:
-XX:+UseAdaptiveSizePolicy
-XX:+UseCompressedOops (this gets auto-enabled based on the size of the Java with 64-bit JVMs --- and you might realize slightly better performance if you can run with -Xmx26g / -Xms26g, that should give you zero base compressed oops)
-XX:+ UseGCOverheadLimit
-XX:+ReservedCodeCacheSize=48, that is the default for 7u21.  You might consider setting it higher if you have the available space, and more importantly if you think you're running out of code space.

thanks,

charlie ...


On May 26, 2013, at 10:20 AM, Martin Makundi wrote:

> Sorry, forgot to mention, using:
> 
> java version "1.7.0_21"
> Java(TM) SE Runtime Environment (build 1.7.0_21-b11)
> Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode)
> 
> Linux version 3.0.1.stk64 (dfn at localhost.localdomain) (gcc version
> 4.5.1 20100924 (Red Hat 4.5.1-4) (GCC) ) #1 SMP Sat Aug 13 12:53:46
> EDT 2011
> 
> -Dclassworlds.conf=/usr/share/maven/maven/bin/m2.conf
> -Dmaven.home=/usr/share/maven/maven
> -Duser.timezone=EET
> -XX:+AggressiveOpts
> -XX:+DisableExplicitGC
> -XX:+ParallelRefProcEnabled
> -XX:+PrintGCDateStamps
> -XX:+PrintGCDetails
> -XX:+PrintHeapAtGC
> -XX:+UseAdaptiveSizePolicy
> -XX:+UseCompressedOops
> -XX:+UseFastAccessorMethods
> -XX:+UseG1GC
> -XX:+UseGCOverheadLimit
> -XX:+UseNUMA
> -XX:+UseStringCache
> -XX:CMSInitiatingOccupancyFraction=70
> -XX:GCPauseIntervalMillis=10000
> -XX:InitiatingHeapOccupancyPercent=0
> -XX:MaxGCPauseMillis=500
> -XX:MaxPermSize=512m
> -XX:PermSize=512m
> -XX:ReservedCodeCacheSize=48m
> -Xloggc:gc.log
> -Xmaxf1
> -Xms30G
> -Xmx30G
> -Xnoclassgc
> -Xss4096k
> 
> 
> **
> Martin
> 
> 2013/5/26 Charlie Hunt <chunt at salesforce.com>:
>> Which version of the JDK/JRE are you using?
>> 
>> One of the links you referenced below was using JDK 6, where there is no official support for G1. The other link suggests it could have been RMI DGC or a System.gc().
>> 
>> 
>> 
>> Sent from my iPhone
>> 
>> On May 25, 2013, at 11:43 PM, "Martin Makundi" <martin.makundi at koodaripalvelut.com> wrote:
>> 
>>> it occurs daily.


From martin.makundi at koodaripalvelut.com  Tue May 28 11:35:06 2013
From: martin.makundi at koodaripalvelut.com (Martin Makundi)
Date: Tue, 28 May 2013 21:35:06 +0300
Subject: Bug in G1GC it performs Full GC when code cache is full resulting
	in overkill
In-Reply-To: <DE80C755-56B1-4C39-AD35-3007C86762A7@salesforce.com>
References: <CAKspAHKSnrndWznEe8Uk-frRALUJ_b_JVq+be5woQuj0BnKSeg@mail.gmail.com>
	<3807DE76-D6CA-451F-AC72-771332825905@salesforce.com>
	<CAKspAHJ5M7CikMmyr0t0h=5GOFjHJ3jACo-M6hW-ZXoz=v=-2g@mail.gmail.com>
	<DE80C755-56B1-4C39-AD35-3007C86762A7@salesforce.com>
Message-ID: <CAKspAHJev3o+g_CN0H3NUsjXPpThMWppnvawiuG_P8YFpSDVEQ@mail.gmail.com>

Hi!

> On the subject of cmd line options ...

Thanks for the detailed feedback, here is what we based our decisions upon:

> Here's a list of options that I think look a bit questionable, and I'd like to understand why you feel the need to set them:

They are not very clearly documented, so there are a lot of 'shotgun' options.

> -XX:+UseFastAccessorMethod  (the default is disabled)

Fast sounds good, the description of it is "Use optimized versions of
Get<Primitive>Field" which sounds good. I see no harm in this.

> -XX:+UseNUMA  (Are you running a JVM that spans NUMA memory nodes?  Or, do you have multiple JVMs running on a NUMA or non-NUMA system?)

Single JVM 64bit Linux, I do not know the technical details, but
switched on based on this sentence:
NUMA Performance Metrics

When evaluated against the SPEC JBB 2005 benchmark on an 8-chip
Opteron machine, NUMA-aware systems showed the following performance
increases:
    32 bit ? About 30 percent increase in performance with NUMA-aware allocator
    64 bit ? About 40 percent increase in performance with NUMA-aware allocator

> -XX:+UseStringCache (Do you have evidence that this helps?  And, do you know what it does?)

I assume it is some sort of string interning solution. Don't know
exactly what it does, but our application uses high amount of
redundant strings, smaller memory footprint is a good idea. Again,
very little documentation about this available but seems
straightforward. Haven't benchmarked it personally.

> -XX:CMSInitiatingOccupancyFraction=70 (This is applicable to CMS GC, and not applicable to G1 GC)

Again, not documented thoroughly where it applies and where not, jvm
gave no warning/error about it so we assumed it's valid.

> -XX:GCPauseIntervalMillis=10000 (Would like to understand the justification for setting this, and to a 10 second value. This will impact G1.)

I understand what matters is the ratio
MaxGCPauseMillis/GCPauseIntervalMillis and a larger
GCPauseIntervalMillis makes it less aggressive and thus less overhead?

> -XX:InitiatingHeapOccupancyPercent=0  (You realize this will force G1's concurrent cycle to run continuously?)

Yes, that's what we figured out, we don't want it to sit lazy and end
up in a situation where it is required to do a Full GC. This switch
was specifically chosen in a situation we had a memory leak and tried
to aggressively fight against it before we found the root cause. Maybe
we should try without this switch now, and see what effect it has.

> -Xmaxf1 (I've never seen this used before, can you share what you it does and what you expect it to do?)

Again, referring to previous memory leak issues, we did not want the
application to fight with other applications for available memory.
Xmaxf1 keeps memory reservation fixed to initial value which is equal
to maximum value.

> -noclassgc (This is rarely needed and haven't seen an app that required it for quite some time)

Jvm 1.6 stopped the world for couple of minutes several times per day
while unloading classes, so we used noclassgc to disable that. We do
not know if this is necessary for latest 1.7 to avoid class unload
pause, but we continued to use this switch and found no harm in it.
Can't afford testing that in production ;)

> These you don't need to set as they are the default with 1.7.0_21 when you specify -XX:+UseG1GC, hence you can remove them:

Some of them are set explicitly just to keep track amidst jvm upgrades.

> -XX:+UseAdaptiveSizePolicy
> -XX:+UseCompressedOops (this gets auto-enabled based on the size of the Java with 64-bit JVMs --- and you might realize slightly better performance if you can run with -Xmx26g / -Xms26g, that should give you zero base compressed oops)

Thanks, good to know, will try that. Is it exactly 26g or bits more or
bits less?

> -XX:+ UseGCOverheadLimit
> -XX:+ReservedCodeCacheSize=48, that is the default for 7u21.  You might consider setting it higher if you have the available space, and more importantly if you think you're running out of code space.

For our sun jvm linux 64bit 48m is maximum, jvm won't start if higher value.


**
Martin

>
>
> On May 26, 2013, at 10:20 AM, Martin Makundi wrote:
>
>> Sorry, forgot to mention, using:
>>
>> java version "1.7.0_21"
>> Java(TM) SE Runtime Environment (build 1.7.0_21-b11)
>> Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode)
>>
>> Linux version 3.0.1.stk64 (dfn at localhost.localdomain) (gcc version
>> 4.5.1 20100924 (Red Hat 4.5.1-4) (GCC) ) #1 SMP Sat Aug 13 12:53:46
>> EDT 2011
>>
>> -Dclassworlds.conf=/usr/share/maven/maven/bin/m2.conf
>> -Dmaven.home=/usr/share/maven/maven
>> -Duser.timezone=EET
>> -XX:+AggressiveOpts
>> -XX:+DisableExplicitGC
>> -XX:+ParallelRefProcEnabled
>> -XX:+PrintGCDateStamps
>> -XX:+PrintGCDetails
>> -XX:+PrintHeapAtGC
>> -XX:+UseAdaptiveSizePolicy
>> -XX:+UseCompressedOops
>> -XX:+UseFastAccessorMethods
>> -XX:+UseG1GC
>> -XX:+UseGCOverheadLimit
>> -XX:+UseNUMA
>> -XX:+UseStringCache
>> -XX:CMSInitiatingOccupancyFraction=70
>> -XX:GCPauseIntervalMillis=10000
>> -XX:InitiatingHeapOccupancyPercent=0
>> -XX:MaxGCPauseMillis=500
>> -XX:MaxPermSize=512m
>> -XX:PermSize=512m
>> -XX:ReservedCodeCacheSize=48m
>> -Xloggc:gc.log
>> -Xmaxf1
>> -Xms30G
>> -Xmx30G
>> -Xnoclassgc
>> -Xss4096k
>>
>>
>> **
>> Martin
>>
>> 2013/5/26 Charlie Hunt <chunt at salesforce.com>:
>>> Which version of the JDK/JRE are you using?
>>>
>>> One of the links you referenced below was using JDK 6, where there is no official support for G1. The other link suggests it could have been RMI DGC or a System.gc().
>>>
>>>
>>>
>>> Sent from my iPhone
>>>
>>> On May 25, 2013, at 11:43 PM, "Martin Makundi" <martin.makundi at koodaripalvelut.com> wrote:
>>>
>>>> it occurs daily.
>

From martin.makundi at koodaripalvelut.com  Wed May 29 04:10:21 2013
From: martin.makundi at koodaripalvelut.com (Martin Makundi)
Date: Wed, 29 May 2013 14:10:21 +0300
Subject: Bug in G1GC it performs Full GC when code cache is full resulting
	in overkill
In-Reply-To: <CAKspAHJev3o+g_CN0H3NUsjXPpThMWppnvawiuG_P8YFpSDVEQ@mail.gmail.com>
References: <CAKspAHKSnrndWznEe8Uk-frRALUJ_b_JVq+be5woQuj0BnKSeg@mail.gmail.com>
	<3807DE76-D6CA-451F-AC72-771332825905@salesforce.com>
	<CAKspAHJ5M7CikMmyr0t0h=5GOFjHJ3jACo-M6hW-ZXoz=v=-2g@mail.gmail.com>
	<DE80C755-56B1-4C39-AD35-3007C86762A7@salesforce.com>
	<CAKspAHJev3o+g_CN0H3NUsjXPpThMWppnvawiuG_P8YFpSDVEQ@mail.gmail.com>
Message-ID: <CAKspAHJWa75MgbjUwDTsDzSBgPrBqcQq054eOwqgcqTC84RZMA@mail.gmail.com>

Hi!

These changes resulted in two Full GCs already within 8 hours from deployment:

- memory reduction to 26g
- removed InitiatingHeapOccupancyPercent=0


Neither change had noticeable effect on performance, we will first put
back InitiatingHeapOccupancyPercent=0 to see if it makes a difference.

**
Martin

2013/5/28 Martin Makundi <martin.makundi at koodaripalvelut.com>:
> Hi!
>
>> On the subject of cmd line options ...
>
> Thanks for the detailed feedback, here is what we based our decisions upon:
>
>> Here's a list of options that I think look a bit questionable, and I'd like to understand why you feel the need to set them:
>
> They are not very clearly documented, so there are a lot of 'shotgun' options.
>
>> -XX:+UseFastAccessorMethod  (the default is disabled)
>
> Fast sounds good, the description of it is "Use optimized versions of
> Get<Primitive>Field" which sounds good. I see no harm in this.
>
>> -XX:+UseNUMA  (Are you running a JVM that spans NUMA memory nodes?  Or, do you have multiple JVMs running on a NUMA or non-NUMA system?)
>
> Single JVM 64bit Linux, I do not know the technical details, but
> switched on based on this sentence:
> NUMA Performance Metrics
>
> When evaluated against the SPEC JBB 2005 benchmark on an 8-chip
> Opteron machine, NUMA-aware systems showed the following performance
> increases:
>     32 bit ? About 30 percent increase in performance with NUMA-aware allocator
>     64 bit ? About 40 percent increase in performance with NUMA-aware allocator
>
>> -XX:+UseStringCache (Do you have evidence that this helps?  And, do you know what it does?)
>
> I assume it is some sort of string interning solution. Don't know
> exactly what it does, but our application uses high amount of
> redundant strings, smaller memory footprint is a good idea. Again,
> very little documentation about this available but seems
> straightforward. Haven't benchmarked it personally.
>
>> -XX:CMSInitiatingOccupancyFraction=70 (This is applicable to CMS GC, and not applicable to G1 GC)
>
> Again, not documented thoroughly where it applies and where not, jvm
> gave no warning/error about it so we assumed it's valid.
>
>> -XX:GCPauseIntervalMillis=10000 (Would like to understand the justification for setting this, and to a 10 second value. This will impact G1.)
>
> I understand what matters is the ratio
> MaxGCPauseMillis/GCPauseIntervalMillis and a larger
> GCPauseIntervalMillis makes it less aggressive and thus less overhead?
>
>> -XX:InitiatingHeapOccupancyPercent=0  (You realize this will force G1's concurrent cycle to run continuously?)
>
> Yes, that's what we figured out, we don't want it to sit lazy and end
> up in a situation where it is required to do a Full GC. This switch
> was specifically chosen in a situation we had a memory leak and tried
> to aggressively fight against it before we found the root cause. Maybe
> we should try without this switch now, and see what effect it has.
>
>> -Xmaxf1 (I've never seen this used before, can you share what you it does and what you expect it to do?)
>
> Again, referring to previous memory leak issues, we did not want the
> application to fight with other applications for available memory.
> Xmaxf1 keeps memory reservation fixed to initial value which is equal
> to maximum value.
>
>> -noclassgc (This is rarely needed and haven't seen an app that required it for quite some time)
>
> Jvm 1.6 stopped the world for couple of minutes several times per day
> while unloading classes, so we used noclassgc to disable that. We do
> not know if this is necessary for latest 1.7 to avoid class unload
> pause, but we continued to use this switch and found no harm in it.
> Can't afford testing that in production ;)
>
>> These you don't need to set as they are the default with 1.7.0_21 when you specify -XX:+UseG1GC, hence you can remove them:
>
> Some of them are set explicitly just to keep track amidst jvm upgrades.
>
>> -XX:+UseAdaptiveSizePolicy
>> -XX:+UseCompressedOops (this gets auto-enabled based on the size of the Java with 64-bit JVMs --- and you might realize slightly better performance if you can run with -Xmx26g / -Xms26g, that should give you zero base compressed oops)
>
> Thanks, good to know, will try that. Is it exactly 26g or bits more or
> bits less?
>
>> -XX:+ UseGCOverheadLimit
>> -XX:+ReservedCodeCacheSize=48, that is the default for 7u21.  You might consider setting it higher if you have the available space, and more importantly if you think you're running out of code space.
>
> For our sun jvm linux 64bit 48m is maximum, jvm won't start if higher value.
>
>
> **
> Martin
>
>>
>>
>> On May 26, 2013, at 10:20 AM, Martin Makundi wrote:
>>
>>> Sorry, forgot to mention, using:
>>>
>>> java version "1.7.0_21"
>>> Java(TM) SE Runtime Environment (build 1.7.0_21-b11)
>>> Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode)
>>>
>>> Linux version 3.0.1.stk64 (dfn at localhost.localdomain) (gcc version
>>> 4.5.1 20100924 (Red Hat 4.5.1-4) (GCC) ) #1 SMP Sat Aug 13 12:53:46
>>> EDT 2011
>>>
>>> -Dclassworlds.conf=/usr/share/maven/maven/bin/m2.conf
>>> -Dmaven.home=/usr/share/maven/maven
>>> -Duser.timezone=EET
>>> -XX:+AggressiveOpts
>>> -XX:+DisableExplicitGC
>>> -XX:+ParallelRefProcEnabled
>>> -XX:+PrintGCDateStamps
>>> -XX:+PrintGCDetails
>>> -XX:+PrintHeapAtGC
>>> -XX:+UseAdaptiveSizePolicy
>>> -XX:+UseCompressedOops
>>> -XX:+UseFastAccessorMethods
>>> -XX:+UseG1GC
>>> -XX:+UseGCOverheadLimit
>>> -XX:+UseNUMA
>>> -XX:+UseStringCache
>>> -XX:CMSInitiatingOccupancyFraction=70
>>> -XX:GCPauseIntervalMillis=10000
>>> -XX:InitiatingHeapOccupancyPercent=0
>>> -XX:MaxGCPauseMillis=500
>>> -XX:MaxPermSize=512m
>>> -XX:PermSize=512m
>>> -XX:ReservedCodeCacheSize=48m
>>> -Xloggc:gc.log
>>> -Xmaxf1
>>> -Xms30G
>>> -Xmx30G
>>> -Xnoclassgc
>>> -Xss4096k
>>>
>>>
>>> **
>>> Martin
>>>
>>> 2013/5/26 Charlie Hunt <chunt at salesforce.com>:
>>>> Which version of the JDK/JRE are you using?
>>>>
>>>> One of the links you referenced below was using JDK 6, where there is no official support for G1. The other link suggests it could have been RMI DGC or a System.gc().
>>>>
>>>>
>>>>
>>>> Sent from my iPhone
>>>>
>>>> On May 25, 2013, at 11:43 PM, "Martin Makundi" <martin.makundi at koodaripalvelut.com> wrote:
>>>>
>>>>> it occurs daily.
>>

From chunt at salesforce.com  Wed May 29 08:10:13 2013
From: chunt at salesforce.com (Charlie Hunt)
Date: Wed, 29 May 2013 10:10:13 -0500
Subject: Bug in G1GC it performs Full GC when code cache is full resulting
	in overkill
In-Reply-To: <CAKspAHJev3o+g_CN0H3NUsjXPpThMWppnvawiuG_P8YFpSDVEQ@mail.gmail.com>
References: <CAKspAHKSnrndWznEe8Uk-frRALUJ_b_JVq+be5woQuj0BnKSeg@mail.gmail.com>
	<3807DE76-D6CA-451F-AC72-771332825905@salesforce.com>
	<CAKspAHJ5M7CikMmyr0t0h=5GOFjHJ3jACo-M6hW-ZXoz=v=-2g@mail.gmail.com>
	<DE80C755-56B1-4C39-AD35-3007C86762A7@salesforce.com>
	<CAKspAHJev3o+g_CN0H3NUsjXPpThMWppnvawiuG_P8YFpSDVEQ@mail.gmail.com>
Message-ID: <5C892669-3515-4DCC-BB72-96AE89EBE5F8@salesforce.com>

A bit of constructive criticism ;-)  It would be good practice to set one option at a time and measure its performance to determine whether it improves performance rather than choosing an option because of something you read in text.  In short, always measure and reason about whether what you've observed for an improvement or regression makes sense.  And, also run multiple times to get a sense of noise versus real improvement or regression.

Addl comments embedded below.

hths,

charlie ...

On May 28, 2013, at 1:35 PM, Martin Makundi wrote:

> Hi!
> 
>> On the subject of cmd line options ...
> 
> Thanks for the detailed feedback, here is what we based our decisions upon:
> 
>> Here's a list of options that I think look a bit questionable, and I'd like to understand why you feel the need to set them:
> 
> They are not very clearly documented, so there are a lot of 'shotgun' options.
> 
>> -XX:+UseFastAccessorMethod  (the default is disabled)
> 
> Fast sounds good, the description of it is "Use optimized versions of
> Get<Primitive>Field" which sounds good. I see no harm in this.

These would be JNI operations.

A quick at the HotSpot source suggests UseFastAccessorMethods is mostly confined to interpreter operations.

> 
>> -XX:+UseNUMA  (Are you running a JVM that spans NUMA memory nodes?  Or, do you have multiple JVMs running on a NUMA or non-NUMA system?)
> 
> Single JVM 64bit Linux, I do not know the technical details, but
> switched on based on this sentence:
> NUMA Performance Metrics
> 
> When evaluated against the SPEC JBB 2005 benchmark on an 8-chip
> Opteron machine, NUMA-aware systems showed the following performance
> increases:
>    32 bit ? About 30 percent increase in performance with NUMA-aware allocator
>    64 bit ? About 40 percent increase in performance with NUMA-aware allocator

A bit of missing context here ... the underlying system should be a NUMA system.  IIRC, on that particular 8-chip AMD system, there could be as much as two "hops" if you will to access memory on a given node.

Key point is that you should use -XX:+UseNUMA only when you are deploying a JVM that spans NUMA nodes.  If you're on a system that is not a NUMA architecture, then you shouldn't use it.  If you have multiple JVMs on a NUMA system, it would be a better practice to bind those JVMs to a NUMA node (CPU & memory node), unless the two JVMs are so disparate that it doesn't make sense to give an entire NUMA node to one JVM.

> 
>> -XX:+UseStringCache (Do you have evidence that this helps?  And, do you know what it does?)
> 
> I assume it is some sort of string interning solution. Don't know
> exactly what it does, but our application uses high amount of
> redundant strings, smaller memory footprint is a good idea. Again,
> very little documentation about this available but seems
> straightforward. Haven't benchmarked it personally.

I won't go into the details of what it does.  I don't think I can say what it does without possibly being at risk of binding separation agreement.

I'll just say that you should measure the perf difference with it off versus on if you think it might help.

> 
>> -XX:CMSInitiatingOccupancyFraction=70 (This is applicable to CMS GC, and not applicable to G1 GC)
> 
> Again, not documented thoroughly where it applies and where not, jvm
> gave no warning/error about it so we assumed it's valid.

There's always the HotSpot source code ;-)

It's also quite well documented in various slide ware on the internet.  It's also quite well documented in the Java Performance book. :-)

> 
>> -XX:GCPauseIntervalMillis=10000 (Would like to understand the justification for setting this, and to a 10 second value. This will impact G1.)
> 
> I understand what matters is the ratio
> MaxGCPauseMillis/GCPauseIntervalMillis and a larger
> GCPauseIntervalMillis makes it less aggressive and thus less overhead?

That's the intention.  But, in practice in work I've done with G1, I rarely find I need to set GCPauseIntervalMillis different from the default.

> 
>> -XX:InitiatingHeapOccupancyPercent=0  (You realize this will force G1's concurrent cycle to run continuously?)
> 
> Yes, that's what we figured out, we don't want it to sit lazy and end
> up in a situation where it is required to do a Full GC. This switch
> was specifically chosen in a situation we had a memory leak and tried
> to aggressively fight against it before we found the root cause. Maybe
> we should try without this switch now, and see what effect it has.

Having GC logs to see what available head room you have between the initiating of a G1 concurrent cycle and available regions / heap space would be most appropriate.

> 
>> -Xmaxf1 (I've never seen this used before, can you share what you it does and what you expect it to do?)
> 
> Again, referring to previous memory leak issues, we did not want the
> application to fight with other applications for available memory.
> Xmaxf1 keeps memory reservation fixed to initial value which is equal
> to maximum value.

Ok

> 
>> -noclassgc (This is rarely needed and haven't seen an app that required it for quite some time)
> 
> Jvm 1.6 stopped the world for couple of minutes several times per day
> while unloading classes, so we used noclassgc to disable that. We do
> not know if this is necessary for latest 1.7 to avoid class unload
> pause, but we continued to use this switch and found no harm in it.
> Can't afford testing that in production ;)

Haven't seen a case where unloading classes cause a several minute pause.  Are you sure your system is not swapping?  And, do you have GC logs you can share that illustrate the behavior and that -noclassgc fixed it?

> 
>> These you don't need to set as they are the default with 1.7.0_21 when you specify -XX:+UseG1GC, hence you can remove them:
> 
> Some of them are set explicitly just to keep track amidst jvm upgrades.

You can do as you wish. ;-)  I tend to like to keep the list of JVM options a short as possible and when migrating to newer versions doing a dump of -XX:+PrintFlagsFinal to get the defaults, and then also checking the default values after selecting the collector I'm gonna use, i.e. -XX:+UseG1GC, and if I'm gonna use -XX:+AggressiveOpts because I know those will also set other options too.  That prevents some other command line option changing default values and not noticing it.

> 
>> -XX:+UseAdaptiveSizePolicy
>> -XX:+UseCompressedOops (this gets auto-enabled based on the size of the Java with 64-bit JVMs --- and you might realize slightly better performance if you can run with -Xmx26g / -Xms26g, that should give you zero base compressed oops)
> 
> Thanks, good to know, will try that. Is it exactly 26g or bits more or
> bits less?

Not exactly 26g, but in that area.  26g almost always gives you zero base.  I haven't seen on that hasn't.

> 
>> -XX:+ UseGCOverheadLimit
>> -XX:ReservedCodeCacheSize=48, that is the default for 7u21.  You might consider setting it higher if you have the available space, and more importantly if you think you're running out of code space.
> 
> For our sun jvm linux 64bit 48m is maximum, jvm won't start if higher value.

If you can't go larger than -XX:ReservedCodeCacheSize=48m, that may suggest you have memory constraints and may also suggest you don't have enough swap space defined, and you may be experiencing swapping during JVM execution.  I've got a Linux system that has 32 GB of RAM, I can set ReservedCodeCacheSize=256m with no issues, even with -Xms30g and -Xms30g.


> 
> 
> **
> Martin
> 
>> 
>> 
>> On May 26, 2013, at 10:20 AM, Martin Makundi wrote:
>> 
>>> Sorry, forgot to mention, using:
>>> 
>>> java version "1.7.0_21"
>>> Java(TM) SE Runtime Environment (build 1.7.0_21-b11)
>>> Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode)
>>> 
>>> Linux version 3.0.1.stk64 (dfn at localhost.localdomain) (gcc version
>>> 4.5.1 20100924 (Red Hat 4.5.1-4) (GCC) ) #1 SMP Sat Aug 13 12:53:46
>>> EDT 2011
>>> 
>>> -Dclassworlds.conf=/usr/share/maven/maven/bin/m2.conf
>>> -Dmaven.home=/usr/share/maven/maven
>>> -Duser.timezone=EET
>>> -XX:+AggressiveOpts
>>> -XX:+DisableExplicitGC
>>> -XX:+ParallelRefProcEnabled
>>> -XX:+PrintGCDateStamps
>>> -XX:+PrintGCDetails
>>> -XX:+PrintHeapAtGC
>>> -XX:+UseAdaptiveSizePolicy
>>> -XX:+UseCompressedOops
>>> -XX:+UseFastAccessorMethods
>>> -XX:+UseG1GC
>>> -XX:+UseGCOverheadLimit
>>> -XX:+UseNUMA
>>> -XX:+UseStringCache
>>> -XX:CMSInitiatingOccupancyFraction=70
>>> -XX:GCPauseIntervalMillis=10000
>>> -XX:InitiatingHeapOccupancyPercent=0
>>> -XX:MaxGCPauseMillis=500
>>> -XX:MaxPermSize=512m
>>> -XX:PermSize=512m
>>> -XX:ReservedCodeCacheSize=48m
>>> -Xloggc:gc.log
>>> -Xmaxf1
>>> -Xms30G
>>> -Xmx30G
>>> -Xnoclassgc
>>> -Xss4096k
>>> 
>>> 
>>> **
>>> Martin
>>> 
>>> 2013/5/26 Charlie Hunt <chunt at salesforce.com>:
>>>> Which version of the JDK/JRE are you using?
>>>> 
>>>> One of the links you referenced below was using JDK 6, where there is no official support for G1. The other link suggests it could have been RMI DGC or a System.gc().
>>>> 
>>>> 
>>>> 
>>>> Sent from my iPhone
>>>> 
>>>> On May 25, 2013, at 11:43 PM, "Martin Makundi" <martin.makundi at koodaripalvelut.com> wrote:
>>>> 
>>>>> it occurs daily.
>> 


From martin.makundi at koodaripalvelut.com  Wed May 29 08:49:46 2013
From: martin.makundi at koodaripalvelut.com (Martin Makundi)
Date: Wed, 29 May 2013 18:49:46 +0300
Subject: Bug in G1GC it performs Full GC when code cache is full resulting
	in overkill
In-Reply-To: <5C892669-3515-4DCC-BB72-96AE89EBE5F8@salesforce.com>
References: <CAKspAHKSnrndWznEe8Uk-frRALUJ_b_JVq+be5woQuj0BnKSeg@mail.gmail.com>
	<3807DE76-D6CA-451F-AC72-771332825905@salesforce.com>
	<CAKspAHJ5M7CikMmyr0t0h=5GOFjHJ3jACo-M6hW-ZXoz=v=-2g@mail.gmail.com>
	<DE80C755-56B1-4C39-AD35-3007C86762A7@salesforce.com>
	<CAKspAHJev3o+g_CN0H3NUsjXPpThMWppnvawiuG_P8YFpSDVEQ@mail.gmail.com>
	<5C892669-3515-4DCC-BB72-96AE89EBE5F8@salesforce.com>
Message-ID: <CAKspAH+NQWUkbDwsOCAfkQr1aVKC_WPRipDWUEA7PezS38drBA@mail.gmail.com>

Hi!

> A bit of constructive criticism ;-)  It would be good practice to set one option at a time and measure its performance to determine whether it improves performance rather than choosing an option because of something you read in text.  In short, always measure and reason about whether what you've observed for an improvement or regression makes sense.  And, also run multiple times to get a sense of noise versus real improvement or regression.


Thanks. That's one of the reasons we never changed our options. Once
we found someting that works very well, we know that its always n!
work to test changes and the system was running very nice indeed
before the previous tweak ;)

>>> -XX:+UseFastAccessorMethod  (the default is disabled)
>>
>> Fast sounds good, the description of it is "Use optimized versions of
>> Get<Primitive>Field" which sounds good. I see no harm in this.
>
> These would be JNI operations.
>
> A quick at the HotSpot source suggests UseFastAccessorMethods
> is mostly confined to interpreter operations.

Thanks for the info. Doesn't say much to me, but does not seem to harm
anything. Will try setting it off at some point in time.

>>> -XX:+UseNUMA  (Are you running a JVM that spans NUMA memory nodes?
>>> Or, do you have multiple JVMs running on a NUMA or non-NUMA system?)
>>
>
> Key point is that you should use -XX:+UseNUMA only when you are
> deploying  a JVM that spans NUMA nodes.

Thanks for the info. Doesn't say much to me, but does not seem to harm
anything. Will try setting it off at some point in time.

>>> -XX:+UseStringCache (Do you have evidence that this helps?  And, do you know what it does?)
>>
>> I assume it is some sort of string interning solution. Don't know
>> exactly what it does, but our application uses high amount of
>> redundant strings, smaller memory footprint is a good idea. Again,
>> very little documentation about this available but seems
>> straightforward. Haven't benchmarked it personally.
>
> I won't go into the details of what it does.  I don't think I can say what it does without possibly being at risk of binding separation agreement.
>
> I'll just say that you should measure the perf difference with it off versus on if you think it might help.

No visible impact on performance, really, in production. If we test it
with a bogus test case, well.. results are as informative as our tests
are close to the production. Which is unlikely.

>>> -XX:CMSInitiatingOccupancyFraction=70 (This is applicable to CMS GC, and not applicable to G1 GC)
>>
>> Again, not documented thoroughly where it applies and where not, jvm
>> gave no warning/error about it so we assumed it's valid.
>
> There's always the HotSpot source code ;-)
>
> It's also quite well documented in various slide ware on the internet.
> It's also quite well documented in the Java Performance book. :-)

Uh.. does it say somewhere that Do not use
XX:CMSInitiatingOccupancyFraction with G1GC? ;) I know performance
tuning is your bread and butter but is not ours... is more like we are
just driving the car and you are the mechanic...different
perspective.. just trying to fill'er'up to go..leaded or unleaded...
;)

>>> -XX:InitiatingHeapOccupancyPercent=0  (You realize this will force G1's concurrent cycle to run continuously?)
>>
>> Yes, that's what we figured out, we don't want it to sit lazy and end
>> up in a situation where it is required to do a Full GC. This switch
>> was specifically chosen in a situation we had a memory leak and tried
>> to aggressively fight against it before we found the root cause. Maybe
>> we should try without this switch now, and see what effect it has.
>
> Having GC logs to see what available head room you have between the
> initiating of a G1 concurrent cycle and available regions / heap space would be most appropriate.

Hmm.. I don't thoroughly understand the logs either, but, here is a snap:

2013-05-29T17:28:56.119+0300: 38905.407: [GC pause (young), 0.29261000 secs]
   [Parallel Time: 288.8 ms]
      [GC Worker Start (ms):  38905407.6  38905407.7  38905407.7  38905407.9
       Avg: 38905407.7, Min: 38905407.6, Max: 38905407.9, Diff:   0.3]
      [Ext Root Scanning (ms):  22.8  16.3  18.1  22.0
       Avg:  19.8, Min:  16.3, Max:  22.8, Diff:   6.6]
      [SATB Filtering (ms):  0.0  0.1  0.0  0.0
       Avg:   0.0, Min:   0.0, Max:   0.1, Diff:   0.1]
      [Update RS (ms):  31.9  37.3  35.1  33.3
       Avg:  34.4, Min:  31.9, Max:  37.3, Diff:   5.5]
         [Processed Buffers : 102 106 119 104
          Sum: 431, Avg: 107, Min: 102, Max: 119, Diff: 17]
      [Scan RS (ms):  0.0  0.0  0.1  0.0
       Avg:   0.0, Min:   0.0, Max:   0.1, Diff:   0.1]
      [Object Copy (ms):  228.2  229.1  229.5  227.3
       Avg: 228.5, Min: 227.3, Max: 229.5, Diff:   2.2]
      [Termination (ms):  0.0  0.0  0.0  0.0
       Avg:   0.0, Min:   0.0, Max:   0.0, Diff:   0.0]
         [Termination Attempts : 4 1 11 4
          Sum: 20, Avg: 5, Min: 1, Max: 11, Diff: 10]
      [GC Worker End (ms):  38905690.5  38905690.5  38905690.5  38905690.5
       Avg: 38905690.5, Min: 38905690.5, Max: 38905690.5, Diff:   0.0]
      [GC Worker (ms):  282.9  282.8  282.8  282.6
       Avg: 282.8, Min: 282.6, Max: 282.9, Diff:   0.3]
      [GC Worker Other (ms):  5.9  6.0  6.0  6.2
       Avg:   6.0, Min:   5.9, Max:   6.2, Diff:   0.3]
   [Complete CSet Marking:   0.0 ms]
   [Clear CT:   0.1 ms]
   [Other:   3.7 ms]
      [Choose CSet:   0.0 ms]
      [Ref Proc:   2.8 ms]
      [Ref Enq:   0.1 ms]
      [Free CSet:   0.3 ms]
   [Eden: 48M(5032M)->0B(5032M) Survivors: 288M->288M Heap:
15790M(26624M)->15741M(26624M)]
 [Times: user=1.14 sys=0.00, real=0.29 secs]
Heap after GC invocations=575 (full 157):
 garbage-first heap   total 27262976K, used 16119181K
[0x0000000160000000, 0x00000007e0000000, 0x00000007e0000000)
  region size 8192K, 36 young (294912K), 36 survivors (294912K)
 compacting perm gen  total 524288K, used 164479K [0x00000007e0000000,
0x0000000800000000, 0x0000000800000000)
   the space 524288K,  31% used [0x00000007e0000000,
0x00000007ea09ff68, 0x00000007ea0a0000, 0x0000000800000000)
No shared spaces configured.
}
{Heap before GC invocations=575 (full 157):
 garbage-first heap   total 27262976K, used 16119181K
[0x0000000160000000, 0x00000007e0000000, 0x00000007e0000000)
  region size 8192K, 37 young (303104K), 36 survivors (294912K)
 compacting perm gen  total 524288K, used 164479K [0x00000007e0000000,
0x0000000800000000, 0x0000000800000000)
   the space 524288K,  31% used [0x00000007e0000000,
0x00000007ea09ff68, 0x00000007ea0a0000, 0x0000000800000000)
No shared spaces configured.
2013-05-29T17:28:56.413+0300: 38905.701: [Full GC
15742M->14497M(26624M), 56.7731320 secs]

That's the third Full GC today after the change to 26G and change from
occupancypercent=0. Tomorrow will be trying again with
occupancypercent=0

>>> -noclassgc (This is rarely needed and haven't seen an app that required it for quite some time)
>>
>> Jvm 1.6 stopped the world for couple of minutes several times per day
>> while unloading classes, so we used noclassgc to disable that. We do
>> not know if this is necessary for latest 1.7 to avoid class unload
>> pause, but we continued to use this switch and found no harm in it.
>> Can't afford testing that in production ;)
>
> Haven't seen a case where unloading classes cause a several minute pause.
> Are you sure your system is not swapping?  And, do you have GC logs you
> can share that illustrate the behavior and that -noclassgc fixed it?

We deleted swap partition long time ago, we simply do not risk swapping at all.

We had this class unloading problem several times per day like half a
year ago, and fixed it with noclasssgc, that was a no-brainer, single
parameter that made the difference.

It is also discussed here (they do not discuss noclassgc though, we
figured that out somehow)
http://stackoverflow.com/questions/2833983/meaning-of-the-unloading-class-messages

>>> -XX:+ UseGCOverheadLimit
>>> -XX:ReservedCodeCacheSize=48, that is the default for 7u21.  You might consider setting it higher if you have the available space, and more importantly if you think you're running out of code space.
>>
>> For our sun jvm linux 64bit 48m is maximum, jvm won't start if higher value.
>
> If you can't go larger than -XX:ReservedCodeCacheSize=48m, that may suggest you have memory constraints and may also suggest you don't have enough swap space defined, and you may be experiencing swapping during JVM execution.  I've got a Linux system that has 32 GB of RAM, I can set ReservedCodeCacheSize=256m with no issues, even with -Xms30g and -Xms30g.

It is also documented that 48m is maximum
http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html
"maximum code cache size. [Solaris 64-bit, amd64, and -server x86:
48m"

**
Martin

>
>
>>>
>>> On May 26, 2013, at 10:20 AM, Martin Makundi wrote:
>>>
>>>> Sorry, forgot to mention, using:
>>>>
>>>> java version "1.7.0_21"
>>>> Java(TM) SE Runtime Environment (build 1.7.0_21-b11)
>>>> Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode)
>>>>
>>>> Linux version 3.0.1.stk64 (dfn at localhost.localdomain) (gcc version
>>>> 4.5.1 20100924 (Red Hat 4.5.1-4) (GCC) ) #1 SMP Sat Aug 13 12:53:46
>>>> EDT 2011
>>>>
>>>> -Dclassworlds.conf=/usr/share/maven/maven/bin/m2.conf
>>>> -Dmaven.home=/usr/share/maven/maven
>>>> -Duser.timezone=EET
>>>> -XX:+AggressiveOpts
>>>> -XX:+DisableExplicitGC
>>>> -XX:+ParallelRefProcEnabled
>>>> -XX:+PrintGCDateStamps
>>>> -XX:+PrintGCDetails
>>>> -XX:+PrintHeapAtGC
>>>> -XX:+UseAdaptiveSizePolicy
>>>> -XX:+UseCompressedOops
>>>> -XX:+UseFastAccessorMethods
>>>> -XX:+UseG1GC
>>>> -XX:+UseGCOverheadLimit
>>>> -XX:+UseNUMA
>>>> -XX:+UseStringCache
>>>> -XX:CMSInitiatingOccupancyFraction=70
>>>> -XX:GCPauseIntervalMillis=10000
>>>> -XX:InitiatingHeapOccupancyPercent=0
>>>> -XX:MaxGCPauseMillis=500
>>>> -XX:MaxPermSize=512m
>>>> -XX:PermSize=512m
>>>> -XX:ReservedCodeCacheSize=48m
>>>> -Xloggc:gc.log
>>>> -Xmaxf1
>>>> -Xms30G
>>>> -Xmx30G
>>>> -Xnoclassgc
>>>> -Xss4096k
>>>>
>>>>
>>>> **
>>>> Martin
>>>>
>>>> 2013/5/26 Charlie Hunt <chunt at salesforce.com>:
>>>>> Which version of the JDK/JRE are you using?
>>>>>
>>>>> One of the links you referenced below was using JDK 6, where there is no official support for G1. The other link suggests it could have been RMI DGC or a System.gc().
>>>>>
>>>>>
>>>>>
>>>>> Sent from my iPhone
>>>>>
>>>>> On May 25, 2013, at 11:43 PM, "Martin Makundi" <martin.makundi at koodaripalvelut.com> wrote:
>>>>>
>>>>>> it occurs daily.
>>>
>

From chunt at salesforce.com  Wed May 29 09:31:28 2013
From: chunt at salesforce.com (Charlie Hunt)
Date: Wed, 29 May 2013 11:31:28 -0500
Subject: Fwd: Bug in G1GC it performs Full GC when code cache is full
	resulting in overkill
References: <9A54ECD2-B470-4797-9EFB-E81DD91C6B3F@salesforce.com>
Message-ID: <F93A5AB6-71B1-4D75-B502-906F5A4D0D5F@salesforce.com>

forgot to cc hotspot-gc-use ... will try to remember on future replies.

Begin forwarded message:

> From: Charlie Hunt <chunt at salesforce.com>
> Subject: Re: Bug in G1GC it performs Full GC when code cache is full resulting in overkill
> Date: May 29, 2013 11:28:35 AM CDT
> To: Martin Makundi <martin.makundi at koodaripalvelut.com>
> 
> Couple comments below.
> 
> charlie ...
> 
> On May 29, 2013, at 10:49 AM, Martin Makundi wrote:
> 
>> Hi!
>> 
>>> A bit of constructive criticism ;-)  It would be good practice to set one option at a time and measure its performance to determine whether it improves performance rather than choosing an option because of something you read in text.  In short, always measure and reason about whether what you've observed for an improvement or regression makes sense.  And, also run multiple times to get a sense of noise versus real improvement or regression.
>> 
>> 
>> Thanks. That's one of the reasons we never changed our options. Once
>> we found someting that works very well, we know that its always n!
>> work to test changes and the system was running very nice indeed
>> before the previous tweak ;)
>> 
>>>>> -XX:+UseFastAccessorMethod  (the default is disabled)
>>>> 
>>>> Fast sounds good, the description of it is "Use optimized versions of
>>>> Get<Primitive>Field" which sounds good. I see no harm in this.
>>> 
>>> These would be JNI operations.
>>> 
>>> A quick at the HotSpot source suggests UseFastAccessorMethods
>>> is mostly confined to interpreter operations.
>> 
>> Thanks for the info. Doesn't say much to me, but does not seem to harm
>> anything. Will try setting it off at some point in time.
>> 
>>>>> -XX:+UseNUMA  (Are you running a JVM that spans NUMA memory nodes?
>>>>> Or, do you have multiple JVMs running on a NUMA or non-NUMA system?)
>>>> 
>>> 
>>> Key point is that you should use -XX:+UseNUMA only when you are
>>> deploying  a JVM that spans NUMA nodes.
>> 
>> Thanks for the info. Doesn't say much to me, but does not seem to harm
>> anything. Will try setting it off at some point in time.
> 
> You either have a NUMA system and deploying a single JVM on it, or you're on a non-NUMA system.
> 
> Come to think of it, you're on a Linux system. I don't recall the exact details of how the numa-allocator works on Linux.  I was first thinking of it in terms of how it's handled on Solaris with Solaris lgroups.  I won't go into that. :-)  On Linux it may just do round robin .. would have to go look again at the HotSpot source to see what it does on Linux.  Depending on what it does, you may not see any difference with it.  However, that could change in the future and you could be caught off guard with such a change. ;-)
> 
>> 
>>>>> -XX:+UseStringCache (Do you have evidence that this helps?  And, do you know what it does?)
>>>> 
>>>> I assume it is some sort of string interning solution. Don't know
>>>> exactly what it does, but our application uses high amount of
>>>> redundant strings, smaller memory footprint is a good idea. Again,
>>>> very little documentation about this available but seems
>>>> straightforward. Haven't benchmarked it personally.
>>> 
>>> I won't go into the details of what it does.  I don't think I can say what it does without possibly being at risk of binding separation agreement.
>>> 
>>> I'll just say that you should measure the perf difference with it off versus on if you think it might help.
>> 
>> No visible impact on performance, really, in production. If we test it
>> with a bogus test case, well.. results are as informative as our tests
>> are close to the production. Which is unlikely.
>> 
>>>>> -XX:CMSInitiatingOccupancyFraction=70 (This is applicable to CMS GC, and not applicable to G1 GC)
>>>> 
>>>> Again, not documented thoroughly where it applies and where not, jvm
>>>> gave no warning/error about it so we assumed it's valid.
>>> 
>>> There's always the HotSpot source code ;-)
>>> 
>>> It's also quite well documented in various slide ware on the internet.
>>> It's also quite well documented in the Java Performance book. :-)
>> 
>> Uh.. does it say somewhere that Do not use
>> XX:CMSInitiatingOccupancyFraction with G1GC? ;) I know performance
>> tuning is your bread and butter but is not ours... is more like we are
>> just driving the car and you are the mechanic...different
>> perspective.. just trying to fill'er'up to go..leaded or unleaded...
>> ;)
> 
> Well, uh, the command line options says "CMS" in it.  Isn't that enough to imply that it's CMS specific?  Additionally, if the description says, "The percent of old generation space occupancy at which the first CMS garbage collection cycle should start. Subsequent starts of the CMS cycle are determined at a HotSpot ergonomically computed occupancy.", isn't that enough to imply it's CMS GC specific?  That description comes directly from Java Performance.
> 
> To use your analogy, would you put diesel fuel in your gasoline powered vehicle when the label at the pump says "diesel fuel"?
> 
>> 
>>>>> -XX:InitiatingHeapOccupancyPercent=0  (You realize this will force G1's concurrent cycle to run continuously?)
>>>> 
>>>> Yes, that's what we figured out, we don't want it to sit lazy and end
>>>> up in a situation where it is required to do a Full GC. This switch
>>>> was specifically chosen in a situation we had a memory leak and tried
>>>> to aggressively fight against it before we found the root cause. Maybe
>>>> we should try without this switch now, and see what effect it has.
>>> 
>>> Having GC logs to see what available head room you have between the
>>> initiating of a G1 concurrent cycle and available regions / heap space would be most appropriate.
>> 
>> Hmm.. I don't thoroughly understand the logs either, but, here is a snap:
>> 
>> 2013-05-29T17:28:56.119+0300: 38905.407: [GC pause (young), 0.29261000 secs]
>>  [Parallel Time: 288.8 ms]
>>     [GC Worker Start (ms):  38905407.6  38905407.7  38905407.7  38905407.9
>>      Avg: 38905407.7, Min: 38905407.6, Max: 38905407.9, Diff:   0.3]
>>     [Ext Root Scanning (ms):  22.8  16.3  18.1  22.0
>>      Avg:  19.8, Min:  16.3, Max:  22.8, Diff:   6.6]
>>     [SATB Filtering (ms):  0.0  0.1  0.0  0.0
>>      Avg:   0.0, Min:   0.0, Max:   0.1, Diff:   0.1]
>>     [Update RS (ms):  31.9  37.3  35.1  33.3
>>      Avg:  34.4, Min:  31.9, Max:  37.3, Diff:   5.5]
>>        [Processed Buffers : 102 106 119 104
>>         Sum: 431, Avg: 107, Min: 102, Max: 119, Diff: 17]
>>     [Scan RS (ms):  0.0  0.0  0.1  0.0
>>      Avg:   0.0, Min:   0.0, Max:   0.1, Diff:   0.1]
>>     [Object Copy (ms):  228.2  229.1  229.5  227.3
>>      Avg: 228.5, Min: 227.3, Max: 229.5, Diff:   2.2]
>>     [Termination (ms):  0.0  0.0  0.0  0.0
>>      Avg:   0.0, Min:   0.0, Max:   0.0, Diff:   0.0]
>>        [Termination Attempts : 4 1 11 4
>>         Sum: 20, Avg: 5, Min: 1, Max: 11, Diff: 10]
>>     [GC Worker End (ms):  38905690.5  38905690.5  38905690.5  38905690.5
>>      Avg: 38905690.5, Min: 38905690.5, Max: 38905690.5, Diff:   0.0]
>>     [GC Worker (ms):  282.9  282.8  282.8  282.6
>>      Avg: 282.8, Min: 282.6, Max: 282.9, Diff:   0.3]
>>     [GC Worker Other (ms):  5.9  6.0  6.0  6.2
>>      Avg:   6.0, Min:   5.9, Max:   6.2, Diff:   0.3]
>>  [Complete CSet Marking:   0.0 ms]
>>  [Clear CT:   0.1 ms]
>>  [Other:   3.7 ms]
>>     [Choose CSet:   0.0 ms]
>>     [Ref Proc:   2.8 ms]
>>     [Ref Enq:   0.1 ms]
>>     [Free CSet:   0.3 ms]
>>  [Eden: 48M(5032M)->0B(5032M) Survivors: 288M->288M Heap:
>> 15790M(26624M)->15741M(26624M)]
>> [Times: user=1.14 sys=0.00, real=0.29 secs]
>> Heap after GC invocations=575 (full 157):
>> garbage-first heap   total 27262976K, used 16119181K
>> [0x0000000160000000, 0x00000007e0000000, 0x00000007e0000000)
>> region size 8192K, 36 young (294912K), 36 survivors (294912K)
>> compacting perm gen  total 524288K, used 164479K [0x00000007e0000000,
>> 0x0000000800000000, 0x0000000800000000)
>>  the space 524288K,  31% used [0x00000007e0000000,
>> 0x00000007ea09ff68, 0x00000007ea0a0000, 0x0000000800000000)
>> No shared spaces configured.
>> }
>> {Heap before GC invocations=575 (full 157):
>> garbage-first heap   total 27262976K, used 16119181K
>> [0x0000000160000000, 0x00000007e0000000, 0x00000007e0000000)
>> region size 8192K, 37 young (303104K), 36 survivors (294912K)
>> compacting perm gen  total 524288K, used 164479K [0x00000007e0000000,
>> 0x0000000800000000, 0x0000000800000000)
>>  the space 524288K,  31% used [0x00000007e0000000,
>> 0x00000007ea09ff68, 0x00000007ea0a0000, 0x0000000800000000)
>> No shared spaces configured.
>> 2013-05-29T17:28:56.413+0300: 38905.701: [Full GC
>> 15742M->14497M(26624M), 56.7731320 secs]
>> 
>> That's the third Full GC today after the change to 26G and change from
>> occupancypercent=0. Tomorrow will be trying again with
>> occupancypercent=0
> 
> I saw your follow-up post with additional GC logs.  Thanks!  That'll really help!
> 
>> 
>>>>> -noclassgc (This is rarely needed and haven't seen an app that required it for quite some time)
>>>> 
>>>> Jvm 1.6 stopped the world for couple of minutes several times per day
>>>> while unloading classes, so we used noclassgc to disable that. We do
>>>> not know if this is necessary for latest 1.7 to avoid class unload
>>>> pause, but we continued to use this switch and found no harm in it.
>>>> Can't afford testing that in production ;)
>>> 
>>> Haven't seen a case where unloading classes cause a several minute pause.
>>> Are you sure your system is not swapping?  And, do you have GC logs you
>>> can share that illustrate the behavior and that -noclassgc fixed it?
>> 
>> We deleted swap partition long time ago, we simply do not risk swapping at all.
> 
> You may need some additional swap, or you'll need additional memory for backing reserved space even though the application may not use it.
> 
>> 
>> We had this class unloading problem several times per day like half a
>> year ago, and fixed it with noclasssgc, that was a no-brainer, single
>> parameter that made the difference.
> 
> Ok, if you're convinced it fixes your issue, then use it. :-)  Usually class unloading issues generally implies perm gen size needs increases, or initial perm gen size could use increasing as an alternative.
> 
>> 
>> It is also discussed here (they do not discuss noclassgc though, we
>> figured that out somehow)
>> http://stackoverflow.com/questions/2833983/meaning-of-the-unloading-class-messages
>> 
>>>>> -XX:+ UseGCOverheadLimit
>>>>> -XX:ReservedCodeCacheSize=48, that is the default for 7u21.  You might consider setting it higher if you have the available space, and more importantly if you think you're running out of code space.
>>>> 
>>>> For our sun jvm linux 64bit 48m is maximum, jvm won't start if higher value.
>>> 
>>> If you can't go larger than -XX:ReservedCodeCacheSize=48m, that may suggest you have memory constraints and may also suggest you don't have enough swap space defined, and you may be experiencing swapping during JVM execution.  I've got a Linux system that has 32 GB of RAM, I can set ReservedCodeCacheSize=256m with no issues, even with -Xms30g and -Xms30g.
>> 
>> It is also documented that 48m is maximum
>> http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html
>> "maximum code cache size. [Solaris 64-bit, amd64, and -server x86:
>> 48m"
>> 
> 
> The documentation means that the command line option sets the maximum code cache size, not that it is the absolute maximum you can set.  Rather it sets the default maximum code cache size.  It's not any different than setting -Xmx, there is a default -Xmx value if you don't set one, and you can specify a larger one using -Xmx.
> 
> Fwiw, here's the output from 7u21 on my Linux x64 system with 32 GB of RAM, notice I also specified +AlwaysPreTouch which forces every page to be touched as part of the command execution to illustrate the memory has been reserved, committed and touched:
> $  java -Xmx26g -Xms26g -XX:+AlwaysPreTouch -XX:ReservedCodeCacheSize=256m -version
> java version "1.7.0_21"
> Java(TM) SE Runtime Environment (build 1.7.0_21-b11)
> Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode)
> 
> 
>> **
>> Martin
>> 
>>> 
>>> 
>>>>> 
>>>>> On May 26, 2013, at 10:20 AM, Martin Makundi wrote:
>>>>> 
>>>>>> Sorry, forgot to mention, using:
>>>>>> 
>>>>>> java version "1.7.0_21"
>>>>>> Java(TM) SE Runtime Environment (build 1.7.0_21-b11)
>>>>>> Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode)
>>>>>> 
>>>>>> Linux version 3.0.1.stk64 (dfn at localhost.localdomain) (gcc version
>>>>>> 4.5.1 20100924 (Red Hat 4.5.1-4) (GCC) ) #1 SMP Sat Aug 13 12:53:46
>>>>>> EDT 2011
>>>>>> 
>>>>>> -Dclassworlds.conf=/usr/share/maven/maven/bin/m2.conf
>>>>>> -Dmaven.home=/usr/share/maven/maven
>>>>>> -Duser.timezone=EET
>>>>>> -XX:+AggressiveOpts
>>>>>> -XX:+DisableExplicitGC
>>>>>> -XX:+ParallelRefProcEnabled
>>>>>> -XX:+PrintGCDateStamps
>>>>>> -XX:+PrintGCDetails
>>>>>> -XX:+PrintHeapAtGC
>>>>>> -XX:+UseAdaptiveSizePolicy
>>>>>> -XX:+UseCompressedOops
>>>>>> -XX:+UseFastAccessorMethods
>>>>>> -XX:+UseG1GC
>>>>>> -XX:+UseGCOverheadLimit
>>>>>> -XX:+UseNUMA
>>>>>> -XX:+UseStringCache
>>>>>> -XX:CMSInitiatingOccupancyFraction=70
>>>>>> -XX:GCPauseIntervalMillis=10000
>>>>>> -XX:InitiatingHeapOccupancyPercent=0
>>>>>> -XX:MaxGCPauseMillis=500
>>>>>> -XX:MaxPermSize=512m
>>>>>> -XX:PermSize=512m
>>>>>> -XX:ReservedCodeCacheSize=48m
>>>>>> -Xloggc:gc.log
>>>>>> -Xmaxf1
>>>>>> -Xms30G
>>>>>> -Xmx30G
>>>>>> -Xnoclassgc
>>>>>> -Xss4096k
>>>>>> 
>>>>>> 
>>>>>> **
>>>>>> Martin
>>>>>> 
>>>>>> 2013/5/26 Charlie Hunt <chunt at salesforce.com>:
>>>>>>> Which version of the JDK/JRE are you using?
>>>>>>> 
>>>>>>> One of the links you referenced below was using JDK 6, where there is no official support for G1. The other link suggests it could have been RMI DGC or a System.gc().
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Sent from my iPhone
>>>>>>> 
>>>>>>> On May 25, 2013, at 11:43 PM, "Martin Makundi" <martin.makundi at koodaripalvelut.com> wrote:
>>>>>>> 
>>>>>>>> it occurs daily.
>>>>> 
>>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130529/9978f7a5/attachment-0001.html 

From martin.makundi at koodaripalvelut.com  Wed May 29 09:47:17 2013
From: martin.makundi at koodaripalvelut.com (Martin Makundi)
Date: Wed, 29 May 2013 19:47:17 +0300
Subject: Bug in G1GC it performs Full GC when code cache is full resulting
	in overkill
In-Reply-To: <F93A5AB6-71B1-4D75-B502-906F5A4D0D5F@salesforce.com>
References: <9A54ECD2-B470-4797-9EFB-E81DD91C6B3F@salesforce.com>
	<F93A5AB6-71B1-4D75-B502-906F5A4D0D5F@salesforce.com>
Message-ID: <CAKspAHJmL3ezrWhM=AiQ_ZMoKtEQ-bV+F_wKzEh1kNzKnKjXRQ@mail.gmail.com>

Hi!

> To use your analogy, would you put diesel fuel in your gasoline powered
> vehicle when the label at the pump says "diesel fuel"?

Well, eh.. to use your analogy, we are like the old lady who
distinguishes cars by color only =) You are looking down on us from a
high tower...was it 12 years JVM tuning? ..huh..we are yet just
toddlers... ;)

> You may need some additional swap, or you'll need additional memory for
> backing reserved space even though the application may not use it.

We had memory fixed so swapping should not be an issue if the problem
is inside jvm..which seems to be.

We have free memory on linux side so physical memory as such is not a problem.

>> We had this class unloading problem several times per day like half a
>> year ago, and fixed it with noclasssgc, that was a no-brainer, single
>> parameter that made the difference.
>
> Ok, if you're convinced it fixes your issue, then use it. :-)  Usually class
> unloading issues generally implies perm gen size needs increases, or initial
> perm gen size could use increasing as an alternative.

Also permgen was measured at that time and it was not an issue, there
was plenty of permgen space, but probably some unreferenced loaded
classes. I read somewhere that if your  code uses lots of reflection
it might generate some moss...anyways, works now so I quit guessing =)

> -XX:ReservedCodeCacheSize=48, that is the default for 7u21.  You might
> consider setting it higher if you have the available space, and more
> importantly if you think you're running out of code space.
>
> For our sun jvm linux 64bit 48m is maximum, jvm won't start if higher value.
>
>
> If you can't go larger than -XX:ReservedCodeCacheSize=48m, that may suggest
> you have memory constraints and may also suggest you don't have enough swap
> space defined, and you may be experiencing swapping during JVM execution.
> I've got a Linux system that has 32 GB of RAM, I can set
> ReservedCodeCacheSize=256m with no issues, even with -Xms30g and -Xms30g.
>
>
> It is also documented that 48m is maximum
>
> http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html
>
> "maximum code cache size. [Solaris 64-bit, amd64, and -server x86:
>
> 48m"
>
>
>
> The documentation means that the command line option sets the maximum code
> cache size, not that it is the absolute maximum you can set.  Rather it sets
> the default maximum code cache size.  It's not any different than setting
> -Xmx, there is a default -Xmx value if you don't set one, and you can
> specify a larger one using -Xmx.
>
> Fwiw, here's the output from 7u21 on my Linux x64 system with 32 GB of RAM,
> notice I also specified +AlwaysPreTouch which forces every page to be
> touched as part of the command execution to illustrate the memory has been
> reserved, committed and touched:
> $  java -Xmx26g -Xms26g -XX:+AlwaysPreTouch -XX:ReservedCodeCacheSize=256m
> -version
> java version "1.7.0_21"
> Java(TM) SE Runtime Environment (build 1.7.0_21-b11)
> Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode)

Ok, you are right, it seems to work. What do you recommend for code
cache size or how to find a good value for it?

**
Martin


>
>
>
>
>
> On May 26, 2013, at 10:20 AM, Martin Makundi wrote:
>
>
> Sorry, forgot to mention, using:
>
>
> java version "1.7.0_21"
>
> Java(TM) SE Runtime Environment (build 1.7.0_21-b11)
>
> Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode)
>
>
> Linux version 3.0.1.stk64 (dfn at localhost.localdomain) (gcc version
>
> 4.5.1 20100924 (Red Hat 4.5.1-4) (GCC) ) #1 SMP Sat Aug 13 12:53:46
>
> EDT 2011
>
>
> -Dclassworlds.conf=/usr/share/maven/maven/bin/m2.conf
>
> -Dmaven.home=/usr/share/maven/maven
>
> -Duser.timezone=EET
>
> -XX:+AggressiveOpts
>
> -XX:+DisableExplicitGC
>
> -XX:+ParallelRefProcEnabled
>
> -XX:+PrintGCDateStamps
>
> -XX:+PrintGCDetails
>
> -XX:+PrintHeapAtGC
>
> -XX:+UseAdaptiveSizePolicy
>
> -XX:+UseCompressedOops
>
> -XX:+UseFastAccessorMethods
>
> -XX:+UseG1GC
>
> -XX:+UseGCOverheadLimit
>
> -XX:+UseNUMA
>
> -XX:+UseStringCache
>
> -XX:CMSInitiatingOccupancyFraction=70
>
> -XX:GCPauseIntervalMillis=10000
>
> -XX:InitiatingHeapOccupancyPercent=0
>
> -XX:MaxGCPauseMillis=500
>
> -XX:MaxPermSize=512m
>
> -XX:PermSize=512m
>
> -XX:ReservedCodeCacheSize=48m
>
> -Xloggc:gc.log
>
> -Xmaxf1
>
> -Xms30G
>
> -Xmx30G
>
> -Xnoclassgc
>
> -Xss4096k
>
>
>
> **
>
> Martin
>
>
> 2013/5/26 Charlie Hunt <chunt at salesforce.com>:
>
> Which version of the JDK/JRE are you using?
>
>
> One of the links you referenced below was using JDK 6, where there is no
> official support for G1. The other link suggests it could have been RMI DGC
> or a System.gc().
>
>
>
>
> Sent from my iPhone
>
>
> On May 25, 2013, at 11:43 PM, "Martin Makundi"
> <martin.makundi at koodaripalvelut.com> wrote:
>
>
> it occurs daily.
>
>
>
>
>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>

From chunt at salesforce.com  Wed May 29 10:25:35 2013
From: chunt at salesforce.com (Charlie Hunt)
Date: Wed, 29 May 2013 12:25:35 -0500
Subject: Bug in G1GC it performs Full GC when code cache is full resulting
	in overkill
In-Reply-To: <CAKspAHJmL3ezrWhM=AiQ_ZMoKtEQ-bV+F_wKzEh1kNzKnKjXRQ@mail.gmail.com>
References: <9A54ECD2-B470-4797-9EFB-E81DD91C6B3F@salesforce.com>
	<F93A5AB6-71B1-4D75-B502-906F5A4D0D5F@salesforce.com>
	<CAKspAHJmL3ezrWhM=AiQ_ZMoKtEQ-bV+F_wKzEh1kNzKnKjXRQ@mail.gmail.com>
Message-ID: <6208C497-21E4-4A26-B825-13F23E702898@salesforce.com>

On May 29, 2013, at 11:47 AM, Martin Makundi wrote:

> Hi!
> 
>> To use your analogy, would you put diesel fuel in your gasoline powered
>> vehicle when the label at the pump says "diesel fuel"?
> 
> Well, eh.. to use your analogy, we are like the old lady who
> distinguishes cars by color only =) You are looking down on us from a
> high tower...was it 12 years JVM tuning? ..huh..we are yet just
> toddlers... ;)
> 
>> You may need some additional swap, or you'll need additional memory for
>> backing reserved space even though the application may not use it.
> 
> We had memory fixed so swapping should not be an issue if the problem
> is inside jvm..which seems to be.
> 
> We have free memory on linux side so physical memory as such is not a problem.

Having free memory is one thing.  How much free memory do you have?  Do you have 2x more than your Java heap size, including -Xmx and what you're specifying for MaxPermSize?

> 
>>> We had this class unloading problem several times per day like half a
>>> year ago, and fixed it with noclasssgc, that was a no-brainer, single
>>> parameter that made the difference.
>> 
>> Ok, if you're convinced it fixes your issue, then use it. :-)  Usually class
>> unloading issues generally implies perm gen size needs increases, or initial
>> perm gen size could use increasing as an alternative.
> 
> Also permgen was measured at that time and it was not an issue, there
> was plenty of permgen space, but probably some unreferenced loaded
> classes. I read somewhere that if your  code uses lots of reflection
> it might generate some moss...anyways, works now so I quit guessing =)

Question is did you set both -XX:PermSize and -XX:MaxPermSize to the same value?  If not, if perm gen needs to expand from the initial size, or gets close to that initial size, the JVM may attempt to unload classes to free up space prior to expanding perm gen.

But, if you'd rather use -noclassgc, then go ahead. :-)

> 
>> -XX:ReservedCodeCacheSize=48, that is the default for 7u21.  You might
>> consider setting it higher if you have the available space, and more
>> importantly if you think you're running out of code space.
>> 
>> For our sun jvm linux 64bit 48m is maximum, jvm won't start if higher value.
>> 
>> 
>> If you can't go larger than -XX:ReservedCodeCacheSize=48m, that may suggest
>> you have memory constraints and may also suggest you don't have enough swap
>> space defined, and you may be experiencing swapping during JVM execution.
>> I've got a Linux system that has 32 GB of RAM, I can set
>> ReservedCodeCacheSize=256m with no issues, even with -Xms30g and -Xms30g.
>> 
>> 
>> It is also documented that 48m is maximum
>> 
>> http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html
>> 
>> "maximum code cache size. [Solaris 64-bit, amd64, and -server x86:
>> 
>> 48m"
>> 
>> 
>> 
>> The documentation means that the command line option sets the maximum code
>> cache size, not that it is the absolute maximum you can set.  Rather it sets
>> the default maximum code cache size.  It's not any different than setting
>> -Xmx, there is a default -Xmx value if you don't set one, and you can
>> specify a larger one using -Xmx.
>> 
>> Fwiw, here's the output from 7u21 on my Linux x64 system with 32 GB of RAM,
>> notice I also specified +AlwaysPreTouch which forces every page to be
>> touched as part of the command execution to illustrate the memory has been
>> reserved, committed and touched:
>> $  java -Xmx26g -Xms26g -XX:+AlwaysPreTouch -XX:ReservedCodeCacheSize=256m
>> -version
>> java version "1.7.0_21"
>> Java(TM) SE Runtime Environment (build 1.7.0_21-b11)
>> Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode)
> 
> Ok, you are right, it seems to work. What do you recommend for code
> cache size or how to find a good value for it?

There's not magic number of what to increase it too.  It's probably more magical than suggesting a Java heap configuration size to use. We don't know how much code you've got and how much of it will be executed enough times to compile, or require a de-opt and re-opt.

However, you can monitor the occupancy of code cache in a couple different ways.  There's a JMX MBean for the code cache where you can get the occupancy and size of code cache.  There's also a plug-in for VisualVM that monitors code cache size and occupancy, you get a copy of the plug-in and install it into VisualVM. The web site for it is: https://java.net/projects/memorypoolview.  If you're monitoring code cache occupancy in production today, you should probably put it on your short list.

In JDK 7u21, code cache flushing is enabled by default.  So when code cache approaches getting full, it will attempt to flush the oldest compilations to make available space.  I noticed some recent commits for JDK 8 that improve the behavior of code cache flushing.  If your application requires a huge amount of code cache space, flushing may not be the best option. Ideally, you'd like to not have to rely on flushing.

You can disable code cache flushing by using -XX:-UseCodeCacheFlushing.  But, realize that if you run out of code cache space, JIT compilation will be halted.

If you monitor code cache with code cache flushing enabled and you see code cache occupancy grow near capacity and drop back, it's a symptom that code cache flushing is taken place and new (JIT compilation) activations are occurring. If you disable code cache flushing, and you see code cache occupancy grow near capacity and you've noticed application throughput slows down, or your response times have increased, you can expect that code cache space has been exhausted.

If you have code cache flushing disabled, if you run out of code cache, you'll get a message in your log that says:
CodeCache is full. Compiler has been disabled.
Try increasing the code cache size using -XX:ReservedCodeCacheSize=


> **
> Martin
> 
> 
>> 
>> 
>> 
>> 
>> 
>> On May 26, 2013, at 10:20 AM, Martin Makundi wrote:
>> 
>> 
>> Sorry, forgot to mention, using:
>> 
>> 
>> java version "1.7.0_21"
>> 
>> Java(TM) SE Runtime Environment (build 1.7.0_21-b11)
>> 
>> Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode)
>> 
>> 
>> Linux version 3.0.1.stk64 (dfn at localhost.localdomain) (gcc version
>> 
>> 4.5.1 20100924 (Red Hat 4.5.1-4) (GCC) ) #1 SMP Sat Aug 13 12:53:46
>> 
>> EDT 2011
>> 
>> 
>> -Dclassworlds.conf=/usr/share/maven/maven/bin/m2.conf
>> 
>> -Dmaven.home=/usr/share/maven/maven
>> 
>> -Duser.timezone=EET
>> 
>> -XX:+AggressiveOpts
>> 
>> -XX:+DisableExplicitGC
>> 
>> -XX:+ParallelRefProcEnabled
>> 
>> -XX:+PrintGCDateStamps
>> 
>> -XX:+PrintGCDetails
>> 
>> -XX:+PrintHeapAtGC
>> 
>> -XX:+UseAdaptiveSizePolicy
>> 
>> -XX:+UseCompressedOops
>> 
>> -XX:+UseFastAccessorMethods
>> 
>> -XX:+UseG1GC
>> 
>> -XX:+UseGCOverheadLimit
>> 
>> -XX:+UseNUMA
>> 
>> -XX:+UseStringCache
>> 
>> -XX:CMSInitiatingOccupancyFraction=70
>> 
>> -XX:GCPauseIntervalMillis=10000
>> 
>> -XX:InitiatingHeapOccupancyPercent=0
>> 
>> -XX:MaxGCPauseMillis=500
>> 
>> -XX:MaxPermSize=512m
>> 
>> -XX:PermSize=512m
>> 
>> -XX:ReservedCodeCacheSize=48m
>> 
>> -Xloggc:gc.log
>> 
>> -Xmaxf1
>> 
>> -Xms30G
>> 
>> -Xmx30G
>> 
>> -Xnoclassgc
>> 
>> -Xss4096k
>> 
>> 
>> 
>> **
>> 
>> Martin
>> 
>> 
>> 2013/5/26 Charlie Hunt <chunt at salesforce.com>:
>> 
>> Which version of the JDK/JRE are you using?
>> 
>> 
>> One of the links you referenced below was using JDK 6, where there is no
>> official support for G1. The other link suggests it could have been RMI DGC
>> or a System.gc().
>> 
>> 
>> 
>> 
>> Sent from my iPhone
>> 
>> 
>> On May 25, 2013, at 11:43 PM, "Martin Makundi"
>> <martin.makundi at koodaripalvelut.com> wrote:
>> 
>> 
>> it occurs daily.
>> 
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130529/dd5f29d1/attachment-0001.html 

From john.cuthbertson at oracle.com  Wed May 29 10:35:27 2013
From: john.cuthbertson at oracle.com (John Cuthbertson)
Date: Wed, 29 May 2013 10:35:27 -0700
Subject: Bug in G1GC it performs Full GC when code cache is full resulting
	in overkill
In-Reply-To: <CAKspAH+NQWUkbDwsOCAfkQr1aVKC_WPRipDWUEA7PezS38drBA@mail.gmail.com>
References: <CAKspAHKSnrndWznEe8Uk-frRALUJ_b_JVq+be5woQuj0BnKSeg@mail.gmail.com>
	<3807DE76-D6CA-451F-AC72-771332825905@salesforce.com>
	<CAKspAHJ5M7CikMmyr0t0h=5GOFjHJ3jACo-M6hW-ZXoz=v=-2g@mail.gmail.com>
	<DE80C755-56B1-4C39-AD35-3007C86762A7@salesforce.com>
	<CAKspAHJev3o+g_CN0H3NUsjXPpThMWppnvawiuG_P8YFpSDVEQ@mail.gmail.com>
	<5C892669-3515-4DCC-BB72-96AE89EBE5F8@salesforce.com>
	<CAKspAH+NQWUkbDwsOCAfkQr1aVKC_WPRipDWUEA7PezS38drBA@mail.gmail.com>
Message-ID: <51A63C5F.9040808@oracle.com>

Hi Martin,

I'm going to fill in bit more detail to Charlie's replies....

On 5/29/2013 8:49 AM, Martin Makundi wrote:
> Hi!
>
>> A bit of constructive criticism ;-)  It would be good practice to set one option at a time and measure its performance to determine whether it improves performance rather than choosing an option because of something you read in text.  In short, always measure and reason about whether what you've observed for an improvement or regression makes sense.  And, also run multiple times to get a sense of noise versus real improvement or regression.
>
> Thanks. That's one of the reasons we never changed our options. Once
> we found someting that works very well, we know that its always n!
> work to test changes and the system was running very nice indeed
> before the previous tweak ;)
>
>>>> -XX:+UseFastAccessorMethod  (the default is disabled)
>>> Fast sounds good, the description of it is "Use optimized versions of
>>> Get<Primitive>Field" which sounds good. I see no harm in this.
>> These would be JNI operations.
>>
>> A quick at the HotSpot source suggests UseFastAccessorMethods
>> is mostly confined to interpreter operations.
> Thanks for the info. Doesn't say much to me, but does not seem to harm
> anything. Will try setting it off at some point in time.
>
>>>> -XX:+UseNUMA  (Are you running a JVM that spans NUMA memory nodes?
>>>> Or, do you have multiple JVMs running on a NUMA or non-NUMA system?)
>> Key point is that you should use -XX:+UseNUMA only when you are
>> deploying  a JVM that spans NUMA nodes.
> Thanks for the info. Doesn't say much to me, but does not seem to harm
> anything. Will try setting it off at some point in time.

The fast accessor methods flag creates specialized (i.e. short and 
optimized) interpreter entry points for accessor methods (those that 
just return the value in one of the object's fields). In most 
applications the bulk of the execution time is spent executing JIT 
compiled code; only a few percent is typically spent in Hotspot's 
interpreter. The JIT compiler will always try to inline accessor methods 
into their caller. So, unless your application is spending a ton of time 
interpreting, this flag should make no difference.

>>>> -XX:CMSInitiatingOccupancyFraction=70 (This is applicable to CMS GC, and not applicable to G1 GC)
>>> Again, not documented thoroughly where it applies and where not, jvm
>>> gave no warning/error about it so we assumed it's valid.
>> There's always the HotSpot source code ;-)
>>
>> It's also quite well documented in various slide ware on the internet.
>> It's also quite well documented in the Java Performance book. :-)
> Uh.. does it say somewhere that Do not use
> XX:CMSInitiatingOccupancyFraction with G1GC? ;) I know performance
> tuning is your bread and butter but is not ours... is more like we are
> just driving the car and you are the mechanic...different
> perspective.. just trying to fill'er'up to go..leaded or unleaded...
> ;)

The G1 equivalent of this flag is InitiatingHeapOccupancyPercent (IHOP 
for short). Actually both G1 and CMS accept and observe IHOP. 
CMSInitiatingOccupancyFraction was superseded by IHOP. The JVM still 
accepts the old flag name - but it is CMS only and doesn't affect G1.

>>>> -XX:InitiatingHeapOccupancyPercent=0  (You realize this will force G1's concurrent cycle to run continuously?)
>>> Yes, that's what we figured out, we don't want it to sit lazy and end
>>> up in a situation where it is required to do a Full GC. This switch
>>> was specifically chosen in a situation we had a memory leak and tried
>>> to aggressively fight against it before we found the root cause. Maybe
>>> we should try without this switch now, and see what effect it has.
>> Having GC logs to see what available head room you have between the
>> initiating of a G1 concurrent cycle and available regions / heap space would be most appropriate.
> Hmm.. I don't thoroughly understand the logs either, but, here is a snap:
>
> 2013-05-29T17:28:56.119+0300: 38905.407: [GC pause (young), 0.29261000 secs]
>     [Parallel Time: 288.8 ms]
>        [GC Worker Start (ms):  38905407.6  38905407.7  38905407.7  38905407.9
>         Avg: 38905407.7, Min: 38905407.6, Max: 38905407.9, Diff:   0.3]
>        [Ext Root Scanning (ms):  22.8  16.3  18.1  22.0
>         Avg:  19.8, Min:  16.3, Max:  22.8, Diff:   6.6]
>        [SATB Filtering (ms):  0.0  0.1  0.0  0.0
>         Avg:   0.0, Min:   0.0, Max:   0.1, Diff:   0.1]
>        [Update RS (ms):  31.9  37.3  35.1  33.3
>         Avg:  34.4, Min:  31.9, Max:  37.3, Diff:   5.5]
>           [Processed Buffers : 102 106 119 104
>            Sum: 431, Avg: 107, Min: 102, Max: 119, Diff: 17]
>        [Scan RS (ms):  0.0  0.0  0.1  0.0
>         Avg:   0.0, Min:   0.0, Max:   0.1, Diff:   0.1]
>        [Object Copy (ms):  228.2  229.1  229.5  227.3
>         Avg: 228.5, Min: 227.3, Max: 229.5, Diff:   2.2]
>        [Termination (ms):  0.0  0.0  0.0  0.0
>         Avg:   0.0, Min:   0.0, Max:   0.0, Diff:   0.0]
>           [Termination Attempts : 4 1 11 4
>            Sum: 20, Avg: 5, Min: 1, Max: 11, Diff: 10]
>        [GC Worker End (ms):  38905690.5  38905690.5  38905690.5  38905690.5
>         Avg: 38905690.5, Min: 38905690.5, Max: 38905690.5, Diff:   0.0]
>        [GC Worker (ms):  282.9  282.8  282.8  282.6
>         Avg: 282.8, Min: 282.6, Max: 282.9, Diff:   0.3]
>        [GC Worker Other (ms):  5.9  6.0  6.0  6.2
>         Avg:   6.0, Min:   5.9, Max:   6.2, Diff:   0.3]
>     [Complete CSet Marking:   0.0 ms]
>     [Clear CT:   0.1 ms]
>     [Other:   3.7 ms]
>        [Choose CSet:   0.0 ms]
>        [Ref Proc:   2.8 ms]
>        [Ref Enq:   0.1 ms]
>        [Free CSet:   0.3 ms]
>     [Eden: 48M(5032M)->0B(5032M) Survivors: 288M->288M Heap:
> 15790M(26624M)->15741M(26624M)]
>   [Times: user=1.14 sys=0.00, real=0.29 secs]
> Heap after GC invocations=575 (full 157):
>   garbage-first heap   total 27262976K, used 16119181K
> [0x0000000160000000, 0x00000007e0000000, 0x00000007e0000000)
>    region size 8192K, 36 young (294912K), 36 survivors (294912K)
>   compacting perm gen  total 524288K, used 164479K [0x00000007e0000000,
> 0x0000000800000000, 0x0000000800000000)
>     the space 524288K,  31% used [0x00000007e0000000,
> 0x00000007ea09ff68, 0x00000007ea0a0000, 0x0000000800000000)
> No shared spaces configured.
> }
> {Heap before GC invocations=575 (full 157):
>   garbage-first heap   total 27262976K, used 16119181K
> [0x0000000160000000, 0x00000007e0000000, 0x00000007e0000000)
>    region size 8192K, 37 young (303104K), 36 survivors (294912K)
>   compacting perm gen  total 524288K, used 164479K [0x00000007e0000000,
> 0x0000000800000000, 0x0000000800000000)
>     the space 524288K,  31% used [0x00000007e0000000,
> 0x00000007ea09ff68, 0x00000007ea0a0000, 0x0000000800000000)
> No shared spaces configured.
> 2013-05-29T17:28:56.413+0300: 38905.701: [Full GC
> 15742M->14497M(26624M), 56.7731320 secs]
>
> That's the third Full GC today after the change to 26G and change from
> occupancypercent=0. Tomorrow will be trying again with
> occupancypercent=0

What did you set the IHOP value to?

>
>>>> -noclassgc (This is rarely needed and haven't seen an app that required it for quite some time)
>>> Jvm 1.6 stopped the world for couple of minutes several times per day
>>> while unloading classes, so we used noclassgc to disable that. We do
>>> not know if this is necessary for latest 1.7 to avoid class unload
>>> pause, but we continued to use this switch and found no harm in it.
>>> Can't afford testing that in production ;)
>> Haven't seen a case where unloading classes cause a several minute pause.
>> Are you sure your system is not swapping?  And, do you have GC logs you
>> can share that illustrate the behavior and that -noclassgc fixed it?
> We deleted swap partition long time ago, we simply do not risk swapping at all.
>
> We had this class unloading problem several times per day like half a
> year ago, and fixed it with noclasssgc, that was a no-brainer, single
> parameter that made the difference.
>
> It is also discussed here (they do not discuss noclassgc though, we
> figured that out somehow)
> http://stackoverflow.com/questions/2833983/meaning-of-the-unloading-class-messages

G1 only performs class unloading during a full GC. But if you're not 
running out of perm space or compiled code cache - you can leave this flag.

>
>>>> -XX:+ UseGCOverheadLimit
>>>> -XX:ReservedCodeCacheSize=48, that is the default for 7u21.  You might consider setting it higher if you have the available space, and more importantly if you think you're running out of code space.
>>> For our sun jvm linux 64bit 48m is maximum, jvm won't start if higher value.
>> If you can't go larger than -XX:ReservedCodeCacheSize=48m, that may suggest you have memory constraints and may also suggest you don't have enough swap space defined, and you may be experiencing swapping during JVM execution.  I've got a Linux system that has 32 GB of RAM, I can set ReservedCodeCacheSize=256m with no issues, even with -Xms30g and -Xms30g.
> It is also documented that 48m is maximum
> http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html
> "maximum code cache size. [Solaris 64-bit, amd64, and -server x86:
> 48m"
>
>

That's the default max code cache size. When the JIT compiler compiles a 
Java method it places the generated code into the code cache. When 
there's no more room in the code cache, a warning is issued and JIT 
compilation is stopped. You can set it higher. IIRC there was time in 
the past when the size was limited in order to use short branches in 
compiled code. I don't think we've had that restriction for a while.

HTHs

JohnC

From darius.ski at gmail.com  Wed May 29 12:11:24 2013
From: darius.ski at gmail.com (Darius D.)
Date: Wed, 29 May 2013 22:11:24 +0300
Subject: Bug in G1GC it performs Full GC when code cache is full resulting
	in overkill
In-Reply-To: <51A63C5F.9040808@oracle.com>
References: <CAKspAHKSnrndWznEe8Uk-frRALUJ_b_JVq+be5woQuj0BnKSeg@mail.gmail.com>
	<3807DE76-D6CA-451F-AC72-771332825905@salesforce.com>
	<CAKspAHJ5M7CikMmyr0t0h=5GOFjHJ3jACo-M6hW-ZXoz=v=-2g@mail.gmail.com>
	<DE80C755-56B1-4C39-AD35-3007C86762A7@salesforce.com>
	<CAKspAHJev3o+g_CN0H3NUsjXPpThMWppnvawiuG_P8YFpSDVEQ@mail.gmail.com>
	<5C892669-3515-4DCC-BB72-96AE89EBE5F8@salesforce.com>
	<CAKspAH+NQWUkbDwsOCAfkQr1aVKC_WPRipDWUEA7PezS38drBA@mail.gmail.com>
	<51A63C5F.9040808@oracle.com>
Message-ID: <CAKt3ReJnoUeepUSuQ4erVaqY8MdKEqAixdsDhQag9vamkgEhNA@mail.gmail.com>

Hi,

I'd strongly suggest that Martin should add
-XX:+PrintAdaptiveSizePolicy to his JVM options. In our case that was
what we needed to solve the mystery of FullGCs with gigabytes of heap
free.

Actually with some minor googling around i've found:

https://forums.oracle.com/forums/thread.jspa?messageID=10869877

I suspect it could be same story as ours, "humongous allocation
request failed" is bad for JVM health, FullGC will occur immediately.

Remember, any allocation that is larger than half of G1GC region size
will get allocated as "humongous" object on heap, that does not care
about regions etc. In our case we were failing to allocate 32
megabytes with over 50% of heap free!


Best regards,

Darius.


On Wed, May 29, 2013 at 8:35 PM, John Cuthbertson
<john.cuthbertson at oracle.com> wrote:
> Hi Martin,
>
> I'm going to fill in bit more detail to Charlie's replies....
>
> On 5/29/2013 8:49 AM, Martin Makundi wrote:
>> Hi!
>>
>>> A bit of constructive criticism ;-)  It would be good practice to set one option at a time and measure its performance to determine whether it improves performance rather than choosing an option because of something you read in text.  In short, always measure and reason about whether what you've observed for an improvement or regression makes sense.  And, also run multiple times to get a sense of noise versus real improvement or regression.
>>
>> Thanks. That's one of the reasons we never changed our options. Once
>> we found someting that works very well, we know that its always n!
>> work to test changes and the system was running very nice indeed
>> before the previous tweak ;)
>>
>>>>> -XX:+UseFastAccessorMethod  (the default is disabled)
>>>> Fast sounds good, the description of it is "Use optimized versions of
>>>> Get<Primitive>Field" which sounds good. I see no harm in this.
>>> These would be JNI operations.
>>>
>>> A quick at the HotSpot source suggests UseFastAccessorMethods
>>> is mostly confined to interpreter operations.
>> Thanks for the info. Doesn't say much to me, but does not seem to harm
>> anything. Will try setting it off at some point in time.
>>
>>>>> -XX:+UseNUMA  (Are you running a JVM that spans NUMA memory nodes?
>>>>> Or, do you have multiple JVMs running on a NUMA or non-NUMA system?)
>>> Key point is that you should use -XX:+UseNUMA only when you are
>>> deploying  a JVM that spans NUMA nodes.
>> Thanks for the info. Doesn't say much to me, but does not seem to harm
>> anything. Will try setting it off at some point in time.
>
> The fast accessor methods flag creates specialized (i.e. short and
> optimized) interpreter entry points for accessor methods (those that
> just return the value in one of the object's fields). In most
> applications the bulk of the execution time is spent executing JIT
> compiled code; only a few percent is typically spent in Hotspot's
> interpreter. The JIT compiler will always try to inline accessor methods
> into their caller. So, unless your application is spending a ton of time
> interpreting, this flag should make no difference.
>
>>>>> -XX:CMSInitiatingOccupancyFraction=70 (This is applicable to CMS GC, and not applicable to G1 GC)
>>>> Again, not documented thoroughly where it applies and where not, jvm
>>>> gave no warning/error about it so we assumed it's valid.
>>> There's always the HotSpot source code ;-)
>>>
>>> It's also quite well documented in various slide ware on the internet.
>>> It's also quite well documented in the Java Performance book. :-)
>> Uh.. does it say somewhere that Do not use
>> XX:CMSInitiatingOccupancyFraction with G1GC? ;) I know performance
>> tuning is your bread and butter but is not ours... is more like we are
>> just driving the car and you are the mechanic...different
>> perspective.. just trying to fill'er'up to go..leaded or unleaded...
>> ;)
>
> The G1 equivalent of this flag is InitiatingHeapOccupancyPercent (IHOP
> for short). Actually both G1 and CMS accept and observe IHOP.
> CMSInitiatingOccupancyFraction was superseded by IHOP. The JVM still
> accepts the old flag name - but it is CMS only and doesn't affect G1.
>
>>>>> -XX:InitiatingHeapOccupancyPercent=0  (You realize this will force G1's concurrent cycle to run continuously?)
>>>> Yes, that's what we figured out, we don't want it to sit lazy and end
>>>> up in a situation where it is required to do a Full GC. This switch
>>>> was specifically chosen in a situation we had a memory leak and tried
>>>> to aggressively fight against it before we found the root cause. Maybe
>>>> we should try without this switch now, and see what effect it has.
>>> Having GC logs to see what available head room you have between the
>>> initiating of a G1 concurrent cycle and available regions / heap space would be most appropriate.
>> Hmm.. I don't thoroughly understand the logs either, but, here is a snap:
>>
>> 2013-05-29T17:28:56.119+0300: 38905.407: [GC pause (young), 0.29261000 secs]
>>     [Parallel Time: 288.8 ms]
>>        [GC Worker Start (ms):  38905407.6  38905407.7  38905407.7  38905407.9
>>         Avg: 38905407.7, Min: 38905407.6, Max: 38905407.9, Diff:   0.3]
>>        [Ext Root Scanning (ms):  22.8  16.3  18.1  22.0
>>         Avg:  19.8, Min:  16.3, Max:  22.8, Diff:   6.6]
>>        [SATB Filtering (ms):  0.0  0.1  0.0  0.0
>>         Avg:   0.0, Min:   0.0, Max:   0.1, Diff:   0.1]
>>        [Update RS (ms):  31.9  37.3  35.1  33.3
>>         Avg:  34.4, Min:  31.9, Max:  37.3, Diff:   5.5]
>>           [Processed Buffers : 102 106 119 104
>>            Sum: 431, Avg: 107, Min: 102, Max: 119, Diff: 17]
>>        [Scan RS (ms):  0.0  0.0  0.1  0.0
>>         Avg:   0.0, Min:   0.0, Max:   0.1, Diff:   0.1]
>>        [Object Copy (ms):  228.2  229.1  229.5  227.3
>>         Avg: 228.5, Min: 227.3, Max: 229.5, Diff:   2.2]
>>        [Termination (ms):  0.0  0.0  0.0  0.0
>>         Avg:   0.0, Min:   0.0, Max:   0.0, Diff:   0.0]
>>           [Termination Attempts : 4 1 11 4
>>            Sum: 20, Avg: 5, Min: 1, Max: 11, Diff: 10]
>>        [GC Worker End (ms):  38905690.5  38905690.5  38905690.5  38905690.5
>>         Avg: 38905690.5, Min: 38905690.5, Max: 38905690.5, Diff:   0.0]
>>        [GC Worker (ms):  282.9  282.8  282.8  282.6
>>         Avg: 282.8, Min: 282.6, Max: 282.9, Diff:   0.3]
>>        [GC Worker Other (ms):  5.9  6.0  6.0  6.2
>>         Avg:   6.0, Min:   5.9, Max:   6.2, Diff:   0.3]
>>     [Complete CSet Marking:   0.0 ms]
>>     [Clear CT:   0.1 ms]
>>     [Other:   3.7 ms]
>>        [Choose CSet:   0.0 ms]
>>        [Ref Proc:   2.8 ms]
>>        [Ref Enq:   0.1 ms]
>>        [Free CSet:   0.3 ms]
>>     [Eden: 48M(5032M)->0B(5032M) Survivors: 288M->288M Heap:
>> 15790M(26624M)->15741M(26624M)]
>>   [Times: user=1.14 sys=0.00, real=0.29 secs]
>> Heap after GC invocations=575 (full 157):
>>   garbage-first heap   total 27262976K, used 16119181K
>> [0x0000000160000000, 0x00000007e0000000, 0x00000007e0000000)
>>    region size 8192K, 36 young (294912K), 36 survivors (294912K)
>>   compacting perm gen  total 524288K, used 164479K [0x00000007e0000000,
>> 0x0000000800000000, 0x0000000800000000)
>>     the space 524288K,  31% used [0x00000007e0000000,
>> 0x00000007ea09ff68, 0x00000007ea0a0000, 0x0000000800000000)
>> No shared spaces configured.
>> }
>> {Heap before GC invocations=575 (full 157):
>>   garbage-first heap   total 27262976K, used 16119181K
>> [0x0000000160000000, 0x00000007e0000000, 0x00000007e0000000)
>>    region size 8192K, 37 young (303104K), 36 survivors (294912K)
>>   compacting perm gen  total 524288K, used 164479K [0x00000007e0000000,
>> 0x0000000800000000, 0x0000000800000000)
>>     the space 524288K,  31% used [0x00000007e0000000,
>> 0x00000007ea09ff68, 0x00000007ea0a0000, 0x0000000800000000)
>> No shared spaces configured.
>> 2013-05-29T17:28:56.413+0300: 38905.701: [Full GC
>> 15742M->14497M(26624M), 56.7731320 secs]
>>
>> That's the third Full GC today after the change to 26G and change from
>> occupancypercent=0. Tomorrow will be trying again with
>> occupancypercent=0
>
> What did you set the IHOP value to?
>
>>
>>>>> -noclassgc (This is rarely needed and haven't seen an app that required it for quite some time)
>>>> Jvm 1.6 stopped the world for couple of minutes several times per day
>>>> while unloading classes, so we used noclassgc to disable that. We do
>>>> not know if this is necessary for latest 1.7 to avoid class unload
>>>> pause, but we continued to use this switch and found no harm in it.
>>>> Can't afford testing that in production ;)
>>> Haven't seen a case where unloading classes cause a several minute pause.
>>> Are you sure your system is not swapping?  And, do you have GC logs you
>>> can share that illustrate the behavior and that -noclassgc fixed it?
>> We deleted swap partition long time ago, we simply do not risk swapping at all.
>>
>> We had this class unloading problem several times per day like half a
>> year ago, and fixed it with noclasssgc, that was a no-brainer, single
>> parameter that made the difference.
>>
>> It is also discussed here (they do not discuss noclassgc though, we
>> figured that out somehow)
>> http://stackoverflow.com/questions/2833983/meaning-of-the-unloading-class-messages
>
> G1 only performs class unloading during a full GC. But if you're not
> running out of perm space or compiled code cache - you can leave this flag.
>
>>
>>>>> -XX:+ UseGCOverheadLimit
>>>>> -XX:ReservedCodeCacheSize=48, that is the default for 7u21.  You might consider setting it higher if you have the available space, and more importantly if you think you're running out of code space.
>>>> For our sun jvm linux 64bit 48m is maximum, jvm won't start if higher value.
>>> If you can't go larger than -XX:ReservedCodeCacheSize=48m, that may suggest you have memory constraints and may also suggest you don't have enough swap space defined, and you may be experiencing swapping during JVM execution.  I've got a Linux system that has 32 GB of RAM, I can set ReservedCodeCacheSize=256m with no issues, even with -Xms30g and -Xms30g.
>> It is also documented that 48m is maximum
>> http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html
>> "maximum code cache size. [Solaris 64-bit, amd64, and -server x86:
>> 48m"
>>
>>
>
> That's the default max code cache size. When the JIT compiler compiles a
> Java method it places the generated code into the code cache. When
> there's no more room in the code cache, a warning is issued and JIT
> compilation is stopped. You can set it higher. IIRC there was time in
> the past when the size was limited in order to use short branches in
> compiled code. I don't think we've had that restriction for a while.
>
> HTHs
>
> JohnC
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

From martin.makundi at koodaripalvelut.com  Thu May 30 11:37:33 2013
From: martin.makundi at koodaripalvelut.com (Martin Makundi)
Date: Thu, 30 May 2013 21:37:33 +0300
Subject: Bug in G1GC it performs Full GC when code cache is full resulting
	in overkill
In-Reply-To: <6208C497-21E4-4A26-B825-13F23E702898@salesforce.com>
References: <9A54ECD2-B470-4797-9EFB-E81DD91C6B3F@salesforce.com>
	<F93A5AB6-71B1-4D75-B502-906F5A4D0D5F@salesforce.com>
	<CAKspAHJmL3ezrWhM=AiQ_ZMoKtEQ-bV+F_wKzEh1kNzKnKjXRQ@mail.gmail.com>
	<6208C497-21E4-4A26-B825-13F23E702898@salesforce.com>
Message-ID: <CAKspAHJvWCt5VqFdA2YdYhdCKzFxo5ydF0hxyaqgpW3JJAr7hw@mail.gmail.com>

Hi!

> Having free memory is one thing.  How much free memory do you have?  Do you
> have 2x more than your Java heap size, including -Xmx and what you're
> specifying for MaxPermSize?

Do we need 2x more (or just 2x) memory relative to java heap size?
Why? Currently we have 40gb ram and 26-30gb allocated to java (fixed
xm size). The rest is for system needs.

> Question is did you set both -XX:PermSize and -XX:MaxPermSize to the same
> value?  If not, if perm gen needs to expand from the initial size, or gets
> close to that initial size, the JVM may attempt to unload classes to free up
> space prior to expanding perm gen.

Yes, exactly for that reason we have set both equal.

> But, if you'd rather use -noclassgc, then go ahead. :-)

We had to use that too, permsize alone didn't do the job.

> -XX:ReservedCodeCacheSize=48, that is the default for 7u21.
>
> There's not magic number of what to increase it too.  It's probably more
> magical than suggesting a Java heap configuration size to use. We don't know
> how much code you've got and how much of it will be executed enough times to
> compile, or require a de-opt and re-opt.

Ok. We set it to 256m and code cache usage is now approximately 20-25%
for most of the time. Bit over-sized but luckily we aren't in 8bit
hardware anymore so we can afford it ;)

> However, you can monitor the occupancy of code cache in a couple different
> ways.  There's a JMX MBean for the code cache where you can get the
> occupancy and size of code cache.  There's also a plug-in for VisualVM that
> monitors code cache size and occupancy, you get a copy of the plug-in and
> install it into VisualVM. The web site for it is:
> https://java.net/projects/memorypoolview.  If you're monitoring code cache
> occupancy in production today, you should probably put it on your short
> list.

We have quite nice view to our server stats from new relic and
appdynamics monitors.

Will try 128m though...


**
Martin

>
>
>
>
>
>
>
> On May 26, 2013, at 10:20 AM, Martin Makundi wrote:
>
>
>
> Sorry, forgot to mention, using:
>
>
>
> java version "1.7.0_21"
>
>
> Java(TM) SE Runtime Environment (build 1.7.0_21-b11)
>
>
> Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode)
>
>
>
> Linux version 3.0.1.stk64 (dfn at localhost.localdomain) (gcc version
>
>
> 4.5.1 20100924 (Red Hat 4.5.1-4) (GCC) ) #1 SMP Sat Aug 13 12:53:46
>
>
> EDT 2011
>
>
>
> -Dclassworlds.conf=/usr/share/maven/maven/bin/m2.conf
>
>
> -Dmaven.home=/usr/share/maven/maven
>
>
> -Duser.timezone=EET
>
>
> -XX:+AggressiveOpts
>
>
> -XX:+DisableExplicitGC
>
>
> -XX:+ParallelRefProcEnabled
>
>
> -XX:+PrintGCDateStamps
>
>
> -XX:+PrintGCDetails
>
>
> -XX:+PrintHeapAtGC
>
>
> -XX:+UseAdaptiveSizePolicy
>
>
> -XX:+UseCompressedOops
>
>
> -XX:+UseFastAccessorMethods
>
>
> -XX:+UseG1GC
>
>
> -XX:+UseGCOverheadLimit
>
>
> -XX:+UseNUMA
>
>
> -XX:+UseStringCache
>
>
> -XX:CMSInitiatingOccupancyFraction=70
>
>
> -XX:GCPauseIntervalMillis=10000
>
>
> -XX:InitiatingHeapOccupancyPercent=0
>
>
> -XX:MaxGCPauseMillis=500
>
>
> -XX:MaxPermSize=512m
>
>
> -XX:PermSize=512m
>
>
> -XX:ReservedCodeCacheSize=48m
>
>
> -Xloggc:gc.log
>
>
> -Xmaxf1
>
>
> -Xms30G
>
>
> -Xmx30G
>
>
> -Xnoclassgc
>
>
> -Xss4096k
>
>
>
>
> **
>
>
> Martin
>
>
>
> 2013/5/26 Charlie Hunt <chunt at salesforce.com>:
>
>
> Which version of the JDK/JRE are you using?
>
>
>
> One of the links you referenced below was using JDK 6, where there is no
>
> official support for G1. The other link suggests it could have been RMI DGC
>
> or a System.gc().
>
>
>
>
>
> Sent from my iPhone
>
>
>
> On May 25, 2013, at 11:43 PM, "Martin Makundi"
>
> <martin.makundi at koodaripalvelut.com> wrote:
>
>
>
> it occurs daily.
>
>
>
>
>
>
>
> _______________________________________________
>
> hotspot-gc-use mailing list
>
> hotspot-gc-use at openjdk.java.net
>
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
>

From martin.makundi at koodaripalvelut.com  Thu May 30 11:54:04 2013
From: martin.makundi at koodaripalvelut.com (Martin Makundi)
Date: Thu, 30 May 2013 21:54:04 +0300
Subject: Bug in G1GC it performs Full GC when code cache is full resulting
	in overkill
In-Reply-To: <CAKt3ReJnoUeepUSuQ4erVaqY8MdKEqAixdsDhQag9vamkgEhNA@mail.gmail.com>
References: <CAKspAHKSnrndWznEe8Uk-frRALUJ_b_JVq+be5woQuj0BnKSeg@mail.gmail.com>
	<3807DE76-D6CA-451F-AC72-771332825905@salesforce.com>
	<CAKspAHJ5M7CikMmyr0t0h=5GOFjHJ3jACo-M6hW-ZXoz=v=-2g@mail.gmail.com>
	<DE80C755-56B1-4C39-AD35-3007C86762A7@salesforce.com>
	<CAKspAHJev3o+g_CN0H3NUsjXPpThMWppnvawiuG_P8YFpSDVEQ@mail.gmail.com>
	<5C892669-3515-4DCC-BB72-96AE89EBE5F8@salesforce.com>
	<CAKspAH+NQWUkbDwsOCAfkQr1aVKC_WPRipDWUEA7PezS38drBA@mail.gmail.com>
	<51A63C5F.9040808@oracle.com>
	<CAKt3ReJnoUeepUSuQ4erVaqY8MdKEqAixdsDhQag9vamkgEhNA@mail.gmail.com>
Message-ID: <CAKspAHKtFfJ2bp0za4iZ9wwkTvejCM9x3sJ+Sm=15BVRdyHPaw@mail.gmail.com>

Hi!

> I'd strongly suggest that Martin should add
> -XX:+PrintAdaptiveSizePolicy to his JVM options. In our case that was
> what we needed to solve the mystery of FullGCs with gigabytes of heap
> free.

Thanks, will add that tomorrow.

> Actually with some minor googling around i've found:
>
> https://forums.oracle.com/forums/thread.jspa?messageID=10869877
>
> I suspect it could be same story as ours, "humongous allocation
> request failed" is bad for JVM health, FullGC will occur immediately.
>
> Remember, any allocation that is larger than half of G1GC region size
> will get allocated as "humongous" object on heap, that does not care
> about regions etc. In our case we were failing to allocate 32
> megabytes with over 50% of heap free!

Any solution to such problem or is it a bug in g1gc? Is there a way to
log what code is performing the memory allocation if that happens to
be the case?

**
Martin
>
>
> Best regards,
>
> Darius.
>
>
>
> On Wed, May 29, 2013 at 8:35 PM, John Cuthbertson
> <john.cuthbertson at oracle.com> wrote:
>> Hi Martin,
>>
>> I'm going to fill in bit more detail to Charlie's replies....
>>
>> On 5/29/2013 8:49 AM, Martin Makundi wrote:
>>> Hi!
>>>
>>>> A bit of constructive criticism ;-)  It would be good practice to set one option at a time and measure its performance to determine whether it improves performance rather than choosing an option because of something you read in text.  In short, always measure and reason about whether what you've observed for an improvement or regression makes sense.  And, also run multiple times to get a sense of noise versus real improvement or regression.
>>>
>>> Thanks. That's one of the reasons we never changed our options. Once
>>> we found someting that works very well, we know that its always n!
>>> work to test changes and the system was running very nice indeed
>>> before the previous tweak ;)
>>>
>>>>>> -XX:+UseFastAccessorMethod  (the default is disabled)
>>>>> Fast sounds good, the description of it is "Use optimized versions of
>>>>> Get<Primitive>Field" which sounds good. I see no harm in this.
>>>> These would be JNI operations.
>>>>
>>>> A quick at the HotSpot source suggests UseFastAccessorMethods
>>>> is mostly confined to interpreter operations.
>>> Thanks for the info. Doesn't say much to me, but does not seem to harm
>>> anything. Will try setting it off at some point in time.
>>>
>>>>>> -XX:+UseNUMA  (Are you running a JVM that spans NUMA memory nodes?
>>>>>> Or, do you have multiple JVMs running on a NUMA or non-NUMA system?)
>>>> Key point is that you should use -XX:+UseNUMA only when you are
>>>> deploying  a JVM that spans NUMA nodes.
>>> Thanks for the info. Doesn't say much to me, but does not seem to harm
>>> anything. Will try setting it off at some point in time.
>>
>> The fast accessor methods flag creates specialized (i.e. short and
>> optimized) interpreter entry points for accessor methods (those that
>> just return the value in one of the object's fields). In most
>> applications the bulk of the execution time is spent executing JIT
>> compiled code; only a few percent is typically spent in Hotspot's
>> interpreter. The JIT compiler will always try to inline accessor methods
>> into their caller. So, unless your application is spending a ton of time
>> interpreting, this flag should make no difference.
>>
>>>>>> -XX:CMSInitiatingOccupancyFraction=70 (This is applicable to CMS GC, and not applicable to G1 GC)
>>>>> Again, not documented thoroughly where it applies and where not, jvm
>>>>> gave no warning/error about it so we assumed it's valid.
>>>> There's always the HotSpot source code ;-)
>>>>
>>>> It's also quite well documented in various slide ware on the internet.
>>>> It's also quite well documented in the Java Performance book. :-)
>>> Uh.. does it say somewhere that Do not use
>>> XX:CMSInitiatingOccupancyFraction with G1GC? ;) I know performance
>>> tuning is your bread and butter but is not ours... is more like we are
>>> just driving the car and you are the mechanic...different
>>> perspective.. just trying to fill'er'up to go..leaded or unleaded...
>>> ;)
>>
>> The G1 equivalent of this flag is InitiatingHeapOccupancyPercent (IHOP
>> for short). Actually both G1 and CMS accept and observe IHOP.
>> CMSInitiatingOccupancyFraction was superseded by IHOP. The JVM still
>> accepts the old flag name - but it is CMS only and doesn't affect G1.
>>
>>>>>> -XX:InitiatingHeapOccupancyPercent=0  (You realize this will force G1's concurrent cycle to run continuously?)
>>>>> Yes, that's what we figured out, we don't want it to sit lazy and end
>>>>> up in a situation where it is required to do a Full GC. This switch
>>>>> was specifically chosen in a situation we had a memory leak and tried
>>>>> to aggressively fight against it before we found the root cause. Maybe
>>>>> we should try without this switch now, and see what effect it has.
>>>> Having GC logs to see what available head room you have between the
>>>> initiating of a G1 concurrent cycle and available regions / heap space would be most appropriate.
>>> Hmm.. I don't thoroughly understand the logs either, but, here is a snap:
>>>
>>> 2013-05-29T17:28:56.119+0300: 38905.407: [GC pause (young), 0.29261000 secs]
>>>     [Parallel Time: 288.8 ms]
>>>        [GC Worker Start (ms):  38905407.6  38905407.7  38905407.7  38905407.9
>>>         Avg: 38905407.7, Min: 38905407.6, Max: 38905407.9, Diff:   0.3]
>>>        [Ext Root Scanning (ms):  22.8  16.3  18.1  22.0
>>>         Avg:  19.8, Min:  16.3, Max:  22.8, Diff:   6.6]
>>>        [SATB Filtering (ms):  0.0  0.1  0.0  0.0
>>>         Avg:   0.0, Min:   0.0, Max:   0.1, Diff:   0.1]
>>>        [Update RS (ms):  31.9  37.3  35.1  33.3
>>>         Avg:  34.4, Min:  31.9, Max:  37.3, Diff:   5.5]
>>>           [Processed Buffers : 102 106 119 104
>>>            Sum: 431, Avg: 107, Min: 102, Max: 119, Diff: 17]
>>>        [Scan RS (ms):  0.0  0.0  0.1  0.0
>>>         Avg:   0.0, Min:   0.0, Max:   0.1, Diff:   0.1]
>>>        [Object Copy (ms):  228.2  229.1  229.5  227.3
>>>         Avg: 228.5, Min: 227.3, Max: 229.5, Diff:   2.2]
>>>        [Termination (ms):  0.0  0.0  0.0  0.0
>>>         Avg:   0.0, Min:   0.0, Max:   0.0, Diff:   0.0]
>>>           [Termination Attempts : 4 1 11 4
>>>            Sum: 20, Avg: 5, Min: 1, Max: 11, Diff: 10]
>>>        [GC Worker End (ms):  38905690.5  38905690.5  38905690.5  38905690.5
>>>         Avg: 38905690.5, Min: 38905690.5, Max: 38905690.5, Diff:   0.0]
>>>        [GC Worker (ms):  282.9  282.8  282.8  282.6
>>>         Avg: 282.8, Min: 282.6, Max: 282.9, Diff:   0.3]
>>>        [GC Worker Other (ms):  5.9  6.0  6.0  6.2
>>>         Avg:   6.0, Min:   5.9, Max:   6.2, Diff:   0.3]
>>>     [Complete CSet Marking:   0.0 ms]
>>>     [Clear CT:   0.1 ms]
>>>     [Other:   3.7 ms]
>>>        [Choose CSet:   0.0 ms]
>>>        [Ref Proc:   2.8 ms]
>>>        [Ref Enq:   0.1 ms]
>>>        [Free CSet:   0.3 ms]
>>>     [Eden: 48M(5032M)->0B(5032M) Survivors: 288M->288M Heap:
>>> 15790M(26624M)->15741M(26624M)]
>>>   [Times: user=1.14 sys=0.00, real=0.29 secs]
>>> Heap after GC invocations=575 (full 157):
>>>   garbage-first heap   total 27262976K, used 16119181K
>>> [0x0000000160000000, 0x00000007e0000000, 0x00000007e0000000)
>>>    region size 8192K, 36 young (294912K), 36 survivors (294912K)
>>>   compacting perm gen  total 524288K, used 164479K [0x00000007e0000000,
>>> 0x0000000800000000, 0x0000000800000000)
>>>     the space 524288K,  31% used [0x00000007e0000000,
>>> 0x00000007ea09ff68, 0x00000007ea0a0000, 0x0000000800000000)
>>> No shared spaces configured.
>>> }
>>> {Heap before GC invocations=575 (full 157):
>>>   garbage-first heap   total 27262976K, used 16119181K
>>> [0x0000000160000000, 0x00000007e0000000, 0x00000007e0000000)
>>>    region size 8192K, 37 young (303104K), 36 survivors (294912K)
>>>   compacting perm gen  total 524288K, used 164479K [0x00000007e0000000,
>>> 0x0000000800000000, 0x0000000800000000)
>>>     the space 524288K,  31% used [0x00000007e0000000,
>>> 0x00000007ea09ff68, 0x00000007ea0a0000, 0x0000000800000000)
>>> No shared spaces configured.
>>> 2013-05-29T17:28:56.413+0300: 38905.701: [Full GC
>>> 15742M->14497M(26624M), 56.7731320 secs]
>>>
>>> That's the third Full GC today after the change to 26G and change from
>>> occupancypercent=0. Tomorrow will be trying again with
>>> occupancypercent=0
>>
>> What did you set the IHOP value to?
>>
>>>
>>>>>> -noclassgc (This is rarely needed and haven't seen an app that required it for quite some time)
>>>>> Jvm 1.6 stopped the world for couple of minutes several times per day
>>>>> while unloading classes, so we used noclassgc to disable that. We do
>>>>> not know if this is necessary for latest 1.7 to avoid class unload
>>>>> pause, but we continued to use this switch and found no harm in it.
>>>>> Can't afford testing that in production ;)
>>>> Haven't seen a case where unloading classes cause a several minute pause.
>>>> Are you sure your system is not swapping?  And, do you have GC logs you
>>>> can share that illustrate the behavior and that -noclassgc fixed it?
>>> We deleted swap partition long time ago, we simply do not risk swapping at all.
>>>
>>> We had this class unloading problem several times per day like half a
>>> year ago, and fixed it with noclasssgc, that was a no-brainer, single
>>> parameter that made the difference.
>>>
>>> It is also discussed here (they do not discuss noclassgc though, we
>>> figured that out somehow)
>>> http://stackoverflow.com/questions/2833983/meaning-of-the-unloading-class-messages
>>
>> G1 only performs class unloading during a full GC. But if you're not
>> running out of perm space or compiled code cache - you can leave this flag.
>>
>>>
>>>>>> -XX:+ UseGCOverheadLimit
>>>>>> -XX:ReservedCodeCacheSize=48, that is the default for 7u21.  You might consider setting it higher if you have the available space, and more importantly if you think you're running out of code space.
>>>>> For our sun jvm linux 64bit 48m is maximum, jvm won't start if higher value.
>>>> If you can't go larger than -XX:ReservedCodeCacheSize=48m, that may suggest you have memory constraints and may also suggest you don't have enough swap space defined, and you may be experiencing swapping during JVM execution.  I've got a Linux system that has 32 GB of RAM, I can set ReservedCodeCacheSize=256m with no issues, even with -Xms30g and -Xms30g.
>>> It is also documented that 48m is maximum
>>> http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html
>>> "maximum code cache size. [Solaris 64-bit, amd64, and -server x86:
>>> 48m"
>>>
>>>
>>
>> That's the default max code cache size. When the JIT compiler compiles a
>> Java method it places the generated code into the code cache. When
>> there's no more room in the code cache, a warning is issued and JIT
>> compilation is stopped. You can set it higher. IIRC there was time in
>> the past when the size was limited in order to use short branches in
>> compiled code. I don't think we've had that restriction for a while.
>>
>> HTHs
>>
>> JohnC
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

From monica.beckwith at oracle.com  Thu May 30 13:55:35 2013
From: monica.beckwith at oracle.com (Monica Beckwith)
Date: Thu, 30 May 2013 15:55:35 -0500
Subject: Bug in G1GC it performs Full GC when code cache is full resulting
	in overkill
In-Reply-To: <CAKspAHKtFfJ2bp0za4iZ9wwkTvejCM9x3sJ+Sm=15BVRdyHPaw@mail.gmail.com>
References: <CAKspAHKSnrndWznEe8Uk-frRALUJ_b_JVq+be5woQuj0BnKSeg@mail.gmail.com>
	<3807DE76-D6CA-451F-AC72-771332825905@salesforce.com>
	<CAKspAHJ5M7CikMmyr0t0h=5GOFjHJ3jACo-M6hW-ZXoz=v=-2g@mail.gmail.com>
	<DE80C755-56B1-4C39-AD35-3007C86762A7@salesforce.com>
	<CAKspAHJev3o+g_CN0H3NUsjXPpThMWppnvawiuG_P8YFpSDVEQ@mail.gmail.com>
	<5C892669-3515-4DCC-BB72-96AE89EBE5F8@salesforce.com>
	<CAKspAH+NQWUkbDwsOCAfkQr1aVKC_WPRipDWUEA7PezS38drBA@mail.gmail.com>
	<51A63C5F.9040808@oracle.com>
	<CAKt3ReJnoUeepUSuQ4erVaqY8MdKEqAixdsDhQag9vamkgEhNA@mail.gmail.com>
	<CAKspAHKtFfJ2bp0za4iZ9wwkTvejCM9x3sJ+Sm=15BVRdyHPaw@mail.gmail.com>
Message-ID: <51A7BCC7.4050700@oracle.com>

+1 to enabling PrintAdaptiveSizePolicy.
Darius,
As you have already mentioned - Any object with size greater or equal to 
a half region is called a "humongous" object (H-obj). The max region 
size for G1 is 32M. So yes, even if you set your region size to max, 
your 32M object will be considered humongous.
Now, there are a couple of things that we should be aware of with 
respect to humongous regions (H-region)/objects -

 1. The H-obj allocation will happen directly into the old generation
     1. There will be a check for marking threshold (IHOP), and a
        concurrent cycle will be initiated if necessary
 2. The H-regions are not included in an evacuation pause, since it's
    just going to increase the copying expense.
     1. But if the H-obj(s) are dead, they get freed at the end of the
        multi-phased concurrent marking cycle.

So, I think, if you have to work with H-objs and increasing the region 
size doesn't help (as is your case), then maybe you should try limiting 
your nursery so as to allow more space for the old generation, so as to 
sustain your H-objs till they die or if they are a part of your live 
data set, then it's all the more necessary to be able to fit them in 
your old gen.

-Monica

On 5/30/2013 1:54 PM, Martin Makundi wrote:
> Hi!
>
>> I'd strongly suggest that Martin should add
>> -XX:+PrintAdaptiveSizePolicy to his JVM options. In our case that was
>> what we needed to solve the mystery of FullGCs with gigabytes of heap
>> free.
> Thanks, will add that tomorrow.
>
>> Actually with some minor googling around i've found:
>>
>> https://forums.oracle.com/forums/thread.jspa?messageID=10869877
>>
>> I suspect it could be same story as ours, "humongous allocation
>> request failed" is bad for JVM health, FullGC will occur immediately.
>>
>> Remember, any allocation that is larger than half of G1GC region size
>> will get allocated as "humongous" object on heap, that does not care
>> about regions etc. In our case we were failing to allocate 32
>> megabytes with over 50% of heap free!
> Any solution to such problem or is it a bug in g1gc? Is there a way to
> log what code is performing the memory allocation if that happens to
> be the case?
>
> **
> Martin
>>
>> Best regards,
>>
>> Darius.
>>
>>
>>
>> On Wed, May 29, 2013 at 8:35 PM, John Cuthbertson
>> <john.cuthbertson at oracle.com> wrote:
>>> Hi Martin,
>>>
>>> I'm going to fill in bit more detail to Charlie's replies....
>>>
>>> On 5/29/2013 8:49 AM, Martin Makundi wrote:
>>>> Hi!
>>>>
>>>>> A bit of constructive criticism ;-)  It would be good practice to set one option at a time and measure its performance to determine whether it improves performance rather than choosing an option because of something you read in text.  In short, always measure and reason about whether what you've observed for an improvement or regression makes sense.  And, also run multiple times to get a sense of noise versus real improvement or regression.
>>>> Thanks. That's one of the reasons we never changed our options. Once
>>>> we found someting that works very well, we know that its always n!
>>>> work to test changes and the system was running very nice indeed
>>>> before the previous tweak ;)
>>>>
>>>>>>> -XX:+UseFastAccessorMethod  (the default is disabled)
>>>>>> Fast sounds good, the description of it is "Use optimized versions of
>>>>>> Get<Primitive>Field" which sounds good. I see no harm in this.
>>>>> These would be JNI operations.
>>>>>
>>>>> A quick at the HotSpot source suggests UseFastAccessorMethods
>>>>> is mostly confined to interpreter operations.
>>>> Thanks for the info. Doesn't say much to me, but does not seem to harm
>>>> anything. Will try setting it off at some point in time.
>>>>
>>>>>>> -XX:+UseNUMA  (Are you running a JVM that spans NUMA memory nodes?
>>>>>>> Or, do you have multiple JVMs running on a NUMA or non-NUMA system?)
>>>>> Key point is that you should use -XX:+UseNUMA only when you are
>>>>> deploying  a JVM that spans NUMA nodes.
>>>> Thanks for the info. Doesn't say much to me, but does not seem to harm
>>>> anything. Will try setting it off at some point in time.
>>> The fast accessor methods flag creates specialized (i.e. short and
>>> optimized) interpreter entry points for accessor methods (those that
>>> just return the value in one of the object's fields). In most
>>> applications the bulk of the execution time is spent executing JIT
>>> compiled code; only a few percent is typically spent in Hotspot's
>>> interpreter. The JIT compiler will always try to inline accessor methods
>>> into their caller. So, unless your application is spending a ton of time
>>> interpreting, this flag should make no difference.
>>>
>>>>>>> -XX:CMSInitiatingOccupancyFraction=70 (This is applicable to CMS GC, and not applicable to G1 GC)
>>>>>> Again, not documented thoroughly where it applies and where not, jvm
>>>>>> gave no warning/error about it so we assumed it's valid.
>>>>> There's always the HotSpot source code ;-)
>>>>>
>>>>> It's also quite well documented in various slide ware on the internet.
>>>>> It's also quite well documented in the Java Performance book. :-)
>>>> Uh.. does it say somewhere that Do not use
>>>> XX:CMSInitiatingOccupancyFraction with G1GC? ;) I know performance
>>>> tuning is your bread and butter but is not ours... is more like we are
>>>> just driving the car and you are the mechanic...different
>>>> perspective.. just trying to fill'er'up to go..leaded or unleaded...
>>>> ;)
>>> The G1 equivalent of this flag is InitiatingHeapOccupancyPercent (IHOP
>>> for short). Actually both G1 and CMS accept and observe IHOP.
>>> CMSInitiatingOccupancyFraction was superseded by IHOP. The JVM still
>>> accepts the old flag name - but it is CMS only and doesn't affect G1.
>>>
>>>>>>> -XX:InitiatingHeapOccupancyPercent=0  (You realize this will force G1's concurrent cycle to run continuously?)
>>>>>> Yes, that's what we figured out, we don't want it to sit lazy and end
>>>>>> up in a situation where it is required to do a Full GC. This switch
>>>>>> was specifically chosen in a situation we had a memory leak and tried
>>>>>> to aggressively fight against it before we found the root cause. Maybe
>>>>>> we should try without this switch now, and see what effect it has.
>>>>> Having GC logs to see what available head room you have between the
>>>>> initiating of a G1 concurrent cycle and available regions / heap space would be most appropriate.
>>>> Hmm.. I don't thoroughly understand the logs either, but, here is a snap:
>>>>
>>>> 2013-05-29T17:28:56.119+0300: 38905.407: [GC pause (young), 0.29261000 secs]
>>>>      [Parallel Time: 288.8 ms]
>>>>         [GC Worker Start (ms):  38905407.6  38905407.7  38905407.7  38905407.9
>>>>          Avg: 38905407.7, Min: 38905407.6, Max: 38905407.9, Diff:   0.3]
>>>>         [Ext Root Scanning (ms):  22.8  16.3  18.1  22.0
>>>>          Avg:  19.8, Min:  16.3, Max:  22.8, Diff:   6.6]
>>>>         [SATB Filtering (ms):  0.0  0.1  0.0  0.0
>>>>          Avg:   0.0, Min:   0.0, Max:   0.1, Diff:   0.1]
>>>>         [Update RS (ms):  31.9  37.3  35.1  33.3
>>>>          Avg:  34.4, Min:  31.9, Max:  37.3, Diff:   5.5]
>>>>            [Processed Buffers : 102 106 119 104
>>>>             Sum: 431, Avg: 107, Min: 102, Max: 119, Diff: 17]
>>>>         [Scan RS (ms):  0.0  0.0  0.1  0.0
>>>>          Avg:   0.0, Min:   0.0, Max:   0.1, Diff:   0.1]
>>>>         [Object Copy (ms):  228.2  229.1  229.5  227.3
>>>>          Avg: 228.5, Min: 227.3, Max: 229.5, Diff:   2.2]
>>>>         [Termination (ms):  0.0  0.0  0.0  0.0
>>>>          Avg:   0.0, Min:   0.0, Max:   0.0, Diff:   0.0]
>>>>            [Termination Attempts : 4 1 11 4
>>>>             Sum: 20, Avg: 5, Min: 1, Max: 11, Diff: 10]
>>>>         [GC Worker End (ms):  38905690.5  38905690.5  38905690.5  38905690.5
>>>>          Avg: 38905690.5, Min: 38905690.5, Max: 38905690.5, Diff:   0.0]
>>>>         [GC Worker (ms):  282.9  282.8  282.8  282.6
>>>>          Avg: 282.8, Min: 282.6, Max: 282.9, Diff:   0.3]
>>>>         [GC Worker Other (ms):  5.9  6.0  6.0  6.2
>>>>          Avg:   6.0, Min:   5.9, Max:   6.2, Diff:   0.3]
>>>>      [Complete CSet Marking:   0.0 ms]
>>>>      [Clear CT:   0.1 ms]
>>>>      [Other:   3.7 ms]
>>>>         [Choose CSet:   0.0 ms]
>>>>         [Ref Proc:   2.8 ms]
>>>>         [Ref Enq:   0.1 ms]
>>>>         [Free CSet:   0.3 ms]
>>>>      [Eden: 48M(5032M)->0B(5032M) Survivors: 288M->288M Heap:
>>>> 15790M(26624M)->15741M(26624M)]
>>>>    [Times: user=1.14 sys=0.00, real=0.29 secs]
>>>> Heap after GC invocations=575 (full 157):
>>>>    garbage-first heap   total 27262976K, used 16119181K
>>>> [0x0000000160000000, 0x00000007e0000000, 0x00000007e0000000)
>>>>     region size 8192K, 36 young (294912K), 36 survivors (294912K)
>>>>    compacting perm gen  total 524288K, used 164479K [0x00000007e0000000,
>>>> 0x0000000800000000, 0x0000000800000000)
>>>>      the space 524288K,  31% used [0x00000007e0000000,
>>>> 0x00000007ea09ff68, 0x00000007ea0a0000, 0x0000000800000000)
>>>> No shared spaces configured.
>>>> }
>>>> {Heap before GC invocations=575 (full 157):
>>>>    garbage-first heap   total 27262976K, used 16119181K
>>>> [0x0000000160000000, 0x00000007e0000000, 0x00000007e0000000)
>>>>     region size 8192K, 37 young (303104K), 36 survivors (294912K)
>>>>    compacting perm gen  total 524288K, used 164479K [0x00000007e0000000,
>>>> 0x0000000800000000, 0x0000000800000000)
>>>>      the space 524288K,  31% used [0x00000007e0000000,
>>>> 0x00000007ea09ff68, 0x00000007ea0a0000, 0x0000000800000000)
>>>> No shared spaces configured.
>>>> 2013-05-29T17:28:56.413+0300: 38905.701: [Full GC
>>>> 15742M->14497M(26624M), 56.7731320 secs]
>>>>
>>>> That's the third Full GC today after the change to 26G and change from
>>>> occupancypercent=0. Tomorrow will be trying again with
>>>> occupancypercent=0
>>> What did you set the IHOP value to?
>>>
>>>>>>> -noclassgc (This is rarely needed and haven't seen an app that required it for quite some time)
>>>>>> Jvm 1.6 stopped the world for couple of minutes several times per day
>>>>>> while unloading classes, so we used noclassgc to disable that. We do
>>>>>> not know if this is necessary for latest 1.7 to avoid class unload
>>>>>> pause, but we continued to use this switch and found no harm in it.
>>>>>> Can't afford testing that in production ;)
>>>>> Haven't seen a case where unloading classes cause a several minute pause.
>>>>> Are you sure your system is not swapping?  And, do you have GC logs you
>>>>> can share that illustrate the behavior and that -noclassgc fixed it?
>>>> We deleted swap partition long time ago, we simply do not risk swapping at all.
>>>>
>>>> We had this class unloading problem several times per day like half a
>>>> year ago, and fixed it with noclasssgc, that was a no-brainer, single
>>>> parameter that made the difference.
>>>>
>>>> It is also discussed here (they do not discuss noclassgc though, we
>>>> figured that out somehow)
>>>> http://stackoverflow.com/questions/2833983/meaning-of-the-unloading-class-messages
>>> G1 only performs class unloading during a full GC. But if you're not
>>> running out of perm space or compiled code cache - you can leave this flag.
>>>
>>>>>>> -XX:+ UseGCOverheadLimit
>>>>>>> -XX:ReservedCodeCacheSize=48, that is the default for 7u21.  You might consider setting it higher if you have the available space, and more importantly if you think you're running out of code space.
>>>>>> For our sun jvm linux 64bit 48m is maximum, jvm won't start if higher value.
>>>>> If you can't go larger than -XX:ReservedCodeCacheSize=48m, that may suggest you have memory constraints and may also suggest you don't have enough swap space defined, and you may be experiencing swapping during JVM execution.  I've got a Linux system that has 32 GB of RAM, I can set ReservedCodeCacheSize=256m with no issues, even with -Xms30g and -Xms30g.
>>>> It is also documented that 48m is maximum
>>>> http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html
>>>> "maximum code cache size. [Solaris 64-bit, amd64, and -server x86:
>>>> 48m"
>>>>
>>>>
>>> That's the default max code cache size. When the JIT compiler compiles a
>>> Java method it places the generated code into the code cache. When
>>> there's no more room in the code cache, a warning is issued and JIT
>>> compilation is stopped. You can set it higher. IIRC there was time in
>>> the past when the size was limited in order to use short branches in
>>> compiled code. I don't think we've had that restriction for a while.
>>>
>>> HTHs
>>>
>>> JohnC
>>> _______________________________________________
>>> hotspot-gc-use mailing list
>>> hotspot-gc-use at openjdk.java.net
>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

-- 
Oracle <http://www.oracle.com>
Monica Beckwith | Principal Member of Technical Staff
VOIP: +15124011274 <tel:+15124011274>
Oracle Java Performance

Green Oracle <http://www.oracle.com/commitment> Oracle is committed to 
developing practices and products that help protect the environment
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130530/d988a187/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: oracle_sig_logo.gif
Type: image/gif
Size: 658 bytes
Desc: not available
Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130530/d988a187/oracle_sig_logo-0001.gif 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: green-for-email-sig_0.gif
Type: image/gif
Size: 356 bytes
Desc: not available
Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130530/d988a187/green-for-email-sig_0-0001.gif 

From darius.ski at gmail.com  Thu May 30 15:32:55 2013
From: darius.ski at gmail.com (Darius D.)
Date: Fri, 31 May 2013 01:32:55 +0300
Subject: Bug in G1GC it performs Full GC when code cache is full resulting
	in overkill
In-Reply-To: <51A7BCC7.4050700@oracle.com>
References: <CAKspAHKSnrndWznEe8Uk-frRALUJ_b_JVq+be5woQuj0BnKSeg@mail.gmail.com>
	<3807DE76-D6CA-451F-AC72-771332825905@salesforce.com>
	<CAKspAHJ5M7CikMmyr0t0h=5GOFjHJ3jACo-M6hW-ZXoz=v=-2g@mail.gmail.com>
	<DE80C755-56B1-4C39-AD35-3007C86762A7@salesforce.com>
	<CAKspAHJev3o+g_CN0H3NUsjXPpThMWppnvawiuG_P8YFpSDVEQ@mail.gmail.com>
	<5C892669-3515-4DCC-BB72-96AE89EBE5F8@salesforce.com>
	<CAKspAH+NQWUkbDwsOCAfkQr1aVKC_WPRipDWUEA7PezS38drBA@mail.gmail.com>
	<51A63C5F.9040808@oracle.com>
	<CAKt3ReJnoUeepUSuQ4erVaqY8MdKEqAixdsDhQag9vamkgEhNA@mail.gmail.com>
	<CAKspAHKtFfJ2bp0za4iZ9wwkTvejCM9x3sJ+Sm=15BVRdyHPaw@mail.gmail.com>
	<51A7BCC7.4050700@oracle.com>
Message-ID: <CAKt3ReKr143sugCdBq3egbuPC93W5eben+S_AbzipLqjzWEnrQ@mail.gmail.com>

Hi,

Monica, thanks a lot for Your additional insight about H-Objects, as i've
already mentioned in this thread, your great JavaOne presentation about
G1GC was key in solving our problem.

You are right that it is all about "fitting" H-Object in old gen. In our
case no matter how high (within confines of what server memory allowed us)
we set the heap size, we were still getting H-Obj alloc failures. Actually
even drastically cutting H-Object count we were still getting some Full
GCs, only after we increased region size step by step to 16M they ceased.
It was subtle fragmentation issue, made worse by rather small default
region size.

The following happened:

1) Big web request came in, 1-1.5GB of various allocations were done to
generate that Json string, triggering young gc in progress.
2) After young gc heap was a bit fragmented cause there was temporary, but
still "live" data. All that data now got into fresh survivor/old regions.
The "health" of heap now really depends on how long ago last mixed GC ran.
3) Into such "sprayed" heap we start to allocate our big object. I have no
idea how a set of regions is chosen for humongous region, but i think we
were generating a total of ~30 humongous objects ( "generating" as in
resizing StringBuffer somewhere deep in web framework till 30M fits) and
that was too much for G1GC to cope.
4) Reducing allocation rate is not enough unfortunately, those small ones
that slip are really dangerous - they are immediately allocated in OldGen,
fragmenting the heap further.
5) It is now a race between big web requests and next mixed gc. We could
reliably reproduce Full GCs in testing by generating several well timed big
requests :)

Getting region size up from default (actually I got a question, why G1GC is
aiming for thousands of regions, what are the drawbacks of larger than
"default" region) is more about reducing fragmentation by keeping those
large but temporary objects where they belong - in nursery where G1GC can
collect them efficiently.

So we have 3 important todo and tunables in avoiding H-Object caused FullGC:


1) Code changes - any profiler that can record every allocation stack trace
will help, set it to record each alloc above 1/2 heap region size and limit
them as much as possible
2) IHOP should be tuned to allow mixed GCs frequently enough even if your
app is behaving perfectly ( stable old gen + temporaries in nursery with
little promotion going on ).
3) G1 region size can be increased to reduce heap fragmentation caused by
H-Objects if they are temporary (by reducing them to ordinary objects
allocated in young gen)


Darius.


On Thu, May 30, 2013 at 11:55 PM, Monica Beckwith <
monica.beckwith at oracle.com> wrote:

>  +1 to enabling PrintAdaptiveSizePolicy.
> Darius,
> As you have already mentioned - Any object with size greater or equal to a
> half region is called a "humongous" object (H-obj). The max region size for
> G1 is 32M. So yes, even if you set your region size to max, your 32M object
> will be considered humongous.
> Now, there are a couple of things that we should be aware of with respect
> to humongous regions (H-region)/objects -
>
>    1. The H-obj allocation will happen directly into the old generation
>       1. There will be a check for marking threshold (IHOP), and a
>       concurrent cycle will be initiated if necessary
>    2. The H-regions are not included in an evacuation pause, since it's
>    just going to increase the copying expense.
>       1. But if the H-obj(s) are dead, they get freed at the end of the
>       multi-phased concurrent marking cycle.
>
> So, I think, if you have to work with H-objs and increasing the region
> size doesn't help (as is your case), then maybe you should try limiting
> your nursery so as to allow more space for the old generation, so as to
> sustain your H-objs till they die or if they are a part of your live data
> set, then it's all the more necessary to be able to fit them in your old
> gen.
>
> -Monica
>
>
> On 5/30/2013 1:54 PM, Martin Makundi wrote:
>
> Hi!
>
>
>  I'd strongly suggest that Martin should add
> -XX:+PrintAdaptiveSizePolicy to his JVM options. In our case that was
> what we needed to solve the mystery of FullGCs with gigabytes of heap
> free.
>
>  Thanks, will add that tomorrow.
>
>
>  Actually with some minor googling around i've found:
> https://forums.oracle.com/forums/thread.jspa?messageID=10869877
>
> I suspect it could be same story as ours, "humongous allocation
> request failed" is bad for JVM health, FullGC will occur immediately.
>
> Remember, any allocation that is larger than half of G1GC region size
> will get allocated as "humongous" object on heap, that does not care
> about regions etc. In our case we were failing to allocate 32
> megabytes with over 50% of heap free!
>
>  Any solution to such problem or is it a bug in g1gc? Is there a way to
> log what code is performing the memory allocation if that happens to
> be the case?
>
> **
> Martin
>
>  Best regards,
>
> Darius.
>
>
>
> On Wed, May 29, 2013 at 8:35 PM, John Cuthbertson<john.cuthbertson at oracle.com> <john.cuthbertson at oracle.com> wrote:
>
>  Hi Martin,
>
> I'm going to fill in bit more detail to Charlie's replies....
>
> On 5/29/2013 8:49 AM, Martin Makundi wrote:
>
>  Hi!
>
>
>  A bit of constructive criticism ;-)  It would be good practice to set one option at a time and measure its performance to determine whether it improves performance rather than choosing an option because of something you read in text.  In short, always measure and reason about whether what you've observed for an improvement or regression makes sense.  And, also run multiple times to get a sense of noise versus real improvement or regression.
>
>  Thanks. That's one of the reasons we never changed our options. Once
> we found someting that works very well, we know that its always n!
> work to test changes and the system was running very nice indeed
> before the previous tweak ;)
>
>
>   -XX:+UseFastAccessorMethod  (the default is disabled)
>
>  Fast sounds good, the description of it is "Use optimized versions of
> Get<Primitive>Field" which sounds good. I see no harm in this.
>
>  These would be JNI operations.
>
> A quick at the HotSpot source suggests UseFastAccessorMethods
> is mostly confined to interpreter operations.
>
>  Thanks for the info. Doesn't say much to me, but does not seem to harm
> anything. Will try setting it off at some point in time.
>
>
>   -XX:+UseNUMA  (Are you running a JVM that spans NUMA memory nodes?
> Or, do you have multiple JVMs running on a NUMA or non-NUMA system?)
>
>  Key point is that you should use -XX:+UseNUMA only when you are
> deploying  a JVM that spans NUMA nodes.
>
>  Thanks for the info. Doesn't say much to me, but does not seem to harm
> anything. Will try setting it off at some point in time.
>
>  The fast accessor methods flag creates specialized (i.e. short and
> optimized) interpreter entry points for accessor methods (those that
> just return the value in one of the object's fields). In most
> applications the bulk of the execution time is spent executing JIT
> compiled code; only a few percent is typically spent in Hotspot's
> interpreter. The JIT compiler will always try to inline accessor methods
> into their caller. So, unless your application is spending a ton of time
> interpreting, this flag should make no difference.
>
>
>   -XX:CMSInitiatingOccupancyFraction=70 (This is applicable to CMS GC, and not applicable to G1 GC)
>
>  Again, not documented thoroughly where it applies and where not, jvm
> gave no warning/error about it so we assumed it's valid.
>
>  There's always the HotSpot source code ;-)
>
> It's also quite well documented in various slide ware on the internet.
> It's also quite well documented in the Java Performance book. :-)
>
>  Uh.. does it say somewhere that Do not use
> XX:CMSInitiatingOccupancyFraction with G1GC? ;) I know performance
> tuning is your bread and butter but is not ours... is more like we are
> just driving the car and you are the mechanic...different
> perspective.. just trying to fill'er'up to go..leaded or unleaded...
> ;)
>
>  The G1 equivalent of this flag is InitiatingHeapOccupancyPercent (IHOP
> for short). Actually both G1 and CMS accept and observe IHOP.
> CMSInitiatingOccupancyFraction was superseded by IHOP. The JVM still
> accepts the old flag name - but it is CMS only and doesn't affect G1.
>
>
>   -XX:InitiatingHeapOccupancyPercent=0  (You realize this will force G1's concurrent cycle to run continuously?)
>
>  Yes, that's what we figured out, we don't want it to sit lazy and end
> up in a situation where it is required to do a Full GC. This switch
> was specifically chosen in a situation we had a memory leak and tried
> to aggressively fight against it before we found the root cause. Maybe
> we should try without this switch now, and see what effect it has.
>
>  Having GC logs to see what available head room you have between the
> initiating of a G1 concurrent cycle and available regions / heap space would be most appropriate.
>
>  Hmm.. I don't thoroughly understand the logs either, but, here is a snap:
>
> 2013-05-29T17:28:56.119+0300: 38905.407: [GC pause (young), 0.29261000 secs]
>     [Parallel Time: 288.8 ms]
>        [GC Worker Start (ms):  38905407.6  38905407.7  38905407.7  38905407.9
>         Avg: 38905407.7, Min: 38905407.6, Max: 38905407.9, Diff:   0.3]
>        [Ext Root Scanning (ms):  22.8  16.3  18.1  22.0
>         Avg:  19.8, Min:  16.3, Max:  22.8, Diff:   6.6]
>        [SATB Filtering (ms):  0.0  0.1  0.0  0.0
>         Avg:   0.0, Min:   0.0, Max:   0.1, Diff:   0.1]
>        [Update RS (ms):  31.9  37.3  35.1  33.3
>         Avg:  34.4, Min:  31.9, Max:  37.3, Diff:   5.5]
>           [Processed Buffers : 102 106 119 104
>            Sum: 431, Avg: 107, Min: 102, Max: 119, Diff: 17]
>        [Scan RS (ms):  0.0  0.0  0.1  0.0
>         Avg:   0.0, Min:   0.0, Max:   0.1, Diff:   0.1]
>        [Object Copy (ms):  228.2  229.1  229.5  227.3
>         Avg: 228.5, Min: 227.3, Max: 229.5, Diff:   2.2]
>        [Termination (ms):  0.0  0.0  0.0  0.0
>         Avg:   0.0, Min:   0.0, Max:   0.0, Diff:   0.0]
>           [Termination Attempts : 4 1 11 4
>            Sum: 20, Avg: 5, Min: 1, Max: 11, Diff: 10]
>        [GC Worker End (ms):  38905690.5  38905690.5  38905690.5  38905690.5
>         Avg: 38905690.5, Min: 38905690.5, Max: 38905690.5, Diff:   0.0]
>        [GC Worker (ms):  282.9  282.8  282.8  282.6
>         Avg: 282.8, Min: 282.6, Max: 282.9, Diff:   0.3]
>        [GC Worker Other (ms):  5.9  6.0  6.0  6.2
>         Avg:   6.0, Min:   5.9, Max:   6.2, Diff:   0.3]
>     [Complete CSet Marking:   0.0 ms]
>     [Clear CT:   0.1 ms]
>     [Other:   3.7 ms]
>        [Choose CSet:   0.0 ms]
>        [Ref Proc:   2.8 ms]
>        [Ref Enq:   0.1 ms]
>        [Free CSet:   0.3 ms]
>     [Eden: 48M(5032M)->0B(5032M) Survivors: 288M->288M Heap:
> 15790M(26624M)->15741M(26624M)]
>   [Times: user=1.14 sys=0.00, real=0.29 secs]
> Heap after GC invocations=575 (full 157):
>   garbage-first heap   total 27262976K, used 16119181K
> [0x0000000160000000, 0x00000007e0000000, 0x00000007e0000000)
>    region size 8192K, 36 young (294912K), 36 survivors (294912K)
>   compacting perm gen  total 524288K, used 164479K [0x00000007e0000000,
> 0x0000000800000000, 0x0000000800000000)
>     the space 524288K,  31% used [0x00000007e0000000,
> 0x00000007ea09ff68, 0x00000007ea0a0000, 0x0000000800000000)
> No shared spaces configured.
> }
> {Heap before GC invocations=575 (full 157):
>   garbage-first heap   total 27262976K, used 16119181K
> [0x0000000160000000, 0x00000007e0000000, 0x00000007e0000000)
>    region size 8192K, 37 young (303104K), 36 survivors (294912K)
>   compacting perm gen  total 524288K, used 164479K [0x00000007e0000000,
> 0x0000000800000000, 0x0000000800000000)
>     the space 524288K,  31% used [0x00000007e0000000,
> 0x00000007ea09ff68, 0x00000007ea0a0000, 0x0000000800000000)
> No shared spaces configured.
> 2013-05-29T17:28:56.413+0300: 38905.701: [Full GC
> 15742M->14497M(26624M), 56.7731320 secs]
>
> That's the third Full GC today after the change to 26G and change from
> occupancypercent=0. Tomorrow will be trying again with
> occupancypercent=0
>
>  What did you set the IHOP value to?
>
>
>    -noclassgc (This is rarely needed and haven't seen an app that required it for quite some time)
>
>  Jvm 1.6 stopped the world for couple of minutes several times per day
> while unloading classes, so we used noclassgc to disable that. We do
> not know if this is necessary for latest 1.7 to avoid class unload
> pause, but we continued to use this switch and found no harm in it.
> Can't afford testing that in production ;)
>
>  Haven't seen a case where unloading classes cause a several minute pause.
> Are you sure your system is not swapping?  And, do you have GC logs you
> can share that illustrate the behavior and that -noclassgc fixed it?
>
>  We deleted swap partition long time ago, we simply do not risk swapping at all.
>
> We had this class unloading problem several times per day like half a
> year ago, and fixed it with noclasssgc, that was a no-brainer, single
> parameter that made the difference.
>
> It is also discussed here (they do not discuss noclassgc though, we
> figured that out somehow)http://stackoverflow.com/questions/2833983/meaning-of-the-unloading-class-messages
>
>  G1 only performs class unloading during a full GC. But if you're not
> running out of perm space or compiled code cache - you can leave this flag.
>
>
>    -XX:+ UseGCOverheadLimit
> -XX:ReservedCodeCacheSize=48, that is the default for 7u21.  You might consider setting it higher if you have the available space, and more importantly if you think you're running out of code space.
>
>  For our sun jvm linux 64bit 48m is maximum, jvm won't start if higher value.
>
>  If you can't go larger than -XX:ReservedCodeCacheSize=48m, that may suggest you have memory constraints and may also suggest you don't have enough swap space defined, and you may be experiencing swapping during JVM execution.  I've got a Linux system that has 32 GB of RAM, I can set ReservedCodeCacheSize=256m with no issues, even with -Xms30g and -Xms30g.
>
>  It is also documented that 48m is maximumhttp://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html
> "maximum code cache size. [Solaris 64-bit, amd64, and -server x86:
> 48m"
>
>
>
>  That's the default max code cache size. When the JIT compiler compiles a
> Java method it places the generated code into the code cache. When
> there's no more room in the code cache, a warning is issued and JIT
> compilation is stopped. You can set it higher. IIRC there was time in
> the past when the size was limited in order to use short branches in
> compiled code. I don't think we've had that restriction for a while.
>
> HTHs
>
> JohnC
> _______________________________________________
> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>  _______________________________________________
> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>  _______________________________________________
> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
> --
> [image: Oracle] <http://www.oracle.com>
> Monica Beckwith | Principal Member of Technical Staff
> VOIP: +15124011274
> Oracle Java Performance
>
> [image: Green Oracle] <http://www.oracle.com/commitment> Oracle is
> committed to developing practices and products that help protect the
> environment
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130531/01047367/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 356 bytes
Desc: not available
Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130531/01047367/attachment-0002.gif 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 658 bytes
Desc: not available
Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20130531/01047367/attachment-0003.gif 

From martin.makundi at koodaripalvelut.com  Thu May 30 20:41:20 2013
From: martin.makundi at koodaripalvelut.com (Martin Makundi)
Date: Fri, 31 May 2013 06:41:20 +0300
Subject: Bug in G1GC it performs Full GC when code cache is full resulting
	in overkill
In-Reply-To: <51A7BCC7.4050700@oracle.com>
References: <CAKspAHKSnrndWznEe8Uk-frRALUJ_b_JVq+be5woQuj0BnKSeg@mail.gmail.com>
	<3807DE76-D6CA-451F-AC72-771332825905@salesforce.com>
	<CAKspAHJ5M7CikMmyr0t0h=5GOFjHJ3jACo-M6hW-ZXoz=v=-2g@mail.gmail.com>
	<DE80C755-56B1-4C39-AD35-3007C86762A7@salesforce.com>
	<CAKspAHJev3o+g_CN0H3NUsjXPpThMWppnvawiuG_P8YFpSDVEQ@mail.gmail.com>
	<5C892669-3515-4DCC-BB72-96AE89EBE5F8@salesforce.com>
	<CAKspAH+NQWUkbDwsOCAfkQr1aVKC_WPRipDWUEA7PezS38drBA@mail.gmail.com>
	<51A63C5F.9040808@oracle.com>
	<CAKt3ReJnoUeepUSuQ4erVaqY8MdKEqAixdsDhQag9vamkgEhNA@mail.gmail.com>
	<CAKspAHKtFfJ2bp0za4iZ9wwkTvejCM9x3sJ+Sm=15BVRdyHPaw@mail.gmail.com>
	<51A7BCC7.4050700@oracle.com>
Message-ID: <CAKspAHJCLnt3qcGgKncg8HEErSTmodondm0tjyD5jY-=pQU5gA@mail.gmail.com>

Hi!

> So, I think, if you have to work with H-objs and increasing the region
> size doesn't help (as is your case), then maybe you should try limiting your
> nursery so as to allow more space for the old generation, so as to sustain
> your H-objs till they die or if they are a part of your live data set, then
> it's all the more necessary to be able to fit them in your old gen.

1. I will post my logs from todays results.

2. How can this ("limiting your nursery") be achieved in order to
reach the goal?

3. Is there a way to adjust "region size" (what region??)?

4. Isn't g1gc with adaptivesizepolicy supposed to handle all this
automatically, i.e, it's a bug in the algorithm that it fails in these
situations?

> Monica, thanks a lot for Your additional insight about H-Objects, as
> i've already mentioned in this thread, your great JavaOne presentation
> about G1GC was key in solving our problem.

Thanks for the hint, googled
http://www.myexpospace.com/JavaOne2012/SessionFiles/CON6583_PDF_6583_0001.pdf

Also http://www.slideshare.net/C2B2/g1-garbage-collector-big-heaps-and-low-pauses
was nice.

> 1) Code changes - any profiler that can record every allocation
> stack trace will help, set it to record each alloc above 1/2 heap
> region size and limit them as much as possible

What profiler can do such, and moreover, what profiler can do that in
production enviroment with low overhead so that it can be run in
production?

> 2) IHOP should be tuned to allow mixed GCs frequently enough
> even if your app is behaving perfectly ( stable old gen +
> temporaries in nursery with little promotion going on ).

How to do that? We have set InitiatingHeapOccupancyPercent=0 and
either this or default neither rids us of Full GC's.

> 3) G1 region size can be increased to reduce heap fragmentation
> caused by H-Objects if they are temporary (by reducing them to
> ordinary objects allocated in young gen)

Region size can be adjusted with G1HeapRegionSize? How will it reflect
to adaptivesizepolicy, shouldnt adaptivesizepolicy automatically
handle that for me? Shouldn't g1gc be able to handle different size
regions simultaneously for best fit?

How about tuning parameter G1MixedGCLiveThresholdPercent? Can we see
from the logs how it is performing?

Should I enable -XX:+PrintGC with "fine" mode?

How about tuning parameter G1MixedGCCountTarget? Can we see from the
logs how it is performing?

Does the parameter G1MixedGCCountTarget have any effect when
InitiatingHeapOccupancyPercent=0? Can it 'run out'? What is defautl
value for G1MixedGCCountTarget and what is maximum value?

How about tuning parameter G1HeapWastePercent ? Can we see from the
logs how it is performing? How will it reflect to adaptivesizepolicy?

Anybody know if azul jvm handles these issues automatically?

**
Martin
>
> -Monica
>
>
> On 5/30/2013 1:54 PM, Martin Makundi wrote:
>
> Hi!
>
> I'd strongly suggest that Martin should add
> -XX:+PrintAdaptiveSizePolicy to his JVM options. In our case that was
> what we needed to solve the mystery of FullGCs with gigabytes of heap
> free.
>
> Thanks, will add that tomorrow.
>
> Actually with some minor googling around i've found:
>
> https://forums.oracle.com/forums/thread.jspa?messageID=10869877
>
> I suspect it could be same story as ours, "humongous allocation
> request failed" is bad for JVM health, FullGC will occur immediately.
>
> Remember, any allocation that is larger than half of G1GC region size
> will get allocated as "humongous" object on heap, that does not care
> about regions etc. In our case we were failing to allocate 32
> megabytes with over 50% of heap free!
>
> Any solution to such problem or is it a bug in g1gc? Is there a way to
> log what code is performing the memory allocation if that happens to
> be the case?
>
> **
> Martin
>
> Best regards,
>
> Darius.
>
>
>
> On Wed, May 29, 2013 at 8:35 PM, John Cuthbertson
> <john.cuthbertson at oracle.com> wrote:
>
> Hi Martin,
>
> I'm going to fill in bit more detail to Charlie's replies....
>
> On 5/29/2013 8:49 AM, Martin Makundi wrote:
>
> Hi!
>
> A bit of constructive criticism ;-)  It would be good practice to set one
> option at a time and measure its performance to determine whether it
> improves performance rather than choosing an option because of something you
> read in text.  In short, always measure and reason about whether what you've
> observed for an improvement or regression makes sense.  And, also run
> multiple times to get a sense of noise versus real improvement or
> regression.
>
> Thanks. That's one of the reasons we never changed our options. Once
> we found someting that works very well, we know that its always n!
> work to test changes and the system was running very nice indeed
> before the previous tweak ;)
>
> -XX:+UseFastAccessorMethod  (the default is disabled)
>
> Fast sounds good, the description of it is "Use optimized versions of
> Get<Primitive>Field" which sounds good. I see no harm in this.
>
> These would be JNI operations.
>
> A quick at the HotSpot source suggests UseFastAccessorMethods
> is mostly confined to interpreter operations.
>
> Thanks for the info. Doesn't say much to me, but does not seem to harm
> anything. Will try setting it off at some point in time.
>
> -XX:+UseNUMA  (Are you running a JVM that spans NUMA memory nodes?
> Or, do you have multiple JVMs running on a NUMA or non-NUMA system?)
>
> Key point is that you should use -XX:+UseNUMA only when you are
> deploying  a JVM that spans NUMA nodes.
>
> Thanks for the info. Doesn't say much to me, but does not seem to harm
> anything. Will try setting it off at some point in time.
>
> The fast accessor methods flag creates specialized (i.e. short and
> optimized) interpreter entry points for accessor methods (those that
> just return the value in one of the object's fields). In most
> applications the bulk of the execution time is spent executing JIT
> compiled code; only a few percent is typically spent in Hotspot's
> interpreter. The JIT compiler will always try to inline accessor methods
> into their caller. So, unless your application is spending a ton of time
> interpreting, this flag should make no difference.
>
> -XX:CMSInitiatingOccupancyFraction=70 (This is applicable to CMS GC, and
> not applicable to G1 GC)
>
> Again, not documented thoroughly where it applies and where not, jvm
> gave no warning/error about it so we assumed it's valid.
>
> There's always the HotSpot source code ;-)
>
> It's also quite well documented in various slide ware on the internet.
> It's also quite well documented in the Java Performance book. :-)
>
> Uh.. does it say somewhere that Do not use
> XX:CMSInitiatingOccupancyFraction with G1GC? ;) I know performance
> tuning is your bread and butter but is not ours... is more like we are
> just driving the car and you are the mechanic...different
> perspective.. just trying to fill'er'up to go..leaded or unleaded...
> ;)
>
> The G1 equivalent of this flag is InitiatingHeapOccupancyPercent (IHOP
> for short). Actually both G1 and CMS accept and observe IHOP.
> CMSInitiatingOccupancyFraction was superseded by IHOP. The JVM still
> accepts the old flag name - but it is CMS only and doesn't affect G1.
>
> -XX:InitiatingHeapOccupancyPercent=0  (You realize this will force G1's
> concurrent cycle to run continuously?)
>
> Yes, that's what we figured out, we don't want it to sit lazy and end
> up in a situation where it is required to do a Full GC. This switch
> was specifically chosen in a situation we had a memory leak and tried
> to aggressively fight against it before we found the root cause. Maybe
> we should try without this switch now, and see what effect it has.
>
> Having GC logs to see what available head room you have between the
> initiating of a G1 concurrent cycle and available regions / heap space
> would be most appropriate.
>
> Hmm.. I don't thoroughly understand the logs either, but, here is a snap:
>
> 2013-05-29T17:28:56.119+0300: 38905.407: [GC pause (young), 0.29261000
> secs]
>     [Parallel Time: 288.8 ms]
>        [GC Worker Start (ms):  38905407.6  38905407.7  38905407.7
> 38905407.9
>         Avg: 38905407.7, Min: 38905407.6, Max: 38905407.9, Diff:   0.3]
>        [Ext Root Scanning (ms):  22.8  16.3  18.1  22.0
>         Avg:  19.8, Min:  16.3, Max:  22.8, Diff:   6.6]
>        [SATB Filtering (ms):  0.0  0.1  0.0  0.0
>         Avg:   0.0, Min:   0.0, Max:   0.1, Diff:   0.1]
>        [Update RS (ms):  31.9  37.3  35.1  33.3
>         Avg:  34.4, Min:  31.9, Max:  37.3, Diff:   5.5]
>           [Processed Buffers : 102 106 119 104
>            Sum: 431, Avg: 107, Min: 102, Max: 119, Diff: 17]
>        [Scan RS (ms):  0.0  0.0  0.1  0.0
>         Avg:   0.0, Min:   0.0, Max:   0.1, Diff:   0.1]
>        [Object Copy (ms):  228.2  229.1  229.5  227.3
>         Avg: 228.5, Min: 227.3, Max: 229.5, Diff:   2.2]
>        [Termination (ms):  0.0  0.0  0.0  0.0
>         Avg:   0.0, Min:   0.0, Max:   0.0, Diff:   0.0]
>           [Termination Attempts : 4 1 11 4
>            Sum: 20, Avg: 5, Min: 1, Max: 11, Diff: 10]
>        [GC Worker End (ms):  38905690.5  38905690.5  38905690.5
> 38905690.5
>         Avg: 38905690.5, Min: 38905690.5, Max: 38905690.5, Diff:   0.0]
>        [GC Worker (ms):  282.9  282.8  282.8  282.6
>         Avg: 282.8, Min: 282.6, Max: 282.9, Diff:   0.3]
>        [GC Worker Other (ms):  5.9  6.0  6.0  6.2
>         Avg:   6.0, Min:   5.9, Max:   6.2, Diff:   0.3]
>     [Complete CSet Marking:   0.0 ms]
>     [Clear CT:   0.1 ms]
>     [Other:   3.7 ms]
>        [Choose CSet:   0.0 ms]
>        [Ref Proc:   2.8 ms]
>        [Ref Enq:   0.1 ms]
>        [Free CSet:   0.3 ms]
>     [Eden: 48M(5032M)->0B(5032M) Survivors: 288M->288M Heap:
> 15790M(26624M)->15741M(26624M)]
>   [Times: user=1.14 sys=0.00, real=0.29 secs]
> Heap after GC invocations=575 (full 157):
>   garbage-first heap   total 27262976K, used 16119181K
> [0x0000000160000000, 0x00000007e0000000, 0x00000007e0000000)
>    region size 8192K, 36 young (294912K), 36 survivors (294912K)
>   compacting perm gen  total 524288K, used 164479K [0x00000007e0000000,
> 0x0000000800000000, 0x0000000800000000)
>     the space 524288K,  31% used [0x00000007e0000000,
> 0x00000007ea09ff68, 0x00000007ea0a0000, 0x0000000800000000)
> No shared spaces configured.
> }
> {Heap before GC invocations=575 (full 157):
>   garbage-first heap   total 27262976K, used 16119181K
> [0x0000000160000000, 0x00000007e0000000, 0x00000007e0000000)
>    region size 8192K, 37 young (303104K), 36 survivors (294912K)
>   compacting perm gen  total 524288K, used 164479K [0x00000007e0000000,
> 0x0000000800000000, 0x0000000800000000)
>     the space 524288K,  31% used [0x00000007e0000000,
> 0x00000007ea09ff68, 0x00000007ea0a0000, 0x0000000800000000)
> No shared spaces configured.
> 2013-05-29T17:28:56.413+0300: 38905.701: [Full GC
> 15742M->14497M(26624M), 56.7731320 secs]
>
> That's the third Full GC today after the change to 26G and change from
> occupancypercent=0. Tomorrow will be trying again with
> occupancypercent=0
>
> What did you set the IHOP value to?
>
> -noclassgc (This is rarely needed and haven't seen an app that required it
> for quite some time)
>
> Jvm 1.6 stopped the world for couple of minutes several times per day
> while unloading classes, so we used noclassgc to disable that. We do
> not know if this is necessary for latest 1.7 to avoid class unload
> pause, but we continued to use this switch and found no harm in it.
> Can't afford testing that in production ;)
>
> Haven't seen a case where unloading classes cause a several minute pause.
> Are you sure your system is not swapping?  And, do you have GC logs you
> can share that illustrate the behavior and that -noclassgc fixed it?
>
> We deleted swap partition long time ago, we simply do not risk swapping at
> all.
>
> We had this class unloading problem several times per day like half a
> year ago, and fixed it with noclasssgc, that was a no-brainer, single
> parameter that made the difference.
>
> It is also discussed here (they do not discuss noclassgc though, we
> figured that out somehow)
>
> http://stackoverflow.com/questions/2833983/meaning-of-the-unloading-class-messages
>
> G1 only performs class unloading during a full GC. But if you're not
> running out of perm space or compiled code cache - you can leave this
> flag.
>
> -XX:+ UseGCOverheadLimit
> -XX:ReservedCodeCacheSize=48, that is the default for 7u21.  You might
> consider setting it higher if you have the available space, and more
> importantly if you think you're running out of code space.
>
> For our sun jvm linux 64bit 48m is maximum, jvm won't start if higher
> value.
>
> If you can't go larger than -XX:ReservedCodeCacheSize=48m, that may
> suggest you have memory constraints and may also suggest you don't have
> enough swap space defined, and you may be experiencing swapping during JVM
> execution.  I've got a Linux system that has 32 GB of RAM, I can set
> ReservedCodeCacheSize=256m with no issues, even with -Xms30g and -Xms30g.
>
> It is also documented that 48m is maximum
>
> http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html
> "maximum code cache size. [Solaris 64-bit, amd64, and -server x86:
> 48m"
>
>
> That's the default max code cache size. When the JIT compiler compiles a
> Java method it places the generated code into the code cache. When
> there's no more room in the code cache, a warning is issued and JIT
> compilation is stopped. You can set it higher. IIRC there was time in
> the past when the size was limited in order to use short branches in
> compiled code. I don't think we've had that restriction for a while.
>
> HTHs
>
> JohnC
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
> --
>
> Monica Beckwith | Principal Member of Technical Staff
> VOIP: +15124011274
> Oracle Java Performance
>
> Oracle is committed to developing practices and products that help protect
> the environment