From matt.khan at db.com  Wed Apr 14 04:28:03 2010
From: matt.khan at db.com (Matt Khan)
Date: Wed, 14 Apr 2010 12:28:03 +0100
Subject: Avoiding 1 long CMS with a big heap
Message-ID: <OFF2843927.0084A8CF-ON80257705.003E2F96-80257705.003EFE1F@db.com>

Hi

I have been experimenting with larger heap sizes to see if I can reduce 
the frequency of my pauses, the switches I have used are;

-Xms16192m 
-Xmx16192m 
-Xmn16000m 
-XX:+UseCompressedOops 
-XX:+UseConcMarkSweepGC 
-XX:+CMSIncrementalMode 
-XX:+CMSIncrementalPacing 
-XX:+UseParNewGC 
-XX:MaxTenuringThreshold=1 
-XX:+PrintTenuringDistribution 
-XX:SurvivorRatio=1022 
-XX:TargetSurvivorRatio=90 
-XX:+DisableExplicitGC 
-XX:+PrintGCDetails 
-XX:+PrintGCDateStamps 
-XX:+PrintGCApplicationStoppedTime 
-XX:+PrintGCApplicationConcurrentTime

This works quite well in that our average pause time is ~4ms every 30s or 
so (albeit in the range 5-60s but most commonly about every 30s).

There is one wrinkle and that is an initial, v long CMS pause. It happens 
about ~40mins after start

2010-04-13T23:05:29.072+0000: 3342.854: [GC [1 CMS-initial-mark: 
12018K(196608K)] 7811496K(16564608K), 4.2147116 secs] [Times: user=4.15 
sys=0.06, real=4.22 secs] 
Total time for which application threads were stopped: 4.2243800 seconds
2010-04-13T23:05:33.288+0000: 3347.070: [CMS-concurrent-mark-start]
Application time: 0.0021391 seconds
Total time for which application threads were stopped: 0.0046091 seconds
2010-04-13T23:05:33.343+0000: 3347.125: [CMS-concurrent-mark: 0.048/0.056 
secs] [Times: user=0.35 sys=0.15, real=0.06 secs] 
2010-04-13T23:05:33.344+0000: 3347.126: [CMS-concurrent-preclean-start]
2010-04-13T23:05:33.347+0000: 3347.128: [CMS-concurrent-preclean: 
0.003/0.003 secs] [Times: user=0.01 sys=0.01, real=0.00 secs] 
2010-04-13T23:05:33.347+0000: 3347.129: 
[CMS-concurrent-abortable-preclean-start]
 CMS: abort preclean due to time 2010-04-13T23:05:38.456+0000: 3352.238: 
[CMS-concurrent-abortable-preclean: 0.573/5.109 secs] [Times: user=0.94 
sys=0.24, real=5.11 secs] 
Application time: 4.1574245 seconds
2010-04-13T23:05:38.461+0000: 3352.243: [GC[YG occupancy: 7824966 K 
(16368000 K)]3352.243: [Rescan (parallel) , 4.2295340 secs]3356.473: [weak 
refs processing, 0.0000422 secs] [1 CMS-remark: 12018K(196608K)] 
7836984K(16564608K), 4.2298963 secs] [Times: user=36.56 sys=1.65, 
real=4.23 secs] 
Total time for which application threads were stopped: 4.2351364 seconds
2010-04-13T23:05:42.692+0000: 3356.473: [CMS-concurrent-sweep-start]
Application time: 0.0003415 seconds

What I would like to understand is why there is such a long pause when the 
tenured is nowhere near full (~12m of 192m occupied)? and hence whether 
this can be avoided?

Cheers
Matt

Matt Khan
--------------------------------------------------
GFFX Auto Trading
Deutsche Bank, London


---

This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and delete this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.

Please refer to http://www.db.com/en/content/eu_disclosures.htm for additional EU corporate and regulatory disclosures.

From jon.masamitsu at oracle.com  Wed Apr 14 06:57:34 2010
From: jon.masamitsu at oracle.com (Jon Masamitsu)
Date: Wed, 14 Apr 2010 06:57:34 -0700
Subject: Avoiding 1 long CMS with a big heap
In-Reply-To: <OFF2843927.0084A8CF-ON80257705.003E2F96-80257705.003EFE1F@db.com>
References: <OFF2843927.0084A8CF-ON80257705.003E2F96-80257705.003EFE1F@db.com>
Message-ID: <4BC5C9CE.4060309@oracle.com>

Matt,

What jdk release are you using?

Can you send the GC log for a few of the young generation collections
before the initial mark (the very long pause of ~ 4secs)?  For a few young
generation collections after the end of the CMS cycle may be
interesting too.

Does the long pause occur with every initial mark (CMS-initial-mark)
or just the first one?

What type of  platform are you running on?

Why are you using CMSIncrementalMode?

You seem to have decided to promote (copy to the tenured
generation) everything that survives a young gen collection
(small MaxTenuringThreshold, small survivor spaces).
Is that to minimize the young gen collection pauses?

Jon


On 04/14/10 04:28, Matt Khan wrote:
> Hi
>
> I have been experimenting with larger heap sizes to see if I can reduce 
> the frequency of my pauses, the switches I have used are;
>
> -Xms16192m 
> -Xmx16192m 
> -Xmn16000m 
> -XX:+UseCompressedOops 
> -XX:+UseConcMarkSweepGC 
> -XX:+CMSIncrementalMode 
> -XX:+CMSIncrementalPacing 
> -XX:+UseParNewGC 
> -XX:MaxTenuringThreshold=1 
> -XX:+PrintTenuringDistribution 
> -XX:SurvivorRatio=1022 
> -XX:TargetSurvivorRatio=90 
> -XX:+DisableExplicitGC 
> -XX:+PrintGCDetails 
> -XX:+PrintGCDateStamps 
> -XX:+PrintGCApplicationStoppedTime 
> -XX:+PrintGCApplicationConcurrentTime
>
> This works quite well in that our average pause time is ~4ms every 30s or 
> so (albeit in the range 5-60s but most commonly about every 30s).
>
> There is one wrinkle and that is an initial, v long CMS pause. It happens 
> about ~40mins after start
>
> 2010-04-13T23:05:29.072+0000: 3342.854: [GC [1 CMS-initial-mark: 
> 12018K(196608K)] 7811496K(16564608K), 4.2147116 secs] [Times: user=4.15 
> sys=0.06, real=4.22 secs] 
> Total time for which application threads were stopped: 4.2243800 seconds
> 2010-04-13T23:05:33.288+0000: 3347.070: [CMS-concurrent-mark-start]
> Application time: 0.0021391 seconds
> Total time for which application threads were stopped: 0.0046091 seconds
> 2010-04-13T23:05:33.343+0000: 3347.125: [CMS-concurrent-mark: 0.048/0.056 
> secs] [Times: user=0.35 sys=0.15, real=0.06 secs] 
> 2010-04-13T23:05:33.344+0000: 3347.126: [CMS-concurrent-preclean-start]
> 2010-04-13T23:05:33.347+0000: 3347.128: [CMS-concurrent-preclean: 
> 0.003/0.003 secs] [Times: user=0.01 sys=0.01, real=0.00 secs] 
> 2010-04-13T23:05:33.347+0000: 3347.129: 
> [CMS-concurrent-abortable-preclean-start]
>  CMS: abort preclean due to time 2010-04-13T23:05:38.456+0000: 3352.238: 
> [CMS-concurrent-abortable-preclean: 0.573/5.109 secs] [Times: user=0.94 
> sys=0.24, real=5.11 secs] 
> Application time: 4.1574245 seconds
> 2010-04-13T23:05:38.461+0000: 3352.243: [GC[YG occupancy: 7824966 K 
> (16368000 K)]3352.243: [Rescan (parallel) , 4.2295340 secs]3356.473: [weak 
> refs processing, 0.0000422 secs] [1 CMS-remark: 12018K(196608K)] 
> 7836984K(16564608K), 4.2298963 secs] [Times: user=36.56 sys=1.65, 
> real=4.23 secs] 
> Total time for which application threads were stopped: 4.2351364 seconds
> 2010-04-13T23:05:42.692+0000: 3356.473: [CMS-concurrent-sweep-start]
> Application time: 0.0003415 seconds
>
> What I would like to understand is why there is such a long pause when the 
> tenured is nowhere near full (~12m of 192m occupied)? and hence whether 
> this can be avoided?
>
> Cheers
> Matt
>
> Matt Khan
> --------------------------------------------------
> GFFX Auto Trading
> Deutsche Bank, London
>
>
>
> ---
>
> This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and delete this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.
>
> Please refer to http://www.db.com/en/content/eu_disclosures.htm for additional EU corporate and regulatory disclosures.
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>   

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100414/8a77c03a/attachment.html 

From matt.khan at db.com  Wed Apr 14 07:24:05 2010
From: matt.khan at db.com (Matt Khan)
Date: Wed, 14 Apr 2010 15:24:05 +0100
Subject: Avoiding 1 long CMS with a big heap
In-Reply-To: <4BC5C9CE.4060309@oracle.com>
Message-ID: <OFA8AED64F.016F36BE-ON80257705.004D349E-80257705.004F1BE1@db.com>

Hi Jon

>> What jdk release are you using?
6u18

>> What type of  platform are you running on?
Solaris 10 x86 (Sun x4600)

>> You seem to have decided to promote (copy to the tenured generation) 
everything that survives a young gen collection (small 
MaxTenuringThreshold, small survivor spaces). Is that to minimize the 
young gen collection pauses?
this post (
http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2010-April/001671.html) 
has a bit more info on our allocation behaviour. Basically there is static 
data loaded at startup and then mostly things don't last v long before 
they're eligible for eviction. Therefore my thinking is that since the set 
of live objects at any one time is small and they don't last long then the 
probability of anything surviving 2 young collections is minimal and 
therefore I might as well increase the interval between young collections 
marginally by making active use of that space in eden. 

The target is simply consistently low pauses.

>> Why are you using CMSIncrementalMode?
in the past we found that it helps smooth out the pause times a little, I 
left it in on this run to minimise the no of changes between what we have 
now and what I'm benching. I was expecting a single CMS run and no more 
tbh as I was working on the understanding that it always has one CMS event 
near startup in order to collect some stats on what is going on.

>> Does the long pause occur with every initial mark (CMS-initial-mark) or 
just the first one?
benchmark has run for 16hrs so far and there have been 2 initial marks so 
far, the first one I expected but the 2nd one I didn't.

2010-04-13T22:09:58.463+0000: 12.249: [GC [1 CMS-initial-mark: 
3679K(196608K)] 3679K(16564608K), 0.0009694 secs] [Times: user=0.00 
sys=0.00, real=0.00 secs] 
2010-04-13T23:05:29.072+0000: 3342.854: [GC [1 CMS-initial-mark: 
12018K(196608K)] 7811496K(16564608K), 4.2147116 secs] [Times: user=4.15 
sys=0.06, real=4.22 secs] 

>> Can you send the GC log for a few of the young generation collections 
before the initial mark (the very long pause of ~ 4secs)?
here you go, have stripped out the stopped/concurrent time entries to show 
all the GC events leading upto the long pause & the next 5 young 
collections.

2010-04-13T22:09:58.234+0000: 12.019: [Full GC 12.021: [CMS: 
0K->3679K(196608K), 0.2269262 secs] 4251520K->3679K(16564608K), [CMS Perm 
: 21247K->21209K(21248K)], 0.2286580 secs] [Times: user=0.22 sys=0.01, 
real=0.23 secs] 
2010-04-13T22:09:58.463+0000: 12.249: [GC [1 CMS-initial-mark: 
3679K(196608K)] 3679K(16564608K), 0.0009694 secs] [Times: user=0.00 
sys=0.00, real=0.00 secs] 
2010-04-13T22:09:58.464+0000: 12.250: [CMS-concurrent-mark-start]
2010-04-13T22:10:01.522+0000: 15.308: [CMS-concurrent-mark: 0.024/3.058 
secs] [Times: user=3.29 sys=1.09, real=3.06 secs] 
2010-04-13T22:10:01.522+0000: 15.308: [CMS-concurrent-preclean-start]
2010-04-13T22:10:01.539+0000: 15.325: [CMS-concurrent-preclean: 
0.016/0.017 secs] [Times: user=0.03 sys=0.05, real=0.02 secs] 
2010-04-13T22:10:01.539+0000: 15.325: 
[CMS-concurrent-abortable-preclean-start]
2010-04-13T22:10:08.650+0000: 22.436: [GC 22.437: [ParNew
Desired survivor size 14745600 bytes, new threshold 1 (max 1)
- age   1:    8049864 bytes,    8049864 total
- age   2:        112 bytes,    8049976 total
: 16352000K->7981K(16368000K), 0.0305746 secs] 
16355679K->11661K(16564608K), 0.0312194 secs] [Times: user=0.14 sys=0.05, 
real=0.03 secs] 
 CMS: abort preclean due to time 2010-04-13T22:13:56.812+0000: 250.598: 
[CMS-concurrent-abortable-preclean: 0.013/235.273 secs] [Times: user=48.48 
sys=11.66, real=235.27 secs] 
2010-04-13T22:13:56.814+0000: 250.599: [GC[YG occupancy: 7376304 K 
(16368000 K)]250.600: [Rescan (parallel) , 1.9740400 secs]252.574: [weak 
refs processing, 0.0015189 secs] [1 CMS-remark: 3679K(196608K)] 
7379983K(16564608K), 1.9759675 secs] [Times: user=17.08 sys=0.82, 
real=1.98 secs] 
2010-04-13T22:13:58.791+0000: 252.576: [CMS-concurrent-sweep-start]
2010-04-13T22:13:58.796+0000: 252.581: [CMS-concurrent-sweep: 0.005/0.005 
secs] [Times: user=0.04 sys=0.02, real=0.01 secs] 
2010-04-13T22:13:58.800+0000: 252.585: [CMS-concurrent-reset-start]
2010-04-13T22:13:58.805+0000: 252.591: [CMS-concurrent-reset: 0.006/0.006 
secs] [Times: user=0.03 sys=0.01, real=0.01 secs] 
2010-04-13T22:47:09.067+0000: 2242.850: [GC 2242.851: [ParNew
Desired survivor size 14745600 bytes, new threshold 1 (max 1)
- age   1:    7803368 bytes,    7803368 total
: 16359981K->12388K(16368000K), 0.0589077 secs] 
16363570K->24406K(16564608K) icms_dc=5 , 0.0593692 secs] [Times: user=0.18 
sys=0.06, real=0.06 secs] 
2010-04-13T23:05:29.072+0000: 3342.854: [GC [1 CMS-initial-mark: 
12018K(196608K)] 7811496K(16564608K), 4.2147116 secs] [Times: user=4.15 
sys=0.06, real=4.22 secs] 
2010-04-13T23:05:33.288+0000: 3347.070: [CMS-concurrent-mark-start]
2010-04-13T23:05:33.343+0000: 3347.125: [CMS-concurrent-mark: 0.048/0.056 
secs] [Times: user=0.35 sys=0.15, real=0.06 secs] 
2010-04-13T23:05:33.344+0000: 3347.126: [CMS-concurrent-preclean-start]
2010-04-13T23:05:33.347+0000: 3347.128: [CMS-concurrent-preclean: 
0.003/0.003 secs] [Times: user=0.01 sys=0.01, real=0.00 secs] 
2010-04-13T23:05:33.347+0000: 3347.129: 
[CMS-concurrent-abortable-preclean-start]
 CMS: abort preclean due to time 2010-04-13T23:05:38.456+0000: 3352.238: 
[CMS-concurrent-abortable-preclean: 0.573/5.109 secs] [Times: user=0.94 
sys=0.24, real=5.11 secs] 
2010-04-13T23:05:38.461+0000: 3352.243: [GC[YG occupancy: 7824966 K 
(16368000 K)]3352.243: [Rescan (parallel) , 4.2295340 secs]3356.473: [weak 
refs processing, 0.0000422 secs] [1 CMS-remark: 12018K(196608K)] 
7836984K(16564608K), 4.2298963 secs] [Times: user=36.56 sys=1.65, 
real=4.23 secs] 
2010-04-13T23:05:42.692+0000: 3356.473: [CMS-concurrent-sweep-start]
2010-04-13T23:05:42.765+0000: 3356.547: [CMS-concurrent-sweep: 0.018/0.074 
secs] [Times: user=0.13 sys=0.05, real=0.07 secs] 
2010-04-13T23:05:42.768+0000: 3356.549: [CMS-concurrent-reset-start]
2010-04-13T23:05:42.771+0000: 3356.552: [CMS-concurrent-reset: 0.003/0.003 
secs] [Times: user=0.01 sys=0.01, real=0.00 secs] 
2010-04-13T23:40:49.463+0000: 5463.242: [GC 5463.243: [ParNew
Desired survivor size 14745600 bytes, new threshold 1 (max 1)
- age   1:    4876232 bytes,    4876232 total
: 16364388K->6895K(16368000K), 0.0338554 secs] 
16375925K->21493K(16564608K) icms_dc=0 , 0.0342923 secs] [Times: user=0.12 
sys=0.04, real=0.03 secs] 
2010-04-14T00:41:32.410+0000: 9106.186: [GC 9106.186: [ParNew
Desired survivor size 14745600 bytes, new threshold 1 (max 1)
- age   1:    7471944 bytes,    7471944 total
: 16358895K->8975K(16368000K), 0.0301672 secs] 
16373493K->23921K(16564608K) icms_dc=0 , 0.0306159 secs] [Times: user=0.12 
sys=0.03, real=0.03 secs] 
2010-04-14T01:42:59.422+0000: 12793.193: [GC 12793.193: [ParNew
Desired survivor size 14745600 bytes, new threshold 1 (max 1)
- age   1:    6720592 bytes,    6720592 total
: 16360975K->10859K(16368000K), 0.1057987 secs] 
16375921K->28778K(16564608K) icms_dc=0 , 0.1062338 secs] [Times: user=0.12 
sys=0.14, real=0.11 secs] 2010-04-14T02:46:12.795+0000: 16586.561: [GC 
16586.561: [ParNew
Desired survivor size 14745600 bytes, new threshold 1 (max 1)
- age   1:    6370400 bytes,    6370400 total
: 16362859K->7334K(16368000K), 0.0498029 secs] 
16380778K->27405K(16564608K) icms_dc=0 , 0.0502474 secs] [Times: user=0.10 
sys=0.04, real=0.05 secs] 
2010-04-14T03:54:06.986+0000: 20660.747: [GC 20660.747: [ParNew
Desired survivor size 14745600 bytes, new threshold 1 (max 1)
- age   1:    5864544 bytes,    5864544 total
: 16359334K->8803K(16368000K), 0.0434378 secs] 
16379405K->30657K(16564608K) icms_dc=0 , 0.0439089 secs] [Times: user=0.10 
sys=0.03, real=0.04 secs] 

Cheers
Matt

Matt Khan
--------------------------------------------------
GFFX Auto Trading
Deutsche Bank, London


---

This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and delete this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.

Please refer to http://www.db.com/en/content/eu_disclosures.htm for additional EU corporate and regulatory disclosures.

From jon.masamitsu at oracle.com  Wed Apr 14 10:37:19 2010
From: jon.masamitsu at oracle.com (Jon Masamitsu)
Date: Wed, 14 Apr 2010 10:37:19 -0700
Subject: Avoiding 1 long CMS with a big heap
In-Reply-To: <OFA8AED64F.016F36BE-ON80257705.004D349E-80257705.004F1BE1@db.com>
References: <OFA8AED64F.016F36BE-ON80257705.004D349E-80257705.004F1BE1@db.com>
Message-ID: <4BC5FD4F.6030506@oracle.com>

On 04/14/10 07:24, Matt Khan wrote:
>
>  
> 2010-04-13T22:47:09.067+0000: 2242.850: [GC 2242.851: [ParNew
> Desired survivor size 14745600 bytes, new threshold 1 (max 1)
> - age   1:    7803368 bytes,    7803368 total
> : 16359981K->12388K(16368000K), 0.0589077 secs] 
> 16363570K->24406K(16564608K) icms_dc=5 , 0.0593692 secs] [Times: user=0.18 
> sys=0.06, real=0.06 secs] 
> 2010-04-13T23:05:29.072+0000: 3342.854: [GC [1 CMS-initial-mark: 
> 12018K(196608K)] 7811496K(16564608K), 4.2147116 secs] [Times: user=4.15 
> sys=0.06, real=4.22 secs] 
>   
I'm assuming that there was no GC activity between the ParNew collection and
the initial-mark above.

That says that the application has been filling up the young gen for a 
while.  Most of that might
be dead when the initial-mark starts but we don't know so we have to 
assume that it's live.
Live objects in the young gen can keep objects alive in the tenured 
(cms) generation so
we need to look at the young gen and your young gen is large so there's 
lots of GC work
there.

The first initial-mark had a full GC just before it (right?) so the 
young gen was likely
empty.

If you run your test case longer and see mostly longer initial-marks, 
that would suggest
that my guess is right.
> 2010-04-13T23:05:33.288+0000: 3347.070: [CMS-concurrent-mark-start]
> 2010-04-13T23:05:33.343+0000: 3347.125: [CMS-concurrent-mark: 0.048/0.056 
> secs] [Times: user=0.35 sys=0.15, real=0.06 secs] 
> 2010-04-13T23:05:33.344+0000: 3347.126: [CMS-concurrent-preclean-start]
> 2010-04-13T23:05:33.347+0000: 3347.128: [CMS-concurrent-preclean: 
> 0.003/0.003 secs] [Times: user=0.01 sys=0.01, real=0.00 secs] 
> 2010-04-13T23:05:33.347+0000: 3347.129: 
> [CMS-concurrent-abortable-preclean-start]
>  CMS: abort preclean due to time 2010-04-13T23:05:38.456+0000: 3352.238: 
> [CMS-concurrent-abortable-preclean: 0.573/5.109 secs] [Times: user=0.94 
> sys=0.24, real=5.11 secs] 
> 2010-04-13T23:05:38.461+0000: 3352.243: [GC[YG occupancy: 7824966 K 
> (16368000 K)]3352.243: [Rescan (parallel) , 4.2295340 secs]3356.473: [weak 
> refs processing, 0.0000422 secs] [1 CMS-remark: 12018K(196608K)] 
> 7836984K(16564608K), 4.2298963 secs] [Times: user=36.56 sys=1.65, 
> real=4.23 secs] 
>   
Also note that your remark pause is long.  If you add 
-XX:+CMSScavengeBeforeRemark
it will schedule a ParNew collection before the remark.  If this causes 
the remark pause
to drop significantly, that also suggests that the issue is just lots of 
work with
lots of live objects in the young gen (because the ParNew collection 
will reduce the
number of live objects in the young gen).


From y.s.ramakrishna at oracle.com  Wed Apr 14 10:42:48 2010
From: y.s.ramakrishna at oracle.com (Y. Srinivas Ramakrishna)
Date: Wed, 14 Apr 2010 10:42:48 -0700
Subject: Avoiding 1 long CMS with a big heap
In-Reply-To: <4BC5FD4F.6030506@oracle.com>
References: <OFA8AED64F.016F36BE-ON80257705.004D349E-80257705.004F1BE1@db.com>
	<4BC5FD4F.6030506@oracle.com>
Message-ID: <4BC5FE98.2070502@oracle.com>

Right, this is a performance bug in CMS in general (init
mark is single-threaded) and is exacerbated by iCMS (because
of the way the init-mark is scheduled currently between scavenges
so as to temporally separate pauses). Just drop the incremental
option and you would be fine in this case.

I'll annotate an existing bug for long init mark pauses
with this info about the pathology being exacerbated
by iCMS .... and see if we can fix it sooner ...

-- ramki

On 04/14/10 10:37, Jon Masamitsu wrote:
> On 04/14/10 07:24, Matt Khan wrote:
>>  
>> 2010-04-13T22:47:09.067+0000: 2242.850: [GC 2242.851: [ParNew
>> Desired survivor size 14745600 bytes, new threshold 1 (max 1)
>> - age   1:    7803368 bytes,    7803368 total
>> : 16359981K->12388K(16368000K), 0.0589077 secs] 
>> 16363570K->24406K(16564608K) icms_dc=5 , 0.0593692 secs] [Times: user=0.18 
>> sys=0.06, real=0.06 secs] 
>> 2010-04-13T23:05:29.072+0000: 3342.854: [GC [1 CMS-initial-mark: 
>> 12018K(196608K)] 7811496K(16564608K), 4.2147116 secs] [Times: user=4.15 
>> sys=0.06, real=4.22 secs] 
>>   
> I'm assuming that there was no GC activity between the ParNew collection and
> the initial-mark above.
> 
> That says that the application has been filling up the young gen for a 
> while.  Most of that might
> be dead when the initial-mark starts but we don't know so we have to 
> assume that it's live.
> Live objects in the young gen can keep objects alive in the tenured 
> (cms) generation so
> we need to look at the young gen and your young gen is large so there's 
> lots of GC work
> there.
> 
> The first initial-mark had a full GC just before it (right?) so the 
> young gen was likely
> empty.
> 
> If you run your test case longer and see mostly longer initial-marks, 
> that would suggest
> that my guess is right.
>> 2010-04-13T23:05:33.288+0000: 3347.070: [CMS-concurrent-mark-start]
>> 2010-04-13T23:05:33.343+0000: 3347.125: [CMS-concurrent-mark: 0.048/0.056 
>> secs] [Times: user=0.35 sys=0.15, real=0.06 secs] 
>> 2010-04-13T23:05:33.344+0000: 3347.126: [CMS-concurrent-preclean-start]
>> 2010-04-13T23:05:33.347+0000: 3347.128: [CMS-concurrent-preclean: 
>> 0.003/0.003 secs] [Times: user=0.01 sys=0.01, real=0.00 secs] 
>> 2010-04-13T23:05:33.347+0000: 3347.129: 
>> [CMS-concurrent-abortable-preclean-start]
>>  CMS: abort preclean due to time 2010-04-13T23:05:38.456+0000: 3352.238: 
>> [CMS-concurrent-abortable-preclean: 0.573/5.109 secs] [Times: user=0.94 
>> sys=0.24, real=5.11 secs] 
>> 2010-04-13T23:05:38.461+0000: 3352.243: [GC[YG occupancy: 7824966 K 
>> (16368000 K)]3352.243: [Rescan (parallel) , 4.2295340 secs]3356.473: [weak 
>> refs processing, 0.0000422 secs] [1 CMS-remark: 12018K(196608K)] 
>> 7836984K(16564608K), 4.2298963 secs] [Times: user=36.56 sys=1.65, 
>> real=4.23 secs] 
>>   
> Also note that your remark pause is long.  If you add 
> -XX:+CMSScavengeBeforeRemark
> it will schedule a ParNew collection before the remark.  If this causes 
> the remark pause
> to drop significantly, that also suggests that the issue is just lots of 
> work with
> lots of live objects in the young gen (because the ParNew collection 
> will reduce the
> number of live objects in the young gen).
> 
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use


From shaun.hennessy at alcatel-lucent.com  Wed Apr 14 13:29:09 2010
From: shaun.hennessy at alcatel-lucent.com (Shaun Hennessy)
Date: Wed, 14 Apr 2010 16:29:09 -0400
Subject: understanding GC logs
In-Reply-To: <148A7A57-B616-4CA4-9571-0F0216B0F650@oracle.com>
References: <dc6011320909281007k19f52f54v8d086edcd19b0bae@mail.gmail.com>
	<4AC0EEAE.5010705@Sun.COM>
	<dc6011320909281106r398fa9fak9384e89012d0c52d@mail.gmail.com>
	<4AC145AE.30804@sun.com> <4AC148B6.7010608@Sun.COM>
	<4AC1493A.2030004@sun.com> <4B4B3ECB.5090105@alcatel-lucent.com>
	<4B4B937C.4080907@alcatel-lucent.com>
	<4B4B9FBC.4040103@sun.com> <4B4BA6AF.5080300@sun.com>
	<4B50C9BF.8060202@alcatel-lucent.com> <4B50DE95.4060901@Sun.COM>
	<4BABA011.7020801@alcatel-lucent.com>
	<4BB20B1C.4020608@alcatel-lucent.com>
	<148A7A57-B616-4CA4-9571-0F0216B0F650@oracle.com>
Message-ID: <4BC62595.2000002@alcatel-lucent.com>

An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100414/f6d8198f/attachment.html 

From matt.khan at db.com  Wed Apr 14 14:15:27 2010
From: matt.khan at db.com (Matt Khan)
Date: Wed, 14 Apr 2010 22:15:27 +0100
Subject: Avoiding 1 long CMS with a big heap
In-Reply-To: <4BC5FD4F.6030506@oracle.com>
Message-ID: <OF4B34409F.E11FAA9E-ON80257705.0072008C-80257705.0074C5BE@db.com>

>> The first initial-mark had a full GC just before it (right?) so the 
young gen was likely empty.
that's right

>> If you run your test case longer and see mostly longer initial-marks, 
that would suggest that my guess is right.
the young gen is definitely not empty when the 2nd initial mark was 
triggered. There has been no further CMS activity since then so 
approaching 24hrs now, occupation of tenured has hovered ~30MB all day. 

I still don't understand *why* the 2nd initial mark happened. I thought 
CMS was only triggered once the occupation passed the threshold (IIRC this 
is 50% by default), since occupation was actually more like 10% or so then 
why did it even feel the need to do anything at all?

>> If you add -XX:+CMSScavengeBeforeRemark it will schedule a ParNew 
collection before the remark.
slightly naive Q.... why isn't this default behaviour? is it because a 
"normal" heap has a bigger tenured than eden hence the cost isn't skewed 
in the way I have it configured?

>> Just drop the incremental option and you would be fine in this case.
I'll give that a try and report back.

Cheers
Matt

Matt Khan
--------------------------------------------------
GFFX Auto Trading
Deutsche Bank, London


---

This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and delete this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.

Please refer to http://www.db.com/en/content/eu_disclosures.htm for additional EU corporate and regulatory disclosures.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100414/1b7657a0/attachment.html 

From jon.masamitsu at oracle.com  Wed Apr 14 14:58:18 2010
From: jon.masamitsu at oracle.com (Jon Masamitsu)
Date: Wed, 14 Apr 2010 14:58:18 -0700
Subject: Avoiding 1 long CMS with a big heap
In-Reply-To: <OF4B34409F.E11FAA9E-ON80257705.0072008C-80257705.0074C5BE@db.com>
References: <OF4B34409F.E11FAA9E-ON80257705.0072008C-80257705.0074C5BE@db.com>
Message-ID: <4BC63A7A.8010908@oracle.com>

On 04/14/10 14:15, Matt Khan wrote:
> .
...

>
>
> I still don't understand *why* the 2nd initial mark happened. I 
> thought CMS was only triggered once the occupation passed the 
> threshold (IIRC this is 50% by default), since occupation was actually 
> more like 10% or so then why did it even feel the need to do anything 
> at all?
>

CMS tries to project how early to start a cycle based on past
behavior.  When there hasn't been much data gathered about
collections, the projection can just be wrong.

> >> If you add -XX:+CMSScavengeBeforeRemark it will schedule a ParNew 
> collection before the remark.
> slightly naive Q.... why isn't this default behaviour? is it because a 
> "normal" heap has a bigger tenured than eden hence the cost isn't 
> skewed in the way I have it configured?

That would put a young gen collection and a remark back-to-back and users
see it as a single longer pause.  We work to keep space between those two
pauses.
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100414/dd4d0116/attachment.html 

From matt.khan at db.com  Wed Apr 14 15:31:29 2010
From: matt.khan at db.com (Matt Khan)
Date: Wed, 14 Apr 2010 23:31:29 +0100
Subject: Avoiding 1 long CMS with a big heap
In-Reply-To: <4BC63A7A.8010908@oracle.com>
Message-ID: <OFC6DFCA4B.8B1DCF24-ON80257705.007B2DF6-80257705.007BBBC2@db.com>

>> CMS tries to project how early to start a cycle based on past behavior. 
 When there hasn't been much data gathered about collections, the 
projection can just be wrong.
OK so it seems to be Q of how can I mitigate this risk, i.e. is there 
anything I can do to influence this behaviour such that the probability of 
such an incorrect call, by the collector, is reduced? 

Obviously ideally it would be possible to eliminate the risk full stop but 
based on your reply it sounds like I'd need to be able to guarantee that 
"this object really is not reachable by anything that's in tenured" which 
really appears to equate to explicit memory management.

Cheers
Matt

Matt Khan
--------------------------------------------------
GFFX Auto Trading
Deutsche Bank, London


---

This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and delete this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.

Please refer to http://www.db.com/en/content/eu_disclosures.htm for additional EU corporate and regulatory disclosures.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100414/bbd3601f/attachment-0001.html 

From jon.masamitsu at oracle.com  Thu Apr 15 15:51:52 2010
From: jon.masamitsu at oracle.com (Jon Masamitsu)
Date: Thu, 15 Apr 2010 15:51:52 -0700
Subject: understanding GC logs
In-Reply-To: <4BC62595.2000002@alcatel-lucent.com>
References: <dc6011320909281007k19f52f54v8d086edcd19b0bae@mail.gmail.com>
	<4AC0EEAE.5010705@Sun.COM>
	<dc6011320909281106r398fa9fak9384e89012d0c52d@mail.gmail.com>
	<4AC145AE.30804@sun.com> <4AC148B6.7010608@Sun.COM>
	<4AC1493A.2030004@sun.com> <4B4B3ECB.5090105@alcatel-lucent.com>
	<4B4B937C.4080907@alcatel-lucent.com>
	<4B4B9FBC.4040103@sun.com> <4B4BA6AF.5080300@sun.com>
	<4B50C9BF.8060202@alcatel-lucent.com> <4B50DE95.4060901@Sun.COM>
	<4BABA011.7020801@alcatel-lucent.com>
	<4BB20B1C.4020608@alcatel-lucent.com>
	<148A7A57-B616-4CA4-9571-0F0216B0F650@oracle.com>
	<4BC62595.2000002@alcatel-lucent.com>
Message-ID: <4BC79888.5020606@oracle.com>

Shaun,

I'm going to answer over several replies.  Here are
the first 2 answers.

On 04/14/10 13:29, Shaun Hennessy wrote:
> Still working on log output w/ different releases / log options,
> but a few other follow up questions on the substance
>
>>>
>>> - Promotion Failure
>>> 4896.478: [GC 4896.478: [ParNew: 4894353K->587864K(5017600K), 
>>> 0.4789909 secs] 8473688K->4268560K(13619200K), 0.4791812 secs] 
>>> [Times: user=1.00 sys=0.61, real=0.48 secs]
>>> 4897.812: [GC 4897.812: [ParNew: 4888664K->545903K(5017600K), 
>>> 0.4105613 secs] 8569360K->4326583K(13619200K), 0.4107560 secs] 
>>> [Times: user=1.06 sys=0.55, real=0.41 secs]
>>> 4899.057: [GC 4899.058: [ParNew: 4846703K->638966K(5017600K), 
>>> 0.2759734 secs] 8627383K->4496987K(13619200K), 0.2761637 secs] 
>>> [Times: user=1.13 sys=0.36, real=0.28 secs]
>>> 4900.101: [GC 4900.101: [ParNew: 4939768K->630721K(5017600K), 
>>> 0.5117751 secs] 8797789K->4607020K(13619200K), 0.5119662 secs] 
>>> [Times: user=0.84 sys=0.66, real=0.51 secs]
>>> 4900.615: [GC 4900.615: [ParNew: 651487K->487288K(5017600K), 
>>> 0.0780183 secs] 4627786K->4463587K(13619200K), 0.0781687 secs] 
>>> [Times: user=0.96 sys=0.00, real=0.08 secs]
>>> *4901.581: [GC 4901.581: [ParNew (promotion failed): 
>>> 4788088K->4780999K(5017600K), 2.8947499 secs]4904.476: [CMS: 
>>> 4003090K->1530872K(8601600K), 7.5122451 secs] 
>>> 8764387K->1530872K(13619200K), [CMS Perm : 
>>> 671102K->671102K(819200K)], 10.4072102 secs] [Times: user=11.03 
>>> sys=1.09, real=10.41 secs]*
>>> 4913.024: [GC 4913.024: [ParNew: 4300800K->316807K(5017600K), 
>>> 0.0615917 secs] 5831672K->1847679K(13619200K), 0.0617857 secs] 
>>> [Times: user=0.74 sys=0.00, real=0.06 secs]
>>> 4914.015: [GC 4914.015: [ParNew: 4617607K->475077K(5017600K), 
>>> 0.0771389 secs] 6148479K->2005949K(13619200K), 0.0773290 secs] 
>>> [Times: user=0.95 sys=0.00, real=0.08 secs]
>>> 4914.908: [GC 4914.908: [ParNew: 4775877K->586339K(5017600K), 
>>> 0.0857102 secs] 6306749K->2117211K(13619200K), 0.0859046 secs] 
>>> [Times: user=1.06 sys=0.00, real=0.09 secs]
>>> 4915.816: [GC 4915.816: [ParNew: 4887139K->476398K(5017600K), 
>>> 0.1841627 secs] 6418011K->2152868K(13619200K), 0.1843556 secs] 
>>> [Times: user=1.32 sys=0.07, real=0.18 secs]
>>
>>> 8) What exactly is occurring during this promotion failed 
>>> collection?  Based on the next example I assume
>>>   it's a (successful) scavenge.  What exactly is this - which 
>>> thread(s) serial / ParallelGCThreads?,
>>>   STW?, are we simply compacting the tenured gen or are we can 
>>> actually GC the tenured?
>>
>> A promotion failure is a scavenge that does not succeed because there 
>> is not enough
>> space in the old gen to do all the  needed promotions.  The scavenge 
>> is in essence
>> unwound and then a full STW compaction of the entire heap is done.
> 1) Just so I am clear "compaction" is compacting of the heap (making a 
> contiguous heap) AND
> garbage collection at the same time?  -- the "CMS: 
> 4003090K->1530872K(8601600K)"
> shows me my total heap is 1.5GB following this action, whereas the
> previous ParNew showed total heap was at 4.4GB.
"compaction" is a feature of some garbage collectors and in this case,  
a garbage
collection with compaction is happening.
>
>
> 2) Also just noticed this, what is the reason for the previous minor 
> collection ?
> 4900.615 [ParNew: 651487K->487288K(5017600K)  -->   Normally the 
> minors are
> at 4.8GB and get reduced to about 600K, whereas this one started at 
> 650K. 
> Using survivor isn't impacting/changing anything is it?
>
> Maybe it's because at 4900.615 we finished at 487K and than less than
> 1 second later at 4901.581 we have a another ParNew (the failure) and the
> 4788088K->4780999K entry --- so in less than 1 second we allocated 4GB.
> So perhaps does this answer my previous question about why a minor comes
> along at only 651K - we're trying to allocate a 4GB object / 4GB worth 
> of large objects?
> I guess this explains why I had a promotion failure / confirms 
> fragmentation
> despite having more than 4GB free in tenured I probably didn't 4GB 
> contiguous.

Hard to say why the collection with only 651487k used.  You're using JNI 
critical sections which
affects garbage collection so something could be going wrong with that 
interaction.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100415/23bbd60f/attachment.html 

From jon.masamitsu at oracle.com  Fri Apr 16 11:03:11 2010
From: jon.masamitsu at oracle.com (Jon Masamitsu)
Date: Fri, 16 Apr 2010 11:03:11 -0700
Subject: understanding GC logs
In-Reply-To: <4BC62595.2000002@alcatel-lucent.com>
References: <dc6011320909281007k19f52f54v8d086edcd19b0bae@mail.gmail.com>
	<4AC0EEAE.5010705@Sun.COM>
	<dc6011320909281106r398fa9fak9384e89012d0c52d@mail.gmail.com>
	<4AC145AE.30804@sun.com> <4AC148B6.7010608@Sun.COM>
	<4AC1493A.2030004@sun.com> <4B4B3ECB.5090105@alcatel-lucent.com>
	<4B4B937C.4080907@alcatel-lucent.com>
	<4B4B9FBC.4040103@sun.com> <4B4BA6AF.5080300@sun.com>
	<4B50C9BF.8060202@alcatel-lucent.com> <4B50DE95.4060901@Sun.COM>
	<4BABA011.7020801@alcatel-lucent.com>
	<4BB20B1C.4020608@alcatel-lucent.com>
	<148A7A57-B616-4CA4-9571-0F0216B0F650@oracle.com>
	<4BC62595.2000002@alcatel-lucent.com>
Message-ID: <4BC8A65F.40408@oracle.com>

On 4/14/10 1:29 PM, Shaun Hennessy wrote:
> ...
>
>>
>>>
>>>
>>>
>>> promotion failed, and full GC
>>> 50786.124: [GC 50786.124: [ParNew: 4606713K->338518K(5017600K), 
>>> 0.0961884 secs] 12303455K->8081859K(13619200K), 0.0963907 secs] 
>>> [Times: user=0.91 sys=0.01, real=0.10 secs]
>>> 50787.373: [GC 50787.373: [ParNew: 4639318K->272229K(5017600K), 
>>> 0.0749353 secs] 12382659K->8053730K(13619200K), 0.0751408 secs] 
>>> [Times: user=0.75 sys=0.00, real=0.08 secs]
>>> 50788.483: [GC 50788.483: [ParNew: 4573029K->393397K(5017600K), 
>>> 0.0837182 secs] 12354530K->8185595K(13619200K), 0.0839321 secs] 
>>> [Times: user=1.03 sys=0.00, real=0.08 secs]
>>> 50789.590: [GC 50789.590: [ParNew (promotion failed): 
>>> 4694264K->4612345K(5017600K), 1.5974678 secs] 
>>> 12486461K->12447305K(13619200K), 1.5976765 secs] [Times : user=2.38 
>>> sys=0.20, real=1.60 secs]
>>> GC locker: Trying a full collection because scavenge failed
>>> 50791.188: [Full GC 50791.188: [CMS: 7834959K->1227325K(8601600K), 
>>> 6.7102106 secs] 12447305K->1227325K(13619200K), [CMS Perm : 
>>> 670478K->670478K(819200K)], 6.7103417 secs] [Times: user=6.71 
>>> sys=0.00, real=6.71 secs]
>>> 50798.982: [GC 50798.982: [ParNew: 4300800K->217359K(5017600K), 
>>> 0.0364557 secs] 5528125K->1444685K(13619200K), 0.0366630 secs] 
>>> [Times: user=0.44 sys=0.00, real=0.04 secs]
>>> 50800.246: [GC 50800.246: [ParNew: 4518167K->198753K(5017600K), 
>>> 0.0368620 secs] 5745493K->1426078K(13619200K), 0.0370604 secs] 
>>> [Times: user=0.46 sys=0.01, real=0.04 secs]
>>> 9) Probably once I understand what the scavenge is doing will help 
>>> me understand this case, but logic seems
>>>  simply enough - fullgc on promotionfailure&scavenge failed.
>>
>> Yes, full STW compaction.
>>
> 3) So in the prevous case (promotion failed)  you said it was "full 
> STW compaction of the entire heap is done" and
> here in the (promotion failed; GC locker; Full GC) also "full STW 
> compaction" --- what is the difference between
> the 2 scenarios (ie cause and effect)?   From below your "/The "GC 
> locker" message says that after a JNI critical
> section was exited the GC wanted to do a scavenge but did not think 
> there was enough room in  the old gen so it
> does a full STW compaction."   -- /so the cause of the difference 
> between the 2 cases  is simply the fact the tenured
> generation was more full in this latest case?     So in terms of 
> effect  -- is anything different actually
> being done regarding the between the 2 cases or is just the log 
> messages being displayed are different - but they
> are the exact same type of GC?
>
The collections are the same (both full STW compactions).  The only 
difference are
the circumstances under which the GC were done.  JNI critical sections can
temporarily delay a GC and whenthe critical section is exited the delayed
GC happens.  That's about all I know about that.
>
> 4) Forgetting these specific examples, if we are having 
> frequent/troublesome promotion failures
>  is there anything beyond

Those are the usual remedies.  I suppose you could also try to reduce the
amount of allocations you do.

>
> a) Try 6u20
> b) Try a lower InitiatingOccupancy
> c) Try a smaller Eden Gen &  bigger Tenured / or simply bigger Heap
> - Does the use of Survivor Space vs not using Survivor make it more or 
> less likely to have promotion failures?
Having survivor spaces are better in general.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100416/49f446f0/attachment.html 

From matt.khan at db.com  Tue Apr 20 06:20:07 2010
From: matt.khan at db.com (Matt Khan)
Date: Tue, 20 Apr 2010 14:20:07 +0100
Subject: What influences young generation pause times?
Message-ID: <OF859D83F9.94C8439F-ON8025770B.00485204-8025770B.004940C2@db.com>

Hi

My understanding is that young gen pause times are related to the size of 
the set of live objects but what does this really mean? for example does 
it mean number of objects in the live set? total size [of them]? 

and what other factors influence the young gen pause time?

Cheers
Matt

Matt Khan
--------------------------------------------------
GFFX Auto Trading
Deutsche Bank, London


---

This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and delete this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.

Please refer to http://www.db.com/en/content/eu_disclosures.htm for additional EU corporate and regulatory disclosures.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100420/705d9997/attachment.html 

From tony.printezis at oracle.com  Tue Apr 20 07:04:32 2010
From: tony.printezis at oracle.com (Tony Printezis)
Date: Tue, 20 Apr 2010 10:04:32 -0400
Subject: What influences young generation pause times?
In-Reply-To: <OF859D83F9.94C8439F-ON8025770B.00485204-8025770B.004940C2@db.com>
References: <OF859D83F9.94C8439F-ON8025770B.00485204-8025770B.004940C2@db.com>
Message-ID: <4BCDB470.8090102@oracle.com>

Matt,

For most applications, object copying is the predominant cost during 
young GCs. Basically, the more objects survive the collection and need 
to be copied, the higher the young GC times will be. Young GCs do not 
touch the dead objects in the young generation so the number of overall 
objects in the young generation will not really affect pause times, just 
the number of live ones.

There are other costs during young GCs, like scanning the application 
thread stacks, scanning the card table to find old-to-young references, 
etc. However, typically, those costs are very small compared to copying 
the live objects.

Hope this helps,

Tony

Matt Khan wrote:
>
> Hi
>
> My understanding is that young gen pause times are related to the size 
> of the set of live objects but what does this really mean? for example 
> does it mean number of objects in the live set? total size [of them]?
>
> and what other factors influence the young gen pause time?
>
> Cheers
> Matt
>
> Matt Khan
> --------------------------------------------------
> GFFX Auto Trading
> Deutsche Bank, London
>
>
> ---
>
> This e-mail may contain confidential and/or privileged information. If 
> you are not the intended recipient (or have received this e-mail in 
> error) please notify the sender immediately and delete this e-mail. 
> Any unauthorized copying, disclosure or distribution of the material 
> in this e-mail is strictly forbidden.
>
> Please refer to http://www.db.com/en/content/eu_disclosures.htm for 
> additional EU corporate and regulatory disclosures.
> ------------------------------------------------------------------------
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>   

From matt.khan at db.com  Tue Apr 20 09:04:08 2010
From: matt.khan at db.com (Matt Khan)
Date: Tue, 20 Apr 2010 17:04:08 +0100
Subject: What influences young generation pause times?
In-Reply-To: <4BCDB470.8090102@oracle.com>
Message-ID: <OF9AD5912A.88CC330D-ON8025770B.0056D10E-8025770B.00584526@db.com>

Hi Tony

>> Basically, the more objects survive the collection and need to be 
copied, the higher the young GC times will be.
so when does a concurrent collector enter a STW pause? 

for example if I look at figure 6, p10 in the memory management white 
paper (http://java.sun.com/products/hotspot/whitepaper.html) then that 
makes it look like there is a single STW pause per young collection that 
is made shorter because there are n threads doing the work. Is that an 
accurate depiction of when it pauses or just a convenient visualisation?

My reason for asking is that my app doesn't exhibit this single pause per 
young collection, instead I see a succession of short pauses between GC 
logs (example below) & I'd like to understand what causes those pauses. 
This app is using CMS (params used below) but there is no CMS activity 
reported at this time because v little enters the tenured generation and 
hence there is no collection required.

Total time for which application threads were stopped: 0.0051359 seconds
Application time: 99.9576332 seconds
2010-04-13T19:14:53.185+0000: 368542.855: [GC 368542.855: [ParNew
Desired survivor size 14450688 bytes, new threshold 1 (max 1)
- age   1:    3377144 bytes,    3377144 total
: 2986668K->4491K(2998976K), 0.0254753 secs] 3076724K->94963K(3130048K) 
icms_dc=0 , 0.0259072 secs] [Time
s: user=0.25 sys=0.01, real=0.03 secs] 
Total time for which application threads were stopped: 0.0330759 seconds
Application time: 190.7387185 seconds
Total time for which application threads were stopped: 0.0060798 seconds
Application time: 9.2698867 seconds
Total time for which application threads were stopped: 0.0051861 seconds
Application time: 290.7195886 seconds
Total time for which application threads were stopped: 0.0065455 seconds
Application time: 9.2792321 seconds
Total time for which application threads were stopped: 0.0051541 seconds
Application time: 290.7292153 seconds
Total time for which application threads were stopped: 0.0063071 seconds
Application time: 9.2696694 seconds
Total time for which application threads were stopped: 0.0052036 seconds
Application time: 290.7093779 seconds
Total time for which application threads were stopped: 0.0065365 seconds
Application time: 9.2793591 seconds
Total time for which application threads were stopped: 0.0051265 seconds
Application time: 290.7301471 seconds
Total time for which application threads were stopped: 0.0070431 seconds
Application time: 9.2694376 seconds
Total time for which application threads were stopped: 0.0051428 seconds
Application time: 119.4074368 seconds
Total time for which application threads were stopped: 0.0059739 seconds
Application time: 39.8647697 seconds
2010-04-13T19:40:52.550+0000: 370102.218: [GC 370102.219: [ParNew
Desired survivor size 14450688 bytes, new threshold 1 (max 1)
- age   1:    2911824 bytes,    2911824 total

-Xms3072m 
-Xmx3072m 
-Xmn2944m 
-XX:+DisableExplicitGC 
-XX:+PrintGCDetails 
-XX:+PrintGCDateStamps 
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintGCApplicationConcurrentTime
-XX:MaxTenuringThreshold=1 
-XX:SurvivorRatio=190 
-XX:TargetSurvivorRatio=90
-XX:+UseConcMarkSweepGC 
-XX:+UseParNewGC 

Cheers
Matt

Matt Khan
--------------------------------------------------
GFFX Auto Trading
Deutsche Bank, London


---

This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and delete this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.

Please refer to http://www.db.com/en/content/eu_disclosures.htm for additional EU corporate and regulatory disclosures.

From tony.printezis at oracle.com  Tue Apr 20 10:51:58 2010
From: tony.printezis at oracle.com (Tony Printezis)
Date: Tue, 20 Apr 2010 13:51:58 -0400
Subject: What influences young generation pause times?
In-Reply-To: <OF9AD5912A.88CC330D-ON8025770B.0056D10E-8025770B.00584526@db.com>
References: <OF9AD5912A.88CC330D-ON8025770B.0056D10E-8025770B.00584526@db.com>
Message-ID: <4BCDE9BE.1070504@oracle.com>

Matt,

The pauses that you see seem to be non-GC safepoints. Maybe they are 
biased lock revocation safepoints. If the log doesn't show a [GC...] 
line at a safepoint, then there is not GC activity during it (at least I 
can't think of an occasion when this would be the case).

Tony

Matt Khan wrote:
> Hi Tony
>
>   
>>> Basically, the more objects survive the collection and need to be 
>>>       
> copied, the higher the young GC times will be.
> so when does a concurrent collector enter a STW pause? 
>
> for example if I look at figure 6, p10 in the memory management white 
> paper (http://java.sun.com/products/hotspot/whitepaper.html) then that 
> makes it look like there is a single STW pause per young collection that 
> is made shorter because there are n threads doing the work. Is that an 
> accurate depiction of when it pauses or just a convenient visualisation?
>
> My reason for asking is that my app doesn't exhibit this single pause per 
> young collection, instead I see a succession of short pauses between GC 
> logs (example below) & I'd like to understand what causes those pauses. 
> This app is using CMS (params used below) but there is no CMS activity 
> reported at this time because v little enters the tenured generation and 
> hence there is no collection required.
>
> Total time for which application threads were stopped: 0.0051359 seconds
> Application time: 99.9576332 seconds
> 2010-04-13T19:14:53.185+0000: 368542.855: [GC 368542.855: [ParNew
> Desired survivor size 14450688 bytes, new threshold 1 (max 1)
> - age   1:    3377144 bytes,    3377144 total
> : 2986668K->4491K(2998976K), 0.0254753 secs] 3076724K->94963K(3130048K) 
> icms_dc=0 , 0.0259072 secs] [Time
> s: user=0.25 sys=0.01, real=0.03 secs] 
> Total time for which application threads were stopped: 0.0330759 seconds
> Application time: 190.7387185 seconds
> Total time for which application threads were stopped: 0.0060798 seconds
> Application time: 9.2698867 seconds
> Total time for which application threads were stopped: 0.0051861 seconds
> Application time: 290.7195886 seconds
> Total time for which application threads were stopped: 0.0065455 seconds
> Application time: 9.2792321 seconds
> Total time for which application threads were stopped: 0.0051541 seconds
> Application time: 290.7292153 seconds
> Total time for which application threads were stopped: 0.0063071 seconds
> Application time: 9.2696694 seconds
> Total time for which application threads were stopped: 0.0052036 seconds
> Application time: 290.7093779 seconds
> Total time for which application threads were stopped: 0.0065365 seconds
> Application time: 9.2793591 seconds
> Total time for which application threads were stopped: 0.0051265 seconds
> Application time: 290.7301471 seconds
> Total time for which application threads were stopped: 0.0070431 seconds
> Application time: 9.2694376 seconds
> Total time for which application threads were stopped: 0.0051428 seconds
> Application time: 119.4074368 seconds
> Total time for which application threads were stopped: 0.0059739 seconds
> Application time: 39.8647697 seconds
> 2010-04-13T19:40:52.550+0000: 370102.218: [GC 370102.219: [ParNew
> Desired survivor size 14450688 bytes, new threshold 1 (max 1)
> - age   1:    2911824 bytes,    2911824 total
>
> -Xms3072m 
> -Xmx3072m 
> -Xmn2944m 
> -XX:+DisableExplicitGC 
> -XX:+PrintGCDetails 
> -XX:+PrintGCDateStamps 
> -XX:+PrintGCApplicationStoppedTime
> -XX:+PrintGCApplicationConcurrentTime
> -XX:MaxTenuringThreshold=1 
> -XX:SurvivorRatio=190 
> -XX:TargetSurvivorRatio=90
> -XX:+UseConcMarkSweepGC 
> -XX:+UseParNewGC 
>
> Cheers
> Matt
>
> Matt Khan
> --------------------------------------------------
> GFFX Auto Trading
> Deutsche Bank, London
>
>
>
> ---
>
> This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and delete this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.
>
> Please refer to http://www.db.com/en/content/eu_disclosures.htm for additional EU corporate and regulatory disclosures.
>   

From y.s.ramakrishna at oracle.com  Tue Apr 20 11:00:21 2010
From: y.s.ramakrishna at oracle.com (Y. Srinivas Ramakrishna)
Date: Tue, 20 Apr 2010 11:00:21 -0700
Subject: What influences young generation pause times?
In-Reply-To: <4BCDE9BE.1070504@oracle.com>
References: <OF9AD5912A.88CC330D-ON8025770B.0056D10E-8025770B.00584526@db.com>
	<4BCDE9BE.1070504@oracle.com>
Message-ID: <4BCDEBB5.8030708@oracle.com>

You could use -XX:+PrintSafepointStatistics etc. to figure these out.

-- ramki

On 04/20/10 10:51, Tony Printezis wrote:
> Matt,
> 
> The pauses that you see seem to be non-GC safepoints. Maybe they are 
> biased lock revocation safepoints. If the log doesn't show a [GC...] 
> line at a safepoint, then there is not GC activity during it (at least I 
> can't think of an occasion when this would be the case).
> 
> Tony
> 
> Matt Khan wrote:
>> Hi Tony
>>
>>   
>>>> Basically, the more objects survive the collection and need to be 
>>>>       
>> copied, the higher the young GC times will be.
>> so when does a concurrent collector enter a STW pause? 
>>
>> for example if I look at figure 6, p10 in the memory management white 
>> paper (http://java.sun.com/products/hotspot/whitepaper.html) then that 
>> makes it look like there is a single STW pause per young collection that 
>> is made shorter because there are n threads doing the work. Is that an 
>> accurate depiction of when it pauses or just a convenient visualisation?
>>
>> My reason for asking is that my app doesn't exhibit this single pause per 
>> young collection, instead I see a succession of short pauses between GC 
>> logs (example below) & I'd like to understand what causes those pauses. 
>> This app is using CMS (params used below) but there is no CMS activity 
>> reported at this time because v little enters the tenured generation and 
>> hence there is no collection required.
>>
>> Total time for which application threads were stopped: 0.0051359 seconds
>> Application time: 99.9576332 seconds
>> 2010-04-13T19:14:53.185+0000: 368542.855: [GC 368542.855: [ParNew
>> Desired survivor size 14450688 bytes, new threshold 1 (max 1)
>> - age   1:    3377144 bytes,    3377144 total
>> : 2986668K->4491K(2998976K), 0.0254753 secs] 3076724K->94963K(3130048K) 
>> icms_dc=0 , 0.0259072 secs] [Time
>> s: user=0.25 sys=0.01, real=0.03 secs] 
>> Total time for which application threads were stopped: 0.0330759 seconds
>> Application time: 190.7387185 seconds
>> Total time for which application threads were stopped: 0.0060798 seconds
>> Application time: 9.2698867 seconds
>> Total time for which application threads were stopped: 0.0051861 seconds
>> Application time: 290.7195886 seconds
>> Total time for which application threads were stopped: 0.0065455 seconds
>> Application time: 9.2792321 seconds
>> Total time for which application threads were stopped: 0.0051541 seconds
>> Application time: 290.7292153 seconds
>> Total time for which application threads were stopped: 0.0063071 seconds
>> Application time: 9.2696694 seconds
>> Total time for which application threads were stopped: 0.0052036 seconds
>> Application time: 290.7093779 seconds
>> Total time for which application threads were stopped: 0.0065365 seconds
>> Application time: 9.2793591 seconds
>> Total time for which application threads were stopped: 0.0051265 seconds
>> Application time: 290.7301471 seconds
>> Total time for which application threads were stopped: 0.0070431 seconds
>> Application time: 9.2694376 seconds
>> Total time for which application threads were stopped: 0.0051428 seconds
>> Application time: 119.4074368 seconds
>> Total time for which application threads were stopped: 0.0059739 seconds
>> Application time: 39.8647697 seconds
>> 2010-04-13T19:40:52.550+0000: 370102.218: [GC 370102.219: [ParNew
>> Desired survivor size 14450688 bytes, new threshold 1 (max 1)
>> - age   1:    2911824 bytes,    2911824 total
>>
>> -Xms3072m 
>> -Xmx3072m 
>> -Xmn2944m 
>> -XX:+DisableExplicitGC 
>> -XX:+PrintGCDetails 
>> -XX:+PrintGCDateStamps 
>> -XX:+PrintGCApplicationStoppedTime
>> -XX:+PrintGCApplicationConcurrentTime
>> -XX:MaxTenuringThreshold=1 
>> -XX:SurvivorRatio=190 
>> -XX:TargetSurvivorRatio=90
>> -XX:+UseConcMarkSweepGC 
>> -XX:+UseParNewGC 
>>
>> Cheers
>> Matt
>>
>> Matt Khan
>> --------------------------------------------------
>> GFFX Auto Trading
>> Deutsche Bank, London
>>
>>
>>
>> ---
>>
>> This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and delete this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.
>>
>> Please refer to http://www.db.com/en/content/eu_disclosures.htm for additional EU corporate and regulatory disclosures.
>>   
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use


From tony.printezis at oracle.com  Tue Apr 20 11:11:59 2010
From: tony.printezis at oracle.com (Tony Printezis)
Date: Tue, 20 Apr 2010 14:11:59 -0400
Subject: What influences young generation pause times?
In-Reply-To: <n2tfb5ec5091004201027h25f96f3fqafbfdffe8ecb0ab9@mail.gmail.com>
References: <4BCDB470.8090102@oracle.com>	
	<OF9AD5912A.88CC330D-ON8025770B.0056D10E-8025770B.00584526@db.com>
	<n2tfb5ec5091004201027h25f96f3fqafbfdffe8ecb0ab9@mail.gmail.com>
Message-ID: <4BCDEE6F.8030807@oracle.com>

Osvaldo,

You misunderstand how a copying GC (which is the algorithm our young gen 
GCs implement) works. It does not first mark the live objects, and then 
copies them. Instead, it copies the objects as it comes across them 
(i.e., at the same time it discovers they are live). So, there is no 
opportunity to find big blocks of live objects and not copy them. The 
end of the GC would be the only time you would be able to do that but, 
by then, you've already copied all the objects anyway.

Regarding calling young GCs explicitly from an application: I can see 
how, in the case of single-threaded applications, the application might 
know "We are between transactions and, maybe, we have lots of garbage 
and not much live in the young gen. So let's do a young GC to clean up 
the young gen at maybe low overhead since we'll copy very little." 
However, how will this work in the case of multi-threaded applications, 
which are the vast majority of applications we see from our customers? A 
thread might be between transactions, but what about the other 50, 300, 
or even 2,000 threads? If a particular time is good to do a young GC for 
a particular thread, it does not mean that it's also good for the rest. 
Additionally, I would be willing to bet money that if we provided such 
an API, library writers will abuse it thinking that "hey, the end of 
this library call will be a great time to do a young GC!", without 
taking into consideration that many other threads could be doing 
something totally different at the same time (we've seen way too many 
libraries that call System.gc() already...).

My two cents,

Tony

Osvaldo Doederlein wrote:
> If you allow some intermission... is the young-gen collector smart 
> enough to avoid semispace copying in some favorable conditions? Let's 
> say I am lucky and when young-GC is triggered, after marking I have 
> [LLLLLLLLLLDDDD] where L=live objects, D=dead. it's stupid to copy the 
> block [LLLLLLLLLL] to another space. I'd expect the collector to have 
> some heuristic like: look at the top address and total size of the 
> remaining live data, and if it is densely populated (say >90% live 
> space - e.g. [LLLDLLLLLLDDDD]), just finish GC without any compaction 
> or semispace flipping.
>
> I would expect this scenario to happen in the real world with very 
> small frequency, because young-GC must be triggered at a "lucky" time, 
> e.g. after some application transactions commit and before any newer 
> transaction begins - but if the collector already accounts the live 
> set size at the marking phase, the cost to attempt this new 
> optimization is virtually zero. And we might hint the VM to make sure 
> the optimal case doesn't depend on good luck. The JVM could expose an 
> API that allows an application (or a container) to request a 
> "lightweight GC", i.e., perform only young-GC, and only if the 
> young-gen is >N% full. E.g., System.fastGC(0.8) for N=80%. A JavaEE 
> application server could invoke this when it detects idle periods 
> (zero running transactions / zero background processes doing anything 
> important); or even after every transaction commit if the VM uses 
> TLABs (in that case we only collect the TLAB; the whole thing only 
> makes sense for large enough TLABs). For single-threaded processes 
> (Ok, almost-single-threaded...) it's much simpler, just call the 
> lightweight-GC API at special places where major activity ends and 
> tons of allocated data are liberated, e.g. after the render-frame step 
> of your game loop, or after importing each file in your batch ETL 
> program, etc.
>
> A+
> Osvaldo
>
> 2010/4/20 Matt Khan <matt.khan at db.com <mailto:matt.khan at db.com>>
>
>     Hi Tony
>
>     >> Basically, the more objects survive the collection and need to be
>     copied, the higher the young GC times will be.
>     so when does a concurrent collector enter a STW pause?
>
>     for example if I look at figure 6, p10 in the memory management white
>     paper (http://java.sun.com/products/hotspot/whitepaper.html) then that
>     makes it look like there is a single STW pause per young
>     collection that
>     is made shorter because there are n threads doing the work. Is that an
>     accurate depiction of when it pauses or just a convenient
>     visualisation?
>
>     My reason for asking is that my app doesn't exhibit this single
>     pause per
>     young collection, instead I see a succession of short pauses
>     between GC
>     logs (example below) & I'd like to understand what causes those
>     pauses.
>     This app is using CMS (params used below) but there is no CMS activity
>     reported at this time because v little enters the tenured
>     generation and
>     hence there is no collection required.
>
>     Total time for which application threads were stopped: 0.0051359
>     seconds
>     Application time: 99.9576332 seconds
>     2010-04-13T19:14:53.185+0000: 368542.855: [GC 368542.855: [ParNew
>     Desired survivor size 14450688 bytes, new threshold 1 (max 1)
>     - age   1:    3377144 bytes,    3377144 total
>     : 2986668K->4491K(2998976K), 0.0254753 secs]
>     3076724K->94963K(3130048K)
>     icms_dc=0 , 0.0259072 secs] [Time
>     s: user=0.25 sys=0.01, real=0.03 secs]
>     Total time for which application threads were stopped: 0.0330759
>     seconds
>     Application time: 190.7387185 seconds
>     Total time for which application threads were stopped: 0.0060798
>     seconds
>     Application time: 9.2698867 seconds
>     Total time for which application threads were stopped: 0.0051861
>     seconds
>     Application time: 290.7195886 seconds
>     Total time for which application threads were stopped: 0.0065455
>     seconds
>     Application time: 9.2792321 seconds
>     Total time for which application threads were stopped: 0.0051541
>     seconds
>     Application time: 290.7292153 seconds
>     Total time for which application threads were stopped: 0.0063071
>     seconds
>     Application time: 9.2696694 seconds
>     Total time for which application threads were stopped: 0.0052036
>     seconds
>     Application time: 290.7093779 seconds
>     Total time for which application threads were stopped: 0.0065365
>     seconds
>     Application time: 9.2793591 seconds
>     Total time for which application threads were stopped: 0.0051265
>     seconds
>     Application time: 290.7301471 seconds
>     Total time for which application threads were stopped: 0.0070431
>     seconds
>     Application time: 9.2694376 seconds
>     Total time for which application threads were stopped: 0.0051428
>     seconds
>     Application time: 119.4074368 seconds
>     Total time for which application threads were stopped: 0.0059739
>     seconds
>     Application time: 39.8647697 seconds
>     2010-04-13T19:40:52.550+0000: 370102.218: [GC 370102.219: [ParNew
>     Desired survivor size 14450688 bytes, new threshold 1 (max 1)
>     - age   1:    2911824 bytes,    2911824 total
>
>     -Xms3072m
>     -Xmx3072m
>     -Xmn2944m
>     -XX:+DisableExplicitGC
>     -XX:+PrintGCDetails
>     -XX:+PrintGCDateStamps
>     -XX:+PrintGCApplicationStoppedTime
>     -XX:+PrintGCApplicationConcurrentTime
>     -XX:MaxTenuringThreshold=1
>     -XX:SurvivorRatio=190
>     -XX:TargetSurvivorRatio=90
>     -XX:+UseConcMarkSweepGC
>     -XX:+UseParNewGC
>
>     Cheers
>     Matt
>
>     Matt Khan
>     --------------------------------------------------
>     GFFX Auto Trading
>     Deutsche Bank, London
>
>
>
>     ---
>
>     This e-mail may contain confidential and/or privileged
>     information. If you are not the intended recipient (or have
>     received this e-mail in error) please notify the sender
>     immediately and delete this e-mail. Any unauthorized copying,
>     disclosure or distribution of the material in this e-mail is
>     strictly forbidden.
>
>     Please refer to http://www.db.com/en/content/eu_disclosures.htm
>     for additional EU corporate and regulatory disclosures.
>     _______________________________________________
>     hotspot-gc-use mailing list
>     hotspot-gc-use at openjdk.java.net
>     <mailto:hotspot-gc-use at openjdk.java.net>
>     http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>

From opinali at gmail.com  Tue Apr 20 10:27:37 2010
From: opinali at gmail.com (Osvaldo Doederlein)
Date: Tue, 20 Apr 2010 14:27:37 -0300
Subject: What influences young generation pause times?
In-Reply-To: <OF9AD5912A.88CC330D-ON8025770B.0056D10E-8025770B.00584526@db.com>
References: <4BCDB470.8090102@oracle.com>
	<OF9AD5912A.88CC330D-ON8025770B.0056D10E-8025770B.00584526@db.com>
Message-ID: <n2tfb5ec5091004201027h25f96f3fqafbfdffe8ecb0ab9@mail.gmail.com>

If you allow some intermission... is the young-gen collector smart enough to
avoid semispace copying in some favorable conditions? Let's say I am lucky
and when young-GC is triggered, after marking I have [LLLLLLLLLLDDDD] where
L=live objects, D=dead. it's stupid to copy the block [LLLLLLLLLL] to
another space. I'd expect the collector to have some heuristic like: look at
the top address and total size of the remaining live data, and if it is
densely populated (say >90% live space - e.g. [LLLDLLLLLLDDDD]), just finish
GC without any compaction or semispace flipping.

I would expect this scenario to happen in the real world with very small
frequency, because young-GC must be triggered at a "lucky" time, e.g. after
some application transactions commit and before any newer transaction begins
- but if the collector already accounts the live set size at the marking
phase, the cost to attempt this new optimization is virtually zero. And we
might hint the VM to make sure the optimal case doesn't depend on good luck.
The JVM could expose an API that allows an application (or a container) to
request a "lightweight GC", i.e., perform only young-GC, and only if the
young-gen is >N% full. E.g., System.fastGC(0.8) for N=80%. A JavaEE
application server could invoke this when it detects idle periods (zero
running transactions / zero background processes doing anything important);
or even after every transaction commit if the VM uses TLABs (in that case we
only collect the TLAB; the whole thing only makes sense for large enough
TLABs). For single-threaded processes (Ok, almost-single-threaded...) it's
much simpler, just call the lightweight-GC API at special places where major
activity ends and tons of allocated data are liberated, e.g. after the
render-frame step of your game loop, or after importing each file in your
batch ETL program, etc.

A+
Osvaldo

2010/4/20 Matt Khan <matt.khan at db.com>

> Hi Tony
>
> >> Basically, the more objects survive the collection and need to be
> copied, the higher the young GC times will be.
> so when does a concurrent collector enter a STW pause?
>
> for example if I look at figure 6, p10 in the memory management white
> paper (http://java.sun.com/products/hotspot/whitepaper.html) then that
> makes it look like there is a single STW pause per young collection that
> is made shorter because there are n threads doing the work. Is that an
> accurate depiction of when it pauses or just a convenient visualisation?
>
> My reason for asking is that my app doesn't exhibit this single pause per
> young collection, instead I see a succession of short pauses between GC
> logs (example below) & I'd like to understand what causes those pauses.
> This app is using CMS (params used below) but there is no CMS activity
> reported at this time because v little enters the tenured generation and
> hence there is no collection required.
>
> Total time for which application threads were stopped: 0.0051359 seconds
> Application time: 99.9576332 seconds
> 2010-04-13T19:14:53.185+0000: 368542.855: [GC 368542.855: [ParNew
> Desired survivor size 14450688 bytes, new threshold 1 (max 1)
> - age   1:    3377144 bytes,    3377144 total
> : 2986668K->4491K(2998976K), 0.0254753 secs] 3076724K->94963K(3130048K)
> icms_dc=0 , 0.0259072 secs] [Time
> s: user=0.25 sys=0.01, real=0.03 secs]
> Total time for which application threads were stopped: 0.0330759 seconds
> Application time: 190.7387185 seconds
> Total time for which application threads were stopped: 0.0060798 seconds
> Application time: 9.2698867 seconds
> Total time for which application threads were stopped: 0.0051861 seconds
> Application time: 290.7195886 seconds
> Total time for which application threads were stopped: 0.0065455 seconds
> Application time: 9.2792321 seconds
> Total time for which application threads were stopped: 0.0051541 seconds
> Application time: 290.7292153 seconds
> Total time for which application threads were stopped: 0.0063071 seconds
> Application time: 9.2696694 seconds
> Total time for which application threads were stopped: 0.0052036 seconds
> Application time: 290.7093779 seconds
> Total time for which application threads were stopped: 0.0065365 seconds
> Application time: 9.2793591 seconds
> Total time for which application threads were stopped: 0.0051265 seconds
> Application time: 290.7301471 seconds
> Total time for which application threads were stopped: 0.0070431 seconds
> Application time: 9.2694376 seconds
> Total time for which application threads were stopped: 0.0051428 seconds
> Application time: 119.4074368 seconds
> Total time for which application threads were stopped: 0.0059739 seconds
> Application time: 39.8647697 seconds
> 2010-04-13T19:40:52.550+0000: 370102.218: [GC 370102.219: [ParNew
> Desired survivor size 14450688 bytes, new threshold 1 (max 1)
> - age   1:    2911824 bytes,    2911824 total
>
> -Xms3072m
> -Xmx3072m
> -Xmn2944m
> -XX:+DisableExplicitGC
> -XX:+PrintGCDetails
> -XX:+PrintGCDateStamps
> -XX:+PrintGCApplicationStoppedTime
> -XX:+PrintGCApplicationConcurrentTime
> -XX:MaxTenuringThreshold=1
> -XX:SurvivorRatio=190
> -XX:TargetSurvivorRatio=90
> -XX:+UseConcMarkSweepGC
> -XX:+UseParNewGC
>
> Cheers
> Matt
>
> Matt Khan
> --------------------------------------------------
> GFFX Auto Trading
> Deutsche Bank, London
>
>
>
> ---
>
> This e-mail may contain confidential and/or privileged information. If you
> are not the intended recipient (or have received this e-mail in error)
> please notify the sender immediately and delete this e-mail. Any
> unauthorized copying, disclosure or distribution of the material in this
> e-mail is strictly forbidden.
>
> Please refer to http://www.db.com/en/content/eu_disclosures.htm for
> additional EU corporate and regulatory disclosures.
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100420/7dfc2ada/attachment-0001.html 

From opinali at gmail.com  Tue Apr 20 13:10:01 2010
From: opinali at gmail.com (Osvaldo Doederlein)
Date: Tue, 20 Apr 2010 17:10:01 -0300
Subject: What influences young generation pause times?
In-Reply-To: <4BCDEE6F.8030807@oracle.com>
References: <4BCDB470.8090102@oracle.com>
	<OF9AD5912A.88CC330D-ON8025770B.0056D10E-8025770B.00584526@db.com>
	<n2tfb5ec5091004201027h25f96f3fqafbfdffe8ecb0ab9@mail.gmail.com>
	<4BCDEE6F.8030807@oracle.com>
Message-ID: <r2tfb5ec5091004201310o71780c67pd10edebf82805f20@mail.gmail.com>

Hi Tony,

Osvaldo,
>
> You misunderstand how a copying GC (which is the algorithm our young gen
> GCs implement) works. It does not first mark the live objects, and then
> copies them. Instead, it copies the objects as it comes across them (i.e.,
> at the same time it discovers they are live). So, there is no opportunity to
> find big blocks of live objects and not copy them. The end of the GC would
> be the only time you would be able to do that but, by then, you've already
> copied all the objects anyway.
>

That's right, but (as my thinking went) just one possible implementation -
but I failed to notice that an ideal solution would require combining both
algorithms in some way that is not possible without a hit for the non-lucky
case of needing to copy.


> Regarding calling young GCs explicitly from an application: I can see how,
> in the case of single-threaded applications, the application might know "We
> are between transactions and, maybe, we have lots of garbage and not much
> live in the young gen. So let's do a young GC to clean up the young gen at
> maybe low overhead since we'll copy very little." However, how will this
> work in the case of multi-threaded applications, which are the vast majority
> of applications we see from our customers? A thread might be between
> transactions, but what about the other 50, 300, or even 2,000 threads? If a
> particular time is good to do a young GC


Not all multithreaded apps are heavily multithreaded; for each mammoth
website with 2000 concurrent transactions, you'll find a thousand corporate
apps with peaks of 5 concurrent transactions and very frequent full-idle
periods. Well, admittedly for these apps, cutting the cost of young-GCs
copying is irrelevant; it's the former, larger apps that need it. I guess at
least my speculation about (big) TLAB collection is valid? And what about
the large number of non-EE apps - remarkably media-heavy / RIA / games,
which are typically "almost-single-threaded" = event dispatch thread plus
two or three application threads, typically tightly controlled (so it's
trivial and cheap to force all these threads to stop in a barrier when you
want to clean up). These apps are often very sensitive to latency: a
stop-the-world 50ms pause at the wrong time are sufficient to result in
visible or audible stuttering.

[Java GC should not care only for the Enterprise side. If you peek into some
dev communities - e.g. javagaming.org - people are always whining about
insufficient control over GC behavior. What we want is something like RTSJ's
scoped heaps - you "enter" some execution phase, allocate lots of Young
objects, then you "leave" this phase and request GC that will be basically
free - but of course, we need something that works in the JavaSE and JavaME
platforms, without the complexities and mutator costs of RTSJ.]


> for a particular thread, it does not mean that it's also good for the rest.
> Additionally, I would be willing to bet money that if we provided such an
> API, library writers will abuse it thinking that "hey, the end of this
> library call will be a great time to do a young GC!", without taking into
> consideration that many other threads could be doing something totally
> different at the same time (we've seen way too many libraries that call
> System.gc() already...).
>

This is true, but I guess the problem could be handled by the Security
manager and/or VM options, maybe allowing only certain packages to induce GC
in any way. There is precedent for that (-XX:+DisableExplicitGC, and default
configuration of app servers to use that option). The problem exisits but
it's not new in any "lightweight GC API" proposal - even to current date I
sometimes find some app code that invokes Good Old System.gc(). Please let's
not use the "developers will shoot themselves in the foot" argument, to not
provide a solution for a very painful problem. :)

A+
Osvaldo


>
> My two cents,
>
> Tony
>
> Osvaldo Doederlein wrote:
>
>> If you allow some intermission... is the young-gen collector smart enough
>> to avoid semispace copying in some favorable conditions? Let's say I am
>> lucky and when young-GC is triggered, after marking I have [LLLLLLLLLLDDDD]
>> where L=live objects, D=dead. it's stupid to copy the block [LLLLLLLLLL] to
>> another space. I'd expect the collector to have some heuristic like: look at
>> the top address and total size of the remaining live data, and if it is
>> densely populated (say >90% live space - e.g. [LLLDLLLLLLDDDD]), just finish
>> GC without any compaction or semispace flipping.
>>
>> I would expect this scenario to happen in the real world with very small
>> frequency, because young-GC must be triggered at a "lucky" time, e.g. after
>> some application transactions commit and before any newer transaction begins
>> - but if the collector already accounts the live set size at the marking
>> phase, the cost to attempt this new optimization is virtually zero. And we
>> might hint the VM to make sure the optimal case doesn't depend on good luck.
>> The JVM could expose an API that allows an application (or a container) to
>> request a "lightweight GC", i.e., perform only young-GC, and only if the
>> young-gen is >N% full. E.g., System.fastGC(0.8) for N=80%. A JavaEE
>> application server could invoke this when it detects idle periods (zero
>> running transactions / zero background processes doing anything important);
>> or even after every transaction commit if the VM uses TLABs (in that case we
>> only collect the TLAB; the whole thing only makes sense for large enough
>> TLABs). For single-threaded processes (Ok, almost-single-threaded...) it's
>> much simpler, just call the lightweight-GC API at special places where major
>> activity ends and tons of allocated data are liberated, e.g. after the
>> render-frame step of your game loop, or after importing each file in your
>> batch ETL program, etc.
>>
>> A+
>> Osvaldo
>>
>> 2010/4/20 Matt Khan <matt.khan at db.com <mailto:matt.khan at db.com>>
>>
>>
>>    Hi Tony
>>
>>    >> Basically, the more objects survive the collection and need to be
>>    copied, the higher the young GC times will be.
>>    so when does a concurrent collector enter a STW pause?
>>
>>    for example if I look at figure 6, p10 in the memory management white
>>    paper (http://java.sun.com/products/hotspot/whitepaper.html) then that
>>    makes it look like there is a single STW pause per young
>>    collection that
>>    is made shorter because there are n threads doing the work. Is that an
>>    accurate depiction of when it pauses or just a convenient
>>    visualisation?
>>
>>    My reason for asking is that my app doesn't exhibit this single
>>    pause per
>>    young collection, instead I see a succession of short pauses
>>    between GC
>>    logs (example below) & I'd like to understand what causes those
>>    pauses.
>>    This app is using CMS (params used below) but there is no CMS activity
>>    reported at this time because v little enters the tenured
>>    generation and
>>    hence there is no collection required.
>>
>>    Total time for which application threads were stopped: 0.0051359
>>    seconds
>>    Application time: 99.9576332 seconds
>>    2010-04-13T19:14:53.185+0000: 368542.855: [GC 368542.855: [ParNew
>>    Desired survivor size 14450688 bytes, new threshold 1 (max 1)
>>    - age   1:    3377144 bytes,    3377144 total
>>    : 2986668K->4491K(2998976K), 0.0254753 secs]
>>    3076724K->94963K(3130048K)
>>    icms_dc=0 , 0.0259072 secs] [Time
>>    s: user=0.25 sys=0.01, real=0.03 secs]
>>    Total time for which application threads were stopped: 0.0330759
>>    seconds
>>    Application time: 190.7387185 seconds
>>    Total time for which application threads were stopped: 0.0060798
>>    seconds
>>    Application time: 9.2698867 seconds
>>    Total time for which application threads were stopped: 0.0051861
>>    seconds
>>    Application time: 290.7195886 seconds
>>    Total time for which application threads were stopped: 0.0065455
>>    seconds
>>    Application time: 9.2792321 seconds
>>    Total time for which application threads were stopped: 0.0051541
>>    seconds
>>    Application time: 290.7292153 seconds
>>    Total time for which application threads were stopped: 0.0063071
>>    seconds
>>    Application time: 9.2696694 seconds
>>    Total time for which application threads were stopped: 0.0052036
>>    seconds
>>    Application time: 290.7093779 seconds
>>    Total time for which application threads were stopped: 0.0065365
>>    seconds
>>    Application time: 9.2793591 seconds
>>    Total time for which application threads were stopped: 0.0051265
>>    seconds
>>    Application time: 290.7301471 seconds
>>    Total time for which application threads were stopped: 0.0070431
>>    seconds
>>    Application time: 9.2694376 seconds
>>    Total time for which application threads were stopped: 0.0051428
>>    seconds
>>    Application time: 119.4074368 seconds
>>    Total time for which application threads were stopped: 0.0059739
>>    seconds
>>    Application time: 39.8647697 seconds
>>    2010-04-13T19:40:52.550+0000: 370102.218: [GC 370102.219: [ParNew
>>    Desired survivor size 14450688 bytes, new threshold 1 (max 1)
>>    - age   1:    2911824 bytes,    2911824 total
>>
>>    -Xms3072m
>>    -Xmx3072m
>>    -Xmn2944m
>>    -XX:+DisableExplicitGC
>>    -XX:+PrintGCDetails
>>    -XX:+PrintGCDateStamps
>>    -XX:+PrintGCApplicationStoppedTime
>>    -XX:+PrintGCApplicationConcurrentTime
>>    -XX:MaxTenuringThreshold=1
>>    -XX:SurvivorRatio=190
>>    -XX:TargetSurvivorRatio=90
>>    -XX:+UseConcMarkSweepGC
>>    -XX:+UseParNewGC
>>
>>    Cheers
>>    Matt
>>
>>    Matt Khan
>>    --------------------------------------------------
>>    GFFX Auto Trading
>>    Deutsche Bank, London
>>
>>
>>
>>    ---
>>
>>    This e-mail may contain confidential and/or privileged
>>    information. If you are not the intended recipient (or have
>>    received this e-mail in error) please notify the sender
>>    immediately and delete this e-mail. Any unauthorized copying,
>>    disclosure or distribution of the material in this e-mail is
>>    strictly forbidden.
>>
>>    Please refer to http://www.db.com/en/content/eu_disclosures.htm
>>    for additional EU corporate and regulatory disclosures.
>>    _______________________________________________
>>    hotspot-gc-use mailing list
>>    hotspot-gc-use at openjdk.java.net
>>    <mailto:hotspot-gc-use at openjdk.java.net>
>>
>>    http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100420/f4599e2a/attachment-0001.html 

From matt.khan at db.com  Tue Apr 20 13:30:32 2010
From: matt.khan at db.com (Matt Khan)
Date: Tue, 20 Apr 2010 21:30:32 +0100
Subject: What influences young generation pause times?
In-Reply-To: <4BCDF1EB.2010605@oracle.com>
Message-ID: <OFE7E9B80D.EF6A1B05-ON8025770B.006FA02B-8025770B.0070A8F6@db.com>

>> You might need a VM with this fix. 6782663: Data produced by 
PrintGCApplicationConcurrentTime and PrintGCApplicationStoppedTime is not 
accurate
OK so that's the early access for 6u21 by the looks of it. I'll repeat a 
run on that JVM (and with the safepoint stats).

>> If the log doesn't show a [GC...] line at a safepoint, then there is 
not GC activity during it 
does this mean that the actual time stopped due to GC is the one 
immediately after the GC line? 

>> You could use -XX:+PrintSafepointStatistics etc. to figure these out.
which other flags should I look at?

Cheers
Matt

Matt Khan
--------------------------------------------------
GFFX Auto Trading
Deutsche Bank, London


---

This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and delete this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.

Please refer to http://www.db.com/en/content/eu_disclosures.htm for additional EU corporate and regulatory disclosures.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100420/51fd8c39/attachment.html 

From jon.masamitsu at oracle.com  Tue Apr 20 13:39:34 2010
From: jon.masamitsu at oracle.com (Jon Masamitsu)
Date: Tue, 20 Apr 2010 13:39:34 -0700
Subject: What influences young generation pause times?
In-Reply-To: <OFE7E9B80D.EF6A1B05-ON8025770B.006FA02B-8025770B.0070A8F6@db.com>
References: <OFE7E9B80D.EF6A1B05-ON8025770B.006FA02B-8025770B.0070A8F6@db.com>
Message-ID: <4BCE1106.1060400@oracle.com>

On 04/20/10 13:30, Matt Khan wrote:
>
> >> You might need a VM with this fix. 6782663: Data produced by 
> PrintGCApplicationConcurrentTime and PrintGCApplicationStoppedTime is 
> not accurate
> OK so that's the early access for 6u21 by the looks of it. I'll repeat 
> a run on that JVM (and with the safepoint stats).
>
> >> If the log doesn't show a [GC...] line at a safepoint, then there 
> is not GC activity during it
> does this mean that the actual time stopped due to GC is the one 
> immediately after the GC line? 

As you can see more than GC can contribute to a STW so the "time 
stopped" might include
other times that don't log any message.   The GC entries in the log 
print their own times.
>
>
> >> You could use -XX:+PrintSafepointStatistics etc. to figure these out.
> which other flags should I look at?
I don't know of any others.
>
> Cheers
> Matt
>
> Matt Khan
> --------------------------------------------------
> GFFX Auto Trading
> Deutsche Bank, London
>
> ---
>
> This e-mail may contain confidential and/or privileged information. If 
> you are not the intended recipient (or have received this e-mail in 
> error) please notify the sender immediately and delete this e-mail. 
> Any unauthorized copying, disclosure or distribution of the material 
> in this e-mail is strictly forbidden.
>
> Please refer to http://www.db.com/en/content/eu_disclosures.htm for 
> additional EU corporate and regulatory disclosures.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100420/e06abd93/attachment.html 

From y.s.ramakrishna at oracle.com  Tue Apr 20 13:56:57 2010
From: y.s.ramakrishna at oracle.com (Y. Srinivas Ramakrishna)
Date: Tue, 20 Apr 2010 13:56:57 -0700
Subject: What influences young generation pause times?
In-Reply-To: <OFE7E9B80D.EF6A1B05-ON8025770B.006FA02B-8025770B.0070A8F6@db.com>
References: <OFE7E9B80D.EF6A1B05-ON8025770B.006FA02B-8025770B.0070A8F6@db.com>
Message-ID: <4BCE1519.808@oracle.com>

On 04/20/10 13:30, Matt Khan wrote:
...
>  >> You could use -XX:+PrintSafepointStatistics etc. to figure these out.
> which other flags should I look at?

   product(bool, PrintSafepointStatistics, false,                            \
           "print statistics about safepoint synchronization")               \
                                                                             \
   product(intx, PrintSafepointStatisticsCount, 300,                         \
           "total number of safepoint statistics collected "                 \
           "before printing them out")                                       \


The o/p might go to stdout, rather than to the gc log file in
case you have the one redirected and not the other.

Once you know what those safepoints are for, you can use more specific
flags to see what's happening there (for example as Tony conjectured
perhaps biased locking; or global deoptimization due to class loading etc.)

-- ramki

From matt.khan at db.com  Tue Apr 20 14:14:35 2010
From: matt.khan at db.com (Matt Khan)
Date: Tue, 20 Apr 2010 22:14:35 +0100
Subject: What influences young generation pause times?
In-Reply-To: <4BCE1519.808@oracle.com>
Message-ID: <OFCD6F96CE.C4149986-ON8025770B.00742C7E-8025770B.0074B1AA@db.com>

perhaps this should be on the runtime list but since it's relevant to this 
thread I'll ask here....

I'm periodically getting output like

RevokeBias                         [     238          0              1]    
[     0     0     0]     [     0        7]          0 

headed by one of these

Total time for wh [     0     0     0]     [     0       14]          0 

the lines are overwhelmingly RevokeBias or BulkRevokeBias 

Can you explain what these values are? I'm not especially familiar with 
the hotspot source code so am unsure where I'd find this info in the src 
alone. 

Cheers
Matt

Matt Khan
--------------------------------------------------
GFFX Auto Trading
Deutsche Bank, London


---

This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and delete this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.

Please refer to http://www.db.com/en/content/eu_disclosures.htm for additional EU corporate and regulatory disclosures.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100420/c951917a/attachment.html 

From y.s.ramakrishna at oracle.com  Tue Apr 20 15:37:08 2010
From: y.s.ramakrishna at oracle.com (Y. Srinivas Ramakrishna)
Date: Tue, 20 Apr 2010 15:37:08 -0700
Subject: What influences young generation pause times?
In-Reply-To: <OFCD6F96CE.C4149986-ON8025770B.00742C7E-8025770B.0074B1AA@db.com>
References: <OFCD6F96CE.C4149986-ON8025770B.00742C7E-8025770B.0074B1AA@db.com>
Message-ID: <4BCE2C94.8090006@oracle.com>


On 04/20/10 14:14, Matt Khan wrote:
> 
> perhaps this should be on the runtime list but since it's relevant to 
> this thread I'll ask here....
> 
> I'm periodically getting output like
> 
> RevokeBias                         [     238          0              1] 
>          [     0     0     0]     [     0        7]          0  
> 
> headed by one of these
> 
> Total time for wh [     0     0     0]     [     0       14]          0  
> 
> the lines are overwhelmingly RevokeBias or BulkRevokeBias
> 
> Can you explain what these values are? I'm not especially familiar with 
> the hotspot source code so am unsure where I'd find this info in the src 
> alone.


If you are asking about the columns in each record, check the header
line which should have been printed earlier. If you use a more recent
JVM (hs18) you should see the column headings printed more frequently
than the sole initial appearance in the log produced by older jvm's.

For background on biased locking and associated flags, and what
is involved in bias revocation etc, read
http://portal.acm.org/citation.cfm?doid=1167473.1167496

Experiment with -XX:-UseBiasedLocking if it's not working well for
you, or if it is interfering with your predictability
objectives (read the article for when biased locking may not work well
and how it may affect predictability), and contact your
Java support for further tuning help, questions, to report a
performance bug, etc.

all the best.
-- ramki

> 
> Cheers
> Matt
> 
> Matt Khan
> --------------------------------------------------
> GFFX Auto Trading
> Deutsche Bank, London
> 
> 
> ---
> 
> This e-mail may contain confidential and/or privileged information. If 
> you are not the intended recipient (or have received this e-mail in 
> error) please notify the sender immediately and delete this e-mail. Any 
> unauthorized copying, disclosure or distribution of the material in this 
> e-mail is strictly forbidden.
> 
> Please refer to http://www.db.com/en/content/eu_disclosures.htm for 
> additional EU corporate and regulatory disclosures.


From tony.printezis at oracle.com  Wed Apr 21 08:01:49 2010
From: tony.printezis at oracle.com (Tony Printezis)
Date: Wed, 21 Apr 2010 11:01:49 -0400
Subject: What influences young generation pause times?
In-Reply-To: <r2tfb5ec5091004201310o71780c67pd10edebf82805f20@mail.gmail.com>
References: <4BCDB470.8090102@oracle.com>	
	<OF9AD5912A.88CC330D-ON8025770B.0056D10E-8025770B.00584526@db.com>	
	<n2tfb5ec5091004201027h25f96f3fqafbfdffe8ecb0ab9@mail.gmail.com>	
	<4BCDEE6F.8030807@oracle.com>
	<r2tfb5ec5091004201310o71780c67pd10edebf82805f20@mail.gmail.com>
Message-ID: <4BCF135D.8030808@oracle.com>

Osvaldo,

Osvaldo Doederlein wrote:
>
>     Regarding calling young GCs explicitly from an application: I can
>     see how, in the case of single-threaded applications, the
>     application might know "We are between transactions and, maybe, we
>     have lots of garbage and not much live in the young gen. So let's
>     do a young GC to clean up the young gen at maybe low overhead
>     since we'll copy very little." However, how will this work in the
>     case of multi-threaded applications, which are the vast majority
>     of applications we see from our customers? A thread might be
>     between transactions, but what about the other 50, 300, or even
>     2,000 threads? If a particular time is good to do a young GC 
>
>
> Not all multithreaded apps are heavily multithreaded; for each mammoth 
> website with 2000 concurrent transactions, you'll find a thousand 
> corporate apps with peaks of 5 concurrent transactions and very 
> frequent full-idle periods. Well, admittedly for these apps, cutting 
> the cost of young-GCs copying is irrelevant;
:-)
> it's the former, larger apps that need it. I guess at least my 
> speculation about (big) TLAB collection is valid?
I'm not quite sure what you mean by big TLAB collection. You mean to 
collect just the TLABs of a thread. You really cannot do that without 
scanning the entire young gen to find all the references into them.
> And what about the large number of non-EE apps - remarkably 
> media-heavy / RIA / games, which are typically 
> "almost-single-threaded" = event dispatch thread plus two or three 
> application threads, typically tightly controlled (so it's trivial and 
> cheap to force all these threads to stop in a barrier when you want to 
> clean up).
I haven't talked to many customers who will be willing to introduce 
their own safepoints in their Java code.
> These apps are often very sensitive to latency: a stop-the-world 50ms 
> pause at the wrong time are sufficient to result in visible or audible 
> stuttering.
>
> [Java GC should not care only for the Enterprise side. If you peek 
> into some dev communities - e.g. javagaming.org 
> <http://javagaming.org> - people are always whining about insufficient 
> control over GC behavior. What we want is something like RTSJ's scoped 
> heaps - you "enter" some execution phase, allocate lots of Young 
> objects, then you "leave" this phase and request GC that will be 
> basically free - but of course, we need something that works in the 
> JavaSE and JavaME platforms, without the complexities and mutator 
> costs of RTSJ.]
Let's assume you want to use something like RTSJ's scopes (and let's say 
they are simplified and assume only one thread will enter them, which 
removes a lot of the complexity associated with RTSJ's scopes). Then the 
only way to be able to reclaim a scope when you leave it is to ensure 
that there are no references from outside the scope into it. You can 
ensure that this is the case by introducing write barriers and throwing 
an exception if such a reference is created (which is what the RTSJ 
does), but now a lot of the existing code won't work correctly with this 
restriction. So, you just cannot do what you're proposing without some 
extra costs.

We have been working on, and thinking about GC, for a long time. Trust 
me, if there was a way to do cheaply what you're proposing, we would 
have done it a long time ago. With the semantics of Java it's not 
straightforward. And, after many years of doing this stuff, I can also 
assure you that nothing in GC it's "free". ;-)
>  
>
>     for a particular thread, it does not mean that it's also good for
>     the rest. Additionally, I would be willing to bet money that if we
>     provided such an API, library writers will abuse it thinking that
>     "hey, the end of this library call will be a great time to do a
>     young GC!", without taking into consideration that many other
>     threads could be doing something totally different at the same
>     time (we've seen way too many libraries that call System.gc()
>     already...).
>
>
> This is true, but I guess the problem could be handled by the Security 
> manager and/or VM options, maybe allowing only certain packages to 
> induce GC in any way. There is precedent for that 
> (-XX:+DisableExplicitGC, and default configuration of app servers to 
> use that option). The problem exisits but it's not new in any 
> "lightweight GC API" proposal - even to current date I sometimes find 
> some app code that invokes Good Old System.gc(). Please let's not use 
> the "developers will shoot themselves in the foot" argument, to not 
> provide a solution for a very painful problem. :)
On the contrary, our mission is to prevent developers from shooting 
themselves in the foot. :-)

Tony


From opinali at gmail.com  Wed Apr 21 11:55:30 2010
From: opinali at gmail.com (Osvaldo Doederlein)
Date: Wed, 21 Apr 2010 15:55:30 -0300
Subject: What influences young generation pause times?
In-Reply-To: <4BCF135D.8030808@oracle.com>
References: <4BCDB470.8090102@oracle.com>
	<OF9AD5912A.88CC330D-ON8025770B.0056D10E-8025770B.00584526@db.com>
	<n2tfb5ec5091004201027h25f96f3fqafbfdffe8ecb0ab9@mail.gmail.com>
	<4BCDEE6F.8030807@oracle.com>
	<r2tfb5ec5091004201310o71780c67pd10edebf82805f20@mail.gmail.com>
	<4BCF135D.8030808@oracle.com>
Message-ID: <h2zfb5ec5091004211155pdce28774m8c483610a18e6624@mail.gmail.com>

Tony,

2010/4/21 Tony Printezis <tony.printezis at oracle.com>

> I'm not quite sure what you mean by big TLAB collection. You mean to
> collect just the TLABs of a thread. You really cannot do that without
> scanning the entire young gen to find all the references into them.


I didn't know that, I was supposing that the TLAB would have a remset for
incoming references from the rest of the young gen.


> Let's assume you want to use something like RTSJ's scopes (and let's say
> they are simplified and assume only one thread will enter them, which
> removes a lot of the complexity associated with RTSJ's scopes). Then the
> only way to be able to reclaim a scope when you leave it is to ensure that
> there are no references from outside the scope into it. You can ensure that
> this is the case by introducing write barriers and throwing an exception if
> such a reference is created (which is what the RTSJ does), but now a lot of
> the existing code won't work correctly with this restriction. So, you just
> cannot do what you're proposing without some extra costs.
>

To make my suggestion more clear, the intention is just enabling something
similar to RTSJ. While "inside" some scope, the program can introduce
references from old to new objects, as long as these refs are all cleared
when "exiting" the scope. By "request GC that will be basically free" I
didn't mean something identical to RTSJ (free() a whole heap block), but
only trigger the young-GC -- at that time, all those young objects are
unreachable. Except perhaps a few objects that existed before entering the
scope - not a problem, they are all compacted in the beginning of the YG, so
young-GC will be only slightly slower than for zero live objects, no big
deal. (I'm also not wishing for real-time GC guarantees like in RTSJ.)


> We have been working on, and thinking about GC, for a long time. Trust me,
> if there was a way to do cheaply what you're proposing, we would have done
> it a long time ago. With the semantics of Java it's not straightforward.
> And, after many years of doing this stuff, I can also assure you that
> nothing in GC it's "free". ;-)


I certainly believe this - I'm not suggesting anything that would be really
free. In all suggestions I would expect some hit in overall throughput
(thread-local young gens would have this cost because we potentially
increase the working set, so reduced cache efficiency alone should more than
offset the cycles saved in GC). In other cases - fine-grained explicit GC
API + strict threading/allocation behavior required for near-instant young
GC - the cost is programming complexity. These costs would be fine for a
great many apps.

Let's put it differently: why not adding a simple API like I suggested
(trigger only young-GC, perhaps with a parameter to only do that if the
young gen has less than some % of free space)? I'd expect this to be trivial
to implement. Maybe we could have this in some sun.* package, and only in
non-release builds for extra protection. Let people kick the tires and give
you feedback - whether this produces significant benefits for some apps.

A+
Osvaldo


>
>
>>    for a particular thread, it does not mean that it's also good for
>>    the rest. Additionally, I would be willing to bet money that if we
>>    provided such an API, library writers will abuse it thinking that
>>    "hey, the end of this library call will be a great time to do a
>>    young GC!", without taking into consideration that many other
>>    threads could be doing something totally different at the same
>>    time (we've seen way too many libraries that call System.gc()
>>    already...).
>>
>>
>> This is true, but I guess the problem could be handled by the Security
>> manager and/or VM options, maybe allowing only certain packages to induce GC
>> in any way. There is precedent for that (-XX:+DisableExplicitGC, and default
>> configuration of app servers to use that option). The problem exisits but
>> it's not new in any "lightweight GC API" proposal - even to current date I
>> sometimes find some app code that invokes Good Old System.gc(). Please let's
>> not use the "developers will shoot themselves in the foot" argument, to not
>> provide a solution for a very painful problem. :)
>>
> On the contrary, our mission is to prevent developers from shooting
> themselves in the foot. :-)
>
> Tony
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100421/9432d250/attachment.html 

From dreamingpacer at gmail.com  Wed Apr 21 20:08:18 2010
From: dreamingpacer at gmail.com (dreamingpacer)
Date: Fri, 22 Apr 2010 11:08:18 +0800
Subject: In JDK 6, do all collectors use ergonomics?
Message-ID: <4bcfbd9f.13838d0a.4043.ffffbcda@mx.google.com>

HI,
  If not, which one does? Thanks. 

Cheers,
mengyoyou
2010-04-22


mengyoyou
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100422/8a913379/attachment.html 

From tony.printezis at oracle.com  Fri Apr 23 12:10:44 2010
From: tony.printezis at oracle.com (Tony Printezis)
Date: Fri, 23 Apr 2010 15:10:44 -0400
Subject: What influences young generation pause times?
In-Reply-To: <h2zfb5ec5091004211155pdce28774m8c483610a18e6624@mail.gmail.com>
References: <4BCDB470.8090102@oracle.com>
	<OF9AD5912A.88CC330D-ON8025770B.0056D10E-8025770B.00584526@db.com>
	<n2tfb5ec5091004201027h25f96f3fqafbfdffe8ecb0ab9@mail.gmail.com>
	<4BCDEE6F.8030807@oracle.com>
	<r2tfb5ec5091004201310o71780c67pd10edebf82805f20@mail.gmail.com>
	<4BCF135D.8030808@oracle.com>
	<h2zfb5ec5091004211155pdce28774m8c483610a18e6624@mail.gmail.com>
Message-ID: <4BD1F0B4.8060302@oracle.com>

Osvaldo,

Osvaldo Doederlein wrote:
> Tony,
>
> 2010/4/21 Tony Printezis <tony.printezis at oracle.com 
> <mailto:tony.printezis at oracle.com>>
>
>     I'm not quite sure what you mean by big TLAB collection. You mean
>     to collect just the TLABs of a thread. You really cannot do that
>     without scanning the entire young gen to find all the references
>     into them.
>
>
> I didn't know that, I was supposing that the TLAB would have a remset 
> for incoming references from the rest of the young gen.
>
Oh, no. Too expensive. There are too many references within the young 
gen, it could be too expensive to keep track of them.
>
>     Let's assume you want to use something like RTSJ's scopes (and
>     let's say they are simplified and assume only one thread will
>     enter them, which removes a lot of the complexity associated with
>     RTSJ's scopes). Then the only way to be able to reclaim a scope
>     when you leave it is to ensure that there are no references from
>     outside the scope into it. You can ensure that this is the case by
>     introducing write barriers and throwing an exception if such a
>     reference is created (which is what the RTSJ does), but now a lot
>     of the existing code won't work correctly with this restriction.
>     So, you just cannot do what you're proposing without some extra costs.
>
>
> To make my suggestion more clear, the intention is just enabling 
> something similar to RTSJ. While "inside" some scope, the program can 
> introduce references from old to new objects, as long as these refs 
> are all cleared when "exiting" the scope. By "request GC that will be 
> basically free" I didn't mean something identical to RTSJ (free() a 
> whole heap block), but only trigger the young-GC --
Here you are assuming that what you're referring to as a "young GC" will 
only touch the objects that just got allocated by that thread (and 
there'd be extra costs associated with ensuring that). Or did you really 
mean an actual young GC? Again, in the single-threaded case, this might 
work. It won't in the multi-threaded case: if each thread that completes 
a transaction triggers a young GC, then the system will be overwhelmed 
by young GCs.
> at that time, all those young objects are unreachable. Except perhaps 
> a few objects that existed before entering the scope - not a problem, 
> they are all compacted in the beginning of the YG, so young-GC will be 
> only slightly slower than for zero live objects, no big deal.
As I said, there's nothing "free" in GC. If you copy all the objects (as 
we do now), you have to copy the ones that existed when you entered the 
scope anyway. If you introduce a way to only copy the survivors of the 
objects that were created from inside the scope, you'd still have to 
scan the ones you didn't copy to find and update references to the 
objects you're moving. If you don't want to scan them, you'd have to 
maintain remembered sets. You might think that implementing something 
like what you're proposing is simple but, really, it's not.
> (I'm also not wishing for real-time GC guarantees like in RTSJ.)
>
Good. :-)
>
>     We have been working on, and thinking about GC, for a long time.
>     Trust me, if there was a way to do cheaply what you're proposing,
>     we would have done it a long time ago. With the semantics of Java
>     it's not straightforward. And, after many years of doing this
>     stuff, I can also assure you that nothing in GC it's "free". ;-)
>
>
> I certainly believe this - I'm not suggesting anything that would be 
> really free. In all suggestions I would expect some hit in overall 
> throughput (thread-local young gens would have this cost because we 
> potentially increase the working set, so reduced cache efficiency 
> alone should more than offset the cycles saved in GC).
And also because the concept of a thead-local object is not built-in the 
language. If we were doing Erlang, it'd be a different story...
> In other cases - fine-grained explicit GC API + strict 
> threading/allocation behavior required for near-instant young GC - the 
> cost is programming complexity. These costs would be fine for a great 
> many apps.
>
> Let's put it differently: why not adding a simple API like I suggested 
> (trigger only young-GC, perhaps with a parameter to only do that if 
> the young gen has less than some % of free space)? I'd expect this to 
> be trivial to implement.
It will definitely not be complicated to implement.
> Maybe we could have this in some sun.* package, and only in 
> non-release builds for extra protection. Let people kick the tires and 
> give you feedback - whether this produces significant benefits for 
> some apps.
There are no plans to do this at the moment. In all honesty, we all have 
our hands full right now and such an API would be low in our (very long) 
priority list.

Tony
>  
>
>
>          
>            for a particular thread, it does not mean that it's also
>         good for
>            the rest. Additionally, I would be willing to bet money
>         that if we
>            provided such an API, library writers will abuse it
>         thinking that
>            "hey, the end of this library call will be a great time to do a
>            young GC!", without taking into consideration that many other
>            threads could be doing something totally different at the same
>            time (we've seen way too many libraries that call System.gc()
>            already...).
>
>
>         This is true, but I guess the problem could be handled by the
>         Security manager and/or VM options, maybe allowing only
>         certain packages to induce GC in any way. There is precedent
>         for that (-XX:+DisableExplicitGC, and default configuration of
>         app servers to use that option). The problem exisits but it's
>         not new in any "lightweight GC API" proposal - even to current
>         date I sometimes find some app code that invokes Good Old
>         System.gc(). Please let's not use the "developers will shoot
>         themselves in the foot" argument, to not provide a solution
>         for a very painful problem. :)
>
>     On the contrary, our mission is to prevent developers from
>     shooting themselves in the foot. :-)
>
>     Tony
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>   

From opinali at gmail.com  Fri Apr 23 13:38:06 2010
From: opinali at gmail.com (Osvaldo Doederlein)
Date: Fri, 23 Apr 2010 17:38:06 -0300
Subject: What influences young generation pause times?
In-Reply-To: <4BD1F0B4.8060302@oracle.com>
References: <4BCDB470.8090102@oracle.com>
	<OF9AD5912A.88CC330D-ON8025770B.0056D10E-8025770B.00584526@db.com>
	<n2tfb5ec5091004201027h25f96f3fqafbfdffe8ecb0ab9@mail.gmail.com>
	<4BCDEE6F.8030807@oracle.com>
	<r2tfb5ec5091004201310o71780c67pd10edebf82805f20@mail.gmail.com>
	<4BCF135D.8030808@oracle.com>
	<h2zfb5ec5091004211155pdce28774m8c483610a18e6624@mail.gmail.com>
	<4BD1F0B4.8060302@oracle.com>
Message-ID: <z2wfb5ec5091004231338m7c884834l29fab5cd71296aa6@mail.gmail.com>

Hi Tony,

2010/4/23 Tony Printezis <tony.printezis at oracle.com>

> To make my suggestion more clear, the intention is just enabling something
> similar to RTSJ. While "inside" some scope, the program can introduce
> references from old to new objects, as long as these refs are all cleared
> when "exiting" the scope. By "request GC that will be basically free" I
> didn't mean something identical to RTSJ (free() a whole heap block), but
> only trigger the young-GC --
> Here you are assuming that what you're referring to as a "young GC" will
> only touch the objects that just got allocated by that thread (and there'd
> be extra costs associated with ensuring that). Or did you really mean an
> actual young GC? Again, in the single-threaded case, this might work. It
> won't in the multi-threaded case: if each thread that completes a
> transaction triggers a young GC, then the system will be overwhelmed by
> young GCs.
>

I'm not ignoring the cost of marking - the discussion started because
copying alone can be a significant fraction of the total cost of
generational GC. If the young-gen is basically empty of live objects, I
think we have significant speedup compared to an average young-GC (that
doesn't necessarily happen in the "perfect" time, with lots of live young
objects). But I see now that the idea doesn't work at all for nontrivial
multithreaded apps.


>  at that time, all those young objects are unreachable. Except perhaps a
>> few objects that existed before entering the scope - not a problem, they are
>> all compacted in the beginning of the YG, so young-GC will be only slightly
>> slower than for zero live objects, no big deal.
>>
> As I said, there's nothing "free" in GC. If you copy all the objects (as we
> do now), you have to copy the ones that existed when you entered the scope
> anyway. If you introduce a way to only copy the survivors of the objects
> that were created from inside the scope, you'd still have to scan the ones
> you didn't copy to find and update references to the objects you're moving.
> If you don't want to scan them, you'd have to maintain remembered sets. You
> might think that implementing something like what you're proposing is simple
> but, really, it's not.
>

My assumption here is that the old generation is very stable - the program
loops in a cycle of phases that create lots of new objects, then in the end
of each cycle virtually all these objects die, they are collected, goto next
cycle. Any objects that you may have in the YG initially (before the program
stabilizes in such loop) would be promoted to the old gen after a few
collections. The app developer would measure the worst-case allocation
volume for each cycle, and size the young-gen so that volume fits.


> Let's put it differently: why not adding a simple API like I suggested
>> (trigger only young-GC, perhaps with a parameter to only do that if the
>> young gen has less than some % of free space)? I'd expect this to be trivial
>> to implement.
>>
>
> It will definitely not be complicated to implement.
>
>> Maybe we could have this in some sun.* package, and only in non-release
>> builds for extra protection. Let people kick the tires and give you feedback
>> - whether this produces significant benefits for some apps.
>>
> There are no plans to do this at the moment. In all honesty, we all have
> our hands full right now and such an API would be low in our (very long)
> priority list.
>

 This looks like an interesting OpenJDK experiment. Maybe I should find some
time to write a patch, then test a couple apps that I think that would
benefit from the idea.

A+
Osvaldo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100423/bf64a639/attachment.html