From guanxiaohua at gmail.com  Wed Sep  2 11:47:22 2009
From: guanxiaohua at gmail.com (Tony Guan)
Date: Wed, 2 Sep 2009 13:47:22 -0500
Subject: capturing method entry/exit efficiently
Message-ID: <2fcb552b0909021147r4a9425dds961bf020cb8ee4a4@mail.gmail.com>

Dear all,

One update for yesterday's mail is that I am now able to add some
call_vm() code in  TemplateTable::invokespecial(), to get my own code
executed. This is much economy in compared with JVMTI.  But I just
cannot find a suitable place for monitoring the method_exit.

Any idea about that? And still, the compiled method is still a nightmare for me.

Thanks!

Tony

> Dear all,
>
> My current research project with hotspot requires me to do something
> particular whenever a method(interpreted or compiled) is invoked. I
> need to know the thread and the method at the invocation time. What I
> am trying to do is to do some VM hacking based on the methods called.
> Question 1: Can I use BCI to achieve this?
>
> I am now able to capture the method_entry/exit events by writing a
> JVMTI agent, but it's not what I really need to do. By using JVMTI,
> performance is deteriorated a lot. And I am not sure if the compiled
> method can still be captured. (Though I know java1.5 has some JVMPI
> support in the compilation part, but not in java1.7. Am I right?).
> Question 2: I am trying find a way to enable the
> notify_method_enry/exit by partly simulating an JVMTI agent, that
> means that I modify several parts in the hotspot without actually use
> an external JVMTI agent. Is it feasible? (in terms of perfomance)
>
> Question 3: Is there some better way to capture the method_entry/exit event?
>
> Thanks for diluting the question marks in my mind!
>
> Tony ?(Xiaohua Guan)
>
>
> ------------------------------
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
> End of hotspot-gc-use Digest, Vol 21, Issue 1
> *********************************************
>


-- 
Xiaohua Guan (Tony)

Department of Computer Science and Engineering
University of Nebraska-Lincoln
103A Avery Hall
Lincoln, NE 68588-0115
Tel: 402-472-3884
Email: xguan at cse.unl.edu
URL: http://www.cse.unl.edu/~xguan

(This email is encoded as Unicode(UTF-8))

From Peter.Kessler at Sun.COM  Thu Sep  3 10:47:23 2009
From: Peter.Kessler at Sun.COM (Peter B. Kessler)
Date: Thu, 03 Sep 2009 10:47:23 -0700
Subject: capturing method entry/exit efficiently
In-Reply-To: <2fcb552b0909021147r4a9425dds961bf020cb8ee4a4@mail.gmail.com>
References: <2fcb552b0909021147r4a9425dds961bf020cb8ee4a4@mail.gmail.com>
Message-ID: <4AA0012B.2090509@Sun.COM>

Since hotspot-gc-use at openjdk.java.net is the HotSpot GC users mailing list, you might not be reaching the people who can help you with your project.  I would suggest asking on the hotspot-runtime-dev at openjdk-java.net for help with the interpreter (e.g., TemplateTable), or hotspot-compiler-dev at openjdk.java.net for help with the runtime compiler, or serviceability-dev at openjdk.java.net for help with the monitoring frameworks (e.g., JVMTI).  But the garbage collectors don't have anything to do with method entry or exit.

You might also want to look at bytecode rewriting, e.g., via your own classloader, to add instrumentation to method entries and exits for the methods of the classes you are interested in.  I think there are tools available that make bytecode rewriting not as difficult as it sounds.

			... peter

Tony Guan wrote:
> Dear all,
> 
> One update for yesterday's mail is that I am now able to add some
> call_vm() code in  TemplateTable::invokespecial(), to get my own code
> executed. This is much economy in compared with JVMTI.  But I just
> cannot find a suitable place for monitoring the method_exit.
> 
> Any idea about that? And still, the compiled method is still a nightmare for me.
> 
> Thanks!
> 
> Tony
> 
>> Dear all,
>>
>> My current research project with hotspot requires me to do something
>> particular whenever a method(interpreted or compiled) is invoked. I
>> need to know the thread and the method at the invocation time. What I
>> am trying to do is to do some VM hacking based on the methods called.
>> Question 1: Can I use BCI to achieve this?
>>
>> I am now able to capture the method_entry/exit events by writing a
>> JVMTI agent, but it's not what I really need to do. By using JVMTI,
>> performance is deteriorated a lot. And I am not sure if the compiled
>> method can still be captured. (Though I know java1.5 has some JVMPI
>> support in the compilation part, but not in java1.7. Am I right?).
>> Question 2: I am trying find a way to enable the
>> notify_method_enry/exit by partly simulating an JVMTI agent, that
>> means that I modify several parts in the hotspot without actually use
>> an external JVMTI agent. Is it feasible? (in terms of perfomance)
>>
>> Question 3: Is there some better way to capture the method_entry/exit event?
>>
>> Thanks for diluting the question marks in my mind!
>>
>> Tony  (Xiaohua Guan)
>>
>>
>> ------------------------------
>>
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>
>>
>> End of hotspot-gc-use Digest, Vol 21, Issue 1
>> *********************************************
>>
> 
> 
> 


From jeff.lloyd at algorithmics.com  Thu Sep 10 13:06:54 2009
From: jeff.lloyd at algorithmics.com (jeff.lloyd at algorithmics.com)
Date: Thu, 10 Sep 2009 16:06:54 -0400
Subject: Young generation configuration
Message-ID: <0FCC438D62A5E643AA3F57D3417B220D0A7AE2ED@TORMAIL.algorithmics.com>

Hi,

 
I'm new to this list and I have a few questions about tuning my young
generation gc. 

 
I have chosen to use the CMS garbage collector because my application is
a relatively large reporting server that has a web front end and
therefore needs to have minimal pauses.  

 
I am using java 1.6.0_16 64-bit on redhat 5.2 intel 8x3GHz and 64GB ram.

 
The machine is dedicated to this JVM.

 
My steady-state was calculated as follows:

-          A typical number of users logged in and viewed several
reports

-          Stopped user actions and performed a manual full GC

-          Look at the amount of heap used and take that number as the
steady-state memory requirement

 
In this case my heap usage was ~10GB.  In order to handle variance or
spikes I sized my old generation at 15-20GB.

 
I sized my young generation at 32-42GB and used survivor ratios of 1, 2,
3 and 6.

 
My goal is to maximize throughput and minimize pauses.  I'm willing to
sacrifice ram to increase speed.

 
I have attached several of my many gc logs.  The file gc_48G.txt is just
using CMS without any other tuning, and the results are much worse than
what I have been able to accomplish with other settings.  The best
results are in the files gc_52G_20Gold_32Gyoung_2sr.txt and
gc_57G_15Gold_42Gyoung_1sr.txt.

 
The problem is that some of the pauses are just too long.

 
Is there a way to reduce the pause time any more than I have it now?

Am I heading in the right direction?  I ask because the default settings
are so different than what I have been heading towards.

 
The best reference I have found on what good gc logs look like come from
brief examples presented at JavaOne this year by Tony Printezis and
Charlie Hunt.  But I don't seem to be able to get logs that resemble
their tenuring patterns.

 
I think I have a lot of medium-lived objects instead of nice short-lived
ones.

 
Are there any good practices for apps with objects like this?

 
Thanks,

Jeff

 
--------------------------------------------------------------------------
This email and any files transmitted with it are confidential and proprietary to Algorithmics Incorporated and its affiliates ("Algorithmics"). If received in error, use is prohibited. Please destroy, and notify sender. Sender does not waive confidentiality or privilege. Internet communications cannot be guaranteed to be timely, secure, error or virus-free. Algorithmics does not accept liability for any errors or omissions. Any commitment intended to bind Algorithmics must be reduced to writing and signed by an authorized signatory.
--------------------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20090910/55defd8d/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gc.zip
Type: application/x-zip-compressed
Size: 66850 bytes
Desc: gc.zip
Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20090910/55defd8d/attachment-0001.bin 

From tony.printezis at sun.com  Fri Sep 11 08:22:17 2009
From: tony.printezis at sun.com (Tony Printezis)
Date: Fri, 11 Sep 2009 11:22:17 -0400
Subject: Young generation configuration
In-Reply-To: <0FCC438D62A5E643AA3F57D3417B220D0A7AE2ED@TORMAIL.algorithmics.com>
References: <0FCC438D62A5E643AA3F57D3417B220D0A7AE2ED@TORMAIL.algorithmics.com>
Message-ID: <4AAA6B29.3030008@sun.com>

Jeff,

Hi. I had a very brief look at your logs. Yes, your app does seem to 
need to copy quite a lot (I don't think I've ever seen 1-2GB of data 
being copied in age 1!!!). From what I've seen from the space sizes, 
you're doing the right thing (i.e., you're consistent with what we 
talked about during the talk): you have quite large young gen and a 
reasonably sized old gen. But the sheer amount of surviving objects is 
what's getting you. How much larger can you make your young gen? I think 
in this case, the larger, the better.  Maybe, you can also try 
MaxTenuringThreshold=1. This goes against our general advice, but this 
might decrease the amount of objects being copied during young GCs, at 
the expense of more frequent CMS cycles...

Tony

jeff.lloyd at algorithmics.com wrote:
>
> Hi,
>
>  
>
> I?m new to this list and I have a few questions about tuning my young 
> generation gc.
>
>  
>
> I have chosen to use the CMS garbage collector because my application 
> is a relatively large reporting server that has a web front end and 
> therefore needs to have minimal pauses. 
>
>  
>
> I am using java 1.6.0_16 64-bit on redhat 5.2 intel 8x3GHz and 64GB ram.
>
>  
>
> The machine is dedicated to this JVM.
>
>  
>
> My steady-state was calculated as follows:
>
> -          A typical number of users logged in and viewed several reports
>
> -          Stopped user actions and performed a manual full GC
>
> -          Look at the amount of heap used and take that number as the 
> steady-state memory requirement
>
>  
>
> In this case my heap usage was ~10GB.  In order to handle variance or 
> spikes I sized my old generation at 15-20GB.
>
>  
>
> I sized my young generation at 32-42GB and used survivor ratios of 1, 
> 2, 3 and 6.
>
>  
>
> My goal is to maximize throughput and minimize pauses.  I?m willing to 
> sacrifice ram to increase speed.
>
>  
>
> I have attached several of my many gc logs.  The file gc_48G.txt is 
> just using CMS without any other tuning, and the results are much 
> worse than what I have been able to accomplish with other settings.  
> The best results are in the files gc_52G_20Gold_32Gyoung_2sr.txt and 
> gc_57G_15Gold_42Gyoung_1sr.txt.
>
>  
>
> The problem is that some of the pauses are just too long.
>
>  
>
> Is there a way to reduce the pause time any more than I have it now?
>
> Am I heading in the right direction?  I ask because the default 
> settings are so different than what I have been heading towards.
>
>  
>
> The best reference I have found on what good gc logs look like come 
> from brief examples presented at JavaOne this year by Tony Printezis 
> and Charlie Hunt.  But I don?t seem to be able to get logs that 
> resemble their tenuring patterns.
>
>  
>
> I think I have a lot of medium-lived objects instead of nice 
> short-lived ones.
>
>  
>
> Are there any good practices for apps with objects like this?
>
>  
>
> Thanks,
>
> Jeff
>
>  
>
>  
> ------------------------------------------------------------------------
> This email and any files transmitted with it are confidential and 
> proprietary to Algorithmics Incorporated and its affiliates 
> ("Algorithmics"). If received in error, use is prohibited. Please 
> destroy, and notify sender. Sender does not waive confidentiality or 
> privilege. Internet communications cannot be guaranteed to be timely, 
> secure, error or virus-free. Algorithmics does not accept liability 
> for any errors or omissions. Any commitment intended to bind 
> Algorithmics must be reduced to writing and signed by an authorized 
> signatory.
> ------------------------------------------------------------------------
> ------------------------------------------------------------------------
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

-- 
---------------------------------------------------------------------
| Tony Printezis, Staff Engineer   | Sun Microsystems Inc.          |
|                                  | MS UBUR02-311                  |
| e-mail: tony.printezis at sun.com   | 35 Network Drive               |
| office: +1 781 442 0998 (x20998) | Burlington, MA 01803-2756, USA |
---------------------------------------------------------------------
e-mail client: Thunderbird (Linux)


From Paul.Hohensee at Sun.COM  Fri Sep 11 10:22:33 2009
From: Paul.Hohensee at Sun.COM (Paul Hohensee)
Date: Fri, 11 Sep 2009 13:22:33 -0400
Subject: Young generation configuration
In-Reply-To: <4AAA6B29.3030008@sun.com>
References: <0FCC438D62A5E643AA3F57D3417B220D0A7AE2ED@TORMAIL.algorithmics.com>
	<4AAA6B29.3030008@sun.com>
Message-ID: <4AAA8759.8010904@sun.com>

Another alternative mentioned in Tony and Charlie's J1 slides is the 
parallel
collector.  If, as Tony says, you can make the young gen large enough to 
avoid
promotion, and you really do have a steady state old gen, then which old gen
collector you use wouldn't matter much to pause times, given that young
gen pause times seem to be your immediate problem.

It may be that you just need more hardware threads to collect such a big 
young
gen too.  You might vary the number of gc threads to see how that affects
collection times.  If there's significant differences, then you need more
hardware threads, i.e., a bigger machine.

You might also try using compressed pointers via -XX:+UseCompressedOops.
That should cut down the total survivor size significantly, perhaps enough
to that your current hardware threads can collect significantly faster.  
Heap size
will be limited to < 32gb, but you're app will probably fit.  A more 
efficient
version of compressed pointers will be available in 6u18, btw.

I notice that none of your logs shows more than age 7 stats even though the
tenuring threshold is 15.  It'd be nice to see if anything dies before then.

Paul

Tony Printezis wrote:
> Jeff,
>
> Hi. I had a very brief look at your logs. Yes, your app does seem to 
> need to copy quite a lot (I don't think I've ever seen 1-2GB of data 
> being copied in age 1!!!). From what I've seen from the space sizes, 
> you're doing the right thing (i.e., you're consistent with what we 
> talked about during the talk): you have quite large young gen and a 
> reasonably sized old gen. But the sheer amount of surviving objects is 
> what's getting you. How much larger can you make your young gen? I think 
> in this case, the larger, the better.  Maybe, you can also try 
> MaxTenuringThreshold=1. This goes against our general advice, but this 
> might decrease the amount of objects being copied during young GCs, at 
> the expense of more frequent CMS cycles...
>
> Tony
>
> jeff.lloyd at algorithmics.com wrote:
>   
>> Hi,
>>
>>  
>>
>> I?m new to this list and I have a few questions about tuning my young 
>> generation gc.
>>
>>  
>>
>> I have chosen to use the CMS garbage collector because my application 
>> is a relatively large reporting server that has a web front end and 
>> therefore needs to have minimal pauses. 
>>
>>  
>>
>> I am using java 1.6.0_16 64-bit on redhat 5.2 intel 8x3GHz and 64GB ram.
>>
>>  
>>
>> The machine is dedicated to this JVM.
>>
>>  
>>
>> My steady-state was calculated as follows:
>>
>> -          A typical number of users logged in and viewed several reports
>>
>> -          Stopped user actions and performed a manual full GC
>>
>> -          Look at the amount of heap used and take that number as the 
>> steady-state memory requirement
>>
>>  
>>
>> In this case my heap usage was ~10GB.  In order to handle variance or 
>> spikes I sized my old generation at 15-20GB.
>>
>>  
>>
>> I sized my young generation at 32-42GB and used survivor ratios of 1, 
>> 2, 3 and 6.
>>
>>  
>>
>> My goal is to maximize throughput and minimize pauses.  I?m willing to 
>> sacrifice ram to increase speed.
>>
>>  
>>
>> I have attached several of my many gc logs.  The file gc_48G.txt is 
>> just using CMS without any other tuning, and the results are much 
>> worse than what I have been able to accomplish with other settings.  
>> The best results are in the files gc_52G_20Gold_32Gyoung_2sr.txt and 
>> gc_57G_15Gold_42Gyoung_1sr.txt.
>>
>>  
>>
>> The problem is that some of the pauses are just too long.
>>
>>  
>>
>> Is there a way to reduce the pause time any more than I have it now?
>>
>> Am I heading in the right direction?  I ask because the default 
>> settings are so different than what I have been heading towards.
>>
>>  
>>
>> The best reference I have found on what good gc logs look like come 
>> from brief examples presented at JavaOne this year by Tony Printezis 
>> and Charlie Hunt.  But I don?t seem to be able to get logs that 
>> resemble their tenuring patterns.
>>
>>  
>>
>> I think I have a lot of medium-lived objects instead of nice 
>> short-lived ones.
>>
>>  
>>
>> Are there any good practices for apps with objects like this?
>>
>>  
>>
>> Thanks,
>>
>> Jeff
>>
>>  
>>
>>  
>> ------------------------------------------------------------------------
>> This email and any files transmitted with it are confidential and 
>> proprietary to Algorithmics Incorporated and its affiliates 
>> ("Algorithmics"). If received in error, use is prohibited. Please 
>> destroy, and notify sender. Sender does not waive confidentiality or 
>> privilege. Internet communications cannot be guaranteed to be timely, 
>> secure, error or virus-free. Algorithmics does not accept liability 
>> for any errors or omissions. Any commitment intended to bind 
>> Algorithmics must be reduced to writing and signed by an authorized 
>> signatory.
>> ------------------------------------------------------------------------
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>     
>
>   

From Y.S.Ramakrishna at Sun.COM  Fri Sep 11 11:13:54 2009
From: Y.S.Ramakrishna at Sun.COM (Y.S.Ramakrishna at Sun.COM)
Date: Fri, 11 Sep 2009 11:13:54 -0700
Subject: Young generation configuration
In-Reply-To: <4AAA8759.8010904@sun.com>
References: <0FCC438D62A5E643AA3F57D3417B220D0A7AE2ED@TORMAIL.algorithmics.com>
	<4AAA6B29.3030008@sun.com> <4AAA8759.8010904@sun.com>
Message-ID: <4AAA9362.3080009@Sun.COM>

Just some very general remarks ...

>> jeff.lloyd at algorithmics.com wrote:
...
>>> My goal is to maximize throughput and minimize pauses.  I?m willing to 
>>> sacrifice ram to increase speed.

Ah, but you may not be able to achieve a joint optimum there;
on the contrary, maximal throughput is often achieved at
maximal pause times. Lowering pause times to within budget
currently often involves giving up some throughput.
You need to define the maximum pause time you can stand and
the minimum throughput you can tolerate, and solve that
optimization problem.

...
>>> The problem is that some of the pauses are just too long.

Hmm, good, we are getting closer :-) How long is "too long"?

...
>>> Is there a way to reduce the pause time any more than I have it now?

yes, but you will likely give up on throughput.

>>>
>>> Am I heading in the right direction?  I ask because the default 
>>> settings are so different than what I have been heading towards.

Depending on your boundary conditions (constraints on your
objective metrics, and if you can define a suitable utility
or objective function) there may be multiple optimal configurations,
or none at all, which will meet your constraints.


>>> I think I have a lot of medium-lived objects instead of nice 
>>> short-lived ones.

You also have some short-lived ones (may be about 80%?), but yes you do
have quite some (~15%?) of medium-lived ones. The total volume of
such medium-lived objects is proportional to the transactional
rate that your server is subject to, and also proportional to
the longevity of those transactions (where i am using transactions
loosely to mean how long it takes for the records associated
with those transactions to flush their state).

You mention that your application is a "reporting server".
What is your estimate of the (expected/measured)
lifetime of such a "reporting transaction"? Does it
match the kinds of object lifetimes you are seeing here?

-- ramki

From jeff.lloyd at algorithmics.com  Fri Sep 11 13:39:02 2009
From: jeff.lloyd at algorithmics.com (jeff.lloyd at algorithmics.com)
Date: Fri, 11 Sep 2009 16:39:02 -0400
Subject: Young generation configuration
In-Reply-To: <4AAA6B29.3030008@sun.com>
References: <0FCC438D62A5E643AA3F57D3417B220D0A7AE2ED@TORMAIL.algorithmics.com>
	<4AAA6B29.3030008@sun.com>
Message-ID: <0FCC438D62A5E643AA3F57D3417B220D0A813803@TORMAIL.algorithmics.com>

Hi Tony,

We do have a lot of data that we create/copy within the application.  We
hold big trees/graphs of data representing large portfolio structures in
memory per user.  Slicing and dicing the data creates similar strains.

I'll try to increase the YG and play more with MTT to see if I can speed
things up.  The problem is that we have an interactive web interface so
the pauses need to be relatively quick or the UI responsiveness suffers.

If I set MTT to 1, then I am guessing I may need to boost my OG size
because it will fill up faster.  Would it make sense to increase the OG
size and reduce the initiating occupancy fraction?

Thanks!
Jeff

-----Original Message-----
From: Antonios.Printezis at sun.com [mailto:Antonios.Printezis at sun.com] On
Behalf Of Tony Printezis
Sent: Friday, September 11, 2009 11:22 AM
To: Jeff Lloyd
Cc: hotspot-gc-use at openjdk.java.net
Subject: Re: Young generation configuration

Jeff,

Hi. I had a very brief look at your logs. Yes, your app does seem to 
need to copy quite a lot (I don't think I've ever seen 1-2GB of data 
being copied in age 1!!!). From what I've seen from the space sizes, 
you're doing the right thing (i.e., you're consistent with what we 
talked about during the talk): you have quite large young gen and a 
reasonably sized old gen. But the sheer amount of surviving objects is 
what's getting you. How much larger can you make your young gen? I think

in this case, the larger, the better.  Maybe, you can also try 
MaxTenuringThreshold=1. This goes against our general advice, but this 
might decrease the amount of objects being copied during young GCs, at 
the expense of more frequent CMS cycles...

Tony

jeff.lloyd at algorithmics.com wrote:
>
> Hi,
>
>  
>
> I'm new to this list and I have a few questions about tuning my young 
> generation gc.
>
>  
>
> I have chosen to use the CMS garbage collector because my application 
> is a relatively large reporting server that has a web front end and 
> therefore needs to have minimal pauses. 
>
>  
>
> I am using java 1.6.0_16 64-bit on redhat 5.2 intel 8x3GHz and 64GB
ram.
>
>  
>
> The machine is dedicated to this JVM.
>
>  
>
> My steady-state was calculated as follows:
>
> -          A typical number of users logged in and viewed several
reports
>
> -          Stopped user actions and performed a manual full GC
>
> -          Look at the amount of heap used and take that number as the

> steady-state memory requirement
>
>  
>
> In this case my heap usage was ~10GB.  In order to handle variance or 
> spikes I sized my old generation at 15-20GB.
>
>  
>
> I sized my young generation at 32-42GB and used survivor ratios of 1, 
> 2, 3 and 6.
>
>  
>
> My goal is to maximize throughput and minimize pauses.  I'm willing to

> sacrifice ram to increase speed.
>
>  
>
> I have attached several of my many gc logs.  The file gc_48G.txt is 
> just using CMS without any other tuning, and the results are much 
> worse than what I have been able to accomplish with other settings.  
> The best results are in the files gc_52G_20Gold_32Gyoung_2sr.txt and 
> gc_57G_15Gold_42Gyoung_1sr.txt.
>
>  
>
> The problem is that some of the pauses are just too long.
>
>  
>
> Is there a way to reduce the pause time any more than I have it now?
>
> Am I heading in the right direction?  I ask because the default 
> settings are so different than what I have been heading towards.
>
>  
>
> The best reference I have found on what good gc logs look like come 
> from brief examples presented at JavaOne this year by Tony Printezis 
> and Charlie Hunt.  But I don't seem to be able to get logs that 
> resemble their tenuring patterns.
>
>  
>
> I think I have a lot of medium-lived objects instead of nice 
> short-lived ones.
>
>  
>
> Are there any good practices for apps with objects like this?
>
>  
>
> Thanks,
>
> Jeff
>
>  
>
>  
>
------------------------------------------------------------------------
> This email and any files transmitted with it are confidential and 
> proprietary to Algorithmics Incorporated and its affiliates 
> ("Algorithmics"). If received in error, use is prohibited. Please 
> destroy, and notify sender. Sender does not waive confidentiality or 
> privilege. Internet communications cannot be guaranteed to be timely, 
> secure, error or virus-free. Algorithmics does not accept liability 
> for any errors or omissions. Any commitment intended to bind 
> Algorithmics must be reduced to writing and signed by an authorized 
> signatory.
>
------------------------------------------------------------------------
>
------------------------------------------------------------------------
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

-- 
---------------------------------------------------------------------
| Tony Printezis, Staff Engineer   | Sun Microsystems Inc.          |
|                                  | MS UBUR02-311                  |
| e-mail: tony.printezis at sun.com   | 35 Network Drive               |
| office: +1 781 442 0998 (x20998) | Burlington, MA 01803-2756, USA |
---------------------------------------------------------------------
e-mail client: Thunderbird (Linux)


--------------------------------------------------------------------------
This email and any files transmitted with it are confidential and proprietary to Algorithmics Incorporated and its affiliates ("Algorithmics"). If received in error, use is prohibited. Please destroy, and notify sender. Sender does not waive confidentiality or privilege. Internet communications cannot be guaranteed to be timely, secure, error or virus-free. Algorithmics does not accept liability for any errors or omissions. Any commitment intended to bind Algorithmics must be reduced to writing and signed by an authorized signatory.
--------------------------------------------------------------------------

From jeff.lloyd at algorithmics.com  Fri Sep 11 13:49:45 2009
From: jeff.lloyd at algorithmics.com (jeff.lloyd at algorithmics.com)
Date: Fri, 11 Sep 2009 16:49:45 -0400
Subject: Young generation configuration
In-Reply-To: <4AAA8759.8010904@sun.com>
References: <0FCC438D62A5E643AA3F57D3417B220D0A7AE2ED@TORMAIL.algorithmics.com>
	<4AAA6B29.3030008@sun.com> <4AAA8759.8010904@sun.com>
Message-ID: <0FCC438D62A5E643AA3F57D3417B220D0A813819@TORMAIL.algorithmics.com>

Thanks for your response Paul.

I'll take another look at the parallel collector.  

That's a good point about the -XX:+UseCompressedOops.  We started off
with heaps bigger than 32G so I had left that option out.  I'll put it
back in and definitely try out 6u18 when it's available.

What about the option -XX:+UseAdaptiveGCBoundary?  I don't see it
referenced very often.  Would it be helpful in a case like mine?

I'm not sure I understand your last paragraph.  What is the period of
time that you would be interested in seeing?

Jeff

-----Original Message-----
From: Paul.Hohensee at Sun.COM [mailto:Paul.Hohensee at Sun.COM] 
Sent: Friday, September 11, 2009 1:23 PM
To: Tony Printezis
Cc: Jeff Lloyd; hotspot-gc-use at openjdk.java.net
Subject: Re: Young generation configuration

Another alternative mentioned in Tony and Charlie's J1 slides is the 
parallel
collector.  If, as Tony says, you can make the young gen large enough to

avoid
promotion, and you really do have a steady state old gen, then which old
gen
collector you use wouldn't matter much to pause times, given that young
gen pause times seem to be your immediate problem.

It may be that you just need more hardware threads to collect such a big

young
gen too.  You might vary the number of gc threads to see how that
affects
collection times.  If there's significant differences, then you need
more
hardware threads, i.e., a bigger machine.

You might also try using compressed pointers via -XX:+UseCompressedOops.
That should cut down the total survivor size significantly, perhaps
enough
to that your current hardware threads can collect significantly faster.

Heap size
will be limited to < 32gb, but you're app will probably fit.  A more 
efficient
version of compressed pointers will be available in 6u18, btw.

I notice that none of your logs shows more than age 7 stats even though
the
tenuring threshold is 15.  It'd be nice to see if anything dies before
then.

Paul

Tony Printezis wrote:
> Jeff,
>
> Hi. I had a very brief look at your logs. Yes, your app does seem to 
> need to copy quite a lot (I don't think I've ever seen 1-2GB of data 
> being copied in age 1!!!). From what I've seen from the space sizes, 
> you're doing the right thing (i.e., you're consistent with what we 
> talked about during the talk): you have quite large young gen and a 
> reasonably sized old gen. But the sheer amount of surviving objects is

> what's getting you. How much larger can you make your young gen? I
think 
> in this case, the larger, the better.  Maybe, you can also try 
> MaxTenuringThreshold=1. This goes against our general advice, but this

> might decrease the amount of objects being copied during young GCs, at

> the expense of more frequent CMS cycles...
>
> Tony
>
> jeff.lloyd at algorithmics.com wrote:
>   
>> Hi,
>>
>>  
>>
>> I'm new to this list and I have a few questions about tuning my young

>> generation gc.
>>
>>  
>>
>> I have chosen to use the CMS garbage collector because my application

>> is a relatively large reporting server that has a web front end and 
>> therefore needs to have minimal pauses. 
>>
>>  
>>
>> I am using java 1.6.0_16 64-bit on redhat 5.2 intel 8x3GHz and 64GB
ram.
>>
>>  
>>
>> The machine is dedicated to this JVM.
>>
>>  
>>
>> My steady-state was calculated as follows:
>>
>> -          A typical number of users logged in and viewed several
reports
>>
>> -          Stopped user actions and performed a manual full GC
>>
>> -          Look at the amount of heap used and take that number as
the 
>> steady-state memory requirement
>>
>>  
>>
>> In this case my heap usage was ~10GB.  In order to handle variance or

>> spikes I sized my old generation at 15-20GB.
>>
>>  
>>
>> I sized my young generation at 32-42GB and used survivor ratios of 1,

>> 2, 3 and 6.
>>
>>  
>>
>> My goal is to maximize throughput and minimize pauses.  I'm willing
to 
>> sacrifice ram to increase speed.
>>
>>  
>>
>> I have attached several of my many gc logs.  The file gc_48G.txt is 
>> just using CMS without any other tuning, and the results are much 
>> worse than what I have been able to accomplish with other settings.  
>> The best results are in the files gc_52G_20Gold_32Gyoung_2sr.txt and 
>> gc_57G_15Gold_42Gyoung_1sr.txt.
>>
>>  
>>
>> The problem is that some of the pauses are just too long.
>>
>>  
>>
>> Is there a way to reduce the pause time any more than I have it now?
>>
>> Am I heading in the right direction?  I ask because the default 
>> settings are so different than what I have been heading towards.
>>
>>  
>>
>> The best reference I have found on what good gc logs look like come 
>> from brief examples presented at JavaOne this year by Tony Printezis 
>> and Charlie Hunt.  But I don't seem to be able to get logs that 
>> resemble their tenuring patterns.
>>
>>  
>>
>> I think I have a lot of medium-lived objects instead of nice 
>> short-lived ones.
>>
>>  
>>
>> Are there any good practices for apps with objects like this?
>>
>>  
>>
>> Thanks,
>>
>> Jeff
>>
>>  
>>
>>  
>>
------------------------------------------------------------------------
>> This email and any files transmitted with it are confidential and 
>> proprietary to Algorithmics Incorporated and its affiliates 
>> ("Algorithmics"). If received in error, use is prohibited. Please 
>> destroy, and notify sender. Sender does not waive confidentiality or 
>> privilege. Internet communications cannot be guaranteed to be timely,

>> secure, error or virus-free. Algorithmics does not accept liability 
>> for any errors or omissions. Any commitment intended to bind 
>> Algorithmics must be reduced to writing and signed by an authorized 
>> signatory.
>>
------------------------------------------------------------------------
>>
------------------------------------------------------------------------
>>
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>     
>
>   

 
--------------------------------------------------------------------------
This email and any files transmitted with it are confidential and proprietary to Algorithmics Incorporated and its affiliates ("Algorithmics"). If received in error, use is prohibited. Please destroy, and notify sender. Sender does not waive confidentiality or privilege. Internet communications cannot be guaranteed to be timely, secure, error or virus-free. Algorithmics does not accept liability for any errors or omissions. Any commitment intended to bind Algorithmics must be reduced to writing and signed by an authorized signatory.
--------------------------------------------------------------------------

From tony.printezis at sun.com  Fri Sep 11 13:54:50 2009
From: tony.printezis at sun.com (Tony Printezis)
Date: Fri, 11 Sep 2009 16:54:50 -0400
Subject: Young generation configuration
In-Reply-To: <0FCC438D62A5E643AA3F57D3417B220D0A813803@TORMAIL.algorithmics.com>
References: <0FCC438D62A5E643AA3F57D3417B220D0A7AE2ED@TORMAIL.algorithmics.com>
	<4AAA6B29.3030008@sun.com>
	<0FCC438D62A5E643AA3F57D3417B220D0A813803@TORMAIL.algorithmics.com>
Message-ID: <4AAAB91A.9020302@sun.com>


jeff.lloyd at algorithmics.com wrote:
> Hi Tony,
>
> We do have a lot of data that we create/copy within the application.  We
> hold big trees/graphs of data representing large portfolio structures in
> memory per user.  Slicing and dicing the data creates similar strains.
>
> I'll try to increase the YG and play more with MTT to see if I can speed
> things up.  The problem is that we have an interactive web interface so
> the pauses need to be relatively quick or the UI responsiveness suffers.
>
> If I set MTT to 1, then I am guessing I may need to boost my OG size
> because it will fill up faster.  Would it make sense to increase the OG
> size and reduce the initiating occupancy fraction?
>   
Definitely. Someone was paying attention during the talk. :-) But, 
concentrate first on whether the young GC times are good enough.

Tony
> -----Original Message-----
> From: Antonios.Printezis at sun.com [mailto:Antonios.Printezis at sun.com] On
> Behalf Of Tony Printezis
> Sent: Friday, September 11, 2009 11:22 AM
> To: Jeff Lloyd
> Cc: hotspot-gc-use at openjdk.java.net
> Subject: Re: Young generation configuration
>
> Jeff,
>
> Hi. I had a very brief look at your logs. Yes, your app does seem to 
> need to copy quite a lot (I don't think I've ever seen 1-2GB of data 
> being copied in age 1!!!). From what I've seen from the space sizes, 
> you're doing the right thing (i.e., you're consistent with what we 
> talked about during the talk): you have quite large young gen and a 
> reasonably sized old gen. But the sheer amount of surviving objects is 
> what's getting you. How much larger can you make your young gen? I think
>
> in this case, the larger, the better.  Maybe, you can also try 
> MaxTenuringThreshold=1. This goes against our general advice, but this 
> might decrease the amount of objects being copied during young GCs, at 
> the expense of more frequent CMS cycles...
>
> Tony
>
> jeff.lloyd at algorithmics.com wrote:
>   
>> Hi,
>>
>>  
>>
>> I'm new to this list and I have a few questions about tuning my young 
>> generation gc.
>>
>>  
>>
>> I have chosen to use the CMS garbage collector because my application 
>> is a relatively large reporting server that has a web front end and 
>> therefore needs to have minimal pauses. 
>>
>>  
>>
>> I am using java 1.6.0_16 64-bit on redhat 5.2 intel 8x3GHz and 64GB
>>     
> ram.
>   
>>  
>>
>> The machine is dedicated to this JVM.
>>
>>  
>>
>> My steady-state was calculated as follows:
>>
>> -          A typical number of users logged in and viewed several
>>     
> reports
>   
>> -          Stopped user actions and performed a manual full GC
>>
>> -          Look at the amount of heap used and take that number as the
>>     
>
>   
>> steady-state memory requirement
>>
>>  
>>
>> In this case my heap usage was ~10GB.  In order to handle variance or 
>> spikes I sized my old generation at 15-20GB.
>>
>>  
>>
>> I sized my young generation at 32-42GB and used survivor ratios of 1, 
>> 2, 3 and 6.
>>
>>  
>>
>> My goal is to maximize throughput and minimize pauses.  I'm willing to
>>     
>
>   
>> sacrifice ram to increase speed.
>>
>>  
>>
>> I have attached several of my many gc logs.  The file gc_48G.txt is 
>> just using CMS without any other tuning, and the results are much 
>> worse than what I have been able to accomplish with other settings.  
>> The best results are in the files gc_52G_20Gold_32Gyoung_2sr.txt and 
>> gc_57G_15Gold_42Gyoung_1sr.txt.
>>
>>  
>>
>> The problem is that some of the pauses are just too long.
>>
>>  
>>
>> Is there a way to reduce the pause time any more than I have it now?
>>
>> Am I heading in the right direction?  I ask because the default 
>> settings are so different than what I have been heading towards.
>>
>>  
>>
>> The best reference I have found on what good gc logs look like come 
>> from brief examples presented at JavaOne this year by Tony Printezis 
>> and Charlie Hunt.  But I don't seem to be able to get logs that 
>> resemble their tenuring patterns.
>>
>>  
>>
>> I think I have a lot of medium-lived objects instead of nice 
>> short-lived ones.
>>
>>  
>>
>> Are there any good practices for apps with objects like this?
>>
>>  
>>
>> Thanks,
>>
>> Jeff
>>
>>  
>>
>>  
>>
>>     
> ------------------------------------------------------------------------
>   
>> This email and any files transmitted with it are confidential and 
>> proprietary to Algorithmics Incorporated and its affiliates 
>> ("Algorithmics"). If received in error, use is prohibited. Please 
>> destroy, and notify sender. Sender does not waive confidentiality or 
>> privilege. Internet communications cannot be guaranteed to be timely, 
>> secure, error or virus-free. Algorithmics does not accept liability 
>> for any errors or omissions. Any commitment intended to bind 
>> Algorithmics must be reduced to writing and signed by an authorized 
>> signatory.
>>
>>     
> ------------------------------------------------------------------------
>   
> ------------------------------------------------------------------------
>   
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>     
>
>   

-- 
---------------------------------------------------------------------
| Tony Printezis, Staff Engineer   | Sun Microsystems Inc.          |
|                                  | MS UBUR02-311                  |
| e-mail: tony.printezis at sun.com   | 35 Network Drive               |
| office: +1 781 442 0998 (x20998) | Burlington, MA 01803-2756, USA |
---------------------------------------------------------------------
e-mail client: Thunderbird (Linux)


From jeff.lloyd at algorithmics.com  Fri Sep 11 14:06:01 2009
From: jeff.lloyd at algorithmics.com (jeff.lloyd at algorithmics.com)
Date: Fri, 11 Sep 2009 17:06:01 -0400
Subject: Young generation configuration
In-Reply-To: <4AAA9362.3080009@Sun.COM>
References: <0FCC438D62A5E643AA3F57D3417B220D0A7AE2ED@TORMAIL.algorithmics.com>
	<4AAA6B29.3030008@sun.com> <4AAA8759.8010904@sun.com>
	<4AAA9362.3080009@Sun.COM>
Message-ID: <0FCC438D62A5E643AA3F57D3417B220D0A813842@TORMAIL.algorithmics.com>

Hi Ramki,

I did not know that lower pause times and higher throughput were
generally incompatible.  Good to know - it makes sense too.

I'm trying to find out how long "too long" is.  Bankers can be fickle.
:-)  Honestly, I think "too long" constitutes a noticeable pause in GUI
interactions.

How did you measure the proportion of short-lived and medium-lived
objects?

We typically expect a "session" be live for most of the day, and
multiple reports of seconds or minutes in duration executed within that
session.  So yes, I am seeing my "steady state" continue for a long
time, with blips of activity throughout the day.  We cache a lot of
results, which can lead to a general upward trend, but it doesn't seem
to be our current source of object volume.

Thanks for your help,
Jeff

-----Original Message-----
From: Y.S.Ramakrishna at Sun.COM [mailto:Y.S.Ramakrishna at Sun.COM] 
Sent: Friday, September 11, 2009 2:14 PM
To: Jeff Lloyd
Cc: hotspot-gc-use at openjdk.java.net
Subject: Re: Young generation configuration

Just some very general remarks ...

>> jeff.lloyd at algorithmics.com wrote:
...
>>> My goal is to maximize throughput and minimize pauses.  I'm willing
to 
>>> sacrifice ram to increase speed.

Ah, but you may not be able to achieve a joint optimum there;
on the contrary, maximal throughput is often achieved at
maximal pause times. Lowering pause times to within budget
currently often involves giving up some throughput.
You need to define the maximum pause time you can stand and
the minimum throughput you can tolerate, and solve that
optimization problem.

...
>>> The problem is that some of the pauses are just too long.

Hmm, good, we are getting closer :-) How long is "too long"?

...
>>> Is there a way to reduce the pause time any more than I have it now?

yes, but you will likely give up on throughput.

>>>
>>> Am I heading in the right direction?  I ask because the default 
>>> settings are so different than what I have been heading towards.

Depending on your boundary conditions (constraints on your
objective metrics, and if you can define a suitable utility
or objective function) there may be multiple optimal configurations,
or none at all, which will meet your constraints.


>>> I think I have a lot of medium-lived objects instead of nice 
>>> short-lived ones.

You also have some short-lived ones (may be about 80%?), but yes you do
have quite some (~15%?) of medium-lived ones. The total volume of
such medium-lived objects is proportional to the transactional
rate that your server is subject to, and also proportional to
the longevity of those transactions (where i am using transactions
loosely to mean how long it takes for the records associated
with those transactions to flush their state).

You mention that your application is a "reporting server".
What is your estimate of the (expected/measured)
lifetime of such a "reporting transaction"? Does it
match the kinds of object lifetimes you are seeing here?

-- ramki

 
--------------------------------------------------------------------------
This email and any files transmitted with it are confidential and proprietary to Algorithmics Incorporated and its affiliates ("Algorithmics"). If received in error, use is prohibited. Please destroy, and notify sender. Sender does not waive confidentiality or privilege. Internet communications cannot be guaranteed to be timely, secure, error or virus-free. Algorithmics does not accept liability for any errors or omissions. Any commitment intended to bind Algorithmics must be reduced to writing and signed by an authorized signatory.
--------------------------------------------------------------------------

From Paul.Hohensee at Sun.COM  Fri Sep 11 14:13:46 2009
From: Paul.Hohensee at Sun.COM (Paul Hohensee)
Date: Fri, 11 Sep 2009 17:13:46 -0400
Subject: Young generation configuration
In-Reply-To: <0FCC438D62A5E643AA3F57D3417B220D0A813819@TORMAIL.algorithmics.com>
References: <0FCC438D62A5E643AA3F57D3417B220D0A7AE2ED@TORMAIL.algorithmics.com>
	<4AAA6B29.3030008@sun.com> <4AAA8759.8010904@sun.com>
	<0FCC438D62A5E643AA3F57D3417B220D0A813819@TORMAIL.algorithmics.com>
Message-ID: <4AAABD8A.7000900@sun.com>

You can try out compressed pointers in 6u14.  It just won't be quite as
fast as the version that's going into 6u18.  6u14 with compressed pointers
will still be quite a bit faster than without.

One of the gc guys may correct me, but UseAdaptiveGCBoundary allows
the vm to ergonomically move the boundary between old and young generations,
effectively resizing them.  I don't know if it's bit-rotted, and I seem 
to remember
that there wasn't much benefit.  But maybe we just didn't have a good 
use case.

What I meant by the last paragraph was that with the tenuring threshold 
set at
15 (which is what the log says), and with only 7 young gcs in the log, 
we can't
see at what age (or if) between 8 and 15 the survivor size goes down to 
something
reasonable.  If it doesn't, it might be worth it to us to revisit 
increasing the age
limit for 64-bit.

Paul

jeff.lloyd at algorithmics.com wrote:
> Thanks for your response Paul.
>
> I'll take another look at the parallel collector.  
>
> That's a good point about the -XX:+UseCompressedOops.  We started off
> with heaps bigger than 32G so I had left that option out.  I'll put it
> back in and definitely try out 6u18 when it's available.
>
> What about the option -XX:+UseAdaptiveGCBoundary?  I don't see it
> referenced very often.  Would it be helpful in a case like mine?
>
> I'm not sure I understand your last paragraph.  What is the period of
> time that you would be interested in seeing?
>
> Jeff
>
> -----Original Message-----
> From: Paul.Hohensee at Sun.COM [mailto:Paul.Hohensee at Sun.COM] 
> Sent: Friday, September 11, 2009 1:23 PM
> To: Tony Printezis
> Cc: Jeff Lloyd; hotspot-gc-use at openjdk.java.net
> Subject: Re: Young generation configuration
>
> Another alternative mentioned in Tony and Charlie's J1 slides is the 
> parallel
> collector.  If, as Tony says, you can make the young gen large enough to
>
> avoid
> promotion, and you really do have a steady state old gen, then which old
> gen
> collector you use wouldn't matter much to pause times, given that young
> gen pause times seem to be your immediate problem.
>
> It may be that you just need more hardware threads to collect such a big
>
> young
> gen too.  You might vary the number of gc threads to see how that
> affects
> collection times.  If there's significant differences, then you need
> more
> hardware threads, i.e., a bigger machine.
>
> You might also try using compressed pointers via -XX:+UseCompressedOops.
> That should cut down the total survivor size significantly, perhaps
> enough
> to that your current hardware threads can collect significantly faster.
>
> Heap size
> will be limited to < 32gb, but you're app will probably fit.  A more 
> efficient
> version of compressed pointers will be available in 6u18, btw.
>
> I notice that none of your logs shows more than age 7 stats even though
> the
> tenuring threshold is 15.  It'd be nice to see if anything dies before
> then.
>
> Paul
>
> Tony Printezis wrote:
>   
>> Jeff,
>>
>> Hi. I had a very brief look at your logs. Yes, your app does seem to 
>> need to copy quite a lot (I don't think I've ever seen 1-2GB of data 
>> being copied in age 1!!!). From what I've seen from the space sizes, 
>> you're doing the right thing (i.e., you're consistent with what we 
>> talked about during the talk): you have quite large young gen and a 
>> reasonably sized old gen. But the sheer amount of surviving objects is
>>     
>
>   
>> what's getting you. How much larger can you make your young gen? I
>>     
> think 
>   
>> in this case, the larger, the better.  Maybe, you can also try 
>> MaxTenuringThreshold=1. This goes against our general advice, but this
>>     
>
>   
>> might decrease the amount of objects being copied during young GCs, at
>>     
>
>   
>> the expense of more frequent CMS cycles...
>>
>> Tony
>>
>> jeff.lloyd at algorithmics.com wrote:
>>   
>>     
>>> Hi,
>>>
>>>  
>>>
>>> I'm new to this list and I have a few questions about tuning my young
>>>       
>
>   
>>> generation gc.
>>>
>>>  
>>>
>>> I have chosen to use the CMS garbage collector because my application
>>>       
>
>   
>>> is a relatively large reporting server that has a web front end and 
>>> therefore needs to have minimal pauses. 
>>>
>>>  
>>>
>>> I am using java 1.6.0_16 64-bit on redhat 5.2 intel 8x3GHz and 64GB
>>>       
> ram.
>   
>>>  
>>>
>>> The machine is dedicated to this JVM.
>>>
>>>  
>>>
>>> My steady-state was calculated as follows:
>>>
>>> -          A typical number of users logged in and viewed several
>>>       
> reports
>   
>>> -          Stopped user actions and performed a manual full GC
>>>
>>> -          Look at the amount of heap used and take that number as
>>>       
> the 
>   
>>> steady-state memory requirement
>>>
>>>  
>>>
>>> In this case my heap usage was ~10GB.  In order to handle variance or
>>>       
>
>   
>>> spikes I sized my old generation at 15-20GB.
>>>
>>>  
>>>
>>> I sized my young generation at 32-42GB and used survivor ratios of 1,
>>>       
>
>   
>>> 2, 3 and 6.
>>>
>>>  
>>>
>>> My goal is to maximize throughput and minimize pauses.  I'm willing
>>>       
> to 
>   
>>> sacrifice ram to increase speed.
>>>
>>>  
>>>
>>> I have attached several of my many gc logs.  The file gc_48G.txt is 
>>> just using CMS without any other tuning, and the results are much 
>>> worse than what I have been able to accomplish with other settings.  
>>> The best results are in the files gc_52G_20Gold_32Gyoung_2sr.txt and 
>>> gc_57G_15Gold_42Gyoung_1sr.txt.
>>>
>>>  
>>>
>>> The problem is that some of the pauses are just too long.
>>>
>>>  
>>>
>>> Is there a way to reduce the pause time any more than I have it now?
>>>
>>> Am I heading in the right direction?  I ask because the default 
>>> settings are so different than what I have been heading towards.
>>>
>>>  
>>>
>>> The best reference I have found on what good gc logs look like come 
>>> from brief examples presented at JavaOne this year by Tony Printezis 
>>> and Charlie Hunt.  But I don't seem to be able to get logs that 
>>> resemble their tenuring patterns.
>>>
>>>  
>>>
>>> I think I have a lot of medium-lived objects instead of nice 
>>> short-lived ones.
>>>
>>>  
>>>
>>> Are there any good practices for apps with objects like this?
>>>
>>>  
>>>
>>> Thanks,
>>>
>>> Jeff
>>>
>>>  
>>>
>>>  
>>>
>>>       
> ------------------------------------------------------------------------
>   
>>> This email and any files transmitted with it are confidential and 
>>> proprietary to Algorithmics Incorporated and its affiliates 
>>> ("Algorithmics"). If received in error, use is prohibited. Please 
>>> destroy, and notify sender. Sender does not waive confidentiality or 
>>> privilege. Internet communications cannot be guaranteed to be timely,
>>>       
>
>   
>>> secure, error or virus-free. Algorithmics does not accept liability 
>>> for any errors or omissions. Any commitment intended to bind 
>>> Algorithmics must be reduced to writing and signed by an authorized 
>>> signatory.
>>>
>>>       
> ------------------------------------------------------------------------
>   
> ------------------------------------------------------------------------
>   
>>> _______________________________________________
>>> hotspot-gc-use mailing list
>>> hotspot-gc-use at openjdk.java.net
>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>     
>>>       
>>   
>>     
>
>  
> --------------------------------------------------------------------------
> This email and any files transmitted with it are confidential and proprietary to Algorithmics Incorporated and its affiliates ("Algorithmics"). If received in error, use is prohibited. Please destroy, and notify sender. Sender does not waive confidentiality or privilege. Internet communications cannot be guaranteed to be timely, secure, error or virus-free. Algorithmics does not accept liability for any errors or omissions. Any commitment intended to bind Algorithmics must be reduced to writing and signed by an authorized signatory.
> --------------------------------------------------------------------------
>   

From tony.printezis at sun.com  Fri Sep 11 14:17:55 2009
From: tony.printezis at sun.com (Tony Printezis)
Date: Fri, 11 Sep 2009 17:17:55 -0400
Subject: Young generation configuration
In-Reply-To: <4AAABD8A.7000900@sun.com>
References: <0FCC438D62A5E643AA3F57D3417B220D0A7AE2ED@TORMAIL.algorithmics.com>
	<4AAA6B29.3030008@sun.com> <4AAA8759.8010904@sun.com>
	<0FCC438D62A5E643AA3F57D3417B220D0A813819@TORMAIL.algorithmics.com>
	<4AAABD8A.7000900@sun.com>
Message-ID: <4AAABE83.90308@sun.com>


Paul Hohensee wrote:
> You can try out compressed pointers in 6u14.  It just won't be quite as
> fast as the version that's going into 6u18.  6u14 with compressed pointers
> will still be quite a bit faster than without.
>
> One of the gc guys may correct me, but UseAdaptiveGCBoundary allows
> the vm to ergonomically move the boundary between old and young generations,
> effectively resizing them.  I don't know if it's bit-rotted, and I seem 
> to remember
> that there wasn't much benefit.  But maybe we just didn't have a good 
> use case.
>   
Also, it's ParallelGC-only, IIRC.
> What I meant by the last paragraph was that with the tenuring threshold 
> set at
> 15 (which is what the log says), and with only 7 young gcs in the log, 
> we can't
> see at what age (or if) between 8 and 15 the survivor size goes down to 
> something
> reasonable.  If it doesn't, it might be worth it to us to revisit 
> increasing the age
> limit for 64-bit.
>   
Paul, the problem in Jeff's case is that even at age 1 he copies 1GB or 
so. So, maybe, setting a small MTT and having more CMS cycles might be 
the right option for him.

Tony
> jeff.lloyd at algorithmics.com wrote:
>   
>> Thanks for your response Paul.
>>
>> I'll take another look at the parallel collector.  
>>
>> That's a good point about the -XX:+UseCompressedOops.  We started off
>> with heaps bigger than 32G so I had left that option out.  I'll put it
>> back in and definitely try out 6u18 when it's available.
>>
>> What about the option -XX:+UseAdaptiveGCBoundary?  I don't see it
>> referenced very often.  Would it be helpful in a case like mine?
>>
>> I'm not sure I understand your last paragraph.  What is the period of
>> time that you would be interested in seeing?
>>
>> Jeff
>>
>> -----Original Message-----
>> From: Paul.Hohensee at Sun.COM [mailto:Paul.Hohensee at Sun.COM] 
>> Sent: Friday, September 11, 2009 1:23 PM
>> To: Tony Printezis
>> Cc: Jeff Lloyd; hotspot-gc-use at openjdk.java.net
>> Subject: Re: Young generation configuration
>>
>> Another alternative mentioned in Tony and Charlie's J1 slides is the 
>> parallel
>> collector.  If, as Tony says, you can make the young gen large enough to
>>
>> avoid
>> promotion, and you really do have a steady state old gen, then which old
>> gen
>> collector you use wouldn't matter much to pause times, given that young
>> gen pause times seem to be your immediate problem.
>>
>> It may be that you just need more hardware threads to collect such a big
>>
>> young
>> gen too.  You might vary the number of gc threads to see how that
>> affects
>> collection times.  If there's significant differences, then you need
>> more
>> hardware threads, i.e., a bigger machine.
>>
>> You might also try using compressed pointers via -XX:+UseCompressedOops.
>> That should cut down the total survivor size significantly, perhaps
>> enough
>> to that your current hardware threads can collect significantly faster.
>>
>> Heap size
>> will be limited to < 32gb, but you're app will probably fit.  A more 
>> efficient
>> version of compressed pointers will be available in 6u18, btw.
>>
>> I notice that none of your logs shows more than age 7 stats even though
>> the
>> tenuring threshold is 15.  It'd be nice to see if anything dies before
>> then.
>>
>> Paul
>>
>> Tony Printezis wrote:
>>   
>>     
>>> Jeff,
>>>
>>> Hi. I had a very brief look at your logs. Yes, your app does seem to 
>>> need to copy quite a lot (I don't think I've ever seen 1-2GB of data 
>>> being copied in age 1!!!). From what I've seen from the space sizes, 
>>> you're doing the right thing (i.e., you're consistent with what we 
>>> talked about during the talk): you have quite large young gen and a 
>>> reasonably sized old gen. But the sheer amount of surviving objects is
>>>     
>>>       
>>   
>>     
>>> what's getting you. How much larger can you make your young gen? I
>>>     
>>>       
>> think 
>>   
>>     
>>> in this case, the larger, the better.  Maybe, you can also try 
>>> MaxTenuringThreshold=1. This goes against our general advice, but this
>>>     
>>>       
>>   
>>     
>>> might decrease the amount of objects being copied during young GCs, at
>>>     
>>>       
>>   
>>     
>>> the expense of more frequent CMS cycles...
>>>
>>> Tony
>>>
>>> jeff.lloyd at algorithmics.com wrote:
>>>   
>>>     
>>>       
>>>> Hi,
>>>>
>>>>  
>>>>
>>>> I'm new to this list and I have a few questions about tuning my young
>>>>       
>>>>         
>>   
>>     
>>>> generation gc.
>>>>
>>>>  
>>>>
>>>> I have chosen to use the CMS garbage collector because my application
>>>>       
>>>>         
>>   
>>     
>>>> is a relatively large reporting server that has a web front end and 
>>>> therefore needs to have minimal pauses. 
>>>>
>>>>  
>>>>
>>>> I am using java 1.6.0_16 64-bit on redhat 5.2 intel 8x3GHz and 64GB
>>>>       
>>>>         
>> ram.
>>   
>>     
>>>>  
>>>>
>>>> The machine is dedicated to this JVM.
>>>>
>>>>  
>>>>
>>>> My steady-state was calculated as follows:
>>>>
>>>> -          A typical number of users logged in and viewed several
>>>>       
>>>>         
>> reports
>>   
>>     
>>>> -          Stopped user actions and performed a manual full GC
>>>>
>>>> -          Look at the amount of heap used and take that number as
>>>>       
>>>>         
>> the 
>>   
>>     
>>>> steady-state memory requirement
>>>>
>>>>  
>>>>
>>>> In this case my heap usage was ~10GB.  In order to handle variance or
>>>>       
>>>>         
>>   
>>     
>>>> spikes I sized my old generation at 15-20GB.
>>>>
>>>>  
>>>>
>>>> I sized my young generation at 32-42GB and used survivor ratios of 1,
>>>>       
>>>>         
>>   
>>     
>>>> 2, 3 and 6.
>>>>
>>>>  
>>>>
>>>> My goal is to maximize throughput and minimize pauses.  I'm willing
>>>>       
>>>>         
>> to 
>>   
>>     
>>>> sacrifice ram to increase speed.
>>>>
>>>>  
>>>>
>>>> I have attached several of my many gc logs.  The file gc_48G.txt is 
>>>> just using CMS without any other tuning, and the results are much 
>>>> worse than what I have been able to accomplish with other settings.  
>>>> The best results are in the files gc_52G_20Gold_32Gyoung_2sr.txt and 
>>>> gc_57G_15Gold_42Gyoung_1sr.txt.
>>>>
>>>>  
>>>>
>>>> The problem is that some of the pauses are just too long.
>>>>
>>>>  
>>>>
>>>> Is there a way to reduce the pause time any more than I have it now?
>>>>
>>>> Am I heading in the right direction?  I ask because the default 
>>>> settings are so different than what I have been heading towards.
>>>>
>>>>  
>>>>
>>>> The best reference I have found on what good gc logs look like come 
>>>> from brief examples presented at JavaOne this year by Tony Printezis 
>>>> and Charlie Hunt.  But I don't seem to be able to get logs that 
>>>> resemble their tenuring patterns.
>>>>
>>>>  
>>>>
>>>> I think I have a lot of medium-lived objects instead of nice 
>>>> short-lived ones.
>>>>
>>>>  
>>>>
>>>> Are there any good practices for apps with objects like this?
>>>>
>>>>  
>>>>
>>>> Thanks,
>>>>
>>>> Jeff
>>>>
>>>>  
>>>>
>>>>  
>>>>
>>>>       
>>>>         
>> ------------------------------------------------------------------------
>>   
>>     
>>>> This email and any files transmitted with it are confidential and 
>>>> proprietary to Algorithmics Incorporated and its affiliates 
>>>> ("Algorithmics"). If received in error, use is prohibited. Please 
>>>> destroy, and notify sender. Sender does not waive confidentiality or 
>>>> privilege. Internet communications cannot be guaranteed to be timely,
>>>>       
>>>>         
>>   
>>     
>>>> secure, error or virus-free. Algorithmics does not accept liability 
>>>> for any errors or omissions. Any commitment intended to bind 
>>>> Algorithmics must be reduced to writing and signed by an authorized 
>>>> signatory.
>>>>
>>>>       
>>>>         
>> ------------------------------------------------------------------------
>>   
>> ------------------------------------------------------------------------
>>   
>>     
>>>> _______________________________________________
>>>> hotspot-gc-use mailing list
>>>> hotspot-gc-use at openjdk.java.net
>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>>     
>>>>       
>>>>         
>>>   
>>>     
>>>       
>>  
>> --------------------------------------------------------------------------
>> This email and any files transmitted with it are confidential and proprietary to Algorithmics Incorporated and its affiliates ("Algorithmics"). If received in error, use is prohibited. Please destroy, and notify sender. Sender does not waive confidentiality or privilege. Internet communications cannot be guaranteed to be timely, secure, error or virus-free. Algorithmics does not accept liability for any errors or omissions. Any commitment intended to bind Algorithmics must be reduced to writing and signed by an authorized signatory.
>> --------------------------------------------------------------------------
>>   
>>     
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>   

-- 
---------------------------------------------------------------------
| Tony Printezis, Staff Engineer   | Sun Microsystems Inc.          |
|                                  | MS UBUR02-311                  |
| e-mail: tony.printezis at sun.com   | 35 Network Drive               |
| office: +1 781 442 0998 (x20998) | Burlington, MA 01803-2756, USA |
---------------------------------------------------------------------
e-mail client: Thunderbird (Linux)


From Y.S.Ramakrishna at Sun.COM  Fri Sep 11 15:19:46 2009
From: Y.S.Ramakrishna at Sun.COM (Y.S.Ramakrishna at Sun.COM)
Date: Fri, 11 Sep 2009 15:19:46 -0700
Subject: Young generation configuration
In-Reply-To: <0FCC438D62A5E643AA3F57D3417B220D0A813842@TORMAIL.algorithmics.com>
References: <0FCC438D62A5E643AA3F57D3417B220D0A7AE2ED@TORMAIL.algorithmics.com>
	<4AAA6B29.3030008@sun.com> <4AAA8759.8010904@sun.com>
	<4AAA9362.3080009@Sun.COM>
	<0FCC438D62A5E643AA3F57D3417B220D0A813842@TORMAIL.algorithmics.com>
Message-ID: <4AAACD02.4040109@Sun.COM>

Hi Jeff --

On 09/11/09 14:06, jeff.lloyd at algorithmics.com wrote:
> Hi Ramki,
> 
> I did not know that lower pause times and higher throughput were
> generally incompatible.  Good to know - it makes sense too.
> 
> I'm trying to find out how long "too long" is.  Bankers can be fickle.
> :-)  Honestly, I think "too long" constitutes a noticeable pause in GUI
> interactions.

So, may be around one 200 ms pause per second or so at the most?
(If you think that is not suitable, think up a suitable figure like that.)
That would give us the requisite pause time budget and implicitly
define a GC overhead budget of 200/1000 = 20% (which is actually
quite high, but still lower than the overhead i saw in some
of your logs from a quick browse,
but as Tony pointed out that's because of the excessive copying
you were doing of relatively long-lived data that you may be
better off tenuring more quickly and letting the concurrent
collector deal with it (modulo yr & Tony's earlier remarks
re the slightly (see below) increased pressure -- probably
unavoidable if you are to meet yr pause time goals -- on
the concurrent collector).

> 
> How did you measure the proportion of short-lived and medium-lived
> objects?

oh, i was playing somewhat fast and loose. I was taking the ratio
of (age 1 survivors): (Eden size) to get a rough read on the
short:(not short). I sampled a single GC from one of yr log files,
but that would be the way to figure this out (while averaging
over a sufficiently large set of samples). (Of course, "long" and "short",
are relative, and age1 just tells you what survived that was allocated
in the last GC epoch. If GC's happen frequently less data would
die and more would qualify as "not short" by that kind of loose
definition (so my "long" and "short" was relative to the given GC
period).

> 
> We typically expect a "session" be live for most of the day, and

How much typical session data do you have? What is the rate at
which sessions get created? Does this happen perhaps mostly at the
start of the day? (In which case you would see lots of promotion
activity at the start of the day, but not so much later in the day?)
Or is the session creation rate uniform through the typical day?

> multiple reports of seconds or minutes in duration executed within that
> session.  So yes, I am seeing my "steady state" continue for a long

Let's say 1 minute. So during that 1 minute, how much data do you
produce and of that how much needs to be saved into the session
in the form of the "result" from that report? Looks like that
result would constitute data that you want to tenure sooner
rather than later. Depending on how long the intermediate
results needed to generate the final result are needed (you
mentioned large trees of intermediate objects i think in an
earlier email), you may want to copy them in the survivor
spaces, or -- if that data is so large as to cost excessive
copying time -- just promote that too. Luckily, in typical
cases, if data wants to be large, it also wants to live
long.

> time, with blips of activity throughout the day.  We cache a lot of
> results, which can lead to a general upward trend, but it doesn't seem
> to be our current source of object volume.

The cached data will tenure. Best to tenure it soon, if the
proportion of cached data is large. (I am guessing that
if you cache, you probably find it saves computation later --
so it also saves allocation later; thus I might naively
expect that you will initially tenure lots of data as your
caches fill, and later in steady state tenure less as well
as perhaps allocate less.)

If I look at one random tenuring distribution sample out of yr
logs, I see:-

- age   1: 2151744736 bytes, 2151744736 total
- age   2:  897330448 bytes, 3049075184 total
- age   3: 1274314280 bytes, 4323389464 total
- age   4: 1351603024 bytes, 5674992488 total
- age   5: 1529394376 bytes, 7204386864 total
- age   6: 1219001160 bytes, 8423388024 total

which is very flat -- indicating that anything that survives
a scavenge appears to live on for quite a while (lots of
assumptions about steady loads and so on). Experimenting
with an MTT of 1 or 2 might be useful, cf yr previous emails
with Tony et al. (Yes you will want to increase yr OG size,
as you noted, but no it will not fill up much faster because
the rate at which you promote will be nearly the same, because
most data that survives a single scavenge here tends to live -- above --
for at least 6 scavenges after which it prmotes anyway; you are
just promoting that same data a bit sooner without wasting effort
in copying it back and forth. It is true that some small amount
if intermediate data will promote but that's probably OK).

You will then want to play with initiating occupancy fraction
once you get an idea about the rate at which it's filling
upo versus the rate at which CMS is able to collect versus
the effect on scavenges of letting the CMS gen fill up more
before collecting versus the effect of doing more frequent
or less frequent CMS cycles (and its effect on mutator throughput
and available CPU and memory bandwidth).

Yes, as Paul noted, definitely +UseCompressedOops to relieve
heap pressure (reduce GC overhead) and speed up mutators
by improving cache efficiency.

-- ramki

From Paul.Hohensee at Sun.COM  Fri Sep 11 17:00:02 2009
From: Paul.Hohensee at Sun.COM (Paul Hohensee)
Date: Fri, 11 Sep 2009 20:00:02 -0400
Subject: Young generation configuration
In-Reply-To: <4AAABE83.90308@sun.com>
References: <0FCC438D62A5E643AA3F57D3417B220D0A7AE2ED@TORMAIL.algorithmics.com>
	<4AAA6B29.3030008@sun.com> <4AAA8759.8010904@sun.com>
	<0FCC438D62A5E643AA3F57D3417B220D0A813819@TORMAIL.algorithmics.com>
	<4AAABD8A.7000900@sun.com> <4AAABE83.90308@sun.com>
Message-ID: <4AAAE482.7070200@sun.com>

Could be, but that would lead to a lot of concurrent overhead, reducing
his throughput.  Such a balancing act. :)

Paul

Tony Printezis wrote:
>
>
> Paul Hohensee wrote:
>> You can try out compressed pointers in 6u14.  It just won't be quite as
>> fast as the version that's going into 6u18.  6u14 with compressed 
>> pointers
>> will still be quite a bit faster than without.
>>
>> One of the gc guys may correct me, but UseAdaptiveGCBoundary allows
>> the vm to ergonomically move the boundary between old and young 
>> generations,
>> effectively resizing them.  I don't know if it's bit-rotted, and I 
>> seem to remember
>> that there wasn't much benefit.  But maybe we just didn't have a good 
>> use case.
>>   
> Also, it's ParallelGC-only, IIRC.
>> What I meant by the last paragraph was that with the tenuring 
>> threshold set at
>> 15 (which is what the log says), and with only 7 young gcs in the 
>> log, we can't
>> see at what age (or if) between 8 and 15 the survivor size goes down 
>> to something
>> reasonable.  If it doesn't, it might be worth it to us to revisit 
>> increasing the age
>> limit for 64-bit.
>>   
> Paul, the problem in Jeff's case is that even at age 1 he copies 1GB 
> or so. So, maybe, setting a small MTT and having more CMS cycles might 
> be the right option for him.
>
> Tony
>> jeff.lloyd at algorithmics.com wrote:
>>  
>>> Thanks for your response Paul.
>>>
>>> I'll take another look at the parallel collector. 
>>> That's a good point about the -XX:+UseCompressedOops.  We started off
>>> with heaps bigger than 32G so I had left that option out.  I'll put it
>>> back in and definitely try out 6u18 when it's available.
>>>
>>> What about the option -XX:+UseAdaptiveGCBoundary?  I don't see it
>>> referenced very often.  Would it be helpful in a case like mine?
>>>
>>> I'm not sure I understand your last paragraph.  What is the period of
>>> time that you would be interested in seeing?
>>>
>>> Jeff
>>>
>>> -----Original Message-----
>>> From: Paul.Hohensee at Sun.COM [mailto:Paul.Hohensee at Sun.COM] Sent: 
>>> Friday, September 11, 2009 1:23 PM
>>> To: Tony Printezis
>>> Cc: Jeff Lloyd; hotspot-gc-use at openjdk.java.net
>>> Subject: Re: Young generation configuration
>>>
>>> Another alternative mentioned in Tony and Charlie's J1 slides is the 
>>> parallel
>>> collector.  If, as Tony says, you can make the young gen large 
>>> enough to
>>>
>>> avoid
>>> promotion, and you really do have a steady state old gen, then which 
>>> old
>>> gen
>>> collector you use wouldn't matter much to pause times, given that young
>>> gen pause times seem to be your immediate problem.
>>>
>>> It may be that you just need more hardware threads to collect such a 
>>> big
>>>
>>> young
>>> gen too.  You might vary the number of gc threads to see how that
>>> affects
>>> collection times.  If there's significant differences, then you need
>>> more
>>> hardware threads, i.e., a bigger machine.
>>>
>>> You might also try using compressed pointers via 
>>> -XX:+UseCompressedOops.
>>> That should cut down the total survivor size significantly, perhaps
>>> enough
>>> to that your current hardware threads can collect significantly faster.
>>>
>>> Heap size
>>> will be limited to < 32gb, but you're app will probably fit.  A more 
>>> efficient
>>> version of compressed pointers will be available in 6u18, btw.
>>>
>>> I notice that none of your logs shows more than age 7 stats even though
>>> the
>>> tenuring threshold is 15.  It'd be nice to see if anything dies before
>>> then.
>>>
>>> Paul
>>>
>>> Tony Printezis wrote:
>>>      
>>>> Jeff,
>>>>
>>>> Hi. I had a very brief look at your logs. Yes, your app does seem 
>>>> to need to copy quite a lot (I don't think I've ever seen 1-2GB of 
>>>> data being copied in age 1!!!). From what I've seen from the space 
>>>> sizes, you're doing the right thing (i.e., you're consistent with 
>>>> what we talked about during the talk): you have quite large young 
>>>> gen and a reasonably sized old gen. But the sheer amount of 
>>>> surviving objects is
>>>>           
>>>      
>>>> what's getting you. How much larger can you make your young gen? I
>>>>           
>>> think      
>>>> in this case, the larger, the better.  Maybe, you can also try 
>>>> MaxTenuringThreshold=1. This goes against our general advice, but this
>>>>           
>>>      
>>>> might decrease the amount of objects being copied during young GCs, at
>>>>           
>>>      
>>>> the expense of more frequent CMS cycles...
>>>>
>>>> Tony
>>>>
>>>> jeff.lloyd at algorithmics.com wrote:
>>>>            
>>>>> Hi,
>>>>>
>>>>>  
>>>>>
>>>>> I'm new to this list and I have a few questions about tuning my young
>>>>>               
>>>      
>>>>> generation gc.
>>>>>
>>>>>  
>>>>>
>>>>> I have chosen to use the CMS garbage collector because my application
>>>>>               
>>>      
>>>>> is a relatively large reporting server that has a web front end 
>>>>> and therefore needs to have minimal pauses.
>>>>>  
>>>>>
>>>>> I am using java 1.6.0_16 64-bit on redhat 5.2 intel 8x3GHz and 64GB
>>>>>               
>>> ram.
>>>      
>>>>>  
>>>>>
>>>>> The machine is dedicated to this JVM.
>>>>>
>>>>>  
>>>>>
>>>>> My steady-state was calculated as follows:
>>>>>
>>>>> -          A typical number of users logged in and viewed several
>>>>>               
>>> reports
>>>      
>>>>> -          Stopped user actions and performed a manual full GC
>>>>>
>>>>> -          Look at the amount of heap used and take that number as
>>>>>               
>>> the      
>>>>> steady-state memory requirement
>>>>>
>>>>>  
>>>>>
>>>>> In this case my heap usage was ~10GB.  In order to handle variance or
>>>>>               
>>>      
>>>>> spikes I sized my old generation at 15-20GB.
>>>>>
>>>>>  
>>>>>
>>>>> I sized my young generation at 32-42GB and used survivor ratios of 1,
>>>>>               
>>>      
>>>>> 2, 3 and 6.
>>>>>
>>>>>  
>>>>>
>>>>> My goal is to maximize throughput and minimize pauses.  I'm willing
>>>>>               
>>> to      
>>>>> sacrifice ram to increase speed.
>>>>>
>>>>>  
>>>>>
>>>>> I have attached several of my many gc logs.  The file gc_48G.txt 
>>>>> is just using CMS without any other tuning, and the results are 
>>>>> much worse than what I have been able to accomplish with other 
>>>>> settings.  The best results are in the files 
>>>>> gc_52G_20Gold_32Gyoung_2sr.txt and gc_57G_15Gold_42Gyoung_1sr.txt.
>>>>>
>>>>>  
>>>>>
>>>>> The problem is that some of the pauses are just too long.
>>>>>
>>>>>  
>>>>>
>>>>> Is there a way to reduce the pause time any more than I have it now?
>>>>>
>>>>> Am I heading in the right direction?  I ask because the default 
>>>>> settings are so different than what I have been heading towards.
>>>>>
>>>>>  
>>>>>
>>>>> The best reference I have found on what good gc logs look like 
>>>>> come from brief examples presented at JavaOne this year by Tony 
>>>>> Printezis and Charlie Hunt.  But I don't seem to be able to get 
>>>>> logs that resemble their tenuring patterns.
>>>>>
>>>>>  
>>>>>
>>>>> I think I have a lot of medium-lived objects instead of nice 
>>>>> short-lived ones.
>>>>>
>>>>>  
>>>>>
>>>>> Are there any good practices for apps with objects like this?
>>>>>
>>>>>  
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Jeff
>>>>>
>>>>>  
>>>>>
>>>>>  
>>>>>
>>>>>               
>>> ------------------------------------------------------------------------ 
>>>
>>>      
>>>>> This email and any files transmitted with it are confidential and 
>>>>> proprietary to Algorithmics Incorporated and its affiliates 
>>>>> ("Algorithmics"). If received in error, use is prohibited. Please 
>>>>> destroy, and notify sender. Sender does not waive confidentiality 
>>>>> or privilege. Internet communications cannot be guaranteed to be 
>>>>> timely,
>>>>>               
>>>      
>>>>> secure, error or virus-free. Algorithmics does not accept 
>>>>> liability for any errors or omissions. Any commitment intended to 
>>>>> bind Algorithmics must be reduced to writing and signed by an 
>>>>> authorized signatory.
>>>>>
>>>>>               
>>> ------------------------------------------------------------------------ 
>>>
>>>   
>>> ------------------------------------------------------------------------ 
>>>
>>>      
>>>>> _______________________________________________
>>>>> hotspot-gc-use mailing list
>>>>> hotspot-gc-use at openjdk.java.net
>>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>>>                   
>>>>             
>>>  
>>> -------------------------------------------------------------------------- 
>>>
>>> This email and any files transmitted with it are confidential and 
>>> proprietary to Algorithmics Incorporated and its affiliates 
>>> ("Algorithmics"). If received in error, use is prohibited. Please 
>>> destroy, and notify sender. Sender does not waive confidentiality or 
>>> privilege. Internet communications cannot be guaranteed to be 
>>> timely, secure, error or virus-free. Algorithmics does not accept 
>>> liability for any errors or omissions. Any commitment intended to 
>>> bind Algorithmics must be reduced to writing and signed by an 
>>> authorized signatory.
>>> -------------------------------------------------------------------------- 
>>>
>>>       
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>   
>

From tcogan50 at gmail.com  Fri Sep 11 19:54:33 2009
From: tcogan50 at gmail.com (tcogan50 at gmail.com)
Date: Fri, 11 Sep 2009 22:54:33 -0400
Subject: hotspot-gc-use Digest, Vol 21, Issue 7
Message-ID: <4aab0d91.1402be0a.1873.5dfa@mx.google.com>

stop

-----Original Message-----
From: hotspot-gc-use-request at openjdk.java.net
Sent: Friday, September 11, 2009 8:00 PM
To: hotspot-gc-use at openjdk.java.net
Subject: hotspot-gc-use Digest, Vol 21, Issue 7

Send hotspot-gc-use mailing list submissions to
	hotspot-gc-use at openjdk.java.net

To subscribe or unsubscribe via the World Wide Web, visit
	http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
or, via email, send a message with subject or body 'help' to
	hotspot-gc-use-request at openjdk.java.net

You can reach the person managing the list at
	hotspot-gc-use-owner at openjdk.java.net

When replying, please edit your Subject line so it is more specific
than "Re: Contents of hotspot-gc-use digest..."


Today's Topics:

   1. Re: Young generation configuration (Tony Printezis)
   2. Re: Young generation configuration (Y.S.Ramakrishna at Sun.COM)
   3. Re: Young generation configuration (Paul Hohensee)


----------------------------------------------------------------------

Message: 1
Date: Fri, 11 Sep 2009 17:17:55 -0400
From: Tony Printezis <tony.printezis at sun.com>
Subject: Re: Young generation configuration
To: Paul Hohensee <Paul.Hohensee at sun.com>
Cc: hotspot-gc-use at openjdk.java.net
Message-ID: <4AAABE83.90308 at sun.com>
Content-Type: text/plain; CHARSET=US-ASCII; format=flowed


Paul Hohensee wrote:
> You can try out compressed pointers in 6u14.  It just won't be quite as
> fast as the version that's going into 6u18.  6u14 with compressed pointers
> will still be quite a bit faster than without.
>
> One of the gc guys may correct me, but UseAdaptiveGCBoundary allows
> the vm to ergonomically move the boundary between old and young generations,
> effectively resizing them.  I don't know if it's bit-rotted, and I seem 
> to remember
> that there wasn't much benefit.  But maybe we just didn't have a good 
> use case.
>   
Also, it's ParallelGC-only, IIRC.
> What I meant by the last paragraph was that with the tenuring threshold 
> set at
> 15 (which is what the log says), and with only 7 young gcs in the log, 
> we can't
> see at what age (or if) between 8 and 15 the survivor size goes down to 
> something
> reasonable.  If it doesn't, it might be worth it to us to revisit 
> increasing the age
> limit for 64-bit.
>   
Paul, the problem in Jeff's case is that even at age 1 he copies 1GB or 
so. So, maybe, setting a small MTT and having more CMS cycles might be 
the right option for him.

Tony
> jeff.lloyd at algorithmics.com wrote:
>   
>> Thanks for your response Paul.
>>
>> I'll take another look at the parallel collector.  
>>
>> That's a good point about the -XX:+UseCompressedOops.  We started off
>> with heaps bigger than 32G so I had left that option out.  I'll put it
>> back in and definitely try out 6u18 when it's available.
>>
>> What about the option -XX:+UseAdaptiveGCBoundary?  I don't see it
>> referenced very often.  Would it be helpful in a case like mine?
>>
>> I'm not sure I understand your last paragraph.  What is the period of
>> time that you would be interested in seeing?
>>
>> Jeff
>>
>> -----Original Message-----
>> From: Paul.Hohensee at Sun.COM [mailto:Paul.Hohensee at Sun.COM] 
>> Sent: Friday, September 11, 2009 1:23 PM
>> To: Tony Printezis
>> Cc: Jeff Lloyd; hotspot-gc-use at openjdk.java.net
>> Subject: Re: Young generation configuration
>>
>> Another alternative mentioned in Tony and Charlie's J1 slides is the 
>> parallel
>> collector.  If, as Tony says, you can make the young gen large enough to
>>
>> avoid
>> promotion, and you really do have a steady state old gen, then which old
>> gen
>> collector you use wouldn't matter much to pause times, given that young
>> gen pause times seem to be your immediate problem.
>>
>> It may be that you just need more hardware threads to collect such a big
>>
>> young
>> gen too.  You might vary the number of gc threads to see how that
>> affects
>> collection times.  If there's significant differences, then you need
>> more
>> hardware threads, i.e., a bigger machine.
>>
>> You might also try using compressed pointers via -XX:+UseCompressedOops.
>> That should cut down the total survivor size significantly, perhaps
>> enough
>> to that your current hardware threads can collect significantly faster.
>>
>> Heap size
>> will be limited to < 32gb, but you're app will probably fit.  A more 
>> efficient
>> version of compressed pointers will be available in 6u18, btw.
>>
>> I notice that none of your logs shows more than age 7 stats even though
>> the
>> tenuring threshold is 15.  It'd be nice to see if anything dies before
>> then.
>>
>> Paul
>>
>> Tony Printezis wrote:
>>   
>>     
>>> Jeff,
>>>
>>> Hi. I had a very brief look at your logs. Yes, your app does seem to 
>>> need to copy quite a lot (I don't think I've ever seen 1-2GB of data 
>>> being copied in age 1!!!). From what I've seen from the space sizes, 
>>> you're doing the right thing (i.e., you're consistent with what we 
>>> talked about during the talk): you have quite large young gen and a 
>>> reasonably sized old gen. But the sheer amount of surviving objects is
>>>     
>>>       
>>   
>>     
>>> what's getting you. How much larger can you make your young gen? I
>>>     
>>>       
>> think 
>>   
>>     
>>> in this case, the larger, the better.  Maybe, you can also try 
>>> MaxTenuringThreshold=1. This goes against our general advice, but this
>>>     
>>>       
>>   
>>     
>>> might decrease the amount of objects being copied during young GCs, at
>>>     
>>>       
>>   
>>     
>>> the expense of more frequent CMS cycles...
>>>
>>> Tony
>>>
>>> jeff.lloyd at algorithmics.com wrote:
>>>   
>>>     
>>>       
>>>> Hi,
>>>>
>>>>  
>>>>
>>>> I'm new to this list and I have a few questions about tuning my young
>>>>       
>>>>         
>>   
>>     
>>>> generation gc.
>>>>
>>>>  
>>>>
>>>> I have chosen to use the CMS garbage collector because my application
>>>>       
>>>>         
>>   
>>     
>>>> is a relatively large reporting server that has a web front end and 
>>>> therefore needs to have minimal pauses. 
>>>>
>>>>  
>>>>
>>>> I am using java 1.6.0_16 64-bit on redhat 5.2 intel 8x3GHz and 64GB
>>>>       
>>>>         
>> ram.
>>   
>>     
>>>>  
>>>>
>>>> The machine is dedicated to this JVM.
>>>>
>>>>  
>>>>
>>>> My steady-state was calculated as follows:
>>>>
>>>> -          A typical number of users logged in and viewed several
>>>>       
>>>>         
>> reports
>>   
>>     
>>>> -          Stopped user actions and performed a manual full GC
>>>>
>>>> -          Look at the amount of heap used and take that number as
>>>>       
>>>>         
>> the 
>>   
>>     
>>>> steady-state memory requirement
>>>>
>>>>  
>>>>
>>>> In this case my heap usage was ~10GB.  In order to handle variance or
>>>>       
>>>>         
>>   
>>     
>>>> spikes I sized my old generation at 15-20GB.
>>>>
>>>>  
>>>>
>>>> I sized my young generation at 32-42GB and used survivor ratios of 1,
>>>>       
>>>>         
>>   
>>     
>>>> 2, 3 and 6.
>>>>
>>>>  
>>>>
>>>> My goal is to maximize throughput and minimize pauses.  I'm willing
>>>>       
>>>>         
>> to 
>>   
>>     
>>>> sacrifice ram to increase speed.
>>>>
>>>>  
>>>>
>>>> I have attached several of my many gc logs.  The file gc_48G.txt is 
>>>> just using CMS without any other tuning, and the results are much 
>>>> worse than what I have been able to accomplish with other settings.  
>>>> The best results are in the files gc_52G_20Gold_32Gyoung_2sr.txt and 
>>>> gc_57G_15Gold_42Gyoung_1sr.txt.
>>>>
>>>>  
>>>>
>>>> The problem is that some of the pauses are just too long.
>>>>
>>>>  
>>>>
>>>> Is there a way to reduce the pause time any more than I have it now?
>>>>
>>>> Am I heading in the right direction?  I ask because the default 
>>>> settings are so different than what I have been heading towards.
>>>>
>>>>  
>>>>
>>>> The best reference I have found on what good gc logs look like come 
>>>> from brief examples presented at JavaOne this year by Tony Printezis 
>>>> and Charlie Hunt.  But I don't seem to be able to get logs that 
>>>> resemble their tenuring patterns.
>>>>
>>>>  
>>>>
>>>> I think I have a lot of medium-lived objects instead of nice 
>>>> short-lived ones.
>>>>
>>>>  
>>>>
>>>> Are there any good practices for apps with objects like this?
>>>>
>>>>  
>>>>
>>>> Thanks,
>>>>
>>>> Jeff
>>>>
>>>>  
>>>>
>>>>  
>>>>
>>>>       
>>>>         
>> ------------------------------------------------------------------------
>>   
>>     
>>>> This email and any files transmitted with it are confidential and 
>>>> proprietary to Algorithmics Incorporated and its affiliates 
>>>> ("Algorithmics"). If received in error, use is prohibited. Please 
>>>> destroy, and notify sender. Sender does not waive confidentiality or 
>>>> privilege. Internet communications cannot be guaranteed to be timely,
>>>>       
>>>>         
>>   
>>     
>>>> secure, error or virus-free. Algorithmics does not accept liability 
>>>> for any errors or omissions. Any commitment intended to bind 
>>>> Algorithmics must be reduced to writing and signed by an authorized 
>>>> signatory.
>>>>
>>>>       
>>>>         
>> ------------------------------------------------------------------------
>>   
>> ------------------------------------------------------------------------
>>   
>>     
>>>> _______________________________________________
>>>> hotspot-gc-use mailing list
>>>> hotspot-gc-use at openjdk.java.net
>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>>     
>>>>       
>>>>         
>>>   
>>>     
>>>       
>>  
>> --------------------------------------------------------------------------
>> This email and any files transmitted with it are confidential and proprietary to Algorithmics Incorporated and its affiliates ("Algorithmics"). If received in error, use is prohibited. Please destroy, and notify sender. Sender does not waive confidentiality or privilege. Internet communications cannot be guaranteed to be timely, secure, error or virus-free. Algorithmics does not accept liability for any errors or omissions. Any commitment intended to bind Algorithmics must be reduced to writing and signed by an authorized signatory.
>> --------------------------------------------------------------------------
>>   
>>     
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>   

-- 
---------------------------------------------------------------------
| Tony Printezis, Staff Engineer   | Sun Microsystems Inc.          |
|                                  | MS UBUR02-311                  |
| e-mail: tony.printezis at sun.com   | 35 Network Drive               |
| office: +1 781 442 0998 (x20998) | Burlington, MA 01803-2756, USA |
---------------------------------------------------------------------
e-mail client: Thunderbird (Linux)


------------------------------

Message: 2
Date: Fri, 11 Sep 2009 15:19:46 -0700
From: Y.S.Ramakrishna at Sun.COM
Subject: Re: Young generation configuration
To: jeff.lloyd at algorithmics.com
Cc: hotspot-gc-use at openjdk.java.net
Message-ID: <4AAACD02.4040109 at Sun.COM>
Content-Type: text/plain; CHARSET=US-ASCII; format=flowed

Hi Jeff --

On 09/11/09 14:06, jeff.lloyd at algorithmics.com wrote:
> Hi Ramki,
> 
> I did not know that lower pause times and higher throughput were
> generally incompatible.  Good to know - it makes sense too.
> 
> I'm trying to find out how long "too long" is.  Bankers can be fickle.
> :-)  Honestly, I think "too long" constitutes a noticeable pause in GUI
> interactions.

So, may be around one 200 ms pause per second or so at the most?
(If you think that is not suitable, think up a suitable figure like that.)
That would give us the requisite pause time budget and implicitly
define a GC overhead budget of 200/1000 = 20% (which is actually
quite high, but still lower than the overhead i saw in some
of your logs from a quick browse,
but as Tony pointed out that's because of the excessive copying
you were doing of relatively long-lived data that you may be
better off tenuring more quickly and letting the concurrent
collector deal with it (modulo yr & Tony's earlier remarks
re the slightly (see below) increased pressure -- probably
unavoidable if you are to meet yr pause time goals -- on
the concurrent collector).

> 
> How did you measure the proportion of short-lived and medium-lived
> objects?

oh, i was playing somewhat fast and loose. I was taking the ratio
of (age 1 survivors): (Eden size) to get a rough read on the
short:(not short). I sampled a single GC from one of yr log files,
but that would be the way to figure this out (while averaging
over a sufficiently large set of samples). (Of course, "long" and "short",
are relative, and age1 just tells you what survived that was allocated
in the last GC epoch. If GC's happen frequently less data would
die and more would qualify as "not short" by that kind of loose
definition (so my "long" and "short" was relative to the given GC
period).

> 
> We typically expect a "session" be live for most of the day, and

How much typical session data do you have? What is the rate at
which sessions get created? Does this happen perhaps mostly at the
start of the day? (In which case you would see lots of promotion
activity at the start of the day, but not so much later in the day?)
Or is the session creation rate uniform through the typical day?

> multiple reports of seconds or minutes in duration executed within that
> session.  So yes, I am seeing my "steady state" continue for a long

Let's say 1 minute. So during that 1 minute, how much data do you
produce and of that how much needs to be saved into the session
in the form of the "result" from that report? Looks like that
result would constitute data that you want to tenure sooner
rather than later. Depending on how long the intermediate
results needed to generate the final result are needed (you
mentioned large trees of intermediate objects i think in an
earlier email), you may want to copy them in the survivor
spaces, or -- if that data is so large as to cost excessive
copying time -- just promote that too. Luckily, in typical
cases, if data wants to be large, it also wants to live
long.

> time, with blips of activity throughout the day.  We cache a lot of
> results, which can lead to a general upward trend, but it doesn't seem
> to be our current source of object volume.

The cached data will tenure. Best to tenure it soon, if the
proportion of cached data is large. (I am guessing that
if you cache, you probably find it saves computation later --
so it also saves allocation later; thus I might naively
expect that you will initially tenure lots of data as your
caches fill, and later in steady state tenure less as well
as perhaps allocate less.)

If I look at one random tenuring distribution sample out of yr
logs, I see:-

- age   1: 2151744736 bytes, 2151744736 total
- age   2:  897330448 bytes, 3049075184 total
- age   3: 1274314280 bytes, 4323389464 total
- age   4: 1351603024 bytes, 5674992488 total
- age   5: 1529394376 bytes, 7204386864 total
- age   6: 1219001160 bytes, 8423388024 total

which is very flat -- indicating that anything that survives
a scavenge appears to live on for quite a while (lots of
assumptions about steady loads and so on). Experimenting
with an MTT of 1 or 2 might be useful, cf yr previous emails
with Tony et al. (Yes you will want to increase yr OG size,
as you noted, but no it will not fill up much faster because
the rate at which you promote will be nearly the same, because
most data that survives a single scavenge here tends to live -- above --
for at least 6 scavenges after which it prmotes anyway; you are
just promoting that same data a bit sooner without wasting effort
in copying it back and forth. It is true that some small amount
if intermediate data will promote but that's probably OK).

You will then want to play with initiating occupancy fraction
once you get an idea about the rate at which it's filling
upo versus the rate at which CMS is able to collect versus
the effect on scavenges of letting the CMS gen fill up more
before collecting versus the effect of doing more frequent
or less frequent CMS cycles (and its effect on mutator throughput
and available CPU and memory bandwidth).

Yes, as Paul noted, definitely +UseCompressedOops to relieve
heap pressure (reduce GC overhead) and speed up mutators
by improving cache efficiency.

-- ramki


------------------------------

Message: 3
Date: Fri, 11 Sep 2009 20:00:02 -0400
From: Paul Hohensee <Paul.Hohensee at Sun.COM>
Subject: Re: Young generation configuration
To: Tony Printezis <tony.printezis at Sun.COM>
Cc: hotspot-gc-use at openjdk.java.net
Message-ID: <4AAAE482.7070200 at sun.com>
Content-Type: text/plain; CHARSET=US-ASCII; format=flowed

Could be, but that would lead to a lot of concurrent overhead, reducing
his throughput.  Such a balancing act. :)

Paul

Tony Printezis wrote:
>
>
> Paul Hohensee wrote:
>> You can try out compressed pointers in 6u14.  It just won't be quite as
>> fast as the version that's going into 6u18.  6u14 with compressed 
>> pointers
>> will still be quite a bit faster than without.
>>
>> One of the gc guys may correct me, but UseAdaptiveGCBoundary allows
>> the vm to ergonomically move the boundary between old and young 
>> generations,
>> effectively resizing them.  I don't know if it's bit-rotted, and I 
>> seem to remember
>> that there wasn't much benefit.  But maybe we just didn't have a good 
>> use case.
>>   
> Also, it's ParallelGC-only, IIRC.
>> What I meant by the last paragraph was that with the tenuring 
>> threshold set at
>> 15 (which is what the log says), and with only 7 young gcs in the 
>> log, we can't
>> see at what age (or if) between 8 and 15 the survivor size goes down 
>> to something
>> reasonable.  If it doesn't, it might be worth it to us to revisit 
>> increasing the age
>> limit for 64-bit.
>>   
> Paul, the problem in Jeff's case is that even at age 1 he copies 1GB 
> or so. So, maybe, setting a small MTT and having more CMS cycles might 
> be the right option for him.
>
> Tony
>> jeff.lloyd at algorithmics.com wrote:
>>  
>>> Thanks for your response Paul.
>>>
>>> I'll take another look at the parallel collector. 
>>> That's a good point about the -XX:+UseCompressedOops.  We started off
>>> with heaps bigger than 32G so I had left that option out.  I'll put it
>>> back in and definitely try out 6u18 when it's available.
>>>
>>> What about the option -XX:+UseAdaptiveGCBoundary?  I don't see it
>>> referenced very often.  Would it be helpful in a case like mine?
>>>
>>> I'm not sure I understand your last paragraph.  What is the period of
>>> time that you would be interested in seeing?
>>>
>>> Jeff
>>>
>>> -----Original Message-----
>>> From: Paul.Hohensee at Sun.COM [mailto:Paul.Hohensee at Sun.COM] Sent: 
>>> Friday, September 11, 2009 1:23 PM
>>> To: Tony Printezis
>>> Cc: Jeff Lloyd; hotspot-gc-use at openjdk.java.net
>>> Subject: Re: Young generation configuration
>>>
>>> Another alternative mentioned in Tony and Charlie's J1 slides is the 
>>> parallel
>>> collector.  If, as Tony says, you can make the young gen large 
>>> enough to
>>>
>>> avoid
>>> promotion, and you really do have a steady state old gen, then which 
>>> old
>>> gen
>>> collector you use wouldn't matter much to pause times, given that young
>>> gen pause times seem to be your immediate problem.
>>>
>>> It may be that you just need more hardware threads to collect such a 
>>> big
>>>
>>> young
>>> gen too.  You might vary the number of gc threads to see how that
>>> affects
>>> collection times.  If there's significant differences, then you need
>>> more
>>> hardware threads, i.e., a bigger machine.
>>>
>>> You might also try using compressed pointers via 
>>> -XX:+UseCompressedOops.
>>> That should cut down the total survivor size significantly, perhaps
>>> enough
>>> to that your current hardware threads can collect significantly faster.
>>>
>>> Heap size
>>> will be limi

[The entire original message is not included]

From jeff.lloyd at algorithmics.com  Mon Sep 14 13:12:15 2009
From: jeff.lloyd at algorithmics.com (jeff.lloyd at algorithmics.com)
Date: Mon, 14 Sep 2009 16:12:15 -0400
Subject: Young generation configuration
In-Reply-To: <4AAABD8A.7000900@sun.com>
References: <0FCC438D62A5E643AA3F57D3417B220D0A7AE2ED@TORMAIL.algorithmics.com>
	<4AAA6B29.3030008@sun.com> <4AAA8759.8010904@sun.com>
	<0FCC438D62A5E643AA3F57D3417B220D0A813819@TORMAIL.algorithmics.com>
	<4AAABD8A.7000900@sun.com>
Message-ID: <0FCC438D62A5E643AA3F57D3417B220D0A813E38@TORMAIL.algorithmics.com>

Ah - I see what you mean about the last paragraph.  I hadn't counted the
number of gc's relative to the mtt yet.  

For what it's worth, the pause time to collect that much YG garbage is
too large for me, so I'll be decreasing the YG anyway.

Thanks again.
Jeff

-----Original Message-----
From: Paul.Hohensee at Sun.COM [mailto:Paul.Hohensee at Sun.COM] 
Sent: Friday, September 11, 2009 5:14 PM
To: Jeff Lloyd
Cc: hotspot-gc-use at openjdk.java.net
Subject: Re: Young generation configuration

You can try out compressed pointers in 6u14.  It just won't be quite as
fast as the version that's going into 6u18.  6u14 with compressed
pointers
will still be quite a bit faster than without.

One of the gc guys may correct me, but UseAdaptiveGCBoundary allows
the vm to ergonomically move the boundary between old and young
generations,
effectively resizing them.  I don't know if it's bit-rotted, and I seem 
to remember
that there wasn't much benefit.  But maybe we just didn't have a good 
use case.

What I meant by the last paragraph was that with the tenuring threshold 
set at
15 (which is what the log says), and with only 7 young gcs in the log, 
we can't
see at what age (or if) between 8 and 15 the survivor size goes down to 
something
reasonable.  If it doesn't, it might be worth it to us to revisit 
increasing the age
limit for 64-bit.

Paul

jeff.lloyd at algorithmics.com wrote:
> Thanks for your response Paul.
>
> I'll take another look at the parallel collector.  
>
> That's a good point about the -XX:+UseCompressedOops.  We started off
> with heaps bigger than 32G so I had left that option out.  I'll put it
> back in and definitely try out 6u18 when it's available.
>
> What about the option -XX:+UseAdaptiveGCBoundary?  I don't see it
> referenced very often.  Would it be helpful in a case like mine?
>
> I'm not sure I understand your last paragraph.  What is the period of
> time that you would be interested in seeing?
>
> Jeff
>
> -----Original Message-----
> From: Paul.Hohensee at Sun.COM [mailto:Paul.Hohensee at Sun.COM] 
> Sent: Friday, September 11, 2009 1:23 PM
> To: Tony Printezis
> Cc: Jeff Lloyd; hotspot-gc-use at openjdk.java.net
> Subject: Re: Young generation configuration
>
> Another alternative mentioned in Tony and Charlie's J1 slides is the 
> parallel
> collector.  If, as Tony says, you can make the young gen large enough
to
>
> avoid
> promotion, and you really do have a steady state old gen, then which
old
> gen
> collector you use wouldn't matter much to pause times, given that
young
> gen pause times seem to be your immediate problem.
>
> It may be that you just need more hardware threads to collect such a
big
>
> young
> gen too.  You might vary the number of gc threads to see how that
> affects
> collection times.  If there's significant differences, then you need
> more
> hardware threads, i.e., a bigger machine.
>
> You might also try using compressed pointers via
-XX:+UseCompressedOops.
> That should cut down the total survivor size significantly, perhaps
> enough
> to that your current hardware threads can collect significantly
faster.
>
> Heap size
> will be limited to < 32gb, but you're app will probably fit.  A more 
> efficient
> version of compressed pointers will be available in 6u18, btw.
>
> I notice that none of your logs shows more than age 7 stats even
though
> the
> tenuring threshold is 15.  It'd be nice to see if anything dies before
> then.
>
> Paul
>
> Tony Printezis wrote:
>   
>> Jeff,
>>
>> Hi. I had a very brief look at your logs. Yes, your app does seem to 
>> need to copy quite a lot (I don't think I've ever seen 1-2GB of data 
>> being copied in age 1!!!). From what I've seen from the space sizes, 
>> you're doing the right thing (i.e., you're consistent with what we 
>> talked about during the talk): you have quite large young gen and a 
>> reasonably sized old gen. But the sheer amount of surviving objects
is
>>     
>
>   
>> what's getting you. How much larger can you make your young gen? I
>>     
> think 
>   
>> in this case, the larger, the better.  Maybe, you can also try 
>> MaxTenuringThreshold=1. This goes against our general advice, but
this
>>     
>
>   
>> might decrease the amount of objects being copied during young GCs,
at
>>     
>
>   
>> the expense of more frequent CMS cycles...
>>
>> Tony
>>
>> jeff.lloyd at algorithmics.com wrote:
>>   
>>     
>>> Hi,
>>>
>>>  
>>>
>>> I'm new to this list and I have a few questions about tuning my
young
>>>       
>
>   
>>> generation gc.
>>>
>>>  
>>>
>>> I have chosen to use the CMS garbage collector because my
application
>>>       
>
>   
>>> is a relatively large reporting server that has a web front end and 
>>> therefore needs to have minimal pauses. 
>>>
>>>  
>>>
>>> I am using java 1.6.0_16 64-bit on redhat 5.2 intel 8x3GHz and 64GB
>>>       
> ram.
>   
>>>  
>>>
>>> The machine is dedicated to this JVM.
>>>
>>>  
>>>
>>> My steady-state was calculated as follows:
>>>
>>> -          A typical number of users logged in and viewed several
>>>       
> reports
>   
>>> -          Stopped user actions and performed a manual full GC
>>>
>>> -          Look at the amount of heap used and take that number as
>>>       
> the 
>   
>>> steady-state memory requirement
>>>
>>>  
>>>
>>> In this case my heap usage was ~10GB.  In order to handle variance
or
>>>       
>
>   
>>> spikes I sized my old generation at 15-20GB.
>>>
>>>  
>>>
>>> I sized my young generation at 32-42GB and used survivor ratios of
1,
>>>       
>
>   
>>> 2, 3 and 6.
>>>
>>>  
>>>
>>> My goal is to maximize throughput and minimize pauses.  I'm willing
>>>       
> to 
>   
>>> sacrifice ram to increase speed.
>>>
>>>  
>>>
>>> I have attached several of my many gc logs.  The file gc_48G.txt is 
>>> just using CMS without any other tuning, and the results are much 
>>> worse than what I have been able to accomplish with other settings.

>>> The best results are in the files gc_52G_20Gold_32Gyoung_2sr.txt and

>>> gc_57G_15Gold_42Gyoung_1sr.txt.
>>>
>>>  
>>>
>>> The problem is that some of the pauses are just too long.
>>>
>>>  
>>>
>>> Is there a way to reduce the pause time any more than I have it now?
>>>
>>> Am I heading in the right direction?  I ask because the default 
>>> settings are so different than what I have been heading towards.
>>>
>>>  
>>>
>>> The best reference I have found on what good gc logs look like come 
>>> from brief examples presented at JavaOne this year by Tony Printezis

>>> and Charlie Hunt.  But I don't seem to be able to get logs that 
>>> resemble their tenuring patterns.
>>>
>>>  
>>>
>>> I think I have a lot of medium-lived objects instead of nice 
>>> short-lived ones.
>>>
>>>  
>>>
>>> Are there any good practices for apps with objects like this?
>>>
>>>  
>>>
>>> Thanks,
>>>
>>> Jeff
>>>
>>>  
>>>
>>>  
>>>
>>>       
>
------------------------------------------------------------------------
>   
>>> This email and any files transmitted with it are confidential and 
>>> proprietary to Algorithmics Incorporated and its affiliates 
>>> ("Algorithmics"). If received in error, use is prohibited. Please 
>>> destroy, and notify sender. Sender does not waive confidentiality or

>>> privilege. Internet communications cannot be guaranteed to be
timely,
>>>       
>
>   
>>> secure, error or virus-free. Algorithmics does not accept liability 
>>> for any errors or omissions. Any commitment intended to bind 
>>> Algorithmics must be reduced to writing and signed by an authorized 
>>> signatory.
>>>
>>>       
>
------------------------------------------------------------------------
>   
>
------------------------------------------------------------------------
>   
>>> _______________________________________________
>>> hotspot-gc-use mailing list
>>> hotspot-gc-use at openjdk.java.net
>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>     
>>>       
>>   
>>     
>
>  
>
------------------------------------------------------------------------
--
> This email and any files transmitted with it are confidential and
proprietary to Algorithmics Incorporated and its affiliates
("Algorithmics"). If received in error, use is prohibited. Please
destroy, and notify sender. Sender does not waive confidentiality or
privilege. Internet communications cannot be guaranteed to be timely,
secure, error or virus-free. Algorithmics does not accept liability for
any errors or omissions. Any commitment intended to bind Algorithmics
must be reduced to writing and signed by an authorized signatory.
>
------------------------------------------------------------------------
--
>   

 
--------------------------------------------------------------------------
This email and any files transmitted with it are confidential and proprietary to Algorithmics Incorporated and its affiliates ("Algorithmics"). If received in error, use is prohibited. Please destroy, and notify sender. Sender does not waive confidentiality or privilege. Internet communications cannot be guaranteed to be timely, secure, error or virus-free. Algorithmics does not accept liability for any errors or omissions. Any commitment intended to bind Algorithmics must be reduced to writing and signed by an authorized signatory.
--------------------------------------------------------------------------

From jeff.lloyd at algorithmics.com  Mon Sep 14 14:12:04 2009
From: jeff.lloyd at algorithmics.com (jeff.lloyd at algorithmics.com)
Date: Mon, 14 Sep 2009 17:12:04 -0400
Subject: Young generation configuration
In-Reply-To: <4AAAE482.7070200@sun.com>
References: <0FCC438D62A5E643AA3F57D3417B220D0A7AE2ED@TORMAIL.algorithmics.com>
	<4AAA6B29.3030008@sun.com> <4AAA8759.8010904@sun.com>
	<0FCC438D62A5E643AA3F57D3417B220D0A813819@TORMAIL.algorithmics.com>
	<4AAABD8A.7000900@sun.com> <4AAABE83.90308@sun.com>
	<4AAAE482.7070200@sun.com>
Message-ID: <0FCC438D62A5E643AA3F57D3417B220D0A813ED4@TORMAIL.algorithmics.com>

Is there somewhere I can download a balancing pole?  :-)

Thanks, you guys have been great help.

Jeff

-----Original Message-----
From: Paul.Hohensee at Sun.COM [mailto:Paul.Hohensee at Sun.COM] 
Sent: Friday, September 11, 2009 8:00 PM
To: Tony Printezis
Cc: Jeff Lloyd; hotspot-gc-use at openjdk.java.net
Subject: Re: Young generation configuration

Could be, but that would lead to a lot of concurrent overhead, reducing
his throughput.  Such a balancing act. :)

Paul

Tony Printezis wrote:
>
>
> Paul Hohensee wrote:
>> You can try out compressed pointers in 6u14.  It just won't be quite
as
>> fast as the version that's going into 6u18.  6u14 with compressed 
>> pointers
>> will still be quite a bit faster than without.
>>
>> One of the gc guys may correct me, but UseAdaptiveGCBoundary allows
>> the vm to ergonomically move the boundary between old and young 
>> generations,
>> effectively resizing them.  I don't know if it's bit-rotted, and I 
>> seem to remember
>> that there wasn't much benefit.  But maybe we just didn't have a good

>> use case.
>>   
> Also, it's ParallelGC-only, IIRC.
>> What I meant by the last paragraph was that with the tenuring 
>> threshold set at
>> 15 (which is what the log says), and with only 7 young gcs in the 
>> log, we can't
>> see at what age (or if) between 8 and 15 the survivor size goes down 
>> to something
>> reasonable.  If it doesn't, it might be worth it to us to revisit 
>> increasing the age
>> limit for 64-bit.
>>   
> Paul, the problem in Jeff's case is that even at age 1 he copies 1GB 
> or so. So, maybe, setting a small MTT and having more CMS cycles might

> be the right option for him.
>
> Tony
>> jeff.lloyd at algorithmics.com wrote:
>>  
>>> Thanks for your response Paul.
>>>
>>> I'll take another look at the parallel collector. 
>>> That's a good point about the -XX:+UseCompressedOops.  We started
off
>>> with heaps bigger than 32G so I had left that option out.  I'll put
it
>>> back in and definitely try out 6u18 when it's available.
>>>
>>> What about the option -XX:+UseAdaptiveGCBoundary?  I don't see it
>>> referenced very often.  Would it be helpful in a case like mine?
>>>
>>> I'm not sure I understand your last paragraph.  What is the period
of
>>> time that you would be interested in seeing?
>>>
>>> Jeff
>>>
>>> -----Original Message-----
>>> From: Paul.Hohensee at Sun.COM [mailto:Paul.Hohensee at Sun.COM] Sent: 
>>> Friday, September 11, 2009 1:23 PM
>>> To: Tony Printezis
>>> Cc: Jeff Lloyd; hotspot-gc-use at openjdk.java.net
>>> Subject: Re: Young generation configuration
>>>
>>> Another alternative mentioned in Tony and Charlie's J1 slides is the

>>> parallel
>>> collector.  If, as Tony says, you can make the young gen large 
>>> enough to
>>>
>>> avoid
>>> promotion, and you really do have a steady state old gen, then which

>>> old
>>> gen
>>> collector you use wouldn't matter much to pause times, given that
young
>>> gen pause times seem to be your immediate problem.
>>>
>>> It may be that you just need more hardware threads to collect such a

>>> big
>>>
>>> young
>>> gen too.  You might vary the number of gc threads to see how that
>>> affects
>>> collection times.  If there's significant differences, then you need
>>> more
>>> hardware threads, i.e., a bigger machine.
>>>
>>> You might also try using compressed pointers via 
>>> -XX:+UseCompressedOops.
>>> That should cut down the total survivor size significantly, perhaps
>>> enough
>>> to that your current hardware threads can collect significantly
faster.
>>>
>>> Heap size
>>> will be limited to < 32gb, but you're app will probably fit.  A more

>>> efficient
>>> version of compressed pointers will be available in 6u18, btw.
>>>
>>> I notice that none of your logs shows more than age 7 stats even
though
>>> the
>>> tenuring threshold is 15.  It'd be nice to see if anything dies
before
>>> then.
>>>
>>> Paul
>>>
>>> Tony Printezis wrote:
>>>      
>>>> Jeff,
>>>>
>>>> Hi. I had a very brief look at your logs. Yes, your app does seem 
>>>> to need to copy quite a lot (I don't think I've ever seen 1-2GB of 
>>>> data being copied in age 1!!!). From what I've seen from the space 
>>>> sizes, you're doing the right thing (i.e., you're consistent with 
>>>> what we talked about during the talk): you have quite large young 
>>>> gen and a reasonably sized old gen. But the sheer amount of 
>>>> surviving objects is
>>>>           
>>>      
>>>> what's getting you. How much larger can you make your young gen? I
>>>>           
>>> think      
>>>> in this case, the larger, the better.  Maybe, you can also try 
>>>> MaxTenuringThreshold=1. This goes against our general advice, but
this
>>>>           
>>>      
>>>> might decrease the amount of objects being copied during young GCs,
at
>>>>           
>>>      
>>>> the expense of more frequent CMS cycles...
>>>>
>>>> Tony
>>>>
>>>> jeff.lloyd at algorithmics.com wrote:
>>>>            
>>>>> Hi,
>>>>>
>>>>>  
>>>>>
>>>>> I'm new to this list and I have a few questions about tuning my
young
>>>>>               
>>>      
>>>>> generation gc.
>>>>>
>>>>>  
>>>>>
>>>>> I have chosen to use the CMS garbage collector because my
application
>>>>>               
>>>      
>>>>> is a relatively large reporting server that has a web front end 
>>>>> and therefore needs to have minimal pauses.
>>>>>  
>>>>>
>>>>> I am using java 1.6.0_16 64-bit on redhat 5.2 intel 8x3GHz and
64GB
>>>>>               
>>> ram.
>>>      
>>>>>  
>>>>>
>>>>> The machine is dedicated to this JVM.
>>>>>
>>>>>  
>>>>>
>>>>> My steady-state was calculated as follows:
>>>>>
>>>>> -          A typical number of users logged in and viewed several
>>>>>               
>>> reports
>>>      
>>>>> -          Stopped user actions and performed a manual full GC
>>>>>
>>>>> -          Look at the amount of heap used and take that number as
>>>>>               
>>> the      
>>>>> steady-state memory requirement
>>>>>
>>>>>  
>>>>>
>>>>> In this case my heap usage was ~10GB.  In order to handle variance
or
>>>>>               
>>>      
>>>>> spikes I sized my old generation at 15-20GB.
>>>>>
>>>>>  
>>>>>
>>>>> I sized my young generation at 32-42GB and used survivor ratios of
1,
>>>>>               
>>>      
>>>>> 2, 3 and 6.
>>>>>
>>>>>  
>>>>>
>>>>> My goal is to maximize throughput and minimize pauses.  I'm
willing
>>>>>               
>>> to      
>>>>> sacrifice ram to increase speed.
>>>>>
>>>>>  
>>>>>
>>>>> I have attached several of my many gc logs.  The file gc_48G.txt 
>>>>> is just using CMS without any other tuning, and the results are 
>>>>> much worse than what I have been able to accomplish with other 
>>>>> settings.  The best results are in the files 
>>>>> gc_52G_20Gold_32Gyoung_2sr.txt and gc_57G_15Gold_42Gyoung_1sr.txt.
>>>>>
>>>>>  
>>>>>
>>>>> The problem is that some of the pauses are just too long.
>>>>>
>>>>>  
>>>>>
>>>>> Is there a way to reduce the pause time any more than I have it
now?
>>>>>
>>>>> Am I heading in the right direction?  I ask because the default 
>>>>> settings are so different than what I have been heading towards.
>>>>>
>>>>>  
>>>>>
>>>>> The best reference I have found on what good gc logs look like 
>>>>> come from brief examples presented at JavaOne this year by Tony 
>>>>> Printezis and Charlie Hunt.  But I don't seem to be able to get 
>>>>> logs that resemble their tenuring patterns.
>>>>>
>>>>>  
>>>>>
>>>>> I think I have a lot of medium-lived objects instead of nice 
>>>>> short-lived ones.
>>>>>
>>>>>  
>>>>>
>>>>> Are there any good practices for apps with objects like this?
>>>>>
>>>>>  
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Jeff
>>>>>
>>>>>  
>>>>>
>>>>>  
>>>>>
>>>>>               
>>>
------------------------------------------------------------------------

>>>
>>>      
>>>>> This email and any files transmitted with it are confidential and 
>>>>> proprietary to Algorithmics Incorporated and its affiliates 
>>>>> ("Algorithmics"). If received in error, use is prohibited. Please 
>>>>> destroy, and notify sender. Sender does not waive confidentiality 
>>>>> or privilege. Internet communications cannot be guaranteed to be 
>>>>> timely,
>>>>>               
>>>      
>>>>> secure, error or virus-free. Algorithmics does not accept 
>>>>> liability for any errors or omissions. Any commitment intended to 
>>>>> bind Algorithmics must be reduced to writing and signed by an 
>>>>> authorized signatory.
>>>>>
>>>>>               
>>>
------------------------------------------------------------------------

>>>
>>>   
>>>
------------------------------------------------------------------------

>>>
>>>      
>>>>> _______________________________________________
>>>>> hotspot-gc-use mailing list
>>>>> hotspot-gc-use at openjdk.java.net
>>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>>>                   
>>>>             
>>>  
>>>
------------------------------------------------------------------------
-- 
>>>
>>> This email and any files transmitted with it are confidential and 
>>> proprietary to Algorithmics Incorporated and its affiliates 
>>> ("Algorithmics"). If received in error, use is prohibited. Please 
>>> destroy, and notify sender. Sender does not waive confidentiality or

>>> privilege. Internet communications cannot be guaranteed to be 
>>> timely, secure, error or virus-free. Algorithmics does not accept 
>>> liability for any errors or omissions. Any commitment intended to 
>>> bind Algorithmics must be reduced to writing and signed by an 
>>> authorized signatory.
>>>
------------------------------------------------------------------------
-- 
>>>
>>>       
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>   
>

 
--------------------------------------------------------------------------
This email and any files transmitted with it are confidential and proprietary to Algorithmics Incorporated and its affiliates ("Algorithmics"). If received in error, use is prohibited. Please destroy, and notify sender. Sender does not waive confidentiality or privilege. Internet communications cannot be guaranteed to be timely, secure, error or virus-free. Algorithmics does not accept liability for any errors or omissions. Any commitment intended to bind Algorithmics must be reduced to writing and signed by an authorized signatory.
--------------------------------------------------------------------------

From jeff.lloyd at algorithmics.com  Mon Sep 14 14:52:41 2009
From: jeff.lloyd at algorithmics.com (jeff.lloyd at algorithmics.com)
Date: Mon, 14 Sep 2009 17:52:41 -0400
Subject: Young generation configuration
In-Reply-To: <4AAACD02.4040109@Sun.COM>
References: <0FCC438D62A5E643AA3F57D3417B220D0A7AE2ED@TORMAIL.algorithmics.com>
	<4AAA6B29.3030008@sun.com> <4AAA8759.8010904@sun.com>
	<4AAA9362.3080009@Sun.COM>
	<0FCC438D62A5E643AA3F57D3417B220D0A813842@TORMAIL.algorithmics.com>
	<4AAACD02.4040109@Sun.COM>
Message-ID: <0FCC438D62A5E643AA3F57D3417B220D0A813EFA@TORMAIL.algorithmics.com>

Thanks for all the information Ramki.

I had to lower my YG to 1G in order to reduce my typical YG GC to under
one second, and under .5 sec for many gc's.  I'm not playing with
initiating occupancy fraction settings to avoid the cms failures I'm
getting.  But it's looking so much better.

Our app login and logout produces truck loads of garbage, so figuring
out the initiating occupancy fraction settings is a bit tricky.

Everything is definitely much clearer now.

Thanks!
Jeff

-----Original Message-----
From: Y.S.Ramakrishna at Sun.COM [mailto:Y.S.Ramakrishna at Sun.COM] 
Sent: Friday, September 11, 2009 6:20 PM
To: Jeff Lloyd
Cc: hotspot-gc-use at openjdk.java.net
Subject: Re: Young generation configuration

Hi Jeff --

On 09/11/09 14:06, jeff.lloyd at algorithmics.com wrote:
> Hi Ramki,
> 
> I did not know that lower pause times and higher throughput were
> generally incompatible.  Good to know - it makes sense too.
> 
> I'm trying to find out how long "too long" is.  Bankers can be fickle.
> :-)  Honestly, I think "too long" constitutes a noticeable pause in
GUI
> interactions.

So, may be around one 200 ms pause per second or so at the most?
(If you think that is not suitable, think up a suitable figure like
that.)
That would give us the requisite pause time budget and implicitly
define a GC overhead budget of 200/1000 = 20% (which is actually
quite high, but still lower than the overhead i saw in some
of your logs from a quick browse,
but as Tony pointed out that's because of the excessive copying
you were doing of relatively long-lived data that you may be
better off tenuring more quickly and letting the concurrent
collector deal with it (modulo yr & Tony's earlier remarks
re the slightly (see below) increased pressure -- probably
unavoidable if you are to meet yr pause time goals -- on
the concurrent collector).

> 
> How did you measure the proportion of short-lived and medium-lived
> objects?

oh, i was playing somewhat fast and loose. I was taking the ratio
of (age 1 survivors): (Eden size) to get a rough read on the
short:(not short). I sampled a single GC from one of yr log files,
but that would be the way to figure this out (while averaging
over a sufficiently large set of samples). (Of course, "long" and
"short",
are relative, and age1 just tells you what survived that was allocated
in the last GC epoch. If GC's happen frequently less data would
die and more would qualify as "not short" by that kind of loose
definition (so my "long" and "short" was relative to the given GC
period).

> 
> We typically expect a "session" be live for most of the day, and

How much typical session data do you have? What is the rate at
which sessions get created? Does this happen perhaps mostly at the
start of the day? (In which case you would see lots of promotion
activity at the start of the day, but not so much later in the day?)
Or is the session creation rate uniform through the typical day?

> multiple reports of seconds or minutes in duration executed within
that
> session.  So yes, I am seeing my "steady state" continue for a long

Let's say 1 minute. So during that 1 minute, how much data do you
produce and of that how much needs to be saved into the session
in the form of the "result" from that report? Looks like that
result would constitute data that you want to tenure sooner
rather than later. Depending on how long the intermediate
results needed to generate the final result are needed (you
mentioned large trees of intermediate objects i think in an
earlier email), you may want to copy them in the survivor
spaces, or -- if that data is so large as to cost excessive
copying time -- just promote that too. Luckily, in typical
cases, if data wants to be large, it also wants to live
long.

> time, with blips of activity throughout the day.  We cache a lot of
> results, which can lead to a general upward trend, but it doesn't seem
> to be our current source of object volume.

The cached data will tenure. Best to tenure it soon, if the
proportion of cached data is large. (I am guessing that
if you cache, you probably find it saves computation later --
so it also saves allocation later; thus I might naively
expect that you will initially tenure lots of data as your
caches fill, and later in steady state tenure less as well
as perhaps allocate less.)

If I look at one random tenuring distribution sample out of yr
logs, I see:-

- age   1: 2151744736 bytes, 2151744736 total
- age   2:  897330448 bytes, 3049075184 total
- age   3: 1274314280 bytes, 4323389464 total
- age   4: 1351603024 bytes, 5674992488 total
- age   5: 1529394376 bytes, 7204386864 total
- age   6: 1219001160 bytes, 8423388024 total

which is very flat -- indicating that anything that survives
a scavenge appears to live on for quite a while (lots of
assumptions about steady loads and so on). Experimenting
with an MTT of 1 or 2 might be useful, cf yr previous emails
with Tony et al. (Yes you will want to increase yr OG size,
as you noted, but no it will not fill up much faster because
the rate at which you promote will be nearly the same, because
most data that survives a single scavenge here tends to live -- above --
for at least 6 scavenges after which it prmotes anyway; you are
just promoting that same data a bit sooner without wasting effort
in copying it back and forth. It is true that some small amount
if intermediate data will promote but that's probably OK).

You will then want to play with initiating occupancy fraction
once you get an idea about the rate at which it's filling
upo versus the rate at which CMS is able to collect versus
the effect on scavenges of letting the CMS gen fill up more
before collecting versus the effect of doing more frequent
or less frequent CMS cycles (and its effect on mutator throughput
and available CPU and memory bandwidth).

Yes, as Paul noted, definitely +UseCompressedOops to relieve
heap pressure (reduce GC overhead) and speed up mutators
by improving cache efficiency.

-- ramki

 
--------------------------------------------------------------------------
This email and any files transmitted with it are confidential and proprietary to Algorithmics Incorporated and its affiliates ("Algorithmics"). If received in error, use is prohibited. Please destroy, and notify sender. Sender does not waive confidentiality or privilege. Internet communications cannot be guaranteed to be timely, secure, error or virus-free. Algorithmics does not accept liability for any errors or omissions. Any commitment intended to bind Algorithmics must be reduced to writing and signed by an authorized signatory.
--------------------------------------------------------------------------

From jeff.lloyd at algorithmics.com  Thu Sep 17 07:52:48 2009
From: jeff.lloyd at algorithmics.com (jeff.lloyd at algorithmics.com)
Date: Thu, 17 Sep 2009 10:52:48 -0400
Subject: GC working well now - thanks!
Message-ID: <0FCC438D62A5E643AA3F57D3417B220D0A888FA3@TORMAIL.algorithmics.com>

Hi,

 
I just wanted to say thank you very much to everyone who gave me some
time on this list.  You've been very helpful, and I believe my problem
is solved.

 
For anyone who is interested, I took the old-school approach to using
the CMS collector:  The only way to reduce the gui pauses was to make
the YG relatively small - in our case 1G.  That kept the ParNew pauses
under 1 second most of the time, and the GUI felt responsive.  However I
started getting CMS failures so I radically changed my OG size.  Since
my steady-state size is 10G, I decided to give myself a 50% buffer and
leave 5G for quick tenuring of temporary objects that survived the
ParNew YG GC.  Then since my machine has lots of physical ram I set the
initiating occupancy fraction to 50%, and the total OG size at 30G.
That's probably higher than it needs to be, but at 20G I was still
getting CMS failures followed by a full GC.  Below is the full set of GC
parameters I used:

 
-verbose:gc

XX:+PrintGCDetails

-XX:+PrintGCTimeStamps

-Xloggc:gc.txt

-XX:+PrintTenuringDistribution

-XX:+UseConcMarkSweepGC

-Xmn1g

-XX:CMSInitiatingOccupancyFraction=50

-XX:+DoEscapeAnalysis

-XX:+UseCompressedOops

 
I'm attaching the log file for anyone who may be curious to see what it
looks like.  When I view it in Visual GC the YG is very active and the
OG has long rolling hills with room to spare at the top of the hills.

 
Thanks again.

 
Jeff

 
--------------------------------------------------------------------------
This email and any files transmitted with it are confidential and proprietary to Algorithmics Incorporated and its affiliates ("Algorithmics"). If received in error, use is prohibited. Please destroy, and notify sender. Sender does not waive confidentiality or privilege. Internet communications cannot be guaranteed to be timely, secure, error or virus-free. Algorithmics does not accept liability for any errors or omissions. Any commitment intended to bind Algorithmics must be reduced to writing and signed by an authorized signatory.
--------------------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20090917/cc54130a/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gc3_31_30old_1young_50iof.zip
Type: application/x-zip-compressed
Size: 22253 bytes
Desc: gc3_31_30old_1young_50iof.zip
Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20090917/cc54130a/attachment-0001.bin 

From Sujit.Das at cognizant.com  Tue Sep 22 12:46:51 2009
From: Sujit.Das at cognizant.com (Sujit.Das at cognizant.com)
Date: Wed, 23 Sep 2009 01:16:51 +0530
Subject: Question on ParallelGCThreads
Message-ID: <19B27FD5AF2EAA49A66F787911CF519596D051@CTSINCHNSXUU.cts.com>


Hi All,
 
We use CMS collector for old generation collection (option -XX:+UseConcMarkSweepGC) and parallel copying collector for young generation (option -XX:+ UseParNewGC). We use ParallelGCThreads command line option (-XX:ParallelGCThreads=<desired number>) to control number of garbage collector threads.
 
My question is:
1. Is ParallelGCThreads option applicable for only minor GC or is it applicable for old generation GC also?
 
2. Since CMS collector is a non-compacting collector and if application faces memory fragmentation issue then reducing # of ParallelGCThreads is an option to reduce fragmentation. Please confirm the understanding. This is based on my reading that each garbage collection thread reserves a part of the old generation for promotions and the division of the available space into these "promotion buffers" can cause a fragmentation effect. Reducing the number of garbage collector threads will reduce this fragmentation effect as will increasing the size of the old generation. 
 
Thanks,
Sujit
 

This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information.
If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. 
Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email or any action taken in reliance on this e-mail is strictly prohibited and may be unlawful.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20090923/22947409/attachment.html 

From Jon.Masamitsu at Sun.COM  Tue Sep 22 13:56:12 2009
From: Jon.Masamitsu at Sun.COM (Jon Masamitsu)
Date: Tue, 22 Sep 2009 13:56:12 -0700
Subject: Question on ParallelGCThreads
In-Reply-To: <19B27FD5AF2EAA49A66F787911CF519596D051@CTSINCHNSXUU.cts.com>
References: <19B27FD5AF2EAA49A66F787911CF519596D051@CTSINCHNSXUU.cts.com>
Message-ID: <4AB939EC.2070206@Sun.COM>


On 09/22/09 12:46, Sujit.Das at cognizant.com wrote:
> Hi All,
>  
> We use CMS collector for old generation collection (option 
> -XX:+UseConcMarkSweepGC) and parallel copying collector for young 
> generation (option -XX:+ UseParNewGC). We use ParallelGCThreads command 
> line option (-XX:ParallelGCThreads=<desired number>) to control number 
> of garbage collector threads.
>  
> My question is:
> 1. Is ParallelGCThreads option applicable for only minor GC or is it 
> applicable for old generation GC also?

Parts of the old generation collection that stop-the-world
and do work with multiple GC threads also use
ParallelGCThreads.  This would be the initial-mark and
remark phases assuming you're using a recent release
(parallel initial-mark and parallel remark were not in the
first release of CMS).

Additionally, the concurrent marking that uses multiple GC
threads (introduced in jdk6) may be affected by ParallelGCThreads.
The number of GC threads used in the concurrent marking is
a fraction of ParallelGCThreads.

>  
> 2. Since CMS collector is a non-compacting collector and if application 
> faces memory fragmentation issue then reducing # of ParallelGCThreads is 
> an option to reduce fragmentation. Please confirm the understanding. 
> This is based on my reading that each garbage collection thread reserves 
> a part of the old generation for promotions and the division of the 
> available space into these "promotion buffers" can cause a fragmentation 
> effect. Reducing the number of garbage collector threads will reduce 
> this fragmentation effect as will increasing the size of the old 
> generation. 

Yes, the promotion-local-allocation-buffers (PLAB's) can fragment
the old generation although that is not the most common cause.
There might have been an investigation of this type of fragmentation
recently.  I'll ask around.

Increasing the size of the old gen ameliorates the affects of
fragmentation by giving objects more time to die and allows
CMS more time to coalesce dead space into larger blocks.

>  
> Thanks,
> Sujit
>  
> This e-mail and any files transmitted with it are for the sole use of 
> the intended recipient(s) and may contain confidential and privileged 
> information.
> If you are not the intended recipient, please contact the sender by 
> reply e-mail and destroy all copies of the original message.
> Any unauthorized review, use, disclosure, dissemination, forwarding, 
> printing or copying of this email or any action taken in reliance on 
> this e-mail is strictly prohibited and may be unlawful.
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use

From Y.S.Ramakrishna at Sun.COM  Mon Sep 28 16:37:26 2009
From: Y.S.Ramakrishna at Sun.COM (Y.S.Ramakrishna at Sun.COM)
Date: Mon, 28 Sep 2009 16:37:26 -0700
Subject: unexplained CMS pauses
In-Reply-To: <4AC145AE.30804@sun.com>
References: <dc6011320909281007k19f52f54v8d086edcd19b0bae@mail.gmail.com>
	<4AC0EEAE.5010705@Sun.COM>
	<dc6011320909281106r398fa9fak9384e89012d0c52d@mail.gmail.com>
	<4AC145AE.30804@sun.com>
Message-ID: <4AC148B6.7010608@Sun.COM>

Hi Paul, that would be 100 us + minor gc time (=30 ms) = 30.1 ms.
It does not explain the 10- to 40-fold increase in txn times observed
here (of 300 ms - 1200 ms).

-- ramki

On 09/28/09 16:24, Paul Hohensee wrote:
> Might have nothing to do with concurrent mark running, rather with minor 
> collections
> happening during a java->native call.
> 
> Minor collections are stop-the-world, and if one occurs during a 
> java->native call, the
> java thread making the call to native will block on return from the call 
> to native until
> the minor collection is over.  If the outliers are outliers in the time 
> it takes to execute
> the native call, and the minor gc duration and timing are exactly right, 
> you could end
> up spending most of the minor gc time blocked.  Pause time would then be 
> 100ms
> + minor gc time.
> 
> Paul
> 
> Shane Cox wrote:
>> Ramki,
>> We're running Solaris 10 on 8-core Intel Xeon's.  1 Java instance per 
>> box.  JRE 1.6, update 14.  64-bit JVM.
>>
>> Our app is making JNI calls.  Each call requires approximately the 
>> same amount of work (so we expect them to perform similarly).  
>> Internally, we measure how long it takes to perform these calls. +99% 
>> of these calls complete in less than 100 micros.  However, we have 
>> outliers in the 300-1200ms range.  After some research, we have found 
>> that these extreme outliers coincide with the Concurrent Mark phase of 
>> CMS (based on timestamps), and ONLY if there is a minor GC during that 
>> phase.
>>
>> In a given day, our app will execute CMS collections 500-1000 times.  
>> Of these, less than 10 will have a minor collection execute during the 
>> Concurrent Mark.  For each of these, our app reports a large pause in 
>> the 300-1200ms range.  In fact, all of the large pauses reported by 
>> our app correlate with a  minor GC during a Concurrent Mark, without 
>> exception.
>>
>> Thanks
>>
>> On Mon, Sep 28, 2009 at 1:13 PM, <Y.S.Ramakrishna at sun.com 
>> <mailto:Y.S.Ramakrishna at sun.com>> wrote:
>>
>>     What platform are you on (#cpu's etc.)
>>     and when you say "app reports a pause of 300 ms"
>>     is it that the odd transaction sees a latency
>>     of 300 ms (coincident with concurrent mark),
>>     whereas most transactions complete much more
>>     quickly?
>>
>>     I am trying to first understand how you determine
>>     that the application is seeing "long pauses"
>>     when a minor gc occurs during concurrent mark.
>>
>>     PS: for example, if two gc pauses (say a scavenge of 30 ms
>>     and an initial mark of 13 ms occur in quick succession,
>>     your application might notice a  pause of 43 ms, etc.)
>>
>>     -- ramki
>>
>>
>>     On 09/28/09 10:07, Shane Cox wrote:
>>
>>         Our application is reporting long pauses when a minor GC
>>         occurs during the Concurrent Mark phase of CMS.  The output
>>         below is a specific example.  All of the GC pauses are less
>>         than 30ms (initial mark, remark, minor GC).  However, our app
>>         reported a 300ms pause.
>>
>>         56750.934: [GC [1 CMS-initial-mark: 702464K(1402496K)]
>>         719045K(1551616K), 0.0131859 secs]
>>         56750.947: [CMS-concurrent-mark-start]
>>         56752.133: [GC 56752.133: [ParNew: 144393K->12122K(149120K),
>>         0.0237615 secs] 846857K->719330K(1551616K), 0.0239988 secs]
>>         56752.162: [CMS-concurrent-mark: 1.188/1.215 secs]
>>         56752.162: [CMS-concurrent-preclean-start]
>>         56752.243: [CMS-concurrent-preclean: 0.070/0.081 secs]
>>         56752.243: [CMS-concurrent-abortable-preclean-start]
>>         56752.765: [CMS-concurrent-abortable-preclean: 0.143/0.522 secs]
>>         56752.766: [GC[YG occupancy: 77423 K (149120 K)]56752.766:
>>         [Rescan (parallel) , 0.0065730 secs]56752.773: [weak refs
>>         processing, 0.0001983 secs] [1 CMS-remark: 707208K(1402496K)]
>>         784631K(1551616K), 0.0068908 secs]
>>         56752.773: [CMS-concurrent-sweep-start]
>>         56753.209: [CMS-concurrent-sweep: 0.436/0.436 secs]
>>         56753.209: [CMS-concurrent-reset-start]
>>         56753.219: [CMS-concurrent-reset: 0.010/0.010 secs]
>>
>>
>>         We only observe this behavior when a minor GC occurs during
>>         the Concurrent Mark (which is rare).  Our app has reported
>>         pauses up to 1.2 seconds ... which is generally the time it
>>         takes to perform a Concurrent Mark.
>>
>>
>>         Any insight/help that you could provide would be much 
>> appreciated.
>>
>>         Thanks
>>
>>
>>


From Paul.Hohensee at Sun.COM  Mon Sep 28 16:39:38 2009
From: Paul.Hohensee at Sun.COM (Paul Hohensee)
Date: Mon, 28 Sep 2009 19:39:38 -0400
Subject: unexplained CMS pauses
In-Reply-To: <4AC148B6.7010608@Sun.COM>
References: <dc6011320909281007k19f52f54v8d086edcd19b0bae@mail.gmail.com>
	<4AC0EEAE.5010705@Sun.COM>
	<dc6011320909281106r398fa9fak9384e89012d0c52d@mail.gmail.com>
	<4AC145AE.30804@sun.com> <4AC148B6.7010608@Sun.COM>
Message-ID: <4AC1493A.2030004@sun.com>

Ouch.  Missed that.

Paul

Y.S.Ramakrishna at Sun.COM wrote:
> Hi Paul, that would be 100 us + minor gc time (=30 ms) = 30.1 ms.
> It does not explain the 10- to 40-fold increase in txn times observed
> here (of 300 ms - 1200 ms).
>
> -- ramki
>
> On 09/28/09 16:24, Paul Hohensee wrote:
>> Might have nothing to do with concurrent mark running, rather with 
>> minor collections
>> happening during a java->native call.
>>
>> Minor collections are stop-the-world, and if one occurs during a 
>> java->native call, the
>> java thread making the call to native will block on return from the 
>> call to native until
>> the minor collection is over.  If the outliers are outliers in the 
>> time it takes to execute
>> the native call, and the minor gc duration and timing are exactly 
>> right, you could end
>> up spending most of the minor gc time blocked.  Pause time would then 
>> be 100ms
>> + minor gc time.
>>
>> Paul
>>
>> Shane Cox wrote:
>>> Ramki,
>>> We're running Solaris 10 on 8-core Intel Xeon's.  1 Java instance 
>>> per box.  JRE 1.6, update 14.  64-bit JVM.
>>>
>>> Our app is making JNI calls.  Each call requires approximately the 
>>> same amount of work (so we expect them to perform similarly).  
>>> Internally, we measure how long it takes to perform these calls. 
>>> +99% of these calls complete in less than 100 micros.  However, we 
>>> have outliers in the 300-1200ms range.  After some research, we have 
>>> found that these extreme outliers coincide with the Concurrent Mark 
>>> phase of CMS (based on timestamps), and ONLY if there is a minor GC 
>>> during that phase.
>>>
>>> In a given day, our app will execute CMS collections 500-1000 
>>> times.  Of these, less than 10 will have a minor collection execute 
>>> during the Concurrent Mark.  For each of these, our app reports a 
>>> large pause in the 300-1200ms range.  In fact, all of the large 
>>> pauses reported by our app correlate with a  minor GC during a 
>>> Concurrent Mark, without exception.
>>>
>>> Thanks
>>>
>>> On Mon, Sep 28, 2009 at 1:13 PM, <Y.S.Ramakrishna at sun.com 
>>> <mailto:Y.S.Ramakrishna at sun.com>> wrote:
>>>
>>>     What platform are you on (#cpu's etc.)
>>>     and when you say "app reports a pause of 300 ms"
>>>     is it that the odd transaction sees a latency
>>>     of 300 ms (coincident with concurrent mark),
>>>     whereas most transactions complete much more
>>>     quickly?
>>>
>>>     I am trying to first understand how you determine
>>>     that the application is seeing "long pauses"
>>>     when a minor gc occurs during concurrent mark.
>>>
>>>     PS: for example, if two gc pauses (say a scavenge of 30 ms
>>>     and an initial mark of 13 ms occur in quick succession,
>>>     your application might notice a  pause of 43 ms, etc.)
>>>
>>>     -- ramki
>>>
>>>
>>>     On 09/28/09 10:07, Shane Cox wrote:
>>>
>>>         Our application is reporting long pauses when a minor GC
>>>         occurs during the Concurrent Mark phase of CMS.  The output
>>>         below is a specific example.  All of the GC pauses are less
>>>         than 30ms (initial mark, remark, minor GC).  However, our app
>>>         reported a 300ms pause.
>>>
>>>         56750.934: [GC [1 CMS-initial-mark: 702464K(1402496K)]
>>>         719045K(1551616K), 0.0131859 secs]
>>>         56750.947: [CMS-concurrent-mark-start]
>>>         56752.133: [GC 56752.133: [ParNew: 144393K->12122K(149120K),
>>>         0.0237615 secs] 846857K->719330K(1551616K), 0.0239988 secs]
>>>         56752.162: [CMS-concurrent-mark: 1.188/1.215 secs]
>>>         56752.162: [CMS-concurrent-preclean-start]
>>>         56752.243: [CMS-concurrent-preclean: 0.070/0.081 secs]
>>>         56752.243: [CMS-concurrent-abortable-preclean-start]
>>>         56752.765: [CMS-concurrent-abortable-preclean: 0.143/0.522 
>>> secs]
>>>         56752.766: [GC[YG occupancy: 77423 K (149120 K)]56752.766:
>>>         [Rescan (parallel) , 0.0065730 secs]56752.773: [weak refs
>>>         processing, 0.0001983 secs] [1 CMS-remark: 707208K(1402496K)]
>>>         784631K(1551616K), 0.0068908 secs]
>>>         56752.773: [CMS-concurrent-sweep-start]
>>>         56753.209: [CMS-concurrent-sweep: 0.436/0.436 secs]
>>>         56753.209: [CMS-concurrent-reset-start]
>>>         56753.219: [CMS-concurrent-reset: 0.010/0.010 secs]
>>>
>>>
>>>         We only observe this behavior when a minor GC occurs during
>>>         the Concurrent Mark (which is rare).  Our app has reported
>>>         pauses up to 1.2 seconds ... which is generally the time it
>>>         takes to perform a Concurrent Mark.
>>>
>>>
>>>         Any insight/help that you could provide would be much 
>>> appreciated.
>>>
>>>         Thanks
>>>
>>>
>>>
>