hotspot-gc-use Digest, Vol 21, Issue 7

tcogan50 at gmail.com tcogan50 at gmail.com
Fri Sep 11 19:54:33 PDT 2009


stop

-----Original Message-----
From: hotspot-gc-use-request at openjdk.java.net
Sent: Friday, September 11, 2009 8:00 PM
To: hotspot-gc-use at openjdk.java.net
Subject: hotspot-gc-use Digest, Vol 21, Issue 7

Send hotspot-gc-use mailing list submissions to
	hotspot-gc-use at openjdk.java.net

To subscribe or unsubscribe via the World Wide Web, visit
	http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
or, via email, send a message with subject or body 'help' to
	hotspot-gc-use-request at openjdk.java.net

You can reach the person managing the list at
	hotspot-gc-use-owner at openjdk.java.net

When replying, please edit your Subject line so it is more specific
than "Re: Contents of hotspot-gc-use digest..."


Today's Topics:

   1. Re: Young generation configuration (Tony Printezis)
   2. Re: Young generation configuration (Y.S.Ramakrishna at Sun.COM)
   3. Re: Young generation configuration (Paul Hohensee)


----------------------------------------------------------------------

Message: 1
Date: Fri, 11 Sep 2009 17:17:55 -0400
From: Tony Printezis <tony.printezis at sun.com>
Subject: Re: Young generation configuration
To: Paul Hohensee <Paul.Hohensee at sun.com>
Cc: hotspot-gc-use at openjdk.java.net
Message-ID: <4AAABE83.90308 at sun.com>
Content-Type: text/plain; CHARSET=US-ASCII; format=flowed



Paul Hohensee wrote:
> You can try out compressed pointers in 6u14.  It just won't be quite as
> fast as the version that's going into 6u18.  6u14 with compressed pointers
> will still be quite a bit faster than without.
>
> One of the gc guys may correct me, but UseAdaptiveGCBoundary allows
> the vm to ergonomically move the boundary between old and young generations,
> effectively resizing them.  I don't know if it's bit-rotted, and I seem 
> to remember
> that there wasn't much benefit.  But maybe we just didn't have a good 
> use case.
>   
Also, it's ParallelGC-only, IIRC.
> What I meant by the last paragraph was that with the tenuring threshold 
> set at
> 15 (which is what the log says), and with only 7 young gcs in the log, 
> we can't
> see at what age (or if) between 8 and 15 the survivor size goes down to 
> something
> reasonable.  If it doesn't, it might be worth it to us to revisit 
> increasing the age
> limit for 64-bit.
>   
Paul, the problem in Jeff's case is that even at age 1 he copies 1GB or 
so. So, maybe, setting a small MTT and having more CMS cycles might be 
the right option for him.

Tony
> jeff.lloyd at algorithmics.com wrote:
>   
>> Thanks for your response Paul.
>>
>> I'll take another look at the parallel collector.  
>>
>> That's a good point about the -XX:+UseCompressedOops.  We started off
>> with heaps bigger than 32G so I had left that option out.  I'll put it
>> back in and definitely try out 6u18 when it's available.
>>
>> What about the option -XX:+UseAdaptiveGCBoundary?  I don't see it
>> referenced very often.  Would it be helpful in a case like mine?
>>
>> I'm not sure I understand your last paragraph.  What is the period of
>> time that you would be interested in seeing?
>>
>> Jeff
>>
>> -----Original Message-----
>> From: Paul.Hohensee at Sun.COM [mailto:Paul.Hohensee at Sun.COM] 
>> Sent: Friday, September 11, 2009 1:23 PM
>> To: Tony Printezis
>> Cc: Jeff Lloyd; hotspot-gc-use at openjdk.java.net
>> Subject: Re: Young generation configuration
>>
>> Another alternative mentioned in Tony and Charlie's J1 slides is the 
>> parallel
>> collector.  If, as Tony says, you can make the young gen large enough to
>>
>> avoid
>> promotion, and you really do have a steady state old gen, then which old
>> gen
>> collector you use wouldn't matter much to pause times, given that young
>> gen pause times seem to be your immediate problem.
>>
>> It may be that you just need more hardware threads to collect such a big
>>
>> young
>> gen too.  You might vary the number of gc threads to see how that
>> affects
>> collection times.  If there's significant differences, then you need
>> more
>> hardware threads, i.e., a bigger machine.
>>
>> You might also try using compressed pointers via -XX:+UseCompressedOops.
>> That should cut down the total survivor size significantly, perhaps
>> enough
>> to that your current hardware threads can collect significantly faster.
>>
>> Heap size
>> will be limited to < 32gb, but you're app will probably fit.  A more 
>> efficient
>> version of compressed pointers will be available in 6u18, btw.
>>
>> I notice that none of your logs shows more than age 7 stats even though
>> the
>> tenuring threshold is 15.  It'd be nice to see if anything dies before
>> then.
>>
>> Paul
>>
>> Tony Printezis wrote:
>>   
>>     
>>> Jeff,
>>>
>>> Hi. I had a very brief look at your logs. Yes, your app does seem to 
>>> need to copy quite a lot (I don't think I've ever seen 1-2GB of data 
>>> being copied in age 1!!!). From what I've seen from the space sizes, 
>>> you're doing the right thing (i.e., you're consistent with what we 
>>> talked about during the talk): you have quite large young gen and a 
>>> reasonably sized old gen. But the sheer amount of surviving objects is
>>>     
>>>       
>>   
>>     
>>> what's getting you. How much larger can you make your young gen? I
>>>     
>>>       
>> think 
>>   
>>     
>>> in this case, the larger, the better.  Maybe, you can also try 
>>> MaxTenuringThreshold=1. This goes against our general advice, but this
>>>     
>>>       
>>   
>>     
>>> might decrease the amount of objects being copied during young GCs, at
>>>     
>>>       
>>   
>>     
>>> the expense of more frequent CMS cycles...
>>>
>>> Tony
>>>
>>> jeff.lloyd at algorithmics.com wrote:
>>>   
>>>     
>>>       
>>>> Hi,
>>>>
>>>>  
>>>>
>>>> I'm new to this list and I have a few questions about tuning my young
>>>>       
>>>>         
>>   
>>     
>>>> generation gc.
>>>>
>>>>  
>>>>
>>>> I have chosen to use the CMS garbage collector because my application
>>>>       
>>>>         
>>   
>>     
>>>> is a relatively large reporting server that has a web front end and 
>>>> therefore needs to have minimal pauses. 
>>>>
>>>>  
>>>>
>>>> I am using java 1.6.0_16 64-bit on redhat 5.2 intel 8x3GHz and 64GB
>>>>       
>>>>         
>> ram.
>>   
>>     
>>>>  
>>>>
>>>> The machine is dedicated to this JVM.
>>>>
>>>>  
>>>>
>>>> My steady-state was calculated as follows:
>>>>
>>>> -          A typical number of users logged in and viewed several
>>>>       
>>>>         
>> reports
>>   
>>     
>>>> -          Stopped user actions and performed a manual full GC
>>>>
>>>> -          Look at the amount of heap used and take that number as
>>>>       
>>>>         
>> the 
>>   
>>     
>>>> steady-state memory requirement
>>>>
>>>>  
>>>>
>>>> In this case my heap usage was ~10GB.  In order to handle variance or
>>>>       
>>>>         
>>   
>>     
>>>> spikes I sized my old generation at 15-20GB.
>>>>
>>>>  
>>>>
>>>> I sized my young generation at 32-42GB and used survivor ratios of 1,
>>>>       
>>>>         
>>   
>>     
>>>> 2, 3 and 6.
>>>>
>>>>  
>>>>
>>>> My goal is to maximize throughput and minimize pauses.  I'm willing
>>>>       
>>>>         
>> to 
>>   
>>     
>>>> sacrifice ram to increase speed.
>>>>
>>>>  
>>>>
>>>> I have attached several of my many gc logs.  The file gc_48G.txt is 
>>>> just using CMS without any other tuning, and the results are much 
>>>> worse than what I have been able to accomplish with other settings.  
>>>> The best results are in the files gc_52G_20Gold_32Gyoung_2sr.txt and 
>>>> gc_57G_15Gold_42Gyoung_1sr.txt.
>>>>
>>>>  
>>>>
>>>> The problem is that some of the pauses are just too long.
>>>>
>>>>  
>>>>
>>>> Is there a way to reduce the pause time any more than I have it now?
>>>>
>>>> Am I heading in the right direction?  I ask because the default 
>>>> settings are so different than what I have been heading towards.
>>>>
>>>>  
>>>>
>>>> The best reference I have found on what good gc logs look like come 
>>>> from brief examples presented at JavaOne this year by Tony Printezis 
>>>> and Charlie Hunt.  But I don't seem to be able to get logs that 
>>>> resemble their tenuring patterns.
>>>>
>>>>  
>>>>
>>>> I think I have a lot of medium-lived objects instead of nice 
>>>> short-lived ones.
>>>>
>>>>  
>>>>
>>>> Are there any good practices for apps with objects like this?
>>>>
>>>>  
>>>>
>>>> Thanks,
>>>>
>>>> Jeff
>>>>
>>>>  
>>>>
>>>>  
>>>>
>>>>       
>>>>         
>> ------------------------------------------------------------------------
>>   
>>     
>>>> This email and any files transmitted with it are confidential and 
>>>> proprietary to Algorithmics Incorporated and its affiliates 
>>>> ("Algorithmics"). If received in error, use is prohibited. Please 
>>>> destroy, and notify sender. Sender does not waive confidentiality or 
>>>> privilege. Internet communications cannot be guaranteed to be timely,
>>>>       
>>>>         
>>   
>>     
>>>> secure, error or virus-free. Algorithmics does not accept liability 
>>>> for any errors or omissions. Any commitment intended to bind 
>>>> Algorithmics must be reduced to writing and signed by an authorized 
>>>> signatory.
>>>>
>>>>       
>>>>         
>> ------------------------------------------------------------------------
>>   
>> ------------------------------------------------------------------------
>>   
>>     
>>>> _______________________________________________
>>>> hotspot-gc-use mailing list
>>>> hotspot-gc-use at openjdk.java.net
>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>>     
>>>>       
>>>>         
>>>   
>>>     
>>>       
>>  
>> --------------------------------------------------------------------------
>> This email and any files transmitted with it are confidential and proprietary to Algorithmics Incorporated and its affiliates ("Algorithmics"). If received in error, use is prohibited. Please destroy, and notify sender. Sender does not waive confidentiality or privilege. Internet communications cannot be guaranteed to be timely, secure, error or virus-free. Algorithmics does not accept liability for any errors or omissions. Any commitment intended to bind Algorithmics must be reduced to writing and signed by an authorized signatory.
>> --------------------------------------------------------------------------
>>   
>>     
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>   

-- 
---------------------------------------------------------------------
| Tony Printezis, Staff Engineer   | Sun Microsystems Inc.          |
|                                  | MS UBUR02-311                  |
| e-mail: tony.printezis at sun.com   | 35 Network Drive               |
| office: +1 781 442 0998 (x20998) | Burlington, MA 01803-2756, USA |
---------------------------------------------------------------------
e-mail client: Thunderbird (Linux)




------------------------------

Message: 2
Date: Fri, 11 Sep 2009 15:19:46 -0700
From: Y.S.Ramakrishna at Sun.COM
Subject: Re: Young generation configuration
To: jeff.lloyd at algorithmics.com
Cc: hotspot-gc-use at openjdk.java.net
Message-ID: <4AAACD02.4040109 at Sun.COM>
Content-Type: text/plain; CHARSET=US-ASCII; format=flowed

Hi Jeff --

On 09/11/09 14:06, jeff.lloyd at algorithmics.com wrote:
> Hi Ramki,
> 
> I did not know that lower pause times and higher throughput were
> generally incompatible.  Good to know - it makes sense too.
> 
> I'm trying to find out how long "too long" is.  Bankers can be fickle.
> :-)  Honestly, I think "too long" constitutes a noticeable pause in GUI
> interactions.

So, may be around one 200 ms pause per second or so at the most?
(If you think that is not suitable, think up a suitable figure like that.)
That would give us the requisite pause time budget and implicitly
define a GC overhead budget of 200/1000 = 20% (which is actually
quite high, but still lower than the overhead i saw in some
of your logs from a quick browse,
but as Tony pointed out that's because of the excessive copying
you were doing of relatively long-lived data that you may be
better off tenuring more quickly and letting the concurrent
collector deal with it (modulo yr & Tony's earlier remarks
re the slightly (see below) increased pressure -- probably
unavoidable if you are to meet yr pause time goals -- on
the concurrent collector).

> 
> How did you measure the proportion of short-lived and medium-lived
> objects?

oh, i was playing somewhat fast and loose. I was taking the ratio
of (age 1 survivors): (Eden size) to get a rough read on the
short:(not short). I sampled a single GC from one of yr log files,
but that would be the way to figure this out (while averaging
over a sufficiently large set of samples). (Of course, "long" and "short",
are relative, and age1 just tells you what survived that was allocated
in the last GC epoch. If GC's happen frequently less data would
die and more would qualify as "not short" by that kind of loose
definition (so my "long" and "short" was relative to the given GC
period).

> 
> We typically expect a "session" be live for most of the day, and

How much typical session data do you have? What is the rate at
which sessions get created? Does this happen perhaps mostly at the
start of the day? (In which case you would see lots of promotion
activity at the start of the day, but not so much later in the day?)
Or is the session creation rate uniform through the typical day?

> multiple reports of seconds or minutes in duration executed within that
> session.  So yes, I am seeing my "steady state" continue for a long

Let's say 1 minute. So during that 1 minute, how much data do you
produce and of that how much needs to be saved into the session
in the form of the "result" from that report? Looks like that
result would constitute data that you want to tenure sooner
rather than later. Depending on how long the intermediate
results needed to generate the final result are needed (you
mentioned large trees of intermediate objects i think in an
earlier email), you may want to copy them in the survivor
spaces, or -- if that data is so large as to cost excessive
copying time -- just promote that too. Luckily, in typical
cases, if data wants to be large, it also wants to live
long.

> time, with blips of activity throughout the day.  We cache a lot of
> results, which can lead to a general upward trend, but it doesn't seem
> to be our current source of object volume.

The cached data will tenure. Best to tenure it soon, if the
proportion of cached data is large. (I am guessing that
if you cache, you probably find it saves computation later --
so it also saves allocation later; thus I might naively
expect that you will initially tenure lots of data as your
caches fill, and later in steady state tenure less as well
as perhaps allocate less.)

If I look at one random tenuring distribution sample out of yr
logs, I see:-

- age   1: 2151744736 bytes, 2151744736 total
- age   2:  897330448 bytes, 3049075184 total
- age   3: 1274314280 bytes, 4323389464 total
- age   4: 1351603024 bytes, 5674992488 total
- age   5: 1529394376 bytes, 7204386864 total
- age   6: 1219001160 bytes, 8423388024 total

which is very flat -- indicating that anything that survives
a scavenge appears to live on for quite a while (lots of
assumptions about steady loads and so on). Experimenting
with an MTT of 1 or 2 might be useful, cf yr previous emails
with Tony et al. (Yes you will want to increase yr OG size,
as you noted, but no it will not fill up much faster because
the rate at which you promote will be nearly the same, because
most data that survives a single scavenge here tends to live -- above --
for at least 6 scavenges after which it prmotes anyway; you are
just promoting that same data a bit sooner without wasting effort
in copying it back and forth. It is true that some small amount
if intermediate data will promote but that's probably OK).

You will then want to play with initiating occupancy fraction
once you get an idea about the rate at which it's filling
upo versus the rate at which CMS is able to collect versus
the effect on scavenges of letting the CMS gen fill up more
before collecting versus the effect of doing more frequent
or less frequent CMS cycles (and its effect on mutator throughput
and available CPU and memory bandwidth).

Yes, as Paul noted, definitely +UseCompressedOops to relieve
heap pressure (reduce GC overhead) and speed up mutators
by improving cache efficiency.

-- ramki


------------------------------

Message: 3
Date: Fri, 11 Sep 2009 20:00:02 -0400
From: Paul Hohensee <Paul.Hohensee at Sun.COM>
Subject: Re: Young generation configuration
To: Tony Printezis <tony.printezis at Sun.COM>
Cc: hotspot-gc-use at openjdk.java.net
Message-ID: <4AAAE482.7070200 at sun.com>
Content-Type: text/plain; CHARSET=US-ASCII; format=flowed

Could be, but that would lead to a lot of concurrent overhead, reducing
his throughput.  Such a balancing act. :)

Paul

Tony Printezis wrote:
>
>
> Paul Hohensee wrote:
>> You can try out compressed pointers in 6u14.  It just won't be quite as
>> fast as the version that's going into 6u18.  6u14 with compressed 
>> pointers
>> will still be quite a bit faster than without.
>>
>> One of the gc guys may correct me, but UseAdaptiveGCBoundary allows
>> the vm to ergonomically move the boundary between old and young 
>> generations,
>> effectively resizing them.  I don't know if it's bit-rotted, and I 
>> seem to remember
>> that there wasn't much benefit.  But maybe we just didn't have a good 
>> use case.
>>   
> Also, it's ParallelGC-only, IIRC.
>> What I meant by the last paragraph was that with the tenuring 
>> threshold set at
>> 15 (which is what the log says), and with only 7 young gcs in the 
>> log, we can't
>> see at what age (or if) between 8 and 15 the survivor size goes down 
>> to something
>> reasonable.  If it doesn't, it might be worth it to us to revisit 
>> increasing the age
>> limit for 64-bit.
>>   
> Paul, the problem in Jeff's case is that even at age 1 he copies 1GB 
> or so. So, maybe, setting a small MTT and having more CMS cycles might 
> be the right option for him.
>
> Tony
>> jeff.lloyd at algorithmics.com wrote:
>>  
>>> Thanks for your response Paul.
>>>
>>> I'll take another look at the parallel collector. 
>>> That's a good point about the -XX:+UseCompressedOops.  We started off
>>> with heaps bigger than 32G so I had left that option out.  I'll put it
>>> back in and definitely try out 6u18 when it's available.
>>>
>>> What about the option -XX:+UseAdaptiveGCBoundary?  I don't see it
>>> referenced very often.  Would it be helpful in a case like mine?
>>>
>>> I'm not sure I understand your last paragraph.  What is the period of
>>> time that you would be interested in seeing?
>>>
>>> Jeff
>>>
>>> -----Original Message-----
>>> From: Paul.Hohensee at Sun.COM [mailto:Paul.Hohensee at Sun.COM] Sent: 
>>> Friday, September 11, 2009 1:23 PM
>>> To: Tony Printezis
>>> Cc: Jeff Lloyd; hotspot-gc-use at openjdk.java.net
>>> Subject: Re: Young generation configuration
>>>
>>> Another alternative mentioned in Tony and Charlie's J1 slides is the 
>>> parallel
>>> collector.  If, as Tony says, you can make the young gen large 
>>> enough to
>>>
>>> avoid
>>> promotion, and you really do have a steady state old gen, then which 
>>> old
>>> gen
>>> collector you use wouldn't matter much to pause times, given that young
>>> gen pause times seem to be your immediate problem.
>>>
>>> It may be that you just need more hardware threads to collect such a 
>>> big
>>>
>>> young
>>> gen too.  You might vary the number of gc threads to see how that
>>> affects
>>> collection times.  If there's significant differences, then you need
>>> more
>>> hardware threads, i.e., a bigger machine.
>>>
>>> You might also try using compressed pointers via 
>>> -XX:+UseCompressedOops.
>>> That should cut down the total survivor size significantly, perhaps
>>> enough
>>> to that your current hardware threads can collect significantly faster.
>>>
>>> Heap size
>>> will be limi

[The entire original message is not included]


More information about the hotspot-gc-use mailing list