Proposal: MaxTenuringThreshold and age field size changes

Fri Jun 6 19:46:17 UTC 2008

I just came across a neglected process here that was last tuned so long
ago it was still using the never tenure approach. So yesterday I was
down stairs retuning. My goal was to reduce object promotions, minor
collection frequency and reduce collection times.

So here is what I did - 
	This is before
	JVM_ARGS="${JVM_ARGS} -XX:NewSize=100m"
        JVM_ARGS="${JVM_ARGS} -XX:MaxNewSize=100m"
        JVM_ARGS="${JVM_ARGS} -XX:SurvivorRatio=128"
        JVM_ARGS="${JVM_ARGS} -XX:MaxTenuringThreshold=0"
        JVM_ARGS="${JVM_ARGS} -XX:TargetSurvivorRatio=100"
        JVM_ARGS="${JVM_ARGS} -XX:CMSInitiatingOccupancyFraction=40"

	This is first guess based on what I thought would be close

	JVM_ARGS="${JVM_ARGS} -XX:NewSize=200m"
        JVM_ARGS="${JVM_ARGS} -XX:MaxNewSize=200m"
        JVM_ARGS="${JVM_ARGS} -XX:SurvivorRatio=10"
        JVM_ARGS="${JVM_ARGS} -XX:MaxTenuringThreshold=15"
        JVM_ARGS="${JVM_ARGS} -XX:-CMSPermGenPrecleaningEnabled"
        JVM_ARGS="${JVM_ARGS} -XX:TargetSurvivorRatio=50"
        JVM_ARGS="${JVM_ARGS} -XX:CMSInitiatingOccupancyFraction=60"

The test ran really well. I achieved all three goals right off the bat.
So the only thing to do was adjust the tenuring to minimize the copies
and still minimize promotions as well.

The data indicated 4 slots would be optimal. After re-running my numbers
did not improve. In fact the collection time increased slightly. So I
upped the slots to 6 to maximize object death at the expense of an extra
2 copies. The results were significantly better.

The lesson here is promotions are much more expensive than copies. I
thought I'd share that.

-----Original Message-----
From: hotspot-gc-dev-bounces at openjdk.java.net
[mailto:hotspot-gc-dev-bounces at openjdk.java.net] On Behalf Of
hotspot-gc-dev-request at openjdk.java.net
Sent: Thursday, June 05, 2008 2:00 PM
To: hotspot-gc-dev at openjdk.java.net
Subject: hotspot-gc-dev Digest, Vol 12, Issue 4

Send hotspot-gc-dev mailing list submissions to
	hotspot-gc-dev at openjdk.java.net

To subscribe or unsubscribe via the World Wide Web, visit
	http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-dev
or, via email, send a message with subject or body 'help' to
	hotspot-gc-dev-request at openjdk.java.net

You can reach the person managing the list at
	hotspot-gc-dev-owner at openjdk.java.net

When replying, please edit your Subject line so it is more specific
than "Re: Contents of hotspot-gc-dev digest..."

Today's Topics:

   1. Re: Proposal: MaxTenuringThreshold and age field size changes
      (Y Srinivas Ramakrishna)
   2. Re: Proposal: MaxTenuringThreshold and age field size changes
      (kirk)
   3. Re: Proposal: MaxTenuringThreshold and age field size changes
      (Jon Masamitsu)
   4. hg: jdk7/hotspot-gc/hotspot: 6629727: assertion in
      set_trap_state()	in methodDataOop.hpp is too strong.
      (jon.masamitsu at sun.com)

----------------------------------------------------------------------

Message: 1
Date: Thu, 05 Jun 2008 01:03:11 -0700
From: Y Srinivas Ramakrishna <Y.S.Ramakrishna at Sun.COM>
Subject: Re: Proposal: MaxTenuringThreshold and age field size changes
To: Nicolas Michael <email at nmichael.de>
Cc: Tony Printezis <tony.printezis at Sun.COM>,
	hotspot-gc-dev at openjdk.java.net
Message-ID: <f81dbe8d5a24.48473b4f at sun.com>
Content-Type: text/plain; charset=us-ascii

Hi Nick --

You make a good point.

I think there are scenarios in which the use of PLAB's in the survivor
spaces
can cause survivor space overflow, even when the total space in the
survivor
space should have otherwise been enough. That then sets in motion a
series
of "nepotistical" cycles, which are caused when young objects Z that are
prematurely promoted die in the old generation while holding references
to
now dead objects Z' in the young generation. Call these objects
"zombies"
because they are dead but not recognized as such.

Z keep Z' alive, because a scavenge does not know that Z is dead,
because it considers all references from the old gen as roots.
Worse, if Z' has references to Z and Z' stays in the young generation
forever (which they can under the circumstances you describe)
then Z will not be recognized as dead by a CMS collection (which
currently treats all objects in the young generation as roots).

This is a well-understood problem when spaces are collected
independently
in this manner. (The workaround is to have the CMS collector not treat
the young generation as a source of roots but rather to mark through the
young gen objects starting from roots.)

Of course when MTT=15, then eventually every such Z' will be forced to
promote to
the old gen and the garbage cycle will all (hopefully) move into the old
gen and
thence will be reclaimed by the old gen CMS collection.

This would explain the behaviour difference you saw.

Now to come to the  first point I made above:
the use of multiple scavenger threads and their use of PLABs can
sometimes
cause this kind of overflow to happen (especially if there is the
occasional
large object). You might want to switch off the use of survivor space
PLAB's (or fix them at a vey small modest value) or just use a
single-threaded
scavenger and see that this kind of behaviour might reduce because
overflow
becomes much less likely.

I think there is an open bug to tune the adaptive PLAB sizing code to
eliminate this kind of pathological behaviour, but we have not had the
opportunity to
get to that bug.

If you have PrintGCDetails logs, they would probably show the
premature promotion happening. PrintTenuringDistribution would not
show any objects of age greater than 2 initially, yet some objects
would be seen to be promoted, and then by virtue of the
cross-generational
references from a promoted zombie, we would artificially expand the
lifetime of objects in the young generation (creating zombies Z')
and so on. I believe it was partially this kind of behaviour on the
part of generational scavenger implementations that caused some
people in the past to start advocating the clearing of all references
in objects that they knew they would be dropping references to (which of
course
we all know is difficult to do correctly and fraught with all kinds of
problems and errors).

-- ramki

------------------------------

Message: 2
Date: Thu, 05 Jun 2008 09:51:42 +0200
From: kirk <kirk.pepperdine at gmail.com>
Subject: Re: Proposal: MaxTenuringThreshold and age field size changes
To: Jon Masamitsu <Jon.Masamitsu at Sun.COM>
Cc: hotspot-gc-dev at openjdk.java.net, Tony Printezis
	<tony.printezis at Sun.COM>,	Y Srinivas Ramakrishna
	<Y.S.Ramakrishna at Sun.COM>
Message-ID: <48479B0E.5060403 at javaperformancetuning.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Hi really enjoyed reading Nick's description of how he ran into trouble 
and what he did to fix it.

My preference to delay premature promotions has always been to increase 
the size of survivor spaces. Using larger tenuring thresholds hasn't 
sounded appealing because of the cost of copying. Having GC run less 
often always seemed like a more desirable goal. And I think this is 
especially true in young where it is mostly about live object harvesting

and having GC run less often gives object a better chance to expire.

If these bits are so important I think that limiting MMT 31 seems 
reasonable at least today. If you have a bit to spare, than 63 seems 
even safer.

Regards,
Kirk Pepperdine

Jon Masamitsu wrote:
> Since  higher MTT's  are important for some applications
> and we really don't know how high is high enough, should
> we take John P's suggestion a step further and use the
> upper 15 bits of any MTT as the tenuring age (i.e.,
> setting MTT to 60, 61, 62 or 63 would tenure at
> age 60)?  We just have to increment the age every
> n-th scavenge as appropriate.
>
> Y Srinivas Ramakrishna wrote On 06/04/08 14:54,:
>
>   
>> Hi Nick --
>>
>> thanks for sharing that experience and for the nice description! 
>> Looking at the PrintTenuringDistribiution for your application
>> with the old, 5-age bit JVM run with MTT=24 would probably be
>> illuminating. By the way, would you expect the objects surviving
>> 24 scavenges (in that original configuration) to live for a pretty
long time?
>> Do you know if your object age distribution has a long thin tail,
>> and, if so, where it falls to 0? If it is the case that most
applications
>> have a long thin tail and that the population in that tail (age >
MTT)
>> is large, then NeverTenure is probably a bad idea.
>>
>> As you found, the basic rule is always that if a small transient
overload
>> causes survivor space overflow, that in turn can cause a much
longer-term "ringing effect"
>> because of "nepotism" (which begets more nepotism, ....,) the effects
of
>> which can last much longer than the initial transient that set it
off.
>> And, yes, NeverTenure will lead to overflow in long-tailed
distributions
>> unless your survivor spaces are large enough to accomodate the tail;
>> which is just another way of saying, if the tail does not fit, you
will
>> have overflow which will cause nepotism and its ill-effects.
>>
>> Thanks again for sharing that experience and for the nice
>> explanation of your experiments.
>>
>> -- ramki
>>
>>  
>>
>>     
>>> So, as a conclusion: Yes, we are missing the 5th age bit.
NeverTenure 
>>>
>>> works very bad for us. MTT=15 with our original eden size is too 
>>> small, 
>>> but increasing the eden size allows us to get similar behavior with 
>>> MTT=15 as with the original configuration (MTT=24 and 5 age bits).
>>>
>>> I think, Tony's suggestion to limit the configurable MTT to 2^n-1 
>>> (with 
>>> n being the age bits) is a good solution. At least according to my 
>>> tests, this is much better then activating the "never tenure" policy

>>> when the user is not aware of this.
>>>
>>> I hope some of this may have been helpful for you. I will send some
more
>>> detailed results and logfiles directly to Tony.
>>>
>>> Nick.
>>>    
>>>
>>>       
>
>
>   

------------------------------

Message: 3
Date: Thu, 05 Jun 2008 08:39:01 -0700
From: Jon Masamitsu <Jon.Masamitsu at Sun.COM>
Subject: Re: Proposal: MaxTenuringThreshold and age field size changes
To: hotspot-gc-dev at openjdk.java.net
Message-ID: <48480895.7010502 at Sun.COM>
Content-Type: text/plain; format=flowed; charset=ISO-8859-1

Jon Masamitsu wrote:
> Since  higher MTT's  are important for some applications
> and we really don't know how high is high enough, should
> we take John P's suggestion a step further and use the
> upper 15 bits of any MTT as the tenuring age (i.e.

Sorry, guys.  That should have been 4 bits.  You can
stop wondering where that heck I got 15 bits now.

> setting MTT to 60, 61, 62 or 63 would tenure at
> age 60)?  We just have to increment the age every
> n-th scavenge as appropriate.
> 
> Y Srinivas Ramakrishna wrote On 06/04/08 14:54,:
> 
>> Hi Nick --
>>
>> thanks for sharing that experience and for the nice description! 
>> Looking at the PrintTenuringDistribiution for your application
>> with the old, 5-age bit JVM run with MTT=24 would probably be
>> illuminating. By the way, would you expect the objects surviving
>> 24 scavenges (in that original configuration) to live for a pretty
long time?
>> Do you know if your object age distribution has a long thin tail,
>> and, if so, where it falls to 0? If it is the case that most
applications
>> have a long thin tail and that the population in that tail (age >
MTT)
>> is large, then NeverTenure is probably a bad idea.
>>
>> As you found, the basic rule is always that if a small transient
overload
>> causes survivor space overflow, that in turn can cause a much
longer-term "ringing effect"
>> because of "nepotism" (which begets more nepotism, ....,) the effects
of
>> which can last much longer than the initial transient that set it
off.
>> And, yes, NeverTenure will lead to overflow in long-tailed
distributions
>> unless your survivor spaces are large enough to accomodate the tail;
>> which is just another way of saying, if the tail does not fit, you
will
>> have overflow which will cause nepotism and its ill-effects.
>>
>> Thanks again for sharing that experience and for the nice
>> explanation of your experiments.
>>
>> -- ramki
>>
>>  
>>
>>> So, as a conclusion: Yes, we are missing the 5th age bit.
NeverTenure 
>>>
>>> works very bad for us. MTT=15 with our original eden size is too 
>>> small, 
>>> but increasing the eden size allows us to get similar behavior with 
>>> MTT=15 as with the original configuration (MTT=24 and 5 age bits).
>>>
>>> I think, Tony's suggestion to limit the configurable MTT to 2^n-1 
>>> (with 
>>> n being the age bits) is a good solution. At least according to my 
>>> tests, this is much better then activating the "never tenure" policy

>>> when the user is not aware of this.
>>>
>>> I hope some of this may have been helpful for you. I will send some
more
>>> detailed results and logfiles directly to Tony.
>>>
>>> Nick.
>>>    
>>>
> 

------------------------------

Message: 4
Date: Thu, 05 Jun 2008 17:56:18 +0000
From: jon.masamitsu at sun.com
Subject: hg: jdk7/hotspot-gc/hotspot: 6629727: assertion in
	set_trap_state()	in methodDataOop.hpp is too strong.
To: jdk7-changes at openjdk.java.net, hotspot-gc-dev at openjdk.java.net
Message-ID: <20080605175622.8667328041 at hg.openjdk.java.net>

Changeset: 0b27f3512f9e
Author:    jmasa
Date:      2008-06-04 13:51 -0700
URL:
http://hg.openjdk.java.net/jdk7/hotspot-gc/hotspot/rev/0b27f3512f9e

6629727: assertion in set_trap_state() in methodDataOop.hpp is too
strong.
Summary: The assertion can failure due to race conditions.
Reviewed-by: never

! src/share/vm/oops/methodDataOop.hpp

End of hotspot-gc-dev Digest, Vol 12, Issue 4
*********************************************