Proposal: MaxTenuringThreshold and age field size changes

Thu Jun 5 06:26:57 UTC 2008

Hi Ramki,

please see inline.

Y Srinivas Ramakrishna schrieb:
> Hi Nick --
> 
> thanks for sharing that experience and for the nice description! 
> Looking at the PrintTenuringDistribiution for your application
> with the old, 5-age bit JVM run with MTT=24 would probably be
> illuminating. By the way, would you expect the objects surviving
> 24 scavenges (in that original configuration) to live for a pretty long time?
> Do you know if your object age distribution has a long thin tail,
> and, if so, where it falls to 0? If it is the case that most applications
> have a long thin tail and that the population in that tail (age > MTT)
> is large, then NeverTenure is probably a bad idea.

Basically, all our scenarios work like this: Upon an incoming request, 
we create some hundred kilobytes of objects. Most of those objects die 
after this request has been processed, which usually takes something 
between 10ms and 100ms. A few kilobytes of objects remain alive in the 
Java heap until the session is terminated, which is signaled through a 
further request. The session length may vary between few seconds, some 
minutes or even hours, depending on the scenario. There are no 
session-related objects dying "in between". So, assume a fix session 
duration of 60 seconds for all requests, this means that all objects 
that are not just temporary and survive the first gc cycle will only die 
after 60 seconds. With a gc interval of, let's say, 6 seconds, this 
would be 10 cycles. MTT >= 10 would be sufficient to collect all those 
objects in the young gen. As you can see, in this example there is no 
"thin tail": We have the same amount of objects at each age 1-10.

Let's assume a session duration of 120 seconds. As you can easily see, 
with the same gc pattern, all those objects would survive 20 gc cycles. 
With a max MTT of 15, they would all tenure into old after 15 cycles. 
It's quite obvious that for this scenario, it would be better to set 
MTT=1, since we would avoid copying those objects 14 times before all of 
them tenure anyway.

Another way would be to use "never tenure" for 120 second sessions. This 
would allow to keep them in the young gen for 20 cycles, provided the 
survivor spaces are large enough. But imagine a third scenario with a 
session duration of 10 minutes. Such a scenario would definately 
overflow the survivor spaces with the "never tenure" policy.

We need a set of JVM settings that fits all scenarios. Assuming that 
scenarios with mixed session durations run simultaneously, our aim is to 
collect the objects for most scenarios in the young gen (before any of 
them tenure), and accept that the objects of some scenarios (after a 
reasonable amount of copying) tenure into old.

The question is where we draw that line... With our original gc 
intervals, 24 cycles seemed to be a good trade-off. Now, that we can 
only set MTT<=15, MTT=15 with a streched gc interval (by enlarging the 
eden) achieves the same.

> As you found, the basic rule is always that if a small transient overload
> causes survivor space overflow, that in turn can cause a much longer-term "ringing effect"
> because of "nepotism" (which begets more nepotism, ....,) the effects of
> which can last much longer than the initial transient that set it off.
> And, yes, NeverTenure will lead to overflow in long-tailed distributions
> unless your survivor spaces are large enough to accomodate the tail;
> which is just another way of saying, if the tail does not fit, you will
> have overflow which will cause nepotism and its ill-effects.

Unfortunately, not only long-tailed distributions lead to survivor space 
overflow! This is what I meant with my sentence "most surprisingly..." 
in my previous mail:

I've run a scenario where all object died after 2 gc cycles. With 
MTT=15, they were nicely collected after 2 cycles and the survivor 
spaces were only something like 10% full.

The *same* scenario with MTT=24 ("never tenure") filled up the survivor 
space to 100%, even causing tenuring of (live or dead, I don't know) 
objects into the old gen. It's obvious that 90% of the objects filling 
up the survivor spaces must have been dead already, just gc didn't 
collect them!

This doesn't happen always, but often (so it is reproducable). To make 
it more clear: The same scenario with the same configuration (MTT=24) 
doesn't necessarily fill up the survivor spaces to 100%. I've also had 
runs where it only filled 20% of the survivor spaces. That's still 
factor 2 of MTT=15 (which still means, there's 50% garbage being copied 
around), but not as worse as filling them up to 100%.

And it get's even better: During a 30 minute run, I've even seen a 
change in gc behavior (without any change in the load): For the first 20 
minutes, the target survivor space was always 100% full after a 
collection. Then, all of a sudden, from the next collection on, it was 
only 20% full for the remaining 10 minutes.

Tony has an explanation for this (Tony? You can probably explain this 
better than me?). This is one of the main reasons why "never tenure" 
works very poor for us.

Nick.

> 
> Thanks again for sharing that experience and for the nice
> explanation of your experiments.
> 
> -- ramki
> 
>> So, as a conclusion: Yes, we are missing the 5th age bit. NeverTenure 
>>
>> works very bad for us. MTT=15 with our original eden size is too 
>> small, 
>> but increasing the eden size allows us to get similar behavior with 
>> MTT=15 as with the original configuration (MTT=24 and 5 age bits).
>>
>> I think, Tony's suggestion to limit the configurable MTT to 2^n-1 
>> (with 
>> n being the age bits) is a good solution. At least according to my 
>> tests, this is much better then activating the "never tenure" policy 
>> when the user is not aware of this.
>>
>> I hope some of this may have been helpful for you. I will send some more
>> detailed results and logfiles directly to Tony.
>>
>> Nick.
>