G1 performance

Thu May 9 20:02:31 UTC 2013

can we have a link to the bug when you file it? I'm trying to make sense of these graphs but... I think I need the explanation.

-- Kirk

On 2013-05-09, at 7:56 PM, Srinivas Ramakrishna <ysr1729 at gmail.com> wrote:

> On Thu, May 9, 2013 at 8:09 AM, Jon Masamitsu <jon.masamitsu at oracle.com> wrote:
>> 
>> On 5/9/13 12:22 AM, Srinivas Ramakrishna wrote:
> ...
>> 
>>> 
>>> As regards ParallelGC's adaptive sizing, I have recently found that
>>> the filter that maintains the estimate
>>> of the promotion volume (which sereves among other things, as input to
>>> the decision whether a
>>> scavenge might fail) produces estimates that are far too perssimistic
>>> and, in a heap-constrained situation,
>>> can end up locking the JVM into what appears to be a self-sustaining
>>> suboptimal state from which
>>> it will not easily be dislodged.  A smallish tweak to the filter will
>>> produce much better behaviour.
>>> I will submit  patch when I have had a chance to test its performance.
>> 
>> 
>> Ok. You have my attention.  What's the nature of the tweak?
> 
> :-)
> 
> The padded promotion average calculation adds absolute deviation
> estimate to the mean estimate.
> There are three issues:
> . the absolute deviation is positive (naturally) even when the
> deviation is negative (so when promotion volume is declining wrt the
> prediction), which
>  increases the padding, which increases the deviation, which .... A
> fix is to use a deviation of 0 when it's negative (the case above).
> This reduces
>  the pessimism in the prediction.
> . the filter gives much more weightage to historical estimate than to
> current value. This can be good since it prevents oscillation or
> over-reaction to
>  small short-term changes, acting as a low pass filter, but the
> weightage is so skewed that the reaction becomes extremely sluggish.
> Tuning the
>  weightage to favour past and present equally seems to work well (of
> course this needs to be tested over a range of benchmarks to see if
> that
>  value is generally good).
> . when we get stuck (because of the above two, but mostly because of
> the first reason) in a back-to-back full gc  state, i think there is
> no new
>  promotion estimate telemetry coming into the filter. This leaves the
> output of the filter locked up in the past. We need to either decay
> the filter output
>  at each gc event (major or minor), or better just feed it a
> promotion volume estimate computed from comparing the old gen sizes
> before and after
>  the collection (with a floor at 0).
> 
> I ran a simulation of a range of filters in excel with the same inputs
> as the telemetric signal we were getting and plotted the outputs.
> I need to recheck the data (as my excel skills are less than stellar),
> but I am attaching a preliminary plot. (I'll make sure to recheck and
> attach a checked plot to the bug that i am planning to submit once we
> have done some more testing here with the proposed changes.)
> The first is a "long-term" plot of the outputs, the second zooms into
> the first few moments after the spike and its decline.
> 
> The outputs are a series of changes I made (which i'll explain in the
> bug that i file) to the filter, showing the effect of each change to
> the
> response from the filter.
> 
> -- ramki
> <BetterPredictionFilter_time.tiff><BetterPredictionFilter_time_start.tiff>