G1 performance

Thu May 9 23:21:41 UTC 2013

On Thu, May 9, 2013 at 1:02 PM, Kirk Pepperdine <kirk at kodewerk.com> wrote:
> can we have a link to the bug when you file it? I'm trying to make sense of these graphs but... I think I need the explanation.

Sure; i'll do that.

The x-axis is time, the y-axis is the signal value. "promo" is the
input signal, the paddedAv*'s are the output signals (1 is the
original, 5 is the suggested final filter as described, with 2,3,4
being outputs of other filters I tried en route to 5). Details will be
in the bug.

-- ramki

>
> -- Kirk
>
>
> On 2013-05-09, at 7:56 PM, Srinivas Ramakrishna <ysr1729 at gmail.com> wrote:
>
>> On Thu, May 9, 2013 at 8:09 AM, Jon Masamitsu <jon.masamitsu at oracle.com> wrote:
>>>
>>> On 5/9/13 12:22 AM, Srinivas Ramakrishna wrote:
>> ...
>>>
>>>>
>>>> As regards ParallelGC's adaptive sizing, I have recently found that
>>>> the filter that maintains the estimate
>>>> of the promotion volume (which sereves among other things, as input to
>>>> the decision whether a
>>>> scavenge might fail) produces estimates that are far too perssimistic
>>>> and, in a heap-constrained situation,
>>>> can end up locking the JVM into what appears to be a self-sustaining
>>>> suboptimal state from which
>>>> it will not easily be dislodged.  A smallish tweak to the filter will
>>>> produce much better behaviour.
>>>> I will submit  patch when I have had a chance to test its performance.
>>>
>>>
>>> Ok. You have my attention.  What's the nature of the tweak?
>>
>> :-)
>>
>> The padded promotion average calculation adds absolute deviation
>> estimate to the mean estimate.
>> There are three issues:
>> . the absolute deviation is positive (naturally) even when the
>> deviation is negative (so when promotion volume is declining wrt the
>> prediction), which
>>  increases the padding, which increases the deviation, which .... A
>> fix is to use a deviation of 0 when it's negative (the case above).
>> This reduces
>>  the pessimism in the prediction.
>> . the filter gives much more weightage to historical estimate than to
>> current value. This can be good since it prevents oscillation or
>> over-reaction to
>>  small short-term changes, acting as a low pass filter, but the
>> weightage is so skewed that the reaction becomes extremely sluggish.
>> Tuning the
>>  weightage to favour past and present equally seems to work well (of
>> course this needs to be tested over a range of benchmarks to see if
>> that
>>  value is generally good).
>> . when we get stuck (because of the above two, but mostly because of
>> the first reason) in a back-to-back full gc  state, i think there is
>> no new
>>  promotion estimate telemetry coming into the filter. This leaves the
>> output of the filter locked up in the past. We need to either decay
>> the filter output
>>  at each gc event (major or minor), or better just feed it a
>> promotion volume estimate computed from comparing the old gen sizes
>> before and after
>>  the collection (with a floor at 0).
>>
>> I ran a simulation of a range of filters in excel with the same inputs
>> as the telemetric signal we were getting and plotted the outputs.
>> I need to recheck the data (as my excel skills are less than stellar),
>> but I am attaching a preliminary plot. (I'll make sure to recheck and
>> attach a checked plot to the bug that i am planning to submit once we
>> have done some more testing here with the proposed changes.)
>> The first is a "long-term" plot of the outputs, the second zooms into
>> the first few moments after the spike and its decline.
>>
>> The outputs are a series of changes I made (which i'll explain in the
>> bug that i file) to the filter, showing the effect of each change to
>> the
>> response from the filter.
>>
>> -- ramki
>> <BetterPredictionFilter_time.tiff><BetterPredictionFilter_time_start.tiff>
>