G1 performance

Thu May 9 17:56:32 UTC 2013

On Thu, May 9, 2013 at 8:09 AM, Jon Masamitsu <jon.masamitsu at oracle.com> wrote:
>
> On 5/9/13 12:22 AM, Srinivas Ramakrishna wrote:
...
>
>>
>> As regards ParallelGC's adaptive sizing, I have recently found that
>> the filter that maintains the estimate
>> of the promotion volume (which sereves among other things, as input to
>> the decision whether a
>> scavenge might fail) produces estimates that are far too perssimistic
>> and, in a heap-constrained situation,
>> can end up locking the JVM into what appears to be a self-sustaining
>> suboptimal state from which
>> it will not easily be dislodged.  A smallish tweak to the filter will
>> produce much better behaviour.
>> I will submit  patch when I have had a chance to test its performance.
>
>
> Ok. You have my attention.  What's the nature of the tweak?

:-)

The padded promotion average calculation adds absolute deviation
estimate to the mean estimate.
There are three issues:
. the absolute deviation is positive (naturally) even when the
deviation is negative (so when promotion volume is declining wrt the
prediction), which
  increases the padding, which increases the deviation, which .... A
fix is to use a deviation of 0 when it's negative (the case above).
This reduces
  the pessimism in the prediction.
. the filter gives much more weightage to historical estimate than to
current value. This can be good since it prevents oscillation or
over-reaction to
  small short-term changes, acting as a low pass filter, but the
weightage is so skewed that the reaction becomes extremely sluggish.
Tuning the
  weightage to favour past and present equally seems to work well (of
course this needs to be tested over a range of benchmarks to see if
that
  value is generally good).
. when we get stuck (because of the above two, but mostly because of
the first reason) in a back-to-back full gc  state, i think there is
no new
  promotion estimate telemetry coming into the filter. This leaves the
output of the filter locked up in the past. We need to either decay
the filter output
  at each gc event (major or minor), or better just feed it a
promotion volume estimate computed from comparing the old gen sizes
before and after
  the collection (with a floor at 0).

I ran a simulation of a range of filters in excel with the same inputs
as the telemetric signal we were getting and plotted the outputs.
I need to recheck the data (as my excel skills are less than stellar),
but I am attaching a preliminary plot. (I'll make sure to recheck and
attach a checked plot to the bug that i am planning to submit once we
have done some more testing here with the proposed changes.)
The first is a "long-term" plot of the outputs, the second zooms into
the first few moments after the spike and its decline.

The outputs are a series of changes I made (which i'll explain in the
bug that i file) to the filter, showing the effect of each change to
the
response from the filter.

-- ramki
-------------- next part --------------
A non-text attachment was scrubbed...
Name: BetterPredictionFilter_time.tiff
Type: image/tiff
Size: 111354 bytes
Desc: not available
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20130509/52bee8e8/BetterPredictionFilter_time.tiff>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: BetterPredictionFilter_time_start.tiff
Type: image/tiff
Size: 151350 bytes
Desc: not available
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20130509/52bee8e8/BetterPredictionFilter_time_start.tiff>