Strange G1 behavior

Fri Oct 20 12:45:29 UTC 2017

> On Oct 20, 2017, at 1:41 PM, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
> 
> Hi all,
> 
> On Tue, 2017-10-17 at 23:48 +0200, Guillaume Lederrey wrote:
>> Quick note before going to bed...
>> 
>> On 17 October 2017 at 23:28, Kirk Pepperdine <kirk at kodewerk.com>
>> wrote:
>>> Hi all,
>>> [...]
>>> This log looks different in that the mixed collections are actually
>>> recovering space. However there seems to be an issue with RSet
>>> update times just as heap occupancy jumps though I would view this
>>> as a normal response to increasing tenured occupancies. The spike
>>> in tenured occupancy does force young to shrink to a size that
>>> should see “to-space” with no room to accept in-coming survivors.
>>> 
>>> Specific recommendations; the app is churning using enough weak
>>> references that your app would benefit from parallelizing reference
>>> processing (off by default), I would double max heap and limit the
>>> shrinking of young to 20% to start with (default is 5%).
>>> 
>> 
>> I'll double max heap tomorrow. Parallel ref processing is already
>> enabled (-XX:+ParallelRefProcEnabled), and young is already limited
>> to max 25% (-XX:G1MaxNewSizePercent=25), I'll add -
>> XX:G1NewSizePercent=20 (, if that's the correct option).
> 
> Did that help?
> 
> I am not convinced that increasing the min young gen helps, as it will
> only lengthen the time between mixed gcs, which potentially means that
> more data could accumulate to be promoted, but the time goal within the
> collection (the amount of memory reclaimed) will stay the same.
> Of course, if increasing the eden gives the objects in there enough
> time to die, then it's a win.

In my experience promotion rates are exacerbated by an overly small young gen (which translates into an overly small to-space). In these cases I believe it only adds to the overall pressure on tenured and it part of the reason why the full recovers as much as it does.   Not promoting has the benefit of not requiring a mixed collection to clean things up. Thus larger survivors still can play a positive role as they do in generational collectors. MMV will vary with each application.

> 
> The problem with that is that during the time from start of marking to
> the end of the mixed gc, more data is promoted than reclaimed ;)

Absolutely… and this is a case of the tail wagging the dog. Overly small results in premature promotion which results in more pressure on tenured results in more GC activity in tenured. GC activity in tenured is still to be avoided unless it shouldn’t be avoided.

> One problem is the marking algorithm G1 uses in JDK8 which can overflow
> easily easily, causing it to restart marking ("concurrent-mark-reset-
> for-overflow" message). [That has been fixed in JDK9]
> 
> To fix that, set -XX:MarkStackSize to the same value as
> -XX:MarkStackSizeMax (i.e. -XX:MarkStackSize=512M
> -XX:MarkStackSizeMax=512M - probably a bit lower is fine too, and since
> you set the initial mark stack size to the same as max I think you can
> leave MarkStackSizeMax off from the command line).

This is great information. Unfortunately there isn’t any data to help anyone understand what a reason able setting should be. Would it also be reasonable to double the mark stack size when you these failure. Also, is the max size of the stack bigger if you configure a larger heap?

> 
> I do not think region liveness information is interesting any more (-
> XX:+G1PrintRegionLivenessInfo), so you could remove it again.

+1, sorry I forgot to mention this… although having a clean run (one without failures) with the data would be intellectually interesting.

Kind regards,
Kirk