Strange G1 behavior

Fri Oct 20 13:03:56 UTC 2017

Thanks for your continuing interest in our issue!

I have been firefighting another issue with a user sending a bit too much
traffic our way. Good news, this allowed us to tune our throttling and will
probably result in a slightly smoother load in the future, which can only
help...

I have prepared a change with your suggested improvements [1]. But I will
wait until next Monday to deploy it. I'll send the new logs as soon as I
have them.

Kirk suggested earlier to double the size of the heap (from 16GB to 32GB).
I have not yet implemented that suggestion. Do you think it make sense to
bundle that change with the changes suggested by Thomas? Or should I keep
it for later?

Thanks again for your help!

[1] https://gerrit.wikimedia.org/r/#/c/385364/

On 20 October 2017 at 14:45, Kirk Pepperdine <kirk at kodewerk.com> wrote:

>
> > On Oct 20, 2017, at 1:41 PM, Thomas Schatzl <thomas.schatzl at oracle.com>
> wrote:
> >
> > Hi all,
> >
> > On Tue, 2017-10-17 at 23:48 +0200, Guillaume Lederrey wrote:
> >> Quick note before going to bed...
> >>
> >> On 17 October 2017 at 23:28, Kirk Pepperdine <kirk at kodewerk.com>
> >> wrote:
> >>> Hi all,
> >>> [...]
> >>> This log looks different in that the mixed collections are actually
> >>> recovering space. However there seems to be an issue with RSet
> >>> update times just as heap occupancy jumps though I would view this
> >>> as a normal response to increasing tenured occupancies. The spike
> >>> in tenured occupancy does force young to shrink to a size that
> >>> should see “to-space” with no room to accept in-coming survivors.
> >>>
> >>> Specific recommendations; the app is churning using enough weak
> >>> references that your app would benefit from parallelizing reference
> >>> processing (off by default), I would double max heap and limit the
> >>> shrinking of young to 20% to start with (default is 5%).
> >>>
> >>
> >> I'll double max heap tomorrow. Parallel ref processing is already
> >> enabled (-XX:+ParallelRefProcEnabled), and young is already limited
> >> to max 25% (-XX:G1MaxNewSizePercent=25), I'll add -
> >> XX:G1NewSizePercent=20 (, if that's the correct option).
> >
> > Did that help?
> >
> > I am not convinced that increasing the min young gen helps, as it will
> > only lengthen the time between mixed gcs, which potentially means that
> > more data could accumulate to be promoted, but the time goal within the
> > collection (the amount of memory reclaimed) will stay the same.
> > Of course, if increasing the eden gives the objects in there enough
> > time to die, then it's a win.
>
> In my experience promotion rates are exacerbated by an overly small young
> gen (which translates into an overly small to-space). In these cases I
> believe it only adds to the overall pressure on tenured and it part of the
> reason why the full recovers as much as it does.   Not promoting has the
> benefit of not requiring a mixed collection to clean things up. Thus larger
> survivors still can play a positive role as they do in generational
> collectors. MMV will vary with each application.
>
> >
> > The problem with that is that during the time from start of marking to
> > the end of the mixed gc, more data is promoted than reclaimed ;)
>
> Absolutely… and this is a case of the tail wagging the dog. Overly small
> results in premature promotion which results in more pressure on tenured
> results in more GC activity in tenured. GC activity in tenured is still to
> be avoided unless it shouldn’t be avoided.
>
> > One problem is the marking algorithm G1 uses in JDK8 which can overflow
> > easily easily, causing it to restart marking ("concurrent-mark-reset-
> > for-overflow" message). [That has been fixed in JDK9]
> >
> > To fix that, set -XX:MarkStackSize to the same value as
> > -XX:MarkStackSizeMax (i.e. -XX:MarkStackSize=512M
> > -XX:MarkStackSizeMax=512M - probably a bit lower is fine too, and since
> > you set the initial mark stack size to the same as max I think you can
> > leave MarkStackSizeMax off from the command line).
>
> This is great information. Unfortunately there isn’t any data to help
> anyone understand what a reason able setting should be. Would it also be
> reasonable to double the mark stack size when you these failure. Also, is
> the max size of the stack bigger if you configure a larger heap?
>
>
> >
> > I do not think region liveness information is interesting any more (-
> > XX:+G1PrintRegionLivenessInfo), so you could remove it again.
>
> +1, sorry I forgot to mention this… although having a clean run (one
> without failures) with the data would be intellectually interesting.
>
> Kind regards,
> Kirk
>
>

-- 
mobile : +41 76 573 32 40
skype : Guillaume.Lederrey
Freenode: gehel

projects :
* https://signs-web.herokuapp.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20171020/3a1fe843/attachment.htm>