Bug in G1

Fri Jul 21 14:34:27 UTC 2017

Hi Kirk,

On Fri, 2017-07-21 at 10:34 +0300, Kirk Pepperdine wrote:
> Hi all,
> 
> A while back I mentioned to Erik at JFokus that I was seeing a
> puzzling behavior in the G1 where without any obvious failure, heap
> occupancy after collections would spike which would trigger a full
> which would (unexpectedly) completely recover everything down to the
> expected live set. Yesterday while working with Simone Bordet on the
> problem we came to the realization that we were seeing a pattern
> prior to the ramp up to the Full, Survivor space would be
> ergonomically resized to 0 -> 0. The only way to reset the situation
> was to run a full collection. In our minds this doesn’t make any
> sense to reset survivor space to 0. So far this is an observation
> from a single GC log but I recall seeing the pattern in many other
> logs. Before I go through the exercise of building a super grep to
> run over my G1 log repo I’d like to ask; under what conditions would
> it make sense to have the survivor space resized to 0? And if not,
>  would this be bug in G1? We tried reproducing the behavior in some
> test applications but I fear we often only see this happening in
> production applications that have been running for several days. It’s
> a behavior that I’ve seen in 1.7.0 and 1.8.0. No word on 9.

  sounds similar to https://bugs.openjdk.java.net/browse/JDK-8037500.
Could you please post the type of collections for a few more gcs before
the zero-sized ones? It would be particularly interesting if there is a
mixed gc with to-space exhaustion just before this sequence. And if
there are log messages with attempts to start marking too.

As for why that bug has been closed as "won't fix" because we do not
have a reproducer (any more) to test any changes in addition to the
stated reasons that the performance impact seemed minor at that time.

There have been some changes in how the next gc is calculated in 9 too,
so I do not know either if 9 is also affected (particularly one of
these young-only gc's would not be issued any more).

I can think of at least one more reasons other than stated in the CR
why this occurs at least for 8u60+ builds. There is the possibility
particularly in conjunction with humongous object allocation that after
starting the mutator, immediately afterwards a young gc that reclaims
zero space is issued, e.g.:

young-gc, has X regions left at the end, starts mutators
mutator 1 allocates exactly X regions as humongous objects
mutator 2 allocates, finds that there are no regions left, issues
young-gc request; in this young-gc eden and survivor are of obviously
of zero size
[...and so on...]

Note that this pattern could repeat multiple times as young gc may
reclaim space from humongous objects (eager reclaim!) until at some
point it ran into full gc.

The logging that shows humongous object allocation (something about
reaching threshold and starting marking) could confirm this situation.

No guarantees about that being the actual issue though.

Thanks,
  Thomas