ParNew - how does it decide if Full GC is needed

Fri May 9 18:01:28 UTC 2014

On 05/08/14 18:04, Vitaly Davidovich wrote:
> Thanks Peter, I understand the mess that a promotion failure causes now.  I'm interested in your opinion on Ramki's last point, which is to defer the full gc until the next scavenge (I.e. remember that you think you may have promotion failure on next scavenge, and then do a full gc right before that next scavenge).

The algorithm used for a full collection is not well-suited for a heap in which there's a lot of garbage.  It involves (at least) two passes: an object-graph-order marking pass to identify live objects, and then an address-order pass that looks at every object and moves it if it is live (for the compacting collectiors), or puts it on a free-list if it isn't (for the non-moving collectors).  In contrast, scavenging is a single object-graph-order pass that examines only the live objects.  That's why it is such a win for edens where we expect the garbage ratio to be high.

Time a young generation collection on a typical 10GB eden, and one on a similarly-populated 10GB old generation.  For science!

If we wait until the eden is full again, when we know the old generation is also full, then we can't scavenge the young generation.  Maybe that wouldn't bother you because you are hoping there is no next collection.  You've chosen to use the throughput collector, where the focus is on getting the collections done in the most efficient manner.  Ramki is suggesting the low-pause collector, where the focus is on doing most of the collection work concurrent with application work.  If there are cycles to spare (CPU and memory) that might complete a full collection without interfering with the application, so maybe Ramki is not as concerned about cost of a failed promotion and full collection.  One size does not fit all.  You haven't said why you made the choice you did.

> I think you'll find that there are many JVM deployments out there that either restart their JVM daily or force GC off peak hours.  For those cases, you want to keep on running out of eden as much as possible since it's likely that there won't be a next scavenge, either because jvm is restarted or a forced gc is induced off hours, at which point you don't care how long it takes.  It sounds like that's what ParNew does, so maybe that's worth a try.

Certainly there are applications that only run during banking hours.  Or run with hot spares and fail over rather than take a GC pause.  Just as certainly there are applications that run for months and shape their heaps to provide the levels of service they need, with the help of the right collector.  When benchmarking application performance I've been known to run with -Xms512g -Xmx512g -Xmn384g just to see how things go without any interference from the collector.  (I'm in Oracle Labs, but I probably could arrange a sales representative to call you if you want to buy a big machine. :-)

I still haven't seen the output of -XX:+PrintHeapAtGC from your application.  It may be that we can squeeze enough space into the eden to put off the collection for long enough.  Though, it seems brittle.

What was the result of running with smaller survivor spaces to give more space to the eden?  What was the result of running a larger heap with more space in the young generation?  What happens if you run with an extravagant -Xms64g -Xmx64g -Xmn32g?  This might seem farcical on a machine, for example, with only 16GB of RAM, but if you really don't care about the duration of the forced collections before the day begins, and really think you don't allocate more than 16GB during the day, then your operating system might well swap out the parts of the heap that you aren't using any more and keep in memory the parts of the heap that you are using.  If you ever collect it will be a disaster.  If you ever need more live data than you have memory, you will page yourself to death.  (Object-graph-order traversals of your swap space!  I'm curious how that works out off an SSD.)  Brittle isn't the word for this; maybe "pre-stressed"?  I have a hard time even suggesting such a setup, but !
 you seem d
etermined.  If it works, write it up.

			... peter

> Also, in my example here, the induced GC took nearly 7 secs (as compared to 1+ sec for young with a larger space) on a fairly small tenured and reclaimed some very nominal amount - one could say it was a waste of time doing it, but I do appreciate that this setup is not the norm.
>
> Thanks, this has been a very educational discussion.
>
> Sent from my phone
>
> On May 8, 2014 7:11 PM, "Peter B. Kessler" <Peter.B.Kessler at oracle.com <mailto:Peter.B.Kessler at oracle.com>> wrote:
>
>     Recovering from promotion failure is slow.  The advantage of scavenges is that you only touch the live objects, and there aren't many of those.  When a scavenge finishes successfully, you can just reset the allocation pointer in the eden because everything is either unreachable, or has been copied somewhere else.  When a promotion fails, you have an eden with some live object in it, but you don't know where they are.  So (at least with techniques we know about) you have to pick up each young generation object and decide if it's still reachable or not, whether it has already been copied out, and compact the live objects into the space in the eden, and then run around updating all the pointers to the live objects that you moved.  Touching each object in eden is painful (because there are lots of them) and not terribly satisfying (because most of them are reachable).
>
>     Much better to do a successful scavenge that empties the young generation and a full collection on the old generation to create space for the *next* scavenge using a collector that's designed for the old generation.
>
>     Your situation is unusual.  You might have to do more work to get the behavior you want.
>
>                              ... peter
>
>     On 05/08/14 15:57, Vitaly Davidovich wrote:
>
>         Jon,
>
>         Thanks.  So ParNew behavior of not triggering a full gc preemptively seems a better fit for my usecase.  In fact, we will not have another young gc in our setup, allocation rate, and workload.  What's the purpose of doing a preemptive full gc (with all the baggage it comes with) in parallel old? Why not just wait until the next young collection (if that even happens) and take the full gc hit then? I'm failing to see the advantage of taking that hit eagerly, even after reading Peter's description.  Is it to avoid promotion failure that it thinks will happen next time? And if so, it thinks doing the preemptive full gc is faster than handling a promotion failure next time?
>
>         Thanks guys
>
>         Sent from my phone
>
>
>         On 05/08/2014 01:24 PM, Srinivas Ramakrishna wrote:
>
>             The 98% old gen occupancy triggered one of my two neurons.
>             I think there was gc policy code (don't know if it;s still there) that would proactiively precipitate a full gc when it realized (based on recent/historical promotion volume stats) that the next minor gc would not be able to promote its survivors into the head room remaining in old. (Don't ask me why it;s better to do it now rather than the next time the young gen fills up and just rely on the same check). Again I am not looking at the code (as it takes some effort to get to the box where I keep a copy of the hotspot/openjdk code.)
>
>
>         The UseParallelGC collector will do a full GC after a young GC if the UseParallelGC
>         thinks the next young GC will not succeed (per Peter's explanation).  I don't think
>         the ParNew GC will do that.   I looked for that code but did not find it.   I looked in
>         the do_collection() code and the ParNew::collect() code.
>
>         The only case I could find where a full GC followed a young GC with ParNew was
>         if the collection failed to free enough space for the allocation. Given the amount
>         of free space in the young gen after the collection, that's unlikely.  Or course, there
>         could be a bug.
>
>         Jon
>
>             Hopefully Jon &co. will quickly confirm or shoot down the imaginations o my foggy memory!
>             -- ramki
>
>
>             On Thu, May 8, 2014 at 12:55 PM, Vitaly Davidovich <vitalyd at gmail.com <mailto:vitalyd at gmail.com> <mailto:vitalyd at gmail.com <mailto:vitalyd at gmail.com>>> wrote:
>
>                  I captured some usage and capacity stats via jstat right after that full gc that started this email thread.  It showed 0% usage of survivor spaces (which makes sense now that I know that a full gc empties that out irrespective of tenuring threshold and object age); eden usage went down to like 10%; tenured usage was very high, 98%.  Last gc cause was recorded as "Allocation Failure".  So it's true that the tenured doesn't have much breathing room here, but what prompted this email is I don't understand why that even matters considering young gen got cleaned up quite nicely.
>
>
>                  On Thu, May 8, 2014 at 3:36 PM, Srinivas Ramakrishna <ysr1729 at gmail.com <mailto:ysr1729 at gmail.com> <mailto:ysr1729 at gmail.com <mailto:ysr1729 at gmail.com>>> wrote:
>
>
>                      By the way, as others have noted, -XX:+PrintGCDetails at max verbosity level would be your friend to get more visibility into this. Include -XX:+PrintHeapAtGC for even better visibility. For good measure, after the puzzling full gc happens (and hopefully before another GC happens) capture jstat data re the heap (old gen), for direct allocation visibility.
>
>                      -- ramki
>
>
>                      On Thu, May 8, 2014 at 12:34 PM, Srinivas Ramakrishna <ysr1729 at gmail.com <mailto:ysr1729 at gmail.com> <mailto:ysr1729 at gmail.com <mailto:ysr1729 at gmail.com>>> wrote:
>
>                          Hi Vitaly --
>
>
>                          On Thu, May 8, 2014 at 11:38 AM, Vitaly Davidovich <vitalyd at gmail.com <mailto:vitalyd at gmail.com> <mailto:vitalyd at gmail.com <mailto:vitalyd at gmail.com>>> wrote:
>
>                              Hi Jon,
>
>                              Nope, we're not using CMS here; this is the throughput/parallel collector setup.
>
>                              I was browsing some of the gc code in openjdk, and noticed a few places where each generation attempts to decide (upfront from what I can tell, i.e. before doing the collection) whether it thinks it's "safe" to perform the collection (and if it's not, it punts to the next generation) and also whether some amount of promoted bytes will fit.
>
>                              I didn't dig too much yet, but a cursory scan of that code leads me to think that perhaps the defNew generation is asking the next gen (i.e. tenured) whether it could handle some estimated promotion amount, and given the large imbalance between Young and Tenured size, tenured is reporting that things won't fit -- this then causes a full gc.  Is that at all possible from what you know?
>
>
>                          If that were to happen, you wouldn't see the minor gc that precedes the full gc in the log snippet you posted.
>
>                          The only situation I know where a minor GC is followed immediately by a major is when a minor gc didn't manage to fit an allocation request in the space available. But, thinking more about that, it can't be because one would expect that Eden knows the largest object it can allocate, so if the request is larger than will fit in young, the allocator would just go look for space in the older generation. If that didn't fit, the old gen would precipitate a gc which would collect the entire heap (all this should be taken with a dose of salt as I don't have the code in front of me as I type, and I haven't looked at the allocation policy code in ages).
>
>
>                              On your first remark about compaction, just to make sure I understand, you're saying that a full GC prefers to move all live objects into tenured (this means taking objects out of survivor space and eden), irrespective of whether their tenuring threshold has been exceeded? If that compaction/migration of objects into tenured overflows tenured, then it attempts to compact the young gen, with overflow into survivor space from eden.  So basically, this generation knows how to perform compaction and it's not just a copying collection?
>
>
>                          That is correct. A full gc does in fact move all survivors from young gen into the old gen. This is a limitation (artificial nepotism can ensue because of "too young" objects that will soon die, getting artificially dragged into the old generation) that I had been lobbying to fix for a while now. I think there's even an old, perhaps still open, bug for this.
>
>
>                              Is there a way to get the young gen to print an age table of objects in its survivor space? I couldn't find one, but perhaps I'm blind.
>
>
>                          +PrintTenuringDistribution (for ParNew/DefNew, perhaps also G1?)
>
>
>                              Also, as a confirmation, System.gc() always invokes a full gc with the parallel collector, right? I believe so, but just wanted to double check while we're on the topic.
>
>
>                          Right. (Not sure what happens if JNI critical section is in force -- whether it's skipped or we wait for the JNI CS to exit/complete; hopefully others can fill in the blanks/inaccuracies in my comments above, since they are based on things that used to be a while ago in code I haven't looked at recently.)
>
>                          -- ramki
>
>
>                              Thanks
>
>
>                              On Thu, May 8, 2014 at 1:39 PM, Jon Masamitsu <jon.masamitsu at oracle.com <mailto:jon.masamitsu at oracle.com> <mailto:jon.masamitsu at oracle.__com <mailto:jon.masamitsu at oracle.com>>> wrote:
>
>
>                                  On 05/07/2014 05:55 PM, Vitaly Davidovich wrote:
>
>
>                                      Yes, I know :) This is some cruft that needs to be cleaned up.
>
>                                      So my suspicion is that full gc is triggered precisely because old gen occupancy is almost 100%, but I'd appreciate confirmation on that. What's surprising is that even though old gen is almost full, young gen has lots of room now. In fact, this system is restarted daily so we never see another young gc before the restart.
>
>                                      The other odd observation is that survivor spaces are completely empty after this full gc despite tenuring threshold not being adjusted.
>
>
>                                  The full gc algorithm used compacts everything (old gen and young gen) into
>                                  the old gen unless it does not all fit. If the old gen overflows, the young gen
>                                  is compacted into itself. Live in the young gen is compacted into eden first and
>                                  then into the survivor spaces.
>
>                                      My intuitive thinking is that there was no real reason for the full gc to occur; whatever allocation failed in young could now succeed and whatever was tenured fit, albeit very tightly.
>
>
>                                  Still puzzling about the full GC.  Are you using CMS?  If you have PrintGCDetails output,
>                                  that might help.
>
>                                  Jon
>
>                                      Sent from my phone
>
>                                      On May 7, 2014 8:40 PM, "Bernd Eckenfels" <bernd-2014 at eckenfels.net <mailto:bernd-2014 at eckenfels.net> <mailto:bernd-2014 at eckenfels.__net <mailto:bernd-2014 at eckenfels.net>>> wrote:
>
>                                          Am Wed, 7 May 2014 19:34:20 -0400
>                                          schrieb Vitaly Davidovich <vitalyd at gmail.com <mailto:vitalyd at gmail.com> <mailto:vitalyd at gmail.com <mailto:vitalyd at gmail.com>>>:
>
>                                          > The vm args are:
>                                          >
>                                          > -Xms16384m -Xmx16384m -Xmn16384m -XX:NewSize=12288m
>                                          > -XX:MaxNewSize=12288m -XX:SurvivorRatio=10
>
>                                          Hmm... you have confliciting arguments here, MaxNewSize overwrites Xmn.
>                                          You will get 16384-12288=4gb old size, thats quite low. As you can see
>                                          in your FullGC the steady state after FullGC has filled it nearly
>                                          completely.
>
>                                          Gruss
>                                          Bernd
>                                          _________________________________________________
>                                          hotspot-gc-use mailing list
>                 hotspot-gc-use at openjdk.java.__net <mailto:hotspot-gc-use at openjdk.java.net> <mailto:hotspot-gc-use at __openjdk.java.net <mailto:hotspot-gc-use at openjdk.java.net>>
>                 http://mail.openjdk.java.net/__mailman/listinfo/hotspot-gc-__use <http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use>
>
>
>
>                                      _________________________________________________
>                                      hotspot-gc-use mailing list
>                 hotspot-gc-use at openjdk.java.__net <mailto:hotspot-gc-use at openjdk.java.net>  <mailto:hotspot-gc-use at __openjdk.java.net <mailto:hotspot-gc-use at openjdk.java.net>>
>                 http://mail.openjdk.java.net/__mailman/listinfo/hotspot-gc-__use <http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use>
>
>
>
>                                  _________________________________________________
>                                  hotspot-gc-use mailing list
>             hotspot-gc-use at openjdk.java.__net <mailto:hotspot-gc-use at openjdk.java.net> <mailto:hotspot-gc-use at __openjdk.java.net <mailto:hotspot-gc-use at openjdk.java.net>>
>             http://mail.openjdk.java.net/__mailman/listinfo/hotspot-gc-__use <http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use>
>
>
>
>                              _________________________________________________
>                              hotspot-gc-use mailing list
>             hotspot-gc-use at openjdk.java.__net <mailto:hotspot-gc-use at openjdk.java.net> <mailto:hotspot-gc-use at __openjdk.java.net <mailto:hotspot-gc-use at openjdk.java.net>>
>             http://mail.openjdk.java.net/__mailman/listinfo/hotspot-gc-__use <http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use>
>
>
>
>
>
>
>
>
>         _________________________________________________
>         hotspot-gc-use mailing list
>         hotspot-gc-use at openjdk.java.__net <mailto:hotspot-gc-use at openjdk.java.net>
>         http://mail.openjdk.java.net/__mailman/listinfo/hotspot-gc-__use <http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use>
>