ParNew - how does it decide if Full GC is needed
Peter B. Kessler
Peter.B.Kessler at Oracle.COM
Thu May 8 22:40:18 UTC 2014
On 05/08/14 14:44, Vitaly Davidovich wrote:
> Hi Peter,
>
> Thanks for the insight. A few questions ...
>
> So we get an allocation failure in eden, scavenger starts a young collection. Eden is at ~10gb at this point, with 2 survivor spaces of 1gb each. At the time this young collection runs, tenured only has about 1gb of data (out of 4gb+ capacity). Looking at the total used heap size post young GC:
>
> 29524.949: [GC 11279905K->4112377K(15728640K), 1.6319030 secs]
>
> That remaining 4gb of live data does make the tenured generation reach 98% occupancy, but eden is now totally clean with lots of space (10gb). It's also unclear why it decided to overflow entirely into tenured space -- why not keep 1gb in the survivor space (that's the survivor space capacity in my setup) and only promote the remaining live objects to tenured? In our case, this would've been better because presumably a full GC would not have triggered and we could've finished the day without any more gc events due to a lot of headroom left in young. Instead, a full GC was triggered, took nearly 7 secs, and didn't really reclaim much -- it was a waste of time, and seems unnecessary.
Without the output of -XX:+PrintHeapAtGC, I'm not going to try to guess the state of the eden and survivors before and after a young generation collection. E.g., how do you know that there isn't 1GB of space in a survivor at the end of the collection you cited? How full was the old generation before this collection? How much of the eden and survivor space survived this collection? (You say, below "eden usage went down to like 10%". Taken literally, that implies a promotion failure with 1GB of stuff not fitting in the old generation: that would be a promotion failure and probably disaster for performance. If I interpret "eden" to mean young space, and remember that your survivors are 10% of the size of your eden, then this might mean that eden is empty and one survivor is full: that's just as things should be. But I don't like speculating.
> Also, when these young and old gc events occurred, the only prior gc events were two forced full gc's very early on in the JVM's lifetime. What historical promotion info is used by the subsequent GC event in this case? Is there effectively no promotion history (due to all prior GC events having been forced via System.gc()) or does the scavenger assume some worst case scenario there?
You could look at the code (I pointed you at the tip of the iceberg, but maybe this one doesn't go down that far :-), or you could wait for someone else to provide the answer. That code has changed enough that I don't want to give you stale data. I know there are initial values and rolling averages, policies around the various kinds of collections, etc., but I can't recite the details. (Also, I apologize for pointing to an older OpenJDK changeset. If you really go digging, start from the changeset that corresponds to the JVM you are running.)
> Finally, what would be the recommended settings (other than raising max heap size and thus giving more room to tenured) for a setup such as this? That is, a JVM that runs for about 8-9 hours before being restarted. The gc allocation rate is fairly low, but does creep up over the course of the day. The amount of truly long-lived objects is somewhat close to tenured capacity, but smaller. The young gen is sized aggressively large to prevent (or at least make an effort in preventing) young GCs from occurring at all. But, if they do occur, vast majority of objects there are garbage.
If you are not expecting, or hope to prevent, any young collections, then why have survivor spaces at all? That's 2GB that isn't in the eden where it could be useful staving off young collections. The downside of smaller survivors is that when a young collection happens objects (presumably short-lived objects) may make it into the old generation, which might then fill up and cause a full collection. But if you don't have any young generation collections, you won't have any full collections either. (Maybe unless you have objects that are so large they get allocated directly in the old generation. There's policy code for that, too.)
> Thanks
> By the way, would the ParNew collector handle this type of setup better than PS? Or hard to say?
The scavenger for ParNew is not that different from the one for PS. There might be policy differences around the edges, but if your eden is large enough that you don't do any young generation collections, the allocation parts are probably indistinguishable.
It seems brittle to be depending on there being no collections. Usage patterns change over time. Code changes over time. Libraries change over time. JVM's change over time. Machines change over time. Mostly Java programmers don't think about allocating a few objects here and there, but it all adds up. You might sleep easier at night by doubling the size of the heap, even if that means buying more memory. But you'd still have to worry about that pending collection. You made it to 8+ hours before the collection you show: how much more time do you need?
... peter
> On Thu, May 8, 2014 at 5:16 PM, Peter B. Kessler <Peter.B.Kessler at oracle.com <mailto:Peter.B.Kessler at oracle.com>> wrote:
>
> The "problem", if you want to call it that, is that when the young generation has filled up before the next collection it is probably too late. The scavenger is optimistic and thinks everything can be promoted. It just goes ahead and starts a young collection. It gets a promotion failure if it runs out of space in the old generation, painfully recovers from the promotion failure and then causes a full collection. Instead we use the promotion history at the end of each young generation collection to decide to do a full collection preemptively. That way we can sneak in that last scavenge (usually pretty fast, and usually emptying the whole eden) before we invoke a full collection, which doesn't handle massive amounts of garbage well (e.g., in the young generation). If we were pessimistic, given Vitaly's heap layout, we'd do nothing but full collections.
>
> I think all the policy code (for the parallel scavenger) is in PSScavenge::invoke(), e.g.,
>
> http://hg.openjdk.java.net/__jdk8/jdk8/hotspot/file/__2f6dc76eb8e5/src/share/vm/gc___implementation/__parallelScavenge/psScavenge.__cpp <http://hg.openjdk.java.net/jdk8/jdk8/hotspot/file/2f6dc76eb8e5/src/share/vm/gc_implementation/parallelScavenge/psScavenge.cpp>
>
> starting at line 210. The policy decision is made in PSAdaptiveSizePolicy::should___full_GC
>
> http://hg.openjdk.java.net/__jdk8/jdk8/hotspot/file/__2f6dc76eb8e5/src/share/vm/gc___implementation/__parallelScavenge/__psAdaptiveSizePolicy.cpp <http://hg.openjdk.java.net/jdk8/jdk8/hotspot/file/2f6dc76eb8e5/src/share/vm/gc_implementation/parallelScavenge/psAdaptiveSizePolicy.cpp>
>
> starting at line 162. Look at all those lovely fish!
>
> It looks like setting -XX:+PrintGCDetails -XX:+Verbose (a "develop" flag) would tell you what choices are being made (and probably produce a lot of other output as well :-). In a product build -XX:+PrintGCDetails -XX:+PrintHeapAtGC, as has been suggested by others, should get enough information to figure out what's going on.
>
> I've cited the code for the parallel scavenger, because Vitaly said "this is the throughput/parallel collector setup". The other collectors have similar policy code.
>
> ... peter
>
>
> On 05/08/14 13:24, Srinivas Ramakrishna wrote:
>
> The 98% old gen occupancy triggered one of my two neurons.
> I think there was gc policy code (don't know if it;s still there) that would proactiively precipitate a full gc when it realized (based on recent/historical promotion volume stats) that the next minor gc would not be able to promote its survivors into the head room remaining in old. (Don't ask me why it;s better to do it now rather than the next time the young gen fills up and just rely on the same check). Again I am not looking at the code (as it takes some effort to get to the box where I keep a copy of the hotspot/openjdk code.)
>
> Hopefully Jon &co. will quickly confirm or shoot down the imaginations o my foggy memory!
> -- ramki
>
>
> On Thu, May 8, 2014 at 12:55 PM, Vitaly Davidovich <vitalyd at gmail.com <mailto:vitalyd at gmail.com> <mailto:vitalyd at gmail.com <mailto:vitalyd at gmail.com>>> wrote:
>
> I captured some usage and capacity stats via jstat right after that full gc that started this email thread. It showed 0% usage of survivor spaces (which makes sense now that I know that a full gc empties that out irrespective of tenuring threshold and object age); eden usage went down to like 10%; tenured usage was very high, 98%. Last gc cause was recorded as "Allocation Failure". So it's true that the tenured doesn't have much breathing room here, but what prompted this email is I don't understand why that even matters considering young gen got cleaned up quite nicely.
>
>
> On Thu, May 8, 2014 at 3:36 PM, Srinivas Ramakrishna <ysr1729 at gmail.com <mailto:ysr1729 at gmail.com> <mailto:ysr1729 at gmail.com <mailto:ysr1729 at gmail.com>>> wrote:
>
>
> By the way, as others have noted, -XX:+PrintGCDetails at max verbosity level would be your friend to get more visibility into this. Include -XX:+PrintHeapAtGC for even better visibility. For good measure, after the puzzling full gc happens (and hopefully before another GC happens) capture jstat data re the heap (old gen), for direct allocation visibility.
>
> -- ramki
>
>
> On Thu, May 8, 2014 at 12:34 PM, Srinivas Ramakrishna <ysr1729 at gmail.com <mailto:ysr1729 at gmail.com> <mailto:ysr1729 at gmail.com <mailto:ysr1729 at gmail.com>>> wrote:
>
> Hi Vitaly --
>
>
> On Thu, May 8, 2014 at 11:38 AM, Vitaly Davidovich <vitalyd at gmail.com <mailto:vitalyd at gmail.com> <mailto:vitalyd at gmail.com <mailto:vitalyd at gmail.com>>> wrote:
>
> Hi Jon,
>
> Nope, we're not using CMS here; this is the throughput/parallel collector setup.
>
> I was browsing some of the gc code in openjdk, and noticed a few places where each generation attempts to decide (upfront from what I can tell, i.e. before doing the collection) whether it thinks it's "safe" to perform the collection (and if it's not, it punts to the next generation) and also whether some amount of promoted bytes will fit.
>
> I didn't dig too much yet, but a cursory scan of that code leads me to think that perhaps the defNew generation is asking the next gen (i.e. tenured) whether it could handle some estimated promotion amount, and given the large imbalance between Young and Tenured size, tenured is reporting that things won't fit -- this then causes a full gc. Is that at all possible from what you know?
>
>
> If that were to happen, you wouldn't see the minor gc that precedes the full gc in the log snippet you posted.
>
> The only situation I know where a minor GC is followed immediately by a major is when a minor gc didn't manage to fit an allocation request in the space available. But, thinking more about that, it can't be because one would expect that Eden knows the largest object it can allocate, so if the request is larger than will fit in young, the allocator would just go look for space in the older generation. If that didn't fit, the old gen would precipitate a gc which would collect the entire heap (all this should be taken with a dose of salt as I don't have the code in front of me as I type, and I haven't looked at the allocation policy code in ages).
>
>
> On your first remark about compaction, just to make sure I understand, you're saying that a full GC prefers to move all live objects into tenured (this means taking objects out of survivor space and eden), irrespective of whether their tenuring threshold has been exceeded? If that compaction/migration of objects into tenured overflows tenured, then it attempts to compact the young gen, with overflow into survivor space from eden. So basically, this generation knows how to perform compaction and it's not just a copying collection?
>
>
> That is correct. A full gc does in fact move all survivors from young gen into the old gen. This is a limitation (artificial nepotism can ensue because of "too young" objects that will soon die, getting artificially dragged into the old generation) that I had been lobbying to fix for a while now. I think there's even an old, perhaps still open, bug for this.
>
>
> Is there a way to get the young gen to print an age table of objects in its survivor space? I couldn't find one, but perhaps I'm blind.
>
>
> +PrintTenuringDistribution (for ParNew/DefNew, perhaps also G1?)
>
>
> Also, as a confirmation, System.gc() always invokes a full gc with the parallel collector, right? I believe so, but just wanted to double check while we're on the topic.
>
>
> Right. (Not sure what happens if JNI critical section is in force -- whether it's skipped or we wait for the JNI CS to exit/complete; hopefully others can fill in the blanks/inaccuracies in my comments above, since they are based on things that used to be a while ago in code I haven't looked at recently.)
>
> -- ramki
>
>
> Thanks
>
>
> On Thu, May 8, 2014 at 1:39 PM, Jon Masamitsu <jon.masamitsu at oracle.com <mailto:jon.masamitsu at oracle.com> <mailto:jon.masamitsu at oracle.__com <mailto:jon.masamitsu at oracle.com>>> wrote:
>
>
> On 05/07/2014 05:55 PM, Vitaly Davidovich wrote:
>
>
> Yes, I know :) This is some cruft that needs to be cleaned up.
>
> So my suspicion is that full gc is triggered precisely because old gen occupancy is almost 100%, but I'd appreciate confirmation on that. What's surprising is that even though old gen is almost full, young gen has lots of room now. In fact, this system is restarted daily so we never see another young gc before the restart.
>
> The other odd observation is that survivor spaces are completely empty after this full gc despite tenuring threshold not being adjusted.
>
>
> The full gc algorithm used compacts everything (old gen and young gen) into
> the old gen unless it does not all fit. If the old gen overflows, the young gen
> is compacted into itself. Live in the young gen is compacted into eden first and
> then into the survivor spaces.
>
> My intuitive thinking is that there was no real reason for the full gc to occur; whatever allocation failed in young could now succeed and whatever was tenured fit, albeit very tightly.
>
>
> Still puzzling about the full GC. Are you using CMS? If you have PrintGCDetails output,
> that might help.
>
> Jon
>
> Sent from my phone
>
>
> On May 7, 2014 8:40 PM, "Bernd Eckenfels" <bernd-2014 at eckenfels.net <mailto:bernd-2014 at eckenfels.net> <mailto:bernd-2014 at eckenfels.__net <mailto:bernd-2014 at eckenfels.net>>> wrote:
>
> Am Wed, 7 May 2014 19:34:20 -0400
> schrieb Vitaly Davidovich <vitalyd at gmail.com <mailto:vitalyd at gmail.com> <mailto:vitalyd at gmail.com <mailto:vitalyd at gmail.com>>>:
>
>
> > The vm args are:
> >
> > -Xms16384m -Xmx16384m -Xmn16384m -XX:NewSize=12288m
> > -XX:MaxNewSize=12288m -XX:SurvivorRatio=10
>
> Hmm... you have confliciting arguments here, MaxNewSize overwrites Xmn.
> You will get 16384-12288=4gb old size, thats quite low. As you can see
> in your FullGC the steady state after FullGC has filled it nearly
> completely.
>
> Gruss
> Bernd
> _________________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.__net <mailto:hotspot-gc-use at openjdk.java.net> <mailto:hotspot-gc-use at __openjdk.java.net <mailto:hotspot-gc-use at openjdk.java.net>>
>
> http://mail.openjdk.java.net/__mailman/listinfo/hotspot-gc-__use <http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use>
>
>
>
> _________________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.__net <mailto:hotspot-gc-use at openjdk.java.net> <mailto:hotspot-gc-use at __openjdk.java.net <mailto:hotspot-gc-use at openjdk.java.net>>
> http://mail.openjdk.java.net/__mailman/listinfo/hotspot-gc-__use <http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use>
>
>
>
> _________________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.__net <mailto:hotspot-gc-use at openjdk.java.net> <mailto:hotspot-gc-use at __openjdk.java.net <mailto:hotspot-gc-use at openjdk.java.net>>
>
> http://mail.openjdk.java.net/__mailman/listinfo/hotspot-gc-__use <http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use>
>
>
>
> _________________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.__net <mailto:hotspot-gc-use at openjdk.java.net> <mailto:hotspot-gc-use at __openjdk.java.net <mailto:hotspot-gc-use at openjdk.java.net>>
>
> http://mail.openjdk.java.net/__mailman/listinfo/hotspot-gc-__use <http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use>
>
>
>
>
>
>
>
> _________________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.__net <mailto:hotspot-gc-use at openjdk.java.net>
> http://mail.openjdk.java.net/__mailman/listinfo/hotspot-gc-__use <http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use>
>
>
More information about the hotspot-gc-use
mailing list