ParNew - how does it decide if Full GC is needed

Thu May 8 21:45:34 UTC 2014

By the way, would the ParNew collector handle this type of setup better
than PS? Or hard to say?

On Thu, May 8, 2014 at 5:44 PM, Vitaly Davidovich <vitalyd at gmail.com> wrote:

> Hi Peter,
>
> Thanks for the insight.  A few questions ...
>
> So we get an allocation failure in eden, scavenger starts a young
> collection.  Eden is at ~10gb at this point, with 2 survivor spaces of 1gb
> each.  At the time this young collection runs, tenured only has about 1gb
> of data (out of 4gb+ capacity).  Looking at the total used heap size post
> young GC:
>
> 29524.949: [GC 11279905K->4112377K(15728640K), 1.6319030 secs]
>
> That remaining 4gb of live data does make the tenured generation reach 98%
> occupancy, but eden is now totally clean with lots of space (10gb).  It's
> also unclear why it decided to overflow entirely into tenured space -- why
> not keep 1gb in the survivor space (that's the survivor space capacity in
> my setup) and only promote the remaining live objects to tenured? In our
> case, this would've been better because presumably a full GC would not have
> triggered and we could've finished the day without any more gc events due
> to a lot of headroom left in young.  Instead, a full GC was triggered, took
> nearly 7 secs, and didn't really reclaim much -- it was a waste of time,
> and seems unnecessary.
>
> Also, when these young and old gc events occurred, the only prior gc
> events were two forced full gc's very early on in the JVM's lifetime.  What
> historical promotion info is used by the subsequent GC event in this case?
> Is there effectively no promotion history (due to all prior GC events
> having been forced via System.gc()) or does the scavenger assume some worst
> case scenario there?
>
> Finally, what would be the recommended settings (other than raising max
> heap size and thus giving more room to tenured) for a setup such as this?
> That is, a JVM that runs for about 8-9 hours before being restarted.  The
> gc allocation rate is fairly low, but does creep up over the course of the
> day.  The amount of truly long-lived objects is somewhat close to tenured
> capacity, but smaller.  The young gen is sized aggressively large to
> prevent (or at least make an effort in preventing) young GCs from occurring
> at all.  But, if they do occur, vast majority of objects there are garbage.
>
> Thanks
>
>
>
>
>
>
>
> On Thu, May 8, 2014 at 5:16 PM, Peter B. Kessler <
> Peter.B.Kessler at oracle.com> wrote:
>
>> The "problem", if you want to call it that, is that when the young
>> generation has filled up before the next collection it is probably too
>> late.  The scavenger is optimistic and thinks everything can be promoted.
>>  It just goes ahead and starts a young collection.  It gets a promotion
>> failure if it runs out of space in the old generation, painfully recovers
>> from the promotion failure and then causes a full collection.  Instead we
>> use the promotion history at the end of each young generation collection to
>> decide to do a full collection preemptively.  That way we can sneak in that
>> last scavenge (usually pretty fast, and usually emptying the whole eden)
>> before we invoke a full collection, which doesn't handle massive amounts of
>> garbage well (e.g., in the young generation).  If we were pessimistic,
>> given Vitaly's heap layout, we'd do nothing but full collections.
>>
>> I think all the policy code (for the parallel scavenger) is in
>> PSScavenge::invoke(), e.g.,
>>
>>     http://hg.openjdk.java.net/jdk8/jdk8/hotspot/file/
>> 2f6dc76eb8e5/src/share/vm/gc_implementation/parallelScavenge/psScavenge.
>> cpp
>>
>> starting at line 210.  The policy decision is made in
>>  PSAdaptiveSizePolicy::should_full_GC
>>
>>     http://hg.openjdk.java.net/jdk8/jdk8/hotspot/file/
>> 2f6dc76eb8e5/src/share/vm/gc_implementation/parallelScavenge/
>> psAdaptiveSizePolicy.cpp
>>
>> starting at line 162.   Look at all those lovely fish!
>>
>> It looks like setting -XX:+PrintGCDetails -XX:+Verbose (a "develop" flag)
>> would tell you what choices are being made (and probably produce a lot of
>> other output as well :-).  In a product build -XX:+PrintGCDetails
>> -XX:+PrintHeapAtGC, as has been suggested by others, should get enough
>> information to figure out what's going on.
>>
>> I've cited the code for the parallel scavenger, because Vitaly said "this
>> is the throughput/parallel collector setup".  The other collectors have
>> similar policy code.
>>
>>                         ... peter
>>
>>
>> On 05/08/14 13:24, Srinivas Ramakrishna wrote:
>>
>>> The 98% old gen occupancy triggered one of my two neurons.
>>> I think there was gc policy code (don't know if it;s still there) that
>>> would proactiively precipitate a full gc when it realized (based on
>>> recent/historical promotion volume stats) that the next minor gc would not
>>> be able to promote its survivors into the head room remaining in old.
>>> (Don't ask me why it;s better to do it now rather than the next time the
>>> young gen fills up and just rely on the same check). Again I am not looking
>>> at the code (as it takes some effort to get to the box where I keep a copy
>>> of the hotspot/openjdk code.)
>>>
>>> Hopefully Jon &co. will quickly confirm or shoot down the imaginations o
>>> my foggy memory!
>>> -- ramki
>>>
>>>
>>> On Thu, May 8, 2014 at 12:55 PM, Vitaly Davidovich <vitalyd at gmail.com<mailto:
>>> vitalyd at gmail.com>> wrote:
>>>
>>>     I captured some usage and capacity stats via jstat right after that
>>> full gc that started this email thread.  It showed 0% usage of survivor
>>> spaces (which makes sense now that I know that a full gc empties that out
>>> irrespective of tenuring threshold and object age); eden usage went down to
>>> like 10%; tenured usage was very high, 98%.  Last gc cause was recorded as
>>> "Allocation Failure".  So it's true that the tenured doesn't have much
>>> breathing room here, but what prompted this email is I don't understand why
>>> that even matters considering young gen got cleaned up quite nicely.
>>>
>>>
>>>     On Thu, May 8, 2014 at 3:36 PM, Srinivas Ramakrishna <
>>> ysr1729 at gmail.com <mailto:ysr1729 at gmail.com>> wrote:
>>>
>>>
>>>         By the way, as others have noted, -XX:+PrintGCDetails at max
>>> verbosity level would be your friend to get more visibility into this.
>>> Include -XX:+PrintHeapAtGC for even better visibility. For good measure,
>>> after the puzzling full gc happens (and hopefully before another GC
>>> happens) capture jstat data re the heap (old gen), for direct allocation
>>> visibility.
>>>
>>>         -- ramki
>>>
>>>
>>>         On Thu, May 8, 2014 at 12:34 PM, Srinivas Ramakrishna <
>>> ysr1729 at gmail.com <mailto:ysr1729 at gmail.com>> wrote:
>>>
>>>             Hi Vitaly --
>>>
>>>
>>>             On Thu, May 8, 2014 at 11:38 AM, Vitaly Davidovich <
>>> vitalyd at gmail.com <mailto:vitalyd at gmail.com>> wrote:
>>>
>>>                 Hi Jon,
>>>
>>>                 Nope, we're not using CMS here; this is the
>>> throughput/parallel collector setup.
>>>
>>>                 I was browsing some of the gc code in openjdk, and
>>> noticed a few places where each generation attempts to decide (upfront from
>>> what I can tell, i.e. before doing the collection) whether it thinks it's
>>> "safe" to perform the collection (and if it's not, it punts to the next
>>> generation) and also whether some amount of promoted bytes will fit.
>>>
>>>                 I didn't dig too much yet, but a cursory scan of that
>>> code leads me to think that perhaps the defNew generation is asking the
>>> next gen (i.e. tenured) whether it could handle some estimated promotion
>>> amount, and given the large imbalance between Young and Tenured size,
>>> tenured is reporting that things won't fit -- this then causes a full gc.
>>>  Is that at all possible from what you know?
>>>
>>>
>>>             If that were to happen, you wouldn't see the minor gc that
>>> precedes the full gc in the log snippet you posted.
>>>
>>>             The only situation I know where a minor GC is followed
>>> immediately by a major is when a minor gc didn't manage to fit an
>>> allocation request in the space available. But, thinking more about that,
>>> it can't be because one would expect that Eden knows the largest object it
>>> can allocate, so if the request is larger than will fit in young, the
>>> allocator would just go look for space in the older generation. If that
>>> didn't fit, the old gen would precipitate a gc which would collect the
>>> entire heap (all this should be taken with a dose of salt as I don't have
>>> the code in front of me as I type, and I haven't looked at the allocation
>>> policy code in ages).
>>>
>>>
>>>                 On your first remark about compaction, just to make sure
>>> I understand, you're saying that a full GC prefers to move all live objects
>>> into tenured (this means taking objects out of survivor space and eden),
>>> irrespective of whether their tenuring threshold has been exceeded? If that
>>> compaction/migration of objects into tenured overflows tenured, then it
>>> attempts to compact the young gen, with overflow into survivor space from
>>> eden.  So basically, this generation knows how to perform compaction and
>>> it's not just a copying collection?
>>>
>>>
>>>             That is correct. A full gc does in fact move all survivors
>>> from young gen into the old gen. This is a limitation (artificial nepotism
>>> can ensue because of "too young" objects that will soon die, getting
>>> artificially dragged into the old generation) that I had been lobbying to
>>> fix for a while now. I think there's even an old, perhaps still open, bug
>>> for this.
>>>
>>>
>>>                 Is there a way to get the young gen to print an age
>>> table of objects in its survivor space? I couldn't find one, but perhaps
>>> I'm blind.
>>>
>>>
>>>             +PrintTenuringDistribution (for ParNew/DefNew, perhaps also
>>> G1?)
>>>
>>>
>>>                 Also, as a confirmation, System.gc() always invokes a
>>> full gc with the parallel collector, right? I believe so, but just wanted
>>> to double check while we're on the topic.
>>>
>>>
>>>             Right. (Not sure what happens if JNI critical section is in
>>> force -- whether it's skipped or we wait for the JNI CS to exit/complete;
>>> hopefully others can fill in the blanks/inaccuracies in my comments above,
>>> since they are based on things that used to be a while ago in code I
>>> haven't looked at recently.)
>>>
>>>             -- ramki
>>>
>>>
>>>                 Thanks
>>>
>>>
>>>                 On Thu, May 8, 2014 at 1:39 PM, Jon Masamitsu <
>>> jon.masamitsu at oracle.com <mailto:jon.masamitsu at oracle.com>> wrote:
>>>
>>>
>>>                     On 05/07/2014 05:55 PM, Vitaly Davidovich wrote:
>>>
>>>>
>>>>                     Yes, I know :) This is some cruft that needs to be
>>>> cleaned up.
>>>>
>>>>                     So my suspicion is that full gc is triggered
>>>> precisely because old gen occupancy is almost 100%, but I'd appreciate
>>>> confirmation on that.  What's surprising is that even though old gen is
>>>> almost full, young gen has lots of room now. In fact, this system is
>>>> restarted daily so we never see another young gc before the restart.
>>>>
>>>>                     The other odd observation is that survivor spaces
>>>> are completely empty after this full gc despite tenuring threshold not
>>>> being adjusted.
>>>>
>>>>
>>>                     The full gc algorithm used compacts everything (old
>>> gen and young gen) into
>>>                     the old gen unless it does not all fit.   If the old
>>> gen overflows, the young gen
>>>                     is compacted into itself.  Live in the young gen is
>>> compacted into eden first and
>>>                     then into the survivor spaces.
>>>
>>>                      My intuitive thinking is that there was no real
>>>> reason for the full gc to occur; whatever allocation failed in young could
>>>> now succeed and whatever was tenured fit, albeit very tightly.
>>>>
>>>>
>>>                     Still puzzling about the full GC.  Are you using
>>> CMS?  If you have PrintGCDetails output,
>>>                     that might help.
>>>
>>>                     Jon
>>>
>>>                      Sent from my phone
>>>>
>>>>
>>>>                     On May 7, 2014 8:40 PM, "Bernd Eckenfels" <
>>>> bernd-2014 at eckenfels.net <mailto:bernd-2014 at eckenfels.net>> wrote:
>>>>
>>>>                         Am Wed, 7 May 2014 19:34:20 -0400
>>>>                         schrieb Vitaly Davidovich <vitalyd at gmail.com<mailto:
>>>> vitalyd at gmail.com>>:
>>>>
>>>>
>>>>                         > The vm args are:
>>>>                         >
>>>>                         > -Xms16384m -Xmx16384m -Xmn16384m
>>>> -XX:NewSize=12288m
>>>>                         > -XX:MaxNewSize=12288m -XX:SurvivorRatio=10
>>>>
>>>>                         Hmm... you have confliciting arguments here,
>>>> MaxNewSize overwrites Xmn.
>>>>                         You will get 16384-12288=4gb old size, thats
>>>> quite low. As you can see
>>>>                         in your FullGC the steady state after FullGC
>>>> has filled it nearly
>>>>                         completely.
>>>>
>>>>                         Gruss
>>>>                         Bernd
>>>>                         _______________________________________________
>>>>                         hotspot-gc-use mailing list
>>>>                         hotspot-gc-use at openjdk.java.net <mailto:
>>>> hotspot-gc-use at openjdk.java.net>
>>>>
>>>>                         http://mail.openjdk.java.net/
>>>> mailman/listinfo/hotspot-gc-use
>>>>
>>>>
>>>>
>>>>                     _______________________________________________
>>>>                     hotspot-gc-use mailing list
>>>>                     hotspot-gc-use at openjdk.java.net  <mailto:
>>>> hotspot-gc-use at openjdk.java.net>
>>>>                     http://mail.openjdk.java.net/
>>>> mailman/listinfo/hotspot-gc-use
>>>>
>>>
>>>
>>>                     _______________________________________________
>>>                     hotspot-gc-use mailing list
>>>                     hotspot-gc-use at openjdk.java.net <mailto:
>>> hotspot-gc-use at openjdk.java.net>
>>>
>>>                     http://mail.openjdk.java.net/
>>> mailman/listinfo/hotspot-gc-use
>>>
>>>
>>>
>>>                 _______________________________________________
>>>                 hotspot-gc-use mailing list
>>>                 hotspot-gc-use at openjdk.java.net <mailto:hotspot-gc-use@
>>> openjdk.java.net>
>>>
>>>                 http://mail.openjdk.java.net/
>>> mailman/listinfo/hotspot-gc-use
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> hotspot-gc-use mailing list
>>> hotspot-gc-use at openjdk.java.net
>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140508/c6f87d8a/attachment.html>