ParNew - how does it decide if Full GC is needed

Fri May 9 01:04:51 UTC 2014

Thanks Peter, I understand the mess that a promotion failure causes now.
I'm interested in your opinion on Ramki's last point, which is to defer the
full gc until the next scavenge (I.e. remember that you think you may have
promotion failure on next scavenge, and then do a full gc right before that
next scavenge).

I think you'll find that there are many JVM deployments out there that
either restart their JVM daily or force GC off peak hours.  For those
cases, you want to keep on running out of eden as much as possible since
it's likely that there won't be a next scavenge, either because jvm is
restarted or a forced gc is induced off hours, at which point you don't
care how long it takes.  It sounds like that's what ParNew does, so maybe
that's worth a try.

Also, in my example here, the induced GC took nearly 7 secs (as compared to
1+ sec for young with a larger space) on a fairly small tenured and
reclaimed some very nominal amount - one could say it was a waste of time
doing it, but I do appreciate that this setup is not the norm.

Thanks, this has been a very educational discussion.

Sent from my phone
On May 8, 2014 7:11 PM, "Peter B. Kessler" <Peter.B.Kessler at oracle.com>
wrote:

> Recovering from promotion failure is slow.  The advantage of scavenges is
> that you only touch the live objects, and there aren't many of those.  When
> a scavenge finishes successfully, you can just reset the allocation pointer
> in the eden because everything is either unreachable, or has been copied
> somewhere else.  When a promotion fails, you have an eden with some live
> object in it, but you don't know where they are.  So (at least with
> techniques we know about) you have to pick up each young generation object
> and decide if it's still reachable or not, whether it has already been
> copied out, and compact the live objects into the space in the eden, and
> then run around updating all the pointers to the live objects that you
> moved.  Touching each object in eden is painful (because there are lots of
> them) and not terribly satisfying (because most of them are reachable).
>
> Much better to do a successful scavenge that empties the young generation
> and a full collection on the old generation to create space for the *next*
> scavenge using a collector that's designed for the old generation.
>
> Your situation is unusual.  You might have to do more work to get the
> behavior you want.
>
>                         ... peter
>
> On 05/08/14 15:57, Vitaly Davidovich wrote:
>
>> Jon,
>>
>> Thanks.  So ParNew behavior of not triggering a full gc preemptively
>> seems a better fit for my usecase.  In fact, we will not have another young
>> gc in our setup, allocation rate, and workload.  What's the purpose of
>> doing a preemptive full gc (with all the baggage it comes with) in parallel
>> old? Why not just wait until the next young collection (if that even
>> happens) and take the full gc hit then? I'm failing to see the advantage of
>> taking that hit eagerly, even after reading Peter's description.  Is it to
>> avoid promotion failure that it thinks will happen next time? And if so, it
>> thinks doing the preemptive full gc is faster than handling a promotion
>> failure next time?
>>
>> Thanks guys
>>
>> Sent from my phone
>>
>>
>> On 05/08/2014 01:24 PM, Srinivas Ramakrishna wrote:
>>
>>> The 98% old gen occupancy triggered one of my two neurons.
>>> I think there was gc policy code (don't know if it;s still there) that
>>> would proactiively precipitate a full gc when it realized (based on
>>> recent/historical promotion volume stats) that the next minor gc would not
>>> be able to promote its survivors into the head room remaining in old.
>>> (Don't ask me why it;s better to do it now rather than the next time the
>>> young gen fills up and just rely on the same check). Again I am not looking
>>> at the code (as it takes some effort to get to the box where I keep a copy
>>> of the hotspot/openjdk code.)
>>>
>>
>> The UseParallelGC collector will do a full GC after a young GC if the
>> UseParallelGC
>> thinks the next young GC will not succeed (per Peter's explanation).  I
>> don't think
>> the ParNew GC will do that.   I looked for that code but did not find it.
>>   I looked in
>> the do_collection() code and the ParNew::collect() code.
>>
>> The only case I could find where a full GC followed a young GC with
>> ParNew was
>> if the collection failed to free enough space for the allocation. Given
>> the amount
>> of free space in the young gen after the collection, that's unlikely.  Or
>> course, there
>> could be a bug.
>>
>> Jon
>>
>>  Hopefully Jon &co. will quickly confirm or shoot down the imaginations o
>>> my foggy memory!
>>> -- ramki
>>>
>>>
>>> On Thu, May 8, 2014 at 12:55 PM, Vitaly Davidovich <vitalyd at gmail.com<mailto:
>>> vitalyd at gmail.com>> wrote:
>>>
>>>     I captured some usage and capacity stats via jstat right after that
>>> full gc that started this email thread.  It showed 0% usage of survivor
>>> spaces (which makes sense now that I know that a full gc empties that out
>>> irrespective of tenuring threshold and object age); eden usage went down to
>>> like 10%; tenured usage was very high, 98%.  Last gc cause was recorded as
>>> "Allocation Failure".  So it's true that the tenured doesn't have much
>>> breathing room here, but what prompted this email is I don't understand why
>>> that even matters considering young gen got cleaned up quite nicely.
>>>
>>>
>>>     On Thu, May 8, 2014 at 3:36 PM, Srinivas Ramakrishna <
>>> ysr1729 at gmail.com <mailto:ysr1729 at gmail.com>> wrote:
>>>
>>>
>>>         By the way, as others have noted, -XX:+PrintGCDetails at max
>>> verbosity level would be your friend to get more visibility into this.
>>> Include -XX:+PrintHeapAtGC for even better visibility. For good measure,
>>> after the puzzling full gc happens (and hopefully before another GC
>>> happens) capture jstat data re the heap (old gen), for direct allocation
>>> visibility.
>>>
>>>         -- ramki
>>>
>>>
>>>         On Thu, May 8, 2014 at 12:34 PM, Srinivas Ramakrishna <
>>> ysr1729 at gmail.com <mailto:ysr1729 at gmail.com>> wrote:
>>>
>>>             Hi Vitaly --
>>>
>>>
>>>             On Thu, May 8, 2014 at 11:38 AM, Vitaly Davidovich <
>>> vitalyd at gmail.com <mailto:vitalyd at gmail.com>> wrote:
>>>
>>>                 Hi Jon,
>>>
>>>                 Nope, we're not using CMS here; this is the
>>> throughput/parallel collector setup.
>>>
>>>                 I was browsing some of the gc code in openjdk, and
>>> noticed a few places where each generation attempts to decide (upfront from
>>> what I can tell, i.e. before doing the collection) whether it thinks it's
>>> "safe" to perform the collection (and if it's not, it punts to the next
>>> generation) and also whether some amount of promoted bytes will fit.
>>>
>>>                 I didn't dig too much yet, but a cursory scan of that
>>> code leads me to think that perhaps the defNew generation is asking the
>>> next gen (i.e. tenured) whether it could handle some estimated promotion
>>> amount, and given the large imbalance between Young and Tenured size,
>>> tenured is reporting that things won't fit -- this then causes a full gc.
>>>  Is that at all possible from what you know?
>>>
>>>
>>>             If that were to happen, you wouldn't see the minor gc that
>>> precedes the full gc in the log snippet you posted.
>>>
>>>             The only situation I know where a minor GC is followed
>>> immediately by a major is when a minor gc didn't manage to fit an
>>> allocation request in the space available. But, thinking more about that,
>>> it can't be because one would expect that Eden knows the largest object it
>>> can allocate, so if the request is larger than will fit in young, the
>>> allocator would just go look for space in the older generation. If that
>>> didn't fit, the old gen would precipitate a gc which would collect the
>>> entire heap (all this should be taken with a dose of salt as I don't have
>>> the code in front of me as I type, and I haven't looked at the allocation
>>> policy code in ages).
>>>
>>>
>>>                 On your first remark about compaction, just to make sure
>>> I understand, you're saying that a full GC prefers to move all live objects
>>> into tenured (this means taking objects out of survivor space and eden),
>>> irrespective of whether their tenuring threshold has been exceeded? If that
>>> compaction/migration of objects into tenured overflows tenured, then it
>>> attempts to compact the young gen, with overflow into survivor space from
>>> eden.  So basically, this generation knows how to perform compaction and
>>> it's not just a copying collection?
>>>
>>>
>>>             That is correct. A full gc does in fact move all survivors
>>> from young gen into the old gen. This is a limitation (artificial nepotism
>>> can ensue because of "too young" objects that will soon die, getting
>>> artificially dragged into the old generation) that I had been lobbying to
>>> fix for a while now. I think there's even an old, perhaps still open, bug
>>> for this.
>>>
>>>
>>>                 Is there a way to get the young gen to print an age
>>> table of objects in its survivor space? I couldn't find one, but perhaps
>>> I'm blind.
>>>
>>>
>>>             +PrintTenuringDistribution (for ParNew/DefNew, perhaps also
>>> G1?)
>>>
>>>
>>>                 Also, as a confirmation, System.gc() always invokes a
>>> full gc with the parallel collector, right? I believe so, but just wanted
>>> to double check while we're on the topic.
>>>
>>>
>>>             Right. (Not sure what happens if JNI critical section is in
>>> force -- whether it's skipped or we wait for the JNI CS to exit/complete;
>>> hopefully others can fill in the blanks/inaccuracies in my comments above,
>>> since they are based on things that used to be a while ago in code I
>>> haven't looked at recently.)
>>>
>>>             -- ramki
>>>
>>>
>>>                 Thanks
>>>
>>>
>>>                 On Thu, May 8, 2014 at 1:39 PM, Jon Masamitsu <
>>> jon.masamitsu at oracle.com <mailto:jon.masamitsu at oracle.com>> wrote:
>>>
>>>
>>>                     On 05/07/2014 05:55 PM, Vitaly Davidovich wrote:
>>>
>>>>
>>>>                     Yes, I know :) This is some cruft that needs to be
>>>> cleaned up.
>>>>
>>>>                     So my suspicion is that full gc is triggered
>>>> precisely because old gen occupancy is almost 100%, but I'd appreciate
>>>> confirmation on that. What's surprising is that even though old gen is
>>>> almost full, young gen has lots of room now. In fact, this system is
>>>> restarted daily so we never see another young gc before the restart.
>>>>
>>>>                     The other odd observation is that survivor spaces
>>>> are completely empty after this full gc despite tenuring threshold not
>>>> being adjusted.
>>>>
>>>>
>>>                     The full gc algorithm used compacts everything (old
>>> gen and young gen) into
>>>                     the old gen unless it does not all fit. If the old
>>> gen overflows, the young gen
>>>                     is compacted into itself. Live in the young gen is
>>> compacted into eden first and
>>>                     then into the survivor spaces.
>>>
>>>                      My intuitive thinking is that there was no real
>>>> reason for the full gc to occur; whatever allocation failed in young could
>>>> now succeed and whatever was tenured fit, albeit very tightly.
>>>>
>>>>
>>>                     Still puzzling about the full GC.  Are you using
>>> CMS?  If you have PrintGCDetails output,
>>>                     that might help.
>>>
>>>                     Jon
>>>
>>>                      Sent from my phone
>>>>
>>>>                     On May 7, 2014 8:40 PM, "Bernd Eckenfels" <
>>>> bernd-2014 at eckenfels.net <mailto:bernd-2014 at eckenfels.net>> wrote:
>>>>
>>>>                         Am Wed, 7 May 2014 19:34:20 -0400
>>>>                         schrieb Vitaly Davidovich <vitalyd at gmail.com<mailto:
>>>> vitalyd at gmail.com>>:
>>>>
>>>>                         > The vm args are:
>>>>                         >
>>>>                         > -Xms16384m -Xmx16384m -Xmn16384m
>>>> -XX:NewSize=12288m
>>>>                         > -XX:MaxNewSize=12288m -XX:SurvivorRatio=10
>>>>
>>>>                         Hmm... you have confliciting arguments here,
>>>> MaxNewSize overwrites Xmn.
>>>>                         You will get 16384-12288=4gb old size, thats
>>>> quite low. As you can see
>>>>                         in your FullGC the steady state after FullGC
>>>> has filled it nearly
>>>>                         completely.
>>>>
>>>>                         Gruss
>>>>                         Bernd
>>>>                         _______________________________________________
>>>>                         hotspot-gc-use mailing list
>>>>                         hotspot-gc-use at openjdk.java.net <mailto:
>>>> hotspot-gc-use at openjdk.java.net>
>>>>                         http://mail.openjdk.java.net/
>>>> mailman/listinfo/hotspot-gc-use
>>>>
>>>>
>>>>
>>>>                     _______________________________________________
>>>>                     hotspot-gc-use mailing list
>>>>                     hotspot-gc-use at openjdk.java.net  <mailto:
>>>> hotspot-gc-use at openjdk.java.net>
>>>>                     http://mail.openjdk.java.net/
>>>> mailman/listinfo/hotspot-gc-use
>>>>
>>>
>>>
>>>                     _______________________________________________
>>>                     hotspot-gc-use mailing list
>>>                     hotspot-gc-use at openjdk.java.net <mailto:
>>> hotspot-gc-use at openjdk.java.net>
>>>                     http://mail.openjdk.java.net/
>>> mailman/listinfo/hotspot-gc-use
>>>
>>>
>>>
>>>                 _______________________________________________
>>>                 hotspot-gc-use mailing list
>>>                 hotspot-gc-use at openjdk.java.net <mailto:hotspot-gc-use@
>>> openjdk.java.net>
>>>                 http://mail.openjdk.java.net/
>>> mailman/listinfo/hotspot-gc-use
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20140508/07f13289/attachment.html>