ParNew promotion failed, no expected OOM.

Wed Apr 11 23:02:18 PDT 2012

Dawid,

I haven't used these myself but you can try the flags

PrintClassHistogramBeforeFullGC
PrintClassHistogramAfterFullGC

and see what gets collected.

Jon

On 4/11/2012 11:53 AM, Dawid Weiss wrote:
>> I haven't look at your numbers but the OOM due to the
>> GC overhead is thrown very conservatively.  In addition to
> I realize this but this seems like a good example of when gc overhead
> should fire... or so I
> think. There doesn't seem to be any space left at all --
>
> 69016K->69014K(81152K)
>
> I realize these are full GCs because that's what -verbose:gc reports
> (I included the details because I asked for them but otherwise what
> you see is just FullGCs and no progress from the application itself).
>
> What's puzzling to me is that this routine only allocates memory (hard
> refs, there is nothing to collect) but the garbace collector _does_
> drop around 2kb on every full GC... Also, this routine is normally
> blazing fast and should either complete or OOM very quickly but
> instead stalls as if 99% of the time was spent doing full collections.
> I really cannot explain this.
>
> Is there any way to see which objects get _dropped_ on full GC runs?
> I'm curious what these dropped objects are.
>
> Dawid
>
>
>> spending too much time doing GC, the policy looks at how
>> much free space is available in the heap.  It may be that
>> there is enough free space in the heap such that the policy
>> does not want to trigger an OOM.
>>
>> You see the "promotion failure" message when the GC
>> policy thinks there is enough space in the old gen to
>> support a young collection.  It's supposed to be the
>> exception case and I wonder a bit why you see
>> "promotion failure" messages repeatedly instead of
>> just seeing "Full collections" but I can see how the
>> policy could get stuck in a situation where it keeps
>> thinking there is enough space in the old gen but
>> in the end there isn't.   Anyway those are basically
>> Full collections.
>>
>> Jon
>>
>> On 04/11/12 07:24, Dawid Weiss wrote:
>>> Hi there,
>>>
>>> We are measuring certain aspects of our algorithm with a test suite
>>> which attempts to run close to the physical heap's maximum size. We do
>>> it by doing a form of binary search based on the size of data passed
>>> to the algorithm, where the lower bound is always "succeeded without
>>> an OOM" and the upper bound is "threw an OOM". This works nice but
>>> occasionally we experience an effective deadlock in which full GCs are
>>> repeatedly invoked, the application makes progress but overall it's
>>> several orders of magnitude slower than usual (hours instead of
>>> seconds).
>>>
>>> GC logs look like this:
>>>
>>> [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0220371
>>> secs][CMS: 69016K->69014K(81152K), 0.1370901 secs]
>>> 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)],
>>> 0.1591765 secs] [Times: user=0.20 sys=0.00, real=0.16 secs]
>>> [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0170617
>>> secs][CMS: 69016K->69014K(81152K), 0.1235417 secs]
>>> 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)],
>>> 0.1406872 secs] [Times: user=0.18 sys=0.00, real=0.14 secs]
>>> [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0191855
>>> secs][CMS: 69016K->69014K(81152K), 0.1296462 secs]
>>> 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)],
>>> 0.1488816 secs] [Times: user=0.18 sys=0.00, real=0.15 secs]
>>> [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0232418
>>> secs][CMS: 69016K->69014K(81152K), 0.1300695 secs]
>>> 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)],
>>> 0.1533590 secs] [Times: user=0.20 sys=0.00, real=0.15 secs]
>>> [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0190998
>>> secs][CMS: 69016K->69014K(81152K), 0.1319668 secs]
>>> 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)],
>>> 0.1511436 secs] [Times: user=0.18 sys=0.00, real=0.15 secs]
>>> [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0168998
>>> secs][CMS: 69017K->69015K(81152K), 0.1359254 secs]
>>> 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)],
>>> 0.1528776 secs] [Times: user=0.20 sys=0.01, real=0.16 secs]
>>> [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0214651
>>> secs][CMS: 69017K->69015K(81152K), 0.1209494 secs]
>>> 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)],
>>> 0.1424941 secs] [Times: user=0.18 sys=0.00, real=0.14 secs]
>>> [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0200897
>>> secs][CMS: 69017K->69015K(81152K), 0.1244227 secs]
>>> 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)],
>>> 0.1445654 secs] [Times: user=0.18 sys=0.00, real=0.14 secs]
>>> [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0203377
>>> secs][CMS: 69017K->69015K(81152K), 0.1353857 secs]
>>> 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)],
>>> 0.1558016 secs] [Times: user=0.19 sys=0.00, real=0.16 secs]
>>> [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0201951
>>> secs][CMS: 69017K->69015K(81152K), 0.1289750 secs]
>>> 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)],
>>> 0.1492306 secs] [Times: user=0.19 sys=0.00, real=0.15 secs]
>>> [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0206677
>>> secs][CMS: 69017K->69015K(81152K), 0.1280734 secs]
>>> 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)],
>>> 0.1488114 secs] [Times: user=0.18 sys=0.00, real=0.15 secs]
>>> [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0150225
>>> secs][CMS: 69017K->69015K(81152K), 0.1301056 secs]
>>> 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)],
>>> 0.1451940 secs] [Times: user=0.19 sys=0.01, real=0.14 secs]
>>>
>>> The heap limit is intentionally left smallish and the routine where
>>> this happens is in fact computational (it does allocate sporadic
>>> objects but never releases them until finished).
>>>
>>> This behavior is easy to reproduce on my Mac (quad core),
>>>
>>> java version "1.6.0_31"
>>> Java(TM) SE Runtime Environment (build 1.6.0_31-b04-414-11M3626)
>>> Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01-414, mixed mode)
>>>
>>> I read a bit about the nature of "promotion failed" and it's clear to
>>> me (or so I think) why this is happening here. My questions are:
>>>
>>> 1) why isn't OOM being triggered by gc overhead limit? It should
>>> easily be falling within the default thresholds,
>>> 2) is there anything one can do to prevent situation like the above
>>> (other than manually fiddling with limits)?
>>>
>>> Thanks in advance for any pointers and feedback,
>>>
>>> Dawid
>>> _______________________________________________
>>> hotspot-gc-use mailing list
>>> hotspot-gc-use at openjdk.java.net
>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use