ParNew promotion failed, no expected OOM.

Wed Apr 11 11:24:05 PDT 2012

I believe this is missing the "gc overhead" threshold for the space limit.
As I have commented in the past, i think the GC overhead limit should
consider
not just the space free in the whole heap, but rather the difference
between the old gen
capacity and the sum of the space used in the young gen and the old gen
after a major
GC has competed, as a percentage of the old gen capacity. It almost seems
as though
you have a largish object in the young gen which will not fit in the space
free in the old gen,
o it will never be promoted unless sufficient space clears up in the old
gen, and from what
you are describing, that won't happen until your program terminates its
computation.

I think we need to fix the space criteria for overhead limit to deal
gracefully
with these kinds of situations.

On an unrelated note, for such a small heap, you should probably use
ParallelOldGC rather
than CMS, but I realize that you didn't explicitly ask for CMS, the mac
just gave it to you
because that's the default.

-- ramki

On Wed, Apr 11, 2012 at 7:24 AM, Dawid Weiss <dawid.weiss at gmail.com> wrote:

> Hi there,
>
> We are measuring certain aspects of our algorithm with a test suite
> which attempts to run close to the physical heap's maximum size. We do
> it by doing a form of binary search based on the size of data passed
> to the algorithm, where the lower bound is always "succeeded without
> an OOM" and the upper bound is "threw an OOM". This works nice but
> occasionally we experience an effective deadlock in which full GCs are
> repeatedly invoked, the application makes progress but overall it's
> several orders of magnitude slower than usual (hours instead of
> seconds).
>
> GC logs look like this:
>
> [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0220371
> secs][CMS: 69016K->69014K(81152K), 0.1370901 secs]
> 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)],
> 0.1591765 secs] [Times: user=0.20 sys=0.00, real=0.16 secs]
> [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0170617
> secs][CMS: 69016K->69014K(81152K), 0.1235417 secs]
> 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)],
> 0.1406872 secs] [Times: user=0.18 sys=0.00, real=0.14 secs]
> [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0191855
> secs][CMS: 69016K->69014K(81152K), 0.1296462 secs]
> 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)],
> 0.1488816 secs] [Times: user=0.18 sys=0.00, real=0.15 secs]
> [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0232418
> secs][CMS: 69016K->69014K(81152K), 0.1300695 secs]
> 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)],
> 0.1533590 secs] [Times: user=0.20 sys=0.00, real=0.15 secs]
> [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0190998
> secs][CMS: 69016K->69014K(81152K), 0.1319668 secs]
> 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)],
> 0.1511436 secs] [Times: user=0.18 sys=0.00, real=0.15 secs]
> [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0168998
> secs][CMS: 69017K->69015K(81152K), 0.1359254 secs]
> 86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)],
> 0.1528776 secs] [Times: user=0.20 sys=0.01, real=0.16 secs]
> [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0214651
> secs][CMS: 69017K->69015K(81152K), 0.1209494 secs]
> 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)],
> 0.1424941 secs] [Times: user=0.18 sys=0.00, real=0.14 secs]
> [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0200897
> secs][CMS: 69017K->69015K(81152K), 0.1244227 secs]
> 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)],
> 0.1445654 secs] [Times: user=0.18 sys=0.00, real=0.14 secs]
> [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0203377
> secs][CMS: 69017K->69015K(81152K), 0.1353857 secs]
> 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)],
> 0.1558016 secs] [Times: user=0.19 sys=0.00, real=0.16 secs]
> [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0201951
> secs][CMS: 69017K->69015K(81152K), 0.1289750 secs]
> 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)],
> 0.1492306 secs] [Times: user=0.19 sys=0.00, real=0.15 secs]
> [GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0206677
> secs][CMS: 69017K->69015K(81152K), 0.1280734 secs]
> 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)],
> 0.1488114 secs] [Times: user=0.18 sys=0.00, real=0.15 secs]
> [GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0150225
> secs][CMS: 69017K->69015K(81152K), 0.1301056 secs]
> 86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)],
> 0.1451940 secs] [Times: user=0.19 sys=0.01, real=0.14 secs]
>
> The heap limit is intentionally left smallish and the routine where
> this happens is in fact computational (it does allocate sporadic
> objects but never releases them until finished).
>
> This behavior is easy to reproduce on my Mac (quad core),
>
> java version "1.6.0_31"
> Java(TM) SE Runtime Environment (build 1.6.0_31-b04-414-11M3626)
> Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01-414, mixed mode)
>
> I read a bit about the nature of "promotion failed" and it's clear to
> me (or so I think) why this is happening here. My questions are:
>
> 1) why isn't OOM being triggered by gc overhead limit? It should
> easily be falling within the default thresholds,
> 2) is there anything one can do to prevent situation like the above
> (other than manually fiddling with limits)?
>
> Thanks in advance for any pointers and feedback,
>
> Dawid
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20120411/42777acc/attachment.html