ParNew promotion failed, no expected OOM.

Wed Apr 11 07:24:28 PDT 2012

Hi there,

We are measuring certain aspects of our algorithm with a test suite
which attempts to run close to the physical heap's maximum size. We do
it by doing a form of binary search based on the size of data passed
to the algorithm, where the lower bound is always "succeeded without
an OOM" and the upper bound is "threw an OOM". This works nice but
occasionally we experience an effective deadlock in which full GCs are
repeatedly invoked, the application makes progress but overall it's
several orders of magnitude slower than usual (hours instead of
seconds).

GC logs look like this:

[GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0220371
secs][CMS: 69016K->69014K(81152K), 0.1370901 secs]
86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)],
0.1591765 secs] [Times: user=0.20 sys=0.00, real=0.16 secs]
[GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0170617
secs][CMS: 69016K->69014K(81152K), 0.1235417 secs]
86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)],
0.1406872 secs] [Times: user=0.18 sys=0.00, real=0.14 secs]
[GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0191855
secs][CMS: 69016K->69014K(81152K), 0.1296462 secs]
86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)],
0.1488816 secs] [Times: user=0.18 sys=0.00, real=0.15 secs]
[GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0232418
secs][CMS: 69016K->69014K(81152K), 0.1300695 secs]
86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)],
0.1533590 secs] [Times: user=0.20 sys=0.00, real=0.15 secs]
[GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0190998
secs][CMS: 69016K->69014K(81152K), 0.1319668 secs]
86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)],
0.1511436 secs] [Times: user=0.18 sys=0.00, real=0.15 secs]
[GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0168998
secs][CMS: 69017K->69015K(81152K), 0.1359254 secs]
86038K->86038K(100288K), [CMS Perm : 21285K->21285K(35724K)],
0.1528776 secs] [Times: user=0.20 sys=0.01, real=0.16 secs]
[GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0214651
secs][CMS: 69017K->69015K(81152K), 0.1209494 secs]
86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)],
0.1424941 secs] [Times: user=0.18 sys=0.00, real=0.14 secs]
[GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0200897
secs][CMS: 69017K->69015K(81152K), 0.1244227 secs]
86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)],
0.1445654 secs] [Times: user=0.18 sys=0.00, real=0.14 secs]
[GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0203377
secs][CMS: 69017K->69015K(81152K), 0.1353857 secs]
86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)],
0.1558016 secs] [Times: user=0.19 sys=0.00, real=0.16 secs]
[GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0201951
secs][CMS: 69017K->69015K(81152K), 0.1289750 secs]
86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)],
0.1492306 secs] [Times: user=0.19 sys=0.00, real=0.15 secs]
[GC [ParNew (promotion failed): 17023K->18700K(19136K), 0.0206677
secs][CMS: 69017K->69015K(81152K), 0.1280734 secs]
86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)],
0.1488114 secs] [Times: user=0.18 sys=0.00, real=0.15 secs]
[GC [ParNew (promotion failed): 17023K->18905K(19136K), 0.0150225
secs][CMS: 69017K->69015K(81152K), 0.1301056 secs]
86039K->86039K(100288K), [CMS Perm : 21285K->21285K(35724K)],
0.1451940 secs] [Times: user=0.19 sys=0.01, real=0.14 secs]

The heap limit is intentionally left smallish and the routine where
this happens is in fact computational (it does allocate sporadic
objects but never releases them until finished).

This behavior is easy to reproduce on my Mac (quad core),

java version "1.6.0_31"
Java(TM) SE Runtime Environment (build 1.6.0_31-b04-414-11M3626)
Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01-414, mixed mode)

I read a bit about the nature of "promotion failed" and it's clear to
me (or so I think) why this is happening here. My questions are:

1) why isn't OOM being triggered by gc overhead limit? It should
easily be falling within the default thresholds,
2) is there anything one can do to prevent situation like the above
(other than manually fiddling with limits)?

Thanks in advance for any pointers and feedback,

Dawid