Heap OOM for no apparent reason?

Fri Jun 3 07:28:43 UTC 2011

Y. Srinivas Ramakrishna (y.s.ramakrishna at oracle.com) wrote:
> Sorry, sent previous email without addressing all of the issues.
> 
> On 6/2/2011 6:15 PM, Raman Gupta wrote:
> > It would be *really* handy if there were a switch like:
> >
> > -XX:+StackTraceOnOutOfMemoryError
> 
> Yes that would be handy, and probably not too difficult.
> But I wonder also if something like OnOutOfMemoryError or
> like would already get you enough info to get close to
> the problem ... (although may be because it's executed in
> a separate shell, by the time the command executes the
> process has likely gone well past the point when the problem
> occurred).

No need to worry, the OnOutOfMemoryError commands are run while the
JVM is at a safepoint.  This worked for me:

  java -XX:OnOutOfMemoryError='jstack %p' ...

-John

> > to force the stack trace to be shown. Obviously looking at every line
> > of code of every library my application uses, including core JDK
> > libraries, for code paths where large amounts of heap may be allocated
> > and the associated OOME is caught and swallowed, is pretty much
> > impossible.
> 
> Try to look for places where OOME (or supertype?) is caught. I am
> hoping there aren't too many of those...
> 
> >
> > I think my next step is to increase the max heap size to a large value
> > which hopefully allows the large allocation to occur without failure,
> > and then periodically take heap dumps to isolate it.
> 
> Yes that seems reasonable, or may be use an allocation profiler
> with the larger heap and find it that way...
> 
> -- ramki
> 
> >
> > Thanks,
> > Raman
> >
> > On 06/02/2011 01:46 PM, Y. S. Ramakrishna wrote:
> >> If your code is not catching the OOM exception you'd
> >> expect to see the stack retrace when the program dies.
> >> If it catches the exception and carries on, you'd want
> >> it to print the exception detail. I don't know of
> >> cases where the exception would just disappear.
> >>
> >> In your case the report to stdout/stderr(?)that an OOM occurred and that
> >> the heap is being dumped comes from inside the JVM
> >> because you have asked for -XX:+HeapDumpOnOutOfMemoryError.
> >> After this point, your allocating thread would have gotten
> >> an OOME which it probably caught and swallowed, and hence
> >> the silence wrt the stack retrace you would normally see. You
> >> will want to look at your Infinispan code to see how
> >> it deals with the inability to allocate said large objects.
> >>
> >> Recall that object size is limited by the size of and
> >> available space in the largest area (Eden or Old) in your
> >> Java heap. As Kirk noted, the full gc was to attempt allocation
> >> of an object that didn't fit into the available space in
> >> Eden or in Old (so from that you can estimate the size of
> >> the request).
> >>
> >> Note also that the JDK libraries will resize hashtables under
> >> you and that can also cause large allocation requests
> >> (but i don't know how they handle OOM's resulting from such
> >> allocations).
> >>
> >> -- ramki
> >>
> >> On 06/02/11 09:56, Raman Gupta wrote:
> >>> I do tend to think that somewhere a large object or array is being
> >>> created. In particular, Infinispan is one library we are using that
> >>> may be allocating large chunks of memory -- indeed, replacing
> >>> Infinispan with a local cache does seem to "fix" the problem.
> >>>
> >>> However, more information from the JVM would really be useful in
> >>> isolating the offending code in Infinispan. Ideally,
> >>>
> >>> a) any large allocations should show up as part of the heap dump if
> >>> the allocation succeeded but then some other subsequent code caused
> >>> the OOM, or
> >>>
> >>> b) if the allocation itself failed, the OOM exception should include a
> >>> stack trace that would allow me to isolate the allocation point  (as
> >>> it does normally, but for some reason in this case doesn't).
> >>>
> >>> In this case the heap dump shows plenty of room in heap, and there is
> >>> no stack trace at the OOM, so I don't really have any way to isolate
> >>> the offending allocation point. In which situations does the OOM
> >>> exception get printed without an associated stack trace?
> >>>
> >>> Cheers,
> >>> Raman
> >>>
> >>>
> >>> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
> >>>> are you trying to create a humungous object or array? Accidentally?
> >>>>
> >>>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
> >>>>
> >>>>> I did check the database but didn't find anything relevant. My search
> >>>>> terms may not be optimal, though I did scan through all the results
> >>>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
> >>>>> "0K->0K".
> >>>>>
> >>>>> I also suspected a bug in the collector and so I tried the same test
> >>>>> with the G1 collector, with the same OOM result. I didn't save the
> >>>>> log
> >>>>> from the G1 test, but I can quite easily redo the test with any
> >>>>> set of
> >>>>> JVM parameters that may be helpful in debugging -- the OOM seems
> >>>>> to be
> >>>>> easily and consistently reproducible with this application.
> >>>>>
> >>>>> Cheers,
> >>>>> Raman
> >>>>>
> >>>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
> >>>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen:
> >>>>>> 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)]
> >>>>>> 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)],
> >>>>>> 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
> >>>>>> UseAdaptiveSizePolicy actions to meet  *** throughput goal ***
> >>>>>>                        GC overhead (%)
> >>>>>>     Young generation:        2.65      (attempted to grow)
> >>>>>>     Tenured generation:      0.54      (attempted to grow)
> >>>>>>     Tenuring threshold:    (attempted to decrease to balance GC
> >>>>>> costs) = 1
> >>>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen:
> >>>>>> 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times:
> >>>>>> user=0.00 sys=0.00, real=0.00 secs]    UseAdaptiveSizePolicy
> >>>>>> actions to meet  *** throughput goal ***
> >>>>>>                        GC overhead (%)
> >>>>>>     Young generation:        2.01      (attempted to grow)
> >>>>>>     Tenured generation:      0.54      (no change)
> >>>>>>     Tenuring threshold:    (attempted to decrease to balance GC
> >>>>>> costs) = 1
> >>>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen:
> >>>>>> 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)]
> >>>>>> 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)],
> >>>>>> 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
> >>>>>> First Full GC looks normal as PSOldGen is (174784-169093)K =
> >>>>>> 5691K from being full. Nothing happening in Perm.
> >>>>>> The second is where things start to get weird. I don't see why
> >>>>>> that GC was called. Stranger still, it's ~800ms *after* the full
> >>>>>> gc and yet no application thread allocated any memory out of
> >>>>>> young gen.
> >>>>>> for some reason that "failed" young gen collection triggers an
> >>>>>> immediate Full GC.
> >>>>>>
> >>>>>> Bug in the collector? Did you check the bug database?
> >>>>>>
> >>>>>> Regards,
> >>>>>> Kirk
> >>>>>>
>