Heap OOM for no apparent reason?

Fri Jun 3 01:15:24 UTC 2011

It would be *really* handy if there were a switch like:

-XX:+StackTraceOnOutOfMemoryError

to force the stack trace to be shown. Obviously looking at every line
of code of every library my application uses, including core JDK
libraries, for code paths where large amounts of heap may be allocated
and the associated OOME is caught and swallowed, is pretty much
impossible.

I think my next step is to increase the max heap size to a large value
which hopefully allows the large allocation to occur without failure,
and then periodically take heap dumps to isolate it.

Thanks,
Raman

On 06/02/2011 01:46 PM, Y. S. Ramakrishna wrote:
> If your code is not catching the OOM exception you'd
> expect to see the stack retrace when the program dies.
> If it catches the exception and carries on, you'd want
> it to print the exception detail. I don't know of
> cases where the exception would just disappear.
> 
> In your case the report to stdout/stderr(?)that an OOM occurred and that
> the heap is being dumped comes from inside the JVM
> because you have asked for -XX:+HeapDumpOnOutOfMemoryError.
> After this point, your allocating thread would have gotten
> an OOME which it probably caught and swallowed, and hence
> the silence wrt the stack retrace you would normally see. You
> will want to look at your Infinispan code to see how
> it deals with the inability to allocate said large objects.
> 
> Recall that object size is limited by the size of and
> available space in the largest area (Eden or Old) in your
> Java heap. As Kirk noted, the full gc was to attempt allocation
> of an object that didn't fit into the available space in
> Eden or in Old (so from that you can estimate the size of
> the request).
> 
> Note also that the JDK libraries will resize hashtables under
> you and that can also cause large allocation requests
> (but i don't know how they handle OOM's resulting from such
> allocations).
> 
> -- ramki
> 
> On 06/02/11 09:56, Raman Gupta wrote:
>> I do tend to think that somewhere a large object or array is being
>> created. In particular, Infinispan is one library we are using that
>> may be allocating large chunks of memory -- indeed, replacing
>> Infinispan with a local cache does seem to "fix" the problem.
>>
>> However, more information from the JVM would really be useful in
>> isolating the offending code in Infinispan. Ideally,
>>
>> a) any large allocations should show up as part of the heap dump if
>> the allocation succeeded but then some other subsequent code caused
>> the OOM, or
>>
>> b) if the allocation itself failed, the OOM exception should include a
>> stack trace that would allow me to isolate the allocation point  (as
>> it does normally, but for some reason in this case doesn't).
>>
>> In this case the heap dump shows plenty of room in heap, and there is
>> no stack trace at the OOM, so I don't really have any way to isolate
>> the offending allocation point. In which situations does the OOM
>> exception get printed without an associated stack trace?
>>
>> Cheers,
>> Raman
>>
>>
>> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
>>> are you trying to create a humungous object or array? Accidentally?
>>>
>>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>>>
>>>> I did check the database but didn't find anything relevant. My search
>>>> terms may not be optimal, though I did scan through all the results
>>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>>>> "0K->0K".
>>>>
>>>> I also suspected a bug in the collector and so I tried the same test
>>>> with the G1 collector, with the same OOM result. I didn't save the
>>>> log
>>>> from the G1 test, but I can quite easily redo the test with any
>>>> set of
>>>> JVM parameters that may be helpful in debugging -- the OOM seems
>>>> to be
>>>> easily and consistently reproducible with this application.
>>>>
>>>> Cheers,
>>>> Raman
>>>>
>>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen:
>>>>> 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)]
>>>>> 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)],
>>>>> 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]   
>>>>> UseAdaptiveSizePolicy actions to meet  *** throughput goal ***
>>>>>                       GC overhead (%)
>>>>>    Young generation:        2.65      (attempted to grow)
>>>>>    Tenured generation:      0.54      (attempted to grow)
>>>>>    Tenuring threshold:    (attempted to decrease to balance GC
>>>>> costs) = 1
>>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen:
>>>>> 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times:
>>>>> user=0.00 sys=0.00, real=0.00 secs]    UseAdaptiveSizePolicy
>>>>> actions to meet  *** throughput goal ***
>>>>>                       GC overhead (%)
>>>>>    Young generation:        2.01      (attempted to grow)
>>>>>    Tenured generation:      0.54      (no change)
>>>>>    Tenuring threshold:    (attempted to decrease to balance GC
>>>>> costs) = 1
>>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen:
>>>>> 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)]
>>>>> 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)],
>>>>> 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>>> First Full GC looks normal as PSOldGen is (174784-169093)K =
>>>>> 5691K from being full. Nothing happening in Perm.
>>>>> The second is where things start to get weird. I don't see why
>>>>> that GC was called. Stranger still, it's ~800ms *after* the full
>>>>> gc and yet no application thread allocated any memory out of
>>>>> young gen.
>>>>> for some reason that "failed" young gen collection triggers an
>>>>> immediate Full GC.
>>>>>
>>>>> Bug in the collector? Did you check the bug database?
>>>>>
>>>>> Regards,
>>>>> Kirk
>>>>>