Heap OOM for no apparent reason?

Fri Jun 3 05:54:45 UTC 2011

Sorry, sent previous email without addressing all of the issues.

On 6/2/2011 6:15 PM, Raman Gupta wrote:
> It would be *really* handy if there were a switch like:
>
> -XX:+StackTraceOnOutOfMemoryError

Yes that would be handy, and probably not too difficult.
But I wonder also if something like OnOutOfMemoryError or
like would already get you enough info to get close to
the problem ... (although may be because it's executed in
a separate shell, by the time the command executes the
process has likely gone well past the point when the problem
occurred).

>
> to force the stack trace to be shown. Obviously looking at every line
> of code of every library my application uses, including core JDK
> libraries, for code paths where large amounts of heap may be allocated
> and the associated OOME is caught and swallowed, is pretty much
> impossible.

Try to look for places where OOME (or supertype?) is caught. I am
hoping there aren't too many of those...

>
> I think my next step is to increase the max heap size to a large value
> which hopefully allows the large allocation to occur without failure,
> and then periodically take heap dumps to isolate it.

Yes that seems reasonable, or may be use an allocation profiler
with the larger heap and find it that way...

-- ramki

>
> Thanks,
> Raman
>
> On 06/02/2011 01:46 PM, Y. S. Ramakrishna wrote:
>> If your code is not catching the OOM exception you'd
>> expect to see the stack retrace when the program dies.
>> If it catches the exception and carries on, you'd want
>> it to print the exception detail. I don't know of
>> cases where the exception would just disappear.
>>
>> In your case the report to stdout/stderr(?)that an OOM occurred and that
>> the heap is being dumped comes from inside the JVM
>> because you have asked for -XX:+HeapDumpOnOutOfMemoryError.
>> After this point, your allocating thread would have gotten
>> an OOME which it probably caught and swallowed, and hence
>> the silence wrt the stack retrace you would normally see. You
>> will want to look at your Infinispan code to see how
>> it deals with the inability to allocate said large objects.
>>
>> Recall that object size is limited by the size of and
>> available space in the largest area (Eden or Old) in your
>> Java heap. As Kirk noted, the full gc was to attempt allocation
>> of an object that didn't fit into the available space in
>> Eden or in Old (so from that you can estimate the size of
>> the request).
>>
>> Note also that the JDK libraries will resize hashtables under
>> you and that can also cause large allocation requests
>> (but i don't know how they handle OOM's resulting from such
>> allocations).
>>
>> -- ramki
>>
>> On 06/02/11 09:56, Raman Gupta wrote:
>>> I do tend to think that somewhere a large object or array is being
>>> created. In particular, Infinispan is one library we are using that
>>> may be allocating large chunks of memory -- indeed, replacing
>>> Infinispan with a local cache does seem to "fix" the problem.
>>>
>>> However, more information from the JVM would really be useful in
>>> isolating the offending code in Infinispan. Ideally,
>>>
>>> a) any large allocations should show up as part of the heap dump if
>>> the allocation succeeded but then some other subsequent code caused
>>> the OOM, or
>>>
>>> b) if the allocation itself failed, the OOM exception should include a
>>> stack trace that would allow me to isolate the allocation point  (as
>>> it does normally, but for some reason in this case doesn't).
>>>
>>> In this case the heap dump shows plenty of room in heap, and there is
>>> no stack trace at the OOM, so I don't really have any way to isolate
>>> the offending allocation point. In which situations does the OOM
>>> exception get printed without an associated stack trace?
>>>
>>> Cheers,
>>> Raman
>>>
>>>
>>> On 06/02/2011 12:01 PM, Charles K Pepperdine wrote:
>>>> are you trying to create a humungous object or array? Accidentally?
>>>>
>>>> On Jun 2, 2011, at 5:35 PM, Raman Gupta wrote:
>>>>
>>>>> I did check the database but didn't find anything relevant. My search
>>>>> terms may not be optimal, though I did scan through all the results
>>>>> returned by "java.lang.OutOfMemoryError: Java heap space" as well as
>>>>> "0K->0K".
>>>>>
>>>>> I also suspected a bug in the collector and so I tried the same test
>>>>> with the G1 collector, with the same OOM result. I didn't save the
>>>>> log
>>>>> from the G1 test, but I can quite easily redo the test with any
>>>>> set of
>>>>> JVM parameters that may be helpful in debugging -- the OOM seems
>>>>> to be
>>>>> easily and consistently reproducible with this application.
>>>>>
>>>>> Cheers,
>>>>> Raman
>>>>>
>>>>> On 06/02/2011 09:20 AM, Charles K Pepperdine wrote:
>>>>>> 2011-05-27T11:35:18.214-0400: 147.823: [Full GC [PSYoungGen:
>>>>>> 7662K->0K(72896K)] [PSOldGen: 169093K->62518K(174784K)]
>>>>>> 176755K->62518K(247680K) [PSPermGen: 27342K->27342K(55232K)],
>>>>>> 0.8074580 secs] [Times: user=0.29 sys=0.03, real=0.81 secs]
>>>>>> UseAdaptiveSizePolicy actions to meet  *** throughput goal ***
>>>>>>                        GC overhead (%)
>>>>>>     Young generation:        2.65      (attempted to grow)
>>>>>>     Tenured generation:      0.54      (attempted to grow)
>>>>>>     Tenuring threshold:    (attempted to decrease to balance GC
>>>>>> costs) = 1
>>>>>> 2011-05-27T11:35:19.021-0400: 148.631: [GC [PSYoungGen:
>>>>>> 0K->0K(57856K)] 62518K->62518K(232640K), 0.0009670 secs] [Times:
>>>>>> user=0.00 sys=0.00, real=0.00 secs]    UseAdaptiveSizePolicy
>>>>>> actions to meet  *** throughput goal ***
>>>>>>                        GC overhead (%)
>>>>>>     Young generation:        2.01      (attempted to grow)
>>>>>>     Tenured generation:      0.54      (no change)
>>>>>>     Tenuring threshold:    (attempted to decrease to balance GC
>>>>>> costs) = 1
>>>>>> 2011-05-27T11:35:19.022-0400: 148.632: [Full GC [PSYoungGen:
>>>>>> 0K->0K(57856K)] [PSOldGen: 62518K->52158K(174784K)]
>>>>>> 62518K->52158K(232640K) [PSPermGen: 27342K->26866K(61056K)],
>>>>>> 0.3614330 secs] [Times: user=0.32 sys=0.02, real=0.37 secs]
>>>>>> First Full GC looks normal as PSOldGen is (174784-169093)K =
>>>>>> 5691K from being full. Nothing happening in Perm.
>>>>>> The second is where things start to get weird. I don't see why
>>>>>> that GC was called. Stranger still, it's ~800ms *after* the full
>>>>>> gc and yet no application thread allocated any memory out of
>>>>>> young gen.
>>>>>> for some reason that "failed" young gen collection triggers an
>>>>>> immediate Full GC.
>>>>>>
>>>>>> Bug in the collector? Did you check the bug database?
>>>>>>
>>>>>> Regards,
>>>>>> Kirk
>>>>>>