Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
David Holmes
david.holmes at oracle.com
Fri Jan 17 04:38:14 UTC 2014
On 17/01/2014 1:31 PM, srikalyan chandrashekar wrote:
> Hi David, the disassembled code is also attached to the bug. Per my
Sorry missed that.
> analysis the exception was thrown when Reference Handler was on line 143
> as put in the earlier email.
But if the numbers in the dissassembly match the BCI then 65 shows:
65: instanceof #11 // class sun/misc/Cleaner
which makes more sense, the runtime instanceof check might encounter an
OOME condition. I wish there was some easy way to trace into the full
call chain as TraceExceptions doesn't show you any runtime frames :(
Still, it is easy enough to check:
// Fast path for cleaners
boolean isCleaner = false;
try {
isCleaner = r instanceof Cleaner;
} catch (OutofMemoryError oome) {
continue;
}
if (isCleaner) {
((Cleaner)r).clean();
continue;
}
Thanks,
David
> --
> Thanks
> kalyan
>
> On 1/16/14 6:16 PM, David Holmes wrote:
>> On 17/01/2014 4:48 AM, srikalyan wrote:
>>> Hi David
>>>
>>> On 1/15/14, 9:04 PM, David Holmes wrote:
>>>> On 16/01/2014 10:19 AM, srikalyan chandrashekar wrote:
>>>>> Hi Peter/David, we could finally get a trace of exception with
>>>>> fastdebug
>>>>> build and ReferenceHandler modified (with runImpl() added and called
>>>>> from run()). The logs, disassembled code is available in JIRA
>>>>> <https://bugs.openjdk.java.net/browse/JDK-8022321> as attachments.
>>>>
>>>> All I can see is the log for the OOMECatchingTest program not one for
>>>> the actual ReferenceHandler ??
>>>>
>>> Please search for ReferenceHandler in the log.
>>>>> Observations from the log:
>>>>>
>>>>> Root Cause:
>>>>> 1) UncaughtException is being dispatched from Reference.java:143
>>>>> 141 Reference<Object> r;
>>>>> 142 synchronized (lock) {
>>>>> 143 if (pending != null) {
>>>>> 144 r = pending;
>>>>> 145 pending = r.discovered;
>>>>> 146 r.discovered = null;
>>>>>
>>>>> pending field in Reference is touched and updated by the collector, so
>>>>> at line 143 when the execution context is in Reference handler there
>>>>> might have been an Exception pending due to allocation done by
>>>>> collector
>>>>> which causes ReferenceHandler thread to die.
>>>>
>>>> Sorry but the GC does not trigger asynchronous exceptions so this
>>>> explanation does not make any sense to me. What part of the log led
>>>> you to this conclusion?
>>> ------------------ Log Excerpt begins ------------------
>>> Exception <a 'java/lang/OutOfMemoryError'> (0x00000000ff7808e8)
>>> thrown
>>> [/home/srikalyc/work/ora2013/infracleanup/jdk8/hotspot/src/share/vm/gc_interface/collectedHeap.inline.hpp,
>>>
>>> line 168]
>>> for thread 0x00007feed80cf800
>>> Exception <a 'java/lang/OutOfMemoryError'> (0x00000000ff7808e8)
>>> thrown in interpreter method <{method} {0x00007feeddd3c600} 'runImpl'
>>> '()V' in 'java/lang/ref/Reference$ReferenceHandler'>
>>> at bci 65 for thread 0x00007feed80cf800
>>> Exception <a 'java/lang/OutOfMemoryError'> (0x00000000ff7808e8)
>>> thrown in interpreter method <{method} {0x00007feeddd3c478} 'run'
>>> '()V' in 'java/lang/ref/Reference$ReferenceHandler'>
>>> at bci 1 for thread 0x00007feed80cf800
>>> Exception <a 'java/lang/OutOfMemoryError'> (0x00000000ff780868)
>>> thrown
>>> [/home/srikalyc/work/ora2013/infracleanup/jdk8/hotspot/src/share/vm/gc_interface/collectedHeap.inline.hpp,
>>>
>>> line 157]
>>> for thread 0x00007feed80cf800
>>> Exception <a 'java/lang/OutOfMemoryError'> (0x00000000ff780868)
>>> thrown in interpreter method <{method} {0x00007feeddcaaf90}
>>> 'uncaughtException' '(Ljava/lang/Thread;Ljava/lang/Throwable;)V' in '>
>>> at bci 48 for thread 0x00007feed80cf800
>>> Exception <a 'java/lang/OutOfMemoryError'> (0x00000000ff780868)
>>> thrown in interpreter method <{method} {0x00007feeddca7298}
>>> 'dispatchUncaughtException' '(Ljava/lang/Throwable;)V' in 'java/lang/>
>>> at bci 6 for thread 0x00007feed80cf800
>>> ------------------ Log Excerpt ends ------------------
>>> Sorry if it is a wrong understanding.
>>
>> What you are seeing there is an OOME escaping the run() method which
>> will cause the uncaughtExceptionHandler to be run which then triggers
>> a second OOME (likely as it tries to report information about the
>> first OOME). The first exception occurred in runImpl at BCI 65. Can
>> you disassemble (javap -c) the class you used so we can see what is at
>> BCI 65.
>>
>> Thanks,
>> David
>>
>>>>
>>>>> Suggested fix:
>>>>> - As proposed earlier putting an outer guard(try-catch on OOME) in the
>>>>> ReferenceHandler will fix the issue, if ReferenceHandler is considered
>>>>> as part of the GC sub system then it should be alive even in the midst
>>>>> of an OOME so i feel that the additional guard should be allowed,
>>>>> however i might still be ignorant of vital implications.
>>>>> - Apart from the above changes, Peter's suggestion to create and
>>>>> call a
>>>>> private runImpl() from run() in ReferenceHandler makes sense to me.
>>>>
>>>> Why would we need this?
>>>>
>>>> David
>>>> -----
>>>>
>>>>>
>>>>> ---
>>>>> Thanks
>>>>> kalyan
>>>>>
>>>>> On 01/13/2014 03:57 PM, srikalyan wrote:
>>>>>>
>>>>>> On 1/11/14, 6:15 AM, Peter Levart wrote:
>>>>>>>
>>>>>>> On 01/10/2014 10:51 PM, srikalyan chandrashekar wrote:
>>>>>>>> Hi Peter the version you provided ran indefinitely(i put a 10
>>>>>>>> minute
>>>>>>>> timeout) and the program got interrupted(no error),
>>>>>>>
>>>>>>> Did you run it with or without fastedbug & -XX:+TraceExceptions ? If
>>>>>>> with, it might be that fastdebug and/or -XX:+TraceExceptions changes
>>>>>>> the execution a bit so that we can no longer reproduce the wrong
>>>>>>> behaviour.
>>>>>> With fastdebug & -XX:TraceExceptions. I will try combination of
>>>>>> possible options(i.e without -XX:TraceEception on debug build etc)
>>>>>> soon.
>>>>>>>
>>>>>>>> even if there were to be an error you cannot print the "string" of
>>>>>>>> thread to console(these have been attempted earlier).
>>>>>>>
>>>>>>> ...it has been attempted to print toString in uncaught exception
>>>>>>> handler. At that time, the heap is still full. I'm printing it after
>>>>>>> the GC has cleared the heap. You can try that it works by commenting
>>>>>>> out the "try {" and corresponding "} catch (OOME x) {}" exception
>>>>>>> handler...
>>>>>> Since there is a GC call prior to printing string i will give that a
>>>>>> shot with non-debug build.
>>>>>>>
>>>>>>>> - The test's running on interpreter mode, what i am watching for is
>>>>>>>> one error with trace. Without fastdebug build and
>>>>>>>> -XX:+TraceExceptions i am able to reproduce failure atleast 5
>>>>>>>> failures out of 1000 runs but with fastdebug+Trace no luck
>>>>>>>> yet(already past few 1000 runs).
>>>>>>>
>>>>>>> It might be interesting to try with fastebug build but without the
>>>>>>> -XX:+TraceExceptions option to see what has an effect on it. It
>>>>>>> might
>>>>>>> also be interesting to try the modified ReferenceHandler (the one
>>>>>>> with private runImpl() method called from run()) and with normal
>>>>>>> non-fastdebug JDK. This info might be useful when one starts to
>>>>>>> inspect the exception handling code in interpreter...
>>>>>>>
>>>>>>> Regards, Peter
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Thanks
>>>>>> kalyan
>>>>>> Ph: (408)-585-8040
>>>>>>
>>>>>>>>
>>>>>>>> ---
>>>>>>>> Thanks
>>>>>>>> kalyan
>>>>>>>>
>>>>>>>> On 01/10/2014 02:57 AM, Peter Levart wrote:
>>>>>>>>> On 01/10/2014 09:31 AM, Peter Levart wrote:
>>>>>>>>>> Since we suspect there's something wrong with exception handling
>>>>>>>>>> in interpreter, I devised a hypothetical reproducer that tries to
>>>>>>>>>> simulate ReferenceHandler in many aspects, but doesn't require to
>>>>>>>>>> be a ReferenceHandler:
>>>>>>>>>>
>>>>>>>>>> http://cr.openjdk.java.net/~plevart/misc/OOME/OOMECatchingTest.java
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> This is designed to run indefinitely and only terminate if/when
>>>>>>>>>> thread dies. Could you run this program in the environment that
>>>>>>>>>> causes the OOMEInReferenceHandler test to fail and see if it
>>>>>>>>>> terminates?
>>>>>>>>>
>>>>>>>>> I forgot to mention that in order for this long-running program to
>>>>>>>>> exhibit interpreter behaviour, it should be run with -Xint option.
>>>>>>>>> So I suggest:
>>>>>>>>>
>>>>>>>>> -Xmx24M -XX:-UseTLAB -Xint
>>>>>>>>>
>>>>>>>>> Regards, Peter
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>
More information about the core-libs-dev
mailing list