Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently

srikalyan chandrashekar srikalyan.chandrashekar at oracle.com
Thu Jan 16 00:19:03 UTC 2014


Hi Peter/David, we could finally get a trace of exception with fastdebug 
build and ReferenceHandler modified (with runImpl() added and called 
from run()). The logs, disassembled code is available in JIRA 
<https://bugs.openjdk.java.net/browse/JDK-8022321> as attachments.

Observations from the log:

Root Cause:
1) UncaughtException is being dispatched from Reference.java:143
141                   Reference<Object> r;
142                   synchronized (lock) {
143                        if (pending != null) {
144                            r = pending;
145                            pending = r.discovered;
146                            r.discovered = null;

pending field in Reference is touched and updated by the collector, so 
at line 143 when the execution context is in Reference handler there 
might have been an Exception pending due to allocation done by collector 
which causes ReferenceHandler thread to die.

Suggested fix:
- As proposed earlier putting an outer guard(try-catch on OOME) in the 
ReferenceHandler will fix the issue, if ReferenceHandler is considered 
as part of the GC sub system then it should be alive even in the midst 
of an OOME so i feel that the additional guard should be allowed, 
however i might still be ignorant of vital implications.
- Apart from the above changes, Peter's suggestion to create and call a 
private runImpl() from run() in ReferenceHandler makes sense to me.


---
Thanks
kalyan

On 01/13/2014 03:57 PM, srikalyan wrote:
>
> On 1/11/14, 6:15 AM, Peter Levart wrote:
>>
>> On 01/10/2014 10:51 PM, srikalyan chandrashekar wrote:
>>> Hi Peter the version you provided ran indefinitely(i put a 10 minute 
>>> timeout) and the program got interrupted(no error),
>>
>> Did you run it with or without fastedbug & -XX:+TraceExceptions ? If 
>> with, it might be that fastdebug and/or -XX:+TraceExceptions changes 
>> the execution a bit so that we can no longer reproduce the wrong 
>> behaviour.
> With fastdebug & -XX:TraceExceptions. I will try combination of 
> possible options(i.e without -XX:TraceEception on debug build etc) soon.
>>
>>> even if there were to be an error you cannot print the "string" of 
>>> thread to console(these have been attempted earlier).
>>
>> ...it has been attempted to print toString in uncaught exception 
>> handler. At that time, the heap is still full. I'm printing it after 
>> the GC has cleared the heap. You can try that it works by commenting 
>> out the "try {" and corresponding "} catch (OOME x) {}" exception 
>> handler...
> Since there is a GC call prior to printing string i will give that a 
> shot with non-debug build.
>>
>>> - The test's running on interpreter mode, what i am watching for is 
>>> one error with trace. Without fastdebug build and 
>>> -XX:+TraceExceptions i am able to reproduce failure atleast 5 
>>> failures out of 1000 runs but with fastdebug+Trace no luck 
>>> yet(already past few 1000 runs).
>>
>> It might be interesting to try with fastebug build but without the 
>> -XX:+TraceExceptions option to see what has an effect on it. It might 
>> also be interesting to try the modified ReferenceHandler (the one 
>> with private runImpl() method called from run()) and with normal 
>> non-fastdebug JDK. This info might be useful when one starts to 
>> inspect the exception handling code in interpreter...
>>
>> Regards, Peter
>>
>
> -- 
> Thanks
> kalyan
> Ph: (408)-585-8040
>
>>>
>>> ---
>>> Thanks
>>> kalyan
>>>
>>> On 01/10/2014 02:57 AM, Peter Levart wrote:
>>>> On 01/10/2014 09:31 AM, Peter Levart wrote:
>>>>> Since we suspect there's something wrong with exception handling 
>>>>> in interpreter, I devised a hypothetical reproducer that tries to 
>>>>> simulate ReferenceHandler in many aspects, but doesn't require to 
>>>>> be a ReferenceHandler:
>>>>>
>>>>> http://cr.openjdk.java.net/~plevart/misc/OOME/OOMECatchingTest.java
>>>>>
>>>>> This is designed to run indefinitely and only terminate if/when 
>>>>> thread dies. Could you run this program in the environment that 
>>>>> causes the OOMEInReferenceHandler test to fail and see if it 
>>>>> terminates?
>>>>
>>>> I forgot to mention that in order for this long-running program to 
>>>> exhibit interpreter behaviour, it should be run with -Xint option. 
>>>> So I suggest:
>>>>
>>>> -Xmx24M -XX:-UseTLAB -Xint
>>>>
>>>> Regards, Peter
>>>>
>>>
>>




More information about the core-libs-dev mailing list