Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently

srikalyan chandrashekar srikalyan.chandrashekar at oracle.com
Tue Jan 7 21:33:31 UTC 2014


Hi David, TraceExceptions with fastdebug build produced some nice trace 
<http://cr.openjdk.java.net/%7Esrikchan/OOME_exception_trace.log> . The 
native method wait(long) is where the OOME if being thrown, the deepest 
call is in

src/share/vm/gc_interface/collectedHeap.inline.hpp, line 157

------------------- Excerpt Begins ---------------------

147  if (!gc_overhead_limit_was_exceeded) {
148    // -XX:+HeapDumpOnOutOfMemoryError and -XX:OnOutOfMemoryError support
149    report_java_out_of_memory("Java heap space");
150
151    if (JvmtiExport::should_post_resource_exhausted()) {
152      JvmtiExport::post_resource_exhausted(
153        JVMTI_RESOURCE_EXHAUSTED_OOM_ERROR | JVMTI_RESOURCE_EXHAUSTED_JAVA_HEAP,
154        "Java heap space");
155    }
156
157    THROW_OOP_0(Universe::out_of_memory_error_java_heap());
158  } else {

------------------- Excerpt Ends ---------------------


Would be helpful if David/some one else in the team could explain the 
latent aspects/probable cause.

---
Thanks
kalyan

On 01/06/2014 04:40 PM, David Holmes wrote:
> Back from vacation ...
>
> On 20/12/2013 4:49 PM, David Holmes wrote:
>> On 20/12/2013 12:57 PM, srikalyan chandrashekar wrote:
>>> Hi David Thanks for your comments, the unguarded part(clean and 
>>> enqueue)
>>> in the Reference Handler thread does not seem to create any new 
>>> objects,
>>> so it is the application(the test in this case) which is adding objects
>>> to heap and causing the Reference Handler to die with OOME.
>>
>> The ReferenceHandler thread can only get OOME if it allocates (directly
>> or indirectly) - so there has to be something in the unguarded part that
>> causes this. Again it may be an implicit action in the VM - similar to
>> the class load issue for InterruptedException.
>
> Run a debug VM with -XX:+TraceExceptions to see where the OOME is 
> triggered.
>
> David
> -----
>
>> David
>>
>> I am still
>>> unsure about the side effects of the code change and agree with your
>>> thoughts(on memory exhaustion test's reliability).
>>>
>>> PS: hotspot dev alias removed from CC.
>>>
>>> -- 
>>> Thanks
>>> kalyan
>>>
>>> On 12/19/13 5:08 PM, David Holmes wrote:
>>>> Hi Kalyan,
>>>>
>>>> This is not a hotspot issue so I'm moving this to core-libs, please
>>>> drop hotspot from any replies.
>>>>
>>>> On 20/12/2013 6:26 AM, srikalyan wrote:
>>>>> Hi all,  I have been working on the bug JDK-8022321
>>>>> <https://bugs.openjdk.java.net/browse/JDK-8022321> , this is a 
>>>>> sporadic
>>>>> failure and the webrev is available here
>>>>> http://cr.openjdk.java.net/~srikchan/Regression/JDK-8022321_OOMEInReferenceHandler-webrev/ 
>>>>>
>>>>>
>>>>>
>>>>
>>>> I'm really not sure what to make of this. We have a test that triggers
>>>> an out-of-memory condition but the OOME can actually turn up in the
>>>> ReferenceHandler thread causing it to terminate and the test to fail.
>>>> We previously accounted for the non-obvious occurrences of OOME due to
>>>> the Object.wait and the possible need to load the InterruptedException
>>>> class - but still the OOME can appear where we don't want it. So
>>>> finally you have just placed the whole for(;;) loop in a
>>>> try/catch(OOME) that ignores the OOME. I'm certain that makes the test
>>>> happy, but I'm not sure it is really what we want for the
>>>> ReferenceHandler thread. If the OOME occurs while cleaning, or
>>>> enqueuing then we will fail to clean and/or enqueue but there would be
>>>> no indication that has occurred and I think that is a bigger problem
>>>> than this test failing.
>>>>
>>>> There may be no way to make this test 100% reliable. In fact I'd
>>>> suggest that no memory exhaustion test can be 100% reliable.
>>>>
>>>> David
>>>>
>>>>> *
>>>>> **"Root Cause:Still not known"*
>>>>> 2 places where there is a possibility for OOME
>>>>> 1) Cleaner.clean()
>>>>> 2) ReferenceQueue.enqueue()
>>>>>
>>>>> 1)  The cleanup code in turn has 2 places where there is potential 
>>>>> for
>>>>> throwing OOME,
>>>>>      a) thunk Thread which is run from clean() method. This 
>>>>> Runnable is
>>>>> passed to Cleaner and appears in the following classes
>>>>>          java/nio/DirectByteBuffer.java
>>>>>          sun/misc/Perf.java
>>>>>          sun/nio/fs/NativeBuffer.java
>>>>>          sun/nio/ch/IOVecWrapper.java
>>>>>          sun/misc/Cleaner/ExitOnThrow.java
>>>>> However none of the above overridden implementations ever create an
>>>>> object in the clean() code.
>>>>>      b) new PrivilegedAction created in try catch Exception block of
>>>>> clean() method but for this object to be created and to be held
>>>>> responsible for OOME an Exception(other than OOME) has to be thrown.
>>>>>
>>>>> 2) No new heap objects are created in the enqueue method nor
>>>>> anywhere in
>>>>> the deep call stack (VM.addFinalRefCount() etc) so this cannot be a
>>>>> potential cause.
>>>>>
>>>>> *Experimental change to java.lang.Reference.java* :
>>>>> - Put one more guard (try catch with OOME block) in the Reference
>>>>> Handler Thread which may give the Reference Handler a chance to
>>>>> cleanup.
>>>>> This is fixing the test failure (several 1000 runs with 0 failures)
>>>>> - Without the above change the test fails atleast 3-5 times for every
>>>>> 1000 run.
>>>>>
>>>>> *PS*: The code change is to a very critical part of JDK and i am 
>>>>> fully
>>>>> not aware of the consequences of the change, hence seeking expert 
>>>>> help
>>>>> here. Appreciate your time and inputs towards this.
>>>>>
>>>




More information about the core-libs-dev mailing list