Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
David Holmes
david.holmes at oracle.com
Wed Jan 8 06:30:17 UTC 2014
On 8/01/2014 4:19 PM, David Holmes wrote:
> On 8/01/2014 7:33 AM, srikalyan chandrashekar wrote:
>> Hi David, TraceExceptions with fastdebug build produced some nice trace
>> <http://cr.openjdk.java.net/%7Esrikchan/OOME_exception_trace.log> . The
>> native method wait(long) is where the OOME if being thrown, the deepest
>> call is in
>>
>> src/share/vm/gc_interface/collectedHeap.inline.hpp, line 157
>
> Yes but it is the caller that is of interest:
>
> Exception <a 'java/lang/OutOfMemoryError'> (0x00000000d6a01840)
> thrown
> [/HUDSON/workspace/8-2-build-linux-amd64/jdk8/1317/hotspot/src/share/vm/runtime/objectMonitor.cpp,
> line 1649]
> for thread 0x00007f78c40d2800
> Exception <a 'java/lang/OutOfMemoryError'> (0x00000000d6a01840)
> thrown in interpreter method <{method} {0x00007f78b4800ae0} 'wait'
> '(J)V' in 'java/lang/Object'>
> at bci 0 for thread 0x00007f78c40d2800
>
> The ReferenceHandler thread gets the OOME trying to allocate the
> InterruptedException.
However we already have a catch block around the wait() so how is this
OOME getting through? A bug in exception handling in the interpreter ??
David
> David
> -----
>
>> ------------------- Excerpt Begins ---------------------
>>
>> 147 if (!gc_overhead_limit_was_exceeded) {
>> 148 // -XX:+HeapDumpOnOutOfMemoryError and -XX:OnOutOfMemoryError
>> support
>> 149 report_java_out_of_memory("Java heap space");
>> 150
>> 151 if (JvmtiExport::should_post_resource_exhausted()) {
>> 152 JvmtiExport::post_resource_exhausted(
>> 153 JVMTI_RESOURCE_EXHAUSTED_OOM_ERROR |
>> JVMTI_RESOURCE_EXHAUSTED_JAVA_HEAP,
>> 154 "Java heap space");
>> 155 }
>> 156
>> 157 THROW_OOP_0(Universe::out_of_memory_error_java_heap());
>> 158 } else {
>>
>> ------------------- Excerpt Ends ---------------------
>>
>>
>> Would be helpful if David/some one else in the team could explain the
>> latent aspects/probable cause.
>>
>> ---
>> Thanks
>> kalyan
>>
>> On 01/06/2014 04:40 PM, David Holmes wrote:
>>> Back from vacation ...
>>>
>>> On 20/12/2013 4:49 PM, David Holmes wrote:
>>>> On 20/12/2013 12:57 PM, srikalyan chandrashekar wrote:
>>>>> Hi David Thanks for your comments, the unguarded part(clean and
>>>>> enqueue)
>>>>> in the Reference Handler thread does not seem to create any new
>>>>> objects,
>>>>> so it is the application(the test in this case) which is adding
>>>>> objects
>>>>> to heap and causing the Reference Handler to die with OOME.
>>>>
>>>> The ReferenceHandler thread can only get OOME if it allocates (directly
>>>> or indirectly) - so there has to be something in the unguarded part
>>>> that
>>>> causes this. Again it may be an implicit action in the VM - similar to
>>>> the class load issue for InterruptedException.
>>>
>>> Run a debug VM with -XX:+TraceExceptions to see where the OOME is
>>> triggered.
>>>
>>> David
>>> -----
>>>
>>>> David
>>>>
>>>> I am still
>>>>> unsure about the side effects of the code change and agree with your
>>>>> thoughts(on memory exhaustion test's reliability).
>>>>>
>>>>> PS: hotspot dev alias removed from CC.
>>>>>
>>>>> --
>>>>> Thanks
>>>>> kalyan
>>>>>
>>>>> On 12/19/13 5:08 PM, David Holmes wrote:
>>>>>> Hi Kalyan,
>>>>>>
>>>>>> This is not a hotspot issue so I'm moving this to core-libs, please
>>>>>> drop hotspot from any replies.
>>>>>>
>>>>>> On 20/12/2013 6:26 AM, srikalyan wrote:
>>>>>>> Hi all, I have been working on the bug JDK-8022321
>>>>>>> <https://bugs.openjdk.java.net/browse/JDK-8022321> , this is a
>>>>>>> sporadic
>>>>>>> failure and the webrev is available here
>>>>>>> http://cr.openjdk.java.net/~srikchan/Regression/JDK-8022321_OOMEInReferenceHandler-webrev/
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> I'm really not sure what to make of this. We have a test that
>>>>>> triggers
>>>>>> an out-of-memory condition but the OOME can actually turn up in the
>>>>>> ReferenceHandler thread causing it to terminate and the test to fail.
>>>>>> We previously accounted for the non-obvious occurrences of OOME
>>>>>> due to
>>>>>> the Object.wait and the possible need to load the
>>>>>> InterruptedException
>>>>>> class - but still the OOME can appear where we don't want it. So
>>>>>> finally you have just placed the whole for(;;) loop in a
>>>>>> try/catch(OOME) that ignores the OOME. I'm certain that makes the
>>>>>> test
>>>>>> happy, but I'm not sure it is really what we want for the
>>>>>> ReferenceHandler thread. If the OOME occurs while cleaning, or
>>>>>> enqueuing then we will fail to clean and/or enqueue but there
>>>>>> would be
>>>>>> no indication that has occurred and I think that is a bigger problem
>>>>>> than this test failing.
>>>>>>
>>>>>> There may be no way to make this test 100% reliable. In fact I'd
>>>>>> suggest that no memory exhaustion test can be 100% reliable.
>>>>>>
>>>>>> David
>>>>>>
>>>>>>> *
>>>>>>> **"Root Cause:Still not known"*
>>>>>>> 2 places where there is a possibility for OOME
>>>>>>> 1) Cleaner.clean()
>>>>>>> 2) ReferenceQueue.enqueue()
>>>>>>>
>>>>>>> 1) The cleanup code in turn has 2 places where there is potential
>>>>>>> for
>>>>>>> throwing OOME,
>>>>>>> a) thunk Thread which is run from clean() method. This
>>>>>>> Runnable is
>>>>>>> passed to Cleaner and appears in the following classes
>>>>>>> java/nio/DirectByteBuffer.java
>>>>>>> sun/misc/Perf.java
>>>>>>> sun/nio/fs/NativeBuffer.java
>>>>>>> sun/nio/ch/IOVecWrapper.java
>>>>>>> sun/misc/Cleaner/ExitOnThrow.java
>>>>>>> However none of the above overridden implementations ever create an
>>>>>>> object in the clean() code.
>>>>>>> b) new PrivilegedAction created in try catch Exception block of
>>>>>>> clean() method but for this object to be created and to be held
>>>>>>> responsible for OOME an Exception(other than OOME) has to be thrown.
>>>>>>>
>>>>>>> 2) No new heap objects are created in the enqueue method nor
>>>>>>> anywhere in
>>>>>>> the deep call stack (VM.addFinalRefCount() etc) so this cannot be a
>>>>>>> potential cause.
>>>>>>>
>>>>>>> *Experimental change to java.lang.Reference.java* :
>>>>>>> - Put one more guard (try catch with OOME block) in the Reference
>>>>>>> Handler Thread which may give the Reference Handler a chance to
>>>>>>> cleanup.
>>>>>>> This is fixing the test failure (several 1000 runs with 0 failures)
>>>>>>> - Without the above change the test fails atleast 3-5 times for
>>>>>>> every
>>>>>>> 1000 run.
>>>>>>>
>>>>>>> *PS*: The code change is to a very critical part of JDK and i am
>>>>>>> fully
>>>>>>> not aware of the consequences of the change, hence seeking expert
>>>>>>> help
>>>>>>> here. Appreciate your time and inputs towards this.
>>>>>>>
>>>>>
>>
More information about the core-libs-dev
mailing list