Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently

srikalyan chandrashekar srikalyan.chandrashekar at oracle.com
Thu Jan 9 23:59:02 UTC 2014


David/Peter you are right, the logs trace came from passed run, i am 
trying to simulate the failure and get the logs for failed runs(2000+ 
runs done and still no failure), will get back to you once i have the 
data from failed run. Sorry for the confusion.

---
Thanks
kalyan

On 01/08/2014 11:22 PM, David Holmes wrote:
> Thanks Peter.
>
> Kalyan: Can you confirm, as Peter asked, that the TraceExceptions 
> output came from a failed run?
>
> AFAICS the Trace info is printed after each bytecode where there is a 
> pending exception - though I'm not 100% sure on the printing within 
> the VM runtime. Based on that I think we see the Trace output in run() 
> at the point where wait() returns, so it may well be caught after that 
> - in which case this was not a failing run.
>
> I also can't reproduce the problem :(
>
> David
>
> On 8/01/2014 10:34 PM, Peter Levart wrote:
>> On 01/08/2014 07:30 AM, David Holmes wrote:
>>> On 8/01/2014 4:19 PM, David Holmes wrote:
>>>> On 8/01/2014 7:33 AM, srikalyan chandrashekar wrote:
>>>>> Hi David, TraceExceptions with fastdebug build produced some nice 
>>>>> trace
>>>>> <http://cr.openjdk.java.net/%7Esrikchan/OOME_exception_trace.log> 
>>>>> . The
>>>>> native method wait(long) is where the OOME if being thrown, the 
>>>>> deepest
>>>>> call is in
>>>>>
>>>>> src/share/vm/gc_interface/collectedHeap.inline.hpp, line 157
>>>>
>>>> Yes but it is the caller that is of interest:
>>>>
>>>> Exception <a 'java/lang/OutOfMemoryError'> (0x00000000d6a01840)
>>>> thrown
>>>> [/HUDSON/workspace/8-2-build-linux-amd64/jdk8/1317/hotspot/src/share/vm/runtime/objectMonitor.cpp, 
>>>>
>>>>
>>>> line 1649]
>>>> for thread 0x00007f78c40d2800
>>>> Exception <a 'java/lang/OutOfMemoryError'> (0x00000000d6a01840)
>>>>   thrown in interpreter method <{method} {0x00007f78b4800ae0} 'wait'
>>>> '(J)V' in 'java/lang/Object'>
>>>>   at bci 0 for thread 0x00007f78c40d2800
>>>>
>>>> The ReferenceHandler thread gets the OOME trying to allocate the
>>>> InterruptedException.
>>>
>>> However we already have a catch block around the wait() so how is this
>>> OOME getting through? A bug in exception handling in the interpreter ??
>>>
>>
>> Might be. And it may have something to do with the fact that the
>> Thread.run() method is the 1st call frame on the thread's stack (seems
>> like corner case). The last few meaningful TraceExceptions records are:
>>
>>
>> Exception <a 'java/lang/OutOfMemoryError'> (0x00000000d6a01840)
>> thrown
>> [/HUDSON/workspace/8-2-build-linux-amd64/jdk8/1317/hotspot/src/share/vm/gc_interface/collectedHeap.inline.hpp, 
>>
>> line 157]
>> for thread 0x00007f78c40d2800
>> Exception <a 'java/lang/OutOfMemoryError'> (0x00000000d6a01840)
>> thrown
>> [/HUDSON/workspace/8-2-build-linux-amd64/jdk8/1317/hotspot/src/share/vm/runtime/objectMonitor.cpp, 
>>
>> line 1649]
>> for thread 0x00007f78c40d2800
>> Exception <a 'java/lang/OutOfMemoryError'> (0x00000000d6a01840)
>>   thrown in interpreter method <{method} {0x00007f78b4800ae0} 'wait'
>> '(J)V' in 'java/lang/Object'>
>>   at bci 0 for thread 0x00007f78c40d2800
>> Exception <a 'java/lang/OutOfMemoryError'> (0x00000000d6a01840)
>>   thrown in interpreter method <{method} {0x00007f78b4800ca8} 'wait'
>> '()V' in 'java/lang/Object'>
>>   at *bci 2* for thread 0x00007f78c40d2800
>> Exception <a 'java/lang/OutOfMemoryError'> (0x00000000d6a01840)
>>   thrown in interpreter method <{method} {0x00007f78b48d2250} 'run'
>> '()V' in 'java/lang/ref/Reference$ReferenceHandler'>
>>   at *bci 36* for thread 0x00007f78c40d2800
>>
>>
>> Here's the relevant bytecodes:
>>
>>
>> public class java.lang.Object
>>
>>    public final void wait() throws java.lang.InterruptedException;
>>      descriptor: ()V
>>      flags: ACC_PUBLIC, ACC_FINAL
>>      Code:
>>        stack=3, locals=1, args_size=1
>>           0: aload_0
>>           1: lconst_0
>> *         2: invokevirtual #73                 // Method wait:(J)V*
>>           5: return
>>        LineNumberTable:
>>          line 502: 0
>>          line 503: 5
>>      Exceptions:
>>        throws java.lang.InterruptedException
>>
>>
>> class java.lang.ref.Reference$ReferenceHandler extends java.lang.Thread
>>
>>    public void run();
>>      descriptor: ()V
>>      flags: ACC_PUBLIC
>>      Code:
>>        stack=2, locals=5, args_size=1
>>           0: invokestatic  #62                 // Method
>> java/lang/ref/Reference.access$100:()Ljava/lang/ref/Reference$Lock;
>>           3: dup
>>           4: astore_2
>>           5: monitorenter
>>           6: invokestatic  #61                 // Method
>> java/lang/ref/Reference.access$200:()Ljava/lang/ref/Reference;
>>           9: ifnull        33
>>          12: invokestatic  #61                 // Method
>> java/lang/ref/Reference.access$200:()Ljava/lang/ref/Reference;
>>          15: astore_1
>>          16: aload_1
>>          17: invokestatic  #64                 // Method
>> java/lang/ref/Reference.access$300:(Ljava/lang/ref/Reference;)Ljava/lang/ref/Reference; 
>>
>>          20: invokestatic  #63                 // Method
>> java/lang/ref/Reference.access$202:(Ljava/lang/ref/Reference;)Ljava/lang/ref/Reference; 
>>
>>          23: pop
>>          24: aload_1
>>          25: aconst_null
>>          26: invokestatic  #65                 // Method
>> java/lang/ref/Reference.access$302:(Ljava/lang/ref/Reference;Ljava/lang/ref/Reference;)Ljava/lang/ref/Reference; 
>>
>>          29: pop
>>          30: goto          52
>>          33: invokestatic  #62                 // Method
>> java/lang/ref/Reference.access$100:()Ljava/lang/ref/Reference$Lock;
>> *        36: invokevirtual #59                 // Method
>> java/lang/Object.wait:()V*
>>          39: goto          43
>>          42: astore_3
>>          43: goto          47
>>          46: astore_3
>>          47: aload_2
>>          48: monitorexit
>>          49: goto          0
>>          52: aload_2
>>          53: monitorexit
>>          54: goto          64
>>          57: astore        4
>>          59: aload_2
>>          60: monitorexit
>>          61: aload         4
>>          63: athrow
>>          64: aload_1
>>          65: instanceof    #38                 // class sun/misc/Cleaner
>>          68: ifeq          81
>>          71: aload_1
>>          72: checkcast     #38                 // class sun/misc/Cleaner
>>          75: invokevirtual #67                 // Method
>> sun/misc/Cleaner.clean:()V
>>          78: goto          0
>>          81: aload_1
>>          82: getfield      #57                 // Field
>> java/lang/ref/Reference.queue:Ljava/lang/ref/ReferenceQueue;
>>          85: astore_2
>>          86: aload_2
>>          87: getstatic     #58                 // Field
>> java/lang/ref/ReferenceQueue.NULL:Ljava/lang/ref/ReferenceQueue;
>>          90: if_acmpeq     99
>>          93: aload_2
>>          94: aload_1
>>          95: invokevirtual #66                 // Method
>> java/lang/ref/ReferenceQueue.enqueue:(Ljava/lang/ref/Reference;)Z
>>          98: pop
>>          99: goto          0
>>        Exception table:
>>           from    to  target type
>> *            33    39    42   Class java/lang/OutOfMemoryError*
>>              33    43    46   Class java/lang/InterruptedException
>>               6    49    57   any
>>              52    54    57   any
>>              57    61    57   any
>>        StackMapTable: number_of_entries = 11
>>             frame_type = 0 /* same */
>>             frame_type = 253 /* append */
>>            offset_delta = 32
>>            locals = [ top, class java/lang/Object ]
>>               frame_type = 72 /* same_locals_1_stack_item */
>>              stack = [ class java/lang/OutOfMemoryError ]
>>               frame_type = 0 /* same */
>>               frame_type = 66 /* same_locals_1_stack_item */
>>              stack = [ class java/lang/InterruptedException ]
>>               frame_type = 0 /* same */
>>               frame_type = 255 /* full_frame */
>>              offset_delta = 4
>>              locals = [ class java/lang/ref/Reference$ReferenceHandler,
>> class java/lang/ref/Reference, class java/lang/Object ]
>>              stack = []
>>               frame_type = 255 /* full_frame */
>>              offset_delta = 4
>>              locals = [ class java/lang/ref/Reference$ReferenceHandler,
>> top, class java/lang/Object ]
>>              stack = [ class java/lang/Throwable ]
>>               frame_type = 255 /* full_frame */
>>              offset_delta = 6
>>              locals = [ class java/lang/ref/Reference$ReferenceHandler,
>> class java/lang/ref/Reference ]
>>              stack = []
>>               frame_type = 16 /* same */
>>               frame_type = 250 /* chop */
>>              offset_delta = 17
>>
>>          LineNumberTable:
>>            line 136: 0
>>            line 137: 6
>>            line 138: 12
>>            line 139: 16
>>            line 140: 24
>>            line 157: 33
>>            line 158: 39
>>            line 159: 43
>>            line 160: 47
>>            line 162: 52
>>            line 165: 64
>>            line 166: 71
>>            line 167: 78
>>            line 170: 81
>>            line 171: 86
>>            line 172: 99
>>    }
>>
>>
>> ... if I understand correctly the TraceExceptions' output, one record is
>> printed for each abruptly completed method. The last record printed
>> shows that Object.wait() method completed abruptly by throwing OOME.
>> Even if the OOME was caught and ignored, we would still get this
>> TraceExceptions record. Am I right? There's no record of abruptly
>> terminated ReferenceHandler.run() method in the output. This might be
>> because there's no method "calling" the ReferenceHandler.run() method,
>> since it is the 1st method on the call-stack. Are you sure the output is
>> from a failed test?
>>
>> Can we make the following adjustment to the ReferenceHandler and see
>> what gets printed and/or whether we still get uncaught OOME:
>>
>>
>>      private static class ReferenceHandler extends Thread {
>>      ...
>>          public void run() {
>>              runImpl();
>>          }
>>
>>          private void runImpl() {
>>              // ... the content of the original run() method
>>          }
>>      ...
>>
>>
>>
>>
>> Regards, Peter
>>
>>> David
>>>
>>>> David
>>>> -----
>>>>
>>>>> ------------------- Excerpt Begins ---------------------
>>>>>
>>>>> 147  if (!gc_overhead_limit_was_exceeded) {
>>>>> 148    // -XX:+HeapDumpOnOutOfMemoryError and -XX:OnOutOfMemoryError
>>>>> support
>>>>> 149    report_java_out_of_memory("Java heap space");
>>>>> 150
>>>>> 151    if (JvmtiExport::should_post_resource_exhausted()) {
>>>>> 152      JvmtiExport::post_resource_exhausted(
>>>>> 153        JVMTI_RESOURCE_EXHAUSTED_OOM_ERROR |
>>>>> JVMTI_RESOURCE_EXHAUSTED_JAVA_HEAP,
>>>>> 154        "Java heap space");
>>>>> 155    }
>>>>> 156
>>>>> 157 THROW_OOP_0(Universe::out_of_memory_error_java_heap());
>>>>> 158  } else {
>>>>>
>>>>> ------------------- Excerpt Ends ---------------------
>>>>>
>>>>>
>>>>> Would be helpful if David/some one else in the team could explain the
>>>>> latent aspects/probable cause.
>>>>>
>>>>> ---
>>>>> Thanks
>>>>> kalyan
>>>>>
>>>>> On 01/06/2014 04:40 PM, David Holmes wrote:
>>>>>> Back from vacation ...
>>>>>>
>>>>>> On 20/12/2013 4:49 PM, David Holmes wrote:
>>>>>>> On 20/12/2013 12:57 PM, srikalyan chandrashekar wrote:
>>>>>>>> Hi David Thanks for your comments, the unguarded part(clean and
>>>>>>>> enqueue)
>>>>>>>> in the Reference Handler thread does not seem to create any new
>>>>>>>> objects,
>>>>>>>> so it is the application(the test in this case) which is adding
>>>>>>>> objects
>>>>>>>> to heap and causing the Reference Handler to die with OOME.
>>>>>>>
>>>>>>> The ReferenceHandler thread can only get OOME if it allocates
>>>>>>> (directly
>>>>>>> or indirectly) - so there has to be something in the unguarded part
>>>>>>> that
>>>>>>> causes this. Again it may be an implicit action in the VM -
>>>>>>> similar to
>>>>>>> the class load issue for InterruptedException.
>>>>>>
>>>>>> Run a debug VM with -XX:+TraceExceptions to see where the OOME is
>>>>>> triggered.
>>>>>>
>>>>>> David
>>>>>> -----
>>>>>>
>>>>>>> David
>>>>>>>
>>>>>>> I am still
>>>>>>>> unsure about the side effects of the code change and agree with 
>>>>>>>> your
>>>>>>>> thoughts(on memory exhaustion test's reliability).
>>>>>>>>
>>>>>>>> PS: hotspot dev alias removed from CC.
>>>>>>>>
>>>>>>>> -- 
>>>>>>>> Thanks
>>>>>>>> kalyan
>>>>>>>>
>>>>>>>> On 12/19/13 5:08 PM, David Holmes wrote:
>>>>>>>>> Hi Kalyan,
>>>>>>>>>
>>>>>>>>> This is not a hotspot issue so I'm moving this to core-libs, 
>>>>>>>>> please
>>>>>>>>> drop hotspot from any replies.
>>>>>>>>>
>>>>>>>>> On 20/12/2013 6:26 AM, srikalyan wrote:
>>>>>>>>>> Hi all,  I have been working on the bug JDK-8022321
>>>>>>>>>> <https://bugs.openjdk.java.net/browse/JDK-8022321> , this is a
>>>>>>>>>> sporadic
>>>>>>>>>> failure and the webrev is available here
>>>>>>>>>> http://cr.openjdk.java.net/~srikchan/Regression/JDK-8022321_OOMEInReferenceHandler-webrev/ 
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I'm really not sure what to make of this. We have a test that
>>>>>>>>> triggers
>>>>>>>>> an out-of-memory condition but the OOME can actually turn up 
>>>>>>>>> in the
>>>>>>>>> ReferenceHandler thread causing it to terminate and the test to
>>>>>>>>> fail.
>>>>>>>>> We previously accounted for the non-obvious occurrences of OOME
>>>>>>>>> due to
>>>>>>>>> the Object.wait and the possible need to load the
>>>>>>>>> InterruptedException
>>>>>>>>> class - but still the OOME can appear where we don't want it. So
>>>>>>>>> finally you have just placed the whole for(;;) loop in a
>>>>>>>>> try/catch(OOME) that ignores the OOME. I'm certain that makes the
>>>>>>>>> test
>>>>>>>>> happy, but I'm not sure it is really what we want for the
>>>>>>>>> ReferenceHandler thread. If the OOME occurs while cleaning, or
>>>>>>>>> enqueuing then we will fail to clean and/or enqueue but there
>>>>>>>>> would be
>>>>>>>>> no indication that has occurred and I think that is a bigger
>>>>>>>>> problem
>>>>>>>>> than this test failing.
>>>>>>>>>
>>>>>>>>> There may be no way to make this test 100% reliable. In fact I'd
>>>>>>>>> suggest that no memory exhaustion test can be 100% reliable.
>>>>>>>>>
>>>>>>>>> David
>>>>>>>>>
>>>>>>>>>> *
>>>>>>>>>> **"Root Cause:Still not known"*
>>>>>>>>>> 2 places where there is a possibility for OOME
>>>>>>>>>> 1) Cleaner.clean()
>>>>>>>>>> 2) ReferenceQueue.enqueue()
>>>>>>>>>>
>>>>>>>>>> 1)  The cleanup code in turn has 2 places where there is 
>>>>>>>>>> potential
>>>>>>>>>> for
>>>>>>>>>> throwing OOME,
>>>>>>>>>>      a) thunk Thread which is run from clean() method. This
>>>>>>>>>> Runnable is
>>>>>>>>>> passed to Cleaner and appears in the following classes
>>>>>>>>>>          java/nio/DirectByteBuffer.java
>>>>>>>>>>          sun/misc/Perf.java
>>>>>>>>>>          sun/nio/fs/NativeBuffer.java
>>>>>>>>>>          sun/nio/ch/IOVecWrapper.java
>>>>>>>>>>          sun/misc/Cleaner/ExitOnThrow.java
>>>>>>>>>> However none of the above overridden implementations ever
>>>>>>>>>> create an
>>>>>>>>>> object in the clean() code.
>>>>>>>>>>      b) new PrivilegedAction created in try catch Exception
>>>>>>>>>> block of
>>>>>>>>>> clean() method but for this object to be created and to be held
>>>>>>>>>> responsible for OOME an Exception(other than OOME) has to be
>>>>>>>>>> thrown.
>>>>>>>>>>
>>>>>>>>>> 2) No new heap objects are created in the enqueue method nor
>>>>>>>>>> anywhere in
>>>>>>>>>> the deep call stack (VM.addFinalRefCount() etc) so this cannot
>>>>>>>>>> be a
>>>>>>>>>> potential cause.
>>>>>>>>>>
>>>>>>>>>> *Experimental change to java.lang.Reference.java* :
>>>>>>>>>> - Put one more guard (try catch with OOME block) in the 
>>>>>>>>>> Reference
>>>>>>>>>> Handler Thread which may give the Reference Handler a chance to
>>>>>>>>>> cleanup.
>>>>>>>>>> This is fixing the test failure (several 1000 runs with 0
>>>>>>>>>> failures)
>>>>>>>>>> - Without the above change the test fails atleast 3-5 times for
>>>>>>>>>> every
>>>>>>>>>> 1000 run.
>>>>>>>>>>
>>>>>>>>>> *PS*: The code change is to a very critical part of JDK and i am
>>>>>>>>>> fully
>>>>>>>>>> not aware of the consequences of the change, hence seeking 
>>>>>>>>>> expert
>>>>>>>>>> help
>>>>>>>>>> here. Appreciate your time and inputs towards this.
>>>>>>>>>>
>>>>>>>>
>>>>>
>>




More information about the core-libs-dev mailing list