Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently

Peter Levart peter.levart at gmail.com
Fri Jan 10 08:31:39 UTC 2014


On 01/10/2014 12:59 AM, srikalyan chandrashekar wrote:
> David/Peter you are right, the logs trace came from passed run, i am 
> trying to simulate the failure and get the logs for failed runs(2000+ 
> runs done and still no failure), will get back to you once i have the 
> data from failed run. Sorry for the confusion.

I doubt the logs will be any different. A simple test that throws an 
exception inside Thread.run() without catching it shows that 
TraceExceptions doesn't report the fact that Thread.run() terminates 
abruptly (as David pointed out, pending exception is reported after 
every bytecode executed and there's no bytecode that invoked Thread.run()).
While you're at it, testing, could you also test the modified 
ReferenceHandler (the one that calls private runImpl() from it's run() 
method) so that we get a proof of incorrect behaviour.

Since we suspect there's something wrong with exception handling in 
interpreter, I devised a hypothetical reproducer that tries to simulate 
ReferenceHandler in many aspects, but doesn't require to be a 
ReferenceHandler:

http://cr.openjdk.java.net/~plevart/misc/OOME/OOMECatchingTest.java

This is designed to run indefinitely and only terminate if/when thread 
dies. Could you run this program in the environment that causes the 
OOMEInReferenceHandler test to fail and see if it terminates?


Regards, Peter

>
> ---
> Thanks
> kalyan
>
> On 01/08/2014 11:22 PM, David Holmes wrote:
>> Thanks Peter.
>>
>> Kalyan: Can you confirm, as Peter asked, that the TraceExceptions 
>> output came from a failed run?
>>
>> AFAICS the Trace info is printed after each bytecode where there is a 
>> pending exception - though I'm not 100% sure on the printing within 
>> the VM runtime. Based on that I think we see the Trace output in 
>> run() at the point where wait() returns, so it may well be caught 
>> after that - in which case this was not a failing run.
>>
>> I also can't reproduce the problem :(
>>
>> David
>>
>> On 8/01/2014 10:34 PM, Peter Levart wrote:
>>> On 01/08/2014 07:30 AM, David Holmes wrote:
>>>> On 8/01/2014 4:19 PM, David Holmes wrote:
>>>>> On 8/01/2014 7:33 AM, srikalyan chandrashekar wrote:
>>>>>> Hi David, TraceExceptions with fastdebug build produced some nice 
>>>>>> trace
>>>>>> <http://cr.openjdk.java.net/%7Esrikchan/OOME_exception_trace.log> 
>>>>>> . The
>>>>>> native method wait(long) is where the OOME if being thrown, the 
>>>>>> deepest
>>>>>> call is in
>>>>>>
>>>>>> src/share/vm/gc_interface/collectedHeap.inline.hpp, line 157
>>>>>
>>>>> Yes but it is the caller that is of interest:
>>>>>
>>>>> Exception <a 'java/lang/OutOfMemoryError'> (0x00000000d6a01840)
>>>>> thrown
>>>>> [/HUDSON/workspace/8-2-build-linux-amd64/jdk8/1317/hotspot/src/share/vm/runtime/objectMonitor.cpp, 
>>>>>
>>>>>
>>>>> line 1649]
>>>>> for thread 0x00007f78c40d2800
>>>>> Exception <a 'java/lang/OutOfMemoryError'> (0x00000000d6a01840)
>>>>>   thrown in interpreter method <{method} {0x00007f78b4800ae0} 'wait'
>>>>> '(J)V' in 'java/lang/Object'>
>>>>>   at bci 0 for thread 0x00007f78c40d2800
>>>>>
>>>>> The ReferenceHandler thread gets the OOME trying to allocate the
>>>>> InterruptedException.
>>>>
>>>> However we already have a catch block around the wait() so how is this
>>>> OOME getting through? A bug in exception handling in the 
>>>> interpreter ??
>>>>
>>>
>>> Might be. And it may have something to do with the fact that the
>>> Thread.run() method is the 1st call frame on the thread's stack (seems
>>> like corner case). The last few meaningful TraceExceptions records are:
>>>
>>>
>>> Exception <a 'java/lang/OutOfMemoryError'> (0x00000000d6a01840)
>>> thrown
>>> [/HUDSON/workspace/8-2-build-linux-amd64/jdk8/1317/hotspot/src/share/vm/gc_interface/collectedHeap.inline.hpp, 
>>>
>>> line 157]
>>> for thread 0x00007f78c40d2800
>>> Exception <a 'java/lang/OutOfMemoryError'> (0x00000000d6a01840)
>>> thrown
>>> [/HUDSON/workspace/8-2-build-linux-amd64/jdk8/1317/hotspot/src/share/vm/runtime/objectMonitor.cpp, 
>>>
>>> line 1649]
>>> for thread 0x00007f78c40d2800
>>> Exception <a 'java/lang/OutOfMemoryError'> (0x00000000d6a01840)
>>>   thrown in interpreter method <{method} {0x00007f78b4800ae0} 'wait'
>>> '(J)V' in 'java/lang/Object'>
>>>   at bci 0 for thread 0x00007f78c40d2800
>>> Exception <a 'java/lang/OutOfMemoryError'> (0x00000000d6a01840)
>>>   thrown in interpreter method <{method} {0x00007f78b4800ca8} 'wait'
>>> '()V' in 'java/lang/Object'>
>>>   at *bci 2* for thread 0x00007f78c40d2800
>>> Exception <a 'java/lang/OutOfMemoryError'> (0x00000000d6a01840)
>>>   thrown in interpreter method <{method} {0x00007f78b48d2250} 'run'
>>> '()V' in 'java/lang/ref/Reference$ReferenceHandler'>
>>>   at *bci 36* for thread 0x00007f78c40d2800
>>>
>>>
>>> Here's the relevant bytecodes:
>>>
>>>
>>> public class java.lang.Object
>>>
>>>    public final void wait() throws java.lang.InterruptedException;
>>>      descriptor: ()V
>>>      flags: ACC_PUBLIC, ACC_FINAL
>>>      Code:
>>>        stack=3, locals=1, args_size=1
>>>           0: aload_0
>>>           1: lconst_0
>>> *         2: invokevirtual #73                 // Method wait:(J)V*
>>>           5: return
>>>        LineNumberTable:
>>>          line 502: 0
>>>          line 503: 5
>>>      Exceptions:
>>>        throws java.lang.InterruptedException
>>>
>>>
>>> class java.lang.ref.Reference$ReferenceHandler extends java.lang.Thread
>>>
>>>    public void run();
>>>      descriptor: ()V
>>>      flags: ACC_PUBLIC
>>>      Code:
>>>        stack=2, locals=5, args_size=1
>>>           0: invokestatic  #62                 // Method
>>> java/lang/ref/Reference.access$100:()Ljava/lang/ref/Reference$Lock;
>>>           3: dup
>>>           4: astore_2
>>>           5: monitorenter
>>>           6: invokestatic  #61                 // Method
>>> java/lang/ref/Reference.access$200:()Ljava/lang/ref/Reference;
>>>           9: ifnull        33
>>>          12: invokestatic  #61                 // Method
>>> java/lang/ref/Reference.access$200:()Ljava/lang/ref/Reference;
>>>          15: astore_1
>>>          16: aload_1
>>>          17: invokestatic  #64                 // Method
>>> java/lang/ref/Reference.access$300:(Ljava/lang/ref/Reference;)Ljava/lang/ref/Reference; 
>>>
>>>          20: invokestatic  #63                 // Method
>>> java/lang/ref/Reference.access$202:(Ljava/lang/ref/Reference;)Ljava/lang/ref/Reference; 
>>>
>>>          23: pop
>>>          24: aload_1
>>>          25: aconst_null
>>>          26: invokestatic  #65                 // Method
>>> java/lang/ref/Reference.access$302:(Ljava/lang/ref/Reference;Ljava/lang/ref/Reference;)Ljava/lang/ref/Reference; 
>>>
>>>          29: pop
>>>          30: goto          52
>>>          33: invokestatic  #62                 // Method
>>> java/lang/ref/Reference.access$100:()Ljava/lang/ref/Reference$Lock;
>>> *        36: invokevirtual #59                 // Method
>>> java/lang/Object.wait:()V*
>>>          39: goto          43
>>>          42: astore_3
>>>          43: goto          47
>>>          46: astore_3
>>>          47: aload_2
>>>          48: monitorexit
>>>          49: goto          0
>>>          52: aload_2
>>>          53: monitorexit
>>>          54: goto          64
>>>          57: astore        4
>>>          59: aload_2
>>>          60: monitorexit
>>>          61: aload         4
>>>          63: athrow
>>>          64: aload_1
>>>          65: instanceof    #38                 // class 
>>> sun/misc/Cleaner
>>>          68: ifeq          81
>>>          71: aload_1
>>>          72: checkcast     #38                 // class 
>>> sun/misc/Cleaner
>>>          75: invokevirtual #67                 // Method
>>> sun/misc/Cleaner.clean:()V
>>>          78: goto          0
>>>          81: aload_1
>>>          82: getfield      #57                 // Field
>>> java/lang/ref/Reference.queue:Ljava/lang/ref/ReferenceQueue;
>>>          85: astore_2
>>>          86: aload_2
>>>          87: getstatic     #58                 // Field
>>> java/lang/ref/ReferenceQueue.NULL:Ljava/lang/ref/ReferenceQueue;
>>>          90: if_acmpeq     99
>>>          93: aload_2
>>>          94: aload_1
>>>          95: invokevirtual #66                 // Method
>>> java/lang/ref/ReferenceQueue.enqueue:(Ljava/lang/ref/Reference;)Z
>>>          98: pop
>>>          99: goto          0
>>>        Exception table:
>>>           from    to  target type
>>> *            33    39    42   Class java/lang/OutOfMemoryError*
>>>              33    43    46   Class java/lang/InterruptedException
>>>               6    49    57   any
>>>              52    54    57   any
>>>              57    61    57   any
>>>        StackMapTable: number_of_entries = 11
>>>             frame_type = 0 /* same */
>>>             frame_type = 253 /* append */
>>>            offset_delta = 32
>>>            locals = [ top, class java/lang/Object ]
>>>               frame_type = 72 /* same_locals_1_stack_item */
>>>              stack = [ class java/lang/OutOfMemoryError ]
>>>               frame_type = 0 /* same */
>>>               frame_type = 66 /* same_locals_1_stack_item */
>>>              stack = [ class java/lang/InterruptedException ]
>>>               frame_type = 0 /* same */
>>>               frame_type = 255 /* full_frame */
>>>              offset_delta = 4
>>>              locals = [ class java/lang/ref/Reference$ReferenceHandler,
>>> class java/lang/ref/Reference, class java/lang/Object ]
>>>              stack = []
>>>               frame_type = 255 /* full_frame */
>>>              offset_delta = 4
>>>              locals = [ class java/lang/ref/Reference$ReferenceHandler,
>>> top, class java/lang/Object ]
>>>              stack = [ class java/lang/Throwable ]
>>>               frame_type = 255 /* full_frame */
>>>              offset_delta = 6
>>>              locals = [ class java/lang/ref/Reference$ReferenceHandler,
>>> class java/lang/ref/Reference ]
>>>              stack = []
>>>               frame_type = 16 /* same */
>>>               frame_type = 250 /* chop */
>>>              offset_delta = 17
>>>
>>>          LineNumberTable:
>>>            line 136: 0
>>>            line 137: 6
>>>            line 138: 12
>>>            line 139: 16
>>>            line 140: 24
>>>            line 157: 33
>>>            line 158: 39
>>>            line 159: 43
>>>            line 160: 47
>>>            line 162: 52
>>>            line 165: 64
>>>            line 166: 71
>>>            line 167: 78
>>>            line 170: 81
>>>            line 171: 86
>>>            line 172: 99
>>>    }
>>>
>>>
>>> ... if I understand correctly the TraceExceptions' output, one 
>>> record is
>>> printed for each abruptly completed method. The last record printed
>>> shows that Object.wait() method completed abruptly by throwing OOME.
>>> Even if the OOME was caught and ignored, we would still get this
>>> TraceExceptions record. Am I right? There's no record of abruptly
>>> terminated ReferenceHandler.run() method in the output. This might be
>>> because there's no method "calling" the ReferenceHandler.run() method,
>>> since it is the 1st method on the call-stack. Are you sure the 
>>> output is
>>> from a failed test?
>>>
>>> Can we make the following adjustment to the ReferenceHandler and see
>>> what gets printed and/or whether we still get uncaught OOME:
>>>
>>>
>>>      private static class ReferenceHandler extends Thread {
>>>      ...
>>>          public void run() {
>>>              runImpl();
>>>          }
>>>
>>>          private void runImpl() {
>>>              // ... the content of the original run() method
>>>          }
>>>      ...
>>>
>>>
>>>
>>>
>>> Regards, Peter
>>>
>>>> David
>>>>
>>>>> David
>>>>> -----
>>>>>
>>>>>> ------------------- Excerpt Begins ---------------------
>>>>>>
>>>>>> 147  if (!gc_overhead_limit_was_exceeded) {
>>>>>> 148    // -XX:+HeapDumpOnOutOfMemoryError and -XX:OnOutOfMemoryError
>>>>>> support
>>>>>> 149    report_java_out_of_memory("Java heap space");
>>>>>> 150
>>>>>> 151    if (JvmtiExport::should_post_resource_exhausted()) {
>>>>>> 152      JvmtiExport::post_resource_exhausted(
>>>>>> 153        JVMTI_RESOURCE_EXHAUSTED_OOM_ERROR |
>>>>>> JVMTI_RESOURCE_EXHAUSTED_JAVA_HEAP,
>>>>>> 154        "Java heap space");
>>>>>> 155    }
>>>>>> 156
>>>>>> 157 THROW_OOP_0(Universe::out_of_memory_error_java_heap());
>>>>>> 158  } else {
>>>>>>
>>>>>> ------------------- Excerpt Ends ---------------------
>>>>>>
>>>>>>
>>>>>> Would be helpful if David/some one else in the team could explain 
>>>>>> the
>>>>>> latent aspects/probable cause.
>>>>>>
>>>>>> ---
>>>>>> Thanks
>>>>>> kalyan
>>>>>>
>>>>>> On 01/06/2014 04:40 PM, David Holmes wrote:
>>>>>>> Back from vacation ...
>>>>>>>
>>>>>>> On 20/12/2013 4:49 PM, David Holmes wrote:
>>>>>>>> On 20/12/2013 12:57 PM, srikalyan chandrashekar wrote:
>>>>>>>>> Hi David Thanks for your comments, the unguarded part(clean and
>>>>>>>>> enqueue)
>>>>>>>>> in the Reference Handler thread does not seem to create any new
>>>>>>>>> objects,
>>>>>>>>> so it is the application(the test in this case) which is adding
>>>>>>>>> objects
>>>>>>>>> to heap and causing the Reference Handler to die with OOME.
>>>>>>>>
>>>>>>>> The ReferenceHandler thread can only get OOME if it allocates
>>>>>>>> (directly
>>>>>>>> or indirectly) - so there has to be something in the unguarded 
>>>>>>>> part
>>>>>>>> that
>>>>>>>> causes this. Again it may be an implicit action in the VM -
>>>>>>>> similar to
>>>>>>>> the class load issue for InterruptedException.
>>>>>>>
>>>>>>> Run a debug VM with -XX:+TraceExceptions to see where the OOME is
>>>>>>> triggered.
>>>>>>>
>>>>>>> David
>>>>>>> -----
>>>>>>>
>>>>>>>> David
>>>>>>>>
>>>>>>>> I am still
>>>>>>>>> unsure about the side effects of the code change and agree 
>>>>>>>>> with your
>>>>>>>>> thoughts(on memory exhaustion test's reliability).
>>>>>>>>>
>>>>>>>>> PS: hotspot dev alias removed from CC.
>>>>>>>>>
>>>>>>>>> -- 
>>>>>>>>> Thanks
>>>>>>>>> kalyan
>>>>>>>>>
>>>>>>>>> On 12/19/13 5:08 PM, David Holmes wrote:
>>>>>>>>>> Hi Kalyan,
>>>>>>>>>>
>>>>>>>>>> This is not a hotspot issue so I'm moving this to core-libs, 
>>>>>>>>>> please
>>>>>>>>>> drop hotspot from any replies.
>>>>>>>>>>
>>>>>>>>>> On 20/12/2013 6:26 AM, srikalyan wrote:
>>>>>>>>>>> Hi all,  I have been working on the bug JDK-8022321
>>>>>>>>>>> <https://bugs.openjdk.java.net/browse/JDK-8022321> , this is a
>>>>>>>>>>> sporadic
>>>>>>>>>>> failure and the webrev is available here
>>>>>>>>>>> http://cr.openjdk.java.net/~srikchan/Regression/JDK-8022321_OOMEInReferenceHandler-webrev/ 
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I'm really not sure what to make of this. We have a test that
>>>>>>>>>> triggers
>>>>>>>>>> an out-of-memory condition but the OOME can actually turn up 
>>>>>>>>>> in the
>>>>>>>>>> ReferenceHandler thread causing it to terminate and the test to
>>>>>>>>>> fail.
>>>>>>>>>> We previously accounted for the non-obvious occurrences of OOME
>>>>>>>>>> due to
>>>>>>>>>> the Object.wait and the possible need to load the
>>>>>>>>>> InterruptedException
>>>>>>>>>> class - but still the OOME can appear where we don't want it. So
>>>>>>>>>> finally you have just placed the whole for(;;) loop in a
>>>>>>>>>> try/catch(OOME) that ignores the OOME. I'm certain that makes 
>>>>>>>>>> the
>>>>>>>>>> test
>>>>>>>>>> happy, but I'm not sure it is really what we want for the
>>>>>>>>>> ReferenceHandler thread. If the OOME occurs while cleaning, or
>>>>>>>>>> enqueuing then we will fail to clean and/or enqueue but there
>>>>>>>>>> would be
>>>>>>>>>> no indication that has occurred and I think that is a bigger
>>>>>>>>>> problem
>>>>>>>>>> than this test failing.
>>>>>>>>>>
>>>>>>>>>> There may be no way to make this test 100% reliable. In fact I'd
>>>>>>>>>> suggest that no memory exhaustion test can be 100% reliable.
>>>>>>>>>>
>>>>>>>>>> David
>>>>>>>>>>
>>>>>>>>>>> *
>>>>>>>>>>> **"Root Cause:Still not known"*
>>>>>>>>>>> 2 places where there is a possibility for OOME
>>>>>>>>>>> 1) Cleaner.clean()
>>>>>>>>>>> 2) ReferenceQueue.enqueue()
>>>>>>>>>>>
>>>>>>>>>>> 1)  The cleanup code in turn has 2 places where there is 
>>>>>>>>>>> potential
>>>>>>>>>>> for
>>>>>>>>>>> throwing OOME,
>>>>>>>>>>>      a) thunk Thread which is run from clean() method. This
>>>>>>>>>>> Runnable is
>>>>>>>>>>> passed to Cleaner and appears in the following classes
>>>>>>>>>>>          java/nio/DirectByteBuffer.java
>>>>>>>>>>>          sun/misc/Perf.java
>>>>>>>>>>>          sun/nio/fs/NativeBuffer.java
>>>>>>>>>>>          sun/nio/ch/IOVecWrapper.java
>>>>>>>>>>>          sun/misc/Cleaner/ExitOnThrow.java
>>>>>>>>>>> However none of the above overridden implementations ever
>>>>>>>>>>> create an
>>>>>>>>>>> object in the clean() code.
>>>>>>>>>>>      b) new PrivilegedAction created in try catch Exception
>>>>>>>>>>> block of
>>>>>>>>>>> clean() method but for this object to be created and to be held
>>>>>>>>>>> responsible for OOME an Exception(other than OOME) has to be
>>>>>>>>>>> thrown.
>>>>>>>>>>>
>>>>>>>>>>> 2) No new heap objects are created in the enqueue method nor
>>>>>>>>>>> anywhere in
>>>>>>>>>>> the deep call stack (VM.addFinalRefCount() etc) so this cannot
>>>>>>>>>>> be a
>>>>>>>>>>> potential cause.
>>>>>>>>>>>
>>>>>>>>>>> *Experimental change to java.lang.Reference.java* :
>>>>>>>>>>> - Put one more guard (try catch with OOME block) in the 
>>>>>>>>>>> Reference
>>>>>>>>>>> Handler Thread which may give the Reference Handler a chance to
>>>>>>>>>>> cleanup.
>>>>>>>>>>> This is fixing the test failure (several 1000 runs with 0
>>>>>>>>>>> failures)
>>>>>>>>>>> - Without the above change the test fails atleast 3-5 times for
>>>>>>>>>>> every
>>>>>>>>>>> 1000 run.
>>>>>>>>>>>
>>>>>>>>>>> *PS*: The code change is to a very critical part of JDK and 
>>>>>>>>>>> i am
>>>>>>>>>>> fully
>>>>>>>>>>> not aware of the consequences of the change, hence seeking 
>>>>>>>>>>> expert
>>>>>>>>>>> help
>>>>>>>>>>> here. Appreciate your time and inputs towards this.
>>>>>>>>>>>
>>>>>>>>>
>>>>>>
>>>
>




More information about the core-libs-dev mailing list