RFC 7038914: VM could throw uncaught OOME in ReferenceHandler thread

Peter Levart peter.levart at gmail.com
Fri May 10 10:52:15 UTC 2013


Hi David, Thomas,

I think I understand you now, David. You want to minimize the 
possibility of vacuously passing the test. So here's my attempt at that:


public class OOMEInReferenceHandler {
     static Object[] fillHeap() {
         Object[] first = null, last = null;
         int size = 1 << 20;
         while (size > 0) {
             try {
                 Object[] array = new Object[size];
                 if (first == null) {
                     first = array;
                 } else {
                     last[0] = array;
                 }
                 last = array;
             } catch (OutOfMemoryError oome) {
                 size = size >>> 1;
             }
         }
         return first;
     }

     public static void main(String[] args) throws Exception {
         ThreadGroup tg = Thread.currentThread().getThreadGroup();
         for (
             ThreadGroup tgn = tg;
             tgn != null;
             tg = tgn, tgn = tg.getParent()
             )
             ;

         Thread[] threads = new Thread[tg.activeCount()];
         Thread referenceHandlerThread = null;
         int n = tg.enumerate(threads);
         for (int i = 0; i < n; i++) {
             if ("Reference Handler".equals(threads[i].getName())) {
                 referenceHandlerThread = threads[i];
             }
         }

         if (referenceHandlerThread == null) {
             throw new IllegalStateException("Couldn't find Reference 
Handler thread.");
         }

         ReferenceQueue<Object> refQueue = new ReferenceQueue<>();
         Object referent = new Object();
         WeakReference<Object> weakRef = new WeakReference<>(referent, 
refQueue);

         Object waste = fillHeap();

         referenceHandlerThread.interrupt();

         // allow referenceHandlerThread some time to throw OOME
         Thread.sleep(500L);

         // release waste & referent
         waste = null;
         referent = null;

         // wait at most 10 seconds for success or failure
         for (int i = 0; i < 20; i++) {
             if (refQueue.poll() != null) {
                 // Reference Handler thread still working -> success
                 return;
             }
             System.gc();
             Thread.sleep(500L); // wait a little to allow GC to do it's 
work before allocating objects
             if (!referenceHandlerThread.isAlive()) {
                 // Reference Handler thread died -> failure
                 throw new Exception("Reference Handler thread died.");
             }
         }

         // no sure answer after 10 seconds
         throw new IllegalStateException("Reference Handler thread stuck.");
     }
}


While executing the above test with the patch to ReferenceHandler 
applied, I noticed a strange behaviour. I can reproduce this behaviour 
reliably on both JDK7 and JDK8. When the patch is applied as proposed:

                         try {
                             lock.wait();
                         } catch (InterruptedException | 
OutOfMemoryError  x) { }

... I still get the following output from the test (reliably, always):

Exception: java.lang.OutOfMemoryError thrown from the 
UncaughtExceptionHandler in thread "Reference Handler"
Exception in thread "main" java.lang.Exception: Reference Handler thread 
died.
         at OOMEInReferenceHandler.main(OOMEInReferenceHandler.java:80)

But when i change the patch to the following:

                         try {
                             lock.wait();
                         } catch (OutOfMemoryError | 
InterruptedException  x) { }

...the test reliably and always passes.

My explanation to his behaviour is that the order of exception handlers 
changes the order of class referencing. In the former variation (that 
still throws OOME) the following seems to be happening:

wait() is interrupted and InterruptedException instance creation is 
attempted. Because this is the 1st reference to InterruptedException 
class in the lifetime of the JVM, loading of InterruptedException class 
is attempted which fails because of OOME. This OOME is caught by handler 
and ignored. But after handling of this OOME, another reference to 
InterruptedException class is attempted by exception handlers themselves 
(I don't know how exception handlers work exactly, but I have a feeling 
this is happening). Because InterruptedException class was not 
successfully loaded the 1st time tried, every reference to this class 
must throw NoClassDefFoundError, so this is attempted, but creation of 
NoClassDefFoundError fails because there's no heap space and another 
OOME is thrown - this time out of exception handling block, which is 
propagated and kills the thread.

If the order of exception handlers is reversed, this second OOME is 
caught and ignored.

To be sure the order of class referencing in handlers is not disturbed 
by changes in compilers, I suggest the patch to be modified to this more 
explicit variant:

                         try {
                             try {
                                 lock.wait();
                             } catch (InterruptedException  x) { }
                         } catch (OutOfMemoryError x) {}


..which also passes the test.

Regards, Peter


On 05/10/2013 05:20 AM, David Holmes wrote:
> On 9/05/2013 10:13 PM, Peter Levart wrote:
>> On 05/09/2013 01:33 PM, David Holmes wrote:
>>> On 9/05/2013 9:16 PM, Peter Levart wrote:
>>>> Hi David,
>>>>
>>>> On 05/09/2013 12:02 PM, David Holmes wrote:
>>>>> Hi Peter,
>>>>>
>>>>> Thanks for clarifying the test details. A few follow ups:
>>>>>
>>>>> - The reference class does get initialized early in the VM startup
>>>>> well before Main will be loaded. But the use of the weakref doesn't
>>>>> hurt.
>>>>
>>>> Ok, so if this is guaranteed, we can remove the weakref construction.
>>>>
>>>>>
>>>>> - 500ms may not be enough on some platforms, particularly some
>>>>> embedded systems. Ideally we could code up a handshake using another
>>>>> weakref that would guarantee that the handler thread really survived
>>>>> past the OOM. But at some point this just becomes a no-reg-hard
>>>>> situation :)
>>>>
>>>> If we used a weakref then there should be a delay between the
>>>> referenceHandlerThread.interrupt() and the release of the
>>>> WeakReference's referent, otherwise the WeakReference could be cleared
>>>> and enqueued before Reference Handler attempts to throw
>>>> InterruptedException (this really happens - I tried without delay and
>>>> the clearing/enqueueing was quicker than interrupt). After this 
>>>> initial
>>>> delay (currently set to 500ms) releasing the referent and waiting for
>>>> WeakReference to be enqueued (while executing System.gc()) is 
>>>> analogous
>>>> to testing for referenceHandlerThread.isAlive() - the only difference
>>>> being the code that has to be executed in Reference Handler while
>>>> unwinding the stack until the state of thread changes to 
>>>> TERMINATED. Can
>>>> this be delayed for a long time?
>>>
>>> What I was thinking was that after we interrupt we then create a new
>>> weakref
>>
>> Can't do that immediately (no space)!
>>
>>> and we loop until the ref gets cleared, or we detect the reference
>>> thread is not alive (with a sleep in between). One of those two
>>> conditions must become true.
>>
>> To create a weakref after interrupt, we have to 1st wait some time (to
>> give Reference Handler thread a chance to throw OOME), then free the
>> heap (release 'waste' and possibly do some System.gc()) to get some
>> space for weakref creation. After we do that, a chance that Reference
>> Handler thread is just in the process of unwinding the stack after
>> uncaught OOME and referenceHandlerThread.isAlive() still returns true is
>> minimal. Do you think we should wait some more to be sure thread is
>> still alive?
>
> "create a new weakref" was the wrong thing to say. We should already 
> have the weakref and then clear the string ref (wrapped by the 
> weakref) and wait for the weakref to be cleared (or not) - which would 
> require looping with system.gc() call.
>
>>>
>>>>>
>>>>> - I'm not certain that setting waste=null is sufficient to guarantee
>>>>> the next allocation will succeed under all GC's. GC folk can
>>>>> confim/deny that.
>>>>
>>>> We can pre-allocate the test-fail exception before doing fillHeap() so
>>>> that we can conditionally throw it later, like this:
>>>
>>> So when we do the throw there is logic in the launcher that will try
>>> to print the stacktrace, which may also encounter OOME unless we have
>>> freed the heap. So I think we want to release memory, I just wasn't
>>> certain that simply setting waste=null would guarantee it.
>>
>> 'waste' is local variable and goes out of scope when main() is finished.
>
> Good point. But again I'm not certain that once waste is null, or out 
> of scope, that an attempted allocation (eg for stacktrace) will force 
> GC to run, release everything, and have the allocation succeed. This 
> is what I'd like GC folk to confirm/refute.
>
>> So in final variation I haven't even bothered to set it to null at the
>> end. What do you suggest for successful test failure? Setting 'waste' to
>> null, followed by a System.gc() and Thread.sleep()? Can we signal test
>> failure in another way? System.exit(1)?
>
> The test harness expects exceptions for failures. System.exit is not 
> allowed in general.
>
> David
> -----
>
>> Regards, Peter
>>
>>>
>>> David
>>> ------
>>>
>>>> public class Test {
>>>>      static Object[] fillHeap() {
>>>>          Object[] first = null, last = null;
>>>>          int size = 1 << 20;
>>>>          while (size > 0) {
>>>>              try {
>>>>                  Object[] array = new Object[size];
>>>>                  if (first == null) {
>>>>                      first = array;
>>>>                  } else {
>>>>                      last[0] = array;
>>>>                  }
>>>>                  last = array;
>>>>              } catch (OutOfMemoryError oome) {
>>>>                  size = size >>> 1;
>>>>              }
>>>>          }
>>>>          return first;
>>>>      }
>>>>
>>>>      public static void main(String[] args) throws Exception {
>>>>          ThreadGroup tg = Thread.currentThread().getThreadGroup();
>>>>          for (
>>>>              ThreadGroup tgn = tg;
>>>>              tgn != null;
>>>>              tg = tgn, tgn = tg.getParent()
>>>>              )
>>>>              ;
>>>>
>>>>          Thread[] threads = new Thread[tg.activeCount()];
>>>>          Thread referenceHandlerThread = null;
>>>>          int n = tg.enumerate(threads);
>>>>          for (int i = 0; i < n; i++) {
>>>>              if ("Reference Handler".equals(threads[i].getName())) {
>>>>                  referenceHandlerThread = threads[i];
>>>>              }
>>>>          }
>>>>
>>>>          if (referenceHandlerThread == null) {
>>>>              throw new IllegalStateException("Couldn't find Reference
>>>> Handler thread.");
>>>>          }
>>>>
>>>>          // pre-allocate failure so that we don't cause OOME later
>>>>          Exception failure = new Exception("Reference Handler thread
>>>> died.");
>>>>
>>>>          Object waste = fillHeap();
>>>>
>>>>          referenceHandlerThread.interrupt();
>>>>
>>>>          // allow referenceHandlerThread some time to throw OOME
>>>>          Thread.sleep(500L);
>>>>
>>>>          if (waste != null && // keep 'waste' reference alive until 
>>>> this
>>>> point
>>>>              !referenceHandlerThread.isAlive()// check that the 
>>>> handler
>>>> is still alive
>>>>              ) {
>>>>              throw failure;
>>>>          }
>>>>      }
>>>> }
>>>>
>>>>
>>>>
>>>> Regards, Peter
>>>>
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>> On 9/05/2013 6:35 PM, Peter Levart wrote:
>>>>>> On 05/09/2013 07:53 AM, David Holmes wrote:
>>>>>>> Hi Thomas,
>>>>>>>
>>>>>>> On 9/05/2013 1:28 AM, Thomas Schatzl wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>>    please review the latest webrev for this patch that is
>>>>>>>> available at
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~tschatzl/7038914/webrev.2/
>>>>>>>>
>>>>>>>> As mentioned, it incorporates the fix and reproducer from Peter
>>>>>>>> Levart.
>>>>>>>
>>>>>>> Fix is fine.
>>>>>>>
>>>>>>> I'm not sure about the test (sorry I didn't pay it attention
>>>>>>> earlier).
>>>>>>> Have you instrumented the code to verify that the test actually
>>>>>>> triggers an OOME? It may be possible that if the OOME did kill the
>>>>>>> reference thread that we wouldn't necessarily detect it using the
>>>>>>> sleeps etc. Also I don't understand the need for the actual weakRef
>>>>>>> usage and System.gc() etc. If the heap is full then the interrupt
>>>>>>> will
>>>>>>> wake the reference handler thread and the allocation will 
>>>>>>> trigger the
>>>>>>> OOME. It doesn't matter if there are any references to process. The
>>>>>>> main logic might then reduce to:
>>>>>>>
>>>>>>>  referenceHandlerThread.interrupt();
>>>>>>>  for (int i = 0; i < 10; i++) {
>>>>>>>    if (!referenceHandlerThread.isAlive())
>>>>>>>        throw new Exception("Reference-handler thread died");
>>>>>>>    Thread.sleep(1000);
>>>>>>>  }
>>>>>>>
>>>>>>> which just gives it 10s to die else assumes all ok. ?
>>>>>>>
>>>>>>> But I can live with existing test (it just might vacuously pass)
>>>>>>>
>>>>>>> Cheers,
>>>>>>> David
>>>>>>
>>>>>> Hi David, Thomas,
>>>>>>
>>>>>> Yes, this was just a left-over from my bug reproducer that
>>>>>> demonstrated
>>>>>> the situation where WeakReference was cleared, but nothing was
>>>>>> enqueue-d
>>>>>> into the ReferenceQueue. Reference Handler thread otherwise just
>>>>>> sleeps
>>>>>> in lock.wait() most of the time if there's nothing to do and
>>>>>> interrupting it's thread will trigger allocation of
>>>>>> InterruptedException
>>>>>> and consequently OOME nevertheless.
>>>>>>
>>>>>> One think to watch: Reference Handler thread is started in the
>>>>>> j.l.r.Reference static initializer, so the Reference class must be
>>>>>> initialized. I don't know if there is any guarantee that this is 
>>>>>> done
>>>>>> before entering main(). So there should be something explicit in
>>>>>> main()
>>>>>> method that ensures it.
>>>>>>
>>>>>> Also, throwing test-fail exception should be done after releasing 
>>>>>> the
>>>>>> heap or we'll just trigger another OOME without any message to
>>>>>> describe
>>>>>> the situation.
>>>>>>
>>>>>> Waiting in a loop for ReferenceHandler thread is not needed since 
>>>>>> the
>>>>>> test will normally succeed and so the whole loop will execute to the
>>>>>> end. Only when the test fails the loop will exit faster. In my
>>>>>> experience, the thread dies between 10-20ms after interrupting, so
>>>>>> waiting for about 500ms is enough I think. Also no System.gc() and
>>>>>> additional waiting is needed to report the outcome - just releasing
>>>>>> the
>>>>>> reference to 'waste' variable is enuugh in my experience to restore
>>>>>> the
>>>>>> heap into working condition.
>>>>>>
>>>>>> Here's the updated test that does all that:
>>>>>>
>>>>>>
>>>>>> public class OOMEInReferenceHandler {
>>>>>>      static Object[] fillHeap() {
>>>>>>          Object[] first = null, last = null;
>>>>>>          int size = 1 << 20;
>>>>>>          while (size > 0) {
>>>>>>              try {
>>>>>>                  Object[] array = new Object[size];
>>>>>>                  if (first == null) {
>>>>>>                      first = array;
>>>>>>                  }
>>>>>>                  else {
>>>>>>                      last[0] = array;
>>>>>>                  }
>>>>>>                  last = array;
>>>>>>              }
>>>>>>              catch (OutOfMemoryError oome) {
>>>>>>                  size = size >>> 1;
>>>>>>              }
>>>>>>          }
>>>>>>          return first;
>>>>>>      }
>>>>>>
>>>>>>      public static void main(String[] args) throws Exception {
>>>>>>          // ensure ReferenceHandler thread is started
>>>>>>          new WeakReference<Object>(new Object());
>>>>>>
>>>>>>          ThreadGroup tg = Thread.currentThread().getThreadGroup();
>>>>>>          for (ThreadGroup tgn = tg;
>>>>>>               tgn != null;
>>>>>>               tg = tgn, tgn = tg.getParent());
>>>>>>
>>>>>>          Thread[] threads = new Thread[tg.activeCount()];
>>>>>>          Thread referenceHandlerThread = null;
>>>>>>          int n = tg.enumerate(threads);
>>>>>>          for (int i = 0; i < n; i++) {
>>>>>>              if("Reference Handler".equals(threads[i].getName())) {
>>>>>>                  referenceHandlerThread = threads[i];
>>>>>>              }
>>>>>>          }
>>>>>>
>>>>>>          if (referenceHandlerThread == null) {
>>>>>>              throw new IllegalStateException("Couldn't find 
>>>>>> Reference
>>>>>> Handler thread.");
>>>>>>          }
>>>>>>
>>>>>>          Object waste = fillHeap();
>>>>>>
>>>>>>          referenceHandlerThread.interrupt();
>>>>>>
>>>>>>          Thread.sleep(500L);
>>>>>>
>>>>>>          // release waste so we can allocate and throw normal
>>>>>> exceptions
>>>>>>          waste = null;
>>>>>>
>>>>>>           // check that the handler is still alive
>>>>>>          if (!referenceHandlerThread.isAlive()) {
>>>>>>              throw new Exception("Reference Handler thread died.");
>>>>>>          }
>>>>>>      }
>>>>>> }
>>>>>>
>>>>>>
>>>>>>
>>>>>> Regards, Peter
>>>>>>
>>>>>>>
>>>>>>>> Bugs.sun.com:
>>>>>>>> http://bugs.sun.com/view_bug.do?bug_id=7038914
>>>>>>>>
>>>>>>>> JIRA:
>>>>>>>> https://jbs.oracle.com/bugs/browse/JDK-7038914
>>>>>>>>
>>>>>>>> Testing:
>>>>>>>> JPRT, with the new reproducer passing
>>>>>>>>
>>>>>>>> In need of reviewers and a sponsor for this patch; as mentioned
>>>>>>>> earlier
>>>>>>>> I will set Peter (plevart) as author for this patch, i.e. send a
>>>>>>>> patchset to the sponsor for this change with that author set.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Thomas
>>>>>>>>
>>>>>>
>>>>
>>




More information about the core-libs-dev mailing list