RFC 7038914: VM could throw uncaught OOME in ReferenceHandler thread

David Holmes david.holmes at oracle.com
Thu May 9 11:33:22 UTC 2013


On 9/05/2013 9:16 PM, Peter Levart wrote:
> Hi David,
>
> On 05/09/2013 12:02 PM, David Holmes wrote:
>> Hi Peter,
>>
>> Thanks for clarifying the test details. A few follow ups:
>>
>> - The reference class does get initialized early in the VM startup
>> well before Main will be loaded. But the use of the weakref doesn't hurt.
>
> Ok, so if this is guaranteed, we can remove the weakref construction.
>
>>
>> - 500ms may not be enough on some platforms, particularly some
>> embedded systems. Ideally we could code up a handshake using another
>> weakref that would guarantee that the handler thread really survived
>> past the OOM. But at some point this just becomes a no-reg-hard
>> situation :)
>
> If we used a weakref then there should be a delay between the
> referenceHandlerThread.interrupt() and the release of the
> WeakReference's referent, otherwise the WeakReference could be cleared
> and enqueued before Reference Handler attempts to throw
> InterruptedException (this really happens - I tried without delay and
> the clearing/enqueueing was quicker than interrupt). After this initial
> delay (currently set to 500ms) releasing the referent and waiting for
> WeakReference to be enqueued (while executing System.gc()) is analogous
> to testing for referenceHandlerThread.isAlive() - the only difference
> being the code that has to be executed in Reference Handler while
> unwinding the stack until the state of thread changes to TERMINATED. Can
> this be delayed for a long time?

What I was thinking was that after we interrupt we then create a new 
weakref and we loop until the ref gets cleared, or we detect the 
reference thread is not alive (with a sleep in between). One of those 
two conditions must become true.

>>
>> - I'm not certain that setting waste=null is sufficient to guarantee
>> the next allocation will succeed under all GC's. GC folk can
>> confim/deny that.
>
> We can pre-allocate the test-fail exception before doing fillHeap() so
> that we can conditionally throw it later, like this:

So when we do the throw there is logic in the launcher that will try to 
print the stacktrace, which may also encounter OOME unless we have freed 
the heap. So I think we want to release memory, I just wasn't certain 
that simply setting waste=null would guarantee it.

David
------

> public class Test {
>      static Object[] fillHeap() {
>          Object[] first = null, last = null;
>          int size = 1 << 20;
>          while (size > 0) {
>              try {
>                  Object[] array = new Object[size];
>                  if (first == null) {
>                      first = array;
>                  } else {
>                      last[0] = array;
>                  }
>                  last = array;
>              } catch (OutOfMemoryError oome) {
>                  size = size >>> 1;
>              }
>          }
>          return first;
>      }
>
>      public static void main(String[] args) throws Exception {
>          ThreadGroup tg = Thread.currentThread().getThreadGroup();
>          for (
>              ThreadGroup tgn = tg;
>              tgn != null;
>              tg = tgn, tgn = tg.getParent()
>              )
>              ;
>
>          Thread[] threads = new Thread[tg.activeCount()];
>          Thread referenceHandlerThread = null;
>          int n = tg.enumerate(threads);
>          for (int i = 0; i < n; i++) {
>              if ("Reference Handler".equals(threads[i].getName())) {
>                  referenceHandlerThread = threads[i];
>              }
>          }
>
>          if (referenceHandlerThread == null) {
>              throw new IllegalStateException("Couldn't find Reference
> Handler thread.");
>          }
>
>          // pre-allocate failure so that we don't cause OOME later
>          Exception failure = new Exception("Reference Handler thread
> died.");
>
>          Object waste = fillHeap();
>
>          referenceHandlerThread.interrupt();
>
>          // allow referenceHandlerThread some time to throw OOME
>          Thread.sleep(500L);
>
>          if (waste != null && // keep 'waste' reference alive until this
> point
>              !referenceHandlerThread.isAlive()// check that the handler
> is still alive
>              ) {
>              throw failure;
>          }
>      }
> }
>
>
>
> Regards, Peter
>
>>
>> Thanks,
>> David
>>
>> On 9/05/2013 6:35 PM, Peter Levart wrote:
>>> On 05/09/2013 07:53 AM, David Holmes wrote:
>>>> Hi Thomas,
>>>>
>>>> On 9/05/2013 1:28 AM, Thomas Schatzl wrote:
>>>>> Hi,
>>>>>
>>>>>    please review the latest webrev for this patch that is available at
>>>>>
>>>>> http://cr.openjdk.java.net/~tschatzl/7038914/webrev.2/
>>>>>
>>>>> As mentioned, it incorporates the fix and reproducer from Peter
>>>>> Levart.
>>>>
>>>> Fix is fine.
>>>>
>>>> I'm not sure about the test (sorry I didn't pay it attention earlier).
>>>> Have you instrumented the code to verify that the test actually
>>>> triggers an OOME? It may be possible that if the OOME did kill the
>>>> reference thread that we wouldn't necessarily detect it using the
>>>> sleeps etc. Also I don't understand the need for the actual weakRef
>>>> usage and System.gc() etc. If the heap is full then the interrupt will
>>>> wake the reference handler thread and the allocation will trigger the
>>>> OOME. It doesn't matter if there are any references to process. The
>>>> main logic might then reduce to:
>>>>
>>>>  referenceHandlerThread.interrupt();
>>>>  for (int i = 0; i < 10; i++) {
>>>>    if (!referenceHandlerThread.isAlive())
>>>>        throw new Exception("Reference-handler thread died");
>>>>    Thread.sleep(1000);
>>>>  }
>>>>
>>>> which just gives it 10s to die else assumes all ok. ?
>>>>
>>>> But I can live with existing test (it just might vacuously pass)
>>>>
>>>> Cheers,
>>>> David
>>>
>>> Hi David, Thomas,
>>>
>>> Yes, this was just a left-over from my bug reproducer that demonstrated
>>> the situation where WeakReference was cleared, but nothing was enqueue-d
>>> into the ReferenceQueue. Reference Handler thread otherwise just sleeps
>>> in lock.wait() most of the time if there's nothing to do and
>>> interrupting it's thread will trigger allocation of InterruptedException
>>> and consequently OOME nevertheless.
>>>
>>> One think to watch: Reference Handler thread is started in the
>>> j.l.r.Reference static initializer, so the Reference class must be
>>> initialized. I don't know if there is any guarantee that this is done
>>> before entering main(). So there should be something explicit in main()
>>> method that ensures it.
>>>
>>> Also, throwing test-fail exception should be done after releasing the
>>> heap or we'll just trigger another OOME without any message to describe
>>> the situation.
>>>
>>> Waiting in a loop for ReferenceHandler thread is not needed since the
>>> test will normally succeed and so the whole loop will execute to the
>>> end. Only when the test fails the loop will exit faster. In my
>>> experience, the thread dies between 10-20ms after interrupting, so
>>> waiting for about 500ms is enough I think. Also no System.gc() and
>>> additional waiting is needed to report the outcome - just releasing the
>>> reference to 'waste' variable is enuugh in my experience to restore the
>>> heap into working condition.
>>>
>>> Here's the updated test that does all that:
>>>
>>>
>>> public class OOMEInReferenceHandler {
>>>      static Object[] fillHeap() {
>>>          Object[] first = null, last = null;
>>>          int size = 1 << 20;
>>>          while (size > 0) {
>>>              try {
>>>                  Object[] array = new Object[size];
>>>                  if (first == null) {
>>>                      first = array;
>>>                  }
>>>                  else {
>>>                      last[0] = array;
>>>                  }
>>>                  last = array;
>>>              }
>>>              catch (OutOfMemoryError oome) {
>>>                  size = size >>> 1;
>>>              }
>>>          }
>>>          return first;
>>>      }
>>>
>>>      public static void main(String[] args) throws Exception {
>>>          // ensure ReferenceHandler thread is started
>>>          new WeakReference<Object>(new Object());
>>>
>>>          ThreadGroup tg = Thread.currentThread().getThreadGroup();
>>>          for (ThreadGroup tgn = tg;
>>>               tgn != null;
>>>               tg = tgn, tgn = tg.getParent());
>>>
>>>          Thread[] threads = new Thread[tg.activeCount()];
>>>          Thread referenceHandlerThread = null;
>>>          int n = tg.enumerate(threads);
>>>          for (int i = 0; i < n; i++) {
>>>              if("Reference Handler".equals(threads[i].getName())) {
>>>                  referenceHandlerThread = threads[i];
>>>              }
>>>          }
>>>
>>>          if (referenceHandlerThread == null) {
>>>              throw new IllegalStateException("Couldn't find Reference
>>> Handler thread.");
>>>          }
>>>
>>>          Object waste = fillHeap();
>>>
>>>          referenceHandlerThread.interrupt();
>>>
>>>          Thread.sleep(500L);
>>>
>>>          // release waste so we can allocate and throw normal exceptions
>>>          waste = null;
>>>
>>>           // check that the handler is still alive
>>>          if (!referenceHandlerThread.isAlive()) {
>>>              throw new Exception("Reference Handler thread died.");
>>>          }
>>>      }
>>> }
>>>
>>>
>>>
>>> Regards, Peter
>>>
>>>>
>>>>> Bugs.sun.com:
>>>>> http://bugs.sun.com/view_bug.do?bug_id=7038914
>>>>>
>>>>> JIRA:
>>>>> https://jbs.oracle.com/bugs/browse/JDK-7038914
>>>>>
>>>>> Testing:
>>>>> JPRT, with the new reproducer passing
>>>>>
>>>>> In need of reviewers and a sponsor for this patch; as mentioned
>>>>> earlier
>>>>> I will set Peter (plevart) as author for this patch, i.e. send a
>>>>> patchset to the sponsor for this change with that author set.
>>>>>
>>>>> Thanks,
>>>>> Thomas
>>>>>
>>>
>



More information about the core-libs-dev mailing list