RFC 7038914: VM could throw uncaught OOME in ReferenceHandler thread

David Holmes david.holmes at oracle.com
Fri May 10 03:34:43 UTC 2013


string ref -> strong ref  <sigh>

David

On 10/05/2013 1:20 PM, David Holmes wrote:
> On 9/05/2013 10:13 PM, Peter Levart wrote:
>> On 05/09/2013 01:33 PM, David Holmes wrote:
>>> On 9/05/2013 9:16 PM, Peter Levart wrote:
>>>> Hi David,
>>>>
>>>> On 05/09/2013 12:02 PM, David Holmes wrote:
>>>>> Hi Peter,
>>>>>
>>>>> Thanks for clarifying the test details. A few follow ups:
>>>>>
>>>>> - The reference class does get initialized early in the VM startup
>>>>> well before Main will be loaded. But the use of the weakref doesn't
>>>>> hurt.
>>>>
>>>> Ok, so if this is guaranteed, we can remove the weakref construction.
>>>>
>>>>>
>>>>> - 500ms may not be enough on some platforms, particularly some
>>>>> embedded systems. Ideally we could code up a handshake using another
>>>>> weakref that would guarantee that the handler thread really survived
>>>>> past the OOM. But at some point this just becomes a no-reg-hard
>>>>> situation :)
>>>>
>>>> If we used a weakref then there should be a delay between the
>>>> referenceHandlerThread.interrupt() and the release of the
>>>> WeakReference's referent, otherwise the WeakReference could be cleared
>>>> and enqueued before Reference Handler attempts to throw
>>>> InterruptedException (this really happens - I tried without delay and
>>>> the clearing/enqueueing was quicker than interrupt). After this initial
>>>> delay (currently set to 500ms) releasing the referent and waiting for
>>>> WeakReference to be enqueued (while executing System.gc()) is analogous
>>>> to testing for referenceHandlerThread.isAlive() - the only difference
>>>> being the code that has to be executed in Reference Handler while
>>>> unwinding the stack until the state of thread changes to TERMINATED.
>>>> Can
>>>> this be delayed for a long time?
>>>
>>> What I was thinking was that after we interrupt we then create a new
>>> weakref
>>
>> Can't do that immediately (no space)!
>>
>>> and we loop until the ref gets cleared, or we detect the reference
>>> thread is not alive (with a sleep in between). One of those two
>>> conditions must become true.
>>
>> To create a weakref after interrupt, we have to 1st wait some time (to
>> give Reference Handler thread a chance to throw OOME), then free the
>> heap (release 'waste' and possibly do some System.gc()) to get some
>> space for weakref creation. After we do that, a chance that Reference
>> Handler thread is just in the process of unwinding the stack after
>> uncaught OOME and referenceHandlerThread.isAlive() still returns true is
>> minimal. Do you think we should wait some more to be sure thread is
>> still alive?
>
> "create a new weakref" was the wrong thing to say. We should already
> have the weakref and then clear the string ref (wrapped by the weakref)
> and wait for the weakref to be cleared (or not) - which would require
> looping with system.gc() call.
>
>>>
>>>>>
>>>>> - I'm not certain that setting waste=null is sufficient to guarantee
>>>>> the next allocation will succeed under all GC's. GC folk can
>>>>> confim/deny that.
>>>>
>>>> We can pre-allocate the test-fail exception before doing fillHeap() so
>>>> that we can conditionally throw it later, like this:
>>>
>>> So when we do the throw there is logic in the launcher that will try
>>> to print the stacktrace, which may also encounter OOME unless we have
>>> freed the heap. So I think we want to release memory, I just wasn't
>>> certain that simply setting waste=null would guarantee it.
>>
>> 'waste' is local variable and goes out of scope when main() is finished.
>
> Good point. But again I'm not certain that once waste is null, or out of
> scope, that an attempted allocation (eg for stacktrace) will force GC to
> run, release everything, and have the allocation succeed. This is what
> I'd like GC folk to confirm/refute.
>
>> So in final variation I haven't even bothered to set it to null at the
>> end. What do you suggest for successful test failure? Setting 'waste' to
>> null, followed by a System.gc() and Thread.sleep()? Can we signal test
>> failure in another way? System.exit(1)?
>
> The test harness expects exceptions for failures. System.exit is not
> allowed in general.
>
> David
> -----
>
>> Regards, Peter
>>
>>>
>>> David
>>> ------
>>>
>>>> public class Test {
>>>>      static Object[] fillHeap() {
>>>>          Object[] first = null, last = null;
>>>>          int size = 1 << 20;
>>>>          while (size > 0) {
>>>>              try {
>>>>                  Object[] array = new Object[size];
>>>>                  if (first == null) {
>>>>                      first = array;
>>>>                  } else {
>>>>                      last[0] = array;
>>>>                  }
>>>>                  last = array;
>>>>              } catch (OutOfMemoryError oome) {
>>>>                  size = size >>> 1;
>>>>              }
>>>>          }
>>>>          return first;
>>>>      }
>>>>
>>>>      public static void main(String[] args) throws Exception {
>>>>          ThreadGroup tg = Thread.currentThread().getThreadGroup();
>>>>          for (
>>>>              ThreadGroup tgn = tg;
>>>>              tgn != null;
>>>>              tg = tgn, tgn = tg.getParent()
>>>>              )
>>>>              ;
>>>>
>>>>          Thread[] threads = new Thread[tg.activeCount()];
>>>>          Thread referenceHandlerThread = null;
>>>>          int n = tg.enumerate(threads);
>>>>          for (int i = 0; i < n; i++) {
>>>>              if ("Reference Handler".equals(threads[i].getName())) {
>>>>                  referenceHandlerThread = threads[i];
>>>>              }
>>>>          }
>>>>
>>>>          if (referenceHandlerThread == null) {
>>>>              throw new IllegalStateException("Couldn't find Reference
>>>> Handler thread.");
>>>>          }
>>>>
>>>>          // pre-allocate failure so that we don't cause OOME later
>>>>          Exception failure = new Exception("Reference Handler thread
>>>> died.");
>>>>
>>>>          Object waste = fillHeap();
>>>>
>>>>          referenceHandlerThread.interrupt();
>>>>
>>>>          // allow referenceHandlerThread some time to throw OOME
>>>>          Thread.sleep(500L);
>>>>
>>>>          if (waste != null && // keep 'waste' reference alive until
>>>> this
>>>> point
>>>>              !referenceHandlerThread.isAlive()// check that the handler
>>>> is still alive
>>>>              ) {
>>>>              throw failure;
>>>>          }
>>>>      }
>>>> }
>>>>
>>>>
>>>>
>>>> Regards, Peter
>>>>
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>> On 9/05/2013 6:35 PM, Peter Levart wrote:
>>>>>> On 05/09/2013 07:53 AM, David Holmes wrote:
>>>>>>> Hi Thomas,
>>>>>>>
>>>>>>> On 9/05/2013 1:28 AM, Thomas Schatzl wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>>    please review the latest webrev for this patch that is
>>>>>>>> available at
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~tschatzl/7038914/webrev.2/
>>>>>>>>
>>>>>>>> As mentioned, it incorporates the fix and reproducer from Peter
>>>>>>>> Levart.
>>>>>>>
>>>>>>> Fix is fine.
>>>>>>>
>>>>>>> I'm not sure about the test (sorry I didn't pay it attention
>>>>>>> earlier).
>>>>>>> Have you instrumented the code to verify that the test actually
>>>>>>> triggers an OOME? It may be possible that if the OOME did kill the
>>>>>>> reference thread that we wouldn't necessarily detect it using the
>>>>>>> sleeps etc. Also I don't understand the need for the actual weakRef
>>>>>>> usage and System.gc() etc. If the heap is full then the interrupt
>>>>>>> will
>>>>>>> wake the reference handler thread and the allocation will trigger
>>>>>>> the
>>>>>>> OOME. It doesn't matter if there are any references to process. The
>>>>>>> main logic might then reduce to:
>>>>>>>
>>>>>>>  referenceHandlerThread.interrupt();
>>>>>>>  for (int i = 0; i < 10; i++) {
>>>>>>>    if (!referenceHandlerThread.isAlive())
>>>>>>>        throw new Exception("Reference-handler thread died");
>>>>>>>    Thread.sleep(1000);
>>>>>>>  }
>>>>>>>
>>>>>>> which just gives it 10s to die else assumes all ok. ?
>>>>>>>
>>>>>>> But I can live with existing test (it just might vacuously pass)
>>>>>>>
>>>>>>> Cheers,
>>>>>>> David
>>>>>>
>>>>>> Hi David, Thomas,
>>>>>>
>>>>>> Yes, this was just a left-over from my bug reproducer that
>>>>>> demonstrated
>>>>>> the situation where WeakReference was cleared, but nothing was
>>>>>> enqueue-d
>>>>>> into the ReferenceQueue. Reference Handler thread otherwise just
>>>>>> sleeps
>>>>>> in lock.wait() most of the time if there's nothing to do and
>>>>>> interrupting it's thread will trigger allocation of
>>>>>> InterruptedException
>>>>>> and consequently OOME nevertheless.
>>>>>>
>>>>>> One think to watch: Reference Handler thread is started in the
>>>>>> j.l.r.Reference static initializer, so the Reference class must be
>>>>>> initialized. I don't know if there is any guarantee that this is done
>>>>>> before entering main(). So there should be something explicit in
>>>>>> main()
>>>>>> method that ensures it.
>>>>>>
>>>>>> Also, throwing test-fail exception should be done after releasing the
>>>>>> heap or we'll just trigger another OOME without any message to
>>>>>> describe
>>>>>> the situation.
>>>>>>
>>>>>> Waiting in a loop for ReferenceHandler thread is not needed since the
>>>>>> test will normally succeed and so the whole loop will execute to the
>>>>>> end. Only when the test fails the loop will exit faster. In my
>>>>>> experience, the thread dies between 10-20ms after interrupting, so
>>>>>> waiting for about 500ms is enough I think. Also no System.gc() and
>>>>>> additional waiting is needed to report the outcome - just releasing
>>>>>> the
>>>>>> reference to 'waste' variable is enuugh in my experience to restore
>>>>>> the
>>>>>> heap into working condition.
>>>>>>
>>>>>> Here's the updated test that does all that:
>>>>>>
>>>>>>
>>>>>> public class OOMEInReferenceHandler {
>>>>>>      static Object[] fillHeap() {
>>>>>>          Object[] first = null, last = null;
>>>>>>          int size = 1 << 20;
>>>>>>          while (size > 0) {
>>>>>>              try {
>>>>>>                  Object[] array = new Object[size];
>>>>>>                  if (first == null) {
>>>>>>                      first = array;
>>>>>>                  }
>>>>>>                  else {
>>>>>>                      last[0] = array;
>>>>>>                  }
>>>>>>                  last = array;
>>>>>>              }
>>>>>>              catch (OutOfMemoryError oome) {
>>>>>>                  size = size >>> 1;
>>>>>>              }
>>>>>>          }
>>>>>>          return first;
>>>>>>      }
>>>>>>
>>>>>>      public static void main(String[] args) throws Exception {
>>>>>>          // ensure ReferenceHandler thread is started
>>>>>>          new WeakReference<Object>(new Object());
>>>>>>
>>>>>>          ThreadGroup tg = Thread.currentThread().getThreadGroup();
>>>>>>          for (ThreadGroup tgn = tg;
>>>>>>               tgn != null;
>>>>>>               tg = tgn, tgn = tg.getParent());
>>>>>>
>>>>>>          Thread[] threads = new Thread[tg.activeCount()];
>>>>>>          Thread referenceHandlerThread = null;
>>>>>>          int n = tg.enumerate(threads);
>>>>>>          for (int i = 0; i < n; i++) {
>>>>>>              if("Reference Handler".equals(threads[i].getName())) {
>>>>>>                  referenceHandlerThread = threads[i];
>>>>>>              }
>>>>>>          }
>>>>>>
>>>>>>          if (referenceHandlerThread == null) {
>>>>>>              throw new IllegalStateException("Couldn't find Reference
>>>>>> Handler thread.");
>>>>>>          }
>>>>>>
>>>>>>          Object waste = fillHeap();
>>>>>>
>>>>>>          referenceHandlerThread.interrupt();
>>>>>>
>>>>>>          Thread.sleep(500L);
>>>>>>
>>>>>>          // release waste so we can allocate and throw normal
>>>>>> exceptions
>>>>>>          waste = null;
>>>>>>
>>>>>>           // check that the handler is still alive
>>>>>>          if (!referenceHandlerThread.isAlive()) {
>>>>>>              throw new Exception("Reference Handler thread died.");
>>>>>>          }
>>>>>>      }
>>>>>> }
>>>>>>
>>>>>>
>>>>>>
>>>>>> Regards, Peter
>>>>>>
>>>>>>>
>>>>>>>> Bugs.sun.com:
>>>>>>>> http://bugs.sun.com/view_bug.do?bug_id=7038914
>>>>>>>>
>>>>>>>> JIRA:
>>>>>>>> https://jbs.oracle.com/bugs/browse/JDK-7038914
>>>>>>>>
>>>>>>>> Testing:
>>>>>>>> JPRT, with the new reproducer passing
>>>>>>>>
>>>>>>>> In need of reviewers and a sponsor for this patch; as mentioned
>>>>>>>> earlier
>>>>>>>> I will set Peter (plevart) as author for this patch, i.e. send a
>>>>>>>> patchset to the sponsor for this change with that author set.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Thomas
>>>>>>>>
>>>>>>
>>>>
>>



More information about the core-libs-dev mailing list