RFR: 8016341 java/lang/ref/OOMEInReferenceHandler.java failing intermittently

Peter Levart peter.levart at gmail.com
Mon Jul 8 07:49:22 UTC 2013


On 07/08/2013 09:19 AM, David Holmes wrote:
> Hi Peter,
>
> On 8/07/2013 5:01 PM, Peter Levart wrote:
>> On 07/05/2013 01:22 AM, David Holmes wrote:
>>> Hi Peter,
>>>
>>> On 2/07/2013 5:19 PM, Peter Levart wrote:
>>>> Looking at original code once again, I think this was actually a bug.
>>>> The WeakReference instance constructed in (old) line 82, can be GCed
>>>> right away, since nobody is using the local variable after assignment.
>>>
>>> Of course. Doh! I was to busy thinking about the lifetime of the
>>> referent object to consider the reference itself.
>>>
>>>> If WeakReference is GCed it can not be enqueued. The promotion of 
>>>> local
>>>> variable into a field is one way to fix this. The other would be to 
>>>> use
>>>> the local variable somewhere down the code path, like for example in a
>>>> final throw statement:
>>>>
>>>>   110          throw new IllegalStateException("Reference Handler 
>>>> thread
>>>> stuck. weakRef.get(): " + weakRef.get());
>>>>
>>>>
>>>> This would also reveal some more info about the WeakReference when
>>>> there's no sure answer after 10 seconds and could be added to the test
>>>> anyway.
>>>
>>> Okay I've modified the test as suggested. Updated webrev at same
>>> location:
>>>
>>> http://cr.openjdk.java.net/~dholmes/8016341/webrev/
>>>
>>> In testing it though I simply exposed the remaining flaws in the
>>> ReferenceHandler code. We can still kill the ReferenceHandler thread
>>> with an OOME when it tries to load the Cleaner class (running with a
>>> 5M heap triggers this nicely if you use G1):
>>>
>>>                // Fast path for cleaners
>>>                 if (r instanceof Cleaner) {
>>>                     ((Cleaner)r).clean();
>>>                     continue;
>>>                 }
>>>
>>> and if that passes the clean() might throw OOME (it internally tries
>>> to do a System.exit if an exception occurs but will likely encounter
>>> another OOME trying to create the PrivilegedAction).
>>>
>>> Even the:
>>>
>>>                ReferenceQueue q = r.queue;
>>>                 if (q != ReferenceQueue.NULL) q.enqueue(r);
>>>
>>> might throw OOME because enqueue() might have to load the
>>> FinalReference class.
>>>
>>> So really catching the OOME around the wait() only patches a small
>>> hole. We can't simply put a try/catch in the for(;;) loop because that
>>> doesn't address the problem that if the class loading throws OOME then
>>> subsequent attempts to load that class will also fail. We would have
>>> to preload all possible classes. Even then we might just send the
>>> ReferenceHandler thread into a busy loop of throwing OOME catching it
>>> and retrying!
>>>
>>> So I can fix the test to deal with the Xcomp issue but we may still
>>> see intermittent failures, and the ReferenceHandler thread may still
>>> die from OOME.
>>
>> Hi David,
>>
>> I think the test is fine now.
>>
>> Regarding the resilience of ReferenceHandler thread, I think we should
>> try to preload FinalReference, Cleaner and InterruptedException in the
>> initialization phase of ReferenceHandler thread. There should be no
>> danger of OOME in related cases then.
>
> We would need more than that as we need to ensure there is zero 
> classloading needed at any point where OOME might be thrown. That 
> would include PrivilegedAction and any transitive closure therefrom.

Hi David,

I currently don't see any other possibility for another class to be 
loaded in the ReferenceHandler's run() method, apart from those that you 
indicated and what gets loaded in the the Cleaner.thunk's run() method.

>
>> As far as Cleaner's exception
>> handler is concerned, I think it is not universally wise to just exit
>> the VM when any exception is thrown by the Cleaner's thunk.run() method.
>> What if that exception is OOME? That does not mean there's something
>> wrong with Cleaner.thunk's code. Not only will this kill
>> ReferenceHandler thread, but entire VM.
>
> Yes "fatal errors" are not uncommon in the JDK, or at least not 
> completely rare. The general question is whether continuing might do 
> more harm than good given that some "critical" action has failed. Hard 
> to know in general so aborting is fairly common response (ala hotspot 
> running out of C heap memory).

Ok then. if we should stick with terminating the VM, what about the 
following "allocation-conservative" Cleaner.clean() method:


     private static final class TerminateAction implements 
PrivilegedAction<Void> {
         final Error error = new Error("Cleaner terminated abnormally");
         @Override
         public Void run() {
             try {
                 if (System.err != null) error.printStackTrace();
             } finally {
                 System.exit(1);
             }
             return null;
         }
     }

     // pre-allocate single instance of TerminateAction
     private static final TerminateAction terminateAction = new 
TerminateAction();

     /**
      * Runs this cleaner, if it has not been run before.
      */
     public void clean() {
         if (!remove(this))
             return;
         try {
             thunk.run();
         } catch (final Throwable x) {
             synchronized (terminateAction) {
                 if (terminateAction.error.getCause() == null)
                     terminateAction.error.initCause(x);
             }
             AccessController.doPrivileged(terminateAction);
         }
     }


Regards, Peter

>
> Thanks,
> David
> -----
>
>> If the purpose of exiting VM was attracting attention to the possible
>> bug in Cleaner.thunk's code, then this absolutely works, but wouldn't
>> simple message to System.err be enough? Like for example:
>>
>>
>>      public void clean() {
>>          if (!remove(this))
>>              return;
>>          try {
>>              thunk.run();
>>          } catch (final Throwable x) {
>>              try {
>>                  new Error("Cleaner caught exception", x)
>>                      .printStackTrace();
>>              } catch (OutOfMemoryError oome) {
>>                  // can't do much
>>              }
>>          }
>>      }
>>
>>
>> Regards, Peter
>>
>>>
>>> David
>>> -----
>>>
>>>
>>>>
>>>> Regards, Peter
>>>>
>>>> On 07/02/2013 06:38 AM, David Holmes wrote:
>>>>> This recently added test was found to fail under some conditions -
>>>>> namely client compiler with -Xcomp. It seems that the use of all 
>>>>> local
>>>>> variables enabled the compiler to optimize things in a way that
>>>>> stopped the weakref from being enqueued as expected. Simple fix 
>>>>> was to
>>>>> make the weakref a field.
>>>>>
>>>>> http://cr.openjdk.java.net/~dholmes/8016341/webrev/
>>>>>
>>>>> Thanks,
>>>>> David
>>>>
>>




More information about the core-libs-dev mailing list