RFR: 8016341 java/lang/ref/OOMEInReferenceHandler.java failing intermittently

David Holmes david.holmes at oracle.com
Mon Jul 8 08:37:16 UTC 2013


On 8/07/2013 5:49 PM, Peter Levart wrote:
> On 07/08/2013 09:19 AM, David Holmes wrote:
>>>> In testing it though I simply exposed the remaining flaws in the
>>>> ReferenceHandler code. We can still kill the ReferenceHandler thread
>>>> with an OOME when it tries to load the Cleaner class (running with a
>>>> 5M heap triggers this nicely if you use G1):
>>>>
>>>>                // Fast path for cleaners
>>>>                 if (r instanceof Cleaner) {
>>>>                     ((Cleaner)r).clean();
>>>>                     continue;
>>>>                 }
>>>>
>>>> and if that passes the clean() might throw OOME (it internally tries
>>>> to do a System.exit if an exception occurs but will likely encounter
>>>> another OOME trying to create the PrivilegedAction).
>>>>
>>>> Even the:
>>>>
>>>>                ReferenceQueue q = r.queue;
>>>>                 if (q != ReferenceQueue.NULL) q.enqueue(r);
>>>>
>>>> might throw OOME because enqueue() might have to load the
>>>> FinalReference class.
>>>>
>>>> So really catching the OOME around the wait() only patches a small
>>>> hole. We can't simply put a try/catch in the for(;;) loop because that
>>>> doesn't address the problem that if the class loading throws OOME then
>>>> subsequent attempts to load that class will also fail. We would have
>>>> to preload all possible classes. Even then we might just send the
>>>> ReferenceHandler thread into a busy loop of throwing OOME catching it
>>>> and retrying!
>>>>
>>>> So I can fix the test to deal with the Xcomp issue but we may still
>>>> see intermittent failures, and the ReferenceHandler thread may still
>>>> die from OOME.
>>>
>>> Hi David,
>>>
>>> I think the test is fine now.
>>>
>>> Regarding the resilience of ReferenceHandler thread, I think we should
>>> try to preload FinalReference, Cleaner and InterruptedException in the
>>> initialization phase of ReferenceHandler thread. There should be no
>>> danger of OOME in related cases then.
>>
>> We would need more than that as we need to ensure there is zero
>> classloading needed at any point where OOME might be thrown. That
>> would include PrivilegedAction and any transitive closure therefrom.
>
> Hi David,
>
> I currently don't see any other possibility for another class to be
> loaded in the ReferenceHandler's run() method, apart from those that you
> indicated and what gets loaded in the the Cleaner.thunk's run() method.

Under "normal" conditions all the classes should already be loaded once 
VM initialization is complete. It is only with very small heaps that we 
can get the problem during VM initialization itself ... in which case 
maybe we don't care? Or perhaps that too should be a fatal error? (If a 
Cleaner has been registered then Cleaner must already be loaded.) 
Anything loaded via the run() method will be handled the enclosing code 
in Cleaner.

>>
>>> As far as Cleaner's exception
>>> handler is concerned, I think it is not universally wise to just exit
>>> the VM when any exception is thrown by the Cleaner's thunk.run() method.
>>> What if that exception is OOME? That does not mean there's something
>>> wrong with Cleaner.thunk's code. Not only will this kill
>>> ReferenceHandler thread, but entire VM.
>>
>> Yes "fatal errors" are not uncommon in the JDK, or at least not
>> completely rare. The general question is whether continuing might do
>> more harm than good given that some "critical" action has failed. Hard
>> to know in general so aborting is fairly common response (ala hotspot
>> running out of C heap memory).
>
> Ok then. if we should stick with terminating the VM, what about the
> following "allocation-conservative" Cleaner.clean() method:

Adding a try/finally in the PrivilegedAction.run() would seem to suffice.

David
-----

>
>      private static final class TerminateAction implements
> PrivilegedAction<Void> {
>          final Error error = new Error("Cleaner terminated abnormally");
>          @Override
>          public Void run() {
>              try {
>                  if (System.err != null) error.printStackTrace();
>              } finally {
>                  System.exit(1);
>              }
>              return null;
>          }
>      }
>
>      // pre-allocate single instance of TerminateAction
>      private static final TerminateAction terminateAction = new
> TerminateAction();
>
>      /**
>       * Runs this cleaner, if it has not been run before.
>       */
>      public void clean() {
>          if (!remove(this))
>              return;
>          try {
>              thunk.run();
>          } catch (final Throwable x) {
>              synchronized (terminateAction) {
>                  if (terminateAction.error.getCause() == null)
>                      terminateAction.error.initCause(x);
>              }
>              AccessController.doPrivileged(terminateAction);
>          }
>      }
>
>
> Regards, Peter
>
>>
>> Thanks,
>> David
>> -----
>>
>>> If the purpose of exiting VM was attracting attention to the possible
>>> bug in Cleaner.thunk's code, then this absolutely works, but wouldn't
>>> simple message to System.err be enough? Like for example:
>>>
>>>
>>>      public void clean() {
>>>          if (!remove(this))
>>>              return;
>>>          try {
>>>              thunk.run();
>>>          } catch (final Throwable x) {
>>>              try {
>>>                  new Error("Cleaner caught exception", x)
>>>                      .printStackTrace();
>>>              } catch (OutOfMemoryError oome) {
>>>                  // can't do much
>>>              }
>>>          }
>>>      }
>>>
>>>
>>> Regards, Peter
>>>
>>>>
>>>> David
>>>> -----
>>>>
>>>>
>>>>>
>>>>> Regards, Peter
>>>>>
>>>>> On 07/02/2013 06:38 AM, David Holmes wrote:
>>>>>> This recently added test was found to fail under some conditions -
>>>>>> namely client compiler with -Xcomp. It seems that the use of all
>>>>>> local
>>>>>> variables enabled the compiler to optimize things in a way that
>>>>>> stopped the weakref from being enqueued as expected. Simple fix
>>>>>> was to
>>>>>> make the weakref a field.
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dholmes/8016341/webrev/
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>
>>>
>



More information about the core-libs-dev mailing list