Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently

David Holmes david.holmes at oracle.com
Wed Jan 22 02:19:05 UTC 2014


Hi Peter,

On 22/01/2014 12:00 AM, Peter Levart wrote:
> Hi, David, Kalyan,
>
> Summing up the discussion, I propose the following patch for
> ReferenceHandler:
>
> http://cr.openjdk.java.net/~plevart/jdk9-dev/OOMEInReferenceHandler/webrev.01/

I can live with it, though it maybe that once Cleaner has been preloaded 
instanceof can no longer throw OOME. Can't be 100% sure. And there's 
some duplication/verbosity in the commentary that could be trimmed down :)

Any specific reason to use Unsafe to do the preload rather than 
Class.forName ? Does this force Unsafe to be loaded earlier than it 
otherwise would?

Thanks,
David

>
> all 10 java/lang/ref tests pass on my PC (including
> OOMEInReferenceHandler).
>
> I kindly ask Kalyan to try to re-run the OOMEInReferenceHandler test
> with this code and report any failure.
>
>
> Thanks, Peter
>
> On 01/21/2014 08:57 AM, David Holmes wrote:
>> On 21/01/2014 4:54 PM, Peter Levart wrote:
>>>
>>> On 01/21/2014 03:22 AM, David Holmes wrote:
>>>> Hi Peter,
>>>>
>>>> I do not see Cleaner being loaded prior to the main class on either
>>>> Windows or Linux. Which platform are you on? Did you see it loaded
>>>> before the main class or as part of executing it?
>>>
>>> Before. The main class is empty:
>>>
>>> public class Test { public static void main(String... a) {} }
>>>
>>> Here's last few lines of -verbose:class:
>>>
>>> [Loaded java.util.TimeZone from
>>> /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar]
>>> [Loaded sun.util.calendar.ZoneInfo from
>>> /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar]
>>> [Loaded sun.util.calendar.ZoneInfoFile from
>>> /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar]
>>> [Loaded sun.util.calendar.ZoneInfoFile$1 from
>>> /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar]
>>> [Loaded java.io.DataInput from
>>> /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar]
>>> [Loaded java.io.DataInputStream from
>>> /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar]
>>> *[Loaded sun.misc.Cleaner from
>>> /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar]*
>>
>> Curious. I wonder what the controlling factor is ??
>>
>>>
>>> I'm on linux, 64bit and using official EA build 121 of JDK 8...
>>>
>>> But if I try with JDK 7u45, I don't see it. So perhaps it would be good
>>> to trigger Cleaner loading and initialization as part of
>>> ReferenceHandler initialization to play things safe.
>>
>>
>> If we do that for Cleaner we may as well do it for
>> InterruptedException too.
>>
>>>> Also, it is not that I think ReferenceHandler is responsible for
>>>> reporting OOME, but that it is responsible for reporting that it was
>>>> unable to perform a clean or enqueue because of OOME.
>>>
>>> This would be necessary if we skipped a Reference because of OOME, but
>>> if we just re-try until we eventually succeed, nothing is lost, nothing
>>> to report (but a slow response)...
>>
>> Agreed - just trying to clarify things.
>>
>>>>
>>>> Your suggested approach seems okay though I'm not sure why we
>>>> shouldn't help things along by calling System.gc() ourselves rather
>>>> than just yielding and hoping things will get cleaned up elsewhere.
>>>> But for the present purposes your approach will suffice I think.
>>>
>>> Maybe my understanding is wrong but isn't the fact that OOME is rised a
>>> consequence of that VM has already attempted to clear things up
>>> (executing a GC round synchronously) but didn't succeed to make enough
>>> free space to satisfy the allocation request? If this is only how some
>>> collectors/allocators are implemented and not a general rule, then we
>>> should put a System.gc() in place of Thread.yield(). Should we also
>>> combine that with Thread.yield()? I'm concerned of a possibility that we
>>> spin, consume too much CPU (ReferenceHandler thread has MAX priority) so
>>> that other threads dont' get enough CPU time to proceed and clean things
>>> up (we hope other threads will also get OOME and release things as their
>>> stacks unwind...).
>>
>> You are probably right about the System.gc() - OOME should be thrown
>> after GC fails to create space, so it really needs some other thread
>> to drop live references to allow further space to be reclaimed.
>>
>> But note that Thread.yield() can behave badly on some linux systems
>> too, so spinning is still a possibility - but either way this would
>> only be "really bad" on a uniprocessor system where yield() is
>> unlikely to misbehave.
>>
>> David
>> -----
>>
>>>
>>> Regards, Peter
>>>
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>> On 20/01/2014 6:42 PM, Peter Levart wrote:
>>>>> On 01/20/2014 09:00 AM, Peter Levart wrote:
>>>>>> On 01/20/2014 02:51 AM, David Holmes wrote:
>>>>>>> Hi Peter,
>>>>>>>
>>>>>>> On 17/01/2014 11:24 PM, Peter Levart wrote:
>>>>>>>> On 01/17/2014 02:13 PM, Peter Levart wrote:
>>>>>>>>>>> // Fast path for cleaners
>>>>>>>>>>> boolean isCleaner = false;
>>>>>>>>>>> try {
>>>>>>>>>>>   isCleaner = r instanceof Cleaner;
>>>>>>>>>>> } catch (OutofMemoryError oome) {
>>>>>>>>>>>   continue;
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> if (isCleaner) {
>>>>>>>>>>>   ((Cleaner)r).clean();
>>>>>>>>>>>   continue;
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi David, Kalyan,
>>>>>>>>>>
>>>>>>>>>> I've caught-up now. Just thinking: is "instanceof Cleaner"
>>>>>>>>>> throwing
>>>>>>>>>> OOME as a result of loading the Cleaner class? Wouldn't the above
>>>>>>>>>> code then throw some error also in ((Cleaner)r) - the checkcast,
>>>>>>>>>> since Cleaner class would not be successfully initialized?
>>>>>>>>>
>>>>>>>>> Well, no. The above code would just skip Cleaner processing in
>>>>>>>>> this
>>>>>>>>> situation. And will never be doing it again after the heap is
>>>>>>>>> freed...
>>>>>>>>> So it might be good to load and initialize Cleaner class as
>>>>>>>>> part of
>>>>>>>>> ReferenceHandler initialization to ensure correct operation...
>>>>>>>>
>>>>>>>> Well, yes and no. Let me try once more:
>>>>>>>>
>>>>>>>> Above code will skip Cleaner processing if the 1st time "instanceof
>>>>>>>> Cleaner" is executed, OOME is thrown as a consequence of full heap
>>>>>>>> while
>>>>>>>> loading and initializing the Cleaner class.
>>>>>>>
>>>>>>> Yes - I was assuming that this would not fail the very first time
>>>>>>> and
>>>>>>> so the Cleaner class would already be loaded. Failing to be able to
>>>>>>> load the Cleaner class was one of the potential issues flagged
>>>>>>> earlier with this problem. I was actually assuming that Cleaner
>>>>>>> would
>>>>>>> be loaded already due to some actual Cleaner subclasses being used,
>>>>>>> but this does not happen as part of the default initialization. :(
>>>>>>> The irony being that if the Cleaner class is not loaded then r can
>>>>>>> not be an instance of Cleaner and so we would fail to load the class
>>>>>>> in a case where we didn't need it anyway.
>>>>>>>
>>>>>>> What I wanted to focus on here was an OOME from the instanceof
>>>>>>> itself, but as you say that might trigger classloading of Cleaner
>>>>>>> (which is not what I was interested in).
>>>>>>>
>>>>>>>> The 2nd time the "instanceof
>>>>>>>> Cleaner" is executed after such OOME, the same line would throw
>>>>>>>> NoClassDefFoundError as a consequence of referencing a class that
>>>>>>>> failed
>>>>>>>> initialization. Am I right?
>>>>>>>
>>>>>>> instanceof is not one of the class initialization triggers, so we
>>>>>>> should not see an OOME generated due to a class initialization
>>>>>>> exception and so the class will not be put into the Erroneous state
>>>>>>> and so subsequent attempts to use the class will not automatically
>>>>>>> trigger NoClassdefFoundError.
>>>>>>>
>>>>>>> If OOME occurs during actual loading/linking of the class Cleaner it
>>>>>>> is unclear what would happen on subsequent attempts. OOME is not a
>>>>>>> LinkageError that must be rethrown on subsequent attempts, and it is
>>>>>>> potentially a transient condition, so I would expect a re-load
>>>>>>> attempt to be allowed. However we are now deep into the details of
>>>>>>> the VM and it may well depend on the exact place from which the OOME
>>>>>>> originates.
>>>>>>>
>>>>>>> The bottom line with the current problem is that there are multiple
>>>>>>> non-obvious paths by which the ReferenceHandler can encounter an
>>>>>>> OOME. In such cases we do not want the ReferenceHandler to terminate
>>>>>>> - which implies catching the OOME and continuing. However we also do
>>>>>>> not want to silently skip Cleaner processing or reference queue
>>>>>>> processing - as that would lead to hard to diagnoze bugs. But trying
>>>>>>> to report the problem may not be possible due to being
>>>>>>> out-of-memory.
>>>>>>> It may be that we need to break things up into multiple try/catch
>>>>>>> blocks, where each catch does a System.gc() and then reports that
>>>>>>> the
>>>>>>> OOME occurred. Of course the reporting must still be in a try/catch
>>>>>>> for the OOME. Though at some point letting the ReferenceHandler die
>>>>>>> may be the only way to "report" a major memory problem.
>>>>>>>
>>>>>>> David
>>>>>>
>>>>>> Hm... If I give -verbose:class option to run a simple test program:
>>>>>>
>>>>>> public class Test { public static void main(String... a) {} }
>>>>>>
>>>>>> I see Cleaner class being loaded before Test class. I don't see by
>>>>>> which tread or if it might get loaded after main() starts, but I
>>>>>> suspect that loading of Cleaner is not a problem here. Initialization
>>>>>> of Cleaner class is not performed by ReferenceHandler thread as you
>>>>>> pointed out. The instanceof does not trigger it and if it returns
>>>>>> true
>>>>>> then Cleaner has already been initialized. So there must be some
>>>>>> other
>>>>>> cause for instanceof throwing OOME...
>>>>>>
>>>>>> What do you say about this variant of ReferenceHandler.run() method:
>>>>>>
>>>>>>         public void run() {
>>>>>>             for (;;) {
>>>>>>                 Reference r;
>>>>>>                 Cleaner c;
>>>>>>                 synchronized (lock) {
>>>>>>                     r = pending;
>>>>>>                     if (r != null) {
>>>>>>                         // instanceof operator might throw OOME
>>>>>> sometimes. Just retry after
>>>>>>                         // yielding - might have better luck next
>>>>>> time...
>>>>>>                         try {
>>>>>>                             c = r instanceof Cleaner ? (Cleaner) r :
>>>>>> null;
>>>>>>                         } catch (OutOfMemoryError x) {
>>>>>>                             Thread.yield();
>>>>>>                             continue;
>>>>>>                         }
>>>>>>                         pending = r.discovered;
>>>>>>                         r.discovered = null;
>>>>>>                     } else {
>>>>>>                         // The waiting on the lock may cause an OOME
>>>>>> because it may try to allocate
>>>>>>                         // exception objects, so also catch OOME here
>>>>>> to avoid silent exit of the
>>>>>>                         // reference handler thread.
>>>>>>                         //
>>>>>>                         // Explicitly define the order of the two
>>>>>> exceptions we catch here
>>>>>>                         // when waiting for the lock.
>>>>>>                         //
>>>>>>                         // We do not want to try to potentially load
>>>>>> the InterruptedException class
>>>>>>                         // (which would be done if this was its first
>>>>>> use, and InterruptedException
>>>>>>                         // were checked first) in this situation.
>>>>>>                         //
>>>>>>                         // This may lead to the VM not ever trying to
>>>>>> load the InterruptedException
>>>>>>                         // class again.
>>>>>>                         try {
>>>>>>                             try {
>>>>>>                                 lock.wait();
>>>>>>                             } catch (OutOfMemoryError x) { }
>>>>>>                         } catch (InterruptedException x) { }
>>>>>>                         continue;
>>>>>>                     }
>>>>>>                 }
>>>>>>
>>>>>>                 // Fast path for cleaners
>>>>>>                 if (c != null) {
>>>>>>                     c.clean();
>>>>>>                     continue;
>>>>>>                 }
>>>>>>
>>>>>>                 ReferenceQueue q = r.queue;
>>>>>>                 if (q != ReferenceQueue.NULL) q.enqueue(r);
>>>>>>             }
>>>>>>         }
>>>>>>
>>>>>>
>>>>>>
>>>>>> ... it tries to not consume and skip Cleaner instances when OOME is
>>>>>> caught.
>>>>>>
>>>>>> I don't think ReferenceHandler is to make responsible for reporting
>>>>>> OOMEs. Full heap is a global condition and ReferenceHandler is the
>>>>>> last to accuse for it.
>>>>>>
>>>>>> Regards, Peter
>>>>>
>>>>> Hi David,
>>>>>
>>>>> I think the following variation is even better. It executes
>>>>> Thread.yield() after catching OOME but outside synchronized block so
>>>>> that given CPU slice can be used by GC threads to make progress
>>>>> enqueueing pending References (they are not able to enqueue them while
>>>>> ReferenceHandler is holding the lock):
>>>>>
>>>>>
>>>>>          public void run() {
>>>>>              for (;;) {
>>>>>                  Reference r;
>>>>>                  Cleaner c;
>>>>>                  try {
>>>>>                      try {
>>>>>                          synchronized (lock) {
>>>>>                              r = pending;
>>>>>                              if (r != null) {
>>>>>                                  // 'instanceof' might throw OOME
>>>>> sometimes so do this before
>>>>>                                  // unlinking 'r' from the 'pending'
>>>>> chain...
>>>>>                                  c = r instanceof Cleaner ?
>>>>> (Cleaner) r
>>>>> : null;
>>>>>                                  // unlink 'r' from 'pending' chain
>>>>>                                  pending = r.discovered;
>>>>>                                  r.discovered = null;
>>>>>                              } else {
>>>>>                                  // The waiting on the lock may
>>>>> cause an
>>>>> OOME because it may try to allocate
>>>>>                                  // exception objects.
>>>>>                                  lock.wait();
>>>>>                                  continue;
>>>>>                              }
>>>>>                          }
>>>>>                      } catch (OutOfMemoryError x) {
>>>>>                          // Catch OOME from 'r instanceof Cleaner' or
>>>>> 'lock.wait()' 1st so that we don't
>>>>>                          // try to potentially load the
>>>>> InterruptedException class
>>>>>                          // (which would be done if this was its first
>>>>> use, and InterruptedException
>>>>>                          // were checked first) in this situation.
>>>>>                          // Give other threads CPU time so they
>>>>> hopefully release some objects and GC
>>>>>                          // clears some heap.
>>>>>                          // Also prevent CPU intensive spinning in
>>>>> case
>>>>> 'r instanceof Cleaner' above
>>>>>                          // persistently throws OOME for some time...
>>>>>                          Thread.yield();
>>>>>                          // retry
>>>>>                          continue;
>>>>>                      }
>>>>>                  } catch (InterruptedException x) {
>>>>>                      // Catch InterruptedException from 'lock.wait()'
>>>>> and retry
>>>>>                      continue;
>>>>>                  }
>>>>>
>>>>>                  // Fast path for cleaners
>>>>>                  if (c != null) {
>>>>>                      c.clean();
>>>>>                      continue;
>>>>>                  }
>>>>>
>>>>>                  ReferenceQueue q = r.queue;
>>>>>                  if (q != ReferenceQueue.NULL) q.enqueue(r);
>>>>>              }
>>>>>          }
>>>>>
>>>>>
>>>>> Regards, Peter
>>>>>
>>>
>



More information about the core-libs-dev mailing list