Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently

Peter Levart peter.levart at gmail.com
Tue Jan 21 14:00:26 UTC 2014


Hi, David, Kalyan,

Summing up the discussion, I propose the following patch for 
ReferenceHandler:

http://cr.openjdk.java.net/~plevart/jdk9-dev/OOMEInReferenceHandler/webrev.01/

all 10 java/lang/ref tests pass on my PC (including OOMEInReferenceHandler).

I kindly ask Kalyan to try to re-run the OOMEInReferenceHandler test 
with this code and report any failure.


Thanks, Peter

On 01/21/2014 08:57 AM, David Holmes wrote:
> On 21/01/2014 4:54 PM, Peter Levart wrote:
>>
>> On 01/21/2014 03:22 AM, David Holmes wrote:
>>> Hi Peter,
>>>
>>> I do not see Cleaner being loaded prior to the main class on either
>>> Windows or Linux. Which platform are you on? Did you see it loaded
>>> before the main class or as part of executing it?
>>
>> Before. The main class is empty:
>>
>> public class Test { public static void main(String... a) {} }
>>
>> Here's last few lines of -verbose:class:
>>
>> [Loaded java.util.TimeZone from
>> /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar]
>> [Loaded sun.util.calendar.ZoneInfo from
>> /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar]
>> [Loaded sun.util.calendar.ZoneInfoFile from
>> /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar]
>> [Loaded sun.util.calendar.ZoneInfoFile$1 from
>> /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar]
>> [Loaded java.io.DataInput from
>> /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar]
>> [Loaded java.io.DataInputStream from
>> /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar]
>> *[Loaded sun.misc.Cleaner from
>> /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar]*
>
> Curious. I wonder what the controlling factor is ??
>
>>
>> I'm on linux, 64bit and using official EA build 121 of JDK 8...
>>
>> But if I try with JDK 7u45, I don't see it. So perhaps it would be good
>> to trigger Cleaner loading and initialization as part of
>> ReferenceHandler initialization to play things safe.
>
>
> If we do that for Cleaner we may as well do it for 
> InterruptedException too.
>
>>> Also, it is not that I think ReferenceHandler is responsible for
>>> reporting OOME, but that it is responsible for reporting that it was
>>> unable to perform a clean or enqueue because of OOME.
>>
>> This would be necessary if we skipped a Reference because of OOME, but
>> if we just re-try until we eventually succeed, nothing is lost, nothing
>> to report (but a slow response)...
>
> Agreed - just trying to clarify things.
>
>>>
>>> Your suggested approach seems okay though I'm not sure why we
>>> shouldn't help things along by calling System.gc() ourselves rather
>>> than just yielding and hoping things will get cleaned up elsewhere.
>>> But for the present purposes your approach will suffice I think.
>>
>> Maybe my understanding is wrong but isn't the fact that OOME is rised a
>> consequence of that VM has already attempted to clear things up
>> (executing a GC round synchronously) but didn't succeed to make enough
>> free space to satisfy the allocation request? If this is only how some
>> collectors/allocators are implemented and not a general rule, then we
>> should put a System.gc() in place of Thread.yield(). Should we also
>> combine that with Thread.yield()? I'm concerned of a possibility that we
>> spin, consume too much CPU (ReferenceHandler thread has MAX priority) so
>> that other threads dont' get enough CPU time to proceed and clean things
>> up (we hope other threads will also get OOME and release things as their
>> stacks unwind...).
>
> You are probably right about the System.gc() - OOME should be thrown 
> after GC fails to create space, so it really needs some other thread 
> to drop live references to allow further space to be reclaimed.
>
> But note that Thread.yield() can behave badly on some linux systems 
> too, so spinning is still a possibility - but either way this would 
> only be "really bad" on a uniprocessor system where yield() is 
> unlikely to misbehave.
>
> David
> -----
>
>>
>> Regards, Peter
>>
>>>
>>> Thanks,
>>> David
>>>
>>> On 20/01/2014 6:42 PM, Peter Levart wrote:
>>>> On 01/20/2014 09:00 AM, Peter Levart wrote:
>>>>> On 01/20/2014 02:51 AM, David Holmes wrote:
>>>>>> Hi Peter,
>>>>>>
>>>>>> On 17/01/2014 11:24 PM, Peter Levart wrote:
>>>>>>> On 01/17/2014 02:13 PM, Peter Levart wrote:
>>>>>>>>>> // Fast path for cleaners
>>>>>>>>>> boolean isCleaner = false;
>>>>>>>>>> try {
>>>>>>>>>>   isCleaner = r instanceof Cleaner;
>>>>>>>>>> } catch (OutofMemoryError oome) {
>>>>>>>>>>   continue;
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> if (isCleaner) {
>>>>>>>>>>   ((Cleaner)r).clean();
>>>>>>>>>>   continue;
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi David, Kalyan,
>>>>>>>>>
>>>>>>>>> I've caught-up now. Just thinking: is "instanceof Cleaner" 
>>>>>>>>> throwing
>>>>>>>>> OOME as a result of loading the Cleaner class? Wouldn't the above
>>>>>>>>> code then throw some error also in ((Cleaner)r) - the checkcast,
>>>>>>>>> since Cleaner class would not be successfully initialized?
>>>>>>>>
>>>>>>>> Well, no. The above code would just skip Cleaner processing in 
>>>>>>>> this
>>>>>>>> situation. And will never be doing it again after the heap is
>>>>>>>> freed...
>>>>>>>> So it might be good to load and initialize Cleaner class as 
>>>>>>>> part of
>>>>>>>> ReferenceHandler initialization to ensure correct operation...
>>>>>>>
>>>>>>> Well, yes and no. Let me try once more:
>>>>>>>
>>>>>>> Above code will skip Cleaner processing if the 1st time "instanceof
>>>>>>> Cleaner" is executed, OOME is thrown as a consequence of full heap
>>>>>>> while
>>>>>>> loading and initializing the Cleaner class.
>>>>>>
>>>>>> Yes - I was assuming that this would not fail the very first time 
>>>>>> and
>>>>>> so the Cleaner class would already be loaded. Failing to be able to
>>>>>> load the Cleaner class was one of the potential issues flagged
>>>>>> earlier with this problem. I was actually assuming that Cleaner 
>>>>>> would
>>>>>> be loaded already due to some actual Cleaner subclasses being used,
>>>>>> but this does not happen as part of the default initialization. :(
>>>>>> The irony being that if the Cleaner class is not loaded then r can
>>>>>> not be an instance of Cleaner and so we would fail to load the class
>>>>>> in a case where we didn't need it anyway.
>>>>>>
>>>>>> What I wanted to focus on here was an OOME from the instanceof
>>>>>> itself, but as you say that might trigger classloading of Cleaner
>>>>>> (which is not what I was interested in).
>>>>>>
>>>>>>> The 2nd time the "instanceof
>>>>>>> Cleaner" is executed after such OOME, the same line would throw
>>>>>>> NoClassDefFoundError as a consequence of referencing a class that
>>>>>>> failed
>>>>>>> initialization. Am I right?
>>>>>>
>>>>>> instanceof is not one of the class initialization triggers, so we
>>>>>> should not see an OOME generated due to a class initialization
>>>>>> exception and so the class will not be put into the Erroneous state
>>>>>> and so subsequent attempts to use the class will not automatically
>>>>>> trigger NoClassdefFoundError.
>>>>>>
>>>>>> If OOME occurs during actual loading/linking of the class Cleaner it
>>>>>> is unclear what would happen on subsequent attempts. OOME is not a
>>>>>> LinkageError that must be rethrown on subsequent attempts, and it is
>>>>>> potentially a transient condition, so I would expect a re-load
>>>>>> attempt to be allowed. However we are now deep into the details of
>>>>>> the VM and it may well depend on the exact place from which the OOME
>>>>>> originates.
>>>>>>
>>>>>> The bottom line with the current problem is that there are multiple
>>>>>> non-obvious paths by which the ReferenceHandler can encounter an
>>>>>> OOME. In such cases we do not want the ReferenceHandler to terminate
>>>>>> - which implies catching the OOME and continuing. However we also do
>>>>>> not want to silently skip Cleaner processing or reference queue
>>>>>> processing - as that would lead to hard to diagnoze bugs. But trying
>>>>>> to report the problem may not be possible due to being 
>>>>>> out-of-memory.
>>>>>> It may be that we need to break things up into multiple try/catch
>>>>>> blocks, where each catch does a System.gc() and then reports that 
>>>>>> the
>>>>>> OOME occurred. Of course the reporting must still be in a try/catch
>>>>>> for the OOME. Though at some point letting the ReferenceHandler die
>>>>>> may be the only way to "report" a major memory problem.
>>>>>>
>>>>>> David
>>>>>
>>>>> Hm... If I give -verbose:class option to run a simple test program:
>>>>>
>>>>> public class Test { public static void main(String... a) {} }
>>>>>
>>>>> I see Cleaner class being loaded before Test class. I don't see by
>>>>> which tread or if it might get loaded after main() starts, but I
>>>>> suspect that loading of Cleaner is not a problem here. Initialization
>>>>> of Cleaner class is not performed by ReferenceHandler thread as you
>>>>> pointed out. The instanceof does not trigger it and if it returns 
>>>>> true
>>>>> then Cleaner has already been initialized. So there must be some 
>>>>> other
>>>>> cause for instanceof throwing OOME...
>>>>>
>>>>> What do you say about this variant of ReferenceHandler.run() method:
>>>>>
>>>>>         public void run() {
>>>>>             for (;;) {
>>>>>                 Reference r;
>>>>>                 Cleaner c;
>>>>>                 synchronized (lock) {
>>>>>                     r = pending;
>>>>>                     if (r != null) {
>>>>>                         // instanceof operator might throw OOME
>>>>> sometimes. Just retry after
>>>>>                         // yielding - might have better luck next
>>>>> time...
>>>>>                         try {
>>>>>                             c = r instanceof Cleaner ? (Cleaner) r :
>>>>> null;
>>>>>                         } catch (OutOfMemoryError x) {
>>>>>                             Thread.yield();
>>>>>                             continue;
>>>>>                         }
>>>>>                         pending = r.discovered;
>>>>>                         r.discovered = null;
>>>>>                     } else {
>>>>>                         // The waiting on the lock may cause an OOME
>>>>> because it may try to allocate
>>>>>                         // exception objects, so also catch OOME here
>>>>> to avoid silent exit of the
>>>>>                         // reference handler thread.
>>>>>                         //
>>>>>                         // Explicitly define the order of the two
>>>>> exceptions we catch here
>>>>>                         // when waiting for the lock.
>>>>>                         //
>>>>>                         // We do not want to try to potentially load
>>>>> the InterruptedException class
>>>>>                         // (which would be done if this was its first
>>>>> use, and InterruptedException
>>>>>                         // were checked first) in this situation.
>>>>>                         //
>>>>>                         // This may lead to the VM not ever trying to
>>>>> load the InterruptedException
>>>>>                         // class again.
>>>>>                         try {
>>>>>                             try {
>>>>>                                 lock.wait();
>>>>>                             } catch (OutOfMemoryError x) { }
>>>>>                         } catch (InterruptedException x) { }
>>>>>                         continue;
>>>>>                     }
>>>>>                 }
>>>>>
>>>>>                 // Fast path for cleaners
>>>>>                 if (c != null) {
>>>>>                     c.clean();
>>>>>                     continue;
>>>>>                 }
>>>>>
>>>>>                 ReferenceQueue q = r.queue;
>>>>>                 if (q != ReferenceQueue.NULL) q.enqueue(r);
>>>>>             }
>>>>>         }
>>>>>
>>>>>
>>>>>
>>>>> ... it tries to not consume and skip Cleaner instances when OOME is
>>>>> caught.
>>>>>
>>>>> I don't think ReferenceHandler is to make responsible for reporting
>>>>> OOMEs. Full heap is a global condition and ReferenceHandler is the
>>>>> last to accuse for it.
>>>>>
>>>>> Regards, Peter
>>>>
>>>> Hi David,
>>>>
>>>> I think the following variation is even better. It executes
>>>> Thread.yield() after catching OOME but outside synchronized block so
>>>> that given CPU slice can be used by GC threads to make progress
>>>> enqueueing pending References (they are not able to enqueue them while
>>>> ReferenceHandler is holding the lock):
>>>>
>>>>
>>>>          public void run() {
>>>>              for (;;) {
>>>>                  Reference r;
>>>>                  Cleaner c;
>>>>                  try {
>>>>                      try {
>>>>                          synchronized (lock) {
>>>>                              r = pending;
>>>>                              if (r != null) {
>>>>                                  // 'instanceof' might throw OOME
>>>> sometimes so do this before
>>>>                                  // unlinking 'r' from the 'pending'
>>>> chain...
>>>>                                  c = r instanceof Cleaner ? 
>>>> (Cleaner) r
>>>> : null;
>>>>                                  // unlink 'r' from 'pending' chain
>>>>                                  pending = r.discovered;
>>>>                                  r.discovered = null;
>>>>                              } else {
>>>>                                  // The waiting on the lock may 
>>>> cause an
>>>> OOME because it may try to allocate
>>>>                                  // exception objects.
>>>>                                  lock.wait();
>>>>                                  continue;
>>>>                              }
>>>>                          }
>>>>                      } catch (OutOfMemoryError x) {
>>>>                          // Catch OOME from 'r instanceof Cleaner' or
>>>> 'lock.wait()' 1st so that we don't
>>>>                          // try to potentially load the
>>>> InterruptedException class
>>>>                          // (which would be done if this was its first
>>>> use, and InterruptedException
>>>>                          // were checked first) in this situation.
>>>>                          // Give other threads CPU time so they
>>>> hopefully release some objects and GC
>>>>                          // clears some heap.
>>>>                          // Also prevent CPU intensive spinning in 
>>>> case
>>>> 'r instanceof Cleaner' above
>>>>                          // persistently throws OOME for some time...
>>>>                          Thread.yield();
>>>>                          // retry
>>>>                          continue;
>>>>                      }
>>>>                  } catch (InterruptedException x) {
>>>>                      // Catch InterruptedException from 'lock.wait()'
>>>> and retry
>>>>                      continue;
>>>>                  }
>>>>
>>>>                  // Fast path for cleaners
>>>>                  if (c != null) {
>>>>                      c.clean();
>>>>                      continue;
>>>>                  }
>>>>
>>>>                  ReferenceQueue q = r.queue;
>>>>                  if (q != ReferenceQueue.NULL) q.enqueue(r);
>>>>              }
>>>>          }
>>>>
>>>>
>>>> Regards, Peter
>>>>
>>




More information about the core-libs-dev mailing list