Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently

Peter Levart peter.levart at gmail.com
Sat Jan 25 00:44:41 UTC 2014


On 01/22/2014 03:19 AM, David Holmes wrote:
> Hi Peter,
>
> On 22/01/2014 12:00 AM, Peter Levart wrote:
>> Hi, David, Kalyan,
>>
>> Summing up the discussion, I propose the following patch for
>> ReferenceHandler:
>>
>> http://cr.openjdk.java.net/~plevart/jdk9-dev/OOMEInReferenceHandler/webrev.01/ 
>>
>
> I can live with it, though it maybe that once Cleaner has been 
> preloaded instanceof can no longer throw OOME. Can't be 100% sure. And 
> there's some duplication/verbosity in the commentary that could be 
> trimmed down :)
>
> Any specific reason to use Unsafe to do the preload rather than 
> Class.forName ? Does this force Unsafe to be loaded earlier than it 
> otherwise would?

Good question. In systemDictionary.hpp they are both on the preloaded 
list in this order:

   do_klass(Reference_klass, java_lang_ref_Reference,                   
Pre                 ) \
...
   do_klass(misc_Unsafe_klass, 
sun_misc_Unsafe,                           Pre                 ) \


So when Reference is initialized, the Unsafe is already loaded. But I 
don't know if it is already initialized. This should be studied.

I'll try to find out what is the case and get back to you.

Regards, Peter


>
> Thanks,
> David
>
>>
>> all 10 java/lang/ref tests pass on my PC (including
>> OOMEInReferenceHandler).
>>
>> I kindly ask Kalyan to try to re-run the OOMEInReferenceHandler test
>> with this code and report any failure.
>>
>>
>> Thanks, Peter
>>
>> On 01/21/2014 08:57 AM, David Holmes wrote:
>>> On 21/01/2014 4:54 PM, Peter Levart wrote:
>>>>
>>>> On 01/21/2014 03:22 AM, David Holmes wrote:
>>>>> Hi Peter,
>>>>>
>>>>> I do not see Cleaner being loaded prior to the main class on either
>>>>> Windows or Linux. Which platform are you on? Did you see it loaded
>>>>> before the main class or as part of executing it?
>>>>
>>>> Before. The main class is empty:
>>>>
>>>> public class Test { public static void main(String... a) {} }
>>>>
>>>> Here's last few lines of -verbose:class:
>>>>
>>>> [Loaded java.util.TimeZone from
>>>> /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar]
>>>> [Loaded sun.util.calendar.ZoneInfo from
>>>> /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar]
>>>> [Loaded sun.util.calendar.ZoneInfoFile from
>>>> /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar]
>>>> [Loaded sun.util.calendar.ZoneInfoFile$1 from
>>>> /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar]
>>>> [Loaded java.io.DataInput from
>>>> /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar]
>>>> [Loaded java.io.DataInputStream from
>>>> /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar]
>>>> *[Loaded sun.misc.Cleaner from
>>>> /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar]*
>>>
>>> Curious. I wonder what the controlling factor is ??
>>>
>>>>
>>>> I'm on linux, 64bit and using official EA build 121 of JDK 8...
>>>>
>>>> But if I try with JDK 7u45, I don't see it. So perhaps it would be 
>>>> good
>>>> to trigger Cleaner loading and initialization as part of
>>>> ReferenceHandler initialization to play things safe.
>>>
>>>
>>> If we do that for Cleaner we may as well do it for
>>> InterruptedException too.
>>>
>>>>> Also, it is not that I think ReferenceHandler is responsible for
>>>>> reporting OOME, but that it is responsible for reporting that it was
>>>>> unable to perform a clean or enqueue because of OOME.
>>>>
>>>> This would be necessary if we skipped a Reference because of OOME, but
>>>> if we just re-try until we eventually succeed, nothing is lost, 
>>>> nothing
>>>> to report (but a slow response)...
>>>
>>> Agreed - just trying to clarify things.
>>>
>>>>>
>>>>> Your suggested approach seems okay though I'm not sure why we
>>>>> shouldn't help things along by calling System.gc() ourselves rather
>>>>> than just yielding and hoping things will get cleaned up elsewhere.
>>>>> But for the present purposes your approach will suffice I think.
>>>>
>>>> Maybe my understanding is wrong but isn't the fact that OOME is 
>>>> rised a
>>>> consequence of that VM has already attempted to clear things up
>>>> (executing a GC round synchronously) but didn't succeed to make enough
>>>> free space to satisfy the allocation request? If this is only how some
>>>> collectors/allocators are implemented and not a general rule, then we
>>>> should put a System.gc() in place of Thread.yield(). Should we also
>>>> combine that with Thread.yield()? I'm concerned of a possibility 
>>>> that we
>>>> spin, consume too much CPU (ReferenceHandler thread has MAX 
>>>> priority) so
>>>> that other threads dont' get enough CPU time to proceed and clean 
>>>> things
>>>> up (we hope other threads will also get OOME and release things as 
>>>> their
>>>> stacks unwind...).
>>>
>>> You are probably right about the System.gc() - OOME should be thrown
>>> after GC fails to create space, so it really needs some other thread
>>> to drop live references to allow further space to be reclaimed.
>>>
>>> But note that Thread.yield() can behave badly on some linux systems
>>> too, so spinning is still a possibility - but either way this would
>>> only be "really bad" on a uniprocessor system where yield() is
>>> unlikely to misbehave.
>>>
>>> David
>>> -----
>>>
>>>>
>>>> Regards, Peter
>>>>
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>> On 20/01/2014 6:42 PM, Peter Levart wrote:
>>>>>> On 01/20/2014 09:00 AM, Peter Levart wrote:
>>>>>>> On 01/20/2014 02:51 AM, David Holmes wrote:
>>>>>>>> Hi Peter,
>>>>>>>>
>>>>>>>> On 17/01/2014 11:24 PM, Peter Levart wrote:
>>>>>>>>> On 01/17/2014 02:13 PM, Peter Levart wrote:
>>>>>>>>>>>> // Fast path for cleaners
>>>>>>>>>>>> boolean isCleaner = false;
>>>>>>>>>>>> try {
>>>>>>>>>>>>   isCleaner = r instanceof Cleaner;
>>>>>>>>>>>> } catch (OutofMemoryError oome) {
>>>>>>>>>>>>   continue;
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> if (isCleaner) {
>>>>>>>>>>>>   ((Cleaner)r).clean();
>>>>>>>>>>>>   continue;
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hi David, Kalyan,
>>>>>>>>>>>
>>>>>>>>>>> I've caught-up now. Just thinking: is "instanceof Cleaner"
>>>>>>>>>>> throwing
>>>>>>>>>>> OOME as a result of loading the Cleaner class? Wouldn't the 
>>>>>>>>>>> above
>>>>>>>>>>> code then throw some error also in ((Cleaner)r) - the 
>>>>>>>>>>> checkcast,
>>>>>>>>>>> since Cleaner class would not be successfully initialized?
>>>>>>>>>>
>>>>>>>>>> Well, no. The above code would just skip Cleaner processing in
>>>>>>>>>> this
>>>>>>>>>> situation. And will never be doing it again after the heap is
>>>>>>>>>> freed...
>>>>>>>>>> So it might be good to load and initialize Cleaner class as
>>>>>>>>>> part of
>>>>>>>>>> ReferenceHandler initialization to ensure correct operation...
>>>>>>>>>
>>>>>>>>> Well, yes and no. Let me try once more:
>>>>>>>>>
>>>>>>>>> Above code will skip Cleaner processing if the 1st time 
>>>>>>>>> "instanceof
>>>>>>>>> Cleaner" is executed, OOME is thrown as a consequence of full 
>>>>>>>>> heap
>>>>>>>>> while
>>>>>>>>> loading and initializing the Cleaner class.
>>>>>>>>
>>>>>>>> Yes - I was assuming that this would not fail the very first time
>>>>>>>> and
>>>>>>>> so the Cleaner class would already be loaded. Failing to be 
>>>>>>>> able to
>>>>>>>> load the Cleaner class was one of the potential issues flagged
>>>>>>>> earlier with this problem. I was actually assuming that Cleaner
>>>>>>>> would
>>>>>>>> be loaded already due to some actual Cleaner subclasses being 
>>>>>>>> used,
>>>>>>>> but this does not happen as part of the default initialization. :(
>>>>>>>> The irony being that if the Cleaner class is not loaded then r can
>>>>>>>> not be an instance of Cleaner and so we would fail to load the 
>>>>>>>> class
>>>>>>>> in a case where we didn't need it anyway.
>>>>>>>>
>>>>>>>> What I wanted to focus on here was an OOME from the instanceof
>>>>>>>> itself, but as you say that might trigger classloading of Cleaner
>>>>>>>> (which is not what I was interested in).
>>>>>>>>
>>>>>>>>> The 2nd time the "instanceof
>>>>>>>>> Cleaner" is executed after such OOME, the same line would throw
>>>>>>>>> NoClassDefFoundError as a consequence of referencing a class that
>>>>>>>>> failed
>>>>>>>>> initialization. Am I right?
>>>>>>>>
>>>>>>>> instanceof is not one of the class initialization triggers, so we
>>>>>>>> should not see an OOME generated due to a class initialization
>>>>>>>> exception and so the class will not be put into the Erroneous 
>>>>>>>> state
>>>>>>>> and so subsequent attempts to use the class will not automatically
>>>>>>>> trigger NoClassdefFoundError.
>>>>>>>>
>>>>>>>> If OOME occurs during actual loading/linking of the class 
>>>>>>>> Cleaner it
>>>>>>>> is unclear what would happen on subsequent attempts. OOME is not a
>>>>>>>> LinkageError that must be rethrown on subsequent attempts, and 
>>>>>>>> it is
>>>>>>>> potentially a transient condition, so I would expect a re-load
>>>>>>>> attempt to be allowed. However we are now deep into the details of
>>>>>>>> the VM and it may well depend on the exact place from which the 
>>>>>>>> OOME
>>>>>>>> originates.
>>>>>>>>
>>>>>>>> The bottom line with the current problem is that there are 
>>>>>>>> multiple
>>>>>>>> non-obvious paths by which the ReferenceHandler can encounter an
>>>>>>>> OOME. In such cases we do not want the ReferenceHandler to 
>>>>>>>> terminate
>>>>>>>> - which implies catching the OOME and continuing. However we 
>>>>>>>> also do
>>>>>>>> not want to silently skip Cleaner processing or reference queue
>>>>>>>> processing - as that would lead to hard to diagnoze bugs. But 
>>>>>>>> trying
>>>>>>>> to report the problem may not be possible due to being
>>>>>>>> out-of-memory.
>>>>>>>> It may be that we need to break things up into multiple try/catch
>>>>>>>> blocks, where each catch does a System.gc() and then reports that
>>>>>>>> the
>>>>>>>> OOME occurred. Of course the reporting must still be in a 
>>>>>>>> try/catch
>>>>>>>> for the OOME. Though at some point letting the ReferenceHandler 
>>>>>>>> die
>>>>>>>> may be the only way to "report" a major memory problem.
>>>>>>>>
>>>>>>>> David
>>>>>>>
>>>>>>> Hm... If I give -verbose:class option to run a simple test program:
>>>>>>>
>>>>>>> public class Test { public static void main(String... a) {} }
>>>>>>>
>>>>>>> I see Cleaner class being loaded before Test class. I don't see by
>>>>>>> which tread or if it might get loaded after main() starts, but I
>>>>>>> suspect that loading of Cleaner is not a problem here. 
>>>>>>> Initialization
>>>>>>> of Cleaner class is not performed by ReferenceHandler thread as you
>>>>>>> pointed out. The instanceof does not trigger it and if it returns
>>>>>>> true
>>>>>>> then Cleaner has already been initialized. So there must be some
>>>>>>> other
>>>>>>> cause for instanceof throwing OOME...
>>>>>>>
>>>>>>> What do you say about this variant of ReferenceHandler.run() 
>>>>>>> method:
>>>>>>>
>>>>>>>         public void run() {
>>>>>>>             for (;;) {
>>>>>>>                 Reference r;
>>>>>>>                 Cleaner c;
>>>>>>>                 synchronized (lock) {
>>>>>>>                     r = pending;
>>>>>>>                     if (r != null) {
>>>>>>>                         // instanceof operator might throw OOME
>>>>>>> sometimes. Just retry after
>>>>>>>                         // yielding - might have better luck next
>>>>>>> time...
>>>>>>>                         try {
>>>>>>>                             c = r instanceof Cleaner ? (Cleaner) 
>>>>>>> r :
>>>>>>> null;
>>>>>>>                         } catch (OutOfMemoryError x) {
>>>>>>>                             Thread.yield();
>>>>>>>                             continue;
>>>>>>>                         }
>>>>>>>                         pending = r.discovered;
>>>>>>>                         r.discovered = null;
>>>>>>>                     } else {
>>>>>>>                         // The waiting on the lock may cause an 
>>>>>>> OOME
>>>>>>> because it may try to allocate
>>>>>>>                         // exception objects, so also catch OOME 
>>>>>>> here
>>>>>>> to avoid silent exit of the
>>>>>>>                         // reference handler thread.
>>>>>>>                         //
>>>>>>>                         // Explicitly define the order of the two
>>>>>>> exceptions we catch here
>>>>>>>                         // when waiting for the lock.
>>>>>>>                         //
>>>>>>>                         // We do not want to try to potentially 
>>>>>>> load
>>>>>>> the InterruptedException class
>>>>>>>                         // (which would be done if this was its 
>>>>>>> first
>>>>>>> use, and InterruptedException
>>>>>>>                         // were checked first) in this situation.
>>>>>>>                         //
>>>>>>>                         // This may lead to the VM not ever 
>>>>>>> trying to
>>>>>>> load the InterruptedException
>>>>>>>                         // class again.
>>>>>>>                         try {
>>>>>>>                             try {
>>>>>>>                                 lock.wait();
>>>>>>>                             } catch (OutOfMemoryError x) { }
>>>>>>>                         } catch (InterruptedException x) { }
>>>>>>>                         continue;
>>>>>>>                     }
>>>>>>>                 }
>>>>>>>
>>>>>>>                 // Fast path for cleaners
>>>>>>>                 if (c != null) {
>>>>>>>                     c.clean();
>>>>>>>                     continue;
>>>>>>>                 }
>>>>>>>
>>>>>>>                 ReferenceQueue q = r.queue;
>>>>>>>                 if (q != ReferenceQueue.NULL) q.enqueue(r);
>>>>>>>             }
>>>>>>>         }
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ... it tries to not consume and skip Cleaner instances when OOME is
>>>>>>> caught.
>>>>>>>
>>>>>>> I don't think ReferenceHandler is to make responsible for reporting
>>>>>>> OOMEs. Full heap is a global condition and ReferenceHandler is the
>>>>>>> last to accuse for it.
>>>>>>>
>>>>>>> Regards, Peter
>>>>>>
>>>>>> Hi David,
>>>>>>
>>>>>> I think the following variation is even better. It executes
>>>>>> Thread.yield() after catching OOME but outside synchronized block so
>>>>>> that given CPU slice can be used by GC threads to make progress
>>>>>> enqueueing pending References (they are not able to enqueue them 
>>>>>> while
>>>>>> ReferenceHandler is holding the lock):
>>>>>>
>>>>>>
>>>>>>          public void run() {
>>>>>>              for (;;) {
>>>>>>                  Reference r;
>>>>>>                  Cleaner c;
>>>>>>                  try {
>>>>>>                      try {
>>>>>>                          synchronized (lock) {
>>>>>>                              r = pending;
>>>>>>                              if (r != null) {
>>>>>>                                  // 'instanceof' might throw OOME
>>>>>> sometimes so do this before
>>>>>>                                  // unlinking 'r' from the 'pending'
>>>>>> chain...
>>>>>>                                  c = r instanceof Cleaner ?
>>>>>> (Cleaner) r
>>>>>> : null;
>>>>>>                                  // unlink 'r' from 'pending' chain
>>>>>>                                  pending = r.discovered;
>>>>>>                                  r.discovered = null;
>>>>>>                              } else {
>>>>>>                                  // The waiting on the lock may
>>>>>> cause an
>>>>>> OOME because it may try to allocate
>>>>>>                                  // exception objects.
>>>>>>                                  lock.wait();
>>>>>>                                  continue;
>>>>>>                              }
>>>>>>                          }
>>>>>>                      } catch (OutOfMemoryError x) {
>>>>>>                          // Catch OOME from 'r instanceof 
>>>>>> Cleaner' or
>>>>>> 'lock.wait()' 1st so that we don't
>>>>>>                          // try to potentially load the
>>>>>> InterruptedException class
>>>>>>                          // (which would be done if this was its 
>>>>>> first
>>>>>> use, and InterruptedException
>>>>>>                          // were checked first) in this situation.
>>>>>>                          // Give other threads CPU time so they
>>>>>> hopefully release some objects and GC
>>>>>>                          // clears some heap.
>>>>>>                          // Also prevent CPU intensive spinning in
>>>>>> case
>>>>>> 'r instanceof Cleaner' above
>>>>>>                          // persistently throws OOME for some 
>>>>>> time...
>>>>>>                          Thread.yield();
>>>>>>                          // retry
>>>>>>                          continue;
>>>>>>                      }
>>>>>>                  } catch (InterruptedException x) {
>>>>>>                      // Catch InterruptedException from 
>>>>>> 'lock.wait()'
>>>>>> and retry
>>>>>>                      continue;
>>>>>>                  }
>>>>>>
>>>>>>                  // Fast path for cleaners
>>>>>>                  if (c != null) {
>>>>>>                      c.clean();
>>>>>>                      continue;
>>>>>>                  }
>>>>>>
>>>>>>                  ReferenceQueue q = r.queue;
>>>>>>                  if (q != ReferenceQueue.NULL) q.enqueue(r);
>>>>>>              }
>>>>>>          }
>>>>>>
>>>>>>
>>>>>> Regards, Peter
>>>>>>
>>>>
>>




More information about the core-libs-dev mailing list