Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
Peter Levart
peter.levart at gmail.com
Sat Jan 25 00:44:41 UTC 2014
On 01/22/2014 03:19 AM, David Holmes wrote:
> Hi Peter,
>
> On 22/01/2014 12:00 AM, Peter Levart wrote:
>> Hi, David, Kalyan,
>>
>> Summing up the discussion, I propose the following patch for
>> ReferenceHandler:
>>
>> http://cr.openjdk.java.net/~plevart/jdk9-dev/OOMEInReferenceHandler/webrev.01/
>>
>
> I can live with it, though it maybe that once Cleaner has been
> preloaded instanceof can no longer throw OOME. Can't be 100% sure. And
> there's some duplication/verbosity in the commentary that could be
> trimmed down :)
>
> Any specific reason to use Unsafe to do the preload rather than
> Class.forName ? Does this force Unsafe to be loaded earlier than it
> otherwise would?
Good question. In systemDictionary.hpp they are both on the preloaded
list in this order:
do_klass(Reference_klass, java_lang_ref_Reference,
Pre ) \
...
do_klass(misc_Unsafe_klass,
sun_misc_Unsafe, Pre ) \
So when Reference is initialized, the Unsafe is already loaded. But I
don't know if it is already initialized. This should be studied.
I'll try to find out what is the case and get back to you.
Regards, Peter
>
> Thanks,
> David
>
>>
>> all 10 java/lang/ref tests pass on my PC (including
>> OOMEInReferenceHandler).
>>
>> I kindly ask Kalyan to try to re-run the OOMEInReferenceHandler test
>> with this code and report any failure.
>>
>>
>> Thanks, Peter
>>
>> On 01/21/2014 08:57 AM, David Holmes wrote:
>>> On 21/01/2014 4:54 PM, Peter Levart wrote:
>>>>
>>>> On 01/21/2014 03:22 AM, David Holmes wrote:
>>>>> Hi Peter,
>>>>>
>>>>> I do not see Cleaner being loaded prior to the main class on either
>>>>> Windows or Linux. Which platform are you on? Did you see it loaded
>>>>> before the main class or as part of executing it?
>>>>
>>>> Before. The main class is empty:
>>>>
>>>> public class Test { public static void main(String... a) {} }
>>>>
>>>> Here's last few lines of -verbose:class:
>>>>
>>>> [Loaded java.util.TimeZone from
>>>> /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar]
>>>> [Loaded sun.util.calendar.ZoneInfo from
>>>> /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar]
>>>> [Loaded sun.util.calendar.ZoneInfoFile from
>>>> /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar]
>>>> [Loaded sun.util.calendar.ZoneInfoFile$1 from
>>>> /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar]
>>>> [Loaded java.io.DataInput from
>>>> /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar]
>>>> [Loaded java.io.DataInputStream from
>>>> /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar]
>>>> *[Loaded sun.misc.Cleaner from
>>>> /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar]*
>>>
>>> Curious. I wonder what the controlling factor is ??
>>>
>>>>
>>>> I'm on linux, 64bit and using official EA build 121 of JDK 8...
>>>>
>>>> But if I try with JDK 7u45, I don't see it. So perhaps it would be
>>>> good
>>>> to trigger Cleaner loading and initialization as part of
>>>> ReferenceHandler initialization to play things safe.
>>>
>>>
>>> If we do that for Cleaner we may as well do it for
>>> InterruptedException too.
>>>
>>>>> Also, it is not that I think ReferenceHandler is responsible for
>>>>> reporting OOME, but that it is responsible for reporting that it was
>>>>> unable to perform a clean or enqueue because of OOME.
>>>>
>>>> This would be necessary if we skipped a Reference because of OOME, but
>>>> if we just re-try until we eventually succeed, nothing is lost,
>>>> nothing
>>>> to report (but a slow response)...
>>>
>>> Agreed - just trying to clarify things.
>>>
>>>>>
>>>>> Your suggested approach seems okay though I'm not sure why we
>>>>> shouldn't help things along by calling System.gc() ourselves rather
>>>>> than just yielding and hoping things will get cleaned up elsewhere.
>>>>> But for the present purposes your approach will suffice I think.
>>>>
>>>> Maybe my understanding is wrong but isn't the fact that OOME is
>>>> rised a
>>>> consequence of that VM has already attempted to clear things up
>>>> (executing a GC round synchronously) but didn't succeed to make enough
>>>> free space to satisfy the allocation request? If this is only how some
>>>> collectors/allocators are implemented and not a general rule, then we
>>>> should put a System.gc() in place of Thread.yield(). Should we also
>>>> combine that with Thread.yield()? I'm concerned of a possibility
>>>> that we
>>>> spin, consume too much CPU (ReferenceHandler thread has MAX
>>>> priority) so
>>>> that other threads dont' get enough CPU time to proceed and clean
>>>> things
>>>> up (we hope other threads will also get OOME and release things as
>>>> their
>>>> stacks unwind...).
>>>
>>> You are probably right about the System.gc() - OOME should be thrown
>>> after GC fails to create space, so it really needs some other thread
>>> to drop live references to allow further space to be reclaimed.
>>>
>>> But note that Thread.yield() can behave badly on some linux systems
>>> too, so spinning is still a possibility - but either way this would
>>> only be "really bad" on a uniprocessor system where yield() is
>>> unlikely to misbehave.
>>>
>>> David
>>> -----
>>>
>>>>
>>>> Regards, Peter
>>>>
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>> On 20/01/2014 6:42 PM, Peter Levart wrote:
>>>>>> On 01/20/2014 09:00 AM, Peter Levart wrote:
>>>>>>> On 01/20/2014 02:51 AM, David Holmes wrote:
>>>>>>>> Hi Peter,
>>>>>>>>
>>>>>>>> On 17/01/2014 11:24 PM, Peter Levart wrote:
>>>>>>>>> On 01/17/2014 02:13 PM, Peter Levart wrote:
>>>>>>>>>>>> // Fast path for cleaners
>>>>>>>>>>>> boolean isCleaner = false;
>>>>>>>>>>>> try {
>>>>>>>>>>>> isCleaner = r instanceof Cleaner;
>>>>>>>>>>>> } catch (OutofMemoryError oome) {
>>>>>>>>>>>> continue;
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> if (isCleaner) {
>>>>>>>>>>>> ((Cleaner)r).clean();
>>>>>>>>>>>> continue;
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hi David, Kalyan,
>>>>>>>>>>>
>>>>>>>>>>> I've caught-up now. Just thinking: is "instanceof Cleaner"
>>>>>>>>>>> throwing
>>>>>>>>>>> OOME as a result of loading the Cleaner class? Wouldn't the
>>>>>>>>>>> above
>>>>>>>>>>> code then throw some error also in ((Cleaner)r) - the
>>>>>>>>>>> checkcast,
>>>>>>>>>>> since Cleaner class would not be successfully initialized?
>>>>>>>>>>
>>>>>>>>>> Well, no. The above code would just skip Cleaner processing in
>>>>>>>>>> this
>>>>>>>>>> situation. And will never be doing it again after the heap is
>>>>>>>>>> freed...
>>>>>>>>>> So it might be good to load and initialize Cleaner class as
>>>>>>>>>> part of
>>>>>>>>>> ReferenceHandler initialization to ensure correct operation...
>>>>>>>>>
>>>>>>>>> Well, yes and no. Let me try once more:
>>>>>>>>>
>>>>>>>>> Above code will skip Cleaner processing if the 1st time
>>>>>>>>> "instanceof
>>>>>>>>> Cleaner" is executed, OOME is thrown as a consequence of full
>>>>>>>>> heap
>>>>>>>>> while
>>>>>>>>> loading and initializing the Cleaner class.
>>>>>>>>
>>>>>>>> Yes - I was assuming that this would not fail the very first time
>>>>>>>> and
>>>>>>>> so the Cleaner class would already be loaded. Failing to be
>>>>>>>> able to
>>>>>>>> load the Cleaner class was one of the potential issues flagged
>>>>>>>> earlier with this problem. I was actually assuming that Cleaner
>>>>>>>> would
>>>>>>>> be loaded already due to some actual Cleaner subclasses being
>>>>>>>> used,
>>>>>>>> but this does not happen as part of the default initialization. :(
>>>>>>>> The irony being that if the Cleaner class is not loaded then r can
>>>>>>>> not be an instance of Cleaner and so we would fail to load the
>>>>>>>> class
>>>>>>>> in a case where we didn't need it anyway.
>>>>>>>>
>>>>>>>> What I wanted to focus on here was an OOME from the instanceof
>>>>>>>> itself, but as you say that might trigger classloading of Cleaner
>>>>>>>> (which is not what I was interested in).
>>>>>>>>
>>>>>>>>> The 2nd time the "instanceof
>>>>>>>>> Cleaner" is executed after such OOME, the same line would throw
>>>>>>>>> NoClassDefFoundError as a consequence of referencing a class that
>>>>>>>>> failed
>>>>>>>>> initialization. Am I right?
>>>>>>>>
>>>>>>>> instanceof is not one of the class initialization triggers, so we
>>>>>>>> should not see an OOME generated due to a class initialization
>>>>>>>> exception and so the class will not be put into the Erroneous
>>>>>>>> state
>>>>>>>> and so subsequent attempts to use the class will not automatically
>>>>>>>> trigger NoClassdefFoundError.
>>>>>>>>
>>>>>>>> If OOME occurs during actual loading/linking of the class
>>>>>>>> Cleaner it
>>>>>>>> is unclear what would happen on subsequent attempts. OOME is not a
>>>>>>>> LinkageError that must be rethrown on subsequent attempts, and
>>>>>>>> it is
>>>>>>>> potentially a transient condition, so I would expect a re-load
>>>>>>>> attempt to be allowed. However we are now deep into the details of
>>>>>>>> the VM and it may well depend on the exact place from which the
>>>>>>>> OOME
>>>>>>>> originates.
>>>>>>>>
>>>>>>>> The bottom line with the current problem is that there are
>>>>>>>> multiple
>>>>>>>> non-obvious paths by which the ReferenceHandler can encounter an
>>>>>>>> OOME. In such cases we do not want the ReferenceHandler to
>>>>>>>> terminate
>>>>>>>> - which implies catching the OOME and continuing. However we
>>>>>>>> also do
>>>>>>>> not want to silently skip Cleaner processing or reference queue
>>>>>>>> processing - as that would lead to hard to diagnoze bugs. But
>>>>>>>> trying
>>>>>>>> to report the problem may not be possible due to being
>>>>>>>> out-of-memory.
>>>>>>>> It may be that we need to break things up into multiple try/catch
>>>>>>>> blocks, where each catch does a System.gc() and then reports that
>>>>>>>> the
>>>>>>>> OOME occurred. Of course the reporting must still be in a
>>>>>>>> try/catch
>>>>>>>> for the OOME. Though at some point letting the ReferenceHandler
>>>>>>>> die
>>>>>>>> may be the only way to "report" a major memory problem.
>>>>>>>>
>>>>>>>> David
>>>>>>>
>>>>>>> Hm... If I give -verbose:class option to run a simple test program:
>>>>>>>
>>>>>>> public class Test { public static void main(String... a) {} }
>>>>>>>
>>>>>>> I see Cleaner class being loaded before Test class. I don't see by
>>>>>>> which tread or if it might get loaded after main() starts, but I
>>>>>>> suspect that loading of Cleaner is not a problem here.
>>>>>>> Initialization
>>>>>>> of Cleaner class is not performed by ReferenceHandler thread as you
>>>>>>> pointed out. The instanceof does not trigger it and if it returns
>>>>>>> true
>>>>>>> then Cleaner has already been initialized. So there must be some
>>>>>>> other
>>>>>>> cause for instanceof throwing OOME...
>>>>>>>
>>>>>>> What do you say about this variant of ReferenceHandler.run()
>>>>>>> method:
>>>>>>>
>>>>>>> public void run() {
>>>>>>> for (;;) {
>>>>>>> Reference r;
>>>>>>> Cleaner c;
>>>>>>> synchronized (lock) {
>>>>>>> r = pending;
>>>>>>> if (r != null) {
>>>>>>> // instanceof operator might throw OOME
>>>>>>> sometimes. Just retry after
>>>>>>> // yielding - might have better luck next
>>>>>>> time...
>>>>>>> try {
>>>>>>> c = r instanceof Cleaner ? (Cleaner)
>>>>>>> r :
>>>>>>> null;
>>>>>>> } catch (OutOfMemoryError x) {
>>>>>>> Thread.yield();
>>>>>>> continue;
>>>>>>> }
>>>>>>> pending = r.discovered;
>>>>>>> r.discovered = null;
>>>>>>> } else {
>>>>>>> // The waiting on the lock may cause an
>>>>>>> OOME
>>>>>>> because it may try to allocate
>>>>>>> // exception objects, so also catch OOME
>>>>>>> here
>>>>>>> to avoid silent exit of the
>>>>>>> // reference handler thread.
>>>>>>> //
>>>>>>> // Explicitly define the order of the two
>>>>>>> exceptions we catch here
>>>>>>> // when waiting for the lock.
>>>>>>> //
>>>>>>> // We do not want to try to potentially
>>>>>>> load
>>>>>>> the InterruptedException class
>>>>>>> // (which would be done if this was its
>>>>>>> first
>>>>>>> use, and InterruptedException
>>>>>>> // were checked first) in this situation.
>>>>>>> //
>>>>>>> // This may lead to the VM not ever
>>>>>>> trying to
>>>>>>> load the InterruptedException
>>>>>>> // class again.
>>>>>>> try {
>>>>>>> try {
>>>>>>> lock.wait();
>>>>>>> } catch (OutOfMemoryError x) { }
>>>>>>> } catch (InterruptedException x) { }
>>>>>>> continue;
>>>>>>> }
>>>>>>> }
>>>>>>>
>>>>>>> // Fast path for cleaners
>>>>>>> if (c != null) {
>>>>>>> c.clean();
>>>>>>> continue;
>>>>>>> }
>>>>>>>
>>>>>>> ReferenceQueue q = r.queue;
>>>>>>> if (q != ReferenceQueue.NULL) q.enqueue(r);
>>>>>>> }
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ... it tries to not consume and skip Cleaner instances when OOME is
>>>>>>> caught.
>>>>>>>
>>>>>>> I don't think ReferenceHandler is to make responsible for reporting
>>>>>>> OOMEs. Full heap is a global condition and ReferenceHandler is the
>>>>>>> last to accuse for it.
>>>>>>>
>>>>>>> Regards, Peter
>>>>>>
>>>>>> Hi David,
>>>>>>
>>>>>> I think the following variation is even better. It executes
>>>>>> Thread.yield() after catching OOME but outside synchronized block so
>>>>>> that given CPU slice can be used by GC threads to make progress
>>>>>> enqueueing pending References (they are not able to enqueue them
>>>>>> while
>>>>>> ReferenceHandler is holding the lock):
>>>>>>
>>>>>>
>>>>>> public void run() {
>>>>>> for (;;) {
>>>>>> Reference r;
>>>>>> Cleaner c;
>>>>>> try {
>>>>>> try {
>>>>>> synchronized (lock) {
>>>>>> r = pending;
>>>>>> if (r != null) {
>>>>>> // 'instanceof' might throw OOME
>>>>>> sometimes so do this before
>>>>>> // unlinking 'r' from the 'pending'
>>>>>> chain...
>>>>>> c = r instanceof Cleaner ?
>>>>>> (Cleaner) r
>>>>>> : null;
>>>>>> // unlink 'r' from 'pending' chain
>>>>>> pending = r.discovered;
>>>>>> r.discovered = null;
>>>>>> } else {
>>>>>> // The waiting on the lock may
>>>>>> cause an
>>>>>> OOME because it may try to allocate
>>>>>> // exception objects.
>>>>>> lock.wait();
>>>>>> continue;
>>>>>> }
>>>>>> }
>>>>>> } catch (OutOfMemoryError x) {
>>>>>> // Catch OOME from 'r instanceof
>>>>>> Cleaner' or
>>>>>> 'lock.wait()' 1st so that we don't
>>>>>> // try to potentially load the
>>>>>> InterruptedException class
>>>>>> // (which would be done if this was its
>>>>>> first
>>>>>> use, and InterruptedException
>>>>>> // were checked first) in this situation.
>>>>>> // Give other threads CPU time so they
>>>>>> hopefully release some objects and GC
>>>>>> // clears some heap.
>>>>>> // Also prevent CPU intensive spinning in
>>>>>> case
>>>>>> 'r instanceof Cleaner' above
>>>>>> // persistently throws OOME for some
>>>>>> time...
>>>>>> Thread.yield();
>>>>>> // retry
>>>>>> continue;
>>>>>> }
>>>>>> } catch (InterruptedException x) {
>>>>>> // Catch InterruptedException from
>>>>>> 'lock.wait()'
>>>>>> and retry
>>>>>> continue;
>>>>>> }
>>>>>>
>>>>>> // Fast path for cleaners
>>>>>> if (c != null) {
>>>>>> c.clean();
>>>>>> continue;
>>>>>> }
>>>>>>
>>>>>> ReferenceQueue q = r.queue;
>>>>>> if (q != ReferenceQueue.NULL) q.enqueue(r);
>>>>>> }
>>>>>> }
>>>>>>
>>>>>>
>>>>>> Regards, Peter
>>>>>>
>>>>
>>
More information about the core-libs-dev
mailing list