Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
Peter Levart
peter.levart at gmail.com
Tue Jan 21 14:00:26 UTC 2014
Hi, David, Kalyan,
Summing up the discussion, I propose the following patch for
ReferenceHandler:
http://cr.openjdk.java.net/~plevart/jdk9-dev/OOMEInReferenceHandler/webrev.01/
all 10 java/lang/ref tests pass on my PC (including OOMEInReferenceHandler).
I kindly ask Kalyan to try to re-run the OOMEInReferenceHandler test
with this code and report any failure.
Thanks, Peter
On 01/21/2014 08:57 AM, David Holmes wrote:
> On 21/01/2014 4:54 PM, Peter Levart wrote:
>>
>> On 01/21/2014 03:22 AM, David Holmes wrote:
>>> Hi Peter,
>>>
>>> I do not see Cleaner being loaded prior to the main class on either
>>> Windows or Linux. Which platform are you on? Did you see it loaded
>>> before the main class or as part of executing it?
>>
>> Before. The main class is empty:
>>
>> public class Test { public static void main(String... a) {} }
>>
>> Here's last few lines of -verbose:class:
>>
>> [Loaded java.util.TimeZone from
>> /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar]
>> [Loaded sun.util.calendar.ZoneInfo from
>> /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar]
>> [Loaded sun.util.calendar.ZoneInfoFile from
>> /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar]
>> [Loaded sun.util.calendar.ZoneInfoFile$1 from
>> /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar]
>> [Loaded java.io.DataInput from
>> /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar]
>> [Loaded java.io.DataInputStream from
>> /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar]
>> *[Loaded sun.misc.Cleaner from
>> /home/peter/Apps64/jdk1.8.0-ea-b121/jre/lib/rt.jar]*
>
> Curious. I wonder what the controlling factor is ??
>
>>
>> I'm on linux, 64bit and using official EA build 121 of JDK 8...
>>
>> But if I try with JDK 7u45, I don't see it. So perhaps it would be good
>> to trigger Cleaner loading and initialization as part of
>> ReferenceHandler initialization to play things safe.
>
>
> If we do that for Cleaner we may as well do it for
> InterruptedException too.
>
>>> Also, it is not that I think ReferenceHandler is responsible for
>>> reporting OOME, but that it is responsible for reporting that it was
>>> unable to perform a clean or enqueue because of OOME.
>>
>> This would be necessary if we skipped a Reference because of OOME, but
>> if we just re-try until we eventually succeed, nothing is lost, nothing
>> to report (but a slow response)...
>
> Agreed - just trying to clarify things.
>
>>>
>>> Your suggested approach seems okay though I'm not sure why we
>>> shouldn't help things along by calling System.gc() ourselves rather
>>> than just yielding and hoping things will get cleaned up elsewhere.
>>> But for the present purposes your approach will suffice I think.
>>
>> Maybe my understanding is wrong but isn't the fact that OOME is rised a
>> consequence of that VM has already attempted to clear things up
>> (executing a GC round synchronously) but didn't succeed to make enough
>> free space to satisfy the allocation request? If this is only how some
>> collectors/allocators are implemented and not a general rule, then we
>> should put a System.gc() in place of Thread.yield(). Should we also
>> combine that with Thread.yield()? I'm concerned of a possibility that we
>> spin, consume too much CPU (ReferenceHandler thread has MAX priority) so
>> that other threads dont' get enough CPU time to proceed and clean things
>> up (we hope other threads will also get OOME and release things as their
>> stacks unwind...).
>
> You are probably right about the System.gc() - OOME should be thrown
> after GC fails to create space, so it really needs some other thread
> to drop live references to allow further space to be reclaimed.
>
> But note that Thread.yield() can behave badly on some linux systems
> too, so spinning is still a possibility - but either way this would
> only be "really bad" on a uniprocessor system where yield() is
> unlikely to misbehave.
>
> David
> -----
>
>>
>> Regards, Peter
>>
>>>
>>> Thanks,
>>> David
>>>
>>> On 20/01/2014 6:42 PM, Peter Levart wrote:
>>>> On 01/20/2014 09:00 AM, Peter Levart wrote:
>>>>> On 01/20/2014 02:51 AM, David Holmes wrote:
>>>>>> Hi Peter,
>>>>>>
>>>>>> On 17/01/2014 11:24 PM, Peter Levart wrote:
>>>>>>> On 01/17/2014 02:13 PM, Peter Levart wrote:
>>>>>>>>>> // Fast path for cleaners
>>>>>>>>>> boolean isCleaner = false;
>>>>>>>>>> try {
>>>>>>>>>> isCleaner = r instanceof Cleaner;
>>>>>>>>>> } catch (OutofMemoryError oome) {
>>>>>>>>>> continue;
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> if (isCleaner) {
>>>>>>>>>> ((Cleaner)r).clean();
>>>>>>>>>> continue;
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi David, Kalyan,
>>>>>>>>>
>>>>>>>>> I've caught-up now. Just thinking: is "instanceof Cleaner"
>>>>>>>>> throwing
>>>>>>>>> OOME as a result of loading the Cleaner class? Wouldn't the above
>>>>>>>>> code then throw some error also in ((Cleaner)r) - the checkcast,
>>>>>>>>> since Cleaner class would not be successfully initialized?
>>>>>>>>
>>>>>>>> Well, no. The above code would just skip Cleaner processing in
>>>>>>>> this
>>>>>>>> situation. And will never be doing it again after the heap is
>>>>>>>> freed...
>>>>>>>> So it might be good to load and initialize Cleaner class as
>>>>>>>> part of
>>>>>>>> ReferenceHandler initialization to ensure correct operation...
>>>>>>>
>>>>>>> Well, yes and no. Let me try once more:
>>>>>>>
>>>>>>> Above code will skip Cleaner processing if the 1st time "instanceof
>>>>>>> Cleaner" is executed, OOME is thrown as a consequence of full heap
>>>>>>> while
>>>>>>> loading and initializing the Cleaner class.
>>>>>>
>>>>>> Yes - I was assuming that this would not fail the very first time
>>>>>> and
>>>>>> so the Cleaner class would already be loaded. Failing to be able to
>>>>>> load the Cleaner class was one of the potential issues flagged
>>>>>> earlier with this problem. I was actually assuming that Cleaner
>>>>>> would
>>>>>> be loaded already due to some actual Cleaner subclasses being used,
>>>>>> but this does not happen as part of the default initialization. :(
>>>>>> The irony being that if the Cleaner class is not loaded then r can
>>>>>> not be an instance of Cleaner and so we would fail to load the class
>>>>>> in a case where we didn't need it anyway.
>>>>>>
>>>>>> What I wanted to focus on here was an OOME from the instanceof
>>>>>> itself, but as you say that might trigger classloading of Cleaner
>>>>>> (which is not what I was interested in).
>>>>>>
>>>>>>> The 2nd time the "instanceof
>>>>>>> Cleaner" is executed after such OOME, the same line would throw
>>>>>>> NoClassDefFoundError as a consequence of referencing a class that
>>>>>>> failed
>>>>>>> initialization. Am I right?
>>>>>>
>>>>>> instanceof is not one of the class initialization triggers, so we
>>>>>> should not see an OOME generated due to a class initialization
>>>>>> exception and so the class will not be put into the Erroneous state
>>>>>> and so subsequent attempts to use the class will not automatically
>>>>>> trigger NoClassdefFoundError.
>>>>>>
>>>>>> If OOME occurs during actual loading/linking of the class Cleaner it
>>>>>> is unclear what would happen on subsequent attempts. OOME is not a
>>>>>> LinkageError that must be rethrown on subsequent attempts, and it is
>>>>>> potentially a transient condition, so I would expect a re-load
>>>>>> attempt to be allowed. However we are now deep into the details of
>>>>>> the VM and it may well depend on the exact place from which the OOME
>>>>>> originates.
>>>>>>
>>>>>> The bottom line with the current problem is that there are multiple
>>>>>> non-obvious paths by which the ReferenceHandler can encounter an
>>>>>> OOME. In such cases we do not want the ReferenceHandler to terminate
>>>>>> - which implies catching the OOME and continuing. However we also do
>>>>>> not want to silently skip Cleaner processing or reference queue
>>>>>> processing - as that would lead to hard to diagnoze bugs. But trying
>>>>>> to report the problem may not be possible due to being
>>>>>> out-of-memory.
>>>>>> It may be that we need to break things up into multiple try/catch
>>>>>> blocks, where each catch does a System.gc() and then reports that
>>>>>> the
>>>>>> OOME occurred. Of course the reporting must still be in a try/catch
>>>>>> for the OOME. Though at some point letting the ReferenceHandler die
>>>>>> may be the only way to "report" a major memory problem.
>>>>>>
>>>>>> David
>>>>>
>>>>> Hm... If I give -verbose:class option to run a simple test program:
>>>>>
>>>>> public class Test { public static void main(String... a) {} }
>>>>>
>>>>> I see Cleaner class being loaded before Test class. I don't see by
>>>>> which tread or if it might get loaded after main() starts, but I
>>>>> suspect that loading of Cleaner is not a problem here. Initialization
>>>>> of Cleaner class is not performed by ReferenceHandler thread as you
>>>>> pointed out. The instanceof does not trigger it and if it returns
>>>>> true
>>>>> then Cleaner has already been initialized. So there must be some
>>>>> other
>>>>> cause for instanceof throwing OOME...
>>>>>
>>>>> What do you say about this variant of ReferenceHandler.run() method:
>>>>>
>>>>> public void run() {
>>>>> for (;;) {
>>>>> Reference r;
>>>>> Cleaner c;
>>>>> synchronized (lock) {
>>>>> r = pending;
>>>>> if (r != null) {
>>>>> // instanceof operator might throw OOME
>>>>> sometimes. Just retry after
>>>>> // yielding - might have better luck next
>>>>> time...
>>>>> try {
>>>>> c = r instanceof Cleaner ? (Cleaner) r :
>>>>> null;
>>>>> } catch (OutOfMemoryError x) {
>>>>> Thread.yield();
>>>>> continue;
>>>>> }
>>>>> pending = r.discovered;
>>>>> r.discovered = null;
>>>>> } else {
>>>>> // The waiting on the lock may cause an OOME
>>>>> because it may try to allocate
>>>>> // exception objects, so also catch OOME here
>>>>> to avoid silent exit of the
>>>>> // reference handler thread.
>>>>> //
>>>>> // Explicitly define the order of the two
>>>>> exceptions we catch here
>>>>> // when waiting for the lock.
>>>>> //
>>>>> // We do not want to try to potentially load
>>>>> the InterruptedException class
>>>>> // (which would be done if this was its first
>>>>> use, and InterruptedException
>>>>> // were checked first) in this situation.
>>>>> //
>>>>> // This may lead to the VM not ever trying to
>>>>> load the InterruptedException
>>>>> // class again.
>>>>> try {
>>>>> try {
>>>>> lock.wait();
>>>>> } catch (OutOfMemoryError x) { }
>>>>> } catch (InterruptedException x) { }
>>>>> continue;
>>>>> }
>>>>> }
>>>>>
>>>>> // Fast path for cleaners
>>>>> if (c != null) {
>>>>> c.clean();
>>>>> continue;
>>>>> }
>>>>>
>>>>> ReferenceQueue q = r.queue;
>>>>> if (q != ReferenceQueue.NULL) q.enqueue(r);
>>>>> }
>>>>> }
>>>>>
>>>>>
>>>>>
>>>>> ... it tries to not consume and skip Cleaner instances when OOME is
>>>>> caught.
>>>>>
>>>>> I don't think ReferenceHandler is to make responsible for reporting
>>>>> OOMEs. Full heap is a global condition and ReferenceHandler is the
>>>>> last to accuse for it.
>>>>>
>>>>> Regards, Peter
>>>>
>>>> Hi David,
>>>>
>>>> I think the following variation is even better. It executes
>>>> Thread.yield() after catching OOME but outside synchronized block so
>>>> that given CPU slice can be used by GC threads to make progress
>>>> enqueueing pending References (they are not able to enqueue them while
>>>> ReferenceHandler is holding the lock):
>>>>
>>>>
>>>> public void run() {
>>>> for (;;) {
>>>> Reference r;
>>>> Cleaner c;
>>>> try {
>>>> try {
>>>> synchronized (lock) {
>>>> r = pending;
>>>> if (r != null) {
>>>> // 'instanceof' might throw OOME
>>>> sometimes so do this before
>>>> // unlinking 'r' from the 'pending'
>>>> chain...
>>>> c = r instanceof Cleaner ?
>>>> (Cleaner) r
>>>> : null;
>>>> // unlink 'r' from 'pending' chain
>>>> pending = r.discovered;
>>>> r.discovered = null;
>>>> } else {
>>>> // The waiting on the lock may
>>>> cause an
>>>> OOME because it may try to allocate
>>>> // exception objects.
>>>> lock.wait();
>>>> continue;
>>>> }
>>>> }
>>>> } catch (OutOfMemoryError x) {
>>>> // Catch OOME from 'r instanceof Cleaner' or
>>>> 'lock.wait()' 1st so that we don't
>>>> // try to potentially load the
>>>> InterruptedException class
>>>> // (which would be done if this was its first
>>>> use, and InterruptedException
>>>> // were checked first) in this situation.
>>>> // Give other threads CPU time so they
>>>> hopefully release some objects and GC
>>>> // clears some heap.
>>>> // Also prevent CPU intensive spinning in
>>>> case
>>>> 'r instanceof Cleaner' above
>>>> // persistently throws OOME for some time...
>>>> Thread.yield();
>>>> // retry
>>>> continue;
>>>> }
>>>> } catch (InterruptedException x) {
>>>> // Catch InterruptedException from 'lock.wait()'
>>>> and retry
>>>> continue;
>>>> }
>>>>
>>>> // Fast path for cleaners
>>>> if (c != null) {
>>>> c.clean();
>>>> continue;
>>>> }
>>>>
>>>> ReferenceQueue q = r.queue;
>>>> if (q != ReferenceQueue.NULL) q.enqueue(r);
>>>> }
>>>> }
>>>>
>>>>
>>>> Regards, Peter
>>>>
>>
More information about the core-libs-dev
mailing list