Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
srikalyan chandrashekar
srikalyan.chandrashekar at oracle.com
Wed Jan 8 20:08:15 UTC 2014
Hi Peter, the jtreg test configuration is @run main/othervm -Xmx24M
-XX:-UseTLAB OOMEInReferenceHandler. With this option you still have to
run the test several times(like a 1000 runs) to capture 1(OR) more
failures. Platform may not have an affect, however i used a 64 bit
Ubuntu 12.04 LTS , 8GB, 2 core workstation and any JDK(7/8).
---
Thanks
kalyan
On 01/08/2014 05:53 AM, Peter Levart wrote:
> Hi Kalyan,
>
> What hardware/OS/JVM and what JVM options are you using to reproduce
> this failure. I would really like to reproduce this myself, but all
> attempts on my PC have so far been unsuccessful. I might be able to
> get access to a machine that is similar to yours...
>
> Regards, Peter
>
> On 01/07/2014 09:55 PM, srikalyan chandrashekar wrote:
>> Peter, getting state info out(to console or otherwise) from within
>> Reference Handler's exceptions handlers have been unsuccessful.
>> However David's suggestion produced some useful trace with fast debug
>> build and could get some information , see the log here
>> <http://cr.openjdk.java.net/%7Esrikchan/OOME_exception_trace.log> .
>> ---
>> Thanks
>> kalyan
>> On 01/07/2014 12:42 AM, Peter Levart wrote:
>>> On 01/07/2014 03:15 AM, srikalyan chandrashekar wrote:
>>>> Sure David will give that a try, we have so far attempted to
>>>> 1. Print state data(as per the test creator peter.levart's inputs),
>>>
>>> Hi Kalyan,
>>>
>>> Have you been able to reproduce the OOME in that set-up? What was
>>> the result?
>>>
>>> Regards, Peter
>>>
>>>> 2. Use UEH(uncaught exception handler per Mandy's inputs)
>>>>
>>>> --
>>>> Thanks
>>>> kalyan
>>>>
>>>> On 1/6/14 4:40 PM, David Holmes wrote:
>>>>> Back from vacation ...
>>>>>
>>>>> On 20/12/2013 4:49 PM, David Holmes wrote:
>>>>>> On 20/12/2013 12:57 PM, srikalyan chandrashekar wrote:
>>>>>>> Hi David Thanks for your comments, the unguarded part(clean and
>>>>>>> enqueue)
>>>>>>> in the Reference Handler thread does not seem to create any new
>>>>>>> objects,
>>>>>>> so it is the application(the test in this case) which is adding
>>>>>>> objects
>>>>>>> to heap and causing the Reference Handler to die with OOME.
>>>>>>
>>>>>> The ReferenceHandler thread can only get OOME if it allocates
>>>>>> (directly
>>>>>> or indirectly) - so there has to be something in the unguarded
>>>>>> part that
>>>>>> causes this. Again it may be an implicit action in the VM -
>>>>>> similar to
>>>>>> the class load issue for InterruptedException.
>>>>>
>>>>> Run a debug VM with -XX:+TraceExceptions to see where the OOME is
>>>>> triggered.
>>>>>
>>>>> David
>>>>> -----
>>>>>
>>>>>> David
>>>>>>
>>>>>> I am still
>>>>>>> unsure about the side effects of the code change and agree with
>>>>>>> your
>>>>>>> thoughts(on memory exhaustion test's reliability).
>>>>>>>
>>>>>>> PS: hotspot dev alias removed from CC.
>>>>>>>
>>>>>>> --
>>>>>>> Thanks
>>>>>>> kalyan
>>>>>>>
>>>>>>> On 12/19/13 5:08 PM, David Holmes wrote:
>>>>>>>> Hi Kalyan,
>>>>>>>>
>>>>>>>> This is not a hotspot issue so I'm moving this to core-libs,
>>>>>>>> please
>>>>>>>> drop hotspot from any replies.
>>>>>>>>
>>>>>>>> On 20/12/2013 6:26 AM, srikalyan wrote:
>>>>>>>>> Hi all, I have been working on the bug JDK-8022321
>>>>>>>>> <https://bugs.openjdk.java.net/browse/JDK-8022321> , this is a
>>>>>>>>> sporadic
>>>>>>>>> failure and the webrev is available here
>>>>>>>>> http://cr.openjdk.java.net/~srikchan/Regression/JDK-8022321_OOMEInReferenceHandler-webrev/
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> I'm really not sure what to make of this. We have a test that
>>>>>>>> triggers
>>>>>>>> an out-of-memory condition but the OOME can actually turn up in
>>>>>>>> the
>>>>>>>> ReferenceHandler thread causing it to terminate and the test to
>>>>>>>> fail.
>>>>>>>> We previously accounted for the non-obvious occurrences of OOME
>>>>>>>> due to
>>>>>>>> the Object.wait and the possible need to load the
>>>>>>>> InterruptedException
>>>>>>>> class - but still the OOME can appear where we don't want it. So
>>>>>>>> finally you have just placed the whole for(;;) loop in a
>>>>>>>> try/catch(OOME) that ignores the OOME. I'm certain that makes
>>>>>>>> the test
>>>>>>>> happy, but I'm not sure it is really what we want for the
>>>>>>>> ReferenceHandler thread. If the OOME occurs while cleaning, or
>>>>>>>> enqueuing then we will fail to clean and/or enqueue but there
>>>>>>>> would be
>>>>>>>> no indication that has occurred and I think that is a bigger
>>>>>>>> problem
>>>>>>>> than this test failing.
>>>>>>>>
>>>>>>>> There may be no way to make this test 100% reliable. In fact I'd
>>>>>>>> suggest that no memory exhaustion test can be 100% reliable.
>>>>>>>>
>>>>>>>> David
>>>>>>>>
>>>>>>>>> *
>>>>>>>>> **"Root Cause:Still not known"*
>>>>>>>>> 2 places where there is a possibility for OOME
>>>>>>>>> 1) Cleaner.clean()
>>>>>>>>> 2) ReferenceQueue.enqueue()
>>>>>>>>>
>>>>>>>>> 1) The cleanup code in turn has 2 places where there is
>>>>>>>>> potential for
>>>>>>>>> throwing OOME,
>>>>>>>>> a) thunk Thread which is run from clean() method. This
>>>>>>>>> Runnable is
>>>>>>>>> passed to Cleaner and appears in the following classes
>>>>>>>>> java/nio/DirectByteBuffer.java
>>>>>>>>> sun/misc/Perf.java
>>>>>>>>> sun/nio/fs/NativeBuffer.java
>>>>>>>>> sun/nio/ch/IOVecWrapper.java
>>>>>>>>> sun/misc/Cleaner/ExitOnThrow.java
>>>>>>>>> However none of the above overridden implementations ever
>>>>>>>>> create an
>>>>>>>>> object in the clean() code.
>>>>>>>>> b) new PrivilegedAction created in try catch Exception
>>>>>>>>> block of
>>>>>>>>> clean() method but for this object to be created and to be held
>>>>>>>>> responsible for OOME an Exception(other than OOME) has to be
>>>>>>>>> thrown.
>>>>>>>>>
>>>>>>>>> 2) No new heap objects are created in the enqueue method nor
>>>>>>>>> anywhere in
>>>>>>>>> the deep call stack (VM.addFinalRefCount() etc) so this cannot
>>>>>>>>> be a
>>>>>>>>> potential cause.
>>>>>>>>>
>>>>>>>>> *Experimental change to java.lang.Reference.java* :
>>>>>>>>> - Put one more guard (try catch with OOME block) in the Reference
>>>>>>>>> Handler Thread which may give the Reference Handler a chance to
>>>>>>>>> cleanup.
>>>>>>>>> This is fixing the test failure (several 1000 runs with 0
>>>>>>>>> failures)
>>>>>>>>> - Without the above change the test fails atleast 3-5 times
>>>>>>>>> for every
>>>>>>>>> 1000 run.
>>>>>>>>>
>>>>>>>>> *PS*: The code change is to a very critical part of JDK and i
>>>>>>>>> am fully
>>>>>>>>> not aware of the consequences of the change, hence seeking
>>>>>>>>> expert help
>>>>>>>>> here. Appreciate your time and inputs towards this.
>>>>>>>>>
>>>>>>>
>>>>
>>>
>>
>
More information about the core-libs-dev
mailing list