Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently
david.holmes at oracle.com
Fri Dec 20 06:49:15 UTC 2013
On 20/12/2013 12:57 PM, srikalyan chandrashekar wrote:
> Hi David Thanks for your comments, the unguarded part(clean and enqueue)
> in the Reference Handler thread does not seem to create any new objects,
> so it is the application(the test in this case) which is adding objects
> to heap and causing the Reference Handler to die with OOME.
The ReferenceHandler thread can only get OOME if it allocates (directly
or indirectly) - so there has to be something in the unguarded part that
causes this. Again it may be an implicit action in the VM - similar to
the class load issue for InterruptedException.
I am still
> unsure about the side effects of the code change and agree with your
> thoughts(on memory exhaustion test's reliability).
> PS: hotspot dev alias removed from CC.
> On 12/19/13 5:08 PM, David Holmes wrote:
>> Hi Kalyan,
>> This is not a hotspot issue so I'm moving this to core-libs, please
>> drop hotspot from any replies.
>> On 20/12/2013 6:26 AM, srikalyan wrote:
>>> Hi all, I have been working on the bug JDK-8022321
>>> <https://bugs.openjdk.java.net/browse/JDK-8022321> , this is a sporadic
>>> failure and the webrev is available here
>> I'm really not sure what to make of this. We have a test that triggers
>> an out-of-memory condition but the OOME can actually turn up in the
>> ReferenceHandler thread causing it to terminate and the test to fail.
>> We previously accounted for the non-obvious occurrences of OOME due to
>> the Object.wait and the possible need to load the InterruptedException
>> class - but still the OOME can appear where we don't want it. So
>> finally you have just placed the whole for(;;) loop in a
>> try/catch(OOME) that ignores the OOME. I'm certain that makes the test
>> happy, but I'm not sure it is really what we want for the
>> ReferenceHandler thread. If the OOME occurs while cleaning, or
>> enqueuing then we will fail to clean and/or enqueue but there would be
>> no indication that has occurred and I think that is a bigger problem
>> than this test failing.
>> There may be no way to make this test 100% reliable. In fact I'd
>> suggest that no memory exhaustion test can be 100% reliable.
>>> **"Root Cause:Still not known"*
>>> 2 places where there is a possibility for OOME
>>> 1) Cleaner.clean()
>>> 2) ReferenceQueue.enqueue()
>>> 1) The cleanup code in turn has 2 places where there is potential for
>>> throwing OOME,
>>> a) thunk Thread which is run from clean() method. This Runnable is
>>> passed to Cleaner and appears in the following classes
>>> However none of the above overridden implementations ever create an
>>> object in the clean() code.
>>> b) new PrivilegedAction created in try catch Exception block of
>>> clean() method but for this object to be created and to be held
>>> responsible for OOME an Exception(other than OOME) has to be thrown.
>>> 2) No new heap objects are created in the enqueue method nor anywhere in
>>> the deep call stack (VM.addFinalRefCount() etc) so this cannot be a
>>> potential cause.
>>> *Experimental change to java.lang.Reference.java* :
>>> - Put one more guard (try catch with OOME block) in the Reference
>>> Handler Thread which may give the Reference Handler a chance to cleanup.
>>> This is fixing the test failure (several 1000 runs with 0 failures)
>>> - Without the above change the test fails atleast 3-5 times for every
>>> 1000 run.
>>> *PS*: The code change is to a very critical part of JDK and i am fully
>>> not aware of the consequences of the change, hence seeking expert help
>>> here. Appreciate your time and inputs towards this.
More information about the core-libs-dev