RFR: 8055232 (ref) Exceptions while processing Reference pending list

Peter Levart peter.levart at gmail.com
Fri Sep 19 06:43:13 UTC 2014


Hi,

This story has a long tail. It started with:

     https://bugs.openjdk.java.net/browse/JDK-7038914

Some stress tests triggered OOME in ReferenceHandler thread which would 
die. The first attempt at fixing this was the following discussion:

https://www.mail-archive.com/core-libs-dev%40openjdk.java.net/msg16250.html

Which resulted in patch:

     http://hg.openjdk.java.net/jdk8/jdk8/jdk/rev/0b8dab7fec54

This assumed that ReferenceHandler thread doing Object.wait() could be 
interrupted (by stress test - normal application don't do that) and 
failed to allocate InterruptedException object. A jtreg test was 
designed which triggered that situation and a fix would catch OOME and 
ignore it.

But the stress tests (the same or some other, I don't know) apparently 
were not entirely happy with this fix. The following discussion 
describes this:

https://www.mail-archive.com/core-libs-dev%40openjdk.java.net/msg23596.html

The other "unprotected" point at which OOME could be thrown and was 
later confirmed by debugging is (r instanceof Cleaner) test. The 
assumption was that it could trigger Cleaner class loading at 1st 
execution which would cause OOME to be thrown. The fix that finally 
silenced stress tests was the following:

     http://hg.openjdk.java.net/jdk9/dev/jdk/rev/d04102f69d46

This part of code (the j.l.r.Reference class and its members) has 
undergone further changes afterwards for fixing another bug:

     https://bugs.openjdk.java.net/browse/JDK-6857566

With following patch:

     http://hg.openjdk.java.net/jdk9/jdk9/jdk/rev/9934d34ed3c0

But this did not change the code paths - just factored-out the content 
of the loop into a separate method that could be used from outside.

All well until kim.barrett at oracle.com noticed that the 2nd fix 
introduced a potentially illegal situation. There is a 
j.l.r.Reference.Lock inner class and a singleton object assigned to 
static field in j.l.Reference class with the following notice:

     /* Object used to synchronize with the garbage collector.  The 
collector
      * must acquire this lock at the beginning of each collection 
cycle.  It is
      * therefore critical that any code holding this lock complete as 
quickly
      * as possible, allocate no new objects, and avoid calling user code.
      */
     static private class Lock { }
     private static Lock lock = new Lock();

The conflicting part is "allocate no new objects". Catching OOME inside 
a synchronized block holding this lock implies that new objects could be 
allocated. I have a feeling that the 2nd fix prevented that by 
pre-loading the Cleaner class at Reference class initialization time. 
But because it was hard to reproduce the situation where OOME was thrown 
from (r instanceof Cleaner) check, we nevertheless handled this 
hypothetical situation. Perhaps it would be better that we didn't and 
just see if OOME returned after just adding the pre-loading of Cleaner 
class...

So here we are, at an attempt to clean this up:

     https://bugs.openjdk.java.net/browse/JDK-8055232

We can move (r instanceof Cleaner) check outside of synchronized block 
to where it was before the 2nd fix and wait what stress tests will show. 
Another possibility is to move the instanceof check outside of 
synchronized block, but handle the hypothetical OOME by re-linking the 
unlinked reference back into the pending chain:

http://cr.openjdk.java.net/~plevart/jdk9-dev/ReferenceHandlerExceptions/webrev.01/

What would you suggest?

Regards, Peter




More information about the core-libs-dev mailing list