RFR (XS): 8014890 : Reference queues may return more entries than expected

David Holmes david.holmes at oracle.com
Mon Jul 1 00:43:14 UTC 2013

Hi Thomas,

Sorry for the delay in looking into this deeper but I've been OOTO a bit 
this past week.

I'm backing up to the start to explore the apparent problem ...

On 19/06/2013 7:08 AM, Thomas Schatzl wrote:
> Hi all,
>    can I have reviews for the following change?
> It happens if multiple threads are enqueuing and dequeuing reference
> objects into a reference queue, that Reference objects may be enqueued
> at multiple times.
> This is because when java.lang.ref.ReferenceQueue.poll() returns and
> inactivates a Reference object, another thread may just be during
> enqueuing it again.
> In the test case provided, the two threads that conflict are the
> reference handler thread and the program (main) thread.
> Relevant code:
> ReferenceHandlerThread.run():
> 0: [...]
> 1: ReferenceQueue q = r.queue; // r is the reference
> 2: if (r != ReferenceQueue.NULL)
> 3:   q.enqueue().
> ReferenceQueue::poll()(reallyPoll()) code (I removed lots of code here)
> 1: synchronized(lock) {
> 2:   [...]
> 3:   r.queue = ReferenceQueue.NULL;
> 3:}
> I.e. while ReferenceQueue.poll() sets the Reference's queue to the NULL
> queue so that that reference will not be enqueued again (or at most into
> the NULL queue which is a nop), it happens that if the task switch
> occurs between lines 2 and 3 of the reference handler thread, q still
> contains the old queue reference, and the reference handler thread will
> enqueue the Reference into the original queue again.

Let's set some initial conditions here. For poll() to remove 'r' it must 
already have been enqueued on this queue. That means that r.queue == 
ENQUEUED. That means the ReferenceHandler thread would actually enqueue 
it to the ENQUEUED instance - which is harmless.

So it seems to me that we must have a race between two enqueue attempts 
for a problem to arise here. So lets assume that there is a concurrent 
r.enqueue() with the reference handlers attempt to enqueue, and a poll() 
is mixed in. Consider the following:

refThread reads target q:

   ReferenceQueue q = r.queue; // r is the reference
   if (r != ReferenceQueue.NULL)

(Note: the check for NULL is at best an optimization; it isn't needed)

Thread_1 does r.enqueue()

   So r.queue == ENQUEUED

Thread_2 does q.poll() and removes r.

   r.queue == NULL

refThread continues and executes:


and so we have enqueued 'r' twice. Which seems to be the problem 
scenario observed by the test, as we can then poll() and get 'r' a 
second time.

But is it actually a problem if this happens? If I have two concurrent 
attempts to enqueue a reference and concurrent attempt to dequeue it 
(via poll) it seems quite plausible to see: enqueue, poll, enqueue - and 
so the reference ends up in the queue. The test program is of the form:

           for (int i = 0; i < iterations; i++) {
                 queue = new ReferenceQueue<Object>();

                 for (int j = 0; j < refs.length; j++) {
                     refs[j] = new WeakReference(new Object(), queue);

                 System.gc(); // enqueues references into the list of 
discovered references

                 // now manually enqueue all of them
                 for (int j = 0; j < refs.length; j++) {
                 // and get them back. There should be exactly numReferences
                 // entries in the queue now.
                 int foundReferences = 0;
                 while (queue.poll() != null) {

and naively we would assume that until the enqueue loop is complete (at 
which point all refs are enqueued) then the ReferenceHandler thread will 
not process any of those references as they are still strongly 
reachable. If it processes them after then it is a no-op and either way 
the poll() will find the exact number of references enqueued.

But that is not guaranteed to happen. As JLS 12.6.1 states:

"Optimizing transformations of a program can be designed that reduce the 
number of objects that are reachable to be less than those which would 
naively be considered reachable."

So in fact we might find that one or more references are no longer 
reachable before their enqueue() operation is invoked and so 
consequently we can indeed get this race between the reference handler's 
attempt to enqueue and the test programs attempt to enqueue. I would say 
this is a bug in the test program as it needs to ensure that the 
references are guaranteed to be strongly reachable until after they have 
had enqueue() invoked.

So your proposed fix really just masks the invalid assumption that a 
ReferenceHandler based enqueue and a direct r.enqueue can't possibly be 

Let's consider the case of two concurrent r.enqueue() calls, with an 
interleaving poll(). Because Reference.enqueue lacks synchronization on 
the reference while reading the queue you can get an interleaving where 
the second enqueue() might see NULL, ENQUEUED or the actual queue 
depending on the timing. I don't see any issue that requires a code 
change. Only an omniscient observer can determine the exact order in 
which the two enqueue()'s and the poll() occur, so pretty much any 
outcome is valid.

So I don't think there is actually a bug in the reference code per-se, 
at least not in relation to this test program.

Now the synchronization is still "all over the place". There are races 
due to lack of synchronized (the ref should be locked whenever any of 
its fields, ie queue & next, are modified), and lack of volatile on 
fields accessed without synchronization. But whether any of those races 
are actually a bug is a separate matter and very difficult to determine. 
Bugs would arise where multi-field invariants are violated due to lack 
of sync, but AFAICS that does not occur. Even JMM issues, reading stale 
fields, don't seem to cause any problem here.


> You can achieve the same effect by simply calling
> ReferenceQueue.enqueue() (i.e. without the reference handler thread, or
> within the reference handler thread doing the != NULL check), it's just
> that in such a case the "old" ReferenceQueue is stored in some register.
> The guard for ReferenceQueue.NULL does not have any effect except for
> possibly saving the virtual call. Simply calling r.enqueue() exhibits
> the same problem.
> The proposed solution is to filter out References within
> ReferenceQueue.enqueue() again. At that point we can check whether the
> given Reference is actually meant for this queue or not. Already removed
> References are known to be "inactive" (as enqueue and poll are mutually
> exclusive using a lock), in particular the References' queue is
> different (actually the NULL queue) to the queue it is being enqueued.
> This change should pose no change in semantics, as the ReferenceQueue of
> the Reference can only be set in its constructor, and as soon as the
> Reference is removed from a ReferenceQueue, its ReferenceQueue will be
> the NULL queue. (I.e. before this change you could not enqueue such an
> "inactive" Reference multiple times anyway)
> (too many References and queues here :)
> Webrev with test case
> http://cr.openjdk.java.net/~tschatzl/8014890/webrev/
> https://jbs.oracle.com/bugs/browse/JDK-8014890
> bugs.sun.com
> http://bugs.sun.com/view_bug.do?bug_id=8014890
> Testing:
> jck, jprt, manual testing
> Note that I also need a sponsor to push in case this change is approved.
> Thanks,
>    Thomas

More information about the core-libs-dev mailing list