RFR (XS): 8014890 : Reference queues may return more entries than expected
David Holmes
david.holmes at oracle.com
Mon Jul 1 00:43:14 UTC 2013
Hi Thomas,
Sorry for the delay in looking into this deeper but I've been OOTO a bit
this past week.
I'm backing up to the start to explore the apparent problem ...
On 19/06/2013 7:08 AM, Thomas Schatzl wrote:
> Hi all,
>
> can I have reviews for the following change?
>
> It happens if multiple threads are enqueuing and dequeuing reference
> objects into a reference queue, that Reference objects may be enqueued
> at multiple times.
>
> This is because when java.lang.ref.ReferenceQueue.poll() returns and
> inactivates a Reference object, another thread may just be during
> enqueuing it again.
>
> In the test case provided, the two threads that conflict are the
> reference handler thread and the program (main) thread.
>
> Relevant code:
>
> ReferenceHandlerThread.run():
>
> 0: [...]
> 1: ReferenceQueue q = r.queue; // r is the reference
> 2: if (r != ReferenceQueue.NULL)
> 3: q.enqueue().
>
> ReferenceQueue::poll()(reallyPoll()) code (I removed lots of code here)
>
> 1: synchronized(lock) {
> 2: [...]
> 3: r.queue = ReferenceQueue.NULL;
> 3:}
>
> I.e. while ReferenceQueue.poll() sets the Reference's queue to the NULL
> queue so that that reference will not be enqueued again (or at most into
> the NULL queue which is a nop), it happens that if the task switch
> occurs between lines 2 and 3 of the reference handler thread, q still
> contains the old queue reference, and the reference handler thread will
> enqueue the Reference into the original queue again.
Let's set some initial conditions here. For poll() to remove 'r' it must
already have been enqueued on this queue. That means that r.queue ==
ENQUEUED. That means the ReferenceHandler thread would actually enqueue
it to the ENQUEUED instance - which is harmless.
So it seems to me that we must have a race between two enqueue attempts
for a problem to arise here. So lets assume that there is a concurrent
r.enqueue() with the reference handlers attempt to enqueue, and a poll()
is mixed in. Consider the following:
refThread reads target q:
ReferenceQueue q = r.queue; // r is the reference
if (r != ReferenceQueue.NULL)
(Note: the check for NULL is at best an optimization; it isn't needed)
Thread_1 does r.enqueue()
So r.queue == ENQUEUED
Thread_2 does q.poll() and removes r.
r.queue == NULL
refThread continues and executes:
q.enqueue(r)
and so we have enqueued 'r' twice. Which seems to be the problem
scenario observed by the test, as we can then poll() and get 'r' a
second time.
But is it actually a problem if this happens? If I have two concurrent
attempts to enqueue a reference and concurrent attempt to dequeue it
(via poll) it seems quite plausible to see: enqueue, poll, enqueue - and
so the reference ends up in the queue. The test program is of the form:
for (int i = 0; i < iterations; i++) {
queue = new ReferenceQueue<Object>();
for (int j = 0; j < refs.length; j++) {
refs[j] = new WeakReference(new Object(), queue);
}
System.gc(); // enqueues references into the list of
discovered references
// now manually enqueue all of them
for (int j = 0; j < refs.length; j++) {
refs[j].enqueue();
}
// and get them back. There should be exactly numReferences
// entries in the queue now.
int foundReferences = 0;
while (queue.poll() != null) {
foundReferences++;
}
and naively we would assume that until the enqueue loop is complete (at
which point all refs are enqueued) then the ReferenceHandler thread will
not process any of those references as they are still strongly
reachable. If it processes them after then it is a no-op and either way
the poll() will find the exact number of references enqueued.
But that is not guaranteed to happen. As JLS 12.6.1 states:
"Optimizing transformations of a program can be designed that reduce the
number of objects that are reachable to be less than those which would
naively be considered reachable."
So in fact we might find that one or more references are no longer
reachable before their enqueue() operation is invoked and so
consequently we can indeed get this race between the reference handler's
attempt to enqueue and the test programs attempt to enqueue. I would say
this is a bug in the test program as it needs to ensure that the
references are guaranteed to be strongly reachable until after they have
had enqueue() invoked.
So your proposed fix really just masks the invalid assumption that a
ReferenceHandler based enqueue and a direct r.enqueue can't possibly be
concurrent.
Let's consider the case of two concurrent r.enqueue() calls, with an
interleaving poll(). Because Reference.enqueue lacks synchronization on
the reference while reading the queue you can get an interleaving where
the second enqueue() might see NULL, ENQUEUED or the actual queue
depending on the timing. I don't see any issue that requires a code
change. Only an omniscient observer can determine the exact order in
which the two enqueue()'s and the poll() occur, so pretty much any
outcome is valid.
So I don't think there is actually a bug in the reference code per-se,
at least not in relation to this test program.
Now the synchronization is still "all over the place". There are races
due to lack of synchronized (the ref should be locked whenever any of
its fields, ie queue & next, are modified), and lack of volatile on
fields accessed without synchronization. But whether any of those races
are actually a bug is a separate matter and very difficult to determine.
Bugs would arise where multi-field invariants are violated due to lack
of sync, but AFAICS that does not occur. Even JMM issues, reading stale
fields, don't seem to cause any problem here.
Cheers,
David
-----
> You can achieve the same effect by simply calling
> ReferenceQueue.enqueue() (i.e. without the reference handler thread, or
> within the reference handler thread doing the != NULL check), it's just
> that in such a case the "old" ReferenceQueue is stored in some register.
>
> The guard for ReferenceQueue.NULL does not have any effect except for
> possibly saving the virtual call. Simply calling r.enqueue() exhibits
> the same problem.
>
> The proposed solution is to filter out References within
> ReferenceQueue.enqueue() again. At that point we can check whether the
> given Reference is actually meant for this queue or not. Already removed
> References are known to be "inactive" (as enqueue and poll are mutually
> exclusive using a lock), in particular the References' queue is
> different (actually the NULL queue) to the queue it is being enqueued.
>
> This change should pose no change in semantics, as the ReferenceQueue of
> the Reference can only be set in its constructor, and as soon as the
> Reference is removed from a ReferenceQueue, its ReferenceQueue will be
> the NULL queue. (I.e. before this change you could not enqueue such an
> "inactive" Reference multiple times anyway)
>
> (too many References and queues here :)
>
> Webrev with test case
> http://cr.openjdk.java.net/~tschatzl/8014890/webrev/
>
> JIRA:
> https://jbs.oracle.com/bugs/browse/JDK-8014890
>
> bugs.sun.com
> http://bugs.sun.com/view_bug.do?bug_id=8014890
>
> Testing:
> jck, jprt, manual testing
>
> Note that I also need a sponsor to push in case this change is approved.
>
> Thanks,
> Thomas
>
More information about the core-libs-dev
mailing list