RFR (XS): 8014890 : Reference queues may return more entries than expected

David Holmes david.holmes at oracle.com
Mon Jul 1 07:51:27 UTC 2013


Well you can ignore what I wrote below - sorry. Somehow I got it in my 
head that multiple enqueue's were intended/supported when of course they 
are not. :(

So the proposed fix is okay - though I'd simplify the comment to just:

// Check that since getting the lock this reference hasn't already been
// enqueued (and even then removed)

The synchronization is problematic as I mention below but there is no 
easy fix due to the lock-ordering problem, and any attempt at such a fix 
would be much riskier. So this fix is fine - thanks.

David
-----

On 1/07/2013 10:43 AM, David Holmes wrote:
> Hi Thomas,
>
> Sorry for the delay in looking into this deeper but I've been OOTO a bit
> this past week.
>
> I'm backing up to the start to explore the apparent problem ...
>
> On 19/06/2013 7:08 AM, Thomas Schatzl wrote:
>> Hi all,
>>
>>    can I have reviews for the following change?
>>
>> It happens if multiple threads are enqueuing and dequeuing reference
>> objects into a reference queue, that Reference objects may be enqueued
>> at multiple times.
>>
>> This is because when java.lang.ref.ReferenceQueue.poll() returns and
>> inactivates a Reference object, another thread may just be during
>> enqueuing it again.
>>
>> In the test case provided, the two threads that conflict are the
>> reference handler thread and the program (main) thread.
>>
>> Relevant code:
>>
>> ReferenceHandlerThread.run():
>>
>> 0: [...]
>> 1: ReferenceQueue q = r.queue; // r is the reference
>> 2: if (r != ReferenceQueue.NULL)
>> 3:   q.enqueue().
>>
>> ReferenceQueue::poll()(reallyPoll()) code (I removed lots of code here)
>>
>> 1: synchronized(lock) {
>> 2:   [...]
>> 3:   r.queue = ReferenceQueue.NULL;
>> 3:}
>>
>> I.e. while ReferenceQueue.poll() sets the Reference's queue to the NULL
>> queue so that that reference will not be enqueued again (or at most into
>> the NULL queue which is a nop), it happens that if the task switch
>> occurs between lines 2 and 3 of the reference handler thread, q still
>> contains the old queue reference, and the reference handler thread will
>> enqueue the Reference into the original queue again.
>
> Let's set some initial conditions here. For poll() to remove 'r' it must
> already have been enqueued on this queue. That means that r.queue ==
> ENQUEUED. That means the ReferenceHandler thread would actually enqueue
> it to the ENQUEUED instance - which is harmless.
>
> So it seems to me that we must have a race between two enqueue attempts
> for a problem to arise here. So lets assume that there is a concurrent
> r.enqueue() with the reference handlers attempt to enqueue, and a poll()
> is mixed in. Consider the following:
>
> refThread reads target q:
>
>    ReferenceQueue q = r.queue; // r is the reference
>    if (r != ReferenceQueue.NULL)
>
> (Note: the check for NULL is at best an optimization; it isn't needed)
>
> Thread_1 does r.enqueue()
>
>    So r.queue == ENQUEUED
>
> Thread_2 does q.poll() and removes r.
>
>    r.queue == NULL
>
> refThread continues and executes:
>
>    q.enqueue(r)
>
> and so we have enqueued 'r' twice. Which seems to be the problem
> scenario observed by the test, as we can then poll() and get 'r' a
> second time.
>
> But is it actually a problem if this happens? If I have two concurrent
> attempts to enqueue a reference and concurrent attempt to dequeue it
> (via poll) it seems quite plausible to see: enqueue, poll, enqueue - and
> so the reference ends up in the queue. The test program is of the form:
>
>            for (int i = 0; i < iterations; i++) {
>                  queue = new ReferenceQueue<Object>();
>
>                  for (int j = 0; j < refs.length; j++) {
>                      refs[j] = new WeakReference(new Object(), queue);
>                  }
>
>                  System.gc(); // enqueues references into the list of
> discovered references
>
>                  // now manually enqueue all of them
>                  for (int j = 0; j < refs.length; j++) {
>                      refs[j].enqueue();
>                  }
>                  // and get them back. There should be exactly
> numReferences
>                  // entries in the queue now.
>                  int foundReferences = 0;
>                  while (queue.poll() != null) {
>                      foundReferences++;
>                  }
>
> and naively we would assume that until the enqueue loop is complete (at
> which point all refs are enqueued) then the ReferenceHandler thread will
> not process any of those references as they are still strongly
> reachable. If it processes them after then it is a no-op and either way
> the poll() will find the exact number of references enqueued.
>
> But that is not guaranteed to happen. As JLS 12.6.1 states:
>
> "Optimizing transformations of a program can be designed that reduce the
> number of objects that are reachable to be less than those which would
> naively be considered reachable."
>
> So in fact we might find that one or more references are no longer
> reachable before their enqueue() operation is invoked and so
> consequently we can indeed get this race between the reference handler's
> attempt to enqueue and the test programs attempt to enqueue. I would say
> this is a bug in the test program as it needs to ensure that the
> references are guaranteed to be strongly reachable until after they have
> had enqueue() invoked.
>
> So your proposed fix really just masks the invalid assumption that a
> ReferenceHandler based enqueue and a direct r.enqueue can't possibly be
> concurrent.
>
> Let's consider the case of two concurrent r.enqueue() calls, with an
> interleaving poll(). Because Reference.enqueue lacks synchronization on
> the reference while reading the queue you can get an interleaving where
> the second enqueue() might see NULL, ENQUEUED or the actual queue
> depending on the timing. I don't see any issue that requires a code
> change. Only an omniscient observer can determine the exact order in
> which the two enqueue()'s and the poll() occur, so pretty much any
> outcome is valid.
>
> So I don't think there is actually a bug in the reference code per-se,
> at least not in relation to this test program.
>
> Now the synchronization is still "all over the place". There are races
> due to lack of synchronized (the ref should be locked whenever any of
> its fields, ie queue & next, are modified), and lack of volatile on
> fields accessed without synchronization. But whether any of those races
> are actually a bug is a separate matter and very difficult to determine.
> Bugs would arise where multi-field invariants are violated due to lack
> of sync, but AFAICS that does not occur. Even JMM issues, reading stale
> fields, don't seem to cause any problem here.
>
> Cheers,
> David
> -----
>
>
>
>> You can achieve the same effect by simply calling
>> ReferenceQueue.enqueue() (i.e. without the reference handler thread, or
>> within the reference handler thread doing the != NULL check), it's just
>> that in such a case the "old" ReferenceQueue is stored in some register.
>>
>> The guard for ReferenceQueue.NULL does not have any effect except for
>> possibly saving the virtual call. Simply calling r.enqueue() exhibits
>> the same problem.
>>
>> The proposed solution is to filter out References within
>> ReferenceQueue.enqueue() again. At that point we can check whether the
>> given Reference is actually meant for this queue or not. Already removed
>> References are known to be "inactive" (as enqueue and poll are mutually
>> exclusive using a lock), in particular the References' queue is
>> different (actually the NULL queue) to the queue it is being enqueued.
>>
>> This change should pose no change in semantics, as the ReferenceQueue of
>> the Reference can only be set in its constructor, and as soon as the
>> Reference is removed from a ReferenceQueue, its ReferenceQueue will be
>> the NULL queue. (I.e. before this change you could not enqueue such an
>> "inactive" Reference multiple times anyway)
>>
>> (too many References and queues here :)
>>
>> Webrev with test case
>> http://cr.openjdk.java.net/~tschatzl/8014890/webrev/
>>
>> JIRA:
>> https://jbs.oracle.com/bugs/browse/JDK-8014890
>>
>> bugs.sun.com
>> http://bugs.sun.com/view_bug.do?bug_id=8014890
>>
>> Testing:
>> jck, jprt, manual testing
>>
>> Note that I also need a sponsor to push in case this change is approved.
>>
>> Thanks,
>>    Thomas
>>



More information about the core-libs-dev mailing list