RFR [8038333] java/lang/ref/EarlyTimeout.java failed

Ivan Gerasimov ivan.gerasimov at oracle.com
Thu Mar 27 07:26:19 UTC 2014


David, Mandy, thank you for comments!

Here's what we want to achieve in the test:
Two EarlyTimeout child threads should both be blocked in remove(TIMEOUT) 
at the moment the weakReference gets enqueued.
This is the situation, when the bug 6853696 manifests itself.

If we made sure the weakReference is enqueued before remove() is called 
by the child threads, then they both would wait for specified amount of 
time even without the fix.

That's why we make the main thread wait for the children to start before 
calling System.gc().
I understand it may seem strange at the first glance.
But we want the child thread to be inside the remove() function when the 
main thread is unblocked, that's why it is done this way.

Unfortunately, we cannot easily make sure that remove() was called by 
the child threads at the moment the GC decides to enqueue the weakReference.
Therefore, the test is probabilistic by its nature, and we cannot be 
absolutely sure that at least one child thread will get non-null 
reference returned from remove().

I'm withdrawing the second webrev: While moving startedSignal.await() 
after System.gc() reduces the number of cases when nonNullRefCount == 0, 
it also greatly decreases the chances of occurring the situation the 
test was designed for.

I believe the correct fix here is to ignore the (quite rare) situation 
when nonNullRefCount upon the completion.
I suggest to return to the very first trivial fix:
http://cr.openjdk.java.net/~igerasim/8038333/1/webrev/

Sincerely yours,
Ivan

On 27.03.2014 4:30, Mandy Chung wrote:
> On 3/26/2014 3:57 PM, David Holmes wrote:
>> Ivan,
>>
>> I think the problem is that the EarlyTimeout threads can complete 
>> their remove(TIMEOUT) before the main thread has started them all, 
>> cleared the reference and called System.gc().
>>
>> Depending on exactly what is being tested, the EarlyTimeout threads 
>> may need to wait on another latch that is signalled by the main 
>> thread after the gc() call returns. Still no guarantee that the gc 
>> will do its work before the timeout elapses.
>>
>
> This is similar to what I have been thinking.  I believe the 
> EarlyTimeout threads don't need the startedSignal.countDown; instead 
> it can have a signal latch with 1 count.  The EarlyTimeout threads 
> awaits on the signal latch.  The main thread will call 
> signal.countDown after System.gc() and we can add a check 
> weakReference.isEnqueued to make sure before we awake the threads to 
> proceed the queue.remove call.
>
> Ivan - for the error message, perhaps you can simply do this:
>
>       if (nonNullRefCount != 1) {
>            throw new RuntimeException(nonNullRefCount + " references 
> were removed from queue");
>       }
>
>
> Mandy
>
>> David
>>
>> On 27/03/2014 6:18 AM, Ivan Gerasimov wrote:
>>> Thank you Mandy!
>>>
>>>> Are you able to reproduce the test failure?
>>>
>>> Yes, I could easily reproduce the failure when I reduced the timeout to
>>> 10 ms.
>>> With the timeout reduced, the test fails every third time on my 
>>> machine.
>>>
>>>> I think the test verifies that only one thread gets the reference is a
>>>> good test.
>>>>
>>> But if none of the threads could get the reference, it should not cause
>>> the failure of the test.
>>> It only means that during this particular run we could not have tested
>>> what we needed.
>>> We could retry, but I'm not sure it's worth complicating the test.
>>> It's easier to ignore the failure, taking into account that it happens
>>> rarely.
>>>
>>>> I think the race is due to the threads get to call queue.remove as
>>>> soon as both threads decrement the count of the latch that can be well
>>>> before the reference is enqueued.
>>>>
>>> The problem is that we have no way to block the main thread until there
>>> is a reference in the queue.
>>> I improved the situation a bit, having moved the await() after the call
>>> of gc().
>>>
>>>> It'd be good to add additional information in the test to help
>>>> diagnosing test failure.
>>>>
>>> I added reporting to stderr about being unable to remove a reference
>>> from the queue.
>>> I believe we shouldn't treat it as an error, and should simply 
>>> ignore it.
>>>
>>> Would you please have a look at the updated webrev:
>>> http://cr.openjdk.java.net/~igerasim/8038333/1/webrev/
>>>
>>> Sincerely yours,
>>> Ivan
>>>
>>>
>
>
>




More information about the core-libs-dev mailing list