jmx-dev [PATCH] JDK-7146162: javax/management/remote/mandatory/connection/BrokenConnectionTest.java failing intermittently

Wed Dec 19 02:00:49 PST 2012

On 12/18/2012 08:21 PM, shanliang wrote:
> It is good to call
>    gotIOException((IOException)e);
> 
> but no need to call in next
>    restart((IOException)e);
> 
> the method "gotIOException" should call "restart" if necessary,
> otherwise "restart" may be called 2 times and generated 2 times
> "Reconnection notification".

Yes, it seems that the restart() will be called exactly twice or not at
all. I will remove the restart() call at the line 199.

Updated webrev to
http://cr.openjdk.java.net/~jbachorik/JDK-7146162/webrev.05

-JB-

> 
> Shanliang
> 
>   Jaroslav Bachorik wrote:
>> I've updated the webrev - no additional synchronisation is required.
>> Please, review:
>> http://cr.openjdk.java.net/~jbachorik/JDK-7146162/webrev.04
>>
>> -JB-
>>
>> On 10/31/2012 01:59 PM, Jaroslav Bachorik wrote:
>>  
>>> On 10/30/2012 05:10 PM, Eamonn McManus wrote:
>>>    
>>>> This area has historically caused a lot of problems and I am not
>>>> surprised to see that there are more. While I don't know what the best
>>>> way to fix the issue at hand is, I don't think this proposed change is
>>>> it. The reason is that the checkConnection and gotIOException methods
>>>> do blocking operations, and it is generally not a good idea to do
>>>> blocking operations in a synchronized block. Is there a way to avoid
>>>> the race condition without that?
>>>>       
>>> The important part is calling the gotIOException() method even from the
>>> heart-beat checker. I've tried to return the synchronization block back
>>> to the original state and the test passes with the check period of 10ms
>>> which pushes the probability of data races rather high.
>>>
>>> It seems that the worst that can happen would be one additional
>>> checkConnection() call - in case when the state gets set to TERMINATED
>>> by another thread right after it has been checked in the synchronized
>>> block the loop condition might evaluate to true if the state value has
>>> not been flushed yet.
>>>
>>> I could change the "state" variable to be volatile but I am not sure
>>> whether it's worth the hassle.
>>>
>>> -JB-
>>>
>>>    
>>>> Éamonn
>>>>
>>>>
>>>> 2012/10/29 Jaroslav Bachorik <jaroslav.bachorik at oracle.com>:
>>>>      
>>>>> I am looking for a sponsor and reviewers.
>>>>>
>>>>> The webrev is available at
>>>>> http://cr.openjdk.java.net/~jbachorik/JDK-7146162/webrev.03
>>>>>
>>>>> As explained in the issue the failure is caused by the RMI connection
>>>>> heart-beat thread racing against the thread executing the MBean
>>>>> operation and encountering the IOException. The heart beat thread sets
>>>>> the the admin state to "terminated" but does not send the failure
>>>>> notifications. On the other side the operation thread determines the
>>>>> state to be already terminated and skips the notifications as well.
>>>>>
>>>>> The fix adds the call to handle an ioexception, including sending the
>>>>> failure notifications, to the hear-beat connection failure handler.
>>>>> Also
>>>>> it widens the synchronized block since the whole code block
>>>>> checking for
>>>>> the connection failure and recovering must be run atomically,
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> -JB-
>>>>>         
>>
>>   
> 
>