jmx-dev [PATCH] JDK-7170447: Intermittent DeadListenerTest.java failure

Jaroslav Bachorik jaroslav.bachorik at oracle.com
Thu Jan 10 01:34:36 PST 2013


On 01/10/2013 10:05 AM, shanliang wrote:
> Jaroslav Bachorik wrote:
>> On 01/09/2013 03:25 PM, shanliang wrote:
>>  
>>> Jaroslav Bachorik wrote:
>>>    
>>>> On 01/09/2013 02:44 PM, shanliang wrote:
>>>>  
>>>>      
>>>>> Let's forget the JMX implementation at first. If an MBean is
>>>>> unregistered, a user at client side calls "removeNotificationListener"
>>>>> on the MBean, what should happen? if the user calls "isRegistered" on
>>>>> the MBean, what should happen?
>>>>>
>>>>> I have done 2 tests, I used only one thread:
>>>>>
>>>>> 1)
>>>>> ......
>>>>> localServer.unregisterMBean(myMBean);
>>>>> boolean isRegistered = remoteClientServer.isRegistered(myMBean));
>>>>>
>>>>> I got isRegistered = false;
>>>>>
>>>>> 2)
>>>>> ......
>>>>> localServer.unregisterMBean(myMBean);
>>>>> System.out.println("isRegistered =
>>>>> "+remoteClientServer.sRegistered(myMBean));
>>>>> remoteClientServer.removeNotificationListener(myMBean, listener);
>>>>>
>>>>> I did not get an exception.
>>>>>
>>>>> The 1) told that the client could know the MBean was unregistered,
>>>>> then
>>>>> the client should throw an exception for the call of
>>>>> "removeNotificationListener" in 2).
>>>>>             
>>>> Yes, but then it would not test the listener leakage as it was supposed
>>>> to test but rather the fact that the client throws the appropriate
>>>> exception. The fact that the mbean was unregistered does not
>>>> necessarily
>>>> mean that the listeners were released. The main problem remains - the
>>>> listeners are being cleaned-up asynchronously and the clean-up process
>>>> might race against the other uses of the JMX API.
>>>>         
>>> client.removeNotificationListener is not a right way here to test
>>> listener leak, we could use some other ways, for example we keep the
>>> listener in a weak reference, then after the mbean is removed, the weak
>>> reference should be empty after some time. Another way is like
>>> DeadListenerTest does to check whether clean has done at server side:
>>> use reflection to get the "listenerMap" at server side and make sure it
>>> is empty, but this need to add a private method to the class
>>> ClientNotifForwarder.
>>>     
>>
>> There will still be problems with timing. You need either to wait for
>> the GC to kick in to clean up the weak ref. And the listenerMap will not
>> be purged of the unregistered MBean listeners until the notification is
>> generated, processed on the ClientNotificationForwarder and forwarded to
>> the server. So there goes the timing issue again.
>>
>> The problem is that the "unregisterMBean" operation does not guarantee
>> that the listeners have been unregistered at the time it returns. So,
>> one way or the other we will need to wait an arbitrary amount of time
>> before checking for the memory leak.
>>   
> Yes we need to wait, but you can use a cycle like:
>        long maxWaitingTime = 3000;
>        long startTime = System.currentTimeMillis();
>        while ( weakReference.get != null
>                && System.currentTimeMillis() < startTime +
> maxWaitingTime) {
>            System.gc();
>            Thread.sleep(100);
>            System.gc();
>        }
> 
>       if (weakReference.get != null) {
>          // failed
>       }
> 

Sounds reasonable. I'll update the test.

-JB-

> Shanliang
>> -JB-
>>
>>  
>>> I think we have 3 things to do here:
>>> 1) modify the test to not use removeNotificationListener for testing
>>> listener leak
>>> 2) create a new bug about a client does not throw an exception after an
>>> mbean is unregistered
>>> 3) create a bug about a client does not throw a same exception as at
>>> server side.
>>>
>>> I will do 2) and 3), if you like you can continue 1), it might need to
>>> do fix also in the JMX implementation.
>>>
>>> Shanliang
>>>    
>>>>  
>>>>      
>>>>> The test "DeadListenerTest" got passed in some machines because of the
>>>>> timeout for waiting a notification. I think its failure just tells
>>>>> a new
>>>>> bug.
>>>>>
>>>>> To set a longer timeout just hides the real bug, and the test might
>>>>> fail
>>>>> again one day if running condition is changed and you might need
>>>>> longer
>>>>> timeout again.
>>>>>             
>>>> Yes, I agree with you that extending the timeout just lessens the
>>>> likelihood of the race condition and does not prevent it.
>>>>
>>>>  
>>>>      
>>>>> Shanliang
>>>>>
>>>>> Jaroslav Bachorik wrote:
>>>>>           
>>>>>> On 01/09/2013 11:08 AM, shanliang wrote:
>>>>>>  
>>>>>>               
>>>>>>> Jaroslav Bachorik wrote:
>>>>>>>                      
>>>>>>>> On 01/09/2013 09:40 AM, shanliang wrote:
>>>>>>>>  
>>>>>>>>                            
>>>>>>>>> I still have no idea why the test failed, but I do not see why a
>>>>>>>>> longer
>>>>>>>>> timeout can fix the test. Have you reproduced the problem and
>>>>>>>>> tested
>>>>>>>>> your fix? if yes then possible the long timeout hided a real
>>>>>>>>> problem.
>>>>>>>>>                                         
>>>>>>>> Yes, I can reproduce the problem (using the fastbuild bits and
>>>>>>>> -Xcomp
>>>>>>>> switch) and verify that the fix makes the test pass.
>>>>>>>>
>>>>>>>> The ClientNotifForwarder scans the notifications for
>>>>>>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and removes the
>>>>>>>> appropriate notification listeners in a separate thread. Thus,
>>>>>>>> calling
>>>>>>>> "removeNotificationListener" on the main thread is prone to racing.
>>>>>>>>                                 
>>>>>>> It is true that ClientNotifForwarder scans the notifications for
>>>>>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and removes the
>>>>>>> appropriate notification listeners in a separate thread. This is
>>>>>>> for a
>>>>>>> client connection to do clean if a user never calls
>>>>>>> removeNotificationListener.
>>>>>>>
>>>>>>> But calling directly removeNotificationListener from a client should
>>>>>>> still get exception if the clean has not been done. As I said, if
>>>>>>> the
>>>>>>> client checked and found the listener was still there, then the
>>>>>>> client
>>>>>>> sent a request to its server to remove the listener at server side,
>>>>>>> the
>>>>>>> server should find that the MBean in question was not registered,
>>>>>>> so the
>>>>>>> server should throw an exception. The bug might be here.
>>>>>>>                         
>>>>>> This won't work. The server side listeners are removed upon receiving
>>>>>> the "unregistered" notification which is delivered from the
>>>>>> ClientNotificationForwarder and it may have not run yet (since it
>>>>>> runs
>>>>>> in a separate executor thread). The result is that the attempt to
>>>>>> remove
>>>>>> the notification listener on the server will succeed as well failing
>>>>>> the
>>>>>> test subsequently.
>>>>>>
>>>>>> -JB-
>>>>>>
>>>>>>  
>>>>>>               
>>>>>>> Shanliang
>>>>>>>                      
>>>>>>>>  
>>>>>>>>                            
>>>>>>>>> The timeout you made longer was used to wait a notification which
>>>>>>>>> should
>>>>>>>>> never arrive.
>>>>>>>>>                                         
>>>>>>>> Well, it can be used to allow more time to process the "unregister"
>>>>>>>> notification, too.
>>>>>>>>
>>>>>>>> When I think more of this I am more inclined to fix the race
>>>>>>>> condition.
>>>>>>>> An updated webrev will follow.
>>>>>>>>
>>>>>>>>  
>>>>>>>>                            
>>>>>>>>> To remove a listener from a client side, we did:
>>>>>>>>> 1) at client side, check whether it was added in the client side
>>>>>>>>> 2) at server side, check whether the MBean in question was
>>>>>>>>> registered in
>>>>>>>>> the MBeanServer (!!!)
>>>>>>>>> 3) at server side, check whether the listener was added.
>>>>>>>>>
>>>>>>>>> So 2) tells that we did not rely on a "unregister" notification.
>>>>>>>>> Anyway,
>>>>>>>>> if you use a SAME thread to call "unregister" operation to
>>>>>>>>> unregister an
>>>>>>>>> mbean, then any following call (without any time break) to use the
>>>>>>>>> mbean
>>>>>>>>> should fail, like "removeNotificationListener", "isRegistered"
>>>>>>>>> etc.
>>>>>>>>>
>>>>>>>>> I do see a bug here, if we remove a listener from a non-existing
>>>>>>>>> MBeam,
>>>>>>>>> we get "ListenerNotFoundException" at a client side, but get
>>>>>>>>> "InstanceNotFoundException" at server side, I think we should
>>>>>>>>> create a
>>>>>>>>> bug, because both implemented the same interface
>>>>>>>>> MBeanServerConnection.
>>>>>>>>>                                         
>>>>>>>> Yes, it is rather inconsistent.
>>>>>>>>
>>>>>>>> -JB-
>>>>>>>>
>>>>>>>>  
>>>>>>>>                            
>>>>>>>>> Shanliang
>>>>>>>>>
>>>>>>>>> Jaroslav Bachorik wrote:
>>>>>>>>>                                     
>>>>>>>>>> Looking for review and a sponsor.
>>>>>>>>>>
>>>>>>>>>> Webrev at http://cr.openjdk.java.net/~jbachorik/7170447/webrev.00
>>>>>>>>>>
>>>>>>>>>> In this issue the timing is the problem.
>>>>>>>>>> MBeanServer.unregisterMBean()
>>>>>>>>>> fires the "unregister" notification which is sent to the server
>>>>>>>>>> asynchronously. Thus it may happen that the "unregister"
>>>>>>>>>> notification
>>>>>>>>>> has not been yet processed at the time of invoking
>>>>>>>>>> removeNotificationListener() and the notification listeners
>>>>>>>>>> hasn't
>>>>>>>>>> been
>>>>>>>>>> cleaned up leading to the test failure.
>>>>>>>>>>
>>>>>>>>>> There is no synchronization between the client and the server and
>>>>>>>>>> such
>>>>>>>>>> race condition can occur occasionally. Normally, the execution is
>>>>>>>>>> fast
>>>>>>>>>> enough to behave like the "unregister" notification is processed
>>>>>>>>>> synchronously with the unregisterMBean() operation but it seems
>>>>>>>>>> that
>>>>>>>>>> using fastdebug Server VM bits with the -Xcomp option strains
>>>>>>>>>> the CPU
>>>>>>>>>> enough to make this problem appear.
>>>>>>>>>>
>>>>>>>>>> There is no proper fix for this - the only thing that work is
>>>>>>>>>> waiting a
>>>>>>>>>> bit longer in the main thread to give the notification processing
>>>>>>>>>> thread
>>>>>>>>>> some time to clean up the listeners.
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>>
>>>>>>>>>> -JB-
>>>>>>>>>>                                                   
>>>>>>>>                                 
>>>>>>>                         
>>>>>>                   
>>>>>             
>>>>         
>>>     
>>
>>   
> 
> 



More information about the serviceability-dev mailing list