jmx-dev [PATCH] JDK-7170447: Intermittent DeadListenerTest.java failure

shanliang shanliang.jiang at oracle.com
Thu Jan 10 01:05:11 PST 2013


Jaroslav Bachorik wrote:
> On 01/09/2013 03:25 PM, shanliang wrote:
>   
>> Jaroslav Bachorik wrote:
>>     
>>> On 01/09/2013 02:44 PM, shanliang wrote:
>>>  
>>>       
>>>> Let's forget the JMX implementation at first. If an MBean is
>>>> unregistered, a user at client side calls "removeNotificationListener"
>>>> on the MBean, what should happen? if the user calls "isRegistered" on
>>>> the MBean, what should happen?
>>>>
>>>> I have done 2 tests, I used only one thread:
>>>>
>>>> 1)
>>>> ......
>>>> localServer.unregisterMBean(myMBean);
>>>> boolean isRegistered = remoteClientServer.isRegistered(myMBean));
>>>>
>>>> I got isRegistered = false;
>>>>
>>>> 2)
>>>> ......
>>>> localServer.unregisterMBean(myMBean);
>>>> System.out.println("isRegistered =
>>>> "+remoteClientServer.sRegistered(myMBean));
>>>> remoteClientServer.removeNotificationListener(myMBean, listener);
>>>>
>>>> I did not get an exception.
>>>>
>>>> The 1) told that the client could know the MBean was unregistered, then
>>>> the client should throw an exception for the call of
>>>> "removeNotificationListener" in 2).
>>>>     
>>>>         
>>> Yes, but then it would not test the listener leakage as it was supposed
>>> to test but rather the fact that the client throws the appropriate
>>> exception. The fact that the mbean was unregistered does not necessarily
>>> mean that the listeners were released. The main problem remains - the
>>> listeners are being cleaned-up asynchronously and the clean-up process
>>> might race against the other uses of the JMX API.
>>>   
>>>       
>> client.removeNotificationListener is not a right way here to test
>> listener leak, we could use some other ways, for example we keep the
>> listener in a weak reference, then after the mbean is removed, the weak
>> reference should be empty after some time. Another way is like
>> DeadListenerTest does to check whether clean has done at server side:
>> use reflection to get the "listenerMap" at server side and make sure it
>> is empty, but this need to add a private method to the class
>> ClientNotifForwarder.
>>     
>
> There will still be problems with timing. You need either to wait for
> the GC to kick in to clean up the weak ref. And the listenerMap will not
> be purged of the unregistered MBean listeners until the notification is
> generated, processed on the ClientNotificationForwarder and forwarded to
> the server. So there goes the timing issue again.
>
> The problem is that the "unregisterMBean" operation does not guarantee
> that the listeners have been unregistered at the time it returns. So,
> one way or the other we will need to wait an arbitrary amount of time
> before checking for the memory leak.
>   
Yes we need to wait, but you can use a cycle like:
        long maxWaitingTime = 3000;
        long startTime = System.currentTimeMillis();
        while ( weakReference.get != null
                && System.currentTimeMillis() < startTime + 
maxWaitingTime) {
            System.gc();
            Thread.sleep(100);
            System.gc();
        }

       if (weakReference.get != null) {
          // failed
       }

Shanliang
> -JB-
>
>   
>> I think we have 3 things to do here:
>> 1) modify the test to not use removeNotificationListener for testing
>> listener leak
>> 2) create a new bug about a client does not throw an exception after an
>> mbean is unregistered
>> 3) create a bug about a client does not throw a same exception as at
>> server side.
>>
>> I will do 2) and 3), if you like you can continue 1), it might need to
>> do fix also in the JMX implementation.
>>
>> Shanliang
>>     
>>>  
>>>       
>>>> The test "DeadListenerTest" got passed in some machines because of the
>>>> timeout for waiting a notification. I think its failure just tells a new
>>>> bug.
>>>>
>>>> To set a longer timeout just hides the real bug, and the test might fail
>>>> again one day if running condition is changed and you might need longer
>>>> timeout again.
>>>>     
>>>>         
>>> Yes, I agree with you that extending the timeout just lessens the
>>> likelihood of the race condition and does not prevent it.
>>>
>>>  
>>>       
>>>> Shanliang
>>>>
>>>> Jaroslav Bachorik wrote:
>>>>    
>>>>         
>>>>> On 01/09/2013 11:08 AM, shanliang wrote:
>>>>>  
>>>>>      
>>>>>           
>>>>>> Jaroslav Bachorik wrote:
>>>>>>           
>>>>>>             
>>>>>>> On 01/09/2013 09:40 AM, shanliang wrote:
>>>>>>>  
>>>>>>>               
>>>>>>>               
>>>>>>>> I still have no idea why the test failed, but I do not see why a
>>>>>>>> longer
>>>>>>>> timeout can fix the test. Have you reproduced the problem and tested
>>>>>>>> your fix? if yes then possible the long timeout hided a real
>>>>>>>> problem.
>>>>>>>>                         
>>>>>>>>                 
>>>>>>> Yes, I can reproduce the problem (using the fastbuild bits and -Xcomp
>>>>>>> switch) and verify that the fix makes the test pass.
>>>>>>>
>>>>>>> The ClientNotifForwarder scans the notifications for
>>>>>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and removes the
>>>>>>> appropriate notification listeners in a separate thread. Thus,
>>>>>>> calling
>>>>>>> "removeNotificationListener" on the main thread is prone to racing.
>>>>>>>                   
>>>>>>>               
>>>>>> It is true that ClientNotifForwarder scans the notifications for
>>>>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and removes the
>>>>>> appropriate notification listeners in a separate thread. This is for a
>>>>>> client connection to do clean if a user never calls
>>>>>> removeNotificationListener.
>>>>>>
>>>>>> But calling directly removeNotificationListener from a client should
>>>>>> still get exception if the clean has not been done. As I said, if the
>>>>>> client checked and found the listener was still there, then the client
>>>>>> sent a request to its server to remove the listener at server side,
>>>>>> the
>>>>>> server should find that the MBean in question was not registered,
>>>>>> so the
>>>>>> server should throw an exception. The bug might be here.
>>>>>>             
>>>>>>             
>>>>> This won't work. The server side listeners are removed upon receiving
>>>>> the "unregistered" notification which is delivered from the
>>>>> ClientNotificationForwarder and it may have not run yet (since it runs
>>>>> in a separate executor thread). The result is that the attempt to
>>>>> remove
>>>>> the notification listener on the server will succeed as well failing
>>>>> the
>>>>> test subsequently.
>>>>>
>>>>> -JB-
>>>>>
>>>>>  
>>>>>      
>>>>>           
>>>>>> Shanliang
>>>>>>           
>>>>>>             
>>>>>>>  
>>>>>>>               
>>>>>>>               
>>>>>>>> The timeout you made longer was used to wait a notification which
>>>>>>>> should
>>>>>>>> never arrive.
>>>>>>>>                         
>>>>>>>>                 
>>>>>>> Well, it can be used to allow more time to process the "unregister"
>>>>>>> notification, too.
>>>>>>>
>>>>>>> When I think more of this I am more inclined to fix the race
>>>>>>> condition.
>>>>>>> An updated webrev will follow.
>>>>>>>
>>>>>>>  
>>>>>>>               
>>>>>>>               
>>>>>>>> To remove a listener from a client side, we did:
>>>>>>>> 1) at client side, check whether it was added in the client side
>>>>>>>> 2) at server side, check whether the MBean in question was
>>>>>>>> registered in
>>>>>>>> the MBeanServer (!!!)
>>>>>>>> 3) at server side, check whether the listener was added.
>>>>>>>>
>>>>>>>> So 2) tells that we did not rely on a "unregister" notification.
>>>>>>>> Anyway,
>>>>>>>> if you use a SAME thread to call "unregister" operation to
>>>>>>>> unregister an
>>>>>>>> mbean, then any following call (without any time break) to use the
>>>>>>>> mbean
>>>>>>>> should fail, like "removeNotificationListener", "isRegistered" etc.
>>>>>>>>
>>>>>>>> I do see a bug here, if we remove a listener from a non-existing
>>>>>>>> MBeam,
>>>>>>>> we get "ListenerNotFoundException" at a client side, but get
>>>>>>>> "InstanceNotFoundException" at server side, I think we should
>>>>>>>> create a
>>>>>>>> bug, because both implemented the same interface
>>>>>>>> MBeanServerConnection.
>>>>>>>>                         
>>>>>>>>                 
>>>>>>> Yes, it is rather inconsistent.
>>>>>>>
>>>>>>> -JB-
>>>>>>>
>>>>>>>  
>>>>>>>               
>>>>>>>               
>>>>>>>> Shanliang
>>>>>>>>
>>>>>>>> Jaroslav Bachorik wrote:
>>>>>>>>                      
>>>>>>>>                 
>>>>>>>>> Looking for review and a sponsor.
>>>>>>>>>
>>>>>>>>> Webrev at http://cr.openjdk.java.net/~jbachorik/7170447/webrev.00
>>>>>>>>>
>>>>>>>>> In this issue the timing is the problem.
>>>>>>>>> MBeanServer.unregisterMBean()
>>>>>>>>> fires the "unregister" notification which is sent to the server
>>>>>>>>> asynchronously. Thus it may happen that the "unregister"
>>>>>>>>> notification
>>>>>>>>> has not been yet processed at the time of invoking
>>>>>>>>> removeNotificationListener() and the notification listeners hasn't
>>>>>>>>> been
>>>>>>>>> cleaned up leading to the test failure.
>>>>>>>>>
>>>>>>>>> There is no synchronization between the client and the server and
>>>>>>>>> such
>>>>>>>>> race condition can occur occasionally. Normally, the execution is
>>>>>>>>> fast
>>>>>>>>> enough to behave like the "unregister" notification is processed
>>>>>>>>> synchronously with the unregisterMBean() operation but it seems
>>>>>>>>> that
>>>>>>>>> using fastdebug Server VM bits with the -Xcomp option strains
>>>>>>>>> the CPU
>>>>>>>>> enough to make this problem appear.
>>>>>>>>>
>>>>>>>>> There is no proper fix for this - the only thing that work is
>>>>>>>>> waiting a
>>>>>>>>> bit longer in the main thread to give the notification processing
>>>>>>>>> thread
>>>>>>>>> some time to clean up the listeners.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>>
>>>>>>>>> -JB-
>>>>>>>>>                                 
>>>>>>>>>                   
>>>>>>>                   
>>>>>>>               
>>>>>>             
>>>>>>             
>>>>>         
>>>>>           
>>>>     
>>>>         
>>>   
>>>       
>>     
>
>   

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/serviceability-dev/attachments/20130110/c9203ef0/attachment-0001.html 


More information about the serviceability-dev mailing list