jmx-dev [PATCH] JDK-7170447: Intermittent DeadListenerTest.java failure

shanliang shanliang.jiang at oracle.com
Thu Jan 10 03:53:10 PST 2013


Instead to wait GC, you can also to wait the 
MBeanServerNotification.UNREGISTRATION_NOTIFICATION, when you receive 
it, then your listener must be removed too. Of course this solution is 
implementation dependent, but the test is already implementation dependent.

Shanliang


Jaroslav Bachorik wrote:
> On 01/10/2013 10:05 AM, shanliang wrote:
>   
>> Jaroslav Bachorik wrote:
>>     
>>> On 01/09/2013 03:25 PM, shanliang wrote:
>>>  
>>>       
>>>> Jaroslav Bachorik wrote:
>>>>    
>>>>         
>>>>> On 01/09/2013 02:44 PM, shanliang wrote:
>>>>>  
>>>>>      
>>>>>           
>>>>>> Let's forget the JMX implementation at first. If an MBean is
>>>>>> unregistered, a user at client side calls "removeNotificationListener"
>>>>>> on the MBean, what should happen? if the user calls "isRegistered" on
>>>>>> the MBean, what should happen?
>>>>>>
>>>>>> I have done 2 tests, I used only one thread:
>>>>>>
>>>>>> 1)
>>>>>> ......
>>>>>> localServer.unregisterMBean(myMBean);
>>>>>> boolean isRegistered = remoteClientServer.isRegistered(myMBean));
>>>>>>
>>>>>> I got isRegistered = false;
>>>>>>
>>>>>> 2)
>>>>>> ......
>>>>>> localServer.unregisterMBean(myMBean);
>>>>>> System.out.println("isRegistered =
>>>>>> "+remoteClientServer.sRegistered(myMBean));
>>>>>> remoteClientServer.removeNotificationListener(myMBean, listener);
>>>>>>
>>>>>> I did not get an exception.
>>>>>>
>>>>>> The 1) told that the client could know the MBean was unregistered,
>>>>>> then
>>>>>> the client should throw an exception for the call of
>>>>>> "removeNotificationListener" in 2).
>>>>>>             
>>>>>>             
>>>>> Yes, but then it would not test the listener leakage as it was supposed
>>>>> to test but rather the fact that the client throws the appropriate
>>>>> exception. The fact that the mbean was unregistered does not
>>>>> necessarily
>>>>> mean that the listeners were released. The main problem remains - the
>>>>> listeners are being cleaned-up asynchronously and the clean-up process
>>>>> might race against the other uses of the JMX API.
>>>>>         
>>>>>           
>>>> client.removeNotificationListener is not a right way here to test
>>>> listener leak, we could use some other ways, for example we keep the
>>>> listener in a weak reference, then after the mbean is removed, the weak
>>>> reference should be empty after some time. Another way is like
>>>> DeadListenerTest does to check whether clean has done at server side:
>>>> use reflection to get the "listenerMap" at server side and make sure it
>>>> is empty, but this need to add a private method to the class
>>>> ClientNotifForwarder.
>>>>     
>>>>         
>>> There will still be problems with timing. You need either to wait for
>>> the GC to kick in to clean up the weak ref. And the listenerMap will not
>>> be purged of the unregistered MBean listeners until the notification is
>>> generated, processed on the ClientNotificationForwarder and forwarded to
>>> the server. So there goes the timing issue again.
>>>
>>> The problem is that the "unregisterMBean" operation does not guarantee
>>> that the listeners have been unregistered at the time it returns. So,
>>> one way or the other we will need to wait an arbitrary amount of time
>>> before checking for the memory leak.
>>>   
>>>       
>> Yes we need to wait, but you can use a cycle like:
>>        long maxWaitingTime = 3000;
>>        long startTime = System.currentTimeMillis();
>>        while ( weakReference.get != null
>>                && System.currentTimeMillis() < startTime +
>> maxWaitingTime) {
>>            System.gc();
>>            Thread.sleep(100);
>>            System.gc();
>>        }
>>
>>       if (weakReference.get != null) {
>>          // failed
>>       }
>>     
>
> Still you need an arbitrary timeout which might be reached under extreme
> conditions making this test to fail intermittently. But I'd say that's
> the nature of tests for memory leak fixes, due to the unpredictable
> nature of the GC runs. Unless you take a heap dump and do a reachability
> analysis you can not be sure whether a reference is dangling somwehwere
> or it just hasn't been collected yet :/
>
> -JB-
>
>   
>> Shanliang
>>     
>>> -JB-
>>>
>>>  
>>>       
>>>> I think we have 3 things to do here:
>>>> 1) modify the test to not use removeNotificationListener for testing
>>>> listener leak
>>>> 2) create a new bug about a client does not throw an exception after an
>>>> mbean is unregistered
>>>> 3) create a bug about a client does not throw a same exception as at
>>>> server side.
>>>>
>>>> I will do 2) and 3), if you like you can continue 1), it might need to
>>>> do fix also in the JMX implementation.
>>>>
>>>> Shanliang
>>>>    
>>>>         
>>>>>  
>>>>>      
>>>>>           
>>>>>> The test "DeadListenerTest" got passed in some machines because of the
>>>>>> timeout for waiting a notification. I think its failure just tells
>>>>>> a new
>>>>>> bug.
>>>>>>
>>>>>> To set a longer timeout just hides the real bug, and the test might
>>>>>> fail
>>>>>> again one day if running condition is changed and you might need
>>>>>> longer
>>>>>> timeout again.
>>>>>>             
>>>>>>             
>>>>> Yes, I agree with you that extending the timeout just lessens the
>>>>> likelihood of the race condition and does not prevent it.
>>>>>
>>>>>  
>>>>>      
>>>>>           
>>>>>> Shanliang
>>>>>>
>>>>>> Jaroslav Bachorik wrote:
>>>>>>           
>>>>>>             
>>>>>>> On 01/09/2013 11:08 AM, shanliang wrote:
>>>>>>>  
>>>>>>>               
>>>>>>>               
>>>>>>>> Jaroslav Bachorik wrote:
>>>>>>>>                      
>>>>>>>>                 
>>>>>>>>> On 01/09/2013 09:40 AM, shanliang wrote:
>>>>>>>>>  
>>>>>>>>>                            
>>>>>>>>>                   
>>>>>>>>>> I still have no idea why the test failed, but I do not see why a
>>>>>>>>>> longer
>>>>>>>>>> timeout can fix the test. Have you reproduced the problem and
>>>>>>>>>> tested
>>>>>>>>>> your fix? if yes then possible the long timeout hided a real
>>>>>>>>>> problem.
>>>>>>>>>>                                         
>>>>>>>>>>                     
>>>>>>>>> Yes, I can reproduce the problem (using the fastbuild bits and
>>>>>>>>> -Xcomp
>>>>>>>>> switch) and verify that the fix makes the test pass.
>>>>>>>>>
>>>>>>>>> The ClientNotifForwarder scans the notifications for
>>>>>>>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and removes the
>>>>>>>>> appropriate notification listeners in a separate thread. Thus,
>>>>>>>>> calling
>>>>>>>>> "removeNotificationListener" on the main thread is prone to racing.
>>>>>>>>>                                 
>>>>>>>>>                   
>>>>>>>> It is true that ClientNotifForwarder scans the notifications for
>>>>>>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and removes the
>>>>>>>> appropriate notification listeners in a separate thread. This is
>>>>>>>> for a
>>>>>>>> client connection to do clean if a user never calls
>>>>>>>> removeNotificationListener.
>>>>>>>>
>>>>>>>> But calling directly removeNotificationListener from a client should
>>>>>>>> still get exception if the clean has not been done. As I said, if
>>>>>>>> the
>>>>>>>> client checked and found the listener was still there, then the
>>>>>>>> client
>>>>>>>> sent a request to its server to remove the listener at server side,
>>>>>>>> the
>>>>>>>> server should find that the MBean in question was not registered,
>>>>>>>> so the
>>>>>>>> server should throw an exception. The bug might be here.
>>>>>>>>                         
>>>>>>>>                 
>>>>>>> This won't work. The server side listeners are removed upon receiving
>>>>>>> the "unregistered" notification which is delivered from the
>>>>>>> ClientNotificationForwarder and it may have not run yet (since it
>>>>>>> runs
>>>>>>> in a separate executor thread). The result is that the attempt to
>>>>>>> remove
>>>>>>> the notification listener on the server will succeed as well failing
>>>>>>> the
>>>>>>> test subsequently.
>>>>>>>
>>>>>>> -JB-
>>>>>>>
>>>>>>>  
>>>>>>>               
>>>>>>>               
>>>>>>>> Shanliang
>>>>>>>>                      
>>>>>>>>                 
>>>>>>>>>  
>>>>>>>>>                            
>>>>>>>>>                   
>>>>>>>>>> The timeout you made longer was used to wait a notification which
>>>>>>>>>> should
>>>>>>>>>> never arrive.
>>>>>>>>>>                                         
>>>>>>>>>>                     
>>>>>>>>> Well, it can be used to allow more time to process the "unregister"
>>>>>>>>> notification, too.
>>>>>>>>>
>>>>>>>>> When I think more of this I am more inclined to fix the race
>>>>>>>>> condition.
>>>>>>>>> An updated webrev will follow.
>>>>>>>>>
>>>>>>>>>  
>>>>>>>>>                            
>>>>>>>>>                   
>>>>>>>>>> To remove a listener from a client side, we did:
>>>>>>>>>> 1) at client side, check whether it was added in the client side
>>>>>>>>>> 2) at server side, check whether the MBean in question was
>>>>>>>>>> registered in
>>>>>>>>>> the MBeanServer (!!!)
>>>>>>>>>> 3) at server side, check whether the listener was added.
>>>>>>>>>>
>>>>>>>>>> So 2) tells that we did not rely on a "unregister" notification.
>>>>>>>>>> Anyway,
>>>>>>>>>> if you use a SAME thread to call "unregister" operation to
>>>>>>>>>> unregister an
>>>>>>>>>> mbean, then any following call (without any time break) to use the
>>>>>>>>>> mbean
>>>>>>>>>> should fail, like "removeNotificationListener", "isRegistered"
>>>>>>>>>> etc.
>>>>>>>>>>
>>>>>>>>>> I do see a bug here, if we remove a listener from a non-existing
>>>>>>>>>> MBeam,
>>>>>>>>>> we get "ListenerNotFoundException" at a client side, but get
>>>>>>>>>> "InstanceNotFoundException" at server side, I think we should
>>>>>>>>>> create a
>>>>>>>>>> bug, because both implemented the same interface
>>>>>>>>>> MBeanServerConnection.
>>>>>>>>>>                                         
>>>>>>>>>>                     
>>>>>>>>> Yes, it is rather inconsistent.
>>>>>>>>>
>>>>>>>>> -JB-
>>>>>>>>>
>>>>>>>>>  
>>>>>>>>>                            
>>>>>>>>>                   
>>>>>>>>>> Shanliang
>>>>>>>>>>
>>>>>>>>>> Jaroslav Bachorik wrote:
>>>>>>>>>>                                     
>>>>>>>>>>                     
>>>>>>>>>>> Looking for review and a sponsor.
>>>>>>>>>>>
>>>>>>>>>>> Webrev at http://cr.openjdk.java.net/~jbachorik/7170447/webrev.00
>>>>>>>>>>>
>>>>>>>>>>> In this issue the timing is the problem.
>>>>>>>>>>> MBeanServer.unregisterMBean()
>>>>>>>>>>> fires the "unregister" notification which is sent to the server
>>>>>>>>>>> asynchronously. Thus it may happen that the "unregister"
>>>>>>>>>>> notification
>>>>>>>>>>> has not been yet processed at the time of invoking
>>>>>>>>>>> removeNotificationListener() and the notification listeners
>>>>>>>>>>> hasn't
>>>>>>>>>>> been
>>>>>>>>>>> cleaned up leading to the test failure.
>>>>>>>>>>>
>>>>>>>>>>> There is no synchronization between the client and the server and
>>>>>>>>>>> such
>>>>>>>>>>> race condition can occur occasionally. Normally, the execution is
>>>>>>>>>>> fast
>>>>>>>>>>> enough to behave like the "unregister" notification is processed
>>>>>>>>>>> synchronously with the unregisterMBean() operation but it seems
>>>>>>>>>>> that
>>>>>>>>>>> using fastdebug Server VM bits with the -Xcomp option strains
>>>>>>>>>>> the CPU
>>>>>>>>>>> enough to make this problem appear.
>>>>>>>>>>>
>>>>>>>>>>> There is no proper fix for this - the only thing that work is
>>>>>>>>>>> waiting a
>>>>>>>>>>> bit longer in the main thread to give the notification processing
>>>>>>>>>>> thread
>>>>>>>>>>> some time to clean up the listeners.
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>>
>>>>>>>>>>> -JB-
>>>>>>>>>>>                                                   
>>>>>>>>>>>                       
>>>>>>>>>                                 
>>>>>>>>>                   
>>>>>>>>                         
>>>>>>>>                 
>>>>>>>                   
>>>>>>>               
>>>>>>             
>>>>>>             
>>>>>         
>>>>>           
>>>>     
>>>>         
>>>   
>>>       
>>     
>
>   

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/serviceability-dev/attachments/20130110/c2e94c85/attachment-0001.html 


More information about the serviceability-dev mailing list