jmx-dev [PATCH] JDK-7170447: Intermittent DeadListenerTest.java failure

Jaroslav Bachorik jaroslav.bachorik at oracle.com
Thu Jan 10 05:49:09 PST 2013


On 01/10/2013 02:18 PM, shanliang wrote:
> The weakListener is unnecessary, the test does already the same
> verification:
> 171         Set<?> setForUnreg = listenerMap.get(name);
> 172         assertTrue("No trace of unregistered MBean: " + setForUnreg,
> setForUnreg == null);

Addressed.

> 
> All other are OK for me.

So, http://cr.openjdk.java.net/~jbachorik/7170447/webrev.02 could be the
final version.

Thanks for the review!

-JB-

> 
> Shanliang
> 
> 
> Jaroslav Bachorik wrote:
>> On 01/10/2013 01:09 PM, Jaroslav Bachorik wrote:
>>  
>>> On 01/10/2013 12:53 PM, shanliang wrote:
>>>    
>>>> Instead to wait GC, you can also to wait the
>>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION, when you receive
>>>> it, then your listener must be removed too. Of course this solution is
>>>>       
>>> The problem is that the *NotificationForwarder implementations swallow
>>> this kind of notification and just perform the cleanup. No other
>>> listener will ever receive this notification.
>>>
>>> The "unregisterMBean" operation's semantics is not clearly defined.
>>> Intuitively, when unregistering an MBean all the associated listeners
>>> should be gone before the method returns. But this is not the case -
>>> currently the listeners are sanitized some time after the
>>> "unregisterMBean" operation started, eventually. There is no easy way to
>>> notify the API user that the listeners were removed. I am afraid that in
>>> order to resolve these problems new APIs would need to be introduced and
>>> the whole mechanism of delivering notification should be revisited (as
>>> it was planned for JMX 2.0, anyway).
>>>
>>> As for fixing the test - checking the weak references works fine as well
>>> as increasing the timeout. They both can fail when the system is
>>> extremely busy but the GC based solution will be in general faster than
>>> the one with increased timeout.
>>>     
>>
>> Updated webrev: http://cr.openjdk.java.net/~jbachorik/7170447/webrev.01
>>
>>  
>>> -JB-
>>>
>>>    
>>>> implementation dependent, but the test is already implementation
>>>> dependent.
>>>>
>>>> Shanliang
>>>>
>>>>
>>>> Jaroslav Bachorik wrote:
>>>>      
>>>>> On 01/10/2013 10:05 AM, shanliang wrote:
>>>>>  
>>>>>        
>>>>>> Jaroslav Bachorik wrote:
>>>>>>             
>>>>>>> On 01/09/2013 03:25 PM, shanliang wrote:
>>>>>>>  
>>>>>>>                 
>>>>>>>> Jaroslav Bachorik wrote:
>>>>>>>>                        
>>>>>>>>> On 01/09/2013 02:44 PM, shanliang wrote:
>>>>>>>>>  
>>>>>>>>>                              
>>>>>>>>>> Let's forget the JMX implementation at first. If an MBean is
>>>>>>>>>> unregistered, a user at client side calls
>>>>>>>>>> "removeNotificationListener"
>>>>>>>>>> on the MBean, what should happen? if the user calls
>>>>>>>>>> "isRegistered" on
>>>>>>>>>> the MBean, what should happen?
>>>>>>>>>>
>>>>>>>>>> I have done 2 tests, I used only one thread:
>>>>>>>>>>
>>>>>>>>>> 1)
>>>>>>>>>> ......
>>>>>>>>>> localServer.unregisterMBean(myMBean);
>>>>>>>>>> boolean isRegistered = remoteClientServer.isRegistered(myMBean));
>>>>>>>>>>
>>>>>>>>>> I got isRegistered = false;
>>>>>>>>>>
>>>>>>>>>> 2)
>>>>>>>>>> ......
>>>>>>>>>> localServer.unregisterMBean(myMBean);
>>>>>>>>>> System.out.println("isRegistered =
>>>>>>>>>> "+remoteClientServer.sRegistered(myMBean));
>>>>>>>>>> remoteClientServer.removeNotificationListener(myMBean, listener);
>>>>>>>>>>
>>>>>>>>>> I did not get an exception.
>>>>>>>>>>
>>>>>>>>>> The 1) told that the client could know the MBean was
>>>>>>>>>> unregistered,
>>>>>>>>>> then
>>>>>>>>>> the client should throw an exception for the call of
>>>>>>>>>> "removeNotificationListener" in 2).
>>>>>>>>>>                                           
>>>>>>>>> Yes, but then it would not test the listener leakage as it was
>>>>>>>>> supposed
>>>>>>>>> to test but rather the fact that the client throws the appropriate
>>>>>>>>> exception. The fact that the mbean was unregistered does not
>>>>>>>>> necessarily
>>>>>>>>> mean that the listeners were released. The main problem remains
>>>>>>>>> - the
>>>>>>>>> listeners are being cleaned-up asynchronously and the clean-up
>>>>>>>>> process
>>>>>>>>> might race against the other uses of the JMX API.
>>>>>>>>>                                   
>>>>>>>> client.removeNotificationListener is not a right way here to test
>>>>>>>> listener leak, we could use some other ways, for example we keep
>>>>>>>> the
>>>>>>>> listener in a weak reference, then after the mbean is removed, the
>>>>>>>> weak
>>>>>>>> reference should be empty after some time. Another way is like
>>>>>>>> DeadListenerTest does to check whether clean has done at server
>>>>>>>> side:
>>>>>>>> use reflection to get the "listenerMap" at server side and make
>>>>>>>> sure it
>>>>>>>> is empty, but this need to add a private method to the class
>>>>>>>> ClientNotifForwarder.
>>>>>>>>                           
>>>>>>> There will still be problems with timing. You need either to wait
>>>>>>> for
>>>>>>> the GC to kick in to clean up the weak ref. And the listenerMap will
>>>>>>> not
>>>>>>> be purged of the unregistered MBean listeners until the
>>>>>>> notification is
>>>>>>> generated, processed on the ClientNotificationForwarder and
>>>>>>> forwarded to
>>>>>>> the server. So there goes the timing issue again.
>>>>>>>
>>>>>>> The problem is that the "unregisterMBean" operation does not
>>>>>>> guarantee
>>>>>>> that the listeners have been unregistered at the time it returns.
>>>>>>> So,
>>>>>>> one way or the other we will need to wait an arbitrary amount of
>>>>>>> time
>>>>>>> before checking for the memory leak.
>>>>>>>                     
>>>>>> Yes we need to wait, but you can use a cycle like:
>>>>>>        long maxWaitingTime = 3000;
>>>>>>        long startTime = System.currentTimeMillis();
>>>>>>        while ( weakReference.get != null
>>>>>>                && System.currentTimeMillis() < startTime +
>>>>>> maxWaitingTime) {
>>>>>>            System.gc();
>>>>>>            Thread.sleep(100);
>>>>>>            System.gc();
>>>>>>        }
>>>>>>
>>>>>>       if (weakReference.get != null) {
>>>>>>          // failed
>>>>>>       }
>>>>>>               
>>>>> Still you need an arbitrary timeout which might be reached under
>>>>> extreme
>>>>> conditions making this test to fail intermittently. But I'd say that's
>>>>> the nature of tests for memory leak fixes, due to the unpredictable
>>>>> nature of the GC runs. Unless you take a heap dump and do a
>>>>> reachability
>>>>> analysis you can not be sure whether a reference is dangling
>>>>> somwehwere
>>>>> or it just hasn't been collected yet :/
>>>>>
>>>>> -JB-
>>>>>
>>>>>  
>>>>>        
>>>>>> Shanliang
>>>>>>             
>>>>>>> -JB-
>>>>>>>
>>>>>>>  
>>>>>>>                 
>>>>>>>> I think we have 3 things to do here:
>>>>>>>> 1) modify the test to not use removeNotificationListener for
>>>>>>>> testing
>>>>>>>> listener leak
>>>>>>>> 2) create a new bug about a client does not throw an exception
>>>>>>>> after an
>>>>>>>> mbean is unregistered
>>>>>>>> 3) create a bug about a client does not throw a same exception
>>>>>>>> as at
>>>>>>>> server side.
>>>>>>>>
>>>>>>>> I will do 2) and 3), if you like you can continue 1), it might
>>>>>>>> need to
>>>>>>>> do fix also in the JMX implementation.
>>>>>>>>
>>>>>>>> Shanliang
>>>>>>>>                        
>>>>>>>>>  
>>>>>>>>>                              
>>>>>>>>>> The test "DeadListenerTest" got passed in some machines because
>>>>>>>>>> of the
>>>>>>>>>> timeout for waiting a notification. I think its failure just
>>>>>>>>>> tells
>>>>>>>>>> a new
>>>>>>>>>> bug.
>>>>>>>>>>
>>>>>>>>>> To set a longer timeout just hides the real bug, and the test
>>>>>>>>>> might
>>>>>>>>>> fail
>>>>>>>>>> again one day if running condition is changed and you might need
>>>>>>>>>> longer
>>>>>>>>>> timeout again.
>>>>>>>>>>                                           
>>>>>>>>> Yes, I agree with you that extending the timeout just lessens the
>>>>>>>>> likelihood of the race condition and does not prevent it.
>>>>>>>>>
>>>>>>>>>  
>>>>>>>>>                              
>>>>>>>>>> Shanliang
>>>>>>>>>>
>>>>>>>>>> Jaroslav Bachorik wrote:
>>>>>>>>>>                                       
>>>>>>>>>>> On 01/09/2013 11:08 AM, shanliang wrote:
>>>>>>>>>>>  
>>>>>>>>>>>                                               
>>>>>>>>>>>> Jaroslav Bachorik wrote:
>>>>>>>>>>>>                                                          
>>>>>>>>>>>>> On 01/09/2013 09:40 AM, shanliang wrote:
>>>>>>>>>>>>>  
>>>>>>>>>>>>>                                            
>>>>>>>>>>>>>                        
>>>>>>>>>>>>>> I still have no idea why the test failed, but I do not see
>>>>>>>>>>>>>> why a
>>>>>>>>>>>>>> longer
>>>>>>>>>>>>>> timeout can fix the test. Have you reproduced the problem and
>>>>>>>>>>>>>> tested
>>>>>>>>>>>>>> your fix? if yes then possible the long timeout hided a real
>>>>>>>>>>>>>> problem.
>>>>>>>>>>>>>>                                                                                      
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yes, I can reproduce the problem (using the fastbuild bits and
>>>>>>>>>>>>> -Xcomp
>>>>>>>>>>>>> switch) and verify that the fix makes the test pass.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The ClientNotifForwarder scans the notifications for
>>>>>>>>>>>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and
>>>>>>>>>>>>> removes the
>>>>>>>>>>>>> appropriate notification listeners in a separate thread. Thus,
>>>>>>>>>>>>> calling
>>>>>>>>>>>>> "removeNotificationListener" on the main thread is prone to
>>>>>>>>>>>>> racing.
>>>>>>>>>>>>>                                                  
>>>>>>>>>>>>>                         
>>>>>>>>>>>> It is true that ClientNotifForwarder scans the notifications
>>>>>>>>>>>> for
>>>>>>>>>>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and removes
>>>>>>>>>>>> the
>>>>>>>>>>>> appropriate notification listeners in a separate thread.
>>>>>>>>>>>> This is
>>>>>>>>>>>> for a
>>>>>>>>>>>> client connection to do clean if a user never calls
>>>>>>>>>>>> removeNotificationListener.
>>>>>>>>>>>>
>>>>>>>>>>>> But calling directly removeNotificationListener from a client
>>>>>>>>>>>> should
>>>>>>>>>>>> still get exception if the clean has not been done. As I
>>>>>>>>>>>> said, if
>>>>>>>>>>>> the
>>>>>>>>>>>> client checked and found the listener was still there, then the
>>>>>>>>>>>> client
>>>>>>>>>>>> sent a request to its server to remove the listener at server
>>>>>>>>>>>> side,
>>>>>>>>>>>> the
>>>>>>>>>>>> server should find that the MBean in question was not
>>>>>>>>>>>> registered,
>>>>>>>>>>>> so the
>>>>>>>>>>>> server should throw an exception. The bug might be here.
>>>>>>>>>>>>                                                               
>>>>>>>>>>> This won't work. The server side listeners are removed upon
>>>>>>>>>>> receiving
>>>>>>>>>>> the "unregistered" notification which is delivered from the
>>>>>>>>>>> ClientNotificationForwarder and it may have not run yet
>>>>>>>>>>> (since it
>>>>>>>>>>> runs
>>>>>>>>>>> in a separate executor thread). The result is that the
>>>>>>>>>>> attempt to
>>>>>>>>>>> remove
>>>>>>>>>>> the notification listener on the server will succeed as well
>>>>>>>>>>> failing
>>>>>>>>>>> the
>>>>>>>>>>> test subsequently.
>>>>>>>>>>>
>>>>>>>>>>> -JB-
>>>>>>>>>>>
>>>>>>>>>>>  
>>>>>>>>>>>                                               
>>>>>>>>>>>> Shanliang
>>>>>>>>>>>>                                                          
>>>>>>>>>>>>>  
>>>>>>>>>>>>>                                            
>>>>>>>>>>>>>                        
>>>>>>>>>>>>>> The timeout you made longer was used to wait a notification
>>>>>>>>>>>>>> which
>>>>>>>>>>>>>> should
>>>>>>>>>>>>>> never arrive.
>>>>>>>>>>>>>>                                                                                      
>>>>>>>>>>>>>
>>>>>>>>>>>>> Well, it can be used to allow more time to process the
>>>>>>>>>>>>> "unregister"
>>>>>>>>>>>>> notification, too.
>>>>>>>>>>>>>
>>>>>>>>>>>>> When I think more of this I am more inclined to fix the race
>>>>>>>>>>>>> condition.
>>>>>>>>>>>>> An updated webrev will follow.
>>>>>>>>>>>>>
>>>>>>>>>>>>>  
>>>>>>>>>>>>>                                            
>>>>>>>>>>>>>                        
>>>>>>>>>>>>>> To remove a listener from a client side, we did:
>>>>>>>>>>>>>> 1) at client side, check whether it was added in the
>>>>>>>>>>>>>> client side
>>>>>>>>>>>>>> 2) at server side, check whether the MBean in question was
>>>>>>>>>>>>>> registered in
>>>>>>>>>>>>>> the MBeanServer (!!!)
>>>>>>>>>>>>>> 3) at server side, check whether the listener was added.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So 2) tells that we did not rely on a "unregister"
>>>>>>>>>>>>>> notification.
>>>>>>>>>>>>>> Anyway,
>>>>>>>>>>>>>> if you use a SAME thread to call "unregister" operation to
>>>>>>>>>>>>>> unregister an
>>>>>>>>>>>>>> mbean, then any following call (without any time break) to
>>>>>>>>>>>>>> use the
>>>>>>>>>>>>>> mbean
>>>>>>>>>>>>>> should fail, like "removeNotificationListener",
>>>>>>>>>>>>>> "isRegistered"
>>>>>>>>>>>>>> etc.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I do see a bug here, if we remove a listener from a
>>>>>>>>>>>>>> non-existing
>>>>>>>>>>>>>> MBeam,
>>>>>>>>>>>>>> we get "ListenerNotFoundException" at a client side, but get
>>>>>>>>>>>>>> "InstanceNotFoundException" at server side, I think we should
>>>>>>>>>>>>>> create a
>>>>>>>>>>>>>> bug, because both implemented the same interface
>>>>>>>>>>>>>> MBeanServerConnection.
>>>>>>>>>>>>>>                                                                                      
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yes, it is rather inconsistent.
>>>>>>>>>>>>>
>>>>>>>>>>>>> -JB-
>>>>>>>>>>>>>
>>>>>>>>>>>>>  
>>>>>>>>>>>>>                                            
>>>>>>>>>>>>>                        
>>>>>>>>>>>>>> Shanliang
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Jaroslav Bachorik wrote:
>>>>>>>>>>>>>>                                                       
>>>>>>>>>>>>>>                          
>>>>>>>>>>>>>>> Looking for review and a sponsor.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Webrev at
>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~jbachorik/7170447/webrev.00
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In this issue the timing is the problem.
>>>>>>>>>>>>>>> MBeanServer.unregisterMBean()
>>>>>>>>>>>>>>> fires the "unregister" notification which is sent to the
>>>>>>>>>>>>>>> server
>>>>>>>>>>>>>>> asynchronously. Thus it may happen that the "unregister"
>>>>>>>>>>>>>>> notification
>>>>>>>>>>>>>>> has not been yet processed at the time of invoking
>>>>>>>>>>>>>>> removeNotificationListener() and the notification listeners
>>>>>>>>>>>>>>> hasn't
>>>>>>>>>>>>>>> been
>>>>>>>>>>>>>>> cleaned up leading to the test failure.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> There is no synchronization between the client and the
>>>>>>>>>>>>>>> server and
>>>>>>>>>>>>>>> such
>>>>>>>>>>>>>>> race condition can occur occasionally. Normally, the
>>>>>>>>>>>>>>> execution is
>>>>>>>>>>>>>>> fast
>>>>>>>>>>>>>>> enough to behave like the "unregister" notification is
>>>>>>>>>>>>>>> processed
>>>>>>>>>>>>>>> synchronously with the unregisterMBean() operation but it
>>>>>>>>>>>>>>> seems
>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>> using fastdebug Server VM bits with the -Xcomp option
>>>>>>>>>>>>>>> strains
>>>>>>>>>>>>>>> the CPU
>>>>>>>>>>>>>>> enough to make this problem appear.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> There is no proper fix for this - the only thing that
>>>>>>>>>>>>>>> work is
>>>>>>>>>>>>>>> waiting a
>>>>>>>>>>>>>>> bit longer in the main thread to give the notification
>>>>>>>>>>>>>>> processing
>>>>>>>>>>>>>>> thread
>>>>>>>>>>>>>>> some time to clean up the listeners.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -JB-
>>>>>>>>>>>>>>>                                                 
>>>>>>>>>>>>>>>                                                   
>>>>>>>>>>>>>                                                  
>>>>>>>>>>>>>                         
>>>>>>>>>>>>                                                               
>>>>>>>>>>>                                                     
>>>>>>>>>>                                           
>>>>>>>>>                                   
>>>>>>>>                           
>>>>>>>                     
>>>>>>               
>>>>>           
>>>>       
>>
>>   
> 
> 



More information about the serviceability-dev mailing list