jmx-dev [PATCH] JDK-7170447: Intermittent DeadListenerTest.java failure

shanliang shanliang.jiang at oracle.com
Thu Jan 10 07:14:40 PST 2013


It is OK for me, thanks for fixing the bug!

Shanliang

Jaroslav Bachorik wrote:
> On 01/10/2013 02:18 PM, shanliang wrote:
>   
>> The weakListener is unnecessary, the test does already the same
>> verification:
>> 171         Set<?> setForUnreg = listenerMap.get(name);
>> 172         assertTrue("No trace of unregistered MBean: " + setForUnreg,
>> setForUnreg == null);
>>     
>
> Addressed.
>
>   
>> All other are OK for me.
>>     
>
> So, http://cr.openjdk.java.net/~jbachorik/7170447/webrev.02 could be the
> final version.
>
> Thanks for the review!
>
> -JB-
>
>   
>> Shanliang
>>
>>
>> Jaroslav Bachorik wrote:
>>     
>>> On 01/10/2013 01:09 PM, Jaroslav Bachorik wrote:
>>>  
>>>       
>>>> On 01/10/2013 12:53 PM, shanliang wrote:
>>>>    
>>>>         
>>>>> Instead to wait GC, you can also to wait the
>>>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION, when you receive
>>>>> it, then your listener must be removed too. Of course this solution is
>>>>>       
>>>>>           
>>>> The problem is that the *NotificationForwarder implementations swallow
>>>> this kind of notification and just perform the cleanup. No other
>>>> listener will ever receive this notification.
>>>>
>>>> The "unregisterMBean" operation's semantics is not clearly defined.
>>>> Intuitively, when unregistering an MBean all the associated listeners
>>>> should be gone before the method returns. But this is not the case -
>>>> currently the listeners are sanitized some time after the
>>>> "unregisterMBean" operation started, eventually. There is no easy way to
>>>> notify the API user that the listeners were removed. I am afraid that in
>>>> order to resolve these problems new APIs would need to be introduced and
>>>> the whole mechanism of delivering notification should be revisited (as
>>>> it was planned for JMX 2.0, anyway).
>>>>
>>>> As for fixing the test - checking the weak references works fine as well
>>>> as increasing the timeout. They both can fail when the system is
>>>> extremely busy but the GC based solution will be in general faster than
>>>> the one with increased timeout.
>>>>     
>>>>         
>>> Updated webrev: http://cr.openjdk.java.net/~jbachorik/7170447/webrev.01
>>>
>>>  
>>>       
>>>> -JB-
>>>>
>>>>    
>>>>         
>>>>> implementation dependent, but the test is already implementation
>>>>> dependent.
>>>>>
>>>>> Shanliang
>>>>>
>>>>>
>>>>> Jaroslav Bachorik wrote:
>>>>>      
>>>>>           
>>>>>> On 01/10/2013 10:05 AM, shanliang wrote:
>>>>>>  
>>>>>>        
>>>>>>             
>>>>>>> Jaroslav Bachorik wrote:
>>>>>>>             
>>>>>>>               
>>>>>>>> On 01/09/2013 03:25 PM, shanliang wrote:
>>>>>>>>  
>>>>>>>>                 
>>>>>>>>                 
>>>>>>>>> Jaroslav Bachorik wrote:
>>>>>>>>>                        
>>>>>>>>>                   
>>>>>>>>>> On 01/09/2013 02:44 PM, shanliang wrote:
>>>>>>>>>>  
>>>>>>>>>>                              
>>>>>>>>>>                     
>>>>>>>>>>> Let's forget the JMX implementation at first. If an MBean is
>>>>>>>>>>> unregistered, a user at client side calls
>>>>>>>>>>> "removeNotificationListener"
>>>>>>>>>>> on the MBean, what should happen? if the user calls
>>>>>>>>>>> "isRegistered" on
>>>>>>>>>>> the MBean, what should happen?
>>>>>>>>>>>
>>>>>>>>>>> I have done 2 tests, I used only one thread:
>>>>>>>>>>>
>>>>>>>>>>> 1)
>>>>>>>>>>> ......
>>>>>>>>>>> localServer.unregisterMBean(myMBean);
>>>>>>>>>>> boolean isRegistered = remoteClientServer.isRegistered(myMBean));
>>>>>>>>>>>
>>>>>>>>>>> I got isRegistered = false;
>>>>>>>>>>>
>>>>>>>>>>> 2)
>>>>>>>>>>> ......
>>>>>>>>>>> localServer.unregisterMBean(myMBean);
>>>>>>>>>>> System.out.println("isRegistered =
>>>>>>>>>>> "+remoteClientServer.sRegistered(myMBean));
>>>>>>>>>>> remoteClientServer.removeNotificationListener(myMBean, listener);
>>>>>>>>>>>
>>>>>>>>>>> I did not get an exception.
>>>>>>>>>>>
>>>>>>>>>>> The 1) told that the client could know the MBean was
>>>>>>>>>>> unregistered,
>>>>>>>>>>> then
>>>>>>>>>>> the client should throw an exception for the call of
>>>>>>>>>>> "removeNotificationListener" in 2).
>>>>>>>>>>>                                           
>>>>>>>>>>>                       
>>>>>>>>>> Yes, but then it would not test the listener leakage as it was
>>>>>>>>>> supposed
>>>>>>>>>> to test but rather the fact that the client throws the appropriate
>>>>>>>>>> exception. The fact that the mbean was unregistered does not
>>>>>>>>>> necessarily
>>>>>>>>>> mean that the listeners were released. The main problem remains
>>>>>>>>>> - the
>>>>>>>>>> listeners are being cleaned-up asynchronously and the clean-up
>>>>>>>>>> process
>>>>>>>>>> might race against the other uses of the JMX API.
>>>>>>>>>>                                   
>>>>>>>>>>                     
>>>>>>>>> client.removeNotificationListener is not a right way here to test
>>>>>>>>> listener leak, we could use some other ways, for example we keep
>>>>>>>>> the
>>>>>>>>> listener in a weak reference, then after the mbean is removed, the
>>>>>>>>> weak
>>>>>>>>> reference should be empty after some time. Another way is like
>>>>>>>>> DeadListenerTest does to check whether clean has done at server
>>>>>>>>> side:
>>>>>>>>> use reflection to get the "listenerMap" at server side and make
>>>>>>>>> sure it
>>>>>>>>> is empty, but this need to add a private method to the class
>>>>>>>>> ClientNotifForwarder.
>>>>>>>>>                           
>>>>>>>>>                   
>>>>>>>> There will still be problems with timing. You need either to wait
>>>>>>>> for
>>>>>>>> the GC to kick in to clean up the weak ref. And the listenerMap will
>>>>>>>> not
>>>>>>>> be purged of the unregistered MBean listeners until the
>>>>>>>> notification is
>>>>>>>> generated, processed on the ClientNotificationForwarder and
>>>>>>>> forwarded to
>>>>>>>> the server. So there goes the timing issue again.
>>>>>>>>
>>>>>>>> The problem is that the "unregisterMBean" operation does not
>>>>>>>> guarantee
>>>>>>>> that the listeners have been unregistered at the time it returns.
>>>>>>>> So,
>>>>>>>> one way or the other we will need to wait an arbitrary amount of
>>>>>>>> time
>>>>>>>> before checking for the memory leak.
>>>>>>>>                     
>>>>>>>>                 
>>>>>>> Yes we need to wait, but you can use a cycle like:
>>>>>>>        long maxWaitingTime = 3000;
>>>>>>>        long startTime = System.currentTimeMillis();
>>>>>>>        while ( weakReference.get != null
>>>>>>>                && System.currentTimeMillis() < startTime +
>>>>>>> maxWaitingTime) {
>>>>>>>            System.gc();
>>>>>>>            Thread.sleep(100);
>>>>>>>            System.gc();
>>>>>>>        }
>>>>>>>
>>>>>>>       if (weakReference.get != null) {
>>>>>>>          // failed
>>>>>>>       }
>>>>>>>               
>>>>>>>               
>>>>>> Still you need an arbitrary timeout which might be reached under
>>>>>> extreme
>>>>>> conditions making this test to fail intermittently. But I'd say that's
>>>>>> the nature of tests for memory leak fixes, due to the unpredictable
>>>>>> nature of the GC runs. Unless you take a heap dump and do a
>>>>>> reachability
>>>>>> analysis you can not be sure whether a reference is dangling
>>>>>> somwehwere
>>>>>> or it just hasn't been collected yet :/
>>>>>>
>>>>>> -JB-
>>>>>>
>>>>>>  
>>>>>>        
>>>>>>             
>>>>>>> Shanliang
>>>>>>>             
>>>>>>>               
>>>>>>>> -JB-
>>>>>>>>
>>>>>>>>  
>>>>>>>>                 
>>>>>>>>                 
>>>>>>>>> I think we have 3 things to do here:
>>>>>>>>> 1) modify the test to not use removeNotificationListener for
>>>>>>>>> testing
>>>>>>>>> listener leak
>>>>>>>>> 2) create a new bug about a client does not throw an exception
>>>>>>>>> after an
>>>>>>>>> mbean is unregistered
>>>>>>>>> 3) create a bug about a client does not throw a same exception
>>>>>>>>> as at
>>>>>>>>> server side.
>>>>>>>>>
>>>>>>>>> I will do 2) and 3), if you like you can continue 1), it might
>>>>>>>>> need to
>>>>>>>>> do fix also in the JMX implementation.
>>>>>>>>>
>>>>>>>>> Shanliang
>>>>>>>>>                        
>>>>>>>>>                   
>>>>>>>>>>  
>>>>>>>>>>                              
>>>>>>>>>>                     
>>>>>>>>>>> The test "DeadListenerTest" got passed in some machines because
>>>>>>>>>>> of the
>>>>>>>>>>> timeout for waiting a notification. I think its failure just
>>>>>>>>>>> tells
>>>>>>>>>>> a new
>>>>>>>>>>> bug.
>>>>>>>>>>>
>>>>>>>>>>> To set a longer timeout just hides the real bug, and the test
>>>>>>>>>>> might
>>>>>>>>>>> fail
>>>>>>>>>>> again one day if running condition is changed and you might need
>>>>>>>>>>> longer
>>>>>>>>>>> timeout again.
>>>>>>>>>>>                                           
>>>>>>>>>>>                       
>>>>>>>>>> Yes, I agree with you that extending the timeout just lessens the
>>>>>>>>>> likelihood of the race condition and does not prevent it.
>>>>>>>>>>
>>>>>>>>>>  
>>>>>>>>>>                              
>>>>>>>>>>                     
>>>>>>>>>>> Shanliang
>>>>>>>>>>>
>>>>>>>>>>> Jaroslav Bachorik wrote:
>>>>>>>>>>>                                       
>>>>>>>>>>>                       
>>>>>>>>>>>> On 01/09/2013 11:08 AM, shanliang wrote:
>>>>>>>>>>>>  
>>>>>>>>>>>>                                               
>>>>>>>>>>>>                         
>>>>>>>>>>>>> Jaroslav Bachorik wrote:
>>>>>>>>>>>>>                                                          
>>>>>>>>>>>>>                           
>>>>>>>>>>>>>> On 01/09/2013 09:40 AM, shanliang wrote:
>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>                                            
>>>>>>>>>>>>>>                        
>>>>>>>>>>>>>>                             
>>>>>>>>>>>>>>> I still have no idea why the test failed, but I do not see
>>>>>>>>>>>>>>> why a
>>>>>>>>>>>>>>> longer
>>>>>>>>>>>>>>> timeout can fix the test. Have you reproduced the problem and
>>>>>>>>>>>>>>> tested
>>>>>>>>>>>>>>> your fix? if yes then possible the long timeout hided a real
>>>>>>>>>>>>>>> problem.
>>>>>>>>>>>>>>>                                                                                      
>>>>>>>>>>>>>>>                               
>>>>>>>>>>>>>> Yes, I can reproduce the problem (using the fastbuild bits and
>>>>>>>>>>>>>> -Xcomp
>>>>>>>>>>>>>> switch) and verify that the fix makes the test pass.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The ClientNotifForwarder scans the notifications for
>>>>>>>>>>>>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and
>>>>>>>>>>>>>> removes the
>>>>>>>>>>>>>> appropriate notification listeners in a separate thread. Thus,
>>>>>>>>>>>>>> calling
>>>>>>>>>>>>>> "removeNotificationListener" on the main thread is prone to
>>>>>>>>>>>>>> racing.
>>>>>>>>>>>>>>                                                  
>>>>>>>>>>>>>>                         
>>>>>>>>>>>>>>                             
>>>>>>>>>>>>> It is true that ClientNotifForwarder scans the notifications
>>>>>>>>>>>>> for
>>>>>>>>>>>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and removes
>>>>>>>>>>>>> the
>>>>>>>>>>>>> appropriate notification listeners in a separate thread.
>>>>>>>>>>>>> This is
>>>>>>>>>>>>> for a
>>>>>>>>>>>>> client connection to do clean if a user never calls
>>>>>>>>>>>>> removeNotificationListener.
>>>>>>>>>>>>>
>>>>>>>>>>>>> But calling directly removeNotificationListener from a client
>>>>>>>>>>>>> should
>>>>>>>>>>>>> still get exception if the clean has not been done. As I
>>>>>>>>>>>>> said, if
>>>>>>>>>>>>> the
>>>>>>>>>>>>> client checked and found the listener was still there, then the
>>>>>>>>>>>>> client
>>>>>>>>>>>>> sent a request to its server to remove the listener at server
>>>>>>>>>>>>> side,
>>>>>>>>>>>>> the
>>>>>>>>>>>>> server should find that the MBean in question was not
>>>>>>>>>>>>> registered,
>>>>>>>>>>>>> so the
>>>>>>>>>>>>> server should throw an exception. The bug might be here.
>>>>>>>>>>>>>                                                               
>>>>>>>>>>>>>                           
>>>>>>>>>>>> This won't work. The server side listeners are removed upon
>>>>>>>>>>>> receiving
>>>>>>>>>>>> the "unregistered" notification which is delivered from the
>>>>>>>>>>>> ClientNotificationForwarder and it may have not run yet
>>>>>>>>>>>> (since it
>>>>>>>>>>>> runs
>>>>>>>>>>>> in a separate executor thread). The result is that the
>>>>>>>>>>>> attempt to
>>>>>>>>>>>> remove
>>>>>>>>>>>> the notification listener on the server will succeed as well
>>>>>>>>>>>> failing
>>>>>>>>>>>> the
>>>>>>>>>>>> test subsequently.
>>>>>>>>>>>>
>>>>>>>>>>>> -JB-
>>>>>>>>>>>>
>>>>>>>>>>>>  
>>>>>>>>>>>>                                               
>>>>>>>>>>>>                         
>>>>>>>>>>>>> Shanliang
>>>>>>>>>>>>>                                                          
>>>>>>>>>>>>>                           
>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>                                            
>>>>>>>>>>>>>>                        
>>>>>>>>>>>>>>                             
>>>>>>>>>>>>>>> The timeout you made longer was used to wait a notification
>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>> never arrive.
>>>>>>>>>>>>>>>                                                                                      
>>>>>>>>>>>>>>>                               
>>>>>>>>>>>>>> Well, it can be used to allow more time to process the
>>>>>>>>>>>>>> "unregister"
>>>>>>>>>>>>>> notification, too.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> When I think more of this I am more inclined to fix the race
>>>>>>>>>>>>>> condition.
>>>>>>>>>>>>>> An updated webrev will follow.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>                                            
>>>>>>>>>>>>>>                        
>>>>>>>>>>>>>>                             
>>>>>>>>>>>>>>> To remove a listener from a client side, we did:
>>>>>>>>>>>>>>> 1) at client side, check whether it was added in the
>>>>>>>>>>>>>>> client side
>>>>>>>>>>>>>>> 2) at server side, check whether the MBean in question was
>>>>>>>>>>>>>>> registered in
>>>>>>>>>>>>>>> the MBeanServer (!!!)
>>>>>>>>>>>>>>> 3) at server side, check whether the listener was added.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> So 2) tells that we did not rely on a "unregister"
>>>>>>>>>>>>>>> notification.
>>>>>>>>>>>>>>> Anyway,
>>>>>>>>>>>>>>> if you use a SAME thread to call "unregister" operation to
>>>>>>>>>>>>>>> unregister an
>>>>>>>>>>>>>>> mbean, then any following call (without any time break) to
>>>>>>>>>>>>>>> use the
>>>>>>>>>>>>>>> mbean
>>>>>>>>>>>>>>> should fail, like "removeNotificationListener",
>>>>>>>>>>>>>>> "isRegistered"
>>>>>>>>>>>>>>> etc.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I do see a bug here, if we remove a listener from a
>>>>>>>>>>>>>>> non-existing
>>>>>>>>>>>>>>> MBeam,
>>>>>>>>>>>>>>> we get "ListenerNotFoundException" at a client side, but get
>>>>>>>>>>>>>>> "InstanceNotFoundException" at server side, I think we should
>>>>>>>>>>>>>>> create a
>>>>>>>>>>>>>>> bug, because both implemented the same interface
>>>>>>>>>>>>>>> MBeanServerConnection.
>>>>>>>>>>>>>>>                                                                                      
>>>>>>>>>>>>>>>                               
>>>>>>>>>>>>>> Yes, it is rather inconsistent.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -JB-
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>                                            
>>>>>>>>>>>>>>                        
>>>>>>>>>>>>>>                             
>>>>>>>>>>>>>>> Shanliang
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Jaroslav Bachorik wrote:
>>>>>>>>>>>>>>>                                                       
>>>>>>>>>>>>>>>                          
>>>>>>>>>>>>>>>                               
>>>>>>>>>>>>>>>> Looking for review and a sponsor.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Webrev at
>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~jbachorik/7170447/webrev.00
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> In this issue the timing is the problem.
>>>>>>>>>>>>>>>> MBeanServer.unregisterMBean()
>>>>>>>>>>>>>>>> fires the "unregister" notification which is sent to the
>>>>>>>>>>>>>>>> server
>>>>>>>>>>>>>>>> asynchronously. Thus it may happen that the "unregister"
>>>>>>>>>>>>>>>> notification
>>>>>>>>>>>>>>>> has not been yet processed at the time of invoking
>>>>>>>>>>>>>>>> removeNotificationListener() and the notification listeners
>>>>>>>>>>>>>>>> hasn't
>>>>>>>>>>>>>>>> been
>>>>>>>>>>>>>>>> cleaned up leading to the test failure.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> There is no synchronization between the client and the
>>>>>>>>>>>>>>>> server and
>>>>>>>>>>>>>>>> such
>>>>>>>>>>>>>>>> race condition can occur occasionally. Normally, the
>>>>>>>>>>>>>>>> execution is
>>>>>>>>>>>>>>>> fast
>>>>>>>>>>>>>>>> enough to behave like the "unregister" notification is
>>>>>>>>>>>>>>>> processed
>>>>>>>>>>>>>>>> synchronously with the unregisterMBean() operation but it
>>>>>>>>>>>>>>>> seems
>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>> using fastdebug Server VM bits with the -Xcomp option
>>>>>>>>>>>>>>>> strains
>>>>>>>>>>>>>>>> the CPU
>>>>>>>>>>>>>>>> enough to make this problem appear.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> There is no proper fix for this - the only thing that
>>>>>>>>>>>>>>>> work is
>>>>>>>>>>>>>>>> waiting a
>>>>>>>>>>>>>>>> bit longer in the main thread to give the notification
>>>>>>>>>>>>>>>> processing
>>>>>>>>>>>>>>>> thread
>>>>>>>>>>>>>>>> some time to clean up the listeners.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -JB-
>>>>>>>>>>>>>>>>                                                 
>>>>>>>>>>>>>>>>                                                   
>>>>>>>>>>>>>>>>                                 
>>>>>>>>>>>>>>                                                  
>>>>>>>>>>>>>>                         
>>>>>>>>>>>>>>                             
>>>>>>>>>>>>>                                                               
>>>>>>>>>>>>>                           
>>>>>>>>>>>>                                                     
>>>>>>>>>>>>                         
>>>>>>>>>>>                                           
>>>>>>>>>>>                       
>>>>>>>>>>                                   
>>>>>>>>>>                     
>>>>>>>>>                           
>>>>>>>>>                   
>>>>>>>>                     
>>>>>>>>                 
>>>>>>>               
>>>>>>>               
>>>>>>           
>>>>>>             
>>>>>       
>>>>>           
>>>   
>>>       
>>     
>
>   

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/serviceability-dev/attachments/20130110/9248c139/attachment-0001.html 


More information about the serviceability-dev mailing list