jmx-dev [PATCH] JDK-7170447: Intermittent DeadListenerTest.java failure
shanliang
shanliang.jiang at oracle.com
Thu Jan 10 05:18:32 PST 2013
The weakListener is unnecessary, the test does already the same
verification:
171 Set<?> setForUnreg = listenerMap.get(name);
172 assertTrue("No trace of unregistered MBean: " +
setForUnreg, setForUnreg == null);
All other are OK for me.
Shanliang
Jaroslav Bachorik wrote:
> On 01/10/2013 01:09 PM, Jaroslav Bachorik wrote:
>
>> On 01/10/2013 12:53 PM, shanliang wrote:
>>
>>> Instead to wait GC, you can also to wait the
>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION, when you receive
>>> it, then your listener must be removed too. Of course this solution is
>>>
>> The problem is that the *NotificationForwarder implementations swallow
>> this kind of notification and just perform the cleanup. No other
>> listener will ever receive this notification.
>>
>> The "unregisterMBean" operation's semantics is not clearly defined.
>> Intuitively, when unregistering an MBean all the associated listeners
>> should be gone before the method returns. But this is not the case -
>> currently the listeners are sanitized some time after the
>> "unregisterMBean" operation started, eventually. There is no easy way to
>> notify the API user that the listeners were removed. I am afraid that in
>> order to resolve these problems new APIs would need to be introduced and
>> the whole mechanism of delivering notification should be revisited (as
>> it was planned for JMX 2.0, anyway).
>>
>> As for fixing the test - checking the weak references works fine as well
>> as increasing the timeout. They both can fail when the system is
>> extremely busy but the GC based solution will be in general faster than
>> the one with increased timeout.
>>
>
> Updated webrev: http://cr.openjdk.java.net/~jbachorik/7170447/webrev.01
>
>
>> -JB-
>>
>>
>>> implementation dependent, but the test is already implementation dependent.
>>>
>>> Shanliang
>>>
>>>
>>> Jaroslav Bachorik wrote:
>>>
>>>> On 01/10/2013 10:05 AM, shanliang wrote:
>>>>
>>>>
>>>>> Jaroslav Bachorik wrote:
>>>>>
>>>>>
>>>>>> On 01/09/2013 03:25 PM, shanliang wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Jaroslav Bachorik wrote:
>>>>>>>
>>>>>>>
>>>>>>>> On 01/09/2013 02:44 PM, shanliang wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> Let's forget the JMX implementation at first. If an MBean is
>>>>>>>>> unregistered, a user at client side calls
>>>>>>>>> "removeNotificationListener"
>>>>>>>>> on the MBean, what should happen? if the user calls
>>>>>>>>> "isRegistered" on
>>>>>>>>> the MBean, what should happen?
>>>>>>>>>
>>>>>>>>> I have done 2 tests, I used only one thread:
>>>>>>>>>
>>>>>>>>> 1)
>>>>>>>>> ......
>>>>>>>>> localServer.unregisterMBean(myMBean);
>>>>>>>>> boolean isRegistered = remoteClientServer.isRegistered(myMBean));
>>>>>>>>>
>>>>>>>>> I got isRegistered = false;
>>>>>>>>>
>>>>>>>>> 2)
>>>>>>>>> ......
>>>>>>>>> localServer.unregisterMBean(myMBean);
>>>>>>>>> System.out.println("isRegistered =
>>>>>>>>> "+remoteClientServer.sRegistered(myMBean));
>>>>>>>>> remoteClientServer.removeNotificationListener(myMBean, listener);
>>>>>>>>>
>>>>>>>>> I did not get an exception.
>>>>>>>>>
>>>>>>>>> The 1) told that the client could know the MBean was unregistered,
>>>>>>>>> then
>>>>>>>>> the client should throw an exception for the call of
>>>>>>>>> "removeNotificationListener" in 2).
>>>>>>>>>
>>>>>>>>>
>>>>>>>> Yes, but then it would not test the listener leakage as it was
>>>>>>>> supposed
>>>>>>>> to test but rather the fact that the client throws the appropriate
>>>>>>>> exception. The fact that the mbean was unregistered does not
>>>>>>>> necessarily
>>>>>>>> mean that the listeners were released. The main problem remains - the
>>>>>>>> listeners are being cleaned-up asynchronously and the clean-up
>>>>>>>> process
>>>>>>>> might race against the other uses of the JMX API.
>>>>>>>>
>>>>>>>>
>>>>>>> client.removeNotificationListener is not a right way here to test
>>>>>>> listener leak, we could use some other ways, for example we keep the
>>>>>>> listener in a weak reference, then after the mbean is removed, the
>>>>>>> weak
>>>>>>> reference should be empty after some time. Another way is like
>>>>>>> DeadListenerTest does to check whether clean has done at server side:
>>>>>>> use reflection to get the "listenerMap" at server side and make
>>>>>>> sure it
>>>>>>> is empty, but this need to add a private method to the class
>>>>>>> ClientNotifForwarder.
>>>>>>>
>>>>>>>
>>>>>> There will still be problems with timing. You need either to wait for
>>>>>> the GC to kick in to clean up the weak ref. And the listenerMap will
>>>>>> not
>>>>>> be purged of the unregistered MBean listeners until the notification is
>>>>>> generated, processed on the ClientNotificationForwarder and
>>>>>> forwarded to
>>>>>> the server. So there goes the timing issue again.
>>>>>>
>>>>>> The problem is that the "unregisterMBean" operation does not guarantee
>>>>>> that the listeners have been unregistered at the time it returns. So,
>>>>>> one way or the other we will need to wait an arbitrary amount of time
>>>>>> before checking for the memory leak.
>>>>>>
>>>>>>
>>>>> Yes we need to wait, but you can use a cycle like:
>>>>> long maxWaitingTime = 3000;
>>>>> long startTime = System.currentTimeMillis();
>>>>> while ( weakReference.get != null
>>>>> && System.currentTimeMillis() < startTime +
>>>>> maxWaitingTime) {
>>>>> System.gc();
>>>>> Thread.sleep(100);
>>>>> System.gc();
>>>>> }
>>>>>
>>>>> if (weakReference.get != null) {
>>>>> // failed
>>>>> }
>>>>>
>>>>>
>>>> Still you need an arbitrary timeout which might be reached under extreme
>>>> conditions making this test to fail intermittently. But I'd say that's
>>>> the nature of tests for memory leak fixes, due to the unpredictable
>>>> nature of the GC runs. Unless you take a heap dump and do a reachability
>>>> analysis you can not be sure whether a reference is dangling somwehwere
>>>> or it just hasn't been collected yet :/
>>>>
>>>> -JB-
>>>>
>>>>
>>>>
>>>>> Shanliang
>>>>>
>>>>>
>>>>>> -JB-
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> I think we have 3 things to do here:
>>>>>>> 1) modify the test to not use removeNotificationListener for testing
>>>>>>> listener leak
>>>>>>> 2) create a new bug about a client does not throw an exception
>>>>>>> after an
>>>>>>> mbean is unregistered
>>>>>>> 3) create a bug about a client does not throw a same exception as at
>>>>>>> server side.
>>>>>>>
>>>>>>> I will do 2) and 3), if you like you can continue 1), it might need to
>>>>>>> do fix also in the JMX implementation.
>>>>>>>
>>>>>>> Shanliang
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> The test "DeadListenerTest" got passed in some machines because
>>>>>>>>> of the
>>>>>>>>> timeout for waiting a notification. I think its failure just tells
>>>>>>>>> a new
>>>>>>>>> bug.
>>>>>>>>>
>>>>>>>>> To set a longer timeout just hides the real bug, and the test might
>>>>>>>>> fail
>>>>>>>>> again one day if running condition is changed and you might need
>>>>>>>>> longer
>>>>>>>>> timeout again.
>>>>>>>>>
>>>>>>>>>
>>>>>>>> Yes, I agree with you that extending the timeout just lessens the
>>>>>>>> likelihood of the race condition and does not prevent it.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> Shanliang
>>>>>>>>>
>>>>>>>>> Jaroslav Bachorik wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> On 01/09/2013 11:08 AM, shanliang wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Jaroslav Bachorik wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> On 01/09/2013 09:40 AM, shanliang wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> I still have no idea why the test failed, but I do not see why a
>>>>>>>>>>>>> longer
>>>>>>>>>>>>> timeout can fix the test. Have you reproduced the problem and
>>>>>>>>>>>>> tested
>>>>>>>>>>>>> your fix? if yes then possible the long timeout hided a real
>>>>>>>>>>>>> problem.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>> Yes, I can reproduce the problem (using the fastbuild bits and
>>>>>>>>>>>> -Xcomp
>>>>>>>>>>>> switch) and verify that the fix makes the test pass.
>>>>>>>>>>>>
>>>>>>>>>>>> The ClientNotifForwarder scans the notifications for
>>>>>>>>>>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and
>>>>>>>>>>>> removes the
>>>>>>>>>>>> appropriate notification listeners in a separate thread. Thus,
>>>>>>>>>>>> calling
>>>>>>>>>>>> "removeNotificationListener" on the main thread is prone to
>>>>>>>>>>>> racing.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> It is true that ClientNotifForwarder scans the notifications for
>>>>>>>>>>> MBeanServerNotification.UNREGISTRATION_NOTIFICATION and removes
>>>>>>>>>>> the
>>>>>>>>>>> appropriate notification listeners in a separate thread. This is
>>>>>>>>>>> for a
>>>>>>>>>>> client connection to do clean if a user never calls
>>>>>>>>>>> removeNotificationListener.
>>>>>>>>>>>
>>>>>>>>>>> But calling directly removeNotificationListener from a client
>>>>>>>>>>> should
>>>>>>>>>>> still get exception if the clean has not been done. As I said, if
>>>>>>>>>>> the
>>>>>>>>>>> client checked and found the listener was still there, then the
>>>>>>>>>>> client
>>>>>>>>>>> sent a request to its server to remove the listener at server
>>>>>>>>>>> side,
>>>>>>>>>>> the
>>>>>>>>>>> server should find that the MBean in question was not registered,
>>>>>>>>>>> so the
>>>>>>>>>>> server should throw an exception. The bug might be here.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> This won't work. The server side listeners are removed upon
>>>>>>>>>> receiving
>>>>>>>>>> the "unregistered" notification which is delivered from the
>>>>>>>>>> ClientNotificationForwarder and it may have not run yet (since it
>>>>>>>>>> runs
>>>>>>>>>> in a separate executor thread). The result is that the attempt to
>>>>>>>>>> remove
>>>>>>>>>> the notification listener on the server will succeed as well
>>>>>>>>>> failing
>>>>>>>>>> the
>>>>>>>>>> test subsequently.
>>>>>>>>>>
>>>>>>>>>> -JB-
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Shanliang
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> The timeout you made longer was used to wait a notification
>>>>>>>>>>>>> which
>>>>>>>>>>>>> should
>>>>>>>>>>>>> never arrive.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>> Well, it can be used to allow more time to process the
>>>>>>>>>>>> "unregister"
>>>>>>>>>>>> notification, too.
>>>>>>>>>>>>
>>>>>>>>>>>> When I think more of this I am more inclined to fix the race
>>>>>>>>>>>> condition.
>>>>>>>>>>>> An updated webrev will follow.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> To remove a listener from a client side, we did:
>>>>>>>>>>>>> 1) at client side, check whether it was added in the client side
>>>>>>>>>>>>> 2) at server side, check whether the MBean in question was
>>>>>>>>>>>>> registered in
>>>>>>>>>>>>> the MBeanServer (!!!)
>>>>>>>>>>>>> 3) at server side, check whether the listener was added.
>>>>>>>>>>>>>
>>>>>>>>>>>>> So 2) tells that we did not rely on a "unregister" notification.
>>>>>>>>>>>>> Anyway,
>>>>>>>>>>>>> if you use a SAME thread to call "unregister" operation to
>>>>>>>>>>>>> unregister an
>>>>>>>>>>>>> mbean, then any following call (without any time break) to
>>>>>>>>>>>>> use the
>>>>>>>>>>>>> mbean
>>>>>>>>>>>>> should fail, like "removeNotificationListener", "isRegistered"
>>>>>>>>>>>>> etc.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I do see a bug here, if we remove a listener from a non-existing
>>>>>>>>>>>>> MBeam,
>>>>>>>>>>>>> we get "ListenerNotFoundException" at a client side, but get
>>>>>>>>>>>>> "InstanceNotFoundException" at server side, I think we should
>>>>>>>>>>>>> create a
>>>>>>>>>>>>> bug, because both implemented the same interface
>>>>>>>>>>>>> MBeanServerConnection.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>> Yes, it is rather inconsistent.
>>>>>>>>>>>>
>>>>>>>>>>>> -JB-
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Shanliang
>>>>>>>>>>>>>
>>>>>>>>>>>>> Jaroslav Bachorik wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Looking for review and a sponsor.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Webrev at
>>>>>>>>>>>>>> http://cr.openjdk.java.net/~jbachorik/7170447/webrev.00
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In this issue the timing is the problem.
>>>>>>>>>>>>>> MBeanServer.unregisterMBean()
>>>>>>>>>>>>>> fires the "unregister" notification which is sent to the server
>>>>>>>>>>>>>> asynchronously. Thus it may happen that the "unregister"
>>>>>>>>>>>>>> notification
>>>>>>>>>>>>>> has not been yet processed at the time of invoking
>>>>>>>>>>>>>> removeNotificationListener() and the notification listeners
>>>>>>>>>>>>>> hasn't
>>>>>>>>>>>>>> been
>>>>>>>>>>>>>> cleaned up leading to the test failure.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> There is no synchronization between the client and the
>>>>>>>>>>>>>> server and
>>>>>>>>>>>>>> such
>>>>>>>>>>>>>> race condition can occur occasionally. Normally, the
>>>>>>>>>>>>>> execution is
>>>>>>>>>>>>>> fast
>>>>>>>>>>>>>> enough to behave like the "unregister" notification is
>>>>>>>>>>>>>> processed
>>>>>>>>>>>>>> synchronously with the unregisterMBean() operation but it seems
>>>>>>>>>>>>>> that
>>>>>>>>>>>>>> using fastdebug Server VM bits with the -Xcomp option strains
>>>>>>>>>>>>>> the CPU
>>>>>>>>>>>>>> enough to make this problem appear.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> There is no proper fix for this - the only thing that work is
>>>>>>>>>>>>>> waiting a
>>>>>>>>>>>>>> bit longer in the main thread to give the notification
>>>>>>>>>>>>>> processing
>>>>>>>>>>>>>> thread
>>>>>>>>>>>>>> some time to clean up the listeners.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -JB-
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/serviceability-dev/attachments/20130110/c9e6fdf9/attachment-0001.html
More information about the serviceability-dev
mailing list